Skip to main content

Not Diamond ¬◇

When building complex LLM workflows users may need to prompt different models according to accuracy, cost, or call latency. Users can use Not Diamond to route prompts in these workflows to the right model for their needs, helping maximize accuracy while saving on model costs.

Getting started

Make sure you have created an account and generated an API key, then add your API key to your env as NOTDIAMOND_API_KEY.

![Create an API key]

From here, you can

Tracing

Weave integrates with Not Diamond's Python library to automatically log API calls. You only need to run weave.init() at the start of your workflow, then continue using the routed provider as usual:

from notdiamond import NotDiamond

import weave
weave.init('notdiamond-quickstart')

client = NotDiamond()
session_id, provider = client.chat.completions.model_select(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Concisely explain merge sort."}
],
model=['openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
)

print("LLM called: ", provider.provider) # openai, anthropic, etc
print("Provider model: ", provider.model) # gpt-4o, claude-3-5-sonnet-20240620, etc

Custom routing

You can also train your own custom router on Evaluations, allowing Not Diamond to route prompts according to eval performance for specialized use cases.

Start by training a custom router:

from weave.flow.eval import EvaluationResults
from weave.integrations.notdiamond.custom_router import train_router

# Build an Evaluation on gpt-4o and Claude 3.5 Sonnet
evaluation = weave.Evaluation(...)
gpt_4o = weave.Model(...)
sonnet = weave.Model(...)

model_evals = {
'openai/gpt-4o': evaluation.get_eval_results(gpt_4o),
'anthropic/claude-3-5-sonnet-20240620': evaluation.get_eval_results(sonnet),
}
preference_id = train_router(
model_evals=model_evals,
prompt_column="prompt",
response_column="actual",
language="en",
maximize=True,
)

By reusing this preference ID in any model_select request, you can route your prompts to maximize performance and minimize cost on your evaluation data:

from notdiamond import NotDiamond
client = NotDiamond()

import weave
weave.init('notdiamond-quickstart')

session_id, provider = client.chat.completions.model_select(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Concisely explain merge sort."}
],
model=['openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620'],

# passing this preference ID reuses your custom router
preference_id=preference_id
)

print("LLM called: ", provider.provider) # openai, anthropic, etc
print("Provider model: ", provider.model) # gpt-4o, claude-3-5-sonnet-20240620, etc

Additional support

Visit the docs or send us a message for further support.