Skip to main content

Local Models

Many developers download and run open source models like LLama-3, Mixtral, Gemma, Phi and more locally. There are quite a few ways of running these models locally and Weave supports a few of them out of the box, as long as they support OpenAI SDK compatibility.

Wrap local model functions with @weave.op()

You can easily integrate Weave with any LLM yourself simply by initializing Weave with weave.init('<your-project-name>') and then wrapping the calls to your LLMs with weave.op(). See our guide on tracing for more details.

Updating your OpenAI SDK code to use local models

All of the frameworks of services that support OpenAI SDK compatibility require a few minor changes.

First and most important, is the base_url change during the openai.OpenAI() initialization.

client = openai.OpenAI(
api_key='fake',
base_url="http://localhost:1234",
)

In the case of local models, the api_key can be any string but it should be overridden, as otherwise OpenAI will try to use it from environment variables and show you an error.

OpenAI SDK supported Local Model runners

Here's a list of apps that allows you to download and run models from Hugging Face on your computer, that support OpenAI SDK compatibility.

  1. Nomic GPT4All - support via Local Server in settings (FAQ)
  2. LMStudio - Local Server OpenAI SDK support docs
  3. Ollama - Experimental Support for OpenAI SDK
  4. llama.cpp via llama-cpp-python python package
  5. llamafile - http://localhost:8080/v1 automatically supports OpenAI SDK on Llamafile run