OpenAI Proxy Server

Use this to spin up a proxy api to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.)

This works for async + streaming as well.

Works with ALL MODELS supported by LiteLLM. To see supported providers check out this list - Provider List.

Requirements Make sure relevant keys are set in the local .env.

Jump to tutorial

quick start

Call Huggingface models through your OpenAI proxy.

Start Proxy
Run this in your CLI.

$ pip install litellm

$ litellm --model huggingface/bigcode/starcoder

#INFO:     Uvicorn running on http://0.0.0.0:8000

This will host a local proxy api at: http://0.0.0.0:8000

Test it

OpenAI
curl

import openai 

openai.api_base = "http://0.0.0.0:8000"

print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user", 
      "content": "what do you know?"
    }
  ], 
}'

Other supported models:

$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1

$ export TOGETHERAI_API_KEY=my-api-key
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k

$ export REPLICATE_API_KEY=my-api-key
$ litellm \
  --model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3

$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf

$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison

$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
$ export AZURE_API_VERSION=my-api-version

$ litellm --model azure/my-deployment-id

$ export AI21_API_KEY=my-api-key
$ litellm --model j2-light

$ export COHERE_API_KEY=my-api-key
$ litellm --model command-nightly

Jump to Code

setting api base, temperature, max tokens

litellm --model huggingface/bigcode/starcoder \
  --api_base https://my-endpoint.huggingface.cloud \
  --max_tokens 250 \
  --temperature 0.5

Ollama example

$ litellm --model ollama/llama2 --api_base http://localhost:11434

tutorial - using with aider

Aider is an AI pair programming in your terminal.

But it only accepts OpenAI API Calls.

In this tutorial we'll use Aider with WizardCoder (hosted on HF Inference Endpoints).

[NOTE]: To learn how to deploy a model on Huggingface

Step 1: Install aider and litellm

$ pip install aider-chat litellm

Step 2: Spin up local proxy

Save your huggingface api key in your local environment (can also do this via .env)

$ export HUGGINGFACE_API_KEY=my-huggingface-api-key

Point your local proxy to your model endpoint

$ litellm \
  --model huggingface/WizardLM/WizardCoder-Python-34B-V1.0 \
  --api_base https://my-endpoint.huggingface.com

This will host a local proxy api at: http://0.0.0.0:8000

Step 3: Replace openai api base in Aider

Aider lets you set the openai api base. So lets point it to our proxy instead.

$ aider --openai-api-base http://0.0.0.0:8000

And that's it!

OpenAI Proxy Server

quick start​

setting api base, temperature, max tokens​

tutorial - using with aider​

Step 1: Install aider and litellm​

Step 2: Spin up local proxy​