Using GPTCache with LiteLLM

GPTCache is a Library for Creating Semantic Cache for LLM Queries

GPTCache Docs: https://gptcache.readthedocs.io/en/latest/index.html#

GPTCache Github: https://github.com/zilliztech/GPTCache

In this document we cover:

Quick Start Usage
Advanced Usage - Set Custom Cache Keys

Quick Start Usage

Install GPTCache

pip install gptcache

Using GPT Cache with Litellm Completion()

Using GPTCache

In order to use GPTCache the following lines are used to instantiate it

from gptcache import cache
# set API keys in .env / os.environ
cache.init()
cache.set_openai_key()

Full Code using GPTCache and LiteLLM

By default GPT Cache uses the content in messages as the cache key

from gptcache import cache
from litellm.gpt_cache import completion # import completion from litellm.cache
import time

# Set your .env keys 
os.environ['OPENAI_API_KEY'] = ""
cache.init()
cache.set_openai_key()

question = "what's LiteLLM"
for _ in range(2):
    start_time = time.time()
    response = completion(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': question
        }
      ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))

Advanced Usage - Set Custom Cache Keys

By default gptcache uses the messages as the cache key

GPTCache allows you to set custom cache keys by setting

cache.init(pre_func=pre_cache_func)

In this code snippet below we define a pre_func that returns message content + model as key

Defining a `pre_func` for GPTCache

### using / setting up gpt cache
from gptcache import cache
from gptcache.processor.pre import last_content_without_prompt
from typing import Dict, Any

# use this function to set your cache keys -> gptcache
# data are all the args passed to your completion call 
def pre_cache_func(data: Dict[str, Any], **params: Dict[str, Any]) -> Any:
        # use this to set cache key
        print("in pre_cache_func")
        last_content_without_prompt_val = last_content_without_prompt(data, **params)
        print("last content without prompt", last_content_without_prompt_val)
        print("model", data["model"])
        cache_key = last_content_without_prompt_val + data["model"]
        print("cache_key", cache_key)
        return cache_key # using this as cache_key
        

Init Cache with `pre_func` to set custom keys

# init GPT Cache with custom pre_func
cache.init(pre_func=pre_cache_func)
cache.set_openai_key()

Using Cache

Cache key is message + model

We make 3 LLM API calls

2 to OpenAI
1 to Cohere command nightly

messages = [{"role": "user", "content": "why should I use LiteLLM for completions()"}]
response1 = completion(model="gpt-3.5-turbo", messages=messages)
response2 = completion(model="gpt-3.5-turbo", messages=messages)
response3 = completion(model="command-nightly", messages=messages) # calling cohere command nightly

if response1["choices"] != response2["choices"]: # same models should cache 
    print(f"Error occurred: Caching for same model+prompt failed")

if response3["choices"] == response2["choices"]: # different models, don't cache 
    # if models are different, it should not return cached response
    print(f"Error occurred: Caching for different model+prompt failed")

print("response1", response1)
print("response2", response2)
print("response3", response3)

Using GPTCache with LiteLLM

Quick Start Usage​

Install GPTCache​

Using GPT Cache with Litellm Completion()​

Using GPTCache​

Full Code using GPTCache and LiteLLM​

Advanced Usage - Set Custom Cache Keys​

Defining a pre_func for GPTCache​

Init Cache with pre_func to set custom keys​

Using Cache​