Archaeologist·Field Notes from BerriAI/litellm
Vol. I · Field Notes

BerriAIlitellm

Library to easily interface with LLM API providers

9 May 2026·a vast project
Reading Posture
From the Field
Overengineered LLM gateway that solves problems you probably don't have.
Verdict:Worth a look
Reach for it when

You need a single API to manage 100+ LLMs with enterprise features like rate limiting, logging, and failover.

Look elsewhere when

You just want to call a few models from your app and don't need a full proxy server or enterprise overhead.

In context

It's like OpenAI's API proxy but self-hosted and bloated with 258K LOC for what should be a thin wrapper.

Complexity●●●Heavy
Read time~30 minutes
Language
Python
Runtime
Python >=3.10, <3.14
Dependencies
0total

What using it looks like

Drawn from the project's README

From the README
uv add litellm
Fig. 1 — example 1 of 6

What this is

As told for the tourist

What Is This?

LiteLLM is a universal remote control for AI chatbots. Just like one remote can control your TV, soundbar, and streaming stick, LiteLLM lets you talk to dozens of different AI models (like GPT-4, Claude, Gemini) using the same simple commands, no matter which company made them.

What Can You Do With It?

You could use this to switch your app from OpenAI's GPT-4 to Anthropic's Claude by changing just one word in your code. Here's what that looks like in practice:

from litellm import completion

# One line for OpenAI

response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Change one word for Anthropic

response = completion(model="anthropic/claude-3", messages=[{"role": "user", "content": "Hello!"}])

You could also run it as a server that sits between your app and all the AI companies. Just install it and start it up:

litellm --model gpt-4o

Now your whole team can call any AI model through this one server, and LiteLLM handles the billing, security, and keeping track of who's using what.

from litellm import completion

# One line for OpenAI
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Change one word for Anthropic
response = completion(model="anthropic/claude-3", messages=[{"role": "user", "content": "Hello!"}])
litellm --model gpt-4o

How It Works (No Jargon)

It's like a universal translator but for AI requests. When you ask GPT-4 something, you use a specific format. When you ask Claude, it expects a different format. LiteLLM takes your request in one standard format and automatically translates it into whatever each AI company expects. You write one way, it speaks a dozen languages.

It's like a smart mailroom for your AI calls. Imagine every AI request is a package. LiteLLM's router decides which package goes to which AI company, checks if you're allowed to send it, and keeps a log of everything that went out. If one AI company is slow or down, it can automatically send your request to another one instead.

It's like a hotel concierge who remembers everything. When you use LiteLLM as a server, it tracks how many requests each person or team makes, how much they cost, and who's authorized to use which AI models. It's like having a billing system, security guard, and usage dashboard all built into one.

What's Cool About It?

The coolest thing is that it's completely open source and you can run it on your own computers. Most companies that offer this "AI gateway" service charge per request or lock you into their cloud. LiteLLM gives you the same power for free, and you can see exactly how every piece works.

It also handles the annoying parts automatically. If an AI company changes their pricing or adds a new feature, LiteLLM updates to support it. You don't have to rewrite your code every time a company tweaks their API.

Who Should Care?

Reach for this if: You're building an app that uses AI and you want to be able to switch between models without rewriting everything. Or if you're running a team that shares AI access and you need to track costs and control who uses what.

Skip it if: You're only ever going to use one AI model from one company forever. Or if you're just playing around with AI in a notebook and don't need the extra complexity. For simple experiments, just call the AI directly.

Start Here

A recommended reading path through the code

Start Here

A recommended reading path through the code

  1. 01

    Reveals the global configuration, entry points, and top-level exports that define the library's public API.

  2. 02

    Central orchestration file containing the core completion, embedding, and streaming logic that all providers funnel through.

  3. 03

    Exposes the load balancing and model deployment selection architecture, critical for understanding multi-provider routing.

  4. 04

    Demonstrates the key abstraction for converting between OpenAI and LiteLLM formats, revealing the internal data model mapping.

  5. 05

    Illustrates the provider-specific integration pattern with authentication, error handling, and shared utilities for a major backend.

What's inside

16 sections of the codebase

Read Next

Where to go from here

Sibling Projects

Codebases that occupy adjacent space

Related Expeditions
litellm🐍OpenAI Python SDK🚪Portkey Gateway⛓️LangChain💻MLC LLM
 

Export & Share

Take the field notes with you

Words You'll Hear

Hover the dotted terms above for definitions in context

Adapter pattern

pattern

A design pattern that allows incompatible interfaces to work together by wrapping one interface with another.

API gateway

concept

A server that acts as a single entry point for multiple backend services, handling requests, authentication, and routing.

Async buffered writes

pattern

Writing data to a storage system in batches asynchronously to improve performance by reducing I/O operations.

Bidirectional transformation

concept

Converting data in two directions, such as from one API format to another and back again.

Circuit breaker

pattern

A pattern that prevents repeated failed requests to a service by temporarily stopping requests after a threshold of failures.

Connection pooling

concept

A technique where a set of network connections is reused to avoid the overhead of creating new connections.

Dependency graph

concept

A diagram showing how different parts of a codebase depend on each other, often revealing tangled relationships.

Factory pattern

pattern

A design pattern that creates objects without specifying the exact class of object that will be created.

FastAPI

tool

A modern Python web framework for building APIs with automatic request validation and documentation.

Frozen set

concept

An immutable version of a Python set, used for efficient membership checks and as dictionary keys.

Lua script

tool

A lightweight scripting language often used in Redis to execute multiple commands atomically.

MCP

concept

Model Context Protocol, a standard for connecting external tools and data sources to AI models.

OpenTelemetry

tool

A set of tools and APIs for collecting and exporting telemetry data like traces and metrics.

Plugin-core architecture

pattern

An architecture where core functionality is extended by pluggable modules that add features without changing the core.

Prometheus

tool

An open-source monitoring system that collects metrics from services and stores them in a time-series database.

Rate limiting

concept

A technique to control the number of requests a user can make to a service within a specific time window.

RBAC

concept

Role-Based Access Control, a method of restricting system access based on a user's assigned roles.

Redis

tool

An in-memory data store used for caching, real-time counters, and coordinating multiple server instances.

Regex pattern

concept

A sequence of characters that defines a search pattern, used for efficient string matching.

Sliding window

concept

A time-based algorithm for rate limiting that tracks requests within a moving time interval.

SSO

concept

Single Sign-On, an authentication method that allows users to log in once and access multiple applications.

Strategy pattern

pattern

A design pattern where different algorithms (strategies) are selected at runtime based on configuration or conditions.

TOCTOU

concept

Time-of-check to time-of-use, a race condition where a resource's state changes between checking it and using it.

BerriAI/litellm · Archaeologist