BerriAI/litellm · Archaeologist

From the Field

“Overengineered LLM gateway that solves problems you probably don't have.”

Verdict:Worth a look

Reach for it when

You need a single API to manage 100+ LLMs with enterprise features like rate limiting, logging, and failover.

Look elsewhere when

You just want to call a few models from your app and don't need a full proxy server or enterprise overhead.

In context

It's like OpenAI's API proxy but self-hosted and bloated with 258K LOC for what should be a thin wrapper.

Complexity●●●Heavy

Read time~30 minutes

Language

Python

Runtime

Python >=3.10, <3.14

Dependencies

0total

What using it looks like

Drawn from the project's README

From the README

uv add litellm

Fig. 1 — example 1 of 6

What this is

As told for the tourist

What Is This?

LiteLLM is a universal remote control for AI chatbots. Just like one remote can control your TV, soundbar, and streaming stick, LiteLLM lets you talk to dozens of different AI models (like GPT-4, Claude, Gemini) using the same simple commands, no matter which company made them.

What Can You Do With It?

You could use this to switch your app from OpenAI's GPT-4 to Anthropic's Claude by changing just one word in your code. Here's what that looks like in practice:

from litellm import completion

# One line for OpenAI

response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Change one word for Anthropic

response = completion(model="anthropic/claude-3", messages=[{"role": "user", "content": "Hello!"}])

You could also run it as a server that sits between your app and all the AI companies. Just install it and start it up:

litellm --model gpt-4o

Now your whole team can call any AI model through this one server, and LiteLLM handles the billing, security, and keeping track of who's using what.

from litellm import completion

# One line for OpenAI
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Change one word for Anthropic
response = completion(model="anthropic/claude-3", messages=[{"role": "user", "content": "Hello!"}])

litellm --model gpt-4o

How It Works (No Jargon)

It's like a universal translator but for AI requests. When you ask GPT-4 something, you use a specific format. When you ask Claude, it expects a different format. LiteLLM takes your request in one standard format and automatically translates it into whatever each AI company expects. You write one way, it speaks a dozen languages.

It's like a smart mailroom for your AI calls. Imagine every AI request is a package. LiteLLM's router decides which package goes to which AI company, checks if you're allowed to send it, and keeps a log of everything that went out. If one AI company is slow or down, it can automatically send your request to another one instead.

It's like a hotel concierge who remembers everything. When you use LiteLLM as a server, it tracks how many requests each person or team makes, how much they cost, and who's authorized to use which AI models. It's like having a billing system, security guard, and usage dashboard all built into one.

What's Cool About It?

The coolest thing is that it's completely open source and you can run it on your own computers. Most companies that offer this "AI gateway" service charge per request or lock you into their cloud. LiteLLM gives you the same power for free, and you can see exactly how every piece works.

It also handles the annoying parts automatically. If an AI company changes their pricing or adds a new feature, LiteLLM updates to support it. You don't have to rewrite your code every time a company tweaks their API.

Who Should Care?

Reach for this if: You're building an app that uses AI and you want to be able to switch between models without rewriting everything. Or if you're running a team that shares AI access and you need to track costs and control who uses what.

Skip it if: You're only ever going to use one AI model from one company forever. Or if you're just playing around with AI in a notebook and don't need the extra complexity. For simple experiments, just call the AI directly.

Start Here

A recommended reading path through the code

Start Here

A recommended reading path through the code

01
litellm/__init__.py
Reveals the global configuration, entry points, and top-level exports that define the library's public API.
02
litellm/main.py
Central orchestration file containing the core completion, embedding, and streaming logic that all providers funnel through.
03
litellm/router.py
Exposes the load balancing and model deployment selection architecture, critical for understanding multi-provider routing.
04
litellm/responses/litellm_completion_transformation/transformation.py
Demonstrates the key abstraction for converting between OpenAI and LiteLLM formats, revealing the internal data model mapping.
05
litellm/llms/azure/common_utils.py
Illustrates the provider-specific integration pattern with authentication, error handling, and shared utilities for a major backend.