Archaeologist·Field Notes from stanfordnlp/dspy
Vol. I · Field Notes

stanfordnlpdspy

DSPy

9 May 2026·a substantial project
Reading Posture
From the Field
DSPy: The most important LM framework you're not using yet.
Verdict:Reach for it
Reach for it when

You need to build reliable, optimizable LM pipelines beyond simple prompting.

Look elsewhere when

You just want a quick one-off chat completion or are allergic to learning a new paradigm.

In context

It's like LangChain but with actual programmatic optimization instead of just orchestration.

Complexity●●●Heavy
Read time~30 minutes
Language
Python
Runtime
Python >=3.10, <3.15
Dependencies
0total

What using it looks like

Drawn from the project's README

From the README
pip install dspy
Fig. 1 — example 1 of 2

What this is

As told for the tourist

What Is This?

DSPy is a toolkit that lets you teach AI language models (like ChatGPT) to do complex tasks by writing regular Python code instead of endlessly tweaking prompts. Think of it as a coach that helps you train an AI to follow a recipe, rather than you having to hand-write every single instruction each time.

What Can You Do With It?

You could use this to build a system that automatically answers customer support questions by searching your company's knowledge base, or create a tool that summarizes long research papers into bullet points. The README shows you can build "simple classifiers, sophisticated RAG pipelines, or Agent loops" — which just means anything from sorting emails into folders to building a research assistant that reads documents and writes reports.

Here's how simple it is to get started:

pip install dspy

Then you'd write something like:

import dspy

# Define what you want the AI to do

class AnswerQuestion(dspy.Module):

def __init__(self):

self.answer = dspy.Predict("question -> answer")

def forward(self, question):

return self.answer(question=question)

That's it — you've just created a reusable AI program.

pip install dspy
import dspy

# Define what you want the AI to do
class AnswerQuestion(dspy.Module):
    def __init__(self):
        self.answer = dspy.Predict("question -> answer")
    
    def forward(self, question):
        return self.answer(question=question)

How It Works (No Jargon)

1. Programs instead of prompts — It's like writing a recipe for a chef, rather than shouting instructions at them each time. You define the steps (search, summarize, format) once, and DSPy remembers the pattern.

2. Automatic teaching — It's like having a tutor that watches how the AI performs, then quietly adjusts the instructions to make it better. You give it examples of good outputs, and it figures out the best way to ask the AI to produce those results.

3. Self-improvement — Imagine a coach that watches your golf swing, then suggests tiny adjustments to your grip and stance. DSPy does this for AI prompts — it tries different wordings and examples, keeps what works, and throws away what doesn't.

What's Cool About It?

You stop fighting with prompts. Normally, you'd spend hours tweaking "Please answer concisely" to "Answer in 2-3 sentences" and still get inconsistent results. DSPy treats prompts like code — you write the logic once, and it optimizes the wording automatically.

It works with any AI model. Whether you're using OpenAI's GPT, an open-source model running on your laptop, or something from Anthropic, DSPy handles the differences. You write your program once, and it adapts to whatever AI engine you plug in.

Who Should Care?

Reach for this if you're building anything that uses AI to process information in multiple steps — like a research assistant, a customer service bot, or a content generator. It's perfect if you're tired of copy-pasting prompts and want something that actually improves over time.

Skip it if you just need a one-off answer from ChatGPT, or if you're building a simple chatbot that doesn't need to follow complex instructions. For basic "ask a question, get an answer" scenarios, DSPy is overkill — like using a full kitchen to make toast.

Start Here

A recommended reading path through the code

Start Here

A recommended reading path through the code

  1. 01

    Reveals the public API surface and top-level abstractions, giving a bird's-eye view of the entire library's structure.

  2. 02

    Exposes the key optimization/teleprompter classes, which are central to DSPy's core workflow of program optimization.

  3. 03

    Implements BootstrapFewShot, a fundamental teleprompter that demonstrates the core pattern of composing demonstrations from traces.

  4. 04

    Shows a more advanced optimization strategy (COPRO) for iterative instruction generation, revealing how prompts are refined.

  5. 05

    Illustrates the client/provider abstraction for interacting with LLMs, a critical architectural layer for model access.

What's inside

10 sections of the codebase

Read Next

Where to go from here

📰
Article2024

DSPy: The Future of Prompt Engineering Is Programmatic

Towards Data Science

A gentle, plain-English introduction to DSPy's core ideas without requiring deep ML knowledge.

📺
Video2024

DSPy Explained Simply

YouTube (AI Explained)

A 15-minute visual walkthrough that makes the three abstractions (Signature, Module, Teleprompter) intuitive.

📰
Article2024

DSPy: The Most Important LM Framework You're Not Using

Simon Willison's Blog

A well-known blogger's take on why DSPy matters, with concrete examples of how it differs from LangChain.

🌳
Repo2024

DSPy GitHub Repository

Stanford AI Lab

The official repo with a quick-start guide and examples that let you run your first DSPy program in minutes.

Sibling Projects

Codebases that occupy adjacent space

Related Expeditions
dspy🔗LangChain🎯Guidance📐Outlines🧑‍🏫Instructor📚LlamaIndex
 

Export & Share

Take the field notes with you

Words You'll Hear

Hover the dotted terms above for definitions in context

Abstraction

concept

A simplified representation of a complex system that hides unnecessary details, allowing you to work with higher-level concepts instead of low-level implementation.

Adapter

pattern

A component that converts one interface or format into another, enabling different systems to work together without modifying their core code.

Bayesian optimization

concept

A strategy for finding the best settings of a system by building a probability model of which settings work well, then testing the most promising ones.

Black-box function

concept

A system or process where you can only see its inputs and outputs, but not how it works internally, so you must optimize it by trial and error.

Bootstrap

concept

A technique that generates training examples by running a system on sample data and keeping only the successful attempts.

Callback system

pattern

A mechanism that lets you run custom code at specific points during a process, like before or after a step completes.

Composite pattern

pattern

A design where individual components and groups of components are treated the same way, allowing you to build complex structures from simple parts.

Coupling

concept

The degree to which different parts of a system depend on each other; high coupling makes changes harder because modifying one part requires changing others.

Directed graph

concept

A collection of connected points where each connection has a direction, like a flowchart showing the order of operations.

Factory method pattern

pattern

A design pattern where a method creates objects without specifying the exact class of object that will be created, allowing flexibility in what gets produced.

Few-shot demonstrations

concept

Examples provided to an AI model along with a question to help it understand the desired format or reasoning pattern.

Genetic algorithm

concept

An optimization method inspired by evolution that combines, mutates, and selects the best solutions over many generations.

Hyperparameter

concept

A setting that controls how a machine learning model learns, chosen before training begins rather than learned from data.

Introspect

concept

To examine the internal structure or code of a program while it is running, often used to discover what types or functions are available.

Metaclass

concept

A class that defines how other classes are created, allowing you to automatically add features or modify behavior when a new class is defined.

Optuna

library

A Python library for automated hyperparameter optimization that efficiently searches for the best settings using various algorithms.

Pydantic

library

A Python library that validates data types and structures automatically, ensuring inputs and outputs match expected formats.

REPL

concept

A Read-Eval-Print Loop, an interactive programming environment where you type code and see results immediately.

Sandboxed execution

concept

Running code in a restricted environment that prevents it from accessing sensitive files or causing harm to the main system.

Schema

concept

A formal description of the structure and types of data, like a blueprint that specifies what fields exist and what kind of values they can hold.

Strategy pattern

pattern

A design pattern that lets you swap different algorithms or behaviors at runtime without changing the code that uses them.

Temperature sampling

concept

A technique that controls how random or creative an AI model's responses are, with higher values producing more varied outputs.

Template method pattern

pattern

A design pattern that defines the skeleton of an algorithm in a base class while letting subclasses override specific steps.

Trace

concept

A recorded sequence of steps and data from a single execution of a program, used for debugging or learning from successful runs.

WASM

concept

WebAssembly, a low-level binary format that runs code at near-native speed in web browsers and other environments.