Archaeologist·Field Notes from TabbyML/tabby
Vol. I · Field Notes

TabbyMLtabby

9 May 2026·a sprawling project
Reading Posture
From the Field
Solid self-hosted Copilot alternative for teams that want GPU control.
Verdict:Reach for it
Reach for it when

You need an on-prem code assistant and have a GPU to run it on.

Look elsewhere when

You want a zero-ops cloud experience or lack GPU hardware.

In context

It's like Continue.dev but self-contained with no DB or cloud dependency.

Complexity●●Medium
Read time~30 minutes
Language
JavaScript
Runtime
Node.js >=18
Dependencies
1total

What using it looks like

Drawn from the project's README

From the README
docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct
Fig. 1 — example 1 of 4

What this is

As told for the tourist

What Is This?

Tabby is an AI assistant that helps you write code, but it runs entirely on your own computer instead of sending your code to a cloud service like GitHub Copilot does. Think of it like having a super-smart coworker who sits next to you, watches what you're typing, and suggests the next lines of code — but that coworker never leaves your office or shares your work with anyone else.

What Can You Do With It?

You could use this to get code suggestions as you type in VS Code, just like autocomplete on steroids. For example, if you start typing a function that calculates shipping costs, Tabby might suggest the rest of the function based on patterns it's seen in your codebase.

You could also ask it questions in a chat panel, like "How do I connect to our database?" or "What's the pattern we use for error handling?" — and it will answer using your own project's code as context.

Here's how you'd fire it up on your machine using Docker (a way to run software in a self-contained box):

docker run -it \

--gpus all -p 8080:8080 -v $HOME/.tabby:/data \

tabbyml/tabby \

serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

This tells your computer to download Tabby, give it access to your GPU (graphics card), and start serving two AI models — one for code suggestions and one for chat.

docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

How It Works (No Jargon)

It's like a librarian who memorizes your entire bookshelf. Tabby first reads through all the code in your project and builds a mental index of what functions exist, how they're named, and how they connect. When you start typing, it doesn't guess randomly — it checks its memory of your actual code.

It's like having two specialists on call. One specialist (the "completion" model) is fast and focused — it watches your cursor and predicts what you'll type next, like a word predictor on your phone but for code. The other specialist (the "chat" model) is slower but smarter — it can answer complex questions by searching through your whole codebase.

It's like a private tutor that never connects to the internet. All the AI models run on your own machine, using your GPU if you have one, or just your regular processor if you don't. Your code never leaves your computer, which is why companies with sensitive data love it.

What's Cool About It?

The coolest thing is that Tabby doesn't need a database or cloud service to run — it's completely self-contained. You literally just run one command and it works. Most AI coding assistants require you to sign up for a cloud service, send your code to their servers, and pay monthly fees. Tabby flips that: you own everything, your code stays private, and you can even run it on a consumer-grade gaming GPU.

Another neat trick: Tabby can index not just your code, but also your GitLab merge requests and your own documentation. So when you ask it a question, it can pull answers from your team's internal docs, not just generic internet knowledge.

Who Should Care?

Reach for this if: You're a developer who cares about privacy (working on proprietary code, medical software, or anything sensitive). You're a team that wants AI code suggestions but can't send code to third-party servers due to compliance rules. You're a hobbyist who wants to run AI locally on your gaming PC without paying monthly subscriptions.

Skip it if: You don't have a decent GPU and don't want to wait for slower CPU-based suggestions. You're happy with GitHub Copilot's free tier and don't mind your code being processed in the cloud. You want the absolute best AI model available — Tabby uses smaller, open-source models that are good but not as powerful as the latest GPT models.

Start Here

A recommended reading path through the code

Start Here

A recommended reading path through the code

  1. 01

    Central type definitions reveal core data structures and global augmentations used across the UI.

  2. 02

    Exposes all API request/response models, showing the data contracts between client and server.

  3. 03

    Defines versioned ServerApi interfaces that establish the core communication protocol between editor and chat panel.

  4. 04

    Registers all VS Code extension commands, illustrating the primary user-facing actions and integration points.

  5. 05

    Implements the logging infrastructure used across the agent, revealing cross-cutting concerns and system architecture.

What's inside

15 sections of the codebase

Read Next

Where to go from here

📰
Article2024

Tabby: A Self-Hosted GitHub Copilot Alternative

It's FOSS News

Provides a plain-English overview of Tabby's features and why you'd want to run your own AI coding assistant.

📺
Video2024

Self-Host Your Own AI Coding Assistant with Tabby

YouTube (NetworkChuck or similar tech creator)

Visual walkthrough of setting up Tabby on your own hardware, perfect for beginners who learn by watching.

📰
Article2024

What Is a Self-Hosted AI Code Assistant?

Tabby Blog

Official introduction explaining the value proposition of self-hosted code completion vs. cloud services.

📰
Article2024

How to Run Tabby on a Consumer GPU

Tabby Docs

Step-by-step guide for getting Tabby running on affordable hardware, demystifying GPU requirements.

Sibling Projects

Codebases that occupy adjacent space

Related Expeditions
tabby🧩Continue🦁Cody🔍hnswlib🦙Alpaca-LoRA
 

Export & Share

Take the field notes with you

Words You'll Hear

Hover the dotted terms above for definitions in context

Bidirectional channel

concept

A communication pathway that allows data to flow in both directions simultaneously between two endpoints, enabling real-time interaction.

Command pattern

pattern

A design pattern that encapsulates a request as an object, allowing you to parameterize clients with queues, requests, and operations.

Composite pattern

pattern

A design pattern that allows you to treat individual objects and compositions of objects uniformly, often used to build tree-like structures.

Debouncing

pattern

A technique that delays processing an event (like a keystroke) until a certain amount of time has passed without another event, preventing excessive requests.

Dependency injection

pattern

A design pattern where objects receive their dependencies from an external source rather than creating them internally, making code more modular and testable.

Factory pattern

pattern

A design pattern where a function or class creates objects without specifying the exact class of object that will be created, often based on input parameters.

ForwardRef

tool

A React feature that allows a parent component to directly access a child component's DOM node or instance, often used for imperative control.

Framer Motion

library

A React animation library that provides declarative APIs for creating smooth animations and gestures in user interfaces.

GraphQL mutation

concept

A GraphQL operation used to modify data on the server, such as creating, updating, or deleting resources.

GraphQL subscription

concept

A way to receive real-time updates from a server by maintaining a persistent connection, often used for streaming data like chat responses.

Handshake protocol

pattern

A process where two systems exchange initial messages to establish a connection, agree on parameters, and verify readiness before communication.

Inference server

concept

A server that runs machine learning models to make predictions or generate outputs, often using GPUs for acceleration.

LSP server

concept

A server that implements the Language Server Protocol, allowing editors like VS Code to get code completions, diagnostics, and other language features.

Mutex

concept

A synchronization primitive that ensures only one thread or process can access a shared resource at a time, preventing conflicts.

Observer pattern

pattern

A design pattern where an object (the subject) maintains a list of dependents (observers) and notifies them of state changes, often used for event handling.

Orama

library

A full-text search engine library for JavaScript, used to index and search through text data efficiently in the browser or Node.js.

Plugin-core architecture

pattern

A software design where a central core provides basic functionality, and additional features are added through pluggable modules or extensions.

Rate limiting

concept

A mechanism that controls how many requests a user or system can make within a given time period to prevent overload.

Rehype/remark plugins

library

Libraries for processing HTML and Markdown content, respectively, often used to transform or enhance rendered text in web applications.

State machine

concept

A model that represents a system as a set of states and transitions between them, often used to manage complex workflows or component lifecycles.

SWR

library

A React hook library for data fetching that provides caching, revalidation, and periodic refetching, often used for server state management.

Telemetry

concept

The automated collection and transmission of data about system usage, performance, or events, often used for monitoring and analytics.

Tiptap editor

library

A rich text editor framework for React that is extensible and headless, allowing custom nodes and plugins for editing content.

URQL

library

A lightweight GraphQL client library for JavaScript applications, used to send queries, mutations, and subscriptions to a GraphQL server.

Zod

library

A TypeScript-first schema validation library that allows you to define and validate data structures with type inference.