TabbyML/tabby · Archaeologist

From the Field

“Solid self-hosted Copilot alternative for teams that want GPU control.”

Verdict:Reach for it

Reach for it when

You need an on-prem code assistant and have a GPU to run it on.

Look elsewhere when

You want a zero-ops cloud experience or lack GPU hardware.

In context

It's like Continue.dev but self-contained with no DB or cloud dependency.

Complexity●●●Medium

Read time~30 minutes

Language

JavaScript

Runtime

Node.js >=18

Dependencies

1total

What using it looks like

Drawn from the project's README

From the README

docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

Fig. 1 — example 1 of 4

What this is

As told for the tourist

What Is This?

Tabby is an AI assistant that helps you write code, but it runs entirely on your own computer instead of sending your code to a cloud service like GitHub Copilot does. Think of it like having a super-smart coworker who sits next to you, watches what you're typing, and suggests the next lines of code — but that coworker never leaves your office or shares your work with anyone else.

What Can You Do With It?

You could use this to get code suggestions as you type in VS Code, just like autocomplete on steroids. For example, if you start typing a function that calculates shipping costs, Tabby might suggest the rest of the function based on patterns it's seen in your codebase.

You could also ask it questions in a chat panel, like "How do I connect to our database?" or "What's the pattern we use for error handling?" — and it will answer using your own project's code as context.

Here's how you'd fire it up on your machine using Docker (a way to run software in a self-contained box):

docker run -it \

--gpus all -p 8080:8080 -v $HOME/.tabby:/data \

tabbyml/tabby \

serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

This tells your computer to download Tabby, give it access to your GPU (graphics card), and start serving two AI models — one for code suggestions and one for chat.

docker run -it \
  --gpus all -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct

How It Works (No Jargon)

It's like a librarian who memorizes your entire bookshelf. Tabby first reads through all the code in your project and builds a mental index of what functions exist, how they're named, and how they connect. When you start typing, it doesn't guess randomly — it checks its memory of your actual code.

It's like having two specialists on call. One specialist (the "completion" model) is fast and focused — it watches your cursor and predicts what you'll type next, like a word predictor on your phone but for code. The other specialist (the "chat" model) is slower but smarter — it can answer complex questions by searching through your whole codebase.

It's like a private tutor that never connects to the internet. All the AI models run on your own machine, using your GPU if you have one, or just your regular processor if you don't. Your code never leaves your computer, which is why companies with sensitive data love it.

What's Cool About It?

The coolest thing is that Tabby doesn't need a database or cloud service to run — it's completely self-contained. You literally just run one command and it works. Most AI coding assistants require you to sign up for a cloud service, send your code to their servers, and pay monthly fees. Tabby flips that: you own everything, your code stays private, and you can even run it on a consumer-grade gaming GPU.

Another neat trick: Tabby can index not just your code, but also your GitLab merge requests and your own documentation. So when you ask it a question, it can pull answers from your team's internal docs, not just generic internet knowledge.

Who Should Care?

Reach for this if: You're a developer who cares about privacy (working on proprietary code, medical software, or anything sensitive). You're a team that wants AI code suggestions but can't send code to third-party servers due to compliance rules. You're a hobbyist who wants to run AI locally on your gaming PC without paying monthly subscriptions.

Skip it if: You don't have a decent GPU and don't want to wait for slower CPU-based suggestions. You're happy with GitHub Copilot's free tier and don't mind your code being processed in the cloud. You want the absolute best AI model available — Tabby uses smaller, open-source models that are good but not as powerful as the latest GPT models.

Start Here

A recommended reading path through the code

Start Here

A recommended reading path through the code

01
ee/tabby-ui/lib/types/index.ts
Central type definitions reveal core data structures and global augmentations used across the UI.
02
python/tabby-eval/modal/tabby_python_client/tabby_python_client/models/__init__.py
Exposes all API request/response models, showing the data contracts between client and server.
03
clients/tabby-chat-panel/src/server.ts
Defines versioned ServerApi interfaces that establish the core communication protocol between editor and chat panel.
04
clients/vscode/src/commands/index.ts
Registers all VS Code extension commands, illustrating the primary user-facing actions and integration points.
05
clients/tabby-agent/src/logger/index.ts
Implements the logging infrastructure used across the agent, revealing cross-cutting concerns and system architecture.