What Is This?
Tabby is an AI assistant that helps you write code, but it runs entirely on your own computer instead of sending your code to a cloud service like GitHub Copilot does. Think of it like having a super-smart coworker who sits next to you, watches what you're typing, and suggests the next lines of code — but that coworker never leaves your office or shares your work with anyone else.
What Can You Do With It?
You could use this to get code suggestions as you type in VS Code, just like autocomplete on steroids. For example, if you start typing a function that calculates shipping costs, Tabby might suggest the rest of the function based on patterns it's seen in your codebase.
You could also ask it questions in a chat panel, like "How do I connect to our database?" or "What's the pattern we use for error handling?" — and it will answer using your own project's code as context.
Here's how you'd fire it up on your machine using Docker (a way to run software in a self-contained box):
docker run -it \
--gpus all -p 8080:8080 -v $HOME/.tabby:/data \
tabbyml/tabby \
serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct
This tells your computer to download Tabby, give it access to your GPU (graphics card), and start serving two AI models — one for code suggestions and one for chat.
docker run -it \
--gpus all -p 8080:8080 -v $HOME/.tabby:/data \
tabbyml/tabby \
serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-InstructHow It Works (No Jargon)
It's like a librarian who memorizes your entire bookshelf. Tabby first reads through all the code in your project and builds a mental index of what functions exist, how they're named, and how they connect. When you start typing, it doesn't guess randomly — it checks its memory of your actual code.
It's like having two specialists on call. One specialist (the "completion" model) is fast and focused — it watches your cursor and predicts what you'll type next, like a word predictor on your phone but for code. The other specialist (the "chat" model) is slower but smarter — it can answer complex questions by searching through your whole codebase.
It's like a private tutor that never connects to the internet. All the AI models run on your own machine, using your GPU if you have one, or just your regular processor if you don't. Your code never leaves your computer, which is why companies with sensitive data love it.
What's Cool About It?
The coolest thing is that Tabby doesn't need a database or cloud service to run — it's completely self-contained. You literally just run one command and it works. Most AI coding assistants require you to sign up for a cloud service, send your code to their servers, and pay monthly fees. Tabby flips that: you own everything, your code stays private, and you can even run it on a consumer-grade gaming GPU.
Another neat trick: Tabby can index not just your code, but also your GitLab merge requests and your own documentation. So when you ask it a question, it can pull answers from your team's internal docs, not just generic internet knowledge.
Who Should Care?
Reach for this if: You're a developer who cares about privacy (working on proprietary code, medical software, or anything sensitive). You're a team that wants AI code suggestions but can't send code to third-party servers due to compliance rules. You're a hobbyist who wants to run AI locally on your gaming PC without paying monthly subscriptions.
Skip it if: You don't have a decent GPU and don't want to wait for slower CPU-based suggestions. You're happy with GitHub Copilot's free tier and don't mind your code being processed in the cloud. You want the absolute best AI model available — Tabby uses smaller, open-source models that are good but not as powerful as the latest GPT models.