Tabby review: self-hosted AI code completion you actually control
Tabby is an open-source, self-hosted alternative to cloud AI code completion. What it runs, how to set it up with Docker, and when self-hosting is actually worth the ops overhead.
Cloud code completion has a quiet cost: every keystroke of context gets shipped to someone else’s servers. For a lot of developers that’s fine. For anyone working under a security policy, on a private codebase, or just allergic to sending proprietary code through a third-party API, it’s a dealbreaker. Tabby is the open-source answer — a coding assistant you run on your own hardware, where the model, the data, and the uptime are all yours.
We set it up on a single workstation with a consumer GPU and ran it against a real project to find where the “self-hosted” promise holds and where it costs you.
What you actually control with Tabby
Tabby is a self-hosted alternative to GitHub Copilot, written in Rust and shipped as a single Docker image. You run one container, point your editor at it, and you get inline completions plus a chat panel — without any of that traffic leaving your network.
Three things are genuinely yours when you run it:
- Your code stays put. Completion context never leaves the machine (or VPC) you deploy to. That’s the entire reason most teams look at Tabby.
- You pick the model. Tabby pulls from an open registry of code models — StarCoder, CodeLlama, DeepSeek Coder, and the Qwen2.5-Coder family. You can swap the completion model and the chat model independently.
- It’s free to run. The core is open source under Apache 2.0. There’s no per-seat fee; your cost is the hardware and the time to operate it.
It plugs into VS Code, JetBrains IDEs, and Vim/Neovim through official extensions, so the editor side feels close to what you’re used to.
Getting it running
The happy path is genuinely one command. With an NVIDIA GPU and Docker installed:
docker run -it --gpus all -p 8080:8080 \
-v $HOME/.tabby:/data \
registry.tabbyml.com/tabbyml/tabby \
serve --model Qwen2.5-Coder-1.5B \
--chat-model Qwen2.5-Coder-3B-Instruct \
--device cuda
That brings up the server on localhost:8080 with a web UI, a completion model, and a chat model. You create an account on first launch, generate a token, drop it into the editor extension, and you’re completing code.
The realistic version has a few more wrinkles:
- GPU matters. A 1.5B completion model fits comfortably on a card with 6–8 GB of VRAM and returns suggestions fast enough to feel live. Larger chat models want more headroom. You can run on CPU with
--device cpu, but latency climbs to the point where inline completion stops feeling like completion. - Model choice is a tradeoff, not a default. Bigger models give better suggestions and cost more memory and latency. The Qwen2.5-Coder line is a reasonable starting point; size up only if your hardware allows.
- Indexing takes a pass. Pointing Tabby at your repositories runs a one-time index, after which context retrieval is fast.
Here’s the honest comparison against hosted options:
| Tabby (self-hosted) | Hosted (Copilot / Cursor) | |
|---|---|---|
| Code privacy | Stays on your infra | Sent to vendor API |
| Cost | Hardware + ops time | Per-seat subscription |
| Setup | Docker + a GPU | Install and sign in |
| Completion quality | Good, model-dependent | Frontier-model strong |
| Maintenance | You own updates and uptime | Vendor handles it |
If you’d rather skip the ops entirely and you don’t have a hard privacy constraint, a hosted AI editor is the lower-friction path:
Cursor
A hosted AI-native editor with frontier-model completion and chat. No infrastructure to run — the tradeoff is that your code context goes through Cursor's API.
Free tier; Pro from $20/mo
Affiliate link · We earn a commission at no cost to you.
Who should self-host — and who shouldn’t
Tabby earns its keep in specific situations:
- Regulated or air-gapped environments where code can’t legally or contractually leave the building.
- Teams at scale where per-seat AI subscriptions add up and you already have GPU capacity.
- Privacy-first solo developers who want completion without the data tradeoff and don’t mind running a container.
It’s the wrong call if you want the strongest possible completion quality with zero operational overhead. The open models Tabby runs are capable, but a 1.5B or 3B model self-hosted on a workstation won’t match a frontier model behind a hosted product on raw suggestion quality. You’re trading some completion strength for full control. Whether that trade pays off depends entirely on why you’re looking at self-hosting in the first place.
For most individuals with no privacy constraint, hosted tools win on convenience. For anyone who said “we can’t send our code to an API” out loud this quarter, Tabby is one of the few answers that closes that gap.
FAQ
Does Tabby work without a GPU?+
Which editors does Tabby support?+
Is Tabby actually free?+
Related reading
2026-06-09
Plandex Review: Terminal-Based AI Coding Built for Large, Multi-Step Tasks
A hands-on look at Plandex, the open-source terminal AI coding agent. How its cumulative diff sandbox, version-controlled plans, and multi-model support handle big jobs.
2026-06-09
Gemini CLI for Coding in 2026: Google's Terminal Agent Reviewed
A measured review of Gemini CLI as a coding agent in 2026 — how its ReAct loop, 1M-token context, free tier, and built-in tools hold up against Claude Code and Aider.
2026-06-09
Qodo Review: AI Test Generation and PR Review in 2026
A hands-on look at Qodo (formerly CodiumAI): how its test generation, Qodo Merge PR review, and open-source PR-Agent hold up for real teams in 2026.
2026-06-09
OpenHands Review: The Open-Source Autonomous Coding Agent in 2026
A hands-on look at OpenHands, the open-source coding agent (formerly OpenDevin): how its sandboxed runtime works, when it earns its keep, and where it still trips.
2026-06-08
Greptile vs Graphite: AI Code Review for Large Codebases in 2026
A measured comparison of Greptile and Graphite for AI code review on large repos: how each reads your codebase, what breaks at scale, and which fits your team.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.