pickuma.
AI & Dev Tools

Tabby review: self-hosted AI code completion you actually control

Tabby is an open-source, self-hosted alternative to cloud AI code completion. What it runs, how to set it up with Docker, and when self-hosting is actually worth the ops overhead.

6 min read

Cloud code completion has a quiet cost: every keystroke of context gets shipped to someone else’s servers. For a lot of developers that’s fine. For anyone working under a security policy, on a private codebase, or just allergic to sending proprietary code through a third-party API, it’s a dealbreaker. Tabby is the open-source answer — a coding assistant you run on your own hardware, where the model, the data, and the uptime are all yours.

We set it up on a single workstation with a consumer GPU and ran it against a real project to find where the “self-hosted” promise holds and where it costs you.

What you actually control with Tabby

Tabby is a self-hosted alternative to GitHub Copilot, written in Rust and shipped as a single Docker image. You run one container, point your editor at it, and you get inline completions plus a chat panel — without any of that traffic leaving your network.

Three things are genuinely yours when you run it:

  • Your code stays put. Completion context never leaves the machine (or VPC) you deploy to. That’s the entire reason most teams look at Tabby.
  • You pick the model. Tabby pulls from an open registry of code models — StarCoder, CodeLlama, DeepSeek Coder, and the Qwen2.5-Coder family. You can swap the completion model and the chat model independently.
  • It’s free to run. The core is open source under Apache 2.0. There’s no per-seat fee; your cost is the hardware and the time to operate it.

It plugs into VS Code, JetBrains IDEs, and Vim/Neovim through official extensions, so the editor side feels close to what you’re used to.

Getting it running

The happy path is genuinely one command. With an NVIDIA GPU and Docker installed:

docker run -it --gpus all -p 8080:8080 \
  -v $HOME/.tabby:/data \
  registry.tabbyml.com/tabbyml/tabby \
  serve --model Qwen2.5-Coder-1.5B \
        --chat-model Qwen2.5-Coder-3B-Instruct \
        --device cuda

That brings up the server on localhost:8080 with a web UI, a completion model, and a chat model. You create an account on first launch, generate a token, drop it into the editor extension, and you’re completing code.

The realistic version has a few more wrinkles:

  • GPU matters. A 1.5B completion model fits comfortably on a card with 6–8 GB of VRAM and returns suggestions fast enough to feel live. Larger chat models want more headroom. You can run on CPU with --device cpu, but latency climbs to the point where inline completion stops feeling like completion.
  • Model choice is a tradeoff, not a default. Bigger models give better suggestions and cost more memory and latency. The Qwen2.5-Coder line is a reasonable starting point; size up only if your hardware allows.
  • Indexing takes a pass. Pointing Tabby at your repositories runs a one-time index, after which context retrieval is fast.

Here’s the honest comparison against hosted options:

Tabby (self-hosted)Hosted (Copilot / Cursor)
Code privacyStays on your infraSent to vendor API
CostHardware + ops timePer-seat subscription
SetupDocker + a GPUInstall and sign in
Completion qualityGood, model-dependentFrontier-model strong
MaintenanceYou own updates and uptimeVendor handles it

If you’d rather skip the ops entirely and you don’t have a hard privacy constraint, a hosted AI editor is the lower-friction path:

Cursor

A hosted AI-native editor with frontier-model completion and chat. No infrastructure to run — the tradeoff is that your code context goes through Cursor's API.

Free tier; Pro from $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Who should self-host — and who shouldn’t

Tabby earns its keep in specific situations:

  • Regulated or air-gapped environments where code can’t legally or contractually leave the building.
  • Teams at scale where per-seat AI subscriptions add up and you already have GPU capacity.
  • Privacy-first solo developers who want completion without the data tradeoff and don’t mind running a container.

It’s the wrong call if you want the strongest possible completion quality with zero operational overhead. The open models Tabby runs are capable, but a 1.5B or 3B model self-hosted on a workstation won’t match a frontier model behind a hosted product on raw suggestion quality. You’re trading some completion strength for full control. Whether that trade pays off depends entirely on why you’re looking at self-hosting in the first place.

For most individuals with no privacy constraint, hosted tools win on convenience. For anyone who said “we can’t send our code to an API” out loud this quarter, Tabby is one of the few answers that closes that gap.

FAQ

Does Tabby work without a GPU?+
Yes, with --device cpu, but inline completion latency gets high enough that it stops feeling responsive. A consumer NVIDIA GPU with 6+ GB of VRAM is the practical baseline for live completions.
Which editors does Tabby support?+
Official extensions cover VS Code, JetBrains IDEs (IntelliJ, PyCharm, and the rest of the family), and Vim/Neovim. You point each one at your Tabby server's URL and token.
Is Tabby actually free?+
The core is open source under Apache 2.0, so there's no license fee to self-host. Your real cost is the GPU hardware and the time to run and update it. There's also a paid Enterprise tier with team features like SSO.

Related reading

See all AI & Dev Tools articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.