Tabby review: self-hosted AI code completion you actually control

Cloud code completion has a quiet cost: every keystroke of context gets shipped to someone else’s servers. For a lot of developers that’s fine. For anyone working under a security policy, on a private codebase, or just allergic to sending proprietary code through a third-party API, it’s a dealbreaker. Tabby is the open-source answer — a coding assistant you run on your own hardware, where the model, the data, and the uptime are all yours.

We set it up on a single workstation with a consumer GPU and ran it against a real project to find where the “self-hosted” promise holds and where it costs you.

What you actually control with Tabby

Tabby is a self-hosted alternative to GitHub Copilot, written in Rust and shipped as a single Docker image. You run one container, point your editor at it, and you get inline completions plus a chat panel — without any of that traffic leaving your network.

Three things are genuinely yours when you run it:

Your code stays put. Completion context never leaves the machine (or VPC) you deploy to. That’s the entire reason most teams look at Tabby.
You pick the model. Tabby pulls from an open registry of code models — StarCoder, CodeLlama, DeepSeek Coder, and the Qwen2.5-Coder family. You can swap the completion model and the chat model independently.
It’s free to run. The core is open source under Apache 2.0. There’s no per-seat fee; your cost is the hardware and the time to operate it.

It plugs into VS Code, JetBrains IDEs, and Vim/Neovim through official extensions, so the editor side feels close to what you’re used to.

Getting it running

The happy path is genuinely one command. With an NVIDIA GPU and Docker installed:

docker run -it --gpus all -p 8080:8080 \
  -v $HOME/.tabby:/data \
  registry.tabbyml.com/tabbyml/tabby \
  serve --model Qwen2.5-Coder-1.5B \
        --chat-model Qwen2.5-Coder-3B-Instruct \
        --device cuda

That brings up the server on localhost:8080 with a web UI, a completion model, and a chat model. You create an account on first launch, generate a token, drop it into the editor extension, and you’re completing code.

The realistic version has a few more wrinkles:

GPU matters. A 1.5B completion model fits comfortably on a card with 6–8 GB of VRAM and returns suggestions fast enough to feel live. Larger chat models want more headroom. You can run on CPU with --device cpu, but latency climbs to the point where inline completion stops feeling like completion.
Model choice is a tradeoff, not a default. Bigger models give better suggestions and cost more memory and latency. The Qwen2.5-Coder line is a reasonable starting point; size up only if your hardware allows.
Indexing takes a pass. Pointing Tabby at your repositories runs a one-time index, after which context retrieval is fast.

Here’s the honest comparison against hosted options:

	Tabby (self-hosted)	Hosted (Copilot / Cursor)
Code privacy	Stays on your infra	Sent to vendor API
Cost	Hardware + ops time	Per-seat subscription
Setup	Docker + a GPU	Install and sign in
Completion quality	Good, model-dependent	Frontier-model strong
Maintenance	You own updates and uptime	Vendor handles it

If you’d rather skip the ops entirely and you don’t have a hard privacy constraint, a hosted AI editor is the lower-friction path:

Cursor

A hosted AI-native editor with frontier-model completion and chat. No infrastructure to run — the tradeoff is that your code context goes through Cursor's API.

Free tier; Pro from $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Who should self-host — and who shouldn’t

Tabby earns its keep in specific situations:

Regulated or air-gapped environments where code can’t legally or contractually leave the building.
Teams at scale where per-seat AI subscriptions add up and you already have GPU capacity.
Privacy-first solo developers who want completion without the data tradeoff and don’t mind running a container.

It’s the wrong call if you want the strongest possible completion quality with zero operational overhead. The open models Tabby runs are capable, but a 1.5B or 3B model self-hosted on a workstation won’t match a frontier model behind a hosted product on raw suggestion quality. You’re trading some completion strength for full control. Whether that trade pays off depends entirely on why you’re looking at self-hosting in the first place.

For most individuals with no privacy constraint, hosted tools win on convenience. For anyone who said “we can’t send our code to an API” out loud this quarter, Tabby is one of the few answers that closes that gap.

FAQ

Does Tabby work without a GPU?+

Yes, with --device cpu, but inline completion latency gets high enough that it stops feeling responsive. A consumer NVIDIA GPU with 6+ GB of VRAM is the practical baseline for live completions.

Which editors does Tabby support?+

Official extensions cover VS Code, JetBrains IDEs (IntelliJ, PyCharm, and the rest of the family), and Vim/Neovim. You point each one at your Tabby server's URL and token.

Is Tabby actually free?+

The core is open source under Apache 2.0, so there's no license fee to self-host. Your real cost is the GPU hardware and the time to run and update it. There's also a paid Enterprise tier with team features like SSO.

Tabby review: self-hosted AI code completion you actually control

What you actually control with Tabby

Getting it running

Cursor

Who should self-host — and who shouldn’t

FAQ

Plandex Review: Terminal-Based AI Coding Built for Large, Multi-Step Tasks

Gemini CLI for Coding in 2026: Google's Terminal Agent Reviewed

Qodo Review: AI Test Generation and PR Review in 2026

OpenHands Review: The Open-Source Autonomous Coding Agent in 2026

Greptile vs Graphite: AI Code Review for Large Codebases in 2026

Get the best tools, weekly