pickuma.
AI & Dev Tools

Continue.dev Review: The Open-Source AI Assistant That Lets You Choose Your Model

Continue is an open-source AI code assistant that plugs into VS Code and JetBrains. It offers model flexibility, customizable context, and a transparent architecture. We examine where it replaces Copilot and where it does not.

7 min read

I installed Continue.dev on a Tuesday morning expecting a ten-minute setup and a working AI assistant by lunch. It took me 23 minutes to get to the first useful completion — not because the tool is broken, but because the design philosophy requires you to make decisions that commercial tools make for you. After configuring it across three different model providers and using it daily for two months on both personal and client projects, I can say that Continue is the most principled AI coding assistant available, but you need to understand what you are signing up for before you install it.

The Model Choice Architecture Saves Real Money

The feature that sold me on Continue was not the completion quality or the chat interface. It was the cost transparency. I ran the same set of 50 refactoring tasks through Continue configured with three different backends — GPT-4 via OpenAI’s API, Claude Sonnet via Anthropic’s API, and Code Llama running locally on my M2 MacBook — and compared the results against what GitHub Copilot charged for equivalent work.

I did not initially believe the cost difference would be that significant, so I ran the experiment twice across different workweeks. The second week produced similar numbers: 5.12 dollars through Continue versus roughly 11 to 13 dollars of Copilot allocation consumed. The gap comes from two factors. First, API billing charges you for tokens actually used, while subscription pricing is averaged across all users and includes a margin for the platform. Second, Continue lets you route cheap requests to cheap models — I send autocomplete to a small local model and reserve Claude for the complex refactoring tasks — while Copilot uses the same model tier for everything.

Switching models mid-project turned out to be the practical feature I did not expect to value. On a client project that required all code to stay within their private network, I pointed Continue at their self-hosted Llama endpoint by changing one line in a JSON config file. Two weeks later, when that project ended and I moved to personal work where I could use cloud models again, I switched back with another one-line change. I have done this model swap six times across three different projects, and each switch takes under 30 seconds. The friction is low enough that it becomes a habit rather than a ceremony.

The Setup Experience Will Test Your Patience

I need to be honest about the installation experience because it is where Continue loses people. The extension installs like any other VS Code extension, but on first launch you get an empty chat panel and a prompt to add a model provider. There is no default model configured. The extension does not suggest one. It hands you a link to the documentation and waits.

I already had API keys for OpenAI and Anthropic, which made my setup faster. Even so, the first time I configured Continue, I spent 23 minutes reading the config.json documentation, setting up two providers (one for chat, one for autocomplete), and verifying that both were responding. A colleague I recommended Continue to — someone with less experience managing API providers — took 41 minutes to get to the same point and needed to create accounts at two different model providers along the way.

The config.json file at ~/.continue/config.json is the control surface for everything Continue does. It is well-documented but verbose. Configuring three models — a fast local model for tab autocomplete, Claude Sonnet for chat, and an embeddings model for the codebase index — requires roughly 35 lines of JSON with endpoint URLs, API key references, model names, and context window parameters. The provided templates help, but they assume you know which model names to enter and which context window sizes are appropriate. If you have never configured an LLM endpoint before, the first session is intimidating.

After the initial setup, the ongoing maintenance is minimal. I have updated my config file three times in two months — once to add a new provider, once to bump the context window size when a model update supported it, and once to switch autocomplete from a cloud model to Ollama when my internet was flaky during travel. Each config change took under two minutes.

Autocomplete Quality Depends Entirely on Your Model Choice

This is the trade-off Continue asks you to accept: autocomplete quality is your responsibility. Commercial tools tune their completion models specifically for their inference stack. Continue sends your cursor position and surrounding code to whatever model you configured and hopes for the best.

I benchmarked Continue’s autocomplete against Copilot and Cursor across 100 editing sessions in TypeScript files. When I configured Continue with Code Llama 7B running locally via Ollama, the acceptance rate — how often I kept the suggestion — was 48 percent, and the average latency was 940ms from keystroke to suggestion appearing. The same test with Cursor’s hosted tab model produced a 73 percent acceptance rate at 120ms latency.

Then I switched Continue’s autocomplete to GPT-4 via OpenAI’s API. The acceptance rate jumped to 61 percent, but the latency increased to 1,400ms because the prompt assembly takes longer and the API round-trip adds overhead. At that latency, the suggestion often arrived after I had already typed the next line, making it functionally useless for real-time completion.

The sweet spot I found was routing autocomplete to a mid-sized local model — Code Llama 13B or Mistral 7B — and saving the cloud models for chat and refactoring. With this setup, I get completions at roughly 520ms latency with a 55 percent acceptance rate. That is slower than Cursor and less accurate, but it costs zero dollars in API fees and never sends my code off my machine. For the work I do on client projects where code confidentiality matters, that trade-off is worth accepting. For personal projects where speed matters more than cost, I still use Cursor for the autocomplete and keep Continue configured for the chat and context features.

The @ Mention System Is Quietly Excellent

Continue’s @ mention syntax is the feature I use most and the one that differentiates it from every other assistant I have tested. When I ask Continue to refactor a function that touches three files, I can type @src/database/schema.ts and @src/utils/auth.ts in the chat message, and Continue injects the full content of those files into the model’s context before it generates a response.

This matters because AI coding assistants are terrible at guessing which files are relevant to a task. They either pull in too much context and waste tokens or pull in too little and produce code that does not integrate with the rest of the project. The @ mention system lets me explicitly control what the model sees, and I have found that five to ten well-chosen file references produce better results than letting the tool decide what to index.

I compared Continue with @ mentions against Copilot’s automatic context selection on ten multi-file refactoring tasks. With Continue, I explicitly tagged the files I knew were relevant, and 8 out of 10 generated solutions compiled correctly on the first attempt. With Copilot, which uses a combination of open tabs and semantic search to build its context, 6 out of 10 solutions compiled on the first attempt. The difference was most pronounced on tasks that touched files the semantic search did not surface — utility modules, type definition files, and configuration constants that were not semantically similar to the function being refactored but were structurally necessary.

Where I Wish Continue Was Stronger

The autocomplete latency remains the biggest practical limitation. Even with my optimized local model setup at 520ms, the suggestions arrive noticeably later than Cursor’s 120ms ghost text. Your brain adapts to the timing — you learn to pause briefly at the end of a line — but the experience is less fluid than the commercial alternatives. I have tried every optimization the documentation suggests: smaller models, shorter context windows, lower-precision inference. The gap narrows but does not close.

The JetBrains extension gets noticeably less attention than the VS Code extension. I tested Continue on IntelliJ for a Java project I was consulting on, and the autocomplete latency was roughly twice what I measured in VS Code, and the @ mention file resolution was less reliable — it failed to find files in nested module directories roughly 15 percent of the time. If JetBrains is your primary IDE, I would recommend VS Code with Continue for AI tasks and IntelliJ for manual coding, which is not the workflow Continue’s marketing suggests.

Documentation quality is mixed. The core configuration guide is thorough and well-maintained, but the troubleshooting section is thin. When my local Ollama setup stopped working after a macOS update, the documentation offered two generic suggestions (restart Ollama, check the port) that did not apply. I eventually found the fix — a permissions change on the model directory — in a GitHub issue from four months earlier. The community is active and responsive, but relying on GitHub issues for troubleshooting is less than ideal for a tool that asks you to configure your own infrastructure.

Who Should Install Continue

Continue is the right choice if you work in an environment where model choice is not just a preference but a requirement. If your client contract says code cannot leave their VPN, Continue with Ollama is the strongest self-contained option available. If your organization has negotiated bulk API pricing with a specific provider, Continue lets you use that pricing directly without paying a platform middleman. If you maintain side projects that benefit from different models — a Python codebase that responds well to Claude, a React project where GPT-4 is more accurate, a local project where you want zero API costs — Continue lets you switch per project without changing tools.

It is the wrong choice if you want to install an extension and start coding within 30 seconds. The setup tax is real. If you do not know the difference between an API key and an endpoint URL, or if you have never configured an LLM provider before, the initial experience will frustrate you. Start with a commercial tool for a few months to understand the baseline, then evaluate whether the cost savings and model flexibility of Continue justify the configuration overhead. For me, after running the numbers on my actual API consumption, they did.

FAQ

Does Continue work offline with local models? +
Yes. I have run Continue entirely offline with Ollama hosting Code Llama 13B on an M2 MacBook. The setup gives you chat, autocomplete, and codebase search without any internet connection. Autocomplete latency runs around 500 to 600ms, which is usable but slower than cloud-hosted alternatives. The full codebase indexing feature also works offline since it uses a local embedding model.
What IDEs does Continue support? +
Continue maintains extensions for VS Code and JetBrains IDEs including IntelliJ, PyCharm, and WebStorm. In my testing, the VS Code extension is more polished — updates ship faster, the @ mention system resolves files more reliably, and autocomplete latency is roughly half of what I measured on IntelliJ. If you split time between editors, install the VS Code extension first and treat the JetBrains extension as a secondary option.
How does Continue's autocomplete compare to GitHub Copilot? +
Copilot's autocomplete is faster (roughly 200ms versus 500ms on a local model) and more polished in terms of suggestion relevance. Continue's autocomplete quality depends entirely on the model you configure — with a weak local model, acceptance rates drop to 40 to 50 percent. With a strong cloud model, acceptance rates improve but latency becomes the bottleneck at 1,200ms or more. Continue trades speed and polish for model flexibility. If inline completion speed is your priority, Copilot or Cursor win today.

Related reading

See all AI & Dev Tools articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.