pickuma.
AI & Dev Tools

Veles: Hybrid BM25 + Semantic Code Search in a Local Rust MCP Server

Veles is an open-source MCP server in Rust that runs BM25 keyword search and semantic vector search together over a local index, giving Claude, Cursor, and other MCP assistants more precise code retrieval.

6 min read

Your AI coding assistant is only as good as the code it can find. Ask Claude or Cursor to “tighten up the retry logic,” and the model does not read your entire repository — it retrieves a handful of files or snippets and reasons from those. If that retrieval step surfaces the wrong code, you get a confident, well-written answer built on the wrong context, and you may not notice until something breaks.

Veles is an open-source Model Context Protocol (MCP) server, written in Rust, that targets exactly this step. Instead of committing to one search strategy, it runs two at once — BM25 keyword ranking and semantic vector search — over a local index of your codebase, then returns the merged result to whatever MCP-compatible assistant you have connected.

Most AI coding tools fall back on one of two retrieval strategies, and each fails in a predictable way.

Exact-match search — grep, ripgrep, substring lookup — is precise when you already know the token you want. Search for parseAuthHeader and you get every hit. Ask for “the code that validates login tokens” and exact match returns nothing, because your phrasing shares no words with the identifier.

Semantic search flips the problem. It splits your code into chunks, turns each chunk into an embedding vector, and ranks by vector similarity. That handles paraphrase well — “validate login tokens” can match a function named parseAuthHeader. But embedding models are trained to generalize, and they tend to wash out the specificity you often need most: an exact error string, a config key like MAX_RETRIES, a rarely used function name. Ask a pure-embedding index for a literal string and it will return five files that are about the right topic while missing the one line that actually contains it.

An assistant’s context window is small next to a real repository. You are not feeding it the whole codebase; you are feeding it the top few retrieved chunks. Precision in that step is the difference between a correct refactor and a plausible-looking wrong one. Worse, the failure is usually silent: the index returns its best guess no matter what, so the assistant proceeds as if it found the right code.

What hybrid BM25 + semantic search buys you

BM25 — the Okapi BM25 ranking function — is the keyword side of Veles. It belongs to the same family of term-frequency scoring that backs the defaults in Lucene and Elasticsearch. It weights rare terms heavily, which is exactly the behavior you want for exact identifiers, string literals, and uncommon tokens. Semantic vector search is the conceptual side, catching matches where your wording and the code’s wording diverge.

Run the two separately and you pick your failure mode. Run them together and each covers the other’s blind spot: BM25 anchors the result list to exact hits, while the semantic scorer pulls in conceptually related code that shares no literal terms. Fusing two ranked lists into one is a well-documented pattern in search engineering, and it is the core idea Veles applies to code retrieval. The merge step normalizes or rank-fuses the two score lists — reciprocal rank fusion is a common approach — so neither ranker dominates simply because its raw scores sit on a different scale.

Because it speaks MCP, Veles is not tied to one assistant. The Model Context Protocol is an open standard for connecting tools and data sources to LLM clients, so any MCP-compatible assistant — Claude, Cursor, and a growing list of others — can call Veles the same way. You configure the search backend once and keep it when you switch editors.

Running it locally, and what that costs

Two design choices separate Veles from a hosted code-search service: it is local, and it is Rust.

Local means your code is indexed and searched on your own machine. For proprietary codebases, client work under NDA, or anything in a regulated industry, that removes a real blocker — you get embedding-quality retrieval without shipping source to a third-party API. The project describes Veles as keeping code off the cloud, which is the entire point of running the index yourself.

Rust means Veles ships as a native binary with no Python or Node runtime to install and manage. For a process that sits between your editor and every query you make, low overhead and fast indexing are not cosmetic — they decide whether the tool feels instant or laggy. A compiled binary with no garbage collector also keeps memory predictable on a large monorepo.

Setup follows the standard MCP pattern: add Veles to your client’s MCP configuration — the mcpServers block in Claude Desktop or Cursor — point it at the directory you want indexed, and let it build the index. Most clients let you scope the server to specific folders, which is worth doing on a monorepo so the index stays focused. After that, the assistant gains a code-search tool it can call on its own, with no change to how you prompt.

Cursor

Cursor is an MCP-compatible AI code editor — pair it with a server like Veles so the assistant gets hybrid local code search as a callable tool.

Free tier; Pro at $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Veles is young and open-source, so adopt it deliberately. Read the configuration docs, check how indexing and refresh are handled, and test retrieval quality on your own repository before you depend on it. What the project gets right is the diagnosis: retrieval, not the model, is often the weak link in an AI coding workflow, and a hybrid local index is a sound way to strengthen it.

FAQ

Do I need to send my code to the cloud to use Veles? +
No. Veles indexes and searches your codebase locally, which is its main design goal. The detail to verify is the embedding backend: if it calls a remote embedding API, code chunks leave your machine for that step even though search runs locally. Check the project configuration before indexing a sensitive repo.
Why combine BM25 with semantic search instead of using embeddings alone? +
Embedding-only search is weak at exact tokens such as identifiers, error strings, and config keys, because embedding models generalize that specificity away. BM25 handles those precisely. Running both rankers and merging their results means exact matches and conceptual matches both surface.
Which AI assistants can use Veles? +
Any MCP-compatible client. Veles is a Model Context Protocol server, so Claude, Cursor, and other assistants that support MCP connect to it through their standard MCP configuration.

Related reading

See all AI & Dev Tools articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.