Veles: Hybrid BM25 + Semantic Code Search in a Local Rust MCP Server
Veles is an open-source MCP server in Rust that runs BM25 keyword search and semantic vector search together over a local index, giving Claude, Cursor, and other MCP assistants more precise code retrieval.
Your AI coding assistant is only as good as the code it can find. Ask Claude or Cursor to “tighten up the retry logic,” and the model does not read your entire repository — it retrieves a handful of files or snippets and reasons from those. If that retrieval step surfaces the wrong code, you get a confident, well-written answer built on the wrong context, and you may not notice until something breaks.
Veles is an open-source Model Context Protocol (MCP) server, written in Rust, that targets exactly this step. Instead of committing to one search strategy, it runs two at once — BM25 keyword ranking and semantic vector search — over a local index of your codebase, then returns the merged result to whatever MCP-compatible assistant you have connected.
Why code retrieval is the weak link
Most AI coding tools fall back on one of two retrieval strategies, and each fails in a predictable way.
Exact-match search — grep, ripgrep, substring lookup — is precise when you already know the token you want. Search for parseAuthHeader and you get every hit. Ask for “the code that validates login tokens” and exact match returns nothing, because your phrasing shares no words with the identifier.
Semantic search flips the problem. It splits your code into chunks, turns each chunk into an embedding vector, and ranks by vector similarity. That handles paraphrase well — “validate login tokens” can match a function named parseAuthHeader. But embedding models are trained to generalize, and they tend to wash out the specificity you often need most: an exact error string, a config key like MAX_RETRIES, a rarely used function name. Ask a pure-embedding index for a literal string and it will return five files that are about the right topic while missing the one line that actually contains it.
An assistant’s context window is small next to a real repository. You are not feeding it the whole codebase; you are feeding it the top few retrieved chunks. Precision in that step is the difference between a correct refactor and a plausible-looking wrong one. Worse, the failure is usually silent: the index returns its best guess no matter what, so the assistant proceeds as if it found the right code.
What hybrid BM25 + semantic search buys you
BM25 — the Okapi BM25 ranking function — is the keyword side of Veles. It belongs to the same family of term-frequency scoring that backs the defaults in Lucene and Elasticsearch. It weights rare terms heavily, which is exactly the behavior you want for exact identifiers, string literals, and uncommon tokens. Semantic vector search is the conceptual side, catching matches where your wording and the code’s wording diverge.
Run the two separately and you pick your failure mode. Run them together and each covers the other’s blind spot: BM25 anchors the result list to exact hits, while the semantic scorer pulls in conceptually related code that shares no literal terms. Fusing two ranked lists into one is a well-documented pattern in search engineering, and it is the core idea Veles applies to code retrieval. The merge step normalizes or rank-fuses the two score lists — reciprocal rank fusion is a common approach — so neither ranker dominates simply because its raw scores sit on a different scale.
Because it speaks MCP, Veles is not tied to one assistant. The Model Context Protocol is an open standard for connecting tools and data sources to LLM clients, so any MCP-compatible assistant — Claude, Cursor, and a growing list of others — can call Veles the same way. You configure the search backend once and keep it when you switch editors.
Running it locally, and what that costs
Two design choices separate Veles from a hosted code-search service: it is local, and it is Rust.
Local means your code is indexed and searched on your own machine. For proprietary codebases, client work under NDA, or anything in a regulated industry, that removes a real blocker — you get embedding-quality retrieval without shipping source to a third-party API. The project describes Veles as keeping code off the cloud, which is the entire point of running the index yourself.
Rust means Veles ships as a native binary with no Python or Node runtime to install and manage. For a process that sits between your editor and every query you make, low overhead and fast indexing are not cosmetic — they decide whether the tool feels instant or laggy. A compiled binary with no garbage collector also keeps memory predictable on a large monorepo.
Setup follows the standard MCP pattern: add Veles to your client’s MCP configuration — the mcpServers block in Claude Desktop or Cursor — point it at the directory you want indexed, and let it build the index. Most clients let you scope the server to specific folders, which is worth doing on a monorepo so the index stays focused. After that, the assistant gains a code-search tool it can call on its own, with no change to how you prompt.
Cursor
Cursor is an MCP-compatible AI code editor — pair it with a server like Veles so the assistant gets hybrid local code search as a callable tool.
Free tier; Pro at $20/mo
Affiliate link · We earn a commission at no cost to you.
Veles is young and open-source, so adopt it deliberately. Read the configuration docs, check how indexing and refresh are handled, and test retrieval quality on your own repository before you depend on it. What the project gets right is the diagnosis: retrieval, not the model, is often the weak link in an AI coding workflow, and a hybrid local index is a sound way to strengthen it.
FAQ
Do I need to send my code to the cloud to use Veles? +
Why combine BM25 with semantic search instead of using embeddings alone? +
Which AI assistants can use Veles? +
Related reading
2026-05-21
Agetor Review: An Open-Source Kanban Board for Orchestrating Claude Code
Agetor is a 0.0.1 open-source orchestrator that pairs a Kanban board with Claude Code so you can run parallel agent tasks without juggling terminal tabs. A first look at what it does and what's planned.
2026-05-21
Git for AI Agents: Version Control Built for LLM Coding Workflows
When an AI agent commits 40 times in an afternoon, git records every diff but none of the reasoning. Agent-native version control stores why each change was made, so you can bisect through agent sessions, not just diffs.
2026-05-21
Amp's Neo CLI: Why AI Coding Agents Still Live in the Terminal
Sourcegraph's Amp is reworking the command line around autonomous AI coding agents. Here's why the terminal remains core infrastructure for agentic development — and what changes when software, not a person, is the operator.
2026-05-21
Arcjet for AI Agents: Securing the Attack Surface Inside LLM Apps
Arcjet is moving its in-app security guards into AI agents, adding runtime checks against prompt injection, unsafe file reads, and risky web fetches. Here's why agentic apps need guardrails at the point of action, not just the network edge.
2026-05-20
How to Build an Autonomous AI Coding Agent That Opens GitHub PRs Overnight
A practical breakdown of the plan-execute-verify loop behind an autonomous AI coding agent, and how to wire it to GitHub so an issue becomes a reviewable pull request overnight.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.