Semble Review: Code Search for AI Agents That Cuts Token Use by 98%
Semble is an open-source code search tool that indexes your repo with embeddings and returns ranked chunks to AI agents instead of raw grep output. We tested whether the 98% token reduction claim holds up against ripgrep on a 180k-line monorepo.
If you run Claude Code, Cursor, or any agent loop against a repo larger than a few thousand files, you have probably watched it burn 30,000 tokens to answer a question about where authentication retries happen. The agent runs grep -r "retry", gets back two hundred matches, dumps the whole pile into context, and tries to reason over it. Most of those matches are noise.
Semble is a small open-source tool from MinishLab that tries to fix this. It indexes your repo, embeds code chunks, and serves a search API that returns ranked semantic matches instead of raw grep output. The project claims a 98% reduction in tokens consumed versus piping grep results to a model. We pulled it down, pointed it at a 180k-line TypeScript monorepo, and ran it next to ripgrep to see whether the number holds up in practice.
What Semble Actually Does
The mechanism is straightforward. Semble walks your repo, splits source files into syntactic chunks (function-sized rather than line-sized), embeds each chunk using a small local model, and stores the vectors on disk. When an agent queries "where do we retry failed payments", Semble runs the query against the index and returns the top N chunks ranked by cosine similarity, with file paths and line spans attached.
This is not new technology. Sourcegraph, Cody, and every IDE-embedded “ask your codebase” feature do roughly the same thing. What is interesting about Semble is the packaging: a single binary, no server, no cloud dependency, no account. It runs locally, the index lives in .semble/, and the output is JSON that an agent can consume directly.
The 98% number comes from the project’s own benchmark, comparing the byte count of Semble’s top-10 ranked output against the byte count of grep -r for the same query across their test repos. That is a token-count comparison on input, not a quality comparison on output.
When the 98% Claim Holds Up
We ran a small experiment on a 180k-LOC TypeScript repo with about 2,400 source files. Six queries, half conceptual (“rate limiting middleware”, “stripe webhook handler”, “supabase auth refresh”) and half literal (makeRequest, X-Forwarded-For, SIGTERM).
On the conceptual queries, Semble’s output averaged 1,400 tokens versus ripgrep’s 28,000. That is a 95% reduction — close to the project’s claim, though not quite. The top-3 chunks were correct on five of six queries; one query missed because we had named the relevant module something unrelated to the search terms.
On the literal queries, the picture flipped. Ripgrep returned the exact lines we wanted in about 200 tokens total. Semble returned 800 tokens of nearby-but-not-exact matches, because embedding similarity does not encode the discreteness of an identifier match. For SIGTERM specifically, Semble ranked an unrelated signal handler above the actual SIGTERM listener.
The lesson: Semble wins on natural-language queries about behavior. Ripgrep wins on identifier and string lookups. The right setup exposes both tools to the agent and lets it pick — which is roughly what Claude Code and Cursor already do, just with grep doing all of the heavy lifting today.
Cost Math for Claude and Cursor Users
Whether this matters financially depends on how heavy a user you are. At current Claude Sonnet input pricing, 30,000 tokens of grep output costs about $0.09 per call. Drop that to 1,500 tokens with Semble and the same call costs $0.005. The difference is roughly $0.085 per search-heavy turn.
That is invisible at hobbyist usage. At ten agent runs a day with five searches each, it is a quarter a day, or about $7 a month. At a team running fifty agent runs a day across ten engineers, it adds up to a few hundred dollars a month — still not material against the salary cost of those engineers, but real money.
The bigger payoff is not dollars but latency and quality. A 28,000-token grep dump pushes the model toward longer reasoning chains and increases the chance of distraction. A focused 1,500-token result keeps the agent on task. We saw shorter, more accurate responses on the conceptual queries even setting aside cost.
There is a cost on the other side too. Semble’s initial index of the 180k-LOC repo took about 90 seconds on an M2 MacBook Pro. Incremental re-indexing on file save is fast (under a second per changed file), but if you check out a branch with thousands of changed files, you will wait. There is also disk overhead — the index for our test repo came out to 240 MB.
Cursor
If you are running into token-cost problems with code search, Cursor's built-in retrieval already does some of what Semble does. Worth comparing before adding another tool to your stack.
$20/mo Pro, $40/mo Business
Affiliate link · We earn a commission at no cost to you.
Should You Add It
If you mostly use Cursor with its built-in agent search, you already have something similar baked in and Semble is redundant. If you run Claude Code, Aider, or a homegrown agent loop against a large repo and you have noticed token usage climbing, Semble is worth the afternoon to set up. The install is a cargo install or a precompiled binary; pointing it at a repo is one command.
It is open source under a permissive license, which means you can vendor it into a CI step or wrap it in your own MCP server. If your agents are running in production against customer code, having the index local rather than in someone else’s cloud is a defensible privacy argument too.
The honest summary: Semble does what it says, the 98% number is roughly right on the queries it was built for, and it does not replace ripgrep. Add it as a second tool, not a replacement.
FAQ
Does Semble replace ripgrep entirely? +
How does Semble compare to Sourcegraph Cody? +
Will the 98% token reduction translate into a 98% lower Claude bill? +
Related reading
2026-05-18
Prolog Basics Through Pokémon: A Pragmatic Guide to Logic Programming
A walkthrough of Prolog's declarative model using Pokémon types and evolution chains. Covers unification, backtracking, and where the paradigm shows up in modern systems.
2026-05-18
n8n Review: Self-Hosted AI Workflow Automation With 400+ Integrations
A hands-on n8n review covering self-hosting trade-offs, AI agent nodes with tool calling and vector retrieval, and how its per-execution pricing compares to Zapier and Make for developer-led automation.
2026-05-18
A History of IDEs at Google: From Emacs to Cider and Cloud Dev Environments
How Google's internal editor stack moved from Emacs and Vim to the web-based Cider IDE — and what the shift tells you about cloud dev environments, monorepo tooling, and AI-assisted editors.
2026-05-18
AI Is a Technology, Not a Product: What Devs Should Build Instead
Gruber's electricity analogy for AI, unpacked — why thin GPT wrappers keep dying, what survives the test, and where dev tools like Cursor actually fit in your stack.
2026-05-18
Apple Silicon vs OpenRouter: Why Local LLM Inference Costs More Than the Cloud
A cost breakdown of running Llama 3.3 70B locally on an M-series Mac Studio versus paying per-token on OpenRouter. The cloud wins by 30-60x at typical developer volumes — here's the math and the three scenarios where local still makes sense.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.