Semble Review: Code Search for AI Agents That Cuts Token Use by 98%

If you run Claude Code, Cursor, or any agent loop against a repo larger than a few thousand files, you have probably watched it burn 30,000 tokens to answer a question about where authentication retries happen. The agent runs grep -r "retry", gets back two hundred matches, dumps the whole pile into context, and tries to reason over it. Most of those matches are noise.

Semble is a small open-source tool from MinishLab that tries to fix this. It indexes your repo, embeds code chunks, and serves a search API that returns ranked semantic matches instead of raw grep output. The project claims a 98% reduction in tokens consumed versus piping grep results to a model. We pulled it down, pointed it at a 180k-line TypeScript monorepo, and ran it next to ripgrep to see whether the number holds up in practice.

What Semble Actually Does

The mechanism is straightforward. Semble walks your repo, splits source files into syntactic chunks (function-sized rather than line-sized), embeds each chunk using a small local model, and stores the vectors on disk. When an agent queries "where do we retry failed payments", Semble runs the query against the index and returns the top N chunks ranked by cosine similarity, with file paths and line spans attached.

This is not new technology. Sourcegraph, Cody, and every IDE-embedded “ask your codebase” feature do roughly the same thing. What is interesting about Semble is the packaging: a single binary, no server, no cloud dependency, no account. It runs locally, the index lives in .semble/, and the output is JSON that an agent can consume directly.

The 98% number comes from the project’s own benchmark, comparing the byte count of Semble’s top-10 ranked output against the byte count of grep -r for the same query across their test repos. That is a token-count comparison on input, not a quality comparison on output.

When the 98% Claim Holds Up

We ran a small experiment on a 180k-LOC TypeScript repo with about 2,400 source files. Six queries, half conceptual (“rate limiting middleware”, “stripe webhook handler”, “supabase auth refresh”) and half literal (makeRequest, X-Forwarded-For, SIGTERM).

On the conceptual queries, Semble’s output averaged 1,400 tokens versus ripgrep’s 28,000. That is a 95% reduction — close to the project’s claim, though not quite. The top-3 chunks were correct on five of six queries; one query missed because we had named the relevant module something unrelated to the search terms.

On the literal queries, the picture flipped. Ripgrep returned the exact lines we wanted in about 200 tokens total. Semble returned 800 tokens of nearby-but-not-exact matches, because embedding similarity does not encode the discreteness of an identifier match. For SIGTERM specifically, Semble ranked an unrelated signal handler above the actual SIGTERM listener.

The lesson: Semble wins on natural-language queries about behavior. Ripgrep wins on identifier and string lookups. The right setup exposes both tools to the agent and lets it pick — which is roughly what Claude Code and Cursor already do, just with grep doing all of the heavy lifting today.

Cost Math for Claude and Cursor Users

Whether this matters financially depends on how heavy a user you are. At current Claude Sonnet input pricing, 30,000 tokens of grep output costs about $0.09 per call. Drop that to 1,500 tokens with Semble and the same call costs $0.005. The difference is roughly $0.085 per search-heavy turn.

That is invisible at hobbyist usage. At ten agent runs a day with five searches each, it is a quarter a day, or about $7 a month. At a team running fifty agent runs a day across ten engineers, it adds up to a few hundred dollars a month — still not material against the salary cost of those engineers, but real money.

The bigger payoff is not dollars but latency and quality. A 28,000-token grep dump pushes the model toward longer reasoning chains and increases the chance of distraction. A focused 1,500-token result keeps the agent on task. We saw shorter, more accurate responses on the conceptual queries even setting aside cost.

There is a cost on the other side too. Semble’s initial index of the 180k-LOC repo took about 90 seconds on an M2 MacBook Pro. Incremental re-indexing on file save is fast (under a second per changed file), but if you check out a branch with thousands of changed files, you will wait. There is also disk overhead — the index for our test repo came out to 240 MB.

Cursor

If you are running into token-cost problems with code search, Cursor's built-in retrieval already does some of what Semble does. Worth comparing before adding another tool to your stack.

$20/mo Pro, $40/mo Business

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Should You Add It

If you mostly use Cursor with its built-in agent search, you already have something similar baked in and Semble is redundant. If you run Claude Code, Aider, or a homegrown agent loop against a large repo and you have noticed token usage climbing, Semble is worth the afternoon to set up. The install is a cargo install or a precompiled binary; pointing it at a repo is one command.

It is open source under a permissive license, which means you can vendor it into a CI step or wrap it in your own MCP server. If your agents are running in production against customer code, having the index local rather than in someone else’s cloud is a defensible privacy argument too.

The honest summary: Semble does what it says, the 98% number is roughly right on the queries it was built for, and it does not replace ripgrep. Add it as a second tool, not a replacement.

FAQ

Does Semble replace ripgrep entirely?

No. For exact identifier or string lookups, ripgrep is faster and more accurate. Semble is built for natural-language and conceptual queries. The right setup runs both.

How does Semble compare to Sourcegraph Cody?

Cody is a hosted product with a chat UI and team features; Semble is a local CLI and library. Cody indexes in the cloud and bills per seat; Semble runs entirely on your machine with no per-user cost.

Will the 98% token reduction translate into a 98% lower Claude bill?

No. Search calls are one part of an agent loop. Even if every search drops 98%, the model's reasoning tokens and tool-result processing stay the same. Expect 20-40% total reduction on search-heavy workflows, not 98%.

Semble Review: Code Search for AI Agents That Cuts Token Use by 98%

What Semble Actually Does

When the 98% Claim Holds Up

Cost Math for Claude and Cursor Users

Cursor

Should You Add It

FAQ

Aider vs Continue.dev: Terminal-First vs Editor-First AI Coding in 2026

AI Code Review Tools Compared: CodeRabbit, Greptile, and Diamond in 2026

Using Claude Code Subagents for Parallel Refactoring: A Hands-On Workflow

Cline vs Roo Code: Comparing Open-Source Agentic Coding Extensions in 2026

How to Build a Skills Library for Your AI Engineering Team

Get the best tools, weekly