WHAT KIND OF CONTEXT SEARCH ENGINE DO WE ACTUALLY NEED?

AI coding assistants burn thousands of tokens just figuring out where they are. The context retrieval problem is the real bottleneck — and the solutions look nothing like traditional search.

March 30, 2026·Liu Wei

AI coding assistants burn thousands of tokens just figuring out where they are. The context retrieval problem is the real bottleneck — and the solutions look nothing like traditional search.

AIContext EngineCode SearchDeveloper ToolsMCP

Every time you start a new session with an AI coding assistant — Claude Code, Cursor, Continue, Copilot — something wasteful happens. The agent doesn't know your codebase. It doesn't know which files matter, how modules connect, or where the entry points are. So it does what any confused newcomer does: it greps around, reads files one by one, and burns through tens of thousands of tokens before it can even begin to help.

On a 100K-line codebase, this cold-start exploration routinely consumes 50,000+ tokens. That's not a rounding error — it's the dominant cost of the interaction. This is the context retrieval problem. And it's the bottleneck that separates AI tools that feel magical from ones that feel like interns on their first day.

Here's a distinction that matters more than it first appears. Search answers: "where is the code that matches this query?" Orientation answers: "what does this codebase look like, and what should I pay attention to given what I'm working on?" Most existing tools solve search. Grep, ripgrep, LSP "find references" — these are all precise instruments for known queries. But an AI coding assistant doesn't always know what to ask. When you say "refactor the authentication module," the agent first needs to understand what the authentication module is, what it touches, and what would break if it changed. This is a fundamentally different retrieval problem. It's closer to "give me a mental model of this codebase" than "find me a string."

Looking at the open-source landscape, four distinct approaches to code context retrieval have emerged. Each embodies a different philosophy about what matters most.

Aider, with around 42K GitHub stars, pioneered what might be the most elegant solution. Its RepoMap algorithm parses the entire codebase with tree-sitter, builds a dependency graph, runs PageRank personalized by what you're editing, and fits the result to a token budget via binary search. A 100K-line codebase gets distilled into roughly 1,024 tokens of pure architectural signal. The key insight: the same repository generates different maps depending on context. If you're editing the auth module, auth-adjacent files get boosted 50x. Identifiers you mention in conversation get 10x weight. The map is conversation-aware. The tradeoff: purely structural. It knows UserService calls DatabasePool, but it doesn't know they're semantically related to "user authentication."

Claude Context, built by Zilliz, represents the vector search school. It chunks code using AST-aware splitting, generates embeddings, stores vectors in Milvus with BM25 sparse vectors alongside dense vectors, and runs hybrid search at query time. The promise: find semantically related code even when naming conventions differ. The tradeoff: generic embedding models score only 0.42–0.49 on code retrieval quality benchmarks. Without the resources to train custom models, the ceiling is low.

Zoekt, originally built at Google and maintained by Sourcegraph, takes the "make search impossibly fast" approach: trigram indexes, a 4-tier progressive evaluation pipeline, results in under 50ms across millions of lines. Battle-tested at enormous scale. But it's pure text search — no understanding of code structure, no conversation-awareness.

SCIP goes the other direction entirely: compiler-level precision. Language-specific indexers produce globally unique symbol IDs, typed relationships, and cross-repository navigation. This powers Sourcegraph's "Go to definition" with accuracy no heuristic can match. The tradeoff: massive infrastructure investment, and the precision doesn't directly solve "what should I look at first?"

Here's what the ideal context search engine looks like. It doesn't exist yet in open source, but the pieces are all there.

The single most important design principle: token budget as output contract. An AI coding assistant has a finite context window. Every token spent on context is a token not available for reasoning. The context engine's job isn't to return "all relevant results" — it's to return the most useful information that fits in N tokens. This means binary search over ranked results, fitting to an exact budget. It means AST-aware compression: signatures, not full function bodies. A 100K-line codebase should be representable in 1,024 tokens of pure signal.

Every tool should behave differently depending on what you're working on. Aider demonstrated this with personalized PageRank. But this can go further. Imagine a context engine that tracks current editing context, conversation history, task trajectory, and change impact — if you just modified UserService.validate(), proactively surface its callers. No existing open-source tool combines all of these signals.

This is a controversial take, but the data supports it: structural signals should come before semantic similarity. Generic embedding models achieve 0.42–0.49 retrieval quality on code. Meanwhile, tree-sitter based def/ref graphs are deterministically correct, zero-cost to run, language-aware, and available for 130+ languages. The structural approach has a higher floor and a more predictable ceiling.

The ideal engine works on any machine, for any project, without setup. If Zoekt is installed, use it. If not, fall back to ripgrep. If tree-sitter has a grammar, use AST-aware parsing. If not, fall back to regex. No Docker containers. No vector databases. No cloud APIs. Start from zero infrastructure and add optional accelerators.

The architecture: four MCP tools, one shared ranking layer, zero cloud dependencies. get_repo_map solves cold-start orientation — one call, 1,024 tokens, and the agent understands the codebase. search_code replaces grep with ranked, compressed results. get_impact answers "what will this change break?" with BFS on the dependency graph. get_recent_changes provides temporal context with semantic summaries of recent commits.

The Model Context Protocol has become the interoperability standard — 97M+ monthly SDK downloads, adopted by Anthropic, OpenAI, Google, and Microsoft. Building a context engine as an MCP server means it works with any compatible client: Claude Code, Cursor, Continue, Copilot, Zed.

The landscape is converging. Sourcegraph's Cody dropped embeddings in favor of native keyword search. Continue.dev acknowledged the accuracy limits of codebase retrieval with current embedding models. Aider continues refining its PageRank approach. The trend is clear: the industry is moving from "smarter models" to "smarter context." The models are already capable. The bottleneck is feeding them the right information.

The pieces exist. Aider proved conversation-aware PageRank works. Zoekt proved trigram indexing searches millions of lines in milliseconds. SCIP proved compiler-level precision is achievable. MCP proved tool interoperability is solvable. What nobody has done — in open source — is combine them into a single, zero-dependency engine optimized for AI coding assistants.

This isn't a moonshot. It's about 1,500 lines of Python, four well-designed MCP tools, and a shared ranking layer. The algorithms are published. The libraries are open source. The protocol is standardized.

The context search engine we need isn't a smarter search box. It's a system that understands what you're doing, knows what's architecturally important, fits its answers into your token budget, and works everywhere without asking you to set up a vector database first. The era of "grep and pray" should be behind us.