Technical Note
From keyword matching to semantic retrieval
Why TF-IDF remains useful, where dense embeddings help, and how retrieval quality should be evaluated before generation is added.
June 2026 · 6 min read
Start with a baseline
TF-IDF remains useful because it is fast, inspectable, and difficult to hand-wave. Dense retrieval should earn its complexity by improving the queries that matter.
Measure retrieval before generation
A fluent generated answer can hide weak retrieval. Precision@k and Recall@k make the evidence layer visible before generation is treated as useful.
- Compare top-k results across representative queries.
- Review failure cases manually.
- Keep baseline and dense retrieval outputs side by side.
- Attach query-level examples when publishing a demo.
Grounding is an evidence problem
Retrieval-augmented generation is only as trustworthy as the evidence path it exposes. The right question is not whether the answer sounds good. It is whether the surfaced evidence is relevant enough to support the answer.