Two-Stage Retrieval

The problem

A single embedding model (bi-encoder) is fast but has roughly 85% accuracy on food dedup and search tasks. For production use, you need higher precision. False positives (merging "Chicken Burger" with "Veg Burger") and false negatives (missing that "Murgh Makhani" = "Butter Chicken") both have real costs.

The solution: two stages

dish-embed uses a two-stage architecture internally:

Stage 1: Bi-encoder (fast candidate retrieval)

The bi-encoder embeds each item independently into a vector. Comparing items is just cosine similarity between vectors. This is fast (O(n) encoding, instant comparison) and handles the full catalog.

It produces a shortlist of candidates that pass a similarity threshold.

Stage 2: Cross-encoder reranker (precise scoring)

The reranker takes each candidate pair and examines them jointly in a single forward pass. Unlike the bi-encoder which embeds items independently, the reranker sees both items together and can reason about their relationship directly. This is slower (one inference per pair) but significantly more accurate.

Two rerankers for two problems

dish-embed uses separate rerankers for dedup and search because these are fundamentally different tasks:

Dedup reranker

Used by /match, /dedup, and /report. Trained to answer: "Are these the same dish?"

Butter Chicken vs Murgh Makhani: YES (same dish, different names)
Butter Chicken vs Dal Makhani: NO (different dishes, despite similar names)
Chicken Biryani vs Chiken Biryani: YES (spelling variant)

Search reranker

Used by /search. Trained to answer: "Is this result relevant to the query?"

Query "Indian curry" vs Butter Chicken: YES (highly relevant)
Query "Indian curry" vs Dal Makhani: YES (also relevant)
Query "Indian curry" vs Sushi Roll: NO (not relevant)

Why separate?

A good dedup model must penalize related-but-different items. Butter Chicken and Dal Makhani are both Indian curries, but they are NOT the same dish. The dedup reranker correctly rejects this pair.

A good search model should rank both highly for the query "Indian curry". Using the dedup reranker for search would incorrectly suppress relevant results. Using the search reranker for dedup would incorrectly merge different dishes.

Transparent to you

This two-stage pipeline runs automatically inside the API. You call /search or /dedup with plain text and get results. The staging, thresholding, and reranking are handled server-side. No configuration needed.

The only visible signal is the reranker_score field in responses, which reflects the Stage 2 precision score. Higher is more confident.