API Reference
POST /dedup
Send a full menu and get back clusters of items that refer to the same dish. Catches spelling errors, transliterations, promotional noise, and serving size variations. Returns groups of duplicates plus a list of singletons (unique items).
Request
| Parameter | Type | Required | Description |
|---|---|---|---|
items | string[] | Yes | Menu item texts to deduplicate. 1-2000 items, max 500 chars each. |
cosine_threshold | float | No | Similarity threshold (0.85 works well, lower = more aggressive grouping). Default: 0.85. |
{
"items": [
"Chicken Biryani",
"Murgh Biryani Serves 2",
"Paneer Tikka",
"Panner Tika",
"Masala Dosa",
"**NEW** Chiken Biryani (Half)"
]
}
Response
| Field | Type | Description |
|---|---|---|
clusters | object[] | Groups of duplicate items |
singletons | string[] | Items with no duplicates found |
total_items | int | Total items submitted |
duplicate_items | int | Number of excess duplicates (items in clusters minus number of clusters) |
processing_time_ms | float | Processing time in milliseconds |
Each cluster:
| Field | Type | Description |
|---|---|---|
cluster_id | int | Unique cluster identifier (1-based) |
canonical | string | Shortest member, recommended as the canonical name |
members | string[] | All items in this duplicate group (original text) |
pairwise_scores | object[] | Similarity scores between each pair of members |
{
"clusters": [
{
"cluster_id": 1,
"canonical": "Chicken Biryani",
"members": ["Chicken Biryani", "Murgh Biryani Serves 2", "**NEW** Chiken Biryani (Half)"],
"pairwise_scores": [
{"item_a": "Chicken Biryani", "item_b": "Murgh Biryani Serves 2", "cosine": 0.93, "reranker_score": 0.96},
{"item_a": "Chicken Biryani", "item_b": "**NEW** Chiken Biryani (Half)", "cosine": 0.95, "reranker_score": 0.98},
{"item_a": "Murgh Biryani Serves 2", "item_b": "**NEW** Chiken Biryani (Half)", "cosine": 0.91, "reranker_score": 0.94}
]
},
{
"cluster_id": 2,
"canonical": "Paneer Tikka",
"members": ["Paneer Tikka", "Panner Tika"],
"pairwise_scores": [
{"item_a": "Paneer Tikka", "item_b": "Panner Tika", "cosine": 0.96, "reranker_score": 0.99}
]
}
],
"singletons": ["Masala Dosa"],
"total_items": 6,
"duplicate_items": 3,
"processing_time_ms": 187.4
}
Example
import requests
menu = [
"Chicken Biryani",
"Murgh Biryani Serves 2",
"Paneer Tikka",
"Panner Tika",
"Masala Dosa",
"**NEW** Chiken Biryani (Half)"
]
response = requests.post("https://embed.statode.com/dedup",
headers={"X-API-Key": "YOUR_KEY", "Content-Type": "application/json"},
json={"items": menu}
)
data = response.json()
print(f"Found {len(data['clusters'])} duplicate groups, {data['duplicate_items']} excess items")
for cluster in data["clusters"]:
print(f"\n Canonical: {cluster['canonical']}")
print(f" Duplicates: {cluster['members']}")
Cost
1.0 credits per item.
Notes
duplicate_itemscounts excess items (total items in clusters minus number of clusters), not total clustered items. A cluster of 3 items = 2 excess duplicates.- The
canonicalsuggestion picks the shortest member. You may want to apply your own logic (e.g. prefer items without noise or misspellings). - For menus over 2000 items, split into batches by category or restaurant and deduplicate each batch.
- Promotional text (prices, "NEW", serving sizes) is stripped before comparison, so "Chicken Biryani" and "BEST SELLER Chicken Biryani Rs. 299" will match.