API Reference

POST /dedup

Send a full menu and get back clusters of items that refer to the same dish. Catches spelling errors, transliterations, promotional noise, and serving size variations. Returns groups of duplicates plus a list of singletons (unique items).

Request

ParameterTypeRequiredDescription
itemsstring[]YesMenu item texts to deduplicate. 1-2000 items, max 500 chars each.
cosine_thresholdfloatNoSimilarity threshold (0.85 works well, lower = more aggressive grouping). Default: 0.85.
{
  "items": [
    "Chicken Biryani",
    "Murgh Biryani Serves 2",
    "Paneer Tikka",
    "Panner Tika",
    "Masala Dosa",
    "**NEW** Chiken Biryani (Half)"
  ]
}

Response

FieldTypeDescription
clustersobject[]Groups of duplicate items
singletonsstring[]Items with no duplicates found
total_itemsintTotal items submitted
duplicate_itemsintNumber of excess duplicates (items in clusters minus number of clusters)
processing_time_msfloatProcessing time in milliseconds

Each cluster:

FieldTypeDescription
cluster_idintUnique cluster identifier (1-based)
canonicalstringShortest member, recommended as the canonical name
membersstring[]All items in this duplicate group (original text)
pairwise_scoresobject[]Similarity scores between each pair of members
{
  "clusters": [
    {
      "cluster_id": 1,
      "canonical": "Chicken Biryani",
      "members": ["Chicken Biryani", "Murgh Biryani Serves 2", "**NEW** Chiken Biryani (Half)"],
      "pairwise_scores": [
        {"item_a": "Chicken Biryani", "item_b": "Murgh Biryani Serves 2", "cosine": 0.93, "reranker_score": 0.96},
        {"item_a": "Chicken Biryani", "item_b": "**NEW** Chiken Biryani (Half)", "cosine": 0.95, "reranker_score": 0.98},
        {"item_a": "Murgh Biryani Serves 2", "item_b": "**NEW** Chiken Biryani (Half)", "cosine": 0.91, "reranker_score": 0.94}
      ]
    },
    {
      "cluster_id": 2,
      "canonical": "Paneer Tikka",
      "members": ["Paneer Tikka", "Panner Tika"],
      "pairwise_scores": [
        {"item_a": "Paneer Tikka", "item_b": "Panner Tika", "cosine": 0.96, "reranker_score": 0.99}
      ]
    }
  ],
  "singletons": ["Masala Dosa"],
  "total_items": 6,
  "duplicate_items": 3,
  "processing_time_ms": 187.4
}

Example

import requests

menu = [
    "Chicken Biryani",
    "Murgh Biryani Serves 2",
    "Paneer Tikka",
    "Panner Tika",
    "Masala Dosa",
    "**NEW** Chiken Biryani (Half)"
]

response = requests.post("https://embed.statode.com/dedup",
    headers={"X-API-Key": "YOUR_KEY", "Content-Type": "application/json"},
    json={"items": menu}
)

data = response.json()
print(f"Found {len(data['clusters'])} duplicate groups, {data['duplicate_items']} excess items")

for cluster in data["clusters"]:
    print(f"\n  Canonical: {cluster['canonical']}")
    print(f"  Duplicates: {cluster['members']}")

Cost

1.0 credits per item.

Notes

  • duplicate_items counts excess items (total items in clusters minus number of clusters), not total clustered items. A cluster of 3 items = 2 excess duplicates.
  • The canonical suggestion picks the shortest member. You may want to apply your own logic (e.g. prefer items without noise or misspellings).
  • For menus over 2000 items, split into batches by category or restaurant and deduplicate each batch.
  • Promotional text (prices, "NEW", serving sizes) is stripped before comparison, so "Chicken Biryani" and "BEST SELLER Chicken Biryani Rs. 299" will match.