๐ Smarter food search that understands synonyms
What we're solving:
The food search engine currently has trouble understanding that different words can mean the same thing. For example, searching for "butter unsalted" won't find "butter without salt" even though they describe the same food. Additionally, the search prioritizes your personal foods over the main database regardless of match quality, so searching for "cream" might suggest "cream cheese" from your saved foods instead of plain cream from the USDA database. This task teaches the search system to recognize synonym descriptors, detect conflicting descriptors, and prioritize exact matches over partial ones.
Overview
Improve the food matching algorithm to handle descriptor synonyms, reject conflicting descriptors, and prioritize exact name matches over partial matches from higher-priority sources.
Issue 1: Descriptor Synonym Matching
Problem: Currently "butter unsalted" doesn't match "butter without salt" because the matching algorithm doesn't recognize equivalent food descriptors.
Solution: Implement synonym groups for common food descriptors:
| Category | Synonyms |
|---|---|
| Salt | "unsalted" โ "without salt" โ "no salt" โ "salt free" |
| Pasteurization | "unpasteurized" โ "raw" |
| Sugar | "unsweetened" โ "no sugar" โ "sugar free" |
| Fat | "nonfat" โ "fat free" โ "skim" / "low fat" โ "reduced fat" โ "light" |
| Cooking methods | "grilled" โ "chargrilled" โ "bbq", etc. |
Issue 2: Conflicting Descriptors Not Handled
Problem: The synonym matching normalizes "unsalted" โ no_salt, but doesn't check for conflicting descriptors. "salted" is the opposite of "unsalted" and should be rejected.
Examples:
- โQuery: "butter unsalted" should NOT match "Butter, salted"
- โQuery: "unsweetened tea" should NOT match "sweetened tea"
Solution: Add conflict detection - if query contains a descriptor, reject foods with the opposite descriptor:
- โ
no_saltconflicts withsalted - โ
no_sugarconflicts withsweetened - โ
fat_freeconflicts withfull_fat - โ
rawconflicts with cooked methods (grilled, fried, baked, boiled)
Issue 3: Exact Match Prioritization
Problem: "200g cream" matches "Cheese, cream" from My Foods instead of "Cream" from USDA dataset.
Current behavior: The priority system (My Foods > Dataset > OFF) always prefers higher-priority sources regardless of match quality.
Expected behavior: Exact or near-exact name matches in lower-priority sources should beat partial matches in higher-priority sources.
Example: When searching "cream", "Cream" from USDA should rank higher than "Cheese, cream" from My Foods because "Cream" is an exact match.
Issue 4: Brand Matches Ranked Too High
Problem: Searching "milk skim" matches "Yoghurt, plain" with brand "skim milk" instead of actual skim milk products.
Solution:
- โDeprioritize brand matches in
isStrongMatch()- brand should only be a tiebreaker, not a primary match criterion - โIn
relevanceScore(), give lower scores to brand-only matches - โConsider excluding brand from single-word query matching entirely
Technical Approach
- โAdd
descriptorSynonymGroupsdictionary inFoodMatchingUtils.swift- โCreate a data structure mapping each synonym to its canonical form
- โExample:
["unsalted": "no_salt", "without salt": "no_salt", "no salt": "no_salt", "salt free": "no_salt"]
- โAdd
descriptorConflictsdictionary to define mutually exclusive pairs- โExample:
["no_salt": ["salted"], "salted": ["no_salt"]]
- โExample:
- โCreate
normalizeDescriptor()function to map synonyms to canonical forms- โShould handle compound descriptors (e.g., "unsalted low fat")
- โCase-insensitive matching
- โCreate
haveMatchingDescriptors()function to detect conflicts- โReturns false if query and food have conflicting descriptors
- โUpdate
isStrongMatch()andrelevanceScore()to use synonym matching and conflict detection- โNormalize both query and candidate descriptors before comparison
- โTreat synonym-matched descriptors as equivalent for scoring purposes
- โReject matches with conflicting descriptors
- โModify
pickBestCandidate()to consider match quality across all sources before applying priority- โIntroduce a match quality threshold (e.g., exact match = 1.0, strong match = 0.9, partial = 0.5)
- โOnly apply source priority as a tiebreaker when match quality is similar
- โProposed logic:
finalScore = matchQuality * 100 + sourcePriority(ensures quality dominates)
Files to Modify
- โ
NutriKit/Voice/FoodMatchingUtils.swift- Main implementation location - โ
NutriKit/Voice/TextLoggingModel.swift- May need updates if search flow changes affect the model layer
Acceptance Criteria
- โ Synonym matching: "butter unsalted" matches "butter without salt" with high confidence
- โ Synonym matching: "unsweetened yogurt" matches "yogurt no sugar"
- โ Conflict detection: "butter unsalted" does NOT match "Butter, salted"
- โ Conflict detection: "unsweetened tea" does NOT match "sweetened tea"
- โ Exact match priority: Searching "cream" returns "Cream" from USDA over "Cheese, cream" from My Foods
- โ Exact match priority: Searching "chicken" returns "Chicken" over "Chicken salad" when exact match exists
- โ Brand deprioritization: "milk skim" matches milk products, not yogurt with "skim milk" brand
- โ Brand deprioritization: Name/detail matches always rank higher than brand-only matches
- โ Existing exact matches continue to work correctly (no regressions)
- โ Unit tests added for synonym normalization and conflict detection logic
Related
Follow-up to PXL-826 (Voice/Text Logging food matching fix) Incorporates requirements from PXL-837 (now canceled as duplicate)
Build instruction: Use -destination 'platform=iOS Simulator,name=iPhone 17 Pro' when building this project.