Our client, a food product data transparency platform, needed to automate the pipeline of data ingestion and quality control of ingredients, brand name, and nutritional facts for food products to ensure food product claims match product ingredients. Data was stored in more than 2 million images of product labels, and they were receiving data on more than 15,000 products per week. The client had OCR algorithms to parse the product label images into text.
Our team trained a specialized deep learning, natural language processing model to classify and cluster 250,000+ unique products into 2,000 categories of aisle, shelf, and food type. We partnered with the client engineering team to incorporate the data engineering pipelines and classification and clustering algorithms into their internal platform.