Our work

in life sciences

in tech

Case Study

Automate taxonomy construction and product classification

Develop an algorithm to transform 2.3 Tb of text for ~8.5 million unique products into a hierarchical taxonomy and unified naming convention

Challenge

Our client, a grocery and retail pricing platform, collects billions of pricing records from 100s of online and brick & mortar retailers to provide sales and pricing insights and forecasts to consumer product goods (CPG) brands. Our client needed a way to automatically classify and organize products, in particular, to be able to automatically identify similar products, even if named or described differently.

Solution

We trained a deep learning, natural language processing (NLP) model on 2.5 TB of text including product name, description and store categories for 8.5 million products, leveraging word vectors to auto-generate a taxonomy. We translated to a hierarchical taxonomy and unified naming convention. The client software engineering team has incorporated the taxonomy and unified naming convention into their platform, and has deployed data engineering pipelines to pre-process text for input into the deep learning model to classify new products.

Back to Life Sciences Case StudiesBack to tech case studies
View related case studies