Our client, a grocery and retail pricing platform, collects billions of pricing records from 100s of online and brick & mortar retailers to provide sales and pricing insights and forecasts to consumer product goods (CPG) brands. Our client needed a way to automatically classify and organize products, in particular, to be able to automatically identify similar products, even if named or described differently.
We trained a deep learning, natural language processing (NLP) model on 2.5 TB of text including product name, description and store categories for 8.5 million products, leveraging word vectors to auto-generate a taxonomy. We translated to a hierarchical taxonomy and unified naming convention. The client software engineering team has incorporated the taxonomy and unified naming convention into their platform, and has deployed data engineering pipelines to pre-process text for input into the deep learning model to classify new products.