Our work

in life sciences

in tech

Case Study

Automatically discover themes from 6 million new web articles published daily

Develop specialized natural language processing (NLP) models to extract and organize newly published web content for a digital PR firm

Challenge

Our client, a digital PR firm, needed to automate the discovery of concepts in new articles published to the web daily to share with their customers to drive realtime PR campaigns. Further, these 6 million articles needed to be processed within 1 hour.

Solution

We built data engineering pipelines to pre-process text strings to lemmatize and drop out stop words. Our team developed a specialized natural language processing (NLP) model to process, classify, and cluster web-based articles, based on primary purpose and content. We were able to extract the most common themes present across the full set of new articles. We optimized the performance and parallelized the pipeline to process 6 million articles daily within 1 hour.

Back to Life Sciences Case StudiesBack to tech case studies
View related case studies