Our work

in life sciences

in tech

Case Study

Automate data engineering normalization for public real estate data

Use data science techniques to auto-normalize and unify data for an organization focused on showing the impact of the governmental policy decisions on real estate values

Challenge

Our client focuses on showing the impact of governmental-policy decisions on real estate values. Their business model depends on real estate transaction data contained in deeds, from county and town level recorders, each with varying data formats. To support the growth and scalability of their business, our client needed an automated process to ingest and normalize data in varying formats into a unified format.

Solution

Our team built data engineering pipelines to ingest semi-structured, heterogeneous data files and trained machine learning models to infer data types and auto-normalize to a standard data model. Our models were trained to classify data based on morphology, string matching, and distribution of data. We identified the most similar distribution based on past data to assign data types. We worked with the client engineering team to incorporate the data engineering pipelines and automated normalization algorithms into their platform.

Back to Life Sciences Case StudiesBack to tech case studies
View related case studies