Our client has a platform that allows providers to share in-network and out-of-network pricing with patients for laboratory tests at the point of care. Because of the new Transparency in Coverage, payors are now required to publish cost information via machine-readable JSON files.

Our client needs to incorporate the data in these files into their platform for the most up-to-date cost and coverage information, for each plan, for each geography, for each provider, and for each CPT code. The primary challenge involves engineering a data processing pipeline to efficiently process and update hundreds of terabytes of data from large one-line files.


We designed a unified data model using Snowflake to support the taxonomy of the payor data. We developed a data ingest pipeline that uses cloud computing to preprocess the files in parallel, then ingest and store into a data warehouse.

We then developed optimized queries and views to retrieve information. This pipeline enables the client to access valuable healthcare cost data to provide to their customers, enabling healthcare consumers to make informed decisions on their choice of providers, and avoid unexpected medical bills.

