AI/ML for Health Tech and Medical Devices: Invest Early in Data Science Infrastructure to Create Competitive Advantage

Harnessing the power of AI/ML has the potential to transform health tech and medical device companies by providing better outcomes, better patient and provider satisfaction and engagement, and better insight into product performance and real world evidence.

Increasingly health tech and medical device companies are building AI/ML-driven applications. Medical devices are being “connected” to build combination products. Digital health is expanding into regulated applications that qualify as Software as a Medical Device (SaMD). Harnessing the power of AI/ML has the potential to transform health tech and medical device companies by providing better outcomes, better patient and provider satisfaction and engagement, and better insight into product performance and real world evidence.  

Health tech and medical device companies starting on this journey often hire a data scientist to build a proof-of-concept model. Once the company commits to this path, we believe that it pays substantial dividends to plan and implement a scalable data science infrastructure early in the process.  Unfortunately, most health tech and medical device companies and their data science hires don’t start with the cloud, DevOps, or software architecture background to do this easily. Failure to do so can lead to substantial technical debt (backtracking) later and serious costs and inefficiencies in a company’s data science mission in both the short and long term.

AI/ML requires more ongoing engineering than you might think

Like the quality assurance aphorism: “Quality is a journey not a destination”, an investment into AI/ML or even simple analytics is more than just producing a model, like some final equation to be written into code and forgotten.

Model performance can unexpectedly degrade with exposure to unplanned patient populations, shifting treatment paradigms, changed use cases, or underlying shifts in data streams spanning EHR information, patient inputs, sensors on a cell phone, etc.  On the other hand, AI/ML model performance is often increased over time as a company develops a larger data set.  The lifecycle of build, train, test, deploy, monitor is continuous and, if infrastructure is not well designed, a company loses time and spends more just to keep that cycle going.

What does a well-designed data science infrastructure look like?

It should:

  • Abstract away as much of the data engineering complexity as possible (while still providing the future capabilities that will be needed as the AI/ML initiative grows)
  • Automate data pipelines for cleaning, feature extraction and AI/ML
  • Track data transformations, model changes, and experiments so that data scientists can collaborate and understand how any given model was developed
  • Run on any cloud and keep computation costs low as the company scales
  • Allow AI/ML models to be developed, tested, deployed, and maintained efficiently
  • Provide for future needs like model monitoring, feature stores, etc.

There is more (there is always more) but just thinking through these features will lead to a better design.

What does a well-designed data science infrastructure get you?

Our view is that putting the right infrastructure in place early in the game results in:

  1. Lower Cost of Cloud/Data/ML Engineering
    The biggest mistake we see with companies in the process of data science transformation is to have the data scientists build the infrastructure for deployment and maintenance. Very few data scientists are trained to build scalable, maintainable infrastructure and once your company is committed to a set of tools, it is very hard to back up and start over. Technical debt slows down innovation and often means more spend on engineering staff in the future.
  2. Alignment with the Regulatory Process
    For regulated applications, traceability of data and model versions allow model testing and improvement and aligns with the FDA-mandated design process for testing, verification, and record keeping.
  3. Faster Innovation
    As a consulting firm, we use our own “click to deploy” architecture to build, test and deploy models. It allows rapid creation of an environment and automation of many processes to be shared by the entire data science team without engineering overhead. That, along with the benefits of collaboration and retention of knowledge, allow a level of excellence that you can’t get with an ad hoc data science infrastructure.
  4. Collaboration and coordination
    If the data science team grows in the future, a well planned infrastructure can mean more efficiently getting the best results into production.
  5. Retention of institutional knowledge
    For the same reasons that collaboration is easier, new data scientists can come up to speed faster. This is critical given the fight for data science talent; we have seen life sciences companies lose their entire data science team all at once. Worse, we have seen companies that have lost critical data because the infrastructure wasn’t well defined and secure.

“How many data engineers does it take to support a data scientist?”

The tech industry consensus is that a business needs 2-5 data engineers per data scientist, to maintain a deployed model and to maintain the automated infrastructure that makes your data science efforts efficient.  We don’t think this is always the case, but our experience suggests that data science initiatives always require more engineering work in the long run than companies anticipate at the outset of their AI/ML initiatives.

Data engineers (or ML Engineers or Cloud Engineers) are not cheap – think at least $250,000 per year each after overhead, recruiting fees, misfires (hiring the wrong person). Design your system correctly upfront and you reduce the risk of future technical debt and will need fewer engineering resources to maintain your competitive edge.  

What if your company is just doing “Analytics” and not really “AI”?

Anything beyond very simple data analytics could benefit from many of the same processes to build and maintain as complex machine learning models so it’s worth thinking about future needs.  Once there is a commitment to be a data forward company, both the amount of data and the demand for increasingly sophisticated solutions only seem to grow.

Bottom line

We encourage CEOs to look at the benefits of a well designed data science infrastructure early on in their AI/ML journey.  We believe that investors and customers reward companies that are committed to building better products using data science and that a well-designed data science infrastructure leads to a real competitive advantage.

Connect with us

We build data science applications to support drug development, digital & molecular biomarker discovery, and digital health. Our team works at the intersection of biology and technology to accelerate innovation. If you have an AI/ML-related question or would like to discuss your AI strategy, we’d love to hear from you! Reach out today at, on Twitter @mercurydatasci, or on LinkedIn.

Written by:
Published on:
February 8, 2022
Back to All Blog Posts
View more recent blog posts