Our client is developing a metabolite-based diagnostic test to screen for one of the most common cancer types in men and women. After getting promising results from a smaller scale phase 1 clinical study, the client needed machine learning expertise to begin evaluating a clinical dataset comprised of raw biomarker concentrations for approximately 200 case and control subjects. In addition to validating the results from the first study, the client needed fast but robust analysis to narrow in on a small panel of predictive biomarkers that could best drive ML model performance.
Our team customized our biomarker discovery platform, VIVO, to rapidly narrow in on a core set of predictive biomarkers. The first step in this process was applying a comprehensive set of transformations to expand our feature set and maximize potential signal. We then implemented our feature importance ranking algorithms to hone in on an initial subset of biomarkers. These features were then evaluated across a wide array of ML models through many-fold iterations of cross validation to ensure generalizability. Finally, we conducted subpopulation analysis to construct an ensemble model which was able to demonstrate high accuracy while maintaining interpretability. Our client is using these insights to plan their larger phase 2 clinical study.