Extract Intelligence from the Biomedical Black Hole to Accelerate Discovery

Trying to extract usable insights from the world’s biomedical knowledge can be like trying to escape from a black hole - how can we unlock the potential?


Integrating public and proprietary data into a Knowledge Graph – including knowledge extracted from research texts with Natural Language Processing – is the first step toward realizing its potential.

Just as important is a user experience designed to fit, and augment, researchers’ workflows. We are building our platform ERGO* to support scientists in extracting usable insights to advance discoveries.

The always-accelerating growth of biomedical knowledge has opened the door to a wealth of new insights, drug targets, and therapies.

However, trying to extract usable insights from that massive set of knowledge can be like trying to escape from a black hole, due to the sheer size, complexity, and decentralized nature of biological data. A recent paper describes the challenge well: “The deluge of new papers may deprive reviewers and readers the cognitive slack required to fully recognize and understand novel ideas… These findings suggest that the progress of large scientific fields may be slowed, trapped in existing canon.”

Evaluating potential drug targets requires sifting through research articles, biomedical databases, experimental results, and more.

Key insights might be found only by connecting multiple pieces together, which could take weeks or months of research. And often, you don’t have just one target to evaluate. The cost grows exponentially if you have a list of promising candidate targets and a number of possible indications. The risk of missing out on a novel target is costly; so is the risk of pursuing a target someone else has already proved unsuccessful.

How do you empower your scientists with the insights they need to unlock new discoveries?

Computers excel at working with structured data, and modern cloud computing has made “big data” so accessible that nobody even uses the term anymore.

The first step to utilizing all of the available biomedical knowledge is to aggregate as many data sets as possible. Research articles from PubMed. Patents. Conference proceedings. Protein information from Uniprot, PANTHER, Human Protein Atlas, PDB. Gene data from Ensembl, NCBI, GO. Pathways from Reactome. Drug indications and clinical trial data. This is just to name a few datasets that are freely available. And every drug development team is generating proprietary data sets that need to be integrated within the larger set of publicly available data.

Bringing all of that data together only recreates the black hole; the right tools are needed to extract actionable intelligence in real time. Many of these data sources are describing the same underlying concepts or entities (e.g., “genes”, or “diseases”). We need to know when different sources are referring to the same entity by alternative names or IDs. Only then can the relationships between them (drug-target interactions, for example) form a single, interconnected knowledge graph. This is more difficult for knowledge within text sources like publications, patents, or conference materials. Natural Language Processing (NLP) is needed to extract insights from unstructured data and to link them correctly to the entities represented in the knowledge graph.

A robust, comprehensive knowledge graph is necessary, however it is not sufficient on its own.

To truly accelerate discovery, scientists need to be able to access and explore the knowledge graph in a way that fits their research workflow. They can’t put their work on hold while they learn a new language to write graph queries. On the other hand, if we build a beautiful, intuitive user interface (UI) on top of the graph, it’s not useful to someone if it’s less functional than their existing workflow. The user experience (UX) is absolutely critical. Unfortunately, most attempts to design a single solution that will be useful for generic researchers miss in one direction or the other. The problem is, although many of the questions being asked are similar, the workflows can be very different between academia and industry, between startups and big pharma, between preclinical and clinical, etc.

If we can’t build one solution that will help everyone, and building a bespoke solution from scratch for each use case is impractical, what can we do? The approach we are taking at Mercury Data Science with our ERGO platform is to build a flexible, modular core framework, with an extensible user experience that can be customized for different workflows and domains. We have built a scalable architecture to automatically ingest and refresh data from many sources into one normalized knowledge graph, including Natural Language Processing of biomedical texts. This knowledge network creates the foundation for predictive analytics to find new connections. And our team’s deep domain knowledge of computational biology, experience creating sophisticated data visualizations, and UX/UI design training helps us work with individual customers – and the actual scientists whose workflows we are trying to enhance – to design interfaces that are beautiful, intuitive, and functional for the use case at hand.

Bottom Line

Scientists don’t need AI software that tries to do their job for them. They need tools to accelerate their workflows and insights. We are building ERGO to meet this need.

Connect with us

We build data science applications to support drug development, digital & molecular biomarker discovery, and digital health. Our team works at the intersection of biology and technology to accelerate innovation. If you have an AI/ML-related question or would like to discuss your AI strategy, we’d love to hear from you! Reach out today at inquire@mercuryds.com, on Twitter @mercurydatasci, or on LinkedIn.

*The ergosphere is an area just outside the event horizon of a rotating black hole, where it’s possible to extract usable energy from the black hole itself.

Written by:
Published on:
October 6, 2022
Back to All Blog Posts
View more recent blog posts