Integrating public and proprietary data into a Knowledge Graph – including knowledge extracted from research texts with Natural Language Processing – is the first step toward realizing its potential.
Just as important is a user experience designed to fit, and augment, researchers’ workflows. We are building our platform ERGO* to support scientists in extracting usable insights to advance discoveries.
However, trying to extract usable insights from that massive set of knowledge can be like trying to escape from a black hole, due to the sheer size, complexity, and decentralized nature of biological data. A recent paper describes the challenge well: “The deluge of new papers may deprive reviewers and readers the cognitive slack required to fully recognize and understand novel ideas… These findings suggest that the progress of large scientific fields may be slowed, trapped in existing canon.”
Key insights might be found only by connecting multiple pieces together, which could take weeks or months of research. And often, you don’t have just one target to evaluate. The cost grows exponentially if you have a list of promising candidate targets and a number of possible indications. The risk of missing out on a novel target is costly; so is the risk of pursuing a target someone else has already proved unsuccessful.
The first step to utilizing all of the available biomedical knowledge is to aggregate as many data sets as possible. Research articles from PubMed. Patents. Conference proceedings. Protein information from Uniprot, PANTHER, Human Protein Atlas, PDB. Gene data from Ensembl, NCBI, GO. Pathways from Reactome. Drug indications and clinical trial data. This is just to name a few datasets that are freely available. And every drug development team is generating proprietary data sets that need to be integrated within the larger set of publicly available data.
Bringing all of that data together only recreates the black hole; the right tools are needed to extract actionable intelligence in real time. Many of these data sources are describing the same underlying concepts or entities (e.g., “genes”, or “diseases”). We need to know when different sources are referring to the same entity by alternative names or IDs. Only then can the relationships between them (drug-target interactions, for example) form a single, interconnected knowledge graph. This is more difficult for knowledge within text sources like publications, patents, or conference materials. Natural Language Processing (NLP) is needed to extract insights from unstructured data and to link them correctly to the entities represented in the knowledge graph.
To truly accelerate discovery, scientists need to be able to access and explore the knowledge graph in a way that fits their research workflow. They can’t put their work on hold while they learn a new language to write graph queries. On the other hand, if we build a beautiful, intuitive user interface (UI) on top of the graph, it’s not useful to someone if it’s less functional than their existing workflow. The user experience (UX) is absolutely critical. Unfortunately, most attempts to design a single solution that will be useful for generic researchers miss in one direction or the other. The problem is, although many of the questions being asked are similar, the workflows can be very different between academia and industry, between startups and big pharma, between preclinical and clinical, etc.
If we can’t build one solution that will help everyone, and building a bespoke solution from scratch for each use case is impractical, what can we do? The approach we are taking at Mercury Data Science with our ERGO platform is to build a flexible, modular core framework, with an extensible user experience that can be customized for different workflows and domains. We have built a scalable architecture to automatically ingest and refresh data from many sources into one normalized knowledge graph, including Natural Language Processing of biomedical texts. This knowledge network creates the foundation for predictive analytics to find new connections. And our team’s deep domain knowledge of computational biology, experience creating sophisticated data visualizations, and UX/UI design training helps us work with individual customers – and the actual scientists whose workflows we are trying to enhance – to design interfaces that are beautiful, intuitive, and functional for the use case at hand.
Scientists don’t need AI software that tries to do their job for them. They need tools to accelerate their workflows and insights. We are building ERGO to meet this need.
We build data science applications to support drug development, digital & molecular biomarker discovery, and digital health. Our team works at the intersection of biology and technology to accelerate innovation. If you have an AI/ML-related question or would like to discuss your AI strategy, we’d love to hear from you! Reach out today at email@example.com, on Twitter @mercurydatasci, or on LinkedIn.
*The ergosphere is an area just outside the event horizon of a rotating black hole, where it’s possible to extract usable energy from the black hole itself.
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
Below is an image