The global scientific community has risen to confront COVID-19: there are over 200,000 research publications related to coronavirus, with more than 4,000 added every week. Scientists drawing knowledge and insight from this ever-expanding literature face difficult challenges: tracking synonymous terms for genes and drugs across fields, connecting biological relationships among entities across articles, and distilling the essential information out of large sets of literature.
We built an open access NLP-based application using the CORD-19 dataset, the most extensive machine-readable coronavirus literature collection available for text mining. Our application includes domain-specific dictionaries for scientific entities including human genes and molecular pathways, viral genes and mutations, drugs and treatments, symptoms, diseases, and keywords. The application provides multiple views on relationships among entities, and our AI Summary feature identifies and summarizes the key themes in large document sets.