The Case for Video AI in Telemedicine

Read our top three use cases on the integration of AI video analytics to significantly raise the value of both Telemedicine-based care and Distributed Clinical Trials.

Telehealth is quickly becoming a critical channel for doctor-patient interactions. In less than two years, utilization of telemedicine has increased by 38 fold and investments in the digital health sector have seen a similar boom, with 2021 bringing in almost $30 billion, nearly doubling the record-setting 2020 investment.

At this point, telemedicine is usually just a scheduled videoconference. While this is fine for qualitative data, some important physical measurements are hard to obtain - largely because it is untenable to deploy specialized hardware in every home. However, software-based solutions can provide some of this information and, in some cases, a richer data set, than an office visit. We believe that the incorporation of video analytics will be used to significantly improve care in clinical interactions and allow trial sponsors to gather more information in distributed clinical trials (DCTs).

Examples of Software-only solutions

A surprising number of vital signs and similar measurements can be obtained just through advanced AI/ML techniques on video, including heart rate, heart rate variability, respiration, blood pressure, gaze tracking, and pupil dilation measurements.  Additionally, information that might be difficult to obtain in the office can easily be captured, and perhaps compared longitudinally to determine changes in emotion, attention, cognition, and general health. These tools can be further applied to extract novel physiological measurements and derive longitudinal insights from previous video archives. Here, we describe in detail 3 potential applications of our holistic video analytics platform (VIVO) in the healthcare space:

Example One: Software-only Heart rate and HRV

Heart rate and variability are important measures of health and fitness. Getting an accurate measurement of these features during a doctor’s visit typically entails use of a stethoscope, a tool which is inaccessible to most and gives a single discrete reading. A growing alternative has been through use of wearable technology, which can provide continuous readings but is as of yet utilized by only 21% of adults in the United States.  At Mercury Data Science, we have integrated into our VIVO platform a heart rate detection module by applying a computer vision based approach to the biological principle that invisible-to-the-eye fluctuations in the green color channel of a person’s face corresponds to pulse. We started out by utilizing an open source facial recognition and landmark prediction model. By using this model to parse color channels across key facial regions and subsequently applying our custom signal processing filter, we are able to tease out a fluctuation pattern similar to that of an ECG. Finally, peak detection and error correction algorithms enable us to measure heart rate and variability with surprising fidelity.

Example Two: Software-only Gaze Tracking

Gaze tracking has seen increased application in the neuroscience field as both a diagnostic tool for predicting cognitive impairment and as a critical metric for assessing social behaviors. However, a dependency on hardware has made it a difficult measurement to make in a DCT. To bridge this gap, we found that a software-based approach can be deployed with accuracy that rivals the fidelity of specialized hardware gaze tracking equipment. Using state-of-the-art iris detection models, intuitively engineered features, and spatial regression-based approaches, this model predicts gaze to within an error rate of 1-3 degrees on a frame-by-frame basis.  So, it is possible to deploy a cost-effective and minimally intrusive way to conduct fixation experiments and extract other gaze features in a decentralized (and non-obtrusive) manner.

Example Three: Feature Extraction from Voice and Video for Multimodal AI/ML

Video, in DCT’s or on site, offer some really interesting opportunities to create valuable data assets as part of a clinical trial or in the telepsychiatry space.. For instance, in-office gauging an individual’s emotional state in a consistent manner has been a challenge. This is largely due to the lack of a standardized assessment metric and the biases inherent in measuring subjective features, potentially by different individuals across different patients and different visits. We built separate modules to predict emotion based on facial landmarks and vocal based sentiment analysis, both of which are predicated on established markers of different emotional states. Applying these methods can allow for consistent assessment across videos with minimal costs and time efficiency. Additionally, video and voice analytics modules can be designed to extract many other features such as speech segment isolation, speaker diarization, and general movements allowing for many distinct applications to best fit one’s needs. This kind of data set, potentially across multiple clinical trials, could be hugely valuable for finding new digital biomarkers to measure patient drug response or disease progression and can play a key role in the advancement of precision psychiatry.

About Security

Security of the video is an important point. Studies have found that 93% of adults felt that controlling data privacy was important and that healthcare information was considered the second most sensitive piece of information behind only social security numbers. Because of liability and reputational risk, most telemedicine companies and trial sponsors are reluctant to retain videos of trial participants even though valuable data could be extracted.  The solution to this is simple: extract the features and don’t keep the video. Doing so abstracts out quantifiable digital biomarkers from identifiable features, alleviating potential problems with data breaches.

Final Thoughts and Takeaways

Though somewhat born out of pandemic-related necessity, the advantages of telemedicine are becoming increasingly evident. The integration of AI video analytics is, in our opinion, only a matter of time and it will significantly raise the value of both telemedicine based care and Distributed Clinical Trials.

Connect with Us

With expertise in NLP, Machine Learning, Bioinformatics, and Video/Voice Analytics and a passion for cutting-edge data science, our team is always looking for ways to enhance discoveries and accelerate your potential. If you have an AI/ML-related question or would like to discuss your AI strategy, we’d love to hear from you! Reach out at or on LinkedIn.

Written by:
Published on:
April 21, 2022
Back to All Blog Posts
View more recent blog posts