X

Electronic Health Record Big Data: Massive Responsibility, Little Oversight – A Learning Tale

Author: Pamela Davis, MD, PhD on January 07, 2025

image

The vision of universal interoperable electronic health records (EHRs), which facilitate clinical care, was articulated decades ago but is still not fully realized. Still large EHR systems have grown and expanded, and health information exchanges proceeded, creating both a billing bonanza and a perceived burden for many physicians. The potential for outcomes and other research applications became clear almost immediately; however, because the data were recorded for clinical or billing purposes, utilization of the mega databases for rigorous studies has been challenging. Yet, the potential advantages and opportunities are broad. 

If realized, it would be possible to: 1) analyze data from many patients all at once to determine the frequency of a particular disease or an adverse event, 2) to monitor the health care utilization of specific patient groups, 3) to identify correlations between diseases or previously unappreciated risk factors, and 4) to test, retrospectively, the impact of new drugs, diseases, or interventions including non-pharmacological options such as behavioral or nutritional therapies. If realized, it would be possible to look back over many years to examine the course of a disease or a treatment, similar to the Framingham Study. Moreover, these studies could be done quickly rather than over years or decades to, e.g., describe the longitudinal course of a particular disease. A description of medical records across 20 years might be able to be done in days or weeks. Importantly, these advantages have been amplified as the available EHR systems coalesced into a few global leaders, collecting more and more patients into a single system. As academicians, clinical trialists, and entrepreneurs grasped the extent to which large data platforms could assist in developing, delimiting, and recruiting to clinical trials, as well as conducting research, the vision for potential opportunities blossomed. Health data exchanges allowed some EHRs to access data from state health departments, or from other health systems, further expanding their utility. 

Several corporations, realizing the power and potential of medical records, developed research data platforms. Two large EHR systems, Epic and Cerner, developed research systems to accompany their clinical support structures and now boast hundreds of millions of records each. Another system, TriNetX, grew and evolved with voluntary contributions from many health care organizations that were hoping to participate in clinical trials. TriNetX engaged the pharmaceutical industry and is good at helping to define clinical inclusion and exclusion criteria to maximize potential enrollment in trials. It was also equipped with embedded analytic software, making it easier to use for these broader applications, which may be why it has become quite popular among residents and medical students. TriNetX contains more than 110 million records of patients in the US and 250 million records worldwide. With these numbers comes great capacity to study rare diseases, complications of procedures, or medications, as well as drawing associations that are simply inaccessible to investigators working with much smaller numbers of patients. 

Advantages vs. Downsides

As the size of the data platforms grows and their relative ease of use improves, so does the potential for error or misinterpretation. It becomes crucial to consider the problems we try to address and whether they are reasonable for the available data and structure. It is critical to distinguish statistical significance from clinical meaningfulness or importance. Very large databases can yield comparisons that are statistically significantly different but do not matter in the slightest in clinical practice or patient outcomes. What are the default settings for the question you are asking? Will you get the first data for an event or the most recent? What constitutes a diagnosis in the EHR – a constellation of symptoms or an ICD-10 code or a set of therapeutic interventions?    

To emphasize and consolidate these questions so that medical professionals, including students, residents, and fellows, can appreciate the risks and benefits of an EHR study, Olaker and colleagues assembled a tutorial that will remind readers of the important considerations in this workspace. This paper arose from a student’s concern at a national meeting that her robust efforts to “get it right” had been glossed over by many of her fellow presenters, and that therefore, some of the inferences from their results were likely to be false. When she and a student colleague introduced incoming students to working with the EHRs by brief lectures, presentations, and discussions at lab meetings and journal clubs, the students suggested that presenting the important issues for using EHRs in research to the broader community would be a service.  

The two student colleagues, Veronica Olaker and Sarah Fry, began this tutorial intending to orient their peers – students and residents – to this rich but treacherous resource. Their project grew and recruited other students, Maggie Miller and Ian Dorney, as well as faculty advisers and a member of the TriNetX consortium to help. They finally entitled their work “With Big Data Comes Big Responsibility: Strategies for Utilizing Aggregated, Standardized, De-Identified Electronic Health Record Data for Research.” We hope you enjoy their efforts and find them useful. 

print
The comment feature is locked by administrator.
Sort by:
Photo Gallery
Recent News
Contact Us