Automating Detection of Diagnostic Error of Infectious Diseases using Machine Learning

Kelly S. Peterson, Alec B. Chapman, Wathsala Widanagamaachchi, Jesse Sutton, Brennan Ochoa, Barbara E. Jones, Vanessa Stevens, David C. Classen, Makoto M. Jon


Diagnostic error, a cause of substantial morbidity and mortality, is largely discovered and evaluated through self-report and manual review, which is costly and not suitable to real-time intervention. Opportunities exist to leverage electronic health record data for automated detection of potential misdiagnosis, executed at scale and generalized across diseases. We propose a novel automated approach to identifying diagnostic divergence considering both diagnosis and risk of mortality. 

Our objective was to identify cases of emergency department infectious disease misdiagnoses by measuring the deviation between predicted diagnosis and documented diagnosis, weighted by mortality. Two machine learning models were trained for prediction of infectious disease and mortality using the first 24h of data. Charts were manually reviewed by clinicians to determine whether there could have been a more correct or timely diagnosis.


Diagnostic errors are harmful to patients, traumatic for providers, and costly for healthcare systems. A recent study showed that infectious diseases are one of three major disease categories causing the majority of misdiagnosis-related harms [1]. It estimated 40,000 to 80,000 deaths in hospitals in the US related to misdiagnosis. Diagnostic error evaluation can be conducted with instruments such as Safer Dx [2,3]to identify areas of improvement; however, these instruments require manual case review from clinicians with expertise and are most efficient when applied to known or probable cases of error. Thus, while such instruments are useful, they are not scalable to large populations or ultimately amenable to providing rapid feedback at the point of care.

Materials and Methods

This study was performed using data from the Veterans Health Administration (VHA) Health Care System which cares for more than 9 million living Veterans at over one hundred emergency departments [20]. The study population included all emergency department (ED) visits to a VA medical center from January 1, 2017, to December 31, 2019. Data were extracted from the Corporate Data Warehouse (CDW), VHA’s repository for electronic clinical and administrative records.


A total of 6,536,315 ED visits were initially included from 104 distinct VA medical centers. These visits were across 2,141,271 unique patients where the mean age at the time of the ED visit was 60 years old and 88.1% were male.


Individual models for infectious diseases and mortality demonstrated reasonable diagnostic performance statistics but positive predictive value and PR AUC were low given the low prevalence of infectious diseases. These models were trained with a very large number of features, representing a substantial fraction of the structured data readily available from an EHR.

Our primary concern was whether our combined measures of diagnostic divergence was related to diagnostic error. Correlations between human review of cases and the proposed measure showed a weak positive correlation. When examining the interrater reliability of our subjective diagnostic error measures and when performing an analysis of discordant cases, it became apparent that there was substantial disagreement in interpreting cases and how to weigh the individual components of diagnostic error.


Our proposed method for detecting diagnostic deviance yields candidate cases enriched for diagnostic error. It also finds miscodes and difficult cases. Further refinement could yield a tool for flagging charts for review. Comparisons between human review and our approach indicate preliminary feasibility. Increases in accuracy will likely require natural language processing and methods to leverage information on time and concept relatedness. Further work is needed to develop reliable instruments for rapidly evaluating diagnostic error. Continued development is necessary to allow reviewers and users to explore more detailed information and be convinced that the measurements are valid before a metric can be implemented in clinical practice.

Citation: Peterson KS, Chapman AB, Widanagamaachchi W, Sutton J, Ochoa B, Jones BE, et al. (2024) Automating detection of diagnostic error of infectious diseases using machine learning. PLOS Digit Health 3(6): e0000528.

Editor: Hualou Liang, Drexel University, UNITED STATES

Received: January 30, 2024; Accepted: May 7, 2024; Published: June 7, 2024

Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: Per IRB requirements and VA regulations, patient-level data from this study cannot be shared directly. Access to source data could be accessed by VA-credentialed investigators with an approved IRB and proper VA research authorization. Inquiries about this process for data access can be addressed to [email protected].

Funding: This work was supported by Gordon and Betty Moore Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.