Extracting Standardized Visual Acuity Data from Free-Text Electronic Health Records: The visualacuity Toolkit
Investigative Ophthalmology & Visual Science,
Jan 2024
Abstract
In electronic health records (EHR), visual acuity (VA) is often written in free-form text, using various notation formats corresponding to different measurement techniques. This documentation practice poses a barrier for consistent secondary use of EHR records. We introduce the visualacuity open-source toolkit to extract relevant VA values from text, then convert them into a standardized structured format for large-scale comparison and analysis. Ultimately, the tool will be usable across sites and vendors, facilitating multi-site studies as well as reproducibility of research. We used VA data recorded in the EHR (Epic) at Casey Eye Institute during 2022. For each text field, tokens were extracted from plain text using regular expressions. We then converted the tokens into a sequence of structured objects using a parser generator with a custom grammar. The grammar specified syntax for measurements taken with common VA methods (e.g. Snellen, Jaeger, and ETDRS) as well as additional pertinent details (e.g. laterality, distance of measurement, and the presence or absence of vision correction). Wherever a single, unambiguous VA measurement was found, we derived a Snellen and LogMAR equivalent; otherwise, it was marked as an error. The core library was written in Rust for interoperability with other programming languages, and we packaged the library in Python initially. Similar bindings for R are under development. Out of 4.5M VA values, 4.1M (91.1%) were successfully converted to a Snellen/LogMAR equivalent, including 4.0M (88.9%) Snellen, 100K (2.2%) Jaeger, and 692 (0.0%) ETDRS values. Of the remaining values, 544K (12.2%) were identified as using an alternative method (e.g. Teller cards or low-vision methods such as finger counting). The algorithm was unable to extract VA data from 46K (1.0%) records. The tool can be found at https://github.com/HribarLab/visualacuity VA can be represented in multiple formats, and is often recorded in non-standard ways. Consistent extraction of VA values is needed for multi-site studies and reproducibility of research. We plan to expand the standardized computation of LogMAR values, validate with data from other sites, and develop similar tools for other eye exam data recorded in EHR. This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.Add the full text or supplementary notes for the publication here using Markdown formatting.