Phenotypes
Clinical phenotypes are manifestations of disease, treatment and response as represented in the electronic health record (EHR). EHRs have a large number of data elements representing diagnoses, procedures, laboratory test results, medication orders, and more that together reflect what diagnoses and treatments a patient has had, and what the patient’s treatment responses have been. The problem is, the medical record typically does not represent these phenotypes as distinct data elements. They must be inferred from combinations of data. Clinicians do this in their head as they browse and scan a medical record. Personnel using the medical record for secondary purposes such as reseach and quality improvement do this during the process of chart abstraction. The chart abstraction process’ manual nature inherently limits the volume of EHR data that can be used in a research or quality analysis.
Eureka! aims in part to automate chart abstraction. In other words, it automatically computes the data elements defined for a research or quality improvement study from the data elements in the medical record. We call such study data elements derived data elements. The Phenotype Editor screens in Eureka! are where you can specify your study’s data elements and how they should be computed, either from EHR data elements or other study data elements that you have already specified. Eureka’s phenotype editor supports creating four types of user-defined data elements (Category, Sequence, Frequency, and Value Threshold), described below. Together, these may be combined to specify clinical phenotypes. Examples of combinations include a custom grouping or category of diagnosis codes and a frequency threshold on the number of code values from your custom category that appear in a patient’s data, or scanning for a blood pressure result that is high and at least two hypertension diagnosis codes within 6 months. The Phenotype Editor screens guide you through creating these derived data elements. Having defined your derived data elements of interest, you can then compute those data elements in a dataset of interest, and load the data from the dataset and computed data values into i2b2, a database system with an easy to use user interface for querying and exporting data.
Editing Phenotypes
The Phenotype Editor screen may be accessed by clicking the Phenotypes link at the top of the Eureka! user interface. This screen contains a list of the phenotypes that you have specified already, and it provides a Create New Element link to create new derived data elements. Icons to the left of each derived data element in your list provide for editing and deleting the adjacent element.
Select derived data element type
Clicking on Create New Element opens up a dropdown for selecting a data element type. There are four types:
- Category
- Category data elements allow specifying clinically significant groupings and hierarchies of data elements.
- Sequence
- Sequence data elements allow specifying two or more data elements that must occur in a specified temporal order.
- Frequency
- Frequency data elements allow specifying the number of times a data element must be present in or computed from a patient’s data.
- Value threshold
- Value threshold data elements allow specifying lower and/or upper limits on one or more numerical observation data elements such as laboratory test results or vital signs.
Name and Describe your Derived Data Element
After selecting the data element type, a form will open for specifying your derived data element that is specific to the type that you selected. All of these forms, however, start with fields for specifying a name and an optional description.
Select Elements from the Concept Explorer
All of these forms have a concept explorer on the left side of the screen for selecting concepts representing EHR data and any other derived data elements you have already created. These are found in the System and User Defined tabs, respectively. The System tab represents EHR data as a hierarchy of clinical concepts. The User Defined tab displays the derived data elements that you have previously specified as a list. Because you can define derived data elements in terms of other derived data elements, you can build up data elements representing fairly sophisticated interpretations of EHR data. For example, you can specify a frequency derived data element on a previously defined value threshold data element such as at least two blood pressure values over 130/80.
In general, you drop concepts from the concept explorer into one or more blue boxes on the right side of the form. Clicking the icon to the left of the dropped concept will remove the concept from the box. Underneath those blue boxes, you may specify additional constraints on the concept. You may specify a duration constraint (for example, 2 days to 1 month). You also may specify a particular property value, if the dropped concept has properties (for example, Encounter with type INPATIENT). Make sure to check the checkboxes next to the constraint fields that you intend to use. You may drop only one concept into these blue boxes unless otherwise specified below.
Category data element
Drag and drop one or more concepts from the concept explorer into the blue box to the right to specify the members of your category. You may drop in previously created categories from the User-defined tab to create category hierarchies. If you make a mistake, just click the icon to the left of a category member to remove it. Once created, you may drop a category into any blue box where one of the category’s members is allowed.
Sequence data element
Drag and drop concepts from the concept explorer into the blue boxes for the primary data element (at the top of the form) and the first related data element. Computed sequence derived data values will have the start and finish time of the primary data element value from which it is computed. Optionally select duration and property constraints as described above. Then, specify the temporal relationship between these concepts (before or after) and whether the second concept must be a specified minimum and/or maximum time distance away from the first (for example, before by 2 days to 2 months). Click the Add to sequence link to specify additional temporal relationships between the primary and related data elements.
Frequency data element
Drag and drop a concept from the concept explorer into the blue box. Select the count of the data values represented by the dropped concept. Then, select whether you only are interested in the first n data values, or you are interested in any time at least n values occur. If you drop a value threshold into the blue box, a consecutive check box will appear. If checked, the count will apply only to consecutive values of the data element being thresholded. Optionally select duration and property constraints as described above. Finally, you may specify that the n data values must have a minimum and/or maximum time distance between consecutive values. The start and finish times of frequency derived data values match the temporal extent of the n data values from which they were computed.
Value threshold data element
Drag and drop a concept representing a data element that has a numerical observation from the the concept tree into the blue box labeled Drop Thresholded Data Element Here. Specify upper and/or lower thresholds on the value of that data element in the provided form fields. Click the Add threshold link to specify thresholds on additional data elements. If you specify more than one threshold, use the Value thresholds selector at the top of the form to specify whether your derived data element should be computed if any of the specified thresholds is found or only if all of them are found. In some situations, you may want a threshold to apply only in one or more clinical contexts such as patients with a particular diagnosis. You may specify a context by dragging one or more data elements from the concept explorer into the Drop Contextual Data Element Here boxes. You may specify that these contextual data elements be within a certain time distance before or after the data element values being thresholded. Computed value threshold derived data values have the same timestamp as the data from which they were computed.
Review and Save
After saving, your specified derived data element will appear in your derived data element list on the main editor screen and in the concept explorer. It also will be computed for any subsequent data processing job that you submit.