Datasets

Out of the box, Eureka! Clinical Analytics currently supports processing data in an Excel spreadsheet with the xlsx extension. The spreadsheet’s contents must conform to a specific format and the data must be represented in a certain way for Eureka! to read it. Note that we can access other data sources with this tool including flat files and relational databases, but there is no GUI method for configuring it to do that yet.

Creating a Spreadsheet

Eureka! comes with a sample spreadsheet that conforms to the proper format. To get it, click the Jobs button in the button bar, and then click the Download Sample Spreadsheet link. This spreadsheet contains synthetically generated data. Little attempt was made to make the data look like it came from real patients – it is there purely as an example. We highly recommend that when creating your own datasets that, instead of beginning with a blank spreadsheet, you start with this spreadsheet and replace the data with your own.

The sample spreadsheet has multiple tabs. The tabs each represents a kind of database table. Each row is a record and each column represents an attribute. The tabs are “linked” to each other through primary and foreign key relationships as described below that link patients to encounters, encounters to providers, etc.

The tabs are as follows:

Patient

Contains patient name (name_first and name_last) and demographics (dob, language, marital_status, race and gender). The patient_key field represents an unique identifier or key for each patient. There should be one row or record per patient.

Provider

Represents healthcare providers in the dataset. It has columns for each provider’s first name (name_first) and last name (name_last). The provider_key column represents a unique primary key for each provider. There should be one row or record per provider.

Encounter

Contains the encounter start and end date/time (start_ts and end_ts), the type of the encounter (encounter_type) and the encounter’s discharge status (discharge_disp). The encounter_key column contains a unique id for each encounter. The patient_key column contains the key of the patient in the patient tab whose encounter this was. The provider_key column contains the key of the provider (from the provider tab) that was the healthcare provider of record for the encounter. There should be one row or record per encounter in your dataset. There may be multiple encounters for a given patient.

eCPT

Contains billing codes for procedures from the Current Procedural Terminology (CPT). There are columns for the procedure time (obx_ts) and the procedure code (represented as the code prefixed by CPT:, e.g., CPT:75505). The event_key column represents a primary key for each procedure. The encounter_key contains the key of the encounter in which the procedure occurred.

eICD9D

Contains billing codes for diagnoses from the International Classification of Diseases version 9 (ICD-9-CM). There are columns for the diagnosis time (obx_ts) and the diagnosis code (represented as the code prefixed by ICD9:, e.g., ICD9:V44.1). The event_key column represents a primary key for each diagnosis code. The encounter_key contains the key of the encounter in which the diagnosis was recorded. There may be multiple ICD-9-CM diagnosis codes for a given encounter.

eICD9P

Contains billing codes for procedures from the Internal Classification of Diseases version 9 (ICD-9-CM). There are columns for the procedure time (obx_ts) and the procedure code (represented as the code prefixed by ICD9:, e.g., ICD9:55.02). The event_key column represents a primary key for each procedure. The encounter_key column contains the key of the encounter in which the procedure occurred. There may be multiple ICD-9-CM procedure codes for a given encounter.

eMEDS

Contains medication orders represented using the RxNorm terminology. There are columns for the order time (obx_ts) and the code for the order (entity_id, specified as the code prefixed by RXNORM:, e.g., RXNORM:200998). The event_key column represents a primary key for each order. The encounter_key column represents the key for the encounter in which the order was made. There may be multiple medication orders for a given encounter.

eLABS

Contains laboratory test results represented using the LOINC terminology. There are columns for the time of the test (obx_ts), the test code (entity_id, specified as the code prefixed by LOINC:, e.g., LOINC:778-1), the result in string (result_str) and numerical (result_num) formats, the units of the result (units), and a flag (flag, may be blank, high, normal or low). The event_key column represents a primary key for each test result. The encounter_key column represents the key for the encounter in which the laboratory test was performed. There may be multiple laboratory test results for a given encounter.

eVITALS

Contains vital signs. The coding system is non-standard. Available codes, descriptive strings and classes of vital signs may be found in the metadataVITALS tab. There are columns for the time of the observation (obx_ts), the vital sign code (entity_id), the result in string (result_str) and numerical (result_num) formats, the units (units) and a flag (flag, may be blank, high, normal or low). The event_key column represents a primary key for each vital sign observation. The encounter_key column represents the key for the encounter in which the vital sign was observed. There may be multiple vital sign observations for a given encounter.

The spreadsheets must have these tabs and column names. The cells in each column must have the same value category (e.g. Text, Number) as in the sample preadsheet. Data with an unknown code will be omitted by the tool.