Data Harmonization and Normalization for Combining EHR (Electronic Health Records) and Claims Data in Healthcare Industry

May 15, 2023
Data & Analytics | 7 min READ
In today's data-driven world, the healthcare industry generates petabytes of data, including patient care and healthcare costs. Electronic Health Records (EHR) and Claims data are two critical data sources with distinctive advantages. Combining data from these sources can help healthcare organizations facilitate comprehensive analyses for better outcomes. However, this data is often siloed due to differences in terminologies, structures, and formats, and merging them poses a significant technical challenge. By combining EHR and Claims data through effective data harmonization and normalization, healthcare organizations can gain valuable insights into patient health outcomes, treatment effectiveness, and healthcare costs, leading to more informed decision-making and improved patient care.
Naga Ravi Teja D
Naga Ravi Teja D

GTM & Practice Director –



Data Models Used In HealthCare Industry
Standardized data models are critical for ensuring healthcare data accuracy, consistency, and interoperability across different systems and applications in the organization. The most common data models used in the healthcare industry include the following:
1. Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM):
This is a widely used data model developed by the Observational Health Data Sciences and Informatics (OHDSI) community. It standardizes clinical data across different sources, including electronic health records (EHRs), claims data, and registries. In addition, the OMOP CDM includes standardized terminologies such as SNOMED-CT, LOINC, and RxNorm.
Stay Ahead
Visit our Data Analytics page
2. Fast Healthcare Interoperability Resources (FHIR):
The FHIR data model developed by Health Level Seven International (HL7) exchanges healthcare information between different healthcare systems. FHIR is designed to be flexible, extensible, and easily implementable, making it a popular choice for healthcare data exchange.
3. Clinical Document Architecture (CDA):
This data model developed by HL7 is used to standardize clinical documents. CDA exchanges clinical summaries, discharge summaries, and other clinical documents.
4. Logical Observation Identifiers Names and Codes (LOINC):
LOINC standardized vocabulary is developed by the Regenstrief Institute to identify laboratory tests and clinical observations. LOINC is used to standardize clinical data for research and quality improvement purposes.
5. Systematized Nomenclature of Medicine (SNOMED):
SNOMED is a standardized vocabulary developed by the International Health Terminology Standards Development Organisation (IHTSDO) to describe clinical concepts. SNOMED is generally used for data standardization in research and clinical decision-making.
6. Healthcare Cost and Utilization Project (HCUP):
HCUP is a collection of healthcare databases developed by the Agency for Healthcare Research and Quality (AHRQ), including inpatient, emergency department, and ambulatory surgery data. HCUP data is used for research and policy analysis.
A Common Data Model (CDM) for Electronic Health Records (EHR) and Claims data standardizes medical concepts and data elements across different data sources, harmonizing and facilitating the use of data obtained from the conveyance of healthcare in routine clinical settings, such as data from insurance billing claims, EHRs, and patient registries. The advantages of establishing a CDM for EHR and Claims data include the following:
Improved data quality:
Data standardization allows healthcare organizations to conduct collaborative research, facilitate large-scale analytics and share sophisticated tools and methodologies. Healthcare data can vary significantly from one organization to the next. Data collected for different purposes, such as provider reimbursement, clinical research, and direct patient care, can have different structures and codes. Establishing a CDM can help standardize data and improve data quality.
Enhanced research capabilities:
Associating EHR and claims data can help life sciences companies differentiate subsets of patients based on treatment end results, evaluate market performance and product safety, and better understand how physicians prescribe a commercial drug and the post-approval treatment patterns across multiple therapeutic areas. A CDM can make combining and analyzing data from different sources easier, leading to more comprehensive and accurate research findings.
Better patient care and outcomes:
EHR data can provide reasonable predictions of adherence trajectory and may be helpful for providers seeking to deploy resource-intensive interventions. However, prior adherence information derived from claims is most predictive and can supplement EHR data when available. A CDM can help providers access complete and accurate data, leading to better patient care and outcomes.
The Observational Medical Outcomes Partnership (OMOP) is a standardized data model designed to address the challenges faced by healthcare organizations in managing large quantities of complex healthcare data. Using a common language and structure, OMOP enables seamless data exchange and interoperability across different systems and institutions, thereby improving the efficiency and accuracy of medical research. Its usage for data arrangement in healthcare organizations involves transforming raw clinical information into structured, machine-readable formats that can be easily analyzed and interpreted.
OMOP also provides tools for generating high-quality evidence from observational studies, enabling clinicians to make more informed decisions about patient care. Adopting this innovative approach to data management has significant implications for optimizing treatment outcomes while minimizing costs associated with unnecessary tests or procedures. OMOP allows healthcare organizations to leverage big data in previously impossible ways, leading to better patient health outcomes.
Integrating Vocabularies and Standardization of Healthcare Concepts
Integrating vocabularies and standardizing healthcare concepts are essential for ensuring consistent and accurate data collection and analysis. One approach to achieving this is using a terminology service such as Athena. Athena is a web-based terminology service that provides access to standardized vocabularies, codes, and terminologies used in clinical care, research, and public health reporting. It allows users to search, browse, and download various medical terminologies, including SNOMED CT, LOINC, and RxNorm. To integrate vocabularies and standardize healthcare concepts using Athena, healthcare organizations can use Athena to map their local terminology to standard terminologies. This mapping process involves assigning a code from a standard terminology to each local term used within the organization. The mapping process can be done manually or through automated tools that utilize natural language processing (NLP) and machine learning algorithms.
Once mapped, the standardized codes can query, analyze, and exchange data between healthcare systems and organizations. This approach enables healthcare organizations to ensure their terminology is consistent with national standards and improves system interoperability. Other methodologies for integrating vocabularies and standardizing healthcare concepts include clinical decision support systems, electronic health record (EHR) systems, and data warehouses. These systems can automatically map local terms to standard terminologies and provide decision support based on standardized codes. Additionally, clinical documentation improvement programs can standardize the language used in clinical documentation to improve coding accuracy and ensure consistent data collection.
Streamlining Healthcare Data with GraphDBs, Knowledge Graphs, Ontologies, Cloud, and AI
Combining Electronic Health Records (EHR) and Claims Data can be complex, as both data types have different structures, formats, and vocabularies. However, advanced technologies such as GraphDBs, Knowledge Graphs, Ontologies, Cloud, and AI can standardize terminology, identify correlations and relationships, and provide a unified and linked format for storing and managing data.
Data Harmonization and Normalization for Combining EHR (Electronic Health Records) and Claims Data in Healthcare Industry
GraphDBs are potent databases for storing and managing structured and unstructured data collected from disparate sources. They can be beneficial for handling complex data relationships and help identify patterns and trends within the data. In the context of EHR and Claims Data, GraphDBs can store patient, provider, and claims data in a unified and linked format. It can help to identify correlations between different types of data, such as patient demographics, medical conditions, and treatments received.
Knowledge Graphs can then be used to identify correlations between different types of data, such as patient demographics, medical conditions, and treatments received.
Furthermore, Ontologies can be used to standardize medical terminologies and concepts, making it easier to compare and analyze data from different sources. Cloud computing can be leveraged to provide a scalable and flexible platform for storing and processing large data. Cloud-based solutions can store and manage EHR and Claims Data, making it accessible and secure for authorized users.
Finally, AI can be used to identify patterns and relationships within the stored data to identify correlations and trends. AI can also classify and categorize data for advanced analytics. By leveraging these technologies, healthcare organizations can gain valuable insights into patient care and treatment, leading to better outcomes and improved patient outcomes.
Merging EHR and claims data offers unprecedented opportunities to improve patient outcomes, reduce costs and enhance the overall quality of care in the healthcare industry. However, this can only be achieved through effective data harmonization and normalization processes. By establishing a common data model for EHR and claims data, healthcare organizations can unlock valuable insights that can help drive better decision-making across all levels of care delivery. For example, using OMOP to transform disparate medical terminologies into a standardized language makes integrating vocabularies from various sources easier while ensuring consistency in meaning. Moreover, leveraging methodologies such as Athena provides standardization of healthcare concepts which is critical in reducing errors associated with ambiguous or conflicting terms. By implementing these best practices for combining EHR and claims data, healthcare providers can make effective informed decisions leading to better patient clinical outcomes while driving down costs.
Was this article helpful?
 Demystifying the Data Deluge
Demystifying the Data Deluge
Digital Transformation | 6 min Read