In the realm of healthcare, integrating and harmonising data across public and private records is crucial for advancing patient care, research, and policy-making. Data harmonisation in healthcare is a pivotal process for integrating diverse data sources, ranging from public health records to private medical data. This endeavor aims to standardize and unify data for more comprehensive insights, enhancing patient care, research capabilities, and policy-making. This blog explores the concept, significance, and methodologies of data harmonisation in healthcare, referencing key sources and projects in the field.
Data harmonisation involves the alignment of datasets from various sources to create a coherent, unified view of information. It encompasses three primary dimensions: syntax (data format), structure (conceptual schema), and semantics (intended meaning of words). Syntax refers to the technical formats of data, such as CSV, JSON, or HTML. The structure pertains to how variables relate within a dataset, ranging from highly organized structured data to unstructured data with no fixed format. Semantics involve the interpretation of variables to ensure they measure the intended concepts consistently across datasets.
Data harmonisation significantly enhances the quality of research by enabling the aggregation of data from various sources for more robust analysis. When data from different studies or healthcare systems can be combined and compared, it increases the statistical power of research findings, leading to more reliable and generalizable conclusions. This is particularly important in healthcare, where research findings often inform clinical guidelines and treatment protocols. Harmonised data can reveal trends and patterns that might be obscured in smaller, fragmented datasets, facilitating breakthroughs in understanding diseases, treatments, and patient outcomes. Data harmonisation is a critical endeavor in healthcare, underpinning efforts to improve research, enhance interoperability, achieve cost efficiencies, and support the implementation of evidence-based policies and practices. Through careful methodological planning and collaboration among stakeholders, harmonised data can serve as a powerful tool for advancing health outcomes and operational excellence in healthcare systems.
Harmonisation methods in data science and healthcare research aim to standardize disparate data sources to ensure consistency, comparability, and reliability across datasets. These methods are critical in the context of big data and the increasing reliance on electronic health records (EHRs), where data is often collected from various sources with different standards and formats. Harmonisation can be approached retrospectively, after data collection, or prospectively, before data collection begins. The choice between these approaches depends on the constraints of the existing datasets and the theoretical frameworks guiding the research or clinical objectives.
In the realm of data harmonisation, the emphasis on quality over quantity cannot be overstated. Accurate and reliable data is paramount for developing algorithms that are truly effective and can lead to meaningful insights and outcomes. The focus should be on ensuring that each data point collected and integrated into larger datasets meets a high standard of quality, as this will significantly impact the performance of machine learning models and AI algorithms and tools. Poor-quality data can lead to inaccurate predictions, biased outcomes, and ultimately, decisions that may not be in the best interest of patients or research objectives.
Machine learning and AI play a pivotal role in the cleaning and harmonising of data, making it more usable for predictive analytics. These technologies can automatically detect inconsistencies, outliers, and errors that would be time-consuming and challenging to identify manually. By employing advanced algorithms, AI can also suggest harmonisation strategies, such as normalization techniques and data transformation methods, to ensure that disparate data sources are compatible and can be analysed together. This not only improves the quality of the data but also accelerates the harmonisation process, enabling researchers and clinicians to focus on deriving insights and making informed decisions.
Interoperability standards, such as Health Level Seven International’s Fast Healthcare Interoperability Resources (HL7 FHIR), are critical in enabling data exchange and integration across different healthcare systems. These standards provide a framework for the representation and exchange of healthcare information, facilitating the seamless sharing of patient data among providers, researchers, and other stakeholders. By adhering to such standards, healthcare organizations can ensure that data from various sources can be harmonised more efficiently, enabling more comprehensive and accurate analyses.
The management of large volumes of disparate data is made possible through advanced data management technologies, including cloud-based platforms and electronic health records (EHRs). Cloud-based solutions offer scalable and flexible environments for storing and processing vast amounts of data, providing the infrastructure needed for effective data harmonisation. EHRs, on the other hand, are crucial for collecting and organizing patient data in a structured format, making it more accessible for analysis and harmonisation. These technologies not only support the integration of diverse data sources but also enhance the security and privacy of sensitive health information.
The Social Determinants of Health (SDOH) refer to the conditions in which people are born, grow, live, work, and age, including factors like socioeconomic status, education, neighborhood and physical environment, employment, and social support networks, as well as access to healthcare. SDOH significantly impact health outcomes, with a growing body of evidence suggesting that they may account for a substantial part of health disparities and inequities observed across different populations.
The Office of the National Coordinator for Health Information Technology (ONC) Gravity Project is an initiative focused on developing consensus-driven standards for documenting, coding, and exchanging SDOH data within health records. By incorporating SDOH data into EHRs, the project aims to improve health outcomes by enabling healthcare providers to consider and address the broader social and environmental factors affecting their patients’ health. The standards developed by the Gravity Project are expected to facilitate more holistic approaches to patient care and contribute to advancing health equity by identifying and addressing the root causes of health disparities.
Harmonising health data presents a complex set of challenges, reflecting the diversity and intricacy of healthcare information systems. Addressing these challenges is critical for ensuring that data harmonisation efforts lead to improved healthcare outcomes, enhanced research capabilities, and more informed policy-making. Here’s a closer look at some of the primary challenges involved in healthcare data integration:
Diverse Data Sources: The complexity of healthcare data arises from its diverse sources, varying formats, and the different standards employed by public and private health records. Data collected from hospitals, clinics, laboratories, and patient-reported outcomes can differ significantly, not just in format but also in the detail and context of the information captured. Public health records may focus on population-level data, while private health records often contain detailed individual patient data. Harmonising these sources requires a deep understanding of the context and purpose of each data type and the development of methods to reconcile these differences without losing the nuances that make the data valuable.
Data Privacy and Security: Data privacy and security are paramount concerns in the sharing and integration of health information. Legal frameworks such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States set strict guidelines for handling personal health information. The ethical considerations of maintaining patient confidentiality and ensuring data is used responsibly complicate the harmonisation process. Addressing these concerns involves implementing robust security measures, ensuring compliance with legal requirements, and often navigating complex consent and data governance frameworks to enable the ethical use of data.
Technological Barriers: Technological barriers pose significant challenges to healthcare data integration. Many healthcare systems still rely on legacy systems that were not designed with interoperability in mind, making it difficult to extract, share, and integrate data across different platforms. The need for interoperable solutions that can communicate across various healthcare information systems is critical. Overcoming these barriers requires investment in technology upgrades and the adoption of standards like HL7 FHIR, which facilitate the seamless exchange of health information. However, the cost and complexity of implementing these solutions can be prohibitive for many organizations.
To address these challenges, a systematic and collaborative approach is essential. This includes:
Despite these challenges, the benefits of harmonising health data are substantial. Successful harmonisation efforts can lead to more comprehensive and accurate datasets, enabling advanced research, more personalized patient care, and informed healthcare policies that can ultimately lead to improved health outcomes and reduced healthcare disparities.
Data harmonisation can be implemented retrospectively, after data collection, or prospectively, before data is collected. Retrospective harmonisation, also known as ex-post or output harmonisation, aligns existing datasets. Prospective harmonisation, or ex-ante/input harmonisation, involves planning data collection methods and standards in advance to ensure compatibility. Each approach has its merits, and the choice between them often depends on the goals of the harmonisation effort and the nature of the data involved.
The process involves defining the scope of harmonisation, identifying relevant data sources, standardizing data formats and terminologies, and employing technologies such as natural language processing (NLP) to ensure data quality and consistency. Numerous initiatives support data harmonisation efforts, such as the Common Data Model Harmonisation project, which aims to enhance data utility and interoperability across healthcare networks. Tools and technologies like CDASH and the NIH’s Common Data Elements facilitate registry interoperability.
The BRCA Challenge, under the umbrella of the Global Alliance for Genomics and Health (GA4GH), aims to harmonise genetic data related to the BRCA1 and BRCA2 genes, known for their link to breast and ovarian cancer risk. The project seeks to aggregate and standardize genetic variants to improve the understanding and clinical interpretation of BRCA mutations globally.
Institution | Country | Number of BRCA1/2 Variants |
---|---|---|
International Cancer Genome Consortium (ICGC) | Global | 10,000 |
The Cancer Genome Atlas (TCGA) | USA | 5,000 |
UK Biobank | UK | 2,000 |
Original Variant Name | Standardized Variant Name |
---|---|
c.181T>G (BRCA1) | NM_000059.3:c.181T>G |
p.Val61Leu (BRCA2) | NP_000056.3:p.Val61Leu |
Variant | ACMG Classification | Clinical Significance |
---|---|---|
c.181T>G (BRCA1) | Pathogenic | High risk |
p.Val61Leu (BRCA2) | Uncertain significance | Further investigation needed |
Platform Feature | Description |
---|---|
Variant search | Find specific BRCA1/2 variants by gene, position, or ID. |
Clinical annotations | View clinical data associated with specific variants. |
Download options | Download variant data in different formats. |
PEDSnet, a pediatric learning health system, aims to harmonise electronic health record (EHR) data from multiple pediatric hospitals across the United States to facilitate research and improve pediatric care.
These case studies provide tangible examples of how harmonisation efforts in healthcare can leverage real data to advance research, improve clinical care, and facilitate global collaboration. The BRCA Challenge and PEDSnet demonstrate the potential of harmonised data to impact patient care and outcomes significantly by enabling more informed research and clinical decisions.
Participating Hospital | Location | Number of Pediatric Beds |
---|---|---|
Children’s Hospital of Philadelphia | Philadelphia, USA | 500 |
Seattle Children’s Hospital | Seattle, USA | 300 |
Boston Children’s Hospital | Boston, USA | 400 |
CDM Element | Description |
---|---|
Patient demographics | Age, gender, ethnicity, etc. |
Diagnoses | ICD-10 codes for medical conditions. |
Procedures | CPT codes for medical procedures. |
Medications | Medications administered during hospitalization. |
Length of stay | Number of days spent in the hospital. |
Hospital | Local EHR Data Element | CDM Element | Mapping Process |
---|---|---|---|
Children’s Hospital of Philadelphia | “Patient Age” | “Patient_Age_Years” | Convert date of birth |
The harmonisation of health data involves contributions and responsibilities from both the public and private sectors, each bringing unique resources and perspectives to the process.
Collaboration between these sectors is essential for advancing healthcare research, improving patient care, and informing health policy beneficially for society.
The integration of AI and ML in data harmonisation processes offers promising prospects for healthcare. These technologies can automate the identification and resolution of discrepancies across diverse data sources, reducing the time and resources required for manual data cleaning and standardization. This automation will not only speed up research and analysis but also improve the reliability of data-driven decisions in healthcare settings.
Harmonised health data, enriched through AI-driven analyses, has the potential to revolutionise predictive analytics and personalized medicine. By aggregating and standardizing vast amounts of health data, researchers and clinicians can develop more accurate predictive models for disease risk, treatment outcomes, and health trends. This, in turn, enables the creation of tailored treatment plans that consider the unique genetic makeup, lifestyle, and environmental factors of individual patients, moving healthcare closer to a truly personalized approach.
The evolving landscape of data harmonisation will likely necessitate changes in policy and regulation. Future policy directions may focus on enhancing data sharing frameworks while ensuring patient privacy and data security. Regulations may evolve to encourage the adoption of standardized data models across healthcare systems, promoting interoperability and reducing silos in healthcare data. However, these changes must balance the need for open, collaborative research with the imperative to protect sensitive health information.
The promise of data harmonisation extends beyond technological advancements, touching on key aspects of healthcare efficiency and equity. By making it easier to combine and analyse data from varied sources, harmonisation efforts can help identify and address disparities in healthcare access and outcomes. This can lead to more equitable healthcare systems where decisions are informed by comprehensive, high-quality data reflecting diverse patient populations.
Moreover, the efficiency gains from streamlined data processes can reduce healthcare costs and improve patient outcomes. As data harmonisation facilitates quicker, more accurate analyses, healthcare providers can make informed decisions faster, leading to better health outcomes and more efficient use of resources.
Data harmonisation in healthcare is a complex but essential process for integrating public and private health records. By standardizing data across different sources, healthcare providers and researchers can improve patient care, enhance research capabilities, and support effective public health interventions. While challenges exist, the strategic implementation of data harmonisation, along with collaboration among stakeholders, can lead to significant advancements in healthcare information management and utilization. The future of healthcare lies in leveraging technology to make sense of vast amounts of data. As we look ahead, the role of data harmonisation in realizing the potential of predictive analytics, personalized medicine, and equitable healthcare cannot be understated. With ongoing advances in AI and ML, coupled with thoughtful policy and regulation changes, the harmonisation of health data stands as a cornerstone of future healthcare innovations.
For further reading on data harmonisation and its impact on healthcare, source1, source2, and source3 provide comprehensive insights.
This exploration into data harmonisation in healthcare highlights the intricate process and its significant impact on healthcare management and patient care. The concerted effort of all stakeholders in this domain is vital for achieving a more integrated, effective, and equitable healthcare system.
This blog synthesizes information from comprehensive sources on the topic of data harmonisation in healthcare. For detailed insights and methodologies, refer to the original articles on Nature and BMC Medical Informatics and Decision Making.
For users who have been using Harmony in their research, we have created an example scripts repository here https://github.com/harmonydata/harmony_examples This contains example R notebooks and Jupyter notebooks. You can upload your own example script if you have something to share with the research community. Example problems that users have been solving included: R examples Walkthrough R notebook in R Studio: Walkthrough R notebook in Google Colab: R Markdown to Check for Correspondence between Differently Worded Versions of the Same Scale Item View on Github - credit to Deanna Varley R Script to Check for Matches between Items from Different Scales View on Github - credit to Deanna Varley Python examples Walkthrough Python notebook Example script to create a crosswalk table on real survey data Example script to strip prefixes from questions Documentation View the PDF documentation of the R package on CRAN
Upcoming Tech Talk: GenAI and LLMs night at Google London on 10 December 2024 We’re pleased to announce that the AI tool Harmony will be showcased at the upcoming GenAI and LLMs night at Google London on 10th December organised by AI Camp. Topic: Harmony, Open source AI tool for psychology research Speakers: Thomas Wood (Fast Data Science), Bettina Moltrecht (UCL) Date: 10th December 2024 Register RSVP to join the AI and LLMs night at the Google Campus on 10 December 2024 See other Harmony events 8 October 2024: Harmony: a free online tool using LLMs for research in psychology and social sciences at AI|DL London 11 and 12 September 2024: Harmony at MethodsCon Futures in Manchester 2 July 2024: Harmony: NLP and generative models for psychology research at Pydata London 3 June 2024: Harmony Hackathon at UCL 5 May 2024: Harmony: A global platform for harmonisation, translation and cooperation in mental health at Melbourne Children’s LifeCourse Initiative seminar series.