Data Harmonisation: Steps, Techniques, and Best Practices
Introduction In the digital age, data is often collected from multiple sources, leading to variability in formats, standards, and quality. Data harmonisation addresses these issues by transforming disparate data into a cohesive dataset, enabling better analysis, insights, and decision-making. It is essential for organisations looking to leverage their data assets across diverse systems and platforms.
Data harmonisation involves several key steps: preparing, transforming, and validating data. Additionally, it’s built on a foundation of best practices that ensure the integrity, accuracy, and usability of the harmonised data.
Ideas
Project ideas for future development of Harmony Accepted project ideas Below you can see the list of project ideas aligned with Harmony’s standards, all designed to elevate the functionality and accessibility of Harmony. These proposals aim to fortify Harmony as a comprehensive tool for researchers navigating questionnaire item harmonisation across diverse studies. Each idea maintains a clear scope, typically avoiding extensive overhauls. Quick-start guidelines and beginner-friendly tasks are provided for each idea.
Harmony on Kaggle
Harmony launches on Kaggle! We are proud to have launched our first competition on Kaggle!
The primary challenge of this competition is to develop an AI tool or method that can accurately extract questionnaire questions from documents, primarily PDFs.
This competition offers a unique opportunity for participants to contribute to the field of natural language processing and document analysis as well as open source for social science while developing solutions that have real-world applications.
Data Standardisation vs Harmonisation - The Right Things at the Right Times
Data Standardisation vs Harmonisation: The Right Things at the Right Times In the evolving landscape of data management, two concepts often come to the forefront: data standardisation and data harmonisation. Both play critical roles in how organisations manage and utilise their data, but they serve different purposes and are applicable in various contexts. This article delves into the nuances of each concept, particularly focusing on their significance in business and scientific environments.
What is data harmonisation - and why it matters in 2024
Data Harmonisation: Unifying Data for Deeper Insights What is Data Harmonisation? In today’s data-driven world, data harmonisation has become increasingly important. With data coming from disparate sources, it’s essential to ensure that this information is consistent, accurate, and usable. For example, in a large study in social sciences, such as a longitudinal study or meta-analysis, it is common that a researcher may want to combine data from different studies.
We can make data comparable by recoding variables from different studies, modifying them, or identifying which variables in one study match variables in another study.
How can I contribute to an open source project?
How can I contribute to an open source project? Your guide to contributing to open source projects
Are you feeling intimidated by the thought of stepping into the world of open source contributions? You don’t have to be an expert to help.
You might find this guide helpful: https://opensource.guide/how-to-contribute as well as the Reddit Opensource community.
1. Start small, think big:
Don’t feel pressured to tackle the most complex issues right away.
Formatting help
How should I format my file for Harmony? Harmony supports the following file types:
Word - download an example Word doc formatted for Harmony Excel - download an example Excel spreadsheet formatted for Harmony with two tabs for two questionnaires CSV - download an example tab separated CSV file formatted for Harmony PDF - download an example tabular PDF document formatted for Harmony If you want to upload multiple questionnaires in a single file, you can use Excel format and put them in separate tabs.
Troubleshooting Harmony
Troubleshooting Harmony Harmony didn’t find anything in my PDF Harmony tends to perform better if you upload a file with item numbers.
If there are are no question numbers in the instrument, it’s very hard for Harmony to distinguish question text from other content such as the copyright information. Click here to see an example PDF with question numbers included.
Also, if your PDF is a scanned document, please see if you can find a fully digitised (OCR’ed) version of the document.
Making Harmony sustainable long-term
Longevity is a tricky topic in software development.
We’ve been thinking about how we can make sure that Harmony continues to operate for a long time in the future, since Harmony is intended as a public good for researchers to use with no strings attached (an open source tool for social science).
Sustainability assessment In April 2023, we completed the software sustainability assessment with the Software Sustainability Institute, which gave us 29 recommended improvements to make Harmony more sustainable.
Harmony update: new features and bug fixes
Harmony update: new features and bug fixes We are pleased to announce the release of a new update to Harmony, our open source online platform for harmonising question items. This update includes a number of new features and bug fixes, designed to improve the user experience and make Harmony even more useful for researchers.
New features:
Complete reworking of the search functionality: The search functionality in Harmony has been completely rewritten to support Lucene-like queries.