development

Harmony: Natural Language Processing Tool for Item Harmonisation is now on CRAN

Harmony: Natural Language Processing Tool for Item Harmonisation is now on CRAN

We are excited to announce that Harmony, an open source Natural Language Processing tool for data harmonisation, is now available on the Comprehensive R Archive Network CRAN!

Previously, Harmony R could be installed using devtools.

Harmony can be used to compare questionnaire items across studies, find the best match for a set of items, and identify different versions of the same questionnaire. Harmony is a collaboration project between Ulster University, University College London, the Universidade Federal de Santa Maria, and Fast Data Science. It has been funded by Wellcome as part of the Wellcome Data Prize in Mental Health, as well as UKRI.

To install Harmony, you can use the following command in your R console or R Studio:

install.packages("harmonydata")

We encourage you to try Harmony and let us know what you think! You can also follow us on Twitter @harmonydata for updates.

cran

Are you excited to use Harmony to harmonise your instruments?

Here is a quick walkthrough on how to do it:

  1. Import the Harmony library:
library(harmonydata)
  1. Load the instruments from a file or URL:
instrument = load_instruments_from_file(path = "examples/GAD-7.pdf")
instrument_2 = load_instruments_from_file("https://medfam.umontreal.ca/wp-content/uploads/sites/16/GAD-7-fran%C3%A7ais.pdf") 
instruments = append(instrument, instrument_2)
  1. Match the instruments:
match = match_instruments(instruments)
  1. Get the results of the match:
names(match)
#> [1] "questions"        "matches"          "query_similarity"

As you can see, the match object contains a lot of information about the best match for each question in the query instrument. This information can be used to harmonise the instruments and make them more comparable.

We hope this walkthrough is helpful. Let us know if you have any other questions.

I’m so excited to see what you can do with Harmony!

Related Posts

Improving Harmony's PDF extraction with user testing

Improving Harmony's PDF extraction with user testing

Since we built Harmony, a common complaint has been that it frequently identifies the wrong questions in PDFs. The original algorithm for finding questions in PDFs was a mixture of rule based heuristics and some hand coded logic to look for e.g. lines in the document which begin with numbers. This was very fragile and worked fine on short questionnaires such as the GAD-7, but failed on larger documents. We decided to run a competition with our partner DOXA AI where members of the public could train their own model to extract questions from PDFs.
Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and Datamind Data Science workshop On 2 May 2025, Dr Eoin McElroy demonstrated Harmony at the MQ and Datamind Data Science workshop in Deutsche Bank. Eoin’s presentation focused on “Maximising the use of existing survey data: facilitating cross-study research using retrospective harmonization.” The workshop brought together researchers interested in applying novel harmonisation techniques to existing datasets. Eoin explained traditional harmonisation processes and presented a user-friendly guide to the Harmony tool, demonstrating how natural language processing can streamline the harmonisation process.

Signup to our newsletter

The latest news on data harmonisation project.

Please select all the ways you would like to hear from Harmony project:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website. We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.