development

Harmony R released!

Harmony R released!

We have developed the R package for Harmony and open sourced it. To get started, you need R installed on your system.

Click here to try an example in Google Colab.

Here’s a Jupyter Notebook with an example using Harmony in R

Installing R library

We are currently submitting the R library to CRAN.

In the meantime, you can install the development version of harmonydata from GitHub (documentation in the README file):

You also need devtools which will already be there if you are using R Studio. If not, you can install devtools with the following command in the R console:

install.packages("devtools") # If you don't have devtools installed already.

Next, to install Harmony, run:

library(devtools)
devtools::install_github("harmonydata/harmony_r")

Parsing a raw file into an Instrument

Let’s import Harmony and harmonise an instrument.

If you want to read in a raw (unstructured) PDF or Excel file, you can do this via a POST request to the REST API. This will convert the file into an Instrument object in JSON.It returns the instrument as a list.

library(harmonydata)
instrument = load_instruments_from_file(path = "examples/GAD-7.pdf")
names(instrument[[1]])
#>  [1] "file_id"         "instrument_id"   "instrument_name" "file_name"      
#>  [5] "file_type"       "file_section"    "study"           "sweep"          
#>  [9] "metadata"        "language"        "questions"

You can also input a url containing the questionnaire.

instrument_2 = load_instruments_from_file("https://medfam.umontreal.ca/wp-content/uploads/sites/16/GAD-7-fran%C3%A7ais.pdf") 
names(instrument_2[[1]])
#>  [1] "file_id"         "instrument_id"   "instrument_name" "file_name"      
#>  [5] "file_type"       "file_section"    "study"           "sweep"          
#>  [9] "metadata"        "language"        "questions"

Matching instruments

You can get a list containing the results of the match.Here we can see a list of similarity score for each question comapred to all the other questions in th other questionaire.

instruments = append(instrument, instrument_2)
match = match_instruments(instruments)
names(match)
#> [1] "questions"        "matches"          "query_similarity"

Related Posts

Improving Harmony's PDF extraction with user testing

Improving Harmony's PDF extraction with user testing

Since we built Harmony, a common complaint has been that it frequently identifies the wrong questions in PDFs. The original algorithm for finding questions in PDFs was a mixture of rule based heuristics and some hand coded logic to look for e.g. lines in the document which begin with numbers. This was very fragile and worked fine on short questionnaires such as the GAD-7, but failed on larger documents. We decided to run a competition with our partner DOXA AI where members of the public could train their own model to extract questions from PDFs.
Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and Datamind Data Science workshop On 2 May 2025, Dr Eoin McElroy demonstrated Harmony at the MQ and Datamind Data Science workshop in Deutsche Bank. Eoin’s presentation focused on “Maximising the use of existing survey data: facilitating cross-study research using retrospective harmonization.” The workshop brought together researchers interested in applying novel harmonisation techniques to existing datasets. Eoin explained traditional harmonisation processes and presented a user-friendly guide to the Harmony tool, demonstrating how natural language processing can streamline the harmonisation process.

Signup to our newsletter

The latest news on data harmonisation project.

Please select all the ways you would like to hear from Harmony project:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website. We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.