Do you need to compare questionnaire items across studies? Do you want to find the best match for a set of items? Are there are different versions of the same questionnaire floating around and you want to make sure how compatible they are? Are the questionnaires written in different languages that you would like to compare?
The Harmony project is a data harmonisation project that uses Natural Language Processing to help researchers make better use of existing data from different studies by supporting them with the harmonisation of various measures and items used in different studies. Harmony is a collaboration project between Ulster University, University College London, the Universidade Federal de Santa Maria, and Fast Data Science. Harmony has been funded by the Economic and Social Research Council (ESRC) and by Wellcome as part of the Wellcome Data Prize in Mental Health.
In 2024, we published a paper validating Harmony on real world mental health data:
1. McElroy, Wood, Bond, Mulvenna, Shevlin, Ploubidis, Scopel Hoffmann, Moltrecht, Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data. BMC Psychiatry 24, 530 (2024).
Harmony
pip install harmonydata
import harmony
harmony.download_models()
instruments = harmony.example_instruments["CES_D English"],
harmony.example_instruments["GAD-7 Portuguese"]
questions, similarity, query_similarity, _ = harmony.match_instruments
(instruments)
# How to load a PDF, Excel or Word into an instrument
harmony.load_instruments_from_local_file("gad-7.pdf")
# install.packages("devtools") # If you don't have devtools installed already or CRAN is down.
# library(devtools)
# devtools::install_github("harmonydata/harmony_r")
install.packages("harmonydata")
library(harmonydata)
instrument = load_instruments_from_file(path = "examples/GAD-7.pdf")
instrument_2 = load_instruments_from_file("https://medfam.umontreal.ca/wp-content/uploads/sites/16/GAD-7-fran%C3%A7ais.pdf")
instruments = append(instrument, instrument_2)
match = match_instruments(instruments)
names(match)
docker pull harmonydata/harmonyapi
docker run -p 8000:8000 -p 3000:3000 harmonydata/harmonyapi
Our tool, Harmony, allows researchers to upload a set of mental health questionnaires in PDF or Excel format, such as the GAD-7 anxiety questionnaire. It identifies which questions among questionnaires are identical, similar in meaning, or antonyms of each other, and generates a network graph. This allows researchers to harmonise datasets.
Uniquely, Harmony relies on Transformer neural network architectures and is not dependent on a dictionary approach or word list. This allows for multilingual data harmonisation (English and Portuguese are our languages of focus), and Harmony is able to correctly map the GAD-7 used in the UK to the GAD-7 used in Brazil, despite the Brazilian questionnaire being in Brazilian Portuguese.
Using Harmony, our team was able to harmonise multilingual datasets and conduct groundbreaking research into social isolation and anxiety with NLP supplying a quantitative measure of the equivalence of the different mental health datasets.
HARMONY