Harmony walkthrough - R library

This notebook shows how you can use Harmony to find the similarity matrix between two questionnaires.

The Harmony project is a data harmonisation project that uses Natural Language Processing to help researchers make better use of existing data from different studies by supporting them with the harmonisation of various measures and items used in different studies.

Harmony is a collaboration project between Ulster University, University College London, the Universidade Federal de Santa Maria, and Fast Data Science. Harmony is funded by Wellcome as part of the Wellcome Data Prize in Mental Health.

First let’s install and import Harmony. If you haven’t already, you need to install it with install.packages("harmonydata").

install.packages("harmonydata")
## Installing package into '/home/thomas/R/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(harmonydata)

Now we can create our instruments.

If you had everything in a PDF or Excel file, you could load the instruments directly from the file: instrument = load_instruments_from_file(path = "CBCL_GOASSESS.xlsx"). In this example we will load two questionnaires from Harmony’s database and manually input a third questionnaire.

First we load the instruments from Harmony’s database:

example_instruments <- get_example_instruments()
names(example_instruments)
##  [1] "CES_D English"                                      
##  [2] "SCARED English (adult)"                             
##  [3] "SCARED English (child)"                             
##  [4] "GAD-7 Portuguese"                                   
##  [5] "De Jong Gierveld Loneliness Scale English"          
##  [6] "Market research survey fictional soft drink English"
##  [7] "GAD-7 English"                                      
##  [8] "GHQ 12 English"                                     
##  [9] "Market research survey template English"            
## [10] "Adult ADHD Self-Report Scale English"               
## [11] "MacLean Screening Instrument for BPD English"       
## [12] "RCADS Child Reported English"                       
## [13] "GAD-7 French"                                       
## [14] "GAD-7 German"                                       
## [15] "GAD-7 Spanish"                                      
## [16] "GAD-7 Russian"                                      
## [17] "GAD-7 Chinese"                                      
## [18] "GAD-7 Afrikaans"                                    
## [19] "GAD-7 Cebuano"                                      
## [20] "GAD-7 Kannada"                                      
## [21] "GAD-7 Hebrew"                                       
## [22] "GAD-7 Norwegian"

Let’s use the CES-D English, GAD-7 Portuguese and create a GAD-7 Norwegian instrument.

ces_d_english <- example_instruments[["CES_D English"]]
gad_7_portuguese <- example_instruments[["GAD-7 Portuguese"]]

gad_7_norwegian <- create_instrument_from_list(
    list(
        "Følt deg nervøs, engstelig eller veldig stresset",
        "Ikke klart å slutte å bekymre deg eller kontrolleren bekymringene dine"
    ),
    instrument_name = "GAD-7 Norwegian"
)

We can concatenate our instruments together:

instruments_list <- list(ces_d_english, gad_7_portuguese, gad_7_norwegian)

Let’s call Harmony to calculate the match:

match <- match_instruments(instruments_list)

We also

help(match_instruments)

Check what has come out of the match:

names(match)
## [1] "instruments"                          
## [2] "questions"                            
## [3] "matches"                              
## [4] "query_similarity"                     
## [5] "closest_catalogue_instrument_matches" 
## [6] "instrument_to_instrument_similarities"
## [7] "clusters"

The first question is CES_D English question 1.

match$questions[[1]]
## $question_no
## [1] "1"
## 
## $question_text
## [1] "I was bothered by things that usually don’t bother me."
## 
## $options
## $options[[1]]
## [1] "Rarely or none of the time (less than 1 day)"
## 
## $options[[2]]
## [1] "Some or a little of the time (1-2 days)"
## 
## $options[[3]]
## [1] "Occasionally or a moderate amount of time (3-4 days)"
## 
## $options[[4]]
## [1] "Most or all of the time (5-7 days)"
## 
## 
## $source_page
## [1] 0
## 
## $instrument_id
## [1] "b45b7169e711414582768b8c8431027c"
## 
## $topics_auto
## $topics_auto[[1]]
## [1] "depression"
## 
## 
## $topics_strengths
## $topics_strengths$depression
## [1] 0.9928044
## 
## 
## $nearest_match_from_mhc_auto
## $nearest_match_from_mhc_auto$question_no
## NULL
## 
## $nearest_match_from_mhc_auto$question_intro
## NULL
## 
## $nearest_match_from_mhc_auto$question_text
## [1] "I was bothered by things that usually don't bother me\n"
## 
## $nearest_match_from_mhc_auto$options
## list()
## 
## $nearest_match_from_mhc_auto$source_page
## [1] 0
## 
## $nearest_match_from_mhc_auto$instrument_id
## NULL
## 
## $nearest_match_from_mhc_auto$instrument_name
## NULL
## 
## $nearest_match_from_mhc_auto$topics_auto
## NULL
## 
## $nearest_match_from_mhc_auto$topics_strengths
## NULL
## 
## $nearest_match_from_mhc_auto$nearest_match_from_mhc_auto
## NULL
## 
## $nearest_match_from_mhc_auto$closest_catalogue_question_match
## NULL
## 
## $nearest_match_from_mhc_auto$seen_in_catalogue_instruments
## NULL

The total number of questions is:

length(match$questions)
## [1] 29

The first three entries in the similarity matrix are:

match$matches[[1]][c(1, 2, 3)]
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 0.31365
## 
## [[3]]
## [1] 0.3432307

Generate a Crosswalk table

We can generate a crosswalk table to match questions with a cosine similarity above a certain threshold.

threshold <- 0.7

df_crosswalk_table <- generate_crosswalk_table(
    match$instruments,
    match$matches,
    threshold,
    is_allow_within_instrument_matches = TRUE,
    is_enforce_one_to_one = TRUE
)

df_crosswalk_table
##                              pair_name       question1_id question1_no
## 1     CES_D English_6_CES_D English_17    CES_D English_6            6
## 2  CES_D English_10_GAD-7 Portuguese_7   CES_D English_10           10
## 3    CES_D English_12_CES_D English_16   CES_D English_12           12
## 4    CES_D English_17_CES_D English_18   CES_D English_17           17
## 5 GAD-7 Portuguese_1_GAD-7 Norwegian_1 GAD-7 Portuguese_1            1
## 6 GAD-7 Portuguese_2_GAD-7 Norwegian_2 GAD-7 Portuguese_2            2
##                                             question1_text       question2_id
## 1                                        I felt depressed.   CES_D English_17
## 2                                          I felt fearful. GAD-7 Portuguese_7
## 3                                             I was happy.   CES_D English_16
## 4                                     I had crying spells.   CES_D English_18
## 5          Sentir-se nervoso/a, ansioso/a ou muito tenso/a  GAD-7 Norwegian_1
## 6 Não ser capaz de impedir ou de controlar as preocupações  GAD-7 Norwegian_2
##   question2_no
## 1           17
## 2            7
## 3           16
## 4           18
## 5            1
## 6            2
##                                                           question2_text
## 1                                                   I had crying spells.
## 2                      Sentir medo como se algo horrível fosse acontecer
## 3                                                        I enjoyed life.
## 4                                                            I felt sad.
## 5                       Følt deg nervøs, engstelig eller veldig stresset
## 6 Ikke klart å slutte å bekymre deg eller kontrolleren bekymringene dine
##   match_score
## 1   0.7211360
## 2   0.8114453
## 3   0.7279788
## 4   0.7682067
## 5   0.9261098
## 6   0.8807539

Cluster the questions

Display the clusters that come out of Harmony by default.

for (i in seq_along(match$clusters)) {
    print(paste("Cluster", match$clusters[[i]]$cluster_id, match$clusters[[i]]$text_description))

    for (j in seq_along(match$clusters[[i]]$item_ids)) {
        id <- match$clusters[[i]]$item_ids[[j]]
        print(paste("    ", match$questions[[id]]$question_text))
    }
}
## [1] "Cluster 1 Ficar facilmente aborrecido/a ou irritado/a"
## [1] "     I was bothered by things that usually don’t bother me."
## [1] "     Ficar facilmente aborrecido/a ou irritado/a"
## [1] "Cluster 2 I did not feel like eating; my appetite was poor."
## [1] "     I did not feel like eating; my appetite was poor."
## [1] "Cluster 3 I thought my life had been a failure."
## [1] "     I felt I was just as good as other people."
## [1] "     I thought my life had been a failure."
## [1] "Cluster 4 I could not get “going.”"
## [1] "     I had trouble keeping my mind on what I was doing."
## [1] "     I could not get “going.”"
## [1] "     I felt that I could not shake off the blues even with help from my family or friends."
## [1] "Cluster 5 I felt that everything I did was an effort."
## [1] "     I felt that everything I did was an effort."
## [1] "Cluster 6 I felt fearful."
## [1] "     I felt fearful."
## [1] "     Sentir medo como se algo horrível fosse acontecer"
## [1] "Cluster 7 My sleep was restless."
## [1] "     My sleep was restless."
## [1] "Cluster 8 I was happy."
## [1] "     I was happy."
## [1] "     I enjoyed life."
## [1] "     I felt hopeful about the future."
## [1] "Cluster 9 I talked less than usual."
## [1] "     I talked less than usual."
## [1] "Cluster 10 I felt sad."
## [1] "     I felt lonely."
## [1] "     I felt sad."
## [1] "     I felt depressed."
## [1] "     I felt that people dislike me."
## [1] "     I had crying spells."
## [1] "Cluster 11 People were unfriendly."
## [1] "     People were unfriendly."
## [1] "Cluster 12 Ikke klart å slutte å bekymre deg eller kontrolleren bekymringene dine"
## [1] "     Não ser capaz de impedir ou de controlar as preocupações"
## [1] "     Ikke klart å slutte å bekymre deg eller kontrolleren bekymringene dine"
## [1] "     Preocupar-se muito com diversas coisas"
## [1] "Cluster 13 Følt deg nervøs, engstelig eller veldig stresset"
## [1] "     Ficar tão agitado/a que se torna difícil permanecer sentado/a"
## [1] "     Følt deg nervøs, engstelig eller veldig stresset"
## [1] "     Sentir-se nervoso/a, ansioso/a ou muito tenso/a"
## [1] "     Dificuldade para relaxar"

Display the similarities between instruments

Let’s display the similarities between the instruments.

for (i in seq_along(match$instrument_to_instrument_similarities)) {
    obj <- match$instrument_to_instrument_similarities[[i]]
    print(paste("F1 similarity of", obj$instrument_1_name, "to", obj$instrument_2_name, "is", obj$f1))
}
## [1] "F1 similarity of CES_D English to GAD-7 Portuguese is 0.675"
## [1] "F1 similarity of CES_D English to GAD-7 Norwegian is 0.55"
## [1] "F1 similarity of GAD-7 Portuguese to GAD-7 Norwegian is 0.642857142857143"

Saving to CSV/Excel

Let’s convert to a dataframe so we can export to CSV:

df <- data.frame(match$matches[[1]])
for (x in seq_along(match$matches)) {
    df[x, ] <- match$matches[[x]]
}

Set the row and column names to something human readable:

colnames(df) <- lapply(match$questions, function(x) paste(x$question_no, x$question_text, sep = " "))
rownames(df) <- lapply(match$questions, function(x) paste(x$question_no, x$question_text, sep = " "))

Now we can save to CSV:

write.csv(df, "matches.csv")

If you open the CSV generated, you can see the complete matrix. You can also open in Excel.

You can also see the file in Google Colab online by going into the file system on the left panel (see below screenshot).

Screenshot
Screenshot

Let’s take a look at the first few lines of the CSV file using the Bash command head:

system("head matches.csv", intern=TRUE)
##  [1] "\"\",\"1 I was bothered by things that usually don’t bother me.\",\"2 I did not feel like eating; my appetite was poor.\",\"3 I felt that I could not shake off the blues even with help from my family or friends.\",\"4 I felt I was just as good as other people.\",\"5 I had trouble keeping my mind on what I was doing.\",\"6 I felt depressed.\",\"7 I felt that everything I did was an effort.\",\"8 I felt hopeful about the future.\",\"9 I thought my life had been a failure.\",\"10 I felt fearful.\",\"11 My sleep was restless.\",\"12 I was happy.\",\"13 I talked less than usual.\",\"14 I felt lonely.\",\"15 People were unfriendly.\",\"16 I enjoyed life.\",\"17 I had crying spells.\",\"18 I felt sad.\",\"19 I felt that people dislike me.\",\"20 I could not get “going.”\",\"1 Sentir-se nervoso/a, ansioso/a ou muito tenso/a\",\"2 Não ser capaz de impedir ou de controlar as preocupações\",\"3 Preocupar-se muito com diversas coisas\",\"4 Dificuldade para relaxar\",\"5 Ficar tão agitado/a que se torna difícil permanecer sentado/a\",\"6 Ficar facilmente aborrecido/a ou irritado/a\",\"7 Sentir medo como se algo horrível fosse acontecer\",\"1 Følt deg nervøs, engstelig eller veldig stresset\",\"2 Ikke klart å slutte å bekymre deg eller kontrolleren bekymringene dine\""
##  [2] "\"1 I was bothered by things that usually don’t bother me.\",0.999999999999999,0.313650047598938,0.343230663499091,-0.26082840444773,0.427888051190685,0.340548426607828,-0.307489283086903,-0.184493825916893,-0.259145711080717,0.312328015759908,0.280571828742239,-0.281010362699159,0.485770730945963,0.27214036152058,-0.280003827766512,-0.198906231995062,0.286944981670527,0.310942396490167,0.375459782504639,0.288291389823687,0.337880247659449,0.442903150088537,0.438708029981696,-0.265802079487998,0.387832016192289,0.530011442980522,0.258458669143746,0.348343991802497,0.475978137279983"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
##  [3] "\"2 I did not feel like eating; my appetite was poor.\",0.313650047598938,0.999999999999999,0.325312636148027,-0.384496744718963,-0.393828107393692,-0.436075435200343,-0.44434258786392,-0.237017966209617,-0.469965647187694,-0.36608214873585,-0.344935740246911,-0.2956175296965,0.362998747444815,0.385506492944318,-0.207623165034618,-0.301744841507874,-0.411185159038716,-0.427179414589496,-0.339413721208655,0.379519141291404,-0.317062207829909,0.114358660733742,-0.110786833270329,-0.194964660023755,-0.250276708464404,-0.143823431136999,-0.248033884409163,-0.299941041005306,-0.124765207591952"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
##  [4] "\"3 I felt that I could not shake off the blues even with help from my family or friends.\",0.343230663499091,0.325312636148027,0.999999999999999,-0.457278337076548,0.360037837728839,0.446186372641613,-0.424006298787741,-0.307091262268577,-0.409098908621717,0.340755719365608,-0.223673061882965,-0.392709933625194,-0.223701952898128,0.329841903209299,-0.224017882472451,-0.279029504893053,-0.40553120744061,0.398032196090193,0.333325178197449,0.502272579017944,-0.247389841340177,0.368957384919643,-0.224658042616917,-0.302741265998839,0.299558111765408,-0.242008777373454,-0.313666935789848,-0.262069211517291,0.301125833096098"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##  [5] "\"4 I felt I was just as good as other people.\",-0.26082840444773,-0.384496744718963,-0.457278337076548,0.999999999999997,-0.45695771385426,-0.422779601372985,0.387032134669,0.303661584582296,-0.522683600041443,-0.399109777253549,-0.237761981145234,0.516332383916904,-0.341080448498692,-0.376720474098066,-0.363402654326304,0.432010545821781,-0.350384920623116,-0.418676316012637,-0.489129521475084,-0.334745070798226,-0.23707136631531,-0.119743197391102,-0.187020615621434,-0.0884884693546621,-0.200390171336801,-0.184016998804753,-0.297614050070531,-0.214550444763353,-0.150253560835375"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [6] "\"5 I had trouble keeping my mind on what I was doing.\",0.427888051190685,-0.393828107393692,0.360037837728839,-0.45695771385426,1,0.466129813474605,0.421571817602657,-0.383538625904366,0.43456732088477,0.48017422177863,0.414968161147523,-0.428218466300276,0.399571566241566,0.311824263099099,0.187116769950136,-0.330268159039091,0.387677982557209,0.405366590531665,0.350426408900563,0.553618057210407,0.32335716262896,0.358479398631665,0.39130566150574,-0.288193584395147,0.385785350085118,0.230910969127669,0.311274530821619,0.326613140747239,0.405171407085974"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
##  [7] "\"6 I felt depressed.\",0.340548426607828,-0.436075435200343,0.446186372641613,-0.422779601372985,0.466129813474605,0.999999999999999,0.376669143854367,0.377858633471302,0.610592763275505,0.623487927958546,0.389058133525941,-0.50561308983916,0.302299928495629,0.634577132103332,0.14138991572749,-0.305348210988583,0.721135979181554,0.84969054624074,0.48665361159447,0.403131842058468,0.590245972670556,0.195999306127027,0.379916687766378,0.343792003391211,0.565290736311644,0.336558130088427,0.582476852630194,0.573595968928026,0.339948135047175"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
##  [8] "\"7 I felt that everything I did was an effort.\",-0.307489283086903,-0.44434258786392,-0.424006298787741,0.387032134669,0.421571817602657,0.376669143854367,1,0.267383791867338,-0.444579768015073,-0.280491826567327,0.173945314106401,0.267921404132051,-0.20775773301154,-0.34379544405005,-0.0680977218797329,0.308170225107309,0.329160613870581,0.31908612421549,0.231909595183986,-0.405221663055693,-0.231724660790774,-0.190446556987743,0.16489999322555,-0.198454774969685,-0.242967831220911,-0.217054472811035,-0.19064060773193,-0.224499080049824,-0.201212010108047"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##  [9] "\"8 I felt hopeful about the future.\",-0.184493825916893,-0.237017966209617,-0.307091262268577,0.303661584582296,-0.383538625904366,0.377858633471302,0.267383791867338,0.999999999999999,-0.449764716669952,0.358443762849618,-0.238848197889448,0.512804019722446,-0.165696298568966,-0.276575436767877,-0.156910707215902,0.457223203524353,0.17416214444544,0.318487756133858,-0.136338457955145,-0.426408070036268,0.328153764154816,-0.155400498634685,0.266038525899472,0.219371674557606,-0.157425500813318,0.187176191585208,0.322128105694421,0.289229467294762,0.261893328235846"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [10] "\"9 I thought my life had been a failure.\",-0.259145711080717,-0.469965647187694,-0.409098908621717,-0.522683600041443,0.43456732088477,0.610592763275505,-0.444579768015073,-0.449764716669952,0.999999999999999,0.410728623279194,0.29550894947264,-0.497530817403389,0.297007906515359,0.513598247306311,0.255892344791703,-0.551406431357963,0.539852360586166,0.612483820496241,0.560260811238461,0.357839372214516,0.196036400343276,-0.154716243425524,0.196679665892338,-0.229190423375902,0.284539955878017,0.155592578793472,0.438463254791755,0.177084943005242,-0.222636761164522"