Harmony launches on Kaggle!

We are proud to have launched our first competition on Kaggle!

The primary challenge of this competition is to develop an AI tool or method that can accurately extract questionnaire questions from documents, primarily PDFs.

This competition offers a unique opportunity for participants to contribute to the field of natural language processing and document analysis as well as open source for social science while developing solutions that have real-world applications. We encourage participants to think creatively, leverage available resources, and push the boundaries of current technologies.

Try Kaggle

Try your hand at our competition

Kaggle Github repo

Check out the Github repo associated with the Kaggle competition and the tagged PDF data

Entering the Kaggle competition

Requirements: Python 3.10 or greater

Create an account on Kaggle.
Install Kaggle on your computer:

pip install kaggle

On the Kaggle website, download your kaggle.json file and put it in your home folder under .kaggle/kaggle.json.
Download and unzip the competition data:

kaggle competitions download -c harmony-pdf-and-word-questionnaires-extract
unzip harmony-pdf-and-word-questionnaires-extract.zip

Run create_sample_submission.py in the folder containing your data to create your train and test predictions:

To generate predictions for the training data and write to train_predictions.csv:

python create_sample_submission.py train

To evaluate the train predictions:

python evaluate_train_results.py

To modify the prediction logic or inject your own model, you can edit the function dummy_extract_questions.
To generate predictions for the test data and write to submission.csv:

python create_sample_submission.py test

Submit your CSV file to Kaggle

kaggle competitions submit -c harmony-pdf-and-word-questionnaires-extract -f submission.csv -m "Message"

Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and Datamind Data Science workshop On 2 May 2025, Dr Eoin McElroy demonstrated Harmony at the MQ and Datamind Data Science workshop in Deutsche Bank. Eoin’s presentation focused on “Maximising the use of existing survey data: facilitating cross-study research using retrospective harmonization.” The workshop brought together researchers interested in applying novel harmonisation techniques to existing datasets. Eoin explained traditional harmonisation processes and presented a user-friendly guide to the Harmony tool, demonstrating how natural language processing can streamline the harmonisation process.

'Send to Harmony' Chrome plugin

[Beta mode: we are currently testing this extension] We have developed a browser extension for Harmony called “Send to Harmony” which lets you send selected text to Harmony with a right-click. For PDFs, use the popup to paste your selected text. Send to Harmony enables users to send selected text to the Harmony Data Harmonization (https://harmonydata.ac.uk/) platform for analysis. This plugin provides a right-click or context menu item which allows users to easily bring text from into their harmonisations, making it easier to compare and analyze different measurement scales across research studies.

Harmony on Kaggle

Harmony launches on Kaggle!

Try Kaggle

Kaggle Github repo

Entering the Kaggle competition

Related Posts

Harmony at MQ and DataMind Data Science Workshop

'Send to Harmony' Chrome plugin

Harmony on Kaggle

Harmony launches on Kaggle!

Try Kaggle

Kaggle Github repo

Entering the Kaggle competition

Related Posts

Harmony at MQ and DataMind Data Science Workshop

'Send to Harmony' Chrome plugin

Signup to our newsletter