development

Harmony on Kaggle

Please select all the ways you would like to hear from Harmony project:

Harmony on Kaggle

Harmony launches on Kaggle!

We are proud to have launched our first competition on Kaggle!

The primary challenge of this competition is to develop a tool or method that can accurately extract questionnaire questions from documents, primarily PDFs.

This competition offers a unique opportunity for participants to contribute to the field of natural language processing and document analysis while developing solutions that have real-world applications. We encourage participants to think creatively, leverage available resources, and push the boundaries of current technologies.

Try Kaggle

Try your hand at our competition

Kaggle Github repo

Check out the Github repo associated with the Kaggle competition and the tagged PDF data

Entering the Kaggle competition

Requirements: Python 3.10 or greater

  1. Create an account on Kaggle.

  2. Install Kaggle on your computer:

pip install kaggle
  1. On the Kaggle website, download your kaggle.json file and put it in your home folder under .kaggle/kaggle.json.

  2. Download and unzip the competition data:

kaggle competitions download -c harmony-pdf-and-word-questionnaires-extract
unzip harmony-pdf-and-word-questionnaires-extract.zip 
  1. Run create_sample_submission.py in the folder containing your data to create your train and test predictions:

To generate predictions for the training data and write to train_predictions.csv:

python create_sample_submission.py train

To evaluate the train predictions:

python evaluate_train_results.py
  1. To modify the prediction logic or inject your own model, you can edit the function dummy_extract_questions.

  2. To generate predictions for the test data and write to submission.csv:

python create_sample_submission.py test
  1. Submit your CSV file to Kaggle
kaggle competitions submit -c harmony-pdf-and-word-questionnaires-extract -f submission.csv -m "Message"

Related Posts

Harmony at MethodsCon: Futures in Manchester

Harmony at MethodsCon: Futures in Manchester

MethodsCon in Manchester We will be at MethodsCon: Futures in Manchester, run by the National Centre for Research Methods on 11 and 12 September 2024 to present Harmony, the NLP and AI tool we have been developing for researchers in social science, funded by Wellcome and the Economic and Social Research Council. The events take place at The Edwardian Manchester. Methods Showcase – 11th September The first event is a workshop on 11 September:

Harmony’s research has been published in BMC Psychiatry!

Harmony’s research has been published in BMC Psychiatry!

BMC Psychiatry has published our paper validating Harmony on real-world data We are pleased to announce the publication of a paper validating Harmony on real-life data: Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data, authored by Eoin McElroy, Thomas Wood, Raymond Bond, Maurice Mulvenna, Mark Shevlin, George B. Ploubidis, Mauricio Scopel Hoffmann and Bettina Moltrecht, and published in BMC Psychiatry.

Signup to our newsletter

The latest news on data harmonisation project.

Please select all the ways you would like to hear from Harmony project:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website. We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.