challenges

Improve Harmony’s PDF parsing on DOXA AI

Improve Harmony's PDF parsing on DOXA AI

Train your own Large Language Model to parse PDFs and win up to £1000 in vouchers!

Join a competition to train a machine learning model to improve Harmony’s PDF parsing. You don’t need to have trained a machine learning model before.

Register on DOXA AI

Enter the competition on DOXA AI by fine tuning your own model and improve Harmony!

Join our Discord

Join the Harmony Discord server. Check out the 🏅「parsing-challenge」 channel!

We would like to improve Harmony’s PDF parsing algorithm.

We would like to improve Harmony with a fine tuned language model. We have teamed up with DOXA AI and made an online competition where you can improve on the off-the-shelf LLMs which we are currently using. You can win up to £1000 in vouchers! Click here to join the Harmony matching competition on DOXA AI.

Register on DOXA AI

Enter the competition on DOXA AI by fine tuning your own language model and improve Harmony!

Join our Discord

Join the Harmony Discord server. Check out the 🏅「parsing-challenge」 channel!

Webinar

We will be running a webinar to introduce the challenge, where you can ask questions:

What about data?

We have gathered training data for you to use to fine tune your model, and there is unseen validation data which we will use to score the model.

More information is available on the DOXA AI page.

How can I get started?

First, create an account on DOXA AI and enroll in the competition and download the code examples and training data.

Prizes

The prize for the winner of the competition is £1000 in vouchers and the runner up will get £500 in vouchers.

See also

Matching competition

See other events

Register on DOXA AI

Enter the competition on DOXA AI by fine tuning your own language model and improve Harmony!

Related Posts

Improving Harmony's PDF extraction with user testing

Improving Harmony's PDF extraction with user testing

Since we built Harmony, a common complaint has been that it frequently identifies the wrong questions in PDFs. The original algorithm for finding questions in PDFs was a mixture of rule based heuristics and some hand coded logic to look for e.g. lines in the document which begin with numbers. This was very fragile and worked fine on short questionnaires such as the GAD-7, but failed on larger documents. We decided to run a competition with our partner DOXA AI where members of the public could train their own model to extract questions from PDFs.
Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and DataMind Data Science Workshop

Harmony at MQ and Datamind Data Science workshop On 2 May 2025, Dr Eoin McElroy demonstrated Harmony at the MQ and Datamind Data Science workshop in Deutsche Bank. Eoin’s presentation focused on “Maximising the use of existing survey data: facilitating cross-study research using retrospective harmonization.” The workshop brought together researchers interested in applying novel harmonisation techniques to existing datasets. Eoin explained traditional harmonisation processes and presented a user-friendly guide to the Harmony tool, demonstrating how natural language processing can streamline the harmonisation process.

Signup to our newsletter

The latest news on data harmonisation project.

Please select all the ways you would like to hear from Harmony project:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website. We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.