Troubleshooting Harmony

Harmony didn’t find anything in my PDF

Harmony tends to perform better if you upload a file with item numbers.

If there are are no question numbers in the instrument, it’s very hard for Harmony to distinguish question text from other content such as the copyright information. Click here to see an example PDF with question numbers included.

Also, if your PDF is a scanned document, please see if you can find a fully digitised (OCR’ed) version of the document.

We suggest either finding a file with question numbers or better quality content. Or try a different file format such as Word, Excel or CSV. We have guidance on formatting your files for Harmony.

Harmony supports:

Word
Excel
CSV
PDF - however if you are having problems parsing PDFs please try another format.

Finally, feel free to raise an issue to let us know that your PDF isn’t being parsed. Please also share the PDF in question. Harmony is an open source tool for social sciences research.

Introducing Harmony Meta: A Simpler Way to Discover Research Metadata

Introducing Harmony Meta: A Simpler Way to Discover Research Metadata Finding the right data is the foundation of good research, but it shouldn’t be the hardest part of the process. We are pleased to announce the beta release of Harmony Meta, a search tool built to help researchers find datasets and study information more efficiently. This first version is now live, and we are inviting the research community to test it and help us shape its future.

Improving Harmony's PDF extraction with user testing

Since we built Harmony, a common complaint has been that it frequently identifies the wrong questions in PDFs. The original algorithm for finding questions in PDFs was a mixture of rule based heuristics and some hand coded logic to look for e.g. lines in the document which begin with numbers. This was very fragile and worked fine on short questionnaires such as the GAD-7, but failed on larger documents. We decided to run a competition with our partner DOXA AI where members of the public could train their own model to extract questions from PDFs.

Troubleshooting Harmony

Troubleshooting Harmony

Harmony didn’t find anything in my PDF

Related Posts

Introducing Harmony Meta: A Simpler Way to Discover Research Metadata

Improving Harmony's PDF extraction with user testing

Signup to our newsletter