We are proud to have launched our first competition on Kaggle!
The primary challenge of this competition is to develop a tool or method that can accurately extract questionnaire questions from documents, primarily PDFs.
This competition offers a unique opportunity for participants to contribute to the field of natural language processing and document analysis while developing solutions that have real-world applications. We encourage participants to think creatively, leverage available resources, and push the boundaries of current technologies.
Requirements: Python 3.10 or greater
Create an account on Kaggle.
Install Kaggle on your computer:
pip install kaggle
On the Kaggle website, download your
kaggle.json file and put it in your home folder under
Download and unzip the competition data:
kaggle competitions download -c harmony-pdf-and-word-questionnaires-extract
To generate predictions for the training data and write to train_predictions.csv:
python create_sample_submission.py train
To evaluate the train predictions:
To modify the prediction logic or inject your own model, you can edit the function
To generate predictions for the test data and write to submission.csv:
python create_sample_submission.py test
kaggle competitions submit -c harmony-pdf-and-word-questionnaires-extract -f submission.csv -m "Message"
Tracking Harmony issues and raising feedback Have you found a problem with Harmony? You can use the links on the Harmony homepage at harmonydata.ac.uk - you can see there are buttons titled Raise NLP issue and Raise UI issue, which take you to the process for logging issues in Github. The issues remain in the issue trackers and can be assigned to people either under “Assignee” or in the comments:
Git and GitHub workflow The preferred workflow for contributing to Harmony’s repository is to fork the main repository on GitHub, clone, and develop on a new branch. Please read our general guide about contributing to Harmony. Fork the main project repository by clicking on the ‘Fork’ button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.