development

Contributing to Harmony

Please select all the ways you would like to hear from Harmony project:

Contributing to Harmony

Contribute to the Harmony open source NLP project

Are you a scientist, researcher, data wrangler, or language maestro? Harmony needs YOU! We’re always looking for talented individuals to join our team.

  • Contribute to our open-source code: Whether you’re a seasoned developer or a curious newbie, your contributions are valued.
  • Join the conversation: Join our Discord server, or follow us on Twitter, LinkedIn, and other social media channels.

Getting started

Participating in an open source project can be very rewarding. Read more about it here!

Please familiarise yourself with Git. You can fork Harmony and make a pull request any time! We’re glad to have your contribution.

Video of our orientation session on how to contribute to Harmony.

Issues and bug reports

First, do a quick search to see if the issue has already been reported. If so, it’s often better to just leave a comment on an existing issue, rather than creating a new one. Old issues also often include helpful tips and solutions to common problems. You should also check the troubleshooting guide to see if your problem is already listed there.

If you’re looking for help with your code, consider posting a question on the GitHub Discussions board. Please understand that we won’t be able to provide individual support via email. We also believe that help is much more valuable if it’s shared publicly, so that more people can benefit from it.

Make your first contribution

There are lots of ways you can contribute to Harmony! You can work on code, improve the API, or add code examples.

Where do we need help in Harmony?

In particular, the PDF extraction (converting PDFs to structured questionnaire items) is very hard and we have a separate Github repo with examples here: https://github.com/harmonydata/pdf-questionnaire-extraction

We are planning on running a hackathon focused on this aspect of the tool.

Also, other initiatives that could be really useful include:

We started to make a new repo with training data to improve the PDF data processing. This repo has manually annotated training data: https://github.com/harmonydata/pdf-questionnaire-extraction

One issue that would be really helpful would be, to handle active and passive voice, such as “child is bullied” vs “child bullies others” - these are harmonised as close together whereas they should be far apart.

Maybe a small task is easier to start with. Can you see any obvious bugs that you’d like to pick up?

We’ve been looking at integrations with other data repositories. One way forward is to make an npm package other sites can install that makes it easy for them to send data over to the app. Or possibly just to interact with Harmony’s API directly.

Raising issues and the issue tracker

The issue list is in the Github repository. You can view the open issues, pick one to fix, or raise your own issue. Even if you’re not a coder, feel free to raise an issue.

Coding Harmony

Harmony is mostly coded in Python. We use Pycharm IDE by JetBrains. Please ensure you are familiar with Python, HuggingFace, and FastAPI, or Javascript and React if you want to work on the front end.

Please make sure all code you commit is linted using the Pycharm default linter. If you use a different one (such as VS Code’s linter, or pylint), this will make the code history hard to follow, so please be consistent.

Unit tests and code stability

Harmony uses the pytest framework for testing. For more info on this, see the pytest documentation. To be interpreted and run, all test files and test functions need to be prefixed with test_.

The Harmony Python library https://github.com/harmonydata/harmony is the core Harmony functionality. Most of the logic is in this repo. This repo has unit tests which run automatically on commits to main.

However, the Harmony API repo https://github.com/harmonydata/harmonyapi uses the Harmony Python library as a submodule. When you update the Python library, please run the unit tests and integration tests in the API repo to check nothing is broken - including the Selenium tests which test the browser app end to end. You will need to install Selenium to run the tests.

Since the API repo includes the Python library as a submodule, when you update the Python library, you will need to update the submodule (in the harmonyapi repo, cd into the submodule folder and do git pull, then cd out and do git add, commit and push). We recommend you familiarise yourself with Git submodules.

Finally, the app repo https://github.com/harmonydata/app is the React front end. Please check you can run this repo locally also before you start contributing. To point the front end repo to a local copy of your API repo, please change the file .env to point to http://localhost:8000.

Pull requests

If you’d like to contribute to this project, you can contact us at https://harmonydata.ac.uk/ or make a pull request on our Github repository. You can also raise an issue.

Related Posts

Improve Harmony's PDF parsing on DOXA AI

Improve Harmony's PDF parsing on DOXA AI

Train your own Large Language Model to parse PDFs and win up to £1000 in vouchers! Join a competition to train a machine learning model to improve Harmony’s PDF parsing. You don’t need to have trained a machine learning model before. Register on DOXA AI Enter the competition on DOXA AI by fine tuning your own model and improve Harmony! Join our Discord Join the Harmony Discord server. Check out the 🏅「matching-challenge」 channel!

Harmony at GenAI and LLMs night at Google London

Harmony at GenAI and LLMs night at Google London

Harmony at GenAI and LLMs night at Google London on 10 December 2024 Above: video of the AICamp meetup in London on 10 December 2024. Harmony starts at 40:00 - the first talk is by Connor Leahy of Conjecture We have presented the AI tool Harmony at the GenAI and LLMs night at Google London on 10th December organised by AI Camp at Google Cloud Startup Hub. AI Camp and Google hosted two deep dive tech talks on AI, GenAI, LLMs and machine learning, with food/drink, networking with speakers and fellow developers.

Signup to our newsletter

The latest news on data harmonisation project.

Please select all the ways you would like to hear from Harmony project:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website. We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.