The term ‘harmonisation’ has often been used in different contexts – for example, to describe similar phenomena, such as collaboration, coherence, alignment, integration, partnership, etc. However, we might argue that these concepts might do nothing more than indicate the extent and scale of integration among different entities when it comes to regional cooperation.
Now, the underlying degree of interaction between all the players involved can run a lot deeper and tighter when we transition from collaboration, partnership, and cooperation to integration, community, harmonisation, and interdependence. We might also infer that integration of any kind is an entire process where multiple steps are involved based on the level of commitment coming from the various actors or parties involved.
Data harmonisation has been used for well over a decade in policy documents. For example, the term “harmonisation” was coined in the EHEA (European Higher Education Area) as a chief element in the Sorbonne Declaration of June 1999, signed as architecture of reform of the EU’s higher education system.
The African union has developed a similar framework to harmonise their higher education system where their policy documentation clearly refers to the entire process as “harmonisation”.
The data harmonisation process in education often involves narrowing all kinds of variance – i.e. variance in processes, structural factors, quality standards, qualification frameworks, credits, degree cycles, and so on. Data harmonisation for education may also be seen as the process of creating frameworks through which the relationships of actors involved in the international relations field can be governed.
No matter what an academic institution or education business sets out to do via data harmonisation in education, the purpose of all efforts is to converge on the common element of achieving identical standards or regulations which enrich and promote local as well as global achievements. It also implies greater access to reliable and transparent information, greater networking among all stakeholders in education, and sharing of best practices models with a view toward improving sharing of resources and inter-regional mobility, just to name a few.
Does this sound too complex or overly technical at a glance? It’s not, especially if you understand what context you want to use data harmonisation for education in.
Let’s continue with a practical example:
Surveys and questionnaires are a time-tested data source because they are a standardised form of acquiring the necessary data. This leads to a highly structured and uniform questionnaire which can be presented as a standardised interview.
However, some processes and data are either difficult to standardise or cannot be standardised through survey operations, questionnaires or standardised interview administration. What’s actually needed is further standardisation, which can be achieved via data processing.
Efforts focused on standardising inputs and outputs in comparative surveys are called ‘harmonisation’. The measurement of educational attainment within cross-national surveys, for instance, requires data harmonisation procedures with robust data processing practices at the core. Since the majority of educational systems and their respective qualifications can differ drastically across countries, where the names of educational qualifications can often not be reliably translated across different languages, similar terms are used to indicate the different levels of education.
There’s no terminology which can be equally well-understood by respondents from different countries. It is, therefore, not advisable to design a ‘single source’ instrument of measurement and translate this into the necessary languages for the survey (input harmonisation) – when we talk about measuring varying levels of education.
Instead, comparative research comes into play where country-specific measures of a person’s educational attainment are relied upon. The resulting country’s specific variables are then harmonised by recoding them into a common, universally understood standard once data has been collected. This is referred to as output harmonisation.
For surveys that are specifically designed for the purpose of comparative research – i.e. true cross-national surveys – data harmonisation in education is already integrated into the design phase of the survey and not just limited to the data processing aspect (referred to as ex-ante output harmonisation). In such surveys, the harmonised target variables are designed before creating the national questionnaires, along with the designing of measurement instruments pertaining to the different countries and the coding rules for harmonisation (prior to data collection).
This is a good way to ensure that every kind of education intended to be distinct and coded internationally will be easily identifiable in the country-specific questionnaire items.
Some surveys are not designed to be comparable from the outset, where researchers want to combine data from different surveys to be able to later examine a specific set of research questions which requires increasing the variation on the national level of a specific country – or, for example, where they want to increase the sample size to study specific groups. In such a situation, an ex-post data harmonisation approach will be taken, as opposed to the ex-ante output harmonisation approach which we discussed in the example above.
In this approach, the data to be pooled is first adjusted, which results in a single integrated dataset having coherent target variables. This applies to both surveys conducted in different countries and those conducted in only one country, although they are designed completely independently of each other.
The data is then made comparable after the data collection process by recoding variables related through the same underlying concept but originating from different measurement instruments – into a common, universally identifiable standard. In this case, the degree of flexibility allowed is limited by the information collected through the different surveys. As one might imagine, harmonisation in education surveys or otherwise can become difficult.
Having said that, education categories in some surveys are more detailed or better documented within specific datasets, compared to others – thus, making ex-post harmonisation more challenging and the results more limited, as opposed to ex-ante harmonisation.
Nearly all surveys cover education as a core social background variable; it is used in most statistical analyses, and therefore, difficult to harmonise. To facilitate this objective and support ex-post harmonisation projects where the goal is to maintain the most amount of information as possible compared to the original data – a proper harmonisation framework needs to be in place for harmonised educational attainment variables as ‘target’ variables in harmonisation projects.
The Harmony app builds on the ISCED (International Standard Classification of Education) standard, while extending it for usage in survey data where ex-post harmonisation is required. The ISCED framework includes a coding scheme called GISCED (Generalised International Standard Classification of Education), and builds on experience from ex-ante output harmonisation in education for comparative surveys – complementing all concepts underlying both ISCED and GISCED – in order to better represent stratified educations systems, as those still exist in a number of EU countries.
In the above section, we explored data harmonisation in education from a technical standpoint, specifically, in the context of surveys. However, in a broader sense, data harmonisation for education can provide a clear edge to educational and academic institutions, especially as competition gets fiercer and the technology used to deliver education continues to evolve at a rapid pace.
We’ve already discussed how data harmonisation is used to unify disparate data fields, columns, formats and dimensions into a composite dataset. In order for an academic institution to succeed, its users must have democratic access to ‘clean’ and high-quality data where data formats are agreed upon and the underlying data itself is not complex to decipher.
Data harmonisation in education draws information from diverse sources, clears away any errors, and presents the finished data as a unified and accurate body of information. We discussed one of these sources in the above section (surveys). When every user in the organisation has access to a single window of critical knowledge, it can facilitate in faster, more informed decision-making, and that too at every level of the organisational hierarchy.
With unaligned and disparate data, on the other hand, the process of extracting meaningful insights and trends, or identifying pain points, for example, becomes difficult. But once you use the right tool to clean, sort and aggregate the data – in other words, harmonise it – it can easily provide users with a complete picture on which to make decisions.
In plain and simple terms, education stakeholders and businesses must understand that data harmonisation for education can notably boost the underlying value and utilisation of data. It makes it possible to transform large chunks of inaccurate and fragmented data into workable and usable information – all of which can help to create new insights, analyses, and visualisations – which can help everyone involved – from the students, teachers and professors to the institutional owners and stakeholders.
Ultimately, data harmonisation in education can help the end users (in this case, all non-student entities) reduce the time it might take to discover key insights, for example, or access the right business intelligence, or detect early market disruptions. It can also significantly lower the overall cost involved in complex data analysis, not to mention the cost of handling and processing data over the long term. If an academic business is spending less time struggling and scrambling to find the appropriate source of data, then it can definitely spend that time more efficiently elsewhere – such as improving student and teacher-specific services or expanding operations and making a more desirable revenue impact.
Whether an academic institution has been around for decades or it’s just starting out, there’s no denying the fact that it will gather a myriad of data over the course of its existence, irrespective of its business model. Not only that, but there is a very stark possibility that the gargantuan volumes of information it gathers from multiple sources will likely have errors and misinformation. Additionally, the sheer volume of information that a business may collect throughout its lifespan can be very overwhelming and unwieldy.
With the right data harmonisation tools in your arsenal, such as the Harmony app, your data can be a priceless mine of business intelligence and insights. Educational institutions can learn things about their customers that they didn’t know before. They can discover changing market forces which can help them prepare better for the future. They can even gain insights on their competitors in order to better tailor their strategy.
The trend of businesses in education mining and storing data to make more informed business decisions and better manage their customers, is increasing. But, of course, you need to use the right tool or system to harmonise that data to be able to benefit from it.
In the education sector all across the world, and not just the UK, you’ll find organisations dishing out huge amounts of time, money, and resources on commissioning surveys, gathering information from social media networks, news channels, and other sources on the internet, as well as conducting lengthy focus group sessions. When this information comes to the respective businesses, it almost always comes in a mish-mash of raw, unstructured data, and not data that’s presented in a single, cohesive and manageable body.
Furthermore, to make sense of entire datasets on a whole, it must be harmonised because unharmonised, raw data simply isn’t suitable or usable for any kind of business analysis. Even if you try to make sense of it in its raw form, you will likely come across nothing more than misleading values, irrelevant pointers, and duplicate statistics.
But when academic businesses utilise effective data harmonisation methods, they can standardise their data, creating a single source of verifiable and usable information – which can then be used for many of the purposes stated above, and more.
As we mentioned earlier, the data you gather can be a gold mine of business intelligence and insights – but only if you process it correctly using a powerful tool built for data harmonisation in education. When you align, verify, and clear up inconsistencies in your data mine, you can interpret it successfully and use it with confidence to fuel growth, drive higher revenues, meet stakeholder objectives, etc. but that’s just the tip of the iceberg.
You see, clean and verifiable data is extremely important for confident business decision-making and for strategising within your educational institution. Unless there is a guarantee as to the source or veracity of your data, you would naturally hesitate to make tough, crucial, or timely decisions because it will likely impact your business very adversely on a whole. In fact, it will not only prevent critical decision-making but also stunt your business’s growth trajectory. All this combined will impact revenue negatively and, perhaps, even lead to ill-informed and ill-timed decisions, lacklustre production numbers, and possibly a loss in market share.
The Harmony app is built to help you avoid all that, by way of accessing the business-critical insights you need to plan your next steps – essentially providing insights into trends, patterns, and opportunities before your competitors have a chance to discover them.
Once your master data has been gathered and in place, it can effectively be utilised as many times as you want across different departments within the education system – in order to harmonise each department’s data on a regular basis. Incremental data updates will also improve the quality of that master data.
When you have a single source of accurate data, your education teams and departments will not be burdened with the task of developing their own datasets – which as you may already know, can be conflicting, expensive, and prone to errors. Therefore, teams from different departments within your academic organisation like sales, HR, marketing, and operations, will always benefit by using a single harmonised data set.
The first step is to homogenise and organise your data:
Data is mostly gathered from multiple sources, with each data origin point having its own unique format and structure. This is why we need to homogenise our data into the same format so that we can create hierarchies.
Proper data alignment will mean a standard language and hierarchy for everything in your education system, such as brands and products, geography (regions, cities, and areas of operation), currencies, time frames, channels, advertising & sales campaigns, customers, reviews, feedback, and transactions – and so on.
The second step is to create and build an entire information model:
After successfully harmonising your data, you need to integrate it into an aggregate information model, allowing users to access a general view of your products and the smaller, finer details of those products. The model is in a regular and logical format, allowing you to see correlations between different data categories – from sales and advertising to production and distribution, for example.
You can also create specific metrics for future analysis, so that your users can pick up anomalies before the data is sent for analysis.
In the third step, we extract, transform, and lead the data, aka. ETL:
We must now move the data into a common dataset, which is done in three ways. Extraction gathers or accumulates the data in the original dataset. Transformation converts the format so that the data is ready for analysis. And, loading writes the data to the designated dataset.
ETC may cause pressing issues during the final data integration process because all it takes is a single error to throw the entire system out of whack.
Alternatively, you can make use of data virtualisation, which creates a layer to allow applications to access, retrieve, and manipulate data as and when needed. It brings the information together under a single virtual location, providing users with real-time access, unlike ETL, and is often more accurate and cost-effective than the former.
Choosing the right approach for data harmonisation in education is very important – data scientists and analysts should be focusing more on adding real value to the business by extracting insights and value from the data, rather than primarily focusing on micromanaging data movement.
In the fourth step, we will perform data cleansing:
This process involves correcting and/or removing faulty, inconsistent, or inaccurate data from a dataset. Think of it as the data analyst giving the data an overhaul or makeover. Examples of data cleansing include removing duplicate fields or getting rid of misspelled names.
In step five, we carry out data normalisation:
Both data harmonisation and normalisation work toward the same thing more or less – that is, they make the basic aspects of data uniform. For instance, enabling a video and a tweet with different formats exist in the same dataset without any compatibility issues arising as a result.
The last step involves making classifications:
To explain it in a very basic way, classifications allow your users to segment the data, filter it on an as-and-when needed basis, and then extract the necessary information. It’s similar to the headings you see at the top of an Excel spreadsheet.
If you’re not harmonising your data, you may be risking a lot. In today’s fiercely competitive and hyper-digital world, access to quality, meaningful, and usable data is everything.
Without data harmonisation for education, your business model strategists will not have a clear and authentic picture of sales, trends, and other essential business metrics. Plus, it’s going to be very hard to see an overall view of your data or drill down to extract the necessary micro insights, for that matter.
More importantly though, your academic institution’s management team can easily miss out on opportunities or potential disruption because the data they are viewing is largely disorganised, widespread, and available in many different disparate forms. If you’re going to make decisions based on data in this state (raw and unharmonised), you could be making a lot of erratic and expensive decisions, causing you to lose sales, opportunities to grow, and potentially even risk the entire business.
With the above said, and everything else we’ve discussed in the article thus far, data harmonisation in education can be a time-consuming and exhaustive process. Without the right tool, you could go in circles and never find the insights you need to strategise and plan your next move.
Harmony makes data harmonisation in education a seamless step in the entire data analytics process, helping your management team focus on finding the appropriate insights which will drive your education business forward.
For users who have been using Harmony in their research, we have created an example scripts repository here https://github.com/harmonydata/harmony_examples This contains example R notebooks and Jupyter notebooks. You can upload your own example script if you have something to share with the research community. Example problems that users have been solving included: R examples Walkthrough R notebook in R Studio: Walkthrough R notebook in Google Colab: Python examples Walkthrough Python notebook Example script to create a crosswalk table on real survey data Example script to strip prefixes from questions Documentation View the PDF documentation of the R package on CRAN
Upcoming Tech Talk: GenAI and LLMs night at Google London on 10 December 2024 We’re pleased to announce that the AI tool Harmony will be showcased at the upcoming GenAI and LLMs night at Google London on 10th December organised by AI Camp. Topic: Harmony, Open source AI tool for psychology research Speakers: Thomas Wood (Fast Data Science), Bettina Moltrecht (UCL) Date: 10th December 2024 See other Harmony events 8 October 2024: Harmony: a free online tool using LLMs for research in psychology and social sciences at AI|DL London 11 and 12 September 2024: Harmony at MethodsCon Futures in Manchester 2 July 2024: Harmony: NLP and generative models for psychology research at Pydata London 3 June 2024: Harmony Hackathon at UCL 5 May 2024: Harmony: A global platform for harmonisation, translation and cooperation in mental health at Melbourne Children’s LifeCourse Initiative seminar series.