A data science approach to EU differentiated integration

The EU has adopted around 180,000 laws in its 62 years of history, shaping the lives of European citizens, from food to trade, from privacy to roaming, from banking to migration. In the context of the TRIGGER project, we have collected a dataset of more than 148,000 of these EU laws and have started experimenting with a machine-based approach to analyse them.

Our first testing ground is the so-called ‘differentiated integration’ of the European Union. A differentiation in European law can be defined as “a provision that formally exempts at least one member state from applying a legal rule otherwise valid for all EU member states” (Duttle et al. 2017). It is an increasingly relevant phenomenon in EU integration: as a matter of fact, with the EU’s growing membership and competences, it has been increasingly necessary to offer more flexibility and exemptions for member states in certain policy areas. Since the Maastricht Treaty, member states have been able to negotiate specific arrangements within EU law enabling them to opt-in or out of certain policy areas, based on the complexity of their internal governance structures (Denmark and the UK being prime examples). The term differentiated integration then refers to these country specific arrangements which are enshrined in EU law. While differentiated integration has been the EU’s panacea to overcome political stalemates, it raises many important questions vis à vis the EU’s internal cohesion and capacity to act. A better understanding of this phenomenon is thus crucial towards assessing the EU’s overall effectiveness and actorness.

What can data science contribute to the study of differentiated integration? We share our approach and initial findings below.

The difficulty of running large-scale studies on differentiated integration? 

Identifying differentiated integration entails identifying cases in which Member States opt out of specific policies. Exemptions can be specified either in primary law (the Treaties) or in secondary law (individual regulations, directives, decisions). The former case is relatively straightforward: it suffices to find the opt-outs of different member states as defined in a limited number of annexes and articles. This leads to simply counting the opt-outs per country. The case of secondary law, however, is much more challenging: the sheer number of legal texts makes it a daunting challenge to manually identify and count opt-out rights.

Given the complexity of the exercise, researchers so far had to rely on alternative approaches, in particular qualitative case studies. Researchers have chosen specific cases of differentiated integration over the years and investigated why these cases occurred and what consequences they have. More large-scale analyses have been performed through the manual interpretation of a sample of individual laws, aimed at identifying cases of differentiated integration. Given the sheer size of the EU acquis, however, these manual analyses have been limited to small samples – up to a few thousand laws in some cases. Imagine having to skim through the entire corpus of EU laws to find exemptions: even in the unlikely case that each law took 5 minutes to analyse, covering the whole 180,000 would take 625 days without sleep.

Can ‘data science’ come to the rescue?

Thanks to modern data science tools, manual text analysis is no longer the only solution. In TRIGGER, we are experimenting with a new approach for analysing EU secondary law and, thereby, also differentiated integration. Our approach combines social sciences with data science. More specifically, we are using the programming language R to collect a dataset with the metadata of around 148,000 European laws (regulations, directives, decisions) along with their full legal texts in a machine-readable format. This ‘big’ dataset opens entirely new possibilities for studying European law and governance, including differentiated integration. In addition, we have carried out an extensive literature review on the topic, to be able to determine indicators of differentiated integration. We then wrote a script in the programming language Python that automatically detects patterns of differentiated integration in these laws based on a customised keyword search, allowing us to map differentiated integration across the history of EU law-making.

Figure 1.  Differentiated Integration throughout time (preliminary results)

Figure 1 above presents the first preliminary results of our approach, showing the development of differentiated integration throughout time. The figure shows the share of laws in a given year, which contain some form of differentiated integration according to the Python script. We are currently finalising the analysis and will publish a more granular and comprehensive analysis and explanation of the method and findings in the coming months.

This quantitative text mining approach is a standard approach in the data science literature, but to our knowledge has never been applied to a big dataset of EU laws in European affairs research. This approach features three distinct advantages over traditional manual approaches. First, scalability: once the code for the analysis is programmed, it can analyse hundreds of thousands of laws in minutes. As our dataset grows, analysing more than 148,000 laws will not be a question of years of human labour, but a question of minutes of computer processing. Second, transparency: traditional interpretation of legal texts is often based on implicit assumptions, which are hard to verify by external reviewers. The machine-based approach translates human interpretation into code and makes underlying assumptions explicit in the code, allowing external reviewers to verify the underlying assumptions (this is true for our rules-based matching approach, less so for other approaches based on machine learning). Third, reproducibility: the same code applied to the same dataset will lead to the same results, allowing others to exactly reproduce (and improve) them.

It is important to highlight, however, that machine-based approaches do not replace human expertise in social science methods. If the coder does not possess a deep understanding of the research question, the resulting code will have no value. Code development is a crystallisation of human interpretation and not meant to replace it. Instead of removing assumptions and bias, the machine-based approach writes them into code. Qualitative research is, therefore, a necessary first step which has to inform software development; and a final step for the validation and contextualisation of the results. Moreover, machine-based analyses have one key disadvantage: precision. Today, even the best software produces more superficial results than human interpretation. When analysing a smaller amount of data, qualitative research will always produce more comprehensive results.

Quo Vadis differentiated integration analysis?

We are confident that our proposed approach will produce new empirical insights into differentiated integration and many other research questions related to European law. If it is to inspire action, however, empirical research should be complemented by normative reflections: How much differentiation is good for the EU? Does differentiation increase fragmentation instead of unity in diversity? If differentiated integration increased fragmentation, how could the EU reduce it? In TRIGGER, for example, we try to answer the question: how does differentiated integration affect the EU’s actorness and effectiveness in a variety of global governance settings? These are questions only humans can answer. By significantly speeding up and improving quantitative research, however, data science gives us more time to focus on these questions and adds a significant piece to the overall puzzle.

Authors: Ahmad Wali Ahmad Yar, Camille Borrett, Moritz Laurer



Thomas Duttle, Katharina Holzinger, Thomas Malang, Thomas Schäubli, Frank Schimmelfennig & Thomas Winzen (2017) Opting out from European Union legislation: the differentiation of secondary law, Journal of European Public Policy, 24:3, 406-428, DOI: