New DIE-GDI Briefing Paper: Digitalization for supporting refugees

I published a new DIE-GDI Briefing Paper today on how digital tools can be used by donors, and refugees themselves, to manage and support safe resettlement processes. Feel free to share, and expect the German-language version in the next week or so!


Where Are the Legislators (Who Ostensibly Pay for Data)?

I watched from a distance on Twitter as the World Bank hosted its annual data event. I would love to have attended – the participants were a pretty amazing collection of economists, data professionals and academics. This tweet seemed to resonate with a theme I’ve been focused on the last week or so: There is a data shortage such that even the most advanced countries can’t measure the Sustainable Development Goals (SDGs).

The European Statistical System can only produce around 1/3 of #SDG indicators, according to Pieter Everaers of @EU_Eurostat #ABCDEwb — Neil Fantom (@neilfantom) June 21, 2016

I replied to this tweet with a query about whether there was evidence of political will among EU member states to actually collect this data. In keeping with the “data is political” line that I started on last week, political will is important because the European Statistical System relies heavily on EU member states’ statistics offices to provide data. The above tweet highlights two things for me – there needs to be a conversation about where the existing data comes from, and there need to be MPs or MEPs (legislative representatives) at meetings like the World Bank’s annual data event.

Since Eurostat and the European Statistical System were the topic of the tweet, I’ll focus on how they gather statistics. Most of my expertise is in their social and crime stats so I’ll speak to those primarily, but it’s important to note that the quality and quantity of any statistic is based on its importance to the collector and end user. Eurostat got its start as a hub for data on the coal and steel industries in the 1950s, and while its mandate has grown significantly the quality and density of the economic and business indicators hosted on its data site reflect its founding purpose. Member states provide good economic data because states have decided that trade is important – there is a compelling political reason to provide these statistics. Much of this data is available at high levels of granularity, down to the NUTS 3 level. It’s mostly eye-wateringly boring agricultural, land use, and industrial data, but it’s the kind of stuff that’s important for keeping what is primarily an economic union running smoothly(-ish).

If we compare Eurostat’s economic data to its social and crime data, the quality and coverage decrease notably. This is when it’s important to ask where the data comes from and how it’s gathered – if 2/3 of the data necessary to measure the SDGs isn’t available for Europe (let alone say, the Central African Republic) we need to be thinking clearly about why we have the data we have, and the values that undergird gathering good social data. Eurostat statistics that would be important to measuring the SDGs might include the SILC surveys that measure social inclusion, and general data on crime and policing. The SILC surveys are designed by Eurostat and implemented by national statistics offices in EU member states. The granularity and availability varies depending on the capacity of the national stats office and the domestic laws regarding personal data and privacy. For example, some countries run the SILC surveys at the NUTS 2 level while others administer them only at the national level. A handful of countries, such as France, do the surveys at the individual level and produce panel data. The problem is that the SILC data has mixed levels of availability due to national laws regarding privacy – for example, if you want the SILC panel data you have to apply for it and prove you have data storage standards that meet France’s national laws for data security.

Crime and police data is even more of an issue. Eurostat generally doesn’t collect crime data directly from member states. They have an arrangement with the UN Office on Drugs and Crime where crime and police data reported to the UN by EU member states gets passed to Eurostat and made available through their database. One exception is a dataset of homicide, robbery and burglary in the EU from 2008-2010 that is disaggregated down to the NUTS 3 level. When I spoke with the crime stats lead at Eurostat about this dataset he explained that it was a one-off survey in which Eurostat worked with national statistics offices to gather the data; in the end it was so time consuming and expensive that it was canceled. Why would such a rich data collection process get the axe? Because it’s an established fact that crime statistics can’t be compared across jurisdictions due to definitional and counting differences. So funders reasonably asked: What’s the point of spending a lot of money and time collecting data that isn’t comparable in the first place?

A key problem I see in the open data discussion is a heavy focus on data availability with relatively little focus on why the data we have exists in the first place, and by extension what would go into gathering new SDG-focused data (e.g. the missing 2/3 noted in the opening tweet). Some of this is driven by, in my opinion, an over confidence in/fetishization of ‘big data’ and crowdsourced statistics. Software platforms are important if you think the data availability problem is just a shortage of capacity to mine social networks, geospatial satellite feeds and passive web-produced data. I’d argue though that the problem isn’t collection ability, and that the focus on collection and validation of ‘big data’ distracts from the important political discussion of whether societies value the SDGs enough to put money and resources into filling the 2/3 gap with purpose-designed surveys instead of mining the internet’s exhaust hoping to find data that’s good enough to build policy on.

I’m not a Luddite crank – I’m all for using technology in innovative ways to gather good data and make it available to all citizens. Indeed, ‘big data’ can provide interesting insights into political and social processes, so finding technical solutions for managing reams and reams of it are important. But there is something socially and politically important about allocating public funds for gathering purpose-designed administrative statistics. When MPs, members of Congress, or MEPs allocate public funds they are making two statements. One is that they value data-driven policy making; the other, more important in my opinion, is that they value a policy area enough to use public resources to improve government performance in it. For this reason I’d argue that data events which don’t have legislative representatives featured as speakers are missing a key chance to talk seriously about the politics of data gathering. Perhaps next year instead of having a technical expert from Eurostat tell us that 2/3 of the necessary data for measuring the SDGs is missing, have Marianne Thyssen, the Commissioner for Employment, Social Affairs and Inclusion that covers Eurostat, come and take questions about EU and member state political will to actually measure the SDGs.

The World Bank’s data team, as well as countless other technical experts at stats offices and research organizations, are doing great work when it comes to making existing data available through better web services, APIs, and open databases. But we’re only having 50% of the necessary discussion if the representatives who set budgets and represent the interests of constituents aren’t participating in the discussion of what we value enough to measure, and what kind of public resources it will take to gather the necessary data.


How is Public Data Produced (Part 2)

I published a post yesterday about how administrative data is produced. In the end I claimed that data gathering is an inherently political process. Far from being comparable, scientifically standardized representations of general behavior, public data and statistics are imbued with all the vagaries and unique socio-administrative preferences of the country or locality that collects them.

Administrative criminal statistics are an interesting starting point if someone wants to understand how data reflects the vagaries of administrative structures. If someone thought “I would really like to compare crime rates across European Union member states” they would probably be surprised to learn that unless they just compare homicide rates it’s impossible to compare crime rates between countries. This is not only because definitions of different crimes are different between countries (though the UNODC has done a lot of work to at least standardize definitions), but the actual events of crime are counted differently. For example, Germany uses what’s called “principal offense” counting – this means that in the event that multiple crimes are committed at the same time, the final statistics only count the most serious crime. Belgium doesn’t use this counting method, so its crime statistics look much higher than Germany’s on paper. The University of Lausanne’s Marcelo Aebi, arguably the expert on comparative criminal statistics, published an excellent paper on comparing criminal statistics and the problems posed by different counting procedures (pages 17-18 for those who just want the gist).

Aebi makes a crucial point in the conclusion of his article: Statistics are social constructs and each society has a different way of constructing them. Statistics represent the things we have valued. The past-tense is important here; when we see data it’s showing us the past (the 2016 Global Peace Index uses numbers from 2015, for example), and thus represents what we valued at the time. Data can be used to build and test models of potential future events, but there is no such thing as ‘future data’. The value in data is that it can help citizens and policy makers understand what worked, or didn’t work, so that policies and behaviors can be adjusted going forward.

Of course institutional and administrative behavior is often resistant to trends in data (or very comfortable with data that supports the status quo). This can be for valid, or at least non-nefarious, reasons. For example the Sustainable Development Goals (SDGs) rely heavily on GDP as an economic indicator. The SDGs are supposed to represent sustainable growth and social development into the future, so it’s interesting that they use an economic indicator that many experts and organizations view as quite flawed.

Why would the SDGs rely so heavily on GDP then? For one, it’s a reliable indicator – everyone at least has some vague idea of what is represents. Two, it’s got a long history – we have tracked it for decades. Three, most of the people who created the SDGs come from backgrounds where GDP is a standard indicator – they pick targets and data based on their professional and institutional experience. They didn’t do this because they’re jerks. They did it because GDP represents the standard, if flawed, way that we measure economic performance. They probably also did it because gathering new data is an expensive, time consuming process that everyone says is important [for someone else to pay for].

This is all to say: If you want better public data, or to at least understand why the public data you have seems to reflect the status quo instead of telling you how to break out of it, it’s imperative to understand the qualitative political, social and administrative behaviors inherent to the place or people you’re researching. Once you’ve got that, you can start the political process of organizing the resources to get newer, better, data to formulate newer, better public policy and/or smashing the status quo.


How Is Public Data Produced?

The 2016 Global Peace Index (GPI) launched recently. Along with its usual ranking of most to least peaceful countries it included a section analyzing the capacity for the global community to effectively measure progress in the Sustainable Development Goals (SDGs), specifically Goal 16, the peace goal. The GPI’s analysis of statistical capacity (pp. 73-94) motivates a critical question: Where does data come from, and why does it get produced? This is important, because while the GPI notes that some of the Goal 16 targets can be measured with existing data, many cannot. How will we get all this new data?

Some of the data necessary to measure the Targets for Goal 16 is available. I’d say the GPI’s  findings can probably be extended to the other goals, so we’ll imagine for the sake of argument that we can measure 50-60% of the 169 Targets across all the SDGs with the data currently available globally. How will we get the other 40-50%? To deal with these questions it’s important to know who collects data: The primary answer is of course national statistics offices. These are the entities tasked by governments with managing statistics across a country’s ministries and agencies, as well as doing population censuses. Other data organizations include international institutions and polling firms. NGOs and academic institutes gather data too, but I’d argue that the scale of the SDGs means that governments, international organizations and big polling firms are going to carry the primary load. Knowing the Who, we can now get to the How.

National statistics offices (NSOs) should be the place where all data that will be used for demonstrating a nation’s progress toward goals is gathered and reported. In a perfect world NSOs would have necessary resources for collecting data, and the flexibility to run new surveys using innovative technologies to meet the rapidly evolving data needs of public policy. This is of course not how NSOs work. Much of what happens in a statistics office is less about gathering new data, and more about making sure what exists is accessible. In my experience NSOs have a core budget for census taking, but if new data has to be collected the funding comes from another government office. This last bit is important: NSOs do not generally have the authority to go get whatever data is necessary. If NSOs are going to be the primary source for data that will be used to measure the SDGs, it is critical that legislatures provide funding to government offices for data gathering.

International organizations are the next place we might look to for data. The World Bank, in my opinion, is the gold standard for international data. United Nations agencies also collect a fair amount of data. What sets the Bank apart is that they do some of their own data collection. Most international organizations’ data though is actually just NSO data from member states. For example, when you go to the UN Office on Drugs and Crime’s database, most of what you’ll find are statistics that were voluntarily reported by member states’ statistical offices. The UN, World Bank, OECD and other myriad organizations do relatively little of their own data gathering; much of their effort is spent making sure that the data they are given is accessible. Unless legislatures in member states provide funding to government agencies to gather data, and the government agrees to share the data with international organizations, most international institutions won’t have much new data.

Polling firms such as Gallup gather international survey data that is both timely, accurate and covers a wide range of topics relevant to the SDGs. Unfortunately their data is expensive to access. As a for-profit entity they have a level of flexibility to gather new data  that statistics offices don’t, but this level of flexibility is very expensive to maintain. A problem arises too when Gallup (and similar firms) decide that the data necessary to measure the SDGs is not commercially viable to gather and sell access to. In this case legislatures would need to provide funding to government agencies to hire Gallup to gather data that is relevant to measuring progress toward the SDGs.

There is a pattern in the preceding paragraphs. All of them end with the legislature or representative body of government having to provide funding for data gathering. How we gather data (the funding, budgeting, administration, and authority) is entirely political. This is a key issue that gets lost in a lot of discussion around ‘open data’ and demands for data-driven policy making. It is too easy to fall into a trap where data gets treated as a neutral, values-free thing, existing in a plane outside the messy politics of public administration. The Global Peace Index does a good service by highlighting where there are serious gaps in the necessary data for tracking the SDG Targets. This leads us to the political question of financing data collection.

If the UN and the various stakeholders who developed the Sustainable Development Goals can’t make the case to legislatures and parliaments that investments in data gathering and statistical capacity are politically worthwhile, it is entirely likely that the SDGs will go unmeasured and we’ll be back around a table in 2030 hacking away at the same development challenges while missing the harder conversation about the politics necessary to drive sustainable change.