I watched from a distance on Twitter as the World Bank hosted its annual data event. I would love to have attended – the participants were a pretty amazing collection of economists, data professionals and academics. This tweet seemed to resonate with a theme I’ve been focused on the last week or so: There is a data shortage such that even the most advanced countries can’t measure the Sustainable Development Goals (SDGs).
I replied to this tweet with a query about whether there was evidence of political will among EU member states to actually collect this data. In keeping with the “data is political” line that I started on last week, political will is important because the European Statistical System relies heavily on EU member states’ statistics offices to provide data. The above tweet highlights two things for me – there needs to be a conversation about where the existing data comes from, and there need to be MPs or MEPs (legislative representatives) at meetings like the World Bank’s annual data event.
Since Eurostat and the European Statistical System were the topic of the tweet, I’ll focus on how they gather statistics. Most of my expertise is in their social and crime stats so I’ll speak to those primarily, but it’s important to note that the quality and quantity of any statistic is based on its importance to the collector and end user. Eurostat got its start as a hub for data on the coal and steel industries in the 1950s, and while its mandate has grown significantly the quality and density of the economic and business indicators hosted on its data site reflect its founding purpose. Member states provide good economic data because states have decided that trade is important – there is a compelling political reason to provide these statistics. Much of this data is available at high levels of granularity, down to the NUTS 3 level. It’s mostly eye-wateringly boring agricultural, land use, and industrial data, but it’s the kind of stuff that’s important for keeping what is primarily an economic union running smoothly(-ish).
If we compare Eurostat’s economic data to its social and crime data, the quality and coverage decrease notably. This is when it’s important to ask where the data comes from and how it’s gathered – if 2/3 of the data necessary to measure the SDGs isn’t available for Europe (let alone say, the Central African Republic) we need to be thinking clearly about why we have the data we have, and the values that undergird gathering good social data. Eurostat statistics that would be important to measuring the SDGs might include the SILC surveys that measure social inclusion, and general data on crime and policing. The SILC surveys are designed by Eurostat and implemented by national statistics offices in EU member states. The granularity and availability varies depending on the capacity of the national stats office and the domestic laws regarding personal data and privacy. For example, some countries run the SILC surveys at the NUTS 2 level while others administer them only at the national level. A handful of countries, such as France, do the surveys at the individual level and produce panel data. The problem is that the SILC data has mixed levels of availability due to national laws regarding privacy – for example, if you want the SILC panel data you have to apply for it and prove you have data storage standards that meet France’s national laws for data security.
Crime and police data is even more of an issue. Eurostat generally doesn’t collect crime data directly from member states. They have an arrangement with the UN Office on Drugs and Crime where crime and police data reported to the UN by EU member states gets passed to Eurostat and made available through their database. One exception is a dataset of homicide, robbery and burglary in the EU from 2008-2010 that is disaggregated down to the NUTS 3 level. When I spoke with the crime stats lead at Eurostat about this dataset he explained that it was a one-off survey in which Eurostat worked with national statistics offices to gather the data; in the end it was so time consuming and expensive that it was canceled. Why would such a rich data collection process get the axe? Because it’s an established fact that crime statistics can’t be compared across jurisdictions due to definitional and counting differences. So funders reasonably asked: What’s the point of spending a lot of money and time collecting data that isn’t comparable in the first place?
A key problem I see in the open data discussion is a heavy focus on data availability with relatively little focus on why the data we have exists in the first place, and by extension what would go into gathering new SDG-focused data (e.g. the missing 2/3 noted in the opening tweet). Some of this is driven by, in my opinion, an over confidence in/fetishization of ‘big data’ and crowdsourced statistics. Software platforms are important if you think the data availability problem is just a shortage of capacity to mine social networks, geospatial satellite feeds and passive web-produced data. I’d argue though that the problem isn’t collection ability, and that the focus on collection and validation of ‘big data’ distracts from the important political discussion of whether societies value the SDGs enough to put money and resources into filling the 2/3 gap with purpose-designed surveys instead of mining the internet’s exhaust hoping to find data that’s good enough to build policy on.
I’m not a Luddite crank – I’m all for using technology in innovative ways to gather good data and make it available to all citizens. Indeed, ‘big data’ can provide interesting insights into political and social processes, so finding technical solutions for managing reams and reams of it are important. But there is something socially and politically important about allocating public funds for gathering purpose-designed administrative statistics. When MPs, members of Congress, or MEPs allocate public funds they are making two statements. One is that they value data-driven policy making; the other, more important in my opinion, is that they value a policy area enough to use public resources to improve government performance in it. For this reason I’d argue that data events which don’t have legislative representatives featured as speakers are missing a key chance to talk seriously about the politics of data gathering. Perhaps next year instead of having a technical expert from Eurostat tell us that 2/3 of the necessary data for measuring the SDGs is missing, have Marianne Thyssen, the Commissioner for Employment, Social Affairs and Inclusion that covers Eurostat, come and take questions about EU and member state political will to actually measure the SDGs.
The World Bank’s data team, as well as countless other technical experts at stats offices and research organizations, are doing great work when it comes to making existing data available through better web services, APIs, and open databases. But we’re only having 50% of the necessary discussion if the representatives who set budgets and represent the interests of constituents aren’t participating in the discussion of what we value enough to measure, and what kind of public resources it will take to gather the necessary data.
I was invited to be a speaker on the panel on behavior change and technology in peacebuilding and Build Peace 2015. The panel was a lot of fun, with some fascinating presentations! You can find them on the Build Peace YouTube page. Here’s mine:
This was a particularly fun conference, pulling together practitioners, activists and academics in a setting that breaks away from the usual paper/panel/questions format of most conferences. Looking forward to next year!
I’m excited to have my work included in Building Peace’s latest issue on technology and peacebuilding. This is my doctoral topic and one of my main interest areas, so it’s exciting to see it become an increasingly important topic in the conflict resolution and peacebuilding sphere.
Here’s a link to the entire contents of the issue. I particularly enjoyed reading Jen Welch’s article on games and peacebuilding, and Swedish Minister of Foreign Affairs Margot Wallström’s piece on how government can integrate new technology into foreign policy.
If you’re new to the space have a look at the issue – it’s a great contribution to a new and exciting area of peacebuilding and conflict resolution!
No, I won’t be ‘Dr.’ tomorrow, but the proposal defense is a milestone none the less. For those who are interested in my dissertation research, and can’t make it to my proposal defense tomorrow at 12:00PM at the School for Conflict Analysis and Resolution, below is a sound file you can listen to. You can download my slideshow here and follow along that way as well!
I got to interview Dr. Walter Dorn of Canadian Forces College about his work on technology and peacekeeping for my TechChange course on technology for conflict management and peacebuilding – a good interview that lends some operational and political insight for using these tools in peacekeeping settings!
Unfortunately the last few months have been fairly low output in terms of blog posts. This can be credited to resettling after returning from Samoa, getting back to work with the tech community in D.C, and of course getting a dissertation written. I have had the chance to get myself on a few panels this month and next to discuss my research, though. I’ll be joined by some awesome people too, so hopefully if you’re in D.C. you can come out and join us!
Later in November: Dissertation proposal defense at the School for Conflict Analysis and Resolution (exact date TBD). Open to the public!
Hopefully you can make it out to one or more of these, I think they’ll be really interesting!
For those who were curious about what I discussed with USAID’s Office on Conflict Management and Mitigation on September 4, wonder no more. TechChange’s video guru got me on camera to record the presentation – hopefully it’s useful (or leads to some good arguments at least).
GDELT just released their new Global Visualization dashboard, and it’s pretty cool. It blinks and flashes, glows and pulses, and is really interesting to navigate. Naturally, as a social scientist who studies conflict, I have some thoughts.
1) This is really cool. The user interface is attractive, it’s easy to navigate, and it’s intuitive. I don’t need a raft of instructions on how to use it, and I don’t need to be a programmer or have any background in programming to make use of all its functionality. If the technology and data sectors are going to make inroads into the conflict analysis space, they should take note of how GDELT did this, since most conflict specialists don’t have programming backgrounds and will ignore tools that are too programming intensive. Basically, if it takes more than about 10 minutes for me to get a tool or data program functioning, I’m probably not going to use it since I have other analytic techniques at my disposal that can achieve the same outcome that I’ve already mastered.
2) Beware the desire to forecast! As I dug through the data a bit, I realized something important. This is not a database of information that will be particularly useful for forecasting or predictive analysis. Well, replicable predictive analysis at least. You might be able to identify some trends, but since the data itself is news reports there’s going to be a lot of variation across tone, lag between event and publication, and a whole host of other things that will make quasi-experiments difficult. The example I gave to a friend who I was discussing this with was the challenge of predicting election results using Twitter; it worked when political scientists tried to predict the distribution of seats in the German Bundestag by party, but then when they replicated the experiment in the 2010 U.S. midterm elections it didn’t work at all. Most of this stemmed from the socio-linguistics of political commentary in the two countries. Germans aren’t particularly snarky or sarcastic in their political tweeting (apparently), while Americans are. This caused a major problem for the algorithm that was tracking key words and phrases during the American campaign season. Consider, if we have trouble predicting relatively uniform events like elections using language-based data, how much harder will it be to predict something like violence, which is far more complex?
3) Do look for qualitative details in the data! A friend of mine pointed out that the data contained on this map is treasure trove of sentiment, perception and narrative about how the media at a very local level conceptualizes violence. Understanding how media, especially local media, perceive things like risk or frame political issues is incredibly valuable for conflict analysts or peacebuilding professionals. I would argue that this is actually more valuable than forecasting or predictive modeling; if we’re honest with ourselves I think we’d have to admit that ‘predicting’ conflict and then rushing to stop it before it starts has proven to be a pretty lost endeavor. But if we understand at a deeper level why people would turn to violence, and how their context helps distill their perception of risk into something hard enough to fight over, then interventions such as negotiation, mediation and political settlements are going to be better tailored to the specific conflict. This is where the GDELT dashboard really shines as an analytic tool.
I’m excited to see how GDELT continues to make the dashboard better – there are already plans to provide more options for layering and filtering data, which will be helpful. Overall though, I’m excited to see what can be done with some creative qualitative research using this data, particularly for understanding sentiment and perception in the media during conflict.
My colleague Dr. Pamina Firchow and I are organizing a panel for next year’s ISA meeting in New Orleans (Feb. 15-21, 2015) on crowdsourcing and the study of violence and violence prevention. Below you’ll find our panel description, and instructions for submitting an abstract to us. We’ll need them by May 23 so we can make decisions on the five papers we will include in the panel proposal that we’ll be submitting before the June 1 deadline. We’d love to see what you all are working on, and look forward to your proposals!
Crowdsourcing Peace and Violence: Methods and Technologies in the Field
Over the last five years the field of crowdsourcing has been increasingly used by researchers and practitioners who study peace and violence. The primary goals of this panel are to discuss examples of successful projects, highlight ongoing challenges of using crowdsourcing and seeding, and frame crowd-based research methodologies based within the framework of established social science methods. The technologies that are used in crowdsourcing are readily available and inexpensive; these include mobile phones, social media, and open source software systems like Ushahidi maps. With all this expansion, however, there have been persistent challenges to using crowdsourcing and crowdseeding for peace and conflict research. Some of these are methodological, including problems with sampling bias, validity, and data integrity. Others are techno-social, such as how people use crowdsourcing technologies in their daily life, privacy concerns, and information security. This panel will feature papers from researchers who are actively using crowdsourcing and crowdseeding methods in their research, continuing the theme of ISA 2014’s panel “Crowdsourcing in the Study of Violence (WD26).”
Panelists will also be invited to submit their papers to be included in a special journal issue on crowdsourcing in violence prevention and peacebuilding. Abstracts for the ISA panel should be submitted to Pamina Firchow (pfirchow[at]nd.edu) and Charles Martin-Shields (cmarti17[at]gmu.edu) by May 23, 2014 via email in Word format. Titles need to be less than 50 words and abstracts need to be less than 200 words. Please include affiliation and contact information in your abstract!