GENEVA — Earlier this year, Peter Maurer, president of the International Committee of the Red Cross, noted that “big data and better contextual analysis have the potential to transform how the ICRC responds to and anticipates humanitarian crises.”
He was not merely speculating. For the last three years, ICRC has been developing a digital system to analyze mountains of data gathered from messages posted on Twitter and local social media platforms, especially in fragile corners of the world.
The initiative, in collaboration with the École polytechnique fédérale de Lausanne, or EPLF, a leading Swiss engineering university, aims to give the Red Cross a new tool for early detection and a better understanding of emerging humanitarian issues. It has already carried out three pilot programs and will be running 11 more this year.
See more related topics:
► Opinion: Data's big moment? Here's what you need to know
► With new Tableau partnership, UN looks to modernize data sharing
► Opinion: The hustle for data: Side gigs that change science, policies, and lives
Charlotte Lindsey-Curtet, ICRC’s director of communication and information management and lead on the project, said the pilots have shown that the new system works, but much work remains to make it more effective, and to integrate it into ICRC’s operations and training. She spoke to Devex about the project at ICRC’s Geneva headquarters. The conversation has been edited for length and clarity.
What are the origins of the project and your collaboration with EPFL?
EPFL had quite a lot of experience in big data analytics. I asked them: We have massive challenges to read our operating environment. At the same time, increasingly, there are public sources of information which can have a bearing on our security, our reputation, our ability to understand the operating environment we’re in. So, could the same algorithmic reading [that the university has used for other projects, such as searching for trends in journal articles] be used across different data sets in order to understand the environment?
Almost three years ago, we initiated a project to try to better understand the environments we operate in, without any idea whether it would be feasible. Many people were very skeptical. Our input, for nearly nine months at the beginning, was to start to aggregate all of the terms that would be interesting for us in various languages. We’re looking at such terms as “war,” “bomb,” “terrorist,” “attack,” “conflict,” “hospital,” “shelter.” We had to find local terms for all these things. For example, would local people use the word “humanitarian,” or “charity,” or “alms?”
Our input was building that terminology, what we call the ontology of language. Then you have an algorithmic reading from maybe 1 million different public data sources which would have those terms in them. That’s what took a lot of work. Now, the system’s artificial intelligence learns which terminology to include by itself.
What kind of progress have you made with the pilots?
The first pilot test we did took us about nine months. That was around developing the ontology and creating some sort of dashboard. Then we piloted two more projects last year in Syria and Iraq, where we wanted to analyze results in multiple languages. With those two projects, we are now able to do what previously took nine months in just eight hours. It’s what EPFL would call “near real time.”
We have developed a web-based application that will allow us to disaggregate the reading thematically: Water, health, time, geolocation, etc. In one of the pilots, the system told us there was a water problem in one area. Our water people said, “No, not as far as we know.” We went and had a look and yes, there was a water problem!
We have 11 pilots this year, each on a theme or a context — for example, missing people, or migration, or hate speech. Or we might say, “we can’t get access to a particular region, so what can big data tell us,” to have a complementary reading of the situation there.
What potential does the system seem to have for ICRC’s work?
We went from asking, “Could this be relevant,” to seeing that it is. It works; there’s no doubt about that. We’ve shown the outcomes to our heads of delegations. They’ve said: “Wow, how quickly can we get this?” Now we have to look at how we can change processes and adapt this to the ways that our different teams work.
“In more and more contexts, potential beneficiaries and stakeholders who could have humanitarian needs will not necessarily be expressing them directly to an organization, but through their social media presence.”
— Charlotte Lindsey-Curtet, ICRC’s director of communication and information managementIt can provide us with information to manage our own presence, our own security, as well as understanding where humanitarian needs will be. And it’s reflecting the fact that in more and more contexts, the potential beneficiaries and stakeholders who could have humanitarian needs will not necessarily be expressing them directly to an organization, but through their social media presence. The quicker you can understand the needs on the ground, the better we can make the humanitarian response.
Most of the information on Facebook is closed and private, so our new system uses publicly-sourced information. Twitter is the biggest one, but there are a growing number of local social media platforms that you can get to through various search engines, in different languages. There is an increasing use of social media, even in the highly complex environments that we work in.
For example, you might get a platform where the head of the local water district is saying, “No chlorine tablets for 3 weeks, as no aid convoys have arrived;” or you might have a lot of people suddenly saying, “We can’t get to that particular water well or water source because of security issues.”
There could be 20 million tweets that we analyze in a particular area and time period. Are we able to filter out things that are noise? Are we able to then work out the reliability of the information, and are we able to work out what that might mean for a humanitarian response? It will never be the only information we rely on. But increasingly, people are posting more and more information about what is happening to them in the environments they are in.
What will you be doing with this in the short- and medium-term?
At the moment, we’re looking at building the servers here [at the ICRC headquarters in Geneva].
We also want the system to look at images. Not captions — it would read the images themselves. For example, we could ask for images of bombed hospitals. EPFL has gone quite far with that. They have a relatively reliable system — I think it was 80 percent [accurate] last time I checked.
Our big approach for 2018-2019 is to integrate this, to make it part of the working tools you have across the institution whilst continuing to develop it, to run different pilots, and to ask how we can integrate that into our different training programs. Of course, you’re storing the algorithmic reading over a period of time — what will be interesting for us is that you’re able to look backward, which you can’t do very well in normal aggregation because you’re not keeping that information. So you’re able to assess over a particular time period whether a problem is getting worse or better.
But we also understand how careful we have to be with it. Because you’re aggregating information that is coming from individuals, you can’t just assume that that’s the only reading of the environment and say: “Right, there’s a water problem, let’s forget about all the other problems, like medical, because nobody’s talking about medical.” Maybe medical is actually the bigger problem. That’s why the ability to have an additional, proximity-based reading [through our delegates on the ground] is also important.