Data-driven development needs both social and computer scientists

By Catherine Cheney 29 July 2016

Social scientists and computer scientists don't always see eye to eye, but to harness all the emerging data for development, they will need to find ways to bridge their differences. Photo by: Rachel Johnson / CC BY-ND

There can be tension or even at times ill will between social scientists and computer scientists.

When presented with the same figures, social scientists are likely to caution that the quality of a conclusion is only as good as the quality of the data, whereas computer scientists — often called data scientists — are likely to warn against perfection standing in the way of information. Both of these groups can pursue the growing field of data science, and how they approach their work and cooperation can bring out the best of these two perspectives, or create environments of ambiguity or animosity in the global development industry.

“Social science has always relied on a combination of statistics and models or theories. The increase in available data and the increasingly powerful and sophisticated means of automated analysis changes this equation,” said Lambert Hogenhout, chief of data analytics at the United Nations. “Over the long term, we will be able to explain and predict social dynamics much better. Getting there will require collaboration between social scientists and data scientists.”

The rise of big data and the pursuit of the Sustainable Development Goals demands a diverse set of skills that cannot be found in a single person or discipline. While the divide in language and culture between social and computer scientists remain, there are a number of efforts underway to foster more conversation and collaboration in pursuit of better data for development.

Fractions and divisions

Broadly speaking, development economists value theory and causal identification, whereas the growing number of data scientists entering the development sector are focused on finding patterns in data and making predictions based on what they see.

“At least on the academic side, the two communities have very different traditions, and generally approach problems very differently,” said Josh Blumenstock, the director of the Data Science and Analytics Lab at the University of Washington, whose recent work has combined more traditional data sources such as phone surveys with new data sources including call detail records from mobile phones in Rwanda and Afghanistan.

“I don't think this necessarily presents a problem, but it does mean that you can't just put a social scientist and a data scientist in a room and assume that magic will ensue,” Blumenstock added.

The integration of technologies such as machine learning into social science curriculums, and the interest that computer scientists are taking in social theories and methods, makes him optimistic for the future. There are some very real obstacles that stand in the way of collaborations happening at scale, but also some value that each side can bring to the table, as long as they can learn to get along.

Data scientists are programmers who ignore probability but like pretty graphs, said Patrick Ball, a statistician and human rights advocate who cofounded the Human Rights Data Analysis Group.

“Data is broken,” Ball said. “Anyone who thinks they’re going to use big data to solve a problem is already on the path to fantasy land.”

Data driven development means nothing if the information upon which those decisions are made is partial, unreliable, or false, he told Devex. He outlined three ways to rigorous statistics: have all the data for a given population, otherwise generate a formal probability sample or a random sampling of the underlying population, or if necessary consider models that can generate a statistical inference.

“All data is flawed but that doesn’t mean all data is useless,” countered Andrew Means, co-founder of The Impact Lab, a data science consulting group that provides data visualization and advanced analytics services. “With lots of information, we can estimate some unknown counterfactual based on lots of correlations, rather than what we have traditionally needed to do to understand causality.”

Demonstrating the back and forth that can go on, Ball said people with backgrounds in tech are too quick to say the data speaks for itself, whereas Means said people with statistics backgrounds can develop paralysis by analysis.

Social scientists don't always have the same skills to wrangle unruly data as computer scientists, but computer scientists don’t always bring the same understanding of statistical rigor as social scientists, said Jake Porway, the founder and executive director of DataKind, which pairs data scientists with social change organizations.

They do have at least one thing in common.

“Neither group can resist a fascinating question that might help improve the world and that can be a great way to bring them together,” he said.

What statisticians, demographers, and economists need to realize is that data science is not just a fad, and what computer scientists and engineers need to acknowledge is that they cannot solve global poverty by crunching numbers alone, said Emmanuel Letouzé, director of the Data-Pop Alliance.

Changing the equation

While social scientists and computer scientists tend to to work in two different worlds, a growing number of projects and initiatives are leveraging the best of both approaches.

In Uganda, an expert in development economics from Northwestern University and a pioneer in remote sensing from Stanford University were part of a team that tested how payments to landowners might prevent them from cutting down trees.

The economists figured out the cost of the trees and pollution and measured the program’s effects on individuals, whereas the satellite experts developed an algorithm that would scan high resolution photos and count trees.

Those experiences can certainly change relationships between individuals, but broadening definitions may also impact how professionals in the two fields view eachother.

“A computer scientist can be a data scientist. A social scientist can be a data scientist. Neither of them are data scientists individually. They form a data science team when they come together,” said Matt Gee, who is also a cofounder at The Impact Lab and organizes the Data Science for Social Good Fellowship funded by Alphabet Inc. Executive Chairman Eric Schmidt.

“That to us is a much more powerful definition of data science than you are crowned a data scientist if you know computer science and engineering and you are crowned a social scientist if you know how to do regression discontinuity,” he said.

The Uganda collaboration is one example, but there are others as well, and some are beginning to show results. Inspired by its work with Data Science for Social Good fellows working on issues ranging from maternal mortality to online petitioning, the Office of the President in Mexico is launching a data lab that will leverage the collaboration between those who value theory and causality and those who value patterns and prediction.

Ultimately each problem has a different set of data needs. Some may rely more on traditional methods, such as surveys and statistical models, while others might benefit from using emerging technologies to draw inferences about hard to reach populations.

“The place to begin is first understanding what problem you are trying to solve, and then deciding which tool will be the biggest help,” said Veronica Olazabal, senior associate director for evaluation at Rockefeller Foundation. The foundation is funding United Nations Global Pulse, an initiative of the U.N. secretary-general tasked with harnessing big data innovation for sustainable development, which is producing a report on integrating big data into monitoring and evaluation that it will launch this fall.

The Data Science in Africa workshop organized this July by the U.N. Global Pulse focused on how data science can be used for development and humanitarian action and be best applied to support and achieve the SDGs. Photo by: UN Global Pulse

“Social science and evaluation are typically used to assess results, whereas data science is thus far more valued for its predictive ability. Both are rapidly evolving, which is exciting, so long as novelty doesn’t take precedence over verified impact,” Olazabal said.

The Rockefeller Foundation is seeking ways to generate a dialogue with goodwill rather than to perpetuate a debate on the differences between big data analysis and development evaluation and whether the former should replace the latter.

As computer scientists do more to understand social scientific inquiry, and social scientists learn more about new tools and techniques to analyze big data, the global development community would benefit from considering not only how to identify common ground, but how to utilize both skill sets.

“We’ve found it's precisely in this cross-pollination of different backgrounds and skills that new solutions and approaches can come to life,” said DataKind’s Porway.

With potential to change the trajectory of crises, such as famines or the spread of diseases, the innovative use of data will drive a new era for global development. Throughout this monthlong Data Driven discussion, Devex and partners will explore how the data revolution is changing our approach to achieving development outcomes and reshaping the future of our industry. Help us drive the conversation forward by tagging #DataDriven and @devex.

About the author

Catherine cheney devex
Catherine Cheneycatherinecheney

Catherine Cheney covers the West Coast global development community for Devex. Since graduating from Yale University, where she earned bachelor's and master's degrees in political science, Catherine has worked as a reporter and editor for a range of publications including World Politics Review, POLITICO, and NationSwell, a media company and membership network she helped to build. She is also an ambassador for the Solutions Journalism Network and the Franklin Project at the Aspen Institute.

Join the Discussion