Opinion: The promises — and challenges — of data collaboratives for the SDGs

A data stewards workshop in Cape Town, organized by The GovLab. Photo by: The GovLab

As the road to achieving the Sustainable Development Goals becomes more complex and challenging, policymakers around the world need both new solutions and new ways to become more innovative. This includes better policy and program design based on evidence to solve problems at scale. The use of big data — the vast majority of which is collected, processed, and analyzed by the private sector — is key.

Data collaboratives — a new form of public-private partnership in which government, private industry, and civil society work together to release previously siloed data, making it available to address the challenges of our era.

In the past few months, we at UN Global Pulse and The GovLab have sought to understand pathways to make policymaking more evidence-based and data-driven with the use of big data. Working in parallel at both local and global scale, we have conducted extensive desk research, held a series of workshops, and conducted in-depth conversations and interviews with key stakeholders, including government, civil society, and private sector representatives.

Our work is driven by a recognition of the potential of use of privately processed data through data collaboratives — a new form of public-private partnership in which government, private industry, and civil society work together to release previously siloed data, making it available to address the challenges of our era.

Research suggests that data collaboratives offer tremendous potential when implemented strategically under the appropriate policy and ethical frameworks. Nonetheless, this remains a nascent field, and we have summarized some of the barriers that continue to confront data collaboratives, with an eye toward ultimately proposing solutions to make them more effective, scalable, sustainable, and responsible.

Here are seven challenges:

1. Lack of public awareness on the potential 

Both among those supplying data and those using it, but perhaps most importantly among the public at large, there is a lack of awareness and appreciation of the potential and value of privately processed data being deployed for the public good.

2. Absence of trust

Our research indicates that the relationship between the private sector and governments or actors from civil society, including researchers, is often uneasy, and there is frequently a reluctance to collaborate and use the formers’ assets to bring about social, environmental, and economic impact.

In addition, a lack of trust also manifests in public ambivalence: Although citizens support the theoretical use of data for positive public impact, in practice they often remain uncomfortable about private companies releasing their information into the public domain, even if it is anonymized.

3. Private sector uncertainties

Companies often have — legitimate — concerns and reservations about the reuse of big data limiting both the extent and the impact. A still incomplete list of concerns that we have encountered through our research include those concerning:

■ Data leaks and competitors gaining business intelligence about markets and operations.
■ Penalties and fines from regulators or other lawmakers imposed due to the interpretation of legislation and processes.
■ Reputation loss if customers grow suspicious of governments using their data for surveillance or other purposes.
■ Apprehensions over how the public sector may use or misuse data on citizens.

In our experience, while these reservations often go unstated, they are the implicit reasons why companies do not allow reuse of big data or participate more actively in data collaboratives.

4. Limited capacity

The ability to process, analyze, and use big data varies widely, another factor which limits the positive public impact of its reuse. This is true both among data suppliers — such as private companies — and those who use the data — civil society groups, governments or citizens.

Our research indicates that while several corporations do have or are developing significant in-house capacity, many still have limited capacities in IT equipment, data analytics skills, and the critical ability to anonymize data and otherwise establish controls and de-risk it before release.

Capacity on the demand side is similarly varied, especially to process large unstructured data sets that are the core of big data.

5. Transaction costs

Private data is often released or leveraged without charge, but it would be wrong to assert that opening up data is free. Transaction costs are incurred at several points in the data lifecycle — while preparing data; de-risking data through anonymization for example; and in coordinating with partners, including through the preparation of legal agreements or other structures, mechanisms, or institutions to permit data sharing and reuse.

Our research indicates that such costs are especially burdensome in the context of low- and middle-income countries and the global development community. In addition, high transaction costs may pose a particular problem for smaller entities.

6. Scaling challenges

In our experience, most implemented projects with big data are still limited in their impact because of the scale of implementation. Limiting factors to scale up pilots include lack of mechanisms as platforms to help identify partners or opportunities to collaborate and lack of industry-specific guidelines and protocols to guide collaborations. In low- and middle-income countries, limited funding also hinders the scaling up of successful test cases.  

7.  Limited community of practice and expertise

To an extent, the absence of a well-defined community of practice and expertise is unavoidable, given the nascent nature of the field. Yet as big data sharing initiatives continue to multiply, we would expect to see the emergence of new bodies and institutions that could offer the foundation of such a community.

The above diagnosis allows us to develop more targeted approaches to making data collaboratives more systemic, sustainable, and responsible. Our work on establishing and connecting data stewards in the private sector, documenting case studies, as well as supporting the definition of policies, regulations, and ethics in low- and middle-income countries to frame the reuse of big data for the SDGs has begun to glimpse some possible solutions to the challenges above.

These initiatives are, like the field of leveraging data itself, fledgling and dynamic, and we would welcome any suggestions, support, and recommendations moving forward.

About the authors

  • Unnamed

    Paula Hidalgo-Sanchis

    Paula Hidalgo-Sanchis is based in Uganda and manages Pulse Lab Kampala, a Lab of the UN Global Pulse network. Paula has worked as humanitarian and development practitioner for 18 years. She has worked as manager of Innovations, social policy advisor and analyst posted in America, Asiam and Africa. With field experience in over 20 countries, Paula is passionate about promoting innovations for human development. She holds a Ph.D. in Geography and master's level on international assistance, and has a strong motivation to promote the use of big data and artificial intelligence to achieve today’s world challenges.
  • Stefaan verhulst ed

    Stefaan G. Verhulst

    Stefaan G. Verhulst is co-founder and chief research and development officer of The GovLab at New York University where he is building a knowledge foundation on how to transform governance using advances in science and technology. Before joining NYU, Verhulst spent more than a decade as chief of research for the Markle Foundation, where he continues to serve as senior advisor.