When Rigorous Impact Evaluation Is Not a luxury: Scrutinizing the Millennium Villages

EDITOR’S NOTE: It is impossible to conduct a rigorous impact evaluation for the Millennium Village Project due to the current method’s several weaknesses, according to Michael Clemens, research fellow at the Center for Global Development. These include subjective choice of intervention and comparison sites, lack of baseline data on comparison sites, small sample size and short time horizon, all of which, Clemens says, can easily be addressed.

Back in 2004 a major new development project started in Bar-Sauri, Kenya. This Millennium Village Project (MVP) seeks to break individual village clusters free from poverty with an intense, combined aid package for agriculture, education, health, and infrastructure. The United Nations and Columbia University began the pilot phase in Bar-Sauri and have extended it to numerous village clusters in nine other countries. They hope to scale up the approach across much of Africa.

But wait: Before we consider blanketing a continent with any aid intervention, we have to know whether or not it works. For example, we have to know if different things have happened in Bar-Sauri than have happened in nearby Uranga, which was not touched by the project. And we have to know if those differences will last. This matters because aid money is scarce, and the tens of millions slated for the MVP are tens of millions that won’t be spent on other efforts.

Here I discuss a new research paper that I wrote with Gabriel Demombynes of the World Bank. We ask when it’s important to take great care in measuring a project’s impacts, and we illustrate one concrete case: the Millennium Village Project. We show how easy it can be to get the wrong idea about the project’s impacts when careful, scientific impact evaluation methods are not used. And we detail how the impact evaluation could be done better, at low cost.

The paper deliberately makes no conclusion about the wisdom or effectiveness of the intervention itself, except for the modest claim that the intervention shouldn’t be massively scaled up until its effects have been reliably estimated. To me that’s uncontroversial; Africans have urgent needs, but they have urgent needs for things that work, and many of them have been disappointed by well-intended outsiders in the past.

It’s all relative

Our first point is about comparing trends inside the Millennium Villages to trends outside them. The June 2010 MVP midterm evaluation shows you positive trends within the Millennium Villages, in development indicators like access to sanitation, water, and cell phones. The project claims responsibility for those changes, by calling them “impacts” of the project. But those trends don’t tell you the impact of the project, because it’s possible that some or all of those changes would have happened if the project hadn’t happened.

One way to approximate what might have happened without the MVP is to look at trends in the same indicators across the regions around the Millennium Villages, and across the whole country. For example, below are trends in the fraction of small children sleeping under Insecticide-Treated Nets (ITNs) that help prevent malaria. The black line shows the Millennium Village of Bar-Sauri, Kenya. The blue line shows the trend in rural Nyanza Province. (Bar-Sauri comprises just 1.3% of the rural population of Nyanza.) Green shows rural Kenya, and red shows all of Kenya:

Clearly there are big changes in this indicator happening across the region and country, changes that could not possibly be due to the MVP. Attributing the entire change in Bar-Sauri to the MVP is thus incorrect.

Here’s another example from the Millennium Village in Bonsaaso, Ghana. This time the indicator is the fraction of households with access to improved water sources such as piped water:

Again, the blue line shows that there are major changes in this indicator going on across rural Ashanti Region, where Bonsaaso is located (and makes up a tiny 1.6% of the population). The MVP cannot be causing all those changes. Stating that changes in Bonsaaso measure the “impacts” of the project can’t be a careful measure of those impacts.

The paper discusses several indicators like this. Perhaps most striking is the fraction of households that own mobile phones. Anyone who has recently spent time in Africa knows that the continent is undergoing a mobile phone revolution. For two of the three intervention sites we study, cell phone ownership has been growing just as fast in the large regions around the Millennium Villages as it has been growing inside the Millennium Villages. Just stating that increases in mobile phone ownership constitute an “impact” of the project, as the mid-term evaluation does, cannot be a careful measurement of the project’s impact.

The project’s future evaluation plans

This analysis shows that rigorous estimates of the project’s impact depend heavily on how one does the evaluation. The true impacts of the project are likely to be much smaller than the impacts claimed in the midterm evaluation report, for many indicators in many villages. But the figures above don’t constitute a rigorous evaluation of the project’s impacts. Such a rigorous evaluation can’t be done right now—for five reasons, all of which can be easily fixed:

  1. Subjective choice of intervention sites. The initial wave of Millennium Villages was chosen in part because they had special traits, such as committed NGO partners and local leaders, that would help the project work well. This is a perfectly reasonable choice from a project management point of view. But this means that it’s unclear if the intervention would work as well at other sites. The MVP evaluation protocol mentions this issue. 

  2. Subjective choice of comparison sites. In the future the MVP plans to release data on comparison villages that did not get the intervention. But the evaluation protocol doesn’t give you a way to know whether or not the comparison sites are truly just like the intervention sites in every way that could affect project success—such as (again) the commitment of local leaders. I discussed this problem back in April. 

  3. Lack of baseline data on comparison sites. Unfortunately, though the project now includes comparison villages, no data were gathered on conditions in those villages at the time the project began (at “baseline”). This may be because having comparison villages at all is a recent change in the project, which stated as recently as three years ago that it considered collection of data in comparison villages to be unethical. Now there are comparison villages, but the lack of baseline data means that it will be harder to tell if any differences between the intervention sites and the comparison villages existed before the project started. The MVP evaluation protocol notes this weakness too. 

  4. Small sample size. Just ten village clusters are involved in the project. The MVP evaluation protocol notes that this small sample size will only be able to reliably detect a very large minimum effect of the project on child mortality: a 40% drop in five years. That is a giant effect, more than twice as fast as the decline in child mortality called for by the ambitious Millennium Development Goals. Any smaller effects might be difficult to scientifically distinguish from zero. 

  5. Short time horizon. The current MVP evaluation protocol only contains plans to evaluate the impact of the MVP over a five-year timespan. This is inadequate, because past village-level package interventions in poor rural areas have had short-term effects that rapidly dissipated after ten years. The goal of the project is to create “self-sustaining economic growth” in the villages, and until it’s known whether or not the project is capable of that, the project shouldn’t be greatly scaled up.

In our paper, we detail exactly how a proper impact evaluation could be done, at a cost per village not much higher than the cost of the current evaluation, and in a way that remedies all five of the above weaknesses. The proposal is for random assignment of treatment between about 20 matched pairs of village clusters, where both members of each pair are monitored over 15 years.

The paper has two ultimate purposes. One is to highlight the need in general for rigorous impact evaluation when it is feasible. The second is to argue for such an evaluation of the MVP going forward. It’s too late for the current crop of MVP sites. But there is no obstacle to undertaking a rigorous evaluation for the next 20 sites. There are plans for far more than that number.

Re-published with permission by the Center for Global Development. Visit the original article.

About the author

  • Devex Editor

    Thanks a lot for your interest in Devex News. To share news and views, story ideas and press releases, please email news@devex.com. We look forward to hearing from you.