Data around COVID-19 is a mess and here's why that matters

Our COVID-19 coverage is free. Please consider a Devex Pro subscription to support our journalism.
Nurses check medical records at a field hospital in Santo Andre, São Paulo, Brazil. Photo by: Amanda Perobelli / Reuters

CANBERRA/NAIROBI/MANILA — Infections from the coronavirus have now passed 4 million globally, with deaths now running over 270,000. But experts warn these figures — used to understand the spread and impact of the pandemic — need to be treated with caution.

Data plays a critical role in the COVID-19 response. Researchers rely on case data to make predictions of how many people will likely be infected by the virus. Governments use this information to identify policies and measures they need to adopt and implement in their countries’ contexts. Aid organizations use data to help understand needs and target their interventions.

“You don’t want to know about COVID deaths weeks ago — you want to know about them yesterday and today.”

— Alan Lopez, director, Bloomberg Initiative for Civil Registration and Vital Statistics

But their analyses and responses are only as good as the data at hand. So if there is underreporting of deaths in a country, modeling analyses picking up that data will likely underreport deaths in their predictions, said Nilanjan Chatterjee, a Bloomberg distinguished professor at Johns Hopkins University.

And that will have an impact on how governments prepare for the pandemic.

“If the data is not as good, then our forecast in the future [will also carry] underreported deaths … and that will lead to under preparation [as these predictions] help governments to prepare how many hospital beds will be needed, ICU beds, and ventilators,” said Chatterjee, who has evaluated some of the oft-cited modeling analyses on COVID-19.

Where are the health workers? New data efforts aim to answer big coronavirus questions

UNDP and Chemonics are among the organizations working quickly to generate a better understanding of low- and middle-income countries' health system data during the pandemic.

The problem is there’s a lot of underreporting globally, from testing to deaths from COVID-19. Inconsistent reporting and differences in countries’ reporting methods also make it challenging to make comparisons between countries.

Health data experts, including professor Alan Lopez, director of the Bloomberg Initiative for Civil Registration and Vital Statistics, told Devex as little as one-quarter of reporting countries’ data on deaths may be trustworthy enough to support policy and decision-making.

“In some countries, we know the data quite well, but those countries typically are only covering 25 to 30% of global deaths,” he said. “Many other countries will take several years to produce reliable data.”

Data challenges in LMICs

So far COVID-19 has been reported in over 180 countries and territories. But 19 other countries have not reported any confirmed cases, many of them in the Pacific.

Countries and territories reporting cases of COVID-19, 11 May 2020. Source: Johns Hopkins COVID-19 dashboard.

In the Pacific region, international travel bans imposed by countries early in the outbreak could explain the lack of confirmed cases in most Pacific Islands. But the wider Indo-Pacific region is also known to have poor medical reporting systems, Lopez said.

“I don’t know much about the surveillance of cases, but I know enough about their surveillance and mortality registration systems to say that they would be missing 50% to 80% of deaths typically, and it is unlikely they would be reliably capturing the COVID deaths that occur in hospital,” Lopez said.

“The deaths that are being reported in [the Indo-Pacific] are likely to be gross underestimates of the actual number of COVID-19 deaths,” he said.

Through the Data for Health initiative, Lopez has been working with Indo-Pacific countries, including Myanmar, Papua New Guinea, and the Solomon Islands, to improve death registration. This includes training medical experts and introducing verbal autopsies that help to rapidly identify and categorize deaths outside hospitals.

In many of these countries, Lopez said he starts with a “basis of zero,” which means having limited or no ability to record deaths accurately. But he said it could take years to measure the impact of these efforts, as many of the countries have limited capacities to comprehensively identify and track deaths in the region.

Other countries outside the Indo-Pacific likely face similar challenges.

“Some collect it but don’t rapidly report it, so people can’t access the data. But at this point, we need that data now. There’s no point getting that data in 2021,” Chatterjee said.

Because of this, there is a lag in when cases are showing up in WHO figures. The Kenyan government, for example, confirmed its first case on March 12, alerted the public about the case on March 13, and it only showed up in WHO's daily situation report on March 14, as one of the countries that "have reported cases of COVID-19 in the past 24 hours."

At time of publication, there was a difference of 199,401 cases between the latest WHO daily situation report and the tally of COVID-19 cases published by Johns Hopkins University.

In other countries, the challenge extends to testing capacities and health care access.

“Differences in health-seeking behavior among different populations, say, because of differences in endowments (e.g. access to health care, support systems, information) or expectations (e.g. how they will be treated in healthcare facilities), may also hide the true state of the epidemic,” Michael Abrigo, research specialist at the Philippine Institute for Development Studies, told Devex in an email.

In countries such as North Korea, poor surveillance and testing concerns exist — but experts also speculate they may be hiding cases.

Because of lack of access to tests, Michel Yao, WHO emergency operations manager in Africa, advised that countries, while not ideal, may need to move forward on treating people and triggering community efforts to contain the virus, even if people are never tested, using “case definitions” — or the presence of symptoms as a signal that a person is positive for COVID-19.

"It is done in many other outbreaks. In cholera, for example, where in the past we were not able to test everybody, we just confirm a few cases and rely on case definitions," he said during a press briefing on April 30.

Unreliable data in advanced economies

While COVID-19 reporting is highly unreliable in many low- and middle-income countries, high-income economies also have challenges. There are lags in the detection of COVID-19 and reporting of cases, as doctors struggle to differentiate it from other diseases with similar symptoms, such as the flu. Official death counts in some countries change as governments add or subtract deaths, or revise their reporting methodologies.

The United Kingdom saw a recent spike in death toll after the government started to include COVID-19 deaths in communities, including in nursing homes. The state of New York also recently revised its reporting by including deceased individuals presumed to have died from COVID-19, but never tested for the virus. The change led to an increase in the city’s COVID-19 death toll.

On April 22, the Japanese government revised its reporting methodology for COVID-19 deaths by adding data for deceased cases that are still in the process of verification.

Japanese reported deaths between 3 Apr and May 8. Source: WHO COVID-19 dashboard.

Meanwhile, China has changed how it reports cases a number of times throughout the outbreak.

On Feb. 13, the government saw a spike in COVID-19 cases after it decided to include clinically diagnosed cases in the official tally. This includes a medical professional classifying a confirmed case on the basis of chest imaging. But just a week later, the Chinese government reverted to relying on laboratory-confirmed cases only, after finding some clinically diagnosed cases were COVID-19 negative.

On April 17, China again revised case and death toll numbers in Wuhan, to include previously unreported cases.

Invisibility of asymptomatic cases also makes cumulative cases reported unreliable. Lopez said between 10% and 30% of all COVID cases may be unreported because of this. But the delay in the publication of cases, especially deaths, is an area he sees as particularly challenging for the reliability and usefulness of data from high-income economies.

“In the U.K., I know you have data for last week,” he said. “You don’t want to know about COVID deaths weeks ago — you want to know about them yesterday and today. That timeliness dimension is as important as quality when it comes to capturing the deaths,” Lopez said.

“People are hungry for data, good data … [but] I’m a little worried that there might be too many models out there.”

— Nilanjan Chatterjee, Bloomberg distinguished professor, Johns Hopkins University

WHO recommendations

WHO has acknowledged COVID-19 data challenges, noting differences in countries’ reporting methods, retrospective data consolidation, and reporting delays.

“WHO issued recommendations for reporting surveillance data to all countries with a minimum set of indicators to better understand the epidemiology and trends of COVID-19. That said, not all countries have been able to report this data systematically to WHO, rendering [challenges] to provide a clear picture of the severity of the disease,” WHO said in a statement sent to Devex.

Exclusive: Coronavirus hits development pros' livelihoods

In the first results from a Devex survey tracking the impact of COVID-19 on development, a quarter of professionals say they have lost employment or income.

Maria Van Kerkhove, WHO’s COVID-19 technical lead, said that many countries are struggling to capture deaths from COVID-19.

“There is a very good example across Europe through the EuroMOMO project, which is capturing excess mortality in many countries across Europe. And excess mortality right now is very high.

“So I think it will take some time for us to really understand which deaths are due to COVID-19 directly, in terms of the infection causing that death, and which of the deaths are associated with COVID-19 either because someone has died because they didn't get care for some other reason,” she said during the agency’s press briefing on May 1.

Improved data, better predictions

Christopher Murray, director of the Institute for Health Metrics and Evaluation, discussed his work on collating and analyzing COVID-19 data to predict the next phases of the epidemic as part of a recent United Nations World Data Forum webinar.

He said a lot of issues affect modeling analyses for COVID-19, including testing rates and data timeliness. Reductions of reporting on Sunday and Monday leading to a spike on Tuesday makes models incorrectly assume the peak has been reached.

“There is a lot of bias in data,” Murray explained. “We know that there is a day of the event … and then there is the day that it is reported. If that is made transparent, then on the modeling side we can take that into account.”

To improve modeling predictions, two things need to happen, Chatterjee said: accurate reporting of hospitalizations, deaths, and other types of serious illness; and getting other types of data, such as people’s behavior, mobility, and co-morbidity conditions in a given area.

Proper vetting of models is also needed to ensure quality not quantity.

“People are hungry for data, good data … [but] I’m a little worried that there might be too many models out there. There should be data coordination … whereas there may be fewer models, but more vetted models so people can have more faith in these models,” Chatterjee said, adding that different models produce conflicting predictions, thereby confusing people.

But journalists have a role to play too. When analyzing modeling analyses, he said they should pay attention to the range of “uncertainty estimates” than the actual estimates itself.

“People should pay more attention to that, and think about [the] worst case and best case scenario because sometimes those might be more useful than looking at the estimate itself,” Chatterjee said.

Visit our dedicated COVID-19 page for news, job opportunities, and funding insights.

About the authors

  • Lisa Cornish

    Lisa Cornish is a Senior Reporter based in Canberra, where she focuses on the Australian aid community. Lisa formerly worked with News Corp Australia as a data journalist for the national network and was published throughout Australia in major metropolitan and regional newspapers, including the Daily Telegraph in Melbourne, Herald Sun in Melbourne, Courier-Mail in Brisbane, and online through Lisa additionally consults with Australian government providing data analytics, reporting and visualization services. Lisa was awarded the 2014 Journalist of the Year by the New South Wales Institute of Surveyors.
  • Sara Jerving

    Sara Jerving is a global health reporter based in Nairobi. Her work has appeared in The Wall Street Journal, The New York Times, the Los Angeles Times, Vice News, and Bloomberg News, among others. Sara holds a master's degree from Columbia University Graduate School of Journalism where she was a Lorana Sullivan fellow. She was a finalist for the Livingston Award for Young Journalists in 2018, part of a Vice News Tonight on HBO team that received an Emmy nomination in 2018 and received the Philip Greer Memorial Award from Columbia University Graduate School of Journalism in 2014. She has reported from over a dozen countries.
  • Jenny Lei Ravelo

    Jenny Lei Ravelo is a Devex Senior Reporter based in Manila. She covers global health, with a particular focus on the World Health Organization, and other development and humanitarian aid trends in Asia Pacific. Prior to Devex, she wrote for ABS-CBN, one of the largest broadcasting networks in the Philippines, and was a copy editor for various international scientific journals. She received her journalism degree from the University of Santo Tomas.