How CEPI and its partners are using AI to prepare for 'Disease X'
The realm of possibilities of what pathogen could lead to the next pandemic is vast. But artificial intelligence can help researchers sort through vast amounts of information in order to prepare.
By Sara Jerving // 14 March 2024There are very few ways the world might have been considered “lucky” during the COVID-19 pandemic. But one of them is how quickly researchers were able to develop vaccines. Scientists had already been working on vaccines against infectious respiratory diseases MERS and SARS, which come from the same virus family as COVID-19. This enabled them to use that information to quickly develop COVID-19 vaccines. For the next pandemic, the world might not be so lucky. A virus that the scientific community knows little about might spread rapidly across borders. But researchers are trying to prepare ahead of time with a strategy to tackle virus families and the highest-risk viruses within those families. It’s an exercise in probability. “Disease X” — an unknown pathogen that could cause the next pandemic — is likely to come from one of 25 virus families — and there can be thousands of viruses within each virus family. And so the realm of possibilities of what could happen next are gargantuan and preparing for the unknown is complicated. “We want to go through each virus family, with pandemic potential, learn as much as we can ... so that if something were to happen, it will be related to a virus that we know something about.” --— Jimmy Gollihar, head of the antibody discovery and accelerated protein therapeutics laboratory, Houston Methodist Research Institute “We don't know which is the one that's going to hit,” said In-Kyu Yoon, acting executive director of research & development at the Coalition for Epidemic Preparedness Innovations, or CEPI. CEPI is working with a network of research institutions to help the world get into the position where it only takes 100 days to develop a vaccine when a pandemic strikes. They call this their 100 days mission. One arm of this includes the use of artificial intelligence, which helps researchers process and analyze vast amounts of information in a way that’s more manageable and less expensive than using previous methods or if humans were assigned that task. “It makes it — in some ways — practical. That's what AI allows. You could do it if you had a million years, but it just maybe makes it realistic in terms of actually using that strategy and approach for preparation,” Yoon said. Mounds of data One of the CEPI-funded research consortia involved in the work using AI is led by the Houston Methodist Research Institute, and includes Argonne National Laboratory at the University of Chicago, the J. Craig Venter Institute, La Jolla Institute for Immunology, The University of Texas at Austin, and the University of Texas Medical Branch at Galveston. They are contributing to building disease vaccine libraries on different viral families — filling out details on what is known about how the body’s immune system responds to different virus families. They are also developing strategies on how to grow the knowledge base around specific viruses with pandemic potential. Two of the virus families they are currently working on are Arenaviridae, a family that includes the virus that causes Lassa fever, and Paramyxoviridae, a family that includes the virus that causes measles, said Jimmy Gollihar, a professor of pathology and genomic medicine as well as head of the antibody discovery and accelerated protein therapeutics laboratory at the Houston Methodist Research Institute. And there’s a lot of data at their disposal. When viruses circulate, data known as genomic sequences is collected from individual samples to decipher the genetic material of the virus. But viruses also evolve and mutate so these sequences continually change. There is also data on how likely a virus could move from an animal to a human, and conditions under which this could happen. Thousands of different sequences then need to interface with people’s potential immune responses, Yoon said. But instead of humans sifting through this data that is public and generated in labs, researchers can train machine learning models to help them process and analyze it. “Going through that requires a lot of computing power,” Yoon said. “The real benefit of AI is that it can — in some ways — learn from its algorithm, and then keeps going forward in that way.” Optimal designs AI can help with identifying priority viruses. It can also be used to design immunogens — molecules capable of eliciting an immune response in a human body against pathogens — and then assessing their performance. While there are over a billion potential immunogens that could be used against each pathogen, Gollihar said, researchers can use AI to narrow down the most promising protein designs. “You can imagine how complex that might eventually get. So actually having this capability to go through all of those thousands and thousands of possibilities, and the possible interactions, to narrow down the interactions that might be most favorable,” Yoon said. The researchers are teaching neural networks — an AI method that teaches computers to process data in a way similar to the human brain — tasks that enable them to optimally design proteins that serve as immunogens in a vaccine. This automates a process that was previously done manually by researchers. AI can also be used to stabilize immunogens — which is important for vaccines to make them durable, Gollihar said. The computer takes an immunogen, puts it in three dimensional coordinates, and then the neural network identifies where it should be mutated to make it more stable and in a structure that will elicit protective monoclonal antibodies, which are human-made proteins that act like antibodies in the immune system. Their research teams are also using large language models — AI programs that can recognize and generate text — which have been popularized through applications such as ChatGPT. They use these models to analyze data from genomic surveillance, which shows their natural evolution, as well as experiments conducted safely in laboratories, hoping to use this information to design immunogens with broad protection capability against a wide range of closely related pathogens. And they are using machine learning and computational tools to better understand immune responses to related viruses — for example, looking at how some types of white blood cells act when confronted with a pathogen, and then using this data to design more effective vaccines. “By knowing and understanding how the immune system protects people from related viruses, we want to use that information to make sure that our designs have those types of sequence features in them, while also being stable, and also being as broadly protective as possible,” Gollihar said. Real world trials, data limitations The computer work is the less expensive part. When the most promising optimal immunogen designs have been narrowed down, they would then be paired with a vaccine platform, such as messenger RNA, and tested in labs and in small animals, and ultimately in clinical trials with humans. This is the more expensive part of the process. But AI can also play a role in the next steps. And CEPI hopes that further downstream, AI computation methods can assist with designing clinical trials. Last October, CEPI announced a new partnership with IQVIA, a company with leading AI software, to “rapidly conduct life-saving clinical research for vaccines and other biological countermeasures against emerging infectious diseases.” And then, when a vaccine is proven efficacious, researchers can use AI computational methods to assist in designing the optimal way a vaccine should be rolled out and whom to target. And while there is vast amounts of information to sort through — it’s also not enough. The quality and the outcomes of all of this work is dependent on access to quality data. There may be an abundance of information on one kind of virus but little on another, which might be a high risk, Yoon said. For example, there is a lot of data on coronaviruses, but limited information about Nipah viruses. “There's just not a lot of data. Labs haven't been able to generate the kind of data that is needed to really train these models well,” Gollihar said. “I think a lot of the low-hanging fruit from taking data from public repositories and doing stuff with it with AI is over. The next wave of AI is going to have been trained on very specific datasets.” His teams are working on generating those datasets. “Your models are only as good as the data that you feed them. So our mission is really to generate the datasets that don't exist so that we can train the models to do what we want them to do,” he said. A million pieces How soon — and if — CEPI and its partners will be in a position that they can develop a vaccine in 100 days is a tricky question because it depends on the characteristics of the new virus. Yoon said pandemic preparedness is like a million-piece jigsaw puzzle. “The more we put in, the better the picture is,” he said. But then a virus with pandemic potential strikes — which is like a dart, he said. It lands somewhere on the million-piece jigsaw puzzle. It might land on a part of the puzzle that researchers hadn’t yet filled in — and that’s going to make vaccine development much harder. But if that dart lands on a part that is partially filled in, they have a head start. And so, the more people working to fill in puzzle pieces on those 25 virus families the better. “If we had the resources, we would target everything,” Yoon said. If the world is lucky, in two years, researchers might be in a position to develop a vaccine in 100 days, Yoon said. But, again, it all depends on where the dart lands. “We want to go through each virus family, with pandemic potential, learn as much as we can ... so that if something were to happen, it will be related to a virus that we know something about,” Gollihar said.
There are very few ways the world might have been considered “lucky” during the COVID-19 pandemic. But one of them is how quickly researchers were able to develop vaccines.
Scientists had already been working on vaccines against infectious respiratory diseases MERS and SARS, which come from the same virus family as COVID-19. This enabled them to use that information to quickly develop COVID-19 vaccines.
For the next pandemic, the world might not be so lucky. A virus that the scientific community knows little about might spread rapidly across borders.
This story is forDevex Promembers
Unlock this story now with a 15-day free trial of Devex Pro.
With a Devex Pro subscription you'll get access to deeper analysis and exclusive insights from our reporters and analysts.
Start my free trialRequest a group subscription Printing articles to share with others is a breach of our terms and conditions and copyright policy. Please use the sharing options on the left side of the article. Devex Pro members may share up to 10 articles per month using the Pro share tool ( ).
Sara Jerving is a Senior Reporter at Devex, where she covers global health. Her work has appeared in The New York Times, the Los Angeles Times, The Wall Street Journal, VICE News, and Bloomberg News among others. Sara holds a master's degree from Columbia University Graduate School of Journalism where she was a Lorana Sullivan fellow. She was a finalist for One World Media's Digital Media Award in 2021; a finalist for the Livingston Award for Young Journalists in 2018; and she was part of a VICE News Tonight on HBO team that received an Emmy nomination in 2018. She received the Philip Greer Memorial Award from Columbia University Graduate School of Journalism in 2014.