‘New era in digital biology’: AI reveals structures of nearly all known proteins

What a difference a year makes. Twelve months ago, the artificial intelligence (AI) company DeepMind stunned many scientists with the release of predicted structures for some 350,000 proteins , part of the work recognized as Science ’s 2021 Breakthrough of the Year . Yesterday, DeepMind and its partners went much, much further. The company unveiled the likely structures of nearly all known proteins , more than 200 million from bacteria to humans, a striking achievement for AI and a potential treasure trove for drug development and evolutionary studies.

“We’re releasing now the structures for the whole protein universe,” said Dennis Hassabis, founder and CEO of DeepMind, at a press conference in London.

The structural bounty comes from AlphaFold, one of the new AI programs that have cracked the protein-folding problem, the long-standing challenge of accurately deriving the 3D shapes of proteins from their amino acid sequences. AlphaFold’s newly predicted structures were released yesterday into an existing database through a partnership with the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI). The database “has provided structural biologists with this powerful new tool where you can look up the 3D structure of a protein almost as easily as you can do a keyword Google search,” Hassabis said.

Eric Topol, director of the Scripps Research Translational Institute, echoed the amazement of many outside scientists. “AlphaFold is the singular and momentous advance in life science that demonstrates the power of AI,” he tweeted. “With this new addition of structures illuminating nearly the entire protein universe, we can expect more biological mysteries to be solved each day.”

A big day for #AI in life science. Release of >200 million predicted 3D protein structures from open-source #AlphaFold , nearly the entire protein universe
See: https://t.co/gjASHqACqa @DeepMind
my comment below pic.twitter.com/yPgtPHMZac

— Eric Topol (@EricTopol) July 28, 2022

The DeepMind structure release is “remarkable,” said Ewan Birney, deputy director general of EMBL, at the press conference. “It will make many researchers around the world think about what experiments they can now do.”

The proteins solved by AlphaFold come from organisms ranging from bacteria to plants to vertebrates, including mice, zebrafish, and humans. Kathryn Tunyasuvunakool, a DeepMind research scientist, said it took AlphaFold roughly 10 to 20 seconds to make each protein prediction. The company had to work closely with EMBL-EBI, she noted, to figure out how to present the immense number of structures in the database.

DeepMind says more than 500,000 researchers have already used the database since its launch last year. Hassabis predicted a “new era in digital biology” in which drug developers could go from AI-predicted structures of proteins important to any medical condition to using AI to design small molecules that influence those proteins—and therefore treat an illness.

Others are using the structure predictions to develop vaccine candidates, probe basic biology questions such as how the so-called nuclear pore complex gatekeeps which molecules enter a cell’s nucleus, or examine the evolution of proteins when life first evolved.

Hassabis, however, cautioned the release of the structures is merely a starting point. “There’s still obviously a lot of biology, and a lot of chemistry, that has to be done.”