DeepMind has predicted the structure of almost every protein known to science

DeepMind says its AlphaFold tool has successfully predicted the structure of nearly all proteins known to science. From today, the Alphabet-owned AI lab is offering its database of over 200 million proteins to anyone for free. 

When DeepMind introduced AlphaFold in 2020, it took the science community by surprise. Scientists had spent decades trying to understand how proteins, which are essential to life, are structured; it was considered one of the “grand challenges” of biology. Understanding how they are shaped is crucial to understanding how they function. 

Last year, DeepMind released the source code of AlphaFold and made the structures of 1 million proteins, including nearly every protein in the human body, available in its AlphaFold Protein Structure Database. The database was built together with the European Molecular Biology Laboratory, an international public research institute that already hosts a large database of protein information.

The latest data release gives the database a massive boost. The update includes structures for “plants, bacteria, animals, and many, many other organisms, opening up huge opportunities for AlphaFold to have impact on important issues such as sustainability, fuel, food insecurity, and neglected diseases,” Demis Hassabis, DeepMind’s founder and CEO, told reporters on a call this week. 

The expanded database could act as an important resource for scientists, helping them to better understand diseases. It could also speed innovation in drug discovery and biology. 

“AlphaFold is probably the most major contribution from the AI community to the scientific community,” said Jian Peng, a computer science professor at the University of Illinois Urbana-Champaign who specialises in computational biology. 

Since its release in 2020, researchers have already used AlphaFold to understand proteins that affect the health of honeybees and to develop an effective malaria vaccine

The database allows researchers to look up 3D structures of proteins “almost as easily as doing a keyword Google search,” said Hassabis. 

Predicting the structures of proteins is very time consuming, and having a tool with 200 million readily available protein structures will save researchers a lot of time, said Mohammed AlQuraishi, a systems biologist at Columbia University, who is not involved in DeepMind’s research. 

AlphaFold could also help scientists to reassess previous research to better understand how diseases happen, Peng said. 

However, for many proteins “we’re interested in understanding how their structure is altered by mutations and natural allelic variation, and that won’t be addressed by this database,” said AlQuraishi. “But of course the field is developing fast, and so I expect tools to accurately model protein variants will begin to appear soon,” he added. 

The quality of AlphaFold’s predictions may also not be as accurate for rarer proteins with less available evolutionary information, says Peng.  

The move is the latest development in DeepMind’s push into “digital biology,” where “AI and computational methods can help to understand and model important biological processes,” said Hassabis. Hassabis also leads a new venture, also owned by Alphabet, called Isomorphic Labs, which is developing AI for drug discovery. 

Pushmeet Kohli, head of AI for science at DeepMind, said the company has plenty of challenges in the life sciences it still wants to tackle, such as how proteins behave and interact with other proteins. 

Hassabis said his dream is that AI could not just help figure out the structure of proteins, but become a “significant part of the discovery process for new drugs and cures.”