On November 30, 2020, a piece of heavy news attracted much attention in the technology community: Alphafold, a deep learning algorithm proposed by DeepMind, has solved the protein folding problem that has plagued people for fifty years. A milestone breakthrough has been made in the field of AI's accurate prediction of protein folding. Subsequently, DeepMind published the research results of further optimizing the artificial intelligence system RoseTTAFold in Nature on July 15, 2021.
But how do proteins fold into these unique shapes? This is a grand challenge in the field of biology that has been puzzling scientists for almost 50 years. Recently, AlphaFold, an artificial intelligence system created by researchers at DeepMind in the UK, has improved the accuracy of protein structure prediction to the atomic level, essentially solving this protein folding problem. This is decades ahead of many scientists' expectations and demonstrates the potential of AI to solve major scientific problems.
Proteins are like a carefully assembled machine, the parts of which are the 20 amino acids in our bodies. In the process of synthesizing the amino acid sequence encoded by genes, individual amino acid molecules follow the instructions of the genetic information contained in the gene sequence and are sequentially linked together like beads to form polypeptide chains that form the primary structure of a protein.
However, the construction of a protein molecule is not finished once the amino acid molecules have been linked into a polypeptide chain; it needs to be further folded to create a spatial structure in order to function. Yet the gene sequence only determines the synthesis of the amino acid sequence and contains no further information to guide how it folds into a unique three-dimensional structure.
In fact, the way in which amino acid sequences fold is embedded in themselves, and they design their own folding. The amino acid molecules in a one-dimensional polypeptide chain act as if they know how to communicate with each other, some repelling each other, some attracting each other, forming helices and folding into folds that make up the secondary structure of the protein. Then it folds further into a unique spatial structure, like a piece of wool wound into a ball of thread, to form the tertiary structure of the protein.
AlphaFold's process of solving protein folding problems requires the input of a large amount of protein sequence and structure data. Researchers at DeepMind trained AlphaFold using publicly available data of ~170,000 protein structures from the Protein Data Bank as well as a large database containing protein sequences of unknown structures. It enables AlphaFold to find the interaction between amino acid molecules and the evolution relationship between protein fragments, thus obtaining a powerful ability to predict protein structure. Ultimately, once the amino acid sequence of a protein is known, its structure can be predicted quickly and accurately, which is equivalent to accurately linking the primary structure and tertiary structure of a protein through a sophisticated algorithm.
In the 2018 protein structure prediction competition, AlphaFold ranked first among all participating teams, accurately predicting the structures of 24 out of 43 proteins, making unprecedented progress. In 2020, the upgraded version of AlphaFold draws inspiration from the latest advances in the fields of biology, physics and machine learning, upgrades the algorithm, and once again wins the championship with overwhelming excellence. This time, there are only slight differences between the various protein structures predicted by AlphaFold and the experimental results at the atomic scale, reaching a level comparable to traditional experimental methods. It can be said that AlphaFold has basically solved the protein folding problem.
Reference
Note: If you don't receive our verification email, do the following: