AI creates new therapeutic proteins from scratch, an innovative and cost-effective technique
⇧ [VIDÉO] You may also like this partner content (post ad)
Proteins de novo Functional proteins with limited homology to natural proteins were designed using large learning models. But this technology is time-consuming and expensive. Recently, a small startup in California used artificial intelligence based on a text generation learning model, such as ChatGPT, to design new functional antibacterial proteins. This new way would allow the development of new drugs in a shorter period of time.
Directed evolution has been shown to be highly effective in finding variants of known proteins with improved properties. Indeed, remember that directed evolution is used in protein engineering and is used to mimic the process of natural selection to “direct” the evolution of proteins for a specific purpose, such as the development of sustainable treatments for as-yet-unknown diseases. to patent restrictions or, for example, to new enzymes that can break down non-recyclable plastics. This technique was awarded the Nobel Prize in Chemistry in 2018.
However, designing proteins that are not identical to proteins in nature is extremely difficult. Until now, the field has relied on two traditional methods: expensive and time-consuming searches for proteins that exist in nature, or trying to make small changes to an existing protein in the hope of achieving the desired result.
Artificial intelligence can provide a smart way to take on this tedious task of protein design. A protein is nothing more and nothing less than an arrangement of amino acids connected by peptide bonds, these amino acids are like coordinated words in a sentence. ChatGPT has already demonstrated its effectiveness in successfully passing law and economics exams by consistently generating text.
Recently, based on this principle, Berkeley (California) startup Profluent, in collaboration with the University of California San Francisco (UCSF), used deep generative models to “teach the language of biology” to artificial intelligence and designed new functional functions. proteins. Empirical models are developed on large-scale data. Their work is published in the journal Nature Biotechnology.
ProGen, a revolution for biological engineering
Like language models for text, Profluent models are built on large-scale data, but instead of adjectives and nouns, they learn the “language of the genetic code.”
Experienced founder Ali Madani explained in his statement: As companies experiment with exciting new biotechnologies like genome editing with CRISPR, changing what nature gave us, we do something different. We use artificial intelligence and large language models to enable ChatGPT to learn the basic language of biology and design new proteins with the potential to treat disease. “.
To create their model, the scientists fed the amino acid sequences of 280 million different proteins of each species into a machine learning model and allowed it to integrate the information over several weeks. Next, they refined the model by priming it with 56,000 sequences from five lysozyme families, along with contextual information about these proteins.
The model quickly generated a million sequences, and the research team selected 100 to test based on natural protein sequences as well as the natural “grammar” and “semantics” of amino acids underlying the AI-generated proteins.
From this first batch of 100 proteins, the team made five artificial proteins to test in cells and compared their activity to an enzyme found in egg whites known as hen egg white lysozyme (HEWL). Similar lysozymes are found in human tears, saliva and milk, where they fight bacteria and fungi.
In particular, two of the artificial enzymes were able to cleave bacterial cell walls with comparable activity to HEWL, but their sequences were about 18% identical to each other. The two sequences were approximately 90% and 70% identical, respectively, to any known protein.
A single mutation in a natural protein can prevent it from working. But the team found that the AI-created enzymes showed activity despite only 31.4% similarity between their sequence and that of a known natural protein.
Considering that AI was able to learn how enzymes should be designed just by studying raw sequence data. The atomic structures of the artificial proteins, as measured by X-ray crystallography, appeared “correct,” even though the sequences were unlike anything known.
Use of ProGen for drug production
You should know that Salesforce Research developed ProGen in 2020 based on a type of natural language programming that its researchers originally developed to generate English text. They knew from their previous work that an AI system could learn the grammar and meaning of words on its own, as well as other basic rules for well-written writing.
Nikhil Naik, PhD, director of artificial intelligence research at Salesforce Research and lead author of the study, said in a statement from UCSF: When you train sequence-based models with lots of data, they are really strong at learning structure and rules. They learn which words can go together “.
The design options with proteins are almost endless. Lysozymes are small like proteins, containing up to 300 amino acids. However, there are 20,300 possible combinations with only 20 possible amino acids, and for lysozymes this number is enormous.
Given the unlimited possibilities for the authors, it is remarkable that the model can generate working enzymes so easily. Ali Madani concludes: This is a versatile new tool available to protein engineers, and we look forward to seeing therapeutic applications. “.