🟥 Summary

This is our legacy pipeline. As a proof-of-concept, we set out to computationally design a more efficient version of the human glyoxalase I (Glo1) enzyme for subsequent wet lab validation of predicted candidates.

To engineer the enhanced enzyme, we mutated selected amino acid residues in the wild-type enzyme’s active site and tested the newly generated variants for improved binding affinity towards the enzyme’s substrate — S-lactoylglutathione.

Since substrate binding alone does not directly translate to catalytic efficiency, we undertook an additional parallel approach, optimizing the mutants for catalytic efficiency, specifically the turnover number (kcat), which measures the number of reaction cycles per enzyme per unit time. The screening of candidates for both substrate affinity and reaction turnover rate allows us to filter out non-optimal variants, yielding high-confidence hits for further work.

Broadly speaking, the pipeline aims to replace the conventional high-throughput candidate screening — for instance, in the case of a typical protein directed evolution workflow — with computational variant generation and sorting modularity, i.e., performing most of the job in silico. This approach, detailed below, streamlines the traditionally labor-intensive and costly process of assaying thousands of enzyme variants at the wet lab validation stage, reducing the number to a few dozen and significantly cutting overall R&D costs.

For details, see the description of our pipeline below.

🟥 Introduction

dev.png

In silico protein design is a computational approach to designing new proteins with desired properties. The following steps can be taken to design a protein using in silico methods:

  1. Point mutations generation: First, a pre-trained natural language processing (NLP) model like ProtBERT can be used to generate point mutations in the target protein sequence. This can be achieved by feeding the original protein sequence to ProtBERT and using it to predict the impact of mutations on protein properties such as stability and function.
  2. Protein sequence folding: Next, the protein sequence with the desired mutations can be fed into AlphaFold, a deep learning-based protein folding prediction tool that can predict the three-dimensional structure of the protein. This step is important for understanding how the mutations will affect the overall structure of the protein.
  3. Protein structure and ligand docking: Once the protein structure has been predicted, it can be used for ligand docking simulations to identify small molecules that bind to the protein. This can be done using computational tools like AutoDock Vina, which calculates the binding affinity of ligands to the protein.
  4. Binding affinity Vina score: After docking simulations, the binding affinity of each ligand to the protein can be evaluated using the Vina score. This score indicates the strength of binding between the protein and ligand, and can help identify the most promising ligands for further testing.
  5. Selection of best mutated sequences: Based on the Vina scores, the best mutated protein sequences can be selected for further in vitro testing.

In summary, in silico protein design can provide a powerful tool for designing new proteins with specific properties. By using NLP models like ProtBERT and deep learning tools like AlphaFold, protein designers can predict the effects of mutations and design proteins with specific functional and structural properties. Ligand docking simulations and Vina scores can help identify the most promising protein-ligand interactions, leading to the selection of the best mutated protein sequences for further experimental testing.

🟥 Active site residues identification

Resources: https://prankweb.cz

Input: 7WT1.pdb

Results: A_101, A_103, A_33, A_35, A_37, A_60, A_62, A_65, A_67, A_69, A_71, A_92, A_99, B_118, B_122, B_126, B_150, B_157, B_160, B_162, B_170, B_172, B_179, B_182, B_183

Center = (-9.2, -11.2, -7.5)