🟥 Deck

🟥 Initial ML considerations

The aim of the ML model is to recommend amino acid sequence mutations for a selected enzyme to engineer a new variant that can show binding activity with desired functionality on a target substrate.

The main requirements for the model:

It should generate foldable amino acid sequences
Guarantee high binding activity on a target substrate
Preserve functionality of a starting input enzyme
Achieve the above with a small library size to decrease wet lab work

Since substrate binding alone does not directly translate to catalytic efficiency, we will undertake an additional parallel approach, optimizing the mutants for catalytic efficiency, specifically the turnover number (kcat), which measures the number of reaction cycles per enzyme per unit time. The screening of candidates for both substrate affinity and reaction turnover rate will allow filtering out non-optimal variants, yielding high-confidence hits for further work.

<aside> 💡

Broadly speaking, the pipeline aims to replace the conventional high-throughput candidate screening—for instance, in the case of a typical protein directed evolution workflow—with computational variant generation and sorting modularity, i.e., performing most of the job in silico. This approach, detailed below, streamlines the traditionally labor-intensive and costly process of assaying thousands of enzyme variants at the wet lab validation stage, reducing the number to a few dozen and significantly cutting overall R&D costs.

</aside>

🟥 Proof-of-concept

As a proof-of-concept, we set out to computationally design a more efficient version of the human glyoxalase I (Glo1) enzyme (EC 4.4.1.5) for subsequent wet lab validation of predicted candidates.

To engineer the enhanced enzyme, we will mutate selected amino acid residues in the wild-type enzyme’s active site and test the newly generated variants for improved binding affinity towards the enzyme’s substrate — S-lactoylglutathione.

Information on EC 4.4.1.5 - lactoylglutathione lyase - BRENDA Enzyme Database

Rhea - reaction knowledgebase

🟥 Pipeline overview

In silico protein design is a computational approach to designing new proteins with desired properties. The following steps can be taken to design a protein using in silico methods: