How good is ML protein engineering?
One of the conundrums facing protein engineers is whether (and how) to use directed evolution versus machine learning. On the one hand, directed evolution will generally get you where you’re going, as most selections and screens are directly for function. However, the path can be circuitous and inefficient (as above). In contrast, machine learning holds the promise of providing a more direct path to function, predicting the mutations that should work, rather than randomly chancing upon them. However, there are a number of hurdles that must be traversed (also ala above), and some of these hurdles are far from easily cleared.
Still, the excitement attending the use of machine learning for protein engineering is well-deserved. We continue to utilize the package MutCompute (mutcompute.com), which is quite old (in the general scheme of things), but still proves quite adept at predicting mutations that generally improve protein function (most recently, photocatalysts (Liu et al., 2024). However, the hit rate for mutations for many software packages can be relatively low, requiring considerable experimental efforts to identify a few variants that improve function. For instance, MutCompute was used to select 159 predicted beneficial variants of the plastic-degrading enzyme PETase, and ultimately only four were used for further combinations to make a much better enzyme (Lu et al., 2022). Researchers used three protein generative models to redesign malate dehydrogenase and copper superoxide dismutase (Johnson et al., 2025). Trained on 5000 protein sequences per enzyme, the models generated 144 variants for experimental testing, with only 19% showing activity above background. However, additional computational filtering improved success rates by 50–150%.
But not all proteins are equal in terms of predictions, and in particular, it has proven exceedingly difficult to generate predictions for allosteric proteins. Rosetta was used to engineer LacI to respond to new ligands using computational design of the ligand binding pocket. Results showed that 14 out of 15 of the highest-ranked designs did not repress transcription; therefore, they focused on screening variants (in the range of thousands) with fewer mutations. They did end up identifying variants that responded to each of the new ligands, but only because a high-throughput selection-screening method was used to evaluate function (Taylor et al., 2016).
In our recent bioRxiv study (Clark-ElSayed et al., 2025), we tested whether new generative protein design models could be used to engineer allosteric transcription factors. Specifically, we compared LigandMPNN–a structure-informed generative –with traditional directed evolution for engineering RamR to respond to benzylisoquinoline alkaloids. Using two structure-prediction tools, we generated protein-ligand complexes and used these as backbones to LigandMPNN, targeting the same residues as in the directed evolution screen. We cloned and tested nine designed variants in E. coli, but none were able to repress transcription and did not respond to the target molecules. This demonstrates the current limitations of computational design for changing allosteric binding specificity.
So, hit rates may be especially low for allosteric proteins in part because most algorithms do not explicitly take into account conformational change. While one might expect that sequence-based methods like ESM should implicitly account for such constraints, they have also struggled to engineer allosteric transcription factors, potentially because they lack awareness of long-range residue interactions. To address this, protein design models could be integrated with tools that identify or preserve functionality. For example, in a recent study, FuncLib was used to design a library of approximately 17,000 TtgR variants, of which ~85% retained transcriptional repression activity. FuncLib combines evolutionary and energy-guided design to introduce mutations that increase thermodynamic stability, enabling it to explore sequence space while preserving allosteric function, thereby enhancing the probability of generating functional designs (Nishikawa & Chen, 2024).
References
Clark-ElSayed, A., Creed, E., Nayvelt, K., & Ellington, A. (2025). Comparing LigandMPNN and Directed Evolution for Altering the Effector-Binding Site in the RamR Transcription Factor. Synthetic Biology. https://doi.org/10.1101/2025.07.10.663684
Johnson, S. R., Fu, X., Viknander, S., Goldin, C., Monaco, S., Zelezniak, A., & Yang, K. K. (2025). Computational scoring and experimental evaluation of enzymes generated by neural networks. Nature Biotechnology, 43(3), 396–405. https://doi.org/10.1038/s41587-024-02214-2
Liu, Y., Bender, S. G., Sorigue, D., Diaz, D. J., Ellington, A. D., Mann, G., Allmendinger, S., & Hyster, T. K. (2024). Asymmetric Synthesis of α-Chloroamides via Photoenzymatic Hydroalkylation of Olefins. Journal of the American Chemical Society, 146(11), 7191–7197. https://doi.org/10.1021/jacs.4c00927
Lu, H., Diaz, D. J., Czarnecki, N. J., Zhu, C., Kim, W., Shroff, R., Acosta, D. J., Alexander, B. R., Cole, H. O., Zhang, Y., Lynd, N. A., Ellington, A. D., & Alper, H. S. (2022). Machine learning-aided engineering of hydrolases for PET depolymerization. Nature, 604(7907), 662–667. https://doi.org/10.1038/s41586-022-04599-z
Nishikawa, K., & Chen, J. (2024). Highly multiplexed design of an allosteric transcription factor to sense novel ligands [Dataset]. Zenodo. https://doi.org/10.5281/ZENODO.13381000
Taylor, N. D., Garruss, A. S., Moretti, R., Chan, S., Arbing, M. A., Cascio, D., Rogers, J. K., Isaacs, F. J., Kosuri, S., Baker, D., Fields, S., Church, G. M., & Raman, S. (2016). Engineering an allosteric transcription factor to respond to new ligands. Nature Methods, 13(2), 177–183. https://doi.org/10.1038/nmeth.3696