
Abstract Engineering enzymes for increased efficiency is key to enabling sustainable, ‘green’ biocatalytic production processes in the chemical and pharmaceutical industries. This challenge can be tackled from two angles: by directed evolution, based on labour-intensive experimental testing of enzyme variant libraries, or by computational methods, where data-dependent algorithms relating sequence and function are used to predict biocatalyst improvements. Here, we combine both approaches into a two-week, low-cost workflow, in which ultra-high throughput screening of a library of imine reductases (IREDs) in microfluidic devices provides not only selected ‘hits’, but also long-read sequence data linked to fitness scores of >17 thousand enzyme variants. We demonstrate the engineering of an IRED for chiral amine synthesis by mapping its local fitness landscape in one go, ready to be used for interpretation and extrapolation by protein engineers with the help of machine learning (ML). We calculate position-dependent mutability and combinability scores of mutations and comprehensively illuminate a complex interplay of mutations driven by synergistic, often positively epistatic effects. When interpreted by easy-to-use regression and tree-based ML algorithms designed for random whole-gene mutagenesis data, 3-fold improved ‘hits’ initially obtained from experimental screening are extrapolated further to give another order of magnitude improvement (23-fold in kcat) after testing only a handful of designed mutants. Predictions succeed in >80% of cases. The catalytic features discovered in one IRED are shown to be portable and confer activity on IREDs with ∼50% homology. Our campaigns yield biocatalytically efficient IREDs and are paradigmatic for future enzyme engineering efforts that rely on large sequence-function maps, profiling how a biocatalyst responds to mutation. In the age of predictive biology, these maps will chart the way to improved function by exploiting the synergy of rapid experimental screening combined with ML evaluation and extrapolation.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
