Massive Protein Redesign to Make Overlapping Genes
Authors:
Sean P. Leonard1* ([email protected]), Guillaume Urtecho2, Jennifer Chlebek,1 Christina Kang- Yun,1 Jose-Manuel Marti,3 Jonathan Allen,3 Dante Ricci,1 Dan Park,1 Mimi Yung,1 Harris Wang,2 Yongqin Jiao1
Institutions:
1Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory; 2Columbia University–New York; 3GS-CAD, Lawrence Livermore National Laboratory
Abstract
The future bioeconomy requires engineered microbes that behave predictably, robustly, and safely in natural environments. Most engineered microbes, however, fail to function outside of the laboratory and carry uncontrolled risks of genetic pollution into natural gene pools. The BioSecure Science Focus Area (SFA) at Lawrence Livermore National Lab is developing new approaches to enhance biocontainment of engineered bacteria.
Introducing overlapping genes into engineered bacteria can enhance stability and productivity, while reducing the risk of uncontrolled genetic spread. Overlapping genes share a single coding sequence of DNA and RNA but are translated in alternate reading frames. By designing overlapping genes, researchers align their evolutionary trajectories towards desirable traits. For example, overlapping an essential gene can prolong an engineered function and overlapping a toxic gene can reduce horizontal gene transfer. However, creating functional overlapping sequences requires extensive protein redesign and remains technically and computationally challenging.
The team performed a large-scale computational screen between 118 genes by generating over 7-million overlapping sequences in silico. Researchers compared these overlapping sequences and their predicted function (scores) between gene pairs to identify genes and gene pairs more amenable to overlap. Genes vary in their malleability, and the resulting scores are influenced by several factors including gene length, number of orthologs, and amino acid content. Several small genes, such as hicA, infA, and purE, appear amenable to overlap with multiple partner genes. Several larger genes, such as lacZ and aroB, also score well when overlapped with multiple smaller partners. These results suggest designed overlaps are feasible for many genes. However, these predictions are based on unproven computational models of protein function derived from evolutionary sequence data and must be experimentally validated.
To validate the protein models, researchers have begun experimentally testing redesigned protein variants for individual proteins identified in the screen. Given that most genes in the screen are conditionally essential in E. coli, the team tested protein function by complementing growth in auxotroph strains that lack these genes. The team has implemented a pipeline to test pooled variants at scale by sequencing and then use these fitness data to interpret model predictions.
In small-scale trials, researchers have identified functional genes that have undergone significant redesign but retain function. For example, purE variants exhibited wild-type function despite over 40% of residues being altered, while hisI variants maintained wild-type function with ~50% residues changed. In a recent trial, all tested hicA sequences (6/6) were functional, with each variant featuring 30% to 50% changed residues. Overall, researchers have experimentally validated that the protein models for multiple genes can create sequence-diverse yet functional variants, with successful validation observed in 10 out of the14 genes tested.
In the next phase, researchers will expand the experimental throughput by testing thousands of variants for select genes while also testing the function of overlapping sequences for both genes. These results demonstrate the feasibility of computational redesign of entire proteins in support of designed gene overlap. This ability to create novel overlapping genes will foster the next generation of dependable and secure engineered microbes.
Funding Information
This work is supported by the DOE, Office of Science, BER program, Lawrence Livermore National Laboratory (LLNL) BioSecure Science Focus Area within the Secure Biosystems Design program. Work at LLNL is performed under the auspices of the DOE at LLNL under contract no. DE-AC52-07NA27344.