Columbia Technology Ventures

Computational methods for assessing and optimizing the protein expression potential of gene sequences in vivo

This technology is a quantitative method to predict the effect of mRNA secondary structure folding energy on protein expression. Achieving high-yield protein expression is an important aspect of biotechnology, structural biology, and biochemistry. Many proteins, however, have low expression levels or none at all. Protein expression levels are determined by a number of biochemical factors including the mRNA translation rate. The formation of stable secondary structures in mRNA can impede translation and reduce protein expression. By applying this technology, mRNA sequences can be designed with favorable secondary structure elements, thus effectively increasing the yield of protein expression in vivo. As such, the technology provides a platform to enhance protein yield without compromising protein function, and can potentially be applied to reduce costs in industrial and commercial applications such as food production, drug discovery, drug production and synthetic biology.

Identification and removal of undesirable secondary RNA structures for high-yield protein expression

Researchers used mathematical analysis of a large-scale protein-expression dataset to elucidate nucleotide sequence features in a widely used bacterial host organism E.Coli. These results were used to develop new computational algorithms to assess and optimize the protein expression potential of gene sequences. Unlike most approaches, which optimize protein expression by altering factors extrinsic to the target protein, this technology focuses on the intrinsic properties of the target protein. The likelihood of expressing high quantities of soluble proteins can be increased using computational predictors developed to score RNA folding, solubility, and expression. Use of these predictors can provide a method to modify amino acid content by codon substitutions to increase solubility and expression in heterologous genes. This technology, therefore, not only lets the user identify problematic mRNA structures but can also remove them to improve protein expression, both of which can potentially reduce the production costs of protein synthesis by making the processs more efficient and requiring less starting material when scaled for industrial and/or commercial applications.

This technology has been validated using modified mRNA sequences in the bacterial E. coli expression system, including statistical analysis of translation efficiency with a large scale data set.

Lead Inventor:

John F. Hunt, Ph.D.

Applications:

  • Increasing yield in industrial protein production processes
  • Rational computational design of genes to optimize protein expression in any organism
  • Rational computational design of genes to optimize protein expression under specific growth conditions
  • Identification of gene sequences likely to be translated inefficiently
  • Diagnosing condition-specific and organism-specific problems in translation that limit process efficiency
  • Identification of codon-specific and amino-acid-specific translation stress under process conditions
  • Improving performance of synthetic biology systems based on rational optimization of gene sequences

Advantages:

  • Reduces costs by increasing protein expression yields
  • Quantitative method for assessing and predicting protein expression
  • Enhanced design of mRNA sequences with designated removal of unfavorable regions increases protein yield
  • Rational computational design of genes to optimize protein expression

Patent information:

Tech Ventures Reference: IR CU12083, IR M10-014

Related Publications: