CAI (Codon Adaptation Index) is an effective measure of synonymous codon usage bias. It may give an approximate indication of the likely success of the heterologous gene expression. This online tool calculates CAI according to the relative synonymous codon usage of a reference sequence or existing expression host organisms.
Why you should use this tool?
The positive correlation between degree of codon bias and level of gene expression has been proved. This rare codon analysis tool is just to plot the codon usage frequency of your sequence and shows the codon usage distribution. It can help you decide if your sequence needs to be optimized for heterologous gene expression.
What is GC Content?
GC content is usually calculated as a percentage value and sometimes called G+C ratio or GC-ratio. GC-content percentage is calculated as Count(G + C)/Count(A + T + G + C) * 100.
Why to care about GC Content?
The GC pair is bound by three hydrogen bonds, while AT pairs are bound by two hydrogen bonds. And so,
The GC content affects the stability of DNA.
The GC content affects the secondary structure of mRNA.
The GC content affects the annealing temperature for template DNA in PCR experiments.
Where to apply the GC Content?
The GC Content can be used in
1. primer design for PCR experiments
2. gene design from protein expression
3. mRNA hairpin prediction
In general, codons can be grouped into 20 disjoint families, one family for each of the standard amino acids, with a 21st family for the translation termination signal. Each family in the universal genetic code contains between 1 and 6 codons. Where present, alternate codons are termed as synonymous. Although choice among synonymous codons might not be expected to alter the primary structure of a protein, it has been known for the past 20 years that alternative synonymous codons are not used randomly. This in itself is not startling as codon usage might be expected to be influenced at the very least, by mutational biases.
Analysis of genes from the RNA bacteriophage MS2 identified differences between the codon usage of phage genes and genes from its host, E. coli. Codon bias in MS2 might result from selection for the rate of chain elongation during protein translation. It was suggested that the most frequent synonyms of MS2 were those translated by the major tRNAs of its host. The observation of codon usage bias implied that not all synonymous mutations were neutral. The codon usage of the bacteriophage ΦX174 (5,386 bp), the first genome to be sequenced entirely, was found to be non-random, with a bias towards codons whose third position was thymidine (T) and away from codons starting with adenosine (A) or guanidine (G).
The number of species where the abundance and structures of tRNAs are known is limited relative to the number of organisms from which sequence data has been obtained. Indeed, what knowledge there is of tRNA abundance is potentially biased, because measurements are made under laboratory growth conditions. It is therefore desirable to define an optimal codon in terms of a more readily estimated characteristic. The most commonly used characteristic is the pattern of codon usage itself, the definition used in this thesis is “an optimal codon is any codon whose frequency of usage is significantly higher in putatively highly expressed genes”. Significance is estimated using a two-way chi-squared contingency test, with a cut-off at p<0.01. The most frequent codon for an amino acid is not necessarily an optimal codon, which is subtly different from the original definition of an optimal codon used by Ikemura, who defined optimal codons as those codons occurring most often in biased genes.
In all known organisms, from bacteria to man, the same triplets of DNA bases code for the same amino acids. However this does not mean that all species encode their genomes in exactly the same way. The code is redundant: a number of triplets code for the same amino acid. While all species are able to translate any sequence of DNA interchangeably, E. coli prefers to use certain triplets to code for certain amino acids which may be different to the ones we use. This ‘preference’ is reflected in the levels of tRNA which match such a triplet. In this project we resynthesised a number of genes de novo and thus were able to codon optimise them for expression in E. coli.
E. coli remains a popular choice for the expression of heterologous proteins. The presence of rare codons per se does not imply weak expression. Despite the poor overlap between the codon usage of Halobacterium halobium (70% G+C) and E. coli (50% G+C), genes from Halobium can be highly expressed in E. coli. In E. coli mutation of the ribosomal binding site of atpH can increase its level of expression 20-fold. An oligonucleotide of rare codons within the coding sequence of B. subtilis sspB (small acid soluble spore-protein) did not have a discernible effect on yield. The addition of rare AGG codons near the terminus actually enhanced expression of chloramphenicol acetyltransferase in E. coli.
However, the expression of heterologous genes can be adversely affected by unusual codon usage or context. The presence of rare codons in a recombinant gene can be compensated for by either adding the appropriate tRNA, or synthesising the gene to remove the rare codons. The expression in E. coli of the human granulocyte macrophage stimulating factor was enhanced after argU was induced (even though the recombinant protein had only a single AGG codon). The human rap74 gene (RNA polymerase associating protein) was expressed more efficiently in E. coli after codon usage was adjusted, previously there are a large number of amino terminal fragments due to frameshifts. Similarly altering the codon usage of avidin, tropoelastin (Martin et al. 1995) and isovaleryl-coa dehydrogenase enhanced their expression in E. coli.