The secret language of a small alphabet

It is fair to say that life has a very modest dictionary, I mean, 4 letters (A, T, C, G) is all it takes to create life. When you think that most of oral languages that we speak require definitely more letters than that it is really quite incredible (trivia fact for you: the smallest alphabet currently in use is called Rotokas and it comes from a tribe in Papua New Guinea).

How is then the life created using just these four letters? Well, A, T, C and G are the four nucleobases that make up the genomic material in the cells, i.e. the DNA. This genetic material is “read” by the enzyme called RNA polymerase, which can convert it to another 4-letter language that makes RNA. Then finally, another large enzyme, called a ribosome, translates the RNA language into protein language made of amino acids.

There are, generally speaking, 20 common amino acids in the cell, so clearly each nucleobase cannot code for one amino acids or we would run out of letters pretty soon. Well, in 1961 the brilliant F. Crick et al (1). demonstrated that each amino acid is coded by the combination of three bases, known as triplicate codons. This makes more sense than a single base to single amino acid idea as now we have enough variability in the genetic code to make all 20 amino acids. But wait, actually, there are more then enough now- 4*4*4=64 (any of the four bases can occupy any position in the triplicate codon). So what happens to the rest of the codons? It turns out that several different codons can code for the same amino acid. This property is called the degeneracy of the genetic code and is illustrated in the table below. In addition to degeneracy, the genetic code has two other properties: ambiguity– each triplet codon codes for only one amino acid and universality– all known organisms use the same genetic code.

Codon Table
Codon Table

From the first glance this codon business might seem pretty straight forward, right? Nucleobases make up DNA, which is used to store information about what types of proteins should be made, and the sequence of amino acids in the protein is determined by the codons in DNA. This is the kind of stuff than you tend to learn in the first day of your cell biology class. However, biological systems are rarely as simple as one would like them to be and the genetic code is no exception.

In recent years more and more information has been gathered about the various intricacies of the genetic code and one prominent feature that has gained a lot of attention is known as codon bias. Codon bias refers to the fact that different codons tend to be used more commonly in different organisms. For example, a codon GCC, which codes for amino acid alanine, is found and is used in human cells four times more commonly than GCG codon, which also codes for alanine. Moreover, that does not mean that GCC will be a preferable codon in some other organism. Usually codon bias reflects the relative abundance of the corresponding tRNAs (molecules that carry amino acids for specific codons) in the cell, meaning that there are more tRNAs for the more commonly used codons. This allows the most efficient rates of translation to be achieved and typically mRNAs with suboptimal codon sequence will have suboptimal translation rates. This phenomenon, in fact, has some significant consequences to many laboratory applications. In labs we often use genes that are not native to the cells we are working on. A classical example is GFP (Green Fluorescence Protein) that was originally cloned from a jellyfish, Aequorea victoria, and is now widely used to tag and track different molecules in the cell. Because it came from a jellyfish, when this protein is used in human cells the actual sequence that codes for GFP needs to be codon optimized for the codons that are used more commonly in the human cells to achieve the most efficient translation rates.

The idea of codon bias has also been explored in order to create virus resistant organisms and expand the capacity genetic code. Recently, a team from Harvard and MIT (2) has created an E. coli cell in which all 321 UAG codons have been changed to UAA codon. Both of these codons are used in the bacterium as a signal to stop protein synthesis. This change led to two important results.
Firstly, normally UAG is recognized by the so called release factor 1 (RF1), which leads to termination of translation, but now that it is no longer present in the cell RF1 can also be deleted. If you then reintroduced the UAG into these cells you have capacity to add a new amino acid with it. Scientists have added UAG-recognizing tRNAs that are loaded with amino acid that is not normally present in the cell. Consequently, they were able to expand the genetic code because the cells successfully produced proteins with this nonstandard amino acid in them.
Secondly, the UAG lacking cells also became resistant to bacteriophage T7. The resistance emerged because T7 has important genes in which, like in the normal E.coli, UAG is present to stop protein synthesis. However, because the reengineered bacterium no longer had the machinery required to recognize this stop codon this prevented the successful translation of proteins required for the assembly of viral particles.


Although these experiments might seem far from every-day use, they have more immediate application than one might think. Bacteriophages are one of the main threats for a number of biotechnology processes where bacterial cell are involved. For example, the production of yoghurt you eat in the morning and cheese you have with the glass of wine in the evening all require lactic acid bacteria for the product fermentation. If phages infect the cultures of the bacteria used in these processes they can reduce the quality and quantity of the product made. If we can engineer strains of bacteria that use alternative codons then the industry could greatly benefit from it by minimizing the phage attack rate.

Finally, codon bias has also been explored in terms of vaccine development. Poliovirus has been made with an alternative codon usage that can no longer successfully replicate in human cells. Same strategy has been used to make influenza virus with suboptimal codon usage (3). Such virus, which can still infect the cell but has reduced virulence and pathogenicity, could be used as a vaccine candidate. Typically, vaccination with replicating viruses achieves a better and more diverse immune protection and therefore, considering the high pandemic and epidemic potential of influenza, codon de-optimization seems to be an attractive strategy to pursue.

There are many aspects of codon usage that I have not discusses here. For example, it also seems that not only what codons are used is important but also there is a bias in how codon-pairs are arranged in the genome: certain codons are found next to each other more commonly than one would predict based on their frequencies. For those who would be interested in knowing more about the secrets within genetic code I suggest reading references 2 and 3.

All in all it seems that even language based on a four-letter alphabet might be a difficult one to learn.

1. Crick, Francis, Leslie Barnett, Sydney Brenner, and Richard J. Watts-Tobin.General nature of the genetic code for proteins. Macmillan Journals Limited, 1961.

2. Lajoie, Marc J., Alexis J. Rovner, Daniel B. Goodman, Hans-Rudolf Aerni, Adrian D. Haimovich, Gleb Kuznetsov, Jaron A. Mercer et al. “Genomically recoded organisms expand biological functions.” science 342, no. 6156 (2013): 357-360.

3. Wimmer, Eckard, Steffen Mueller, Terrence M. Tumpey, and Jeffery K. Taubenberger. “Synthetic viruses: a new opportunity to understand and prevent viral disease.” Nature biotechnology 27, no. 12 (2009): 1163-1172.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.