Why is AUG the initiation codon?

Why is AUG the initiation codon?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Is there any reason why AUG is the initiation codon? Can't translation start with different codons?

A good question (if a little mixed up on transcription vs. translation!)

AUG is not always the start codon, but whatever the codon is it will always code for Methionine (or fMet, but still a variation on Met), even if the codon codes for a different amino acid otherwise. A separate transfer RNA (tRNAi, the initiator tRNA) is used for the arrangement of this first step, guided by eIF2 [in eukarya].

In this respect it's less a question of "why AUG" than "why this specific initiator tRNA", the answer being that it's got certain sequence elements and modifications that distinguish it from the elongating tRNAs which bind elongation factors and hence are targeted to the ribosomal A and B sites instead of the ribosomal P site (with function being dependent on form, basically it's shaped to set up transcription rather than to elongate an existing nascent chain polypeptide).

"Identity elements appear to finely tune the structure of the initiator tRNA, and growing evidence suggests that the body of the tRNA is involved in transmitting the signal that the start codon has been found to the rest of the pre-initiation complex." - Kolitz and Lorsch, 2010

The other start codons are just from natural chemical variation (or evolution, whichever way you want to look at it) giving rise to different codon-recognising protein shapes.

The machinery for starting translation works, and as such is "conserved" - evolution has kept it, and that's why it is always the same codon (more or less). Archaea have a very similar shaped tRNAi acceptor stem, and is an equally good ligand, which shows how ancient the system is and gives an idea of how fundamentally difficult it must be for an organism to change a system like this through mutation (3.5 billion years of evolution can't be wrong! etc. haha).

Not to just gloss over the part where I said that non-AUG codons get used too (in yeast and mammalian cells), the following is from a study in which it was found changing 1 (and only 1) of the AUG bases still permitted translation initiation:

"Naturally occurring misrecognition indicates that discriminating two base-paired near-cognate codons from the perfect three-base-paired AUG codon is subject to mistakes. Mutations in translation initiation factors, such as eIF1 and eIF2b, further increase the levels of these mistakes.

Two base-pairing interactions between non-AUG codons and the anticodon of the Met-tRNAi are sufficient to trigger translation initiation, suggesting that wild-type eIF1 plays a role in monitoring proper base-pairing interactions when scanning for the AUG start site. It would be predicted that the Met-tRNAi, not a cognate tRNA matching an individual non-AUG codon, is used in translation initiation at these non-AUG start codons.

The translation initiation complex will bind only the Met-tRNAi as opposed to other tRNAs because Met-tRNAi has unique sequence and structural features that allow it to be loaded onto eIF2 of the ternary complex and enable it to fit into the P site of the ribosoment="bottom" data-s-popover-placement="bottom" title="Follow this answer to receive notifications"> Follow

From recent empirical research (Wang et al., 2018) on eukaryotes, basically the non-AUG start codons have context-dependent [translation initiation] efficiency, while AUG is a "sure thing", i.e. the nucleotides surrounding it have little impact on its efficiency.

There are some theoretical biochem explanations for this, which I'll just quote as-is:

We demonstrated that non-AUG codons are more dependent on their surrounding nucleotide sequence context than AUG codons. Base pairing between an AUG start codon and anticodon of the initiator tRNA along with interactions between the scanning ribosome and the nucleotides surrounding the start codon (e.g., the interaction between Arg55 of eukaryotic initiation factor 2a and the -3 position [… ]) cause the preinitiation complex to shift from an open conformation to a closed conformation so that translation initiation can occur. Most preinitiation complexes undergo translation initiation when they encounter an AUG start codon whether it is an efficient or inefficient context because the strong interaction between the codon and anticodon provide enough energy to drive the conformational shift. However, mismatches between a non-AUG start codon and the anticodon reduce the binding energy from the codon and anticodon. Therefore, the contributions from interactions between the preinitiation complex and the 'context nucleotides' likely become more significant and necessary. We also showed that sequence context, specifically the +4 position, affects the efficiency of each non-AUG start codon differently. The observed differential effect of the +4 position demonstrates that sequence properties can have codon-specific effects on TIS efficiency. It is possible that there are other properties with codon-specific effects. Furthermore, differences in these properties between reporters may explain why some previous studies have identified GUG or ACG as the most efficient non-AUG start codon [… ] while others are in agreement with our findings [… ].

Also pointed out in the paper, in some corner cases (specific sequences) a sequence containing a non-AUG codon can be "as efficient as" another sequence that contains an AUG. (From the box plot it looks to me like some non-AUG sequences have better efficiency than some AUG sequences, but the paper phrases it as "as good as".)

Anyway, this inversion of efficiency only happen in on some specific sequences. On average over all sequences, AUG has the best efficiency.

If we go [farther back in evolutionary time] to prokaryotes, a similar study of all potential starting codons has been done on E.coli in 2017. The picture is a little bit different in the sense that while AUG is still ahead in terms of efficiency, but together with GUG and UUG does form a cluster of its own, well ahead of the rest.

The standard explanation for this that I found in a review is that AUG, GUG and UUG are all decoded by fMet-tRNAfMet. (Also given in an answer here, based on an older review.) Actually the older review does offer tiny bit more insight:

AUG is the most common initiator codon because it forms the most stable interaction with the CAU anticodon in fMet-tRNA

Of course, one could equivalently ask how comes this enzyme co-evolved with the start codons (that it decodes). But I haven't found an answer to that. I'm probably going to ask that as a separate question.

First of all, it is the coding sequence, the open reading frame (ORF), and not the gene that starts with an AUG. Also, there are actually quite a few ORFs that start with different initiation codons, they are just the exceptions rather than the norm.

As for the need, you can think of START and STOP codons as punctuation. The AUG is read like the first (capital) letter of a sentence (if you will allow me to stretch the definition of punctuation a little). Read up on the process of translation, the ribosome will attach to the mRNA molecule which includes UTRs, it uses the AUG codon as an indication that it should now start translating.

UTRs are the untranslated regions and often contain regulatory sequences that can control translation. However, these should not be in the final protein product so the cellular machinery needs a way of knowing where the UTR ends and the coding sequence begins.

Interpretation of the Question

There are two questions here. The one about alternative start codons is factual and has been well answered by Louis Maddox. The other is an evolutionary question which I would recast as

“Why was methionine selected as the initiating amino acid?”

Like Louis Maddox I would have said that this was almost impossible to answer. However, it turns out that there is at least one hypothesis - perhaps related to the comments on Maddox's answer - so I feel it useful to present and discuss it.

The Regulatory Hypothesis for methionine as initiator

This hypothesis was presented by Bhattacharyyaa and Varshney in a paper in RNA Biology (2016). (A personal or institutional subscription may be required for access to the full version.) Their argument can be summarized as:

  • Methionine does not perform a role in proteins that other aliphatic amino acid side-chains could not, and is one of the rarest of amino acids in proteins.
  • The synthesis of methionine has the highest metabolic cost among amino acids
  • The synthesis of methionine (and N-formyl methionine) depends on one-carbon metabolism.
  • Hence its adoption as an initiator of translation could have been to ensure that protein synthesis only occurred when there was sufficient energy in the cell for one-carbon metabolism, and, by implication, for protein synthesis itself.
  • In addition, the requirement for S-adenosyl methionine for the methylation of rRNA and certain tRNAs would represent a specific coupling to other components of protein synthesis.

[Methionine synthesis - from Berg et al., Section 24.2.7]


A difficulty I personally have with this hypothesis is that I would expect a mechanism to evolve first, with regulation only appearing later. One possible way round this would be if methionine had displaced a similarly hydrophobic amino acid such as norleucine from the early genetic code, as suggested by Alvarez-Carreño et al.. Before this, one would assume that initiation did not take place at a specific codon, as discussed in relation to another SE Biology question.

[Formylation of Methionine - from Berg et al., Figure 29.21]

Why Is AUG the Start Codon?

This article is commented on in the Idea to Watch article by Paweł Mackiewicz,


The rational design of theoretical minimal RNA rings predetermines AUG as the universal start codon. This design maximizes coded amino acid diversity over minimal sequence length, defining in silico theoretical minimal RNA rings, candidate ancestral genes. RNA rings code for 21 amino acids and a stop codon after three consecutive translation rounds, and form a degradation-delaying stem-loop hairpin. Twenty-five RNA rings match these constraints, ten start with the universal initiation codon AUG. No first codon bias exists among remaining RNA rings. RNA ring design predetermines AUG as initiation codon. This is the only explanation yet for AUG as start codon. RNA ring design determines additional RNA ring gene- and tRNA-like properties described previously, because it presumably mimics constraints on life's primordial RNAs.

An AUG initiation codon, not codon–anticodon complementarity, is required for the translation of unleadered mRNA in Escherichia coli

Department of Microbiology, Miami University, Oxford, OH 45056, USA.

Department of Microbiology, Miami University, Oxford, OH 45056, USA.

Department of Microbiology, Miami University, Oxford, OH 45056, USA.

Department of Microbiology, Miami University, Oxford, OH 45056, USA.


We determined the in vivo translational efficiency of ‘unleadered’lacZ compared with a conventionally leadered lacZ with and without a Shine–Dalgarno (SD) sequence in Escherichia coli and found that changing the SD sequence of leadered lacZ from the consensus 5′-AGGA-3′ to 5′-UUUU-3′ results in a 15-fold reduction in translational efficiency however, removing the leader altogether results in only a twofold reduction. An increase in translation coincident with the removal of the leader lacking a SD sequence suggests the existence of stronger or novel translational signals within the coding sequence in the absence of the leader. We examined, therefore, a change in the translational signals provided by altering the AUG initiation codon to other naturally occurring initiation codons (GUG, UUG, CUG) in the presence and absence of a leader and find that mRNAs lacking leader sequences are dependent upon an AUG initiation codon, whereas leadered mRNAs are not. This suggests that mRNAs lacking leader sequences are either more dependent on perfect codon–anticodon complementarity or require an AUG initiation codon in a sequence-specific manner to form productive initiation complexes. A mutant initiator tRNA with compensating anticodon mutations restored expression of leadered, but not unleadered, mRNAs with UAG start codons, indicating that codon–anticodon complementarity was insufficient for the translation of mRNA lacking leader sequences. These data suggest that a cognate AUG initiation codon specifically serves as a stronger and different translational signal in the absence of an untranslated leader.

Therapeutic Areas II: Cancer, Infectious Diseases, Inflammation & Immunology and Dermatology Peptide deformylase (PDF) as a therapeutic target

Prokaryotic translation begins with N-formylmethionine , and the resulting proteins undergo N-terminal modification to become functionally mature. This involves stepwise removal of the N-formyl group catalyzed by PDF, and then the methionine residue through the action of methionine aminopeptidase. 427 Studies have shown that the proteins which fail to undergo this modification are inactive, and in recent years, there has been a surge in scientific efforts to develop new antibacterial agents by targeting PDF. 428–430 A nuclear encoded PDF with a signal sequence, indicating its localization in apicoplast, was recently identified from P. falciparum, 431 and its crystal structure is available at 2.2 Å resolution. 432,433 Its sensitivity toward a number of standard PDF inhibitors has highlighted its importance as a drug target. 431,434

Although the PDF from human mitochondria was originally believed to be nonfunctional, recent studies have shown that this enzyme is active and sensitive to a number of actinonin-based PDF inhibitors. These compounds exhibited antiproliferative activities on tumor cells by affecting the mitochondrial membrane potential. 435 Another factor which may come as an obstacle in drug design based on PfPDF is the possibility of resistance development as has been demonstrated recently in studies using Escherichia coli. 436

Properties of Genetic Codons

Genetic code possesses the following salient features:


A genetic code is universal, i.e. all the living organisms will have the same number of genetic codons (64) that encodes specific amino acids (20). Due to the similarity of genetic code among all organisms, the evolutionary history of all the organisms (starting from prokaryotes to eukaryotes) will be the same.


The genetic codons are non-ambiguous, or the codons of the gene coding system encode one amino acid at a time.


A genetic code is redundant as the 20 amino acids are encoded by the 64 combination of codons, which means one or more codons can encode for a single type of amino acid. Therefore, a single amino acid can be encoded by the different combinations of the codon. For instance, tyrosine is an amino acid which is encoded by both UAU and UAC codons. But, different amino acids cannot be encoded by the single codon at a time. It is also called degeneracy of the genetic code.


The genetic codons are non-overlapping. In a genetic coding system, a single codon can encode single amino acid at a time, which means a codon cannot encode two or three amino acids simultaneously.


A single codon is a combination of three nucleotides, due to which a genetic code is a triplet codon. The triplet codons form by the various combination of all the four nucleotide bases.

Continuous and comma-less

The reading frame of the genetic code is continuous, i.e. there is no comma or any symbols in between them. In simple words, a genetic code has no punctuations in between.

Reading frame

The reading frame of genetic code possesses codons from a 5′ to 3′ end. A genetic code always starts with the initiation codon and ends with the termination codon. The 64 codons code for 20 amino acid. The reading frame of codon always starts with AUG, and the next three letters will be the second codon and so on. In eukaryotes, the reading of mRNA codons in interrupted by introns, which can be removed during mRNA splicing.


Any changes like deletion, addition, and insertion etc. in the reading frame of the mRNA codons cause gene mutations. Point and chromosomal aberrations occur most commonly as a result of any alternations in the genetic codons. Mutated nucleotide sequences or codons may affect the phenotypic features of the person.

Codon usage bias

It is the frequency of occurrence of codons in the DNA or RNA, which varies from one species to another species. In simple words, some of the nucleotide bases may occur more frequently in the DNA strand, while few occurs rarely. Therefore, the nucleotide base composition may differ from organism to organism.


We can conclude that the genetic code is the triple combination of all the four nucleotides that give 64 possible codons, in which each encodes a specific amino acid. In other words, we can say that the genetic code is a blueprint of the genetic information, which holds all the combinations of nucleotide bases.

The Organization of mRNAs and the Initiation of Translation

Although the mechanisms of protein synthesis in prokaryotic and eukaryotic cells are similar, there are also differences, particularly in the signals that determine the positions at which synthesis of a polypeptide chain is initiated on an mRNA template (Figure 7.6). Translation does not simply begin at the 5´ end of the mRNA it starts at specific initiation sites. The 5´ terminal portions of both prokaryotic and eukaryotic mRNAs are therefore noncoding sequences, referred to as 5´ untranslated regions. Eukaryotic mRNAs usually encode only a single polypeptide chain, but many prokaryotic mRNAs encode multiple polypeptides that are synthesized independently from distinct initiation sites. For example, the E. coli lac operon consists of three genes that are translated from the same mRNA (see Figure 6.8). Messenger RNAs that encode multiple polypeptides are called polycistronic, whereas monocistronic mRNAs encode a single polypeptide chain. Finally, both prokaryotic and eukaryotic mRNAs end in noncoding 3´ untranslated regions.

Figure 7.6

Prokaryotic and eukaryotic mRNAs. Both prokaryotic and eukaryotic mRNAs contain untranslated regions (UTRs) at their 5´ and 3´ ends. Eukaryotic mRNAs also contain 5´ 7-methylguanosine (m 7 G) caps and 3´ poly-A tails. Prokaryotic (more. )

In both prokaryotic and eukaryotic cells, translation always initiates with the amino acid methionine, usually encoded by AUG. Alternative initiation codons, such as GUG, are used occasionally in bacteria, but when they occur at the beginning of a polypeptide chain, these codons direct the incorporation of methionine rather than of the amino acid they normally encode (GUG normally encodes valine). In most bacteria, protein synthesis is initiated with a modified methionine residue (N-formylmethionine), whereas unmodified methionines initiate protein synthesis in eukaryotes (except in mitochondria and chloroplasts, whose ribosomes resemble those of bacteria).

The signals that identify initiation codons are different in prokaryotic and eukaryotic cells, consistent with the distinct functions of polycistronic and monocistronic mRNAs (Figure 7.7). Initiation codons in bacterial mRNAs are preceded by a specific sequence (called a Shine-Delgarno sequence, after its discoverers) that aligns the mRNA on the ribosome for translation by base-pairing with a complementary sequence near the 3´ terminus of 16S rRNA. This base-pairing interaction enables bacterial ribosomes to initiate translation not only at the 5´ end of an mRNA but also at the internal initiation sites of polycistronic messages. In contrast, ribosomes recognize most eukaryotic mRNAs by binding to the 7-methylguanosine cap at their 5´ terminus (see Figure 6.39). The ribosomes then scan downstream of the 5´ cap until they encounter an AUG initiation codon. Sequences that surround AUGs affect the efficiency of initiation, so in many cases the first AUG in the mRNA is bypassed and translation initiates at an AUG farther downstream. However, eukaryotic mRNAs have no sequence equivalent to the Shine-Delgarno sequence of prokaryotic mRNAs. Translation of eukaryotic mRNAs is instead initiated at a site determined by scanning from the 5´ terminus, consistent with their functions as monocistronic messages that encode only single polypeptides.

Figure 7.7

Signals for translation initiation. Initiation sites in prokaryotic mRNAs are characterized by a Shine-Delgarno sequence that precedes the AUG initiation codon. Base pairing between the Shine-Delgarno sequence and a complementary sequence near the 3´ (more. )

Protein Modifications

During and after translation, individual amino acids may be chemically modified, signal sequences may be appended, and the new protein “folds” into a distinct three-dimensional structure as a result of intramolecular interactions. A signal sequence is a short tail of amino acids that directs a protein to a specific cellular compartment. These sequences at the amino end or the carboxyl end of the protein can be thought of as the protein’s “train ticket” to its ultimate destination. Other cellular factors recognize each signal sequence and help transport the protein from the cytoplasm to its correct compartment. For instance, a specific sequence at the amino terminus will direct a protein to the mitochondria or chloroplasts (in plants). Once the protein reaches its cellular destination, the signal sequence is usually clipped off.

Many proteins fold spontaneously, but some proteins require helper molecules, called chaperones, to prevent them from aggregating during the complicated process of folding. Even if a protein is properly specified by its corresponding mRNA, it could take on a completely dysfunctional shape if abnormal temperature or pH conditions prevent it from folding correctly.

Chemical Modifications, Protein Activity, and Longevity

Proteins can be chemically modified with the addition of groups including methyl, phosphate, acetyl, and ubiquitin groups. The addition or removal of these groups from proteins regulates their activity or the length of time they exist in the cell. Sometimes these modifications can regulate where a protein is found in the cell—for example, in the nucleus, the cytoplasm, or attached to the plasma membrane.

Chemical modifications occur in response to external stimuli such as stress, the lack of nutrients, heat, or ultraviolet light exposure. These changes can alter epigenetic accessibility, transcription, mRNA stability, or translation—all resulting in changes in expression of various genes. This is an efficient way for the cell to rapidly change the levels of specific proteins in response to the environment. Because proteins are involved in every stage of gene regulation, the phosphorylation of a protein (depending on the protein that is modified) can alter accessibility to the chromosome, can alter translation (by altering transcription factor binding or function), can change nuclear shuttling (by influencing modifications to the nuclear pore complex), can alter RNA stability (by binding or not binding to the RNA to regulate its stability), can modify translation (increase or decrease), or can change post-translational modifications (add or remove phosphates or other chemical modifications).

The addition of an ubiquitin group to a protein marks that protein for degradation. Ubiquitin acts like a flag indicating that the protein lifespan is complete. These proteins are moved to the proteasome, an organelle that functions to remove proteins, to be degraded (Figure 7). One way to control gene expression, therefore, is to alter the longevity of the protein.

Figure 7. Proteins with ubiquitin tags are marked for degradation within the proteasome.

Like transcription, translation is controlled by proteins that bind and initiate the process. In translation, before protein synthesis can begin, ribosome assembly has to be completed. This is a multi-step process.

In ribosome assembly, the large and small ribosomal subunits and an initiator tRNA (tRNAi) containing the first amino acid of the final polypeptide chain all come together at the translation start codon on an mRNA to allow translation to begin. First, the small ribosomal subunit binds to the tRNAi which carries methionine in eukaryotes and archaea and carries N-formyl-methionine in bacteria. (Because the tRNAi is carrying an amino acid, it is said to be charged.) Next, the small ribosomal subunit with the charged tRNAi still bound scans along the mRNA strand until it reaches the start codon AUG, which indicates where translation will begin. The start codon also establishes the reading frame for the mRNA strand, which is crucial to synthesizing the correct sequence of amino acids. A shift in the reading frame results in mistranslation of the mRNA. The anticodon on the tRNAi then binds to the start codon via basepairing. The complex consisting of mRNA, charged tRNAi, and the small ribosomal subunit attaches to the large ribosomal subunit, which completes ribosome assembly. These components are brought together by the help of proteins called initiation factors which bind to the small ribosomal subunit during initiation and are found in all three domains of life. In addition, the cell spends GTP energy to help form the initiation complex. Once ribosome assembly is complete, the charged tRNAi is positioned in the P site of the ribosome and the empty A site is ready for the next aminoacyl-tRNA. The polypeptide synthesis begins and always proceeds from the N-terminus to the C-terminus, called the N-to-C direction.

In eukaryotes, several eukaryotic initiation factor proteins (eIFs) assist in ribosome assembly. The eukaryotic initiation factor-2 (eIF-2) is active when it binds to guanosine triphosphate (GTP). With GTP bound to it, eIF-2 protein binds to the small 40S ribosomal subunit. Next, the initiatior tRNA charged with methionine (Met-tRNAi) associates with the GTP-eIF-2/40S ribosome complex, and once all these components are bound to each other, they are collectively called the 43S complex.

Eukaryotic initiation factors eIF1, eIF3, eIF4, and eIF5 help bring the 43S complex to the 5&prime-m 7 G cap of an mRNA be translated. Once bound to the mRNA&rsquos 5&prime m 7 G cap, the 43S complex starts travelling down the mRNA until it reaches the initiation AUG codon at the start of the mRNA&rsquos reading frame. Sequences around the AUG may help ensure the correct AUG is used as the initiation codon in the mRNA.

Once the 43S complex is at the initiation AUG, the tRNAi-Met is positioned over the AUG. The anticodon on tRNAi-Met basepairs with the AUG codon. At this point, the GTP bound to eIF2 in the 43S complexx is hydrolyzed to GDP + phosphate, and energy is released. This energy is used to release the eIF2 (with GDP bound to it) from the 43S complex, leaving the 40S ribosomal subunit and the tRNAi-Met at the translation start site of the mRNA.

Next, eIF5 with GTP bound binds to the 40S ribosomal subunit complexed to the mRNA and the tRNAi-Met. The eIF5-GTP allows the 60S large ribosomal subunit to bind. Once the 60S ribosomal subunit arrives, eIF5 hydrolyzes its bound GTP to GDP + phosphate, and energy is released. This energy powers assembly of the two ribosomal subunits into the intact 80S ribosome, with tRNAi-Met in its P site while also basepaired to the initiation AUG codon on the mRNA. Translation is ready to begin.

The binding of eIF-2 to the 40S ribosomal subunit is controlled by phosphorylation. If eIF-2 is phosphorylated, it undergoes a conformational change and cannot bind to GTP. Therefore, the 43S complex cannot form properly and translation is impeded. When eIF-2 remains unphosphorylated, it binds the 40S ribosomal subunit and actively translates the protein.

Figure: Translation Initiation Complex: Gene expression can be controlled by factors that bind the translation initiation complex.

The ability to fully assemble the ribosome directly affects the rate at which translation occurs. But protein synthesis is regulated at various other levels as well, including mRNA synthesis, tRNA synthesis, rRNA synthesis, and eukaryotic initiation factor synthesis. Alteration in any of these components affects the rate at which translation can occur.

Reduced but accurate translation from a mutant AUA initiation codon in the mitochondrial COX2 mRNA of Saccharomyces cerevisiae

We have changed the translation initiation codon of the COX2 mRNA of Saccharomyces cerevisiae from AUG to AUA, generating a mutation termed cox2-10. This mutation reduced translation of the COX2 mRNA at least five-fold without affecting the steady-state level of the mRNA, and produced a leaky nonrespiratory growth phenotype. To address the question of whether residual translation of the cox2-10 mRNA was initiating at the altered initiation codon or at the next AUG codon downstream (at position 14), we took advantage of the fact that the mature coxII protein is generated from the electrophoretically distinguishable coxII precursor by removal of the amino-terminal 15 residues, and that this processing can be blocked by a mutation in the nuclear gene PET2858. We constructed a pet2858, cox2-10 double mutant strain using a pet2858 allele from our mutant collection. The double mutant accumulated low levels of a polypeptide which comigrated with the coxII precursor protein, not the mature species, providing strong evidence that residual initiation was occurring at the mutant AUA codon. Residual translation of the mutant mRNA required the COX2 mRNA-specific activator PET111. Furthermore, growth of cox2-10 mutant strains was sensitive to alterations in PET111 gene dosage: the respiratory-defective growth phenotype was partially suppressed in haploid strains containing PET111 on a high-copy-number vector, but became more severe in diploid strains containing only one functional copy of PET111.


Interaction between pre-AUG poly(A) and Pab1p

Our finding that yeast genes with a pre-AUG A≥12 have a much reduced predicted protein synthesis rate (Figures 6) and ribosomal density (Figure 7) is consistent with the hypothesis that, while a pre-AUG AN may enhance translation by binding to translation initiation factors (Gallie and Tanguay 1994 Shirokikh and Spirin 2008), a long pre-AUG AN, by binding tightly to PABP, may repress translation by interfering with ribosomal scanning (Sachs et al. 1987 Wu and Bag 1998 Bag 2001 Melo et al. 2003a,b Patel et al. 2005 Ma et al. 2006 Patel and Bag 2006). This hypothesis would predict that removing PABP would remove its inhibitory effect on mRNA with long pre-AUG AN. This is exactly what has been observed in a previous in vitro experiment (Shirokikh and Spirin 2008) without PABP, where the translation-enhancing effect is greater for longer poly(A) than for shorter poly(A), i.e., the ranking order of translation initiation efficiency is A25 > A12 > A5.

Our results suggest an alternative to the dominant hypothesis concerning the function of PABP on translation initiation. Several studies have shown that exogenous poly(A) can inhibit translation initiation (Lodish and Nathan 1972 Jacobson and Favreau 1983 Grossi De Sa et al. 1988), and the inhibitive effect can be eliminated by the addition of PABP (Grossi De Sa et al. 1988 Gilbert et al. 2007). Similarly, noncoding poly(A) sequences, such as BC1 and BC200 RNA expressed in neurons, are known to bind PABP (Muddashetty et al. 2002) and to inhibit translation initiation when highly expressed (Wang et al. 2002, 2005 Kondrashov et al. 2005). The dominant hypothesis is that PABP, in addition to its function in mRNA stabilization and circularization, also serves as a translation initiation factor (Kahvejian et al. 2005 Khanam et al. 2006) that functions by binding to pre-AUG AN. Thus, either exogenous poly(A) RNA or intrinsically produced poly(A) RNA such as BC1 and BC200 RNA that sequesters PABP would reduce translation initiation (Khanam et al. 2006 Gilbert et al. 2007). Consistent with this hypothesis, adding PABP eliminated the inhibitive effect of the exogenous poly(A) RNA (Grossi De Sa et al. 1988 Gilbert et al. 2007). The hypothesis also explains why poly(A) is over-represented in 5′-UTR in yeast genes, especially those highly expressed ones because such pre-AUG AN would gain enhanced translation initiation by interacting with PABP. However, this hypothesis has three difficulties. First, it cannot explain why genes with a long pre-AUG AN have a reduced protein synthesis rate as well as a reduced ribosomal density shown in this article. Second, it cannot explain why, in the complete absence of PABP, pre-AUG AN can still enhance translation initiation for both capped and uncapped mRNA (Shirokikh and Spirin 2008). Third, it cannot explain why adding translation initiation factors eIF-4B and eIF-4F (including eIF-4A) in combination also eliminated the inhibitive effect of exogenous poly(A) RNA on translation initiation (Gallie and Tanguay 1994). Our new hypothesis is that pre-AUG AN binds to translation initiation factors such as eIF-4B and eIF-4F to facilitate translation initiation. Exogenous or intrinsic poly(A) RNAs can inhibit translation initiation not only because they compete for PABP but also because they would sequester the translation initiation factors eIF-4B and eIF-4F. Adding PABP can eliminate the inhibitive effect of exogenous poly(A) RNA because PABP would bind to the poly(A) and free the translation initiation factors sequestered by these poly(A) RNAs. This new hypothesis, which was implicitly proposed in a previous study (Gallie and Tanguay 1994) demonstrating the binding of poly(A) RNA to eIF-4B and eIF-4F, eliminates all three difficulties plaguing the other hypothesis.

Presence of a pre-AUG AN appears to be a key feature in a set of internal ribosomal entry sites (IRESs) empirically verified in a recent study on yeast translation (Gilbert et al. 2007). All those poly(A) tracts are shorter than 12 consecutive A’s. These include not only the genes involved in the invasive growth in the yeast, but also transcripts that are routinely transcribed and translated, such as eIF-4G and Pab1 transcripts. The IRES activity mediated by the pre-AUG AN does seem to require Pab1p (Gilbert et al. 2007). A recent study using mRNAs without a poly(A) tail (Kahvejian et al. 2005) suggests that PABP may serve as a translation initiation factor independent of its binding to the poly(A) tail. It is possible that the multiple PABP functions may depend on how strong it binds to pre-AUG AN, with strong binding inhibiting translation and weak binding enhancing translation. It is also possible that the association between the IRES activity and the pre-AUG AN is coincidental. A recent study on IRESs from both the yeast and Drosophila melanogaster shows that IRES activity increases consistently with decreasing stability of secondary structure (Xia and Holcik 2009). A pre-AUG poly(A) would contribute to a weak RNA secondary structure when nucleotide U usage is dramatically reduced (Figures 1–3).

While there is empirical evidence that mammalian PABP expression may be autoregulated by PABP binding to the pre-AUG AN of its own mRNA (Wu and Bag 1998 Bag 2001 Ma et al. 2003a,b, 2006 Patel et al. 2005 Patel and Bag 2006 Bag and Bhattacharjee 2010), there is no evidence that Pab1p abundance in yeast is autoregulated. Pab1p abundance is high, being the top 39th among the 3841 yeast genes with characterized protein abundance (Ghaemmaghami et al. 2003). Its mRNA ranked the top 114th in ribosomal density among the 5164 yeast genes with ribosomal density characterized by Ingolia et al. (2009). Such a high protein abundance and a high ribosomal density is strong evidence that the high protein abundance in Pab1p does not interfere with the translation of its mRNA. If the autoregulation requires a pre-AUG A12, then Pab1 mRNA would escape autoregulation because it has only a pre-AUG A11. The mammalian PAPB seems to be less strict on contiguity of poly(A), especially its RNA-recognition motif (RRM) 3+4 (Khanam et al. 2006).

Relevance to the translation of early and late genes in vaccinia virus

The finding that the length of pre-AUG AN is strongly associated with ribosomal loading and protein synthesis sheds light on the evolutionary significance of the difference in the length of pre-AUG AN between early and late genes in vaccinia virus. The early vaccinia viral genes have a pre-AUG AN with 4–14 A residues (Ahn et al. 1990 Ink and Pickup 1990), but the poly(A) tracts in late genes are often around 35 A residues (Bertholet et al. 1987 Schwer et al. 1987 Schwer and Stunnenberg 1988). The early viral genes are translated in the presence of abundant PABP, which would repress the translation of mRNAs with a long pre-AUG AN. This implies that the transcripts of the viral early genes should have only short poly(A) to avoid repression. In contrast, late viral genes are translated when the cellular protein production has been much reduced, i.e., when PABP is expected to be less abundant. So mRNAs from late viral genes can have long pre-AUG AN without suffering from translation repression mediated by PABP. It has been experimentally demonstrated that, in the absence of PABP, the translation enhancing effect of pre-AUG AN increases with its length (Shirokikh and Spirin 2008).

There is some controversy concerning whether the PABP level is reduced during the infection cycle of vaccinia virus. The degradation of host mRNA appears nearly complete 6 hr after the viral infection as no host poly(A) mRNA is detectable at/after this time (Katsafanas and Moss 2007). Furthermore, a large-scale characterization of mRNA of HeLa cells infected with vaccinia virus (Yang et al. 2010) showed that PABP mRNA was reduced to 50% by 4 hr. Although no mRNA characterization is done after this time, intuition would suggest continued reduction, and such a suggestion is consistent with the finding that no host mRNA is detectable after 6 hr after the viral infection (Katsafanas and Moss 2007).

The study by Katsafanas and Moss (2007) also showed that the viral mRNAs are located in the cavities of viral factories (VFs), where they are transcribed and translated. A number of translation initiation factors such as eIF4E and eIF4G are also localized in these cavities (Katsafanas and Moss 2007 Walsh et al. 2008). In contrast, PABP is localized on the periphery of a VF (Walsh et al. 2008), which suggests that PABP does not participate in translation of the viral genes. It is known that vaccinia virus produces poly(A) nontranslated small RNA sequences that selectively inhibit cap-dependent translation of host messages (Bablanian and Banerjee 1986 Bablanian et al. 1986, 1987, 1993 Lu and Bablanian 1996), presumably by binding to PABP and preventing it from interacting with other translation initiation factors. Both Rubella virus and Bunyamwera virus inhibit translation of host genes by producing a capsid protein that binds to PABP and prevents it from binding to other translation initiation factors (Ilkow et al. 2008 Blakqori et al. 2009).

Walsh et al. (2008) found a persistent level of PABP during the infection cycle of vaccinia virus, but did not provide any evidence that PABP is in fact produced during the viral infection cycle. However, a subsequent paper (Perez et al. 2011) from the same laboratory found that PABP is continuously synthesized in cytomegalovirus-infected cells. They suggested that this selective translation of PABP is through the mTOR+4E-BP pathway. However, this suggestion does not seem coherent. Preventing 4E-BP from binding to eIF-4E would lead to a general increase of cap-dependent translation, not selective translation of the PABP mRNA.

In summary, multiple lines of empirical evidence suggest that a pre-AUG AN shorter than 12 may enhance translation in the yeast. However, yeast genes with a pre-AUG A≥12 tend to be translated inefficiently with a low ribosomal density and output a reduced amount of protein, consistent with the interpretation that such long poly(A) tracts may bind to Pab1p, resulting in repression of translation.