We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I am a computer-scientist working in nucleic acids from the perspective of Formal Language Theory. I am dealing with several papers that analyze the Stem-Loop structure of DNA.
I would like some solid reference that confirms (or negates):
1) The existence of these structures
2) The existence of a (species, subespecies,… ) whose DNA or RNA has been confirmed to have strings which are a pure stem-loop structure.
The existence and importance of stem loop structures is well documented.
I'm not sure what the second question means exactly. If you mean 'an organism that has at least one stem loop, try HIV. The MS2 system is also a stem loop binding protein that's used as a tool a lot in biology, I suggest googling it.
Comparative genomics identifies thousands of candidate structured RNAs in human microbiomes
Structured RNAs play varied bioregulatory roles within microbes. To date, hundreds of candidate structured RNAs have been predicted using informatic approaches that search for motif structures in genomic sequence data. The human microbiome contains thousands of species and strains of microbes. Yet, much of the metagenomic data from the human microbiome remains unmined for structured RNA motifs primarily due to computational limitations.
We sought to apply a large-scale, comparative genomics approach to these organisms to identify candidate structured RNAs. With a carefully constructed, though computationally intensive automated analysis, we identify 3161 conserved candidate structured RNAs in intergenic regions, as well as 2022 additional candidate structured RNAs that may overlap coding regions. We validate the RNA expression of 177 of these candidate structures by analyzing small fragment RNA-seq data from four human fecal samples.
This approach identifies a wide variety of candidate structured RNAs, including tmRNAs, antitoxins, and likely ribosome protein leaders, from a wide variety of taxa. Overall, our pipeline enables conservative predictions of thousands of novel candidate structured RNAs from human microbiomes.
Ribosomal RNA (rRNA) associates with a set of proteins to form ribosomes. These complex structures, which physically move along an mRNA molecule, catalyze the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis. Ribosomes are composed of a large and small subunit, each of which contains its own rRNA molecule or molecules.
Translation is the whole process by which the base sequence of an mRNA is used to order and to join the amino acids in a protein. The three types of RNA participate in this essential protein-synthesizing pathway in all cells in fact, the development of the three distinct functions of RNA was probably the molecular key to the origin of life. How each RNA carries out its specific task is discussed in this section, while the biochemical events in protein synthesis and the required protein factors are described in the final section of the chapter.
Messenger RNA Carries Information from DNA in a Three-Letter Genetic Code
RNA contains ribonucleotides of adenine, cytidine, guanine, and uracil DNA contains deoxyribonucleotides of adenine, cytidine, guanine, and thymine. Because 4 nucleotides, taken individually, could represent only 4 of the 20 possible amino acids in coding the linear arrangement in proteins, a group of nucleotides is required to represent each amino acid. The code employed must be capable of specifying at least 20 words (i.e., amino acids).
If two nucleotides were used to code for one amino acid, then only 16 (or 42) different code words could be formed, which would be an insufficient number. However, if a group of three nucleotides is used for each code word, then 64 (or 43) code words can be formed. Any code using groups of three or more nucleotides will have more than enough units to encode 20 amino acids. Many such coding systems are mathematically possible. However, the actual genetic code used by cells is a triplet code, with every three nucleotides being "read" from a specified starting point in the mRNA. Each triplet is called a codon. Of the 64 possible codons in the genetic code, 61 specify individual amino acids and three are stop codons. Table 4-2 shows that most amino acids are encoded by more than one codon. Only two — methionine and tryptophan — have a single codon at the other extreme, leucine, serine, and arginine are each specified by six different codons. The different codons for a given amino acid are said to be synonymous. The code itself is termed degenerate, which means that it contains redundancies.
Table 4-2. The Genetic Code (RNA to Amino Acids)*.
The Genetic Code (RNA to Amino Acids)*.
Synthesis of all protein chains in prokaryotic and eukaryotic cells begins with the amino acid methionine. In most mRNAs, the start (initiator) codon specifying this aminoterminal methionine is AUG. In a few bacterial mRNAs, GUG is used as the initiator codon, and CUG occasionally is used as an initiator codon for methionine in eukaryotes. The three codons UAA, UGA, and UAG do not specify amino acids but constitute stop (terminator) signals that mark the carboxyl terminus of protein chains in almost all cells. The sequence of codons that runs from a specific start site to a terminating codon is called a reading frame. This precise linear array of ribonucleotides in groups of three in mRNA specifies the precise linear sequence of amino acids in a protein and also signals where synthesis of the protein chain starts and stops.
Because the genetic code is a commaless, overlapping triplet code, a particular mRNA theoretically could be translated in three different reading frames. Indeed some mRNAs have been shown to contain overlapping information that can be translated in different reading frames, yielding different polypeptides (Figure 4-21). The vast majority of mRNAs, however, can be read in only one frame because stop codons encountered in the other two possible reading frames terminate translation before a functional protein is produced. Another unusual coding arrangement occurs be- cause of frameshifting. In this case the protein-synthesizing machinery may read four nucleotides as one amino acid and then continue reading triplets, or it may back up one base and read all succeeding triplets in the new frame until termination of the chain occurs. These frameshifts are not common events, but a few dozen such instances are known.
Figure 4-21. Example of how the genetic code — an overlapping, commaless triplet code — can be read in two different frames.
Example of how the genetic code — an overlapping, commaless triplet code — can be read in two different frames. If translation of the mRNA sequence shown begins at two different upstream start sites (not (more. )
The meaning of each codon is the same in most known organisms — a strong argument that life on earth evolved only once. Recently the genetic code has been found to differ for a few codons in many mitochondria, in ciliated protozoans, and in Acetabularia, a single-celled plant. As shown in Table 4-3, most of these changes involve reading of normal stop codons as amino acids, not an exchange of one amino acid for another. It is now thought that these exceptions to the general code are later evolutionary developments that is, at no single time was the code immutably fixed, although massive changes were not tolerated once a general code began to function early in evolution.
Table 4-3. Unusual Codon Usage in Nuclear and Mitochondrial Genes.
Unusual Codon Usage in Nuclear and Mitochondrial Genes.
Experiments with Synthetic mRNAs and Trinucleotides Broke the Genetic Code
Having described the genetic code, we briefly recount how it was deciphered — one of the great triumphs of modern biochemistry. The underlying experimental work was carried out largely with cell-free bacterial extracts containing all the necessary components for protein synthesis except mRNA (i.e., tRNAs, ribosomes, amino acids, and the energy-rich nucleotides ATP and GTP).
Initially, researchers added synthetic mRNAs containing a single type of nucleotide to such extracts and then determined the amino acid incorporated into the polypeptide that was formed. In the first successful experiment, synthetic mRNA composed only of U residues [poly(U)] yielded polypeptides made up only of phenylalanine. Thus it was concluded that a codon for phenylalanine consisted entirely of U's. Likewise, experiments with poly(C) and poly(A) showed that a codon for proline contained only C's and a codon for lysine only A's (Figure 4-22). [Poly(G) did not work in this type of experiment because it assumes an unusable stacked structure that is not translated well.] Next, synthetic mRNAs composed of alternating bases were used. The results of these experiments not only revealed more codons but also demonstrated that codons are three bases long. The example of this approach illustrated in Figure 4-23 led to identification of ACA as the codon for threonine and CAC for histidine. Similar experiments with many such mixed polynucleotides revealed a substantial part of the genetic code.
Figure 4-22. Assigning codons using synthetic mRNAs containing a single ribonucleotide.
Assigning codons using synthetic mRNAs containing a single ribonucleotide. Addition of such a synthetic mRNA to a bacterial extract that contained all the components necessary for protein synthesis except mRNA resulted in synthesis of polypeptides composed (more. )
Figure 4-23. Assigning codons using mixed polynucleotides.
Assigning codons using mixed polynucleotides. (a) When a synthetic mRNA with alternating A and C residues was added to a protein-synthesizing bacterial extract, the resulting polypeptide contained alternating threonine and histidine residues. This finding (more. )
The entire genetic code was finally worked out by a second type of experiment conducted by Marshall Nirenberg and his collaborators. In this approach, all the possible trinucleotides were tested for their ability to attract tRNAs attached to the 20 different amino acids found in natural proteins (Figure 4-24). In all, 61 of the 64 possible trinucleotides were found to code for a specific amino acid the trinucleotides UAA, UGA, and UAG did not encode amino acids.
Figure 4-24. Breaking the entire genetic code by use of chemically synthesized trinucleotides.
Breaking the entire genetic code by use of chemically synthesized trinucleotides. Marshall Nirenberg and his collaborators prepared 20 ribosome-free bacterial extracts containing all possible aminoacyl-tRNAs (tRNAs with an amino acid attached). In each (more. )
Although synthetic mRNAs were useful in deciphering the genetic code, in vitro protein synthesis from these mRNAs is very inefficient and yields polypeptides of variable size. Successful in vitro synthesis of a naturally occurring protein was achieved first when mRNA from bacteriophage F2 (a virus) was added to bacterial extracts, leading to formation of the coat, or capsid, protein (the "packaging" protein that covers the virus particle). Studies with such natural mRNAs established that AUG encodes methionine at the start of almost all proteins and is required for efficient initiation of protein synthesis, while the three trinucleotides (UAA, UGA, and UAG) that do not encode any amino acid act as stop codons, necessary for precise termination of synthesis.
The Folded Structure of tRNA Promotes Its Decoding Functions
The next step in understanding the flow of genetic information from DNA to protein was to determine how the nucleotide sequence of mRNA is converted into the amino acid sequence of protein. This decoding process requires two types of adapter molecules: tRNAs and enzymes called aminoacyl-tRNA synthetases. First we describe the role of tRNAs in decoding mRNA codons, and then examine how synthetases recognize tRNAs.
All tRNAs have two functions: to be chemically linked to a particular amino acid and to base-pair with a codon in mRNA so that the amino acid can be added to a growing peptide chain. Each tRNA molecule is recognized by one and only one of the 20 aminoacyl-tRNA synthetases. Likewise, each of these enzymes links one and only one of the 20 amino acids to a particular tRNA, forming an aminoacyl-tRNA. Once its correct amino acid is attached, a tRNA then recognizes a codon in mRNA, thereby delivering its amino acid to the growing polypeptide (Figure 4-25).
Figure 4-25. Translation of nucleic acid sequences in mRNA into amino acid sequences in proteins requires a two-step decoding process.
Translation of nucleic acid sequences in mRNA into amino acid sequences in proteins requires a two-step decoding process. First, an aminoacyl-tRNA synthetase couples a specific amino acid to its corresponding tRNA. Second,a three-base sequence in the (more. )
As studies on tRNA proceeded, 30 - 40 different tRNAs were identified in bacterial cells and as many as 50 - 100 in animal and plant cells. Thus the number of tRNAs in most cells is more than the number of amino acids found in proteins (20) and also differs from the number of codons in the genetic code (61). Consequently, many amino acids have more than one tRNA to which they can attach (explaining how there can be more tRNAs than amino acids) in addition, many tRNAs can attach to more than one codon (explaining how there can be more codons than tRNAs). As noted previously, most amino acids are encoded by more than one codon, requiring some tRNAs to recognize more than one codon.
The function of tRNA molecules, which are 70 - 80 nucleotides long, depends on their precise three-dimensional structures. In solution, all tRNA molecules fold into a similar stem-loop arrangement that resembles a cloverleaf when drawn in two dimensions (Figure 4-26a). The four stems are short double helices stabilized by Watson-Crick base pairing three of the four stems have loops containing seven or eight bases at their ends, while the remaining, unlooped stem contains the free 3′ and 5′ ends of the chain. Three nucleotides termed the anticodon, located at the center of one loop, can form base pairs with the three complementary nucleotides forming a codon in mRNA. As discussed later, specific aminoacyl-tRNA synthetases recognize the surface structure of each tRNA for a specific amino acid and covalently attach the proper amino acid to the unlooped amino acid acceptor stem. The 3′ end of all tRNAs has the sequence CCA, which in most cases is added after synthesis and processing of the tRNA are complete. Viewed in three dimensions, the folded tRNA molecule has an L shape with the anticodon loop and acceptor stem forming the ends of the two arms (Figure 4-26b).
Figure 4-26. Structure of tRNAs.
Structure of tRNAs. (a) The primary structure of yeast alanine tRNA (tRNAAla), the first such sequence determined. This molecule is synthesized from the nucleotides A, C, G, and U, but some of the nucleotides, shown in red, are modified after synthesis: D = dihydrouridine, I = inosine, T = thymine, Ψ = pseudouridine, (more. )
Besides addition of CCA at the 3′ terminus after a tRNA molecule is synthesized, several of its nucleic acid bases typically are modified. For example, most tRNAs are synthesized with a four-base sequence of UUCG near the middle of the molecule. The first uridylate is methylated to become a thymidylate the second is rearranged into a pseudouridylate (abbreviated Ψ), in which the ribose is attached to carbon 5 instead of to nitrogen 1 of the uracil. These modifications produce a characteristic TΨCG loop in an unpaired region at approximately the same position in nearly all tRNAs (see Figure 4-26a).
Nonstandard Base Pairing Often Occurs between Codons and Anticodons
If perfect Watson-Crick base pairing were demanded between codons and anticodons, cells would have to contain exactly 61 different tRNA species, one for each codon that specifies an amino acid. As noted above, however, many cells contain fewer than 61 tRNAs. The explanation for the smaller number lies in the capability of a single tRNA anticodon to recognize more than one, but not necessarily every, codon corresponding to a given amino acid. This broader recognition can occur because of nonstandard pairing between bases in the so-called "wobble" position: the third base in a mRNA codon and the corresponding first base in its tRNA anticodon. Although the first and second bases of a codon form standard Watson-Crick base pairs with the third and second bases of the corresponding anticodon, four nonstandard interactions can occur between bases in the wobble position. Particularly important is the G·U base pair, which structurally fits almost as well as the standard G·C pair. Thus, a given anticodon in tRNA with G in the first (wobble) position can base-pair with the two corresponding codons that have either pyrimidine (C or U) in the third position (Figure 4-27). For example, the phenylalanine codons UUU and UUC (5′ → 3′) are both recognized by the tRNA that has GAA (5′ → 3′) as the anticodon. In fact, any two codons of the type NNPyr (N = any base Pyr = pyrimidine) encode a single amino acid and are decoded by a single tRNA with G in the first (wobble) position of the anticodon.
Figure 4-27. The first and second bases in an mRNA codon form Watson-Crick base pairs with the third and second bases, respectively, of a tRNA anticodon.
The first and second bases in an mRNA codon form Watson-Crick base pairs with the third and second bases, respectively, of a tRNA anticodon. However, the base in the third (or wobble) position of an mRNA codon often forms a nonstandard base pair with (more. )
Although adenine rarely is found in the anticodon wobble position, many tRNAs in plants and animals contain inosine (I), a deaminated product of adenine, at this position. Inosine can form nonstandard base pairs with A, C, and U (Figure 4-28). A tRNA with inosine in the wobble position thus can recognize the corresponding mRNA codons with A, C, or U in the third (wobble) position (see Figure 4-27). For this reason, inosine-containing tRNAs are heavily employed in translation of the synonymous codons that specify a single amino acid. For example, four of the six codons for leucine have a 3′ A, C, or U (see Table 4-2) these four codons are all recognized by the same tRNA (3′-GAI-5′), which has inosine in the wobble position of the anticodon (and thus recognizes CUA, CUC, and CUU), and uses a G·U pair in position 1 to recognize the UUA codon.
Figure 4-28. The nonstandard, wobble base pairs U·G, C·I, A·I, and U·I.
The nonstandard, wobble base pairs U·G, C·I, A·I, and U·I. The heavy black lines indicate the bonds by which the nitrogenous bases attach to the 1′ carbon of ribose (C1).
Aminoacyl-tRNA Synthetases Activate Amino Acids by Linking Them to tRNAs
Recognition of the codon or codons specifying a given amino acid by a particular tRNA is actually the second step in decoding the genetic message. The first step, attachment of the appropriate amino acid to a tRNA, is catalyzed by a specific aminoacyl-tRNA synthetase (see Figure 4-25). Each of the 20 different synthetases recognizes one amino acid and all its compatible, or cognate, tRNAs. These coupling enzymes link an amino acid to the free 2′ or 3′ hydroxyl of the adenosine at the 3′ terminus of tRNA molecules by a two-step ATP-requiring reaction (Figure 4-29). About half the aminoacyl-tRNA synthetases transfer the aminoacyl group to the 2′ hydroxyl of the terminal adenosine (class I), and about half to the 3′ hydroxyl (class II). In this reaction, the amino acid is linked to the tRNA by a high-energy bond and thus is said to be activated. The energy of this bond subsequently drives the formation of peptide bonds between adjacent amino acids in a growing polypeptide chain. The equilibrium of the aminoacylation reaction is driven further toward activation of the amino acid by hydrolysis of the high-energy phosphoanhydride bond in pyrophosphate. The overall reaction is
Figure 4-29. Aminoacylation of tRNA. Amino acids are covalently linked to tRNAs by aminoacyl-tRNA synthetases.
Aminoacylation of tRNA. Amino acids are covalently linked to tRNAs by aminoacyl-tRNA synthetases. Each of these enzymes recognizes one kind of amino acid and all the cognate tRNAs that recognize codons for that amino acid. The two-step aminoacylation (more. )
The amino acid sequences of the aminoacyl-tRNA synthetases (ARSs) from many organisms are now known, and the three-dimensional structures of over a dozen enzymes of both classes have been solved. Each of these enzymes has a rather precise binding site for ATP (GTP is not admitted and CTP and UTP are too small) and binding pockets for its specific amino acid. Class I and class II enzymes bind to opposite faces of the incoming tRNAs. The binding surfaces of class I enzymes tend to be somewhat complementary to those of class II enzymes. These different binding surfaces and the consequent alignment of bound tRNAs probably account in part for the difference in the hydroxyl group to which the aminoacyl group is transferred (Figure 4-30). Because some amino acids are so similar structurally, aminoacyl-tRNA synthetases sometimes make mistakes. These are corrected, however, by the enzymes themselves, which check the fit in the binding pockets and facilitate deacylation of any misacylated tRNAs. This crucial function helps guarantee that a tRNA delivers the correct amino acid to the protein-synthesizing machinery.
Figure 4-30. Recognition of a tRNA by aminoacyl synthetases. Aspartyl-tRNA synthetase (AspRS) is a class II enzyme, and arginyl-tRNA synthetase (ArgRS) is a class I enzyme.
Recognition of a tRNA by aminoacyl synthetases. Aspartyl-tRNA synthetase (AspRS) is a class II enzyme, and arginyl-tRNA synthetase (ArgRS) is a class I enzyme. Shown here are the outlines of the three-dimensional structures of the two synthetases. The (more. )
Each tRNA Molecule Is Recognized by a Specific Aminoacyl-tRNA Synthetase
The ability of aminoacyl-tRNA synthetases to recognize their correct cognate tRNAs is just as important to the accurate translation of the genetic code as codon-anticodon pairing. Once a tRNA is loaded with an amino acid, codon-anticodon pairing directs the tRNA into the proper ribosome site if the wrong amino acid is attached to the tRNA, an error in protein synthesis results.
As noted already, each aminoacyl-tRNA synthetase can aminoacylate all the different tRNAs whose anticodons correspond to the same amino acid. Therefore, all these cognate tRNAs must have a similar binding site, or "identity element," that is recognized by the synthetase. One approach for studying the identity elements in tRNAs that are recognized by aminoacyl-tRNA synthetases is to produce synthetic genes that encode tRNAs with normal and various mutant sequences by techniques discussed in Chapter 7. The normal and mutant tRNAs produced from such synthetic genes then can be tested for their ability to bind purified synthetases.
Very probably no single structure or sequence completely determines a specific tRNA identity. However, some important structural features of several E. coli tRNAs that allow their cognate synthetases to recognize them are known. Perhaps the most logical identity element in a tRNA molecule is the anticodon itself. Experiments in which the anticodons of methionine tRNA (tRNAMet) and valine tRNA (tRNAVal) were interchanged showed that the anticodon is of major importance in determining the identity of these two tRNAs. In addition, x-ray crystallographic analysis of the complex between glutamine aminoacyl-tRNA synthetase (GlnRS) and glutamine tRNA (tRNAGln) showed that each of the anticodon bases neatly fits into a separate, specific "pocket" in the three-dimensional structure of GlnRS. Thus this synthetase specifically recognizes the correct anticodon.
However, the anticodon may not be the principal identity element in other tRNAs (see Figure 4-30). Figure 4-31 shows the extent of base sequence conservation in E. coli tRNAs that become linked to the same amino acid. Identity elements are found in several regions, particularly the end of the acceptor arm. A simple case is presented by tRNAAla: a single G·U base pair (G3·U70) in the acceptor stem is necessary and sufficient for recognition of this tRNA by its cognate aminoacyl-tRNA synthetase. Solution of the three-dimensional structure of additional complexes between aminoacyl-tRNA synthetases and their cognate tRNAs should provide a clear understanding of the rules governing the recognition of tRNAs by specific synthetases.
Figure 4-31. Identity elements in tRNA involved in recognition by aminoacyl-tRNA synthetases, as demonstrated by both conservation and experimentation.
Identity elements in tRNA involved in recognition by aminoacyl-tRNA synthetases, as demonstrated by both conservation and experimentation. The 67 known tRNA sequences in E. coli were compared by computer analysis. The conserved nucleotides in different (more. )
Ribosomes Are Protein-Synthesizing Machines
If the many components that participate in translating mRNA had to interact in free solution, the likelihood of simultaneous collisions occurring would be so low that the rate of amino acid polymerization would be very slow. The efficiency of translation is greatly increased by the binding of the mRNA and the individual aminoacyl-tRNAs to the most abundant RNA-protein complex in the cell — the ribosome. This two-part machine directs the elongation of a polypeptide at a rate of three to five amino acids added per second. Small proteins of 100 - 200 amino acids are therefore made in a minute or less. On the other hand, it takes 2 to 3 hours to make the largest known protein, titin, which is found in muscle and contains 30,000 amino acid residues. The machine that accomplishes this task must be precise and persistent.
With the aid of the electron microscope, ribosomes were first discovered as discrete, rounded structures prominent in animal tissues secreting large amounts of protein initially, however, they were not known to play a role in protein synthesis. Once reasonably pure ribosome preparations were obtained, radiolabeling experiments showed that radioactive amino acids first were incorporated into growing polypeptide chains associated with ribosomes before appearing in finished chains.
A ribosome is composed of several different ribosomal RNA (rRNA) molecules and more than 50 proteins, organized into a large subunit and a small subunit. The proteins in the two subunits differ, as do the molecules of rRNA. The small ribosomal subunit contains a single rRNA molecule, referred to as small rRNA the large subunit contains a molecule of large rRNA and one molecule each of two much smaller rRNAs in eukaryotes (Figure 4-32). The ribosomal subunits and the rRNA molecules are commonly designated in svedbergs (S), a measure of the sedimentation rate of suspended particles centrifuged under standard conditions (Chapter 3). The lengths of the rRNA molecules, the quantity of proteins in each subunit, and consequently the sizes of the subunits differ in prokaryotic and eukaryotic cells. (The small and large rRNAs are about 1500 and 3000 nucleotides long in bacteria and about 1800 and 5000 nucleotides long in humans.) Perhaps of more interest than these differences are the great structural and functional similarities among ribosomes from all species. This consistency is another reflection of the common evolutionary origin of the most basic constituents of living cells.
Figure 4-32. The general structure of ribosomes in prokaryotes and eukaryotes.
The general structure of ribosomes in prokaryotes and eukaryotes. In all cells, each ribosome consists of a large and a small subunit. The two subunits contain rRNAs of different lengths, as well as a different set of proteins. All ribosomes contain two (more. )
The sequences of the small and large rRNAs from several thousand organisms are now known. Although the primary nucleotide sequences of these rRNAs vary considerably, the same parts of each type of rRNA theoretically can form base-paired stem-loops, generating a similar threedimensional structure for each rRNA in all organisms. Evidence that such stem-loops occur in rRNA was obtained by treating rRNA with chemical agents that cross-link paired bases the samples then were digested with enzymes that destroy single-stranded rRNA, but not any cross-linked (base-paired) regions. Finally, the intact, cross-linked rRNA that remained was collected and sequenced, thus identifying the stem-loops in the original rRNA. Experiments of this type have located about 45 stem-loops at similar positions in small rRNAs from many different prokaryotes and eukaryotes (Figure 4-33). An even larger number of regularly positioned stem-loops have been demonstrated in large rRNAs. All the ribosomal proteins have been identified and their sequences determined, and many have been shown to bind specific regions of rRNA. It seems clear that the fundamental protein-synthesizing machinery in all present-day cells arose only once and has been modified about a common plan during evolution.
Figure 4-33. Two-dimensional map of the secondary structure of the small (16S) rRNA from bacteria, showing the location of base-paired stems and loops.
Two-dimensional map of the secondary structure of the small (16S) rRNA from bacteria, showing the location of base-paired stems and loops. In general, the length and position of the stem-loops are very similar in all species, although the exact sequence (more. )
During protein synthesis, a ribosome moves along an mRNA chain, interacting with various protein factors and tRNA and very likely even undergoes shape changes. Despite the complexity of the ribosome, great progress has been made in determining both the overall structure of bacterial ribosomes and in identifying reactive sites that bind specific proteins, mRNA, and tRNA and that participate in important steps in protein synthesis. Quite detailed models of the large and small ribosomal subunits from E. coli have been constructed based on cryoelectron microscope and neutron-scattering studies (Figure 4-34). These studies not only have determined the dimensions and overall shape of the ribosomal subunits, but also have localized the positions of tRNAs bound to the ribosome during protein chain elongation. Powerful chemical experiments have also helped unravel the complex interactions between proteins and RNAs. In a technique called footprinting, for example, ribosomes are treated with chemical reagents that modify single-stranded RNA unprotected by binding either to protein or to another RNA. If the total sequence of the RNA is known, then the location of the modified nucleotides can be located within the molecule. (This technique, which is also useful for locating protein-binding sites in DNA, is described in Chapter 10.) Thus the overall structure and function of ribosomes during protein synthesis is finally, after 40 years, yielding to successful experiments. How these results aid in understanding the specific steps in protein synthesis is described in the next section.
Figure 4-34. Overall structure of the E. coli ribosome at 25-Å resolution inferred from cryoelectron microscopy and three-dimensional reconstruction based on the analysis of 4300 individual projections.
Overall structure of the E. coli ribosome at 25-Å resolution inferred from cryoelectron microscopy and three-dimensional reconstruction based on the analysis of 4300 individual projections. (a) This model shows the shapes of the large (blue) and (more. )
Genetic information is copied into mRNA in the form of a commaless, overlapping, degenerate triplet code. Each amino acid is encoded by one or more three-base sequences, or codons, in mRNA. Each codon specifies one amino acid, but most amino acids are encoded by multiple codons (see Table 4-2).
The AUG codon for methionine is the most common start codon, specifying the amino acid at the NH2-terminus of a protein chain. Three codons function as stop codons and specify no amino acids.
A reading frame, the uninterrupted sequence of codons in mRNA from a specific start codon to a stop codon, is translated into the linear sequence of amino acids in a protein.
Decoding of the nucleotide sequence in mRNA into the amino acid sequence of proteins depends on transfer RNAs and amino-acyl tRNA synthetases (see Figure 4-25).
All tRNAs have a similar three-dimensional structure that includes an acceptor arm for attachment of a specific amino acid and a stem-loop with a three-base anticodon sequence at its ends (see Figure 4-26). The anticodon can base-pair with its corresponding codon or codons in mRNA.
Because of nonstandard interactions, a tRNA may base-pair with more than one mRNA codon, and conversely, a particular codon may base-pair with multiple tRNAs.
Each of the 20 aminoacyl-tRNA synthetases recognizes a single amino acid and covalently links it to a cognate tRNA, forming an aminoacyl-tRNA (see Figure 4-29). This reaction activates the amino acid, so it can participate in peptide-bond formation.
The composition of ribosomes — the large ribonucleoprotein complexes on which proteins are synthesized — is quite similar in all organisms (see Figure 4-32). All ribosomes are composed of a small and a large subunit. Each contains numerous different proteins and one rRNA (small or large). The large subunit also contains one accessory RNA (5S).
Analogous rRNAs from many different species fold into quite similar three-dimensional structures containing numerous stem-loops and binding sites for proteins, mRNA, and tRNAs. As a ribosome moves along an mRNA, a region of the large rRNA mole- cule in each ribosome sequentially binds the aminoacyl-ated ends of incoming tRNAs and probably catalyzes peptide-bond formation (see Figure 4-34).
By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Emergence of viral diseases represents a serious threat to global public health. The past few decades have witnessed several viral epidemics that have emerged with increasing frequency, including the severe acute respiratory syndrome corona virus (SARS-CoV identified in 2002/2003), Swine flu (H1N1 influenza identified in 2009) and later the Middle east respiratory syndrome corona virus (MERS-CoV identified in 2012). The latest outbreak of a new corona virus disease known as COVID-19, caused by severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) is highly transmittable and pathogenic viral infection which has spread all over the world. Corona viruses are known to cause disease in humans, other mammals, and birds. Many corona virus species have been identified so far, out of which only six species have been known to cause disease in human being. They were named as HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1, SARS-CoV, and MERS-CoV. 1, 2 Out of six above identified corona virus species, the first four are endemic locally and causes mild symptoms in humans. Other two can cause severe illness. 3 In late December, 2019, a new viral respiratory illness appeared and spread very fast all over the world which has been the focus of global attention due to a pneumonia epidemic of unknown cause. Initially, the virus was tentatively named as 2019 novel corona virus (2019-nCoV). However, the virus has now been named SARS-CoV-2 by International Committee of Taxonomy of Viruses (ICTV). 4 This new virus, 2019-nCoV belongs to subgenus Sarbeco virus of the genus beta-corona virus of the family Coronaviridae. Viruses belong to the family Coronaviridae possess a single strand, positive-sense ribonucleic acid (RNA) genome whose length ranges from 26 to 32 kb. SARS-Cov-2 is genetically more related to SARS-CoV than MERS-CoV, but both are beta-corona viruses believed to be originated from bats. Next-generation sequencing (NGS) and phylogenetic analysis of the genome revealed 2019-nCoV was closely related (88% identical) to two bat-derived SARS-like corona viruses and more distant from SARS-CoV (79%) and MERS-CoV (50%). At the amino acid level, the 2019-nCoV is quite similar to that of SARS-CoV, but there are some notable differences. For example, the 8a protein present in SARS-CoV are absent in 2019-nCoV the 8b protein is 84 amino acids in SARS-CoV, but longer in 2019- nCoV, with 121 amino acids the 3b protein is 154 amino acids in SARS-CoV, but shorter in 2019-nCoV, with only 22 amino acids. Like SARS-CoV and MERS-CoV, SARS-CoV-2 can lead to lethal pneumonia. It also has a stronger human-to-human transmission capacity than SARS-CoV and MERS-CoV. 5, 6 The COVID-19 disease primarily spreads through close contact of respiratory droplets generated by infected individuals. 7 The clinical manifestations of COVID-19 exhibit a range of, symptoms from mild flu to life-threatening conditions. Most of the infected patients have reported a high fever while some showed dyspnoea and invasive lesions in both lungs with chest radiographs. However, patients, mostly the younger people those are majority of the workforce and are more likely to be socially active are infected with SARS-CoV-2 could have mild or even asymptomatic and can transmit the virus to others. This might be crucial in further spreading of the disease. Early recognition of an infected person and cutting off the route of transmission are key points to control COVID-19. However, most asymptomatic infections do not seek medical assistance due to no obvious clinical signs and poor prevention awareness, which contribute to the rapid spread of COVID-19. Therefore, it is a great challenge to prevent and control this specific type of patient globally, which requires more attention worldwide.
COVID-19 has become a major global public health concern that has spread very fast with significant morbidity and mortality around the world. As the disease spread to many countries, the WHO declared the outbreak to be a public health emergency of international concern on January 30, 2020 and subsequently on March 11, 2020 declared the rapidly spreading corona virus outbreak as a pandemic. As per the WHO, COVID-19 situation report–147, globally, as of 22nd of August 2020, the disease spread over 213 countries of the world having 22,812,491 confirmed cases and 795,132 deaths recorded worldwide.
Early detection with fast and sensitive assays and timely intervention are crucial for interrupting the spread of the COVID-19 virus (SARS-CoV-2). 8 In the absence of suitable antiviral drugs or vaccines, a reliable detection method of the SARS-CoV-2 infection is critical not only for the prevention and control of the COVID-19 pandemic, but also for clinical treatment. Many countries throughout the world were employing combinations of containment and mitigation strategies, for early diagnosis of COVID-19. Effective strategies need to be implemented in the right time to prevent the spread of disease within the population. Apart from this, early diagnosis is essential for prompt intervention for patients who are at higher risk and there is possibility of developing serious complications while infected with COVID-19. At present several diagnosis methods are available for early detection of the virus and the disease with their advantage and disadvantage (Table 1). However, the successful detection of disease depends on several factors, such as the time of testing from onset of illness, the concentration of virus in the sample, the quality of the specimen collected from a person, how it is processed, and the precise formulation of the reagents in the test kits. 28
|Diagnostic technology||Detection||Type of sample||Laboratory/point of care test||Advantage||Disadvantage||Reference|
|RT-PCR||Viral RNA||Nasopharyngeal swab, sputum, stool||Laboratory based||Specific detection and time saving||False negative results also detected in low-viral load||5, 9-13|
|ddPCR||Viral RNA||Nasopharyngeal swab, sputum, stool||Laboratory based||It can detect SARS-Cov-2 accurately in low-viral load samples and minimize false negative results||More expensive and susceptible to exogenous contamination||14, 15|
|Nested RT-PCR||Viral RNA||Nasopharyngeal swab||Laboratory based||It can detect SARS-Cov-2 accurately in low-viral load samples and minimize false negative results||The lid opening after the first round of nested PCR increases the risk of cross-contamination which may lead to false positive results||16|
|RT-LAMP||Viral RNA||Nasopharyngeal swab, sputum, stool||Laboratory based/point of care||Sensitive and specific results in less time, result visualize in eye, does not need a thermo cycler||Cumbersome to optimize primers and reaction conditions||87|
|Penn-RAMP technology||Viral RNA||Nasopharyngeal swab||Laboratory based/point of care||Sensitivity of one-tube Penn-RAMP is 10 times higher than LAMP and RT-PCR||Costly reagents such as polymerase and primers required more number than the LAMP||17|
|CRISPR||Viral RNA||Nasopharyngeal swab, bronchoalveolar lavage fluid||Laboratory based||Easy-to-perform and low cost. In STOP COVID test even RNA extraction are not required||Sometimes gives false result||18|
|Microarray||Viral RNA||Nasopharyngeal swab, bronchoalveolar lavage fluid||Laboratory based||It can identify any mutations associated with SARS-CoV and can detect SNP associated with spike (S) gene of SARS-CoV-2||High cost of a single experiment, the large number of probe designs based on sequences of low-specificity, as well as the lack of control over the pool of analyzed transcripts||19|
|Viral sequencing||Viral RNA||Nasopharyngeal swab||Laboratory based||Convenient, high sensitivity, suitable for detect samples with low-viral load||Sophisticated instruments, increase cost, requires. Trained person||20|
|Serology||Antibody/antigen||Blood||Laboratory based/point of care||Less complex than molecular tests||Antibody detection tests targeting COVID-19 may also cross-react with other pathogens and also false negative result may come at low antigen level at early stage of infection||21|
|ELISA||Antibody||Blood||Laboratory based||Simple and cheap laboratory technique, important for vaccine development and convalescent plasma therapy||ELISA tests are not yet well established for SARS-CoV-2 COVID-19 testing||22|
|CT scan||Lungs||Images of chest||Point of care||CT scans have a higher sensitivity (86–98%) and improved false negative rates compared to RT-PCR||Specificity is low (25%) because the imaging features overlap with other viral pneumonia||23|
|Biosensor||S protein of SARS-CoV-2||Nasopharyngeal swab, sputum, stool||Point of care||Quick detection without any pretreatment of the sample||Cost-effective||24|
|Biomarker||C-reactive protein, lymphocytes, and SARS-CoV-2 RNA||Nasopharyngeal swab and blood||Laboratory based||Easy nonmicrobiologic rapid tests||Same biomarkers are also abnormal in other illnesses and not specific enough to establish a diagnosis of COVID-19||25|
|Nanotechnology||SARS-CoV-2 RNA||Nasopharyngeal swab||Point of care||High sensitivity, adopted in fully automated RNA extraction systems, excellent RNA binding performances||Complex pretreatment steps, requires skillful, expensive than qRT-PCR, with the risk of photobleaching||26|
|Viral culture||In vitro live virus||Human epithelial cell lines||Laboratory based||Important for identification, detection of mutation and development of vaccine||Low sensitivity as isolation is not 100%||27|
- Abbreviations: COVID-19, corona virus disease CRISPR, clustered regularly interspaced short palindromic repeats CT, computed tomography ddPCR, droplet digital polymerase chain reaction ELISA, enzyme-linked immunosorbent assay LAMP, loop-mediated isothermal amplification qRT-PCR, real-time quantitative reverse transcription PCR RNA, ribonucleic acid RT-PCR, real-time PCR SARS-CoV-2, severe acute respiratory syndrome corona virus 2 SNP, single nucleotide polymorphisms.
Till now, the available diagnosis methods for COVID-19 tests are broadly categorized into two types molecular and serological tests. The first category, that is, the molecular assays for detection of SARS-CoV-2 viral RNA, uses the reverse transcription-polymerase chain reaction (RT-PCR)-based method. However, other methods such as isothermal nucleic acid amplification assays, including transcription-mediated amplification and clustered regularly interspaced short palindromic repeats (CRISPR)-based methodologies, are some promising alternatives. According to the SARS-CoV-2 structure (Figure 1) various genes like E, N, S, ORF, and RdRp are targeted for molecular detection. Besides, viral genomic sequencing, a sophisticated technique has become an essential tool for determining the rate and degree of mutational variability associated with SARS-CoV-2. This technique can also be employed for identifying newly emerging strains of the virus for more effective vaccine development.
The second category of diagnostic method includes serological and immunological assays. These serological assays largely rely on detecting antibodies produced by individuals as a result of exposure to the virus while immunological methods are based on detection of antigenic proteins in infected individuals. Detection of antibodies in response to the SARS-CoV-2 virus infection through serological tests can be done through enzyme-linked immunosorbent assay (ELISA) and lateral flow immunoassay. It is important that, these diagnostic techniques can serve for the purposes of detection and management of the COVID-19 pandemic. Testing for SARS-CoV-2 viral RNA by molecular methods identifies SARS-CoV-2-infected individuals during the acute phase of infection as well as to conduct contact tracing for management of disease. On the other hand, antibody-based serological testing subsequently identifies individuals who have developed antibodies to the virus and could be potential convalescent plasma donors. Besides, this antibody base test can be useful to monitor the immune status of individuals and groups exposed to virus over time. 29
In the present scenario, the race is going on to develop cost-effective point-of-contact test kits and efficient laboratory techniques for quick diagnosis of SARS-CoV-2 infection. In the present review, a brief discussion on different diagnostic technologies has been discussed with their scope and limitations. This review also provides an overview of current development in COVID-19 diagnostic techniques and products to facilitate future improvement and innovation.
Virology - Midterm 1
2. Research on the replication of bacteriophages and animal virus DNA laid the foundation for understanding the enzymes involved in cellular DNA replication.
3. RNA splicing in eukaryotic cells was first discovered by studying mRNA of DNA viruses.
Serial dilutions of virus suspension are added to a monolayer of bacterial cells. As the virus replicates and spreads, it kills its bacterial host cell leaving a plaque.
MOI = 1 when there is 1 virion per 1 host cell.
4. Replication of viral genome.
2. Plant viruses penetrate through a wound caused by insects.
3. Enveloped animal viruses fuse their lipid envelope directly with the plasma membrane, releasing into the cell.
RNA viruses (except retroviruses) must synthesize an RNA-dependent RNA polymerase to replicate their genomes. This enzyme is not present in the host cell. It replicates the (+) sense strand of mRNA.
2. Is the appropriate cellular machinery available for the virus to replicate? Eg. Some DNA viruses can only replicate in dividing cells which have high levels of DNTP's available for viral DNA synthesis.
Capsids can assemble spontaneously.
Simple capsids have repeating subunits with identical interactions.
Some viral capsids have icosahedral symmetry.
More complex capsids have repeating subunits interacting in a quasi-equivalent manner.
Many virus capsids are organized as helical tubes which have helical symmetry.
Use TEM --> look at the virus. What shape is it?
Determine if the virus is enveloped or not.
Most have icosahedral capsids.
All dsDNA viruses have unfragmented genomes, that is all genes are contained on a single DNA molecule.
All RNA viruses have linear genomes.
Most genomes are small and capsids are non-enveloped, however the larger nucleocapsids are enveloped, perhaps to protect against degradation.
Have helical nucleocapsids
Enveloped with linear genome.
Many have fragmented genomes, therefore have high psudo-viral particles.
Each genome segment codes for a single mRNA and protein.
Have to carry RNA polymerase.
Capsid provides structure that positions each RNA polymerase molecule and various genome segments, allowing transcripton of each segment and directional transfer of each mRNA into the cytoplasm.
Infects and replicates in a wide variety of hosts, perhaps because it must carry it's own RNA polymerase, it is therefore less dependent on the specific cellular environment.
2. Viroids which do not code for proteins but replicate independently of other viruses -> have auto enzymatic activities which supports the postulation that RNA's were once able to replicate themselves in the absence of proteins (thought to involved the action of ribozymes.)
2. The regressive (or reduction) hypothesis.
Autonomous organisms initially developed a symbiotic relationship but turned parasitic as it became more dependent on the host due to the loss of previously essential genes. Eventually it was unable to replicate independently becoming an obligate intracellular parasite (virus.)
Proposes that viruses may have been the first replicating entities, existing in a pre-cellular world as self-replicating units which gave rise to cells.
Host receptors have their own functions but have been exploited by the virus.
Receptors are usually highly conserved and are therefore unlikely to undergo mutation (Eg. CD4, LDL Receptor, CXC (chemokine) receptor.)
Many viruses bind to carbohydrate groups on glycoproteins/lipids. (Eg. Influenza binds sialic acid residues on glycoproteins present on virtually all cells.)
Some glycoproteins mediate attachment and fusion (Ex. HA protein of influenza uses the HA1 domain to bind sialic acid and the HA2 domain acts as a membrane fusion factor.)
Viral penetration can be preformed in vitro by artificially lowering the pH.
2. Can be imported after partial disassembly in the cytoplasm.
3. Can be imported after disassembly at the nuclear pore.
4. Intact virions can be transported through the nuclear pore complex.
The hepatitis virus is so small that the capsid can move through the nuclear pore directly into the nucleus.
2. Can flood the extracellular space with a soluble receptor.
3. Can block the cellular receptor.
4. Can use lysomotropic agents, carboxylic ionophores and bafilomycin A1 which inhibit the acidification of the endosome.
5. Can inhibit un-coating (using amantadine or WIN compounds.)
Neuraminidase helps the flu virus break through the host cell membrane.
Tamiflu is an inhibitor of the influenza neuraminidase that binds to the enzymes active site thus, the virus cannot leave the cell to infect other cells (Tamiflu thus blocks the exit of the virus.)
1st - Fibers bind to LPS. Phage exhibits very specific host tropism. If there is any change in the fibers, there will be no binding.
2nd - Core proteins are inserted via the phage tail into the cell, forming the pore. Class 1 genes enter the cell first (mediated by a cell signal.)
3rd - gp16 can begin degrading the peptidoglycan.
4th - DNA enters exposing 3-P which bind E. coli RNA polymerase.
Class 1 genes are required first for class II and III, and to shut down the host functions.
Class II are needed for replication. Class II genes code for enzymes involved in T7 dependent DNA replication. The use of multiple promoters leads to a nested set of overlapping transcripts with a common 3' end.
Class II genes are needed for assembly.
2. Class II controls the activity of T7 RNA polymerase by the encoded lysozyme which binds to T7 RNA polymerase to reduce it's activity.
Lac genes express enzymes involved in lactose breakdown in the presence of lactose (or IPTG, a lactose analogue.) The binding of IPTG triggers the transcription of Lac genes.
Copy it in high numbers using the pET vector.
Isolate your protein of interest using a his-tag you incorporated into your gene.
2. Understanding viral evolution to avoid/infect certain tissues is important to understanding the nature of within-host viral adaptation and the balance between adaptation for maximum transmission and adaptation within the host.
Both bacteria and phage are associated with this mucus.
Tail fiber binds to outer membrane protein LamB (a maltose inducible porin.)
DNA is then injected through into the periplasm.
T1D depends on the balance between auto reactive effector T Cells and regulatory T cells. This balance is influenced by dendritic cells (DC).
- The Picornavirus capsid contains 60 copies of each of the proteins (VP1, VP2, VP3 and VP4.)
- The Poliovirus binds to the poliovirus receptor on the host cell.
- Once bound to the host cell, the genomic RNA passes through pores formed in the cell membrane by capsid proteins VP4 and VP1.
VPg is a protein which is linked to the N-terminus of the ssRNA.
A major determinant of paralysis is the step-loop V in the IRES.
This is a translation factor specific to neurons which has a defective interaction with the RNA.
A point mutation in the stem-loop V is what is responsible for the attenuated Sabin vaccine.
- A full length negative RNA strand is made and then used as a template for (+) strand synthesis.
- Translation is first required to generate replication proteins (the polyprotein is cleaved.)
- RNA synthesis is primed by VPg covalently bound to uridine residues. The uridylyation of VPg allows it to hybridize to the poly(A) tail and serve as a primer for (-) strand synthesis.
- RNA synthesis is completed by viral RNA-dependent RNA polymerase (brought over with the virus.)
- Five promoters self assemble into pentamers.
- Pentamers and RNA assemble into a provirion.
Symptoms of a viral infection (pain, fatigue, fever) come from the immune response (not the virus itself.)
The first case in an epidemic is called the index case.
The only difference is, flavavirus undergoes cap-dependent translation.
All genes are translated as a single polyprotein followed by proteolytic cleavage (same as Picornavirus.)
Initial stage --> High fever and muscle pain, headache and severe vomiting.
Quiescent Stage --> Fever and Symptoms disappear
Final Stage --> Repeat of original symptoms only more sever. Can lead to multi-system organ failure.
- Dengue fever
= Dengue hemorrhagic fever
- Dengue shock syndrome.
Leads to death in 5% of cases.
- Dengue targets areas with high white blood cell counts (liver, spleen, lymph nodes, bone marrow and glands.)
- Dengue enters the white blood cells and lymphatic tissues.
- The flavavirus E protein covers the entire surface and directs both binding to the host receptors and fusion of viral and cellular membranes. The E protein is immunogenic. The host favavirus receptor is currently unknown.
Following attachment, entry into the cell is mediated by endocytosis. The vesicle fuses with late endosomes, resulting in a drop in pH. Acidification results in rearrangement of the E protein dimer into it's fusion active trimeric state.
- The signal sequence is cleaved in the lumen of the ER, releasing the polyprotein.
- The mature C protein is cleaved by NS1, releasing the mature capsid protein.
- Incubation period lasts from 12-23 days.
- Viral replication occurs in the nasopharynx and regional lymphnodes.
- Viremia occurs within 5-7 days.
- Placenta and fetus are both infected during viremia.
Once bound, the virus then undergoes receptor-mediated endocytosis.
The drop in pH causes a conformational change in E1 and E2 which leads to membrane fusion and exposes the hydrophobic fusion peptide.
Non-structural proteins are translated directly from the genomic RNA as a polyprotein which is cleaved by viral proteases.
In the early phase of infection, P1234 is proteolytically processed and P4 is cleaved off. P2 codes for a protease and P4 codes for RNA polymerase. P4 binds to P123 and forms a complex which allows for the synthesis of the (-) sence anti-genome RNA.
The anti-genome serves as a template for both full-length and subgenomic RNA synthesis.
Cytoplasmic RNA Scaffolds
Several examples of flexible cytoplasmic RNA scaffolds and decoys have recently come to light, particularly involving cytoplasmic lncRNAs and mRNAs. These, together with a long-known ribosome-associated flexible RNA scaffold are now discussed ( Figure 2 and Table 1 ).
7SL – a paradigm of cytoplasmic RNA scaffolding. (1) Translation of proteins with N-terminal signal peptides (green) are bound by SRP complex, scaffolded by 7SL. (2) Signal peptide binding induces SRP conformational change and tighter binding, thus block ribosomal A-site and stalling translation elongation. (3) SRP interaction with SRP receptor positions nascent peptide for entry into ER translocon. (4) Dissociation of SRP relieves elongation stall, and nascent peptide extends into and folds within ER lumen.
7SL Scaffolds Signal Recognition Particle
The signal recognition particle (SRP) is a well-studied RNP complex that is scaffolded by an RNA Polymerase III-derived RNA (7SL Figure 2 ). SRP recognizes and binds to N-terminal signal peptides present on proteins destined for secretion or membrane localization as they emerge from a ribosome. SRP binding to signal peptides causes a stall in translation elongation, and targets the ribosome to a translocon pore in the ER membrane where translation then resumes, resulting in the unstructured nascent peptide being threaded into the ER lumen, where folding and later trafficking can ultimately occur (Keenan et al., 2001).
The human 7SL lncRNA is a nt long structured RNA characterized by helical regions forming 2 domains separated by a flexible linker, a smaller domain (Alu) and a larger domain (S), with each part recruiting specific proteins (6 in total) that form the SRP (Pool, 2005). The Alu domain of 7SL preferentially binds the SRP14/SRP9 heterodimer whose function involves binding the ribosome and inhibiting translation elongation after signal recognition by obscuring access to the ribosomal A-site, while the S domain binds the heterodimer SRP68/SRP72, SRP19 and SRP54, and binds in proximity to the nascent peptide tunnel (Siegel and Walter, 1988 Wild et al., 2001). Besides signal peptide binding (via SRP54), the S domain also mediates SRP interactions with the SRP receptor at the ER membrane (Gowda et al., 1997 Pool et al., 2002). Crystal structure analyses suggests the SRP68/SRP72 heterodimer binding of 7SL drives an initial conformation change enabling proper interaction of SRP with the ribosome (Grotwinkel et al., 2014). Furthermore, Cryo-EM studies indicate that SRP bound to the elongating ribosome exists in at least two distinct states. In the “scanning” state, SRP samples nascent peptides for signal peptide sequences. If none are detected, elongation factor binding readily displaces the Alu domain away from the ribosomal A-site, whereas in the 𠇎ngaged state,” with SRP54 bound to the nascent peptide, the Alu domain remains robustly bound near the A-site, thus inhibiting translation (Voorhees and Hegde, 2015). Thus, dynamic flexibility of the 7SL RNA scaffold is critical to SRP function.
Cytoplasmic lncRNAs as RNA Scaffolds and Decoys
While most lncRNAs exhibit a significant bias toward nuclear localization, many also localize in the cytoplasm and thus unsurprisingly, are increasingly being linked to an array of diverse scaffolding and decoy roles.
lncRNAs can bind to mRNA in trans, and scaffold complexes that impact mRNA function. For example, numerous lncRNAs harboring Alu elements (𠇁/2-sbsRNAs”) were identified that form partially complementary duplexes in trans with Alu elements present in various mRNA 3′UTRs (Gong and Maquat, 2011 Figure 3 ). Upon lncRNA-mRNA binding, the resulting double stranded RNA serves as a binding site for the RBP Staufen, which in turn recruits the RNA helicase Upf1, resulting in degradation of many target mRNAs by a process termed Staufen mediated decay (SMD) (Kim et al., 2005). SMD regulation by lncRNAs is complex, given that a) several different Alu-containing lncRNAs can promote decay of the same mRNA b) a given Alu-containing lncRNA can regulate several mRNAs and c) not all mRNA targets with complementarity to a given Alu-containing lncRNA necessarily undergo decay, possibly owing to other structural aspects of the mRNA-protein complex (mRNP) (Gong and Maquat, 2011 Gong et al., 2012 Park and Maquat, 2013). In another case, p21, a highly studied lncRNA transcriptional target of p53 with many nuclear functions, also forms hybrids with target cytoplasmic mRNAs ( Figure 3 ). Specifically, p21 exhibits partial base pair complementarity and affinity purifies with CTNNB1 and JUNB mRNAs, which encode oncogenic proteins β-catenin and JunB (Yoon et al., 2012). Based on knockdown studies, p21 exerts an inhibitory effect on translation of these mRNAs, rather than mRNA decay. Consistent with this, p21 also affinity purifies with the translational repressors RCK and FMRP, and fractionates in polysomes, suggesting that p21 pairs to and recruits translation repressors to mRNA targets (Yoon et al., 2012).
Examples of cytoplasmic RNA scaffolds and decoys. Scaffold and decoy RNAs are depicted in pink (A) mRNA translation: lncRNA-p21 partially base pairs with target mRNAs leading to the recruitment of translation repression factors like RCK and FMRP, thus inhibiting mRNA translation. (B) mRNA decay: 1/2sbsRNAs base pair with target mRNAs the resulting dsRNA recruits Staufen, leading to staufen-mediated mRNA decay. (C) miRNA sequestration: various lncRNAs (linear and circular) in multiple species, regulated in a tissue, developmental or environmentally sensitive manner, can base-pair with and sequester miRNAs, preventing their regulation of mRNA translation or decay. (D) Protein degradation: HOTAIR lncRNA binds to two ubiquitin ligases and their substrates causing their ubiquitination and degradation.
A second example of 3′UTR co-translationally scaffolding assembly of protein complexes involves the Baculoviral IAP repeat-containing protein 3 (BIRC3) gene. BIRC3 encodes a ubiquitin ligase, and alternative cleavage and polyadenylation generates short (BIRC3-SU) and long (BIRC3-LU) 3′UTR mRNA isoforms, the latter of which is significantly upregulated in chronic lymphocytic leukemia (CLL) cells (Lee and Mayr, 2019). While not affecting BIRC3 mRNA or protein localization, or protein levels, BIRC3-LU is specifically bound by HuR and Staufen, which in turn recruits trafficking-regulating proteins (IQ motif containing GTPase IQGAP1, and the Ras GTPase RALA) (Lee and Mayr, 2019). These interact with nascently translated BIRC3 such that the 3′UTR scaffolds formation of a BIRC3-RALA-IQGAP1 complex. This complex in turn binds and promotes plasma membrane recycling of a receptor protein called CXCR-4, which promotes CLL progression by aiding malignant B cell migration and survival via targeting to protective bone marrow niches (Burger et al., 2005). Finally, the BIRC3-LU 3′UTR, distinct from the BIRC3-SU isoform, uniquely binds proteins involved in mitochondrial biology and chromatin remodeling, suggesting that multiple functionally distinct protein complexes may be scaffolded by this particular 3′UTR (Lee and Mayr, 2019).
While best known for its nuclear function HOTAIR is also found in the cytoplasm, particularly under cellular conditions such as senescence (Yoon et al., 2013). Cytoplasmic HOTAIR serves as a binding site for two ubiquitin ligases DZIP3 and MEX3B, and their substrates Snurportin 1 (SNUPN) and Ataxin-1 (ATXN1) this binding leads to increased SNUPN and ATXN1 ubiquitination and their degradation. HOTAIR accumulation is necessary for eliciting proper cellular senescence phenotypes in response to inducing stimuli, possibly in part through SNUPN and ATXN1 degradation (Yoon et al., 2013). Both HOTAIR and p21 are downregulated by binding of the RBP HuR, which normally stabilizes mRNAs, but in the case of these lncRNAs, stimulates assembly and recruitment of a miRNA silencing complex (Let-7/Ago2) to elicit their decay (Yoon et al., 2012). Thus, cytoplasmic lncRNAs can regulate translation and protein turnover events and can be subject to miRNA-mediated regulation of their stability.
Conversely, a well appreciated mode of lncRNA function in the cytoplasm is to function as decoys for miRNAs (aka sponges Figure 3 ). First described in plants where the lncRNA IPS1 sequesters miR-399 as part of the regulation of phosphate homeostasis (Franco-Zorrilla et al., 2007), several lncRNA miRNA decoys are now known, some of which harbor multiple sites for distinct miRNAs and can exist as splicing-intermediate derived circular lncRNAs (Hansen et al., 2013 Memczak et al., 2013). Many miRNA decoys appear highly regulated, with distinct tissue, developmental and environmentally responsive expression patterns, and have the potential to exhibit large-scale effects on mRNA translation and decay, owing to the diversity of miRNA molecules whose function they may impede. We guide readers to other detailed reviews on these topics (Ebert and Sharp, 2010a, b Yoon et al., 2014 Rong et al., 2017).
MRNAs Scaffold Co-translational Assembly of Protein Complexes
Increasing evidence indicates that mRNAs serve as scaffolding platforms for co-translational protein complex assembly (Natan et al., 2017). In prokaryotes, subunits of a given protein complex are often encoded in operons (a cluster of genes of common function under a single promoter), which are transcribed into polycistronic mRNAs. Translation of polycistronic mRNAs in turn enhances the likelihood of the encoded protein complex subunits interacting. Such interactions can occur on polycistronic mRNAs even when nascently synthesized proteins are still associated with ribosomes, owing in part to their close spatial proximity upon synthesis (Shieh et al., 2015). In this way, the mRNA serves as a scaffolding molecule facilitating nascent protein interactions. Furthermore, adjacent genes in operons tend to exhibit larger protein interaction interfaces (Koonin and Mushegian, 1996 Dandekar et al., 1998 Shieh et al., 2015 Wells et al., 2016). These findings, combined with additional bioinformatic and structural analyses of protein complex assembly suggest that operon gene order often matches (and thus may help direct) the order in which protein subunits assemble (Wells et al., 2016). Besides spatial proximity of subunits, lower stoichiometric variation in protein complex subunit expression is an additional advantage of protein complex assembly on polycistronic mRNAs (Swain, 2004 Sneppen et al., 2010 Shieh et al., 2015).
Eukaryotes lack operons, meaning any co-translational interactions must involve an in trans event such as recruitment of an interacting protein to a translating mRNP, or juxtaposition of translating mRNPs. Nonetheless, co-translational assembly of protein complexes in eukaryotes has been reported (Natan et al., 2017). Notably, recent affinity purification studies of well characterized protein complexes in yeast (Shiber et al., 2018) and human cells (Kamenova et al., 2019) demonstrate in many cases that nascently translating proteins interact strongly with proteins with which they are destined to form complexes with. Co-translation interactions were identified via careful interaction domain mapping, mRNA-protein co-localization microscopy and critically by detection of both protein interactors, and their mRNA template molecules, following purification of a protein complex subunit of interest. Combining nascent peptide purification with ribosome profiling (an RNA sequencing based method to map ribosome mRNA footprints) allowed determination of precisely where ribosomes were located on a given mRNA when their nascent peptides interacted with a protein of interest (Shiber et al., 2018). Co-translational protein interactions could be unidirectional (meaning Protein A and its mRNA are purified by Protein B isolation, but not vice versa) or bidirectional. This depended largely on how close a given protein’s interaction domain was to the C-terminus, as this affects the degree and timing of exposure of the interaction domain from within the translating ribosome (Shiber et al., 2018). Co-translational assembly of protein complexes was important in preventing protein misfolding and increasing the efficiency of protein complex assembly (Shiber et al., 2018 Kamenova et al., 2019). Thus, co-translational assembly of proteins in eukaryotes may be a common phenomenon.
Two primary models for eukaryotic co-translational protein interaction are generally postulated. The first states that a fully synthesized protein interacts with a nascently synthesizing protein, though whether this simply involves random diffusion, co-localization or an active chaperone driven recruiting process is unclear. A second model is that, particularly in the case of bidirectional co-translational protein interactions, translating mRNPs may be tethered to one another in trans by their nascent peptides (Natan et al., 2017 Schwarz and Beck, 2019). Though the above discussed examples do not distinguish between these models, recent data, discussed below, adds weight and new insight to both ideas and focuses attention on the importance of mRNA as a scaffolding molecule.
MRNA 3′UTR Scaffolding of Protein Complexes
mRNAs possess three primary domains: a 5′ UTR, an open reading frame and a 3′UTR. While all are implicated in translation, ribosomes ordinarily only transit through 5′UTRs and ORFs, leaving 3′UTRs as regions where more stable interactions with proteins and other RNA molecules can be sustained. 3′UTR sequences are unconstrained by selection for a given protein sequence, are significantly longer than 5′UTRs, and increase in length with organismal complexity (Pesole, 2002 Mayr, 2017). Thus, it is not surprising that while 3′UTR localized interactions clearly govern individual mRNA functions, such as translation, decay and localization (Wilkie et al., 2003 Andreassi and Riccio, 2009), new functions for 3′UTRs are being uncovered.
Pioneering work by the Mayr lab demonstrated that specific mRNA 3′UTRs can scaffold protein-protein interactions in which a 3′UTR bound protein (or proteins) interacts with nascently synthesized protein translated on the mRNA itself, with important physiological consequences ( Figure 4 ). The first report of this phenomenon (Berkovits and Mayr, 2015) concerned the CD47 gene, which encodes a protein capable of both ER and cell surface localization at this latter site, CD47 protects cells from phagocytosis by macrophages (Oldenborg et al., 2000 Jaiswal et al., 2009). Alternative cleavage and polyadenylation usage, a highly regulated means by which alternate 3′UTR isoforms are generated (Sandberg et al., 2008 Mayr and Bartel, 2009 Lianoglou et al., 2013), results in two primary CD47 3′UTR mRNA isoforms - a short (CD47-SU) and long (CD47-LU) isoform. Knockdown of CD47-LU inhibits CD47 protein cell surface localization, independently of any mRNA localization effects, as both short and long CD47 mRNA reporters localized to the ER. Subsequent analysis of 3′UTR binding sites, gene knockdown, RNA-immunoprecipitation and mRNA-protein tethering approaches determined that HuR likely binds the CD47-LU 3′UTR in association with SET. In turn, SET interacts with nascently translated CD47 protein, and the membrane-localizing small GTPase RAC1, which then aids translocation of CD47 (and SET) to the plasma membrane (Ten Klooster et al., 2007). Underlying the importance of this 3′UTR scaffolding of a protein-localizing complex, cells forced to express either the CD47-LU or CD47-SU in isolation showed significant differences in phagocytic susceptibility (CD47-LU cells more resistant) and induction of apoptosis following γ-irradiation (only CD47-SU cells induce apoptosis) (Berkovits and Mayr, 2015).
mRNAs as cytoplasmic RNA scaffolds. (A) Protein localization: (1) The CD47-LU mRNA 3′UTR recruits HuR and SET. (2) Within TIS11B-ER membrane compartments (TIS granules), nascently translated CD47’s interaction with SET is facilitated. (3) Recruitment of RAC1 by SET results in subsequent translocation of CD47 to the plasma membrane. (B) mRNP granule assembly: (1) The RPS28B 3′UTR is presumed to recruit Edc3 prior to its subsequent interaction, (2) with either nascently or newly translated Rps28. (3) Since translating mRNAs are excluded from mRNP granules, ribosome run-off is likely required for an RPS28B-Edc3-Rps28 RNP complex to help nucleate yeast P-body assembly.
3′UTR driven co-translational assembly of protein complexes likely occurs in all eukaryotes. We recently demonstrated in yeast that a specific mRNA 3′UTR acts as an mRNA scaffold promoting the assembly of P-bodies (Fernandes and Buchan, 2020), which are cytoplasmic membrane-less organelles enriched in non-translating mRNPs. The mRNA in question is RPS28B, one of two RPS28 paralogs (which code for ribosomal protein S28) that has an unusually long 3′UTR (643 nts) and which harbors a stem-loop structural element which binds Edc3 (He et al., 2014), a protein involved in P-body assembly (Decker et al., 2007). Using genetics, microscopy, and protein interaction assays, we demonstrated that Edc3 recruited by the RPS28B 3′UTR binds nascently or newly translated Rps28b in a manner dependent on the RPS28 ORF and 3′UTR being in cis ( Figure 4 ). The resulting Edc3-Rps28 interaction in turn is key to normal P-body assembly.
To our knowledge, RPS28B is the first case of a specific mRNA implicated in scaffolding an mRNP granule, and it is likely that other RNA molecules (like NEAT1 in paraspeckles) will possess potent scaffolding potential for RNP granules, based on their potentially high interaction valency (either with proteins or other RNA molecules (Van Treeck and Parker, 2018 Van Treeck et al., 2018)) and ability to exist in a non-translating state. Indeed, recent data argues that poorly translated, long RNAs preferentially accumulate within RNP granules such as P-bodies and stress granules (Khong et al., 2017). RPS28B also meets these criteria (Ingolia, 2009), though questions remain as to the nature of the Rps28-Edc3 protein interaction, and whether regulation of RPS28B translation rates or abundance may also occur to influence P-body assembly. Finally, our work demonstrates that mRNAs from gene paralogs, in addition to transcript isoforms from a single gene, represents another means to allow mRNA-based regulation of macromolecular complex assembly. Given that 50% of all human genes generate mRNAs with alternative 3′UTRs (Lianoglou et al., 2013) a broad role for 3′UTRs in scaffolding co-translational protein complex assembly seems feasible (Mayr, 2019).
MRNA Co-localization Can Aid Co-translational Protein Interactions
While mRNA localization clearly facilitates protein localization (Holt and Bullock, 2009), co-localization of mRNAs in specific cellular compartments may also aid co-translational protein interactions and/or protein complex assembly. Such a mechanism is an appealing explanation for co-translation protein assembly mechanisms discussed above. Interestingly, a few studies suggest the existence of “translation factories,” where functionally related mRNAs are spatially concentrated in cellular compartments.
In human cells, several membrane protein-encoding mRNAs, including CD47-LU (but not CD47-SU), harbor binding sites for an RBP called TIS11B. These mRNAs, as well as SET and HuR, coalesce with TIS11B in an ER-intertwined reticular structure termed TIS granules, that rely on TIS11B expression for their assembly (Ma and Mayr, 2018). Importantly, TIS11 expression, and formation of TIS granules promotes interaction of SET with CD47, and localization of CD47 to the plasma membrane. This may reflect increased retention of CD47 mRNA and SET within the TIS granule based on fluorescence recovery after photobleaching data, thus increasing the likelihood of co-translational CD47 interaction with SET (Ma and Mayr, 2018).
In yeast, specific glycolytic mRNAs (Lui et al., 2014) and select translation factor mRNAs (Pizzinga et al., 2019) co-localize in distinct translationally active granules, at least some of which encode proteins that collectively form multi-subunit complexes. Finally, several mRNAs encoding components of the Arp2/Arp3 complex, an actin cytoskeleton-regulating complex, co-localize and at least in some cases are translated at the leading edge of fibroblasts (Mingle et al., 2005 Willett et al., 2013). However, direct testing of whether mRNA co-localization effectively serves as a eukaryotic analog of a prokaryotic operon-based protein complex assembly mechanism remains a question for future study.
Trans-Encoded Base-Pairing sRNAs
In contrast to cis-encoded sRNA, trans-encoded base-pairing sRNAs are not physically linked to their mRNA target and the formation of RNA duplexes are mediated by short imperfect RNA interactions (Figure 1C). The function of many of the trans-encoded base-pairing sRNAs depends on the RNA-binding protein Hfq, which is thought to enhance the likelihood of a productive interaction between the sRNA and its target (Waters and Storz, 2009). This is in contrast to cis-encoded base-pairing sRNA that do not generally require the participation of a RNA chaperone (e.g., Hfq) to bind their target mRNA (Brantl, 2007).
In bacterial pathogens, deletion of the hfq gene often leads to a reduction in virulence, as was observed for E. coli, Salmonella, Shigella, Yersinia, and Listeria (Reviewed in Chao and Vogel, 2010). For example, in E. coli, Hfq was shown to regulate the locus of enterocytes effacement (LEE) encoding a type III secretion system (TTSS Hansen and Kaper, 2009 Shakhnovich et al., 2009). In the intracellular pathogen Salmonella enterica serovar Typhimurium, Hfq is necessary for optimal growth in epithelial cells and macrophages (Sittka et al., 2007). Burkholderia cenocepacia encodes two Hfq homologs and both of them are required for optimal resistance to stress and virulence (Ramos et al., 2011). Deletion of the hfq gene of Staphylococcus aureus has no effect on metabolism but reduces virulence (Bohn et al., 2007 Liu et al., 2010). However, in Neisseria gonorrhoeae, deletion of hfq leads to only a weak reduction of virulence (Dietrich et al., 2009). Moreover, in some bacteria, Hfq is required for the function of some sRNAs but dispensable for others. For example, in V. cholerae, Hfq is required for the control of the quorum sensing systems by the sRNAs Qrr1–Qrr4, but dispensable for the repression of ompA by VrrA (Lenz et al., 2004 Song et al., 2008). It is noteworthy that Helicobacter pylori does not encode an Hfq homolog but still expresses hundreds of sRNAs (Sharma et al., 2010). This suggests that in some bacterial species, the function mediated by Hfq is not necessary for sRNA-mediated gene regulation or that an as yet unknown protein could carry out a similar function. Following genome-wide identification of Hfq-binding sRNAs, it was postulated that even in E. coli, some base-pairing sRNAs might not bind to, or use Hfq (Zhang et al., 2003). Careful review of the Hfq-related literature lead Jousselin et al. (2009) to postulate that the need for Hfq in mRNA–sRNA interaction is related to a number of factors. First, the higher the overall GC content of the bacterial genome the more likely Hfq is required and Hfq seems to be dispensable in bacteria whose genomes display a low GC value, such as S. aureus (32% GC). Second, Hfq is dispensable when the sRNA–mRNA interaction is mediated by long (㸰) and uninterrupted pairing. Third, they observed a correlation between a requirement for Hfq and the C-terminal extension length of Hfq, which forms an mRNA interaction surface. Hfq proteins that have a short C-terminus tend to be found in bacteria in which Hfq is dispensable.
In L. pneumophila, deletion of the hfq gene affects the duration of the lag phase after inoculation in fresh broth (McNealy et al., 2005). Moreover, the L. pneumophila hfq mutant shows a reduced growth rate in chemically defined medium containing low concentrations of iron and a reduction in the expression of the ferric uptake regulator (fur). In E. coli, the RyhB sRNA negatively regulates expression of fur in a Hfq-dependant manner (Vecerek et al., 2007). In addition, the L. pneumophila hfq mutant shows a small reduction in intracellular growth (McNealy et al., 2005). The somewhat limited effect of deleting the hfq gene on L. pneumophila phenotypes suggests that Hfq is not critical for sRNA–mRNA interactions in this organism. The GC content of the L. pneumophila genome is low (38%) and alignment of its Hfq protein sequence with other homologs (Figure 2) reveals that the C-terminal region is short and comparable to the length of the V. cholerae Hfq that is not essential for all mRNA–sRNA interactions. According to the postulates of Jousselin et al. (2009), one could hypothesize that Hfq will not be required for all sRNA–mRNA interactions in L. pneumophila.
Figure 2. Alignment of Hfq proteins from E. coli (Eco), V. cholerae (Vch), L. pneumophila (Lpn), and S. aureus (Sau) was performed with ClustalW2 (Chenna et al., 2003).
Nonetheless, one can speculate that in L. pneumophila, base-pairing sRNAs acting through Hfq may regulate iron acquisition, virulence-related functions and possibly other systems as well, although Hfq function would not be essential for these. Expression profiling of a hfq-deficient L. pneumophila strain would shed light on the importance of Hfq on gene regulation and be of great help at identifying phenotypes that could be affected by it. A similar approach was used for other bacteria such as E. coli (Zhang et al., 2003), Typhimurium (Sittka et al., 2008), B. cenocepacia (Ramos et al., 2011), Pseudomonas aeruginosa (Sonnleitner et al., 2006), and N. gonorrhoeae (Dietrich et al., 2009). In addition, immunoprecipitation of Hfq with subsequent identification of bound sRNAs by enzymatic RNA-sequencing (Christiansen et al., 2006), tiling microarray (Zhang et al., 2003), or deep-sequencing (Sittka et al., 2008) would shed light on the mRNA species affected by Hfq and on the potential sRNAs whose functions are at least partially dependant on Hfq. Windbichler et al. (2008) have used an affinity chromatography procedure to identify RNA-binding proteins in E. coli. Briefly, they tagged a number of known sRNAs with a streptomycin-binding RNA aptamer, allowing them to bind to a streptomycin-coated column, which was then used to capture RNA-binding proteins from cellular extracts. They found that three proteins were consistently bound to a variety of sRNA sequences: Hfq, RNAP β-subunit and the small ribosomal subunit S1. Moreover, they showed that specific proteins could interact with a specific sRNA, depending on its sequence and secondary structure. Therefore, a hunt for sRNA-binding proteins is necessary to complete the sRNA-mediated regulatory landscape and to fully understand the extent of their impact on regulation of cellular functions.
In L. pneumophila, a number of trans-encoded base-pairing sRNA candidates have been identified but mechanistic studies are needed to evaluate their mode of action and to validate them as authentic base-pairing sRNAs (Table 1). Five intergenic RNAs were identified based on computer prediction by using the sRNA Predict software (Faucher et al., 2010). By searching for Rho-independent terminators in intergenic regions preceded by a sequence conserved in other L. pneumophila strains, 143 sRNA molecules were predicted. Using a custom-made microarray, the expression of 101 of these predicted sRNAs was monitored during growth in a variety of conditions. This two-step approach led to the identification of 12 sRNA molecules that were actively expressed, including 6S RNA, six 3′UTR, and five sRNAs that are independently transcribed (Faucher et al., 2010 Table 1). At this point the functions of the five identified sRNAs are unclear. Interestingly, expression of LprA during exponential growth is dependant on OxyR but dependant on RpoS during post-exponential phase (Figure 3). Since RpoS is an important regulator of virulence, it is tempting to speculate that LprA could be part of its regulatory cascade and plays a role in expression of virulence factors. Regardless of the growth phase, the presence of H2O2 induces its expression, which suggests that LprA responds to oxidative stress. This is similar to the E. coli sRNA OxyS, which is part of the oxidative stress response and reduces its mutagenic effects (Altuvia et al., 1997).
Figure 3. Model of the regulatory networks involving sRNAs in L. pneumophila. The lines show interaction between the players: Arrow, activation T bar, repression dotted line, putative, or predicted interaction. See text for details.
RNA-sequencing identified 38 sRNA molecules encoded in intergenic regions that could be considered as potential trans-encoded sRNAs (Weissenmayer et al., 2011 Table 1). Of these, nine were predicted to be functional based on the stability of their predicted secondary structures at 37ଌ. The predicted structure of one sRNA (Lpr0010) was less stable than 1000 randomly permutated sequences of the same length and base composition at 20 or 37ଌ, suggesting that it is under evolutionary pressure to form an unstable secondary structure. The biological relevance of this was not explored further, but one can hypothesize that the structure is only stable at low temperatures (less than 20ଌ) and that it could be part of a cellular response to low temperature. Interestingly five sRNA pairs were identified, for which two distinct sRNA are transcribed antisense to each other (Weissenmayer et al., 2011). In E. coli, the sRNAs RyeB and SraC are encoded opposite to each other and RyeB is completely complementary to the longer SraC segment (Vogel et al., 2003). The size of SraC is nt, but when RyeB is present a shorter band ( nt) is also detected. This reduction in size seems to be dependent on RNase III, suggesting that RyeB mediates degradation of SraC. For the sRNA pairs identified in Legionella, one sRNA can act as a negative regulator of the other, efficiently sequestering it by extended base-pairing and potentially targeting it for degradation. Moreover, mRNA can also regulates sRNAs. This mechanism, named trap-RNA, was described for the MicM sRNA that induces degradation of the YbfM porin mRNA. The chb polycistronic mRNA contain a sequence complementary to MicM and expression of the chb operon leads to MicM hybridization and degradation, resulting in stabilization of the ybfM mRNA (Figueroa-Bossi et al., 2009 Overgaard et al., 2009). Again, additional work is needed to understand the regulatory functions of Legionella trans-encoded base-pairing sRNA.
There are a number of base-pairing sRNAs encoded in other bacterial genomes that are known to affect virulence. A few examples are provided below that might be relevant in the context of L. pneumophila intracellular growth. One intracellular pathogen for which extensive identification and characterization of sRNA have been and are being performed is Salmonella. In this species, outer membrane protein (OMP) expression is regulated by a network of sRNAs. One of them, InvR, is encoded on the Salmonella pathogenicity island-1, acquired by horizontal gene transfer (HGT) and encoding the TTSS responsible for enterocyte invasion (Pfeiffer et al., 2007). Expression of this sRNA is dependant on HilD, a key regulator of TTSS expression. When the TTSS is expressed, InvR acts as a negative regulator of OmpD synthesis, one of the most abundant OMP in Typhimurium. Indirect evidence suggests that repression of OmpD could stabilize the membrane in the context of TTSS expression, allowing succesful translocation of bacterial effectors (Vogel, 2009). Therefore, InvR is thought to have helped establishment of the TTSS sequences after HGT by repressing expression of OMP that were incompatible with the virulence advantage provided by the TTSS (Vogel, 2009). Therefore, it is tempting to speculate that similar mechanisms exist in L. pneumophila to repress OMPs during expression of the Icm/Dot system, the Type IVA secretion system (lvr/lvh) or the Tra conjugative system. However, to date, no trans-encoded sRNAs have been identified in the vicinity of these systems, but, as described above, one cis-encoded sRNA is antisense to lvrA (lpg1259).
The sRNA VrrA of V. cholerae is part of the membrane stress response pathway mediated by σ E and targets ompA mRNA, presumably to limit synthesis of OMPs (Song et al., 2008). Deletion of vrrA leads to an increase in the synthesis of outer membrane vesicles that are known to be involved in delivery of virulence factors to host cells (Mashburn-Warren and Whiteley, 2006). Moreover, VrrA seems to negatively regulate expression of the adhesion molecule Tcp and therefore affects intestinal colonization (Song et al., 2008). There is structural similarity between VrrA and LprD of L. pneumophila and it is tempting to speculate a role for LprD in the regulation of OMP synthesis. However, structure comparisons of trans-encoded sRNAs have been of limited help for predicting function or targets, and an experimental strategy should be taken to determine if LprD regulates OMP synthesis.
The quorum system of V. cholerae comprises four redundant sRNAs named Qrr1–Qrr4 and two signaling molecules, the furanosyl borate diester (AI-2) and the α-hydroxyketone Cqs (Lenz et al., 2004). At low cell density, the system positively regulates expression of Qrr1–Qrr4, which destabilize the mRNA of hapR, a negative regulator of virulence. Therefore, at low cell density, hapR is degraded allowing expression of virulence traits. L. pneumophila also possesses a putative quorum system, based solely on the presence of the α-hydroxyketone Lqs and the LqsR/LqsS two-components system (TCS) (Tiaden et al., 2007 Spirig et al., 2008). Beside the absence of AI-2 signaling in L. pneumophila the quorum system architecture of L. pneumophila and V. cholerae are quite similar (Tiaden et al., 2010). However, in L. pneumophila no sRNA has been implicated in this regulatory system as yet. Following RNA-sequencing, two sRNAs (Lpr0001, and Lpr0069) were found to have substantial homology both at the sequence and the secondary structure levels, which is reminiscent of the Qrr1–Qrr4 sRNAs (Weissenmayer et al., 2011). A search for homologous sequences throughout the genome revealed 20 more copies of these sRNAs, one (Lpr0049) being partially antisense to lpg2142, which encodes a putative ORF. The consensus structure of these sRNAs is a long stem–loop with two central bulges comprised of nt and two small hairpins extruding from either side of the central stem 20 nt before the loop (Weissenmayer et al., 2011). Many of these sequences were found in other Legionella strains as well, often in the same configuration, which indicates that they are evolutionarily conserved and likely to play a beneficial role. Moreover, both the Lqs system and the homologous sRNA sequences are absent in L. longbeachae. These observations are only suggestive and experimental evidence is needed to link the Lqs quorum sensing system with this group of homologous sRNA sequences. It is noteworthy that deletion of all four Qrr sRNAs was needed to see a phenotype on the quorum sensing system (Lenz et al., 2004). Since only Lpr0001 and Lpr0069 seem to be expressed at good level, it might be informative to generate a double lpr0001/lpr0069 mutant and monitor its effect on a population density-related phenotype.
Although the vast majority of base-pairing sRNAs do not encode proteins, there are at least two examples where they do. In E. coli the sgrS gene encodes a sRNA, SgrS, and a small protein, SgrT, that together regulate glucose uptake by different strategies (Wadler and Vanderpool, 2007). In S. aureus, the sRNA RNA III targets virulence factors and functions as a key regulator of virulence, but also encodes a 26 amino acid long hemolysin (Boisset et al., 2007). Therefore, one should keep in mind that sRNAs are not necessarily non-coding. We recently identified two small RNA molecules, LstA and LstB that are predicted to encode small proteins with transmembrane motifs (Faucher et al., 2010). Because small proteins are difficult to predict accurately from genomic sequences, the hunt for small RNA molecules also has the potential benefit of filling the gaps of genomic annotation by also identifying putative small proteins and correcting errors in genome annotation.
Characterization of the mitochondrial genome of Diphyllobothrium latum (Cestoda: Pseudophyllidea) – implications for the phylogeny of eucestodes
The complete nucleotide sequence of the mitochondrial genome was determined for the fish tapeworm Diphyllobothrium latum . This genome is 13 608 bp in length and encodes 12 protein-coding genes (but lacks the atp8 ), 22 transfer RNA (tRNA) and 2 ribosomal RNA (rRNA) genes, corresponding to the gene complement found thus far in other flatworm mitochondrial (mt) DNAs. The gene arrangement of this pseudophyllidean cestode is the same as the 6 cyclophyllidean cestodes characterized to date, with only minor variation in structure among these other genomes the relative position of trnS2 and trnL1 is switched in Hymenolepis diminuta . Phylogenetic analyses of the concatenated amino acid sequences for 12 protein-coding genes of all complete cestode mtDNAs confirmed taxonomic and previous phylogenetic assessments, with D. latum being a sister taxon to the cyclophyllideans. High nodal support and phylogenetic congruence between different methods suggest that mt genomes may be of utility in resolving ordinal relationships within the cestodes. All species of Diphyllobothrium infect fish-eating vertebrates, and D. latum commonly infects humans through the ingestion of raw, poorly cooked or pickled fish. The complete mitochondrial genome provides a wealth of genetic markers which could be useful for identifying different life-cycle stages and for investigating their population genetics, ecology and epidemiology.
C. The Small RNAs: miRNA and siRNA in Eukaryotes
Micro RNAs (miRNAs) and small interfering RNAs (siRNAs) are found in C. elegans, a small nematode (roundworm) that quickly became a model for studies of cell and molecular biology and development. The particular attractions C. elegans are that (a) its genome has
21,700 genes, comparable to the
25,000 genes in a human genome! (b) it uses the products of these genes to produce an adult worm consisting of just 1031 cells organized into all of the major organs found in higher organisms (c) It is possible to trace the embryonic origins of every single cell in its body! C. elegans is illustrated below.
1. Small Interfering RNA (siRNA)
siRNA was first found in plants as well as in C. elegans. However, siRNAs (and miRNAs) are common in many higher organisms. siRNAs were so-named because they interfere with the function of other RNAs foreign to the cell or organism. Their action was dubbed RNA interference (RNAi). For their discovery of siRNAs, A. Z. Fire and C. C. Mello shared the 2006 Nobel Prize in Physiology or Medicine. The action of siRNA targeting foreign DNA is illustrated below.
When cells recognize foreign double-stranded RNAs (e.g., some viral RNA genomes) as alien, the DICER a nuclease called hydrolyzes them. The resulting short double-stranded hydrolysis products (the siRNAs) combine with RNAi Induced Silencing Complex, or RISC proteins. The antisense siRNA strand in the resulting siRNA-RISC complex binds to complementary regions of foreign RNAs, targeting them for degradation. Cellular use of RISC to control gene expression in this way may have derived from the use of RISC proteins by miRNAs as part of a cellular defense mechanism, to be discussed next.
Custom-designed siRNAs have been used to disable expression of specific genes in order to study their function in vivo and in vitro. Both siRNAs and miRNAs are being investigated as possible therapeutic tools to interfere with RNAs whose expression leads to cancer or other diseases.
For an example check out a Youtube video of unexpected results of an RNAi experiment at this link. In the experiment described, RNAi was used to block embryonic expression of the orthodenticle (odt) gene that is normally required for the growth of horns in a dung beetle. The effect of this knock-out mutation was, as expected, to prevent horn growth. What was unexpected however, was the development of an eye in the middle of the beetle&rsquos head (&lsquothird eye&rsquo in the micrograph).
The 3rd eye not only looks like an eye, but is a functional one. This was demonstrated by preventing normal eye development in odt-knockout mutants. The 3rd eye appeared&hellip, and was responsive to light! Keep in mind that this was a beetle with a 3rd eye, not Drosophila! To quote Justin Kumar from Indiana University, who though not involved in the research, stated that &ldquo&helliplessons learned from Drosophila may not be as generally applicable as I or other Drosophilists, would like to believe &hellip The ability to use RNAi in non-traditional model systems is a huge advance that will probably lead to a more balanced view of development.&rdquo
2. Micro RNAs (miRNA)
miRNAs target unwanted endogenous cellular RNAs for degradation. They are transcribed from genes now known to be widely distributed in eukaryotes. The pathway from pre-miRNA transcription through processing and target mRNA degradation is illustrated on the next page.
As they are transcribed, pre-miRNAs fold into a stem-loop structure that is lost during cytoplasmic processing. Like SiRNAs, mature miRNAs combine with RISC proteins. The RISC protein-miRNA complex targets old or no-longer needed mRNAs or mRNAs damaged during transcription.
An estimated 250 miRNAs in humans may be sufficient to H-bond to diverse target RNAs only targets with strong complementarity to a RISC protein-miRNA complex will be degraded.
To search for conserved intergenic sequences, NCBI/BLAST BLAST Assembled Genomes http://blast.ncbi.nlm.nih.gov/Blast.cgi and BLAST with microbial genomes http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi were used. Blast with microbial genomes used a value of 10 for expect and the default filter. Nucleotide blast searches were optimized for both highly similar sequences megablast and discontiguous megablast. Default parameters were used. For similar sequence megablast the parameters were: maximum target sequences, 100 automatically adjusted for short sequences expect, 10 word size, 28. Discontinuous match/mismatch scores, 1,-2 gap costs, linear filter, low complexity regions. Discontinuous megablast: same parameters as those of similar sequence megablast with the exception word size, 11 match/mismatch scores, 2, -3 gap costs, existence: 5 extension: 2.
The Swiss Institute of Bioinformatics SIB ExPASy Blast server  was used to find protein homologies. The blast program and data base used was: blastp – query against the UniProt Knowledgebase (Swiss-Prot + TrEMBL) and default parametes as shown under "Options" were used. The database was the complete database.
Initial searches for repeat sequences and RNA motifs were performed by "walking" intergenic sequences from plasmid lp28 of B. afzelii Pko. In addition, several regions that contain relatively large intergenic sequenes from B. burgdorferi B31 and Borrelia garinii PB plasmids were also scanned.
Intergenic regions were scanned at 200 bp at a time for conserved or partially sequences. These sequences were then modeled for conserved RNA stem loops. Cut offs in regions 5' and 3' of a determined stem loop(s) were made when the additional sequences failed to provide conserved stem-loops. Reverse transcript sequences as well as overlapping sequences at the 200 bp junctions were also structure modeled. Repeat sequences were found that displayed stem-loop structures, but these structures either were not found conserved in homologous sequences in other Borrelia species, or the nt sequence identity was too high and thus the structures did not show base-pair changes. These were discarded. The criteria for potential RNA identification were as follows: 1) presence of the sequence in three or more different plasmid regions and/or two or more Borrelia species, 2) presence of a conserved stem loop with at least 9 contiguous base-pairs, 3) two or more compensatory base changes that maintain a stem, 4) in some cases, the presence of conserved looped out or bulged positions.
RNA secondary structure modeling of repeat nt sequences was performed with the Zuker and Turner mfold, version 3.2 [28, 29]. Parameters used were: default window parameter, maximum interior/bulge loop size = 30, Maximum asymmetry of an interior/bulge loop = 30, and no limit on maximum distance between paired bases.