Large scale reverse transcription?

Large scale reverse transcription?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I need to make RNA:DNA duplexes. I can make 100 to 200 ug of mRNA through in vitro transcription, and I know how to use reverse transcription to make a cDNA library, but I have questions with this.

The mRNA I make with IVT is all poly A RNA, and the reverse transcription kit I have says to use 5ug of total RNA or 500ng of poly A RNA. I need to make at least 10ug of RNA:DNA, which is 20 times the suggested limit for poly A RNA, so a direct scale up probably isn't practical.

I can't find a literature precedent for making such large amounts of DNA through reverse transcription, everything I have found that uses RNA:DNA duplexes used short strands that were chemically synthesized and combined, my mRNA is 1800 bases long, so I can't just order an oligo, I have to make it.

EDIT : I can't comment on my own question.

The duplexes will be about 1800 bases long, it's for firefly luciferase.

Trying to make the DNA for the duplex from the original DNA template presents other issues. While it would be easy to cut it out from the plasmid, it would be double stranded, and I would have to mix with the RNA and heat it up to dissociate everything and hope that the RNA bound before the DNA strands reassociated. I also think it would complicate my controls, because it would be harder to say that any expression I see after delivering these duplexes came from the RNA and not the DNA. Of course synthesizing the DNA from the RNA could also introduce that problem.

Some crude calculations suggests that the dNTPs are the limiting reagent for a transcript this size, and that even with 5ug of mRNA I should have enough oligo dT primer for the reaction. I right now I am running reactions from 0.5 to 5ug of mRNA, 1uL of primer, and 4uL of dNTPs, to make sure I have enough. Hopefully the gel will turn out ok.

EDIT: I got the results. So far it looks like 3.0ug of mRNA works the best. It has the darkest band, and looks like the right length. Less mRNA gives fainter bands that appear too short, and more mRNA gives slightly fainter bands that appear slightly too short. Also, I ran the reaction for 2 hours instead of 1.

The best way for scaling up these kinds of reactions is to set up many reactions. Prepare a mastermix for lets say 20 reactions. You can pool them together, precipitate the RNA and dissolve to appropriate concentration in nuclease free water. Also, use gene specific primer instead of oligo-dT and don't add poly-A tails (poly-A does not affect the in-vitro stability of RNA).

As Mad Scientist already indicated, you would not need to make DNA again from RNA. Simply take 1:2 molar ratio of your IVT template dsDNA (restriction digested) and the RNA (post IVT). Heat and anneal slowly. DNA-RNA hybrids are stronger then DNA-DNA hybrids and the former is more likely to form after annealing. You can digest the unpaired strands using enzymes like mung bean nuclease.

Reverse Transcription Polymerase Chain Reaction in Giant Unilamellar Vesicles

We assessed the applicability of giant unilamellar vesicles (GUVs) for RNA detection using in vesicle reverse transcription polymerase chain reaction (RT-PCR). We prepared GUVs that encapsulated one-pot RT-PCR reaction mixture including template RNA, primers, and Taqman probe, using water-in-oil emulsion transfer method. After thermal cycling, we analysed the GUVs that exhibited intense fluorescence signals, which represented the cDNA amplification. The detailed analysis of flow cytometry data demonstrated that rRNA and mRNA in the total RNA can be amplified from 10–100 copies in the GUVs with 5–10 μm diameter, although the fraction of reactable GUV was approximately 60% at most. Moreover, we report that the target RNA, which was directly transferred into the GUV reactors via membrane fusion, can be amplified and detected using in vesicle RT-PCR. These results suggest that the GUVs can be used as biomimetic reactors capable of performing PCR and RT-PCR, which are important in analytical and diagnostic applications with additional functions.

Reverse Transcription Linearity

The relative concentration of total RNA can influence the efficiency of the RT and the concentration of cDNA produced from a given transcript. Therefore, it is desirable to include the same or a very similar concentration of RNA into all two-step cDNA synthesis reactions, unless the RT system has been verified to have a linear response. As can be seen in Figure 8.2, using a conventional RT protocol, the 100-fold dilutions of input RNA do not result in a corresponding 100-fold difference in cDNA yield for the templates tested. Interestingly, the data presented are duplicate qPCR run on duplicate RT reactions. As shown, the lack of linearity is reproducible between the two RT reactions.

Figure 8.2. Total RNA was diluted through 100-fold and reverse transcribed using two-step random priming two independent RT reactions were performed. β-actin was detected in duplicate qPCRs for each RT reaction. The RT is reproducible, but cDNA yield is not proportional to input RNA concentration. Therefore, if the experimental restraints require that a variable concentration of RNA is included in the RT, it is critical to verify that the protocol and reagent combination result in a linear response.

In the example shown in Figure 8.3, ReadyScript ® RT reagent (RDRT) was used to reverse transcribe total RNA from a 2-fold and a 10-fold serial dilution of template using a two-step protocol and a combination of oligo-dT (O4387) and random priming (described below). The CANX gene was detected in both dilution series with a direct proportionality to the input RNA concentration.

Figure 8.3. ReadyScript® RT reagent (RDRT) was used to reverse transcribe total RNA from a 2-fold and a 10-fold serial dilution. The gene CANX was detected in both dilution series resulting in a direct proportionality to the input RNA concentration (data from student groups attending an EMBL Advanced qPCR Workshop).


Reverse transcriptases were discovered by Howard Temin at the University of Wisconsin–Madison in Rous sarcoma virions [5] and independently isolated by David Baltimore in 1970 at MIT from two RNA tumour viruses: murine leukemia virus and again Rous sarcoma virus. [6] For their achievements, they shared the 1975 Nobel Prize in Physiology or Medicine (with Renato Dulbecco).

Well-studied reverse transcriptases include:

  • HIV-1 reverse transcriptase from human immunodeficiency virus type 1 ( PDB: 1HMV ​) has two subunits, which have respective molecular weights of 66 and 51 kDas. [7]
  • M-MLV reverse transcriptase from the Moloney murine leukemia virus is a single 75 kDa monomer. [8]
  • AMV reverse transcriptase from the avian myeloblastosis virus also has two subunits, a 63 kDa subunit and a 95 kDa subunit. [8] that maintains the telomeres of eukaryoticchromosomes. [9]

The enzymes are encoded and used by viruses that use reverse transcription as a step in the process of replication. Reverse-transcribing RNA viruses, such as retroviruses, use the enzyme to reverse-transcribe their RNA genomes into DNA, which is then integrated into the host genome and replicated along with it. Reverse-transcribing DNA viruses, such as the hepadnaviruses, can allow RNA to serve as a template in assembling and making DNA strands. HIV infects humans with the use of this enzyme. Without reverse transcriptase, the viral genome would not be able to incorporate into the host cell, resulting in failure to replicate.

Process of reverse transcription or retrotranscription Edit

Reverse transcriptase creates double-stranded DNA from an RNA template.

In virus species with reverse transcriptase lacking DNA-dependent DNA polymerase activity, creation of double-stranded DNA can possibly be done by host-encoded DNA polymerase δ, mistaking the viral DNA-RNA for a primer and synthesizing a double-stranded DNA by similar mechanism as in primer removal, where the newly synthesized DNA displaces the original RNA template.

The process of reverse transcription, also called retrotranscription or retrotras, is extremely error-prone, and it is during this step that mutations may occur. Such mutations may cause drug resistance.

Retroviral reverse transcription Edit

Retroviruses, also referred to as class VI ssRNA-RT viruses, are RNA reverse-transcribing viruses with a DNA intermediate. Their genomes consist of two molecules of positive-sense single-stranded RNA with a 5' cap and 3' polyadenylated tail. Examples of retroviruses include the human immunodeficiency virus (HIV) and the human T-lymphotropic virus (HTLV). Creation of double-stranded DNA occurs in the cytosol [10] as a series of these steps:

    tRNA acts as a primer and hybridizes to a complementary part of the virus RNA genome called the primer binding site or PBS.
  1. Reverse transcriptase then adds DNA nucleotides onto the 3' end of the primer, synthesizing DNA complementary to the U5 (non-coding region) and R region (a direct repeat found at both ends of the RNA molecule) of the viral RNA.
  2. A domain on the reverse transcriptase enzyme called RNAse H degrades the U5 and R regions on the 5’ end of the RNA.
  3. The tRNA primer then "jumps" to the 3’ end of the viral genome, and the newly synthesised DNA strands hybridizes to the complementary R region on the RNA.
  4. The complementary DNA (cDNA) added in (2) is further extended.
  5. The majority of viral RNA is degraded by RNAse H, leaving only the PP sequence.
  6. Synthesis of the second DNA strand begins, using the remaining PP fragment of viral RNA as a primer.
  7. The tRNA primer leaves and a "jump" happens. The PBS from the second strand hybridizes with the complementary PBS on the first strand.
  8. Both strands are extended to form a complete double-stranded DNA copy of the original viral RNA genome, which can then be incorporated into the host's genome by the enzyme integrase.

Creation of double-stranded DNA also involves strand transfer, in which there is a translocation of short DNA product from initial RNA-dependent DNA synthesis to acceptor template regions at the other end of the genome, which are later reached and processed by the reverse transcriptase for its DNA-dependent DNA activity. [11]

Retroviral RNA is arranged in 5’ terminus to 3’ terminus. The site where the primer is annealed to viral RNA is called the primer-binding site (PBS). The RNA 5’end to the PBS site is called U5, and the RNA 3’ end to the PBS is called the leader. The tRNA primer is unwound between 14 and 22 nucleotides and forms a base-paired duplex with the viral RNA at PBS. The fact that the PBS is located near the 5’ terminus of viral RNA is unusual because reverse transcriptase synthesize DNA from 3’ end of the primer in the 5’ to 3’ direction (with respect to the newly synthesized DNA strand). Therefore, the primer and reverse transcriptase must be relocated to 3’ end of viral RNA. In order to accomplish this reposition, multiple steps and various enzymes including DNA polymerase, ribonuclease H(RNase H) and polynucleotide unwinding are needed. [12] [13]

The HIV reverse transcriptase also has ribonuclease activity that degrades the viral RNA during the synthesis of cDNA, as well as DNA-dependent DNA polymerase activity that copies the sense cDNA strand into an antisense DNA to form a double-stranded viral DNA intermediate (vDNA). [14]

Self-replicating stretches of eukaryotic genomes known as retrotransposons utilize reverse transcriptase to move from one position in the genome to another via an RNA intermediate. They are found abundantly in the genomes of plants and animals. Telomerase is another reverse transcriptase found in many eukaryotes, including humans, which carries its own RNA template this RNA is used as a template for DNA replication. [15]

Initial reports of reverse transcriptase in prokaryotes came as far back as 1971 in France (Beljanski et al., 1971a, 1972) and a few years later in the USSR (Romashchenko 1977 [16] ). These have since been broadly described as part of bacterial Retrons, distinct sequences that code for reverse transcriptase, and are used in the synthesis of msDNA. In order to initiate synthesis of DNA, a primer is needed. In bacteria, the primer is synthesized during replication. [17]

Valerian Dolja of Oregon State argues that viruses, due to their diversity, have played an evolutionary role in the development of cellular life, with reverse transcriptase playing a central role. [18]

The reverse transcriptase employs a "right hand" structure similar to that found in other viral nucleic acid polymerases. [19] [20] In addition to the transcription function, retroviral reverse transcriptases have a domain belonging to the RNase H family, which is vital to their replication. By degrading the RNA template, it allows the other strand of DNA to be synthesized. [21] Some fragments from the digestion also serve as the primer for the DNA polymerase (either the same enzyme or a host protein), responsible for making the other (plus) strand. [19]

There are three different replication systems during the life cycle of a retrovirus. The first process is the reverse transcriptase synthesis of viral DNA from viral RNA, which then forms newly made complementary DNA strands. The second replication process occurs when host cellular DNA polymerase replicates the integrated viral DNA. Lastly, RNA polymerase II transcribes the proviral DNA into RNA, which will be packed into virions. Mutation can occur during one or all of these replication steps. [22]

Reverse transcriptase has a high error rate when transcribing RNA into DNA since, unlike most other DNA polymerases, it has no proofreading ability. This high error rate allows mutations to accumulate at an accelerated rate relative to proofread forms of replication. The commercially available reverse transcriptases produced by Promega are quoted by their manuals as having error rates in the range of 1 in 17,000 bases for AMV and 1 in 30,000 bases for M-MLV. [23]

Other than creating single-nucleotide polymorphisms, reverse transcriptases have also been shown to be involved in processes such as transcript fusions, exon shuffling and creating artificial antisense transcripts. [24] [25] It has been speculated that this template switching activity of reverse transcriptase, which can be demonstrated completely in vivo, may have been one of the causes for finding several thousand unannotated transcripts in the genomes of model organisms. [26]

Template switching Edit

Two RNA genomes are packaged into each retrovirus particle, but, after an infection, each virus generates only one provirus. [27] After infection, reverse transcription is accompanied by template switching between the two genome copies (copy choice recombination). [27] There are two models that suggest why RNA transcriptase switches templates. The first, the forced copy-choice model, proposes that reverse transcriptase changes the RNA template when it encounters a nick, implying that recombination is obligatory to maintaining virus genome integrity. The second, the dynamic choice model, suggests that reverse transcriptase changes templates when the RNAse function and the polymerase function are not in sync rate-wise, implying that recombination occurs at random and is not in response to genomic damage. A study by Rawson et al. supported both models of recombination. [27] From 5 to 14 recombination events per genome occur at each replication cycle. [28] Template switching (recombination) appears to be necessary for maintaining genome integrity and as a repair mechanism for salvaging damaged genomes. [29] [27]

Antiviral drugs Edit

As HIV uses reverse transcriptase to copy its genetic material and generate new viruses (part of a retrovirus proliferation circle), specific drugs have been designed to disrupt the process and thereby suppress its growth. Collectively, these drugs are known as reverse-transcriptase inhibitors and include the nucleoside and nucleotide analogues zidovudine (trade name Retrovir), lamivudine (Epivir) and tenofovir (Viread), as well as non-nucleoside inhibitors, such as nevirapine (Viramune).

Molecular biology Edit

Reverse transcriptase is commonly used in research to apply the polymerase chain reaction technique to RNA in a technique called reverse transcription polymerase chain reaction (RT-PCR). The classical PCR technique can be applied only to DNA strands, but, with the help of reverse transcriptase, RNA can be transcribed into DNA, thus making PCR analysis of RNA molecules possible. Reverse transcriptase is used also to create cDNA libraries from mRNA. The commercial availability of reverse transcriptase greatly improved knowledge in the area of molecular biology, as, along with other enzymes, it allowed scientists to clone, sequence, and characterise RNA.

Reverse transcriptase has also been employed in insulin production. By inserting eukaryotic mRNA for insulin production along with reverse transcriptase into bacteria, the mRNA could be inserted into the prokaryote's genome. Large amounts of insulin can then be created, sidestepping the need to harvest pig pancreas and other such traditional sources. Directly inserting eukaryotic DNA into bacteria would not work because it carries introns, so would not translate successfully using the bacterial ribosomes. Processing in the eukaryotic cell during mRNA production removes these introns to provide a suitable template. Reverse transcriptase converts this edited RNA back into DNA so it could be incorporated in the genome.

Large scale reverse transcription? - Biology

Robust catalytic activity (even at high temperatures) and high fidelity and template-primer affinity are desirable reverse transcriptase (RT) properties in most biotechnological applications.

Engineered murine leukemia virus RTs are the most commonly used enzymes, although enzymes from other retroviruses (avian myeloblastosis virus or HIV-1) and bacterial group II intron RTs are also efficient in most applications.

New technologies using reverse transcription and showing promise in life science research and technology are RNA sequencing (RNA-seq) (transcriptomics analysis), epitranscriptomics and synthetic biology, and genome editing (through the use of prime editors).

Reverse transcriptases (RTs) are enzymes that can generate a complementary strand of DNA (cDNA) from RNA. Coupled with PCR, RTs have been widely used to detect RNAs and to clone expressed genes. Classical retroviral RTs have been improved by protein engineering. These enzymes and newly characterized RTs are key elements in the development of next-generation sequencing techniques that are now being applied to the study of transcriptomics. In addition, engineered RTs fused to a CRISPR/Cas9 nickase have recently shown great potential as tools to manipulate eukaryotic genomes. In this review, we discuss the properties and uses of wild type and engineered RTs in biotechnological applications, from conventional RT-PCR to recently introduced prime editing.


Library preparation Edit

The general steps to prepare a complementary DNA (cDNA) library for sequencing are described below, but often vary between platforms. [8] [3] [9]

  1. RNA Isolation:RNA is isolated from tissue and mixed with deoxyribonuclease (DNase). DNase reduces the amount of genomic DNA. The amount of RNA degradation is checked with gel and capillary electrophoresis and is used to assign an RNA integrity number to the sample. This RNA quality and the total amount of starting RNA are taken into consideration during the subsequent library preparation, sequencing, and analysis steps.
  1. RNA selection/depletion: To analyze signals of interest, the isolated RNA can either be kept as is, filtered for RNA with 3' polyadenylated (poly(A)) tails to include only mRNA, depleted of ribosomal RNA (rRNA), and/or filtered for RNA that binds specific sequences (RNA selection and depletion methods table, below). The RNA with 3' poly(A) tails are mainly composed of mature, processed, coding sequences. Poly(A) selection is performed by mixing RNA with poly(T) oligomers covalently attached to a substrate, typically magnetic beads. [10][11] Poly(A) selection has important limitations in RNA biotype detection. Many RNA biotypes are not polyadenylated, including many noncoding RNA and histone-core protein transcripts, or are regulated via their poly(A) tail length (e.g., cytokines) and thus might not be detected after poly(A) selection. [12] Furthermore, poly(A) selection may increased 3' bias, especially with lower quality RNA. [13][14] These limitations can be avoided with ribosomal depletion, removing rRNA that typically represents over 90% of the RNA in a cell. Both poly(A) enrichment and ribosomal depletion steps are labor intensive and could introduce biases, so more simple approaches have been developed to omit these steps. [15] Small RNA targets, such as miRNA, can be further isolated through size selection with exclusion gels, magnetic beads, or commercial kits.
  1. cDNA synthesis: RNA is reverse transcribed to cDNA because DNA is more stable and to allow for amplification (which uses DNA polymerases) and leverage more mature DNA sequencing technology. Amplification subsequent to reverse transcription results in loss of strandedness, which can be avoided with chemical labeling or single molecule sequencing. Fragmentation and size selection are performed to purify sequences that are the appropriate length for the sequencing machine. The RNA, cDNA, or both are fragmented with enzymes, sonication, or nebulizers. Fragmentation of the RNA reduces 5' bias of randomly primed-reverse transcription and the influence of primer binding sites, [11] with the downside that the 5' and 3' ends are converted to DNA less efficiently. Fragmentation is followed by size selection, where either small sequences are removed or a tight range of sequence lengths are selected. Because small RNAs like miRNAs are lost, these are analyzed independently. The cDNA for each experiment can be indexed with a hexamer or octamer barcode, so that these experiments can be pooled into a single lane for multiplexed sequencing.

Complementary DNA sequencing (cDNA-Seq) Edit

The cDNA library derived from RNA biotypes is then sequenced into a computer-readable format. There are many high-throughput sequencing technologies for cDNA sequencing including platforms developed by Illumina, Thermo Fisher, BGI/MGI, PacBio, and Oxford Nanopore Technologies. [16] For Illumina short-read sequencing, a common technology for cDNA sequencing, adapters are ligated to the cDNA, DNA is attached to a flow cell, clusters are generated through cycles of bridge amplification and denaturing, and sequence-by-synthesis is performed in cycles of complementary strand synthesis and laser excitation of bases with reversible terminators. Sequencing platform choice and parameters are guided by experimental design and cost. Common experimental design considerations include deciding on the sequencing length, sequencing depth, use of single versus paired-end sequencing, number of replicates, multiplexing, randomization, and spike-ins. [17]

Small RNA/non-coding RNA sequencing Edit

When sequencing RNA other than mRNA, the library preparation is modified. The cellular RNA is selected based on the desired size range. For small RNA targets, such as miRNA, the RNA is isolated through size selection. This can be performed with a size exclusion gel, through size selection magnetic beads, or with a commercially developed kit. Once isolated, linkers are added to the 3' and 5' end then purified. The final step is cDNA generation through reverse transcription.

Direct RNA sequencing Edit

Because converting RNA into cDNA, ligation, amplification, and other sample manipulations have been shown to introduce biases and artifacts that may interfere with both the proper characterization and quantification of transcripts, [18] single molecule direct RNA sequencing has been explored by companies including Helicos (bankrupt), Oxford Nanopore Technologies, [19] and others. This technology sequences RNA molecules directly in a massively-parallel manner.

Single-molecule real-time RNA sequencing Edit

Massively parallel single molecule direct RNA-Seq has been explored as an alternative to traditional RNA-Seq, in which RNA-to-cDNA conversion, ligation, amplifcation, and other sample manipulation steps may introduce biases and artifacts. [20] Technology platforms that perform single-molecule real-time RNA-Seq include Oxford Nanopore Technologies (ONT) Nanopore sequencing, [19] PacBio IsoSeq, and Helicos (bankrupt). Sequencing RNA in its native form preserves modifications like methylation, allowing them to be investigated directly and simultaneously. [19] Another benefit of single-molecule RNA-Seq is that transcripts can be covered in full length, allowing for higher confidence isoform detection and quantification compared to short-read sequencing. Traditionally, single-molecule RNA-Seq methods have higher error rates compared to short-read sequencing, but newer methods like ONT direct RNA-Seq limit errors by avoiding fragmentation and cDNA conversion. Recent uses of ONT direct RNA-Seq for differential expression in human cell populations have demonstrated that this technology can overcome many limitations of short and long cDNA sequencing. [21]

Single-cell RNA sequencing (scRNA-Seq) Edit

Standard methods such as microarrays and standard bulk RNA-Seq analysis analyze the expression of RNAs from large populations of cells. In mixed cell populations, these measurements may obscure critical differences between individual cells within these populations. [22] [23]

Single-cell RNA sequencing (scRNA-Seq) provides the expression profiles of individual cells. Although it is not possible to obtain complete information on every RNA expressed by each cell, due to the small amount of material available, patterns of gene expression can be identified through gene clustering analyses. This can uncover the existence of rare cell types within a cell population that may never have been seen before. For example, rare specialized cells in the lung called pulmonary ionocytes that express the Cystic Fibrosis Transmembrane Conductance Regulator were identified in 2018 by two groups performing scRNA-Seq on lung airway epithelia. [24] [25]

Experimental procedures Edit

Current scRNA-Seq protocols involve the following steps: isolation of single cell and RNA, reverse transcription (RT), amplification, library generation and sequencing. Single cells are either mechanically separated into microwells (e.g., BD Rhapsody, Takara ICELL8, Vycap Puncher Platform, or CellMicrosystems CellRaft) or encapsulated in droplets (e.g., 10x Genomics Chromium, Illumina Bio-Rad ddSEQ, 1CellBio InDrop, Dolomite Bio Nadia). [26] Single cells are labeled by adding beads with barcoded oligonucleotides both cells and beads are supplied in limited amounts such that co-occupancy with multiple cells and beads is a very rare event. Once reverse transcription is complete, the cDNAs from many cells can be mixed together for sequencing transcripts from a particular cell are identified by each cell's unique barcode. [27] [28] Unique molecular identifier (UMIs) can be attached to mRNA/cDNA target sequences to help identify artifacts during library preparation. [29]

Challenges for scRNA-Seq include preserving the initial relative abundance of mRNA in a cell and identifying rare transcripts. [30] The reverse transcription step is critical as the efficiency of the RT reaction determines how much of the cell's RNA population will be eventually analyzed by the sequencer. The processivity of reverse transcriptases and the priming strategies used may affect full-length cDNA production and the generation of libraries biased toward the 3’ or 5' end of genes.

In the amplification step, either PCR or in vitro transcription (IVT) is currently used to amplify cDNA. One of the advantages of PCR-based methods is the ability to generate full-length cDNA. However, different PCR efficiency on particular sequences (for instance, GC content and snapback structure) may also be exponentially amplified, producing libraries with uneven coverage. On the other hand, while libraries generated by IVT can avoid PCR-induced sequence bias, specific sequences may be transcribed inefficiently, thus causing sequence drop-out or generating incomplete sequences. [31] [22] Several scRNA-Seq protocols have been published: Tang et al., [32] STRT, [33] SMART-seq, [34] CEL-seq, [35] RAGE-seq, [36] Quartz-seq [37] and C1-CAGE. [38] These protocols differ in terms of strategies for reverse transcription, cDNA synthesis and amplification, and the possibility to accommodate sequence-specific barcodes (i.e. UMIs) or the ability to process pooled samples. [39]

In 2017, two approaches were introduced to simultaneously measure single-cell mRNA and protein expression through oligonucleotide-labeled antibodies known as REAP-seq, [40] and CITE-seq. [41]

Applications Edit

scRNA-Seq is becoming widely used across biological disciplines including Development, Neurology, [42] Oncology, [43] [44] [45] Autoimmune disease, [46] and Infectious disease. [47]

scRNA-Seq has provided considerable insight into the development of embryos and organisms, including the worm Caenorhabditis elegans, [48] and the regenerative planarian Schmidtea mediterranea. [49] [50] The first vertebrate animals to be mapped in this way were Zebrafish [51] [52] and Xenopus laevis. [53] In each case multiple stages of the embryo were studied, allowing the entire process of development to be mapped on a cell-by-cell basis. [8] Science recognized these advances as the 2018 Breakthrough of the Year. [54]

Experimental considerations Edit

A variety of parameters are considered when designing and conducting RNA-Seq experiments:

  • Tissue specificity: Gene expression varies within and between tissues, and RNA-Seq measures this mix of cell types. This may make it difficult to isolate the biological mechanism of interest. Single cell sequencing can be used to study each cell individually, mitigating this issue.
  • Time dependence: Gene expression changes over time, and RNA-Seq only takes a snapshot. Time course experiments can be performed to observe changes in the transcriptome.
  • Coverage (also known as depth): RNA harbors the same mutations observed in DNA, and detection requires deeper coverage. With high enough coverage, RNA-Seq can be used to estimate the expression of each allele. This may provide insight into phenomena such as imprinting or cis-regulatory effects. The depth of sequencing required for specific applications can be extrapolated from a pilot experiment. [55]
  • Data generation artifacts (also known as technical variance): The reagents (e.g., library preparation kit), personnel involved, and type of sequencer (e.g., Illumina, Pacific Biosciences) can result in technical artifacts that might be mis-interpreted as meaningful results. As with any scientific experiment, it is prudent to conduct RNA-Seq in a well controlled setting. If this is not possible or the study is a meta-analysis, another solution is to detect technical artifacts by inferring latent variables (typically principal component analysis or factor analysis) and subsequently correcting for these variables. [56]
  • Data management: A single RNA-Seq experiment in humans is usually 1-5 Gb (compressed), or more when including intermediate files. [57] This large volume of data can pose storage issues. One solution is compressing the data using multi-purpose computational schemas (e.g., gzip) or genomics-specific schemas. The latter can be based on reference sequences or de novo. Another solution is to perform microarray experiments, which may be sufficient for hypothesis-driven work or replication studies (as opposed to exploratory research).

Transcriptome assembly Edit

Two methods are used to assign raw sequence reads to genomic features (i.e., assemble the transcriptome):

  • De novo: This approach does not require a reference genome to reconstruct the transcriptome, and is typically used if the genome is unknown, incomplete, or substantially altered compared to the reference. [58] Challenges when using short reads for de novo assembly include 1) determining which reads should be joined together into contiguous sequences (contigs), 2) robustness to sequencing errors and other artifacts, and 3) computational efficiency. The primary algorithm used for de novo assembly transitioned from overlap graphs, which identify all pair-wise overlaps between reads, to de Bruijn graphs, which break reads into sequences of length k and collapse all k-mers into a hash table. [59] Overlap graphs were used with Sanger sequencing, but do not scale well to the millions of reads generated with RNA-Seq. Examples of assemblers that use de Bruijn graphs are Trinity, [58] Oases [60] (derived from the genome assembler Velvet[61] ), Bridger, [62] and rnaSPAdes. [63] Paired-end and long-read sequencing of the same sample can mitigate the deficits in short read sequencing by serving as a template or skeleton. Metrics to assess the quality of a de novo assembly include median contig length, number of contigs and N50. [64]
  • Genome guided: This approach relies on the same methods used for DNA alignment, with the additional complexity of aligning reads that cover non-continuous portions of the reference genome. [65] These non-continuous reads are the result of sequencing spliced transcripts (see figure). Typically, alignment algorithms have two steps: 1) align short portions of the read (i.e., seed the genome), and 2) use dynamic programming to find an optimal alignment, sometimes in combination with known annotations. Software tools that use genome-guided alignment include Bowtie, [66] TopHat (which builds on BowTie results to align splice junctions), [67][68] Subread, [69] STAR, [65] HISAT2, [70] and GMAP. [71] The output of genome guided alignment (mapping) tools can be further utilized by tools such as Cufflinks [68] or StringTie [72] to reconstruct contiguous transcript sequences (i.e., a FASTA file).The quality of a genome guided assembly can be measured with both 1) de novo assembly metrics (e.g., N50) and 2) comparisons to known transcript, splice junction, genome, and protein sequences using precision, recall, or their combination (e.g., F1 score). [64] In addition, in silico assessment could be performed using simulated reads. [73][74]

A note on assembly quality: The current consensus is that 1) assembly quality can vary depending on which metric is used, 2) assembly tools that scored well in one species do not necessarily perform well in the other species, and 3) combining different approaches might be the most reliable. [75] [76] [77]

Gene expression quantification Edit

Expression is quantified to study cellular changes in response to external stimuli, differences between healthy and diseased states, and other research questions. Transcript levels are often used as a proxy for protein abundance, but these are often not equivalent due to post transcriptional events such as RNA interference and nonsense-mediated decay. [78]

Expression is quantified by counting the number of reads that mapped to each locus in the transcriptome assembly step. Expression can be quantified for exons or genes using contigs or reference transcript annotations. [8] These observed RNA-Seq read counts have been robustly validated against older technologies, including expression microarrays and qPCR. [55] [79] Tools that quantify counts are HTSeq, [80] FeatureCounts, [81] Rcount, [82] maxcounts, [83] FIXSEQ, [84] and Cuffquant. These tools determine read counts from aligned RNA-Seq data, but alignment-free counts can also be obtained with Sailfish [85] and Kallisto. [86] The read counts are then converted into appropriate metrics for hypothesis testing, regressions, and other analyses. Parameters for this conversion are:

  • Sequencing depth/coverage: Although depth is pre-specified when conducting multiple RNA-Seq experiments, it will still vary widely between experiments. [87] Therefore, the total number of reads generated in a single experiment is typically normalized by converting counts to fragments, reads, or counts per million mapped reads (FPM, RPM, or CPM). The difference between RPM and FPM was historically derived during the evolution from single-end sequencing of fragments to paired-end sequencing. In single-end sequencing, there is only one read per fragment (i.e., RPM = FPM). In paired-end sequencing, there are two reads per fragment (i.e., RPM = 2 x FPM). Sequencing depth is sometimes referred to as library size, the number of intermediary cDNA molecules in the experiment.
  • Gene length: Longer genes will have more fragments/reads/counts than shorter genes if transcript expression is the same. This is adjusted by dividing the FPM by the length of a feature (which can be a gene, transcript, or exon), resulting in the metric fragments per kilobase of feature per million mapped reads (FPKM). [88] When looking at groups of features across samples, FPKM is converted to transcripts per million (TPM) by dividing each FPKM by the sum of FPKMs within a sample. [89][90][91]
  • Total sample RNA output: Because the same amount of RNA is extracted from each sample, samples with more total RNA will have less RNA per gene. These genes appear to have decreased expression, resulting in false positives in downstream analyses. [87] Normalization strategies including quantile, DESeq2, TMM and Median Ratio attempt to account for this difference by comparing a set of non-differentially expressed genes between samples and scaling accordingly. [92]
  • Variance for each gene's expression: is modeled to account for sampling error (important for genes with low read counts), increase power, and decrease false positives. Variance can be estimated as a normal, Poisson, or negative binomial distribution [93][94][95] and is frequently decomposed into technical and biological variance.

Spike-ins for absolute quantification and detection of genome-wide effects Edit

RNA spike-ins are samples of RNA at known concentrations that can be used as gold standards in experimental design and during downstream analyses for absolute quantification and detection of genome-wide effects.

  • Absolute quantification: Absolute quantification of gene expression is not possible with most RNA-Seq experiments, which quantify expression relative to all transcripts. It is possible by performing RNA-Seq with spike-ins, samples of RNA at known concentrations. After sequencing, read counts of spike-in sequences are used to determine the relationship between each gene's read counts and absolute quantities of biological fragments [11][96] In one example, this technique was used in Xenopus tropicalis embryos to determine transcription kinetics. [97]
  • Detection of genome-wide effects: Changes in global regulators including chromatin remodelers, transcription factors (e.g., MYC), acetyltransferase complexes, and nucleosome positioning are not congruent with normalization assumptions and spike-in controls can offer precise interpretation. [98][99]

Differential expression Edit

The simplest but often most powerful use of RNA-Seq is finding differences in gene expression between two or more conditions (e.g., treated vs not treated) this process is called differential expression. The outputs are frequently referred to as differentially expressed genes (DEGs) and these genes can either be up- or down-regulated (i.e., higher or lower in the condition of interest). There are many tools that perform differential expression. Most are run in R, Python, or the Unix command line. Commonly used tools include DESeq, [94] edgeR, [95] and voom+limma, [93] [100] all of which are available through R/Bioconductor. [101] [102] These are the common considerations when performing differential expression:

  • Inputs: Differential expression inputs include (1) an RNA-Seq expression matrix (M genes x N samples) and (2) a design matrix containing experimental conditions for N samples. The simplest design matrix contains one column, corresponding to labels for the condition being tested. Other covariates (also referred to as factors, features, labels, or parameters) can include batch effects, known artifacts, and any metadata that might confound or mediate gene expression. In addition to known covariates, unknown covariates can also be estimated through unsupervised machine learning approaches including principal component, surrogate variable, [103] and PEER [56] analyses. Hidden variable analyses are often employed for human tissue RNA-Seq data, which typically have additional artifacts not captured in the metadata (e.g., ischemic time, sourcing from multiple institutions, underlying clinical traits, collecting data across many years with many personnel).
  • Methods: Most tools use regression or non-parametric statistics to identify differentially expressed genes, and are either based on read counts mapped to a reference genome (DESeq2, limma, edgeR) or based on read counts derived from alignment-free quantification (sleuth, [104] Cuffdiff, [105] Ballgown [106] ). [107] Following regression, most tools employ either familywise error rate (FWER) or false discovery rate (FDR) p-value adjustments to account for multiple hypotheses (in human studies,

20,000 protein-coding genes or

Downstream analyses for a list of differentially expressed genes come in two flavors, validating observations and making biological inferences. Owing to the pitfalls of differential expression and RNA-Seq, important observations are replicated with (1) an orthogonal method in the same samples (like real-time PCR) or (2) another, sometimes pre-registered, experiment in a new cohort. The latter helps ensure generalizability and can typically be followed up with a meta-analysis of all the pooled cohorts. The most common method for obtaining higher-level biological understanding of the results is gene set enrichment analysis, although sometimes candidate gene approaches are employed. Gene set enrichment determines if the overlap between two gene sets is statistically significant, in this case the overlap between differentially expressed genes and gene sets from known pathways/databases (e.g., Gene Ontology, KEGG, Human Phenotype Ontology) or from complementary analyses in the same data (like co-expression networks). Common tools for gene set enrichment include web interfaces (e.g., ENRICHR, g:profiler, WEBGESTALT) [114] and software packages. When evaluating enrichment results, one heuristic is to first look for enrichment of known biology as a sanity check and then expand the scope to look for novel biology.

Alternative splicing Edit

RNA splicing is integral to eukaryotes and contributes significantly to protein regulation and diversity, occurring in >90% of human genes. [115] There are multiple alternative splicing modes: exon skipping (most common splicing mode in humans and higher eukaryotes), mutually exclusive exons, alternative donor or acceptor sites, intron retention (most common splicing mode in plants, fungi, and protozoa), alternative transcription start site (promoter), and alternative polyadenylation. [115] One goal of RNA-Seq is to identify alternative splicing events and test if they differ between conditions. Long-read sequencing captures the full transcript and thus minimizes many of issues in estimating isoform abundance, like ambiguous read mapping. For short-read RNA-Seq, there are multiple methods to detect alternative splicing that can be classified into three main groups: [116] [89] [117]

  • Count-based (also event-based, differential splicing): estimate exon retention. Examples are DEXSeq, [118] MATS, [119] and SeqGSEA. [120]
  • Isoform-based (also multi-read modules, differential isoform expression): estimate isoform abundance first, and then relative abundance between conditions. Examples are Cufflinks 2 [121] and DiffSplice. [122]
  • Intron excision based: calculate alternative splicing using split reads. Examples are MAJIQ [123] and Leafcutter. [117]

Differential gene expression tools can also be used for differential isoform expression if isoforms are quantified ahead of time with other tools like RSEM. [124]

Coexpression networks Edit

Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. [125] Their main purpose lies in hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. [125] RNA-Seq data has been used to infer genes involved in specific pathways based on Pearson correlation, both in plants [126] and mammals. [127] The main advantage of RNA-Seq data in this kind of analysis over the microarray platforms is the capability to cover the entire transcriptome, therefore allowing the possibility to unravel more complete representations of the gene regulatory networks. Differential regulation of the splice isoforms of the same gene can be detected and used to predict their biological functions. [128] [129] Weighted gene co-expression network analysis has been successfully used to identify co-expression modules and intramodular hub genes based on RNA seq data. Co-expression modules may correspond to cell types or pathways. Highly connected intramodular hubs can be interpreted as representatives of their respective module. An eigengene is a weighted sum of expression of all genes in a module. Eigengenes are useful biomarkers (features) for diagnosis and prognosis. [130] Variance-Stabilizing Transformation approaches for estimating correlation coefficients based on RNA seq data have been proposed. [126]

Variant discovery Edit

RNA-Seq captures DNA variation, including single nucleotide variants, small insertions/deletions. and structural variation. Variant calling in RNA-Seq is similar to DNA variant calling and often employs the same tools (including SAMtools mpileup [131] and GATK HaplotypeCaller [132] ) with adjustments to account for splicing. One unique dimension for RNA variants is allele-specific expression (ASE): the variants from only one haplotype might be preferentially expressed due to regulatory effects including imprinting and expression quantitative trait loci, and noncoding rare variants. [133] [134] Limitations of RNA variant identification include that it only reflects expressed regions (in humans, <5% of the genome), could be subject to biases introduced by data processing (e.g., de novo transcriptome assemblies underestimate heterozygosity [135] ), and has lower quality when compared to direct DNA sequencing.

RNA editing (post-transcriptional alterations) Edit

Having the matching genomic and transcriptomic sequences of an individual can help detect post-transcriptional edits (RNA editing). [3] A post-transcriptional modification event is identified if the gene's transcript has an allele/variant not observed in the genomic data.

Fusion gene detection Edit

Caused by different structural modifications in the genome, fusion genes have gained attention because of their relationship with cancer. [136] The ability of RNA-Seq to analyze a sample's whole transcriptome in an unbiased fashion makes it an attractive tool to find these kinds of common events in cancer. [4]

The idea follows from the process of aligning the short transcriptomic reads to a reference genome. Most of the short reads will fall within one complete exon, and a smaller but still large set would be expected to map to known exon-exon junctions. The remaining unmapped short reads would then be further analyzed to determine whether they match an exon-exon junction where the exons come from different genes. This would be evidence of a possible fusion event, however, because of the length of the reads, this could prove to be very noisy. An alternative approach is to use paired-end reads, when a potentially large number of paired reads would map each end to a different exon, giving better coverage of these events (see figure). Nonetheless, the end result consists of multiple and potentially novel combinations of genes providing an ideal starting point for further validation.

RNA-Seq was first developed in mid 2000s with the advent of next-generation sequencing technology. [139] The first manuscripts that used RNA-Seq even without using the term includes those of prostate cancer cell lines [140] (dated 2006), Medicago truncatula [141] (2006), maize [142] (2007), and Arabidopsis thaliana [143] (2007), while the term "RNA-Seq" itself was first mentioned in 2008 [144] The number of manuscripts referring to RNA-Seq in the title or abstract (Figure, blue line) is continuously increasing with 6754 manuscripts published in 2018. The intersection of RNA-Seq and medicine (Figure, gold line) has similar celerity. [145]

Applications to medicine Edit

RNA-Seq has the potential to identify new disease biology, profile biomarkers for clinical indications, infer druggable pathways, and make genetic diagnoses. These results could be further personalized for subgroups or even individual patients, potentially highlighting more effective prevention, diagnostics, and therapy. The feasibility of this approach is in part dictated by costs in money and time a related limitation is the required team of specialists (bioinformaticians, physicians/clinicians, basic researchers, technicians) to fully interpret the huge amount of data generated by this analysis. [146]

Large-scale sequencing efforts Edit

A lot of emphasis has been given to RNA-Seq data after the Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA) projects have used this approach to characterize dozens of cell lines [147] and thousands of primary tumor samples, [148] respectively. ENCODE aimed to identify genome-wide regulatory regions in different cohort of cell lines and transcriptomic data are paramount in order to understand the downstream effect of those epigenetic and genetic regulatory layers. TCGA, instead, aimed to collect and analyze thousands of patient's samples from 30 different tumor types in order to understand the underlying mechanisms of malignant transformation and progression. In this context RNA-Seq data provide a unique snapshot of the transcriptomic status of the disease and look at an unbiased population of transcripts that allows the identification of novel transcripts, fusion transcripts and non-coding RNAs that could be undetected with different technologies.

This article was submitted to WikiJournal of Science for external academic peer review in 2019 (reviewer reports). The updated content was reintegrated into the Wikipedia page under a CC-BY-SA-3.0 license ( 2021 ). The version of record as reviewed is: Felix Richter et al. (17 May 2021). "A broad introduction to RNA-Seq". WikiJournal of Science. 4 (2): 4. doi:10.15347/WJS/2021.004. ISSN 2470-6345. Wikidata Q100146647.

Endogenous Reverse Transcriptase Could Allow mRNA Vaccines to Permanently Alter DNA

The Defender is experiencing censorship on many social channels. Be sure to stay in touch with the news that matters by subscribing to our top news of the day. It’s free.

Research on SARS-CoV-2 RNA by scientists at Harvard and MIT has implications for how mRNA vaccines could permanently alter genomic DNA, according to Doug Corrigan, Ph.D., a biochemist-molecular biologist who says more research is needed.

Over the past year, it would be all but impossible for Americans not to notice the media’s decision to make vaccines the dominant COVID narrative, rushing to do so even before any coronavirus-attributed deaths occurred.

The media’s slanted coverage has provided a particularly fruitful public relations boost for messenger RNA (mRNA) vaccines — decades in the making but never approved for human use — helping to usher the experimental technology closer to the regulatory finish line.

Under ordinary circumstances, the body makes (“transcribes”) mRNA from the DNA in a cell’s nucleus. The mRNA then travels out of the nucleus into the cytoplasm, where it provides instructions about which proteins to make.

By comparison, mRNA vaccines send their chemically synthesized mRNA payload (bundled with spike protein-manufacturing instructions) directly into the cytoplasm.

According to the Centers for Disease Control and Prevention (CDC) and most mRNA vaccine scientists, the buck then stops there — mRNA vaccines “do not affect or interact with our DNA in any way,” the CDC says. The CDC asserts first, that the mRNA cannot enter the cell’s nucleus (where DNA resides), and second, that the cell — Mission-Impossible-style — “gets rid of the mRNA soon after it is finished using the instructions.”

A December preprint about SARS-CoV-2, by scientists at Harvard and Massachusetts Institute of Technology (MIT), produced findings about wild coronavirus that raise questions about how viral RNA operates.

The scientists conducted the analysis because they were “puzzled by the fact that there is a respectable number of people who are testing positive for COVID-19 by PCR long after the infection was gone.”

Their key findings were as follows: SARS-CoV-2 RNAs “can be reverse transcribed in human cells,” “these DNA sequences can be integrated into the cell genome and subsequently be transcribed” (a phenomenon called “retro-integration”) — and there are viable cellular pathways to explain how this happens.

According to Ph.D. biochemist and molecular biologist Dr. Doug Corrigan, these important findings (which run contrary to “current biological dogma”) belong to the category of “Things We Were Absolutely and Unequivocally Certain Couldn’t Happen Which Actually Happened.”

The findings of the Harvard and MIT researchers also put the CDC’s assumptions about mRNA vaccines on shakier ground, according to Corrigan. In fact, a month before the Harvard-MIT preprint appeared, Corrigan had already written a blog outlining possible mechanisms and pathways whereby mRNA vaccines could produce the identical phenomenon.

In a second blog post, written after the preprint came out, Corrigan emphasized that the Harvard-MIT findings about coronavirus RNA have major implications for mRNA vaccines — a fact he describes as “the big elephant in the room.” While not claiming that vaccine RNA will necessarily behave in the same way as coronavirus RNA — that is, permanently altering genomic DNA — Corrigan believes that the possibility exists and deserves close scrutiny.

In Corrigan’s view, the preprint’s contribution is that it “validates that this is at least plausible, and most likely probable.”

Reverse transcription

As the phrase “reverse transcription” implies, the DNA-to-mRNA pathway is not always a one-way street. Enzymes called reverse transcriptases can also convert RNA into DNA, allowing the latter to be integrated into the DNA in the cell nucleus.

Nor is reverse transcription uncommon. Geneticists report that “Over 40% of mammalian genomes comprise the products of reverse transcription.”

The preliminary evidence cited by the Harvard-MIT researchers indicates that endogenous reverse transcriptase enzymes may facilitate reverse transcription of coronavirus RNAs and trigger their integration into the human genome.

The authors suggest that while the clinical consequences require further study, detrimental effects are a distinct possibility and — depending on the integrated viral fragments’ “insertion sites in the human genome” and an individual’s underlying health status — could include “a more severe immune response … such as a ‘cytokine storm’ or auto-immune reactions.”

In 2012, a study suggested that viral genome integration could “lead to drastic consequences for the host cell, including gene disruption, insertional mutagenesis and cell death.”

Corrigan makes a point of saying that the pathways hypothesized to facilitate retro-integration of viral — or vaccine — RNA into DNA “are not unknown to people who understand molecular biology at a deeper level.”

Even so, the preprint’s discussion of reverse transcription and genome integration elicited a maelstrom of negative comments from readers unwilling to rethink biological dogma, some of whom even advocated for retraction (though preprints are, by definition, unpublished) on the grounds that “conspiracy theorists … will take this paper to ‘proof’ that mRNA vaccines can in fact alter your genetic code.”

More thoughtful readers agreed with Corrigan that the paper raises important questions. For example, one reader stated that confirmatory evidence is lacking “to show that the spike protein only is expressed for a short amount of time (say 1-3 days) after vaccination,” adding, “We think that this is the case, but there is no evidence for that.”

In fact, just how long the vaccines’ synthetic mRNA — and thus the instructions for cells to keep manufacturing spike protein — persist inside the cells is an open question.

Ordinarily, RNA is a “notoriously fragile” and unstable molecule. According to scientists, “this fragility is true of the mRNA of any living thing, whether it belongs to a plant, bacteria, virus or human.”

But the synthetic mRNA in the COVID vaccines is a different story. In fact, the step that ultimately allowed scientists and vaccine manufacturers to resolve their decades-long mRNA vaccine impasse was when they figured out how to chemically modify mRNA to increase its stability and longevity — in other words, produce RNA “that hangs around in the cell much longer than viral RNA, or even RNA that our cell normally produces for normal protein production.”

It is anyone’s guess what the synthetic mRNA is doing while it is “hanging around,” but Corrigan speculates that its enhanced longevity raises the probability of it “being converted over into DNA.”

Moreover, because the vaccine mRNA is also engineered to be more efficient at being translated into protein, “negative effects could be more frequent and more pronounced with the vaccine when compared to the natural virus.” ….


Experimental Design

To study the RT error and RDD rates at STRs, we designed the following experiment ( fig. 1). We isolated genomic DNA and total RNA from the same sample (orangutan testis of a single individual). The genomic DNA was sequenced using two different library preparation protocols—PCR-containing and PCR-free (see Materials and Methods section for details)—allowing us to test for genotype congruence between the two libraries (see “Genotyping STRs Using the DNA Sequencing Data” in Results). Total RNA was divided into two aliquots that were used to construct two separate RNA-seq libraries. Each of these two libraries was sequenced in two separate batches. Such an experiment, ideally, should allow one to differentiate between RDDs (such differences from the DNA sequence should be present in both RNA-seq libraries) and RT errors (such variants should be present in only one of the two RNA-seq libraries but in both sequencing batches). However, empirical data frequently have missing information at some loci due to limited sampling, which can distort results. For example, if a deviant STR variant is not sampled in one cDNA library, then an RT error can be incorrectly inferred instead of an RDD. For instance, if one-tenth of RNA molecules at a locus was modified from (A)6 to (A)7 due to RDD, then we should expect to observe (A)7 in both replicated cDNA libraries sequenced. However, if (A)7 was not sampled in one library, then we will observe (A)7 only in the other library, thereby misclassifying this situation as an RT error. Therefore, we developed a full likelihood method that permits sampling errors in the likelihood calculation to avoid error misclassifications.

CDNA cloning and library construction

One of the first applications of reverse transcriptase in molecular biology was the construction of cDNA libraries [2-4]. A cDNA library consists of cDNA clones that represent the transcribed sequences within a specific sample. Therefore, a library provides information about the temporal and spatial expression of genes for a given cell type, organ, or developmental stage, for example. The cDNA library clones are used in the characterization of novel RNA transcripts, determination of gene sequences, and expression of recombinant proteins.

Essential in constructing cDNA libraries is the proper representation of RNAs in their full length and/or their relative abundance, making the selection of a reverse transcriptase extremely important. Highly processive reverse transcriptases are capable of synthesizing long cDNAs as well as capturing low-abundance RNAs. Similarly, reverse transcritpases with increased thermostability are recommended for reverse-transcribing RNA with a high degree of secondary structure. (Learn more about reverse transcriptase attributes) (White paper:Engineered reverse transcriptase)

After reverse transcription, a number of approaches may be used to insert cDNA into a vector for cloning. The double-stranded cDNAs after second-strand synthesis often have blunt ends and can be cloned into blunt-ended vectors (Figure 5A). Although this approach involves fewer steps, blunt-end cloning may result in less efficient ligation and loss of directionality after insertion. (Learn more about cloning workflow)

Alternatively, cDNA ends may be modified to include additional nucleotides of known sequences. For example, to modify the 5′ end of cDNA, oligo(dT) primers with additional 5′ nucleotides can be used to initiate reverse transcription to modify the 3′ end, short DNA oligos called linkers or adapters with desired sequences may be ligated (Figure 5B). In this manner, sites for directional insertion (e.g., restriction and homologous recombination), promoter binding (e.g., T3 and T7 sequences), and affinity purification (e.g., biotin and His tags) can be readily incorporated into the cDNA sequence. (Learn more about DNA library construction)

In another popular strategy, the 3′ ends of cDNA inserts and vectors are enzymatically extended with complementary homopolymeric tails. Using terminal deoxynucleotidyl transferase (TdT) and a single dNTP, a string of 20–30 nucleotides can be added to an insert, and a similar string of complementary nucleotides added to a vector (e.g., Cs on the insert and Gs on the vector), enabling the vector and insert tails to anneal to each other (Figure 5C). Ligation is not required because the gaps are repaired inside the bacteria after transformation.

When the target sequence is known, the insert may be generated by RT-PCR for cloning of a specific region of a cDNA (Figure 5D). (Learn more about PCR cloning)

Learn more

Related products


Real-time RT-PCR is extremely powerful and can generate reliable, reproducible, and biologically meaningful results. However, this brief review of some of the underlying problems should also have made it clear that great care must be taken in planning and analyzing real-time RT-PCR assays. We have barely touched on the problems of normalization and reference genes (previously known as housekeeping genes), and have not mentioned �solute” versus relative quantification or the need for standard curves and how they should be generated. Because the reporting of Ct values alone can conceal as much as it reports, we believe it is necessary to begin a concerted effort to introduce more standard analysis and reporting procedures, as has been done for microarray technology in the establishment of the MIAME guidelines ( Certainly, in the absence of such standards for real-time RT-PCR, it falls to the editors of journals to ensure that papers that include this technology are appropriately reviewed, and that any conclusions are rigorously supported by the actual data.

For the researcher, it is vital to consider each stage of the experimental protocol, starting with the laboratory setup and proceeding through sample acquisition, template preparation, RT, and finally the PCR step. Only if every one of these stages is properly validated is it possible to obtain reliable quantitative data. Of course, choice of chemistries, primers and probes, and instruments must be appropriate to whatever is being quantitated. Finally, data must be interpreted, and this remains a real problem. Clearly, real-time qPCR is a valuable, versatile, and powerful technique. But, like anything powerful, it needs to be treated with respect.

Watch the video: ΕΥΡΩΠΑΙΚΟ ΠΡΩΤΑΘΛΗΜΑ TAEKWON DO ΗΡΑΚΛΕΙΟ 26 31:10 (December 2022).