Are there eukaryotes without introns?

Are there eukaryotes without introns?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

This question on the function of introns in eukaryotic genes made me think: I know that more basal organisms have smaller introns and fewer alternatively spliced exons compared to mammals. But are there eukaryotes whose genomes have no introns at all?

I don't believe so. I have never come across a eukaryote that does not have introns although there are some genes that do not contain introns. Some eukaryotes like ciliates actually contain other non-intronic intergenetic regions that seem to be nonfunctional. These are called Internally Existed Sequences and are removed from the germ-line active micronucleus before going to form the transcriptionally active micronucleus. Interestingly, introns are also present in these organisms and are present in both the micro and macronucleus. These introns are spliced out in textbook fashion. I believe one of the organisms where introns was discovered was in Tetrahymena thermophila with the intron in the rDNA sequence.

Why Prokaryotes Genomes Lack Genes with Introns Processed by Spliceosomes?

Until 1977, we thought that eukaryotic genes were like those of prokaryotes, that is, continuous sequences beginning with an initiation codon (ATG), followed by an open reading frame, always multiple of three bases (codons), and the message (mRNA) stopped when a stop codon (TAA, TAG or TGA) was reached. This paradigm, which historically seemed completely logical, changed dramatically when two groups, leaded by Sharp and Roberts (Berget et al. 1977 Chow et al. 1977), discovered that (at least some) eukaryotic protein coding genes were interrupted by non-coding sequences and eliminated from the mature (translated) mRNA before translation. Then, Gilbert (1978) coined the concept of exons (regions of the coding DNA that remained in the mRNA) and introns (the regions that are eliminated from the mature mRNA, and therefore are not present in the encoded proteins).

Today, there are no doubts that this discovery was a revolution in genetics. It not only challenged our previous definition of what a “gene” is, but led to discoveries and concepts such as splicing (that is, how introns are eliminated and exons are put together to make the mature mRNA), alternative splicing (how different exons from the same “gene” can be combined to make different proteins), or to the great discovery that some RNAs, once transcribed, can eliminate by themselves introns, a mechanism known as autosplicing (see, for instance, Bass and Cech 1986 Cech and Bass 1986 Guerrier-Takada and Altman 1986). In turn, the discovery of autosplicing not only reinforced the idea of the “RNA world” (for a review see Lehman 2015) but eliminated, forever, the time unanimous idea that proteins were the only catalytic molecules.

Therefore, the presence of introns in the majority of eukaryotic genes has challenged most of our concepts about genes, their regulation, their evolution, what is an enzyme… and last, but not least, why there are only probably fewer than 20,000 genes in the human genome while a “simple” organism like Escherichia coli has only around 4000. In other words, introns and how they are eliminated from the mature mRNAs has changed our concepts about molecular biology and evolution.

But given what we have said in the above lines (which of course do not pretend to be a review about the subject), there is a problem that, in our opinion, deserves some attention. As known, prokaryotes display an enormous divergence and different metabolic routes and lifestyles and occupy all known environments. Then, why did they never develop introns processed by spliceosomes? In the next few lines we shall propose an explanation.

Of course, the simplest one is that given that in prokaryotes transcription and translation are coupled, such a system should be a disadvantage from an evolutionary point of view. Furthermore, introns should be present. But there is no evidence that this was the case. Until now, nothing new. But there is a point that, in our opinion, seems very important.

As is known, the modern spliceosome, in its simplest form, is a complex of not less than ten different proteins and several RNAs. Let us imagine that this complex, or one even simplest, evolved in a prokaryote. And for some reason, which might be due to combine different genes in new, longest ones, it became fixed (or the existence or primitive introns). It is difficult to imagine such scenario, but let assume that it indeed happened, for example, “putting together” different pieces of genes from the same operon. This could be an advantage, because different pieces from different mRNAs could combine to produce new proteins with different, but related, functions. There is no biological constraint that can prevent this. Even more, it should be a new way to create new genes and, as a consequence, new functions. We stress that this scenario is hard to imagine just because (as far as we know) it did not happen. But if there was an example, it should not be a big surprise.

However in our opinion it did not happen because of another reason: one of the main forces in the evolution of prokaryotes is horizontal gene transfer (see, for example, Puigbò et al. 2010). As is known, for a gene (or group of genes) to be fixed several biochemical and evolutionary “steps” must be fulfilled, among them are: (a) to be transferred as a unit, (b) to carry (or to be integrated near) a promoter, (c) to not disturb the normal functions of the receptor, and (d) to confer a selective advantage.

Very probably, a putative “primitive spliceosome” (PS) was not as complex as the modern one. But in any case, it should be a rather complex particle, composed by several proteins and RNAs. For it to be transferred successfully, several conditions are needed: (a) All the components of the PS should be transferred simultaneously, which is hard to imagine, because we need to postulate a large “PS operon”, which probably did not existed as such, and therefore, multiple events need to be invoked, which is very unlikely. (b) Introns cannot be possible in the receptor (otherwise, it should had a PS), and if they did not exist, a PS machinery very probably should be extremely harmful for the receptor and eliminated from the population, because of the non-adaptation of genes to the action of the xeno-PS. (c) Even if a and b were disregarded (which of course is more than unlikely), new genes acquired by HGT by the receptor of PS should be negatively affected by the PS acquired. Hence, new events of HGT should be eliminated by the receptor, eliminating, as a consequence, one the major forces in evolution.

Hence, we conclude that in the same manner that HGT was one of the main factors that contributed to fix the universal genetic code as postulated by Vetsigian et al. (2006), it could be a major force inhibiting the appearance (and fixation) of introns processed by spliceosomes among prokaryotes.

Gene Regulation in Eukaryotes

Let us make an in-depth study of the gene regulation in eukaryotes. After reading this article you will learn about: 1. Chromatin Modification 2. Control of Transcription by Hormones 3. Regulation of Processing of mRNA 4. Control of Life Span of mRNA 5. Gene Amplification 6. Post Translation Regulation and 7. Post Transcription Gene Silencing.

Introduction to Gene Regulation:

The expression of genes can be regulated in eukaryotes by all the principles as those of prokaryotes. But there are many additional mechanisms of control of gene expression in eukaryotes as genome is much bigger. The genes are present in the nucleus where mRNA is synthesized. The mRNA is then exported to cytoplasm where translation takes place.

In eukaryotes, the organization is multicellular and specialized into tissues and organs. The cells are differentiated and cells of a tissue generally produce a specific protein involving a particular set of genes. All other genes become permanently shut off and are never transcribed.

Structural features of eukaryotes that influence the gene expression are the presence of nucleosomes in chromatin, heterochromatin and the presence of the split genes in chromosomes.

As compared to prokaryotic genes, the eukaryotic genes have many more regulatory binding sites and they are controlled by many more regulatory proteins. Regulatory sequences can be present thousands of nucleotides away from the promoter, may lie upstream and downstream. These regulatory sequences act from a distance. The intervening DNA loops out, so that the regulatory sequence and promoter come to lie near each other.

Most of the regulation of gene control occurs at the initiation of transcription level. Initiation of translation also influences gene regulation immensely.

Chromatin Modification:

The genome of eukaryotes is wrapped in histone proteins to form nucleosomes. This condition leads to partial concealment of genes and reduces the expression of genes.

The packing of DNA with histone octomers is not permanent. Any portion of DNA can be released from the octomer whenever DNA binding proteins have to act on it. These DNA binding proteins or enzymes recognize their binding sites on DNA only when it is released from histone octomer or when present on linker DNA. The DNA is unwrapped from nucleosomes.

This unwrapping of DNA from nucleosomes is performed by nucleosome modifier enzymes or nucleosome remodelling complex. They act in various ways. They may remodel the structure of octomer or slide the octomer along DNA, thus uncover the DNA binding sites for the action of regulatory proteins. Thus the genes are activated.

Some of these nucleosome modifiers add acetyl groups (acetylation) to the tails of histones, thus loosen the DNA wrapping and in the process exposes the DNA binding sites. All these lead to the expression of genes. Similarly, deacetylation by deacetylases causes inactivation of DNA.

Nucleosomes are entirely absent in the regions that are active in transcription like rRNA genes.

Dense form of chromatin is called heterochromatin in eukaryotes. It leads to gene inhibition or gene silencing. Heterochromatin is densely packaged part of chromatin which does not allow gene expression. Densely packaged chromatin cannot be easily transcribed. Some enzymes make the chromatin more dense. Telomeres and contromeres are in the form of heterochromatin.

In higher animals about 50% of the genome is in the form of heterochromatin. Enzymes are capable of changing the density of chromatin by chemically modifying the tails of histones. This affects transcription.

In this way, both activation and repression of transcription is performed by modification of chromatin into heterochromatin and euchromatin.

Methylation of certain sequences of DNA prevents the transcription of genes in mammals. It has been observed that genes, which are heavily methylated are not transcribed, therefore not expressed. DNA methylase enzymes cause methylation of certain DNA sequences thereby silencing of genes.

Control of Transcription by Hormones:

Various intercellular and intracellular signals regulate the gene expression.

Hormones exercise considerable control over transcription. Hormones are extracellular substances synthesized by endocrine glands. They are carried to the distant target cells. Various hormones like insulin, estrogen, progesterone, testosterone etc. often act by “switching on” transcription of DNA.

The hormone on entering a target cell forms a complex with the receptor present in the cytoplasm. This hormone-receptor complex enters the nucleus and binds to a particular chromosome by means of specific proteins. This initiates the transcription. Hormone-receptor complex can enhance or suppress the expression of genes.

It has been observed in chickens that when hormone estrogen is injected, the oviduct responds by synthesizing mRNA, which is responsible for synthesis of albumen. The hormone directly binds to DNA and acts as an inducer.

Regulation of Processing of mRNA:

Genes of eukaryotes have non-coding regions (introns) in between coding regions (exons). Such genes are called split genes. The entire gene is transcribed to produce mRNA which is called precursor mRNA or primary transcript (pre-mRNA). Before translation takes place, the introns are spliced out by excision and discarded. This is known as processing of mRNA and the processed mRNA is called mature mRNA. This takes part in protein synthesis. Mature mRNA is considerably smaller than precursor mRNA.

Higher eukaryotes have various mechanisms by which pre-mRNA is processed in alternate or differential ways to produce different mRNAs which encode different proteins. Multiple proteins are produced from one gene by alternate mRNA processing. Many cells take advantage of different splicing pathways to alter the expression of genes and synthesize different polypeptides. Alternate mRNA splicing increases the number of proteins expressed by a single eukaryotic gene.

Alternate processing of pre-mRNA is accomplished by exon skipping, by retaining certain introns etc.

These alternate processing pathways are highly regulated.

In drosophilla mRNA is processed in four different ways, therefore produces four different kinds of muscle protein myosin. Different kind of myosin is produced in larva, pupa and late embryonic stages.

Control of Life Span of mRNA:

In prokaryotes the life span of an mRNA molecule is very brief, lasting only for a minute or less. The mRNA immediately degenerates after the protein synthesis.

But as the mRNA in eukaryotes is transported to cytoplasm through the nucleopores, this mRNA is repeatedly translated. This repeated translation of mRNA is achieved by increasing the life span of mRNA. In a highly differentiated cell, single mRNA molecule having long life span is able to produce large amount of single protein. Life span of a eukaryotic mRNA varies from a few hours to several days.

Chicken oviduct cells have a single copy of ovalbumen gene but produce large amount of albumen.

Silk gland of silkworm produces a very long thread made of protein fibroin, which forms cocoon. Silk gland is a single polyploid cell. It produces large number of mRNA molecules, which have long life span of several days.

Gene Amplification:

A mechanism exists in various organisms whereby the number of genes is increased many fold without mitosis division. This is called gene amplification.

During amplification DNA repeatedly undergoes replication without mitotic separation into daughter DNA molecules or chromatids. This enables the cell to produce large amount of protein in a short time.

Post Translation Regulation:

In prokaryotes, a single polycistronic mRNA molecule codes for many different proteins. But in eukaryotes having mono-cistronic mRNA, synthesis of different proteins is achieved in a different way. A single mRNA yields a large polypeptide called polyprotein. This polyprotein is then cleaved in alternate ways to produce different proteins. Each protein is regarded as the product of a single gene. In this system, there are many cleaving sites on the polyprotien.

Post Transcription Gene Silencing:

Many small RNAs exist in eukaryotes that play their role in silencing of genes. These small RNAs act on mRNA resulting in disruption of translation. These small RNAs are micro RNAs (miRNAs), small interfering RNAs (siRNAs) and many others.

Do eukaryotic genes without introns exist?

Our professor told us that eukaryotic genes without introns exist, but she didn't say which genes that would be.

This is more tangential than eukaryotic genes that have evolved for millions of years without introns, but I don't have any scholarly resources for that situation.

However, a class of microscopic animals called bdelloid rotifers has been discovered to have some amazing properties. An interesting property of bdelloids is that they can survive complete dessication - absence of water - at any stage of their life cycle. Most organisms can only survive dessication at certain life cycles (like plant seeds). This appears to be the result of truly amazing DNA repair capabilities.

How does this tie into eukaryotic genes without introns? Well, when bdelloid rotifers dry out, their cell membranes split open, much like with many other organisms. When water comes along and they spring up, it appears that fragmented DNA from other dried-up organisms sometimes gets washed into the bdelloids, and the amazing DNA repair mechanisms simply integrate this DNA into the bdelloid genome.

The result is that modern bdelloids have relatively modern versions of genes from drastically distantly-related organisms, including plants, fungi, and even bacteria. Some genes are broken, others appear to be functional but not used, and others yield active protein products in the living bdelloids. Amazingly, there is one case of a gene that is virtually identical to a gene from e. coli . except that it has hand an intron inserted into it by the bdelloids. In transcription, the bdelloids remove the intron and it yields a functional enzyme. Presumably, at one point, this was a gene that functioned without any introns.


Strains and Plasmids

Two independent colonies of the strain 10I (MATaura3Δ0/ura3Δ0 leu2Δ0/leu2Δ0 lys2Δ0/lys2Δ0 ADE2/ade2Δ::hisG HIS3/his3Δ200) were used in parallel for intron deletions, these strains were created by mating the strains LLY34 (MATaura3Δ0 leu2Δ0 lys2Δ0 ade2Δ::hisG) and LLY35 (MATα ura3Δ0 leu2Δ0 lys2Δ0 his3Δ200). Strains LLY34 and LLY35 were obtained by mating the strains BY4700 (MATaura3Δ0) and BY4705 (MATα ura3Δ0 leu2Δ0 lys2Δ0 ade2Δ::hisG his3Δ200 trp1Δ63 met15Δ0 Brachmann et al., 1998). Cells lacking the MTR2 gene (JPY500) were created by replacing the coding sequence as well as 5′ and 3′ UTRs with URA3 gene using standard PCR-based gene replacement (Sikorski and Hieter, 1989) in the 10I2 strain. The PCR fragments were generated using the primers D-MTR2_for, 5′-GCACGCTTGGCGTCAATATCCTAAAACGGAAAACTAATCAGTTAGATGTGAGATTGTACTGAGAGTGCAC-3′ and D-MTR2_rev, 5′-GTACCCGTGCAGCCGTTTCCGTGCCTCGGTTCCTCCGAGATATCCTTAGGCTGTGCGGTATTTCACACCG-3′ (the underlined bases are complementary to pRS plasmids Sikorski and Hieter, 1989). Strains lacking YRA1 and the TAD3 genes were obtained from the yeast knockout strains of Open Biosystems (Open Biosystems, Huntsville, AL, record no. 24217: MATaura3Δ0/ura3Δ0 leu2Δ0/leu2Δ0 lys2Δ0/LYS2 his3Δ1/his3Δ1 met15Δ0/MET15 yra1Δ::KMX4/YRA1 and record no. 25225: MATaura3Δ0/ura3Δ0 leu2Δ0/leu2Δ0 lys2Δ0/LYS2 his3Δ1/his3Δ1 met15Δ0/MET15 tad3Δ::KMX4/TAD3, respectively). The plasmid pISCE was generated by inserting a 3.85-kb EcoRI fragment of pPEX7 (Fairhead and Dujon, 1993) containing the I-SceI endonuclease under the control of the inducible GAL1-CYC1 promoter into the EcoRI site of pRS424 (Christianson et al., 1992). The expression plasmid pRS425-SCE was constructed by inserting the 3.94-kb SacI-ApaI fragment of pISCE into the 6753-base pair SacI-ApaI fragment of pRS425 (Christianson et al., 1992). The plasmid pURA3-SCE that contains the recognition site of I-SceI in front of the URA3 marker was constructed by inserting two annealed oligonucleotides (Nde.SCE for: TACCGTAGGGATAACAGGGTAATCGG and Nde.SCE rev: TACCGATTACCCTGTTATCCCTACGG, where the underlined bases are the I-SceI site) in the NdeI site of pRS306 (Sikorski and Hieter, 1989). To amplify the yeast ACT1 promoter from wild-type genomic DNA, we used the following primers for PCR: XhoI-ACT1p-F 5′-CTCCTCGAGCTCACCCTAACATATTTTCCAATTA-3′ (the underlined bases are complementary to position −450 to −435 relative to the ACT1 translational start site, and the 5′ extension contains an XhoI site) and HindIII-ACT1p-rev 5′-CCCAAGCTTTGTTAATTCAGTAAATTTTCGATCT-3′ (the underlined bases are complementary to position −25 to −1 relative to the ACT1 translational start site, and the 5′ extension contains an HindIII site). The PCR product was digested with XhoI and HindIII and cloned into pRS316 (Sikorski and Hieter, 1989) to generate pACT1pr. The mtr2Δ1i, mtr2Δ2i, mtr2Δ3i, mtr2Δ4i, mtr2Δ5i, and mtr2Δ6i genes were amplified by PCR from their respective heterozygous diploid yeast strain with primers created against regions −262 to −237 and +1681 to +1706 relative to the MTR2 start codon defines by the Saccharomyces genome database (SGD). The PCR products were digested with the appropriate restriction enzymes and cloned into pACT1pr. The yeast gene YRA1 with or without intron (Δi) was amplified by PCR from wild-type or yra1Δi DNA, respectively, with primers created against regions 0 (without promoter) or 430 base pairs (with its own promoter) upstream of the start codon and 153 base pairs downstream of the stop codon. The PCR products were digested with the appropriate restriction enzymes and cloned into pACT1pr or pRS316 (Sikorski and Hieter, 1989). The pACT1pr-TAD3Δ2i was constructed by inserting the TAD3 cDNA into the pACT1pr plasmid. Yeast cells were transformed by a modification (Gietz and Woods, 2002) of the lithium acetate method (Ito et al., 1983) and were grown in standard yeast media (Zakian and Scott, 1982 Rose et al., 1990). All plasmids were propagated in bacteria using standard Escherichia coli strains and growth conditions (Sambrook et al., 1989).

Direct Introns Displacement Strategy

To remove an intron, we used direct intron displacement strategy described in Figure S1A. A fragment containing the endonuclease I-SceI recognition site in front of the URA3 marker gene was amplified from the plasmid pURA3-SCE. The forward primer used to amplify URA3 included 80 nucleotides complementary to the exon 2 sequence of the targeted genes, whereas the reverse primer included 45 nucleotides complementary to the targeted introns sequence. The generated PCR products were used in a second round of PCR amplification as template. The reverse primer was the same as that used in the first round, whereas the forward primer included sequence representing the perfect junction between exon 1 (45 nt) and exon 2 (55 nt). This resulting PCR product was transformed in the diploid 10I1 and 10I2 yeast strains harboring the pRS425-SCE plasmid. After verifying the fragment insertion by PCR, the production of the I-SceI enzyme was induced for 8 h at 30°C by growing the cells in the presence of galactose (final concentration, 2%) to favor the pop-out of the URA3 gene (Plessis et al., 1992 Lukacsovich et al., 1994 Cohen-Tannoudji et al., 1998). Loss of the URA3 marker was screened by growing cells on media containing 5-fluoroorotic acid (5-FOA Boeke et al., 1987). Cells containing the correct replacements were sporulated and dissected, and haploid strains containing the mutated allele were mated together to obtained homozygote diploid yeast strains. Each step of the intron direct displacement strategy was verified by PCR amplification using primers upstream and downstream of the exons 1 and 2 junction. The list of the primers used can be supplied upon request.

Growth Assays

The yeast strains were grown to ∼4 × 10 6 cells/ml in rich growth medium (YPD). For drugs assays, 50 μl of yeast cells suspension was added to 50 μl of each drug diluted in YPD (see Table S1 for final concentration drug concentrations were adjusted to decrease the growth of wild-type cells by 50%). For carbon sources assays, the yeast cells were washed twice and resuspended in rich medium without glucose (YEP). Assays were carried by mixing equal volume of washed yeast cells and YEP. The carbon source was added separately and adjusted to a final concentration of 2% except for the glycerol that was 3%. Samples were prepared in triplicate and the assay were performed in Costar 96-well polystyrene not treated microplate (Fisher Scientific Company, Ottawa, ON, Canada), and growth was monitored using PowerWave microplate spectrophotometer reader (BioTek Instruments, Winooski, VT Toussaint et al., 2006). Methyl methanesulfonate (MMS), hydrogen peroxide (H2O2), benomyl, ketoconazole, camptothecin, calcofluor white, cycloheximide, and caffeine were obtained from Sigma-Aldrich Canada (Oakville, ON, Canada). Hygromycin B was obtained from Roche Diagnostics (Laval, QC, Canada). Staurosporine and cyclosporin A were obtained from Alexis (Alexis Biochemicals, San Diego, CA). Latrunculin A was obtained from Calbiochem (VWR CANLAB, Mississauga, ON, Canada).

Growth Curves Analysis

The maximal growth rate (μm) of each growth curve was calculated as described previously (Toussaint et al., 2006). For each 96-well microplate, the mutant cells μm was divided by that calculated for wild-type growing in the same plate.

Fitness Test

Competitive growth assays were performed by adding equal quantity (6.5 × 10 5 cells/ml) of wild-type (ade2Δ background) and Δi (ADE2 background) strains into 25 ml of YPD containing 100 μg/ml adenine to avoid the accumulation of the red pigmentation of ade2 mutants. The mixed cultures were grown at 30°C and diluted each day (at 6.5 × 10 5 cells/ml) until 50 generations (±2) were reached. The number of generations was calculated using this equation: [G = ln (Cf/Ci)/ln (2)], where G represents the number of generations and Cf and Ci are the final and the initial concentration of cells of the mixed cultures, respectively. At the beginning of the experiment (generation 0) and after 50 divisions, ∼300 cells of the mixed cultures were plated on SC plates containing a low concentration of adenine (20 μg/ml) to permit the red pigmentation of ade2Δ cells and grown at 30°C. The white (Δi) and red (wild type) cells were counted to calculate the percentage of each strain at the beginning and at the end of the experiment.

Northern Blot

Total RNA from exponentially growing cells was isolated and blotted as described earlier (Abou Elela et al., 1996 Abou Elela and Ares, 1998). The levels of each mRNA were quantified using a random priming probe complementary to the last exon of the corresponding gene. Quantification for all signals was obtained by storage phosphorimaging with Storm and ImageQuant software (GE Healthcare Bio-Sciences, Baie d'Urfé, QC, Canada). The mRNA levels of the intronless genes was normalized to HSC82 or RPL8A that were used as internal control. Loading of the SAC6 mRNA was normalized using the 25S rRNA as reference.

Materials and Methods

Data Sources

We obtained publicly available genome and full-length cDNA sequence data for D. melanogaster, A. thaliana, human, and mouse ( table 1). Boundaries between UTRs and CDSs in full-length cDNA sequences were determined using annotations from GenBank (D. melanogaster and A. thaliana) and the Mammalian Genome Consortium (human and mouse).

Sources and Versions for Genome and Full-Length cDNA Sequence Data

Species Genome Sequence Full-Length cDNA Reference
Arabidopsis thalianaThe Institute for Genome Research (version 13 June 2001) Knowledge-based Oryza Molecular biological Encyclopedia (24 October 2003) ( Castelli et al. 2004)
Drosophila melanogasterBerkeley Drosophila Genome Project (release 3) Berkeley Drosophila Genome Project (10 July 2003) ( Stapleton et al. 2002)
Human GenBank (build 34.3) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)
Mouse GenBank (build 32.1) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)
Species Genome Sequence Full-Length cDNA Reference
Arabidopsis thalianaThe Institute for Genome Research (version 13 June 2001) Knowledge-based Oryza Molecular biological Encyclopedia (24 October 2003) ( Castelli et al. 2004)
Drosophila melanogasterBerkeley Drosophila Genome Project (release 3) Berkeley Drosophila Genome Project (10 July 2003) ( Stapleton et al. 2002)
Human GenBank (build 34.3) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)
Mouse GenBank (build 32.1) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)

Sources and Versions for Genome and Full-Length cDNA Sequence Data

Species Genome Sequence Full-Length cDNA Reference
Arabidopsis thalianaThe Institute for Genome Research (version 13 June 2001) Knowledge-based Oryza Molecular biological Encyclopedia (24 October 2003) ( Castelli et al. 2004)
Drosophila melanogasterBerkeley Drosophila Genome Project (release 3) Berkeley Drosophila Genome Project (10 July 2003) ( Stapleton et al. 2002)
Human GenBank (build 34.3) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)
Mouse GenBank (build 32.1) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)
Species Genome Sequence Full-Length cDNA Reference
Arabidopsis thalianaThe Institute for Genome Research (version 13 June 2001) Knowledge-based Oryza Molecular biological Encyclopedia (24 October 2003) ( Castelli et al. 2004)
Drosophila melanogasterBerkeley Drosophila Genome Project (release 3) Berkeley Drosophila Genome Project (10 July 2003) ( Stapleton et al. 2002)
Human GenBank (build 34.3) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)
Mouse GenBank (build 32.1) Mammalian Genome Consortium (28 January 2004) ( Strausberg et al. 2002)

Intron Positions

Intron positions were determined through the recognition of gaps in alignment of full-length cDNA transcripts with genomic sequences. In brief, for a single full-length cDNA aligned against a contiguous stretch of genomic sequence, exons were determined as proximal blocks of homologous sequence alignment between full-length cDNA and genomic sequence, whereas introns were determined as gaps between exons consisting solely of genomic sequence.

We first cleaned the full-length cDNA libraries by removing transcripts with inconsistent annotation and incomplete CDSs. We then aligned the cleaned library of full-length cDNA transcripts to genome sequences with BLAT ( Kent 2002). Apart from being very fast, the search algorithm used by BLAT has at least two advantages for alignment of potentially spliced transcripts to genome sequences. First, BLAT begins by searching for high-quality matches of short discrete sequences (K-mers, each 8–16 nt), and attempts to “stitch together” proximal high-quality K-mers by extending the match through intervening sequences that also provide high-quality matches. The scale at which this occurs is that of typical exon size. Second, once K-mers are stitched into blocks of high-quality alignment, gaps between matching blocks are adjusted so that the ends of gaps provide the best match to consensus sequences typical of intron ends ( Kent 2002).

The BLAT alignment for each transcript was refined in two steps. First, the best alignment was chosen, defined as that alignment having the highest sequence identity greater than 95% if no alignment had sequence identity greater than 95%, the transcript was discarded. If there were multiple best alignments with equal sequence identity, the longest alignment was chosen. Second, putative exon blocks separated by fewer than 5 bp were merged. Under some sequence and gap size conditions, BLAT does not stitch together proximal blocks ( Kent 2002). We reasoned that gaps with fewer than 5 bp represent indels within full-length transcript sequences rather than actual introns. These small gaps occurred in 0.02% of A. thaliana full-length alignments, 15% of D. melanogaster alignments, 19% of human alignments, and 25% of mouse alignments. Per gap-containing full-length alignment, the mean total gap length was 7.7 bp in A. thaliana, 2.8 bp in D. melanogaster, 6.2 bp in human, and 6.7 bp in mouse. We merged these gaps while moving through each alignment in a 5′–3′ direction along the positive-sense genomic strand. We thus introduced the possibility of a slight bias to intron positions that increases in absolute magnitude from 0 bp at the intron in the 5′-most genomic position within the alignment to a maximum of the total gap length at the intron in the 3′-most genomic position within the alignment. For genes encoded on the positive-sense genomic strand, this bias increased in the 5′–3′ direction within the full-length transcript, whereas for genes encoded on the negative-sense genomic strand, this bias increased in the 3′-to-5′ direction within the transcript. As genes have essentially equal proportions of positive- and negative-sense orientations in these genomes, the data set-wide degree of bias was negligible. We recorded intron positions according to their location from the 5′ end of each full-length transcript.

Following the refinement of the BLAT alignment, we created our set of “qualifying transcripts.” A qualifying transcript contained at least one intron within its 5′ UTR, CDS, or 3′ UTR, as indicated by the alignment. Additionally, to avoid potential inconsistencies introduced by our use of automated alignments, we required that all introns in a qualifying transcript were between 20 bp and 100 kbp (100,000 bp) in length. We chose 20 bp as our minimum intron size because we were concerned about the potential inflation of intron numbers due to spurious gaps larger than our merge limit of 5 bp, and because very few introns are known to be less than 20 bp in length. For example, minimum CDS intron size was 13 bp in 2903 genes from 10 eukaryotes ( Deutsch and Long 1999) and minimum intron size in ESTs was 27 bp in a diverse collection of fungi ( Kupfer et al. 2004). A number of introns as large as 100 kbp and larger are known in, for example, humans ( Nobile et al. 1997 Bärlund et al. 2002) but we did not wish to include such introns in our data set without manual confirmation of each such alignment. Such extremely large introns were extremely rare in our data set. For all species, larger data sets constructed using less stringent qualifying rules resulted in equivalent estimates and distributions, though the occurrence of extremely large introns (>100 kbp) for D. melanogaster, human, and mouse somewhat increased estimates sensitive to extreme outliers (data not shown).

Statistical Analysis of Intron Distribution

We analyzed the general distribution of introns within each region by examining the distribution of exon sizes, which are directly dependent upon intron locations. For example, the expected mean exon size for a region containing ni introns is (length of region/(ni + 1)) regardless of the pattern of intron distribution, so we instead calculate the effective number of exons ne, which is sensitive to the variance in exon size ( Lynch and Kewalramani 2003). When introns are uniformly distributed, resulting in all exons having equal length, then ne = ni + 1. When introns are closely clumped so that one exon is much longer than the others, then ne approaches 1. To calculate ne, for each exon within a region, we determined its size ei relative to the total length of the region, so that the ei within each region sum to 1. We then calculated ne for each species, region, and number of introns using ne = 1/∑i=1 n ei 2 , which is equivalent to the classical formula used in population genetics to calculate the effective number of alleles for a locus ( Kimura and Crow 1964).

Falling between the two extremes of ne = 1, for unusually high variation in exon length, and ne = ni + 1, for no variation in exon length, are expected values of ne for a random distribution of introns within a region, which serve as a null model for comparison. Introns positioned at random create a distribution of exon sizes that follows the “broken-stick” distribution for random partitions of a finite distance ( MacArthur 1957 Goss and Lewontin 1996 Lynch and Kewalramani 2003). For each set of species' genes containing ni = 1–5 introns in a region, we calculated mean ne ± standard error. We then calculated the broken-stick expectation of ne for ni random intron locations via simulation. To create this null expectation, we randomly chose a region length in bp from the set of all observed regions having ni introns for each species, with replacement. Within this randomly chosen observed region length, we then randomly chose locations for ni introns and calculated the relative length ei of each resulting exon. We calculated ne for this simulated region and repeated this for 10 5 iterations for each combination of species, region, and number of introns ni. We call this random distribution of ne values the “unrestricted random distribution.” Other than an absolute minimum exon size limit of 1 bp, we did not place a lower limit on exon size in these simulations, so this approach assumes an absence of exon-size constraints. However, such extremely short exons are quite rare ( Deutsch and Long 1999) and introduce the possibility of numerous splicing difficulties ( Dominiski and Kole 1991, 1992 Sterner and Berget 1993 Carlo et al. 1996). We thus also simulated “minimum-exon-size random distributions” for each combination of species, region, and number of introns. In these simulations, the random draw of intron locations was repeated until the size of the smallest resulting exon was ≥20 bp this is somewhat smaller than the smallest exon size that can apparently be constitutively spliced reliably without additional splicing enhancers (∼50 bp Dominiski and Kole 1991, 1992). The minimum-exon-size random distribution has a larger mean ne than the unrestricted random distribution, and the difference between the mean ne of the two distribution increases as the length of the simulated region decreases (see also Goss and Lewontin 1996). As there are species-specific relationships between mean total exon length of a region and the number of introns found therein ( Lynch and Kewalramani 2003), our minimum-exon-size distributions are not independent of species identities. There is negligible difference among species in the two random distributions within CDSs (data not shown). However, the shorter lengths of 5′ UTRs accentuate the among-species differences ( fig. 3), such that we present separate minimum-exon-size random distributions for human and mouse as a group, and for A. thaliana and D. melanogaster as a group.

Here, we will call observed intron distributions overdispersed or underdispersed in comparison to one of these random distributions if the mean ne value is greater than or less than, respectively, the simulated values of ne for the appropriate random distribution. Overdispersed introns are more uniformly distributed in the region in comparison to a random distribution, whereas underdispersed introns are more clumped.

Chromosome and Eukaryotic Chromosome: Difference

1. Structural organization of bacterial chromosome is simple and is represented mainly by double-stranded DNA molecule. Although there are specific proteins associated with bacterial chromosome (not the histories) that help stabilize its supercoiled domains. Compared to eukaryotic chromosome, one can consider bacterial chromosome to be naked DNA.

2. Bacterial chromosome is covalently closed circular structure consisting of only a single molecule of DNA (with few exceptions such as Borellia burgdorfii and Streptomyces the chromosomes of which are linear).

3. Only one bacterial chromosome occurs per bacterial cell (with few exceptions such as Rhodobacter sphaeroides, a gram-negative phototroph that possesses 2 chromosomes per cell).

4. Bacterial chromosome contains only a single copy of each gene and is therefore genetically haploid.

5. Bacterial chromosomes are shorter and contain lesser number of genes.

6. Bacterial chromosome lies free in the cell cytoplasm without any membrane to separate the chromosome from the cytoplasm., Since the ribosomes also occur free in cell cytoplasm, the process of transcription and translation are not spatially separated.

7. Except few, the bacterial DNAs do not contain introns, the noncoding sequences. As a result, the protein coding genes are not interrupted by introns and synthesize a single mRNA often containing more than one coding region. Each coding region independently synthesizes one or more proteins depending upon the number of operons (Fig. 5.37).

Difference # Eukaryotic Chromosome:

1. Structural organization of eukaryotic chromosome is complex as it contains more than just DNA. In addition to DNA large amount of histone proteins wound around DNA molecule in a very regular fashion to form structures called nucleosome. The nucleosomes aggregate to form a fibres material called chromatin, which itself further compact by folding and looping to eventually form very dense structure called chromosome.

2. Each eukaryotic chromosome is linear and consists of several pieces of DNA.

3. Eukaryotic chromosomes are more than one per cell, and this number varies with the organism. For example, Saccharomyces cerevisiae, a single-called Baker’s yeast, contains 16 chromosomes arranged in eight pairs, while human cells contain 46 chromosomes arranged in 23 pairs.

4. Eukaryotic chromosome typically contains 2 copies of each gene and is therefore genetically diploid. The diploid eukaryotic genome is halved to haploid via the process called meiosis.

5. Eukaryotic chromosomes are larger and contain greater number of genes.

6. Eukaryotic chromosome occurs in the cell nucleus, which is surrounded by nuclear membrane that separates chromosome from the cytoplasm while the ribosomes are in the cytoplasm, the processes of transaction and translation are spatially separated.

7. Eukaryotic chromosome contains both introns (noncoding sequence) and exons (coding sequence). As a result, the protein coding genes are interrupted by introns. Both introns and exons are transcribed into the primary RNA transcript from which the nature (functional) mRNA is formed by excision of introns and transported to the cytoplasm for translation (protein synthesis) (Fig. 5.38).


Introns were first discovered in protein-coding genes of adenovirus, [8] [9] and were subsequently identified in genes encoding transfer RNA and ribosomal RNA genes. Introns are now known to occur within a wide variety of genes throughout organisms and viruses within all of the biological kingdoms.

The fact that genes were split or interrupted by introns was discovered independently in 1977 by Phillip Allen Sharp and Richard J. Roberts, for which they shared the Nobel Prize in Physiology or Medicine in 1993. [10] The term intron was introduced by American biochemist Walter Gilbert: [5]

"The notion of the cistron [i.e., gene] . must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger – which I suggest we call introns (for intragenic regions) – alternating with regions which will be expressed – exons." (Gilbert 1978)

The term intron also refers to intracistron, i.e., an additional piece of DNA that arises within a cistron. [11]

Although introns are sometimes called intervening sequences, [12] the term "intervening sequence" can refer to any of several families of internal nucleic acid sequences that are not present in the final gene product, including inteins, untranslated regions (UTR), and nucleotides removed by RNA editing, in addition to introns.

The frequency of introns within different genomes is observed to vary widely across the spectrum of biological organisms. For example, introns are extremely common within the nuclear genome of jawed vertebrates (e.g. humans and mice), where protein-coding genes almost always contain multiple introns, while introns are rare within the nuclear genes of some eukaryotic microorganisms, [13] for example baker's/brewer's yeast (Saccharomyces cerevisiae). In contrast, the mitochondrial genomes of vertebrates are entirely devoid of introns, while those of eukaryotic microorganisms may contain many introns. [14]

A particularly extreme case is the Drosophila dhc7 gene containing a ≥3.6 megabase (Mb) intron, which takes roughly three days to transcribe. [15] [16] On the other extreme, a recent study suggests that the shortest known eukaryotic intron length is 30 base pairs (bp) belonging to the human MST1L gene. [17]

Splicing of all intron-containing RNA molecules is superficially similar, as described above. However, different types of introns were identified through the examination of intron structure by DNA sequence analysis, together with genetic and biochemical analysis of RNA splicing reactions.

At least four distinct classes of introns have been identified: [1]

    that are removed by spliceosomes (spliceosomal introns)
  • Introns in nuclear and archaeal transfer RNA genes that are removed by proteins (tRNA introns)
  • Self-splicing group I introns that are removed by RNA catalysis
  • Self-splicing group II introns that are removed by RNA catalysis

Group III introns are proposed to be a fifth family, but little is known about the biochemical apparatus that mediates their splicing. They appear to be related to group II introns, and possibly to spliceosomal introns. [18]

Spliceosomal introns Edit

Nuclear pre-mRNA introns (spliceosomal introns) are characterized by specific intron sequences located at the boundaries between introns and exons. [19] These sequences are recognized by spliceosomal RNA molecules when the splicing reactions are initiated. [20] In addition, they contain a branch point, a particular nucleotide sequence near the 3' end of the intron that becomes covalently linked to the 5' end of the intron during the splicing process, generating a branched (lariat) intron. Apart from these three short conserved elements, nuclear pre-mRNA intron sequences are highly variable. Nuclear pre-mRNA introns are often much longer than their surrounding exons.

TRNA introns Edit

Transfer RNA introns that depend upon proteins for removal occur at a specific location within the anticodon loop of unspliced tRNA precursors, and are removed by a tRNA splicing endonuclease. The exons are then linked together by a second protein, the tRNA splicing ligase. [21] Note that self-splicing introns are also sometimes found within tRNA genes. [22]

Group I and group II introns Edit

Group I and group II introns are found in genes encoding proteins (messenger RNA), transfer RNA and ribosomal RNA in a very wide range of living organisms., [23] [24] Following transcription into RNA, group I and group II introns also make extensive internal interactions that allow them to fold into a specific, complex three-dimensional architecture. These complex architectures allow some group I and group II introns to be self-splicing, that is, the intron-containing RNA molecule can rearrange its own covalent structure so as to precisely remove the intron and link the exons together in the correct order. In some cases, particular intron-binding proteins are involved in splicing, acting in such a way that they assist the intron in folding into the three-dimensional structure that is necessary for self-splicing activity. Group I and group II introns are distinguished by different sets of internal conserved sequences and folded structures, and by the fact that splicing of RNA molecules containing group II introns generates branched introns (like those of spliceosomal RNAs), while group I introns use a non-encoded guanosine nucleotide (typically GTP) to initiate splicing, adding it on to the 5'-end of the excised intron.

While introns do not encode protein products, they are integral to gene expression regulation. Some introns themselves encode functional RNAs through further processing after splicing to generate noncoding RNA molecules. [25] Alternative splicing is widely used to generate multiple proteins from a single gene. Furthermore, some introns play essential roles in a wide range of gene expression regulatory functions such as nonsense-mediated decay [26] and mRNA export. [27]

The biological origins of introns are obscure. After the initial discovery of introns in protein-coding genes of the eukaryotic nucleus, there was significant debate as to whether introns in modern-day organisms were inherited from a common ancient ancestor (termed the introns-early hypothesis), or whether they appeared in genes rather recently in the evolutionary process (termed the introns-late hypothesis). Another theory is that the spliceosome and the intron-exon structure of genes is a relic of the RNA world (the introns-first hypothesis). [28] There is still considerable debate about the extent to which of these hypotheses is most correct. The popular consensus at the moment is that introns arose within the eukaryote lineage as selfish elements. [29]

Early studies of genomic DNA sequences from a wide range of organisms show that the intron-exon structure of homologous genes in different organisms can vary widely. [30] More recent studies of entire eukaryotic genomes have now shown that the lengths and density (introns/gene) of introns varies considerably between related species. For example, while the human genome contains an average of 8.4 introns/gene (139,418 in the genome), the unicellular fungus Encephalitozoon cuniculi contains only 0.0075 introns/gene (15 introns in the genome). [31] Since eukaryotes arose from a common ancestor (common descent), there must have been extensive gain or loss of introns during evolutionary time. [32] [33] This process is thought to be subject to selection, with a tendency towards intron gain in larger species due to their smaller population sizes, and the converse in smaller (particularly unicellular) species. [34] Biological factors also influence which genes in a genome lose or accumulate introns. [35] [36] [37]

Alternative splicing of exons within a gene after intron excision acts to introduce greater variability of protein sequences translated from a single gene, allowing multiple related proteins to be generated from a single gene and a single precursor mRNA transcript. The control of alternative RNA splicing is performed by a complex network of signaling molecules that respond to a wide range of intracellular and extracellular signals.

Introns contain several short sequences that are important for efficient splicing, such as acceptor and donor sites at either end of the intron as well as a branch point site, which are required for proper splicing by the spliceosome. Some introns are known to enhance the expression of the gene that they are contained in by a process known as intron-mediated enhancement (IME).

Actively transcribed regions of DNA frequently form R-loops that are vulnerable to DNA damage. In highly expressed yeast genes, introns inhibit R-loop formation and the occurrence of DNA damage. [38] Genome-wide analysis in both yeast and humans revealed that intron-containing genes have decreased R-loop levels and decreased DNA damage compared to intronless genes of similar expression. [38] Insertion of an intron within an R-loop prone gene can also suppress R-loop formation and recombination. Bonnet et al. (2017) [38] speculated that the function of introns in maintaining genetic stability may explain their evolutionary maintenance at certain locations, particularly in highly expressed genes.

Starvation adaptation Edit

The physical presence of introns promotes cellular resistance to starvation via intron enhanced repression of ribosomal protein genes of nutrient-sensing pathways. [39]

Introns may be lost or gained over evolutionary time, as shown by many comparative studies of orthologous genes. Subsequent analyses have identified thousands of examples of intron loss and gain events, and it has been proposed that the emergence of eukaryotes, or the initial stages of eukaryotic evolution, involved an intron invasion. [40] Two definitive mechanisms of intron loss, reverse transcriptase-mediated intron loss (RTMIL) and genomic deletions, have been identified, and are known to occur. [41] The definitive mechanisms of intron gain, however, remain elusive and controversial. At least seven mechanisms of intron gain have been reported thus far: intron transposition, transposon insertion, tandem genomic duplication, intron transfer, intron gain during double-strand break repair (DSBR), insertion of a group II intron, and intronization. In theory it should be easiest to deduce the origin of recently gained introns due to the lack of host-induced mutations, yet even introns gained recently did not arise from any of the aforementioned mechanisms. These findings thus raise the question of whether or not the proposed mechanisms of intron gain fail to describe the mechanistic origin of many novel introns because they are not accurate mechanisms of intron gain, or if there are other, yet to be discovered, processes generating novel introns. [42]

In intron transposition, the most commonly purported intron gain mechanism, a spliced intron is thought to reverse splice into either its own mRNA or another mRNA at a previously intron-less position. This intron-containing mRNA is then reverse transcribed and the resulting intron-containing cDNA may then cause intron gain via complete or partial recombination with its original genomic locus. Transposon insertions can also result in intron creation. Such an insertion could intronize the transposon without disrupting the coding sequence when a transposon inserts into the sequence AGGT, resulting in the duplication of this sequence on each side of the transposon. It is not yet understood why these elements are spliced, whether by chance, or by some preferential action by the transposon. In tandem genomic duplication, due to the similarity between consensus donor and acceptor splice sites, which both closely resemble AGGT, the tandem genomic duplication of an exonic segment harboring an AGGT sequence generates two potential splice sites. When recognized by the spliceosome, the sequence between the original and duplicated AGGT will be spliced, resulting in the creation of an intron without alteration of the coding sequence of the gene. Double-stranded break repair via non-homologous end joining was recently identified as a source of intron gain when researchers identified short direct repeats flanking 43% of gained introns in Daphnia. [42] These numbers must be compared to the number of conserved introns flanked by repeats in other organisms, though, for statistical relevance. For group II intron insertion, the retrohoming of a group II intron into a nuclear gene was proposed to cause recent spliceosomal intron gain.

Intron transfer has been hypothesized to result in intron gain when a paralog or pseudogene gains an intron and then transfers this intron via recombination to an intron-absent location in its sister paralog. Intronization is the process by which mutations create novel introns from formerly exonic sequence. Thus, unlike other proposed mechanisms of intron gain, this mechanism does not require the insertion or generation of DNA to create a novel intron. [42]

The only hypothesized mechanism of recent intron gain lacking any direct evidence is that of group II intron insertion, which when demonstrated in vivo, abolishes gene expression. [43] Group II introns are therefore likely the presumed ancestors of spliceosomal introns, acting as site-specific retroelements, and are no longer responsible for intron gain. [44] [45] Tandem genomic duplication is the only proposed mechanism with supporting in vivo experimental evidence: a short intragenic tandem duplication can insert a novel intron into a protein-coding gene, leaving the corresponding peptide sequence unchanged. [46] This mechanism also has extensive indirect evidence lending support to the idea that tandem genomic duplication is a prevalent mechanism for intron gain. The testing of other proposed mechanisms in vivo, particularly intron gain during DSBR, intron transfer, and intronization, is possible, although these mechanisms must be demonstrated in vivo to solidify them as actual mechanisms of intron gain. Further genomic analyses, especially when executed at the population level, may then quantify the relative contribution of each mechanism, possibly identifying species-specific biases that may shed light on varied rates of intron gain amongst different species. [42]

Eukaryotic Epigenetic Gene Regulation

The human genome encodes over 20,000 genes each of the 23 pairs of human chromosomes encodes thousands of genes. The DNA in the nucleus is precisely wound, folded, and compacted into chromosomes so that it will fit into the nucleus. It is also organized so that specific segments can be accessed as needed by a specific cell type.

The first level of organization, or packing, is the winding of DNA strands around histone proteins. Histones package and order DNA into structural units called nucleosome complexes, which can control the access of proteins to the DNA regions (Figure 1a). Under the electron microscope, this winding of DNA around histone proteins to form nucleosomes looks like small beads on a string (Figure 1b). These beads (histone proteins) can move along the string (DNA) and change the structure of the molecule.

Figure 1. DNA is folded around histone proteins to create (a) nucleosome complexes. These nucleosomes control the access of proteins to the underlying DNA. When viewed through an electron microscope (b), the nucleosomes look like beads on a string. (credit “micrograph”: modification of work by Chris Woodcock)

If DNA encoding a specific gene is to be transcribed into RNA, the nucleosomes surrounding that region of DNA can slide down the DNA to open that specific chromosomal region and allow for the transcriptional machinery (RNA polymerase) to initiate transcription (Figure 2). Nucleosomes can move to open the chromosome structure to expose a segment of DNA, but do so in a very controlled manner.

Practice Question

Figure 2. Nucleosomes can slide along DNA. When nucleosomes are spaced closely together (top), transcription factors cannot bind and gene expression is turned off. When the nucleosomes are spaced far apart (bottom), the DNA is exposed. Transcription factors can bind, allowing gene expression to occur. Modifications to the histones and DNA affect nucleosome spacing.

In females, one of the two X chromosomes is inactivated during embryonic development because of epigenetic changes to the chromatin. What impact do you think these changes would have on nucleosome packing?

This type of gene regulation is called epigenetic regulation. Epigenetic means “around genetics.” The changes that occur to the histone proteins and DNA do not alter the nucleotide sequence and are not permanent. Instead, these changes are temporary (although they often persist through multiple rounds of cell division) and alter the chromosomal structure (open or closed) as needed. A gene can be turned on or off depending upon the location and modifications to the histone proteins and DNA.

View this video that describes how epigenetic regulation controls gene expression.

In Summary: Eukaryotic Epigenetic Gene Regulation

In eukaryotic cells, the first stage of gene expression control occurs at the epigenetic level. Epigenetic mechanisms control access to the chromosomal region to allow genes to be turned on or off. These mechanisms control how DNA is packed into the nucleus by regulating how tightly the DNA is wound around histone proteins. The addition or removal of chemical modifications (or flags) to histone proteins or DNA signals to the cell to open or close a chromosomal region. Therefore, eukaryotic cells can control whether a gene is expressed by controlling accessibility to transcription factors and the binding of RNA polymerase to initiate transcription.

RIGD: A Database for Intronless Genes in the Rosaceae

Most eukaryotic genes are interrupted by one or more introns, and only prokaryotic genomes are composed of mainly single-exon genes without introns. Due to the absence of introns, intronless genes in eukaryotes have become important materials for comparative genomics and evolutionary biology. There is currently no cohesive database that collects intronless genes in plants into a single database, although many databases on exons and introns exist. In this study, we constructed the Rosaceae Intronless Genes Database (RIGD), a user-friendly web interface to explore and collect information on intronless genes from different plants. Six Rosaceae species, Pyrus bretschneideri, Pyrus communis, Malus domestica, Prunus persica, Prunus mume, and Fragaria vesca, are included in the current release of the RIGD. Sequence data and gene annotation were collected from different databases and integrated. The main purpose of this study is to provide gene sequence data. In addition, attribute analysis, functional annotations, subcellular localization prediction, and GO analysis are reported. The RIGD allows users to browse, search, and download data with ease. Blast and comparative analyses are also provided through this online database, which is available at

Keywords: Rosaceae database gene annotations intronless genes platform.

Copyright © 2020 Chen, Meng, Liu, Cheng, Wang, Jin, Xu, Cao and Cai.


The flowchart describing the analysis…

The flowchart describing the analysis of RIGD. After the identification of intronless genes,…

The Flowchart of RIGD Sitemap.…

The Flowchart of RIGD Sitemap. All the data is stored in MySQL database,…

An overview of the website…

An overview of the website page in the RIGD. (A) Home interface show…

The website page of Species.…

The website page of Species. In addition to the download links to the…

The website page of Search.…

The website page of Search. Researchers can search the RIGD database by species…

The website page of Blast.…

The website page of Blast. The interface can be used for Blast comparison…

The statistics of subcellular localization…

The statistics of subcellular localization prediction. The largest number of intronless genes were…

Top 10 largest number of intronless genes in protein function. The largest number…

Subcellular localization prediction of intronless…

Subcellular localization prediction of intronless PPR genes in Pyrus bretschneideri . Most proportion…

Motif analysis and visualization. Motif1…

Motif analysis and visualization. Motif1 is the most covered. Motif3 only existed at…