11.2: Clade Age and Diversity - Biology

11.2: Clade Age and Diversity - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

If we know the age of a clade and its current diversity, then we can calculate the net diversification rate for that clade. Before presenting equations, I want to make a distinction between two different ways that one can measure the age of a clade: the stem age and the crown age. A clade's crown age is the age of the common ancestor of all species in the clade. By contrast, a clade's stem age measures the time that that clade descended from a common ancestor with its sister clade. The stem age of a clade is always at least as old, and usually older, than the crown age.

Magallón and Sanderson (2001) give an equation for the estimate of net diversification rate given clade age and diversity:

$$ hat{r} = frac{ln(n)}{t_{stem}} label{11.1}$$

where n is the number of species in the clade at the present day and tstem is the stem group age. In this equation we also see the net diversification rate parameter, r = λμ (see Chapter 10). This is a good reminder that this parameter best predicts how species accumulate through time and reflects the balance between speciation and extinction rates.

Alternatively, one can use tcrown, the crown group age:

$$ hat{r} = frac{ln(n)-ln(2)}{t_{crown}} label{11.2}$$

The two equations differ because at the crown group age one is considering the clade's diversification starting with two lineages rather than one (Figure 11.2).

Even though these two equations reflect the balance of births and deaths through time, they give ML estimates for r only under a pure birth model where there is no extinction. If there has been extinction in the history of the clade, then our estimates using equations 11.1 and 11.2 will be biased. The bias comes from the fact that we see only clades that survive to the present day, and miss any clades of the same age that died out before they could be observed. By observing only the "winners" of the diversification lottery, we overestimate the net diversification rate. If we know the relative amount of extinction, then we can correct for this bias.

Under a scenario with extinction, one can define ϵ = μ/λ and use the following method-of-moments estimators from Rohatgi (1976, following the notation of Magallon and Sanderson (2001)):

$$ hat{r} = frac{log⁡[n(1-epsilon)+epsilon]}{t_{stem}} label{11.3}$$

for stem age, and

$$ hat{r} = frac{ln[frac{n(1-epsilon^2 )}{2}+2 epsilon+frac{(1-epsilon) sqrt{n(n epsilon^2-8epsilon+2 n epsilon+n)}}{2}]-ln(2)}{t_{crown}} label{11.4}$$

for crown age. (Note that eq. 11.3 and 11.4 reduce to 11.1 and 11.2, respectively, when ϵ = 0). Of course, we usually have little idea what ϵ should be. Common practice in the literature is to try a few different values for ϵ and see how the results change (e.g. Magallon and Sanderson 2001).

Magallón and Sanderson (2001), following Strathmann and Slatkin (1983), also describe how to use eq. 10.13 and 10.15 to calculate confidence intervals for the number of species at a given time.

As a worked example, let's consider the data in table 11.1, which give the crown ages and diversities of a number of plant lineages in the Páramo (from Madriñán et al. 2013). For each lineage, I have calculated the pure-birth estimate of speciation rate (from equation 11.2, since these are crown ages), and net diversification rates under three scenarios for extinction (ϵ = 0.1, ϵ = 0.5, and ϵ = 0.9).

LineagenAge$hat{r}_{pb}$$hat{r}_{epsilon = 0.1}$$hat{r}_{epsilon = 0.5}$$hat{r}_{epsilon = 0.9}$

The table above shows estimates of net diversification rates for Páramo lineages (data from Madriñán et al. 2013) using equations 11.2 and 11.4. Lineages are as follows: 1: Aragoa, 2: Arcytophyllum, 3: Berberis, 4: Calceolaria, 5: Draba, 6: Espeletiinae, 7: Festuca, 8: Jamesonia + Eriosorus, 9: Lupinus, 10: Lysipomia, 11: Oreobolus, 12: Puya, 13: Valeriana.

Inspecting the last four columns of this table, we can make a few general observations. First, these plant lineages really do have remarkably high diversification rates (Madriñán et al. Second, the net diversification rate we estimate depends on what we assume about relative extinction rates (ϵ). You can see in the table that the effect of extinction is relatively mild until extinction rates are assumed to be quite high (ϵ = 0.9). Finally, assuming different levels of extinction affects the diversification rates but not their relative ordering. In all cases, net diversification rates for Aragoa, which formed 17 species in less than half a million years, is higher than the rest of the clades. This relationship holds only when we assume relative extinction rates are constant across clades, though. For example, the net diversification rate we calculate for Calceolaria with ϵ = 0 is higher than the calculated rate for Aragoa with ϵ = 0.9. In other words, we can't completely ignore the role of extinction in altering our view of present-day diversity patterns.

We can also estimate birth and death rates for clade ages and diversities using ML or Bayesian approaches. We already know the full probability distribution for birth-death models starting from any standing diversity N(0)=n0 (see equations 10.13 and 10.15). We can use these equations to calculate the likelihood of any particular combination of N and t (either tstem or tcrown) given particular values of λ and μ. We can then find parameter values that maximize that likelihood. Of course, with data from only a single clade, we cannot estimate parameters reliably; in fact, we are trying to estimate two parameters from a single data point, which is a futile endeavor. (It is common, in this case, to assume some level of extinction and calculate net diversification rates based on that, as we did in Table 11.1 above).

One can also assume that a set of clades have the same speciation and extinction rates and fit them simultaneously, estimating ML parameter values. This is the approach taken by Magallón and Sanderson (2001) in calculating diversification rates across angiosperms. When we apply this approach to the Paramo data, shown above, we obtain ML estimates of $hat{r} = 0.27$ and $hat{epsilon} = 0$. If we were forced to estimate an overall average rate of speciation for all of these clades, this might be a reasonable estimate. However, the table above also suggests that some of these clades are diversifiying faster than others. We will return to the issue of variation in diversification rates across clades in the next chapter.

We can also use a Bayesian approach to calculate posterior distributions for birth and death rates based on clade ages and diversities. This approach has not, to my knowledge, been implemented in any software package, although the method is straightforward (for a related approach, see Höhna et al. 2016). To do this, we will modify the basic algorithm for Bayesian MCMC (see Chapter 2) as follows.


  1. Sample a set of starting parameter values, r and ϵ, from their prior distributions. For this example, we can set our prior distribution for both parameters as exponential with a mean and variance of λprior (note that your choice for this parameter should depend on the units you are using, especially for r). We then select starting r and ϵ from their priors.
  2. Given the current parameter values, select new proposed parameter values using the proposal density Q(p′|p). For both parameter values, we can use a uniform proposal density with width wp, so that Q(p′|p) U(pwp/2, p + wp/2). We can either choose both parameter values simultaneously, or one at a time (the latter is typically more effective).
  3. Calculate three ratios:
    • a. The prior odds ratio. This is the ratio of the probability of drawing the parameter values p and p′ from the prior. Since we have exponential priors for both parameters, we can calculate this ratio as: $$ R_{prior} = frac{lambda_{prior} e^{-lambda_{prior} p'}}{lambda_{prior} e^{-lambda_{prior} p}}=e^{lambda_{prior} (p-p')} label{11.5}$$
  • b. The proposal density ratio. This is the ratio of probability of proposals going from p to p′ and the reverse. We have already declared a symmetrical proposal density, so that Q(p′|p)=Q(p|p′) and Rproposal = 1.
  • c. The likelihood ratio. This is the ratio of probabilities of the data given the two different parameter values. We can calculate these probabilities from equations 10.13 or 10.16 (depending on if the data are stem ages or crown ages).
  1. Find Raccept as the product of the prior odds, proposal density ratio, and the likelihood ratio. In this case, the proposal density ratio is 1, so (eq. 11.6):

    Raccept = RpriorRlikelihood

An Inordinate Fondness for Eukaryotic Diversity

Why do some groups of organisms, like beetles, have so many species, and others, like the tuataras, so few? This classic question in evolutionary biology has a deep history and has been studied using both fossils and phylogenetic trees. Phylogeny-based studies have focused on tree balance, which compares the number of species across clades of the same age in the tree. These studies have suggested that rates of speciation and extinction vary tremendously across the tree of life. In this issue, Rabosky et al. report the most ambitious study to date on the differences in species diversity across clades in the tree of life. The authors bring together a tremendously large dataset of multicellular eukaryotes, including all living species of plants, animals, and fungi they divide these organisms into 1,397 clades, accounting for more than 1.2 million species in total. Rabosky et al. find tremendous variation in diversity across the tree of life. There are old clades with few species, young clades with many species, and everything in between. They also note a peculiar aspect of their data: it is difficult or impossible to predict how many species will be found in a particular clade knowing how long a clade has been diversifying from a common ancestor. This pattern suggests complex dynamics of speciation and extinction in the history of eukaryotes. Rabosky et al.'s paper represents the latest development in our efforts to understand the Earth's biodiversity at the broadest scales.

Citation: Harmon LJ (2012) An Inordinate Fondness for Eukaryotic Diversity. PLoS Biol 10(8): e1001382.

Published: August 28, 2012

Copyright: © Luke J. Harmon. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: Funded by NSF DEB 0919499 and DEB 1208912. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The author has declared that no competing interests exist.

J.B.S. Haldane—one of the most quotable of all evolutionary biologists—had a favorite saying about what patterns of species richness tell us about the nature of the Universe: “God has an inordinate fondness for beetles” (see [1] for more details). With this quip, Haldane is referring to the overwhelming number of beetle species on Earth. We still don't know exactly how many species of beetles there are on the Earth—perhaps around 400,000—but certainly, there are a lot.

One of the primary mysteries in macroevolution is the tremendous difference in numbers of species among different taxonomic groups. Modern systematists classify species into clades (groups of species that represent all of the descendents of a common ancestor, like turtles or arthropods). Different clades in the tree of life have dramatically different diversities. This might not be surprising—after all, species in one clade can be distinct from other species in size, energy use, and a thousand other ways. Also, some clades are much older than others. However, even when we control for differences in age by comparing sister clades—that is, pairs of clades that are each others' closest relative—we still see profound differences in number of species. For example, there are currently two living species of tuatara, a clade of lizard-like reptiles that currently inhabit small islands around New Zealand (Figure 1, left). These tuatara are the sister clade to the squamates, a clade of 7,000 species that includes all living snakes and lizards (Figure 1, right) [2]. Since these two groups are sister clades, they diverged from a common ancestor at exactly the same time (∼250 million years ago [3]). Tuataras used to be far more diverse in the past (though almost certainly not as diverse as squamates [4]), but their current diversity is dwarfed by the tremendous number and variety of snakes and lizards around the globe. Similar patterns occur across the whole tree of life. In fact, old, low diversity clades contain some of the most enigmatic species on Earth: ginkgo trees, coelacanths, tailed frogs, horseshoe crabs, and monotremes, among others. These species are sometimes called “living fossils,” although only some of them are actually thought to resemble their ancient ancestors [5].

The sister clade to the tuatara is Squamata, which includes the ∼7,000 living species of snakes and lizards, including the ornate day gecko (Phelsuma ornata, right). (Left) from Wikimedia commons, taken by user KeresH, (Right) by the author.

We can learn a lot about the differences in diversity across clades from phylogenetic trees. In particular, phylogenetic tree balance summarizes the pattern of differences in the number of species between sister clades across a whole phylogeny [6]. A phylogenetic tree can be completely balanced, such that each pair of sister taxa in the tree have exactly the same number of species (Figure 2A this is only possible if the number of species in the tree is a power of 2: 2, 4, 8, 16, 32, etc.). A phylogenetic tree can also be completely imbalanced, so that every comparison of sister clades has a single species in one clade and the remainder in the other (such a tree is also called pectinate Figure 2B). There are a few different ways to quantify tree balance, but they all work in basically the same way: compare the number of species between sister clades in the tree, and summarize those differences across a whole phylogeny [7].

Imagine for a moment that you have the tree of life (a phylogenetic tree of all species on Earth). We don't have such a tree yet, but scientists are moving in that direction and trees are getting bigger and bigger (e.g., [8]). The tree of life is a huge and complicated structure, and—as one might imagine at a scale that encompasses all living things, bacteria to beetles to beagles—resists generalizations. But even as the tree of life takes shape, we already know that it is highly imbalanced. This statement applies broadly across living things, and applies equally well to plants as it does to animals, and everything else (as far as we know). Dramatic differences in diversity among clades is a characteristic feature of life on Earth.

To better understand patterns of balance in the tree of life, we can start with the birth-death model, a simple model of how phylogenetic trees might grow through time (reviewed in [9]). Under a birth-death model, phylogenetic trees “grow” through time following two processes: speciation, where one species splits into two, and extinction, when one species dies out. For simple birth-death models we assume that both of these processes happen at a constant rate through time for each species alive at that moment, and use the parameters λ (speciation rate) and μ (extinction rate). If we simulate this birth-death model using a computer, we will obtain a phylogenetic tree (Figure 2C). If the extinction rate is greater than zero, such a tree will include surviving species as well as species that have gone extinct. Since we will be comparing this tree to phylogenies of living species, we can assume that any species that went extinct before the present has been pruned. We can then measure the balance of the resulting birth-death tree (our simulated tree will also have branch lengths, but we will ignore those for now). If we repeat this process a large number of times, we obtain a statistical distribution of our tree balance measure, which represents the expectation of that distribution under the birth-death model. It turns out, perhaps surprisingly, that this distribution of balance depends only on the fact that the trees are simulated under a birth-death model in terms of tree balance, the actual rates of speciation and extinction do not matter, as long as they are constant across clades [10].

There is another counterintuitive feature of the balance of birth-death trees: these trees are surprisingly imbalanced compared to what, perhaps, your intuition might suggest. For example, imagine that a certain phylogenetic tree contains 100 species. If you look at the deepest split in that tree and compare the diversity of the two sister clades, what do you expect to find? Are you more likely to find an even number of species in each of these two clades—say, 50/50 or 49/51—or a very uneven number, like 2/98 or 1/99? The surprising answer is that all four of these listed possibilities are equally likely. In fact, all possible combinations of diversity for each of the two clades are equally probable [11].

This property of birth-death models means that birth-death trees are quite imbalanced: it is not uncommon, for example, to find sister clades that differ in diversity by a factor of 20 just by random chance. However, the tree of life is imbalanced even compared to birth-death trees! This general observation was first discovered in an influential paper by Arne Mooers and colleagues [7]. In that paper, the authors measured the balance of phylogenetic trees that had been reconstructed using trait data, DNA sequences, or both. They then compared their balance to what we might expect under a birth-death model. They found that phylogenetic trees from a wide range of taxa are extraordinarily imbalanced. This paper showed that the general “shape” of the tree of life is highly imbalanced.

The classical interpretation of the imbalanced tree of life is that clades vary in their rates of speciation and/or extinction. There are many reasons to suspect that species in some clades might speciate more frequently, or go extinct less frequently, than their relatives. For example, perhaps a species' range affects its probability of speciating or going extinct—as suggested by a recent paper in PLOS Biology by Pigot et al. [12]—so that clades of species with different distributions of range sizes will experience different rates of diversification. Many studies have attempted to measure speciation and extinction rates in groups with a good fossil record, and compare these rates across different types of organisms and time periods (e.g., [13],[14]). These studies have generally found wide variation in both rates and in their difference (speciation−extinction = net diversification rate).

Recent studies go beyond measures of tree balance by using the tree's branch lengths to gain information about speciation rates. One simple way to do this is to compare the diversity of a clade to its age one can then estimate the speciation rate as λ = ln(n)/t, where n is the number of living species in the clade and t is the clade's stem age (the time since divergence from the clade's sister group) [15]. There are also modifications of this equation that incorporate extinction and that can use the clade's crown age (the time since all living species in the clade shared a common ancestor see [15]).

Perhaps surprisingly, one can even estimate extinction from a phylogenetic tree based only on living species [16]. This is because old and young lineages are hit by extinction with different probabilities: since young lineages have not been around very long, they are less likely to have gone extinct than older lineages. The phenomenon is called the “pull of the present” [16] and means that extinction leads to an overabundance of very young lineages in a tree. We can look for this pull in patterns of lineage accumulation through time, which can thus be used to estimate both speciation and extinction rates and to compare these rates across clades (e.g., [17] but see [18], which points out that this method does not work well when its assumptions of rate constancy are strongly violated). More recently, new methods have been developed to search for variation in speciation and/or extinction rates across large phylogenetic trees, and to try to correlate these rates with the traits of lineages [19],[20].

In this issue, Rabosky et al. [21] attempt the most ambitious study to date investigating the differences in species diversity across clades in the tree of life. The authors bring together a tremendously large dataset that spans the multicellular eukaryotes, including all living species of plants, animals, and fungi. For each of 1,397 eukaryotic clades, the authors gathered estimates of age and diversity from the literature – accounting for more than 1.2 million species in total. The authors also summarize the evolutionary relationships among these clades using a “backbone” phylogenetic tree with branch lengths in millions of years. This provides a remarkably complete view of what we currently know about the species diversity of clades across a huge section of the tree of life.

The diversity data in Rabosky et al. [21] are broadly consistent with the historical background above: there are major differences in diversification rates across the tree of life. There are old clades with few species, young clades with many species, and everything in between. But Rabosky et al. also note a peculiar aspect of their data: there is typically either a very weak or no relationship between the number of species in a clade and its age. That is, in the data they analyze, it is difficult to guess how many species are in a clade on the basis of how long it has been diversifying from a common ancestor. The traditional explanation for this pattern would be differences in diversification rates across clades—although the authors use simulations to show that, at least under one scenario about how rates might vary across trees, one rarely finds such weak or absent relationships between age and diversity. The authors speculate about other possible explanations for this peculiar (lack of) pattern, from bias in the way clades are named to ecological processes that limit the number of coexisting, competing species.

Rabosky et al.'s [21] analysis is not the final chapter the tree of life is still under construction, and the total number of species in some clades is best viewed as an educated guess. Specifically, I suspect that we have very poor estimates of the extant diversity of many eukaryotic groups, particularly small, understudied organisms. Indeed, new techniques that use genomic sequencing to identify undiscovered species from DNA sequences from environmental samples often reveal that the species we know are only a small component of natural ecosystems [22]. One might also note that the total diversity of multicellular eukaryotes counted in this study might be a vast underestimate compared to recent estimates that use statistical analyses to correct for incomplete sampling [23]. Still, the results in Rabosky et al. [21] are intriguing and will certainly inspire further study, which I expect will be focused on testing more sophisticated mathematical models, beyond the constant-rate birth-death models prevalent today, that might be able to explain patterns in the data.

I first learned of the Huxley quote that opens this article in a classic paper, “Homage to Santa Rosalia, or why are there so many kinds of animals?” [24]. This paper, written by the great ecologist G.E. Hutchinson, speculates about the mechanisms that allow so many different species to coexist in natural communities. Hutchinson describes collecting Italian water boatmen from a small pond in the shadow of Santa Rosalia's shrine, and wondering why the pond contained two species of water beetle—no more, no less. Hutchinson says “Nothing in her history being known to the contrary, perhaps for the moment we may take Santa Rosalia as the patroness of evolutionary studies…” [24]. Rabosky et al.'s [21] paper represents the latest development in our efforts to understand why the Earth has the particular number of species that it has – no more, no less. Santa Rosalia would be proud.


Geographic distribution of SARS-CoV-2 clades

As of January 25, 2021, WHO reported a total number of 98,925,221 confirmed COVID-19 cases and 2,127,294 deaths 19 . The calculated world case fatality rate (CFR) was 2.15%. Of all continents, the highest number of COVID-19 cases was reported from North America while most deaths were in Europe. South America had the highest calculated CFR.

GR was the most common clade (34.0%), followed by GV (22.3%) and GH (21.4%). Lower prevalence was noted for their parent clade, G, (15.8%). Other less common clades including L, S, and V were identified in 1.2%, 2.1% and 1.5% of the submitted genomes, respectively. About 1.7% of the genomes were not clustered into any of the major clades and thus had the designation “clade O”.

Analysis of the continent distribution of SARS-CoV-2 clades (Fig. 1) showed that clade GR was the most frequently identified among the genomes submitted from four continents namely Africa (41.1%), Asia (52.7%), Oceania (74.8%), and South America (66.8%). Clade GH was the most common in North America (59.0%). In Europe, both GR (35.5%) and the newly emerged clade, GV (34.6%) predominated.

Continent distribution of various SARS-CoV-2 clades. The figure shows the predominance of clade GR in Africa, Asia, Oceania, and South America and the predominance of clade GH in North America. The clades GR and GV predominated in Europe.

The number of coexisting clades was compared between countries with respect to different disease epidemiology parameters including the number of cases, total number of deaths and CFRs. Viral strains belonging to all known clades coexisted in 31 countries (21.1%). Of them, 61.3% reported above median local values for the studied disease epidemiology parameters.

Mann–Whitney test showed a significant difference in the distribution of the number of coexisting clades in the two groups with respect to the total number of cases (P-value < 0.001), total deaths (P-value < 0.001) and CFRs (P-value = 0.020). Higher medians of the number of coexisting clades were shown in the group of countries where above median cases, deaths and CFRs were recorded.

The impact of the distribution of individual clades on the disease epidemiology was also analyzed. Distribution bias of some clades was noted, as shown in Table 1. This was statistically significant for all clades with all disease epidemiology parameters.

Among all studied cases, patient’s clinical status were specified for only 2,634. Based on the provided data, such cases were grouped into asymptomatic/mild cases and severe/deceased cases. Although clades GH, and GR were significantly more prevalent among viral genomes isolated from severe/deceased cases, L clade showed the same distribution but with P-value > 0.05. In contrast all other clades showed higher prevalence in asymptomatic/mild cases than severe/deceased ones. This was statistically significant for all clades except clade V (Table 2).

Analysis based on the chronological distribution of SARS-CoV-2 clades was done for 404,496 cases for which the exact date of collection was available. The analysis showed that clade L predominated at the beginning of the pandemic. Thereafter, new clades evolved including the clades S and V. Viral clades carrying D614G mutation also emerged. Of them, clade G first emerged and soon split into the clades GH and GR. A gradual regression of all clades was then noted with an expansion of clade GR that predominated the scene for six months till the emergence of the last clade, “GV”, by which it was rapidly outweighed. Currently, GV clade is distributed in at least 49 countries predominantly in the United Kingdom (73.3% of the reported GV cases). The global chronological distribution of SARS-CoV-2 clades is shown in Fig. 2. The origins and the evolution time of SARS-CoV-2 clades inferred from the genomes submitted to GISAID are shown in Table 3.

Global chronological distribution of SARS-CoV-2 clades in the period from December 2019 till December 2020. Percentages are used for labelling the predominant clade in each month. The emergence of D614G mutation is marked by a red asterisk.

Gender distribution of SARS-CoV-2 clades

Analysis of 96,350 cases (Males = 49,454, Females = 46,896) for which patient gender was specified showed gender distribution bias for some clades (Table 4). This was statistically significantly for clades L and G that showed higher prevalence in males than females. Similarly, the clades GR and GV were more frequently isolated from females than males with P-value less than 0.05.

The severity of cases in both genders were compared in 2495 cases for which both gender and patient’s clinical status are known. Among the group of cases for which the clinical status was recorded as severe or deceased, the number of male patients was significantly higher than female patients (49.9% versus 35.4%, P-value < 0.001). Clinical status of patients infected by SARS-CoV-2 of different clades in different gender groups is shown in Fig. 3.

Clinical status of patients infected by SARS-CoV-2 of different clades in different gender groups.

Age distribution of SARS-CoV-2 clades

The distribution of the genomes that belonged to different clades in different age groups was analyzed among 95,848 cases for which the patient age was specified (Table 5). As shown in Table 5, viral isolates belonging to clade GR were more common in adult patients than other groups. Meanwhile, children were the age group from which the clades GH and GV were more frequently isolated. Isolation of all other clades was commonest in elderly patients compared to others. Clinical status of patients of different age groups from which viral genomes belonging to different clades were isolated is shown in Fig. 4.

Clinical status of patients infected by SARS-CoV-2 of different clades in different age groups.

A significant correlation was found between age groups and patient’s clinical status. The analysis included 2,524 cases for which both patient age and clinical status are known (Fig. 4). Severe/deceased cases were significantly more prevalent in elderly than in adults (71.9% vs 31.6%, Pearson Chi-Square P-value < 0.001) or in children (71.9% vs 3.4%, Pearson Chi-Square P-value < 0.001). They were also more frequently reported among adults compared to children in a statistically significant manner (31.6 vs 3.4%, Fisher’s Exact test P-value < 0.001).

Evolutionary time drives global tetrapod diversity

Global variation in species richness is widely recognized, but the explanation for what drives it continues to be debated. Previous efforts have focused on a subset of potential drivers, including evolutionary rate, evolutionary time (maximum clade age of species restricted to a region), dispersal (migration from one region to another), ecological factors and climatic stability. However, no study has evaluated these competing hypotheses simultaneously at a broad spatial scale. Here, we examine their relative contribution in determining the richness of the most comprehensive dataset of tetrapods to our knowledge (84% of the described species), distinguishing between the direct influences of evolutionary rate, evolutionary time and dispersal, and the indirect influences of ecological factors and climatic stability through their effect on direct factors. We found that evolutionary time exerted a primary influence on species richness, with evolutionary rate being of secondary importance. By contrast, dispersal did not significantly affect richness patterns. Ecological and climatic stability factors influenced species richness indirectly by modifying evolutionary time (i.e. persistence time) and rate. Overall, our findings suggest that global heterogeneity in tetrapod richness is explained primarily by the length of time species have had to diversify.

Keywords: climatic stability energy richness evolutionary rate evolutionary time species richness tetrapod.

Conflict of interest statement

We declare we have no competing interests.


Theoretical framework depicting the relationships,…

Theoretical framework depicting the relationships, and the underlying hypotheses, between evolutionary, ecological, climatic…

Global species richness maps. Species…

Global species richness maps. Species richness was computed by bioregion for ( a…

Structural equation models depicting the…

Structural equation models depicting the associations of direct and indirect variables with species…


Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013499:431–7.

De Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, et al. Eukaryotic plankton diversity in the sunlit ocean. Science. 2015348:1261605.

Malviya S, Scalco E, Audic S, Vincent F, Veluchamy A, Poulain J, et al. Insights into global diatom distribution and diversity in the world’s ocean. P Natl Acad Sci USA. 2016113:E1516–25.

Pleijel F, Rouse GW. Ceci n’est pas une pipe: Names, clades and phylogenetic nomenclature. J Zool Syst Evol Res. 200341:162–74.

Sundberg PER, Pleijel F. Phylogenetic classification and the definition of taxon names. Zool Scr. 199423:19–25.

Cantino PD, de Queiroz K (2010). PhyloCode: A Phylogenetic Code of Biological Nomenclature. Ohio University. Athens, Ohio.

Harvey PH, Pagel MK. The Comparative Method in Evolutionary Biology. Oxford: Oxford University Press 1991.

Stadler T, Rabosky DL, Ricklefs RE, Bokma F. On age and species richness of higher taxa. Am Nat. 2014184:447–55.

Magallón S, Sanderson MJ. Absolute diversification rates in angiosperm clades. Evolution. 200155:1762–80.

Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al. Structure and function of the global ocean microbiome. Science. 2015348:1261359.

Nakov T, Beaulieu JM and Alverson AJ (2018). Accelerated diversification is related to life history and locomotion in a hyperdiverse lineage of microbial eukaryotes (Diatoms, Bacillariophyta). New Phytol.

Lazarus D, Barron J, Renaudie J, Diver P, Türke A. Cenozoic planktonic marine diatom diversity and correlation to climate change. PLOS ONE. 20149:e84857.

Mann DG, Vanormelingen P. An inordinate fondness? The number, distributions, and origins of diatom species. J Eukaryot Microbiol. 201360:414–20.

Kociolek JP, Balasubramanian K, Blanco S, Coste M, Ector L, Liu Y et al. (2018). DiatomBase. Accessed 18 Feb 2018.

Alverson AJ, Beszteri B, Julius ML, Theriot EC. The model marine diatom Thalassiosira pseudonana likely descended from a freshwater ancestor in the genus Cyclotella. BMC Evol Biol. 201111:125.

Wiese R, Renaudie J, Lazarus DB. Testing the accuracy of genus-level data to predict species diversity in Cenozoic marine diatoms. Geology. 201644:1051–4.

Kociolek JP. Taxonomy and ecology: further considerations. Proc Calif Acad Sci. 200556:99–106.

How reliably can we infer diversity-dependent diversification from phylogenies?

Fig. S1. Power as in Fig. 2, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 40.

Fig. S2. Power as in Fig. 2, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 80.

Fig. S3. Power as in Fig. 2, but for intrinsic speciation rate λ0 = 0.8 and clade-level carrying capacity K = 80.

Fig. S4. Parameter estimates as in Fig. 1, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 40.

Fig. S5. Parameter estimates as in Fig. 1, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 80.

Fig. S6. Parameter estimates as in Fig. 1, but for intrinsic speciation rate λ0 = 0.8 and clade-level carrying capacity K = 80.

Fig. S7. Supplementary statistics of the simulated data for three forms of conditioning (columns) across different diversification scenarios (1000 simulations).

Fig. S8. Supplementary statistics as in Fig. S7 but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 40.

Fig. S9. Supplementary statistics as in Fig. S7, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 80.

Fig. S10. Supplementary statistics as in Fig. S7, but for intrinsic speciation rate λ0 = 0.8 and clade-level carrying capacity K = 80.

Fig. S11. Estimates of K as a function of the size of the simulated tree for different diversification scenarios and forms of conditioning (1000 simulations).

Fig. S12. Relationship of the estimated K vs. tree size as in Fig. S11, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 40.

Fig. S13. Relationship of the estimated K vs. tree size as in Fig. S11, but for intrinsic speciation rate λ0 = 0.5 and clade-level carrying capacity K = 80.

Fig. S14. Relationship of the estimated K vs. tree size as in Fig. S11, but for intrinsic speciation rate λ0 = 0.8 and clade-level carrying capacity K = 80.

Fig. S15. Bootstrap distribution of the parameter values for the Dendroica clade.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Taxon List. List of taxa used for the study, their geographic localities, and GenBank accession numbers.

We thank R. Bryson Jr., J. DaCosta, G. Spellman, J. Batten, C. Tabone, UNLV Systematics Group, T. Monterrubio, J. Chaves, R. Canales, M. Gurrola, J. Rodríguez, J. Jaeger, and two anonymous reviewers. The following individuals and institutions provided samples for the study: Paul Sweet and George Barrowclough (American Museum of Natural History), Christopher Witt and Andrew Jones (Museum of Southwest Biology, University of New Mexico), Robert Zink (Bell Museum of Natural History), Sharon Birks (U of Washington, Burke Museum of Natural History), and Mark Robbins (University of Kansas). This work was partially funded by a UNLV International Studies Scholarship, a UNLV GPSA grant and an American Museum of Natural History Chapman Grant. This study was conducted in accordance with American and Mexican state and federal permits and in compliance with each government's animal research laws.


Outer Ear

Table 2 presents the generic averages for the variables used in the outer ear analysis. The pinna shape index (Psi) is plotted by family/subfamily in Figure 7A. The values for Cebus and Saimiri are plotted separately. This plot underscores the pattern observed in Figure 1 that the major distinction in primate outer ear shape is between anthropoids and all other primates. There is a significant difference in outer ear shape between these two groups (U = 9 P < 0.001) with anthropoids having lower index values indicative of a pinna shape that is more symmetrical compared with the relatively tall and narrow pinnae characterizing nonanthropoids. This difference in pinna shape index may be influenced more by differences in pinna width than pinna height. Comparisons of small-bodied primates suggest that nonanthropoids have narrower pinnae than anthropoids, although this difference is not quite statistically significant at P ≤ 0.05 (F = 3.88 P = 0.06). Certainly, in this body size range, the values for pinna height show almost complete overlap. The suborder-level effects on Psi remain significant using phylogenetically adjusted critical values of the F ratio (F = 72.18 Fphylo = 44.05).

Genus Pinna height (mm) Pinna width (mm) Pinna ratio (Psi)
Alouatta 33.9 ± 2.33 (10) 24.5 ± 2.22 (10) 1.39 ± 0.097 (10)
Aotus 27.6 ± 1.58 (10) 21.5 ± 1.27 (10) 1.29 ± 0.057 (10)
Arctocebus 25.0 (1) 18.0 (1) 1.39 (1)
Ateles 33.0 ± 3.62 (10) 23.9 ± 1.85 (10) 1.38 ± 0.129 (10)
Callimico 25.5 ± 0.71 (2) 20.5 ± 0.71 (2) 1.25 ± 0.077 (2)
Callithrix 24.4 ± 1.41 (8) 18.6 ± 0.92 (8) 1.31 ± 0.091 (8)
Cebus 35.0 ± 2.29 (23) 25.3 ± 1.96 (23) 1.39 ± 0.084 (23)
Cercocebus 33.5 ± 2.12 (2) 27.5 ± 0.71 (2) 1.22 ± 0.109 (2)
Cercopithecus 33.1 ± 3.80 (7) 27.7 ± 2.06 (7) 1.19 ± 0.082 (7)
Cheirogaleus 17.3 ± 0.29 (4) 12.8 ± 0.29 (4) 1.35 ± 0.008 (4)
Daubentonia 75.1 (1) 46.0 (1) 1.63 (1)
Erythrocebus 43.7 ± 4.89 (7) 33.9 ± 2.54 (7) 1.29 ± 0.154 (7)
Euoticus 25.3 ± 0.28 (2) 15.7 ± 1.27 (2) 1.62 ± 0.113 (2)
Galago 38.8 ± 2.80 (6) 25.3 ± 2.30 (6) 1.54 ± 0.118 (6)
Galagoides 24.1 ± 1.34 (9) 13.89 ± 0.65 (9) 1.73 ± 0.068 (9)
Hylobates 31.2 ± 2.95 (5) 26.8 ± 2.59 (5) 1.16 ± 0.066 (5)
Lagothrix 29.8 ± 1.89 (4) 21.3 ± 0.96 (4) 1.40 ± 0.123 (4)
Leontopithecus 27.0 ± 0.63 (6) 22.7 ± 1.03 (6) 1.19 ± 0.070 (6)
Loris 23.8 ± 1.50 (4) 15.8 ± 2.22 (4) 1.52 ± 0.148 (4)
Macaca 38.1 ± 3.59 (9) 29.8 ± 2.59 (9) 1.28 ± 0.062 (9)
Microcebus 22.5 ± 0.71 (2) 15.0 ± 1.41 (2) 1.50 ± 0.095 (2)
Miopithecus 28.8 ± 1.72 (6) 22.8 ± 1.60 (6) 1.27 ± 0.088 (6)
Nasalis 34.5 ± 3.00 (4) 27.0 ± 2.16 (4) 1.28 ± 0.065 (4)
Nycticebus 23.0 ± 2.18 (3) 13.5 ± 0.87 (3) 1.70 ± 0.126 (3)
Otolemur 46.3 ± 11.2 (3) 29.8 ± 5.48 (3) 1.54 ± 0.083 (3)
Papio 50.1 ± 4.94 (9) 38.9 ± 4.23 (9) 1.29 ± 0.131 (9)
Perodicticus 24.0 (1) 15.0 (1) 1.60 (1)
Presbytis 32.5 ± 2.12 (2) 26.0 ± 1.41 (2) 1.25 ± 0.014 (2)
Pygathrix 37.4 ± 2.61 (5) 30.8 ± 1.48 (5) 1.21 ± 0.076 (5)
Saguinus 19.0 ± 0.82 (10) 14.7 ± 0.82 (10) 1.30 ± 0.086 (10)
Saimiri 22.0 ± 2.16 (10) 20.4 ± 2.22 (10) 1.08 ± 0.103 (10)
Semnopithecus 48.0 (1) 35.0 (1) 1.37 (1)
Tarsius 30.0 ± 2.00 (6) 18.4 ± 1.50 (6) 1.64 ± 0.160 (6)
Theropithecus 50.2 ± 6.14 (5) 35.2 ± 4.44 (5) 1.43 ± 0.064 (5)
Trachypithecus 38.5 ± 3.87 (4) 28.8 ± 2.63 (4) 1.34 ± 0.036 (4)
Varecia 41.0 ± 1.73 (3) 24.3 ± 3.06 (3) 1.70 ± 0.209 (3)
Total 32.4 ± 9.24 (204) 24.0 ± 6.91 (204) 1.36 ± 0.182 (204)
  • * Error ranges represent one standard deviation and numbers in parentheses are number of specimens per genus.

A: Pinna shape index values (Psi) for families/subfamilies investigated in this study (Cebus and Saimiri plotted separately) showing that the major differences in outer ear shape occur between anthropoids and all other primates. Error bars represent two standard deviations. B: Estimated pinna shape index values for selected nodes on phylogenetic tree in Figure 6. Pinna shape index is plotted on the abscissa in the same scale as A. The ordinate is time (Myr), from 85 Myr at the bottom to the present at the top. Estimates are phylogenetically weighted means of the tip data calculated in PDTREE. Error bars represent standard error of the estimate. Lines connecting nodes are phylogenetic lines of descent.

Middle Ear

Table 3 presents the generic averages for the variables used in the middle ear analysis. The percentage of acoustic transmission (T1), calculated using the traditional cochlear impedance value of 5,600 dynes sec/cm 3 , is presented in Figure 8 grouped by family/subfamily. Once again, subordinal level differences are apparent. The majority of anthropoids have transmission values between 81% and 92% while strepsirrhines show higher values ranging from 92% to nearly 100%. The exception to this dichotomy is the Callitrichinae, with all five genera having T1 values between 93% and 97%. Using the higher estimate of cochlear impedance (11,200 dynes sec/cm 3 ), anthropoids exhibit T2 values ranging from 54% to 69%, strepsirrhines from 68% to 92%, and callitrichines with values between 70% and 76%. Despite the intermediate values for callitrichines, the difference between anthropoids and strepsirrhines remains significant when evaluated using standard nonparametric statistics, regardless of the estimate for cochlear impedance (T1: U = 12.5, P < 0.001 T2: U = 12.0, P < 0.001).

Genus Tympanic memrbane area (mm 2 ) Stapedial footplate area (mm 2 ) Areal convergence ratio Malleus lever length (mm) Incus lever length (mm) Ossicular lever arm ratio ITR T1 T2
Alouatta 51.5 ± 5.03 (25) 1.53 ± 0.14 (20) 22.21 3.57 ± 0.30 2.24 ± 0.14 1.60 ± 0.18 (6) 0.0178 83.3 57.1
Aotus 25.8 ± 3.11 (24) 0.77 ± 0.08 (15) 22.11 2.84 ± 0.14 1.66 ± 0.14 1.72 ± 0.10 (23) 0.0152 88.3 63.0
Arctocebus 23.7 ± 2.33 (2) 0.78 ± 0.11 (2) 20.05 2.90 ± 0.10 1.44 ± 0.07 2.01 ± 0.03 (2) 0.0122 94.1 71.6
Ateles 56.9 ± 9.13 (9) 1.57 ± 0.25 (10) 23.92 3.24 ± 0.08 1.99 ± 0.25 1.64 ± 0.24 (2) 0.0157 87.0 61.8
Avahi 26.2 ± 0.52 (3) 0.82 (1) 21.41 3.31 ± 0.06 1.59 ± 0.06 2.08 ± 0.11 (2) 0.0109 96.4 75.7
Brachyteles 49.2 (1) 1.43 (1) 22.71 3.30 2.06 1.60 (1) 0.0170 84.9 58.8
Cacajao 32.0 ± 3.26 (22) 1.08 ± 0.01 (2) 19.56 2.89 ± 0.26 1.78 ± 0.22 1.63 ± 0.10 (12) 0.0191 80.7 54.4
Callicebus 31.0 ± 3.51 (17) 0.90 ± 0.11 (9) 22.73 3.02 ± 0.19 1.68 ± 0.10 1.80 ± 0.11 (10) 0.0136 91.6 67.4
Callimico 24.2 ± 2.33 (4) 0.59 ± 0.09 (3) 27.07 2.43 ± 0.10 1.36 ± 0.03 1.78 ± 0.04 (2) 0.0116 95.2 73.4
Callithrix 20.0 ± 1.69 (20) 0.55 ± 0.04 (5) 24.00 2.47 ± 0.14 1.36 ± 0.10 1.82 ± 0.13 (25) 0.0125 93.4 70.5
Cebuella 14.3 ± 1.01 (4) 0.40 ± 0.02 (2) 23.60 2.03 ± 0.04 1.11 ± 0.06 1.85 ± 0.08 (3) 0.0125 93.4 70.7
Cebus 38.6 ± 4.00 (57) 1.06 ± 0.14 (29) 24.04 3.05 ± 0.19 1.90 ± 0.13 1.61 ± 0.11 (37) 0.0160 86.3 61.1
Chiropotes 32.5 ± 3.00 (19) 0.97 ± 0.13 (5) 22.11 2.85 ± 0.13 1.57 ± 0.10 1.82 ± 0.17 (4) 0.0137 91.1 67.0
Daubentonia 43.1 ± 2.65 (5) 1.32 ± 0.17 (2) 21.55 4.78 2.20 2.17 (1) 0.0097 98.1 80.0
Euoticus 17.8 ± 0.21 (2) 2.35 1.16 2.03 (1)
Galago 22.1 ± 1.89 (18) 0.53 ± 0.06 (4) 27.52 2.67 ± 0.12 1.19 ± 0.07 2.24 ± 0.07 (18) 0.0072 99.9 89.8
Galagoides 2.04 0.98 2.08 (1)
Hapalemur 23.0 ± 2.18 (13) 3.18 ± 0.13 1.57 ± 0.04 2.02 ± 0.04 (3)
Indri 36.0 ± 2.89 (5) 1.05 ± 0.18 (2) 22.63 3.65 ± 0.14 1.92 ± 0.20 1.91 ± 0.13 (2) 0.0121 94.1 71.9
Lagothrix 46.2 ± 3.66 (13) 1.67 ± 0.18 (2) 18.26 3.14 ± 0.25 1.82 ± 0.09 1.72 ± 0.06 (4) 0.0183 82.5 56.0
Lemur 26.9 ± 2.70 (26) 0.67 (1) 26.50 3.41 ± 0.10 1.62 ± 0.03 2.10 ± 0.06 (5) 0.0085 99.6 84.7
Leontopithecus 23.7 ± 2.01 (10) 0.65 ± 0.08 (6) 24.06 2.76 ± 0.08 1.49 ± 0.09 1.85 ± 0.11 (7) 0.0121 94.1 71.8
Lepilemur 30.5 ± 3.30 (8) 0.73 ± 0.15 (5) 27.58 3.44 ± 0.12 1.48 ± 0.13 2.34 ± 0.15 (5) 0.0066 99.7 92.0
Macaca 1.17 (1) 3.77 ± 0.36 2.35 ± 0.05 1.60 ± 0.18 (3)
Microcebus 13.0 ± 4.27 (8) 0.37 ± 0.11 (2) 23.19 2.44 ± 0.08 1.23 ± 0.12 1.99 ± 0.13 (5) 0.0108 96.6 76.2
Miopithecus 3.14 1.98 1.59 (1)
Perodicticus 25.3 ± 1.76 (17) 0.73 ± 0.06 (8) 22.87 2.87 ± 0.18 1.39 ± 0.08 2.06 ± 0.05 (8) 0.0102 97.6 78.1
Phaner 0.54 (1) 2.73 1.49 1.83 (1)
Pithecia 38.8 ± 4.50 (20) 0.99 ± 0.12 (8) 25.87 3.01 ± 0.15 1.76 ± 0.09 1.71 ± 0.10 (7) 0.0131 92.0 68.7
Propithecus 29.5 ± 2.83 (17) 0.97 ± 0.11 (2) 20.07 4.12 ± 0.05 2.16 ± 0.23 1.92 ± 0.23 (2) 0.0135 91.6 67.5
Saguinus 20.4 ± 1.16 (13) 0.45 ± 0.05 (4) 29.92 2.47 ± 0.11 1.34 ± 0.16 1.86 ± 0.19 (4) 0.0108 96.5 76.0
Saimiri 20.5 ± 2.10 (57) 0.62 ± 0.06 (14) 21.82 2.35 ± 0.09 1.39 ± 0.06 1.70 ± 0.08 (8) 0.0158 87.0 61.5
Tarsius 2.47 1.47 1.68 (1)
Varecia 30.7 ± 2.25 (10) 3.62 ± 0.19 1.98 ± 0.16 1.84 ± 0.22 (7)
Total 30.4 ± 10.44 (449) 0.95 ± 0.37 (168) 21.12 (26) 2.91 ± 0.43 1.62 ± 0.42 1.82 ± 0.23 (223) 0.0129 (26) 92.1 70.3
  • * T1 is the percentage of transmission through the middle ear calculated using 5,600 dynes sec/cm 3 as an estimate of the specific acoustic impedance of the inner ear and T2 is calculated using 11,200 dynes sec/cm 3 . Areal convergence ratio, ITR, T1, and T2 calculated using genus means. Two-thirds correction factor applied to tympanic membrane area before calculating areal convergence ratio. Error ranges represent one standard deviation and numbers in parentheses are number of specimens/taxa.

Theoretical percentage of acoustic transmission through the middle ear (T) based on ITR values derived from areal convergence and ossicular lever arm ratios. There is a significant difference in T between anthropoids and strepsirrhines despite the intermediate values for Callitrichinae.

The component of the ITR that is primarily responsible for this pattern is the ossicular lever arm ratio. The areal convergence ratios of anthropoids and strepsirrhines are not significantly different (U = 77 P = 0.874), but the difference in lever ratios is highly significant (U = 17 P < 0.001). It is noteworthy that lever ratio is significantly correlated (r = −0.487 P = 0.012) with overall size across all primates, and anthropoids and strepsirrhines differ significantly in overall size. To determine whether the suborder difference in lever ratio persists when this size difference is taken into account, an ANCOVA was performed on lever ratio with size (GM) as a covariate. This ANCOVA confirmed significant differences between anthropoids and strepsirrhines in lever ratio independent of their size differences (F = 74.55 P < 0.001).

To reveal whether the difference between anthropoids and strepsirrhines in lever ratios is due to differences in malleolar or incudal lever arm lengths, ANCOVAs were performed to identify suborder differences in lever arms with size (GM) as a covariate. This revealed no differences between suborders in incudal lever arm length, but significant differences in malleolar lever arm length (F = 23.06 P < 0.001). Thus, suborder differences in ITR are primarily driven by differences in length of the manubrium of the malleus (Fig. 9). Although areal convergence ratio data are not available for tarsiers, the ossicular lever arm ratio of 1.68 (derived from a single specimen) is clearly within the anthropoid range (1.58–1.86) and outside the range for strepsirrhines (1.83–2.34).

Malleus-incus pairs of two similarly sized primates (body mass and geometric mean of skull dimensions), Varecia (left) and Cebus (right), illustrating the relatively longer manubrium of strepsirrhines compared with anthropoids. The relative lengths of the incudal lever arms were not found to differ between these two groups, suggesting that malleolar lever arms are ultimately responsible for the increased ITR and T values of strepsirrhines compared with anthropoids.

When suborder differences in T values are analyzed using a phylogenetically adjusted critical value of the F ratio (Fphylo = 32.96), the suborder differences are not significant despite the high F ratio (F = 19.90). In contrast, the ANCOVA testing for suborder differences in lever ratio while correcting for size remained significant (F = 74.55), even relative to a phylogenetically adjusted critical value of Fphylo = 33. ANCOVAs testing for suborder differences in malleus and incus lever arms with size as a covariate were not significant.


Lorisoid audiograms were compared with platyrrhine audiograms (Fig. 10A) since it was found that the mean body mass for the species in each group (608 and 656 g, respectively) is not significantly different (U = 4 P = 0.827) and the species in one group have equally sized counterparts in the other. The most obvious difference between lorises and platyrrhines is in their low-frequency sensitivity where there is no overlap between the species from each group. Both low-frequency cutoff and low area are significantly different (U = 0 P ≤ 0.05) with platyrrhines showing an average increase in sensitivity of around 15 dB in the lower range. Full area and middle area as well as overall range were also found to be greater in platyrrhines compared with lorisoids (all three: U = 0 P = 0.05). These differences seem to be related to the accentuated secondary peak evident in platyrrhine audiograms (local maxima around 2 kHz Fig. 10A), but only minimally evident in lorisoid audiograms. Threshold of the primary peak is the remaining audiometric variable that shows significant differences (U = 0 P = 0.05) and indicates that platyrrhines are about 9 dB more sensitive at the frequency of greatest sensitivity than are lorisoids. When phylogenetically adjusted critical values of F ratios are calculated, only the suborder differences in low area remained significant (F = 30.40 Fphylo = 25.70).

Audiograms of similarly sized lorisoids and platyrrhines. A: Significant differences were found between the means for these two groups in low-frequency sensitivity, audible area (except high area), range, and threshold of the primary peak. B: When intraspecific variation is considered, platyrrhines (Callithrix) still show better low-frequency sensitivity than lorisoids (Galago), but the other audiometric traits become nonsignificant. Although a difference in high-frequency sensitivity was detected, this appears related to the reduced high-frequency sensitivity of Callithrix and is not characteristic of platyrrhines in general.

To assess the impact of intraspecific variation on comparisons between audiograms, the audiograms for Callithrix jacchus and Galago senegalensis were compared and are shown in Figure 10B. Although the galago data are less complete and appear more variable than the marmoset data, Callithrix still shows significantly better low-frequency sensitivity (U = 0 P = 0.016). However, the values for threshold of the primary peak are nonsignificant, despite there being over 6 dB difference between the means for the two species. A significant difference in high-frequency sensitivity was also detected (U = 0 P = 0.016). However, this difference is obviously related to the restricted high-frequency limit in Callithrix (Fig. 10A) and cannot be said to be characteristic of platyrrhine-lorisoid comparisons in general.

Morphometrics vs. Audiometrics

The results of the correlation analyses using both the original (tip) data and independent contrasts are given in Table 4. Using tip data, Psi shows a positive correlation with frequency of the primary peak (best Hz r = 0.604 P = 0.042) but a negative correlation with low-frequency sensitivity (low cutoff and area), total range, and total audible area (r = 0.677, P = 0.023 r = −0.736, P = 0.012 r = −0.631, P = 0.034 r = −0.759, P = 0.009). Note that an increasing value for low cutoff corresponds to decreased low-frequency sensitivity. Hence, a negative relationship is interpreted from positive correlation coefficient values.

Psi T
Tips Pics Tips Pics
r P r P r P r P
Best Hz 0.604 0.04 0.357 0.19 0.242 0.32 −0.614 0.14
Best dB 0.451 0.11 0.013 0.49 0.636 0.09 0.147 0.41
Low cutoff 0.677 0.02 −0.139 0.37 0.595 0.11 0.130 0.42
High cutoff 0.219 0.29 −0.196 0.32 0.500 0.16 0.106 0.43
Low area −0.736 0.01 −0.194 0.32 −0.626 0.09 −0.108 0.43
Middle area −0.814 < 0.01 −0.391 0.17 −0.436 0.19 0.092 0.44
High area 0.055 0.44 −0.321 0.22 0.124 0.41 −0.388 0.26
Total area −0.759 < 0.01 −0.460 0.13 −0.554 0.13 −0.349 0.28
Total range −0.631 0.03 −0.476 0.12 −0.509 0.15 −0.437 0.23
  • * Results using tip data and standardized independent contrasts are given. Tips, r and P calculated using tip data Pics, r and P calculated using standardized independent contrasts on tree with all branch lengths equal to 1.0. See Figure 5 for audiometric variable descriptions.

The correlation analyses between Psi and audiometric variables using independent contrasts produced rather different results from those obtained using tip data (Table 4). None of the correlations between pinna shape index and the audiometric variables are significant at P < 0.05 or 0.10. The highest correlations are between Psi and total range (r = −0.476 P = 0.117) and total area (r = −0.460 P = 0.126), although both are just above the critical P value.

Using tip data, there is a weak negative correlation between T1 and low area (r = −0.626 P = 0.092). Across all primates, there is a positive correlation between T1 values and the thresholds of the primary peak (best dB r = 0.636 P = 0.087), i.e., as transmission percentage goes up, so does the loudness level (dB) required to hear a sound at the most sensitive frequency. This result is paradoxical because animals should be more sensitive as their transmission percentage increases, not less. This result is attributable to suborder differences. Strepsirrhines have lower sensitivity compared with anthropoids, but highly elevated T1 values (Fig. 11), producing a positive correlation between T1 and sensitivity across all primates. However, correlation coefficients calculated within suborders reveal the expected negative correlation between T1 and threshold of the primary peak (anthropoids, contrasts r = −0.765 strepsirrhines contrast r = −0.262).

The relationship between the theoretical percentage of transmission through the middle ear and threshold of the primary peak taken from audiograms is in the opposite direction of that expected when considering primates as a group, but appears to agree with theoretical expectations when compared within anthropoids and strepsirrhines.

As with the correlations between Psi and audiometric variables, none of the correlations between T1 and audiometric variables were significant when contrasts were used. The lowest P value was produced by the negative correlation between T1 and frequency of the primary peak (r = −0.614 P = 0.135).


Here, we generated a high-quality genome sequence of a representative species (J. sinuosa) from within Jaltomata, a rapidly evolving, florally and reproductively diverse genus in the Solanaceae. We used these data to clarify a complex history of origin that Jaltomata shares with its two most closely related genera—Solanum and Capsicum—and to infer recent TE dynamics that might be responsible for genome-size evolution within the Solanaceae. We also identified overall patterns of significant gene family gain and loss, as well as adaptive molecular evolution, which could be implicated in the rapid reproductive trait evolution that is distinctive to Jaltomata among its Solanaceous relatives.

Comparative Phylogenomic Analysis Reveals Complex History of Divergence among Jaltomata and Its Closest Relatives

The Solanaceae is a highly speciose plant family, with an estimated 100 genera and 2,500 species that have all evolved within the last ∼30 Myr. Previous phylogenetic studies have confirmed that many lineages within the Solanaceae arose within a highly compressed time frame ( Olmstead et al. 1999 Särkinen et al. 2013), making resolution of some evolutionary relationships challenging. In particular, previous molecular phylogenetic studies using chloroplast and nuclear loci indicated that Jaltomata is close to both Solanum and Capsicum, however the relationship among these three genera has varied depending upon the specific loci used in phylogenetic reconstruction ( Bohs and Olmstead 1997 Olmstead et al. 1999 Walsh and Hoot 2001 Olmstead et al. 2008 Särkinen et al. 2013). The J.sinuosa genome therefore provided an opportunity to evaluate and clarify the historical evolutionary relationships among key genera within Solanaceae.

Just as with other recent phylogenomic studies of contemporary ( Brawand et al. 2014 Lamichhaney et al. 2015 Novikova et al. 2016 Pease et al. 2016) or more ancient rapid radiations ( Jarvis et al. 2014 Wickett et al. 2014 Suh et al. 2015 Yang et al. 2015), we detected evidence for substantial gene tree discordance in relationships among the seven Solanaceous species analyzed here. This included high discordance (internode uncertainty) at the internode that split the Petunia lineages from the remaining species ( fig. 3), as well as the internodes separating Jaltomata, Solanum, and Capsicum. In particular, our concatenation-maximum-likelihood phylogeny supported a closer relationship between Jaltomata and Capsicum, but we also detected a similar number of individual gene trees supporting the alternative topology of Jaltomata and Solanum as more closely related ( fig. 3). Genome-wide, the observed pattern of minority gene tree discordance was not consistent with the action of incomplete lineage sorting (ILS) alone (under ILS the two alternative minority trees are expected to be approximately equally represented) ( Degnan and Rosenberg 2009), indicating that a substantial component of discordance was likely also due to introgression between lineages ( Huson et al. 2005).

Following the logical framework in Fontaine et al. 2015, in which the younger (shallower) tree topology is inferred to be due to introgression, we used the relative depth (age) of the two alternative tree topologies to infer that Jaltomata is likely sister to Solanum (a relationship supported by the tree with the older/deeper mean node depths). In contrast, an excess of gene trees supporting a sister relationship between Jaltomata and Capsicum (the tree with on average shorter/younger node depths) is likely to be due to introgression since the split of the three species this is despite the observation that the latter tree is marginally more frequent across all gene trees compared with the next most common bipartition ( fig. 3). Our use of whole-genome sequence data from Jaltomata therefore enabled us to disentangle the likely complex history of origin at the base of the clade that unites these three lineages.

Our analyses also highlighted the extensive phylogenetic incongruence among these and other Solanaceae genera more generally, a history that should be accounted for in comparative studies. In particular, assessing the level and distribution of incongruence is critical when making inferences about trait evolution in radiating lineages, as substantial gene tree discordance can contribute to incorrect inferences of convergence (“hemiplasy”) at both the phenotypic and molecular level ( Hahn and Nakhleh 2016 Wu et al. 2018).

TEs Contribute to Genome-Size Evolution across the Solanaceae

We also used the J. sinuosa genome assembly to examine possible causes of genome-size evolution, specifically variation in TE history. The estimated genome size of J. sinuosa (∼1.5 Gb) is >50% larger than the tomato genome (∼0.9 Gb), but less than half of the hot pepper genome (∼3.5 Gb), despite no accompanying change in ploidy level, suggesting that TEs might be an important contributor to genome-size variation in the Solanaceae. Indeed, we determined that nearly 80% of the assembly consists of repetitive elements, with the vast majority belonging to the Gypsy superfamily of LTR-RTs ( fig. 2). We also inferred a recent proliferation of Gypsy elements that occurred around 1–2 Ma, which might have contributed to the larger genome size in Jaltomata lineages relative to Solanum. Previous studies have revealed a similar pattern of relatively recent Gypsy proliferation in other Solanaceae species, including a substantial excess of Gypsy elements (12-fold more frequent than Copia elements) within the hot pepper genome compared with domestic tomato ( Kim et al. 2014). Similarly, genome-size variation scales with variation in the number of Gypsy repeats among four Nicotiana species ( Xu et al. 2017). In contrast, within Solanum, the larger genome size of wild species S. pennellii compared with domesticated tomato is associated with a recent proliferation of Copia-like elements ( Bolger et al. 2014). Overall, this and other recent analyses indicate that differential activity of LTR-RTs contributed substantially to differences in genome size across the Solanaceae and suggest that future studies examining these and broader TE activity and DNA loss ( Kapusta et al. 2017) could reveal important dynamics of genome-size and content variation in this group.

Both Gene Family Evolution and Specific Molecular Changes Contribute to Unique Reproductive Trait Variation within Jaltomata

Unlike close relatives Solanum and Capsicum, species in Jaltomata exhibit extensive floral diversity in corolla shape and nectar volume and color ( Miller et al. 2011) in addition to an apparently ancestral transition to self-compatibility ( Mione 1992) (Kostyun and Mione, unpublished data). Another goal in developing a representative genome sequence for Jaltomata was therefore to identify genetic mechanisms that might have contributed to the evolutionary origin of these derived floral and reproductive states. In our previous phylogenomic study using transcriptome data, we investigated patterns of molecular evolution in several thousand loci across 14 species within Jaltomata and found that relatively few of the genes that showed a signal of positive selection (i.e., with dN/dS > 1) had functional associations with floral development. However, extensive gene tree discordance and very low sequence divergence (<1% among the species in the radiating group that displays the derived floral traits) only allowed a small subset of genes to be tested at some internal branches ( Wu et al. 2018). Moreover, this transcriptome-based study only focused on molecular variation in coding sequences, although regulatory changes and structural variation, such as gene duplications, are also known contributors to the evolution of novel traits ( Hoekstra and Coyne 2007). The generation of a reference genome in this group therefore allowed us to address the possible role of additional genetic variation that could not be investigated previously.

Interestingly, a major inference from our whole-genome analysis is that each of the seven Solanaceae lineages examined has experienced substantial expansion and contraction of gene families ( fig. 4B), and several of these might have contributed to the distinctive reproductive trait evolution in Jaltomata. First, we detected significant contractions in gene families functionally associated with pollen–pistil interactions, including the putative S-locus receptor kinase family proteins, which could be a genomic signature of an ancestral transition to self-compatibility in this genus. Within the Solanaceae, self-incompatibility is mediated by the “S-locus” which encodes a single female/stylar S-determinant (i.e., S-RNase) and one or more pollen-expressed F-box proteins ( Iwano and Takayama 2012) that interact to cause pollen rejection when the male and female parent share the same allele at the S-locus ( McClure et al. 1989). Multiple studies have shown that loss of the pistil-side S-RNase mediates the transition from SI to SC in Solanaceous species, often followed by the subsequent loss of one or more pollen-side F-box proteins and other components of the SI-machinery ( Charlesworth and Charlesworth 1979 Stone 2002 Takayama and Isogai 2005 Covey et al. 2010). Similarly in Brassicaceae, breakdown of SI appears to occur via loss of pistil-side factors, although the specific genetic mechanisms controlling SI are distinct between these two families ( Kusaba et al. 2001 Sherman-Broyles et al. 2007 Tang et al. 2007 Shimizu et al. 2008). Several analyses in both these families have documented genome-wide effects of loss of SI ( Hu et al. 2011 Slotte et al. 2013), however in most cases the associated transition to selfing has been recent. In contrast to these studies, all examined Jaltomata species are self-compatible ( Mione 1992) (Kostyun and Mione, unpublished data), indicating that the loss of self-incompatibility occurred early within Jaltomata, prior to the origin of the major subgroups within this clade (i.e., >3 Ma). Consistent with an early loss of SI in this genus, we found that heterozygosity in each of 14 Jaltomata species (inferred by remapping RNA-seq reads from Wu et al. [2018] to the genome assembly) is comparable to that found in self-compatible species of Solanum, and much lower than in self-incompatible species of Solanum ( Pease et al. 2016) (see supplementary table S8 , Supplementary Material online). This early loss of SI might explain why we identified significant contractions of gene families related to pollen–pistil interactions specifically in Jaltomata. A similar pattern was observed in the genome of Caenorhabditis nigoni—a self-fertile nematode species that split from its outcrossing sibling species C. briggsae ∼3.5 Ma ( Thomas et al. 2015)—in which several hundred genes mediating protein–protein interactions, including the genes important for sperm–egg interactions, appear to have been lost ( Yin et al. 2018).

Unlike the contraction of gene families potentially associated with relaxed selection, we also identified expansion of gene families that might be associated with the evolution of novel floral traits. In particular, we detected recent tandem duplications of one interesting candidate gene SEUSS ( fig. 5). SEUSS is a transcription factor, which in A. thaliana interacts with LEUNIG within a transcriptional corepressor complex to negatively regulate the expression of homeotic MADS-box gene AGAMOUS ( Franks et al. 2002, , 2006 Sridhar et al. 2006). Mutations in SEUSS cause ectopic and precocious expression of AGAMOUS mRNA, leading to partial homeotic transformation of floral organs in the outer two whorls (i.e., sepal and petal) ( Franks et al. 2002). LEUNIG and SEUSS also have a more general role in lateral organ patterning and boundary formation in Arabidopsis, possibly through interactions with transcriptional factors in the YABBY and KNOX gene families ( Stahle et al. 2009 Lee et al. 2014).

These functional roles are especially interesting with respect to the origin of floral diversity in Jaltomata, as the two derived corolla shapes (i.e., campanulate and tubular) are formed due to differential accelerated growth rates and variation in petal fusion (i.e., boundary formation) ( Kostyun et al. 2017). Thus, the function of SEUSS is closely related to the mechanism generating the corolla shape diversity in Jaltomata. In general, duplication and divergence of floral identity genes appear to play an important role in the evolution of floral morphology in plants, such as the expansion of MIKC c -type MADS-box genes ( Hileman et al. 2006 Panchy et al. 2016). Although we do not yet have expression data for individual tissues (such as individual floral organs), we found that SEUSS copies were expressed only in the reproductive tissues of some lineages, and that species differ in the expression of these SEUSS copies. Interestingly, the SEUSS genes showed more limited expression in the orange-fruited lineages ( supplementary table S15 , Supplementary Material online) with the derived floral traits (i.e., higher extent of corolla fusion), suggesting that downregulation of some or all copies of this negative regulator of floral development might have contributed to the evolution of novel floral (corolla) morphs in this group. Future work with tissue-specific expression and function across different Jaltomata species with distinct corolla shapes will allow us to further assess whether SEUSS variants are strongly implicated in floral trait divergence across Jaltomata.

Apart from gene family contractions and expansions, we also detected loci with patterns of adaptive molecular evolution that could be functionally implicated in reproductive trait changes. Although many of the 58 genes that are inferred to be under positive selection specifically within the Jaltomata lineage ( supplementary table S16 , Supplementary Material online) are involved in general plant processes, some are from functional classes associated with the unique reproductive trait variation observed within this clade. For example, several candidates are associated with pollen–pistil interactions, including an S-ribonuclease binding protein that mediates the degradation of self-pollen ( Sims and Ordanic 2001 Kao and Tsukamoto 2004). Novel changes in these proteins might have followed the early loss of self-incompatibility in this genus, perhaps as a result of reduced constraint on previous functions and the adoption of new roles under the altered reproductive environment of self-compatibility. Other potentially interesting candidates include an auxin response factor, a WRKY transcription factor, and a NAC transcription factor, all of which function in multiple plant developmental processes ( Johnson and Lenhard 2011). Although the plant NAC gene family is quite large and its members play key roles in numerous developmental and stress response pathways, this putative candidate is especially interesting as NAM (a NAC protein) has been shown to function in organ boundary specification, including in flowers of closely related Petunia ( Souer et al. 1996 Zhong et al. 2016), and development of the novel floral forms in Jaltomata appears to be specifically associated with altered floral organ boundaries ( Kostyun et al. 2017).

The Jaltomata Genome as a Tool for Future Comparative Analysis

Overall, the generation of a high-quality Jaltomata genome enabled us to evaluate genetic changes specific to this diverse lineage and across the Solanaceae more generally, as well as to clarify the historical evolutionary relationships among these important clades. In doing so, we inferred that Jaltomata has a complex history of divergence from its most closely related genera, which involves substantial introgression following initial lineage divergence. Based on quantification of recent TE activity, we infer that TE dynamics are likely an important contributor to genome content and genome-size differences among groups in the Solanaceae. Using comparisons of gene family expansion and contraction, and more conventional analyses of adaptive protein evolution, we identified several genetic changes that might have either facilitated or accompanied the unique reproductive trait evolution observed in this clade.

In the future, the interesting evolutionary features and central phylogenetic placement of Jaltomata within the Solanaceae offer a unique opportunity to examine patterns of coding, structural, and TE evolution across this diverse and speciose plant family, as well as to further evaluate mechanisms that could contribute to the rapid reproductive diversification observed specifically within Jaltomata. These future comparative genomic analysis could reveal additional large- and small-scale genetic changes responsible for genomic differentiation and divergent form and function among a set of economically important clades that are well-established models for developmental biology (Petunia), reproductive interactions (Nicotiana), secondary metabolite production (e.g., Capsicum), and ecophysiological responses (e.g., Solanum), as well as within this emerging model for analyzing the evolution of novel floral evolution.

Watch the video: A2 Clades (January 2023).