Basic question about phylogenetic trees

Basic question about phylogenetic trees

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The following is an illustration rom my book of the phylogenetic tree of 6 animal phyla (the red stuff is mine):

I think what's causing some confusion is that there are no names/years at the circled nodes.

My first question is, was there an actual population between nodes (ex.) 2 and 3?

I would like to check if my understanding of the tree is correct:

Before node 1, there was a population which split into two populations: Porifera and the population between nodes 1 and 2 which we will call population 1-2. This population is unnamed on the diagram, either because we don't have a name for it, or because members of that population no longer exist, or to keep the diagram simplified.

Next, after some millions of years, population 1-2 split into the Cnidarians and population 2-3. Etc.

Is my understanding correct?

Thank you very much.

Yes, your understanding is correct.

That being said, it is standard to refer to population / species at specific points in time. Also, it is standard to refer to these ancestral populations relative to what they are the ancestors of. For example, we generally don't refer to thepopulation 2-3(as you named it) but we rather refer to thenode/population/species 3as the Most Recent Common Ancestor (MRCA) of Athropoda, Mollusca, Chordata and Echinodermata. The tree presented here is not complete but all Athropoda, Mollusca, Chordata and Echinodermata are Bilateria while Cnideria and Porifera aren't. Hence, you could say that the ancestor at node 3 is the MRCA of all Bilateria.

Related posts

More Questions About Trees

A few weeks ago, after taking a walk in the woods, I wrote about the puzzling diversity of forest trees. On my walk I found a dozen species sharing the same habitat and apparently competing for the same resources—primarily access to sunlight. An ecological principle says that one species should win this contest and drive out all the others, but the trees haven’t read the ecology textbooks.

In that essay I also mentioned three other questions about trees that have long been bothering me. In this sequel I want to poke at those other questions a bit more deeply.

Top row: white oak (Quercus alba), red oak (Quercus rubra), pin oak (Quercus palustris), sugar maple (Acer saccharum). Bottom row: shagbark hickory (Carya ovata), sweet birch (Betula lenta). All specimens were collected along the Robert Frost Trail in Amherst, Massachusetts, from trees growing no more than 100 meters apart.

Botanists have an elaborate vocabulary for describing leaf shapes: cordate (like a Val­entine heart), cuneate (wedgelike), ensiform (sword shaped), hastate (like an arrowhead, with barbs), lanceolate (like a spearhead), oblanceolate (a backwards spearhead), palmate (leaflets radiating like fingers), pandurate (violin shaped), reniform (kidney shaped), runcinate (saw-toothed), spatulate (spoonlike). That’s not, by any means, a complete list.

Steven Vogel, in his 2012 book The Life of a Leaf, enumerates many factors and forces that might have an influence on leaf shape. For example, leaves can’t be too heavy, or they would break the limbs that hold them aloft. On the other hand, they can’t be too delicate and wispy, or they’ll be torn to shreds by the wind. Leaves also must not generate too much aerodynamic drag, or the whole tree might topple in a storm.

Job One for a leaf is photosynthesis: gathering sunlight, bringing together molecules of carbon dioxide and water, synthesizing carbohydrates. Doing that efficiently puts further constraints on the design. As much as possible, the leaf should turn its face to the sun, maximizing the flux of photons absorbed. But temperature control is also important the biosynthetic apparatus shuts down if the leaf is too hot or too cold.

Vogel points out that subtle features of leaf shape can have a measurable impact on thermal and aerodynamic performance. For example, convective cooling is most effective near the margins of a leaf temperature rises with distance from the nearest edge. In environments where overheating is a risk, shapes that minimize this distance—such as the lobate forms of oak leaves—would seem to have an advantage over simpler, disklike shapes. But the choice between frilly and compact forms depends on other factors as well. Broad leaves with convex shapes intercept the most sunlight, but that may not always be a good thing. Leaves with a lacy design let dappled sunlight pass through, allowing multiple layers of leaves to share the work of photosynthesis.

Natural selection is a superb tool for negotiating a compromise among such inter­acting criteria. If there is some single combination of traits that works best for leaves growing in a particular habitat, I would expect evolution to find it. But I see no evidence of convergence on an optimal solution. On the contrary, even closely related species put out quite distinctive leaves.

Take a look at the three oak leaves in the upper-left quadrant of the image above. They are clearly variations on a theme. What the leaves have in common is a sequence of peninsular protrusions springing alternately to the left and the right of the center line. The variations on the theme have to do with the number of peninsulas (three to five per side in these specimens), their shape (rounded or pointy), and the depth of the coves between peninsulas. Those variations could be attributed to genetic differences at just a few loci. But why have the leaves acquired these different characteristics? What evolutionary force makes rounded lobes better for white oak trees and pointy ones better for red oak and pin oak?

Much has been learned about the developmental mechanisms that generate leaf shapes. Biochemically, the main actors are the plant hormones known as auxins their spatial distribution and their transport through plant tissues regulate local growth rates and hence the pattern of development. (A 2014 review article by Jeremy Dkhar and Ashwani Pareek covers these aspects of leaf form in great detail.) On the mathematical and theoretical side, Adam Runions, Miltos Tsiantis, and Przemyslaw Prusinkiewicz have devised an algorithm that can generate a wide spectrum of leaf shapes with impressive verisimilitude. (Their 2017 paper, along with source code and videos, is at With different parameter values the same program yields shapes that are recognizable as oaks, maples, sycamores, and so on. Again, however, all this work addresses questions of how, not why.

Another property of tree leaves—their size—does seem to respond in a simple way to evolutionary pressures. Across all land plants (not just trees), leaf area varies by a factor of a million—from about 1 square millimeter per leaf to 1 square meter. A 2017 paper by Ian J. Wright and colleagues reports that this variation is strongly correlated with climate. Warm, moist regions favor large leaves think of the banana. Cold, dry environments, such as alpine ridges, host mainly tiny plants with even tinier leaves. So natural selection is alive and well in the realm of tree leaves it just appears to have no clear preferences when it comes to shape.

Or am I missing something important? Elsewhere in nature we find flamboyant variations that seem gratuitous if you view them strictly in the grim context of survival-of-the-fittest. I’m thinking of the fancy-dress feathers of birds, for example. Cardinals and bluejays both frequent my back yard, but I don’t spend much time wondering whether red or blue is the optimal color for survival in that habitat. Nor do I expect the two species to converge on some shade of purple. Their gaudy plumes are not adaptations to the physical environment but elements of a communication system they send signals to rivals or potential mates. Could something similar be going on with leaf shape? Do the various oak species maintain distinctive leaves to identity themselves to animals that help with pollination or seed dispersal? I rate this idea unlikely, but I don’t have a better one.

Surely this question is too easy! We know why trees grow tall. They reach for the sky. It’s their only hope of escaping the gloomy depths of the forest’s lower stories and getting a share of the sunshine. In other words, if you are a forest tree, you need to grow tall because your neighbors are tall they overshadow you. And the neighbors grow tall because you’re tall. It’s is a classic arms race. Vogel has an acute commentary on this point:

In every lineage that has independently come up with treelike plants, a variety of species achieve great height. That appears to me to be the height of stupidity…. We’re looking at, almost surely, an object lesson in the limitations of evolution­ary design….

A trunk limitation treaty would permit all individuals to produce more seeds and to start producing seeds at earlier ages. But evolution, stupid process that it is, hasn’t figured that out—foresight isn’t exactly its strong suit.

Vogel’s trash-talking of Darwinian evolution is meant playfully, of course. But I think the question of height-limitation treaties Joke courtesy Rosalind Reid. (or should we call them treeties?) deserves more serious attention.

Forest trees in the eastern U.S. often grow to a height of 25 or 30 meters, approaching 100 feet. It takes a huge investment of material and energy to erect a structure that tall. To ensure sufficient strength and stiffness, the girth of the trunk must increase as the (frac<3><2>) power of the height, and so the cross-sectional area ((pi r^2)) grows as the cube of the height. It follows that doubling the height of a tree trunk multiplies its mass by a factor of 16.

Great height imposes another, ongoing, metabolic cost. Every day, a living tree must lift 500 liters of water—weighing 500 kilograms—from the root zone at ground level up to the leaves in the crown. It’s like carrying enough water to fill four or five bathtubs from the basement of a building to the 10th floor.

Height also exacerbates certain hazards to the life and health of the tree. A taller trunk forms a longer lever arm for any force that might tend to overturn the tree. Compounding the risk, average wind speed increases with distance above the ground.

Standing on the forest floor, I tilt my head back and stare dizzily upward toward the leafy crowns, perched atop great pillars of wood. I can’t help seeing these plants on stilts as a colossal waste of resources. It’s even sillier than the needlelike towers of apartments for billionaires that now punctuate the Manhattan skyline. In those buildings, all the floors are put to some use. In the forest, the tree trunks are denuded of leaves and sometimes of branches over 90 percent of their length only the penthouses are occupied.

If the trees could somehow get together and negotiate a deal—a zoning ordinance or a building code—they would all benefit. Perhaps they could decree a maximum height of 10 meters. Nothing would change about the crowns of the trees the rule would simply chop off the bottom 20 meters of the trunk.

If every tree would gain from the accord, why don’t we see such amputated forests evolving in nature? The usual response to this why-can’t-everybody-get-along question is that evolution just doesn’t work that way. Natural selection is commonly taken to be utterly selfish and individualist, even when it hurts. A tree reaching the 10-meter limit would say to itself: “Yes, this is good I’m getting plenty of light without having to stand on tiptoe. But it could be even better. If I stretched my trunk another meter or two, I’d collect an even bigger share of solar energy.” Of course the other trees reason with themselves in exactly the same way, and so the futile arms race resumes. As Vogel said, foresight is not evolution’s strong suit.

I am willing to accept this dour view of evolution, but I am not at all sure it actually explains what we see in the forest. If evolution has no place for cooperative action in a situation like this one, how does it happen that all the trees do in fact stop growing at about the same height? Specifically, if an agreement to limit height to 10 meters would be spoiled by rampant cheating, why doesn’t the same thing happen at 30 meters?

One might conjecture that 30 meters is a physiological limit, that the trees would grow taller if they could, but some physical constraint prevents it. Perhaps they just can’t lift the water any higher. I would consider this a very promising hypothesis if it weren’t for the sequoias and the coast redwoods in the western U.S. Those trees have not heard about any such physical barriers. They routinely grow to 70 or 80 meters, and a few specimens have exceeded 100 meters. Thus the question for the East Coast trees is not just “Why are you so tall?” but also “Why aren’t you taller?”

I can think of at least one good reason for forest trees to grow to a uniform height. If a tree is shorter than average, it will suffer for being left in the shade. But standing head and shoulders above the crowd also has disadvantages: Such a standout tree is exposed to stronger winds, a heavier load of ice and snow, and perhaps higher odds of lightning strikes. Thus straying too far either below or above the mean height may be punished by lower reproductive success. But the big question remains: How do all the trees reach consensus on what height is best?

Another possibility: Perhaps the height of forest trees is not a result of an arms-race after all but instead is a response to predation. The trees are holding their leaves on high to keep them away from herbivores. I can’t say this is wrong, but it strikes me as unlikely. No giraffes roam the woods of North America (and if they did, 10 meters would be more than enough to put the leaves out of reach). Most of the animals that nibble on tree leaves are arthropods, which can either fly (adult insects) or crawl up the trunk (caterpillars and other larvae). Thus height cannot fully protect the leaves at best it might provide a deterrent. Tree leaves are not a nutritious diet perhaps some small herbivores consider them worth a climb of 10 meters, but not 30.

To a biologist, a tree is a woody plant of substantial height. To a mathematician, a tree is a graph without loops. It turns out that math-trees and bio-trees have some important properties in common.

The diagram below shows two mathematical graphs. They are collections of dots (known more formally as vertices), linked by line segments (called edges). A graph is said to be connected if you can travel from any vertex to any other vertex by following some sequence of edges. Both of the graphs shown here are connected. Trees form a subspecies of connected graphs. They are mini­mally connected: Be­tween any two vertices there is exactly one Fiddly technical detail: A path is a sequence of edges in which any given edge can appear at most once. This rules out pointless back-and-forth repetition. A sequence such as x, y, x, z is not a path. path. The graph on the left is a tree. The red line shows the unique path from a to b. The graph at right is not a tree. There are two routes from a to b (red and yellow lines).

Here’s another way to describe a math-tree. It’s a graph that obeys the antimatrimonial rule: What branching puts asunder, let no one join together again. Bio-trees generally work the same way: Two limbs that branch away from the trunk will not later return to the trunk or fuse with each other. In other words, there are no cycles, or closed loops. The pattern of radiating branches that never reconverge is evident in the highly regular structure of the bio-tree pictured below. (The tree is a Norfolk Island pine, native to the South Pacific, but this specimen was photographed on Sardinia.)

Trees have achieved great success without loops in their branches. Why would a plant ever want to have its structural elements growing in circles?

I can think of two reasons. The first is mechanical strength and stability. Engineers know the value of triangles (the smallest closed loops) in building rigid structures. Also arches, where two vertical elements that could not stand alone lean on each other. Trees can’t take advantage of these tricks their limbs are cantilevers, supported only at the point of juncture with the trunk or the parent limb. Loopy structures would allow for various kinds of bracing and buttressing.

The second reason is reliability. Providing multiple channels from the roots to the leaves would improve the robustness of the tree’s circulatory system. An injury near the base of a limb would no longer doom all the structures beyond the point of damage.

Networks with multiple paths between nodes are exploited elsewhere in nature, and even in other aspects of the anatomy of trees. The reticulated channels in the image below are veins distributing fluids and nutrients within a leaf from a red oak tree. The very largest veins (or ribs) have a treelike arrangement, but the smaller channels form a nested hierarchy of loops within loops. (The pattern reminds me of a map of an ancient city.) Because of the many redundant pathways, an insect taking a chomp out of the middle of this network will not block communication with the rest of the leaf.

The absence of loops in the larger-scale structure of trunk and branches may be a natural consequence of the devel­opmental program that guides the growth of a tree. Aristid Lindenmayer, a Hungarian-Dutch biologist, invented a family of formal languages (now called L-systems) for describing such growth. The languages are rewriting systems: You start with a single symbol (the axiom) and replace it with a string of symbols specified by the rules of a grammar. Then the string resulting from this substitution becomes a new input to the same rewriting process, with each of its symbols being replaced by another string formed according to the grammar rules. In the end, the symbols are interpreted as commands for constructing a geometric figure.

Here’s an L-system grammar for drawing cartoonish two-dimensional trees:

The symbols f , l , and r are the basic elements of the language when interpreted as drawing commands, they stand for forward, left, and right. The first rule of the grammar replaces any occurrence of f with the string f [l f] [r f] the second and third rules change nothing, replacing l and r with themselves. Square brackets enclose a subprogram. On reaching a left bracket, the system makes note of its current position and orientation in the drawing. Then it executes the instructions inside the brackets, and finally on reaching the right bracket backtracks to the saved position and orientation.

Starting with the axiom f , the grammar yields a succession of ever-more-elaborate command sequences:

When this rewriting process is continued for a few further stages and then converted to graphic output, we see a sapling growing into a young tree, with a shape reminiscent of an elm. I have neglected to specify a few details. At each stage, the length of a forward step is reduced by a factor of 0.6. And all turns, both left and right, are through an angle of 20 degrees.

L-systems like this one can produce a rich variety of branching structures. More elaborate versions of the same program can create realistic images of biological trees. (The Algorithmic Botany website at the University of Calgary has an abundance of examples.) What the L-systems can’t do is create closed loops. That would require a fundamentally different kind of grammar, such as a transformation rule that takes two symbols or strings as input and produces a conjoined result. (Note that in the stage 5 diagram above, two branches of the tree appear to overlap, but they are not joined. The graph has no vertex at the intersection point.)

If the biochemical mechanisms governing the growth and development of trees operate with the same constraints as L-systems, we have a tidy explanation for the absence of loops in the branching of bio-trees. But perhaps the explanation is a little too tidy. I’ve been saying that trees don’t do loops, and it’s generally true. But what about the tree pictured below—a crepe myrtle I photographed some years ago on a street in Raleigh, North Carolina? (It reminds me of a sinewy Henry Moore sculpture.)

This plant is a tree in the botanical sense, but it’s certainly not a mathematical tree. A single trunk comes out of the ground and immediately divides. At waist height there are four branches, then three of them recombine. At chest height, there’s another split and yet another merger. This rogue tree is flouting all the canons and customs of treedom.

And the crepe myrtle is not the only offender. Banyan trees, native to India, have their horizontal branches propped up by numerous outrigger supports that drop to the ground. The banyan shown below, in Hilo, Hawaii, has a hollowed-out cavity where the trunk ought to be, surrounded by dozens or hundreds of supporting shoots, with cross-braces overhead. The L-system described above could never create such a network. But if the banyan can do this, why don’t other trees adopt the same trick?

In biology, the question “Why x?” is shorthand for “What is the evolutionary advantage of x?” or “How does x contribute to the survival and reproductive success of the organism?” Answering such questions often calls for a leap of imagination. We look at the mottled brown moth clinging to tree bark and propose that its coloration is camouflage, concealing the insect from predators. We look at a showy butterfly and conclude that its costume is aposematic—a warning that says, “I am toxic you’ll be sorry if you eat me.”

These explanations risk turning into just-so stories, The French actually call them “why stories”—les contes des pourquoi. like Kipling’s tale of how the elephant got its trunk in a tussle with a crocodile. In place of the Darwinian mechanism of mutation and selection, we tend to think in terms of an individual’s needs and wishes. That’s difficult when the individual is an animal whose mental life—if it has any—is remote from ours. Does the moth fear being eaten? Is the butterfly overjoyed when coming upon a sunny meadow filled with wildflowers? We really don’t know.

And if we have a hard time imagining the experiences of animals, the lives of plants are even further beyond our ken. Does the flower lust for the pollen-laden bee? Does the oak tree grieve when its acorns are eaten by squirrels? How do trees feel about woodpeckers? Confronted with these questions, I can only shrug. I have no idea what plants desire or dread.

Others claim to know much more about vegetable sensibilities. Peter Wohlleben, a German forester, has published a book titled The Hidden Life of Trees: What They Feel, How They Communicate. He reports that trees suckle their young, maintain friendships with their neighbors, and protect sick or wounded members of their community. To the extent these ideas have a scientific basis, they draw heavily on work done in the laboratory of Suzanne Simard at the University of British Columbia. Simard, leader of the Mother Tree project, studies communication networks formed by tree roots and their associated soil fungi.

I find Simard’s work interesting. I find the anthropomorphic rhetoric unhelpful and offensive. The aim, I gather, is to make us care more about trees and forests by suggesting they are a lot like us they have families and communities, friendships, alliances. In my view that’s exactly wrong. What’s most intriguing about trees is that they are aliens among us, living beings whose long, immobile, mute lives bear no resemblance to our own frenetic toing-and-froing. Trees are deeply mysterious all on their own, without any overlay of humanizing sentiment.

Constructing an Animal Phylogenetic Tree

The current understanding of evolutionary relationships between animal, or Metazoa, phyla begins with the distinction between “true” animals with true differentiated tissues, called Eumetazoa, and animal phyla that do not have true differentiated tissues (such as the sponges), called Parazoa. Both Parazoa and Eumetazoa evolved from a common ancestral organism that resembles the modern-day protists called choanoflagellates. These protist cells strongly resemble the sponge choanocyte cells today (Figure 1).

Figure 1. Cells of the protist choanoflagellate resemble sponge choanocyte cells. Beating of choanocyte flagella draws water through the sponge so that nutrients can be extracted and waste removed.

Eumetazoa are subdivided into radially symmetrical animals and bilaterally symmetrical animals, and are thus classified into clade Bilateria or Radiata, respectively. As mentioned earlier, the cnidarians and ctenophores are animal phyla with true radial symmetry. All other Eumetazoa are members of the Bilateria clade. The bilaterally symmetrical animals are further divided into deuterostomes (including chordates and echinoderms) and two distinct clades of protostomes (including ecdysozoans and lophotrochozoans) (Figure 2). Ecdysozoa includes nematodes and arthropods they are so named for a commonly found characteristic among the group: exoskeletal molting (termed ecdysis). Lophotrochozoa is named for two structural features, each common to certain phyla within the clade. Some lophotrochozoan phyla are characterized by a larval stage called trochophore larvae, and other phyla are characterized by the presence of a feeding structure called a lophophore.

Figure 2. Animals that molt their exoskeletons, such as these (a) Madagascar hissing cockroaches, are in the clade Ecdysozoa. (b) Phoronids are in the clade Lophotrochozoa. The tentacles are part of a feeding structure called a lophophore. (credit a: modification of work by Whitney Cranshaw, Colorado State University, credit b: modification of work by NOAA)

Link to Learning

Explore an interactive tree of life here. Zoom and click to learn more about the organisms and their evolutionary relationships.

Review Questions

What do scientists in the field of systematics accomplish?

  1. discover new fossil sites
  2. organize and classify organisms
  3. name new species
  4. communicate between field biologists

Which statement about the taxonomic classification system is correct?

  1. There are more domains than kingdoms.
  2. Kingdoms are the top category of classification.
  3. A phylum may be represented in more than one kingdom.
  4. Species are the most specific category of classification.

Which best describes the relationship between chimpanzees and humans?

  1. chimpanzees evolved from humans
  2. humans evolved from chimpanzees
  3. chimpanzees and humans evolved from a common ancestor
  4. chimpanzees and humans belong to the same species

Which best describes a branch point in a phylogenetic tree?

Which statement about analogies is correct?

  1. They occur only as errors.
  2. They are synonymous with homologous traits.
  3. They are derived by response to similar environmental pressures.
  4. They are a form of mutation.

What kind of trait is important to cladistics?

  1. shared derived traits
  2. shared ancestral traits
  3. analogous traits
  4. parsimonious traits

What is true about organisms that are a part of the same clade?

  1. They all share the same basic characteristics.
  2. They evolved from a shared ancestor.
  3. They all are on the same tree.
  4. They have identical phylogenies.

Which assumption of cladistics is stated incorrectly?

  1. Living things are related by descent from a common ancestor.
  2. Speciation can produce one, two, or three new species.
  3. Traits change from one state to another.
  4. The polarity of a character state change can be determined.

A monophyletic group is a ________.

As an Amazon Associate we earn from qualifying purchases.

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

    If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:

  • Use the information below to generate a citation. We recommend using a citation tool such as this one.
    • Authors: Samantha Fowler, Rebecca Roush, James Wise
    • Publisher/website: OpenStax
    • Book title: Concepts of Biology
    • Publication date: Apr 25, 2013
    • Location: Houston, Texas
    • Book URL:
    • Section URL:

    © Jan 12, 2021 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.


    A total of 332 students participated in the study across all six semesters (Table 1). Study participants reflected the typical composition of students in BIOL 1134, with most being sophomores (40.5 %) majoring in the life sciences (95.9 %). Overall, 53.6 % had previously completed at least one semester of introductory biology at the university level. Because participation in the study was voluntary, it is possible that self-selection bias could potentially influence the results. However, we did not notice any differences in student characteristics, assessment instrument scores, or responses to assessment instrument questions between participants and other course enrollees that would suggest participants were dissimilar from course enrollees overall. Furthermore, study participants’ UET and TTCI scores showed a range of tree-thinking abilities, and scores on MATE pretests and posttests ranged from very high acceptance to very low acceptance of evolution. Therefore, we are confident that was no bias towards students with high tree-thinking skills or initial acceptance of evolution. Unfortunately, the procedures to collect the necessary data and methods to statistically detect, identify, and correct for the myriad of potential factors that can contribute to self-selection bias (Cuddeback et al. 2004) are beyond the scope of this study.

    Tree Thinking and Acceptance of Evolution in F09 and S10 Semesters

    The mean UET score (Table 2) was significantly higher in posttests than pretests for both semesters (F09: t = 3.86, df = 14, p < 0.01 S10: t = 3.77, df = 12, p < 0.01) and showed a normalized change increase of c U = 0.28 ± 0.31 (mean ± SD) in F09 and c U = 0.23 ± (0.2) in S10. Analysis of pretest responses to the UET assessment administered in F09 and S10 identified similar initial misconceptions (Table 2). In both pretests, students had a poor understanding of how time is represented in phylogenies and they frequently used node counting and reading across the tips to determine relationships. Consequently, rotations of trees at the nodes were thought to indicate different evolutionary relationships among lineages. Node counting was used to determine relatedness by 73 and 82 % of students in F09 and S10 pre-tests, respectively. However, the frequency of node counting decreased by over 70 % in posttests for both semesters. Surprisingly, the use of reading across the tips to determine relatedness increased slightly in posttests for both semesters. Students had difficulty reading how traits of different lineages are represented in trees in pretests. They also considered taxa on branches at the left of the phylogeny to be older or “less evolved” than taxa on branches on the right side of the phylogeny. Although many students had difficulty constructing a phylogenetic tree when given a small data set for five hypothetical taxa in pretests, they showed marked improvement in their ability to construct a simple phylogentic tree in the posttest.

    Median MATEPRE scores in both F09 and S10 (Table 3, top portion) were in the High Acceptance of Evolution category (MATE scores 77–88). MATEPOST scores increased in both semesters but were significantly greater than MATEPRE in the S10 semester only. The interquartile range showed a greater increase in S10 than F09. In the S10 MATEPOST responses (Table 4), shifts towards increased acceptance of evolution were particularly clear in questions probing students’ attitudes on the occurrence of evolution (M1, M3, M15, M19), the evidence supporting evolutionary theory (M6, M7, M8, M11), the scientific validity of evolutionary theory (M2, M4, M10, M12, M16, M20), the scientific applicability of evolutionary theory (M9, M13, M14, M18,), and the acceptance of evolutionary theory in the scientific community (M5, M17). Normalized change for MATE in S10, the semester in which tree thinking was integrated into the curriculum, was almost twice as large the normalized change for MATE in F09 (Table 3).

    Tree Thinking and Acceptance of Evolution in S11 Through F12 Semesters

    Mean TTCIPOST scores were significantly higher than mean TTCIPRE scores for all individual semesters and in the pooled data (Table 5, Additional file 3). Students who reported having previously taken introductory biology or previously seen or learned to read phylogenetic trees had significantly better tree reading than those who had not (Table 6). Within pretests and posttests, there were no differences in TTCI scores among students in different years of school or between majors and non-majors. Normalized change scores, c T, showed an average increase of 33 % across all questions in pooled data, with a low of 15 % in S12 and a high of 46 % in S11 (Table 5). Reliability analysis of the TTCI had a Cronbach’s α of 0.60 and 0.68 in the pretest and posttest, respectively, indicating that the assessment instrument questions are measuring the same conceptual construct.

    Pretest TTCI assessment showed that in approximately half of the responses, students correctly answered that phylogenies show relationships among lineages indicated by lines in the diagram (Fig. 1, Table 7). Slightly over one-third of responses correctly identified how time is represented in a phylogeny. Only one fourth correctly responded that nodes indicate where lineages diverge, but approximately one third held the misconception that nodes indicated where lineages came together or hybridized. Students correctly described relationships shown in phylogenies in half of their TTCIPRE responses. As in F09 and S10, node counting was used to determine relationships in phylogenies to a greater extent than reading across the tips. Students could generally identify identical trees showed the same branching pattern of relationships despite node rotations. However, there was a low percentage of correct answers in questions asking students to interpret how trait evolution and speciation is shown in trees. Posttest responses showed improvement in all areas (Fig. 1 Table 7). In posttests, a greater percentage of responses correctly described the components of a phylogenetic tree, and students improved in their ability to determine relationships and interpret traits in trees. Although the frequency of node counting to determine relationships decreased in the TTCI posttest, it still tended to used with greater frequency than reading across the tips. Responses in the TTCI posttest showed increases in correct responses of 15 % or less in Questions T1, T5, T6, T7, and T11. Other questions showed larger increases of 20–40 % in the percentage of correct answers.

    Summary of concepts and skills from S11–F12 TTCI. Percentage of TTCI responses for different tree-thinking concepts and skills. Misconceptions and incorrect methods of determining relationships are indicated by an asterisk. TTCI questions addressing a concept or skill are given in parenthesis. See Additional file 2 and Table 7.

    Looking across questions, TTCI pretest discrimination index values had an average of D = 0.45 ± 0.130 (indicating that the percentage of correct answers to pretest questions was 45 % higher in the U 27 % group than the L 27 % group) (Table 7). Questions such as T4 had low D values due to low percentages of correct answers in the U 27 % and L 27 % groups, while questions such as T5 and T6 had high percentages of correct answers in all groups. Most informative were the questions with the higher D-values (i.e., T1, T9, T12, and T13) which showed that the tree thinking skills that seemed to be most difficult for the low scoring students involved determining relationships and interpreting information about speciation and character evolution.

    Median MATEPRE scores were in the High Acceptance of Evolution category (MATE scores 77–88), in combined data and for F11, S12, and F12 (Table 3, lower portion) and in the Moderate Acceptance of Evolution category (MATE scores 65–76) for S11. Median MATEPOST scores were in the High Acceptance category overall and for individual semesters. The median MATEPOST (Mdn = 82.50) was significantly higher (Z = 7.97, p < 0.001) than MATEPRE (Mdn = 79.00) in pooled data and within semesters (S11: Z = 2.60, p = 0.009 F11: Z = 3.23, p = 0.001 S12: Z = 4.47, p < 0.001 F12: Z = 4.90, p < 0.001). Reliability analysis of the MATE showed excellent internal consistency, with Cronbach’s α values of 0.95 in pretests and 0.925 in posttests.

    The only significant differences in MATE pretest or posttest scores detected among different student categories were for year in school in which seniors had significantly higher MATEPRE and MATEPOST scores than students in other years of school (Table 6). Median MATE posttest scores were significantly higher than pretest scores in all categories except for students who were not life science majors and students who had not previously completed introductory biology. Mean normalized change across semesters for MATE scores were significantly greater than zero (t = 7.648, df = 295, p < 0.001) and showed an average increase of 23 % (c M = 0.23 ± 0.53 N = 296). A large majority of the pretest and posttest responses from students in the U 27 % group were in Likert categories indicating high acceptance of evolution, whereas pretest and posttest responses in the L 27 % group spanned a greater range of Likert categories (Table 8). Despite covering the scientific evidence of evolution during the semester, posttest responses showed that low acceptance of evolution persisted in some students. However, there were shifts in the MATE posttest responses of the L 27 % group in statements that address the basis of evolutionary theory on data and scientific research (Table 8).

    Correlation and Linear Regression Analyses of TTCI and MATE

    Pooling across the S11 through S12 semesters, MATEPRE scores had a significant, positive correlation with all variables except year in school and major (Table 9). The correlation coefficient calculated in regression analysis of MATEPRE was low (R = 0.309, F (6, 297) = 5.21, p < 0.001), and the only significant predictors of MATEPRE in the model were the TTCIPRE score and whether a student had previously seen phylogenetic trees. The R value for MATEPOST over twice as large as in MATEPRE linear regression analyses (R = 0.823, F (8, 295) = 77.58, p < 0.001). MATEPOST scores had a significant, positive correlation with all variables except major and previous introductory biology experience, but only year in school, MATEPRE, and TTCIPOST were significant predictors of a students’ acceptance of evolution as measured by MATEPOST at the end of the semester in linear regression model (Table 9). MATEPOST also had a high correlation coefficient when TTCI scores were replaced with c T in the model (R = 0.822 F (7, 296) = 87.87, p < 0.001), indicating that final acceptance of evolution in the MATE posttest was not only significantly correlated with a student’s initial acceptance of evolution, but also their gains in tree reading ability. The relationship between is improvement in tree thinking and acceptance of evolution is further supported by a significant, positive correlation between c T and cM with ρ = 0.127 (p < 0.05).

    There are no specific resources required to complete this course.

    DOI: 10.6019/TOL.phyl.2015.00001.1

    Course contents

    How and when to access the course

    All our courses are designed with flexibility in mind. You can access them for free at any time, just click on the “Enter Course” button.

    It is up to you how you use the course you can either study the full course or you can focus on sections that are relevant to you. To jump between sections, use the navigation bar on the left or the arrows at the bottom of the page. You can also choose whether to complete the course in one go, or over several visits.

    The average time to read through the main body of the course is 1 hour (not including exercises and external links). The time may vary depending on your prior knowledge and how you choose to work through the course.

    Making the most of the course

    Learning something new takes time and practice. We encourage you to:

    • Use the activities and quizzes to help you check your learning, recall and apply key concepts. Look out for these icons:-
      Activities Quizzes Videos
    • Revisit sections as and when you need them. Bookmark relevant pages in your browser or use the navigation panel to jump the relevant section.

    Getting help and providing feedback

    If something isn’t working or if you have a question get in touch by contacting us at [email protected]

    Tell us what you thought about the course (both good and bad!) using the “Feedback and help” button found at the top of each page.

    Your feedback helps us ensure we are providing training that is relevant and useful for you.

    For help and support on EMBL-EBI resources you can contact the helpdesk directly.

    Learn more

    You can explore other training on offer from EMBL-EBI on our website. We offer online courses, webinars, face-to-face courses and offsite training.

    Building Trees for Biologists

    Mathematics has always been a valuable tool for subjects such as physics, chemistry, and engineering. By comparison it has only been more recently that mathematics has proved to be as illuminating for biology. Examples abound of the way that mathematics and biology are cross-fertilizing each other, ranging from new approaches to finding the distance between two objects of interest (DNA sequences or ancestral trees), or studying the working of neurons or cells. However, there was a small section of biology that mathematics has had longer involvement with and success in. This is the area of genetics. While in some ways genomics and mathematical contributions to computational biology may overshadow this earlier work, it is still the case that methods and ideas developed in "classical" mathematical genetics continue to find applications even now. Some of these classical results are sometimes placed within the domain of population genetics.

    Here I will look at some of the contributions that mathematics has made relatively recently to genetics/genomics and how mathematics grew as a consequence of attempts to model genetics/genomics problems using mathematical tools.

    The idea that species evolve certainly had roots before Charles Darwin (1809-1882) but his book On the Origin of Species from 1859 was a landmark in intellectual history. Darwin was a great observer of nature and he extrapolated from his observations of the natural world both within his homeland England and on his travels (voyage of the Beagle). However, Darwin made limited use of mathematics in his work. He wrote: &ldquoI have deeply regretted that I did not proceed far enough at least to understand something of the great leading principles of mathematics for men thus endowed seem to have an extra sense.&rdquo Many parts of mathematics have been brought to bear on understanding genetics and the genome with goals ranging from increasing the yields of crops, keeping livestock healthier, developing new drugs, finding ways to cure inherited diseases, and getting a basic understanding of the history of life on Earth.

    A basic tenet of evolution is that the species we see today are descended from species that existed in the past and any particular species alive today has origins in such past species. To understand the complexity of living things, scholars developed a classification system for living things that went from the specific to the general. Many scholars contributed to the taxonomy (classification) terms used today but here is a current version:

    Thus, there are many species within a particular genus, and American textbooks usually assert there are 6 kingdoms, but British books often use 5 kingdoms, and when the names in the two systems are lumped we get: Animalia, Plantae, Fungi, Protista, Archaea/Archaeabacteria, Bacteria/Eubacteria and Monera. At all levels of the classification system there are controversies. For a long time there was thought to be only one species of giraffe but recently it has been argued that giraffe belong to four different species. The basic notion of a species is that different species can't have reproducing offspring. However, the reality, as is typical in the sciences, is more complex. In reality, what can happen and in many cases did happen in the wild in the past was that different "species" did mate the phenomenon is known as "hybridization." The hybrids sometimes may have become isolated, leading to a new species, or they may have become "integrated" via mating with one or both of the species that gave rise to the hybrids. In modern times, two different hybrids are, for example, ligers (lion father tiger mother) and tigons (tiger father lion mother).

    Often hybrid males are not able to reproduce though females can, but it appears that sometimes the male hybrids can have progeny. Recently quite a bit of attention has been paid to the fact that there seems to be DNA evidence that the two hominid "species" homo sapiens and Neanderthal interbred and the genetic legacy of this can be found in the genomes of modern homo sapiens.

    We will return to classification issues shortly but first let us take a "detour" to put in perspective how the taxonomy issues of what is being investigated today build on our biological insights since Darwin.

    When early naturalists tried to tell which bird species were alike (or different) they would typically use things like size of the bird, size and shape of bird beaks, etc. to group the birds. However, with the advent of modern biology's discovery of the role of DNA in genetics, new ways were developed to tell which species were close and which ones were far apart.

    There are different approaches to looking at what is the makeup of the genetic material of a species. One can think of the genome of species X as just one long DNA string, as a collection of chromosomes, or as a collection of genes that are located at various places on the chromosomes. When we speak of comparing the genome, say, of a chimp with that of a human there are already complications. Humans have 46 chromosomes (23 pairs) while chimps have 48 chromosomes (24 pairs) though it is thought that we have a good understanding of how these chromosomes match up and there are two pairs of chromosomes of the chimp that seem to "match up" with one pair of human chromosomes. Recall that for most pairs of chromosomes, an individual is made up of one chromosome of a pair from the father and one chromosome of a pair from the mother. One complication is the issue of the sex (female or male) of a human (or chimp). Of the 23 pairs of human chromosomes, 22 are known as autosomes and there is one pair known as sex chromosomes. Usually, the sex chromosomes are known as X and Y chromosomes. Women have two X chromosomes and men have one X chromosome and one Y chromosome. (But as is often the case the usual is not the full story.) Girls get one X chromosome from their dad and one from their mom. Boys must have gotten the Y chromosome from their dad (and not his X chromosome) and one of the two X chromosomes from their mom.

    Typically, one looks inside the nucleus of a cell for the genetic material, where one finds the chromosomes, but there is also genetic material in the mitochondria, which are structures (organelles) found within most cells and which are involved with the energy needs of the cell. The genetic material in the mitochondria in humans is only inherited from one's mother. In some ways tracking genetic material in the mitochondria simplifies the "mechanics" of determining inheritance of traits but the price is that any individual's full genetic endowment is a mixture of what one gets from both of one's parents. It is possible that in some cases the "same" gene inherited from a mother or a dad might function differently.

    The ability of scientists to examine the genetic material of the chromosomes was a milestone in understanding human genetic makeup. However, there is a "finer" unit of inheritance, the gene. Genes as a concept were partially understood before the discovery of DNA. They are "stretches" of the chromosomes which we now know "code" for proteins. Typically one looks for traits such as hair color, eye color, albinism, blood type, sickle cell disease, etc. Here one is noticing the phenotypes (things one can see) associated with a genotype. However, recently the notion there is exactly one gene associated with some of traits thought to have been "controlled" by a single gene has been called into question. Rather, in many cases there may be several genes that affect traits that historically were thought to have simpler mechanisms. A further complication in understanding genes has been that it was initially thought that genes were "contiguous" pieces of DNA. We now know that some pieces of a "gene" are cut out and pieced together with other stretches of DNA to get a piece of genetic material whose function is understood. Commonly, now the terms introns and exons are used. There are many stretches of DNA that are believed to be genes but whose purpose is not yet known. The term "junk DNA" is sometimes used for DNA which was thought to be "non-coding" but many surprises about inheritance are occurring with greater understanding of DNA that is not part of genes.

    After 1953 and the development of ideas of Frances Crick and James Watson, there has been accelerating progress in understanding how the genetic component of each human being works. However, as time has gone on it has become more and more clear that the details of inheritance are much more complicated than was initially thought. In the Crick/Watson "model" genes are responsible (in many cases by complex mechanisms) for generating one or more proteins and these proteins are responsible for the "trait."

    As our knowledge of molecular genetics has increased very simple graphical displays have been used to try to get insights into issues of inheritance. Start with the chromosomes. The autosomic chromosomes are named using the numbers 1 to 22, and the two "sex" chromosomes are called X and Y. Just look at this display (Figure 4), which shows the size of the chromosomes measured in "base pairs," that is, the number of DNA letters that are present in that chromosome and the number of genes currently identified per chromosome are very varied. Much work has been and is being done to understand the structure of chromosomes and why particular genes (and what their functions are) have come to be where they are located on "maps" of chromosomes.

    It is interesting to compare the diagrams where the bottom one of the two is based on less recent information about the chromosomes and the genes on them. The scales on the edges are not quite the same but the changes reflect the growing identification of sections of DNA on individual chromosomes that are actually genes. Another complication that has occurred with time is that the phenomenon of pseudogenes has been identified. The ways that genetic materials are transferred between generations is subject to a variety of changes. Some of these changes are due to the process known as mutation. In the Crick/Watson model this means that some DNA letters get altered which in turn can mean that the pattern of nucleotides that build up proteins (as "programmed" via the DNA) gets altered. Another mechanism is that parts of chromosomes get diced up and reassembled in different ways. One such kind of event is called duplication. When a duplication event occurs many sections of genetic material may get repeated. With time, two identical stretches may be such that they "evolve" to perform different functions or one of the two copies of the duplicated stretch may shut down, as it were, and not function. Such stretches of DNA "look" very similar to active genes that code for proteins but don't seem, based on current knowledge, to do that. These gene-like stretches of DNA are sometime labeled pseudogenes, which, not so long ago biologists (and/or journalists) sometimes referred to "junk DNA." These were long stretches of DNA that seemed not to play a role in the inheritance process. Now, many stretches of so-called junk DNA have been shown to regulate genes and alter the way the genes are expressed. Other parts of junk DNA are pseudogenes, stretches of DNA that perhaps were active in the past but are no longer so, etc.

    For those who learned their genetics years ago, one has to bridge the way discussion of genetics was couched in the not so distant past (and the old point of view has value still) and the more modern terminology. Before Crick/Watson genetics tried to explain the visual differences that run in families of people (animals and plants) and that was charted through the insights of Gregor Mendel and those who built on what he did. Thus, one could see that some people had red hair and it seemed to run in families but sometimes with a puzzling pattern. Putting the old and new pieces together, let us imagine that a trait or its lack is controlled via gene J. The answer lies in the idea of an allele which is short for the term allelomorph. Alleles for a particular gene are variants of the gene. Thus, the phenotype of a trait--what one sees--might depend on the alleles of gene J. Let us call the two different alleles involved U and V. Remember each person would have two copies of the gene J, one from the person's mother and the other from the person's father. The gene pair could be UU, UV, VU, and UU where the first letter of the pair would be from the mother and the second from the father. What one would see in a child if they had UV or VU would be identical. However, some alleles have the property of being dominant and some recessive. This terminology means that when the pair of alleles in a child is UU, UV, or VU one sees trait T. One only sees the absence of T when one's genetic makeup is VV. One of the remarkable insights of mathematics into genetics is that what will happen to recessive traits in the future assuming the recessive trait does not change the chance of the person to live and have a usual size family, independent of having the recessive trait. Assuming random mating where the allele U appears with relative frequency u in the population and the allele V appears with relative frequency v (where u and v are positive real numbers and add to 1, u + v = 1, then the recessive trait will not disappear. What happens in the long run would be independent of the sizes of the numbers u and v initially. Using this relative frequency information we have, where the second equation arises from squaring the first, we have these two relationships:

    Note that u 2 is the frequency in the population where the genotype is having two U alleles (having the same allele is often stated as being homozygous), 2uv is the fraction of the population where the genotype consists of one allele of each kind (heterozyotes), and v 2 is the fraction of the population homozygous for the V allele. If the U allele is dominant over the V allele then the fraction of the population that will show the dominant phenotype is u 2 + 2uv, and the fraction with the recessive phenotype is v 2 .

    It was G.H. Hardy (1877-1947) and Wilhelm Weinberg (1862-1937) who showed there are circumstances under which one's first intuition that recessive traits will "disappear" goes against a simple model for gene propagation. The mathematics (basically the calculation above) shows that after one generation of random mating there will be an equilibrium where the relative frequencies of u and v don't change with time. Of course, some traits will make it less likely that people will in fact randomly mate, but in some cases the same mutations that create some of these traits reoccur again, another reason a recessive trait might not disappear. Thus, albino tigers (and albinos of other species) "regularly" appear though they are rare.


    We will now start to take a look at how one can study ancestry mathematically. The easiest "model" to think of here is that given two species of mammals, for example, there was a common ancestor of these two species at a prior time. This is the notion of a tree of life. However, perhaps it may have happened that a past species X and species Y both gave rise to a species Z via "separate pathways" to what we see today. In terms of using dot/line diagrams, known as graphs, to represent the history of descent, this might mean that the graph involved would not be a tree but would have circuits. Trees have exactly one path between any two vertices (dots) while in circuits some vertices can have two (or more) paths between them. A path is a sequence of edges that join two vertices. Darwin used the metaphor that there is a tree of life but perhaps in the mathematical sense, in some cases, descent diagrams are not the mathematical objects called trees.

    Mathematics typically responds to questions that are raised outside of mathematics by using existing mathematical tools or developing new ones to understand the phenomenon at hand. A good example is the development of the part of mathematics and biology called phylogeny, phylogenetics, which includes the notion of a phylogenetic tree. The term phylogeny was coined by the great German biologist Ernst Haeckel (1834-1919). In addition to coining the term phylogeny he also coined the terms ecology and phylum.

    (Portrait of Ernst Haeckel, courtesy of Wikipedia.)

    Phylogenetic trees are a framework in which one studies how species change with time, one of the goals being to try to understand which species are closer to one another than to other species. This suggests the idea of defining a distance between two species which would obey the nice properties that distances developed in mathematics for other reasons (Euclidean distance, taxicab distance, Hamming distance, etc.) might obey. Typically the distance between two objects P and Q (whether they are points, trees, or languages), d(P,Q), would obey the rules:

    a. d(P,Q) is a non-negative real number.

    c. d(P,Q) = 0 if and only if P = Q

    d. For any three objects P, Q, R

    (which is known as the triangle inequality).

    So given two trees, how might we determine how far apart they are? Or put slightly differently, how can we measure similarity versus dissimilarity for trees, perhaps in some other way than using a "distance"? So if one thinks mathematically one might want to know what kinds of trees are allowed in the comparisons because trees come in many "flavors." A tree is a special kind of geometric diagram known as a graph, which consists of dots called vertices and line segments called edges. What makes trees special is that they have no "circuits" (sometimes called cycles) and consist of one piece. The fact that a tree is in one piece (the technical term is connected) means that one can move between any pair of distinct vertices along edges. A circuit is a collection of edges such that if one starts at a vertex one can walk back to the start using distinct edges. In Figure 5 we see a graph which has several pieces it is not connected. The piece labeled H has 6 vertices and 7 edges and is not a tree--it has three different circuits. The other three pieces, considered together as a graph G in Figure 5, is called a forest, because each piece is a tree. Graph G (without part H) is not connected and consists of three separate trees. The unlabeled piece is a special kind of tree sometimes called a star. This star tree has 6 vertices and 5 edges.

    Figure 6 shows examples of 7 different trees that are all different from each other, that is, they are not isomorphic. In discussions of phylogenetic trees, typically there are some restrictions on the types of trees that are being looked at.

    We will be particularly interested in trees where all of the vertices are either 1-valent or 3-valent (1 edge or 3 edges at a vertex), which are labeled or unlabeled, as well as the related graphs which are known as rooted binary trees, which have exactly one 2-valent vertex. One does have to be careful because not all scholarly work in this area uses the same definitions of terms that others scholars use. In particular, here I will disregard the issue of having direction on the edges (an arrow on one or more edges). In many discussions of phylogenetic trees the edges have a direction. In cases where trees are being used to show ancestral information, vertices towards the top are typically representing "older" things. Often in graph theory one concentrates on the number of edges and vertices of a graph in problem descriptions, but here we will be interested in the number of leaves (1-valent vertices) of a tree. One usually identifies the 1-valent vertices with the taxa (or language families) whose ancestry one is studying. For example, Figure 7 displays the two different (non-isomorphic) trees with 4 leaves.

    But for what follows, we will not look at trees where there are valences (degrees) higher than three. From this point of view there is only one "topological type" of tree with 4 leaves and with the other two vertices having valence 3. Note that for the tree on the left, pairs of 1-valent vertices are not all alike. For example, the two 1-valent vertices on the left can be joined by a path of length two while the two 1-valent vertices at the top need a path of length 3 to join them.

    There are many ways that mathematicians have expanded their study of trees from the first efforts of mathematicians like Arthur Cayley (1821-1895), who in part was interested in using trees as a model for chemical molecules, a practice that continues today. Graph theory finds many applications in chemistry. Before returning to the thread of ancestry ideas in genetics, let us "review" some more aspects of trees.

    a. What are the degrees (valences) of the vertices of the tree?

    In the above tree diagram there are 6 vertices, and 5 edges. In general, trees have the property that there is always one more vertex than there are edges.

    b. Does the tree have a special vertex called the "root?"

    In Figure 9 below there are two trees which are isomorphic (same structure) but have a rather different visual appearance because of the way they are drawn on the page. Both of these trees arise from the tree in Figure 8 by subdividing the horizontal edge of Figure 8. This subdivision process can be thought of as removing the edge and replacing it with a path of length 2, which has a 2-valent vertex in the path. This new vertex of degree 2 can be thought of as the "root" of the tree. From this perspective, for the tree in Figure 8 four of the edges give rise to rooted trees that are the "same" while when the horizontal edge is subdivided one gets a "different" rooted tree.

    The tree on the left in Figure 9 is sometimes the style with which trees that are used for showing the "forks" in making a decision from a starting "root" are displayed. The tree on the right is often the style used for showing family trees and/or trees that show common ancestors of species or genes.

    Figure 10 (Parts of the trees in Figure 9 identified.)

    The concept of the root of a tree may be familiar if you have looked at genealogical trees drawn to look at one's family ancestors. Your genealogical tree may show your "great maternal grandmother" Eve as its root and have labels that show how Eve married and who her children were and who the spouses and children of her children were. Some families are complicated when an ancestor has children by one partner and then marries again and has other children. Other terminology for trees in biology makes reference to the 1-valent vertices as leaves and the non-leaf vertices other than the root as the internal vertices of the tree.

    Both of the non-leaves of the tree in Figure 8 are of valence 3. When one picks an edge of such a tree to place a root on, this can be done by using any edge of the tree. However, sometimes the resulting rooted trees, arising from different edges which can be subdivided to create a root, can be isomorphic. Figure 11 shows how to subdivide an edge in Figure 8 to get a tree that can be thought of as rooted using the 2-valent vertex at the top so that the resulting tree is not isomorphic to the trees in Figure 10. One way to see this is that Figure 11 has a leaf that is joined to a 2-valent vertex while the graph in Figure 10 has no leaf of this kind.

    c. Are you interested in comparing the distance between any pair of vertices in a tree or only vertices which have particular properties? For example, sometimes one is interested in the distance between the leaves of a tree or between the root of a tree and its leaves, or between the "internal" vertices of a tree--vertices of an unrooted tree which are not leaves.

    d. Are there labels assigned to the vertices and/or edges of the tree?

    Sometimes in a diagram vertices/edges are labeled as a convenience to be able to "talk about" which vertex or edge one has in mind when referring to a vertex or an edge. However, sometimes the labels are part of the mathematical model (representation) that the graph is being used to construct. For example, in Figure 12 we have some graphs (trees in fact) with the labels homo sapiens (H), baboon (B), gorilla (G). Do you think all three drawings with their labels are different, or do you think that the top two drawings are "isomorphic" (have the same structure)?

    e. Are there weights located at the vertices of the tree?

    One might want to have a labeled rooted binary tree where the root vertex is assigned the weight 0, which represents the time in the past when a "clock" was started. Now the weights at other vertices are the time when an event occurred as indicated by the presence of that vertex. Sometimes only the root is labeled (0) and the leaves.

    f. Are their weights indicated for the edges of the tree?

    Weights on edges might indicate some way of saying how far apart the ends of the edges are. In Figure 12 we might put weights on the edges in the tree at the top to indicate whether gorillas were closer to humans or baboons were closer to humans. For this kind of model one has to decide if summing edges along a path is meaningful or not.

    Phylogenetic trees

    Now suppose we want to label the leaves of our trees (rooted or unrooted binary trees with taxa (species)), and we want to see how many different ways there are to do this. Consider the example of such a tree with four leaves. The number of ways to choose labels for the two 1-valent vertices of the tree T1 on the left in Figure 7 is 4C2. This quantity counts the number of ways to select combinations of two things from 4, where order does not matter. If you are not familiar with how to calculate this number, which is 6, you can just count them all: A, B A, C A, D B, C B, D and C, D, where we are using the letters A, B, C and D as the 4 labels. Note that the order of the selection does not matter, the selection of B and C is the same as the selection C and B. Figure 13 shows the three ways of labeling the tree on the left of Figure 7 with labels drawn from the four labels A, B, C, and D.

    Why don't we show three additional trees with the other pairs of labels for the vertices which are 1-valent to the left? The answer is that when we select a pair, say B and C, we have also selected the pair A and D. Thus, the 3 labeled trees in Figure 13 already show the different ways that a pair of labels can appear at 1-valent vertices that are two units apart (connected with a path of length 2) or at 1-valent vertices that are three units apart (connected with a path of length 3). What we have done here is a very common thing in mathematics. When it suits our purpose to think of two things that look different to be the same we do it! It is similar to the fact that the fractions 2/4 and 3/6 look different from 1/2 and each other, but we still think of these three fractions as representing the same number.

    Note that for the tree on the right in Figure 7 if we label the leaves for this tree with a distinct letter from A, B, C, D (thus, all 4 leaves have different labels) there is a sense in which one might say all the different ways of labeling the tree on the right are equivalent. However, for the tree on the left in Figure 7 it is natural to think of the 3 trees shown in Figure 13 as being different.

    You may be wondering for a tree whose vertices are 1-valent and 3-valent how many different (non-isomorphic) labeled trees there are with n leaves.

    If a tree has 1-valent and 3-valent vertices and only its leaves are to be labeled where the number of leaves is n (n being 3 or more), then:

    i. The tree has n-2 internal vertices

    ii. The tree has n-3 internal edges

    iii. There are 1(3)(5). (2n -5) different such labeled trees.

    Note that if b(n) (binary trees) denotes the number of trees we are counting, then

    If there are n leaves, the number of edges in our tree is n (leaf edges) + (n -3) internal edges for a total of 2n -3 edges. For a tree with n-1 leaves we have (n -1) leaf edges together with n-4 internal edges or 2n -5 edges total. In our count in going from trees with n-1 leaves to a tree with one more leaf, for each "old" tree we can get (2n -5) new ones, since we can add a new 3-valent vertex (by subdividing an edge and attaching a 1-valent vertex) on any existing edge. We can try checking equation (*) above for n = 4. We get (since 2 x 4 - 5 = 3) 1 x 3 = 3 which is the number of trees we got in Figure 13. For n equal to 5 we get 15 trees. The size of b(n), the number of binary labeled trees grows very rapidly, meaning that the computational work for solving many phylogenetic problems seems not to be doable with algorithms that work in polynomial time.

    What about rooted labeled trees? For each labeled tree with n leaves there are n leaf nodes and n-3 internal edges for a total of 2n -3 edges. The root we select can be placed on any of these edges, so to find the number of labeled rooted trees with n leaves we need to multiply equation (*) by (2n -3) obtaining

    Thus, for n = 4 we should have 15 such trees with 4 leaves. Can you find them all? (Check your answer against Figure 14 where the labels are colors rather than letters.)

    It is worth noting that we are considering two labeled trees the same if the order of the two labeled leaves below a vertex of degree 3 are switched . Thus, for the first row of the leftmost tree in Figure 14, had the blue and red labels been switched, we would consider this the "same" tree. The trees in the top row of Figure 14 are those that arise from rooting the one fully internal edge of the trees in Figure 13.

    Many people find diagrams such as those in Figures 7 and 13 an entry point into getting insights into trees and labeled trees. Others want something more algebraic, and typically computers can't be used to help with calculations when they have to work on diagrams rather than strings of symbols. So for those who like to think about things in a more "algebraic" way, one can get equations which show relations expressed by the geometry in the pictures of graphs. Since a tree with n vertices has n-1 edges, and every tree (other than the trivial tree, the tree consisting of a single vertex) has at least two 1-valent vertices), we can try to count the number of 1-valent vertices of a tree in terms of the other degrees of vertices in the tree. If a tree has ti vertices of valence i (degree i) we can derive the equation below. The key idea is that adding up the degrees of the vertices counts each edge twice, once at each end (together with the fact that there is one more vertex than there are edges in a tree).

    Note that for counting the number of 1-valent vertices in a tree the number of 2-valent vertices does not play a role. Of course, one needs to count the number of 2-valent vertices to see what the total number of vertices in a tree is. You can check that since the bottom tree in Figure 13 has two vertices of degree three, it has four 1-valent vertices.

    There is also the theory of splits, which we will look at for trees with vertices of degree 1 and 3 (such as those in Figure 13) where the leaves have distinct labels. However, the idea carries over to labeled trees in general. Since each edge when cut with a scissors divides the tree into two pieces we can for each edge record the labels in the two pieces into which the tree is cut. I will use the notation U | V for the "split" of the labels into two sets, where the split U | V is the same as the split V | U.

    Here are the splits for the lower tree in Figure 13:

    Of the 5 splits above, each of the other two trees in Figure 13 would have these splits as well but the last split above "codes" the fact that the bottom tree is "different" as a labeled tree from the two above it (up to permutations of the labels).

    The "essential" splits (those for edges that connected two non-leaf vertices) for the other two trees are:

    Thus, we can tell trees apart using their splits instead of relying on visual aspects of trees to cue us that they are different.

    The notion of splits can also be used to construct a distance between two trees, in what is known as the Foulds-Robinson distance (L.R. Foulds and D. F. Robinson) between two trees. (Usually, author names for scholarly papers in mathematics are listed in alphabetical order but for the paper that introduced this distance, Robinson's name is listed first.)

    Given two sets A and B one can compute the symmetric difference between the two sets, which consists of those elements which are in A or B but not both. So the Robinson-Foulds distance between two trees (it can be done for more general trees and splits than the binary unrooted tree case we looked at above) builds on the notion of symmetric difference of sets. Suppose we are given trees T and T*. If S(T) is the number of splits of tree T which are not present for T* and S(T*) is the number of splits of T* which are not present for T, then d(T,T*) = S(T) + S(T*). This distance is relatively easy to compute for sizable trees but it tends to reflect small changes in tree topology in a way which is not "ideal." Also the behavior of this distance on random trees is not ideal. However, the attempt to use this distance in phylogenetic problems has been responsible for research about the properties of this distance function for trees which is of interest for "theoretical" reasons. Also, to the extent that its properties are not "what one wants" it encourages other researchers to find better ideas for finding the distance between trees.

    Graphs are a wonderful tool for studying many parts of mathematics. All of you are probably familiar with examples of convex polyhedra such as the tetrahedron and the cube. For example, here is a diagram of the tetrahedron drawn as a graph in the plane.

    Figure 15 (Graph of the tetrahedron drawn in the plane.)

    It turns out that all polyhedra with all of their vertices of degree three can be generated from the tetrahedron by a sequence of "graph transformations." This fact is a special case of what is known as Steinitz's Theorem. Here is a diagram that shows the way to make such a transformation. One picks two edges that bound the same region (called a face) and join them as shown in Figure 15 which is a schematic. The only face that is shown is the one where two edges have been chosen and this face is "split."

    There are many theorems about how to "modify" graphs which already have a property (say that they represent the edge-vertex graph of a convex polyhedron) and show how to get a new and different graph with the same property. The theory of phylogenetic trees also involves taking trees and showing how to change them or piece them together to get a "related" tree that has the properties one wants. For example Figure 17 shows how to take two trees on the left, where each of these trees has vertices of degree 1 and 3, and perhaps at most one vertex of degree 2, and piece them together to form a new tree with the property that it has vertices of valence 1 and 3 and at most one vertex of valence 2. The process is done in such a way that the number of 1-valent vertices is the sum of the sum of 1-valent vertices in the separate trees.

    If the first tree was built from one set of species and the second from a disjoint set of species, one might try to piece together the two trees to get a tree based on the larger collection of species in some way.

    To tell how far apart two species are one can conceive of various different approaches, each of which might give different insights.

    a. If one had the whole of the human genome from start to finish and the chimp genome from start to finish, one could compute the "distance" between the two genomes.

    b. If one had a list of corresponding genes for humans and for chimps one could for each gene compute the distance between these genes. One could then measure how far apart the two species were by summing the distances between the genes.

    In the world of phylogenetic trees one can use similar distinctions in order to build a tree which shows the "ancestry" relations between a collection of species of interest, say the primates, the birds, or a collection of viruses. However, as noted above, the tree one constructs might merely be one which shows the connection relationships or might try to do more and indicate "distance" relationships for the primates.

    Many "operations" or transformations of one tree to another tree have been conceived of and discussed in view of species closeness issues and mathematical motivations. One wants to see the minimal number of steps to transform one tree to another with the same number of leaves, as some measure of "distance." Three particular operations are well studied. In the sequence of diagrams below, these three operations known as TBR (tree bisection and reconstruction), SPR (subtree prune and regraft) and NNI (nearest neighbor interchange) are explored by the use of diagrams. If a transformation results in a 2-valent vertex such vertices are "suppressed" in these transformations, so that the only trees obtained are ones which have 1-valent and 3-valent vertices. However, sometimes more general versions of trees are used with operations of this spirit.

    Note that in Figure 18 the elements in the sub-pieces of the tree consisting of blobs Z and Y are closer after the switch than before. Here, we switched the blobs X and Z but you might think about switching the blobs X and W instead.

    Figure 19 (TBR tree operation.)

    Again, the tree one constructs could be based on how far apart pairs of species are in terms of their whole genomes or one could construct a family of trees which is constructed on the basis of using each gene that can be compared across the species to find a tree associated with that gene. Now one has a collection of trees, each one based on a different gene and one can try to construct a single tree which is the most "consistent" with the given collection of trees that would represent the species as a whole. Various ways can be devised for constructing such a "consensus" tree but typically the different methods yield different trees. The insights have been helped by the mathematics which attempts to understand how to choose a winner in an election where there is information about how members of a group vote.

    In a rapidly emerging field like computational biology (computational genetics), mathematics has benefited as much as biology (genetics) from the attempts of the mathematically trained to use existing or new ideas to help get further comprehension. Over a period of time these ideas and insights become the foundations of whole new pieces of the mathematical and biological landscape. Many mathematicians and biologists continue to inspire each other in getting a better understanding of both the world of mathematics and the world of biology.


    Alon, N. and H. Naves, B. Sudakov, On the maximum quartet distance between phylogenetic trees, SIAM J. Discrete Math 30 (2016) 718-735.

    Bryant, D. Building trees, hunting for trees and comparing trees, Doctoral Thesis, U. of Canterbury, 1997.

    Clote, P. and R. Backofen, Computational Molecular Biology, Wiley, New York, 2000.

    Felsenstein, J. Inferring Phylogenies, Sinaur, Sunderland, 2004.

    Finden, C., Obtaining common pruned trees, J. of Classification 2 (1985) 255-276.

    Gusfield, D., Algorithms on Strings, Trees, and Sequences, Cambridge U. Press, New York, 1997.

    Gusfield, D., ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks, MIT Press, Cambridge, 2014.

    Hall, B., Phylogenetic Trees Made Easy, Sinauer, Sunderland, 2004.

    Hein, J. and M. Schierup, C. Wiuf, Gene Genealogies, Variation and Evolution, Oxford, New York, 2005.

    Jansson, and C. Shen, W-K Sung, An optimal algorithm for building a majority rule consensus tree, RECOMB 2013, 2013, p. 88-99, Springer.

    Jones, N. and P. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004.

    Klein, P., Computing the edit-distance between unrooted ordered trees, in G. Bilardi et. al. (Eds.), ESA 98, LNCS 1461, pp. 91-102, 1990, Springer.

    Lin, Y., and V. Rajan, B. Moret, A metric for phylogenetic trees based on matching, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9 (2012) 1014-1022.

    Margush, T. and F. McMorris, Consensus-trees, Bull. Math. Bio. 43 (1981) 239-244.

    Page, R. and E. Holmes, Molecular Evolution: A Phylogenetic Approach, Blackwell Science, Oxford, 1998.

    Pevzner, P., Computational Molecular Biology, MIT Press, 2000.

    Robinson, D., Comparison of labeled trees with valency 3, J. Comb. Theory 11 (1971) 105-119.

    Robinson, D. and L. Foulds, Mathematical Biosciences 53 (1981) 131-147.

    Semple, C. and M. Steel, Phylogenetics, Oxford U. Press, New York, 2003.

    Strimmer, K. and A. von Haeseler, Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies, Mol. Biol. Evol., 137 (1996) 964-969.

    Steel, M., Phylogeny, SIAM, Philadelphia, 2016.

    Steel, M. and D. Penny, Distributions of tree comparison metrics--some new results, Syst. Bio., 42 (1993) 126-141.

    Waterman, M., Introduction to Computational Biology, Chapman & Hall, London, 1995.

    Those who can access JSTOR can find some of the papers mentioned above there. For those with access, the American Mathematical Society's MathSciNet can be used to get additional bibliographic information and reviews of some these materials. Some of the items above can be found via the ACM Portal, which also provides bibliographic services.

    The AMS encourages your comments, and hopes you will join the discussions. We review comments before they're posted, and those that are offensive, abusive, off-topic or promoting a commercial product, person or website will not be posted. Expressing disagreement is fine, but mutual respect is required.

    Welcome to the Feature Column!

    These web essays are designed for those who have already discovered the joys of mathematics as well as for those who may be uncomfortable with mathematics.
    Read more . . .


    Albert, J., Wahlberg, J., Leitner, T., Escanilla, D. and Uhlen, M. (1994) "Analysis of a rape case by direct sequencing of the human immunodeficiency virus type 1 pol and gag genes." J Virol 68: 5918-24. [PubMed]

    Arnold, C., Balfe, P. and Clewley, J. P. (1995) "Sequence distances between env genes of HIV-1 from individuals infected from the same source: implications for the investigation of possible transmission events." Virology 211: 198-203. [PubMed]

    Atchely, W. R., and Fitch, W. M. (1991) "Gene trees and the origins of inbred strains of mice." Science 254: 554-558. [PubMed]

    Avise, J. C., and Wollenberg, K. (1997) "Phylogenetics and the origin of species." PNAS 94: 7748-7755. content/full/94/15/7748

    Birch, C. J., McCaw, R. F., Bulach, D. M., Revill, P. A., Carter, J. T., Tomnay, J., Hatch, B., Middleton, T. V., Chibo, D., Catton, M. G., Pankhurst, J. L., Breschkin, A. M., Locarnini, S. A. and Bowden, D. S. (2000) "Molecular analysis of human immunodeficiency virus strains associated with a case of criminal transmission of the virus." J Infect Dis 182: 941-4.

    Blanchard, A., Ferris, S., Chamaret, S., Guetard, D. and Montagnier, L. (1998) "Molecular evidence for nosocomial transmission of human immunodeficiency virus from a surgeon to one of his patients." J Virol 72: 4537-40.

    Brooks, D. R., and McLennan, D. A. (1991) Phylogeny, ecology, and behavior. Chicago: University of Chicago Press.

    Bush, R. M., C. A. Bender, et al. (1999) "Predicting the evolution of human influenza A." Science 286: 1921-1925. [PubMed]

    Doolittle, W. F. (1999) "Phylogenetic Classification and the Universal Tree." Science 284: 2124. [PubMed]

    Doolittle, W. F. (2000) "The nature of the universal ancestor and the evolution of the proteome." Current Opinion in Structural Biology 10: 355-358. [PubMed]

    Edwards, A. W. F. and Cavalli-Sforza, L. L. (1963) "The reconstruction of evolution." Annals of Human Genetics 27: 105-106.

    Efron, B. (1979) "Bootstrap methods: Another look at the jackknife." Annals of Statistics 7: 1-26.

    Efron, B. and Gong, G. (1983) "A leisurely look at the bootstrap, the jackknife, and cross validation." American Statistician 37: 36-48.

    Edwards, A. W. F. and Cavalli-Sforza, L. L. (1964) "Reconstruction of phylogenetic trees." in Phenetic and Phylogenetic Classification. ed. Heywood, V. H. and McNeill. London: Systematics Assoc. Pub No. 6.

    Felsenstein, J. (1981) "A likelihood approach to character weighting and what it tells us about parsimony and compatibility." Biol J Linn Soc Lond 16: 183-196.

    Felsenstein, J. (1981) "Evolutionary trees from DNA sequences: A maximum likelihood approach." J Mol Evol 17: 368-376. [PubMed]

    Felsenstein, J. (1985) "Confidence limits on phylogenies: an approach using the bootstrap." Evolution 39: 783-791.

    Felsenstein, J. (2004) Inferring Phylogenies. Sunderland, MA: Sinauer Associates.

    Fisher, R. A. (1912) "On an absolute criterion for fitting frequency curves." Messenger of Mathematics 41: 155-160.

    Fitch, W. M. (1970) "Distinguishing homologous from analogous proteins." Syst. Zool. 28: 132-163.

    Futuyma, D. (1998) Evolutionary Biology. Third edition. Sunderland, MA: Sinauer Associates.

    Goujon, C. P., Schneider, V. M., Grofti, J., Montigny, J., Jeantils, V., Astagneau, P., Rozenbaum, W., Lot, F., Frocrain-Herchkovitch, C., Delphin, N., Le Gal, F., Nicolas, J. C., Milinkovitch, M. C. and Deny, P. (2000) "Phylogenetic analyses indicate an atypical nurse-to-patient transmission of human immunodeficiency virus type 1." J Virol 74: 2525-32.

    Hennig, W. (1966) Phylogenetic Systematics. (English Translation). Urbana: University of Illinios Press.

    Hillis, D. M., and Bull, J. J. (1993) "An empirical test of bootstrapping as a method for assessing confidence on phylogenetic analysis." Syst. Biol. 42: 182-192.

    Hillis, D. M., J. J. Bull, et al. (1992) "Experimental phylogenetics: Generation of a known phylogeny." Science 255: 589-592. [PubMed]

    Holmes, E. C., Zhang, L. Q., Simmonds, P., Rogers, A. S. and Brown, A. J. (1993) "Molecular investigation of human immunodeficiency virus (HIV) infection in a patient of an HIV-infected surgeon." J Infect Dis 167: 1411-4. [PubMed]

    Hudson, R. R. (1992) "Gene trees, species trees and the segregation of ancestral alleles." Genetics 131: 509-513. [PubMed]

    Huelsenbeck, J. P., Ronquist, F., Nielsen, R., and Bollback, J. P. (2001) "Bayesian inference of phylogeny and its impact on evolutionary biology." Science 294: 2310-2314. [PubMed]

    Kitching, I. J., Forey, P. L., Humphries, C. J., and Williams, D. M. (1998) Cladistics: The Theory and Practice of Parsimony Analysis. Second Edition. The Systematics Association Publication No. 11. Oxford: Oxford University Press.

    Li, W.-H. (1997) Molecular Evolution. Sunderland, MA: Sinauer Associates.

    Machuca, R., Jorgensen, L. B., Theilade, P. and Nielsen, C. (2001) "Molecular investigation of transmission of human immunodeficiency virus type 1 in a criminal case." Clin Diagn Lab Immunol 8: 884-90. [PubMed]

    Maddison, W. P., and Maddison, D. R. (1992) MacClade. Sunderland, MA: Sinauer Associates.

    Nei, M. and Kumar, S. (2000) Molecular Evolution and Phylogenetics. New York, NY: Oxford University Press.

    Nichols, R. (2001) "Gene trees and species trees are not the same." Trends Ecol Evol. 16: 358-364. [PubMed]

    Ou, C. Y., Ciesielski, C. A., Myers, G., Bandea, C. I., Luo, C. C., Korber, B. T., Mullins, J. I., Schochetman, G., Berkelman, R. L., Economou, A. N. and et al. (1992) "Molecular epidemiology of HIV transmission in a dental practice." Science 256: 1165-71. [PubMed]

    Swofford, D. L., Olsen, G. J., Waddell, P. J., and Hillis, D. M. (1996) "Phylogenetic inference." In Molecular Systematics, pp 407-514. Hillis, D. M., Moritiz, C. and Mable, B. K. eds., Sunderland, Massachusetts: Sinauer.

    Veenstra, J., Schuurman, R., Cornelissen, M., van't Wout, A. B., Boucher, C. A., Schuitemaker, H., Goudsmit, J. and Coutinho, R. A. (1995) "Transmission of zidovudine-resistant human immunodeficiency virus type 1 variants following deliberate injection of blood from a patient with AIDS: characteristics and natural history of the virus." Clin Infect Dis 21: 556-60. [PubMed]

    Vogel, G. (1997) "Phylogenetic analysis: getting its day in court." Science 275: 1559-60. [PubMed]

    Wu, C. I. (1991) "Inferences of species phylogeny in relation to segregation of ancient polymorphisms." Genetics 127: 429-435. [PubMed]

    Yirrell, D. L., Robertson, P., Goldberg, D. J., McMenamin, J., Cameron, S. and Leigh Brown, A. J. (1997) "Molecular investigation into outbreak of HIV in a Scottish prison." Bmj 314: 1446-50.

    Zhu, T., B. Korber, et al. (1998) "An African HIV-1 sequence from 1959 and implications for the origin of the epidemic." Nature 391: 594-597. [PubMed]


    Although there are quite a few texts that cover phylogenetic inference, relatively few deal explicitly with interpreting phylogenetic trees or applying them to address broader evolutionary questions. In this section, some of the classic books that cover areas of phylogenetics beyond tree inference itself are listed. Of these, Baum and Smith 2013 provides the most in-depth introduction to reading phylogenetic trees, while the others (Brooks and McLennan 1991, Eldredge and Cracraft 1980, Harvey and Pagel 1991, Wiley and Lieberman 2011) discuss some of the ways that phylogenies are important for addressing questions about evolutionary patterns and processes.

    Baum, D. A., and S. D. Smith. 2013. Tree thinking: An introduction to phylogenetic biology. Greenwood Village, CO: Roberts.

    This textbook provides a broad introduction to tree thinking, with several chapters devoted to interpreting evolutionary relatedness and patterns of trait evolution using phylogenies. It is written to be accessible to undergraduates and biologists in fields outside of phylogenetics.

    Brooks, D. R., and D. A. McLennan. 1991. Phylogeny, ecology, and behavior: A research program in comparative biology. Chicago: Univ. of Chicago Press.

    This book focuses on interpreting patterns of speciation and adaptation in a parsimony framework and includes many fascinating biological examples. It is important to note that many of the trees are based on morphological characters, which are not commonly used today for phylogenetic studies of adaptation because of the potential for circularity.

    Eldredge, N., and J. Cracraft. 1980. Phylogenetic patterns and the evolutionary process: Method and theory in comparative biology. New York: Columbia Univ. Press.

    The authors cover the basics of cladistics but also devote special attention to how phylogenetic analysis can be applied to addressing macroevolutionary questions, such as the role of adaptation to new niches in diversification. Recent advances in statistical methods have resulted in renewed interest in applying phylogenies to testing such questions.

    Harvey, P. H., and M. D. Pagel. 1991. The comparative method in evolutionary biology. Oxford Series in Ecology and Evolution. Oxford: Oxford Univ. Press.

    A classic book in comparative biology that lays out the need for incorporating phylogenetic history in any analysis that spans multiple taxa. It describes how phylogenetic approaches can be used to test adaptive hypotheses by using both discrete and continuous data.

    Wiley, E. O., and B. S. Lieberman. 2011. Phylogenetics: The theory and practice of phylogenetic systematics. 2d ed. Hoboken, NJ: Wiley.

    An updated version of Wiley’s 1981 book (New York: Wiley), this text covers many concepts relevant to interpreting trees. For example, the authors discuss different kinds of trees and different ways that character evolution is represented graphically on trees.

    Users without a subscription are not able to see the full content on this page. Please subscribe or login.


    Abzhanov, A., M. Protas, B. R. Grant, P. R. Grant, and C. J. Tabin. 2004. Bmp4 and morphological variation of beaks in Darwin’s finches. Science 305:1462–1465.

    Alroy, J. 1999. The fossil record of North American mammals: Evidence for a Paleocene evolutionary radiation. Syst. Biol. 48:107–118.

    Bastide, P., C. Solís-Lemus, R. Kriebel, K. W. Sparks, and C. Ané. 2018. Phylogenetic comparative methods on phylogenetic networks with reticulations. Syst. Biol.

    Baum, D. A., and S. D. Smith. 2012. Tree thinking: An introduction to phylogenetic biology. in Tree thinking: An introduction to phylogenetic biology.

    Benton, M., and D. A. T. Harper. 2013. Introduction to paleobiology and the fossil record. John Wiley & Sons.

    Drummond, A. J., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214.

    Falconer, D. S., T. F. C. Mackay, and R. Frankham. 1996. Introduction to quantitative genetics (4th edn). Trends Genet. 12:280. [Amsterdam, The Netherlands: Elsevier Science Publishers (Biomedical Division)], c1985-.

    Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates, Inc., Sunderland, MA.

    Felsenstein, J. 1985. Phylogenies and the comparative method. Am. Nat. 125:1–15.

    Fisher, R. A. 1930. The genetical theory of natural selection: A complete variorum edition. Oxford University Press.

    Foote, M. 1997. The evolution of morphological diversity. Annu. Rev. Ecol. Syst. 28:129–152.

    Harvey, P. H., and M. D. Pagel. 1991. The comparative method in evolutionary biology. Oxford University Press.

    Heath, T. A., J. P. Huelsenbeck, and T. Stadler. 2014. The fossilized birth–death process for coherent calibration of divergence-time estimates. Proc. Natl. Acad. Sci. U. S. A. 111:E2957–E2966. National Academy of Sciences.

    Lande, R. 1976. Natural selection and random genetic drift in phenotypic evolution. Evolution 30:314–334.

    Losos, J. 2009. Lizards in an evolutionary tree: Ecology and adaptive radiation of anoles. University of California Press.

    Losos, J. B. 2011. Seeing the forest for the trees: The limitations of phylogenies in comparative biology. Am. Nat. 177:709–727.

    Lynch, M. 1990. The rate of morphological evolution in mammals from the standpoint of the neutral expectation. Am. Nat. 136:727–741.

    Lynch, M., and B. Walsh. 1998. Genetics and analysis of quantitative traits. Sinauer Sunderland, MA.

    Pennell, M. W., and L. J. Harmon. 2013. An integrative view of phylogenetic comparative methods: Connections to population genetics, community ecology, and paleobiology. Ann. N. Y. Acad. Sci. 1289:90–105.

    Rabosky, D. L. 2010. Extinction rates should not be estimated from molecular phylogenies. Evolution 64:1816–1824.

    Raup, D. M. 1985. Mathematical models of cladogenesis. Paleobiology 11:42–52.

    Raup, D. M., S. J. Gould, T. J. M. Schopf, and D. S. Simberloff. 1973. Stochastic models of phylogeny and the evolution of diversity. J. Geol. 81:525–542.

    Rice, S. H. 2004. Evolutionary theory. Sinauer, Sunderland, MA.

    Rolshausen, G., G. Segelbacher, K. A. Hobson, and H. M. Schaefer. 2009. Contemporary evolution of reproductive isolation and phenotypic divergence in sympatry along a migratory divide. Curr. Biol. 19:2097–2101.

    Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.

    Rosenblum, E. B., H. Römpler, T. Schöneberg, and H. E. Hoekstra. 2010. Molecular and functional basis of phenotypic convergence in white lizards at White Sands. Proc. Natl. Acad. Sci. U. S. A. 107:2113–2117.

    Rosindell, J., and L. J. Harmon. 2012. OneZoom: A fractal explorer for the tree of life. PLoS Biol. 10:e1001406.

    Sepkoski, J. J. 1984. A kinetic model of phanerozoic taxonomic diversity. III. Post-Paleozoic families and mass extinctions. Paleobiology 10:246–267.

    Slater, G. J., L. J. Harmon, and M. E. Alfaro. 2012. Integrating fossils with molecular phylogenies improves inference of trait evolution. Evolution 66:3931–3944.

    Uyeda, J. C., T. F. Hansen, S. J. Arnold, and J. Pienaar. 2011. The million-year wait for macroevolutionary bursts. Proc. Natl. Acad. Sci. U. S. A. 108:15908–15913.

    Valentine, J. W. 1996. Evolutionary paleobiology. University of Chicago Press.

    Wright, S. 1984. Evolution and the genetics of populations, Volume 1: Genetic and biometric foundations. University of Chicago Press.

    Yang, Z. 2006. Computational molecular evolution. Oxford University Press.

    Watch the video: Making a Phylogenetic Tree with Bootstrap Support Values in MEGA (December 2022).