Class notes
1 Overview & species concepts
2025-08-26
1.1 Course overview
Follows Coyne & Orr 2004(?) broadly. This was a highly impactful book, very timely, and heralded the dawn of the genomic era. The period of roughly 2012–20 saw incredible growth in our understanding of genomes and speciation. Species concepts are important, especially if your thesis includes the term “speciation”, but we won’t spend too much time in the class on these debates on definition.
Till exam 1: reading fundamentals of speciation questions. Post-exam-1: stick with basic questions but delve on what genomics has done to advance understanding. Word and sentence counts in assignments need to be followed strictly, aim is to get used to condensing important information.
1.2 Concepts of species
Species are not just for bookkeeping; different species are actually different, and they do different things. Cryptic species are an example of why (community) ecologists should care about work of population geneticists.
If one could conclude as to the nature of the Creator from a study of creation it would appear that God has an inordinate fondness for stars and beetles.
— JBS Haldane, on creationism and the sheer number of beetle species
Early ideas of species concepts can date back to, as with many things, Aristotle. His notion of a perfect “essence” asserted the existence of a perfect form for each species, variations from which are imperfections in its actualisation. The essence of a species was fixed, but involved descent as the crucial factor tying different individuals in a species together. This implied a fundamental difference between inter- and intraspecific variation.
1.2.1 Biological species concept
The Darwinian revolution insisted that variation is not just noise, but rather the core (nuisance vs crux). Everything shares common descent, variation is at the heart of evolution, and there is no true difference between inter- and intraspecific variation. His way of distinguishing variants was using the morphological gap criterion (few intermediates), although his concept can be applied to ecology, behaviour or genetics too.
Ernst Mayr, starting in the 1950s, adapted Darwin’s ideas to eventually come up with the biological species concept, where species are defined as interbreeding natural populations, reproductively isolated from others. Before him, Poulton (1904) had similar ideas, with the notion of “syngamy” (interbreeding) as the true meaning of species. (As did Lotsy and Dobzhansky.)
Some adherents of BSC include those who worked extensively on model organisms like Drosophila in the lab (Coyne, Orr, Noor).
Impact of BSC
- Species now defined by process important to their own maintenance (biological, as opposed to scientist sitting in a museum describing from specimens).
- Allowed, and drew attention to, concepts of gene flow and isolating mechanisms.
- Provided criterion for diagnosing cryptic taxa.
Problems of BSC
Less universally applicable than Darwinian concept:
- Provides no useful insight regarding large-ranged species having several geographically isolated populations (difficult to check interbreeding in allopatry).
- Hybridisation is rather common, even between non-sister taxa. This might not be just noise!
- Cannot deal with asexual organisms that have divided into obviously recognisable populations/groups.
- Circularity: diagnosis of species and cause of speciation are the same.
Also, reproductive isolation not likely adaptive, and prezygotic compatibility is the true reality underlying species—isolating mechanisms a result rather than a cause of species separateness (Paterson 1985). Recognition concept vs isolation concept, common fertilisation system.
1.2.2 Phylogenetic species concepts
1.2.2.1 Hennig 1966: Cladistics/monophyly-based
Hennig brought about the discipline of cladistics, which eventually evolved into the modern PSC, where a clade means a monophyletic group. Therefore, this species concept is based on monophyly, i.e., history. Problems:
- Can lead to considering very small isolated groups (due to focus on monophyly)
- Forced to leave behind paraphyletic taxa
- Contingent on quality of trees! Early on, mostly mitochondrial DNA.
- Species tree vs gene tree discordance.
- Maddison 1997 showed how this discordance can arise (short speciation event times, wide branches?, related to incomplete lineage sorting).
- Can’t rely blindly on numbers from gene trees.
1.2.2.2 Cracraft 1989: Irreducible cluster/diagnosably distinct
Focus on irreducible clusters of organisms, diagnosably distinct from others, and among which there is a parental pattern of ancestry and descent. Most proponents of this were from a museum background. Recently, this concept has started to make inroads into zoological taxonomy, raising subspecies previously demoted (under polytypic species concept) to species level.
1.2.2.3 Modern phylogenetic approaches
- Species delimitations using multi-locus coalescent (Yang & Rannala 2014); using Bayesian posterior to compare single-species or two-species models based on genomic data
- Problems: Same as Bayesian methods in general—importance of prior info on phylogenetic relationships.
- Barcode of Life (Paul Hebert): driven by biodiversity crisis, don’t need to wait for taxonomic expertise, can instead just collect and insert in barcoding pipeline.
- Problems: For many species, there is no clean gap in barcodes to distinguish intra- and interspecific variation. So, useful for identifying taxa from samples, but controversial for describing new species.
- Also, single gene taxonomy, gene-vs-species tree, introgression of mtDNA
1.2.3 van Valen 1976: Ecological species concept
Places greater focus on stabilising selection than gene flow. This is an especially true/useful approach for bacteria: different bacteria can occupy very different niches and have no gene flow, but act/behave the same in terms of where they are found, physiology, etc.
1.2.4 De Queiroz 1998: General lineage concept
Focuses on evolutionary independence, and considers species as segments of population-level variation?
1.2.5 Alternative/dissenting species concepts
Species are not real!
Some perspectives/justifications for this view:
- Taxonomic perspective: subspecies may be elevated to species, but still grade into each other so don’t hold much meaning
- Consider populations as evolutionary units, not species
- Spatial and temporal forms that are not easy to tell apart as species vs not.
- Genetic markers: gene flow too low to maintain cohesion
- Phenetics
1.2.6 Operational definition
Hellberg’s operational definition is based on genotypic cluster criterion (Mallet 1995). The emphasis is on how combinations of traits cluster separately or together between groups. What’s critical is the presence of genotypic gaps between local populations, so this concept makes sense for sympatric populations but not so much for allopatric ones.
It can be pictured as a bimodal distribution of traits. This view relies on Bayesian stats, using “assignment tests” and so on.
2 Geography of speciation
2025-08-28: Allopatric and sympatric speciation.
Allopatric speciation has always been popular, and has been the preferred model of speciation until recently, whereas sympatric speciation has kinda been the underdog.
2.1 Allopatric speciation
Jordan’s Law (1907): David Starr Jordan was the founder of Stanford University. His model system was Indopacific wrasses, and he had ideas about allopatric speciation well before Ernst Mayr. He believed: given any species in any region, nearest related species not likely to be found in the same region but in a neighbouring district separated from the first by a barrier of some sort. A demonstrative example was squirrels in the Grand Canyon.
Ernst Mayr was a big proponent of allopatry, and specifically peripatry. He emphasised the role of drift in populations. Model systems were birds of paradise and Dicrurus in SE Asia.
This has been the preferred model for multiple reasons:
- Easier: no gene flow to contend with
- Lots of opportunities in nature for allopatry
- Lots of evidence:
- Young sister species tend to be allopatric
- Concordance between geography and species borders
- Extension of above: Geographic coincidence of multiple separations of sister species and/or hybrid zones (one barrier/feature showing up again separating multiple groups of species)
- Increase in reproductive isolation in allopatry
However, even in this model, not incorporating explicit selection (i.e., relying only on drift/mutation) into calculations of species generation times results in very long times.
2.2 Sympatric speciation
2.2.1 Problems
Genetic problem: Felsenstein (1981)—also inventor of bootstrapping, maximum likelihood estimation, etc.; has 15 papers with >1000 citations—argued for the problem of recombination. In sympatry, need to avoid recombination by mating differentially.
Ecological problem: Ecological patterns like coexistence, competitive exclusion (Gause’s Rule) occur because in sympatry need to live differently enough from others.
2.2.2 Solutions
Disruptive sexual selection; disruptive natural selection (mating is linked to habitat, and the fewer genes [of large effect] the better). These are now very much linked to the concept of magic traits, for which there has been much evidence.
2.2.3 Outstanding problem
Additional requirements for sympatric speciation are that they need to be sister species, and we need to be sure that past allopatry was unlikely. This has been more difficult to show, and many popular examples fail here.
Lake Victoria cichlids had crazy sympatric speciation possible due to a combination of factors like trophic divergence, sexual selection, and microallopatry. However, there is controversy around the idea that they are only 14,000 years old—some people don’t think the lake fully dried out.
In the case of Rhagoletis fruit flies, which have sympatric differentiation in host plant choice (between hawthorn and apple; GL Bush), it turned out that they were actually allopatric at some point and only the completion of the speciation process happened in sympatry (Feder et al 2003).
Similarly, in the case of stickleback ecomorphs in postglacial lakes, secondary contact drove character displacement and resulted in sister species, even though completion was in sympatry (Rundle & Schluter 2004).
Thus, more and more evidence has piled up to transform the dichotomy from sympatric vs allopatric speciation, to primary sympatric divergence vs secondary contact introgression.
Discussion: Geography of speciation
2025-09-02
- Doesn’t mean old school (primary sympatric divergence) is fully obsolete. In fact, palm paper is an example. Hellberg also believes it should be one model under consideration.
- What % hybridisation is relevant gene flow? 5% is kinda high still (1 in 20).
- Each data point in a genetic structure plot (k-means clustering) is the allele type expressed at each biallelic locus (genotype).
- Differentiation is different from variation
3 Prezygotic isolation
2025-09-04
Reproductive isolation was a hot topic in evolution after WWII and well into the 1990s. Studies focused on two kinds of reproductive barriers (prezygotic and postzygotic), and eventually this dichotomy was paralleled in the scientists who studied them (field vs lab [e.g., Drosophila folks] respectively).
Prezygotic isolation can be considered in two broad, but not mutually exclusive, categories: ecological (temporal, pollinator, habitat) and sexual (behavioural, mechanical, gametic) isolation.
3.1 Evidence for ecological prezygotic isolation
Parallel speciation: stickleback ecomorphs in post-glacial lakes; ecotypes that are geographically isolated still reproduce
Magic traits: ecological gene under selection pleiotropically causes assortative mating (affects reproductive isolation). Heliconius butterflies of Amazonia; cichlids in E Africa lake (depth of lake affects colouration favoured); feather lice & pigeon hosts (Villa et al 2019 PNAS; check videos).
Strong habitat selection: walking sticks in California, immigrant inviability (abiotic-dispersed organisms)
Isolation-by-distance/adaptation: geographic distance \(\longrightarrow\) genetic distance OR ecological/adaptive divergence \(\longrightarrow\) genetic distance
3.2 Evidence for sexual prezygotic isolation
Runaway sexual selection: positive feedback loops
Sexual antagonism: males and females may have different optima for same phenotype
Mating isolation: genitalia differentiation in beetles, moths, etc.; lock-and-key also in pollinator/flower compatibility.
Gametic isolation is a special case which is post-mating but pre-zygotic (reproductive tracts, bodily fluids). Conspecific sperm precedence (mating order); acrosome reaction for V envelope dissolution around eggs in marine gastropods.
Discussion: Prezygotic isolation
2025-09-09
- Sentences in Science are usually more assertive/confident than would be otherwise.
- GWAS: genome-wide association studies.
- Sex chromosomes limit damage when dangerous alleles show up (doesn’t work same in both sexes, sexually antagonistic)
4 Postzygotic isolation
2025-09-11
Barriers after the zygote is formed, which can act via reduced hybrid viability, reduced hybrid fertility, hybrid breakdown, etc.
This topic has received much less focus than prezygotic isolation, because of ascertainment bias: once prezygotic isolation is completed, then cannot see postzygotic isolation. Also, other things like reinforcement strengthen prezygotic isolation.
However, there are many reasons to consider postzygotic isolation. First, in many ways, ecological isolation is a form of postzygotic isolation. Moreover, even in cases where hybrid formation occurs, the persistence of distinct species means there is something decreasing the fitness of these hybrids. Also, intrinsic post- isolation may become stronger at \(\ge\) F2 crosses, while extrinsic post- and pre- become weaker
Darwin had a conundrum about contradictory ways in which evolution worked: on the one hand natural selection operates, but on the other hand unfit hybrids are produced. His solution for this was that sterility arises as an incidental byprouct of “unknown differences”.
Much of the interest in postzygotic isolation is wrt intrinsic.
4.1 Extrinsic postzygotic isolation
External to the individual/organism, i.e., non-developmental. This includes things like habitat isolation, where say niche of hybrids falls somewhere between parental lineages but fitness of hybrids is lower than that of either parent. But in some cases, hybrids may have higher fitness in intermediate habitats, which would explain the increase in hybridisation in the first place.
In stickleback crossing studies, Rundle 2002 found that F1s were bad (relative to parents), but F1-parent backcrosses yielded some fitter individuals (proportional to how much parental heredity the cross had).
Other times, hybrids could have higher disease susceptibility, or intermediate traits that are less attractive (sexual selection).
4.2 Intrinsic postzygotic isolation
Internal to the individual/organism, so quite simply, developmental defects lead to hybrid sterility or inviability.
It can be meiosis issues related to ploidy levels, where hybrids like F1s simply cannot perform meiosis. Or chromosomal rearrangements/inversions like in Australian morabine grasshoppers (White 1978). However, many inversions surprisingly have no effect on heterozygote fitness, because recombination can still happen in other regions of genome.
Other mechanisms include endosymbiotic infection risk for hybrids, or cytoplasmic incompatibilities especially relating to parasites (e.g., intracellular parasites like Wolbachia: when uninfected female flies are crossed with infected males, many/all progeny die, but infection transmitted from mother so those that are born from infected females are infected). There can even be mitonuclear incompatibilities (i.e., between mitochrondrial and cytoplasmic genetic material), famously in copepods (Ron Burton), so much so that Geoffrey Hill even introduced a mitonuclear compatibility species concept.
Allelic incompatibilities are a major way intrinsic postzygotic isolation can occur. The original definition of Dobzhansky-Muller Incompatibility focused on alleles at different loci, arguing that low hybrid fitness is a byproduct of genomes that are geographically isolated. The newer, more specific definition emphasises epistasis (effects of an allele at one locus depend on alleles present at other loci), arguing that negative epistatic interactions occur between alleles of different loci with different evolutionary histories. Some assumptions of this theory are that there occurs at least one pair of allele changes, that most incompatibilities are due to derived alleles, and that the no. if incompatible combinations snowballs over time.
4.2.1 Haldane’s Rule
Rule about sex chromosome incompatibilities; it is an exception, because there are very rare exceptions to the rule. Basically: in offspring of two different races, if one (and only one) sex is messed up (absent, rare, or sterile) then that is the heterozygous sex.
NOT caused by: incompatibilities between X- and Y-linked genes; dosage compensation (compensate based on importance of stuff on X); chromosomal arrangement.
Possible causes:
- Dominance: in XY species, hybrid males are affected by all X-linked genes involved in genic incompatibilities, whereas hybrid females are affected only by dominant X-linked genes
Faster-males: in XY species, incompatibilities affecting XY hybrids accumulate more quickly(but exceptions to this, so doesn’t satisfactorily explain)- Faster-X: faster drift at population levels because more X than Y in population (thus this mechanism interacts with dominance)
- Meiotic drive: distorter alleles that distort Mendelian ratios to their own advantage; suppressors evolve to stop them. When two allopatric populations each independently evolve a driver and suppressor, suppressed drivers can become unmasked.
Discussion: Postzygotic isolation
2025-09-16: (absent)
5 Reinforcement
2025-09-18
Reinforcement has been explored in the context of driving enhanced prezygotic isolation in sympatry. It has traditionally defined as when the formation of maladaptive hybrids selects for enhanced prezyogtic isolation—where hybrids are maladaptive if their fitness is non-zero but lower than either parent’s. The late 1980s saw renewed interest in exploring such questions of reinforcement.
However, there was also a concurrent realisation that enhanced prezygotic isolation in sympatry could arise not just due to selection against hybrids, but also due to ecological or other factors. This brought focus to reproductive character displacement, where selection for enhanced prezygotic isolation occurs in order to reduce wasted reproductive effort between RI species. For instance, acoustic space divergence in frogs in TX/MX occurs between not just close congeners but also distant relatives.
Many early popular examples of reinforcement are actually of reproductive character displacement, because the hybrid has zero fitness (no hybrid formation in sympatry). Reviewed in Butlin 1987. Other cases, like the Collared and Pied flycatchers, do conform to the strictly reinforcement expectations, but we no know that there is more going on there.
5.1 Experimental tests for reinforcement
Reviewed in Rice & Hostert 1993.
- Kill-the-hybrid: let them mate but kill hybrids and see if displacement develops
- But this is actually evidence of RCD than of reinforcement
- Seemingly high success rate, but publication bias?
- Disruptive selection on arbitrary trait: kill everything but high and low values of certain neutral trait (like bristles in Drosophila)
- Bristles are not really neutral/arbitrary, have important function
- On the other hand, these results were never replicated in further studies
5.2 Comparative studies of reinforcement
Can ask the following questions:
- Is prezyogtic isolation greater in sympatry?
- (Difficult to test) Are they forming hybrids?
- (Difficult to test) What is the genetic basis? e.g., does having many great effect genes at few nearby loci facilitate more reinforcement (magic traits)?
Lots of work in this area, including on Drosophila from Dobzhansky & Koller 1938 (western US) to Matute 2010 (Sao Tome & Principe), and bindin protein in sea urchins. In the case of the Ficedula flycatchers, male plumage is driven by female choice (because females in sympatry are not different, only males are), and this is just females being choosier than males (fewer eggs than sperm). However, it is not fully clear whether the brown in Pied males evolved to enhance isolation from sympatric Collared, or whether it’s just female-mimicry in order to reduce intermale aggression. This is the more going on in this case.
The genetic basis of reinforcement has been reviewed in Garner et al 2018, and consists of genetic divergence both within species (allopatric vs sympatric contrasts) and between species (greater in sympatry, lower in allopatry). One example is flower colour in Phlox wildflowers, where colour is driven by two loci (intensity and hue), but selection happens on the intensity one.
5.3 Objections to reinforcement
- Need to reconcile selection and recombination (antagonism, similar to that in case of sympatric speciation), which gets worse with more interacting genes
- Race against extinction: Have to be ecologically distinct to survive in hybrid zone, BUT not enough, need to stick around in hybrid zone (asking for a lot)
- Swamping effect: genes for reinforcement must be favoured only in sympatry; genes from outside hybrid zone may swamp out (again, asking a lot)
5.4 Solutions to objections
- Liou & Price 1994: sexual selection can promote reinforcement in hybrid zone
- Kelly & Noor 1996: selection for greater female discrimination can promote reinforcement; “one-allele” solution (same allele favoured in both species); not just sexual selection but strength of selection (turn up volume/intensity)
5.5 Bottom line
This is a theoretically possible mechanism, but in most cases it’s asking for a lot. There is a lot of evidence for greater divergence in sympatry, but most of these cases also have alternative explanations.
The best bet for such a mechanism would involve female choice resulting in heavier divergence in sympatry. This means that systems with greater sexual selection may be more likely to have reinforcement.
6 Wait, what? Background primer
2025-09-25
6.1 Historical demography of speciation
Genetic studies of speciation often infer the species tree from gene trees. Divergence time for the genes is always older than for the species. Trees made with different genes may be different and have disagreements (is this a methods issue, or the actual population demographics, or are they important factors driving speciation itself?). Moreover, gene trees can be complicated by mixing after divergence. So all this variation needs to be accounted for.
Historical demography helps us infer about population size, changes in population size, divergence genetics, and isolation and hybridisation. In this demographic context, certain terms have specific meanings. Population size is considered an indication of genetic variation, with larger populations holding more genetic variation. Genetic diversity is commonly denoted as \(\theta\), but there are multiple metrics used.
6.2 Measures of genetic diversity
\(\theta = 4 N_e u\) is a simple metric representing expected genetic variation, where \(N_e\) is the effective population size (size of an idealised population that would show the same degree of drift as observed), and \(u\) is substitution rate (sometimes aka mutation rate).
Watterson’s \(\theta_w\): Ignores frequencies, and is strongly affected by rare alleles. Based on the number of segregating sites (polymorphisms) in a DNA sample.
Tajima’s \(\theta_\pi\): Based on frequency so rare alleles don’t matter. Observed nucleotide diversity: average number of differences between pairs of sequences in a DNA sample.
Tajima’s \(D\) gives an overall measure. Premise is that under neutrality (observed diversity matching expected diversity), \(D = 0\). \(D \lt 0\) means more rare alleles than expected (negative selection), suggesting bottleneck (multiple loci) or selective sweep (single locus). During selective sweep, when \(D\) drops, linkage disequilibrium rises. Biston betularia is a classic example of selective sweep. \(D \gt 0\) means few rare alleles, suggesting balancing selection (e.g., malaria resistance) or overdominant selection. If you follow a certain selective pressure over time, \(D\) would go from negative to positive.
Mismatch distribution (Rogers & Harpending 1992): quantify mismatches between individuals in a population for each locus, and visualise like a histogram. A clustered peak shows sudden high mutation event; so we can estimate divergence time (e.g., bottleneck in exploding population). This requires examining non-recombining bits of DNA, and earlier mitochondrial DNA was used for this, but nowadays it’s the Y chromosome.
Site/allele frequency spectrum: distribution of allele frequencies for all segregating sites in a sample of DNA sequences. Certain demographic scenarios like bottleneck would result in allele frequency distribution (using combined polymorphism data) close to 0 for 2nd most common allele (i.e., minor allele in biallelic). We can compare frequency with reference of outgroup (unfolded) to infer derived vs ancestral.
Synteny: conservation of blocks of (gene) order between two sets of chromosomes being compared.
6.3 Isolation-migration model for historical demography
Developed by Jody Hey and others, as a way to address the problem of discordance between histories of gene and species trees. It is based on coalescent theory, where all the lineages present today coalesce going back to one common ancestor. Using only the present-day observed lineages helps because we can ignore others from past who didn’t make it (simplifies compute). Smaller (pop. size) and younger lineages would coalesce faster. We can also get gene history or gene geneaology along the same steps.
This can be used to test for no isolation between population, by looking for \(t = 0\) (divergence time). Similarly, it can test for no migration (genetic exchange), if \(m_1 = m_2 = 0\). On the other hand, it can also estimate values for these parameters for conditions of isolation and migration.
6.4 Other models
The IM model is very simplistic, mainly because it assumes just one population size outside of divergence, and migration rate is constant. One alternative is DaDi (Diffusion Approximations for Demographic Inference), which allows more than one value for each parameter. However, this is a different framework, because it works by model comparison between numerous fixed (prespecified) models. So it is also important to remove very unlikely models from the starting options.
Model selection is typically done using AIC (\(AIC = 2k - 2l\)) where \(k\) (complexity) is number of parameters and \(l\) (fit) is log-likelihood of data under model. In essence, penalised likelihood is a parsimonious compromise. Sometimes, model averaging can be used.
6.5 Tests for selection
Either Tajima’s \(D\), or codon-based metrics like \(\omega = dN/dS\) (non-synonymous substitution per non-synonymous site vs synonymous), can be used to infer selection for new variation, i.e., positive selection in the population. People involved include Nielsen, Yang.
Neutrally, there should be more N than S (first two alleles of codon vs only third). But there are methods to standardise this to 1, simplifying the other cases. Purifying selection is indicated by \(\omega \lt 1\), meaning most changes in alleles are being selected against. Diversifying (positive) selection is indicated by \(\omega \gt 1\), meaning the system wants actual changes to proteins, not benign changes. In this case (if \(\omega \gt 1\)) we can further use Bayesian analysis to see into which class codons go.
7 Genetic basis of speciation I: How many genes?
2025-10-02
What we’ve seen so far could have been gauged based on only distributions, crosses, etc. Nowadays much more based on genomics, and at much greater scale than possible before. Some questions include:
- How many genes & changes (few genes of large effect)?
- Structural or regulatory?
- cis (close) or trans (far away across genome)?
- Special role for inversions? (as a protective measure against recombination for divergent genes)
7.1 Darwinian gradualism | many genes of small effect
“many slight differences … afford materials for natural selection”
— Darwin
Particularly favoured by biometricians who had agricultural backgrounds, and were working to improve crop growth, size, etc. Douglas Falconer was influential, with his book Introduction to Quantitative Genetics.
7.2 Mendelian saltationism | few genes of large effect
“Nature does make jumps now and then”
— TH Huxley
This was particularly favoured by Mendelian geneticists, who were used to looking at systems and data of big, diagnosable traits.
At the extreme, Waddington’s “hopeful monsters” placed faith that random aberrant (seemingly non-beneficial) mutations would somehow turn uniquely useful. e.g., Diaz et al 2020, chameleon and gecko embryo development, two-digit could have been due to single mutation as seen in the gecko.
7.3 Modern Synthesis
The two views formed a divide until the start of the 20th century, when the Modern Synthesis actually brought about reconciliation of Mendelian inheritance and Darwinian natural selection. This viewed evolution as being based on Mendelian genes, but involving many alleles of small effect. Thus, there was a lean towards gradualism.
There were several people involved in this:
- Julian Huxley (grandson of TH; brother Aldous; half-brother Andrew, action potentials): invented “cline”, co-founded WWF, UNESCO
- Fisher (quantitative genetics; farm data)
- Wright (effect pop. size; farm data)
- Mayr (allopatric speciation)
- Dobzhansky (Drosophila)
- Stebbins (plants)
- GG Simpson (fossils, macro-patterns)
Evidence
Much of the evidence came from farm systems, and was good (held up) in crops, farm animals, etc. But people believed this without looking at wild populations. And this is also linked to the trend at the time of turning a blind eye to true ecology.
But as more and more actual evidence started coming in, the true picture seemed different. Orr & Coyne (1992) looked at many studies and how many genes involved in adaptations. Gavrilets used theoretical models of evolution of RI and showed that fewer genes were most favoured. Yeaman & Whitlock (2011) found support for fewer genes of large effect, and that genes of smaller effect tend to cluster in genome over time.
7.4 Which genes?
How to identify the genes responsible for speciation?
7.4.1 Forward genetics
Start with trait(s) in crosses, then work down to genes.
Association mapping
Sequence individuals for a particular trait across many individuals, and look for non-random genetic/genomic associations, i.e., look for correlatinos between markers and the trait. Recently emerged traits will be in LD with regions around them. Can use AFLPs, microsatellites, SNPs. Can be done across the whole genome: genome wide association mapping (GWAS, like Schield et al 2024).
Issues: If goal not medical but evolutionary, results will be biased towards genes of large effect, which will drown out the many genes of small effect. It’s also affected by the extent of genomic LD: easier to pick up some region if high LD, but low precision especially with high recombination rates.
Quantitative Trait Loci mapping (QTL)
Create recombinant populations by first crossing between extremes for F1 (uninformative) then doing F1-parent backcrosses. From these, correlate trait values with distributed genetic markers.
Issues: Requires pedigree info (ideally 2 generations of crosses; inbred lines important so that extremes are certainly homozygous), marker map, phenotypes associated with genotypes, need large families of individuals. Again, it has low power to detect many loci of small effect.
7.4.2 Candidate genes
Start with knowledge, or a fundamental idea, of the genetic basis for a trait in other taxa, and apply it to the new system. e.g., MC1R for melanin in birds.
But this prior assumption means that this is closed in two ways: it’s not open to all variation out there, and it’s also close-ended because there is no further path of questioning to follow.
7.4.3 Reverse genetics
Start with genomic data—ideally the whole genome—and look for signals of adaptive changes, then work back to see what traits they encode for. This of course has grown recently with the availability large whole genome sequences. Essentially, search the data for footprints of selection using genome scans, outlier analyses, etc.
Outlier analysis
Many methods: can compare allele frequency spectra (AFS), or look for signs of positive selection using \(\frac{D_N}{D_S}\).
Or, the popular \(F_{ST}\) approach that was advanced by Wright: calculate the proportion of total genetic variation that is attributable to differences among subpopulations. Survey differentiated/candidate populations for many loci, calculate average overall \(F_{ST}\), simulate expected values based on neutral expectation for variation across loci, then identify the outliers. (\(F_{IT}\) is the variation within an individual suspected population.) Wilding et al 2001 were one of the first to use this approach to find that shell structure in Littorina saxatilis varied according to tidal zone.
Issues of \(F_{ST}\): false positives in outliers that arise from many other kinds of variance including demographic history; and like QTL, it needs validation.
- Poelstra et al 2014: hooded & carrion crows
- Reference genome, then analysis
- Some loci (0.28%, among those most on same chromosome) showed clear separation, some abosolutely no
- Important empirical followup: Later went back and got tissues from various parts ofbody, found substantial difference in gene expression (of that gene) o torso skin and not in any other part.
152x sequence coverage vs resequencing difference?
Discussion: Genetic basis of speciation I
2025-10-07
- Snails have so many sites of small differentiation because low gene flow (very low dispersal).
- They are ovoviviparous
- Scaffolds are fragments of genome that you make using pipeline. If same data and same pipeline, should end up with pretty similar scaffolds.
8 Genetic basis of speciation II: Which genes?
2025-10-09
8.1 Structural vs regulatory sequences
Structural gene sequences control what is being made, whereas regulatory sequences control when and how much is being made. Most regulation happens at the level of transcription, but some at translation via attachment of siRNAs (small interfering; becomes double stranded when attach to mRNA so no translation).
In crows, colour divergence was not always necessarily due to differences in amino acids but rather in non-coding regions like introns. Do regulatory changes underlie qualitatively different types of phenotypic change?
8.2 cis vs trans | mechanism of regulation
Regulation of transcription fundamentally occurs due to sequence-specific binding (close to protein-coding region, so (cis-)) of transcription factors (trans-, far across).
cis-regulatory elements are regions of non-coding DNA that regulate the transcription of nearby genes; they act at the intra-molecular level via binding sequences.
trans-regulatory elements are proteins that regulate the transcription of distant genes; they act at mostly the inter-molecular level, on many genes (not 1:1). aka transcription factors.
8.3 cis-regulatory changes > structural changes
“regulatory evolution is the creative force underlying morphological diversity across the evolutionary spectrum, from variation within species to body plans”
— Sean Carroll
Sean Carroll: UoWisconsin, HHMI funding (millions over 7 years, renewable). Showed evidence for convergent evolution of CRE for the same gene in multiple Drosophila species. Also wrote popular books.
Proponents: Britten & Davidson, Wray, Valentine. Rebuttal: Hoekstra & Coyne (2007).
Sean Carroll’s argument is founded on several factors:
- Mosaic pleiotropy: most proteins regulating development participate in multiple independent developmental processes.
- Ancestral genetic complexity: morphologically disparate and long-diverged animal taxa share similar toolkits of body-building and -patterning genes.
- Functional equivalence of distant homologs: many animal toolkit proteins exhibit functional equivalency (redundancy) in vivo when substituted for one another.
- Deep homology: formation and differentiation of many structures are governed by similar sets of genes and regulatory circuits.
- Infrequent toolkit gene duplication
- Heterotopy: changes in spatial regulation of toolkit genes and the genes they regulate are associated with morphological divergence.
- Modularity of CRE: can have multiple CRE for same genes controlling different aspects of it, like differential expression in different tissues/body parts. (But counter argument by Hoekstra et al. is that duplication can give the same result by copying both CRE and gene.)
- Vast regulatory networks of transcription factors: individual regulatory proteins (transcription factors) control scores to hundreds of target gene CREs; so changes to TF are “catastrophic”.
8.4 In defence of structural changes
Hoekstra & Coyne (2007): morphological changes are actually not so special. Gene duplications can accomplish many of cis-regulatory genes’ special claims (we now know gene duplication rates are almost similar to point mutation rates). And some “regulatory” changes are actually structural: transcription factors include some of fastest evolving human genes (?).
At the time, there was also more evidence of protein changes than CR changes, but later we have seen that both factors are important. Interestingly, both Carroll and Hoekstra ended up publishing later concluding the very opposite of their own earlier POVs.
Much of this debate has placed a strong focus on characters/traits, and is not necessarily tied to speciation in closely related species.
Discussion: Genetic basis of speciation II (Gene regulation)
2025-10-21
- Fundamental of -omics approach: look at everything (genomes or transcriptomes), let data show you who’s interesting
- Genome size related to cell size, so in flying birds for metabolic reasons cells have to be small and therefore genomes small. Same in bats versus other mammals.
9 Genomic architecture
aka, Islands of Discord
2025-10-23
9.1 Genomic islands of speciation (differentiation)
Felsenstein’s paradox made him “[skeptical] towards Santa Rosalia” (vs Hutchinson): recombination brings good genes together, but makes it hard for them to stay together!
Genomic islands of differentiation are relatively small regions of the genome that have elevated divergence, surrounded by regions of low/no divergence. This concept arose (in “speciation” flavour) due to multiple key influences around the same time, ~ 20 years ago:
- Rebirth of reinforcement
- Genic speciation, Chung-I Wu (versed Coyne): shifting emphasis from isolation at whole genome to individual genes
- Isolation-with-migration model (Nielsen, Wakeley, Hey): shift from coalescent approaches to speciation with gene flow
- Rieseberg et al 1996. Helianthus anomalus is a dune specialist. Non-random hybrid composition between two parental lineages, even in lab crosses, very similar to that in wild hybrid. “Interactions between coadapted parental species’ genes constrain the genomic composition of hybrid species.”
- Turner et al 2005. Anopheles gambiae: Three regions (< 2.8 Mb) contain most differentiation, and these speciation islands remain differentiated despite considerable gene flow (therefore responsible for RI).
By 2010–15, speciation with gene flow was already dogmatic. GIDs are thus in regions of low recombination, and co-localise with species-defining traits (e.g., hood in Hooded Crow). There is also LD among unlinked islands (e.g., capuchino seedeaters), and island (gene) trees match species trees (gene tree–species tree concordance). Further, the islands grow over time (Via 2012) because recombination fragment size is larger than individual genes, and some genes/loci hitchhike along with loci under selection.
GIDs can involve inversions (rearrangement/flipping of gene order in chromatin packing; these areas simply do not get recombined, stay the same, due to the differing order; Noor et al 2001 inc. Jane Reiland: QTLs also associated with regions of inversion) and structural genomic variants (beyond just SNPs) which are often associated with adaptations, clines, and speciation (Hooper & Price 2015).
9.2 Disputed islands
Turner & Hahn 2010: what is “divergence”? Most approaches focused on relative divergence, comparing between vs within (e.g., \(F_{ST}\) compares sub with tot). Markers with high variation within populations are less able to show between-variation (max \(F_{ST}\) will be low when \(H_{S}\), i.e., recombination, high). Conversely, low recombination within population will drive up \(F_{ST}\).
On the other hand, absolute divergence (e.g., \(d_{xy}\)) measures the average number of pairwise differences between sequences from two species, and is not sensitive to within-population variation. GIS should have both relative and absolute divergence high, so shift to GID.
Genomic islands of speciation or genomic islands and speciation?
Cruickshank & Hahn 2014 showed that absolute divergence doesn’t keep up, supporting the divergence-after-speciation model. Thus, the view changed:
In regions of low recombinationco-localise with species-defining traits\(\rightarrow\) traits arose post-speciationLD among unlinked islands\(\rightarrow\) impossible to maintaingene tree–species tree concordance\(\rightarrow\) selection speeds lineage sorting
Guerrero & Hahn 2017: Speciation only responsible for sieving (sorting) process. Divergence itself ancestral (polymorphism in population). Ancestral balanced polymorphisms are sorted unequally between descendant lineages (balanced: both coexist in population).
Discussion: Genomic architecture
2025-10-28
- Parallel speciation: Speciation version of convergent evolution. Parallel evolution assumes more similar starting points, is a subset of convergent evolution.
- Type I inversions: divergent/balancing selection. Type II: good things on both inversions, so best is to have heterozygous (e.g., trapped along with good stuff might be bad, and some bad effects may be covered by having heterokaryotype). (Zygote: single locus; karyotype: can be set of loci, structural variants.)
10 Hybridisation and introgression
2025-10-30
Hybridisation is reproduction between members of genetically differentiated populations that yields offspring of mixed ancestry. e.g., Carrion & Hooded Crows. It is often initiated by factors like range change, habitat shifts/mosaics, disturbance.
The outcomes of hybridisation include fusion, hybrid speciation, balance, and introgression.
10.1 Fusion
This is the reversal of speciation, the wholesale collapse of a distinct entity which continues to be more homogenised.
A good example is Common & Chihuahuan Raven. Chi. originally split from Common, then one fused back with Common forming the California type, while the other remains Chi. So the California lineage is considered to have fused (subsumed by Common).
10.2 Hybrid speciation
This forms a transgressive variant, a third group. Allopolyploid speciation is driven by the different numbers of chromosomes, and is common in angiosperms, ferns, Acropora. Homoploid hybrid speciation happens with no ploidy changes, and more common in animals , e.g., Cottus fish, sparrows (House, Italian, Spanish), snub-nosed monkey.
Often happens due to transgressive phenotypes: hybridisation generates fitness-related phenotypes that lie outside the parental distribution.
10.3 Balance
Forms hybrid zones. Generally involves habitat mosaics, and is common in mice, mussels, crickets.
Tension zones are an interesting case where dispersal into zone fights with selection against crossing. (See Fire-bellied Toad, Nick Barton.) Measure strength of tension by \(\theta = \frac{S}{R}\) (ratio of effective selection and pairwise recombination rate).
Hybrid zones tend to be drawn to areas of low population density, and show spatial coupling of genotypic and phenotypic clines enhanced by reduced recombination, epistasis among barrier loci, moving clines to low fitness landscape (?).
Brumfield et al 2001: clines in manakins do not line up with hybrid zones. The transition in certain markers are at some different geographical location than the actual hybrid zone (where crucial trait also transitions), so looking at just that would suggest introgression.
10.4 Introgression
The seeping of a small set of genes, that allow adaptations to jump the species barrier (cf fusion). Something that was originally in one species ends up in another, and this is therefore kind of an exception because it manages to escape the forces acting to keep it within the hybrid zone.
But this allows hybridisation to act as the important source of adaptive variation, instead of mutations which are rare. Examples include Grants in Galapagos finches; high altitude adaptation in Tibetans (adaptive introgression with denisovans); Heliconius aposematism: same morph in sympatry, different in allopatry.
ABBA-BABA test: Patterson’s \(D = \frac{ABBA - BABA}{ABBA + BABA}\). Tests for introgression (\(ABBA > BABA\)) vs incomplete lineage sorting (\(ABBA = BABA\) cos random). Need four groups, and large genome if not whole genome.
Migrant tracts: chromosome segments that show migrant ancestry. Length distribution of chromosome segments will change over time due to recombination. Archaic ancestor SNPs will become smaller/rarer farther away from introgression event. Can compare increasing vs decreasing vs constant introgression. Can also identify private alleles (only found in one population): if due to de novo mutations, random distribution, but if past introgression, expect longer runs of identity in SNPs.
Is introgression the new allopatry?
Discussion: Introgression
2025-11-04
- Looks like Jensen et al 2024 did not start out with hypothesis/prediction, instead got the data and framed the story around it.
- PSMC: only need one individual to recreate the population history
11 Old variation is the new thing
2025-11-06
11.1 de novo variation
DMI can arise through point mutations, duplication, scrambling, retroposition, mobile elements, lateral gene transfer (viruses, parasites), fission and fusion, chimeric genes (origins in separate genes).
But this is a long waiting game, most mutations are deleterious and start at low frequencies (\(\frac{1}{2} N_{e})\)), and small effect mutations are genomically diffuse.
Geospiza difficilis has few genes with large effect on beak size, thus an ecological impact along with an easily identifiable morphological feature. When looking at the overall species tree, the genetic basis for species difference is older than the basis for species split itself. The current sister species have blunt or pointed beaks, so introgression with blunt type introduced the blunt beak in new lineages.
11.2 Standing variation
The diversity of genetic variants present in a population (can include de novo mutation).
Ancestral variation: genetic variation present in common ancestor of populations/species being compared. Forms the substrate for rapid divergence, and explains why speciation alleles often predate species divergence. No waiting, pre-screened for deleterious alleles, high starting frequency, small effect alleles clustered (so enriched for large-effect haplotypes/islands), co-adapted genes clustered (LD). Many sources: mutation, large \(N_{e}\), balanced polymorphisms (broad geo/ecological range species, e.g., balanced alleles for two different habitats), introgression (admixture variation).
11.3 Combinatorial view of speciation
de novo vs standing variation \(\sim\) gene flow vs sieve speciation.
Old hybridisation creates variation that can be sorted into new species.
Seehausen 2005: beginning of speciation event is not the origin but the reassembly of several old variants into novel combinations. Hybrid swarm: hybridisation upon invasion of new habitat producing lots of heritable variation for ecological traits. Syngameon: a complex of genetically weakly but ecologically highly distinctive species capable of exchanging genetic material (in context of radiations).
Implications
- Decoupling of mutation and speciation rates
- Facilitates evolution of LD in face of gene flow
- These two together: no wait when ecological opportunity arises
Things left to do:
- Rapid (inc. non-adaptive) radiations vs depauperate sister taxa
- Allele splits vs species splits
- Compare LD among species
- Locus size effects for old vs new variants
- Revisit old data (e.g., incongruent gene trees)
Discussion: Old variation
2025-11-11
- Mitochondrial DNA has way more drift for same eff pop size than nuclear, which is why you usually see discordance.
12 Microbes, microbiomes and speciation
2025-11-13
12.1 Species
First of all, only 1-2% of bacteria are culturable, so biased view (unculturable paradigm). Heuristic for bacterial species concept is 3% divergence at small-ribosomal subunit (SSU) rRNA.
Why SSU?
- Encoded by nucleus and organelles (e.g, 16S)
- Slow and fast (hour-hands, minute-hands)
- Conserved structure
- Practicalities
- Ancient and essential (ribosome)
- Highly integrated (>100 co-evolved cellular RNAs and proteins). Should also minimise LGT
- Lateral/Horizontal Gene Transfer: Movement of genetic material between organisms other than by vertical (parent-to-offspring) transmission.
Problems with SSU rRNA:
- Discordance with DNA hybridisation: overall rates of DNA similarity (reannealing) did not match with those of SSU rRNA
- Ecological differentiation within “species”: 3% rule of thumb doesn’t capture full picture.
- Descriptive, not a species concept
Reality of species, and how recombination works, is different in bacteria. There is no meiosis, so recombination is rare and not every single generation. On the other hand, recombination can happen with a wide array of organisms (promiscuous), and is highly localised, adding/dropping individual genes at very small levels. There can be transfer of non-homologous loci, and exogenous DNA is a major raw material for adaptation.
The basis of bacterial species cohesion involves extinction/recolonisation (metapopulation model, bottlenecks) and stabilising selection on ecotypes.
12.2 Communities
Are there bacterial communities? From different habitats? On/in different hosts? How tight are associations between species or between species and environment?
Yes, habitat-wise. Yes, host-specific. Tight association with host life cycle (transfer of symbionts/microbiota).
Microbiome: Microbes that live on and inside a defined habitat (e.g., an organism’s body). Human body has 10x prokaryotic cells than eukaryotic. e.g., C. difficile is natural part of microbiome, but in hospitals the antibiotics around affect other bacteria that hold it down in normal condition. Fecal transplants help with this more than antibiotics.
12.3 Holobiont
A host and its microbiota. Multicellular eukaryotes are not and never have been autonomous organisms.
Hologenome: the complete genetic content of the host genome, its organelles, and its microbiome. Host’s nuclear genome inherited a la Mendel, but microbiome is Lamarckian and often uniparental! So, sources of holobiont variation are:
- SNPs
- Recombination
- Gene loss and duplication
- Microbe loss or amplification
- Horizontal gene transfer: Red alga seaweed Porphyra, genes that allow humans to digest, usually only found on bacteria that live on the plant, were transferred via HGT to people in Japan, especially mother-child (Hehemann et al 2010).
Implication for evolution is that microbes in microbiome can be equated with genes in genome. Role of host-associated microbes in speciation of host can be tested using microbe-free (antibiotic) versions of same old experiments. e.g., Brucker & Bordenstein 2013: Nasonia wasps and microbiome affecting viability.
Phylosymbiosis: microbial community similarity that mirrors the phylogeny of their host (Lim & Bordenstein 2020). Dunaj et al 2020 showed phylosymbiosis in ovaries and silk glands but not fat/midgut.
Cophylogeny: agreement of evolutionary histories of symbionts and their hosts (Mark Hafner LSU; not bacteria, but lice and the host gophers).
13 Macroevolution
2025-11-20
Evolution above the species level, studying:
- Patterns of diversity over time
- Mode and tempo of evolution
- Speciation and extinction
- Drivers of patterns of diversity over time
- Key innovations (tied to rates)
- Ecological/geographical circumstances
- Species selection
- Can natural selection occur at levels higher than species?
- Diversification of clades caused by heritable species-level traits
- Rabosky & McCune 2010?
- Problem was need trait that’s species-level, not individual-level. e.g., polymorphism, range size
13.1 Fossil record
Can help:
- Calibrate molecular clock
- Identify transitional forms
- Unveil morphological history
- Unveil ecological history
- Estimate extinction rates
- Mass extinctions are only 4% of all extinctions; others are all background extinctions
- Terminal Ordovician (444 Ma), Late Devonian (360 Ma), End Permian (251 Ma), End Triassic (200 Ma), Cretaceous-Paleogene (65.5 Ma)
- Biotic replacement (e.g., ecological): brachiopods vs bivalves
- Estimate “tempo and mode” of speciation
- Punctuated equilibrium (SJ Gould, Eldredge), aka “evolution by jerks”
- Most morphological change accumulates over short time, then stays quiet for long time (vs phyletic gradualism)
- (Gould influenced by Marxist ideas)
- Main counter: this pattern likely due to incomplete fossil record
- But examples of adaptive radiation, etc. suggest this may not be all impossible
- So the more remarkable pattern is sometimes patterns of stasis (at least morphological) in fossil records, despite rapid change elsewhere. e.g. in Louisiana, horseshoe crab, gar: genetic evolution rates similar between horseshoe crab and hermit crab, so must be something else about organisation of genome.
- Punctuated equilibrium (SJ Gould, Eldredge), aka “evolution by jerks”
However, ILS throws off fossil calibrations. Also, if fossils are old and you’re looking at recent events, bias (and converse).
13.2 Phylogenetic trees
Can study:
- Changes in no. of surviving lineages over time (net diversification rate = speciation rate - extinction rate)
- Speciation rate: Yule process (birth/speciation only), generates distribution of clade ages and sizes
- Extinction rate: contingent on sampling completeness, branch length estimation, constancy rates
- In general, our methods are really good at inferring topology (branch order), but not good at nailing down branch lengths
- Detecting diversification patterns
- Sister clade comparison. e.g., Farrell 1998: beetles and angiosperm associations.
- Lineage-through time (LTT) plots
- How many lineages exist at a specific time, over a period of time (e.g., Glor 2010)
- Test: slow constant species accumulation vs adaptive radiation
- Bromeliads: C3-C4 vs tankless–tank-forming
- Adaptive radiation: short early branches and long later branches
- Antarctic icefish: anti-freeze glycoproteins (hyperdiver) prevent growth of ice crystals
- BAMM: Bayesian Analysis of Macroevolutionary Mixtures (Rabosky 2014). Like cluster analysis, but this identifies how many different groups of differential diversification rates exist in your tree.
- But likelihood ignores rate shifts on extinct lineages, biases estimates of extinction probabilities. CPP prior used for distribution of rate shifts over tree are too sensitive. (Moore et al 2016).