Class notes
1 Overview & species concepts
2025-08-26
1.1 Course overview
Follows Coyne & Orr 2004(?) broadly. This was a highly impactful book, very timely, and heralded the dawn of the genomic era. The period of roughly 2012–20 saw incredible growth in our understanding of genomes and speciation. Species concepts are important, especially if your thesis includes the term “speciation”, but we won’t spend too much time in the class on these debates on definition.
Till exam 1: reading fundamentals of speciation questions. Post-exam-1: stick with basic questions but delve on what genomics has done to advance understanding. Word and sentence counts in assignments need to be followed strictly, aim is to get used to condensing important information.
1.2 Concepts of species
Species are not just for bookkeeping; different species are actually different, and they do different things. Cryptic species are an example of why (community) ecologists should care about work of population geneticists.
If one could conclude as to the nature of the Creator from a study of creation it would appear that God has an inordinate fondness for stars and beetles.
— JBS Haldane, on creationism and the sheer number of beetle species
Early ideas of species concepts can date back to, as with many things, Aristotle. His notion of a perfect “essence” asserted the existence of a perfect form for each species, variations from which are imperfections in its actualisation. The essence of a species was fixed, but involved descent as the crucial factor tying different individuals in a species together. This implied a fundamental difference between inter- and intraspecific variation.
1.2.1 Biological species concept
The Darwinian revolution insisted that variation is not just noise, but rather the core (nuisance vs crux). Everything shares common descent, variation is at the heart of evolution, and there is no true difference between inter- and intraspecific variation. His way of distinguishing variants was using the morphological gap criterion (few intermediates), although his concept can be applied to ecology, behaviour or genetics too.
Ernst Mayr, starting in the 1950s, adapted Darwin’s ideas to eventually come up with the biological species concept, where species are defined as interbreeding natural populations, reproductively isolated from others. Before him, Poulton (1904) had similar ideas, with the notion of “syngamy” (interbreeding) as the true meaning of species. (As did Lotsy and Dobzhansky.)
Some adherents of BSC include those who worked extensively on model organisms like Drosophila in the lab (Coyne, Orr, Noor).
Impact of BSC
- Species now defined by process important to their own maintenance (biological, as opposed to scientist sitting in a museum describing from specimens).
- Allowed, and drew attention to, concepts of gene flow and isolating mechanisms.
- Provided criterion for diagnosing cryptic taxa.
Problems of BSC
Less universally applicable than Darwinian concept:
- Provides no useful insight regarding large-ranged species having several geographically isolated populations (difficult to check interbreeding in allopatry).
- Hybridisation is rather common, even between non-sister taxa. This might not be just noise!
- Cannot deal with asexual organisms that have divided into obviously recognisable populations/groups.
- Circularity: diagnosis of species and cause of speciation are the same.
Also, reproductive isolation not likely adaptive, and prezygotic compatibility is the true reality underlying species—isolating mechanisms a result rather than a cause of species separateness (Paterson 1985). Recognition concept vs isolation concept, common fertilisation system.
1.2.2 Phylogenetic species concepts
1.2.2.1 Hennig 1966: Cladistics/monophyly-based
Hennig brought about the discipline of cladistics, which eventually evolved into the modern PSC, where a clade means a monophyletic group. Therefore, this species concept is based on monophyly, i.e., history. Problems:
- Can lead to considering very small isolated groups (due to focus on monophyly)
- Forced to leave behind paraphyletic taxa
- Contingent on quality of trees! Early on, mostly mitochondrial DNA.
- Species tree vs gene tree discordance.
- Maddison 1997 showed how this discordance can arise (short speciation event times, wide branches?, related to incomplete lineage sorting).
- Can’t rely blindly on numbers from gene trees.
1.2.2.2 Cracraft 1989: Irreducible cluster/diagnosably distinct
Focus on irreducible clusters of organisms, diagnosably distinct from others, and among which there is a parental pattern of ancestry and descent. Most proponents of this were from a museum background. Recently, this concept has started to make inroads into zoological taxonomy, raising subspecies previously demoted (under polytypic species concept) to species level.
1.2.2.3 Modern phylogenetic approaches
- Species delimitations using multi-locus coalescent (Yang & Rannala 2014); using Bayesian posterior to compare single-species or two-species models based on genomic data
- Problems: Same as Bayesian methods in general—importance of prior info on phylogenetic relationships.
- Barcode of Life (Paul Hebert): driven by biodiversity crisis, don’t need to wait for taxonomic expertise, can instead just collect and insert in barcoding pipeline.
- Problems: For many species, there is no clean gap in barcodes to distinguish intra- and interspecific variation. So, useful for identifying taxa from samples, but controversial for describing new species.
- Also, single gene taxonomy, gene-vs-species tree, introgression of mtDNA
1.2.3 van Valen 1976: Ecological species concept
Places greater focus on stabilising selection than gene flow. This is an especially true/useful approach for bacteria: different bacteria can occupy very different niches and have no gene flow, but act/behave the same in terms of where they are found, physiology, etc.
1.2.4 De Queiroz 1998: General lineage concept
Focuses on evolutionary independence, and considers species as segments of population-level variation?
1.2.5 Alternative/dissenting species concepts
Species are not real!
Some perspectives/justifications for this view:
- Taxonomic perspective: subspecies may be elevated to species, but still grade into each other so don’t hold much meaning
- Consider populations as evolutionary units, not species
- Spatial and temporal forms that are not easy to tell apart as species vs not.
- Genetic markers: gene flow too low to maintain cohesion
- Phenetics
1.2.6 Operational definition
Hellberg’s operational definition is based on genotypic cluster criterion (Mallet 1995). The emphasis is on how combinations of traits cluster separately or together between groups. What’s critical is the presence of genotypic gaps between local populations, so this concept makes sense for sympatric populations but not so much for allopatric ones.
It can be pictured as a bimodal distribution of traits. This view relies on Bayesian stats, using “assignment tests” and so on.
2 Geography of speciation
2025-08-28: Allopatric and sympatric speciation.
Allopatric speciation has always been popular, and has been the preferred model of speciation until recently, whereas sympatric speciation has kinda been the underdog.
2.1 Allopatric speciation
Jordan’s Law (1907): David Starr Jordan was the founder of Stanford University. His model system was Indopacific wrasses, and he had ideas about allopatric speciation well before Ernst Mayr. He believed: given any species in any region, nearest related species not likely to be found in the same region but in a neighbouring district separated from the first by a barrier of some sort. A demonstrative example was squirrels in the Grand Canyon.
Ernst Mayr was a big proponent of allopatry, and specifically peripatry. He emphasised the role of drift in populations. Model systems were birds of paradise and Dicrurus in SE Asia.
This has been the preferred model for multiple reasons:
- Easier: no gene flow to contend with
- Lots of opportunities in nature for allopatry
- Lots of evidence:
- Young sister species tend to be allopatric
- Concordance between geography and species borders
- Extension of above: Geographic coincidence of multiple separations of sister species and/or hybrid zones (one barrier/feature showing up again separating multiple groups of species)
- Increase in reproductive isolation in allopatry
However, even in this model, not incorporating explicit selection (i.e., relying only on drift/mutation) into calculations of species generation times results in very long times.
2.2 Sympatric speciation
2.2.1 Problems
Genetic problem: Felsenstein (1981)—also inventor of bootstrapping, maximum likelihood estimation, etc.; has 15 papers with >1000 citations—argued for the problem of recombination. In sympatry, need to avoid recombination by mating differentially.
Ecological problem: Ecological patterns like coexistence, competitive exclusion (Gause’s Rule) occur because in sympatry need to live differently enough from others.
2.2.2 Solutions
Disruptive sexual selection; disruptive natural selection (mating is linked to habitat, and the fewer genes [of large effect] the better). These are now very much linked to the concept of magic traits, for which there has been much evidence.
2.2.3 Outstanding problem
Additional requirements for sympatric speciation are that they need to be sister species, and we need to be sure that past allopatry was unlikely. This has been more difficult to show, and many popular examples fail here.
Lake Victoria cichlids had crazy sympatric speciation possible due to a combination of factors like trophic divergence, sexual selection, and microallopatry. However, there is controversy around the idea that they are only 14,000 years old—some people don’t think the lake fully dried out.
In the case of Rhagoletis fruit flies, which have sympatric differentiation in host plant choice (between hawthorn and apple; GL Bush), it turned out that they were actually allopatric at some point and only the completion of the speciation process happened in sympatry (Feder et al 2003).
Similarly, in the case of stickleback ecomorphs in postglacial lakes, secondary contact drove character displacement and resulted in sister species, even though completion was in sympatry (Rundle & Schulter 2004).
Thus, more and more evidence has piled up to transform the dichotomy from sympatric vs allopatric speciation, to primary sympatric divergence vs secondary contact introgression.
Discussion: Geography of speciation
2025-09-02
- Doesn’t mean old school (primary sympatric divergence) is fully obsolete. In fact, palm paper is an example. Hellberg also believes it should be one model under consideration.
- What % hybridisation is relevant gene flow? 5% is kinda high still (1 in 20).
- Each data point in a genetic structure plot (k-means clustering) is the allele type expressed at each biallelic locus (genotype).
- Differentiation is different from variation
3 Prezygotic isolation
2025-09-04
Reproductive isolation was a hot topic in evolution after WWII and well into the 1990s. Studies focused on two kinds of reproductive barriers (prezygotic and postzygotic), and eventually this dichotomy was paralleled in the scientists who studied them (field vs lab [e.g., Drosophila folks] respectively).
Prezygotic isolation can be considered in two broad, but not mutually exclusive, categories: ecological (temporal, pollinator, habitat) and sexual (behavioural, mechanical, gametic) isolation.
3.1 Evidence for ecological prezygotic isolation
Parallel speciation: stickleback ecomorphs in post-glacial lakes; ecotypes that are geographically isolated still reproduce
Magic traits: ecological gene under selection pleiotropically causes assortative mating (affects reproductive isolation). Heliconius butterflies of Amazonia; cichlids in E Africa lake (depth of lake affects colouration favoured); feather lice & pigeon hosts (Villa et al 2019 PNAS; check videos).
Strong habitat selection: walking sticks in California, immigrant inviability (abiotic-dispersed organisms)
Isolation-by-distance/adaptation: geographic distance \(\longrightarrow\) genetic distance OR ecological/adaptive divergence \(\longrightarrow\) genetic distance
3.2 Evidence for sexual prezygotic isolation
Runaway sexual selection: positive feedback loops
Sexual antagonism: males and females may have different optima for same phenotype
Mating isolation: genitalia differentiation in beetles, moths, etc.; lock-and-key also in pollinator/flower compatibility.
Gametic isolation is a special case which is post-mating but pre-zygotic (reproductive tracts, bodily fluids). Conspecific sperm precedence (mating order); acrosome reaction for V envelope dissolution around eggs in marine gastropods.
Discussion: Prezygotic isolation
2025-09-09
- Sentences in Science are usually more assertive/confident than would be otherwise.
- GWAS: genome-wide association studies.
- Sex chromosomes limit damage when dangerous alleles show up (doesn’t work same in both sexes, sexually antagonistic)
4 Postzygotic isolation
2025-09-11
Barriers after the zygote is formed, which can act via reduced hybrid viability, reduced hybrid fertility, hybrid breakdown, etc.
This topic has received much less focus than prezygotic isolation, because of ascertainment bias: once prezygotic isolation is completed, then cannot see postzygotic isolation. Also, other things like reinforcement strengthen prezygotic isolation.
However, there are many reasons to consider postzygotic isolation. First, in many ways, ecological isolation is a form of postzygotic isolation. Moreover, even in cases where hybrid formation occurs, the persistence of distinct species means there is something decreasing the fitness of these hybrids. Also, intrinsic post- isolation may become stronger at \(\ge\) F2 crosses, while extrinsic post- and pre- become weaker
Darwin had a conundrum about contradictory ways in which evolution worked: on the one hand natural selection operates, but on the other hand unfit hybrids are produced. His solution for this was that sterility arises as an incidental byprouct of “unknown differences”.
Much of the interest in postzygotic isolation is wrt intrinsic.
4.1 Extrinsic postzygotic isolation
External to the individual/organism, i.e., non-developmental. This includes things like habitat isolation, where say niche of hybrids falls somewhere between parental lineages but fitness of hybrids is lower than that of either parent. But in some cases, hybrids may have higher fitness in intermediate habitats, which would explain the increase in hybridisation in the first place.
In stickleback crossing studies, Rundle 2002 found that F1s were bad (relative to parents), but F1-parent backcrosses yielded some fitter individuals (proportional to how much parental heredity the cross had).
Other times, hybrids could have higher disease susceptibility, or intermediate traits that are less attractive (sexual selection).
4.2 Intrinsic postzygotic isolation
Internal to the individual/organism, so quite simply, developmental defects lead to hybrid sterility or inviability.
It can be meiosis issues related to ploidy levels, where hybrids like F1s simply cannot perform meiosis. Or chromosomal rearrangements/inversions like in Australian morabine grasshoppers (White 1978). However, many inversions surprisingly have no effect on heterozygote fitness, because recombination can still happen in other regions of genome.
Other mechanisms include endosymbiotic infection risk for hybrids, or cytoplasmic incompatibilities especially relating to parasites (e.g., intracellular parasites like Wolbachia: when uninfected female flies are crossed with infected males, many/all progeny die, but infection transmitted from mother so those that are born from infected females are infected). There can even be mitonuclear incompatibilities (i.e., between mitochrondrial and cytoplasmic genetic material), famously in copepods (Ron Burton), so much so that Geoffrey Hill even introduced a mitonuclear compatibility species concept.
Allelic incompatibilities are a major way intrinsic postzygotic isolation can occur. The original definition of Dobzhansky-Muller Incompatibility focused on alleles at different loci, arguing that low hybrid fitness is a byproduct of genomes that are geographically isolated. The newer, more specific definition emphasises epistasis (effects of an allele at one locus depend on alleles present at other loci), arguing that negative epistatic interactions occur between alleles of different loci with different evolutionary histories. Some assumptions of this theory are that there occurs at least one pair of allele changes, that most incompatibilities are due to derived alleles, and that the no. if incompatible combinations snowballs over time.
4.2.1 Haldane’s Rule
Rule about sex chromosome incompatibilities; it is an exception, because there are very rare exceptions to the rule. Basically: in offspring of two different races, if one (and only one) sex is messed up (absent, rare, or sterile) then that is the heterozygous sex.
NOT caused by: incompatibilities between X- and Y-linked genes; dosage compensation (compensate based on importance of stuff on X); chromosomal arrangement.
Possible causes:
- Dominance: in XY species, hybrid males are affected by all X-linked genes involved in genic incompatibilities, whereas hybrid females are affected only by dominant X-linked genes
Faster-males: in XY species, incompatibilities affecting XY hybrids accumulate more quickly(but exceptions to this, so doesn’t satisfactorily explain)- Faster-X: faster drift at population levels because more X than Y in population (thus this mechanism interacts with dominance)
- Meiotic drive: distorter alleles that distort Mendelian ratios to their own advantage; suppressors evolve to stop them. When two allopatric populations each independently evolve a driver and suppressor, suppressed drivers can become unmasked.
Discussion: Postzygotic isolation
2025-09-16: (absent)
5 Reinforcement
2025-09-18
Reinforcement has been explored in the context of driving enhanced prezygotic isolation in sympatry. It has traditionally defined as when the formation of maladaptive hybrids selects for enhanced prezyogtic isolation—where hybrids are maladaptive if their fitness is non-zero but lower than either parent’s. The late 1980s saw renewed interest in exploring such questions of reinforcement.
However, there was also a concurrent realisation that enhanced prezygotic isolation in sympatry could arise not just due to selection against hybrids, but also due to ecological or other factors. This brought focus to reproductive character displacement, where selection for enhanced prezygotic isolation occurs in order to reduce wasted reproductive effort between RI species. For instance, acoustic space divergence in frogs in TX/MX occurs between not just close congeners but also distant relatives.
Many early popular examples of reinforcement are actually of reproductive character displacement, because the hybrid has zero fitness (no hybrid formation in sympatry). Reviewed in Butlin 1987. Other cases, like the Collared and Pied flycatchers, do conform to the strictly reinforcement expectations, but we no know that there is more going on there.
5.1 Experimental tests for reinforcement
Reviewed in Rice & Hostert 1993.
- Kill-the-hybrid: let them mate but kill hybrids and see if displacement develops
- But this is actually evidence of RCD than of reinforcement
- Seemingly high success rate, but publication bias?
- Disruptive selection on arbitrary trait: kill everything but high and low values of certain neutral trait (like bristles in Drosophila)
- Bristles are not really neutral/arbitrary, have important function
- On the other hand, these results were never replicated in further studies
5.2 Comparative studies of reinforcement
Can ask the following questions:
- Is prezyogtic isolation greater in sympatry?
- (Difficult to test) Are they forming hybrids?
- (Difficult to test) What is the genetic basis? (e.g., does having many great effect genes at few nearby loci facilitate more reinforcement (magic traits)?
Lots of work in this area, including on Drosophila from Dobzhansky & Koller 1938 (western US) to Matute 2010 (Sao Tome & Principe), and bindin protein in sea urchins. In the case of the Ficedula flycatchers, male plumage is driven by female choice (because females in sympatry are not different, only males are), and this is just females being choosier than males (fewer eggs than sperm). However, it is not fully clear whether the brown in Pied males evolved to enhance isolation from sympatric Collared, or whether it’s just female-mimicry in order to reduce intermale aggression. This is the more going on in this case.
The genetic basis of reinforcement has been reviewed in Garner et al 2018, and consists of genetic divergence both within species (allopatric vs sympatric contrasts) and between species (greater in sympatry, lower in allopatry). One example is flower colour in Phlox wildflowers, where colour is driven by two loci (intensity and hue), but selection happens on the intensity one.
5.3 Objections to reinforcement
- Need to reconcile selection and recombination (antagonism, similar to that in case of sympatric speciation), which gets worse with more interacting genes
- Race against extinction: Have to be ecologically distinct to survive in hybrid zone, BUT not enough, need to stick around in hybrid zone (asking for a lot)
- Swamping effect: genes for reinforcement must be favoured only in sympatry; genes from outside hybrid zone may swamp out (again, asking a lot)
5.4 Solutions to objections
- Liou & Price 1994: sexual selection can promote reinforcement in hybrid zone
- Kelly & Noor 1996: selection for greater female discrimination can promote reinforcement; “one-allele” solution (same allele favoured in both species); not just sexual selection but strength of selection (turn up volume/intensity)
5.5 Bottom line
This is a theoretically possible mechanism, but in most cases it’s asking for a lot. There is a lot of evidence for greater divergence in sympatry, but most of these cases also have alternative explanations.
The best bet for such a mechanism would involve female choice resulting in heavier divergence in sympatry. This means that systems with greater sexual selection may be more likely to have reinforcement.
6 Wait, what? Background primer
2025-09-25
6.1 Historical demography of speciation
Genetic studies of speciation often infer the species tree from gene trees. Divergence time for the genes is always older than for the species. Trees made with different genes may be different and have disagreements (is this a methods issue, or the actual population demographics, or are they important factors driving speciation itself?). Moreover, gene trees can be complicated by mixing after divergence. So all this variation needs to be accounted for.
Historical demography helps us infer about population size, changes in population size, divergence genetics, and isolation and hybridisation. In this demographic context, certain terms have specific meanings. Population size is considered an indication of genetic variation, with larger populations holding more genetic variation. Genetic diversity is commonly denoted as \(\theta\), but there are multiple metrics used.
6.2 Measures of genetic diversity
\(\theta = 4 N_e u\) is a simple metric representing expected genetic variation, where \(N_e\) is the effective population size (size of an idealised population that would show the same degree of drift as observed), and \(u\) is substitution rate (sometimes aka mutation rate).
Watterson’s \(\theta_w\): Ignores frequencies, and is strongly affected by rare alleles. Based on the number of segregating sites (polymorphisms) in a DNA sample.
Tajima’s \(\theta_\pi\): Based on frequency so rare alleles don’t matter. Observed nucleotide diversity: average number of differences between pairs of sequences in a DNA sample.
Tajima’s \(D\) gives an overall measure. Premise is that under neutrality (observed diversity matching expected diversity), \(D = 0\). \(D \lt 0\) means more rare alleles than expected (negative selection), suggesting bottleneck (multiple loci) or selective sweep (single locus). During selective sweep, when \(D\) drops, linkage disequilibrium rises. Biston betularia is a classic example of selective sweep. \(D \gt 0\) means few rare alleles, suggesting balancing selection (e.g., malaria resistance) or overdominant selection. If you follow a certain selective pressure over time, \(D\) would go from negative to positive.
Mismatch distribution (Rogers & Harpending 1992): quantify mismatches between individuals in a population for each locus, and visualise like a histogram. A clustered peak shows sudden high mutation event; so we can estimate divergence time (e.g., bottleneck in exploding population). This requires examining non-recombining bits of DNA, and earlier mitochondrial DNA was used for this, but nowadays it’s the Y chromosome.
Site/allele frequency spectrum: distribution of allele frequencies for all segregatng sites in a sample of DNA sequences. Certain demographic scenarios like bottleneck would result in allele frequency distribution (using combined polymorphism data) close to 0 for 2nd most common allele (i.e., minor allele in biallelic). We can compare frequency with reference of outgroup (unfolded) to infer derived vs ancestral.
Synteny: conservation of blocks of (gene) order between two sets of chromosomes being compared.
6.3 Isolation-migration model for historical demography
Developed by Jody Hey and others, as a way to address the problem of discordance between histories of gene and species trees. It is based on coalescent theory, where all the lineages present today coalesce going back to one common ancestor. Using only the present-day observed lineages helps because we can ignore others from past who didn’t make it (simplifies compute). Smaller (pop. size) and younger lineages would coalesce faster. We can also get gene history or gene geneaology along the same steps.
This can be used to test for no isolation between population, by looking for \(t = 0\) (divergence time). Similarly, it can test for no migration (genetic exchange), if \(m_1 = m_2 = 0\). On the other hand, it can also estimate values for these parameters for conditions of isolation and migration.
6.4 Other models
The IM model is very simplistic, mainly because it assumes just one population size outside of divergence, and migration rate is constant. One alternative is DaDi (Diffusion Approximations for Demographic Inference), which allows more than one value for each parameter. However, this is a different framework, because it works by model comparison between numerous fixed (prespecified) models. So it is also important to remove very unlikely models from the starting options.
Model selection is typically done using AIC (\(AIC = 2k - 2l\)) where \(k\) (complexity) is number of parameters and \(l\) (fit) is log-likelihood of data under model. In essence, penalised likelihood is a parsimonious compromise. Sometimes, model averaging can be used.
6.5 Tests for selection
Either Tajima’s \(D\), or codon-based metrics like \(\omega = dN/dS\) (non-synonymous substitution per non-synonymous site vs synonymous), can be used to infer selection for new variation, i.e., positive selection in the population. People involved include Nielsen, Yang.
Neutrally, there should be more N than S (first two alleles of codon vs only third). But there are methods to standardise this to 1, simplifying the other cases. Purifying selection is indicated by \(\omega \lt 1\), meaning most changes in alleles are being selected against. Diversifying (positive) selection is indicated by \(\omega \gt 1\), meaning the system wants actual changes to proteins, not benign changes. In this case (if \(\omega \gt 1\)) we can further use Bayesian analysis to see into which class codons go.
7 Genetic basis of speciation I
2025-10-02
How many, effect size
- What we’ve seen so far could have been based on only distributions, crosses, etc.
- Nowadays much more based on genomics, and at much greater scale than possible before
- Questions in genetics of speciation
- How many genes & changes (genes of large effect)?
- Structural or regulatory?
- cis (close) or trans (far away across genome)?
- Special role for inversions? (as a protective measure against recombination for divergent genes)
7.1 How many genes?
- Before genes, how many changes?
- “many slight differences … afford materials for natural selection” — Darwin
- gradualism (many genes of small effect)
- Followed by biometricians (agricultural background, working to improve crop growth, size, etc.)
- Douglas Falconer, Introduction to Quantitative Genetics
- T H Huxley on the other side, few genes of large effect: “Nature does make jumps now and then”
- Also influenced by saltationism
- Extreme, Waddington: “hopeful monsters” (e.g., Diaz et al 2020, chameleon and gecko embryo development, two-digit could have been due to single mutation as seen in the gecko) (very aberrant mutation –> monstrosity)
- Also Mendelian geneticists: looking at data of pretty big traits (diagnosable)
- This divide continued till start of 20th century
- Modern Synthesis, starting 1930s (–1950s): reconciliation of Mendelian inheritance and Darwinian natural selection
- Julian Huxley (grandson of TH), brother (Aldous), half-brother (Andrew, action potentials): invented “cline”, co-founded WWF, UNESCO
- Fisher (quantitative genetics; farm data)
- Wright (effect pop. size; farm data)
- Mayr (allopatric speciation)
- Dobzhansky (Drosophila)
- Stebbins (plants)
- G G Simpson (fossils, macro-patterns)
- Evolution based on Mendelian genes, involving many alleles of small effect (Fisher, quantitative genetics)
- (more lean towards gradualism)
- Evidence:
- People believed as gospel, without looking atWILD populations. (Evidence was good in crops, farm animals, etc.)
- Also linked to turning blind eye to true ecology
- Orr & Coyne (1992): looked at studies and how many genes involved in adaptations
- Gavrilets: theoretical models of evolution of RI showed fewer genes most favoured
- Yeaman & Whitlock (2011): fewer genes of large effect, and genes of smaller effect tend to cluster in genome over time
- People believed as gospel, without looking atWILD populations. (Evidence was good in crops, farm animals, etc.)
- How to identify the genes responsible for speciation?
- Forward genetics
- Start with trait (in crosses), word down to genes
- Candidate genes
- Start with knowledge/fundamental idea of genetic basis for trait in other taxa
- Reverse genetics
- Start with genomic data (ideally whole genome), look for signals of adaptive changes, then work back to see what traits they encode for
- Forward genetics
7.2 Forward genetics
- Association mapping: get sequence individuals for trait across many individuals, look for non-random associations
- Look for correlatinos between markers and particular trait
- Recently emerged traits will be in LD with regions around them
- Can use AFLPs, microsatellites, SNPs
- Genome wide association mapping (GWAS, like Schield et al 2024):
- Issues:
- If goal not medical but evolutionary, this will give biased result for test for genes of large effect (drown out more genes of small effect)
- Affected by extent of genomic LD: easier to pick up some region if high LD, but low precision especially when high recombination
- Quantitative Trait Loci mapping (QTL)
- Create recombinant populations (crossing between extremes for F1 (uninformative) then F1-parent backcrosses), correlate trait values with dsitributed genetic markers
- Requires pedigree info (ideally 2 generations of crosses of inbred line), marker map, phenotypes associated with genotypes, need large families of individuals, low power to detect many loci of small effect
- Inbred lines important so that extremes you identify are certainly homozygous
7.3 Candidate genes
- Knowledge of genes function applied to new system
- e.g., MC1R for melanin in birds
- Issue: Not open to all variation out there, now where do you go? (close-ended)
7.4 Reverse genetics
- Availability of full sequences, large
- Genome scans, outlier analyses
- Search data for footprints of selection
- Outlier analyses:
- Allele frequency sprectrum (AFS)
- Positive selection (\(\frac{D_N}{D_S}\))
- \(F_{ST}\) (Wright): proportion of total genetic variation that is attributable to diferences among subpopulations
- Survey differentiated pops for many loci, calculate average overall FST, simluate expected range (neutral expectation for variation across loci), identify outliers
- FIT: variation in ind cf pop
- Wilding et al 2001: Littorina saxatilis: snail shell structure differences according to tidal zone
- Caveats:
- False positives: outliers due to a lot of other variance
- Demographic history can give false signals
- Need validation (like QTLs)
- Poelstra et al 2014: hooded & carrion crows
- Reference genome, then analysis
- Some loci (0.28%, among those most on same chromosome) showed clear separation, some abosolutely no
- Important empirical followup: Later went back and got tissues from various parts ofbody, found substantial difference in gene expression (of that gene) o torso skin and not in any other part.
152x sequence coverage vs resequencing difference?
Discussion: Genetic basis of speciation I
2025-10-07
- Snails have so many sites of small differentiation because low gene flow (very low dispersal).
- They are ovoviviparous
- Scaffolds are fragments of genome that you make using pipeline. If same data and same pipeline, should end up with pretty similar scaffolds.
8 Genetic basis of speciation II
2025-10-09
8.1 Regulatory sequences
- In crows, colour divergence not always necessarily due to differences in amino acids but rather in coding regions like introns
- Do regulatory changes underlie qualitatively different types of phenotypic change?
- Structural: what is being made; regulatory: when and how much being made.
- Most regulation at level of transcription, some at translation, via attachment of siRNAs (small interfering; becomes double stranded when attach to mRNA so no translation)
- Mechanism of regulation:
- Fundamentally transcription regulation is due to sequence-specific binding (close to protein-coding region, so (cis-)) of transcription factors (trans-, far across)
- cis-regulatory elements: regions of non-coding DNA that regulate the transcription of nearby genes; intra-molecular; act via binding sequences
- trans-regulatory elements: proteins that regulate the transcription of distant genes; inter-molecular (can be intra- but generally not 1:1, act on many); transcription factors
- Controversy regarding what has driven most fundamental morphological changes important in adaptive history: cis-regulatory changes vs structural changes
- Now known to be false dichotomy
- Selection works differently on cis-reg regions: they are co-dominant (as opposed to recessive; transcription factors are either on-off, but cis elements have three expression levels), reduced pleiotropy ( modular cis-reg regions)
- Sean Carroll, UoWisconsin, hhmi (millions of funding over 7 years, renewable). Convergent evolution of CRE for same gene in multiple Drosophila species. Also wrote popular books.
- “regulatory evolution is the creative force underlying morphological diversity across the evolutionary spectrum, from variation within species to body plans”
- His argument:
- Mosaic pleiotropy: most proteins regulating development particpiate in multiple indepndent developmental processes
- Ancestral genetic complexity: morph. disparate and long-diveged animal txa share similar toolkits of body-building and -patterning genes
- Functional equivalence of distant homologs: many animal toolkit proteins exhibit functional equivalency (redundancy) in vivo when substituted for one another
- Deep homology: formation and differentiation of many structures are governed by simlar sets of genes and regulatory circuits
- Infrequent toolkit gene duplication
- Heterotopy: changes in spatial regulation of toolkit genes and the genes they regulate are associate with morphological divergence
- Modularity of CRE: can have multiple CRE for same genes controlling different aspects of it, like differential expression in different tissues/body parts. (Hoekstra et al argument is that duplication can give same result by copying both CRE and gene.)
- Vast regulatory networks of transcription factors: individual regulatory proteins (transcription factors) control scores to hundreds of target gene CREs; so changes to TF are “catastrophic”
- Proponents of this view: Britten & Davidson, Wray, Valentine
- Hoekstra & Coyne (2007): rebuttal (defence of structural changes)
- Carroll saw morphology as unique to development; but they are actually not so special
- Gene duplications can accomplish many of cis-regulatory genes’ special claims (we now know gene duplication rates are almost similar to point mutation rates)
- Some “regulatory” changes are actually structural: transcription factors include some of fastest evolving human genes.
- At the time there was more evidence of protein changes than CR changes
- Later, more evidence of both factors. Both Carroll and Hoekstra ended up publishing later concluding opposite to their own earlier POV.
- Lot of this is strong focus on characters, not necessarily tied to speciation in closely related species.
Discussion: Genetic basis of speciation II (Gene regulation)
2025-10-21
- Fundamental of -omics approach: look at everything (genomes or transcriptomes), let data show you who’s interesting
- Genome size related to cell size, so in flying birds for metabolic reasons cells have to be small and therefore genomes small. Same in bats versus other mammals.
9 Genomic architecture
aka, Islands of Discord
2025-10-23
- Felsenstein: Skepticism towards Santa Rosalia (against Hutchinson)
- Felsenstein’s paradox: recombination brings good genes together, but makes it hard for them to stay together
- Genomic islands of differentiation: relatively small regions of genome that have elevated divergence, surrounded by regions of low/no divergence. Few key influences around same time, ~ 20 years ago.
- Rebirth of reinforcement
- Genic speciation, Chung-I Wu (versed Coyne): shifting emphasis from isolation at whole genome to individual genes
- Isolation-with-migration model (Nielsen, Wakeley, Hey): shift from coalescent approaches to speciation with gene flow
- Helianthus anomalus is a dune specialist. Non-random hybrid composition between two parental lineages, even in lab crosses, very similar to that in wild hybrid. “Interactions between coadapted parental species’ genes contstrain the genomic composition of hgybrid species.”, Rieseberg et al 1996.
- Anopheles gambiae, Turner et al 2005. Three regions (< 2.8 Mb) contain most differentiation, and these speciation islands remain differentiated despite considerable gene flow (therefore responsible for RI).
- By 2010–15, speciation with gene flow was already dogmatic.
- Genomic islands of differentiation:
- In regions of low recombination
- Co-localise with species-defining traits (e.g., hood in Hooded Crow)
- LD among unlinked islands (e.g., capuchino seedeaters)
- Island (gene) trees = species trees (gene tree–species tree concordance)
- Via 2012: because recombination fragment size is larger than individual genes, some genes/loci hitchhike along with loci under selection. Island grows over time.
- Potential disproportionate role of inversions
- Noor et al 2001 (inc. Jane Reiland): QTLs also associated with regions of inversion
- Inversion: rearrangement (flipping) of gene order in chromatin packing
- Inverted/rearranged areas simply do not get recombined, stay the same, due to the differing order
- Role of structural genomic variants, beyond just SNPs; often associated with adaptations, clines, speciation (Hooper & Price 2015)
- Disputed islands: Turner & Hahn 2010 (genomic islands of speciation or genomic islands and speciation?)
- What is “divergence”?
- Relative: between vs within (FST; sub, tot). Problem: markers with high variation within populations less able to show between (max FST will be low when HS, i.e., recombination, high). Low recombination within population will drive up FST.
- Absolute (dxy): average # pairwise diffs between sequences from two species. Not sensitive to within-population variation.
- If genomic islands of speciation, both relative and absolute divergence should be high.
- Shift from islands of speciation to islands of differentiation.
- Cruickshank & Hahn 2014 showed that absolute doesn’t keep up. Divergence-after-speciation.
In regions of low recombinationCo-localise with species-defining traits (e.g., hood in Hooded Crow)traits arose post-speciationLD among unlinked islands (e.g., capuchino seedeaters)impossible to maintainIsland (gene) trees = species trees (gene tree–species tree concordance)selection speeds lineage sorting- Speciation as sieve (Guerrero & Hahn 2017). Speciation only responsible for sorting/sieving process, the divergence itself ancestral (polymorphism in population). Ancestral balanced polymorphisms are sorted unequally between descendant lineages. (Balanced: both coexist in population.)
- What is “divergence”?
10 Discussion: Genomic architecture
2025-10-28
- Parallel speciation: Speciation version of convergent evolution. Parallel evolution assumes more similar starting points, is a subset of convergent evolution.
- Type I inversions: divergent/balancing selection. Type II: good things on both inversions, so best is to have heterozygous (e.g., trapped along with good stuff might be bad, and some bad effects may be covered by having heterokaryotype). (Zygote: single locus; karyotype: can be set of loci, structural variants.)
11 Hybridisation and introgression
2025-10-30
- Hybridisation: Reproduction between members of genetically differentiated populations that yields offspring of mixed ancestry. e.g., Carrion & Hooded Crows
- Initiated by range change, habitat shifts/mosaics, disturbance
- Outcomes: fusion (speciation reversal), hybrid speciation (transgressive variation; forms a 3rd group), balance (hybrid zones), introgression (seeping of small set of genes; adaptations jump the species barrier; cf fusion)
- Fusion: Common & Chihuahuan Raven. Chi. originally split from Common, then one fused back with Common forming the California type, while the other remains Chi. So the California lineage is considered to have fused (subsumed by Common).
- Wholesale collapse of distinct entity, continue to be more homogenised.
- Hybrid speciation
- Allopolyploid speciation: different numbers of chromosomes. Angiosperms, ferns, Acropora.
- Homoploid hybrid speciation: no ploidy changes, more common in animals. Cottus fish, sparrows (House, Italian, Spanish), snub-nosed monkey
- Transgressive phenotypes: does hybridisation generate fitness-related phenotypes that lie outside the parental distribution?
- Balance (hybrid zones)
- Mice, mussels, crickets (habitat mosaics)
- Tension zones: dispersal into zone vs selection against crossing.
- Nick Barton developed a lot of the theory.
- Fire-bellied Toad brilliant example of this.
- Ratio of effective selection and recombination (see photo in phone).
- Spatial coupling of clines (genotypic & phenotypic). Hybrid zones tend to be drawn to areas of lo wpopulation density.
- What enhances coupling? Reduced recombination, epistasis among barrier loci, moving clines to low fitness landscape.
- Brumfield et al 2001, clines in manakins do not line up with hybrid zones. The transition in certain markers are at some different geographical location than the actual hybrid zone (where crucial trait also transitions), so looking at just that would suggest introgression.
- Carling & Brumfield 2008: Lazuli and Indigo Buntings.
- Introgression
- Kind of an exception to hybrid zone, it manages to escape forces acting to keep it within the zone. Something that was originally in one species ends up in another.
- Hybridisation as a source of adaptive variation: hybridisation is common, mutations are rare.
- Grants in Galapagos finches. High altitude adaptation in Tibetans (adaptive introgression with denisovans). Heliconius aposematism: same morph in sympatry, different in allopatry.
- ABBA-BABA test (Patterson’s D = (ABBA - BABA) / (ABBA + BABA)): test for introgression vs incomplete lineage sorting. In former, ABBA should be more likely than BABA, in latter should be equal (random). Need four groups, use large genome if not whole genome.
- Migrant tracts: chromosome segments that show migrant ancestry. Length distribution will change over time due to recombination. Archaic ancestor SNPs will become smaller/rarer farther away from introgression event. Can compare increasing vs decreasing vs constant introgression.
- Identify private alleles (only found in one population). If due to de novo mutations, random distribution, but if past introgression, expect longer runs of identity in SNPs.
- Is introgression the new allopatry?
12 Discussion: Introgression
2025-11-04
- Looks like Jensen et al 2024 did not start out with hypothesis/prediction, instead got the data and framed the story around it.
- PSMC: only need one individual to recreate the population history
13 Old variation is the new thing
2025-11-06
- G. difficilis has few genes with large effect on beak size, so also ecological impact on top of easily identifiable morphological feature. When looking at overall species tree, the genetic basis for the difference between these species is older than the basis for species split itself (currently sister species can be blunt or pointed beak). Introgression with blunt type introduced blunt beak in new lineages.
- DMI is a long waiting game (de novo variants): point mutations, duplication, scrambling, retroposition, mobile elements, lateral gene transfer (viruses, parasites), fission and fusion, chimeric genes (origins in separate genes)
- Problems with this
- Most are deleterious
- Start at low frequency (1/2 Ne)
- Small effect mutations genomically diffuse
- Standing variation
- Diversity of genetic variant present in a population. Can include de novo mutation.
- Ancestral variation: genetic variation present in common ancestor of populations/species being compared
- Substrate for rapid divergence, no waiting. Explain why speciation alleles often predate species divergence.
- No wait for new mutations
- Pre-screened for deleterious alleles
- High initial frequency
- Theoretically, small effect alleles clustered (so enriched for large-effect haplotypes/islands)
- Co-adapted genes may be clustered, e.g., LD
- Sources: mutation, large Ne, balanced polymorphisms (broad geo/ecological range species, e.g., balanced alleles for two different habitats), introgression (admixture variation)
- Substrate for rapid divergence, no waiting. Explain why speciation alleles often predate species divergence.
- Speciation with gene flow vs sieve
- Old hybridisation creates variation that can be sorted into new species
- Combinatorial view of speciation: Seehausen. Started thinking about this over 20 years ago.
- 2005: not the origin but the reassembly of several old variants into novel combinations that constitutes the beginning of a speciation event
- Hybrid swarm: hybridisation upon invasion of new habitat producing lots of heritable variation for ecological traits
- Syngameon: a complex of genetically weakly but ecologically highly distinctive species capable of exchanging genetic material
- Implications of combinatorial speciation:
- Decoupling of mutation and speciation rates
- Facilitates evolution of LD in face of gene flow
- These two together: no wait when ecological opportunity arises
- Things still to do:
- Rapid (inc. non-adaptive) radiations vs depauperate sister taxa
- Compare allele splits to species splits
- Compare LD among species
- Locus size effects for old vs new variants
- Revisit old data (e.g., incongruent gene trees)
14 Discussion: Old variation
2025-11-11
- Mitochondrial DNA has way more drift for same eff pop size than nuclear, which is why you usually see discordance.
15 Microbes, microbiomes and speciation
2025-11-13
- Unculturable paradigm: only 1-2% of bacteria are culturable (so biased view)
- Rule of thumb for bacterial species definition is 3% divergent at small-ribosomal subunit (SSU) rRNA. Why SSU?
- Encoded by nucleus and organelles (e.g, 16S)
- Slow and fast (hour-hands, minute-hands)
- Conserved structure
- Practicalities
- Ancient and essential (ribosome)
- Highly integrated (>100 co-evolved cellular RNAs and proteins). Should also minimise LGT
- Lateral/Horizontal Gene Transfer: Movement of genetic material between organisms other than by vertical (parent-to-offspring) transmission.
- Problems with SSU rRNA:
- Discordance with DNA hybridisation: overall rates of DNA similarity (reannealing) did not match with those of SSU rRNA
- Ecological differentiation within “species”: 3% rule of thumb doesn’t capture full picture.
- Descriptive, not a species concept
- Reality of species in bacteria is different (how recombination works), so definitions will be too
- No meiosis, so recombination rare (not every single generation)
- Recombining happens with wide array of organisms (promiscuous)
- Highly localised: adding/dropping individual genes at very small levels
- Transfer of non-homologous loci
- Exogenous DNA is a major raw material for adaptation
- Basis of bacterial species cohesion
- Extinction/recolonisation
- Metapopulation model, bottlenecks
- Stabilising selection on ecotypes
- Extinction/recolonisation
- Bacterial species, OK… Are there bacterial communities? From different habitats? On/in different hosts? How tight are associations between species or between species and environment?
- Yes, habitat-wise.
- Yes, host-specific.
- Tight association with host life cycle (transfer of symbionts/microbiota)
- Microbiome
- Microbes that live on and inside a defined habitat (e.g., an organism’s body)
- Human body has 10x prokaryotic cells than eukaryotic
- C. difficile is natural part of microbiome, but in hospitals the antibiotics around affect other bacteria that hold it down in normal condition. Fecal transplants help with this more than antibiotics.
- Holobiont
- A host and its microbiota. Multicellular eukaryotes are not and never have been autonomous organisms.
- Hologenome: the complete genetic content of the host genome, its organelles, and its microbiome
- Host’s nuclear genome inherited a la Mendel, but microbiome is Lamarckian and often uniparental
- So, sources of holobiont variation:
- SNPs
- Recombination
- Gene loss and duplication
- Microbe loss or amplification
- Horizontal gene transfer:
- Red alga seaweed Porphyra: genes that allow humans to digest, usually only found on bacteria that live on the plant, were transferred via HGT to people in Japan, especially mother-child (Hehemann et al 2010)
- WRT evolution: equate genes in genome with microbes in microbiome
- How many host-associated microbes cause speciation and what are they? Microbe-free (antibiotic) versions of same old experiments
- Brucker & Bordenstein 2013: Nasonia wasps and microbiome affecting viability
- Phylosymbiosis: microbial community similarity that mirrors the phylogeny of their host (Lim & Bordenstein 2020). Dunaj et al 2020 showed phylosymbiosis in ovaries and silk glands but not fat/midgut.
- Cophylogeny: agreement of evolutionary histories of symbionts and their hosts (Mark Hafner LSU; not bacteria, but lice and the host gophers).