# The rise of grasslands is linked to atmospheric CO2 decline in the late Palaeogene

### Taxonomic representativeness in grasslands

To quantify the taxonomic representativeness of vascular-plant families found in open-habitat landscapes (Supplementary Fig. 1), we selected seven distantly distributed eco-regions dominated by grasslands from the World Wide Fund for Nature58. Using the coordinate boundaries of each of the selected eco-regions, we extracted the vascular plant taxa (=Tracheophyta) using the R ‘rgbif:Interface to the Global ‘Biodiversity’ Information Facility API’ package59 with the option ‘hasGeospatialIssue=FALSE’, that includes only records without spatial issues (e.g., invalid coordinates, country coordinate mismatch). Plant families were sorted according to the number of species, removing duplicated species.

### Palaeobotanical analysis

Asteraceae and Poaceae have a fairly similar fossil record; their oldest findings are known from the Late Cretaceous—which mainly comprise microscopic remains (that is, phytoliths60 or pollen grains61)—whereas the first indisputable macroscopic Asteraceae and Poaceae fossils are first known from the Eocene62,63, with a substantial increase of diversity since the Oligocene/Miocene. While the fossil record of Poaceae has been fully revised4,64,65, the fossil record of Asteraceae has not been as carefully reviewed. We compiled published pollen and macroscopic fossil data for Asteraceae including all fossil species assigned to Asteraceae (Supplementary Table 1). The earliest record of the Asteroideae (the clade that includes the most common open-habitat daisy tribes) occurs since the Late Oligocene of New Zealand but in very low frequencies. Fossils refer to this subfamily increased in abundance and diversity during the Miocene and Pliocene. Pollen referred to Artemisia, in particular, did not become abundant until the Middle-Late Miocene with several reports from central Europe, Asia and North America. Pre-Miocene findings need further verification. Overall, the Late Oligocene and in particular the Miocene witnessed the major step in the diversification of Asteraceae; ca. 80% of the fossil species recorded have been assigned to this time interval.

### Divergence-time estimation

To construct the Asteraceae supertree (2723 tips), we first inferred a backbone chronogram using 14 plastid DNA regions from 54 species, including representatives of all 13 subfamilies, with an additional four species of Calyceraceae used as outgroup taxa (Supplementary Table 2). Sequences were compiled from GenBank and each region was aligned separately using MAFFT66 with the options maxiterate 1000 and localpair. Two fossil constraints were applied: (i) a macrofossil (capitulum) and associated pollen (Raiguenrayun cura + Mutisiapollis telleriae) from the Eocene (45.6 Mya) to calibrate the non-Barnadesioideae Asteraceae clade63 and; (ii) the fossil-pollen species Tubulifloridites lilliei type A from the late Cretaceous (72.1 Mya)61 to calibrate the crown Asteraceae (considering T. lilliei as a stem group, see Huang et al.67 for further discussion). Divergence-time estimates and phylogenetic relationships were inferred using RevBayes17. For the aligned molecular sequences we assume a general-time reversible substitution model with gamma-distributed rate variation among sites (GTR+Γ), an UCLN prior on substitution-rate variation across branches (UCLN relaxed clock), and a birth–death prior model on the distribution on node ages/tree topologies. A densely sampled phylogeny is crucial to identify shifts in diversification rates. Therefore, we constructed a supertree by inserting eleven individual sub-trees —representing all subfamilies of the Asteraceae except those less diverse or monotypic clades (that is, Gymnarrhenoideae, Corymbioideae, Hecastocleidoideae, Pertyoideae)—into the calibrated backbone chronogram. This method follows a previous study that constructed a supertree of grasses using the same approach11. Each of the eleven clades of Asteraceae was built using their own set of markers and the same phylogenetic approach as the one used to infer the backbone tree (Supplementary Table 2). Sequence data for each of the eleven trees and their respective outgroup taxa were collected from Genbank using the NCBIminer tool68. The estimated ages of the nodes given by the backbone analysis were used to constrain the age of each of the eleven sub-trees (Supplementary Table 2). Divergence-time estimates and phylogenetic relationships for each of the eleven sub-clades were estimated using RevBayes as described above. The eleven trees were grafted onto the backbone tree using the function ‘paste.tree’ from the phytools R package69. We used ggtree R package70 to plot the circle phylogenetic tree of Figure 1 and phytools69 to include the concentric geological scale. The supertree of the grass family (3,595 taxa) was obtained from Spriggs et al.11 (Supplementary Table 3). They inferred two chronograms using two different calibration scenarios, that is, a younger scenario (#1) calibrated using an Eocene megafossil62 and an older scenario (#2) calibrated using Cretaceous phytoliths60. We ran our diversification analyses using these two chronograms.

### Inferring changes in diversification rate through time

Our species-diversification model is based on the reconstructed evolutionary process described by Nee et al.12 and more specifically on the episodic birth–death process18,19,20,21. We assume that each lineage gives birth to another species with rate λ (cladogenetic speciation events) and dies with rate μ (extinction event; see Fig. 4). We model diversification rates (i.e., speciation and extinction rates) as constant within an interval but independent between intervals, where intervals are demarcated by instantaneous rate-shift events, and equal among contemporaneous lineages. We denote the vector of speciation rates Λ = {λ1, …, λk} and extinction rates M = {μ1, …, μk} where λi and μi are the (constant) speciation and extinction rates in interval i. Additionally, we use the taxon-sampling fraction at the present denoted by ρ15,16. Following the notation of May et al.20, we construct a unique vector, ({mathbb{X}}), that contains all divergence times and rate-shift event times sorted in increasing order. It is convenient for notation to expand the vectors for all the other parameters so that they have the same number of elements (k=| {mathbb{X}}|). Let Ψ denote an inferred tree relating n species, comprising a tree topology, τ, and the set of branching times, ({mathbb{T}}). We use the notation S(2, t1 = 0, T) to represent the survival of two lineages in the interval [t1, T], which is the condition we enforce on the reconstructed evolutionary process. Transforming Equation (A4) in May et al.20 to our model yields the probability density of a reconstructed tree as:

$$f({{Psi }}| N({t}_{1},=,0),=,2,S(2,{t}_{1},=,0,T))\ =frac{{2}^{n-1}}{n!}\ times {left(1!+!mathop{sum }limits_{i = 0}^{k}left(frac{{mu }_{i}}{{mu }_{i}-{lambda }_{i}}times {{{mbox{e}}}}^{mathop{sum }limits_{j = 0}^{i-1}({mu }_{j}-{lambda }_{j})({x}_{j+1}-{x}_{j})}times left({{{mbox{e}}}}^{({mu }_{i}-{lambda }_{i})({x}_{i+1}-{x}_{i})}-1right)right)-frac{rho -1}{rho }times {{{mbox{e}}}}^{mathop{sum }limits_{i = 0}^{k}({mu }_{i}-{lambda }_{i})({x}_{i+1}-{x}_{i})}right)}^{-2}\ times {left({{{mbox{e}}}}^{-{{{{mathrm{log}}}}},(rho )mathop{sum }limits_{j = 0}^{k}({mu }_{j}-{lambda }_{j})({x}_{j+1}-{x}_{j})}right)}^{2}\ times mathop{prod}limits_{iin {{mathbb{I}}}_{T}}Bigg[{lambda }_{i}times Bigg(1+mathop{sum }limits_{l = i}^{k}Bigg(frac{{mu }_{l}}{{mu }_{l}-{lambda }_{l}}times {{{mbox{e}}}}^{mathop{sum }limits_{j = 0}^{l-1}({mu }_{j}-{lambda }_{j})({x}_{j+1}-{x}_{j})}times Big({{{mbox{e}}}}^{({mu }_{l}-{lambda }_{l})({x}_{l+1}-{x}_{l})}-1Big)Bigg) \ -frac{rho -1}{rho }times {{{mbox{e}}}}^{mathop{sum }limits_{l = i}^{k}({mu }_{l}-{lambda }_{l})({x}_{l+1}-{x}_{l})}Bigg)^{-2}times {{{mbox{e}}}}^{-{{{{mathrm{log}}}}},(rho )mathop{sum }limits_{j = i}^{k}({mu }_{j}-{lambda }_{j})({x}_{j+1}-{x}_{j})}Bigg].$$

(1)

The first term, (frac{{2}^{n-1}}{n!}) corresponds to the combinatorial constant for the number of labeled histories18, the second term corresponds to the condition of two initial lineage at the root of the phylogeny surviving until the present, and the third term corresponds to the product of all speciation events and the new lineages surviving until the present.

### Empirical taxon-sampling model

Here we develop an empirical taxon-sampling model that uses taxonomic information on the membership of unsampled species to clades and speciation times of unsampled species, which is an extension to the work by Höhna et al15,16 and similar to the approach used by Stadler and Bokma57. The main difference of our approach and the approach by Stadler and Bokma57 is that their model uses a constant-rate birth–death process (compared to our episodic birth–death process). Additionally, Stadler and Bokma57 derive the density of the missing species using a random probability s of an edge being sampled, which differs from our approach where we integrate over the time of the missing speciation event. Nevertheless, at least for the constant-rate birth–death process, both approaches arrive at the same final likelihood function.

We include information on the missing speciation events by integrating over the known interval when these speciation events must have occurred (that is, between the stem age tc of the MRCA of the clade and the present, Figure 4). This integral of the probability density of a speciation event is exactly the same as one minus the cumulative distribution function of a speciation event16,

$$F({t}_{c}| N({t}_{1})=1,{t}_{1}le tle T)=1-frac{1-P(N(T) > 0| N({t}_{c})=1)exp (r({t}_{c},T))}{1-P(N(T) > 0| N({t}_{1})=1)exp (r({t}_{1},T))},$$

(2)

where t1 is the age of the root. The probability of survival is given by:

$$P(N(T), > ,0| N({t}_{c})=1)\ !=!{left(1+mathop{sum }limits_{i = c}^{k}left(frac{{mu }_{i}}{{mu }_{i}-{lambda }_{i}}times {{{mbox{e}}}}^{mathop{sum }limits_{j = c}^{i-1}({mu }_{j}-{lambda }_{j})({x}_{j+1}-{x}_{j})}times ({{{mbox{e}}}}^{({mu }_{i}-{lambda }_{i})({x}_{i+1}-{x}_{i})}-1)right)-frac{rho -1}{rho }times {{{mbox{e}}}}^{mathop{sum }limits_{i = c}^{k}({mu }_{i}-{lambda }_{i})({x}_{i+1}-{x}_{i})}right)}^{-1}$$

(3)

where (k=| {mathbb{X}}|). Let us define n as the number of sampled species, m as the total number of species in the study group, ({mathbb{K}}) as the set of missing species per clade and (| {mathbb{K}}|) the number of clades with missing species. Additionally, we define ci as the time of most recent common ancestor of the ith clade. Then, the joint probability density of the sampled reconstructed tree and the empirically informed missing speciation times is

$$f({{Psi }},{mathbb{K}}| N({t}_{1},=,0),= ,2,S(2,{t}_{1},=,0,T))=f({{Psi }}| N({t}_{1},=,0),=,2,S(2,{t}_{1},=,0,T))\ times frac{(m-1)!}{(n-1)!}mathop{prod }limits_{i=1}^{| {mathbb{K}}| }frac{1}{{k}_{i}!}{left(1-F(t| N({c}_{i}) = 1,{c}_{i}le tle T)right)}^{{k}_{i}}.$$

(4)

### Prior models on diversification rates

Our model assumes that speciation and extinction rates are piecewise constant but can be different for different time intervals (Fig. 4). Thus, we divide time into equal-length intervals (e.g., Δt = 1). Following Magee et al.21, we specify prior distributions on the log-transformed speciation rates (({{{{{{mathrm{ln}}}}}}},({lambda }_{i}))) and extinction rates (({{{{{{mathrm{ln}}}}}}},({mu }_{i}))) because the rates are only defined for positive numbers and our prior distributions are defined for all real numbers. We apply and compare three different prior models: (i) an UCLN prior distribution, (ii) a GMRF prior21, and (iii) a HMRF prior21. The first prior distribution specifies temporally uncorrelated speciation and extinction rates, whereas the second and third prior distributions are autocorrelated prior models. The assumption of autocorrelated rates might make more sense biologically (an interval of high speciation rates is likely to be followed by another interval with high speciation) but also improves our ability to estimate parameters21. Nevertheless, our inclusion of both uncorrelated and autocorrelated prior distributions allows for testing whether an uncorrelated or autocorrelated model is preferred.

The prior distribution on the speciation rates λi and extinction rates μi are set in exactly the same form in our models with their respective hyperprior parameters. Thus, for the sake of simplicity, we omit the prior distribution on the extinction rates here in the text. Our first prior distribution, the UCLN distributed prior, specifies the same prior probability for each speciation rate λi,

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) sim ,{{mbox{Normal}}},(m,sigma ).$$

(5)

Thus, each speciation rate is independent and identically distributed.

Our second prior distribution, the GMRF prior, models rates in an autocorrelated form analogous to a discretized Brownian motion. That is, we assume that diversification rates λ(t) and μ(t) are autocorrelated and the rates in the next time interval will be centered at the rates in the current time interval,

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) sim ,{{mbox{Normal}}},({{{{{{mathrm{ln}}}}}}},({lambda }_{i-1}),{sigma }_{lambda }).$$

(6)

The standard deviation σ regulates the amount of change between each time interval.

Our third prior distribution, the HSMRF prior, is very similar to the GMRF but additionally allows for the variance to change between time intervals,

$${gamma }_{i} sim ,{{mbox{halfCauchy}}},(0,1)$$

(7)

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) sim ,{{mbox{Normal}}},({{{{{{mathrm{ln}}}}}}},({lambda }_{i-1}),sigma {gamma }_{i}).$$

(8)

The HSMRF prior model is more adaptive than the GMRF; it allows for more extreme jumps between intervals while favoring/smoothing more constant-rate trajectories if there is no evidence for rate changes21.

These three prior models of diversification rates provide the null models of our analyses as they do not assume any dependence to an environmental variable. We use these models first to estimate diversification rates through time before testing for a correlation of the speciation or extinction rate to an environmental variable (e.g., atmospheric CO2 or paleo-temperature). Magee et al.21 found that 100 epochs perform well for autocorrelated models. Since we do not know how many bins (i.e., epochs) should be used for the episodic birth–death process, we test various numbers of equal-sized epochs (4, 10, 20, 50, 100, and 200, Supplementary Figure 7). We show both the median posterior diversification rates (Supplementary Fig. 6) as well as select the best fitting model based on the number of epochs (Supplementary Fig. 8).

### Correlation between speciation and extinction rate to CO2

Previously, Condamine et al.22 introduced an environmentally-dependent diversification model. In their model, diversification rates are correlated with an environmental variable22,23,24,25,26,27. For example, the speciation rate can be modeled as (lambda (t)={lambda }_{0}{{{mbox{e}}}}^{beta times {{{mbox{CO}}}}_{2}(t)}) (see Box 1 in Condamine et al.22), which is equivalent to ({{{{{{mathrm{ln}}}}}}},(lambda (t))={{{{{{mathrm{ln}}}}}}},({lambda }_{0})+beta times {{{mbox{CO}}}}_{2}(t)). Since we are using the episodic birth–death process which has piecewise-constant diversification rates, we modify the original continuous-time environmentally-dependent diversification model to ({{{{{{mathrm{ln}}}}}}},({lambda }_{i})={{{{{{mathrm{ln}}}}}}},({lambda }_{0})+beta times {{{mbox{CO}}}}_{2,i}), which is equivalent to and more conveniently written as ({{{{{{mathrm{ln}}}}}}},({lambda }_{i})={{{{{{mathrm{ln}}}}}}},({lambda }_{i-1})+beta times {{Delta }}{{{mbox{CO}}}}_{2,i}) where ΔCO2,i = CO2,i − CO2,i−1. Note that we only use the so-called exponential dependency and not the linear dependency24 because the linear dependency can result in negative rates which are mathematically and biologically impossible71.

We applied this original environmentally-dependent diversification model and three environmentally-dependent diversification models described in this work. The original environmentally-dependent diversification model of Condamine et al.22 does not accommodate diversification-rate variation that is independent of the environmental variable. Instead, our three environmentally-dependent diversification models build on our diversification-rate prior models which allow for rate variation through time (see above). Thus, our environmentally-dependent diversification models will collapse to the episodic birth–death model if rates of diversification and atmospheric CO2 are uncorrelated and hence inherently allows for diversification rate variation. The linkage of environmental variable and diversification rates without allowing for independent diversification rate variation might provide spurious results, as has been noticed for trait evolution72 and state-dependent diversification rates73. We explore this potential of misattribution of diversification rate variation to the environmental variable in our model selection procedure and simulation study (see below).

As before, we omit the description of the extinction rates in the text for the sake of notational simplicity. Both speciation and extinction rates are model exactly in the same way with their corresponding set of hyperparameters (e.g., see the Supplementary Tables 47). Our first environmentally-dependent diversification model has a fixed linkage between the diversification rate variation and variation in the environmental variable;

$${lambda }_{0} sim ,{{mbox{Uniform}}},(0,100)$$

(9)

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) = {{{{{{mathrm{ln}}}}}}},({lambda }_{i-1})+{beta }_{lambda }times {{Delta }}{{{mbox{CO}}}}_{2}.$$

(10)

This model does not have a counterpart in the above diversification rate priors, but is included as a comparison to the work Condamine et al.22.

Our second environmentally-dependent diversification model adds UCLN variation on top of the variation in the environmental variable;

$${lambda }_{0} sim ,{{mbox{Uniform}}},(0,100)$$

(11)

$${{{{{{mathrm{ln}}}}}}},({hat{lambda }}_{i})={{{{{{mathrm{ln}}}}}}},({hat{lambda }}_{i-1})+{beta }_{lambda }times {{Delta }}{{{mbox{CO}}}}_{2}$$

(12)

$${epsilon }_{i} sim ,{{mbox{Normal}}},(0,sigma )$$

(13)

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) = {{{{{{mathrm{ln}}}}}}},({hat{lambda }}_{i})+{epsilon }_{i}.$$

(14)

Thus, this model collapses to the above UCLN model if there is no correlation between the environmental variable and diversification rates (β = 0). Importantly, the difference in the variation of the diversification rates and environmental variable is independent in each epoch, contributed by the variable ϵi. The environmental-dependent part of the diversification rates ({hat{lambda }}_{i}) is equivalent to the fixed environmentally-dependent diversification model.

Our third environmentally dependent diversification model adds correlated lognormal variation on top of the fixed environmentally-dependent diversification mode;

$${lambda }_{0} sim ,{{mbox{Uniform}}},(0,100)$$

(15)

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) sim ,{{mbox{Normal}}},({{{{{{mathrm{ln}}}}}}},({lambda }_{i-1})+{beta }_{lambda }times {{Delta }}{{{mbox{CO}}}}_{2}),left.sigma right).$$

(16)

This model an extension of the above GMRF model and collapses to it if there is no correlation between the environmental variable and diversification rates (β = 0). As the GMRF model is a discretized Brownian motion model, this environmentally-dependent extension can be considered as a Brownian motion with trend model, where the trend is predicted by the environmental variable. Instead of writing this model with a separate environmentally-dependent part ({hat{lambda }}_{i}) and autocorrelated part ϵi, we directly use the combined environmentally-dependent and independent rate variation as the mean for the next time interval. Nevertheless, we want to emphasize this equivalence to bridge the connection to the UCLN model above.

Finally, our fourth environmentally-dependent diversification model extends the above HSRMF to allow for diversification rates predicted by the environmental variable;

$${lambda }_{0} sim ,{{mbox{Uniform}}},(0,100)$$

(17)

$${gamma }_{i} sim ,{{mbox{halfCauchy}}},(0,1)$$

(18)

$${{{{{{mathrm{ln}}}}}}},({lambda }_{i}) sim ,{{mbox{Normal}}},({{{{{{mathrm{ln}}}}}}},({lambda }_{i-1})+{beta }_{lambda }times {{Delta }}{{{mbox{CO}}}}_{2}),sigma {gamma }_{i}left.right).$$

(19)

This model follows the same extension as the environmentally-dependent GMRF model with local adaptability of the rate variation through the parameter γi, as before for the HSMRF.

In all our four models, we denote the correlation factor by β. If β > 0 then there is a positive correlation between the speciation rate and CO2, that is, if the CO2 increases then the speciation rate will also increases. By contrast, if β < 0 then there is a negative correlation between the speciation rate and CO2, that is, if the CO2 concentration increases then the speciation rate will decrease. Finally, if β = 0 then there is no correlation and our environmentally-dependent diversification model collapses to corresponding episodic birth–death model.

All four models have the same parameter for the initial speciation rate λ0 with a uniform prior distribution between 0 and 100. The models are constructed in increasing complexity and all three models can collapse either to the fixed environmentally-dependent diversification model or to their environmentally independent episodic birth–death process.

### Environmental data

In our analyses we tested for correlation between two environmental factors: CO2 and temperature. The concentration of atmospheric CO2 throughout the Cenozoic were compiled by Beerling & Royer41 using terrestrial and marine proxies. An updated dataset was provided by Dr. Dana Royer. Paleo-temperature fluctuations come from Zachos et al.74. Raw data were extracted from ftp://ftp.ncdc.noaa.gov/pub/data/paleo/.

Analogous to our tests about the number of epochs for the diversification rate analyses, we computed the arithmetic mean for the environmental variable for 1-, 2- and 5-million year intervals. We both estimated the correlation between the environmental variable and diversification rates for each interval size and performed model selection using Bayes factors.

### Model selection

We performed three sets of empirical diversification rate analyses for each dataset. We estimated the diversification rates over time using three different models, we estimated the environmentally-dependent diversification rates using four different models, and we applied two different taxon-sampling schemes. For the first two sets of analyses we performed standard model selection in a Bayesian framework using Bayes factors75. Thus, we computed the marginal likelihood for each model using stepping-stone sampling76 as implemented in RevBayes77. We run 128 stepping stones with each stone comprising of its own MCMC run with 2000 iteration and on average 1374 moves per iteration (i.e., the runs being equivalent to standard single-move-per-iteration software with 2,748,000 iterations).

We tested the support for the environmental correlation using Bayes factors computed from the posterior odds. Our prior probability for the correlation factor β was symmetric and centered at zero, that is, we specified exactly a probability of 0.5 that β < 0 and β > 0. Thus, the prior probability ratio of (frac{P(beta < 0)}{P(beta > 0)}=1.0) Then, to compute the Bayes factor for in support of a negative correlation is simply the number of MCMC samples with β < 0 divided over the total number of MCMC samples.

We did not compute marginal likelihoods for the two different sampling schemes; the uniform taxon sampling and the empirical taxon sampling. Empirical taxon sampling uses additional data, the age ranges of the missing speciation events, and two analyses with different data cannot be compared using traditional model selection. Instead, we performed a simulation study to show the robustness of our parameter estimates under empirical taxon sampling and the resulting bias if wrongly uniform taxon sampling was assumed.

### Simulation study

We performed two sets of simulations; focusing (a) on the environmentally-correlated diversification model, and (b) the incomplete taxon same scheme. First, we simulated phylogenies under the UCLN and GMRF environmentally-correlated diversification model using the R package TESS78,79. We set the diversification rate variation to σ = {0, 0.02, 0.04} and correlation factor to β = {0, −0.005, −0.01}. Thus, our simulations included the constant-rate birth–death process (when σ = 0 and β = 0), time-varying but environmentally independent diversification rates (when σ > 0 and β = 0), the fixed environmentally-dependent diversification model (when σ = 0 and β ≠ 0), and the time-varying and environmentally-dependent diversification model (when σ > 0 and β ≠ 0). For each setting, we simulated ten diversification rate trajectories (Fig. S16 and S17) and trees (Fig. S18 and S19). We analyzed each simulated tree under the same four environmentally dependent diversification model as in our empirical analysis (see above).

Second, we simulated phylogenetic trees under empirical taxon sampling to validate the correctness of our model derivation. Unfortunately, simulation of empirical taxon sampling is not straight forward. We circumvented the problem by randomly adding the missing species to the daisy phylogenetic tree, then drawing new divergence times under (a) a constant-rate birth–death process, and (b) a time-varying episodic birth–death process with rates taken from the empirical estimates. Then, we pruned the additional species to mimic empirical taxon sampling. The simulations under the constant-rate birth–death process provide information about falsely inferring diversification rate variation (false positives) and the simulations under the time-varying episodic birth–death process provide information about the power to correctly inferring diversification rate variation (power analysis). We simulated 100 trees under each setting and analyzed each tree using the GMRF prior model with both empirical and uniform taxon sampling. The MCMC inference settings were identical to the empirical analyses.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.