Introduction
We are enduring a biodiversity crisis (Myers et al. Reference Myers, Mittermeier, Mittermeier, Da Fonseca and Kent2000; Pimm and Raven Reference Pimm and Raven2000; Brook et al. Reference Brook, Sodhi and Ng2003; Thomas et al. Reference Thomas, Telfer, Roy, Preston, Greenwood, Asher, Fox, Clarke and Lawton2004; Barnosky et al. Reference Barnosky, Matzke, Tomiya, Wogan, Swartz, Quental, Marshall, McGuire, Lindsey and Maguire2011), and harnessing all possible data to inform on biodiversity patterns through space and time is critical to better understand the history of life and to be able to set and accomplish conservation goals (Dietl and Flessa Reference Dietl and Flessa2011; Rick and Lockwood Reference Rick and Lockwood2013; Hunt and Slater Reference Hunt and Slater2016). We know that species and communities move and reorganize in response to climate change and habitat alterations (Walther et al. Reference Walther, Post, Convey, Menzel, Parmesan, Beebee, Fromentin, Hoegh-Guldberg and Bairlein2002; Parmesan Reference Parmesan2006; Walther Reference Walther2010). With increasing anthropogenic pressures, including human population growth, habitat destruction and fragmentation, and intensifying land use, there will be less habitat and climate connectivity for species movement and biological community reorganization in the future (Rosenzweig et al. Reference Rosenzweig, Casassa, Karoly, Imeson, Liu, Menzel, Rawlins, Root, Seguin, Tryjanowski, Parry, Canziani, Palutikof, van der Linden and Hanson2007; Ryberg et al. Reference Ryberg, Hill, Painter and Fitzgerald2013; McGuire et al. Reference McGuire, Lawler, McRae, Nuñez and Theobald2016). Habitat composition is also an important consideration for sustaining metacommunity dynamics (Ryberg and Fitzgerald Reference Ryberg and Fitzgerald2016). Species have perished and will perish locally or entirely, and species loss due to extinctions will take millions of years for recovery (Davis et al. Reference Davis, Faurby and Svenning2018). Because the ecological and evolutionary processes leading to adaptation, movement, and extinction occur over long time periods and because the Earth has experienced major alterations to geographic ranges and composition of flora and fauna in the past, it is critical to draw on a deep time perspective to investigate species and community response to climate and environmental change.
The current geographic arrangement of species distributions and community compositions have been significantly influenced by humans (Sinclair et al. Reference Sinclair, Mduma and Arcese2002; Kampichler et al. Reference Kampichler, Van Turnhout, Devictor and Van Der Jeugd2012; Newbold et al. Reference Newbold, Hudson, Hill, Contu, Lysenko, Senior, Börger, Bennett, Choimes and Collen2015; Pineda-Munoz et al. Reference Pineda-Munoz, Wang, Lyons, Tóth and McGuire2021). In fact, it has been shown that human pressures on a landscape predict species geographic ranges better than species own biological traits (Di Marco and Santini Reference Di Marco and Santini2015). Thus, incorporating multiple lines of evidence from field and laboratory experimentation, as well as observation and modeling studies, is particularly important to better understand the response of species and communities to climate and environmental change (Louys et al. Reference Louys, Wilkinson, Bishop and Louys2012). Carefully designed ecological experiments over local to regional geographic extents reveal ecological processes important for determining ecological community composition, dominance, and abundance structures, such as stochastic ecological drift, priority effects, and filtering due to niche selection (Chase Reference Chase2007; Ryberg et al. Reference Ryberg, Smith and Chase2012; Fukami Reference Fukami2015). Coordinated distributed experiments over larger geographic extents are positioned to address global ecological and environmental problems and contribute to a better understanding of basic ecological theory (Fraser et al. Reference Fraser, Henry, Carlyle, White, Beierkuhnlein, Cahill, Casper, Cleland, Collins and Dukes2013). Physiological experimentation reveals other global change drivers relevant to understanding species geographic range shifts, such as oxygen- and capacity-limited thermal tolerance as a way to link biological levels of organization from cells to ecosystems (Bozinovic and Pörtner Reference Bozinovic and Pörtner2015). Furthermore, advances in modeling species ecological niches and geographic distributions have made good use of observational data to evaluate past and potential species range shifts, demographic changes, lineage diversification, and extirpation of species (Maguire et al. Reference Maguire, Nieto-Lugilde, Fitzpatrick, Williams and Blois2015). Taken together, the modeling advances allow us to better understand how and why species and communities move and reorganize in response to climate and environmental change, and we can begin to anticipate future responses due to impending climate, land use, and land cover change.
Another important line of evidence to better understand the response of species and communities to climate and environmental change comes from the fossil record (Pardi and Smith Reference Pardi, Smith and Louys2012). Fossils show where and when species occurred in the past, as well as aspects of past species’ morphology and which species occurred together within a community or regional species pool. Ecological information from fossils have been derived from their morphology, chemical composition, and depositional setting of associated sedimentary deposits (Damuth et al. Reference Damuth, DiMichele, Potts, Sues and Wing1992; Croft et al. Reference Croft, Su and Simpson2018). This information has allowed paleoecologists to answer many relevant questions about the response of species and communities to climate and environmental change, because they have been able to track species through space and time (Jablonski et al. Reference Jablonski, Roy, Valentine, Blackburn and Gaston2003; Stigall Reference Stigall2008); evaluate the geographic shifts of species (Enquist et al. Reference Enquist, Jordan and Brown1995; Rödder et al. Reference Rödder, Lawing, Flecks, Ahmadzadeh, Dambach, Engler, Habel, Hartmann, Hörnes and Ihlow2013; Gavin et al. Reference Gavin, Fitzpatrick, Gugger, Heath, Rodríguez-Sánchez, Dobrowski, Hampe, Hu, Ashcroft and Bartlein2014), ancient invasion dynamics (Jackson Reference Jackson, Luken and Thieret1997; Dudei and Stigall Reference Dudei and Stigall2010), and change in ancient ecosystem functioning using functional traits (Polly and Head Reference Polly and Head2015; Polly et al. Reference Polly, Lawing, Eronen and Schnitzler2016; Lawing et al. Reference Lawing, Eronen, Blois, Graham and Polly2017); and inform conservation decision making (Dietl and Flessa Reference Dietl and Flessa2011; Barnosky et al. Reference Barnosky, Hadly, Gonzalez, Head, Polly, Lawing, Eronen, Ackerly, Alex, Biber, Blois, Brashares, Ceballos, Davis, Dietl, Dirzo, Doremus, Fortelius, Greene, Hellmann, Hickler, Jackson, Kemp, Koch, Kremen, Lindsey, Looy, Marshall, Mendenhall, Mulch, Mychajliw, Nowak, Ramakrishnan, Schnitzler, Das Shrestha, Solari, Stegner, Stegner, Stenseth, Wake and Zhang2017).
It can be informative to incorporate information from studies on modern flora and fauna into paleontological studies (Fritz et al. Reference Fritz, Schnitzler, Eronen, Hof, Böhning-Gaese and Graham2013; Lawing and Matzke Reference Lawing and Matzke2014). Taxonomic resolution, time-averaging, transport, and age uncertainty in data associated with fossils sometimes make it difficult to integrate modern and fossil occurrence data, but these useful pieces of information can be combined to make inferences beyond what each data type will allow on its own. For example, using a phylogenetic framework with randomization procedures would allow one to anchor and extract important clues from the fossil record that can be bolstered by more abundant and taxonomically resolved data from the modern record (Hunt and Slater Reference Hunt and Slater2016). This is not to say that fossil occurrence data are not useful on their own. There are hundreds of studies that make use of fossil occurrence data that revealed important biological insight into ecological and evolutionary processes, biogeographic history, and community assembly. However, designing methods that integrate modern and fossil occurrence data bolster our ability to make inferences using information from multiple taxonomic and phylogenetic scales (Hunt and Slater Reference Hunt and Slater2016), strengthen our ability to use findings from paleontological studies as past anchoring points to investigate ongoing ecological and evolutionary processes (Lawing and Matzke Reference Lawing and Matzke2014), and help us translate findings from paleontological studies to inform conservation practices (Dietl and Flessa Reference Dietl and Flessa2011; Barnosky et al. Reference Barnosky, Hadly, Gonzalez, Head, Polly, Lawing, Eronen, Ackerly, Alex, Biber, Blois, Brashares, Ceballos, Davis, Dietl, Dirzo, Doremus, Fortelius, Greene, Hellmann, Hickler, Jackson, Kemp, Koch, Kremen, Lindsey, Looy, Marshall, Mendenhall, Mulch, Mychajliw, Nowak, Ramakrishnan, Schnitzler, Das Shrestha, Solari, Stegner, Stegner, Stenseth, Wake and Zhang2017).
My intention for this paper is to provide an entry-level discussion to various modern and paleontological data types and methodologies that can be integrated in analyses that span ecological, evolutionary, and geologic time. The discussion provided in this paper is not comprehensive in reviewing all studies that integrate modern and paleontological data and methods but will discuss several methods relevant to understanding how species and communities respond to climate and environmental change through time. I will frame the discussion focusing on PaleoPhyloGeographic species distribution Models (PPGMs) as an organizing theme that integrates multiple lines of evidence to infer species past geographic response to climate change and to estimate where and when there were hotspots of ancient diversification (Lawing and Polly Reference Lawing and Polly2011; Rödder et al. Reference Rödder, Lawing, Flecks, Ahmadzadeh, Dambach, Engler, Habel, Hartmann, Hörnes and Ihlow2013; Lawing et al. Reference Lawing, Polly, Hews and Martins2016; Rivera et al. Reference Rivera, Lawing and Martins2020). Using PPGMs as an organizing concept in this paper will allow me to home in on a few important methods that were integrated in this particular framework and is intended to help readers get basic information about how these methods work so they can think through how they might integrate multiple modeling techniques with heterogenous data types. However, this paper is meant to be useful to readers beyond those only interested in implementing a PPGM analysis. In an effort to triangulate species distribution modeling, phylogenetic comparative methods, and paleontological observations, this paper provides entry-level remarks on each of these aspects and its required or associated data types and considerations. I attempt to answer basic questions about each of the data types and methods, including (1) what are the data and methods, (2) how are they related to other frameworks and methods, (3) how have they been used in previous work, (4) what are the basic premises of the methods and how do they work, (5) why are they useful to further develop, (6) what are the pitfalls for new researchers to be aware of, and (7) how can we move forward in this field of integration?
Paleophylogeographic Species Distribution Models (PPGMs)
PPGMs are retrodictions of species idealized geographic distributions based on phylogenetic comparative methods, modeled climate tolerances, and paleoclimate GCMs. This framework draws on evidence from evolutionary information in the form of phylogenetic relatedness from clades of extant closely related species, where and when there are associated fossil occurrences, and deep time paleoclimate. Thus far, PPGMs have been used to trace species range dynamics over shorter geologic time frames through glacial–interglacial cycles. These studies found that species ranges probably move more quickly than species adapt to new climate conditions (Lawing and Polly Reference Lawing and Polly2011; Rödder et al. Reference Rödder, Lawing, Flecks, Ahmadzadeh, Dambach, Engler, Habel, Hartmann, Hörnes and Ihlow2013). Extending PPGMs over longer geologic time frames back to the Miocene shows that incorporating evolutionary history and phylogenetic comparative methods changes our understanding of deep time range shifts and helps pinpoint hotspots of ancient diversification (Lawing et al. Reference Lawing, Polly, Hews and Martins2016). This framework has been supported by deep time projections of physiological models of climate tolerance (Lawing et al. Reference Lawing, Polly, Hews and Martins2016). However, it is clear that more work is needed to better understand the evolution of physiological tolerances and how they relate to the climate space in which species occur. Rivera et al. (Reference Rivera, Lawing and Martins2020) honed this framework to investigate lineage-specific differences among congeners. They showed that large shifts in the climate system drove expansion and contraction of suitable habitat and that geologic events, such as orogeny, relate to diversification events.
Other frameworks have combined several of the same data sources and methodologies in different ways. One of the earlier studies to use phylogenetic comparative methods in combination with climate envelope modeling investigated factors that may have influenced speciation in a group of dendrobatid frogs (Graham et al. Reference Graham, Ron, Santos, Schneider and Moritz2004). In that study, ancestral reconstructions of climate envelopes were calculated and compared with extant climate envelopes in a principal components space representing an ordination of all the environmental layers that were used to characterize species climate envelopes. Phyloclimatic modeling also combines climate envelopes and phylogenetic comparative methods to reconstruct the history of climate tolerances of species (Yesson and Culham Reference Yesson and Culham2006). It extended the previous framework to include projections of ancestral node estimates onto past paleoclimate scenarios.
Another implementation using fossils with phylogenetic comparative methods and climate envelope models revealed new information about the distribution of stem lineages that influence the interpretations of crown group diversification and ancient evolutionary history (Meseguer et al. Reference Meseguer, Lobo, Ree, Beerling and Sanmartín2015). This approach used climate envelope modeling and a scale-invariant Mahalanobis distance to represent a lineage's optimum climate envelope (Varela et al. Reference Varela, Lobo and Hortal2011). The authors built paleoclimate envelope models from fossil occurrences and projected those models onto paleoclimate maps. The paleoclimate envelope models were not informed by extant species climate envelopes, but they did incorporate ancestral area reconstructions, combining multiple lines of evidence to better infer the biogeographic history of a genus.
The PPGM framework moves these methods forward in two ways. First, PPGMs incorporated a simple paleoclimate interpolation along with a phylogenetic climate envelope lineage interpolation to extract concerted reconstructions of paleoclimate and phylogenetically informed climate envelopes at multiple coincident time periods of the past (Lawing and Polly Reference Lawing and Polly2011). This allowed for more nuanced phylogenetic reconstruction of climate envelopes and more nuanced paleoclimate estimations between time periods where there are available global atmosphere and ocean circulation models reconstructing paleoclimate across geographic space. Second, PPGMs incorporated a method to include paleoclimate information associated with fossil localities into phylogenetic climate envelope reconstructions (Lawing et al. Reference Lawing, Polly, Hews and Martins2016; Rivera et al. Reference Rivera, Lawing and Martins2020). If the paleoclimate information shows that a fossil occurred in a climate that is outside the distribution of current climates for a group under evaluation, then that information can improve our understanding of the evolution of climate envelopes and the paleobiogeographic reconstruction of species.
Data for Integration
Multiple data types are available for integration of paleontological and modern data and methods (Fig. 1). Data have been made more readily available through compilation of databases and accessible data portals (Uhen et al. Reference Uhen, Barnosky, Bills, Blois, Carrano, Carrasco, Erickson, Eronen, Fortelius and Graham2013). Some of these data portals include paleontological and modern data, such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org). Others focus more closely on the compilation of specific modern or paleontological datasets. For example, iNaturalist is an online social network that compiles modern observations of biodiversity around the world but currently is heavily biased in observations from Europe and North America (https://www.inaturalist.org). The Neotoma Paleoecology Database is a community database that compiles information about fossil data from the Pliocene to the Quaternary (www.neotomadb.org). The Paleobiology Database compiles data of fossil occurrences within collections that span all geologic ages (https://paleobiodb.org/#). GBIF compiles many of these more focused databases, yet not all information associated with fossil sites and occurrences are processed through to GBIF. This section explains data requirements for PPGM, where to find primary data, how some data are derived, and how other data are modeled. Each section addresses associated assumptions and uncertainties.
Modern Occurrence Data
Modern occurrence data are recorded observations of individual organisms often taxonomically identified to the species or subspecies level at specific geographic places and times. Occurrence data are systematically collected through surveys or, more often, opportunistically collected through incidental observations. Data are housed in museum collections with vouchered specimens or in online databases. GBIF is one of the most comprehensive online databases that collates and stores locality data for Earth's biodiversity obtained from numerous museums and observation networks. However, the often-incidental nature associated with many observations produce biases in these primary biodiversity data through space and time (Boakes et al. Reference Boakes, McGowan, Fuller, Chang-qing, Clark, O'Connor and Mace2010; Beck et al. Reference Beck, Böller, Erhardt and Schwanghart2014) and there are notable gaps in distributions globally (Yesson et al. Reference Yesson, Brewer, Sutton, Caithness, Pahwa, Burgess, Gray, White, Jones and Bisby2007; Collen et al. Reference Collen, Ram, Zamin and McRae2008).
Methods have been developed that attempt to account for bias in occurrence data. Those include subsampling the available occurrence data in geographic space (Hijmans Reference Hijmans2012; Boria et al. Reference Boria, Olson, Goodman and Anderson2014) or in environmental space (Varela et al. Reference Varela, Anderson, García-Valdés and Fernández-González2014) and weighting occurrences based on sampling effort (Stolar and Nielsen Reference Stolar and Nielsen2015). Environmental filtering, systematically subsampling occurrence data based on position in environmental space, is preferred to geographic filtering, systematically subsampling occurrence data based on position in geographic space, because environmental predictors are typically used to build climate envelope models, species distribution models (SDMs), or ecological niche models (ENMs) for species, and those are the relevant axes to deal with observation bias. In either case, bin or pixel sizes used to subsample observations influence the number of retained samples and influence model performance (Castellanos et al. Reference Castellanos, Huntley, Voelker and Lawing2019). Weighting subsamples of the occurrence data based on sampling effort is known to improve model predictions (Stolar and Nielsen Reference Stolar and Nielsen2015), so calibrating model evaluation statistics with a null model (Hijmans Reference Hijmans2012), deriving a proxy variable for sampling effort (Fithian et al. Reference Fithian, Elith, Hastie and Keith2015), or sample weighting as the inverse probability of sampling (Stolar and Nielsen Reference Stolar and Nielsen2015) are other useful ways forward.
Sampling bias, among other factors such as biotic interaction and available climate in geographic space, contributes to the incomplete characterization of climate envelopes of species, and incomplete characterization has been shown to bias parameter estimates in evolutionary models (Saupe et al. Reference Saupe, Barve, Owens, Cooper, Hosner and Peterson2018). This problem is exacerbated by anthropogenic influences on the ability for species to occupy their full range of climates (Pineda-Munoz et al. Reference Pineda-Munoz, Wang, Lyons, Tóth and McGuire2021). Correcting sampling bias in occurrence records has not yet been widely incorporated in climate envelope modeling, nor in PPGM-type models. The typical reasoning for using these simple modeling schemes is to allow for flexibility in the covariation of climates within climate envelopes and to attempt to more completely characterize certain aspects of a species’ climate niche, in terms of minimum and maximum tolerances of climate, rather than allowing incomplete characterization to drive the relationships established between occurrences and climates. Regardless, it will be a fruitful path forward to carefully consider sampling bias and its implications for climate envelope modeling and PPGM.
Fossil Occurrence Data and Age Ranges
Fossil occurrence data and age ranges stem from recorded observations of remains of organisms, their excrement, or their tracks, documenting presence at a particular geographic location and within a particular time range. Fossils representing occurrences can be fragmentary, weathered, or morphologically distorted through death, transport, deposition, and the fossilization process. However, it has been shown that fossils are rarely transported out of their original life habitats. Many species with robust parts (e.g., bones or shells) are found in death assemblages with high fidelity to their rank abundance at which they are found in life assemblages, and time-averaging of fossil assemblages prevents short-term seasonality or yearly signals of variation (Kidwell and Flessa Reference Kidwell and Flessa1995). Thus, fossils provide meaningful information on ecological and evolutionary dynamics in shallow and deep time.
Taxonomic assignments of fossils are often easier to make at the genus level, rather than the species level, at least for many groups of vertebrate fossils, so many more fossils will be included in an analysis if genus-level identifications are allowed in a dataset. In fact, many paleontological studies use genera as a unit of study (Polly and Spang Reference Polly, Spang and Baigrie2002), but it has been debated whether insights gained from analyses with genera “trickle down” to the species level and enhance our understanding of evolution (Hendricks et al. Reference Hendricks, Saupe, Myers, Hermsen and Allmon2014). At the species or genus level, information about the paleoenvironment or paleoclimate associated with fossil occurrences can provide valuable information about where species lived in the past and can alter our understanding of the biogeographic history of a group (Meseguer et al. Reference Meseguer, Lobo, Ree, Beerling and Sanmartín2015; Lawing et al. Reference Lawing, Polly, Hews and Martins2016).
Another piece of critical information gained from fossil occurrence data is the estimated geologic time when the organism died or when the dead organism was deposited into a depositional environment. There are many strategies for numerical and relative dating of fossil deposits (Elias Reference Elias2015), as well as age–depth models for inferring age in deposits that were not directly dated (Blaauw and Christen Reference Blaauw and Christen2011). Estimates of geologic age are typically derived from fossilized organisms or from the sedimentary deposits where fossils were found. The sedimentary deposits are either dated or correlated into a time-calibrated stratigraphic column. For the purposed of integrating fossil occurrences with modern occurrences, it is useful to extract an age range from fossil occurrences, that is, the maximum and minimum possible geologic ages of a fossil.
There are multiple databases hosting information about the locality and deposits associated with occurrences of fossil specimens. A review of these sources for vertebrate fossils documents the history and development of multiple database efforts and how they interrelate and provides information on their nature and history (Uhen et al. Reference Uhen, Barnosky, Bills, Blois, Carrano, Carrasco, Erickson, Eronen, Fortelius and Graham2013). Some of the databases discussed in that review include other types of data. For example, the Neotoma Paleoecology Database holds community-curated data in a data model framework that supports any type of paleoecological and paleoenvironmental data from sedimentary archives (Williams et al. Reference Williams, Grimm, Blois, Charles, Davis, Goring, Graham, Smith, Anderson and Arroyo-Cabrales2018).
Modern Climate Data
Modern climate data are derived from weather stations across the globe. Weather stations systematically record the minimum temperature, maximum temperature, and precipitation on a daily basis. The temperature values are averaged within each month for 12 monthly estimates of minimum and maximum temperature, and the precipitation values are summed within each month for 12 monthly estimates of total precipitation, resulting in 36 variables representing 1 year of temperature and precipitation measures. Often these 36 variables are averaged across multiple years (Hijmans et al. Reference Hijmans, Cameron, Parra, Jones and Jarvis2005). Because weather is variable from year to year, it is useful to derive variables from these 36 measures that summarize the general climate patterns and that may be biologically meaningful for species (Nix Reference Nix1986; Booth et al. Reference Booth, Nix, Busby and Hutchinson2014).
Weather stations are not uniformly distributed across the globe, so high-resolution interpolation has been used to estimate climate data for points where no primary climate information is available (Hutchinson Reference Hutchinson and Jasper1991). Biases are introduced into the dataset from the choice of interpolation method and from the geographic bias in placement of weather stations. Because I am concerned here with comparing modern climate data with paleontological climate data, the variation produced from the biases in the modern climate data is low to negligible when compared with the variation in modeled climate data from the paleontological record.
Although the calendar months are a useful standard to summarize and store climate data, calendar months are not consistently biologically meaningful to species. For example, minimum temperature in January in Canada and Australia do not mean the same thing for species experiencing their climate environment (i.e., a minimum temperature value in a cold month compared with a minimum temperature value in a warm month). Nix (Reference Nix1986) developed a framework, termed BIOCLIM, to combine the 36 climate variables into 19 biologically meaningful variables. The 19 variables represent means and extremes of temperature and precipitation at monthly, quarterly, and annual temporal scales. They have been used extensively in studies of species distribution modeling and as predictor variables for other biodiversity assessments. See Booth et al. (Reference Booth, Nix, Busby and Hutchinson2014) for further explanation of deriving BIOCLIM variables and Hutchinson et al. (Reference Hutchinson, Johnson, Beesley and Green2014) for climate interpolation.
Paleoclimate Data and Models
Climate information from the geologic record is usually documented from tree rings, corals, ice cores, and sediment deposits (Fritts Reference Fritts1991; Evans et al. Reference Evans, Kaplan and Cane2002; Jones et al. Reference Jones, Briffa, Osborn, Lough, van Ommen, Vinther, Luterbacher, Wahl, Zwiers and Mann2009). Just as is the case for weather stations, many climate proxies from the geologic record are geographically unequally distributed. But there are far fewer primary data extracted from the geologic record than there are weather stations, so interpolation techniques for estimating the geographic distribution of climate in the past are not enough. GCMs of the ocean and atmosphere model the geographic distribution of modern, future, and past climates (Randall et al. Reference Randall, Wood, Bony, Colman, Fichefet, Fyfe, Kattsov, Pitman, Shukla and Srinivasan2007). These models use atmospheric and ocean circulation process modeling combined with knowledge of exogenous forcing and boundary conditions from the geologic record to anchor model behavior. Important forcings include orbital changes, solar irradiance, explosive volcanicity, land surface characteristics, and aerosols (Jones et al. Reference Jones, Briffa, Osborn, Lough, van Ommen, Vinther, Luterbacher, Wahl, Zwiers and Mann2009). GCMs are calibrated over many time steps, and they often record minimum temperature, maximum temperature, and precipitation at each temporal step in the model; and thus, those variables can be summarized as the BIOCLIM suite of 19 climate variables. See Nix (1986) and Booth et al. (Reference Booth, Nix, Busby and Hutchinson2014) for an explanation of how to convert minimum temperature, maximum temperature, and precipitation variables to BIOCLIM variables.
Because GCMs are computationally intensive, we do not yet have comprehensive models of climate through all geologic time and space. At the global scale, GCMs are typically low resolution, and finer resolution GCMs have been developed by downscaling models using various techniques (Wilby and Wigley Reference Wilby and Wigley1997). There are many GCM algorithms and boundary conditions, and each produces a different estimate of climate, so it is important to incorporate modeled climate data from multiple sources. There are several initiatives to calibrate GCMs and paleo-GCMs to make models more comparable, such as the Paleoclimate Modeling Intercomparison Project (Jungclaus et al. Reference Jungclaus, Bard, Baroni, Braconnot, Cao, Chini, Egorova, Evans, González-Rouco and Goosse2017; Kageyama et al. Reference Kageyama, Albani, Braconnot, Harrison, Hopcroft, Ivanovic, Lambert, Marti, Peltier and Peterschmitt2017, Reference Kageyama, Braconnot, Harrison, Haywood, Jungclaus, Otto-Bliesner, Peterschmitt, Abe-Ouchi, Albani and Bartlein2018; Otto-Bliesner et al. Reference Otto-Bliesner, Braconnot, Harrison, Lunt, Abe-Ouchi, Albani, Bartlein, Capron, Carlson and Dutton2017).
Many of the modeling results describing modeled spatial and temporal variation in paleoclimate are provided as part of a publication. Links to available modeling results are also compiled on relevant websites, such as on the network of websites documenting the Paleoclimate Modeling Intercomparison Project. In addition to searching the web for modeling results, it is important to search through the literature for GCMs within relevant time intervals of interest. The results files for the GCMs may be made available from the corresponding authors. Some recent efforts have provided fine-resolution paleo-GCMs for time periods that have been less available to the research community. One example is the PaleoClim database, providing free, easily accessible, high-resolution paleoclimate surfaces of global terrestrial areas (Brown et al. Reference Brown, Hill, Dolan, Carnaval and Haywood2018).
Phylogenetic Data
Phylogenetic information provides the hierarchical structure to cross taxonomic scales and integrate paleontological and modern occurrence data (Felsenstein Reference Felsenstein2004). In phylogenies, tips and nodes are linked together by branches, depicting a hypothesis about the relationship between tips or their topology. The relationships are modeled based on molecular or morphological similarities between tips. Tips are the operational taxonomic units used in a study; for studies on modern species, these are typically species, subspecies, or populations, and for studies on ancient species, these are typically species, genera, or even families. Nodes represent hypothetical ancestral taxa. Ultimately, it is important to understand how closely related to each other species and genera are and who is most closely related to whom. That information can be extracted from phylogenies in the form of topology and branch lengths.
To obtain phylogenetic information for the organisms of interest, one can build phylogenetic hypotheses or use phylogenetic hypotheses that have already been established. Baum and Smith (Reference Baum and Smith2013) and Lemey et al. (Reference Lemey, Salemi and Vandamme2009) provide an introduction to building phylogenies and phylogenetic analysis. Numerous phylogenetic studies have been published, and their resulting phylogenetic hypotheses are typically available as supplemental information. Treebase is an online database that hosts phylogenetic information and is a good resource for published phylogenies (Piel et al. Reference Piel, Donoghue, Sanderson and Netherlands2000).
Often there are differing hypotheses from phylogenies built with different combinations of molecular and morphological data (Hillis Reference Hillis1987; Shaffer et al. Reference Shaffer, Meylan and McKnight1997; Larson Reference Larson, deSalle and Schierwater1998; Swalla and Smith Reference Swalla and Smith2008), as well as differences in phylogenetic hypotheses when both modern and ancient operational taxonomic units are included in the analysis (Novacek Reference Novacek, Novacek and Wheeler1992; Eklund et al. Reference Eklund, Doyle and Herendeen2004; O'Leary and Gatesy Reference O'Leary and Gatesy2008). Because there often is contention around which phylogenetic topology is best supported, it is important to collect multiple phylogenetic hypotheses and repeat analyses to gain an understanding of the range of potentially different results due to phylogenetic uncertainty. In addition, within the framework of PPGMs, for integration with fossil occurrence data and for projection onto relevant paleoclimate maps, time-calibrated phylogenies are required.
Under the Hood
Multiple methods are required for integration of paleontological and modern data within the context of PPGM. This section explains how six methods for integration work. Several of these methods, such as ecological niche modeling, species distribution modeling, and phylogenetic comparative methods, are massive fields and have had many articles and books written about them. Here, I intend to briefly introduce each method and highlight the relevant information required and considerations needed for integration in PPGM.
Modeling Ecological Niches and Species Distributions
Ecological niche modeling and species distribution modeling typically begin with the practice of compiling information on species occurrences, associating climate or environmental data with occurrences, applying an algorithm to estimate some suitable climate or environmental space that is or probably could be occupied by a species (i.e., estimating the climate or environmental niche) (Peterson et al. Reference Peterson, Soberón, Pearson, Anderson, Martínez-Meyer, Nakamura and Araújo2011). One then uses the parameters from that algorithm to project a potential distribution of a species into geographic space. The majority of ENMs and SDMs are correlative in nature, as they are often based on incidental observation data and associate occurrences with predictor variables (Elith and Leathwick Reference Elith and Leathwick2009). Many algorithms have been described for the association of occurrences to predictor variables (Elith et al. Reference Elith, Graham, Anderson, Dudík, Ferrier, Guisan, Hijmans, Huettmann, Leathwick and Lehmann2006), multiple algorithm projections have been combined to reduce uncertainty in projections (Hao et al. Reference Hao, Elith, Guillera-Arroita and Lahoz-Monfort2019), and different algorithms have been shown to be appropriate in different situations (Elith and Graham Reference Elith and Graham2009).
There are many good review papers and books that provide an introduction and review of species distribution modeling and its associated concepts of ecological, environmental, and climate niches (Austin Reference Austin2007; Elith and Leathwick Reference Elith and Leathwick2009; Franklin Reference Franklin2010; Peterson et al. Reference Peterson, Soberón, Pearson, Anderson, Martínez-Meyer, Nakamura and Araújo2011; Maguire et al. Reference Maguire, Nieto-Lugilde, Fitzpatrick, Williams and Blois2015). These overviews and reviews detail the many considerations that are required when modeling a species’ niche and its geographic distribution. More recently, guidelines have been developed to help researchers evaluate the quality of species distribution modeling studies and to help systematically account for all of the steps involved in building SDMs (Sofaer et al. Reference Sofaer, Jarnevich, Pearse, Smyth, Auer, Cook, Edwards, Guala, Howard and Morisette2019). I follow the recommendation of Peterson and Soberón (Reference Peterson and Soberón2012) and recognize that SDMs are inclusive of ENMs, but see Warren (Reference Warren2012) for further consideration of this topic. In this paper, when referring to a niche (ecological, climate, environmental, or otherwise), I am using the term consistent with a Hutchinsonian niche concept, which recognizes there is an n-dimensional hypervolume made up of biologically important axes that quantify where a species can live (Hutchinson Reference Hutchinson1957). I will use the term “climate” or “environmental niche” to explicitly refer to the type of predictor variables being used in conceptualizing the niche model. It is important to point out these practical aspects of terminology because of contention over the use and misuse of terminology and associated concepts in this field (Jiménez-Valverde et al. Reference Jiménez-Valverde, Lobo and Hortal2008; Peterson and Soberón Reference Peterson and Soberón2012; McInerny and Etienne Reference McInerny and Etienne2013).
To integrate paleontological and modern data in a phylogenetic framework, and specifically for use in PPGM, rectilinear climate envelope models have been used due to their simplicity and fidelity to the Hutchinsonian niche concept (Graham et al. Reference Graham, Ron, Santos, Schneider and Moritz2004; Yesson and Culham Reference Yesson and Culham2006; Lawing and Polly Reference Lawing and Polly2011). The rectilinear climate envelope model is one way to characterize niche dimensions for use in projecting potential species distributions into geographic space. This method extracts a range from each climate or environmental variable associated with occurrence data, either maximum and minimum or some subset of it, such as 5th and 95th percentiles, and considers the climate within that envelope suitable for the species being modeled. In geographic space, any point that fits within the ranges of all the climate variables included in the climate envelope model is considered suitable for the species. One drawback to the climate envelope method is that it oversimplifies the ecological niche and potential geographic distribution of modern species. However, other algorithms that have been shown to perform well in characterizing the ecological niche and potential geographic distribution of a species, such as maximum entropy and boosted regression trees (Elith et al. Reference Elith, Graham, Anderson, Dudík, Ferrier, Guisan, Hijmans, Huettmann, Leathwick and Lehmann2006), have multiple parameter estimates and complicated associations or breakpoints between occurrences and predictor variables. So far, it has been unclear how to model their parameters along a phylogenetic tree in a phylogenetic comparative methods framework.
Projecting ENMs forward and backward in time has now received considerable attention, as models typically do not perform well under new conditions, which are known as non-analogue climate scenarios (Fitzpatrick and Hargrove Reference Fitzpatrick and Hargrove2009; McGuire and Davis Reference McGuire and Davis2013; Davis et al. Reference Davis, McGuire and Orcutt2014; Moreno-Amat et al. Reference Moreno-Amat, Rubiales, Morales-Molino and García-Amorena2017). This problem is particularly relevant when projecting models to the past, when there was quite a bit of non-analogous climate compared with modern climates (Fitzpatrick and Hargrove Reference Fitzpatrick and Hargrove2009). One method to improve model projections is to incorporate fossil occurrences into ENMs along with extant occurrences (Varela et al. Reference Varela, Rodríguez and Lobo2009, Reference Varela, Lobo and Hortal2011). This accounts for shifts in the realized niche of a species through time and is meant to more closely approximate its fundamental niche. In addition, directly projecting niche models built with modern data does not incorporate the potential for niche evolution. Thus, PPGMs and other methods have been developed to take into consideration the potential evolution of a niche and the vastly different climates in which the close relatives of modern species occur.
Phylogenetic Comparative Methods
Phylogenetic comparative methods are typically used to correct for non-independence of samples in comparative studies with multiple species (Felsenstein Reference Felsenstein1985), to study the processes of evolution and speciation among multiple species (Harvey and Pagel Reference Harvey and Pagel1991), or to infer character states of hypothetical ancestral species (Martins Reference Martins1999; Omland Reference Omland1999). Biologists have typically used these methods to learn about the history of organisms by using modern information stored in species’ DNA, and paleontologists have compared model results with fossil data to demonstrate model reliability and uncertainty (Polly Reference Polly2001).
Brownian motion has traditionally been used to model the amount of expected evolutionary change, or accumulated variation, over a specified number of time steps (generations) with either no selection or randomly varying selection acting on a phenotype (Harvey and Purvis Reference Harvey and Purvis1991). This is a one-parameter model that estimates evolutionary rate. Other models of evolution have been described that might more accurately represent the evolutionary history of a phenotype (Butler and King Reference Butler and King2004; Boucher et al. Reference Boucher, Thuiller, Davies and Lavergne2014). Notably, the Ornstein-Uhlenbeck model has been used to model selection of a trait toward an optimum and might be particularly important for climate studies, as Lawing et al. (Reference Lawing, Polly, Hews and Martins2016) showed that much variation in climate variables among species is best modeled by an Ornstein-Uhlenbeck process. The Ornstein-Uhlenbeck model is typically a two- or three-parameter model that estimates the evolutionary rate and the strength of selection (also known as the selection coefficient or alpha) toward a fixed optimum. If the optimum is not in the same location as the mean of the population, then the third parameter of the Ornstein-Uhlenbeck model is the location of the optimum. There are multiple review papers that introduce phylogenetic comparative methods and explain their various categories and uses (Miles and Dunham Reference Miles and Dunham1993; Martins and Hansen Reference Martins, Hansen and Martins1996; O'Meara Reference O'Meara2012; Pennell and Harmon Reference Pennell and Harmon2013; Cooper et al. Reference Cooper, Thomas and FitzJohn2016).
Phylogenetic comparative methods have been employed to study the evolution of a climate niche and physiological tolerances of organisms. To integrate these methods with ENMs, researchers considered parameters from ENMs (such as the maximum and minimum value of a climate envelope meant to represent a climate niche) as phenotypes for a species. These climate parameters are treated as species traits or phenotypes and regressed along phylogenies according to a specified model (or models) of evolution. Evolutionary parameters associated with the model, such as evolutionary rate, the selection coefficient, and the optimum, are estimated. These estimates are then used to reconstruct the histories of a climate niche.
Estimates of the history of a climate niche using only extant species information and their phylogenetic relationships will not allow for reconstructions outside the distribution of climate parameters among the tip taxa. This is a problem, because we know that even as recently as the last glacial maximum (26–19 ka) there was a reasonable amount of non-analogous climate, populations of species closely related to those that occur now also occurred during that time, and climate during that time does occur outside the climate envelopes of extant species. Thus, PPGMs and other methods have developed procedures to incorporate evidence of past climate that is geographically associated with fossil occurrence data by incorporating fossil occurrences into a phylogenetic reconstruction.
Anchoring Phylogenetic Comparative Methods with Fossil Occurrences
There have been many efforts to incorporate fossil information to inform phylogenetic methods (Finarelli and Flynn Reference Finarelli and Flynn2006; Pyron and Burbrink Reference Pyron and Burbrink2012; Hunt Reference Hunt2013; Slater Reference Slater2013; Slater and Harmon Reference Slater and Harmon2013). These typically focus on time-calibrating trees with fossil information (Felsenstein Reference Felsenstein2002; Pyron Reference Pyron2011; Ronquist et al. Reference Ronquist, Klopfstein, Vilhelmsen, Schulmeister, Murray and Rasnitsyn2012; Bapst Reference Bapst2013) and tree building to incorporate total evidence from morphological and molecular data into character matrices to analyze and develop hypotheses about the relationships between species, extant and extinct (Williams Reference Williams1994; Purvis Reference Purvis1995; Ronquist et al. Reference Ronquist, Klopfstein, Vilhelmsen, Schulmeister, Murray and Rasnitsyn2012). Incorporating paleoecological or paleoclimate information associated with ancient species is an area that has been less explored, but it is important to consider, as the information associated with fossils allows us to anchor models in the past, as better proxies and GCMs provide more realistic reconstructions of the past climates species would have encountered.
Ideally, the species or genera associated with the modern and fossil occurrences being modeled would have one or more time-calibrated phylogenetic trees that incorporate all extant and extinct species in the study. In this case, regular phylogenetic comparative methods can handle incorporating modern and paleontological information about climate niches. There are occurrences in the fossil record that are assigned to extant species or genera. In the case of fossil occurrences assigned to extant species, one may incorporate the paleoclimate associated with the fossil occurrences directly into the ENM for the extant species. More often, at least for vertebrate species, fossil occurrences are assigned to a genus, but the species affinity is unknown.
One way to deal with the unknown placement of a fossil occurrence within a phylogeny is to repeat a randomization procedure for its placement, perform a phylogenetic comparative analysis, evaluate the model, and extract important parameter estimates (Lawing et al. Reference Lawing, Polly, Hews and Martins2016; Rivera et al. Reference Rivera, Lawing and Martins2020). After this procedure is repeated many times, a distribution of important parameter estimates is available for comparison to the original phylogenetic comparative method performed with no fossil occurrences included. This anchoring procedure can be used to evaluate the usefulness of anchoring a phylogenetic comparative reconstruction with fossil occurrences. The fossil occurrences will only introduce noise in the analysis if they occur within the range of extant variation. However, they will provide useful insight into ancestral reconstructions if they occur in places with paleoclimate estimates outside the range of extant climates associated with modern occurrences (Fig. 2).
Coherent Models for Projection from Lineage Interpolation
Ancestral reconstructions for phylogenetic comparative methods produce estimates for hypothetical ancestral nodes. Those nodes are located within the phylogeny at a place and time that depends on the amount of similarity between taxa in the study and not based on particularly important points in the geologic past. Thus, the estimated time of the ancestral node reconstructions do not necessarily line up with the time of the available paleo-GCMs. Matching ancestral climate estimates through lineage interpolation with paleoclimate interpolations for projection was a novel implementation from a PPGM-type analysis (Lawing and Polly Reference Lawing and Polly2011). Lineage interpolation uses the evolutionary parameters from best-fit models from a phylogenetic comparative analysis to interpolate along a branch (or lineage) between tips and nodes or between nodes and nodes. Estimates of a climate niche can be extracted from the lineage interpolation for any specified time since the most recent last common ancestor of a clade. These interpolation methods allow for the production of coherent time-calibrated models of a past climate niche to project onto an appropriate time-calibrated map of paleoclimate (Fig. 3).
Paleoclimate Interpolations
Paleoclimate interpolations use linear interpolations weighted by stable oxygen isotope values between climate extremes from geologically interesting end points modeled with paleo-GCMs, GCMs, or modern climate data (Fig. 4). So far, these interpolations have used one global proxy of climate to proportionally adjust climate values between two or more extremes (Lawing and Polly Reference Lawing and Polly2011; Lawing et al. Reference Lawing, Polly, Hews and Martins2016; Gamisch Reference Gamisch2019). The adjustment is applied uniformly across the globe. Other proxies for deep-ocean and surface temperatures include alkenones (Bard Reference Bard2001) and Mg/Ca from benthic foraminifera (Billups and Schrag Reference Billups and Schrag2002), which have been used to successfully reconstruct global temperatures and could be explored as other proxies for paleoclimate interpolations.
Without a doubt, GCMs are preferable to paleoclimate interpolations, because they account for complex processes of ocean and atmospheric circulation. However, GCMs are computationally intensive and so have not been modeled for all time periods. Stable oxygen isotope ratios from benthic foraminifera record a global signal of changes in temperature and are useful proxies for changes in global climate (Zachos et al. Reference Zachos, Pagani, Sloan, Thomas and Billups2001; Lisiecki and Raymo Reference Lisiecki and Raymo2005; Cramer et al. Reference Cramer, Toggweiler, Wright, Katz and Miller2009). There is a reasonable amount of variation between the climate estimates produced by some GCMs. The simple linear interpolation method, paleoclimate interpolations, shows less variation between an interpolated paleoclimate and two paleo-GCMs than between the two paleo-GCMs for a test period during the Holocene (Lawing and Polly Reference Lawing and Polly2011). A new suite of interpolated paleoclimate layers is available at 10 kyr time intervals back to 5.4 Ma at a spatial resolution of 2.5 arc-minutes (Gamisch Reference Gamisch2019). However, the procedure could be improved by incorporating more GCM layers to anchor the interpolation to capture deeper time paleoclimate alterations. Gamisch (Reference Gamisch2019) also provides a detailed protocol for the paleoclimate interpolation procedure.
Multivariate Environmental Similarity Surface through Time
Rectilinear climate envelope models identify whether geographic places fall within or outside a defined climate niche. Some studies projecting models built with only modern occurrences onto climates of the past find no suitable area for species (Rödder et al. Reference Rödder, Lawing, Flecks, Ahmadzadeh, Dambach, Engler, Habel, Hartmann, Hörnes and Ihlow2013; Franklin et al. Reference Franklin, Potts, Fisher, Cowling and Marean2015). Instead of showing that no climate is suitable, it is often more interesting to determine how close the climate is to a climate envelope. Elith et al. (Reference Elith, Kearney and Phillips2010) developed a method, multivariate environmental similarity surface (MESS), to calculate how similar a suite of climate variables is to suitable. To calculate similarity between a reference set (here the set of observations occurring within a climate envelope) and each sample point in geographic space, the Euclidean distance is measured from the edge of each variable in the climate envelope to the particular value of the climate variable at the sample point and summed. MESS maps highlight the geographic areas that are within a climate envelope and the level of similarity of areas that are outside a climate envelope. MESS is particularly useful in evaluating PPGM predictions, because of the non-analogous nature of modeled past climates (Rivera et al. Reference Rivera, Lawing and Martins2020).
Integration with PPGM
Earlier, I described the various data types and methods that are required to build PPGMs for a group of species. Integrating this information into a framework to project species climate envelopes onto paleoclimate maps through time requires multiple steps. (1) Obtain and clean species occurrence data for all extant species included in the analysis. (2) Obtain and clean all fossil occurrence data for relevant species or genera included in the analysis. (3) Obtain one or more time-calibrated phylogenetic trees. (4) Determine the relevant descriptors of the climate niche for all species in the analysis. (5) Calibrate SDMs for each species in the study using a rectilinear climatic envelope model to determine the maximum and minimum, or 5th and 95th percentiles, of relevant descriptors of the climate niche. (6) Add fossils into the phylogenetic trees according to the described randomization procedure or constrained to more appropriate locations in the phylogenies. (7) Obtain paleoclimate information from GCMs for relevant time periods. (8) Extract relevant descriptors of the climate niche at fossil locations from paleoclimate maps. (9) Use phylogenetic comparative methods to estimate climate envelopes at hypothetical ancestral nodes. (10) Interpolate between node reconstructions and extant species at relevant time periods. (11) Project climate envelope reconstructions onto paleoclimate maps that have been aligned for each relevant time period for each lineage of the phylogeny. (12) Conduct post hoc comparisons of the projections to address biogeographic hypotheses, which might include the use of MESS to characterize the similarity of an entire paleoclimate surface to a specified climate envelope.
The circumstances under which this method is probably most beneficial is when there exist a reasonable amount of observation data and phylogenetic information for an extant species group and at least some fossils identified to belong within the crown group. In addition, groups that have good information on their physiological tolerances to climate will be particularly fitting. Over shallow time periods during the Quaternary, consideration of evolutionary change in physiological tolerances is likely less important due to shorter time for evolution and speciation to occur, so it would be less useful to go through the process of modeling phylogenetic changes when they might not influence projections of climate envelopes into paleoclimate space. This would be true for species that have time to speciation occurring over millions of years, but it would not be true for species that have shorter time to speciation. At deeper time periods, it is essential to consider species evolutionary change.
Caveats with this methodology include the assumption that the climate niche evolves, that we can capture the evolution of the climate niche using parameters associated with its distribution, and that those parameters are related to physiological requirements of a species (Meik et al. Reference Meik, Streicher, Lawing, Flores-Villela and Fujita2015). Climate data as a proxy for physiological tolerances are probably not adequate. Addo-Bediako et al. (Reference Addo-Bediako, Chown and Gaston2000) found that although species maintain little variation in upper thermal limits across their geographic ranges, they have more variable lower thermal limits that decline with increasing latitude in insects. Gouveia et al. (Reference Gouveia, Hortal, Tejedo, Duarte, Cassemiro, Navas and Diniz-Filho2014) show that upper thermal limits are related to the position of the climate niche in climate space but do not relate to the maximum temperature extracted from the geographic range of anurans.
One way forward is to use principles of biophysical or physiological ecology to model the climate niche of species (i.e., mechanistic models), instead of the climate envelope models described here, which is considered a correlative approach to species distribution modeling. Some researchers have advocated using mechanistic models derived from species physiology to build algorithms to estimate the climate or environmental niche in place of the first two steps of a correlative SDM of collecting species occurrence data and associated climate or environmental data (Kearney and Porter Reference Kearney and Porter2009). This is an interesting path forward, because the physiological parameters might be considered phenotypes on which natural selection could act, more directly linking phylogenetic models with models of a species’ distribution. However, mechanistic models require very specific physiological data for organisms, with extensive validation from the field and lab, where correlative models based on observational data will be more readily populated with much already available data.
Another caveat tangentially related to the caveats already presented is the incomplete characterization of the climate niche. Due to expected biotic influences on species geographic distributions and the variation in available climate space through time, occurrences of species are not expected to capture the full range of climates in which a species may be able to survive and reproduce. Saupe et al. (Reference Saupe, Barve, Owens, Cooper, Hosner and Peterson2018) investigate the effects of incomplete characterization of climate niches by modeling the evolution of a couple of climate niche variables in virtual species. They find that the incomplete characterization of niches increases rates of niche evolution and biases in the comparisons of evolutionary patterns between clades. They caution researchers to beware of these effects and to correct for them by estimating niche truncation. One way to check for niche truncation is to test whether species distributions are in equilibrium with modern climate (Araújo et al. Reference Araújo, Pearson and Rahbek2005; Munguía et al. Reference Munguía, Rahbek, Rangel, Diniz-Filho and Araújo2012). However, even if the species distributions are in equilibrium with modern climate, there remain potential gaps in climate space not occupied by available modern climate. If those gaps occur on the edges of species climate niches, then niche truncation could occur. Including younger fossils in the characterization of the climate niche might offer a more complete characterization of a truncated niche (Varela et al. Reference Varela, Rodríguez and Lobo2009, Reference Varela, Lobo and Hortal2011).
Even with these caveats, this method remains interesting to investigate and improve upon, because it provides an avenue for developing models of species potential distributions through time, while accounting for evolutionary and climate change (Rivera et al. Reference Rivera, Lawing and Martins2020). The results of these models can also be harnessed to provide various expectations of past community composition, which could be compared with observed past communities. These investigations would improve our understanding of the effects of compositional changes and non-analogous compositions on our understanding of past ecosystem dynamics.
Conclusions
We can gain critical insight into biotic response to climate and environmental change by integrating modern and paleontological data, along with phylogenetic comparative methods, ecological niche and species distribution modeling, and paleoclimate interpolations. The approach described here could be broadly applied to integrative studies addressing questions about biota that cross spatial and temporal scales, including investigating biodiversity patterns, macroevolution, community assembly and disassembly, and ecological resilience. Study designs that iterate through divergent assumptions, such as parameters that emphasize niche evolution contrasted to niche conservatism, will result in a suite of possible outcomes that could be evaluated to gain insight into ecological and evolutionary processes governing the distribution of species and their responses to environmental change.
There are multiple study designs that will accommodate the integration of paleontological and neontological datasets. One approach to evaluate biotic response to environmental change is to use methods designed for paleontology and paleontological data to forecast biotic response and compare it with modern biodiversity data. Another approach is to use methods designed for modern observations and inference, project those back in time, and compare the projections with paleontological data. Although these are powerful approaches, especially for model validation, there are several considerations when making these comparisons; see Willig (Reference Willig2003) for a discussion on factors that limit our understanding of biodiversity in space and time. Importantly, the modeling procedure and validation dataset can be mismatched in spatial and temporal scale, so it may be unclear whether some validation procedure fails because the modeling does not accurately capture the important biological processes or due to the spatial and temporal mismatch of the datasets.
A third approach is focused on integrating modern and paleontological data into the same algorithmic procedures for making inferences through new methods development. This approach accommodates the inclusion of both paleontological and neontological data sources and specifically deals with aligning spatial and temporal scales for integration. PPGM-type modeling relies on this third approach, and the associated methods still require multiple aspects of development. The most pressing development needs include understanding how to better characterize niches of species, how those relate to genus occurrences in the fossil record, and how to best incorporate phylogenetic modeling for complicated niche characterization algorithms. It is also critical to better understand the link between physiological ecology and climate tolerances and to further investigate whether and how climate niches evolve.
Modeling species potential distributions in deep time also remains challenging due to the available occurrence data. For many species, there is sparse coverage of spatial and temporal occurrences in the fossil record. Kemp and Hadly (Reference Kemp and Hadly2016) highlight the taxonomic biases present in available data. Targeted sampling will be required to gain more comprehensive coverage for some species. In addition, we need more paleoclimate general circulation models to describe distributions of climate through time and to help interpret the geographic distribution of ancient climate availability. So far, PPGM-type modeling has been applied to only a couple groups of squamate reptiles and to North American chelonians. It is important to extend the application of these methods to species groups with more numerous fossils and with more taxonomically resolved fossil identifications.
Many of the biological and paleontological data we rely on for these modeling efforts are supported by natural history collections (Cook and Light Reference Cook and Light2019). But natural history collections are struggling, as they are underfunded and undersupported, and many important collections have even been closed (Dalton Reference Dalton2003; Schilthuizen et al. Reference Schilthuizen, Vairappan, Slade, Mann and Miller2015). In addition, there is a dearth of researchers depositing new specimens into collections (Turney et al. Reference Turney, Cameron, Cloutier and Buddle2015; Salvador and Cunha Reference Salvador and Cunha2020). We need more support for natural history collections in the twenty-first century and more support for new users and depositors of voucher specimens (Miller et al. Reference Miller, Barrow, Ehlman, Goodheart, Greiman, Lutz, Misiewicz, Smith, Tan and Thawley2020).
Despite the complexities and caveats, it is useful to continue to develop ways to further integrate data and methods across the biology–paleontology spectrum. These methods allow us to meaningfully incorporate paleoclimate data associated with fossils into phylogenetic comparative analyses to anchor reconstructions and better gauge the evolutionary tempo and mode of climate tolerances. They allow us to test current biogeographic hypotheses and develop new suites of hypotheses to better understand geographic shifts in species distributions in response to past global change events. In addition to providing insight into ecological and evolutionary processes that support biodiversity, these past modeled responses may serve as a comparison to recent, modern, and future projected responses to global change.
Acknowledgments
I would like to thank J. Lamsdell and C. Congreve for the invitation to speak at the GSA symposium titled “Phylogenetic Paleoecology: Macroecology within an Evolutionary Framework” and for editing this volume of papers presented at the symposium. I would also like to thank J. Lamsdell for his encouragement to write this article and two anonymous reviewers for providing insightful feedback that improved this article. This work was partly supported by the USDA NIFA Hatch TEX09600 project 1003462 and 1020451 and by the Integrative Climate Change Biology and Conservation Paleobiology in Africa programs of the International Union of Biological Sciences.