Impact Statement
This position paper reviews the severe limitations of max-stable processes for environmental extreme data science, and discusses more appropriate alternative statistical frameworks for the modeling of spatial extremes that have emerged recently. Use of machine learning and artificial intelligence methods in spatial extreme-value modeling and inference is also discussed, and seven key recommendations to push the field forward are given.
1. Introduction
With the rise of statistical machine learning that marks the “data science revolution” (Donoho, Reference Donoho2017), and the increasing availability of massive high-quality environmental data products based on observation and simulation (e.g., large climate model ensembles, see Danabasoglu et al., Reference Danabasoglu, Lamarque, Bacmeister, Bailey, DuVivier, Edwards, Emmons, Fasullo, Garcia, Gettelman, Hannay, Holland, Large, Lauritzen, Lawrence, Lenaerts, Lindsay, Lipscomb, Mills, Neale, Oleson, Otto-Bliesner, Phillips, Sacks, Tilmes, van Kampenhout, Vertenstein, Bertini, Dennis, Deser, Fischer, Fox-Kemper, Kay, Kinnison, Kushner, Larson, Long, Mickelson, Moore, Nienhouse, Polvani, Rasch and Strand2020; reanalysis data, see Hersbach et al., Reference Hersbach, Bell, Berrisford, Hirahara, Horányi, Muñoz-Sabater, Nicolas, Peubey, Radu, Schepers, Simmons, Soci, Abdalla, Abellan, Balsamo, Bechtold, Biavati, Bidlot, Bonavita, De Chiara, Dahlgren, Dee, Diamantakis, Dragani, Flemming, Forbes, Fuentes, Geer, Haimberger, Healy, Hogan, Hólm, Janisková, Keeley, Laloyaux, Lopez, Lupu, Radnoti, de Rosnay, Rozum, Vamborg, Villaume and Thépaut2020; remote sensing, see OCO-2 Science Team et al., Reference Gunson and Eldering2020; wide in-situ observation networks, see Menne et al., Reference Menne, Williams and Vose2009; mobile sensors or citizen-science data, see https://www.nesdis.noaa.gov/about/citizen-science), the relevance of traditional statistical models is at stake more than ever. This is especially true with the modeling and prediction of environmental extreme events, where assumptions are crucial for accurate risk assessment and mitigation, and where applied findings are of key societal importance on a global scale (IPCC, Reference Romero2023). On the one hand, sophisticated models that are well supported by probability theory are desired, in order to provide sound probability estimates of future low-likelihood-high-impact events that occur in the tail of the distribution. On the other hand, practical considerations should guide the model construction to ensure that it can efficiently utilize available data and provide the answers we need to appropriately address the specific scientific problem of interest. In particular, the variability of processes along the space and time dimensions usually plays a key role in environmental science, and a given statistical model should capture the most important marginal and dependence features of the data, such as spatiotemporal trends and nonstationarity, non-Gaussianity, and subasymptotic forms of tail dependence (see Section 3 for details). Such models should also enable fast-enough inference, which includes model fitting, validation, simulation, and prediction. The speed at which a statistical extreme-value analysis must be performed and the amount of human and material resources required to achieve it depend strongly on the context; while spending months or years could be acceptable for academic purposes or for retrospective studies, it is crucial in some cases to do it within just a few days or weeks (as, e.g., with rapid extreme-event attribution studies to respond to the media about the role of anthropogenic forcings in the occurrence of a recent catastrophic event; see Stott et al., Reference Stott, Stone and Allen2004; Risser and Wehner, Reference Risser and Wehner2017) or even “online” (as, e.g., with operational early warning systems predicting natural hazards in real time, where the safety of people or infrastructure is at risk; see Nguyen et al., Reference Nguyen, Veraart, Taisne, Tan and Lallemant2023). Oftentimes, however, the requirement for both rigor and speed are at odds with each other: popular spatial model classes arising from asymptotic extreme-value theory are often computationally prohibitive due to their intricate probabilistic structure, or subject to important practical restrictions that hamper their widespread application in real operational settings, where data are generally big and complex. In this article, we recall that an asymptotic motivation should never supersede applied scientific considerations and proper model checking. Importantly, we argue that the class of max-stable processes (MSPs), characterized by extreme-value copulas, whose practical usage by statisticians and climate scientists has been multiplied since the article of Padoan et al. (Reference Padoan, Ribatet and Sisson2010), not only has many severe built-in limitations, but also fails to address the basic purpose it was made for: namely, to provide a suitable statistical framework for modeling spatial extremes and estimating small joint tail probabilities (or, similarly, high return levels of spatial aggregates) far beyond observed levels. The enthusiasm about, and adoption of, MSP models in environmental studies—for which we are partly responsible—is due to their solid theoretical foundation, which makes them appear as “natural” models to use, the fact that the extreme-value community has traditionally been more theory-oriented while being somewhat “detached” from concrete issues arising in real applications, and the availability of convenient user-friendly software such as the R package SpatialExtremes (Ribatet, Reference Ribatet2022) to fit and simulate these models. While (part of) the statistics of extremes community has already realized some of their limitations (Davison et al., Reference Davison, Huser, Thibaud, Gelfand, Fuentes, Hoeting and Smith2019; Huang et al., Reference Huang, Monahan and Zwiers2021; Huser and Wadsworth, Reference Huser and Wadsworth2022) and started to develop alternative modeling strategies that transcend the classical framework (Wadsworth and Tawn, Reference Wadsworth and Tawn2012; Opitz, Reference Opitz2016; de Fondeville and Davison, Reference de Fondeville and Davison2018; Huser and Wadsworth, Reference Huser and Wadsworth2019; Engelke and Hitz, Reference Engelke and Hitz2020; Huser et al., Reference Huser, Opitz and Thibaud2021; Wadsworth and Tawn, Reference Wadsworth and Tawn2022; Castro-Camilo et al., Reference Castro-Camilo, Huser and Rue2022), MSPs and extreme-value copulas still continue to be used in many spatial data applications and considered in simulation studies as the “default” option.
This article aims to openly discuss the known deficiencies of MSPs, and strongly encourage statisticians and climate scientists to move away from them in real applications unless better alternative solutions are not available. We argue that, as a community, it is now time to reflect and act upon the lessons learned over the past two decades, and move on with the broader adoption of more recent, flexible, efficient, and pragmatic solutions for the modeling of extremes in operational risk assessment studies and, more generally, environmental data science. We stress that extreme-value theory remains the appropriate framework to conduct reliable statistical analyses in the data-scarce setting of extreme events, but its historical standard models, MSPs, are often not the most appropriate tools, especially for environmental data applications.
The rest of this article is organized as follows. In Section 2, we review MSPs and their main limitations. In Section 3, we discuss alternative modeling strategies. In Section 4, we conclude with some final remarks, and an outlook on the future of environmental extreme data science, with a particular view on advances at the interface between statistics of extremes and modern machine learning. We also list seven key recommendations moving forward.
2. Max-stable processes: a restrictive tool for the wrong problem?
The theoretical foundations underpinning MSPs start with the wish to generalize univariate extreme-value theory (Davison and Huser, Reference Davison and Huser2015) to the spatial context. Consider a sequence of independent and identically distributed random processes, $ {Y}_1\left(\boldsymbol{s}\right),{Y}_2\left(\boldsymbol{s}\right),\dots $ , defined over a spatial domain $ \mathcal{S} $ . Extreme-value theory states that under broad conditions, the only possible limits, $ Z\left(\boldsymbol{s}\right) $ , for the process of pointwise maxima, $ {Z}_n\left(\boldsymbol{s}\right)=\max \left\{{Y}_1\left(\boldsymbol{s}\right),\dots, {Y}_n\left(\boldsymbol{s}\right)\right\} $ as $ n\to \infty $ , when appropriately affinely renormalized, are MSPs. This result implies that all univariate margins of $ Z\left(\boldsymbol{s}\right) $ follow the generalized extreme-value (GEV) distribution, while all finite-dimensional margins are characterized by an extreme-value copula (Davison et al., Reference Davison, Padoan and Ribatet2012; Segers, Reference Segers2012). This asymptotic characterization (as the block size $ n $ tends to infinity) has been the principal argument for fitting max-stable models and extreme-value copulas in practice (with fixed and finite $ n $ ). While the MSP theory was established in the 1980s, their popularity started to grow with Schlather (Reference Schlather2002) who showed how to construct max-stable models with realistic-looking realizations based on de Haan’s (Reference de Haan1984) spectral representation. Padoan et al. (Reference Padoan, Ribatet and Sisson2010) later promoted their adoption in environmental applications by proposing a method of inference for spatial max-stable models based on pairwise likelihoods, and Davison et al. (Reference Davison, Padoan and Ribatet2012) further advocated their use against other natural alternatives available at the time. The spectral representation of MSPs (de Haan, Reference de Haan1984; Schlather, Reference Schlather2002) essentially states that, on the unit Fréchet scale, they can be constructed as
for a Poisson point process $ {\left\{{\xi}_i\right\}}_{i\ge 1} $ on $ \left(0,\infty \right) $ with intensity $ {\xi}^{-2}\hskip0.1em d\hskip0.1em \xi $ , and independent copies $ {W}_i\left(\boldsymbol{s}\right) $ of a spatial process $ W\left(\boldsymbol{s}\right) $ satisfying $ \hskip0.1em E\hskip0.1em \left[\max \left\{W\left(\boldsymbol{s}\right),0\right\}\right]=1 $ . The unit Fréchet scale is a common standard since it permits simple expression of the finite-dimensional distribution functions as $ G\left(\boldsymbol{z}\right)=\exp \left\{-V\left(\boldsymbol{z}\right)\right\} $ , with $ V $ the so-called exponent function, which is homogeneous of order $ -1 $ , meaning that $ t\times V\left(t\boldsymbol{z}\right)=V\left(\boldsymbol{z}\right) $ for any positive $ t $ and $ \boldsymbol{z} $ . This construction principle has led to various MSP models, the most popular ones being the Smith (Reference Smith1990) model, the Schlather (Reference Schlather2002) model, the extremal- $ t $ model (Opitz, Reference Opitz2013), the Brown and Resnick (Reference Brown and Resnick1977) model (Kabluchko et al., Reference Kabluchko, Schlather and de Haan2009), and the Reich and Shaby (Reference Reich and Shaby2012) model, which have been widely used in numerous applications. However, MSPs have many intrinsic limitations and practical restrictions, and we now summarize the most critical ones. These restrictions are related to the actual definition of MSPs, the form of their dependence structure, and the cost of likelihood inference and simulation algorithms.
Definition of max-stable processes: A first drawback of MSPs goes back to their basic theoretical motivation, and the implicit definition of a spatial extreme event in this framework. By construction, MSPs $ Z\left(\boldsymbol{s}\right) $ approximate the distribution of pointwise maxima $ {Z}_n\left(\boldsymbol{s}\right) $ for large $ n $ . However, while the $ {Y}_i\left(\boldsymbol{s}\right) $ processes represent the original individual (e.g., daily) spatial events, which are directly observable and may or may not be extreme, the block (e.g., yearly) maximum process $ {Z}_n\left(\boldsymbol{s}\right) $ does not correspond to a real event, unless a single individual process, say $ {Y}_j\left(\boldsymbol{s}\right) $ , is more extreme than all other processes $ {Y}_i\left(\boldsymbol{s}\right) $ , $ i\in \{1,\dots, n\}\hskip0.1em \mathrm{\backslash}\hskip0.1em \{j\} $ , simultaneously at all sites $ \boldsymbol{s} $ within the domain $ \mathcal{S} $ , such that $ {Z}_n\left(\boldsymbol{s}\right)={Y}_j\left(\boldsymbol{s}\right) $ for all $ \boldsymbol{s} $ . This situation rarely occurs in practice, especially for large domains. Therefore, in fitting an MSP to realizations of $ {Z}_n\left(\boldsymbol{s}\right) $ , we effectively model artificially created spatial extreme data that may have little to do with real observations. This is illustrated in Figure 1 for daily summer precipitation data from the United Kingdom (UK) Climate Projections (see https://www.metoffice.gov.uk/research/approach/collaboration/ukcp) over the historical period 1981–2000. An argument used to persist and still fit MSPs to extremes of individual observations (e.g., Huser and Davison, Reference Huser and Davison2014) is that they provide, in some sense, an approximation to the joint tail of $ {Y}_i\left(\boldsymbol{s}\right) $ itself. Specifically, if a vector $ \boldsymbol{Y}\sim F $ has a joint distribution $ F $ with unit Fréchet margins attracted to the max-stable distribution $ G\left(\boldsymbol{z}\right)=\exp \left\{-V\left(\boldsymbol{z}\right)\right\} $ , then we have $ F\left(n\boldsymbol{y}\right)={\left[{\left\{F\left(n\boldsymbol{y}\right)\right\}}^n\right]}^{1/n}\approx {\left\{G\left(\boldsymbol{y}\right)\right\}}^{1/n}=G\left(n\boldsymbol{y}\right) $ , where the last equality uses the homogeneity of $ V $ , and the approximation is justified when $ n $ is large. However, as is clear from this simple derivation, the max-stable distribution $ G $ only provides an accurate approximation to $ F $ when all observations are large at the same time, and it does not provide a suitable approximation in the much more common situation where only a subset of variables are extreme. This issue can be partly dealt with by censoring small observations when performing inference; nevertheless, we argue that, by design, MSPs address the “wrong problem,” and that the very statistical framework motivating their practical use is flawed.
The rigidity of max-stability: MSPs possess a dependence structure that is too rigid for most environmental applications. A max-stable distribution $ G $ is one for which $ {G}^t $ (with $ {G}^t(x) $ defined as $ {\left\{G(x)\right\}}^t $ ) remains a valid distribution within the same location-scale family for all $ t>0 $ . This implies that the dependence structure of MSPs is invariant to the maximum operator; in other words, if a specific MSP is an appropriate model for annual maxima of a given variable, then the same model also provides an appropriate characterization of the dependence structure for 10-year maxima, 50-year maxima, or even 1000-year maxima. This means that, upon marginal standardization, the dependence patterns of a spatial extreme event do not change with the severity of the event, no matter how extreme it is. This rigidity often contradicts empirical findings, where extreme events are dependent but tend to become spatially more localized as they become more extreme (Huser and Wadsworth, Reference Huser and Wadsworth2019; Castro-Camilo and Huser, Reference Castro-Camilo and Huser2020; Zhong et al., Reference Zhong, Huser and Opitz2022). Very few studies which employ MSPs actually scrutinize this stability property, but diagnostic tools can be found in Gabda et al. (Reference Gabda, Towe, Wadsworth and Tawn2012) and Huser et al. (Reference Huser, Opitz and Thibaud2021), or could be adapted from available multivariate hypothesis testing tools (Bücher and Kojadinovic, Reference Bücher and Kojadinovic2016).
Models only for limited subclasses of possible tail structures: The asymptotic world in which MSPs live is also “black and white”: max-stable models are indeed always asymptotically dependent unless they are exactly independent, and they lack more nuanced representation of the important case of asymptotic independence. Asymptotic independence means that the limiting dependence structure of the normalized $ {Z}_n\left(\boldsymbol{s}\right) $ corresponds to the independence copula, yet in practice, there is almost always residual positive dependence present in $ {Z}_n\left(\boldsymbol{s}\right) $ for finite $ n $ , even when the limit would be independence. By fitting an MSP model to maxima of asymptotically independent data, one incorrectly models this residual dependence as asymptotic dependence, leading to biased extrapolation further into the tail of the distribution. Put differently, a max-stable model fitted to block maxima stemming from data with a weakening tail dependence structure is inevitably misspecified; the effect of this misspecification is that MSPs will capture an “average” strength of dependence and, therefore, tend to slightly underestimate the occurrence probability of joint extreme events at relatively low levels but potentially grossly overestimate them at high levels situated far beyond the observed range of data. Another way to define asymptotic dependence of the process $ Y\left(\boldsymbol{s}\right) $ is to consider the coefficient $ {\chi}_{\left\{i,j\right\}}(u)=\Pr \left\{Y\left({\boldsymbol{s}}_i\right)>{F}_i^{-1}(u),Y\left({\boldsymbol{s}}_j\right)>{F}_j^{-1}(u)\right\}/\left(1-u\right) $ , where $ Y\left({\boldsymbol{s}}_i\right)\sim {F}_i $ and $ Y\left({\boldsymbol{s}}_j\right)\sim {F}_j $ . Asymptotic dependence is present if this converges to a positive limit $ {\chi}_{\left\{i,j\right\}}>0 $ as $ u\to 1 $ . In other words, the joint tail decay rate is proportional to the marginal tail decay rate, and any dependence at moderate levels never vanishes completely but it remains in the limiting tail. While asymptotic dependence (or independence) is a property that is difficult to verify or test in practice, models allowing for asymptotic independence often have richer tail decay rates than MSPs, which are strongly limited due to their focus on the possible asymptotic structures rather than the subasymptotic tail behavior. This difference is crucial in practice, because the flexibility of a model in its joint tail dictates extrapolations to higher levels, and thus impacts risk assessment. Being always asymptotically dependent, MSPs will thus have a tendency to overestimate the risk of very large joint extremes. Studies advising against asymptotic dependence models for environmental risk assessment include Bortot et al. (Reference Bortot, Coles and Tawn2000) for oceanographic processes, and Opitz (Reference Opitz2016) and Dawkins and Stephenson (Reference Dawkins and Stephenson2018) for wind gusts.
Lack of flexible and physically realistic models: Some of the most popular max-stable models are also characterized by unphysical properties. The Schlather and extremal- $ t $ max-stable models, for example, are non-ergodic, which implies that they cannot approach full independence between infinitely distanced sites. With such models, spatial extreme events have a positive probability to be “infinitely wide” in extent since the $ {\chi}_{\left\{i,j\right\}} $ coefficient for $ Z\left(\boldsymbol{s}\right) $ is uniformly bounded below by a positive constant for any two locations $ {\boldsymbol{s}}_i $ and $ {\boldsymbol{s}}_j $ whatever their distance $ \parallel {\boldsymbol{s}}_i-{\boldsymbol{s}}_j\parallel $ . On the other hand, the Smith model has overly smooth realizations based on analytical spatial profiles $ {W}_i $ in (1), and the original Reich–Shaby model is a noisy version of the Smith model with artificial nonstationary artifacts, though a variant based on random basis functions producing more realistic realizations was proposed by Bopp et al. (Reference Bopp, Shaby and Huser2021). Overall, the Brown–Resnick model, which is constructed from intrinsically stationary log-Gaussian processes $ {W}_i $ in (1), seems to be the most physically reasonable one. Nevertheless, the very broad subclass of MSPs possessing a positive continuous density in all of their finite-dimensional distributions (including the Brown–Resnick model itself) shares the common drawback that conditional independence implies full independence (Papastathopoulos and Strokorb, Reference Papastathopoulos and Strokorb2016). This “entanglement effect” means that most max-stable models cannot exhibit any interesting Markov structure directly in $ Z\left(\boldsymbol{s}\right) $ , which is a major impediment to leveraging such Markov structures for efficient inference and validation of max-stable models. This also implies that physically meaningful stochastic partial differential equation models (see, e.g., Lindgren et al., Reference Lindgren, Rue and Lindström2009; Bolin and Wallin, Reference Bolin and Wallin2020; Zhang et al., Reference Zhang, Bolin, Engelke and Huser2023b), which commonly lead to conditional independencies and graphical structures when discretized, are—at least on the surface—directly incompatible with MSPs. Although conditional independence can be enforced at the deeper level of the latent variables $ {W}_i $ used to construct the MSP, or in the underlying exponent measure (Engelke and Hitz, Reference Engelke and Hitz2020; Engelke et al., Reference Engelke, Ivanovs and Strokorb2022), it will never manifest at the data level except in special cases such as max-linear models (Améndola et al., Reference Améndola, Klüppelberg, Lauritzen and Tran2022). Furthermore, given that the total vapor holding capacity of the atmosphere and the total energy in the climate system are finite, the property of asymptotic dependence that characterizes all MSPs may also be questioned and intuitively considered as “unphysical” in typical environmental applications.
Computational complexity: Finally, MSPs are also notoriously computationally cumbersome to fit and simulate from. Fully Bayesian inference in high dimensions, as well as efficient conditional simulation, are possible for the Reich–Shaby model by taking advantage of its simple hierarchical construction (see also Bopp et al., Reference Bopp, Shaby and Huser2021), but this is more the exception than the rule: for most other MSPs, likelihood-based inference is very challenging, even with advanced supercomputers (Castruccio et al., Reference Castruccio, Huser and Genton2016), because the full likelihood function contains a number of terms that grows super-exponentially fast with the number of observed locations. Therefore, alternative inference solutions are generally required. Proposed approaches include pairwise likelihoods (Padoan et al., Reference Padoan, Ribatet and Sisson2010), higher-order composite likelihoods (Genton et al., Reference Genton, Ma and Sang2011; Huser and Davison, Reference Huser and Davison2013; Huser et al., Reference Huser, Stein and Zhong2024), M-estimators (Yuen and Stoev, Reference Yuen and Stoev2014; Einmahl et al., Reference Einmahl, Kiriliouk, Krajina and Segers2016), a customized stochastic expectation–maximization algorithm (Huser et al., Reference Huser, Dombry, Ribatet and Genton2019) or its Bayesian inference counterpart (Thibaud et al., Reference Thibaud, Aalta, Cooley, Davison and Heikkinen2016; Dombry et al., Reference Dombry, Engelke and Oesting2017), distributed inference through a divide-and-conquer strategy (Hector and Reich, Reference Hector and Reich2024), or more recently neural Bayes estimators (Lenzi et al., Reference Lenzi, Bessac, Rudi and Stein2023; Sainsbury-Dale et al., Reference Sainsbury-Dale, Zammit-Mangion and Huser2024). However, except for the latter which can be trained offline, these approaches all face a delicate trade-off between computational and statistical efficiency, and they often remain quite expensive to apply in moderate-to-high dimensions (i.e., from a few dozens to a few hundreds of spatial locations depending on the method and specific setting). Likelihood-based inference can be simplified by including event times in the dataset (Stephenson and Tawn, Reference Stephenson and Tawn2005), but this can lead to bias if the number of locations is large in comparison to the number of replicates over which maxima are taken (Wadsworth, Reference Wadsworth2015). Nonetheless, the inclusion of event times starts to mimic the paradigm of peaks-over-threshold modeling, which we argue in Section 3 is a more sensible and natural approach.
The stochastic representation (1) involving an infinite number of processes over which the pointwise maximum is taken also makes simulation cumbersome (apart from the Reich–Shaby model and its variants). While various types of approximate and exact simulation algorithms have been developed for certain max-stable families (Schlather, Reference Schlather2002; Oesting et al., Reference Oesting, Kabluchko and Schlather2012; Thibaud and Opitz, Reference Thibaud and Opitz2015; Dombry et al., Reference Dombry, Engelke and Oesting2016; Oesting and Strokorb, Reference Oesting and Strokorb2022), conditional simulation remains computationally laborious (Dombry et al., Reference Dombry, Éyi-Minko and Ribatet2013; Oesting and Schlather, Reference Oesting and Schlather2013) and does not scale well with the dimension. To construct more “generic” stochastic generators for spatial extremes, it is also possible to directly exploit generative artificial-intelligence techniques from the machine learning literature, such as generative adversarial networks (see, e.g., Boulaguiem et al., Reference Boulaguiem, Zscheischler, Vignotto, van der Wiel and Engelke2022) which bypass the need to specify a parametric dependence structure for extremes at the expense of completely abandoning any known structure, or variational autoencoders (VAEs; see, e.g., Lafon et al., Reference Lafon, Naveau and Fablet2023; Zhang et al., Reference Zhang, Ma, Wikle and Huser2023a), though these recent machine-learning-based approaches are sometimes difficult to train, are always data-hungry, and are often challenging to study from a theoretical perspective, for example, in terms of the approximation quality of the trained generator, or its ability to accurately reproduce joint tail decay rates.
Discussion: Overall, the difficulties with MSPs are “built-in”: they are a direct consequence of their basic definition leading to the complex structure in (1), and the fact that they are not meant to describe extremes of the original individual events. The action of computing block maxima indeed masks information about the timings of events and temporal dependence, and specifically about co-occurrence of maxima at different spatial locations, which has implications for modeling, inference, and simulation. Max-stability arises as an “asymptotic artifact” resulting from taking the limit of block maxima as the block size $ n $ goes to infinity; in practice, however, interest often lies in the original events themselves, rather than maxima. Moreover, even when the modeling of maxima (or minima) may be desired (e.g., when assessing the survival probability of certain species over an entire region, see Thibaud et al., Reference Thibaud, Aalta, Cooley, Davison and Heikkinen2016), the effective block size is often quite moderate in most environmental data due to serial dependence and seasonality, which can in some cases create a severe mismatch between theory and practice, thus casting serious doubts on the suitability of MSPs in such a case. Using the limit model for data in this context severely constrains the form of dependence structures that can be obtained, in a way that is unrealistic in most environmental applications, while simultaneously complicating statistical inference significantly. Simply put, we think that MSPs are often “not worth the trouble,” as the benefits they bring do not counterbalance their other limitations. In the next section, we summarize more recent modeling strategies that bypass several of the above roadblocks by going beyond max-stability, and that are thus better suited for the spatial modeling of extremes.
3. Solutions beyond max-stability
In Section 2, we argued strongly against the continued use of MSPs in all but exceptional circumstances. Here we outline the breadth of alternative models for spatially indexed environmental data, highlighting their relative merits and potential drawbacks.
Peaks-over-threshold versus maxima: Unlike approaches based on block maxima, peaks-over-threshold approaches focus on modeling extremes of the original spatial events that effectively took place; in the context of Figure 1, this includes the fields highlighted in color in the center panel, for example. Therefore, they are not only more meaningful from a practical perspective, but they also offer ways to customize the definition of a “spatial extreme event” to the specific problem of interest. In particular, peaks-over-threshold approaches do not only provide valid probability approximations when all variables are simultaneously large (which rarely occurs in practice), but they can be adapted to events where only a subset of variables are extreme. This is illustrated in Figure 2 for the two main peaks-over-threshold approaches, namely the (generalized) Pareto process (Ferreira and de Haan, Reference Ferreira and de Haan2014; Dombry and Ribatet, Reference Dombry and Ribatet2015) and the spatial conditional extremes process (Wadsworth and Tawn, Reference Wadsworth and Tawn2022), which are justified by different asymptotic paradigms.
Pareto processes (Ferreira and de Haan, Reference Ferreira and de Haan2014; Dombry and Ribatet, Reference Dombry and Ribatet2015; de Fondeville and Davison, Reference de Fondeville and Davison2018, Reference de Fondeville and Davison2022) are usually viewed as the peaks-over-threshold analogue of MSPs. Both classes of models are grounded in the theory of functional regular variation, but Pareto processes are, in principle, applicable to all data which are in some sense extreme. As a result, they possess much simpler likelihoods than max-stable models, and make a more efficient use of data. Slightly different formulations of Pareto processes arise depending on the so-called aggregation (or risk) functional, usually denoted by $ r\left(\cdot \right) $ , used to define a functional extreme event (such as the spatial maximum); see, for example, de Fondeville and Davison (Reference de Fondeville and Davison2018, Reference de Fondeville and Davison2022) for more details.
The spatial conditional extremes model (Wadsworth and Tawn, Reference Wadsworth and Tawn2022), on the other hand, is a different peaks-over-threshold approach that only applies to the case where a single variable (at a fixed chosen location) exceeds a high threshold; however, it offers improved tail flexibility and other benefits compared to Pareto processes, as discussed below.
Improved tail flexibility with “subasymptotic” models: While Pareto processes have simpler likelihoods and permit the use of more data than MSPs, they are unfortunately seldom appropriate models in practice when used for tail extrapolation and estimation of small probabilities associated with environmental extreme events that lie far in the upper joint tail. This is because their threshold-stability property, analogous to the max-stability property of MSPs, is rarely satisfied by the environmental data available at observed levels of extremity. The threshold-stability property will never be satisfied by data that exhibit asymptotic independence, and represents a very strong additional assumption over the presence of asymptotic dependence. Let $ {Y}_j=Y\left({\boldsymbol{s}}_j\right)\sim {F}_j,j\in D=\left\{1,\dots, d\right\} $ , represent a spatial process observed at $ d $ locations. A simple way to assess whether this property may hold is to consider the quantity $ {\chi}_D(u) $ , defined similarly to $ {\chi}_{\left\{i,j\right\}}(u) $ in Section 2, as
If a Pareto process is applicable, then for any $ d $ there always exist $ d $ distinct locations such that $ {\chi}_D(u)\equiv {\chi}_D>0 $ for all sufficiently large $ u<1 $ . Our collective experience is that, in contrast, it is almost always the case that $ {\chi}_D(u) $ decreases as $ u\to 1 $ , representing a weakening of spatial dependence at extreme levels. This is illustrated in Figure 3 for datasets of precipitation (using the UKCP data from Figure 1), as well as air temperature, sea surface temperature, and windspeed.
To deal with this deficiency, some authors—including ourselves—have advocated the use of what are often termed subasymptotic models. This terminology is used because such models can be seen to bridge the gap between the world of finite-level data and the “mythical land” of asymptopia, where models such as max-stable and Pareto processes should be applicable. However, it is perhaps a misnomer, since if data exhibit asymptotic independence, then Pareto processes are not well defined for domains $ \mathcal{S} $ composed of an infinite number of locations, and max-stable models offer no benefits. Therefore, aside from conditional extremes models, alternatives such as these are currently the only approach to performing useful extreme value inference.
Broadly, these subasymptotic models are designed to represent flexible forms of tail decay, permitting extrapolation from observed levels to more extreme levels. A typical consideration is that probabilities such as $ \Pr \left\{{Y}_1>{F}_1^{-1}(u),\dots, {Y}_d>{F}_d^{-1}(u)\right\} $ or $ \Pr \left\{{Y}_i>{F}_i^{-1}(u),{Y}_j>{F}_j^{-1}(u)\right\} $ should have flexible forms as $ u\to 1 $ , and these forms fit with the assumptions of regular variation and/or hidden regular variation (Ledford and Tawn, Reference Ledford and Tawn1996, Reference Ledford and Tawn1997; Resnick, Reference Resnick2002) to provide some theoretical grounding. Where modeling componentwise maxima remains relevant, we can relax the max-stability property to obtain more realistic dependence structures with constructions such as max-infinitely divisible processes (Huser et al., Reference Huser, Opitz and Thibaud2021) or max-mixtures (Wadsworth and Tawn, Reference Wadsworth and Tawn2012; Bacro et al., Reference Bacro, Gaetan and Toulemonde2016); however, working with such extensions of MSPs can further exacerbate the issues of model interpretation and computational cost outlined before.
To date, most subasymptotic models have a random scale construction: extreme data are modeled using the spatial copula of the process $ X\left(\boldsymbol{s}\right)= RW\left(\boldsymbol{s}\right) $ , where the scalar random variable $ R>0 $ is independent of the spatial process $ W $ . The relative tail heaviness of $ R $ and $ W $ , together with the dependence structure of $ W $ , provides a rich array of dependence possibilities (see Engelke et al. (Reference Engelke, Opitz and Wadsworth2019) for an almost-exhaustive description). Examples of this class of models include Opitz (Reference Opitz2016), Huser et al. (Reference Huser, Opitz and Thibaud2017), and Huser and Wadsworth (Reference Huser and Wadsworth2019). With appropriate specification of $ R $ and $ W $ , Pareto processes also possess a random scale representation (Ferreira and de Haan, Reference Ferreira and de Haan2014). Limitations of simple random scale models include their inability to capture complex dependence structures observed over large domains and a gradual decay of positive dependence to independence at large distances (because of the spatially constant $ R $ variable, which makes them non-ergodic), as well as the incorporation of nonstationarity. These limitations also make it difficult to adapt such models to the case of spatiotemporal data. Hazra et al. (Reference Hazra, Huser and Bolin2024) recently attempted to address these issues by proposing a Gaussian scale mixture model extension, which can capture short-range asymptotic dependence, mid-range asymptotic independence, and long-range exact independence, by replacing $ R $ with a suitable spatial process $ R\left(\boldsymbol{s}\right) $ (see also Krupskii and Huser (Reference Krupskii and Huser2022)).
Conditional spatial extremes model: The recently introduced conditional spatial extremes model (Wadsworth and Tawn, Reference Wadsworth and Tawn2022) is another class of models with flexible tail structures, which has been adapted to the spatiotemporal case in Simpson and Wadsworth (Reference Simpson and Wadsworth2021). These models are based on the assumption of a limiting process for suitably normalized $ Y\left(\boldsymbol{s}\right) $ , conditional upon the event $ Y\left({\boldsymbol{s}}_0\right)>t $ , for some conditioning location $ {\boldsymbol{s}}_0 $ . The formulation permits modeling of both asymptotically dependent and asymptotically independent data, while the most commonly used version of the likelihood is relatively simple. The “price” for these two major gains is the necessity of conditioning on a particular location being large, rather than, say, any location being large, though Wadsworth and Tawn (Reference Wadsworth and Tawn2022) outline ways in which this can be mitigated if it is an issue.
Considerations for inference: As mentioned, the fitting of peaks-over-threshold models via likelihood is much simpler than max-stable models. However, to avoid bias in estimation of the tail properties, it is necessary to fit the models only to extreme data. For generalized Pareto process and subasymptotic models, this is usually done using an appropriately censored likelihood (see, e.g., Huser and Wadsworth, Reference Huser and Wadsworth2022) to avoid influence of small values. Censoring can be done in different ways, either by applying a risk functional such as the maximum to the observation vector and fully censoring or discarding the vectors for which the risk is below a fixed high threshold, or using an approach focusing on marginal exceedances, without the need to define a global risk functional, where any component of the vector that falls below its marginal threshold is censored. Especially in the latter case, this act of censoring makes likelihood evaluation significantly more computationally intensive as it can require calculating numerous potentially high-dimensional integrals of the density function. Therefore, this typically limits likelihood-based inference for such models to numbers of observation locations less than about $ 30 $ . One way to circumvent this difficulty is to add a measurement-error (nugget effect) term and then use a Markov chain Monte Carlo algorithm to approximate these multifold integrals numerically (Morris et al., Reference Morris, Reich, Thibaud and Cooley2017; Zhang et al., Reference Zhang, Shaby and Wadsworth2022a; Hazra et al., Reference Hazra, Huser and Bolin2024). Another possibility is to use a weighted gradient scoring approach for inference, as advocated by de Fondeville and Davison (Reference de Fondeville and Davison2018), which mimics smooth censoring, while remaining relatively cheap computationally.
The spatial conditional extremes model can usually be fitted to data from hundreds of observation locations since its common variants take the form of a nonstationary and marginally transformed Gaussian process. The different nature of the asymptotics means that an extreme event is one that is extreme at the conditioning location $ {\boldsymbol{s}}_0 $ , but that may be large or small elsewhere, which (often) avoids the need for censoring in the likelihood. Simpson et al. (Reference Simpson, Opitz and Wadsworth2023) outline the extensions to thousands of dimensions via a slight change in formulation and use of Gaussian Markov random fields, which have a sparse probabilistic structure, constructed from stochastic partial differential equation models (see also Vandeskog et al. (Reference Vandeskog, Martino and Huser2024) for related methodology).
Techniques from the machine learning literature have recently permeated the world of spatial extreme value modeling: Sainsbury-Dale et al. (Reference Sainsbury-Dale, Zammit-Mangion and Huser2024), Sainsbury-Dale et al. (Reference Sainsbury-Dale, Richards, Zammit-Mangion and Huser2024), and Walchessen et al. (Reference Walchessen, Lenzi and Kuusela2024) describe the use of neural networks (NNs) for likelihood-free inference on the parameters of spatial models whose likelihoods are costly to evaluate because of high dimensionality. Richards et al. (Reference Richards, Sainsbury-Dale, Zammit-Mangion and Huser2024) extend these ideas to incorporate censoring of small values, a key consideration for extremes. These NN-based estimators are “amortized” (Zammit-Mangion et al., Reference Zammit-Mangion, Sainsbury-Dale and Huser2025), meaning that all computational effort is encapsulated in a training period, after which estimates can be obtained in a fraction of a second; this is in comparison to hours or sometimes days for likelihood-based estimates. To illustrate the potential of such amortized inference approaches, Figure 4 displays the estimated parameters obtained by fitting the copula associated with the anisotropic Huser and Wadsworth (Reference Huser and Wadsworth2019) model to the UKCP precipitation data from Figure 1 locally in all subgrids of size $ 32\times 32 $ within the study domain; these estimates were obtained by reusing the pretrained censored neural Bayes estimators from Richards et al. (Reference Richards, Sainsbury-Dale, Zammit-Mangion and Huser2024), thus bypassing the expensive training time. In this implementation, data below a marginal 0.9-quantile are censored. Each estimate was obtained in about 38 milliseconds, which amounts to less than 20 minutes overall for the entire domain (requiring about $ 30\hskip0.1em 000 $ fits to censored peaks-over-threshold data in dimension $ d=1024 $ ). While parameter estimates are quite variable and model fits would need to be more carefully validated using advanced diagnostics, the results make a degree of sense at first sight, as the parameter estimates are partially aligned with topography and the coastline; in particular, larger range parameters can be observed over the sea (e.g., over the Channel), as one would expect. However, the estimated shape parameters controlling the asymptotic dependence type are somewhat too large for precipitation data. This may be caused by a number of factors, including out-of-the-box model parameter estimation and use of relatively large subregions. Nevertheless, the model-based $ {\chi}_D(u) $ curve displayed in Figure 3 shows a decent fit for the particular highlighted subregion over the Channel, for which that parameter estimate is quite small (about $ 0.39 $ ). Overall, we emphasize that these results only serve to showcase the potential capability of flexible modeling strategies and modern inference approaches; they should thus not be over-interpreted nor used in practical risk assessment studies without further model validation.
In practical use, a major consideration is the availability of robust and user-friendly code for implementation. This is where alternative models are playing catch-up to MSPs, inference for which can be done in the well-established SpatialExtremes package in R, based on pairwise likelihood. For Pareto processes, the mvPot package (de Fondeville and Belzile, Reference de Fondeville and Belzile2023) offers useful options. Belzile et al. (Reference Belzile, Dutang, Northrop and Opitz2023) provide a recent tour through extreme-value software, including for spatial extremes. They highlight the general lack of packages for implementing modern spatial extremes approaches, although point to supplementary material and unpublished packages in several cases. Recently, the NeuralEstimators package has filled a gap by providing user-friendly functions to facilitate the construction and training of neural Bayes estimators that can, in particular, handle spatial censored peaks-over-threshold data from general spatial extremes models (see also Zammit-Mangion et al. (Reference Zammit-Mangion, Sainsbury-Dale and Huser2025)), who summarize available software for general amortized inference methods, including neural posterior and full likelihood approximations. Nonetheless, further efforts are needed to address the relative scarcity of software to facilitate uptake of novel modeling approaches.
Discussion: It is almost always the case that interest lies in understanding the extremes of original events, thus it makes sense to model their extremal behavior directly. This usually leads to simpler inference and more flexible classes of models. One reason sometimes cited for preferring a block maximum approach is that the resulting $ {Z}_n\left(\boldsymbol{s}\right) $ should be independent in time, as the environmental events comprising the block maxima will fall in different years. Although the modeling of original events does lead to more temporal dependence in extremes, it is typically preferable to ignore this when fitting via likelihood and adjust the uncertainty of parameter estimates post hoc (Fawcett and Walshaw, Reference Fawcett and Walshaw2007).
Recent developments in machine-learning-based inference begin to circumvent some computational difficulties for inference with both peaks-over-threshold models and MSPs alike. Nonetheless, the conceptual drawbacks of MSPs remain, together with potential simulation difficulties. These recent machine-learning-based developments have largely thus far focused on demonstrating how to fit existing models in previously unfeasible scenarios. An exciting possibility is their potential to facilitate generation of flexible new models with desirable tail properties: it is often easier to write down a stochastic representation for a flexible model than to derive (and evaluate) its likelihood function. For example, this could permit specification of and inference on models exhibiting asymptotic dependence at short range, with asymptotic independence and exact independence at longer range. The modeling of spatiotemporal extreme dependence, thus far tackled in relatively few cases, may also be facilitated via this route.
4. Conclusion
The probability theory supporting MSPs is rich and should not be despised, as it has contributed significantly to extreme-value theory and led to important advances as well as a better understanding of extremes in stochastic processes; we thus do not question the rigor or historical developments of this theory itself, but rather its relevance in concrete environmental applications. While the systematic use of MSPs in environmental studies needs to be gradually phased out, we also do not categorically advocate against the use of asymptotically justified models. Asymptotic theory can indeed be very useful provided the asymptotic paradigm directly responds to a concrete need posed by the applied scientific problem at hand. In particular, peaks-over-threshold approaches should be prioritized over block maxima approaches whenever possible. Pareto processes, and the more recent spatial conditional extremes model, are two possible frameworks that stem from more helpful asymptotic regimes. However, while asymptotic guarantees are in principle desired for tail extrapolation, we also stress that they should never supersede careful model checking. Oftentimes, more pragmatic solutions (e.g., certain types of random location and scale constructions, or physics-informed models) that display improved subasymptotic tail flexibility, physically more realistic properties, or computationally more affordable inference, can be better suited to address the specific scientific problem and should thus not be disregarded. When risk assessment of future extreme events (within a reasonable time frame) is of interest, the incorporation of nonstationary climate change signals (e.g., from climate model outputs under various greenhouse gas emission scenarios) and the development of asymptotic independence models with a flexible joint tail decay rate is more important than focusing on accurately identifying the asymptotic dependence class. Furthermore, when models are intended to be used in operational settings, some accuracy must sometimes be traded for speed of inference; in this context, geostatistical models constructed from Gaussian building blocks and/or based on sparse probabilistic structures, and fast approximate inference and simulation techniques (e.g., based on deep learning), can be particularly helpful. Nonparametric extreme-value frameworks for tail extrapolation that rely on broad regular variation (or hidden regular variation) assumptions, which are model-free and thus likelihood-free, are also promising approaches when the dataset is massive and the inference target is too complex to be captured by simple statistical models (Oesting and Huser, Reference Oesting and Huser2022).
We also mention that if interest lies only in the marginal properties of the estimated distributions (e.g., pointwise return-level maps), and not event-level dependence, then alternative simpler approaches are likely warranted. For example, the spatial dependence between parameters of a GEV distribution for maxima, or of a generalized Pareto distribution for threshold exceedances, may be well captured by a model that assumes conditional independence of observations given parameters, which are themselves modeled via Gaussian random fields (see Sang and Gelfand (Reference Sang and Gelfand2010), Geirsson et al. (Reference Geirsson, Hrafnkelsson and Simpson2015), Jalbert et al. (Reference Jalbert, Favre, Bélisle and Angers2017), and Jóhannesson et al. (Reference Jóhannesson, Siegert, Huser, Bakka and Hrafnkelsson2022) for GEV-focused and Cooley et al. (Reference Cooley, Nychka and Naveau2007) and Opitz et al. (Reference Opitz, Huser, Bakka and Rue2018) for generalized-Pareto-focused approaches).
An interesting direction for future research is to develop models and methods deeply rooted in extreme-value theory that harness the power and computational efficiency of advanced machine learning (e.g., deep extreme quantile regression, deep nonstationary spatial models, neural inference, and generative approaches) to better address problems in environmental data science; machine learning tools can indeed overcome limitations of classical statistical tools designed for estimating few model parameters from datasets of only moderate size. Wikle and Zammit-Mangion (Reference Wikle and Zammit-Mangion2023) review statistical deep learning methods in classical spatial statistics; with some suitable adjustments, these methods could potentially be adapted to the extreme-value context.
In the future, it would also be interesting to develop unified spatial frameworks for tractably modeling the full data range in a way that offers high flexibility in the lower joint tail, bulk, and upper joint tail, in the same vein as Naveau et al. (Reference Naveau, Huser, Ribereau and Hannart2016) in the univariate context. For example, normal mean–variance mixtures allow for both asymptotic dependence and independence as well as for controlling asymmetry in lower- and upper-tail dependence (Zhang et al., Reference Zhang, Huser, Opitz and Wadsworth2022b). Another related future line of research is to build upon recent advances in the geometric approach for multivariate extremes (Nolde, Reference Nolde2014; Nolde and Wadsworth, Reference Nolde and Wadsworth2022). The great benefit of this new asymptotic framework is that it provides a unified representation of multivariate extremes approaches (Wadsworth and Campbell, Reference Wadsworth and Campbell2024) and a flexible strategy for the joint modeling of extremes in “all directions”—including the lower and upper tails (Papastathopoulos et al., Reference Papastathopoulos, de Monte, Campbell and Rue2023; see also Mackay and Jonathan (Reference Mackay and Jonathan2023) and Murphy-Barltrop et al. (Reference Murphy-Barltrop, Mackay and Jonathan2024) for related methodology). Extending this approach to the spatial context and to capturing both bulk and tail behaviors simultaneously is an interesting area of investigation. Capturing the full range of values and their dependence relationships is particularly important for modeling compound extremes (AghaKouchak et al., Reference AghaKouchak, Chiang, Huning, Love, Mallakpour, Mazdiyasni, Moftakhari, Papalexiou, Ragno and Sadegh2020; Zscheischler et al., Reference Zscheischler, Martius, Westra, Bevacqua, Raymond and Horton2020) defined as the combination of conditions (over space, time, or several variables) leading to extreme impacts but where the contributing events are not necessarily extreme individually. With spatial data increasingly available on regular grids at relatively high resolution, such as climate model output and remote sensing data, the study of geometric properties of the pixellated exceedance sets at increasingly high threshold levels provides another line of research toward better understanding and modeling of the behavior of joint extremes and could offer methods that scale well to very large data volumes (Cotsakis et al., Reference Cotsakis, Di Bernardino and Opitz2023).
Although highly promising, neural likelihood-free parameter inference, mentioned in both Sections 2 and 3, still requires several developments to replace traditional likelihood-based inference. For example, theoretical guarantees on the accuracy of neural estimators in terms of the chosen NN architecture and number of training samples remain to be established. Furthermore, while uncertainty quantification can be handled via the bootstrap, for example, more needs to be understood about its properties. The effects of model misspecification also need to be studied more comprehensively: with likelihood-based inference, it is known that parameter point estimates are robust to certain types of misspecification, and that adjustments can be made for properly handling uncertainty (White, Reference White1982; Chandler and Bate, Reference Chandler and Bate2007). Model selection and comparison techniques, which have often relied on log-likelihood values, also require attention. Finally, the estimation of a relatively large number of parameters in flexible models for spatial tail dependence remains challenging. While some of these challenges can be addressed by adopting some recent amortized fully Bayes neural methods, such as BayesFlow (Radev et al., Reference Radev, Mertens, Voss, Ardizzone and Köthe2020, Reference Radev, Schmitt, Schumacher, Elsemüller, Pratz, Schälte, Köthe and Bürkner2023b) or JANA (Radev et al., Reference Radev, Schmitt, Pratz, Picchini, Koethe and Buerkner2023a), their use in extreme-value applications remains to be carefully investigated.
Our main recommendations moving forward can be summarized with the acronym MACHINE, which highlights seven “golden rules” for extreme-value analyses and the special role that machine learning is likely to play in the future of environmental extreme data science:
-
• Move away from MSPs;
-
• Adopt a peaks-over-threshold approach or a unified bulk-tail model whenever possible;
-
• Capture subasymptotic behavior rather than focusing on the asymptotic structure and the dichotomy between asymptotic dependence and independence;
-
• Harness specialized models with a sparse and numerically convenient probabilistic structure for speed and interpretation;
-
• Incorporate physics/climate knowledge into probability models as much as possible;
-
• Never-prioritize asymptotic justification over careful model checking;
-
• Embrace modern machine learning and artificial intelligence methods to enhance the modeling and inference of extreme events in complex settings.
“Starting the MACHINE” (or keeping it on) is key, in our opinion, to remain relevant and maximize the impact of extreme-value theory across statistics and applied environmental sciences, especially in the face of the escalating challenges posed by today’s world of extreme climate-driven events.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2024.54.
Acknowledgments
The authors warmly thank the Environmental Data Science (EDS) editorial board and D. Nychka from the advisory board for inviting them to write this opinion piece. The authors also thank Jordan Richards for his help to fit models in the UKCP precipitation data illustration using neural Bayes estimators he had trained previously in the paper Richards et al. (Reference Richards, Sainsbury-Dale, Zammit-Mangion and Huser2024).
Author contribution
Conceptualization: J.L.W., T.O., R.H.; Writing – original draft: J.L.W., T.O., R.H.; Writing – review & editing: J.L.W., T.O., R.H.; Funding acquisition: R.H. All authors contributed equally to this work.
Competing interest
The authors have no competing interests to declare.
Data availability statement
Data to reproduce Figures 1 and 3 can be freely obtained upon reasonable request from the authors or through the following websites: https://www.metoffice.gov.uk/research/approach/collaboration/ukcp (UKCP precipitation data); https://data.marine.copernicus.eu/product/SST_GLO_SST_L4_NRT_OBSERVATIONS_010_001/description (Red Sea surface temperature data); https://www.ecad.eu/download/ensembles/download.php (E-OBS Irish temperature data); and https://meteo.data.gouv.fr/form (French windspeed data).
Funding statement
This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-CRG2020-4394.
Comments
Dear Editor,
We thank you for the invitation to write this opinion piece about environmental data science and extremes. In this paper, we discuss the severe limitations of max-stable processes, which are currently widely used in environmental applications. After strongly arguing against their systematic use in environmental extreme data science, we discuss more appropriate modeling framework based on peaks-over-thresholds, as well as the use of modern hybrid statistical machine learning approaches that enhance the modeling and inference of extreme events.
We hope you will appreciate our contributions, and look forward to hearing from you.
With thanks and best wishes,
Raphael Huser (on behalf of all authors)