Introduction
Seismicity models form the backbone of probabilistic seismic hazard assessment (PSHA) for tectonic and induced earthquakes. They describe the frequency-magnitude distribution and the earthquake rate in space and time. Except for the occurrence of aftershock sequences, tectonic seismicity can be well described by a stationary process, and because of this, empirical fits of the declustered, observed seismicity data can be used to forecast future activity. In contrast, anthropogenic seismicity is time-dependent because it is related to activities such as, for instance, gas production. Forecasting and modelling the expected time dependence of future seismic activity requires an understanding of the physical processes involved in earthquake nucleation.
Stress changes are believed to be the main cause of seismicity. In particular, the Coulomb failure stress (CFS) is a key parameter controlling rock failure and earthquake nucleation (Okada, Reference Okada1992; King et al., Reference King, Stein and Lin1994). The reasons for stress changes can be manifold, such as earthquake-induced or poroelastic changes due to fluid injection or extraction (e.g., Geertsma, Reference Geertsma1973; Segall, Reference Segall1989). Besides estimating the expected stress changes, models need to define the corresponding seismicity response function. For this purpose, physical models use constitutive friction relations. While simple static–kinetic friction relationships predict an immediate response after a stress change, more complex relationships such as the rate- and state-dependent friction law (Ruina, Reference Ruina1983) explain delayed responses in seismicity.
Deterministic physical models based on CFS and friction laws can simulate earthquake nucleation, rupture propagation and stress changes on pre-defined neighbouring faults (Fig. 1a). They allow to address specific problems of earthquake patterns and to confirm or dismiss hypotheses, for example, to analyse the conditions for a rupture front propagating into the over- or underburden of the reservoir, which is important to estimate the maximum possible magnitude of triggered earthquakes. Their strength at the same time constitutes their Achilles’ heal since models are highly specific, that is, adapted to individual scenarios. Thus, unknown initial conditions, structural settings, or material properties require running a large number of alternative scenarios to account for epistemic uncertainties and provide realistic forecasts. Due to the high computational costs of deterministic rupture simulations, their direct use for PSHA is very expensive.
On the other hand, more efficient, purely data-driven statistical models cannot easily account for transient driving forces (Fig. 1b). Two representatives of such models are the temporally stationary, spatially inhomogeneous Poisson model and the Epidemic-Type Aftershock Sequences (ETAS) model (Ogata, Reference Ogata1988, Reference Ogata1998). The ETAS model supplements the Poisson model with aftershock activity described by empirical relationships for aftershock productivity, Omori-type rate decay and distance decay. The standard ETAS model explains the short-term clustering of earthquakes but is unable to account for changing boundary conditions that lead to transient seismicity. Similar challenges apply to machine learning (ML), a third class of data-driven models (Jordan & Mitchell, Reference Jordan and Mitchell2015). Unlike physical modelling approaches, ML approaches learn directly from data without explicitly reasoning about the underlying physical mechanisms (Bergen et al., Reference Bergen, Chen and Li2019) and thus avoid model biases due to incomplete process understanding. However, ML should not be used to extrapolate behaviour outside the parameter range of the training dataset. Thus, its accuracy and forecasting power are limited in the case of small training datasets, specifically if they are derived from transient processes.
While the empirical data-driven models ignore important physical knowledge and constraints, the model class of hybrid models combines the understanding of physical processes and statistical components and is thus most promising for seismic hazard assessment of anthropogenic seismicity. We will provide an overview on seismicity models applied to the Groningen field in the following, while more details, including a careful assessment of the advantages and disadvantages, can be found in Dahm et al. (Reference Dahm, Hainzl, Kühn, Oye, Richter and Vera Rodriguez2020b).
The occurrence of felt earthquakes accompanying gas production in Groningen, the Netherlands, is of high social importance and has initiated dense instrumentation and numerous field studies. Many of the more recent models for explaining and quantifying induced or triggered seismicity in the world have benefited from the outstanding comprehensive datasets collected at Groningen. This paper discusses the main categories of seismicity models at Groningen and potential future directions.
Seismicity models applied to the Groningen field
The Groningen gas reservoir and its seismicity already was the topic of numerous publications. For example, De Jager & Visser (Reference De Jager and Visser2017) give an overview on its geology, Visser & Solano Viota (Reference Visser and Solano Viota2017) and Van Oeveren et al. (Reference Van Oeveren, Valvatne, Geurtsen and Van Elk2017) on the static and dynamic reservoir model, respectively, Van Thienen-Visser & Fokker (Reference Van Thienen-Visser and Fokker2017) on compaction and subsidence, Dost et al. (Reference Dost, Ruigrok and Spetzler2017) on the development of seismicity, monitoring network and PSHA, Kortekaas & Jaarsma (Reference Kortekaas and Jaarsma2017) on the generation of an improved fault model, Bommer et al. (Reference Bommer, Dost, Edwards, Kruiver, Ntinalexis, Rodriguez-Marek, Stafford and Van Elk2017) on the development of a ground motion model, Kruiver et al. (Reference Kruiver, Van Dedem, Romijn, de Lange, Korff, Stafleu, Gunnink, Rodriguez-Marek, Bommer, Van Elk and Doornhof2017) of a shear wave and Hofman et al. (Reference Hofman, Ruigrok, Dost and Paulssen2017) of a shallow velocity model. The Groningen field, measuring approximately 30× 30 km (Spiers et al., Reference Spiers, Hangx and Niemeijer2017) and thus being one of the largest in the world, is located in the north-east of the Netherlands at a depth of approximately 3000 m. It consists of Slochteren sandstone sealed on the top by the Zechstein salt formation and is primarily limited in its lateral extents by fault closures. The net reservoir thickness increases from about 0 m to 280 m in a south-east to north-west direction (Bourne et al., Reference Bourne, Oates, Van Elk and Doornhof2014), with seismic profiles showing the existence of numerous normal faults with variable spatial density mainly striking in NNW–SSE direction with secondary fault trends running E–W and N–S (De Jager & Visser, Reference De Jager and Visser2017) mostly dipping around 70∘ (Van Wees et al., Reference Van Wees, Osinga, Van Thienen-Visser and Fokker2018). The presence of gas in Groningen was discovered in 1959, with production starting in 1963. The first seismic event related to gas extraction in Groningen was detected in 1991 (Van Eck et al., Reference Van Eck, Goutbeek, Haak and Dost2006). Since then, more than 350 earthquakes with magnitude M L ≥ 1.5 have been recorded. Today, with more than 100 seismic stations operated by KNMI at an interstation distance smaller than 5 km, the Groningen area is one of the most thoroughly instrumented sites globally for monitoring of induced seismicity.
Over the monitoring period 1 April 1995 to 30 October 2012, it was assessed that less than 0.1% of the induced strain in Groningen had been accommodated by earthquakes; either aseismic deformation such as fault creep and ductile flow is dominant or induced strain is elastic and remains available to be released by future earthquakes (Bourne et al., Reference Bourne, Oates, Van Elk and Doornhof2014). Especially in case of a termination of gas production in the near future, understanding the stress-induced seismicity is a challenge for seismicity models, since such a future stress scenario is not covered by the previous usage history.
Deterministic physical models
Several physical models were applied to model the seismicity of the Groningen field. Most researchers simplified the geometry to 2-D including only a single fault to investigate earthquake generation and rupture processes, but a 3-D model was developed as well to investigate the total fault slip (Sanz et al., Reference Sanz, Lele, Searles, Hsu, Garzon, Burdette, Kline, Dale and Hector2015).
Van Wees et al. (Reference Van Wees, Buijze, van Thienen-Visser, Nepveu, Wassing, Orlic and Fokker2014, Reference Van Wees, Fokker, Van Thienen-Visser, Wassing, Osinga, Orlic, Ghouri, Buijze and Pluymaekers2017) studied the quasi-static nucleation and rupture of earthquakes on a fault offsetting a reservoir. Model dimensions of 6 × 6 × 6 km were considered to be representative for major fault zones in the central area of the Groningen field. The model explains why earthquakes occur at pre-existing faults, since CFSs are far more pronounced on faults bounding or offsetting the reservoir than elsewhere. Assuming not critically stressed faults at the onset of the depletion, the model further explains the delayed onset of induced seismicity and the non-linear release of seismic moment observed in Groningen (Bourne et al., Reference Bourne, Oates, Van Elk and Doornhof2014). However, the overall characteristics (a Gutenberg–Richter b-value of approximately one and a maximum observed magnitude of ${M_{{\rm{max}}}} \approx 3.6$ ) can only be reproduced if the fault is critically stressed from the beginning. These limitations may result from considering only a single fault, whereas many faults with different orientations are present in the Groningen field. In addition, Van Wees et al. (Reference Van Wees, Buijze, van Thienen-Visser, Nepveu, Wassing, Orlic and Fokker2014, Reference Van Wees, Fokker, Van Thienen-Visser, Wassing, Osinga, Orlic, Ghouri, Buijze and Pluymaekers2017) neglect the impact of absorption of stress by the salt cap rock and energy losses through aseismic slip, which may explain the overestimation of the seismic moment release.
For the same model set-up, Van den Bogert (Reference Van den Bogert2018) and Buijze et al. (Reference Buijze, van den Bogert, Wassing, Orlic and ten Veen2017, Reference Buijze, van den Bogert, Wassing and Orlic2019) explored a fully dynamic initiation and propagation of ruptures in order to understand the impact of reservoir depletion, offset and thickness on the occurrence of fault instability and earthquake magnitudes. The simulations showed that aseismic slip always precedes seismic ruptures in the nucleation phase, which accelerates to seismic slip when its size exceeds a critical length, thus corroborating the analytical expression derived by Uenishi & Rice (Reference Uenishi and Rice2003). The pressure drop necessary to initiate seismic ruptures was found to be independent of the reservoir width, but dependent on the slip-weakening relationship and the reservoir offset. Whereas larger fault offsets promote the nucleation of ruptures, they reduce the down-dip rupture size because ruptures were more easily arrested in stable stress regions. Propagation outside the reservoir interval is promoted by critical in situ stress, a large stress drop and small fracture energy (Buijze et al., Reference Buijze, van den Bogert, Wassing and Orlic2019). Apart from the offset, the onset of fault slip is influenced by the orientation of the fault relative to the background stress field and the initial friction coefficient.
Zbinden et al. (Reference Zbinden, Rinaldi, Urpi and Wiemer2017) addressed the role of fluid flow in faults offsetting a reservoir similarly to Groningen. They used a fully coupled fluid flow and geomechanics 2-D simulator and found that stress-dependent permeability and linear poroelasticity played a major role in pore pressure and stress evolution within the fault. Especially, the fault strength was significantly reduced due to flow from the neighbouring reservoir compartment and other formations into the fault zone. In the case of well shut-in, a highly stressed fault zone could still be reactivated several decades after the production had ceased, although the shut-in resulted in a reduction of seismicity in general.
So far, 3-D reservoir models are restricted to quasi-static approaches, since stresses on faults offsetting the reservoir cannot be calculated by analytical expressions, particularly during rupture nucleation and propagation. Models of sufficient resolution to study the dynamic effects of seismic events are simply not feasible due to computational costs. An example of a 3-D fault model applied to the Groningen field is the finite-element model of Sanz et al. (Reference Sanz, Lele, Searles, Hsu, Garzon, Burdette, Kline, Dale and Hector2015), which consists of two detailed submodels with faults being embedded in a regional model. Such models can predict the cumulative fault slip but do not account for the nucleation and propagation of individual events. Thus, they can predict neither the radiated seismic energy and earthquake magnitude nor the relation between seismic and aseismic slip.
Since the Groningen field has long been produced with fluctuations in production rate over a range of timescales, DeDontney & Lele (Reference DeDontney and Lele2018) studied the impact of production fluctuations on seismicity and seismic hazard. First, they investigated the distance range influenced by daily and seasonal production variations using an analytic reservoir pressure model. Employing a rate-and-state (RS) model, they subsequently compared the seismicity rates for fluctuating and steady production schemes. In a third approach, the impact of such production schemes on seismic hazard was examined using an earthquake cycle model. The first model showed small pore pressure variations (approximately 0.1 bar) in the vicinity of wellbores (up to 200 m); since most faults are sufficiently far from wells, they should thus not be influenced by local production rates. The RS simulations demonstrated that clear seasonal seismicity rate changes are expected for the same stressing histories, but that high seismicity rates are required to conclusively reflect the signature of the imposed sinusoidal variation in stressing rate. In the third model, seismicity rates varied for fluctuating production rates and different parameters as well, but no change in the aggregate character of the seismicity and therefore, seismic hazard, was detected due to constant versus fluctuating production, since neither the cumulative number of events nor their magnitudes significantly changed over the year. However, the choice of only a few fault dip and strike angles is not representative of the highly heterogeneous fault system and fault interactions in the Groningen field. The question if the catalogue of observed earthquakes exhibits a seasonality is still debated (Nepveu et al., Reference Nepveu, van Thienen-Visser and Sijacic2016; Bierman et al., Reference Bierman, Paleja and Jones2018; DeDontney & Lele, Reference DeDontney and Lele2018; Park et al., Reference Park, Jamali-Rad, Oosterbosch, Limbeck, Lanz, Harris, Barbaro, Bisdom and Nevenzeel2018; Park & Nevenzeel, Reference Park and Nevenzeel2018).
As mentioned above, stresses on faults offsetting the reservoir could not be calculated by analytical expressions, restricting 3-D reservoir models to quasi-static approaches. However, Jansen et al. (Reference Jansen, Singhal and Vossepoel2019) recently provided analytic expressions for a normal fault offsetting a homogeneous reservoir of infinite extension, which may enable computationally feasible 3-D dynamic fault simulations in the future.
Data-driven models
In contrast to the deterministic models, data-driven models estimate the relation between input (e.g., production rates) and output (seismicity) by data fitting, either by statistical approaches fitting parametric functions or ML approaches. The advantage of fully data-driven approaches is that the functional relationship between seismicity and driving mechanisms is not predetermined. Thus, data-driven models avoid any bias related to an incomplete understanding of the earthquake nucleation mechanism.
Statistical approaches
For the Groningen field, statistical model approaches differ in input data. Whereas DeDontney (Reference DeDontney2017) tested functional relations between local compaction of the reservoir and cumulative number of induced earthquakes, Hettema et al. (Reference Hettema, Jaarsma, Schroot and van Yperen2017) and Vlek (Reference Vlek2019) showed that the cumulative number of induced earthquakes accelerates as function of the cumulative volume extracted. Based on the fault slip model of Lele et al. (Reference Lele, Hsu, Garzon, DeDontney, Searles, Gist and Dale2016), DeDontney (Reference DeDontney2017) tested the same set of relationships between seismicity rate and fault slip instead of compaction. In areas in which a fault slip model exists, the fault slip-based metric yielded a significantly better representation of the observed seismicity than the compaction-based models. However, the model yielded the best results when a threshold in time was used to suppress modelled earthquakes at the beginning of gas production, indicating a physical inconsistency of the statistical model. In addition, although various functional forms fit the observed seismicity almost equally well, they resulted in very different activity forecasts over the next 10 years. This highlights the large epistemic uncertainty of predictions based on empirical functions and the insufficient size of the observed dataset.
A similar criticism can be expressed for the application of the ETAS model to Groningen seismicity. Since the ETAS model depends non-linearly on its model parameters, its forecasts are highly sensitive to the parameters estimated from the data. Thus, the input catalogue needs to include a sufficient number of aftershocks. The percentage of activity classified as aftershocks differs from ∼ 5% (Muntendam-Bos, Reference Muntendam-Bos2020) to 10–20% (Bourne et al., Reference Bourne, Oates and Van Elk2018) and 18–36% (Post et al., Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021), but likely does not fulfil this requirement.
For the model presented by Hettema et al. (Reference Hettema, Jaarsma, Schroot and van Yperen2017) and Vlek (Reference Vlek2019), it is noteworthy that if the Groningen reservoir is completely exploited, this model predicts a fixed number of future events independently of the production scenario. Consequently, lowering the annual production only postpones earthquakes. This behaviour results from the simple empirical relationship between the cumulative earthquake number and the cumulative extracted gas volume, which has the advantage that time is eliminated as a variable, and short-term fluctuations on the order of months are suppressed. In combination with the empirical Gutenberg–Richter frequency-magnitude relation, the occurrence probabilities can be easily estimated for any magnitude range, in particular, the maximum expected magnitude (Vlek, Reference Vlek2019; Zöller & Holschneider, Reference Zöller and Holschneider2016). Nonetheless, the approach was only tested for the entire field without spatial resolution. The first test on different sub-regions of the Groningen field indicated that the empirical relation is not uniform and varies significantly in space, limiting the model’s applicability (Hettema et al., Reference Hettema, Jaarsma, Schroot and van Yperen2017).
ML approaches
Limbeck et al. (Reference Limbeck, Lanz, Barbaro, Harris, Bisdom, Park, Oosterbosch, Jamali-Rad and Nevenzeel2018) and Lanz et al. (Reference Lanz, Bisdom, Barbaro, Limbeck, Park, Harris and Nevenzeel2019) applied ML to forecast production-induced seismicity in the Groningen field. While the model of Limbeck et al. (Reference Limbeck, Lanz, Barbaro, Harris, Bisdom, Park, Oosterbosch, Jamali-Rad and Nevenzeel2018) was limited to a cumulative prediction of the earthquake numbers, Lanz et al. (Reference Lanz, Bisdom, Barbaro, Limbeck, Park, Harris and Nevenzeel2019) extended the model to predictions in time and space. The ML model was trained only with data recorded before the year 2013. The hold-out data between 2013 and 2016 were used to test the forecast power compared to two baseline models. The first assumed that the forecast rate was proportional to the change in depletion thickness, that is, pressure multiplied by reservoir thickness, while the other considered a proportionality to the change in strain thickness, that is, vertical strain multiplied by reservoir thickness.
The results of Lanz et al. (Reference Lanz, Bisdom, Barbaro, Limbeck, Park, Harris and Nevenzeel2019) demonstrated that ML models are able to qualitatively capture the decreasing trend in seismicity observed during the 2013–2016 hold-out period. However, although there was a relative match in trends, the models systematically underestimated seismicity. The random forest algorithm (Hastie et al., Reference Hastie, Tibshirani and Friedman2009) yielded the best performance and a significantly better result than the two baseline models. In addition, the support vector machine with kernel function (Hastie et al., Reference Hastie, Tibshirani and Friedman2009) showed significant improvements. The most significant spatial features were the topographic gradient of the reservoir, reservoir thickness variations along major faults and compressibility. Overall, however, static features had less impact than dynamic features such as pore pressure.
In any case, if the pressure development in the long term does not fall within the pressure range used in the training dataset, the ML model will likely lead to erroneous forecasts. In addition, important features are based on the results of the static and dynamic reservoir models, which in turn rest upon reservoir flow history matches and forecasts. Thus, the quality of the ML model depends on the quality of these input data. Confidence bands for the forecasts cannot be easily derived for the space-time ML model, and a separate magnitude model is required to complement the current spatiotemporal model.
Hybrid models
Most of the seismicity models applied to the Groningen field fall into the category of hybrid models. Due to their nature of combining physical models with statistical components, hybrid models differ notably in their level of sophistication.
One of the simplest approaches is the seismogenic index (SI) model (Shapiro et al., Reference Shapiro, Dinske, Langenbruch and Wenzel2010; Langenbruch & Zoback, Reference Langenbruch and Zoback2016; Broccardo et al., Reference Broccardo, Mignan, Wiemer, Stojadinovic and Giardini2017). Although initially developed for fluid injections, it has been applied to fluid production as well (Shapiro, Reference Shapiro2018). The SI model assumes that the earthquake rate is proportional to the flow rate, where the proportionality factor quantifies a seismogenic reaction of rocks to a unit volume fluid change. This factor is estimated by empirical fits of fluid volumes to seismicity rates. The assumption that the ratio between earthquake number and fluid volume is constant relies on physical considerations. The model is based on the Gutenberg–Richter distribution and the application of an inhomogeneous Poisson model. The index is temporally invariable, since it represents a characteristic of the seismogenic state of a crustal volume independent of the anthropogenic perturbation. Shapiro (Reference Shapiro2018) estimated that in the Groningen field, the dominantly normal faulting conditions are unfavourable for seismicity induced by injection but favourable for earthquakes induced by production. Although the model has been demonstrated to work well for many injection sites, the assumed constant ratio between number of earthquakes and fluid volume contradicts the observed ratio increase with cumulative production volume in Groningen. In addition, the hydraulic diffusivity is expected to change over time due to compaction (Hettema et al., Reference Hettema, Jaarsma, Schroot and van Yperen2017). So far, the model has not been demonstrated to fit the (spatio-)temporal distribution of seismicity in Groningen.
Wentinck (Reference Wentinck2015) postulated that during interseismic periods, the increase of shear stress on faults is proportional to the pressure drop. In contrast, earthquakes are assumed to lead to shear stress drops on faults proportional to the seismic moments of events. The likelihood of earthquake occurrences is described in his model by a Weibull probability distribution function. The model was applied to six relatively small regions (circular areas with a radius of ∼5 km) within the Groningen, Annerveen and Eleveld fields. Overall, the model fitted the observed data well for each region, but so far was only tested assuming uniform pore pressure changes with a constant rate, which is an oversimplification. Thus, it is unclear whether more realistic spatiotemporal pressure data, despite continuing using uniform stress state model parameters, would lead to a good fit of all regions. After production shut-in, the model predicts an almost constantly high activity, only slowly decaying due to the stress release by the earthquakes. Therefore, the model was not able to correctly forecast the observed decrease in seismicity in the region around Eleveld after 2005, when the reservoir pressure reduction almost fully stopped. In addition, the model not only neglects aftershocks but predicts a decreased activity level after an earthquake.
Based on the work of Segall & Fitzgerald (Reference Segall and Fitzgerald1998) and Zoback (Reference Zoback2010), the model of Dempsey & Suckale (Reference Dempsey and Suckale2017) assumes poroelastic contraction as earthquake generation mechanism. An earthquake nucleates if a critical stress level is exceeded over a critical slip distance resulting in a slip-weakening instability (Uenishi & Rice, Reference Uenishi and Rice2003). A fractal model determines the initial stress profile on the one-dimensional faults. The rupture propagation on those heterogeneous faults is calculated by solving the equation of motion for two tips of an expanding crack (Eshelby, Reference Eshelby1969; Dempsey & Suckale, Reference Dempsey and Suckale2016). However, to reduce computational costs, the fault system was restricted to the 325 largest faults represented as 1-D line segments. An ensemble approach was chosen to account for unknown initial conditions by selecting a different realisation of the initial shear stress profile for each simulation. In contrast to the other hybrid models, the earthquake magnitudes are directly determined by the model. Based on the best model describing the seismicity from 1991 to 2017, forecasts of seismicity rates and magnitudes were obtained for the period 2017 to 2024 according to three production scenarios. The key parameters were determined by a Bayesian approach. The model forecasts a rapid drop in seismicity rates after production reduction and results in a widely varying range of felt earthquakes with almost the same expected maximum magnitude for the three production scenarios. Advantageously, the model combines a deterministic description of the earthquake nucleation and rupture propagation with a probabilistic determination of uncertain parameters and the seismic catalogue. However, it suffers from limitations: the sensitivity to the selection of the faults and their orientations was not investigated, only the total field seismicity rate was modelled without any spatial resolution, and earthquake–earthquake interactions were not considered.
Candela et al. (Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019) employed a semi-analytic approach to calculate the Coulomb stress on faults dividing the reservoir into compartments resulting in differential compaction. The approach takes into account 3-D faults, especially the vertical offset of the faults (Van Wees et al., Reference Van Wees, Pluymaekers, Osinga, Fokker, van Thienen-Visser, Orlic, Wassing, Hegen and Candela2019), and the pressure history combined with the Dieterich (Reference Dieterich1994) RS model to obtain the spatiotemporal distribution of seismicity rates. The poroelastic change in CFS along the faults was computed on a metre scale, but finally, only the maximum Coulomb stressing rate was used for each vertical section. Candela et al. (Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019) successfully modelled the first-order spatiotemporal distribution of observed seismicity for two sub-areas of the Groningen gas field, the region exhibiting the highest seismic activity within the central area and a second area to the south-west. For some years, the number of observed earthquakes lay outside the 95% confidence interval of the model rates, which may be caused by aftershocks, short-period stress perturbations, or uncertainties in the stress or seismicity model. The analysis explained the observed differences in the seismicity response between the two sub-areas by different fault frictional responses. A subsequent study confirmed that spatial heterogeneity in the fault frictional response is required even after honouring the spatial heterogeneity in stress development across the Groningen gas field (Candela et al., Reference Candela, Pluymaekers, Ampuero, van Wees, Buijze, Wassing, Osinga, Grobbe and Muntendam-Bos2022). However, Candela et al. (Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019, Reference Candela, Pluymaekers, Ampuero, van Wees, Buijze, Wassing, Osinga, Grobbe and Muntendam-Bos2022) employed the seismicity rate in 1993 as apparent background rate, which constitutes a critical assumption since it disregards the most intensive production period (Fig. 2a). Thus, a large part of the most probably spatially heterogeneous pre-stressing history is not included. In addition, the model does not account for event magnitudes.
Heimisson et al. (Reference Heimisson, Smith, Avouac and Bourne2022) tested a revised RS model called the threshold RS (TRS) model, which takes into account that seismic sources can initially be far from instability. They applied the TRS model to the Groningen field using the maximum Coulomb stress estimated by Smith et al. (Reference Smith, Heimisson, Bourne and Avouac2021) 5 m above the top of the reservoir. The RS and TRS models result in comparable spatial distributions of earthquakes in good agreement with the observations, but the TRS model’s fit to the observed time-varying seismicity rate is superior, better reproducing its onset, peak and decline.
Richter et al. (Reference Richter, Hainzl, Dahm and Zöller2020) tested the RS model of Dieterich (Reference Dieterich1994) for the Groningen field employing a simplified set-up. Instead of detailed CFS calculations, they assumed a simple proportionality of CFS to pore pressure changes and compaction strain and compared the results of the RS model with those of the Coulomb failure model. In particular, the RS formula was solved by describing the stress history as a succession of steps (Hainzl et al., Reference Hainzl, Steacy and Marsan2010), and the stress field was based on the annual pore pressure and compaction data from a 2-D model provided by NAM. The fault distribution as defined by Dempsey & Suckale (Reference Dempsey and Suckale2017) was taken into account as fault density map, implemented as a factor varying the background seismicity rate. The inclusion of the fault density improved the models significantly and explained prominent spatial patterns of the seismicity. The best-fitting model was found to be the RS model based on pore pressure (Fig. 2b and c). In particular, the sparse seismicity before the year 2000 and the subsequently increasing number of events, focused on the zones of high fault density in the central part of the field, can be recognised. The seismic density is overestimated in the southern part of the field where more E–W striking faults are located and the homogeneous model assumptions are insufficient.
Figure 3 shows a fit of the models illustrated in Richter et al. (Reference Richter, Hainzl, Dahm and Zöller2020) updated until summer 2021. Again, the RS model provides the best fit to the observations, well explaining the acceleration and decay of the activity before and after 2015. The CFS model, assuming a proportionality between the stress and earthquake rate, cannot explain the data, potentially because the initial stress state was subcritical. Indeed, the subcritical CFS model (CFSsub), which requires that a critical stress state is reached locally to initialise earthquake nucleation, fits significantly better than the CFS model, but still worse than the RS model. The corresponding information gain (Rhoades et al., Reference Rhoades, Gerstenberger, Christophersen, Zechar, Schorlemmer, Werner and Jordan2014) per event relative to the Poisson model is 0.26, 1.37 and 1.50 for CFS, CFSsub and RS model, respectively.
Beside the oversimplification of the stressing history, the models tested by Richter et al. (Reference Richter, Hainzl, Dahm and Zöller2020) provide no information on event magnitudes and ignore earthquake interactions. For demonstration purposes, we combined the models with the ETAS approach to account for aftershock generation. To this end, we used the calculated rates r(t) of the CFS, CFSsub and RS model as background activity for the ETAS model (Ogata, Reference Ogata1988). In particular, we considered the earthquake rate as:
calculated for earthquakes with magnitudes larger than 1.45. Here, c and p are parameters of the Omori–Utsu law (Utsu et al., Reference Utsu, Ogata and Matsu’ura1995), while K and α describe aftershock productivity. The pre-factor f is needed to re-scale the background rate. Because the earthquake number is too small to constrain the fit of all five parameters, we assume typical parameter values, K= 0.018, α = 0.8, c = 0.01 days and p = 1.1. The only free parameter is f, which was fitted by the maximum likelihood method. The resulting model trends accounting for aftershocks are shown in Fig. 4. The fit of the RS model has improved even more. Especially, the years with high observed seismicity rates are explained better. For two simplified production scenarios, we ran 10,000 forward simulations to calculate the expected mean rate and the confidence intervals. In the first case, production continues in the same manner, while in the second case, it is strongly reduced. In particular, we assume that the local stress changes per year remain the same or drop to 10% of the last period. Compared to the model forecasts disregarding aftershocks, the models combined with ETAS forecast on average higher rates and wider confidence intervals.
Probably, the most sophisticated seismogenic model for the Groningen field is the NAM hybrid model (Bourne & Oates, Reference Bourne and Oates2017a; Bourne et al., Reference Bourne, Oates and Van Elk2018). It was introduced by Van Elk et al. (Reference Van Elk, Doornhof, Bourne, Oates, Bommer, Visser, van Eijs and van den Bogert2013) and Bourne et al. (Reference Bourne, Oates, Van Elk and Doornhof2014) as part of a PSHA. Similar to our examples described above (Figs. 3 and 4), the NAM model predictions are based on ensembles of earthquake catalogues using Monte Carlo simulations (Bourne et al., Reference Bourne, Oates, Van Elk and Doornhof2014, Reference Bourne, Oates, Bommer, Dost, Van Elk and Doornhof2015). Except for the initial version of the model, aftershock occurrence is considered employing the ETAS approach. The model underwent a considerable development from an empirical basis (Bourne et al., Reference Bourne, Oates, Van Elk and Doornhof2014) to a more physical and complex formulation (Bourne & Oates, Reference Bourne and Oates2017b; Bourne et al., Reference Bourne, Oates and Van Elk2018). A detailed review can be found in Dahm et al. (Reference Dahm, Hainzl, Kühn, Oye, Richter and Vera Rodriguez2020a). The initial version of the NAM model (Van Elk et al., Reference Van Elk, Doornhof, Bourne, Oates, Bommer, Visser, van Eijs and van den Bogert2013; Bourne et al., Reference Bourne, Oates, Van Elk and Doornhof2014) assumed that compaction is the main mechanism behind induced seismicity in the Groningen field and employed an empirical relation based on strain partitioning linking compaction strain and total seismic moment. A revision of the model (Bourne & Oates, Reference Bourne and Oates2014, Reference Bourne and Oates2015) changed the focus from modelling total seismic moment budget to modelling seismic activity rate. In particular, the model was based on calculations of the time-dependent compaction strain and the assumption of statistically distributed pre-stress values, which led to a time-dependent seismicity rate modelled by an inhomogeneous Poisson model. A new framework for fault reactivation induced by poroelastic deformation (Bourne & Oates, Reference Bourne and Oates2017b; Bourne et al., Reference Bourne, Oates and Van Elk2018) provided an operational, stochastic- and physics-based model to account for the early spatiotemporal evolution of seismicity rates and magnitudes as a function of reservoir pore pressure, strain and lateral geometrical changes within a heterogeneous, thin-sheet reservoir. In this model, elastic and geometric reservoir heterogeneities governed the incremental Coulomb stresses induced by pore pressure changes, and under these incremental Coulomb stress loads, the weakest parts of pre-existing geological faults could experience frictional failure resulting in an induced earthquake. Structural heterogeneities are assumed to localise shear stresses and seismicity where structural gradients are greatest. These places are often associated with faults that partially offset the reservoir, and elastic heterogeneities serve to localise induced shear stresses and seismicity within the regions of greatest reservoir compressibility (Bourne & Oates, Reference Bourne and Oates2017b; Bourne et al., Reference Bourne, Oates and Van Elk2018). The location and geometry of faults were taken from the seismic interpretation. Faults with larger offsets were ignored because they were supposed to be aseismic due to their contact with the overlying Zechstein salt formation. This involved the risk of excluding major faults that may become activated in the future. The elastic and geometric heterogeneities were represented by a deterministic, smoothed, poroelastic, stress–strain tensor field, which was derived using geodetic measurements of surface displacements, geophysical imaging of reservoir geometry and in-well monitoring of reservoir pore pressures (Bourne & Oates, Reference Bourne and Oates2017b; Bourne et al., Reference Bourne, Oates and Van Elk2018). Fault friction heterogeneities localised seismicity within the weakest fault segments. Since these were located in the tail of their joint probability distribution, extreme value theory could be applied to govern the exponential-like increase in the rate of induced seismicity and an increase in expected magnitudes as the fraction of fault segments that are close to failure increases (Bourne & Oates, Reference Bourne and Oates2017b; Bourne et al., Reference Bourne, Oates and Van Elk2018). However, once the reservoir transitions into a steady state, the seismic activity rates become proportional to induced strains leading to their overestimation. Since the parameters of frictional fault strength and initial stress heterogeneities are not directly observable, they are represented in the model by a single invariant probability distribution of initial fault stress and a transient stochastic function for stress-induced earthquake nucleation due to previous earthquakes (Bourne et al., Reference Bourne, Oates and Van Elk2018). A constant observation made throughout the development of the NAM seismogenic model was the lack of sufficient data to constrain its parameters. This limitation became more accentuated as the number of model parameters increased with the model sophistication.
Recently, Smith et al. (Reference Smith, Heimisson, Bourne and Avouac2021) demonstrated that the lag and exponential onset of seismicity are well reproduced assuming either a generalised Pareto distribution of initial strength excess, as previously considered by Bourne & Oates (Reference Bourne and Oates2017b) as well as Bourne et al. (Reference Bourne, Oates and Van Elk2018), or a Gaussian distribution. While the former can only represent the tail of the distribution, the latter describes both its tail and body. The authors used this representation to test whether the induced seismicity at Groningen has transitioned to a steady state in which the earthquake rate is proportional to the stressing rate but found that this is not yet the case.
All of the discussed hybrid models are based partially on stress changes related to the reservoir model. Uncertainties from these as well as other input data will certainly be carried forward into the model forecasts. However, during seismic hazard computation, Monte Carlo simulations can incorporate these uncertainties in the final estimates due to the low computational costs. Different branches in the logic tree can address the epistemic uncertainties related to the reservoir model and parameter uncertainties. In contrast, aleatoric uncertainties can be quantified by a large number of stochastic forward simulations.
Discussion and concluding remarks
Open questions
The information on the 3-D reservoir and fault structures, pressure depletion, reservoir compaction, and seismicity in Groningen is comprehensive and detailed. The seismicity models for Groningen have considered most of this unique and outstanding information and are advanced. However, there are knowledge gaps in terms of process understanding as well as the role of structural and dynamic parameters controlling earthquake nucleation and rupture.
The state of stress on faults and their history is assumed to be a key factor controlling the trigger potential of earthquakes. Nevertheless, the magnitude and orientation of stress as a function of space and time is often least known, since stresses cannot be measured directly with remote sensors and are difficult to determine in situ. Another crucial point in the development of more accurate fault models is the insight that salt intrusions within the faults may affect dynamic ruptures significantly, because they may introduce a rate dependency to fault movement and change the frequency-magnitude distribution of seismic events (Kettermann et al., Reference Kettermann, Abe, Raith, de Jager and Urai2017). Although for Groningen, there are currently no subsurface data indicating the existence of salt-filled faults, this may only be due to technical limitations. Using analogue models, Kettermann et al. (Reference Kettermann, Abe, Raith, de Jager and Urai2017) demonstrated that salt from the Zechstein formation might flow down-dip into opening faults due to gravitational flow.
In addition, Kortekaas & Jaarsma (Reference Kortekaas and Jaarsma2017) pointed out that the permeability structure of the Groningen reservoir is complex and highly anisotropic. Therefore, most clusters of production wells have their specific drainage areas. Another process potentially influencing pore pressure and thus Coulomb stress inside and outside the reservoir is related to the underlying aquifer system reacting to production and injections.
While much work has been done to estimate pressure and stress within the reservoir layer, the stresses outside of the field and within basement rocks have been rarely studied. Interestingly, induced seismicity in the Groningen area only occurred after a pore pressure reduction of ∼10 MPa was reached (e.g., Candela et al., Reference Candela, Wassing, Ter Heege and Buijze2018), which is often taken as evidence that before production, most regional faults were far from critical tectonic Coulomb stresses. On the other hand, North German gas fields, placed in similar Rotliegend formations and showing a similarly delayed onset of seismicity, experienced significant earthquakes on faults outside of the reservoirs (e.g., Fig. 1 in Brandes et al., Reference Brandes, Steffen, Bönnemann, Plenefisch, Gestermann and Winsemann2014; Dahm et al., Reference Dahm, Krüger, Stammler, Klinge, Kind, Wylegalla and Grasso2007; Uta, Reference Uta2017). As the setting and production are comparable, and similar geomechanical fault stressing models are in use (e.g., Haug et al., Reference Haug, Nüchtern and Henk2018), the seismicity models suggested for Groningen can likely be used as well to study induced seismicity in most of the North German gas fields.
A further key question concerns the partitioning of deformation between poroelastic and time-dependent inelastic components and the heterogeneity of the remaining stored elastic stress. The Groningen field contains only a single injection cluster in the east of the field. However, it is unclear whether intermediate and long-term effects from injections may impact the seismicity at larger distances. Globally, examples of causal relationships as well as far-reaching and strongly delayed effects on induced and triggered seismicity are documented (Van der Elst et al., Reference Van der Elst, Page, Weiser, Goebel and Hosseini2016; Galis et al., Reference Galis, Ampuero, Mai and Cappa2017; Grigoli et al., Reference Grigoli, Cesca, Rinaldi, Manconi, Lopez-Comino, Clinton, Westaway, Cauzzi, Dahm and Wiemer2018).
Seismicity models for Groningen were designed to reproduce the historical seismicity for the given production and the derived Coulomb stress loading. For seismic hazard assessment, a frequency-magnitude distribution has to be assigned. Due to the limited number of recorded events, the most simple and conservative assumption is a Gutenberg–Richter distribution with a constant b-value combined with load-dependent productivity. However, newer models suggest implementing time- or stress-dependent variations of the distribution. For instance, Bourne & Oates (Reference Bourne and Oates2017a) proposed the use of a stress-dependent b-value and Bourne & Oates (Reference Bourne and Oates2019) the use of a stress-dependent tapered power law for larger magnitudes.
New high-precision source and rupture information combined with highly resolved fault models provide additional physical constraints that are not yet fully explored. The association of earthquake hypocentres with faults and the observed variability of seismic source mechanisms and isotropic compaction components has not yet been considered in seismicity models developed for Groningen. For example, Kortekaas & Jaarsma (Reference Kortekaas and Jaarsma2017) derived high-resolution faults from 3-D seismic data, which seem to better correlate with recent earthquake hypocentres than previously mapped faults. Intersecting faults, as close to the 2018 Zeerijp event (Wentinck, Reference Wentinck2018b), may accumulate stress. Therefore, such fault complexity may be related to larger induced events (see also Candela et al., Reference Candela, Wassing, Ter Heege and Buijze2018). Furthermore, improved detection and location of small magnitude events within the Groningen field can help to improve seismicity statistics and characterise active weak zones prior to the occurrence of large earthquakes. Such improvements might be achieved, for example, by applying novel methodologies as the highly scalable convolutional neural network presented by Perol et al. (Reference Perol, Gharbi and Denolle2018) for induced seismicity in Oklahoma, USA. In general, the low number of spatially and temporally clustered earthquakes in the Groningen field (Jagt et al., Reference Jagt, Ruigrok and Paulssen2017; Muntendam-Bos, Reference Muntendam-Bos2020) is an obstacle for deriving relative event locations. Another feature that may significantly help to improve seismicity models is the earthquake source mechanism as analysed for the Groningen field by Willacy et al. (Reference Willacy, van Dedem, Minisini, Li, Blokland, Das and Droujinine2018); Willacy et al. (Reference Willacy, van Dedem, Minisini, Li, Blokland, Das and Droujinine2019); Kühn et al. (Reference Kühn, Heimann, Isken, Ruigrok and Dost2020) and Dost et al. (Reference Dost, van Stiphout, Kühn, Kortekaas, Ruigrok and Heimann2020). For specific events, kinematic rupture models have been estimated (Wentinck, Reference Wentinck2017; Wentinck, Reference Wentinck2018a, b, c) or derived from apparent source spectra (Ameri et al., Reference Ameri, Martin and Oth2020). This rupture information is not yet considered in current seismicity models. However, estimation of the kinematic and dynamic rupture parameters of such small earthquakes remains a challenge despite the monitoring network extensions in 2015.
Objective assessment and comparison of different model types
For a specific case as the Groningen field, it is a challenge to objectively evaluate and compare models against each other, since they are developed for various aims and specific targets. Physical models and deterministic scenarios were mostly developed to investigate specific aspects of the nucleation and rupture process of induced earthquakes. They cannot be directly applied in seismic hazard studies because of limited computational power as well as unknown model parameters and initial conditions. Instead, they can help to improve the process understanding and to enhance concepts for seismicity models. However, both analytical and deterministic approaches have clear limitations considering the complex nature of reservoirs, subsurface and earthquake ruptures. ML presents a possibility to bypass the problem of incomplete and potentially biased process understanding, as it extracts relevant patterns and information directly from available multivariate datasets without requiring insight into the underlying physical mechanisms. The drawbacks of the ML approaches are the limited forecasting power due to the small number of earthquakes available for model training and that future stress evolution will likely not be covered by the historical training dataset.
Empirical statistical models also avoid physical constraints but implement – in contrast to ML – parametric relations between observable characteristics and use basic statistical models to account for aleatoric variability. Statistical models can be effective for short-term forecasts, such as predicting aftershock activity by ETAS-type models or predicting cumulative seismicity. However, a weakness similar to ML is that purely statistical models cannot provide reliable predictions for system states evolving outside the range covered during the learning period.
Hybrid models combine physics-based deterministic with statistical models. In particular, they are based on modelled stress values in the seismogenic zone and a threshold criterion leading to a forecast of the average rate. The natural variability in magnitude and time is accounted for by the statistical Gutenberg–Richter and Poisson models. Hybrid models are preferential for time-dependent hazard assessment of induced seismicity, because they can more easily adapt to changing production scenarios. Most hybrid models are based on reservoir models and reasonable physical considerations, but the validity of those assumptions is not always proven. For example, it is still debated whether to favour the RS model assuming RS-dependent friction (Bourne & Oates, Reference Bourne and Oates2018; Candela et al., Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019; Richter et al., Reference Richter, Hainzl, Dahm and Zöller2020; Heimisson et al., Reference Heimisson, Smith, Avouac and Bourne2022) or the CFS model assuming static–kinetic friction with instantaneous earthquake nucleation (Bourne et al., Reference Bourne, Oates and Van Elk2018; Richter et al., Reference Richter, Hainzl, Dahm and Zöller2020; Smith et al, Reference Smith, Heimisson, Bourne and Avouac2021).
Because of the large variation of parameter sets, assumptions, input data and different forecast outputs, it is almost impossible to compare and judge the performance of the alternative approaches. Further, due to the complexity of most models and the partly specific development and adaptation to the Groningen field, it remains unclear whether or not hidden parameters were set based on the knowledge of the full dataset without explicit perception of the authors. Thus, for a quantitative comparison, an independent, systematic testing approach is essential. This task has been already recognised for a long time for natural seismicity. To tackle it, the Collaboratory for the Study of Earthquake Predictability (CSEP) was established (Jordan, Reference Jordan2006). Within the CSEP framework, fully prospective tests of earthquake forecasts are carried out in independent testing centres using standardised statistical tests and authoritative datasets to assess the models’ predictive skills and to compare them objectively. CSEP experiments use pre-defined rules to test forecasts against observations using a number of different evaluation metrics (Jordan, Reference Jordan2006; Zechar et al., Reference Zechar, Gerstenberger and Rhoades2010; Jordan et al., Reference Jordan, Chen, Gasparini, Madariaga, Main, Marzocchi, Papadopoulos, Sobolev, Yamaoka and Zschau2011; Rhoades et al., Reference Rhoades, Schorlemmer, Gerstenberger, Christophersen, Zechar and Imoto2011; Michael & Werner, Reference Michael and Werner2018; Schorlemmer & Gerstenberger, Reference Schorlemmer and Gerstenberger2007; Schorlemmer et al., Reference Schorlemmer, Werner, Marzocchi, Jordan, Ogata, Jackson, Mak, Rhoades, Gerstenberger, Hirata, Liukis, Maechling, Strader, Taroni, Wiemer, Zechar and Zhuang2018). In addition, these include comparative metrics to statistically determine whether one model performs better than another over an evaluation period. Similar efforts have not been made so far for induced seismicity. In principle, it would be straightforward to implement a CSEP-type testing framework to validate, compare and rank the models developed for Groningen. Whether or not such a test set-up is feasible and successful depends on at least three conditions: (1) a sufficient number of earthquakes to allow for discrimination between models, (2) the willingness of model developers to participate and provide software codes that run automatically and (3) a test centre in which independent researchers run the codes.
It should be noted that although the production within the Groningen field will be terminated in the near future as decided by the Dutch government, the modelling of the seismicity after a production stop is a challenge for seismicity models because delayed processes and mechanisms can play a crucial role. For instance, a clear seismicity decrease followed the significant production reduction in 2014, but larger earthquakes of magnitudes M ≥ 3.0 continued to occur.
Thus, the feasibility of fully prospective tests should be evaluated carefully, since a CSEP-type testing framework may provide important indications about the predictive power of the different models. In particular, the test results may provide important information on how different models should be weighted in a logic tree approach for seismic hazard assessment in so-called ensemble models. This approach would ensure the best usage of existing seismogenic models.
Acknowledgements
The project KEM-08 ’Seismogenic source models for the Groningen gas reservoir' was financed by the Dutch State Supervision of Mines (SodM). The authors would like to thank Volker Oye (NORSAR) for the quality assurance of the project. Further, the authors are grateful to NAM personnel, especially Stephen Bourne, for openly discussing and explaining their source model at great length. Finally, the authors would like to thank two anonymous reviewers for their critical input that helped to improve and clarify this manuscript.