Introduction
The Groningen gas field is one of the largest in Europe and has been exploited since the 1960s. Since then, surface subsidence has been observed and induced seismicity slowly developed (van Thienen-Visser & Fokker, Reference van Thienen-Visser and Fokker2017; Dost et al., Reference Dost, Ruigrok and Spetzler2017). The connection between gas production, subsidence and seismicity is well-established (Van Eijs et al., Reference Van Eijs, Mulders, Nepveu, Kenter and Scheffers2006; Bierman et al., Reference Bierman, Paleja and Jones2015) and a result of pore pressure loss due to gas extraction (Bourne et al., Reference Bourne, Oates, Elk and Doornhof2014). Compaction in the various geological units has been measured (Kole et al., Reference Kole, Cannon, Doornhof and van Elk2017; Cannon & Cole, Reference Cannon and Cole2018; Kole et al., Reference Kole, Cannon, Tomic and Bierman2020) and linked to the observed seismic moment tensors (Bourne et al., Reference Bourne, Oates, Elk and Doornhof2014), which mostly show normal faulting (Willacy et al., Reference Willacy, van Dedem, Minisini, Li, Blokland, Das and Droujinine2019; Dost et al., Reference Dost, van Stiphout, Kühn, Kortekaas, Riogrok and Heinmann2020). Seismicity has slowly been building up over time, and with it, a very sophisticated and dense monitoring system (Dost et al., Reference Dost, Ruigrok and Spetzler2017). The correlation between the amount of gas production and the number of induced events can therefore be modelled reasonably well (Bourne et al., Reference Bourne, Oates and Elk2018).
Most earthquakes are small, but some reached a magnitude above 3. The fear is that even bigger events might cause serious damage to societal infrastructures in the area. Moment tensors are only estimated for the largest events (Willacy et al., Reference Willacy, van Dedem, Minisini, Li, Blokland, Das and Droujinine2019; Dost et al., Reference Dost, van Stiphout, Kühn, Kortekaas, Riogrok and Heinmann2020), which are well recorded across the monitoring network. Most seismic hazard analysis is based on point measurements of ground acceleration, maximum magnitude estimates and the frequency-magnitude relation. The latter is relatively well constrained, but the maximum magnitude is more uncertain (Dost et al., Reference Dost, Ruigrok and Spetzler2017).
While it is appealing to have simple models for earthquake generation, i.e. due to pore pressure loss in the thin reservoir layer (Bourne et al., Reference Bourne, Oates, Elk and Doornhof2014), it is not clear whether earthquakes occur randomly or present some space-time-magnitude correlations, resulting in earthquake clusters. A related question is whether all events are due to gas extraction, or if some are triggered by master events. Recent papers looked at this from various angles, using space-time declustering (Muntendam-Bos, Reference Muntendam-Bos2020), nowcasting (Luginbuhl et al., Reference Luginbuhl, Rundle and Turcote2018), non-stationary point processes (Post et al., Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021) and geomechanical modelling (Bourne et al., Reference Bourne, Oates and Elk2018; Candela et al., Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019). The estimates vary from very few to up to 27% correlated events. Most statistical analyses are based on stationarity, but the seismicity in Groningen clearly is not stationary: there is no sign of tectonic activity in the area, and seismicity slowly built up after gas production, which itself is irregular, and hopefully will stop some time after gas extraction ceases. Post et al. (Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021) proposed a method to take this non-stationarity into account. Also, the number of recorded events is moderate, which means that robust statistical analysis is a challenge.
We will revisit the statistics of recorded events and analyse their scale-invariance. If size and interevent time probability distributions are scale-invariant, the rate of elastic energy accumulation can be inferred (Benzi et al., Reference Benzi, Castaldi, Toschi and Trampert2022), which is very important for geodynamic modelling in the area. We will also propose a method, based on machine learning, to extract information on the number of (un)correlated events in the catalogue. This model does not require any assumption on stationarity or point processes. It simply is a nonlinear inference method using the interevent time distribution as input.
Event statistics
In this study, we used the induced event file maintained by the KNMI (https://www.knmi.nl/kennis-en-datacentrum/dataset/aardbevingscatalogus) containing all induced earthquakes recorded since 1986 in the Netherlands. Not all induced events occurred within the Groningen gas field. We therefore restricted the catalogue to the Groningen events by imposing a simple latitude-longitude filter (53.0931-53.4909N and 6.5516-7.1048E). By using such a simple filter, we probably lost a few Groningen events and included a few from nearby fields, but this will not greatly influence our statistical arguments below. We also discarded all events before 2002 as the instrumentation was sparse before that date. Rather than looking at magnitudes, a more physically meaningful quantity is the seismic moment, which is proportional to seismic energy release. We therefore converted the local magnitudes m l given by the KNMI into seismic moments using log10 M 0 = 1.5m + 9.105, where M 0 is the seismic moment in [Nm] and m the moment magnitude (Hanks & Kanamori, Reference Hanks and Kanamori1979). Before doing so, we had to change local magnitudes m l into moment magnitudes m using the expression in Dost et al. (Reference Dost, Edwards and Bommer2018). This expression is valid for local magnitudes between 0.5 ≤ m l ≤ 3.6. Restricting our catalogue further to events with a magnitude greater than 0.5 leaves us with 1372 events. We evaluated the probability distribution of the seismic moment using the Gaussian kernel density estimator from the SciPy statistics package (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html). We see (Fig. 1) that the probability density of the seismic moment presents a power law decay over four decades with a slope α = 1.69 ± 0.01 giving the power law p(M 0) ∝ M 0 −α . Using known relations from seismology (Ben-Zion, Reference Ben-Zion2008), we can relate the slope of the seismic moment probability density to the Gutenberg–Richter law, which is cumulative, to give a Gutenberg-Richter slope b = 1.04 ± 0.02. This is close to what has been reported elsewhere (Dost et al., Reference Dost, Ruigrok and Spetzler2017). It is worth noting that the reported b-values vary in different regions of the gas field, which is most likely due to changes in compaction (Bourne et al., Reference Bourne, Oates, Elk and Doornhof2014), or changes in magnitude of completeness used in various parts of the region.
Another statistic easily evaluated is that of interevent times. We follow Corral (Reference Corral2004) and estimate the probability density for interevent times, normalized by the average interevent time < t i >, which is 5.06 days, using the same events. We observe (Fig. 3) that for longer times, the normalized interevent times follow a Gamma distribution $\gamma \left(0.64,1.70\right)$ resulting in a negative slope of γ = 0.36 ± 0.02 giving a power law behaviour p(t i /<t i >) ∝ (t i /<t i >)−γ over two decades. This is slightly steeper than the values reported by Post et al. (Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021), but close to the universal values of Corral (Reference Corral2004). At shorter times, the slope is much steeper and is thought to be due to the presence of aftershocks following an Omori-type behaviour (Corral, Reference Corral2004). We find an Omori slope p = 0.82 ± 0.06. Although on the low end, this is within the range of expected values (Utsu et al., Reference Utsu, Ogata and Matsu’ura1995).
Both seismic moment and interevent times show a plateau at very low values of the respective variables. This is most likely due to the fact that the existing network, although dense, does not record all small events. There is a magnitude of completeness, above which all events are captured by the monitoring network. Dost et al. (Reference Dost, Ruigrok and Spetzler2017) estimated this magnitude of completeness to be above 1.2 for the Groningen area. If we now repeat the statistical analysis for a magnitude cut-off of 1.3, we are left with 694 events. The new probability density functions can be seen in Figs. 2 and 4. We now find α = 1.66 ± 0.02 and γ = 0.31 ± 0.05. The uncertainty in the slope of the intervent time curve is significantly larger, due to the dip at 10−3, which is likely due to the significantly smaller number of events using this higher cut-off. Within the uncertainties though, and certainly within two standard deviations, the slopes are the same. Introducing a magnitude of completeness eliminates the plateau in Fig. 2, while the interevent time curve remains largely unchanged (the Omori slope is now p= 0.87 ± 0.05). The average interevent time < t i > for this higher magnigtude cut-off is 8.87 days. We checked that for other magnitude cut-offs, the power law parts of p(M 0) and p(t i /<t i >) remained the same within the uncertainties, indicating that they are indeed scale-invariant.
Implications of the scale-invariance
In Benzi et al. (Reference Benzi, Castaldi, Toschi and Trampert2022), we showed that if p(M 0) and p(t i /<t i >) are scale-invariant, α and γ are related for two specific cases of energy loading in various systems. Let us generalise this result. Assume that an earthquake releases the elastic energy that has been stored in the system over time. After the earthquake released it, energy will start being stored again and we assume that it grows as some power of the normalized interevent time. The released energy is proportional to the seismic moment (Ben-Zion, Reference Ben-Zion2008). We can then consider a new random variable $X={E_{{\rm stored}} \over E_{{\rm released}}}={\left(t_{i}/\lt t_{i}\gt \right)^{\varepsilon } \over M_{0}}$ . This constant power ϵ can be interpreted as the average over the considered time span 2002–2021 and does not model possible changes due to variations in the volume of gas extraction over the same time span. Indeed Bierman et al. (Reference Bierman, Paleja and Jones2015) analysed the rates of earthquakes in Groningen and found a possible correlation with changes in gas extraction for events of magnitude lower than 1.5. Using the ratio distribution rule, p(t i /<t i >) ∝ (t i /<t i >)−γ and p(M 0) ∝ M 0 −α , the probability density for X is
where the integration is over the part where the seismic moment shows a power law. Now let us consider the scale transformation
We then find that
Since both p(M 0) and p(t i /<t i >) are scale-invariant, p(X) is too. This implies that the exponent of λ has to be 0 and
Using the measured slopes for the seismic moment and interevent time above, we find that for a magnitude cut-off of 0.5, ϵ = 0.93 ±0.03and ϵ = 1.05 ± 0.04 for events above the magnitude cut-off of 1.3. For all practical purposes, we infer that the elastic energy within the Groningen gas reservoir roughly increases linearly with time until a new event occurs, i.e. ϵ ≈ 1. It is remarkable that this loading is very similar to that of tectonic earthquakes (Benzi et al., Reference Benzi, Castaldi, Toschi and Trampert2022).
Modelling interevent time using gradient boosted trees
Except for very short interevent times, p(t i /<t i >) closely follows a Gamma distribution (Figs. 3 or 4). A Gamma distribution could be an indication of some correlation between magnitude, space and time (Corral, Reference Corral2005). Furthermore, a clear Omori slope is present at short interevent times. Another common assumption is that some events are truly independent and follow a Poisson process, resulting in an exponential distribution. An important question therefore is how many aftershocks are following Omori’s power law, how many events a Gamma distribution and how many independent events an exponential distribution in the Groningen catalogue. Without explicitly distinguishing between aftershocks (Omori) and otherwise correlated events, the amount of correlated events was estimated to be between a few percent up to 27% (Luginbuhl et al., Reference Luginbuhl, Rundle and Turcote2018; Bourne et al., Reference Bourne, Oates and Elk2018; Candela et al., Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019; Muntendam-Bos, Reference Muntendam-Bos2020; Post et al., Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021). Most of the studies based on statistics assume stationarity to perform some form of declustering, except for the latter. We propose a machine learning route to interpret the interevent time distribution directly without any assumptions, except that the observed distribution is a mixture of Omori distributed, Gamma distributed and exponentially distributed events.
We generated 105 samples from a power law, 105 samples from a Gamma distribution, whose coefficients have been taken from the interevent times above, and 105 samples from an exponential distribution. We then uniformly draw a random number n g ∈ [0,1] representing the fraction of samples from the Gamma distribution and n e ∈ [0,1] representing the fraction of samples from the exponential distribution. If n g + n e ≤ 1, then n p = 1 − n g − n e is the fraction of power law samples. If however, n g + n e > 1, we assume that the interevent distribution is dominated by either the Gamma or the exponential distribution, i.e. if n g > n e , n e = 0 and n p = 1 − n g , else if n g < n e ,n g = 0 and n p = 1 − n e . We then generate a set of in total 104 samples randomly composed of a fraction n g Gamma distributed samples, a fraction n e exponentially distributed samples and a fraction n p power law distributed samples. Similar to the Groningen event data, we estimate the probability density for this set using a Gaussian kernel density estimator. The aim is to infer the fractions n g , n e and n p directly from the probability density curve. To solve this regression problem, we used gradient boosting trees (Friedman, Reference Friedman2001) implemented by the Python machine learning package (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html). We created a training set of 5000 such probability density curves, which serve as input to the gradient boosting regressor and the fractions n g , n e and n p are the target.
Splitting the training set into 80% training data and 20% test data, we show the performance of the trained gradient boosting regressor on the test data not previously seen by the regressor model (Fig. 5). The output points nicely fall on the diagonal, indication that the model has successfully learned the sample fractions from the input data. Another indication that the learning was successful is that the R 2-score for the test data is 0.98, similar to the score for the training data. We experimented with increasing the training set and slightly changing the parameters of the input distributions, but results remained similar. As can be seen in Fig. 5, the model is not perfect and there is some spread around the diagonal, which will result in some uncertainty in the inference from the observed interevent time distribution. We can directly use this spread to infer the uncertainty on our inference by estimating the standard deviation of the points around the diagonal.
Regardless of the cut-off magnitude, we find n p = 0.03 ± 0.02, n g = 0.92 ± 0.03 and n e = 0.03 ± 0.03. It is not surprising that this inference is independent of the magnitude cut-off as the interevent time distribution is scale-invariant. Figure 6 shows that indeed the inferred fractions closely match the observed interevent time distribution, except for the plateau at the shortest times, which has not been modelled.
Concluding remarks
We reanalysed the Groningen event catalogue and estimated the distributions for seismic moment and interevent times. Upon increasing the cut-off magnitude in the catalogue, we showed that both distributions are scale-invariant. This scale-invariance implies that the dominant slopes in the seismic moment and interevent time distributions are related to the time evolution of the elastic energy loading in the system. We found that $${E_{{\rm{stored}}}} \propto \left( {{t_i}/ \lt {t_i} \gt } \right)$$ . The interevent time distribution indicates that most events are Gamma distributed. To gain more insight, we trained a gradient boosting regressor to infer the proportion of aftershocks (following an Omori power law), the proportion of events following a Gamma distribution and the proportion of independent events following an exponential distribution. Most events follow a Gamma distribution and very few an Omori power law or an exponential distribution.
The elastic energy that is stored over time, until a seismic event releases it, can be written as $E$ , where σ is the effective stress and ϵ the vertical strain (Bourne & Oates, Reference Bourne and Oates2017). Most events are located within the gas reservoir (Dost et al., Reference Dost, Ruigrok and Spetzler2017) and thus the temporal evolution of the stored energy, which we measured, also relates to the reservoir. Measurements of vertical strain in the reservoir using radioactive markers or distributed strain sensing show that the vertical strain increases proportional to time (Kole et al., Reference Kole, Cannon, Doornhof and van Elk2017; Cannon & Cole, Reference Cannon and Cole2018; Kole et al., Reference Kole, Cannon, Tomic and Bierman2020), we therefore deduce that the effective stress is roughly constant. This is similar to tectonic earthquakes where it is commonly assumed that apparent stress is constant (Madariaga, Reference Madariaga and Meyer2011). Bourne & Oates showed that the effective stress is $\sigma =C\Delta P$ where C is a factor, which is a function of the elastic constants and the Biot coefficient and $\Delta P$ is the pressure depletion in the reservoir. The pressure depletion $\Delta P\propto t$ (https://www.nam.nl/feiten-en-cijfers/gasdruk.html) and hence C ∝ t −1. We suggest that these loading rules should be used in future geodynamic modelling of the reservoir.
Recently, there has been some interest in declustering the Groningen catalogue and estimate the number of correlated events using various techniques (Luginbuhl et al., Reference Luginbuhl, Rundle and Turcote2018; Bourne et al., Reference Bourne, Oates and Elk2018; Candela et al., Reference Candela, Osinga, Ampuero, Wassing, Pluymaekers, Fokker, van Wees, de Waal and Muntendam-Bos2019; Muntendam-Bos, Reference Muntendam-Bos2020; Post et al., Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021). We proposed a machine learning approach using the interevent time distribution as input to infer the proportion of aftershocks, the proportion of events following a Gamma distribution and the proportion of independent events. Existing estimates of correlated events (mostly assumed to be aftershocks) vary from a few percent to up to 27%. We find that the proportion of aftershocks in the Groningen catalogue following an Omori-type law is most likely less than 7%. Similarly, we find that the most likely range of truly independent events, following an exponential distribution, to be between 0 and 9%. Both of these ranges are 95% confidence levels or 2 standard deviations. This leaves the bulk of events following a Gamma distribution. The question is if those events are correlated in some way or not. From a statistical point of view, assuming stationarity, the shape parameter of the Gamma distribution can be thought of as the ratio of independent over total events. Post et al. (Reference Post, Michels, Ampuero, Candela, Fokker, van Wees, van der Hofstad and van den Heuvel2021) modified this relation to incorporate non-stationarity. These statistical arguments do not contain any physical insight though. From a material science point of view, we know that many systems presenting avalanche dynamics show interevent time statistics evolving from exponential to Gamma distributions upon increasing compaction, confining pressure or rigidity (Kumar et al., Reference Kumar, Korkolis, Benzi, Denisov, Niemeijer, Schall, Toschi and Trampert2020). Kumar et al. (Reference Kumar, Korkolis, Benzi, Denisov, Niemeijer, Schall, Toschi and Trampert2020) also indirectly showed that, in case these interevent times are Gamma distributed, the systems presented some space-time correlations. Future work should therefore try and unravel the exact physics processes, which give rise to these correlations, and how they relate to gas extraction and hence compaction in the reservoir and try and link this understanding to the occurrence and forecasting of seismicity.
Acknowledgements
Two anonymous reviewers provided constructive comments, which are greatfully acknowledged. We thank Elmer Ruigrok who provided suggestions for filtering KNMI’s induced event catalogue and Hanneke Paulssen who provided constructive comments on an earlier version of this paper. The research leading to these results has received funding from the research programme DeepNL of the Dutch Research Council (NWO) under project number DeepNL.2018.033.
Competing interests
The authors declare none.