1. Introduction
Subglacial outburst floods, or jökulhlaups, originating from ice-dammed lakes are some of the most dramatic and hazardous glaciological phenomena. With peak discharges often exceeding 103 m3 s−1, they can pose significant danger to downstream settlements and environments, a threat likely exacerbated by population growth and climatic warming (e.g. Reference Björnsson, Owens and SlaymakerBjörnsson, 2004; Reference Barnett, Adam and LettenmaierBarnett and others, 2005; Reference Ng, Liu, Mavlyudov and WangNg and others, 2007). While an ability to predict the timing and magnitude of these floods can enable mitigation of their consequences, this problem has remained largely untackled, despite much research into their underlying physics (Reference NyeNye, 1976; Reference Spring and HutterSpring and Hutter, 1981; Reference ClarkeClarke, 1982, Reference Clarke2003; Reference Evatt, Fowler, Clark and HultonEvatt and others, 2006; Reference FowlerFowler, 2009; Reference Kingslake and NgKingslake and Ng, 2013), how they initiate (Reference ThórarinssonThórarinsson, 1953; Reference GlenGlen, 1954; Reference LiestølLiestøl, 1956; Reference FowlerFowler, 1999; Reference JóhannessonJóhannesson, 2002; Reference Flowers, Björnsson, Pálsson and ClarkeFlowers and others, 2004; Reference WalderWalder and others, 2005, Reference Walder2006; Reference Huss, Bauder, Werder, Funk and HockHuss and others, 2007; Reference Werder and FunkWerder and Funk, 2009), their geomorphic impact (Reference Walder and DriedgerWalder and Driedger, 1994; Reference Cenderelli and WohlCenderelli and Wohl, 2003; Reference Roberts, Tweed, Russell, Knudsen and HarrisRoberts and others, 2003; Reference RussellRussell and others, 2006; Reference CarrivickCarrivick, 2007; Reference Burke, Woodward, Russell and FleisherBurke and others, 2009) and their influence on the flow dynamics of glaciers (Reference Anderson, Walder, Anderson, Trabant and FountainAnderson and others, 2005; Reference WalderWalder and others, 2006; Reference Magnússon, Rott, Björnsson and PálssonMagnússon and others, 2007, Reference Magnússon, Björnsson, Rott and Pálsson2010; Reference Sugiyama, Bauder, Weiss and FunkSugiyama and others, 2007; Reference Mayer, Lambrecht, Hagg, Helm and ScharrerMayer and others, 2008; Reference Riesen, Sugiyama and FunkRiesen and others, 2010; Reference Bartholomaus, Anderson and AndersonBartholomaus and others, 2011; Reference Kingslake and NgKingslake and Ng, 2013) and ice sheets (Reference Bell, Studinger, Shuman, Fahnestock and JoughinBell and others, 2007; Reference Sergienko, MacAyeal and BindschadlerSergienko and others, 2007; Reference Stearns, Smith and HamiltonStearns and others, 2008; Reference Sergienko and HulbeSergienko and Hulbe, 2011).
Recently, Reference Ng and LiuNg and Liu (2009) put forward a mathematical theory for understanding the long-term timing pattern of jökulhlaups. Their model represents the ice-dammed lake as a threshold system that drains when it reaches a threshold water depth, which they initially kept constant. They found that in such a nonlinear system the lake yields an irregular sequence of flood dates with predictable timing characteristics when the annual total volume of meltwater feeding the lake differs from the volume needed to trigger it into a flood. They used their threshold model to analyse the recorded date sequence of 51 jökulhlaups from Merzbacher Lake, Kyrgyzstan, and showed that the variable meltwater inputto the lake caused by weather interacts with the threshold water depth to produce flood sequences that resemble the Merzbacher Lake flood record. Like the recorded flood dates, the simulated flood dates are focused on the warmest part of the year when the lake fills with meltwater more rapidly than at other times. This analysis was conducted with a constant outburst threshold, but Reference Ng and LiuNg and Liu (2009) went further and found empirical evidence suggesting that Merzbacher Lake’s threshold is not constant, but varies from flood to flood and is partially correlated with the rate of lake-level rise.
If certain aspects of jökulhlaup timing sequences can be explained using the outburst threshold concept, it may be possible to use it to predict individual flood dates. We explore this in this paper and take steps towards operational flood forecasting. Like Reference Ng and LiuNg and Liu (2009), we focus on the timing of floods (which could inform predictions of flood size) and use the Merzbacher Lake system as an example (Fig. 1a). How reliable, or hopeless is our best prediction of the next flood? Is operational forecasting feasible given our current knowledge of how jökulhlaups initiate? How complex should the forecast models be? We consider such questions, which should interest policy-makers as well as populations exposed to flood risks. Accordingly we set out to establish how well simple threshold models can predict the Merzbacher flood dates and, in so doing, develop measures of predictability that can be applied to other jökulhlaup lakes. Our results can serve as a benchmark for future forecasting efforts.
Because successful flood prediction requires accounting for the variability in the lake outburst threshold highlighted by the work of Reference Ng and LiuNg and Liu (2009) and by earlier studies (e.g. Reference Clague and MathewsClague and Mathews, 1973; Reference Walder and CostaWalder and Costa, 1996;Reference BjörnssonBjörnsson, 2003; Reference Ng and BjörnssonNg and Björnsson, 2003), a hierarchy of assumptions for this threshold is examined below. The assumptions are motivated by different hypotheses of how they depend on environmental conditions, such as the weather near the lake. Our ‘threshold assumptions’ range from the predominantly empirical to those based more strongly on physical grounds, and are presented and studied in order of increasing complexity.
The paper is organized as follows. In Section 2, we outline Reference Ng and LiuNg and Liu’s (2009) model of lake filling and draining and explain how it is used to predict flood dates in our study. Section 3 introduces Merzbacher Lake, its record of jökulhlaup dates and a melt equation needed in the Ng– Liu model for estimating this lake’s water supply. The flood dates are used with the melt equation to reconstruct flood volumes, whose distribution gives us a probabilistic handle on the size and timing of future floods. In Sections 4 and 5, we use the Ng–Liu model with different threshold assumptions to evaluate how well this model predicts the Merzbacher flood dates. After considering how to quantify mismatch between predicted and observed flood dates, we optimize each version of the model for prediction success. In this exercise, the choices of threshold behaviour, weather-data source and assumptions of future weather lead to multiple sets of prediction results, whose performances are compared and discussed. Significantly, we show that, in terms of their flood prediction ability, models that assume a constant outburst threshold are not only inferior to more complex models that allow the threshold to vary, but also inferior to a naïve model that assumes floods occur on the same date each year.
2. Flood-Date Prediction with the Threshold Model
2.1. The Ng–Liu model
In Reference Ng and LiuNg and Liu’s (2009) model, the jökulhlaup lake, which has volume V and water depth h, fills in response to meltwater input at a rate Qi, and drains suddenly and completely in a flood when a threshold water depth, h c, is reached (Fig. 1b and c). With t denoting time, their model equations are
where the function h(V) represents lake geometry. These equations generate sawtooth-shaped filling and draining cycles in the lake level that are irregular when Qi depends on the weather. By assuming a constant h c and estimating Qi from daily air temperature with the sub-model described later (Section 3.3), Reference Ng and LiuNg and Liu (2009) simulated Eqn (1) for Merzbacher Lake through 1956 to 2005 to obtain model flood dates, which they compared with the observed flood dates in this period (Table 1). Motivated by concepts from nonlinear dynamics, they analysed the simulated and observed date sequences with time-delay maps and demonstrated how the sequences resembled one another.
Reference Ng and LiuNg and Liu (2009) also investigated the possibility of a variable h c. They integrated Eqn (1a) for each period between successive floods to reconstruct the flood volumes and the long-term lake-level history. They interpreted inter-flood variability in the reconstructed flood volumes as a manifestation of corresponding variability in Merzbacher Lake’s outburst threshold. They then plotted each reconstructed flood’s outburst threshold against the mean lake-filling rate during the week that preceded the corresponding flood which they extracted from their reconstructed lake-level history. This revealed a negative correlation between the two variables, which suggests that lake-filling rate plays a role in determining the outburst threshold.
2.2. Our prediction strategy
In this paper, we use Ng and Liu’s threshold model for the purpose of predicting the next flood date from the date of the last (known) flood, and specifically for hindcasting the observed Merzbacher flood dates in Table 1. Our study period spans the first and the last flood dates and has 19 006 days on which we could ask when the next flood will be. We explore the effect of a variable outburst threshold on prediction success and seek the best assumption for h c. Most of our simulations use Eqns (1), but one set of simulations use a modified model with a threshold different from h c that mimics a mobile subglacial seal. A detailed explanation of this threshold is given in Section 4.3.
In a typical prediction run, we situate ourselves on a particular day D (Fig. 2); and, starting with an empty lake on the day after the previous known flood (e.g. one of the floods in Table 1), we integrate Eqn (1a) forward on a daily time-step to fill the lake, until h, found from the lake volume via h(V), reaches h c. Thus the next flood date is predicted, and its mismatch in timing from the actual (observed) next flood forms a prediction error, E (Fig. 2). As D marches forward in time past a known flood date, the start date of the integration is renewed. In each set of prediction runs assuming specific threshold behaviour and a specific temperature data source, we apply this procedure for every day in the study period to obtain the same number of predictions (and hence the same number of errors) as the length of the period and summarize the errors into an overall measure of mismatch. Two measures considered later are the root-mean-square (RMS) error and the fraction of predictions that are accurate to a fixed number of days.
2.3. ‘Real’ and ‘Simple’ schemes of weather forcing
When we perform this exercise to quantify the predictability of Merzbacher Lake’s floods, we know the complete past weather forcing – from archived daily temperature data – for deriving Qi in Eqn (1a). (These data are outlined in Section 3.3.) However, we put ourselves in the position of a forecaster in the past and assume that, on any day D on which a prediction is made, he or she knew the past weather but not the future weather, which must be estimated to run the model. We simulate this scenario by the procedure shown in Figure 2a. For all days up to and including D, we use archived daily temperature data for finding Qi. But for all days after D (until the model lake reaches its outburst threshold), we find Qi from a temperature forecast, taken to be the multi-year mean of the archived daily temperature on that calendar date (Fig. 2a). Although such a forecast could be made using sophisticated methods (e.g. regional climate models), this procedure is the easiest that incorporates weather uncertainty into our predictions; we call it the ‘Real’ weather-forcing scheme. As D approaches the next flood, more of the lake-filling period is simulated with known weather forcing, so the predicted flood date or ‘flood-date hindcast’ varies with D. In Section 5.3, this idea allows us to study whether predictions improve as we approach the next flood.
Later, in Section 5.2, we consider a second scheme of weather forcing, called the ‘Simple’ scheme, to assess the impact of weather uncertainty on flood predictability. The previous procedure is followed, but we deliberately assume that the forecaster knew the future as well as the past weather; thus, archived daily temperature data are used to derive Qi throughout each lake-filling period (Fig. 2b). In this case, flood-date hindcasts calculated on different days D that are bracketed by the same pair of observed floods will be identical
3. Study Site and Data Sources
3.1. The Inylchek glaciers and Merzbacher Lake
South Inylchek Glacier and North Inylchek Glacier together form the largest glacier system in the Tien Shan, central Asia (Fig. 1a). The southern glacier stretches for a distance of 50 km from its accumulation area by the ~7000 m high peaks of Khan Tengri and Podeba to its debris-covered terminus at ~2900 m a.s.l. Although located in Kyrgyzstan, its runoff flows into China to feed that country’s fifth-largest river, the Tarim, which provides a vital water supply to oases around the Taklamakan Desert. Glacial meltwater contributes at least 35% of the Tarim’s total runoff (Reference Aizen and AizenAizen and Aizen, 1998), and this proportion is predicted to rise over the next few decades (Reference Aizen, Aizen and KuzmichenokAizen and others, 2007).
Merzbacher Lake forms behind the ice dam made by South Inylchek Glacier across the valley occupied by North Inylchek Glacier (Fig. 1a). The lake fills typically to about 80–100 m depth before draining subglacially, producing jökulhlaups with durations of about a week and peak discharges of up to 1500 m3 s−1 (Reference LiuLiu, 1992; Reference MavlyudovMavlyudov, 1997; Reference Ng and LiuNg and Liu, 2009). Besides being a hazard, these floods represent a waste of valuable water resources (Reference Shen, Wang, Chun, Mao and WangShen and others, 2007). The Inylchek river is also a candidate for hydroelectric projects (Reference Ng, Liu, Mavlyudov and WangNg and others, 2007; Reference Mamatkanov, Deng, Tuzova and DzhumanazarovaMamatkanov and Deng, 2011). These reasons necessitate reliable forecasting of the floods.
3.2. Flood-date record
Merzbacher Lake is chosen for study because of its long and comprehensive jökulhlaup record and because its outbursts recur on a roughly regular basis (once every year on average, doubling or missing in some years; Reference Ng and LiuNg and Liu, 2009), so that attempts to hindcast their dates may have some chance of success. Table 1 lists the lake’s 54 flood dates from 1956 to 2008. All but the last three dates are taken from Reference Ng and LiuNg and Liu (2009), who compiled them from hydrological measurements at Xiehela gauging station near Aksu, China (Fig. 1a), and earlier publications. Those authors also derived the volumes of 19 floods by subtracting an estimated base-flow component from the area of flood hydrographs (Reference Ng, Liu, Mavlyudov and WangNg and others, 2007; Reference Ng and LiuNg and Liu, 2009). Of these flood volumes, 13 (bold in Table 1) are considered more reliable than the remaining 6 (Reference Ng and LiuNg and Liu, 2009) and are used to calibrate our melt sub-model below. The last three dates in Table 1 come from Liu Shiyin (personal communication, 2009).
3.3. Lake water supply sub-model and temperature data
Following Reference Ng and LiuNg and Liu (2009), we calculate the rate of meltwater input to the lake, Q i, in Eqn (1a) by using the temperature-index parameterization
where T (°C) is air temperature near the lake, k (m3 d−1 °C–1) quantifies the melt sensitivity to temperature, T 0 (°C) is the temperature threshold above which melting occurs, and c (m3 d–1) denotes effective water input to the lake due to calving from the ice dam. k, T 0 and c are assumed constant. The subscript 0+ means that T– T 0 is set to zero whenever this difference is negative.
Reference Ng and LiuNg and Liu (2009) prescribed data for T from the daily surface temperature provided by the US National Centers for Environmental Prediction (NCEP)/US National Center for Atmospheric Research (NCAR) reanalysis project (Reference KalnayKalnay and others, 1996; Reference KistlerKistler and others, 2001). Here we also employ the reanalysed daily surface temperature provided by the European Centre for Medium-Range Weather Forecasts, called ERA-40 (Reference UppalaUppala and others, 2005), in order to study the impact of different weather forcings on our predictions. We call these sources ‘NCEP’ and ‘ERA’ and use T NCEP and T ERA to denote the respective temperature data after these have been interpolated to the coordinates of Merzbacher Lake. Both projects assimilate weather data from multiple sources (e.g. satellites, radiosondes, aircraft, ships, ocean buoys) into a global climate model to reconstruct the state of the atmosphere, but they have different temporal ranges. T NCEP began before 1956 and continues to today and covers our entire study period.T ERA is available from 1 September 1957 to 31 August 2002, so it can be used in our prediction of floods 3–48 only. Like Reference Ng and LiuNg and Liu (2009), we do not use the temperature data from Tien Shan weather station (41.92° N, 78.23° E), Kyrgyzstan, because gaps in this dataset make its application difficult.
Reference Ng and LiuNg and Liu (2009) reported values of c estimated by other authors. They derived the other two sub-model parameters, k and T 0, by a nonlinear regression that fits Eqn (2) to the 13 reliable flood volumes in Table 1. The idea is that the daily melt volume predicted by Eqn (2), when summed over the known filling period in the run-up to each of these 13 jökulhlaups, should match each observed flood volume, because Merzbacher Lake is seen to drain completely in the floods. We find k and T 0 here by the same method, assuming Ng and Liu’s ‘typical’ value of c, (0.33 ± 0.06) × 105 m3 s–1. Table 2 lists the parameter values corresponding to the T NCEP and T ERA temperature forcings.
3.4. Probability distributions of flood volume and timing
Armed with Eqns (1a) and (2), the flood dates in Table 1 can be used to derive an empirical probability distribution of historical flood volumes, which in turn helps us gauge the size and timing of future floods. We do this here in three steps. First we reconstruct the volumes of the floods in Table 1 by integrating Eqn (1a) in the period between each pair of successive flood dates, filling the lake from empty. Two separate reconstructions are made with T NCEP and T ERA as forcings, with Eqn (2) taking the corresponding parameters in Table 2. With the NCEP and ERA forcings, this procedure reconstructs 53 and 45 flood volumes, respectively (Table 1). Figure 3 compares these two sets of results with the 19 observed volumes in Table 1 and with each other. The good agreements shown by these plots support our use of both temperature data sources for flood-date prediction.
Next we form a cumulative probability distribution to represent the likelihood that a flood does not exceed a certain size, a volume V F. This is done by putting the reconstructed flood volumes in Table 1 in ascending order, counting the number of floods with volumes ≤V F and turning the number into a fraction of the total, to represent probability. Figure 4a plots the resulting probability distribution against V F. It shows that historically the mean and the median flood volumes have been similar, ~1.64 × 108 m3. No floods had V F exceeding 3.2 × 108 m3, and 83% of them had 1.1 × 108 m3 < V F < 2.2 × 108 m3 (the interval between guides A and C in Fig. 4a). This volumetric range, where the cumulative probability rises steeply, highlights the size of most floods.
Finally, we use the distribution in Figure 4a to characterize flood timing. We assume that the probability of a flood having volume ≤V F is also the probability that the lake filled to this volume should have already outburst. Hence, by calculating how long the lake takes to fill to this volume (from empty), we can determine the probability of a flood having occurred by this time after the date of the last flood. By varying V F in this calculation, we can also determine how this probability changes with time. Figure 4b shows such an outburst probability curve calculated with flood 3 (t = 1957.7 years) as the starting time reference. At any point on the curve with probability p, the time value has been found by integrating Eqn (1a) from t =1957.7 years until the lake reaches a volume that equals the V F value corresponding to the same probability p in Figure 4a.
Since the outburst probability curve combines information from the flood-volume distribution, the last flood date and the weather-dependent lake supply Q i (which is seasonal, with peaks in summer and troughs in winter), probability curves constructed for other times using other floods in Table 1 (or future floods) as the starting reference will be different. However, Figure 4b illustrates key features of such curves. Its shape derives from the curve in Figure 4a, with the steep rise in outburst probability between A′ and B′ corresponding to the rise in cumulative probability between A and B in Figure 4a. This rise occurs in 80 days (between 1958.53 and 1958.75 years). This is a manifestation of the ‘focusing effect’ discussed by Reference Ng and LiuNg and Liu (2009, p. 611 and fig. 11), who showed that abundant meltwater supply during summer fills Merzbacher Lake rapidly and concentrates most of its outbursts in this period. In contrast, low supply towards the end of the melt season and through winter causes a slow increase in outburst probability with time. Figure 4b shows that the interval B′ –C′ (corresponding to the interval B–C in Fig. 4a) is 200 days long but experiences only a 10% increase in outburst probability.
While useful, these results offer predictions whose validity relies wholly on past empirical data and that are probabilistic, unable to tell us each flood’s size and timing. For instance, although Figure 4b indicates a high probability of flood 4 occurring by late August (t ≈ 1958.65, p = 0.5) and very high probability of this before early November (guide B′, p = 0.81),the actual flood came later, on 24 November. We turn to specific flood-date prediction based on the model next.
4. Flood-Date Prediction Models: Threshold Assumptions
In this section we detail the four threshold assumptions to be used with Eqns (1) and (2) to hindcast the Merzbacher flood dates. Three of them concern the lake depth threshold h c; the fourth invokes a mobile subglacial seal, as mentioned before. We label each assumption with an abbreviation (Table 3).
4.1. Constant Date (CD) model, and measures of prediction error
We first consider a naïve prediction model, called the Constant Date (CD) model, that does not implement an outburst threshold nor simulate filling of the lake, unlike the Ng–Liu model. The CD model postulates that the flood occurs on the same date each year, the motivation being that most Merzbacher flood dates cluster in the months July– September (see histogram in fig. 2 of Reference Ng and LiuNg and Liu, 2009). Although this model neglects environmental influence on the system, it sets a benchmark for assessing other prediction models: those failing to match its performance are practically useless. In introducing it here, we also consider how to quantify prediction success.
The CD model assumes a fixed calendar date Δ (expressed in day of year) for all floods, where Δ is chosen by the forecaster to optimize prediction success. Using two trial values, Δ = 268 (25 September in non-leap years) and Δ= 216 (4 August in non-leap years), we have used this model to hindcast the next flood date on every day of the study period. Each of these two sets of runs yields 19 006 hindcasts. Figure 5a shows their prediction errors, E, found by differencing each hindcast and the known date of the next flood (we define E to be positive if the hindcast is early). Although the errors span a large range, most of them cluster within 100 days. The outlier errors are associated with floods that occurred after unusually long or short lake-filling periods.
A straightforward quantifier of the errors in Figure 5a is their RMS, and we can find the best Δ that minimizes the RMS. But the RMS is not the only measure of prediction performance, nor necessarily the best measure, as it emphasizes outliers. Other measures may be more desirable from a forecaster’s viewpoint. For example, a good forecasting model may not be one with the lowest errors overall but one that gives the greatest number of ‘successful’ forecasts, i.e. forecasts near the actual flood dates. The histograms in Figure 5b show the percentage frequency of the errors in the two trial runs. Their central bars, each 40 days wide, show that if we define successful hindcasts as those having errors within ±20 days ( \ E\ ≤ 20), then the run assuming Δ = 216 yields five times more successful hind-casts (54.3%) than the run assuming Δ = 268 (11.2%). This percentage offers a measure of prediction success, and we call it P 20 (the subscript indicates the tolerance). In contrast to the RMS, P 20 emphasizes hindcasts that are relatively accurate over those that are out by a long way. It is a useful measure for the CD hindcasts because they miss some floods by over a year.
We proceed to optimize the CD model by making prediction runs with different Δ values and calculating the corresponding RMS and P 20. The results (Fig. 6) show that the RMS is minimized at 121.9 days when Δ = 268 (then P 20 = 11.2%), whereas P 20 is maximized at 54.3% when Δ = 216 (then RMS = 133 days). These Δ choices are optimal in their own right; each choice optimizes one measure at the expense of the other, since RMS and P 20 quantify different kinds of prediction success. In Section 5, these measures and the approach described here are used to optimize all the prediction models before we evaluate their performance.
The tolerance of 20 days in the P 20 measure is acceptable because jökulhlaup forecasts with such accuracy could be useful, but the tolerance choice depends on what one perceives as accurate. A general measure Pn considers a tolerance of ±n days, and this motivates another way of quantifying prediction errors. In Figure 5c we track the percentage frequency of successful hindcasts for different tolerance; their window of admittance widens as n increases. The two curves relate the errors from the trial runs of Figure 5a and rise monotonically. The perfect result would be zero errors for all hindcasts, plotting at 100% across the graph; in reality, we seek a curve lying as close to the top-left corner of the graph as possible. Figure 5c shows that the run that optimizes P 20 ( Δ = 216; black curve) excels over the run that optimizes RMS ( Δ = 268; grey curve) for tolerances of tens of day. For n > 72 days, the latter run performs better on two occasions, but such tolerance is too large to be useful.
Other measures of prediction performance could be used. For instance, in order to facilitate flood mitigation and emergency evacuation, one might favour forecasts that are early as opposed to late and employ a one-sided definition of Pn based on 0 ≤ E ≤ n. Generally, a suite of measures offers more complete characterization of the prediction errors. The forecaster must decide which measures are more important in a given scenario and weigh their outcomes accordingly for decision-making.
4.2. Depth-threshold assumptions
In the real system, weather influences how fast the lake fills, and glacio-hydrological factors govern the outburst condition of a flood, so we expect prediction schemes incorporating these environmental factors to excel over the CD model. The Ng–Liu model is the simplest of such schemes: Eqn (1) has an adjustable outburst threshold h c, and Eqn (2) captures the weather dependence of lake filling. We posit three assumptions for the behaviour of h c to be used with this model.
4.2.1. Constant Threshold (CT)
The simplest assumption, called Constant Threshold (CT), is that h c is constant. Figure 7 shows an example where it is used with the Ng–Liu model to hindcast flood 3 in the ‘Real’ scheme (Section 2). In this run, which assumes h c = 86 m and that the forecaster made the prediction on 2 July 1957 (302 days after flood 2), the flood hindcast is 19 August 1957, 19 days before the actual flood 3. We have made many sets of prediction runs like this to hindcast the Merzbacher flood dates, by adopting different thresholds in the range 70 < h c < 100 m and T NCEP or T ERA as weather forcing. Each set of runs used a fixed combination of these inputs to produce next-flood hindcasts on every day of the study period, and we compiled the prediction errors into the RMS and P 20 as described in Section 4.1. Figure 8 shows how these error measures vary with h c. For all values of h c in the imposed range, prediction runs using the NCEP forcing yield lower RMS than runs using the ERA forcing, but the optimal values of h c are similar: 87 m with NCEP and 86.5 m with ERA (Fig. 8a).P 20 is maximized when h c = 86.5 m with the NCEP forcing and when h c = 85.5 m with the ERA forcing (Fig. 8b). The value of h c that optimizes prediction performance thus falls within a narrow range (85.5–87 m). However, with the CT assumption, the lowest achievable RMS is 129.1 days and the highest achievable P 20 is 54.1%. This performance is worse than that of the CD model (121.9 days, 54.3%). Thus, while using a threshold allows changing weather to be accounted by the model, a fixed threshold does not improve the flood-date predictions in the ‘Real’scheme
4.2.2. Variable thresholds (VTh and VTT)
Historical changes in h c, shown by differences between the reconstructed flood volumes in Section 3.4, suggest that one might improve the forecasts by using a threshold that responds to environmental conditions. As discussed in Section 2, this idea originates from Reference Ng and LiuNg and Liu (2009), who reconstructed the lake-level history at Merzbacher Lake to extract the h c of each flood and the rate of lake-level rise (dh/dt) before it initiated. They found a negative correlation between these two quantities (see their fig. 15b) and interpreted it as being a result of pressure coupling between the lake and the subglacial hydraulics beneath the ice dam.
Motivated by this empirical finding, we explore two formulations of a variable threshold. The first formulation, which we label VTh, is the linear model proposed by Reference Ng and LiuNg and Liu (2009):
Where h0β and β are constants. When using this with Eqn (1) to predict flood dates, we calculate h c using the exponential moving average of dh/dt (with smoothing constant, S = 0.25; Reference BrownBrown, 2004). Reference Ng and LiuNg and Liu (2009) determined h 0β = 102 m and β = –58 d–1 for Merzbacher Lake from their correlation.
The second variable-threshold formulation, which we label VTT, assumes control on h c by surface temperature rather than the lake-level rise rate. The idea is that meltwater produced at the surface and penetrating to the glacier bed influences the subglacial outburst hydraulics. We suppose
Where haλ and λ are constants. h c is calculated using the exponential moving average of (T − T 0)0+ (again with S = 0.25) and the same temperature data, weather-forcing scheme and parameters as used in the lake water supply model in Eqn (2). Since dh/dt depends on T via melt generation and lake filling, Eqn (4) is a more general threshold description than Eqn (3) in the sense that it encapsulates meltwater control on h c via both the lake pressure and supraglacial water injection to the glacier bed.
When each formulation described here (VTh or VTT) is used with Eqns (1a) and (2) to predict flood dates, we optimize the model in the same way as before, by minimizing RMS and maximizing P20, but do so by searching over the corresponding two-parameter space (of h 0β and β, or h 0λ and λ).
4.3. Threshold based on subglacial water-divide migration (DM)
Our final threshold assumption treats flood-initiation physics in more detail than any of the previous assumptions. In a novel study, Reference FowlerFowler (1999) extended Reference NyeNye’s (1976) equations of jökulhlaup mechanics to model the establishment of a subglacial water divide or ‘seal’ under the ice dam in the period between floods. He envisaged that as the lake fills, the seal migrates towards the lake in response to changing subglacial water pressure and initiates the flood when it reaches the lake. The rate of seal migration depends on the rate of lake-level rise and the conditions of channelized subglacial drainage (e.g. hydraulic gradients associated with glacier geometry). Faster filling of the lake (due to greater meltwater input Q i) results in a higher lake ‘highstand’ by the time the seal arrives at the lake to start a flood. Although this theory has not been validated by subglacial observations, it highlights mechanistic links between environmental factors and flood initiation.
Reference FowlerFowler’s (1999) equations are not used here because the basal topography of South Inylchek Glacier and the attendant hydraulic potential gradients are poorly known. However, we mimic his seal dynamics by a lower-order model that parameterizes seal migration as a function of the lake water pressure:
In this ‘Divide Migration’ (DM) model, h is the lake depth as before, Y measures the dimensionless distance of the seal from the lake, h 0 is the ice-dam flotation depth (estimated at 108 m by Reference Ng, Liu, Mavlyudov and WangNg and others, 2007), and h cα and α are constants. The outburst condition h = h c in the Ng–Liu model is replaced by Y = 0, and we modify Eqn (1) to
During filling, this model describes coupled evolution of lake volume and seal position, with the seal migrating towards the target position (h cα − h)/h 0, which itself moves. The constant α controls the migration rate, and h cα is the theoretical outburst depth of the lake if there were no delay in the seal’s arrival at the lake. The water supply Qi is calculated by Eqn (2) as before.
Equations (5) and (6a) reproduce the behaviour of Reference FowlerFowler’s (1999) model. In each prediction run, we integrate them by the finite-difference method, starting with V = Y = 0 on the day after the last flood. Low lake level initially locates the target down-glacier from the lake, so the seal migrates into Y > 0. But filling of the lake relocates the target up-glacier from the dam eventually (when (h cα − h)/h 0 <0), attracting the seal back towards the lake after some delay. As with the models in Section 4.2, we optimize the success of this DM model by tuning α and h cα over their parameter space.
5. Flood-Date Prediction Experiments: Results and Discussion
Table 3 summarizes our five prediction models. (For convenience we use the word ‘model’ to refer to the three versions of the Ng–Liu model with different thresholds, as well as to the CD and DM models.) In this section we analyse their performance in prediction runs made to hindcast the Merzbacher flood dates. In total, 18 sets of runs were made. The first 9 sets use the ‘Real’ scheme for weather forcing and consist of 1 set of runs for the CD model and 2 sets of runs each for the CT, VTh, VTT and DM models (one set with NCEP forcing, the other with ERA forcing). In each set of runs, model parameters were tuned to optimize prediction success separately in terms of RMS and P 20. Table 4 shows the results, listing in its columns the optimal model parameters and the RMS and P 20 achieved when each of these is optimized. The other 9 sets of runs are structured identically but use the ‘Simple’ scheme for weather forcing. Table 5 shows the corresponding results.
Several matters guide the following analysis of these results: (1) model performance and the floods’ predictability, (2) influences of model complexity and weather forcing on prediction success and (3) the impact of weather uncertainty on the predictions. We cover these matters in Sections 5.1 and 5.2. In Section 5.3 we ask how the reliability of predictions varies with time, and this leads us to suggest an ensemble prediction strategy that uses multiple prediction models to derive forecasts.
5.1. Prediction performance of the models
Since RMS and P20 are fundamentally different quantifiers of the prediction errors, a model optimized for RMS must yield lower RMS than the same model optimized for P20, and a model optimized for P20 must yield higher P20 than the same model optimized for RMS. Table 4 confirms this expectation (compare the RMSs in column 4 with those in column 7, and the P 20’s in column 8 with those in column 5). Accordingly we focus our comparisons here on columns 4 and 8. These show RMS around 120 days and P 20 of 50–60%. The RMS values are not useful quantifiers of our models′ performances as they are disappointingly large and dominated by outlier errors (Section 4.1). Thus, although Table 4 provides findings for both measures, forecasters of the Merzbacher floods might choose P 20 as the main criterion for identifying the best model.
Considering the results in both columns 4 and 8 in Table 4, the overall pattern is that the Variable Threshold models (VTh and VTT) perform best, followed by the DM model, then the CT model. This pattern is robust for each temperature forcing. Also, the CD model performs better than the CT model (as noted before), implying that a fixed threshold does not enhance prediction accuracy. Compared with the other models, the CD model performs nearly as well as the DM model but consistently worse than both Variable Threshold models. The latter models yield P 20 = 57.3–57.5%, meaning that we can hindcast the floods accurately to within 20 days ~57.4% of the time. We have made additional comparisons with the generalized measure Pn (defined in Section 4.1), which show that the Variable Threshold models surpass all other models in the practical tolerance range n ≤ 20. These results support Reference Ng and LiuNg and Liu’s (2009) finding that dh/dt plays a role in flood-initiation physics. However, in our VTh model the optimal β-values are positive, not negative as found by those authors; our threshold h c needs to increase with dh/dt for successful prediction. One possible explanation of this sign difference is the fact that the correlation analysis of Ng and Liu (see the description leading up to Eqn (3) in Section 4.2) and our fitting of flood hindcasts to observed flood dates do not amount to the same optimization constraint for β.
Reassuringly, Table 4 shows that models incorporating environmental conditions in their determination of the outburst threshold (e.g. the rates of meltwater input to the lake and lake-level rise) do better than the CD model. But does prediction success always improve with model complexity? A model with richer physics and more parameters ought to yield a closer fit between hindcasts and observations, unless it is faulty. Hence we expect model performance to improve in the order: CD, CT, VTh/VTT, DM. Our results upset this trend in two ways. Firstly, the DM model performs worse than the VT models. This suggests that either Reference FowlerFowler’s (1999) seal migration does not capture what happens at Merzbacher Lake, or that it does but the DM model incorrectly mimics his mechanism. However, Table 4 shows that the DM model improves upon the CT model in all cases. We can explain this because these models are related, with the CT model being the limit of the DM model as α→∞: then the seal responds infinitely fast to lake-level changes to track the target, and the outburst condition Y = 0 in Eqn (5) becomes equivalent to h = h cα (constant). Seen in this light, the DM model performs better because it has an extra degree of freedom over the CT model.
Secondly, the benchmark CD model outperforms our CT models, despite the latter models incorporating the physical concept of an outburst threshold and the effect of variable weather conditions on the lake-filling rate. This is significant because, given our lack of understanding of the details of flood-initiation physics, a likely first step for flood forecasters aiming to include simple physics in their models and improve their flood predictions is to assume a constant outburst threshold and simulate the filling of the lake physically (with an equation like our Eqn (2)). Our results show that unless the threshold is allowed to vary with environmental conditions, this approach does not yield better flood hindcasts than simply assuming floods occur on the same day every year.
5.2. Influence of weather forcing on prediction success
How does weather uncertainty influence our predictions? Such uncertainty features in the model runs implementing the ‘Real’ scheme in two ways: (1) via the difference between the NCEP and ERA temperature data, which themselves are estimates of a weather variable, and (2) via the fact that the forecaster does not know the future weather. We assess the influence from both angles.
Considering the data sources first, Table 4 shows that the NCEP forcing yields consistently lower optimal RMS values than the ERA forcing, while the optimal P 20 values are almost independent of the choice of forcing. Thus, NCEP temperature data seem to be only marginally better as a forcing for predicting the Merzbacher floods. As noted before, the choice of temperature forcing does not change the overall ordering of model performance (CT, DM, VTh/ VTT) in Table 4.
And what of the need to forecast future weather in the ‘Real’ scheme? We assess the impact of this on prediction performance by comparing Table 4 with Table 5, focusing again on the optimal RMS and P 20 results in columns 4 and 8. The ‘Simple’ scheme ought to perform better than the ‘Real’ scheme because it uses reanalysis data rather than forecasts of future daily temperature to derive the lake-water supply, and this eliminates a source of uncertainty. Tables 4 and 5 show that only the Variable Threshold models that use the ERA forcing fulfil this expectation. In contrast, the Variable Threshold models that use the NCEP forcing perform worse in the ‘Simple’ scheme than in the ‘Real’ scheme. The same is true of the CT and DM models, but the poorer performance of these models suggests that they are limited more by other deficiencies (notably in their threshold assumptions) than by weather uncertainty. It is less clear why the Variable Threshold models with NCEP forcing do not perform better in the ‘Simple’ scheme. A possible explanation is that short-term weather fluctuations are more accurately represented by T ERA than by T NCEP, even though these temperature forcings produce similar meltwater volumes on seasonal and annual timescales (e.g. Fig. 3c) and have similar multi-year means, so that their flood-prediction performances in the ‘Real’ scheme are similar (Table 4).
5.3. The forecaster’s time frame and ensemble prediction
Having used the RMS and P 20 of the hindcast errors to evaluate the models, we turn to a different aspect of the forecasting problem. As each prediction is made, the forecaster may wish to know whether its reliability depends on how far the next flood lies in the future. Given the impact of weather uncertainty examined above, a reasonable hypothesis is that predictions close to the flood are more successful than those made far in advance. Here we study this time dependence using hindcast data from our optimized model runs for the two best models in Table 4: VTh with T ERA temperature forcing, and VTT with T NCEP temperature forcing.
On the day when each prediction is made, D, the forecaster cannot in fact determine its time difference from the next flood because the flood has not yet occurred. However, the forecaster knows the time difference between D and the predicted flood date. We denote by N this ‘predicted time before the next flood’. N is easy to calculate for our hindcast data.
Figure 9 shows how the success of hindcasts (measured with P 20) varies with N in the two Variable Threshold models, with the data for N arranged in 10 day bins to suppress noise on the distributions. Results of the CD model are included for comparison. In practice, after predicting a flood date with a given model, the forecaster can look up the corresponding distribution on this plot to learn the probability that the forecast is accurate to within 20 days. We see that both Variable Threshold models excel over the CD benchmark for most of the range in N, and the VTh model with the ERA temperature forcing performs best overall. The distributions of both Variable Threshold models seem to be consistent with our hypothesis at the beginning of this subsection, as they show a negative (albeit weak) trend. In the ‘Real’ scheme, prediction runs that forecast an imminent flood (with a small value of N) would have used more archived temperature data and fewer forecasts of future weather for deriving the lake-water supply and hence be more likely to be accurate. The CD model shows no such trend as it does not incorporate weather forcing.
The empirical probabilities in Figure 9 motivate an ensemble prediction strategy, as follows. The forecaster predicts the next flood date using each of the models (VTh, VTT and CD), then calculates the values of N for these predictions, and reads the corresponding probabilities of prediction success from Figure 9. In the simplest strategy, the forecast having the highest P 20 is taken as the best guess of the flood date. Consider using this ensemble method on 1 January 1959 (this is 38 days after the date of flood 4, 24 November 1958) to forecast flood 5 (19 September 1959). The CD model predicts the next flood on 4 August 1959, whereas the VTh and VTT models predict it on 8 September 1959 and 9 September 1959, respectively. The circles in Figure 9 mark the corresponding values of N (215 days, 250 days, 251 days) and P 20 (60.4%, 64.3%, 63.5%). In this example, the forecaster learns that the best forecast is the one by the VTh model, 8 September 1959. The P 20 of this forecast (64.3%) is much higher than the mean P 20 of this model in Table 4 (57.5%). In fact, this forecast is successful (accurate to within 20 days) and has a true error of E = 11 days, while the true errors of the other two forecasts by the VTT and CD models are 10 days and 46 days, respectively. This ensemble strategy allows the forecaster to choose between models based on empirical experience.
6. Conclusions and Outlook
Low-order models that implement an outburst threshold based on the lake-water depth can give useful predictions of the date of jökulhlaups from Merzbacher Lake. Our best models (VTh and VTT) assume a variable threshold depth governed by weather. They hindcast observed flood dates successfully to within ±20 days 57.4% of the time, excelling over a benchmark model that assumes a constant flood date each year. For the Merzbacher Lake–Inylchek Glacier system, these models can be readily incorporated into practical flood-forecasting schemes, and may aid decisions regarding the development of hydropower down-valley and management of the corresponding reservoirs (Reference Mamatkanov, Deng, Tuzova and DzhumanazarovaMamatkanov and Deng, 2011). The present work complements the theory by Reference Ng and LiuNg and Liu (2009) that addressed mechanisms underlying the long-term timing pattern of multiple floods.
Although we can understand why predictions made far in advance of flood events are generally less reliable (Section 5.3), it is less clear why weather uncertainty impacts differently on the success of model predictions made with NCEP and ERA temperature forcings (Section 5.2). A limitation of our current study is its use of these reanalysis data, which are themselves uncertain estimates of the past weather. By comparing them to meteorological measurements at the lake, future work will evaluate how severely such uncertainty affects jökulhlaup predictability. For example, reanalysis data may prove completely incapable of reproducing short-term (e.g. daily) weather variations, limiting the prediction ability of models that use such data.
An ability to forecast flood dates to within ±20 days a little more than half of the time is not overwhelmingly successful, and reflects how poorly we understand the physics of flood initiation. However, the relative success of our prediction models may reveal aspects of these physics. For example, assuming a constant outburst threshold – one of the simplest possible physically based assumptions – completely fails to yield useful flood predictions. In contrast, our most successful models allow the threshold to vary with weather conditions. This supports Reference Ng and LiuNg and Liu’s (2009) inference that, at Merzbacher Lake, the rate of change of lake depth (dh/dt) influences the outburst threshold through subglacial seal dynamics. Also revealing is the failure of our DM model, designed to mimic the behaviour of Reference FowlerFowler’s (1999) model of seal dynamics. How closely our toy divide model mimics Fowler’s full thermomechanical model is presently unclear; however, the fact that models with thresholds that are linearly dependent on lake-filling (VTh) and air temperature (VTT) outperform the divide model suggests that dh/dt may not influence the threshold in the way predicted by Reference FowlerFowler’s (1999) theory. When the bed geometry of Inylchek Glacier along the flood path has been accurately determined, it will be worth revisiting the full model of Reference FowlerFowler (1999) and quantifying its prediction ability.
Future work could investigate several model extensions. First, observations show that after a lake empties in a jökulhlaup, water entering the lake basin may flow directly into the glacier dam as an open subglacial stream for some time before the lake reforms (e.g. Reference Bartholomaus, Anderson and AndersonBartholomaus and others, 2011). The duration of this flow presumably depends on the lake-water input rate and the characteristics of the preceding flood; a large flood might cause significant delay before the lake begins to refill. Such a scenario could be incorporated into our models by considering the open-channel hydraulics. Second, the rate of calving from the ice dam, c, which affects lake-water balance (see Eqn (2)), may vary in time and depend on factors such as the lake-water depth. One could account for this variability in our models using ‘calving laws’ (e.g. Reference Benn, Hulton and MottramBenn and other, 2007). Third, the ensemble prediction strategy in Section 5.3 can be developed to exploit weighted averages of the predictions from different models. A further valuable extension will be to combine the prediction of flood timing as examined here with thermomechanical modelling of flood-discharge evolution to forecast the duration and magnitude (peak size and volume) of each flood.
Acknowledgements
We thank Liu Siyin for providing the last three flood dates in Table 1. J.K. acknowledges the support of a University of Sheffield PhD Studentship. We thank Joseph Walder and an anonymous reviewer for positive and constructive reviews that have helped us clarify and strengthen the paper.