Preface
Numerical models of glaciers and ice sheets include parameters such as viscosity (often expressed using other parameters), density, thermal conductivity, heat capacity, geothermal heat flux and basal traction coefficients. (Even basal topography can be considered a model ‘parameter’.) Many of these parameters are difficult to measure directly. We use the phrase ‘data assimilation’ to refer to a method where more easily measured data, such as surface velocities, are used to estimate the values of such parameters. The idea behind data assimilation is to find the values of the parameter of interest (e.g. a frictional coefficient along the bed) that result in the best match between values calculated by the model (surface velocities, perhaps) and measured data. Thus, finding appropriate values for model parameters can be accomplished by solving an optimization problem, namely that of minimizing a measure of the difference between calculated and measured values.
In this paper ‘adjoint methods’ are described in the context of optimization problems. Application to an idealized data-assimilation problem from glaciology is described in the final section. Although applications using real glaciological data can be found in various published research papers (e.g. Reference Brinkerhoff, Meierbachtol, Johnson and HarperBrinkerhoff and others, 2011) no attempt is made to review them here.
Introduction
Suppose we have a problem involving a collection of parameters p whose solution is u = F (p). We want to find the values of the parameters p that minimize (or maximize) a given (scalar) function g( u ). Since u is a function of the parameters p, we can think of g as a function of p. (Formally we could introduce a new function, but we believe the presentation is clearer, albeit less rigorous, without it.) Most efficient algorithms used to optimize g require knowledge of ∂g /∂pk for each of the parameters pk (components of p). Adjoint methods can be used to find these derivatives. We write this collection of partial derivatives as a column matrix ∂g /∂p = [∂g /∂p1 ∂g /∂p2 ∂g /∂pm ]T
The Jacobian matrix of the function F that maps the parameters p to the solution u is defined as
This Jacobian matrix can be used to approximate changes in the solution u resulting from small changes in the parameters p. For an individual component of the solution u,
The approximate changes for all components of u can be written compactly (using matrix multiplication) as
To find the desired derivatives, ∂g /∂ p, consider
Using the chain rule,
Note that the derivatives of the components of u with respect to pk are the elements of the k th column of the Jacobian matrix, ∂ u /∂ p. We can write the desired derivatives compactly using the transpose (also called the adjoint Footnote * ) of the Jacobian matrix
Use of the adjoint of a Jacobian matrix seems to be where ‘adjoint methods’ get their name.
The method can be illustrated by a simple example where the function F mapping the parameters p to the solution u is given explicitly.
Example 1:
Suppose and g(u) = u 1 + u 2 2. We want to find .
To calculate ∂g/∂ p using the adjoint method we begin by finding the Jacobian matrix
Then
To check this solution we fine ∂g/∂p directly using g(u) = u1 + u2 2 =)p1 2 + p2) + (p1p2)2, from which we find ∂g /∂p1 = 2p1 + 2p1p2 2 and ∂g /∂p2 = 1 + 2p1 2p2.
A Typical Situation
Often the problem to be solved (whose solution is u = F(p)) is expressed as a system of n equations, f(u, p) = 0. For this problem we still have a function g (u), for which we want to find optimal values of the parameters p and thus need ∂g/∂ p = [∂g/∂p 1 ∂g/∂p 2 ∂g/∂pm ]T. In the preceding section the chain rule was used to show that ∂g/∂ p = ∂ u/∂p T ∂g/∂ u. To simplify the appearance of some of the equations that follow, we begin by taking the transpose of both sides of this equation and use the matrix identity (AB) T = BT AT to obtain
As before, the elements of ∂g/∂ u are obtained by differentiating g (u), but now the elements of ∂ u /∂ p must be determined from f(u, p) = 0. If we take the total derivative of both sides of the equation fi (u, p) = 0 with respect to pk we obtain
Doing this for every combination of the n elements of u and the m elements of p results in nm equations that can be represented compactly as ∂ f/ ∂ u ∂ u /∂ p + ∂ f/∂p = 0:
This equation can be solved for the ∂ui/∂pk factors
needed to obtain our goal
Here ∂g /∂u T is a 1 × n (row) matrix, ∂f/∂u is n × n and ∂f /∂p is n × m. To avoid multiplying the two large matrices we perform the multiplication in the order
With the product λT = ∂g /∂uT ∂f/∂u−1 found by solving the adjoint problem
Note that ∂ f /∂ u is a Jacobian matrix; once again obtaining the desired partial derivatives (∂g/∂ p) uses the transpose (i.e. adjoint) of a Jacobian matrix. Also note that this Jacobian matrix is the one used in Newton’s method to (approximately) solve f(u, p) = 0 (with fixed p) iteratively.
Example 2:
Suppose
f 1( u 1, u 2, p 1, p 2) = u 1 + u 2 + p 1
f 2( u 1, u 2, p 1, p 2) = u 3 1 – u 2 + p 2
g (u 1, u 2) = u 2 1 + u 2 2
The system of equations f 1( u 1, u 2, p 1, p 2) = 0 and f 2(u 1, u 2, p 1, p 2) = 0 has a solution, (u 1, u 2), that depends on the parameters p 1 and p 2. Changing the values of the parameters p 1 and p 2 causes the values of the variables u 1 and u 2 to change and thus the value of g (u 1, u 2) also changes. We want to find ∂g/∂ p T = [∂g/∂p 1 ∂g/∂p 2].
To calculate ∂g/∂ p T using the adjoint method we begin by finding the Jacobian matrix
along with
Next we write the adjoint problem (∂f /∂u)T λ = ∂g /∂u
and solve it to find
The desired derivatives are now found from
To appreciate the meaning of these derivatives we consider a couple of specific value pairs for the parameters p 1 and p 2.
First, if p 1 = 0 and p 2 = 0, the equation f(u, p) = 0 has solution u = 0, so g( u) = 0. Since g = u 2 1 + u 2 2 ≥ 0 (always), g( u) = 0 must be a global minimum so we expect ∂g/∂ p = 0. Indeed, substituting u 1 = 0 and u 2 = 0 into our expression, confirm this,
To gain further appreciation of ∂g/∂ p, we consider a graphical representation of our example problem. If we plot solutions of f 1(u 1, u 2, p 1, p 2) = u 1 + u 2 + p 1 = 0 in the (u 1, u 2) plane we obtain a line with slope –1 and vertical intercept –p 1. If we plot solutions of f 2( u 1, u 2, p 1, p 2) = u 3 1 – u 2 + p 2 = 0 in the (u 1, u 2) plane we obtain a cubic curve that crosses the vertical axis at p 2 with slope 0. The solution to the system of equations f(u, p) = 0 is represented by the point where the line and the cubic curve cross. The function g gives the square of the distance from the origin, so circles centered on the origin represent contours of constant g. Curves for p 1 = –2, p 2 = 0 and g = 2 are shown in Figure 1.
As can be seen from Figure 1, for this value of p the solution of f(u, p) = 0 is u = (1, 1). If we increase p 1, the line with slope 1 will move down in the figure so the solution point will follow the cubic curve, decreasing its distance from the origin, so g decreases. If, however, we increase p 2, the cubic curve will move up so the solution point will follow the line with slope –1 which is tangent to the (dashed) circle representing a contour of constant g. Thus we expect ∂g/∂p 1 < 0 and ∂g=∂p 2 = 0. Substituting the solution u 1 = 1 and u 2 = 1 into our expression gives ∂g/∂p T = [–2 0], confirming this. The reader is encouraged to use the graphical representation to find other parameter pairs where one of the components of ∂g/∂p is zero and use the expression for ∂g/∂p T to confirm their findings.
A Numerical Example
In this section we use finite differences to approximate the boundary-value problem
where p (x) = p 0 + p 1 x + p 2 x 2 (a second-order polynomial). After computing an approximate solution for particular values of the parameters c 2, c 1, c 0, p 0, p 1, p 2, a 0 and a 1, we use the adjoint method to calculate the rate of change of a function of the solution, g (u), as we vary these parameters. Two different functions are considered: (1) g is the value of u at the midpoint of the domain, and (2) g is the average value of u,
To apply the finite-difference method to our differential equation we divide the domain [0, 1] into n subintervals, each with length The endpoints of these subintervals are x 0, x 1, …, xn , where xk = k∆x. We denote approximate values of u at each of these points as uk . The first derivative, u ʹ, at xk is approximated by . The second derivative, u ʹʹ, at xk is approximated by uk ’ = Substituting these approximations into the differential equation gives
which can be rearranged to give
a linear algebraic equation that can be written for each of the interior points k = 1, 2, …, n –1. At the endpoints of the domain we assert u 0 = a 0 and un = a 1. This collection of equations can be written compactly as Au = b, where A is a tridiagonal matrix, and u and b are column matrices. The matrix equation can be solved (e.g. using Gaussian elimination and back substitution) for u, giving an approximation to the solution of the boundary-value problem.
Following the procedure in the preceding paragraph for the (somewhat arbitrary) parameter values c 2 = 1, c 1 = –2, c 0 = 1, p 0 = 1, p 1 = 1, p 2 = –5, a 0 = 0 and a 1 = 0 generates the approximation shown in Figure 2.
We can use the adjoint method to find the rate of change of a function of u with respect to the parameters c 2, c 1, c 0, p 0, p 1, p 2, a 0 and a 1. We consider first the simple function where an even value for n has been chosen.
To utilize the procedure described in the preceding section, the system of equations represented by the matrix equation Au = b is written in the form f(u, p) = 0, by subtracting the right-hand sides from both sides of each equation. Here p = [c 2 c 1 c 0 p 0 p 1 p 2 a 0 a 1]. To find ∂g/∂ p we need the three matrices ∂ f/ @ u, ∂ f/ @ p and ∂g/∂ u. Since ∂f/ ∂ u and ∂ f/ ∂ p require two columns to be displayed clearly, they are shown in Figure 3. For g (u) = un/ 2,
After solving the adjoint problem ∂ f/ ∂ u T λ = ∂g/∂ u for λ we can calculate ∂g/∂ p T = –λT∂ f/∂p. The results for the example problem whose solution is shown in Figure 2 are given in Table 1.
The values of the partial derivatives found using the adjoint method were verified by perturbing the parameters one at a time. For each parameter (c 2, c 1, c 0, p 0, p 1, p 2, a 0 and a 1) an additional finite-difference approximation was calculated with that parameter increased by 0.01, giving new values of u. Calling these new values ũ we can approximate the partial derivative of g with respect to the perturbed parameter by . As seen in Table 1, these approximations are very close to the values found using the adjoint method.
To calculate the partial derivatives of we use (the composite) Simpson’s rule (which requires an even value for n) to approximate this integral,
from which we have
Again, ∂g/∂p T = –λT ∂f/∂p, where λ is the solution of the adjoint problem ∂f/∂u Tλ = ∂g/∂u. Results are given in Table 1.
A Data-Assimilation Example from Glaciology
Mathematical models of glaciers typically include parameters which are difficult to estimate directly from field measurements. For example, some type of frictional coefficient is required at the base of the glacier where the ice rests upon the earth. The basal traction (described mathematically using this frictional coefficient) affects the velocity of the ice throughout the glacier. Thus, it is plausible to use the more easily measured velocity at the upper surface of the glacier to determine the frictional coefficient. In this concluding section we present an example problem, using a simplified geometry to demonstrate the use of the adjoint method to accomplish this.
The example problem is related to a benchmark problem, ‘Experiment D’ of the Ice Sheet Model Intercomparison Project for Higher-Order Models (ISMIP-HOM) (Reference PattynPattyn and others, 2008). In ISMIP-HOM Experiment D, a sheet of ice with uniform thickness of 1000 m rests on a plane inclined at an angle α = 0: 1°. A domain of length 20 km with periodic boundary conditions at each end is considered. A numerical model is used to calculate the velocity of the ice, which varies not only throughout the thickness of the ice but also along the length of the domain, due to variations in basal traction. There is no variation in the horizontal direction transverse to the inclination.
In ISMIP-HOM Experiment D, the basal traction is prescribed and ice velocities (and pressure) are calculated. We call this a ‘forward problem’. Here we are interested in a related ‘inverse problem’: given the surface velocity, find the basal traction.
The model used is a finite-element approximationFootnote * of Stokes’ equationsFootnote †
along with conservation of mass,
In these equations, x and y are the coordinate directions, x along the incline and y perpendicular to it (almost vertical). The components of velocity in the x – and y –directions are given by u and v, respectively. Pressure is represented by the variable p, while ρ = 910 kg m–3 is the density of ice and g = 9: 81 m s−2 is the acceleration due to gravity. The viscosity, μ, varies according to Glen’s flow lawFootnote ‡
with the effective strain rate
And flow parameters A = 10−16 Pa-na−1 and n = 3.
Stress-free boundary conditions apply at the upper surface
While at the base v = 0 and basal traction is modeled using the boundary condition
In ISMIP-HOM Experiment D
where L is the length of the domain (20 km for the problem presented here). This frictional coefficient is plotted as a heavy black curve in the lower left panel of Figure 4. The upper left panel of Figure 4 shows the velocity component, u, at the (top) surface of the ice for a solution of the ‘forward’ problem (using β 2 = 1000 + 1000 sin 2πx/L)) plotted as a heavy black line.
Now, for the inverse problem. Suppose we are given the surface velocity component, u, plotted as a heavy black curve in the upper left panel of Figure 4. We hope to use this information to discover the basal friction coefficient, β 2 (plotted as a heavy black curve in the lower left panel of Fig. 4). To do so we pose the problem as an optimization problem: find β 2 such that the surface velocity component, u, found by solving the ‘forward’ problem minimizes the error given by
Where ud is the desired surface velocity. To discretize the function β2 a trigonometric expansionFootnote § is used,
A variety of algorithms for solving optimization problems exist (Reference PressPress, 2007). Here we choose the Broyden–Fletcher– Goldfarb–Shano (BFGS) algorithm (Reference PressPress, 2007, p. 521–524), which uses partial derivatives of the objective function, g(u), with respect to the parameters, pk , that can be varied. These partial derivatives (∂g/∂pk ) are exactly what the adjoint method described in the previous sections can provide. Starting with a ‘guess’ for β 2, an approximate solution to the inverse problem is obtained by repeating the following steps:
-
1. Solve the ‘forward’ problem with the current estimate of β2.
-
2. Evaluate the error, g(u).
-
3. Evaluate the partial derivatives, ∂g/∂pk , using the adjoint method.
-
4. Calculate an improved estimate of β2 (i.e. the pk ) according to the BFGS algorithm.
The results of applying this process are shown in Figure 4. The initial guess was β2 = 1000; i.e. p 0 = 1000 and pk = 0 for 1 ≤ k ≤ N where, here, N = 30. The gray curves in the left panels of Figure 4 show approximations of u and β 2 as the process progressed. The original guess of2 ¼ 1000 (represented by a horizontal line in the lower left panel of Fig. 4) resulted in the bottommost horizontal line in the upper left panel of Figure 4. As the estimate of2 improves, the curves approach the desired values plotted as heavy black curves. The upper right panel of Figure 4 shows the error, g (u), decreasing throughout the process. Further improvement can be made by allowing more iterations of the algorithm. The lower right panel of Figure 4 shows the evolution of the coefficient (pk ) values. The upper curve represents p 0 which starts at 1000, dips down for a while but returns to a value very near 1000. The curve that increases from zero to near 1000 represents p 1. The remainder of the curves, which remain near zero, represent p 2 through p 30. In the optimal solution these values are all zero, but the largest, p 9, was 36.9 when the process was terminated. Again, further improvement can be made by allowing more iterations.
Acknowledgements
Most of the ideas in the Introduction were taken from section 3 of Ericco (1997). Most of the ideas in the section entitled ‘A typical situation’ were taken from section 8.7 of Reference StrangStrang (2007). For the ‘A data-assimilation example from glaciology’ section the finite-element and adjoint methods were implemented using the FEniCS and dolfin-adjoint software packages. Thanks to Jesse Johnson who served as committee chair for my thesis (Reference GranzowGranzow, 2013), which includes source code for computer programs that produced Figures 2 and 4. Thanks also to scientific editor Weili Wang and reviewers Fuyuki Saito and Stephen Price. This paper was written while the author was supported by NASA Research Opportunities in Space and Earth Sciences (ROSES) grant NNX11AR23G.