Verification

Taylor remainder convergence test

The fundamental tool used in verification of gradients is the Taylor remainder convergence test. Let \(\widehat{J}(m)\) be the functional, considered as a pure function of the parameter of interest, let \(\nabla \widehat{J}\) be its gradient, and let \(\delta m\) be a perturbation to \(m\). This test is based on the observation that

\[\left|\widehat{J}(m + h\delta m) - \widehat{J}(m)\right| \rightarrow 0 \quad \mathrm{at} \ O(h),\]

but that

\[\left|\widehat{J}(m + h\delta m) - \widehat{J}(m) - h\nabla \widehat{J} \cdot \delta m\right| \rightarrow 0 \quad \mathrm{at} \ O(h^2),\]

by Taylor’s theorem. The quantity \(\left|\widehat{J}(m + h\delta m) - \widehat{J}(m)\right|\) is called the first-order Taylor remainder (because it’s supposed to be first-order), and the quantity \(\left|\widehat{J}(m + h\delta m) - \widehat{J}(m) - h\nabla \widehat{J} \cdot \delta m\right|\) is called the second-order Taylor remainder.

Suppose someone gives you two functions \(\widehat{J}\) and \(d\widehat{J}\), and claims that \(d\widehat{J}\) is the gradient of \(\widehat{J}\). Then their claim can be rigorously verified by computing the second-order Taylor remainder for some choice of \(h\) and \(\delta m\), then repeatedly halving \(h\) and checking that the result decreases by a factor of 4.

Applying this in dolfin-adjoint

In the case of PDE-constrained optimisation, computing \(\widehat{J}(m)\) involves solving the PDE for that choice of \(m\) to compute the solution \(u\), and then evaluating the functional \(J\). The main function in dolfin-adjoint for applying the Taylor remainder convergence test is taylor_test. To see how this works, let’s restructure our forward model so that we can run it as a pure function of the diffusivity \(\nu\):

from dolfin import *
from dolfin_adjoint import *

n = 30
mesh = UnitSquareMesh(n, n)
V = VectorFunctionSpace(mesh, "CG", 2)

ic = project(Expression(("sin(2*pi*x[0])", "cos(2*pi*x[1])"), degree=2),  V)

def main(nu):
    u = ic.copy(deepcopy=True)
    u_next = Function(V)
    v = TestFunction(V)

    timestep = Constant(0.01)

    F = (inner((u_next - u)/timestep, v)
       + inner(grad(u_next)*u_next, v)
       + nu*inner(grad(u_next), grad(v)))*dx

    bc = DirichletBC(V, (0.0, 0.0), "on_boundary")

    t = 0.0
    end = 0.1
    while (t <= end):
        solve(F == 0, u_next, bc)
        u.assign(u_next)
        t += float(timestep)

    return u

if __name__ == "__main__":
    nu = Constant(0.0001)
    u = main(nu)

    J = Functional(inner(u, u)*dx*dt[FINISH_TIME])
    dJdnu = compute_gradient(J, Control(nu))

As you can see, we’ve taken the action part of the model into a function main, so that we can drive the forward model several times for verification. Now let’s see how to use taylor_test:

from dolfin import *
from dolfin_adjoint import *

n = 30
mesh = UnitSquareMesh(n, n)
V = VectorFunctionSpace(mesh, "CG", 2)

ic = project(Expression(("sin(2*pi*x[0])", "cos(2*pi*x[1])"), degree=2),  V)

def main(nu):
    u = ic.copy(deepcopy=True)
    u_next = Function(V)
    v = TestFunction(V)

    timestep = Constant(0.01)

    F = (inner((u_next - u)/timestep, v)
       + inner(grad(u_next)*u_next, v)
       + nu*inner(grad(u_next), grad(v)))*dx

    bc = DirichletBC(V, (0.0, 0.0), "on_boundary")

    t = 0.0
    end = 0.1
    while (t <= end):
        solve(F == 0, u_next, bc)
        u.assign(u_next)
        t += float(timestep)

    return u

if __name__ == "__main__":
    nu = Constant(0.0001)
    u = main(nu)

    J = Functional(inner(u, u)*dx*dt[FINISH_TIME])
    dJdnu = compute_gradient(J, Control(nu))

    Jnu = assemble(inner(u, u)*dx) # current value

    parameters["adjoint"]["stop_annotating"] = True # stop registering equations

    def Jhat(nu): # the functional as a pure function of nu
        u = main(nu)
        return assemble(inner(u, u)*dx)

    conv_rate = taylor_test(Jhat, Control(nu), Jnu, dJdnu)

more info Download the adjoint code with verification.

We set parameters["adjoint"]["stop_annotating"] to True, so that the solve calls in the perturbed runs of main are not registered by libadjoint.

Running this program yields the following output:

$ python tutorial4.py
...
Taylor remainder without gradient information:
  [0.0023634768826859553, 0.001219686435181555, 0.0006197555788530762,
  0.0003124116082189321, 0.0001568463925042951]

The first line gives the values computed for the first-order Taylor remainder. As you can see, each value is approximately half the previous one.

Convergence orders for Taylor remainder without gradient information (should all be 1):
  [0.9544004555219242, 0.9767390399645689, 0.9882512926546192, 0.9940957131087177]

The second line shows the convergence orders of the first-order Taylor remainders: these should always be 1. (If they are not, try passing the seed argument to taylor_test to use a smaller perturbation.)

Taylor remainder with gradient information:
  [0.00015639195264909554, 4.0247982485970384e-05, 1.0211629980686528e-05,
  2.5719961979492594e-06, 6.454097041455739e-07]

The third line gives the values computed for the second-order Taylor remainder. These values should be much smaller than those on the first line.

Convergence orders for Taylor remainder with gradient information (should all be 2):
  [1.9581779067535698, 1.9787032993204938, 1.9892527525050359, 1.9946013350664813]

The fourth line shows the convergence orders of the second-order Taylor remainders: if the gradient has been computed correctly with the adjoint, then these numbers should be 2.

As can be seen, the second-order Taylor remainders do indeed converge at second order, and so the gradient dJdnu is correct.

So, what if the Taylor remainders are not correct? Such a situation could occur if the model manually modifies Function values, or if the model modifies the entries of assembled matrices and vectors, or if the model is not differentiable, or if there is a bug in dolfin-adjoint. dolfin-adjoint offers many ways to pinpoint precisely where an error might lie; these are discussed in the section on debugging.

Once the adjoint model development is completed, you may wish to run your model on much bigger production runs. By default, dolfin-adjoint stores all variables computed in memory, as they will be necessary to linearise the model about the forward trajectory. If you wish to solve much bigger problems, or if the model must run for many timesteps, it may not be feasible to store everything: some balance between storage and recomputation must be achieved, in the form of a checkpointing scheme. Checkpointing is the topic of the next section.