Debugging

dolfin-adjoint offers a thorough suite of debugging routines, to identify exactly why the adjoint might not be correct.

Visualising the forward and adjoint systems

It is sometimes useful when debugging a problem to see dolfin-adjoint’s interpretation of your forward system, and the other models it derives from that. The adj_html function dumps a HTML visualisation:

adj_html("forward.html", "forward")
adj_html("adjoint.html", "adjoint")

For example, let us include these in the Burgers’ equation example:

from dolfin import *
from dolfin_adjoint import *

adj_checkpointing(strategy='multistage', steps=11,
                  snaps_on_disk=2, snaps_in_ram=2, verbose=True)

n = 30
mesh = UnitSquareMesh(n, n)
V = VectorFunctionSpace(mesh, "CG", 2)

ic = project(Expression(("sin(2*pi*x[0])", "cos(2*pi*x[1])"), degree=2),  V)

def main(nu):
    u = ic.copy(deepcopy=True)
    u_next = Function(V)
    v = TestFunction(V)

    timestep = Constant(0.01)

    F = (inner((u_next - u)/timestep, v)
       + inner(grad(u_next)*u_next, v)
       + nu*inner(grad(u_next), grad(v)))*dx

    bc = DirichletBC(V, (0.0, 0.0), "on_boundary")

    t = 0.0
    end = 0.1
    while (t <= end):
        solve(F == 0, u_next, bc)
        u.assign(u_next)
        t += float(timestep)
        adj_inc_timestep()

    return u

if __name__ == "__main__":
    nu = Constant(0.0001)
    u = main(nu)

    adj_html("forward.html", "forward")
    adj_html("adjoint.html", "adjoint")

    J = Functional(inner(u, u)*dx*dt[FINISH_TIME])
    dJdnu = compute_gradient(J, Control(nu))

    Jnu = assemble(inner(u, u)*dx)

    parameters["adjoint"]["stop_annotating"] = True
    def Jhat(nu):
        u = main(nu)
        return assemble(inner(u, u)*dx)

    conv_rate = taylor_test(Jhat, Control(nu), Jnu, dJdnu)

more info Download the code to dump the HTML visualisation.

The resulting forward and adjoint tables are available for inspection.

Each row corresponds to one equation solve. The variable being solved for is listed at the top. If the variable is green, the value of that variable is recorded; if the variable is red, the value of that variable is not recorded. To identify the dependencies of each operator, hover over the block (on the diagonal and on the rhs) with the mouse.

Replaying the forward run

In order to derive a consistent adjoint model, dolfin-adjoint must correctly understand your forward model. If dolfin-adjoint’s record of your forward model is incorrect, then it cannot derive a correct adjoint model.

One way this could happen is if the forward model manually modifies the vector() of a Function. For example, suppose that instead of using

u.assign(u_next)

the code used

u.vector()[:] = u_next.vector()

then the adjoint would be incorrect, as dolfin-adjoint cannot detect the modification:

$ python tutorial_incorrect.py
...
Convergence orders for Taylor remainder with adjoint information (should all be 2):
  [0.9544004555220237, 0.9767390399643741, 0.9882512926547484, 0.9940957131097388]

How would we detect this situation? To check the consistency of dolfin-adjoint’s annotation, it can replay its interpretation of the forward model and compare the results to the real forward model. To do this, use the replay_dolfin function:

success = replay_dolfin(tol=0.0, stop=True)

To see this in action, consider the following (incorrect) code:

from dolfin import *
from dolfin_adjoint import *

n = 30
mesh = UnitSquareMesh(n, n)
V = VectorFunctionSpace(mesh, "CG", 2)

ic = project(Expression(("sin(2*pi*x[0])", "cos(2*pi*x[1])"), degree=2),  V)

def main(nu):
    u = ic.copy(deepcopy=True)
    u_next = Function(V)
    v = TestFunction(V)

    timestep = Constant(0.01)

    F = (inner((u_next - u)/timestep, v)
       + inner(grad(u_next)*u_next, v)
       + nu*inner(grad(u_next), grad(v)))*dx

    bc = DirichletBC(V, (0.0, 0.0), "on_boundary")

    t = 0.0
    end = 0.1
    while (t <= end):
        solve(F == 0, u_next, bc)
        u.vector()[:] = u_next.vector()
        t += float(timestep)

    return u

if __name__ == "__main__":
    nu = Constant(0.0001)
    u = main(nu)

    success = replay_dolfin(tol=0.0, stop=True)

more info Download the broken adjoint code, where the error is detected by replay.

The replay detects that the interpretation and the real forward model diverge:

$ python tutorial8.py
...
Comparing w_4:1:0:Forward against previously recorded value:
   norm of the difference is 7.120735e-02 (> tolerance of 0.000000e+00)

w_4 refers to u_next (try printing it), and w_4:1:0 means “the function w_4 at timestep 1 and iteration 0”. In this error message, libadjoint tells us that the second solution of u_next is different in the replay than in the forward model. Of course, this is because dolfin-adjoint’s knowledge of the value of u is wrong. With this information, the user can inspect the code around the solution for u_next and examine more closely.

Note that the replay cannot be exactly perfect when the model is run in parallel: the order of the parallel reductions is nondeterministic, and so the answers can diverge within floating-point roundoff. When debugging, run in serial.

Testing the derivatives of operators

If the replay works perfectly, but the adjoint is still incorrect, then there are very few possibilities for what could be wrong. The only other major possibility is if the model uses a discretisation that is not differentiable. In order to assemble the adjoint equations, any operators in the forward model that depend on previously computed values must be differentiated with respect to those values. If that dependency is not differentiable, then no consistent derivative of the model exists.

A simple way to check the differentiability of your model is to set

parameters["adjoint"]["test_derivative"] = True

before solving any equations. Then, whenever libadjoint goes to assemble a term involving the derivative of a nonlinear operator, it will apply the Taylor test (at the level of the operator, instead of the whole model). For example, a typical error message from the derivative test looks like

>>>>
Traceback (most recent call last):
...
libadjoint.exceptions.LibadjointWarnComparisonFailed: Expected the Taylor series remainder
   of operator 96c04e026b91e44576aa43ccef66e6a8 with respect to w_{16}:1:22:Forward to
   converge at second order, but got 1.000000
   (the error values are 1.393963e-11 and 6.969817e-12).

In this message, libadjoint tells us that the operator “96c04e026b91e44576aa43ccef66e6a8” (the names are automatically generated from the hash of the form) depends on the variable w_{16}:1:22, but that its dependence is not differentiable (the Taylor remainder convergence test yielded 1, instead of 2). In this example, the operator is an upwinded DG discretisation of the advection operator, and w_{16}:1:22 is an advecting velocity.

Note that even if the adjoint is not perfectly consistent (i.e., the Taylor remainders do not converge at second order), the resulting gradients can still be “good enough” for the purposes of an optimisation algorithm. All that matters to the optimisation algorithm is that the gradient provides a descent direction; if the Taylor remainders are “small”, then the convergence of the algorithm will usually not be affected. Thus, the adjoint is generally still useful, even for nondifferentiable discretisations.

Note that a more rigorous approach for the case where the functional is nondifferentiable is to consider the functional gradient produced by the adjoint as a subgradient. For more information, see the Wikipedia.