# Checkpointing¶

As discussed in the mathematical background, the adjoint model is a linearisation of the forward model. If the forward model is nonlinear, then the solution of that forward model must be available to linearise the forward model. By default, dolfin-adjoint stores every variable computed in memory, as this is the fastest and most straightforward option; however, this may not be feasible for large runs, or for runs with very many timesteps.

The solution to this problem is to employ a checkpointing scheme. Rather than store every variable during the forward run, checkpoints are stored at strategically chosen intervals, from which the model may recompute the missing solutions. During the adjoint run, if a forward variable is necessary and unavailable, the forward model is restarted from the nearest available checkpoint to compute the missing solutions; once these are available, the adjoint run continues.

Thus, to employ a checkpointing scheme, the control flow of the adjoint run must seamlessly jump between assembling and solving the adjoint equations, and assembling and solving parts of the forward run. Coding a checkpointing scheme is quite complicated, and so most hand-coded adjoint models do not use them. However, the libadjoint library underlying dolfin-adjoint embeds the excellent revolve library of Griewank and Walther, and can automatically employ optimal checkpointing schemes for almost no marginal user effort.

Activating checkpointing is very straightforward: two calls to dolfin-adjoint functions are necessary. Firstly, before any equations are solved, the user must call the adj_checkpointing function, which activates and configures the checkpointing scheme. Secondly, the user must place a call to adj_inc_timestep at the end of the time loop, which indicates to libadjoint that a timestep has ended. (Internally, the checkpointing scheme relies on the concept of timesteps, but dolfin-adjoint has no way of automatically determining when a timestep has ended, and so the user must help out.) For example, to activate checkpointing for the Burgers’ equation:

from dolfin import *

snaps_on_disk=2, snaps_in_ram=2, verbose=True)

n = 30
mesh = UnitSquareMesh(n, n)
V = VectorFunctionSpace(mesh, "CG", 2)

ic = project(Expression(("sin(2*pi*x[0])", "cos(2*pi*x[1])"), degree=2),  V)

def main(nu):
u = ic.copy(deepcopy=True)
u_next = Function(V)
v = TestFunction(V)

timestep = Constant(0.01)

F = (inner((u_next - u)/timestep, v)

bc = DirichletBC(V, (0.0, 0.0), "on_boundary")

t = 0.0
end = 0.1
while (t <= end):
solve(F == 0, u_next, bc)
u.assign(u_next)
t += float(timestep)

return u

if __name__ == "__main__":
nu = Constant(0.0001)
u = main(nu)

J = Functional(inner(u, u)*dx*dt[FINISH_TIME])

Jnu = assemble(inner(u, u)*dx)

def Jhat(nu):
u = main(nu)
return assemble(inner(u, u)*dx)

conv_rate = taylor_test(Jhat, Control(nu), Jnu, dJdnu)


$python tutorial5.py ... Convergence orders for Taylor remainder with adjoint information (should all be 2): [1.9581779061731224, 1.9787032981594719, 1.9892527501829258, 1.994601330422228]  To see what the checkpointing scheme does, pass verbose=True: $ python tutorial5.py | grep Revolve
Revolve: Checkpoint statistics:
Revolve: Checkpoint timestep 0 on disk.
Revolve: Advance from timestep 0 to timestep 3.
Revolve: Checkpoint timestep 3 on disk.
Revolve: Advance from timestep 3 to timestep 5.
Revolve: Checkpoint timestep 5 in memory.
Revolve: Advance from timestep 5 to timestep 7.
Revolve: Solve last timestep 7.
====== Revolve: Replay from equation 13 (first equation of timestep 7)
to equation 14 (last equation of timestep 7). ======
Revolve: Replaying equation 13.
Revolve: Checkpoint equation 13 in memory.
Revolve: Replaying equation 14.
Revolve: Delete checkpoint equation 13.
...
====== Revolve: Replay from equation 2 (first equation of timestep 1)
to equation 2 (last equation of timestep 1). ======
Revolve: No need to replay equation 2.
Revolve: Checkpoint equation 2 in memory.
Revolve: Delete checkpoint equation 2.
====== Revolve: Replay from equation 0 (first equation of timestep 0)
to equation 1 (last equation of timestep 0). ======
Revolve: Replaying equation 0.
Revolve: No need to replay equation 1.

To use checkpointing, the user must specify how many checkpoint slots are available in memory and on disk. When libadjoint informs dolfin-adjoint to checkpoint, dolfin-adjoint records the values of all variables at that time. Therefore, each checkpoint slot is equivalent to the whole state of memory. In the above example, both u and u_next will be checkpointed, and so each checkpoint will store 2*V.dim() floating point numbers. Keep this in mind when estimating how many checkpoints your machine can fit.