1 .. default-domain:: cpp
3 .. cpp:namespace:: ceres
5 .. _chapter-automatic_derivatives:
11 We will now consider automatic differentiation. It is a technique that
12 can compute exact derivatives, fast, while requiring about the same
13 effort from the user as is needed to use numerical differentiation.
15 Don't believe me? Well here goes. The following code fragment
16 implements an automatically differentiated ``CostFunction`` for `Rat43
17 <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
21 struct Rat43CostFunctor {
22 Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
25 bool operator()(const T* parameters, T* residuals) const {
26 const T b1 = parameters[0];
27 const T b2 = parameters[1];
28 const T b3 = parameters[2];
29 const T b4 = parameters[3];
30 residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
40 CostFunction* cost_function =
41 new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
42 new Rat43CostFunctor(x, y));
44 Notice that compared to numeric differentiation, the only difference
45 when defining the functor for use with automatic differentiation is
46 the signature of the ``operator()``.
48 In the case of numeric differentition it was
52 bool operator()(const double* parameters, double* residuals) const;
54 and for automatic differentiation it is a templated function of the
59 template <typename T> bool operator()(const T* parameters, T* residuals) const;
62 So what does this small change buy us? The following table compares
63 the time it takes to evaluate the residual and the Jacobian for
64 `Rat43` using various methods.
66 ========================== =========
67 CostFunction Time (ns)
68 ========================== =========
70 Rat43AnalyticOptimized 92
71 Rat43NumericDiffForward 262
72 Rat43NumericDiffCentral 517
73 Rat43NumericDiffRidders 3760
74 Rat43AutomaticDiff 129
75 ========================== =========
77 We can get exact derivatives using automatic differentiation
78 (``Rat43AutomaticDiff``) with about the same effort that is required
79 to write the code for numeric differentiation but only :math:`40\%`
80 slower than hand optimized analytical derivatives.
82 So how does it work? For this we will have to learn about **Dual
83 Numbers** and **Jets** .
91 Reading this and the next section on implementing Jets is not
92 necessary to use automatic differentiation in Ceres Solver. But
93 knowing the basics of how Jets work is useful when debugging and
94 reasoning about the performance of automatic differentiation.
96 Dual numbers are an extension of the real numbers analogous to complex
97 numbers: whereas complex numbers augment the reals by introducing an
98 imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
99 numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
100 :math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
101 components, the *real* component :math:`a` and the *infinitesimal*
104 Surprisingly, this simple change leads to a convenient method for
105 computing exact derivatives without needing to manipulate complicated
106 symbolic expressions.
108 For example, consider the function
119 f(10 + \epsilon) &= (10 + \epsilon)^2\\
120 &= 100 + 20 \epsilon + \epsilon^2\\
124 Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
125 20`. Indeed this generalizes to functions which are not
126 polynomial. Consider an arbitrary differentiable function
127 :math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
128 considering the Taylor expansion of :math:`f` near :math:`x`, which
129 gives us the infinite series
133 f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
134 \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
135 f(x + \epsilon) &= f(x) + Df(x) \epsilon
138 Here we are using the fact that :math:`\epsilon^2 = 0`.
140 A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
141 :math:`n`-dimensional dual number, where we augment the real numbers
142 with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
143 the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
144 a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
145 *infinitesimal* part :math:`\mathbf{v}`, i.e.,
148 x = a + \sum_j v_{j} \epsilon_j
150 The summation notation gets tedious, so we will also just write
155 where the :math:`\epsilon_i`'s are implict. Then, using the same
156 Taylor series expansion used above, we can see that:
160 f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
162 Similarly for a multivariate function
163 :math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
164 :math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
167 f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
169 So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
170 standard basis vector, then, the above expression would simplify to
173 f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
175 and we can extract the coordinates of the Jacobian by inspecting the
176 coefficients of :math:`\epsilon_i`.
181 In order for the above to work in practice, we will need the ability
182 to evaluate an arbitrary function :math:`f` not just on real numbers
183 but also on dual numbers, but one does not usually evaluate functions
184 by evaluating their Taylor expansions,
186 This is where C++ templates and operator overloading comes into
187 play. The following code fragment has a simple implementation of a
188 ``Jet`` and some operators/functions that operate on them.
192 template<int N> struct Jet {
194 Eigen::Matrix<double, 1, N> v;
197 template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
198 return Jet<N>(f.a + g.a, f.v + g.v);
201 template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
202 return Jet<N>(f.a - g.a, f.v - g.v);
205 template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
206 return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
209 template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
210 return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
213 template <int N> Jet<N> exp(const Jet<N>& f) {
214 return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
217 // This is a simple implementation for illustration purposes, the
218 // actual implementation of pow requires careful handling of a number
220 template <int N> Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
221 return Jet<N>(pow(f.a, g.a),
222 g.a * pow(f.a, g.a - 1.0) * f.v +
223 pow(f.a, g.a) * log(f.a); * g.v);
227 With these overloaded functions in hand, we can now call
228 ``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
229 that together with appropriately initialized Jets allows us to compute
230 the Jacobian as follows:
234 class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
236 Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
237 virtual ~Rat43Automatic() {}
238 virtual bool Evaluate(double const* const* parameters,
240 double** jacobians) const {
241 // Just evaluate the residuals if Jacobians are not required.
242 if (!jacobians) return (*functor_)(parameters[0], residuals);
244 // Initialize the Jets
245 ceres::Jet<4> jets[4];
246 for (int i = 0; i < 4; ++i) {
247 jets[i].a = parameters[0][i];
252 ceres::Jet<4> result;
253 (*functor_)(jets, &result);
255 // Copy the values out of the Jet.
256 residuals[0] = result.a;
257 for (int i = 0; i < 4; ++i) {
258 jacobians[0][i] = result.v[i];
264 std::unique_ptr<const Rat43CostFunctor> functor_;
267 Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
273 Automatic differentiation frees the user from the burden of computing
274 and reasoning about the symbolic expressions for the Jacobians, but
275 this freedom comes at a cost. For example consider the following
281 template <typename T> bool operator()(const T* x, T* residual) const {
282 residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
287 Looking at the code for the residual computation, one does not foresee
288 any problems. However, if we look at the analytical expressions for
293 y &= 1 - \sqrt{x_0^2 + x_1^2}\\
294 D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
295 D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
297 we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
300 There is no single solution to this problem. In some cases one needs
301 to reason explicitly about the points where indeterminacy may occur
302 and use alternate expressions using `L'Hopital's rule
303 <https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
304 example some of the conversion routines in `rotation.h
305 <https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
306 other cases, one may need to regularize the expressions to eliminate