docs/source/automatic_derivatives.rst

   1 .. default-domain:: cpp
   2
   3 .. cpp:namespace:: ceres
   4
   5 .. _chapter-automatic_derivatives:
   6
   7 =====================
   8 Automatic Derivatives
   9 =====================
  10
  11 We will now consider automatic differentiation. It is a technique that
  12 can compute exact derivatives, fast, while requiring about the same
  13 effort from the user as is needed to use numerical differentiation.
  14
  15 Don't believe me? Well here goes. The following code fragment
  16 implements an automatically differentiated ``CostFunction`` for `Rat43
  17 <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
  18
  19 .. code-block:: c++
  20
  21   struct Rat43CostFunctor {
  22     Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
  23
  24     template <typename T>
  25     bool operator()(const T* parameters, T* residuals) const {
  26       const T b1 = parameters[0];
  27       const T b2 = parameters[1];
  28       const T b3 = parameters[2];
  29       const T b4 = parameters[3];
  30       residuals[0] = b1 * pow(1.0 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
  31       return true;
  32     }
  33
  34     private:
  35       const double x_;
  36       const double y_;
  37   };
  38
  39
  40   CostFunction* cost_function =
  41         new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
  42           new Rat43CostFunctor(x, y));
  43
  44 Notice that compared to numeric differentiation, the only difference
  45 when defining the functor for use with automatic differentiation is
  46 the signature of the ``operator()``.
  47
  48 In the case of numeric differentition it was
  49
  50 .. code-block:: c++
  51
  52    bool operator()(const double* parameters, double* residuals) const;
  53
  54 and for automatic differentiation it is a templated function of the
  55 form
  56
  57 .. code-block:: c++
  58
  59    template <typename T> bool operator()(const T* parameters, T* residuals) const;
  60
  61
  62 So what does this small change buy us? The following table compares
  63 the time it takes to evaluate the residual and the Jacobian for
  64 `Rat43` using various methods.
  65
  66 ==========================   =========
  67 CostFunction                 Time (ns)
  68 ==========================   =========
  69 Rat43Analytic                      255
  70 Rat43AnalyticOptimized              92
  71 Rat43NumericDiffForward            262
  72 Rat43NumericDiffCentral            517
  73 Rat43NumericDiffRidders           3760
  74 Rat43AutomaticDiff                 129
  75 ==========================   =========
  76
  77 We can get exact derivatives using automatic differentiation
  78 (``Rat43AutomaticDiff``) with about the same effort that is required
  79 to write the code for numeric differentiation but only :math:`40\%`
  80 slower than hand optimized analytical derivatives.
  81
  82 So how does it work? For this we will have to learn about **Dual
  83 Numbers** and **Jets** .
  84
  85
  86 Dual Numbers & Jets
  87 ===================
  88
  89 .. NOTE::
  90
  91    Reading this and the next section on implementing Jets is not
  92    necessary to use automatic differentiation in Ceres Solver. But
  93    knowing the basics of how Jets work is useful when debugging and
  94    reasoning about the performance of automatic differentiation.
  95
  96 Dual numbers are an extension of the real numbers analogous to complex
  97 numbers: whereas complex numbers augment the reals by introducing an
  98 imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
  99 numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
 100 :math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
 101 components, the *real* component :math:`a` and the *infinitesimal*
 102 component :math:`v`.
 103
 104 Surprisingly, this simple change leads to a convenient method for
 105 computing exact derivatives without needing to manipulate complicated
 106 symbolic expressions.
 107
 108 For example, consider the function
 109
 110 .. math::
 111
 112    f(x) = x^2 ,
 113
 114 Then,
 115
 116 .. math::
 117
 118    \begin{align}
 119    f(10 + \epsilon) &= (10 + \epsilon)^2\\
 120             &= 100 + 20 \epsilon + \epsilon^2\\
 121             &= 100 + 20 \epsilon
 122    \end{align}
 123
 124 Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
 125 20`. Indeed this generalizes to functions which are not
 126 polynomial. Consider an arbitrary differentiable function
 127 :math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
 128 considering the Taylor expansion of :math:`f` near :math:`x`, which
 129 gives us the infinite series
 130
 131 .. math::
 132    \begin{align}
 133    f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
 134    \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
 135    f(x + \epsilon) &= f(x) + Df(x) \epsilon
 136    \end{align}
 137
 138 Here we are using the fact that :math:`\epsilon^2 = 0`.
 139
 140 A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
 141 :math:`n`-dimensional dual number, where we augment the real numbers
 142 with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
 143 the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
 144 a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
 145 *infinitesimal* part :math:`\mathbf{v}`, i.e.,
 146
 147 .. math::
 148    x = a + \sum_j v_{j} \epsilon_j
 149
 150 The summation notation gets tedious, so we will also just write
 151
 152 .. math::
 153    x = a + \mathbf{v}.
 154
 155 where the :math:`\epsilon_i`'s are implict. Then, using the same
 156 Taylor series expansion used above, we can see that:
 157
 158 .. math::
 159
 160   f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
 161
 162 Similarly for a multivariate function
 163 :math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
 164 :math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
 165
 166 .. math::
 167    f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
 168
 169 So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
 170 standard basis vector, then, the above expression would simplify to
 171
 172 .. math::
 173    f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
 174
 175 and we can extract the coordinates of the Jacobian by inspecting the
 176 coefficients of :math:`\epsilon_i`.
 177
 178 Implementing Jets
 179 -----------------
 180
 181 In order for the above to work in practice, we will need the ability
 182 to evaluate an arbitrary function :math:`f` not just on real numbers
 183 but also on dual numbers, but one does not usually evaluate functions
 184 by evaluating their Taylor expansions,
 185
 186 This is where C++ templates and operator overloading comes into
 187 play. The following code fragment has a simple implementation of a
 188 ``Jet`` and some operators/functions that operate on them.
 189
 190 .. code-block:: c++
 191
 192    template<int N> struct Jet {
 193      double a;
 194      Eigen::Matrix<double, 1, N> v;
 195    };
 196
 197    template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
 198      return Jet<N>(f.a + g.a, f.v + g.v);
 199    }
 200
 201    template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
 202      return Jet<N>(f.a - g.a, f.v - g.v);
 203    }
 204
 205    template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
 206      return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
 207    }
 208
 209    template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
 210      return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
 211    }
 212
 213    template <int N> Jet<N> exp(const Jet<N>& f) {
 214      return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
 215    }
 216
 217    // This is a simple implementation for illustration purposes, the
 218    // actual implementation of pow requires careful handling of a number
 219    // of corner cases.
 220    template <int N>  Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
 221      return Jet<N>(pow(f.a, g.a),
 222                    g.a * pow(f.a, g.a - 1.0) * f.v +
 223                    pow(f.a, g.a) * log(f.a); * g.v);
 224    }
 225
 226
 227 With these overloaded functions in hand, we can now call
 228 ``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
 229 that together with appropriately initialized Jets allows us to compute
 230 the Jacobian as follows:
 231
 232 .. code-block:: c++
 233
 234   class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
 235    public:
 236     Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
 237     virtual ~Rat43Automatic() {}
 238     virtual bool Evaluate(double const* const* parameters,
 239                           double* residuals,
 240                           double** jacobians) const {
 241       // Just evaluate the residuals if Jacobians are not required.
 242       if (!jacobians) return (*functor_)(parameters[0], residuals);
 243
 244       // Initialize the Jets
 245       ceres::Jet<4> jets[4];
 246       for (int i = 0; i < 4; ++i) {
 247         jets[i].a = parameters[0][i];
 248         jets[i].v.setZero();
 249         jets[i].v[i] = 1.0;
 250       }
 251
 252       ceres::Jet<4> result;
 253       (*functor_)(jets, &result);
 254
 255       // Copy the values out of the Jet.
 256       residuals[0] = result.a;
 257       for (int i = 0; i < 4; ++i) {
 258         jacobians[0][i] = result.v[i];
 259       }
 260       return true;
 261     }
 262
 263    private:
 264     std::unique_ptr<const Rat43CostFunctor> functor_;
 265   };
 266
 267 Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
 268
 269
 270 Pitfalls
 271 ========
 272
 273 Automatic differentiation frees the user from the burden of computing
 274 and reasoning about the symbolic expressions for the Jacobians, but
 275 this freedom comes at a cost. For example consider the following
 276 simple functor:
 277
 278 .. code-block:: c++
 279
 280    struct Functor {
 281      template <typename T> bool operator()(const T* x, T* residual) const {
 282        residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
 283        return true;
 284      }
 285    };
 286
 287 Looking at the code for the residual computation, one does not foresee
 288 any problems. However, if we look at the analytical expressions for
 289 the Jacobian:
 290
 291 .. math::
 292
 293       y &= 1 - \sqrt{x_0^2 + x_1^2}\\
 294    D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
 295    D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
 296
 297 we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
 298 0`.
 299
 300 There is no single solution to this problem. In some cases one needs
 301 to reason explicitly about the points where indeterminacy may occur
 302 and use alternate expressions using `L'Hopital's rule
 303 <https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
 304 example some of the conversion routines in `rotation.h
 305 <https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
 306 other cases, one may need to regularize the expressions to eliminate
 307 these points.