docs/source/analytical_derivatives.rst

   1 .. default-domain:: cpp
   2
   3 .. cpp:namespace:: ceres
   4
   5 .. _chapter-analytical_derivatives:
   6
   7 ====================
   8 Analytic Derivatives
   9 ====================
  10
  11 Consider the problem of fitting the following curve (`Rat43
  12 <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
  13 data:
  14
  15 .. math::
  16   y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
  17
  18 That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
  19 determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
  20 fit this data.
  21
  22 Which can be stated as the problem of finding the
  23 values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
  24 minimize the following objective function [#f1]_:
  25
  26 .. math::
  27    \begin{align}
  28    E(b_1, b_2, b_3, b_4)
  29    &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
  30    &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
  31    \end{align}
  32
  33 To solve this problem using Ceres Solver, we need to define a
  34 :class:`CostFunction` that computes the residual :math:`f` for a given
  35 :math:`x` and :math:`y` and its derivatives with respect to
  36 :math:`b_1, b_2, b_3` and :math:`b_4`.
  37
  38 Using elementary differential calculus, we can see that:
  39
  40 .. math::
  41   \begin{align}
  42   D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
  43   D_2 f(b_1, b_2, b_3, b_4; x,y) &=
  44   \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
  45   D_3 f(b_1, b_2, b_3, b_4; x,y) &=
  46   \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
  47   D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1  \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
  48   \end{align}
  49
  50 With these derivatives in hand, we can now implement the
  51 :class:`CostFunction` as:
  52
  53 .. code-block:: c++
  54
  55   class Rat43Analytic : public SizedCostFunction<1,4> {
  56      public:
  57        Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
  58        virtual ~Rat43Analytic() {}
  59        virtual bool Evaluate(double const* const* parameters,
  60                              double* residuals,
  61                              double** jacobians) const {
  62          const double b1 = parameters[0][0];
  63          const double b2 = parameters[0][1];
  64          const double b3 = parameters[0][2];
  65          const double b4 = parameters[0][3];
  66
  67          residuals[0] = b1 *  pow(1 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
  68
  69          if (!jacobians) return true;
  70          double* jacobian = jacobians[0];
  71          if (!jacobian) return true;
  72
  73          jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
  74          jacobian[1] = -b1 * exp(b2 - b3 * x_) *
  75                        pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
  76          jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
  77                        pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
  78          jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
  79                        pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
  80          return true;
  81        }
  82
  83       private:
  84        const double x_;
  85        const double y_;
  86    };
  87
  88 This is tedious code, hard to read and with a lot of
  89 redundancy. So in practice we will cache some sub-expressions to
  90 improve its efficiency, which would give us something like:
  91
  92 .. code-block:: c++
  93
  94   class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
  95      public:
  96        Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
  97        virtual ~Rat43AnalyticOptimized() {}
  98        virtual bool Evaluate(double const* const* parameters,
  99                              double* residuals,
 100                              double** jacobians) const {
 101          const double b1 = parameters[0][0];
 102          const double b2 = parameters[0][1];
 103          const double b3 = parameters[0][2];
 104          const double b4 = parameters[0][3];
 105
 106          const double t1 = exp(b2 -  b3 * x_);
 107          const double t2 = 1 + t1;
 108          const double t3 = pow(t2, -1.0 / b4);
 109          residuals[0] = b1 * t3 - y_;
 110
 111          if (!jacobians) return true;
 112          double* jacobian = jacobians[0];
 113          if (!jacobian) return true;
 114
 115          const double t4 = pow(t2, -1.0 / b4 - 1);
 116          jacobian[0] = t3;
 117          jacobian[1] = -b1 * t1 * t4 / b4;
 118          jacobian[2] = -x_ * jacobian[1];
 119          jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
 120          return true;
 121        }
 122
 123      private:
 124        const double x_;
 125        const double y_;
 126    };
 127
 128 What is the difference in performance of these two implementations?
 129
 130 ==========================   =========
 131 CostFunction                 Time (ns)
 132 ==========================   =========
 133 Rat43Analytic                      255
 134 Rat43AnalyticOptimized              92
 135 ==========================   =========
 136
 137 ``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
 138 ``Rat43Analytic``.  This difference in run-time is not uncommon. To
 139 get the best performance out of analytically computed derivatives, one
 140 usually needs to optimize the code to account for common
 141 sub-expressions.
 142
 143
 144 When should you use analytical derivatives?
 145 ===========================================
 146
 147 #. The expressions are simple, e.g. mostly linear.
 148
 149 #. A computer algebra system like `Maple
 150    <https://www.maplesoft.com/products/maple/>`_ , `Mathematica
 151    <https://www.wolfram.com/mathematica/>`_, or `SymPy
 152    <http://www.sympy.org/en/index.html>`_ can be used to symbolically
 153    differentiate the objective function and generate the C++ to
 154    evaluate them.
 155
 156 #. Performance is of utmost concern and there is algebraic structure
 157    in the terms that you can exploit to get better performance than
 158    automatic differentiation.
 159
 160    That said, getting the best performance out of analytical
 161    derivatives requires a non-trivial amount of work.  Before going
 162    down this path, it is useful to measure the amount of time being
 163    spent evaluating the Jacobian as a fraction of the total solve time
 164    and remember `Amdahl's Law
 165    <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
 166
 167 #. There is no other way to compute the derivatives, e.g. you
 168    wish to compute the derivative of the root of a polynomial:
 169
 170    .. math::
 171      a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
 172
 173
 174    with respect to :math:`x` and :math:`y`. This requires the use of
 175    the `Inverse Function Theorem
 176    <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
 177
 178 #. You love the chain rule and actually enjoy doing all the algebra by
 179    hand.
 180
 181
 182 .. rubric:: Footnotes
 183
 184 .. [#f1] The notion of best fit depends on the choice of the objective
 185          function used to measure the quality of fit, which in turn
 186          depends on the underlying noise process which generated the
 187          observations. Minimizing the sum of squared differences is
 188          the right thing to do when the noise is `Gaussian
 189          <https://en.wikipedia.org/wiki/Normal_distribution>`_. In
 190          that case the optimal value of the parameters is the `Maximum
 191          Likelihood Estimate
 192          <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.