docs/source/nnls_solving.rst

   1
   2 .. default-domain:: cpp
   3
   4 .. cpp:namespace:: ceres
   5
   6 .. _chapter-nnls_solving:
   7
   8 ================================
   9 Solving Non-linear Least Squares
  10 ================================
  11
  12 Introduction
  13 ============
  14
  15 Effective use of Ceres requires some familiarity with the basic
  16 components of a non-linear least squares solver, so before we describe
  17 how to configure and use the solver, we will take a brief look at how
  18 some of the core optimization algorithms in Ceres work.
  19
  20 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
  21 variables, and
  22 :math:`F(x) = \left[f_1(x), ... ,  f_{m}(x) \right]^{\top}` be a
  23 :math:`m`-dimensional function of :math:`x`.  We are interested in
  24 solving the optimization problem [#f1]_
  25
  26 .. math:: \arg \min_x \frac{1}{2}\|F(x)\|^2\ . \\
  27           L \le x \le U
  28   :label: nonlinsq
  29
  30 Where, :math:`L` and :math:`U` are lower and upper bounds on the
  31 parameter vector :math:`x`.
  32
  33 Since the efficient global minimization of :eq:`nonlinsq` for
  34 general :math:`F(x)` is an intractable problem, we will have to settle
  35 for finding a local minimum.
  36
  37 In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
  38 :math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
  39 and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
  40 = J(x)^\top F(x)`.
  41
  42 The general strategy when solving non-linear optimization problems is
  43 to solve a sequence of approximations to the original problem
  44 [NocedalWright]_. At each iteration, the approximation is solved to
  45 determine a correction :math:`\Delta x` to the vector :math:`x`. For
  46 non-linear least squares, an approximation can be constructed by using
  47 the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
  48 which leads to the following linear least squares problem:
  49
  50 .. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
  51    :label: linearapprox
  52
  53 Unfortunately, naively solving a sequence of these problems and
  54 updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
  55 may not converge.  To get a convergent algorithm, we need to control
  56 the size of the step :math:`\Delta x`. Depending on how the size of
  57 the step :math:`\Delta x` is controlled, non-linear optimization
  58 algorithms can be divided into two major categories [NocedalWright]_.
  59
  60 1. **Trust Region** The trust region approach approximates the
  61    objective function using using a model function (often a quadratic)
  62    over a subset of the search space known as the trust region. If the
  63    model function succeeds in minimizing the true objective function
  64    the trust region is expanded; conversely, otherwise it is
  65    contracted and the model optimization problem is solved again.
  66
  67 2. **Line Search** The line search approach first finds a descent
  68    direction along which the objective function will be reduced and
  69    then computes a step size that decides how far should move along
  70    that direction. The descent direction can be computed by various
  71    methods, such as gradient descent, Newton's method and Quasi-Newton
  72    method. The step size can be determined either exactly or
  73    inexactly.
  74
  75 Trust region methods are in some sense dual to line search methods:
  76 trust region methods first choose a step size (the size of the trust
  77 region) and then a step direction while line search methods first
  78 choose a step direction and then a step size. Ceres implements
  79 multiple algorithms in both categories.
  80
  81 .. _section-trust-region-methods:
  82
  83 Trust Region Methods
  84 ====================
  85
  86 The basic trust region algorithm looks something like this.
  87
  88    1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
  89    2. Solve
  90
  91       .. math::
  92          \arg \min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 \\
  93          \text{such that} &\|D(x)\Delta x\|^2 \le \mu\\
  94          &L \le x + \Delta x \le U.
  95
  96    3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 -
  97       \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 -
  98       \|F(x)\|^2}`
  99    4. if :math:`\rho > \epsilon` then  :math:`x = x + \Delta x`.
 100    5. if :math:`\rho > \eta_1` then :math:`\mu = 2  \mu`
 101    6. else if :math:`\rho < \eta_2` then :math:`\mu = 0.5 * \mu`
 102    7. Go to 2.
 103
 104 Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
 105 matrix used to define a metric on the domain of :math:`F(x)` and
 106 :math:`\rho` measures the quality of the step :math:`\Delta x`, i.e.,
 107 how well did the linear model predict the decrease in the value of the
 108 non-linear objective. The idea is to increase or decrease the radius
 109 of the trust region depending on how well the linearization predicts
 110 the behavior of the non-linear objective, which in turn is reflected
 111 in the value of :math:`\rho`.
 112
 113 The key computational step in a trust-region algorithm is the solution
 114 of the constrained optimization problem
 115
 116 .. math::
 117    \arg \min_{\Delta x}&\quad \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 \\
 118    \text{such that} &\quad \|D(x)\Delta x\|^2 \le \mu\\
 119     &\quad L \le x + \Delta x \le U.
 120    :label: trp
 121
 122 There are a number of different ways of solving this problem, each
 123 giving rise to a different concrete trust-region algorithm. Currently,
 124 Ceres implements two trust-region algorithms - Levenberg-Marquardt
 125 and Dogleg, each of which is augmented with a line search if bounds
 126 constraints are present [Kanzow]_. The user can choose between them by
 127 setting :member:`Solver::Options::trust_region_strategy_type`.
 128
 129 .. rubric:: Footnotes
 130
 131 .. [#f1] At the level of the non-linear solver, the block structure is
 132          not relevant, therefore our discussion here is in terms of an
 133          optimization problem defined over a state vector of size
 134          :math:`n`. Similarly the presence of loss functions is also
 135          ignored as the problem is internally converted into a pure
 136          non-linear least squares problem.
 137
 138
 139 .. _section-levenberg-marquardt:
 140
 141 Levenberg-Marquardt
 142 -------------------
 143
 144 The Levenberg-Marquardt algorithm [Levenberg]_  [Marquardt]_ is the
 145 most popular algorithm for solving non-linear least squares problems.
 146 It was also the first trust region algorithm to be developed
 147 [Levenberg]_ [Marquardt]_. Ceres implements an exact step [Madsen]_
 148 and an inexact step variant of the Levenberg-Marquardt algorithm
 149 [WrightHolt]_ [NashSofer]_.
 150
 151 It can be shown, that the solution to :eq:`trp` can be obtained by
 152 solving an unconstrained optimization of the form
 153
 154 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda  \|D(x)\Delta x\|^2
 155
 156 Where, :math:`\lambda` is a Lagrange multiplier that is inverse
 157 related to :math:`\mu`. In Ceres, we solve for
 158
 159 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
 160    :label: lsqr
 161
 162 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
 163 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
 164
 165 Before going further, let us make some notational simplifications. We
 166 will assume that the matrix :math:`\sqrt{\mu} D` has been concatenated
 167 at the bottom of the matrix :math:`J` and similarly a vector of zeros
 168 has been added to the bottom of the vector :math:`f` and the rest of
 169 our discussion will be in terms of :math:`J` and :math:`f`, i.e, the
 170 linear least squares problem.
 171
 172 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
 173    :label: simple
 174
 175 For all but the smallest problems the solution of :eq:`simple` in
 176 each iteration of the Levenberg-Marquardt algorithm is the dominant
 177 computational cost in Ceres. Ceres provides a number of different
 178 options for solving :eq:`simple`. There are two major classes of
 179 methods - factorization and iterative.
 180
 181 The factorization methods are based on computing an exact solution of
 182 :eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
 183 step Levenberg-Marquardt algorithm. But it is not clear if an exact
 184 solution of :eq:`lsqr` is necessary at each step of the LM algorithm
 185 to solve :eq:`nonlinsq`. In fact, we have already seen evidence
 186 that this may not be the case, as :eq:`lsqr` is itself a regularized
 187 version of :eq:`linearapprox`. Indeed, it is possible to
 188 construct non-linear optimization algorithms in which the linearized
 189 problem is solved approximately. These algorithms are known as inexact
 190 Newton or truncated Newton methods [NocedalWright]_.
 191
 192 An inexact Newton method requires two ingredients. First, a cheap
 193 method for approximately solving systems of linear
 194 equations. Typically an iterative linear solver like the Conjugate
 195 Gradients method is used for this
 196 purpose [NocedalWright]_. Second, a termination rule for
 197 the iterative solver. A typical termination rule is of the form
 198
 199 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
 200    :label: inexact
 201
 202 Here, :math:`k` indicates the Levenberg-Marquardt iteration number and
 203 :math:`0 < \eta_k <1` is known as the forcing sequence.  [WrightHolt]_
 204 prove that a truncated Levenberg-Marquardt algorithm that uses an
 205 inexact Newton step based on :eq:`inexact` converges for any
 206 sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence
 207 depends on the choice of the forcing sequence :math:`\eta_k`.
 208
 209 Ceres supports both exact and inexact step solution strategies. When
 210 the user chooses a factorization based linear solver, the exact step
 211 Levenberg-Marquardt algorithm is used. When the user chooses an
 212 iterative linear solver, the inexact step Levenberg-Marquardt
 213 algorithm is used.
 214
 215 .. _section-dogleg:
 216
 217 Dogleg
 218 ------
 219
 220 Another strategy for solving the trust region problem :eq:`trp` was
 221 introduced by M. J. D. Powell. The key idea there is to compute two
 222 vectors
 223
 224 .. math::
 225
 226         \Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \|J(x)\Delta x + f(x)\|^2.\\
 227         \Delta x^{\text{Cauchy}} &= -\frac{\|g(x)\|^2}{\|J(x)g(x)\|^2}g(x).
 228
 229 Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the
 230 solution to :eq:`linearapprox` and :math:`\Delta
 231 x^{\text{Cauchy}}` is the vector that minimizes the linear
 232 approximation if we restrict ourselves to moving along the direction
 233 of the gradient. Dogleg methods finds a vector :math:`\Delta x`
 234 defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta
 235 x^{\text{Cauchy}}` that solves the trust region problem. Ceres
 236 supports two variants that can be chose by setting
 237 :member:`Solver::Options::dogleg_type`.
 238
 239 ``TRADITIONAL_DOGLEG`` as described by Powell, constructs two line
 240 segments using the Gauss-Newton and Cauchy vectors and finds the point
 241 farthest along this line shaped like a dogleg (hence the name) that is
 242 contained in the trust-region. For more details on the exact reasoning
 243 and computations, please see Madsen et al [Madsen]_.
 244
 245 ``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
 246 entire two dimensional subspace spanned by these two vectors and finds
 247 the point that minimizes the trust region problem in this subspace
 248 [ByrdSchnabel]_.
 249
 250 The key advantage of the Dogleg over Levenberg-Marquardt is that if
 251 the step computation for a particular choice of :math:`\mu` does not
 252 result in sufficient decrease in the value of the objective function,
 253 Levenberg-Marquardt solves the linear approximation from scratch with
 254 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
 255 to compute the interpolation between the Gauss-Newton and the Cauchy
 256 vectors, as neither of them depend on the value of :math:`\mu`.
 257
 258 The Dogleg method can only be used with the exact factorization based
 259 linear solvers.
 260
 261 .. _section-inner-iterations:
 262
 263 Inner Iterations
 264 ----------------
 265
 266 Some non-linear least squares problems have additional structure in
 267 the way the parameter blocks interact that it is beneficial to modify
 268 the way the trust region step is computed. For example, consider the
 269 following regression problem
 270
 271 .. math::   y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
 272
 273
 274 Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate
 275 :math:`a_1, a_2, b_1, b_2`, and :math:`c_1`.
 276
 277 Notice that the expression on the left is linear in :math:`a_1` and
 278 :math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`,
 279 it is possible to use linear regression to estimate the optimal values
 280 of :math:`a_1` and :math:`a_2`. It's possible to analytically
 281 eliminate the variables :math:`a_1` and :math:`a_2` from the problem
 282 entirely. Problems like these are known as separable least squares
 283 problem and the most famous algorithm for solving them is the Variable
 284 Projection algorithm invented by Golub & Pereyra [GolubPereyra]_.
 285
 286 Similar structure can be found in the matrix factorization with
 287 missing data problem. There the corresponding algorithm is known as
 288 Wiberg's algorithm [Wiberg]_.
 289
 290 Ruhe & Wedin present an analysis of various algorithms for solving
 291 separable non-linear least squares problems and refer to *Variable
 292 Projection* as Algorithm I in their paper [RuheWedin]_.
 293
 294 Implementing Variable Projection is tedious and expensive. Ruhe &
 295 Wedin present a simpler algorithm with comparable convergence
 296 properties, which they call Algorithm II.  Algorithm II performs an
 297 additional optimization step to estimate :math:`a_1` and :math:`a_2`
 298 exactly after computing a successful Newton step.
 299
 300
 301 This idea can be generalized to cases where the residual is not
 302 linear in :math:`a_1` and :math:`a_2`, i.e.,
 303
 304 .. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
 305
 306 In this case, we solve for the trust region step for the full problem,
 307 and then use it as the starting point to further optimize just `a_1`
 308 and `a_2`. For the linear case, this amounts to doing a single linear
 309 least squares solve. For non-linear problems, any method for solving
 310 the :math:`a_1` and :math:`a_2` optimization problems will do. The
 311 only constraint on :math:`a_1` and :math:`a_2` (if they are two
 312 different parameter block) is that they do not co-occur in a residual
 313 block.
 314
 315 This idea can be further generalized, by not just optimizing
 316 :math:`(a_1, a_2)`, but decomposing the graph corresponding to the
 317 Hessian matrix's sparsity structure into a collection of
 318 non-overlapping independent sets and optimizing each of them.
 319
 320 Setting :member:`Solver::Options::use_inner_iterations` to ``true``
 321 enables the use of this non-linear generalization of Ruhe & Wedin's
 322 Algorithm II.  This version of Ceres has a higher iteration
 323 complexity, but also displays better convergence behavior per
 324 iteration.
 325
 326 Setting :member:`Solver::Options::num_threads` to the maximum number
 327 possible is highly recommended.
 328
 329 .. _section-non-monotonic-steps:
 330
 331 Non-monotonic Steps
 332 -------------------
 333
 334 Note that the basic trust-region algorithm described in
 335 :ref:`section-trust-region-methods` is a descent algorithm in that it
 336 only accepts a point if it strictly reduces the value of the objective
 337 function.
 338
 339 Relaxing this requirement allows the algorithm to be more efficient in
 340 the long term at the cost of some local increase in the value of the
 341 objective function.
 342
 343 This is because allowing for non-decreasing objective function values
 344 in a principled manner allows the algorithm to *jump over boulders* as
 345 the method is not restricted to move into narrow valleys while
 346 preserving its convergence properties.
 347
 348 Setting :member:`Solver::Options::use_nonmonotonic_steps` to ``true``
 349 enables the non-monotonic trust region algorithm as described by Conn,
 350 Gould & Toint in [Conn]_.
 351
 352 Even though the value of the objective function may be larger
 353 than the minimum value encountered over the course of the
 354 optimization, the final parameters returned to the user are the
 355 ones corresponding to the minimum cost over all iterations.
 356
 357 The option to take non-monotonic steps is available for all trust
 358 region strategies.
 359
 360
 361 .. _section-line-search-methods:
 362
 363 Line Search Methods
 364 ===================
 365
 366 The line search method in Ceres Solver cannot handle bounds
 367 constraints right now, so it can only be used for solving
 368 unconstrained problems.
 369
 370 Line search algorithms
 371
 372    1. Given an initial point :math:`x`
 373    2. :math:`\Delta x = -H^{-1}(x) g(x)`
 374    3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2`
 375    4. :math:`x = x + \mu \Delta x`
 376    5. Goto 2.
 377
 378 Here :math:`H(x)` is some approximation to the Hessian of the
 379 objective function, and :math:`g(x)` is the gradient at
 380 :math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
 381 different search directions :math:`\Delta x`.
 382
 383 Step 4, which is a one dimensional optimization or `Line Search` along
 384 :math:`\Delta x` is what gives this class of methods its name.
 385
 386 Different line search algorithms differ in their choice of the search
 387 direction :math:`\Delta x` and the method used for one dimensional
 388 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
 389 primary source of computational complexity in these
 390 methods. Currently, Ceres Solver supports three choices of search
 391 directions, all aimed at large scale problems.
 392
 393 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
 394    be the identity matrix. This is not a good search direction for
 395    anything but the simplest of the problems. It is only included here
 396    for completeness.
 397
 398 2. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
 399    Gradient method to non-linear functions. The generalization can be
 400    performed in a number of different ways, resulting in a variety of
 401    search directions. Ceres Solver currently supports
 402    ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and ``HESTENES_STIEFEL``
 403    directions.
 404
 405 3. ``BFGS`` A generalization of the Secant method to multiple
 406    dimensions in which a full, dense approximation to the inverse
 407    Hessian is maintained and used to compute a quasi-Newton step
 408    [NocedalWright]_.  BFGS is currently the best known general
 409    quasi-Newton algorithm.
 410
 411 4. ``LBFGS`` A limited memory approximation to the full ``BFGS``
 412    method in which the last `M` iterations are used to approximate the
 413    inverse Hessian used to compute a quasi-Newton step [Nocedal]_,
 414    [ByrdNocedal]_.
 415
 416 Currently Ceres Solver supports both a backtracking and interpolation
 417 based Armijo line search algorithm, and a sectioning / zoom
 418 interpolation (strong) Wolfe condition line search algorithm.
 419 However, note that in order for the assumptions underlying the
 420 ``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
 421 Wolfe line search algorithm should be used.
 422
 423 .. _section-linear-solver:
 424
 425 LinearSolver
 426 ============
 427
 428 Recall that in both of the trust-region methods described above, the
 429 key computational cost is the solution of a linear least squares
 430 problem of the form
 431
 432 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
 433    :label: simple2
 434
 435 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
 436 f(x)`. For notational convenience let us also drop the dependence on
 437 :math:`x`. Then it is easy to see that solving :eq:`simple2` is
 438 equivalent to solving the *normal equations*.
 439
 440 .. math:: H \Delta x = g
 441    :label: normal
 442
 443 Ceres provides a number of different options for solving :eq:`normal`.
 444
 445 .. _section-qr:
 446
 447 ``DENSE_QR``
 448 ------------
 449
 450 For small problems (a couple of hundred parameters and a few thousand
 451 residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
 452 of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
 453 :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
 454 an upper triangular matrix [TrefethenBau]_. Then it can be shown that
 455 the solution to :eq:`normal` is given by
 456
 457 .. math:: \Delta x^* = -R^{-1}Q^\top f
 458
 459
 460 Ceres uses ``Eigen`` 's dense QR factorization routines.
 461
 462 .. _section-cholesky:
 463
 464 ``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
 465 ------------------------------------------------------
 466
 467 Large non-linear least square problems are usually sparse. In such
 468 cases, using a dense QR factorization is inefficient. Let :math:`H =
 469 R^\top R` be the Cholesky factorization of the normal equations, where
 470 :math:`R` is an upper triangular matrix, then the solution to
 471 :eq:`normal` is given by
 472
 473 .. math::
 474
 475     \Delta x^* = R^{-1} R^{-\top} g.
 476
 477
 478 The observant reader will note that the :math:`R` in the Cholesky
 479 factorization of :math:`H` is the same upper triangular matrix
 480 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
 481 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
 482 Q^\top Q R = R^\top R`. There are two variants of Cholesky
 483 factorization -- sparse and dense.
 484
 485 ``DENSE_NORMAL_CHOLESKY``  as the name implies performs a dense
 486 Cholesky factorization of the normal equations. Ceres uses
 487 ``Eigen`` 's dense LDLT factorization routines.
 488
 489 ``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
 490 Cholesky factorization of the normal equations. This leads to
 491 substantial savings in time and memory for large sparse
 492 problems. Ceres uses the sparse Cholesky factorization routines in
 493 Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_
 494 or the sparse Cholesky factorization algorithm in ``Eigen`` (which
 495 incidently is a port of the algorithm implemented inside ``CXSparse``)
 496
 497 .. _section-cgnr:
 498
 499 ``CGNR``
 500 --------
 501
 502 For general sparse problems, if the problem is too large for
 503 ``CHOLMOD`` or a sparse linear algebra library is not linked into
 504 Ceres, another option is the ``CGNR`` solver. This solver uses the
 505 Conjugate Gradients solver on the *normal equations*, but without
 506 forming the normal equations explicitly. It exploits the relation
 507
 508 .. math::
 509     H x = J^\top J x = J^\top(J x)
 510
 511 The convergence of Conjugate Gradients depends on the conditioner
 512 number :math:`\kappa(H)`. Usually :math:`H` is poorly conditioned and
 513 a :ref:`section-preconditioner` must be used to get reasonable
 514 performance. Currently only the ``JACOBI`` preconditioner is available
 515 for use with ``CGNR``. It uses the block diagonal of :math:`H` to
 516 precondition the normal equations.
 517
 518 When the user chooses ``CGNR`` as the linear solver, Ceres
 519 automatically switches from the exact step algorithm to an inexact
 520 step algorithm.
 521
 522 .. _section-schur:
 523
 524 ``DENSE_SCHUR`` & ``SPARSE_SCHUR``
 525 ----------------------------------
 526
 527 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
 528 adjustment problems, bundle adjustment problem have a special
 529 structure, and a more efficient scheme for solving :eq:`normal`
 530 can be constructed.
 531
 532 Suppose that the SfM problem consists of :math:`p` cameras and
 533 :math:`q` points and the variable vector :math:`x` has the block
 534 structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
 535 :math:`y` and :math:`z` correspond to camera and point parameters,
 536 respectively.  Further, let the camera blocks be of size :math:`c` and
 537 the point blocks be of size :math:`s` (for most problems :math:`c` =
 538 :math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
 539 requirement on these block sizes, but choosing them to be constant
 540 simplifies the exposition.
 541
 542 A key characteristic of the bundle adjustment problem is that there is
 543 no term :math:`f_{i}` that includes two or more point blocks.  This in
 544 turn implies that the matrix :math:`H` is of the form
 545
 546 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
 547    :label: hblock
 548
 549 where :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix
 550 with :math:`p` blocks of size :math:`c\times c` and :math:`C \in
 551 \mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks
 552 of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a
 553 general block sparse matrix, with a block of size :math:`c\times s`
 554 for each observation. Let us now block partition :math:`\Delta x =
 555 [\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal`
 556 as the block structured linear system
 557
 558 .. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix}
 559                 \right]\left[ \begin{matrix} \Delta y \\ \Delta z
 560                     \end{matrix} \right] = \left[ \begin{matrix} v\\ w
 561                     \end{matrix} \right]\ ,
 562    :label: linear2
 563
 564 and apply Gaussian elimination to it. As we noted above, :math:`C` is
 565 a block diagonal matrix, with small diagonal blocks of size
 566 :math:`s\times s`.  Thus, calculating the inverse of :math:`C` by
 567 inverting each of these blocks is cheap. This allows us to eliminate
 568 :math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top
 569 \Delta y)`, giving us
 570
 571 .. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ .
 572    :label: schur
 573
 574 The matrix
 575
 576 .. math:: S = B - EC^{-1}E^\top
 577
 578 is the Schur complement of :math:`C` in :math:`H`. It is also known as
 579 the *reduced camera matrix*, because the only variables
 580 participating in :eq:`schur` are the ones corresponding to the
 581 cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured
 582 symmetric positive definite matrix, with blocks of size :math:`c\times
 583 c`. The block :math:`S_{ij}` corresponding to the pair of images
 584 :math:`i` and :math:`j` is non-zero if and only if the two images
 585 observe at least one common point.
 586
 587
 588 Now, :eq:`linear2` can be solved by first forming :math:`S`, solving for
 589 :math:`\Delta y`, and then back-substituting :math:`\Delta y` to
 590 obtain the value of :math:`\Delta z`.  Thus, the solution of what was
 591 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
 592 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
 593 and matrix-vector multiplies, and the solution of block sparse
 594 :math:`pc\times pc` linear system :eq:`schur`.  For almost all
 595 problems, the number of cameras is much smaller than the number of
 596 points, :math:`p \ll q`, thus solving :eq:`schur` is
 597 significantly cheaper than solving :eq:`linear2`. This is the
 598 *Schur complement trick* [Brown]_.
 599
 600 This still leaves open the question of solving :eq:`schur`. The
 601 method of choice for solving symmetric positive definite systems
 602 exactly is via the Cholesky factorization [TrefethenBau]_ and
 603 depending upon the structure of the matrix, there are, in general, two
 604 options. The first is direct factorization, where we store and factor
 605 :math:`S` as a dense matrix [TrefethenBau]_. This method has
 606 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
 607 is only practical for problems with up to a few hundred cameras. Ceres
 608 implements this strategy as the ``DENSE_SCHUR`` solver.
 609
 610
 611 But, :math:`S` is typically a fairly sparse matrix, as most images
 612 only see a small fraction of the scene. This leads us to the second
 613 option: Sparse Direct Methods. These methods store :math:`S` as a
 614 sparse matrix, use row and column re-ordering algorithms to maximize
 615 the sparsity of the Cholesky decomposition, and focus their compute
 616 effort on the non-zero part of the factorization [Chen]_. Sparse
 617 direct methods, depending on the exact sparsity structure of the Schur
 618 complement, allow bundle adjustment algorithms to significantly scale
 619 up over those based on dense factorization. Ceres implements this
 620 strategy as the ``SPARSE_SCHUR`` solver.
 621
 622 .. _section-iterative_schur:
 623
 624 ``ITERATIVE_SCHUR``
 625 -------------------
 626
 627 Another option for bundle adjustment problems is to apply
 628 Preconditioned Conjugate Gradients to the reduced camera matrix
 629 :math:`S` instead of :math:`H`. One reason to do this is that
 630 :math:`S` is a much smaller matrix than :math:`H`, but more
 631 importantly, it can be shown that :math:`\kappa(S)\leq \kappa(H)`.
 632 Ceres implements Conjugate Gradients on :math:`S` as the
 633 ``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
 634 as the linear solver, Ceres automatically switches from the exact step
 635 algorithm to an inexact step algorithm.
 636
 637 The key computational operation when using Conjuagate Gradients is the
 638 evaluation of the matrix vector product :math:`Sx` for an arbitrary
 639 vector :math:`x`. There are two ways in which this product can be
 640 evaluated, and this can be controlled using
 641 ``Solver::Options::use_explicit_schur_complement``. Depending on the
 642 problem at hand, the performance difference between these two methods
 643 can be quite substantial.
 644
 645   1. **Implicit** This is default. Implicit evaluation is suitable for
 646      large problems where the cost of computing and storing the Schur
 647      Complement :math:`S` is prohibitive. Because PCG only needs
 648      access to :math:`S` via its product with a vector, one way to
 649      evaluate :math:`Sx` is to observe that
 650
 651      .. math::  x_1 &= E^\top x
 652      .. math::  x_2 &= C^{-1} x_1
 653      .. math::  x_3 &= Ex_2\\
 654      .. math::  x_4 &= Bx\\
 655      .. math::   Sx &= x_4 - x_3
 656         :label: schurtrick1
 657
 658      Thus, we can run PCG on :math:`S` with the same computational
 659      effort per iteration as PCG on :math:`H`, while reaping the
 660      benefits of a more powerful preconditioner. In fact, we do not
 661      even need to compute :math:`H`, :eq:`schurtrick1` can be
 662      implemented using just the columns of :math:`J`.
 663
 664      Equation :eq:`schurtrick1` is closely related to *Domain
 665      Decomposition methods* for solving large linear systems that
 666      arise in structural engineering and partial differential
 667      equations. In the language of Domain Decomposition, each point in
 668      a bundle adjustment problem is a domain, and the cameras form the
 669      interface between these domains. The iterative solution of the
 670      Schur complement then falls within the sub-category of techniques
 671      known as Iterative Sub-structuring [Saad]_ [Mathew]_.
 672
 673   2. **Explicit** The complexity of implicit matrix-vector product
 674      evaluation scales with the number of non-zeros in the
 675      Jacobian. For small to medium sized problems, the cost of
 676      constructing the Schur Complement is small enough that it is
 677      better to construct it explicitly in memory and use it to
 678      evaluate the product :math:`Sx`.
 679
 680 When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
 681 automatically switches from the exact step algorithm to an inexact
 682 step algorithm.
 683
 684   .. NOTE::
 685
 686      In exact arithmetic, the choice of implicit versus explicit Schur
 687      complement would have no impact on solution quality. However, in
 688      practice if the Jacobian is poorly conditioned, one may observe
 689      (usually small) differences in solution quality. This is a
 690      natural consequence of performing computations in finite arithmetic.
 691
 692
 693 .. _section-preconditioner:
 694
 695 Preconditioner
 696 --------------
 697
 698 The convergence rate of Conjugate Gradients for
 699 solving :eq:`normal` depends on the distribution of eigenvalues
 700 of :math:`H` [Saad]_. A useful upper bound is
 701 :math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
 702 number of the matrix :math:`H`. For most bundle adjustment problems,
 703 :math:`\kappa(H)` is high and a direct application of Conjugate
 704 Gradients to :eq:`normal` results in extremely poor performance.
 705
 706 The solution to this problem is to replace :eq:`normal` with a
 707 *preconditioned* system.  Given a linear system, :math:`Ax =b` and a
 708 preconditioner :math:`M` the preconditioned system is given by
 709 :math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as
 710 Preconditioned Conjugate Gradients algorithm (PCG) and its worst case
 711 complexity now depends on the condition number of the *preconditioned*
 712 matrix :math:`\kappa(M^{-1}A)`.
 713
 714 The computational cost of using a preconditioner :math:`M` is the cost
 715 of computing :math:`M` and evaluating the product :math:`M^{-1}y` for
 716 arbitrary vectors :math:`y`. Thus, there are two competing factors to
 717 consider: How much of :math:`H`'s structure is captured by :math:`M`
 718 so that the condition number :math:`\kappa(HM^{-1})` is low, and the
 719 computational cost of constructing and using :math:`M`.  The ideal
 720 preconditioner would be one for which :math:`\kappa(M^{-1}A)
 721 =1`. :math:`M=A` achieves this, but it is not a practical choice, as
 722 applying this preconditioner would require solving a linear system
 723 equivalent to the unpreconditioned problem.  It is usually the case
 724 that the more information :math:`M` has about :math:`H`, the more
 725 expensive it is use. For example, Incomplete Cholesky factorization
 726 based preconditioners have much better convergence behavior than the
 727 Jacobi preconditioner, but are also much more expensive.
 728
 729 The simplest of all preconditioners is the diagonal or Jacobi
 730 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
 731 block structured matrices like :math:`H` can be generalized to the
 732 block Jacobi preconditioner. Ceres implements the block Jacobi
 733 preconditioner and refers to it as ``JACOBI``. When used with
 734 :ref:`section-cgnr` it refers to the block diagonal of :math:`H` and
 735 when used with :ref:`section-iterative_schur` it refers to the block
 736 diagonal of :math:`B` [Mandel]_.
 737
 738 Another obvious choice for :ref:`section-iterative_schur` is the block
 739 diagonal of the Schur complement matrix :math:`S`, i.e, the block
 740 Jacobi preconditioner for :math:`S`. Ceres implements it and refers to
 741 is as the ``SCHUR_JACOBI`` preconditioner.
 742
 743 For bundle adjustment problems arising in reconstruction from
 744 community photo collections, more effective preconditioners can be
 745 constructed by analyzing and exploiting the camera-point visibility
 746 structure of the scene [KushalAgarwal]_. Ceres implements the two
 747 visibility based preconditioners described by Kushal & Agarwal as
 748 ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. These are fairly new
 749 preconditioners and Ceres' implementation of them is in its early
 750 stages and is not as mature as the other preconditioners described
 751 above.
 752
 753 .. _section-ordering:
 754
 755 Ordering
 756 --------
 757
 758 The order in which variables are eliminated in a linear solver can
 759 have a significant of impact on the efficiency and accuracy of the
 760 method. For example when doing sparse Cholesky factorization, there
 761 are matrices for which a good ordering will give a Cholesky factor
 762 with :math:`O(n)` storage, where as a bad ordering will result in an
 763 completely dense factor.
 764
 765 Ceres allows the user to provide varying amounts of hints to the
 766 solver about the variable elimination ordering to use. This can range
 767 from no hints, where the solver is free to decide the best ordering
 768 based on the user's choices like the linear solver being used, to an
 769 exact order in which the variables should be eliminated, and a variety
 770 of possibilities in between.
 771
 772 Instances of the :class:`ParameterBlockOrdering` class are used to
 773 communicate this information to Ceres.
 774
 775 Formally an ordering is an ordered partitioning of the parameter
 776 blocks. Each parameter block belongs to exactly one group, and each
 777 group has a unique integer associated with it, that determines its
 778 order in the set of groups. We call these groups *Elimination Groups*
 779
 780 Given such an ordering, Ceres ensures that the parameter blocks in the
 781 lowest numbered elimination group are eliminated first, and then the
 782 parameter blocks in the next lowest numbered elimination group and so
 783 on. Within each elimination group, Ceres is free to order the
 784 parameter blocks as it chooses. For example, consider the linear system
 785
 786 .. math::
 787   x + y &= 3\\
 788   2x + 3y &= 7
 789
 790 There are two ways in which it can be solved. First eliminating
 791 :math:`x` from the two equations, solving for :math:`y` and then back
 792 substituting for :math:`x`, or first eliminating :math:`y`, solving
 793 for :math:`x` and back substituting for :math:`y`. The user can
 794 construct three orderings here.
 795
 796 1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
 797 2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
 798 3. :math:`\{0: x, y\}`        : Solver gets to decide the elimination order.
 799
 800 Thus, to have Ceres determine the ordering automatically using
 801 heuristics, put all the variables in the same elimination group. The
 802 identity of the group does not matter. This is the same as not
 803 specifying an ordering at all. To control the ordering for every
 804 variable, create an elimination group per variable, ordering them in
 805 the desired order.
 806
 807 If the user is using one of the Schur solvers (``DENSE_SCHUR``,
 808 ``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
 809 ordering, it must have one important property. The lowest numbered
 810 elimination group must form an independent set in the graph
 811 corresponding to the Hessian, or in other words, no two parameter
 812 blocks in in the first elimination group should co-occur in the same
 813 residual block. For the best performance, this elimination group
 814 should be as large as possible. For standard bundle adjustment
 815 problems, this corresponds to the first elimination group containing
 816 all the 3d points, and the second containing the all the cameras
 817 parameter blocks.
 818
 819 If the user leaves the choice to Ceres, then the solver uses an
 820 approximate maximum independent set algorithm to identify the first
 821 elimination group [LiSaad]_.
 822
 823 .. _section-solver-options:
 824
 825 :class:`Solver::Options`
 826 ========================
 827
 828 .. class:: Solver::Options
 829
 830    :class:`Solver::Options` controls the overall behavior of the
 831    solver. We list the various settings and their default values below.
 832
 833 .. function:: bool Solver::Options::IsValid(string* error) const
 834
 835    Validate the values in the options struct and returns true on
 836    success. If there is a problem, the method returns false with
 837    ``error`` containing a textual description of the cause.
 838
 839 .. member:: MinimizerType Solver::Options::minimizer_type
 840
 841    Default: ``TRUST_REGION``
 842
 843    Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
 844    :ref:`section-trust-region-methods` and
 845    :ref:`section-line-search-methods` for more details.
 846
 847 .. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
 848
 849    Default: ``LBFGS``
 850
 851    Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
 852    ``BFGS`` and ``LBFGS``.
 853
 854 .. member:: LineSearchType Solver::Options::line_search_type
 855
 856    Default: ``WOLFE``
 857
 858    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
 859    Note that in order for the assumptions underlying the ``BFGS`` and
 860    ``LBFGS`` line search direction algorithms to be guaranteed to be
 861    satisifed, the ``WOLFE`` line search should be used.
 862
 863 .. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
 864
 865    Default: ``FLETCHER_REEVES``
 866
 867    Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and
 868    ``HESTENES_STIEFEL``.
 869
 870 .. member:: int Solver::Options::max_lbfgs_rank
 871
 872    Default: 20
 873
 874    The L-BFGS hessian approximation is a low rank approximation to the
 875    inverse of the Hessian matrix. The rank of the approximation
 876    determines (linearly) the space and time complexity of using the
 877    approximation. Higher the rank, the better is the quality of the
 878    approximation. The increase in quality is however is bounded for a
 879    number of reasons.
 880
 881      1. The method only uses secant information and not actual
 882         derivatives.
 883
 884      2. The Hessian approximation is constrained to be positive
 885         definite.
 886
 887    So increasing this rank to a large number will cost time and space
 888    complexity without the corresponding increase in solution
 889    quality. There are no hard and fast rules for choosing the maximum
 890    rank. The best choice usually requires some problem specific
 891    experimentation.
 892
 893 .. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
 894
 895    Default: ``false``
 896
 897    As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
 898    step, the initial inverse Hessian approximation is taken to be the
 899    Identity.  However, [Oren]_ showed that using instead :math:`I *
 900    \gamma`, where :math:`\gamma` is a scalar chosen to approximate an
 901    eigenvalue of the true inverse Hessian can result in improved
 902    convergence in a wide variety of cases.  Setting
 903    ``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
 904    scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
 905    iteration).
 906
 907    Precisely, approximate eigenvalue scaling equates to
 908
 909    .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
 910
 911    With:
 912
 913   .. math:: y_k = \nabla f_{k+1} - \nabla f_k
 914   .. math:: s_k = x_{k+1} - x_k
 915
 916   Where :math:`f()` is the line search objective and :math:`x` the
 917   vector of parameter values [NocedalWright]_.
 918
 919   It is important to note that approximate eigenvalue scaling does
 920   **not** *always* improve convergence, and that it can in fact
 921   *significantly* degrade performance for certain classes of problem,
 922   which is why it is disabled by default.  In particular it can
 923   degrade performance when the sensitivity of the problem to different
 924   parameters varies significantly, as in this case a single scalar
 925   factor fails to capture this variation and detrimentally downscales
 926   parts of the Jacobian approximation which correspond to
 927   low-sensitivity parameters. It can also reduce the robustness of the
 928   solution to errors in the Jacobians.
 929
 930 .. member:: LineSearchIterpolationType Solver::Options::line_search_interpolation_type
 931
 932    Default: ``CUBIC``
 933
 934    Degree of the polynomial used to approximate the objective
 935    function. Valid values are ``BISECTION``, ``QUADRATIC`` and
 936    ``CUBIC``.
 937
 938 .. member:: double Solver::Options::min_line_search_step_size
 939
 940    The line search terminates if:
 941
 942    .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
 943
 944    where :math:`\|\cdot\|_\infty` refers to the max norm, and
 945    :math:`\Delta x_k` is the step change in the parameter values at
 946    the :math:`k`-th iteration.
 947
 948 .. member:: double Solver::Options::line_search_sufficient_function_decrease
 949
 950    Default: ``1e-4``
 951
 952    Solving the line search problem exactly is computationally
 953    prohibitive. Fortunately, line search based optimization algorithms
 954    can still guarantee convergence if instead of an exact solution,
 955    the line search algorithm returns a solution which decreases the
 956    value of the objective function sufficiently. More precisely, we
 957    are looking for a step size s.t.
 958
 959    .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
 960
 961    This condition is known as the Armijo condition.
 962
 963 .. member:: double Solver::Options::max_line_search_step_contraction
 964
 965    Default: ``1e-3``
 966
 967    In each iteration of the line search,
 968
 969    .. math:: \text{new_step_size} >= \text{max_line_search_step_contraction} * \text{step_size}
 970
 971    Note that by definition, for contraction:
 972
 973    .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
 974
 975 .. member:: double Solver::Options::min_line_search_step_contraction
 976
 977    Default: ``0.6``
 978
 979    In each iteration of the line search,
 980
 981    .. math:: \text{new_step_size} <= \text{min_line_search_step_contraction} * \text{step_size}
 982
 983    Note that by definition, for contraction:
 984
 985    .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
 986
 987 .. member:: int Solver::Options::max_num_line_search_step_size_iterations
 988
 989    Default: ``20``
 990
 991    Maximum number of trial step size iterations during each line
 992    search, if a step size satisfying the search conditions cannot be
 993    found within this number of trials, the line search will stop.
 994
 995    As this is an 'artificial' constraint (one imposed by the user, not
 996    the underlying math), if ``WOLFE`` line search is being used, *and*
 997    points satisfying the Armijo sufficient (function) decrease
 998    condition have been found during the current search (in :math:`<=`
 999    ``max_num_line_search_step_size_iterations``).  Then, the step size
1000    with the lowest function value which satisfies the Armijo condition
1001    will be returned as the new valid step, even though it does *not*
1002    satisfy the strong Wolfe conditions.  This behaviour protects
1003    against early termination of the optimizer at a sub-optimal point.
1004
1005 .. member:: int Solver::Options::max_num_line_search_direction_restarts
1006
1007    Default: ``5``
1008
1009    Maximum number of restarts of the line search direction algorithm
1010    before terminating the optimization. Restarts of the line search
1011    direction algorithm occur when the current algorithm fails to
1012    produce a new descent direction. This typically indicates a
1013    numerical failure, or a breakdown in the validity of the
1014    approximations used.
1015
1016 .. member:: double Solver::Options::line_search_sufficient_curvature_decrease
1017
1018    Default: ``0.9``
1019
1020    The strong Wolfe conditions consist of the Armijo sufficient
1021    decrease condition, and an additional requirement that the
1022    step size be chosen s.t. the *magnitude* ('strong' Wolfe
1023    conditions) of the gradient along the search direction
1024    decreases sufficiently. Precisely, this second condition
1025    is that we seek a step size s.t.
1026
1027    .. math:: \|f'(\text{step_size})\| <= \text{sufficient_curvature_decrease} * \|f'(0)\|
1028
1029    Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
1030    of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
1031
1032 .. member:: double Solver::Options::max_line_search_step_expansion
1033
1034    Default: ``10.0``
1035
1036    During the bracketing phase of a Wolfe line search, the step size
1037    is increased until either a point satisfying the Wolfe conditions
1038    is found, or an upper bound for a bracket containing a point
1039    satisfying the conditions is found.  Precisely, at each iteration
1040    of the expansion:
1041
1042    .. math:: \text{new_step_size} <= \text{max_step_expansion} * \text{step_size}
1043
1044    By definition for expansion
1045
1046    .. math:: \text{max_step_expansion} > 1.0
1047
1048 .. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
1049
1050    Default: ``LEVENBERG_MARQUARDT``
1051
1052    The trust region step computation algorithm used by
1053    Ceres. Currently ``LEVENBERG_MARQUARDT`` and ``DOGLEG`` are the two
1054    valid choices. See :ref:`section-levenberg-marquardt` and
1055    :ref:`section-dogleg` for more details.
1056
1057 .. member:: DoglegType Solver::Options::dogleg_type
1058
1059    Default: ``TRADITIONAL_DOGLEG``
1060
1061    Ceres supports two different dogleg strategies.
1062    ``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
1063    method described by [ByrdSchnabel]_ .  See :ref:`section-dogleg`
1064    for more details.
1065
1066 .. member:: bool Solver::Options::use_nonmonotonic_steps
1067
1068    Default: ``false``
1069
1070    Relax the requirement that the trust-region algorithm take strictly
1071    decreasing steps. See :ref:`section-non-monotonic-steps` for more
1072    details.
1073
1074 .. member:: int Solver::Options::max_consecutive_nonmonotonic_steps
1075
1076    Default: ``5``
1077
1078    The window size used by the step selection algorithm to accept
1079    non-monotonic steps.
1080
1081 .. member:: int Solver::Options::max_num_iterations
1082
1083    Default: ``50``
1084
1085    Maximum number of iterations for which the solver should run.
1086
1087 .. member:: double Solver::Options::max_solver_time_in_seconds
1088
1089    Default: ``1e6``
1090    Maximum amount of time for which the solver should run.
1091
1092 .. member:: int Solver::Options::num_threads
1093
1094    Default: ``1``
1095
1096    Number of threads used by Ceres to evaluate the Jacobian.
1097
1098 .. member::  double Solver::Options::initial_trust_region_radius
1099
1100    Default: ``1e4``
1101
1102    The size of the initial trust region. When the
1103    ``LEVENBERG_MARQUARDT`` strategy is used, the reciprocal of this
1104    number is the initial regularization parameter.
1105
1106 .. member:: double Solver::Options::max_trust_region_radius
1107
1108    Default: ``1e16``
1109
1110    The trust region radius is not allowed to grow beyond this value.
1111
1112 .. member:: double Solver::Options::min_trust_region_radius
1113
1114    Default: ``1e-32``
1115
1116    The solver terminates, when the trust region becomes smaller than
1117    this value.
1118
1119 .. member:: double Solver::Options::min_relative_decrease
1120
1121    Default: ``1e-3``
1122
1123    Lower threshold for relative decrease before a trust-region step is
1124    accepted.
1125
1126 .. member:: double Solver::Options::min_lm_diagonal
1127
1128    Default: ``1e6``
1129
1130    The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
1131    regularize the trust region step. This is the lower bound on
1132    the values of this diagonal matrix.
1133
1134 .. member:: double Solver::Options::max_lm_diagonal
1135
1136    Default:  ``1e32``
1137
1138    The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
1139    regularize the trust region step. This is the upper bound on
1140    the values of this diagonal matrix.
1141
1142 .. member:: int Solver::Options::max_num_consecutive_invalid_steps
1143
1144    Default: ``5``
1145
1146    The step returned by a trust region strategy can sometimes be
1147    numerically invalid, usually because of conditioning
1148    issues. Instead of crashing or stopping the optimization, the
1149    optimizer can go ahead and try solving with a smaller trust
1150    region/better conditioned problem. This parameter sets the number
1151    of consecutive retries before the minimizer gives up.
1152
1153 .. member:: double Solver::Options::function_tolerance
1154
1155    Default: ``1e-6``
1156
1157    Solver terminates if
1158
1159    .. math:: \frac{|\Delta \text{cost}|}{\text{cost}} <= \text{function_tolerance}
1160
1161    where, :math:`\Delta \text{cost}` is the change in objective
1162    function value (up or down) in the current iteration of
1163    Levenberg-Marquardt.
1164
1165 .. member:: double Solver::Options::gradient_tolerance
1166
1167    Default: ``1e-10``
1168
1169    Solver terminates if
1170
1171    .. math:: \|x - \Pi \boxplus(x, -g(x))\|_\infty <= \text{gradient_tolerance}
1172
1173    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
1174    is projection onto the bounds constraints and :math:`\boxplus` is
1175    Plus operation for the overall local parameterization associated
1176    with the parameter vector.
1177
1178 .. member:: double Solver::Options::parameter_tolerance
1179
1180    Default: ``1e-8``
1181
1182    Solver terminates if
1183
1184    .. math:: \|\Delta x\| <= (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
1185
1186    where :math:`\Delta x` is the step computed by the linear solver in
1187    the current iteration.
1188
1189 .. member:: LinearSolverType Solver::Options::linear_solver_type
1190
1191    Default: ``SPARSE_NORMAL_CHOLESKY`` / ``DENSE_QR``
1192
1193    Type of linear solver used to compute the solution to the linear
1194    least squares problem in each iteration of the Levenberg-Marquardt
1195    algorithm. If Ceres is built with support for ``SuiteSparse`` or
1196    ``CXSparse`` or ``Eigen``'s sparse Cholesky factorization, the
1197    default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
1198    otherwise.
1199
1200 .. member:: PreconditionerType Solver::Options::preconditioner_type
1201
1202    Default: ``JACOBI``
1203
1204    The preconditioner used by the iterative linear solver. The default
1205    is the block Jacobi preconditioner. Valid values are (in increasing
1206    order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
1207    ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
1208    :ref:`section-preconditioner` for more details.
1209
1210 .. member:: VisibilityClusteringType Solver::Options::visibility_clustering_type
1211
1212    Default: ``CANONICAL_VIEWS``
1213
1214    Type of clustering algorithm to use when constructing a visibility
1215    based preconditioner. The original visibility based preconditioning
1216    paper and implementation only used the canonical views algorithm.
1217
1218    This algorithm gives high quality results but for large dense
1219    graphs can be particularly expensive. As its worst case complexity
1220    is cubic in size of the graph.
1221
1222    Another option is to use ``SINGLE_LINKAGE`` which is a simple
1223    thresholded single linkage clustering algorithm that only pays
1224    attention to tightly coupled blocks in the Schur complement. This
1225    is a fast algorithm that works well.
1226
1227    The optimal choice of the clustering algorithm depends on the
1228    sparsity structure of the problem, but generally speaking we
1229    recommend that you try ``CANONICAL_VIEWS`` first and if it is too
1230    expensive try ``SINGLE_LINKAGE``.
1231
1232 .. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type
1233
1234    Default:``EIGEN``
1235
1236    Ceres supports using multiple dense linear algebra libraries for
1237    dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
1238    the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
1239    to the system ``BLAS + LAPACK`` library which may or may not be
1240    available.
1241
1242    This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
1243    and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
1244    ``EIGEN`` is a fine choice but for large problems, an optimized
1245    ``LAPACK + BLAS`` implementation can make a substantial difference
1246    in performance.
1247
1248 .. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
1249
1250    Default: The highest available according to: ``SUITE_SPARSE`` >
1251    ``CX_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
1252
1253    Ceres supports the use of three sparse linear algebra libraries,
1254    ``SuiteSparse``, which is enabled by setting this parameter to
1255    ``SUITE_SPARSE``, ``CXSparse``, which can be selected by setting
1256    this parameter to ``CX_SPARSE`` and ``Eigen`` which is enabled by
1257    setting this parameter to ``EIGEN_SPARSE``.  Lastly, ``NO_SPARSE``
1258    means that no sparse linear solver should be used; note that this is
1259    irrespective of whether Ceres was compiled with support for one.
1260
1261    ``SuiteSparse`` is a sophisticated and complex sparse linear
1262    algebra library and should be used in general.
1263
1264    If your needs/platforms prevent you from using ``SuiteSparse``,
1265    consider using ``CXSparse``, which is a much smaller, easier to
1266    build library. As can be expected, its performance on large
1267    problems is not comparable to that of ``SuiteSparse``.
1268
1269    Last but not the least you can use the sparse linear algebra
1270    routines in ``Eigen``. Currently the performance of this library is
1271    the poorest of the three. But this should change in the near
1272    future.
1273
1274    Another thing to consider here is that the sparse Cholesky
1275    factorization libraries in Eigen are licensed under ``LGPL`` and
1276    building Ceres with support for ``EIGEN_SPARSE`` will result in an
1277    LGPL licensed library (since the corresponding code from Eigen is
1278    compiled into the library).
1279
1280    The upside is that you do not need to build and link to an external
1281    library to use ``EIGEN_SPARSE``.
1282
1283 .. member:: int Solver::Options::num_linear_solver_threads
1284
1285    Default: ``1``
1286
1287    Number of threads used by the linear solver.
1288
1289 .. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
1290
1291    Default: ``NULL``
1292
1293    An instance of the ordering object informs the solver about the
1294    desired order in which parameter blocks should be eliminated by the
1295    linear solvers. See section~\ref{sec:ordering`` for more details.
1296
1297    If ``NULL``, the solver is free to choose an ordering that it
1298    thinks is best.
1299
1300    See :ref:`section-ordering` for more details.
1301
1302 .. member:: bool Solver::Options::use_explicit_schur_complement
1303
1304    Default: ``false``
1305
1306    Use an explicitly computed Schur complement matrix with
1307    ``ITERATIVE_SCHUR``.
1308
1309    By default this option is disabled and ``ITERATIVE_SCHUR``
1310    evaluates evaluates matrix-vector products between the Schur
1311    complement and a vector implicitly by exploiting the algebraic
1312    expression for the Schur complement.
1313
1314    The cost of this evaluation scales with the number of non-zeros in
1315    the Jacobian.
1316
1317    For small to medium sized problems there is a sweet spot where
1318    computing the Schur complement is cheap enough that it is much more
1319    efficient to explicitly compute it and use it for evaluating the
1320    matrix-vector products.
1321
1322    Enabling this option tells ``ITERATIVE_SCHUR`` to use an explicitly
1323    computed Schur complement. This can improve the performance of the
1324    ``ITERATIVE_SCHUR`` solver significantly.
1325
1326    .. NOTE:
1327
1328      This option can only be used with the ``SCHUR_JACOBI``
1329      preconditioner.
1330
1331 .. member:: bool Solver::Options::use_post_ordering
1332
1333    Default: ``false``
1334
1335    Sparse Cholesky factorization algorithms use a fill-reducing
1336    ordering to permute the columns of the Jacobian matrix. There are
1337    two ways of doing this.
1338
1339    1. Compute the Jacobian matrix in some order and then have the
1340       factorization algorithm permute the columns of the Jacobian.
1341
1342    2. Compute the Jacobian with its columns already permuted.
1343
1344    The first option incurs a significant memory penalty. The
1345    factorization algorithm has to make a copy of the permuted Jacobian
1346    matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
1347    and generally speaking, there is no performance penalty for doing
1348    so.
1349
1350    In some rare cases, it is worth using a more complicated reordering
1351    algorithm which has slightly better runtime performance at the
1352    expense of an extra copy of the Jacobian matrix. Setting
1353    ``use_postordering`` to ``true`` enables this tradeoff.
1354
1355 .. member:: bool Solver::Options::dynamic_sparsity
1356
1357    Some non-linear least squares problems are symbolically dense but
1358    numerically sparse. i.e. at any given state only a small number of
1359    Jacobian entries are non-zero, but the position and number of
1360    non-zeros is different depending on the state. For these problems
1361    it can be useful to factorize the sparse jacobian at each solver
1362    iteration instead of including all of the zero entries in a single
1363    general factorization.
1364
1365    If your problem does not have this property (or you do not know),
1366    then it is probably best to keep this false, otherwise it will
1367    likely lead to worse performance.
1368
1369    This setting only affects the `SPARSE_NORMAL_CHOLESKY` solver.
1370
1371 .. member:: int Solver::Options::min_linear_solver_iterations
1372
1373    Default: ``0``
1374
1375    Minimum number of iterations used by the linear solver. This only
1376    makes sense when the linear solver is an iterative solver, e.g.,
1377    ``ITERATIVE_SCHUR`` or ``CGNR``.
1378
1379 .. member:: int Solver::Options::max_linear_solver_iterations
1380
1381    Default: ``500``
1382
1383    Minimum number of iterations used by the linear solver. This only
1384    makes sense when the linear solver is an iterative solver, e.g.,
1385    ``ITERATIVE_SCHUR`` or ``CGNR``.
1386
1387 .. member:: double Solver::Options::eta
1388
1389    Default: ``1e-1``
1390
1391    Forcing sequence parameter. The truncated Newton solver uses this
1392    number to control the relative accuracy with which the Newton step
1393    is computed. This constant is passed to
1394    ``ConjugateGradientsSolver`` which uses it to terminate the
1395    iterations when
1396
1397    .. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
1398
1399 .. member:: bool Solver::Options::jacobi_scaling
1400
1401    Default: ``true``
1402
1403    ``true`` means that the Jacobian is scaled by the norm of its
1404    columns before being passed to the linear solver. This improves the
1405    numerical conditioning of the normal equations.
1406
1407 .. member:: bool Solver::Options::use_inner_iterations
1408
1409    Default: ``false``
1410
1411    Use a non-linear version of a simplified variable projection
1412    algorithm. Essentially this amounts to doing a further optimization
1413    on each Newton/Trust region step using a coordinate descent
1414    algorithm.  For more details, see :ref:`section-inner-iterations`.
1415
1416 .. member:: double Solver::Options::inner_iteration_tolerance
1417
1418    Default: ``1e-3``
1419
1420    Generally speaking, inner iterations make significant progress in
1421    the early stages of the solve and then their contribution drops
1422    down sharply, at which point the time spent doing inner iterations
1423    is not worth it.
1424
1425    Once the relative decrease in the objective function due to inner
1426    iterations drops below ``inner_iteration_tolerance``, the use of
1427    inner iterations in subsequent trust region minimizer iterations is
1428    disabled.
1429
1430 .. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
1431
1432    Default: ``NULL``
1433
1434    If :member:`Solver::Options::use_inner_iterations` true, then the
1435    user has two choices.
1436
1437    1. Let the solver heuristically decide which parameter blocks to
1438       optimize in each inner iteration. To do this, set
1439       :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
1440
1441    2. Specify a collection of of ordered independent sets. The lower
1442       numbered groups are optimized before the higher number groups
1443       during the inner optimization phase. Each group must be an
1444       independent set. Not all parameter blocks need to be included in
1445       the ordering.
1446
1447    See :ref:`section-ordering` for more details.
1448
1449 .. member:: LoggingType Solver::Options::logging_type
1450
1451    Default: ``PER_MINIMIZER_ITERATION``
1452
1453 .. member:: bool Solver::Options::minimizer_progress_to_stdout
1454
1455    Default: ``false``
1456
1457    By default the :class:`Minimizer` progress is logged to ``STDERR``
1458    depending on the ``vlog`` level. If this flag is set to true, and
1459    :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
1460    output is sent to ``STDOUT``.
1461
1462    For ``TRUST_REGION_MINIMIZER`` the progress display looks like
1463
1464    .. code-block:: bash
1465
1466       iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
1467          0  4.185660e+06    0.00e+00    1.09e+08   0.00e+00   0.00e+00  1.00e+04       0    7.59e-02    3.37e-01
1468          1  1.062590e+05    4.08e+06    8.99e+06   5.36e+02   9.82e-01  3.00e+04       1    1.65e-01    5.03e-01
1469          2  4.992817e+04    5.63e+04    8.32e+06   3.19e+02   6.52e-01  3.09e+04       1    1.45e-01    6.48e-01
1470
1471    Here
1472
1473    #. ``cost`` is the value of the objective function.
1474    #. ``cost_change`` is the change in the value of the objective
1475       function if the step computed in this iteration is accepted.
1476    #. ``|gradient|`` is the max norm of the gradient.
1477    #. ``|step|`` is the change in the parameter vector.
1478    #. ``tr_ratio`` is the ratio of the actual change in the objective
1479       function value to the change in the value of the trust
1480       region model.
1481    #. ``tr_radius`` is the size of the trust region radius.
1482    #. ``ls_iter`` is the number of linear solver iterations used to
1483       compute the trust region step. For direct/factorization based
1484       solvers it is always 1, for iterative solvers like
1485       ``ITERATIVE_SCHUR`` it is the number of iterations of the
1486       Conjugate Gradients algorithm.
1487    #. ``iter_time`` is the time take by the current iteration.
1488    #. ``total_time`` is the total time taken by the minimizer.
1489
1490    For ``LINE_SEARCH_MINIMIZER`` the progress display looks like
1491
1492    .. code-block:: bash
1493
1494       0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e:  0 it: 2.98e-02 tt: 8.50e-02
1495       1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e:  1 it: 4.54e-02 tt: 1.31e-01
1496       2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e:  1 it: 4.96e-02 tt: 1.81e-01
1497
1498    Here
1499
1500    #. ``f`` is the value of the objective function.
1501    #. ``d`` is the change in the value of the objective function if
1502       the step computed in this iteration is accepted.
1503    #. ``g`` is the max norm of the gradient.
1504    #. ``h`` is the change in the parameter vector.
1505    #. ``s`` is the optimal step length computed by the line search.
1506    #. ``it`` is the time take by the current iteration.
1507    #. ``tt`` is the total time taken by the minimizer.
1508
1509 .. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
1510
1511    Default: ``empty``
1512
1513    List of iterations at which the trust region minimizer should dump
1514    the trust region problem. Useful for testing and benchmarking. If
1515    ``empty``, no problems are dumped.
1516
1517 .. member:: string Solver::Options::trust_region_problem_dump_directory
1518
1519    Default: ``/tmp``
1520
1521     Directory to which the problems should be written to. Should be
1522     non-empty if
1523     :member:`Solver::Options::trust_region_minimizer_iterations_to_dump` is
1524     non-empty and
1525     :member:`Solver::Options::trust_region_problem_dump_format_type` is not
1526     ``CONSOLE``.
1527
1528 .. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
1529
1530    Default: ``TEXTFILE``
1531
1532    The format in which trust region problems should be logged when
1533    :member:`Solver::Options::trust_region_minimizer_iterations_to_dump`
1534    is non-empty.  There are three options:
1535
1536    * ``CONSOLE`` prints the linear least squares problem in a human
1537       readable format to ``stderr``. The Jacobian is printed as a
1538       dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are
1539       printed as dense vectors. This should only be used for small
1540       problems.
1541
1542    * ``TEXTFILE`` Write out the linear least squares problem to the
1543      directory pointed to by
1544      :member:`Solver::Options::trust_region_problem_dump_directory` as
1545      text files which can be read into ``MATLAB/Octave``. The Jacobian
1546      is dumped as a text file containing :math:`(i,j,s)` triplets, the
1547      vectors :math:`D`, `x` and `f` are dumped as text files
1548      containing a list of their values.
1549
1550      A ``MATLAB/Octave`` script called
1551      ``ceres_solver_iteration_???.m`` is also output, which can be
1552      used to parse and load the problem into memory.
1553
1554 .. member:: bool Solver::Options::check_gradients
1555
1556    Default: ``false``
1557
1558    Check all Jacobians computed by each residual block with finite
1559    differences. This is expensive since it involves computing the
1560    derivative by normal means (e.g. user specified, autodiff, etc),
1561    then also computing it using finite differences. The results are
1562    compared, and if they differ substantially, the optimization fails
1563    and the details are stored in the solver summary.
1564
1565 .. member:: double Solver::Options::gradient_check_relative_precision
1566
1567    Default: ``1e08``
1568
1569    Precision to check for in the gradient checker. If the relative
1570    difference between an element in a Jacobian exceeds this number,
1571    then the Jacobian for that cost term is dumped.
1572
1573 .. member:: double Solver::Options::gradient_check_numeric_derivative_relative_step_size
1574
1575    Default: ``1e-6``
1576
1577    .. NOTE::
1578
1579       This option only applies to the numeric differentiation used for
1580       checking the user provided derivatives when when
1581       `Solver::Options::check_gradients` is true. If you are using
1582       :class:`NumericDiffCostFunction` and are interested in changing
1583       the step size for numeric differentiation in your cost function,
1584       please have a look at :class:`NumericDiffOptions`.
1585
1586    Relative shift used for taking numeric derivatives when
1587    `Solver::Options::check_gradients` is `true`.
1588
1589    For finite differencing, each dimension is evaluated at slightly
1590    shifted values, e.g., for forward differences, the numerical
1591    derivative is
1592
1593    .. math::
1594
1595      \delta &= gradient\_check\_numeric\_derivative\_relative\_step\_size\\
1596      \Delta f &= \frac{f((1 + \delta)  x) - f(x)}{\delta x}
1597
1598    The finite differencing is done along each dimension. The reason to
1599    use a relative (rather than absolute) step size is that this way,
1600    numeric differentiation works for functions where the arguments are
1601    typically large (e.g. :math:`10^9`) and when the values are small
1602    (e.g. :math:`10^{-5}`). It is possible to construct *torture cases*
1603    which break this finite difference heuristic, but they do not come
1604    up often in practice.
1605
1606 .. member:: vector<IterationCallback> Solver::Options::callbacks
1607
1608    Callbacks that are executed at the end of each iteration of the
1609    :class:`Minimizer`. They are executed in the order that they are
1610    specified in this vector. By default, parameter blocks are updated
1611    only at the end of the optimization, i.e., when the
1612    :class:`Minimizer` terminates. This behavior is controlled by
1613    :member:`Solver::Options::update_state_every_variable`. If the user
1614    wishes to have access to the update parameter blocks when his/her
1615    callbacks are executed, then set
1616    :member:`Solver::Options::update_state_every_iteration` to true.
1617
1618    The solver does NOT take ownership of these pointers.
1619
1620 .. member:: bool Solver::Options::update_state_every_iteration
1621
1622    Default: ``false``
1623
1624    Normally the parameter blocks are only updated when the solver
1625    terminates. Setting this to true update them in every
1626    iteration. This setting is useful when building an interactive
1627    application using Ceres and using an :class:`IterationCallback`.
1628
1629 :class:`ParameterBlockOrdering`
1630 ===============================
1631
1632 .. class:: ParameterBlockOrdering
1633
1634    ``ParameterBlockOrdering`` is a class for storing and manipulating
1635    an ordered collection of groups/sets with the following semantics:
1636
1637    Group IDs are non-negative integer values. Elements are any type
1638    that can serve as a key in a map or an element of a set.
1639
1640    An element can only belong to one group at a time. A group may
1641    contain an arbitrary number of elements.
1642
1643    Groups are ordered by their group id.
1644
1645 .. function:: bool ParameterBlockOrdering::AddElementToGroup(const double* element, const int group)
1646
1647    Add an element to a group. If a group with this id does not exist,
1648    one is created. This method can be called any number of times for
1649    the same element. Group ids should be non-negative numbers.  Return
1650    value indicates if adding the element was a success.
1651
1652 .. function:: void ParameterBlockOrdering::Clear()
1653
1654    Clear the ordering.
1655
1656 .. function:: bool ParameterBlockOrdering::Remove(const double* element)
1657
1658    Remove the element, no matter what group it is in. If the element
1659    is not a member of any group, calling this method will result in a
1660    crash.  Return value indicates if the element was actually removed.
1661
1662 .. function:: void ParameterBlockOrdering::Reverse()
1663
1664    Reverse the order of the groups in place.
1665
1666 .. function:: int ParameterBlockOrdering::GroupId(const double* element) const
1667
1668    Return the group id for the element. If the element is not a member
1669    of any group, return -1.
1670
1671 .. function:: bool ParameterBlockOrdering::IsMember(const double* element) const
1672
1673    True if there is a group containing the parameter block.
1674
1675 .. function:: int ParameterBlockOrdering::GroupSize(const int group) const
1676
1677    This function always succeeds, i.e., implicitly there exists a
1678    group for every integer.
1679
1680 .. function:: int ParameterBlockOrdering::NumElements() const
1681
1682    Number of elements in the ordering.
1683
1684 .. function:: int ParameterBlockOrdering::NumGroups() const
1685
1686    Number of groups with one or more elements.
1687
1688
1689 :class:`IterationCallback`
1690 ==========================
1691
1692 .. class:: IterationSummary
1693
1694    :class:`IterationSummary` describes the state of the minimizer at
1695    the end of each iteration.
1696
1697 .. member:: int32 IterationSummary::iteration
1698
1699    Current iteration number.
1700
1701 .. member:: bool IterationSummary::step_is_valid
1702
1703    Step was numerically valid, i.e., all values are finite and the
1704    step reduces the value of the linearized model.
1705
1706     **Note**: :member:`IterationSummary::step_is_valid` is `false`
1707     when :member:`IterationSummary::iteration` = 0.
1708
1709 .. member::  bool IterationSummary::step_is_nonmonotonic
1710
1711     Step did not reduce the value of the objective function
1712     sufficiently, but it was accepted because of the relaxed
1713     acceptance criterion used by the non-monotonic trust region
1714     algorithm.
1715
1716     **Note**: :member:`IterationSummary::step_is_nonmonotonic` is
1717     `false` when when :member:`IterationSummary::iteration` = 0.
1718
1719 .. member:: bool IterationSummary::step_is_successful
1720
1721    Whether or not the minimizer accepted this step or not.
1722
1723    If the ordinary trust region algorithm is used, this means that the
1724    relative reduction in the objective function value was greater than
1725    :member:`Solver::Options::min_relative_decrease`. However, if the
1726    non-monotonic trust region algorithm is used
1727    (:member:`Solver::Options::use_nonmonotonic_steps` = `true`), then
1728    even if the relative decrease is not sufficient, the algorithm may
1729    accept the step and the step is declared successful.
1730
1731    **Note**: :member:`IterationSummary::step_is_successful` is `false`
1732    when when :member:`IterationSummary::iteration` = 0.
1733
1734 .. member:: double IterationSummary::cost
1735
1736    Value of the objective function.
1737
1738 .. member:: double IterationSummary::cost_change
1739
1740    Change in the value of the objective function in this
1741    iteration. This can be positive or negative.
1742
1743 .. member:: double IterationSummary::gradient_max_norm
1744
1745    Infinity norm of the gradient vector.
1746
1747 .. member:: double IterationSummary::gradient_norm
1748
1749    2-norm of the gradient vector.
1750
1751 .. member:: double IterationSummary::step_norm
1752
1753    2-norm of the size of the step computed in this iteration.
1754
1755 .. member:: double IterationSummary::relative_decrease
1756
1757    For trust region algorithms, the ratio of the actual change in cost
1758    and the change in the cost of the linearized approximation.
1759
1760    This field is not used when a linear search minimizer is used.
1761
1762 .. member:: double IterationSummary::trust_region_radius
1763
1764    Size of the trust region at the end of the current iteration. For
1765    the Levenberg-Marquardt algorithm, the regularization parameter is
1766    1.0 / member::`IterationSummary::trust_region_radius`.
1767
1768 .. member:: double IterationSummary::eta
1769
1770    For the inexact step Levenberg-Marquardt algorithm, this is the
1771    relative accuracy with which the step is solved. This number is
1772    only applicable to the iterative solvers capable of solving linear
1773    systems inexactly. Factorization-based exact solvers always have an
1774    eta of 0.0.
1775
1776 .. member:: double IterationSummary::step_size
1777
1778    Step sized computed by the line search algorithm.
1779
1780    This field is not used when a trust region minimizer is used.
1781
1782 .. member:: int IterationSummary::line_search_function_evaluations
1783
1784    Number of function evaluations used by the line search algorithm.
1785
1786    This field is not used when a trust region minimizer is used.
1787
1788 .. member:: int IterationSummary::linear_solver_iterations
1789
1790    Number of iterations taken by the linear solver to solve for the
1791    trust region step.
1792
1793    Currently this field is not used when a line search minimizer is
1794    used.
1795
1796 .. member:: double IterationSummary::iteration_time_in_seconds
1797
1798    Time (in seconds) spent inside the minimizer loop in the current
1799    iteration.
1800
1801 .. member:: double IterationSummary::step_solver_time_in_seconds
1802
1803    Time (in seconds) spent inside the trust region step solver.
1804
1805 .. member:: double IterationSummary::cumulative_time_in_seconds
1806
1807    Time (in seconds) since the user called Solve().
1808
1809
1810 .. class:: IterationCallback
1811
1812    Interface for specifying callbacks that are executed at the end of
1813    each iteration of the minimizer.
1814
1815    .. code-block:: c++
1816
1817       class IterationCallback {
1818        public:
1819         virtual ~IterationCallback() {}
1820         virtual CallbackReturnType operator()(const IterationSummary& summary) = 0;
1821       };
1822
1823
1824   The solver uses the return value of ``operator()`` to decide whether
1825   to continue solving or to terminate. The user can return three
1826   values.
1827
1828   #. ``SOLVER_ABORT`` indicates that the callback detected an abnormal
1829      situation. The solver returns without updating the parameter
1830      blocks (unless ``Solver::Options::update_state_every_iteration`` is
1831      set true). Solver returns with ``Solver::Summary::termination_type``
1832      set to ``USER_FAILURE``.
1833
1834   #. ``SOLVER_TERMINATE_SUCCESSFULLY`` indicates that there is no need
1835      to optimize anymore (some user specified termination criterion
1836      has been met). Solver returns with
1837      ``Solver::Summary::termination_type``` set to ``USER_SUCCESS``.
1838
1839   #. ``SOLVER_CONTINUE`` indicates that the solver should continue
1840      optimizing.
1841
1842   For example, the following :class:`IterationCallback` is used
1843   internally by Ceres to log the progress of the optimization.
1844
1845   .. code-block:: c++
1846
1847     class LoggingCallback : public IterationCallback {
1848      public:
1849       explicit LoggingCallback(bool log_to_stdout)
1850           : log_to_stdout_(log_to_stdout) {}
1851
1852       ~LoggingCallback() {}
1853
1854       CallbackReturnType operator()(const IterationSummary& summary) {
1855         const char* kReportRowFormat =
1856             "% 4d: f:% 8e d:% 3.2e g:% 3.2e h:% 3.2e "
1857             "rho:% 3.2e mu:% 3.2e eta:% 3.2e li:% 3d";
1858         string output = StringPrintf(kReportRowFormat,
1859                                      summary.iteration,
1860                                      summary.cost,
1861                                      summary.cost_change,
1862                                      summary.gradient_max_norm,
1863                                      summary.step_norm,
1864                                      summary.relative_decrease,
1865                                      summary.trust_region_radius,
1866                                      summary.eta,
1867                                      summary.linear_solver_iterations);
1868         if (log_to_stdout_) {
1869           cout << output << endl;
1870         } else {
1871           VLOG(1) << output;
1872         }
1873         return SOLVER_CONTINUE;
1874       }
1875
1876      private:
1877       const bool log_to_stdout_;
1878     };
1879
1880
1881
1882 :class:`CRSMatrix`
1883 ==================
1884
1885 .. class:: CRSMatrix
1886
1887    A compressed row sparse matrix used primarily for communicating the
1888    Jacobian matrix to the user.
1889
1890 .. member:: int CRSMatrix::num_rows
1891
1892    Number of rows.
1893
1894 .. member:: int CRSMatrix::num_cols
1895
1896    Number of columns.
1897
1898 .. member:: vector<int> CRSMatrix::rows
1899
1900    :member:`CRSMatrix::rows` is a :member:`CRSMatrix::num_rows` + 1
1901    sized array that points into the :member:`CRSMatrix::cols` and
1902    :member:`CRSMatrix::values` array.
1903
1904 .. member:: vector<int> CRSMatrix::cols
1905
1906    :member:`CRSMatrix::cols` contain as many entries as there are
1907    non-zeros in the matrix.
1908
1909    For each row ``i``, ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]``
1910    are the indices of the non-zero columns of row ``i``.
1911
1912 .. member:: vector<int> CRSMatrix::values
1913
1914    :member:`CRSMatrix::values` contain as many entries as there are
1915    non-zeros in the matrix.
1916
1917    For each row ``i``,
1918    ``values[rows[i]]`` ... ``values[rows[i + 1] - 1]`` are the values
1919    of the non-zero columns of row ``i``.
1920
1921 e.g., consider the 3x4 sparse matrix
1922
1923 .. code-block:: c++
1924
1925    0 10  0  4
1926    0  2 -3  2
1927    1  2  0  0
1928
1929 The three arrays will be:
1930
1931 .. code-block:: c++
1932
1933             -row0-  ---row1---  -row2-
1934    rows   = [ 0,      2,          5,     7]
1935    cols   = [ 1,  3,  1,  2,  3,  0,  1]
1936    values = [10,  4,  2, -3,  2,  1,  2]
1937
1938
1939 :class:`Solver::Summary`
1940 ========================
1941
1942 .. class:: Solver::Summary
1943
1944    Summary of the various stages of the solver after termination.
1945
1946 .. function:: string Solver::Summary::BriefReport() const
1947
1948    A brief one line description of the state of the solver after
1949    termination.
1950
1951 .. function:: string Solver::Summary::FullReport() const
1952
1953    A full multiline description of the state of the solver after
1954    termination.
1955
1956 .. function:: bool Solver::Summary::IsSolutionUsable() const
1957
1958    Whether the solution returned by the optimization algorithm can be
1959    relied on to be numerically sane. This will be the case if
1960    `Solver::Summary:termination_type` is set to `CONVERGENCE`,
1961    `USER_SUCCESS` or `NO_CONVERGENCE`, i.e., either the solver
1962    converged by meeting one of the convergence tolerances or because
1963    the user indicated that it had converged or it ran to the maximum
1964    number of iterations or time.
1965
1966 .. member:: MinimizerType Solver::Summary::minimizer_type
1967
1968    Type of minimization algorithm used.
1969
1970 .. member:: TerminationType Solver::Summary::termination_type
1971
1972    The cause of the minimizer terminating.
1973
1974 .. member:: string Solver::Summary::message
1975
1976    Reason why the solver terminated.
1977
1978 .. member:: double Solver::Summary::initial_cost
1979
1980    Cost of the problem (value of the objective function) before the
1981    optimization.
1982
1983 .. member:: double Solver::Summary::final_cost
1984
1985    Cost of the problem (value of the objective function) after the
1986    optimization.
1987
1988 .. member:: double Solver::Summary::fixed_cost
1989
1990    The part of the total cost that comes from residual blocks that
1991    were held fixed by the preprocessor because all the parameter
1992    blocks that they depend on were fixed.
1993
1994 .. member:: vector<IterationSummary> Solver::Summary::iterations
1995
1996    :class:`IterationSummary` for each minimizer iteration in order.
1997
1998 .. member:: int Solver::Summary::num_successful_steps
1999
2000    Number of minimizer iterations in which the step was
2001    accepted. Unless :member:`Solver::Options::use_non_monotonic_steps`
2002    is `true` this is also the number of steps in which the objective
2003    function value/cost went down.
2004
2005 .. member:: int Solver::Summary::num_unsuccessful_steps
2006
2007    Number of minimizer iterations in which the step was rejected
2008    either because it did not reduce the cost enough or the step was
2009    not numerically valid.
2010
2011 .. member:: int Solver::Summary::num_inner_iteration_steps
2012
2013    Number of times inner iterations were performed.
2014
2015  .. member:: int Solver::Summary::num_line_search_steps
2016
2017     Total number of iterations inside the line search algorithm across
2018     all invocations. We call these iterations "steps" to distinguish
2019     them from the outer iterations of the line search and trust region
2020     minimizer algorithms which call the line search algorithm as a
2021     subroutine.
2022
2023 .. member:: double Solver::Summary::preprocessor_time_in_seconds
2024
2025    Time (in seconds) spent in the preprocessor.
2026
2027 .. member:: double Solver::Summary::minimizer_time_in_seconds
2028
2029    Time (in seconds) spent in the Minimizer.
2030
2031 .. member:: double Solver::Summary::postprocessor_time_in_seconds
2032
2033    Time (in seconds) spent in the post processor.
2034
2035 .. member:: double Solver::Summary::total_time_in_seconds
2036
2037    Time (in seconds) spent in the solver.
2038
2039 .. member:: double Solver::Summary::linear_solver_time_in_seconds
2040
2041    Time (in seconds) spent in the linear solver computing the trust
2042    region step.
2043
2044 .. member:: double Solver::Summary::residual_evaluation_time_in_seconds
2045
2046    Time (in seconds) spent evaluating the residual vector.
2047
2048 .. member:: double Solver::Summary::jacobian_evaluation_time_in_seconds
2049
2050    Time (in seconds) spent evaluating the Jacobian matrix.
2051
2052 .. member:: double Solver::Summary::inner_iteration_time_in_seconds
2053
2054    Time (in seconds) spent doing inner iterations.
2055
2056 .. member:: int Solver::Summary::num_parameter_blocks
2057
2058    Number of parameter blocks in the problem.
2059
2060 .. member:: int Solver::Summary::num_parameters
2061
2062    Number of parameters in the problem.
2063
2064 .. member:: int Solver::Summary::num_effective_parameters
2065
2066    Dimension of the tangent space of the problem (or the number of
2067    columns in the Jacobian for the problem). This is different from
2068    :member:`Solver::Summary::num_parameters` if a parameter block is
2069    associated with a :class:`LocalParameterization`.
2070
2071 .. member:: int Solver::Summary::num_residual_blocks
2072
2073    Number of residual blocks in the problem.
2074
2075 .. member:: int Solver::Summary::num_residuals
2076
2077    Number of residuals in the problem.
2078
2079 .. member:: int Solver::Summary::num_parameter_blocks_reduced
2080
2081    Number of parameter blocks in the problem after the inactive and
2082    constant parameter blocks have been removed. A parameter block is
2083    inactive if no residual block refers to it.
2084
2085 .. member:: int Solver::Summary::num_parameters_reduced
2086
2087    Number of parameters in the reduced problem.
2088
2089 .. member:: int Solver::Summary::num_effective_parameters_reduced
2090
2091    Dimension of the tangent space of the reduced problem (or the
2092    number of columns in the Jacobian for the reduced problem). This is
2093    different from :member:`Solver::Summary::num_parameters_reduced` if
2094    a parameter block in the reduced problem is associated with a
2095    :class:`LocalParameterization`.
2096
2097 .. member:: int Solver::Summary::num_residual_blocks_reduced
2098
2099    Number of residual blocks in the reduced problem.
2100
2101 .. member:: int Solver::Summary::num_residuals_reduced
2102
2103    Number of residuals in the reduced problem.
2104
2105 .. member:: int Solver::Summary::num_threads_given
2106
2107    Number of threads specified by the user for Jacobian and residual
2108    evaluation.
2109
2110 .. member:: int Solver::Summary::num_threads_used
2111
2112    Number of threads actually used by the solver for Jacobian and
2113    residual evaluation. This number is not equal to
2114    :member:`Solver::Summary::num_threads_given` if `OpenMP` is not
2115    available.
2116
2117 .. member:: int Solver::Summary::num_linear_solver_threads_given
2118
2119    Number of threads specified by the user for solving the trust
2120    region problem.
2121
2122 .. member:: int Solver::Summary::num_linear_solver_threads_used
2123
2124    Number of threads actually used by the solver for solving the trust
2125    region problem. This number is not equal to
2126    :member:`Solver::Summary::num_linear_solver_threads_given` if
2127    `OpenMP` is not available.
2128
2129 .. member:: LinearSolverType Solver::Summary::linear_solver_type_given
2130
2131    Type of the linear solver requested by the user.
2132
2133 .. member:: LinearSolverType Solver::Summary::linear_solver_type_used
2134
2135    Type of the linear solver actually used. This may be different from
2136    :member:`Solver::Summary::linear_solver_type_given` if Ceres
2137    determines that the problem structure is not compatible with the
2138    linear solver requested or if the linear solver requested by the
2139    user is not available, e.g. The user requested
2140    `SPARSE_NORMAL_CHOLESKY` but no sparse linear algebra library was
2141    available.
2142
2143 .. member:: vector<int> Solver::Summary::linear_solver_ordering_given
2144
2145    Size of the elimination groups given by the user as hints to the
2146    linear solver.
2147
2148 .. member:: vector<int> Solver::Summary::linear_solver_ordering_used
2149
2150    Size of the parameter groups used by the solver when ordering the
2151    columns of the Jacobian.  This maybe different from
2152    :member:`Solver::Summary::linear_solver_ordering_given` if the user
2153    left :member:`Solver::Summary::linear_solver_ordering_given` blank
2154    and asked for an automatic ordering, or if the problem contains
2155    some constant or inactive parameter blocks.
2156
2157 .. member:: std::string Solver::Summary::schur_structure_given
2158
2159     For Schur type linear solvers, this string describes the template
2160     specialization which was detected in the problem and should be
2161     used.
2162
2163 .. member:: std::string Solver::Summary::schur_structure_used
2164
2165    For Schur type linear solvers, this string describes the template
2166    specialization that was actually instantiated and used. The reason
2167    this will be different from
2168    :member:`Solver::Summary::schur_structure_given` is because the
2169    corresponding template specialization does not exist.
2170
2171    Template specializations can be added to ceres by editing
2172    ``internal/ceres/generate_template_specializations.py``
2173
2174 .. member:: bool Solver::Summary::inner_iterations_given
2175
2176    `True` if the user asked for inner iterations to be used as part of
2177    the optimization.
2178
2179 .. member:: bool Solver::Summary::inner_iterations_used
2180
2181    `True` if the user asked for inner iterations to be used as part of
2182    the optimization and the problem structure was such that they were
2183    actually performed. For example, in a problem with just one parameter
2184    block, inner iterations are not performed.
2185
2186 .. member:: vector<int> inner_iteration_ordering_given
2187
2188    Size of the parameter groups given by the user for performing inner
2189    iterations.
2190
2191 .. member:: vector<int> inner_iteration_ordering_used
2192
2193    Size of the parameter groups given used by the solver for
2194    performing inner iterations. This maybe different from
2195    :member:`Solver::Summary::inner_iteration_ordering_given` if the
2196    user left :member:`Solver::Summary::inner_iteration_ordering_given`
2197    blank and asked for an automatic ordering, or if the problem
2198    contains some constant or inactive parameter blocks.
2199
2200 .. member:: PreconditionerType Solver::Summary::preconditioner_type_given
2201
2202    Type of the preconditioner requested by the user.
2203
2204 .. member:: PreconditionerType Solver::Summary::preconditioner_type_used
2205
2206    Type of the preconditioner actually used. This may be different
2207    from :member:`Solver::Summary::linear_solver_type_given` if Ceres
2208    determines that the problem structure is not compatible with the
2209    linear solver requested or if the linear solver requested by the
2210    user is not available.
2211
2212 .. member:: VisibilityClusteringType Solver::Summary::visibility_clustering_type
2213
2214    Type of clustering algorithm used for visibility based
2215    preconditioning. Only meaningful when the
2216    :member:`Solver::Summary::preconditioner_type` is
2217    ``CLUSTER_JACOBI`` or ``CLUSTER_TRIDIAGONAL``.
2218
2219 .. member:: TrustRegionStrategyType Solver::Summary::trust_region_strategy_type
2220
2221    Type of trust region strategy.
2222
2223 .. member:: DoglegType Solver::Summary::dogleg_type
2224
2225    Type of dogleg strategy used for solving the trust region problem.
2226
2227 .. member:: DenseLinearAlgebraLibraryType Solver::Summary::dense_linear_algebra_library_type
2228
2229    Type of the dense linear algebra library used.
2230
2231 .. member:: SparseLinearAlgebraLibraryType Solver::Summary::sparse_linear_algebra_library_type
2232
2233    Type of the sparse linear algebra library used.
2234
2235 .. member:: LineSearchDirectionType Solver::Summary::line_search_direction_type
2236
2237    Type of line search direction used.
2238
2239 .. member:: LineSearchType Solver::Summary::line_search_type
2240
2241    Type of the line search algorithm used.
2242
2243 .. member:: LineSearchInterpolationType Solver::Summary::line_search_interpolation_type
2244
2245    When performing line search, the degree of the polynomial used to
2246    approximate the objective function.
2247
2248 .. member:: NonlinearConjugateGradientType Solver::Summary::nonlinear_conjugate_gradient_type
2249
2250    If the line search direction is `NONLINEAR_CONJUGATE_GRADIENT`,
2251    then this indicates the particular variant of non-linear conjugate
2252    gradient used.
2253
2254 .. member:: int Solver::Summary::max_lbfgs_rank
2255
2256    If the type of the line search direction is `LBFGS`, then this
2257    indicates the rank of the Hessian approximation.