libs/math/doc/sf/lambert_w.qbk

   1 [section:lambert_w Lambert /W/ function]
   2
   3 [h4:synopsis Synopsis]
   4
   5 ``
   6 #include <boost/math/special_functions/lambert_w.hpp>
   7 ``
   8
   9  namespace boost { namespace math {
  10
  11    template <class T>
  12    ``__sf_result`` lambert_w0(T z);                        // W0 branch, default policy.
  13    template <class T>
  14    ``__sf_result`` lambert_wm1(T z);                       // W-1 branch, default policy.
  15    template <class T>
  16    ``__sf_result`` lambert_w0_prime(T z);                  // W0 branch 1st derivative.
  17    template <class T>
  18    ``__sf_result`` lambert_wm1_prime(T z);                 // W-1 branch 1st derivative.
  19
  20    template <class T, class ``__Policy``>
  21    ``__sf_result`` lambert_w0(T z, const ``__Policy``&);         // W0 with policy.
  22    template <class T, class ``__Policy``>
  23    ``__sf_result`` lambert_wm1(T z, const ``__Policy``&);        // W-1 with policy.
  24    template <class T, class ``__Policy``>
  25    ``__sf_result`` lambert_w0_prime(T z, const ``__Policy``&);   // W0 derivative with policy.
  26    template <class T, class ``__Policy``>
  27    ``__sf_result`` lambert_wm1_prime(T z, const ``__Policy``&);  // W-1 derivative with policy.
  28
  29   } // namespace boost
  30   } // namespace math
  31
  32
  33 [h4:description Description]
  34
  35 The __Lambert_W is the solution of the equation /W/(/z/)/e/[super /W/(/z/)] = /z/.
  36 It is also called the Omega function, the inverse of /f/(/W/) = /We/[super /W/].
  37
  38 On the interval \[0, [infin]\), there is just one real solution.
  39 On the interval (-/e/[super -1], 0), there are two real solutions, generating two branches which we will denote by /W/[sub 0] and /W/[sub -1].
  40 In Boost.Math, we call these principal branches `lambert_w0` and `lambert_wm1`; their derivatives are labelled `lambert_w0_prime` and `lambert_wm1_prime`.
  41
  42 [import ../../example/lambert_w_graph.cpp]
  43
  44 [graph lambert_w_graph]
  45 [graph lambert_w_graph_big_w]
  46 [graph lambert_w0_prime_graph]
  47 [graph lambert_wm1_prime_graph]
  48
  49 There is a singularity where the branches meet at /e/[super -1] [cong] [^ -0.367879].
  50 Approaching this point, the condition number of function evaluation tends to infinity,
  51 and the only method of recovering high accuracy is use of higher precision.
  52
  53 This implementation computes the two real branches /W/[sub 0] and /W/[sub -1]
  54 with the functions `lambert_w0` and `lambert_wm1`,
  55 and their derivatives, `lambert_w0_prime` and  `lambert_wm1_prime`.
  56 Complex arguments are not supported.
  57
  58 The final __Policy argument is optional and can be used to control how the function deals with errors.
  59 Refer to __policy_section for more details and see examples below.
  60
  61 [h5:applications Applications of the Lambert /W/ function]
  62
  63 The Lambert /W/ function has a myriad of applications.
  64 [@http://www.apmaths.uwo.ca/~djeffrey/Offprints/W-adv-cm.pdf Corless et al.] provide a summary of applications,
  65 from the mathematical, like iterated exponentiation and asymptotic roots of trinomials,
  66 to the real-world, such as the range of a jet plane, enzyme kinetics, water movement in soil,
  67 epidemics, and diode current (an example replicated [@../../example/lambert_w_diode.cpp here]).
  68 Since the publication of their landmark paper, there have been many more applications, and
  69 also many new implementations of the function, upon which this implementation builds.
  70
  71 [h4:examples Examples]
  72
  73 [import ../../example/lambert_w_simple_examples.cpp]
  74
  75 The most basic usage of the Lambert-/W/ function is demonstrated below:
  76
  77 [lambert_w_simple_examples_includes]
  78
  79 [lambert_w_simple_examples_0]
  80
  81 Other floating-point types can be used too, here `float`,
  82 including user-defined types like __multiprecision.
  83 It is convenient to use a function like `show_value`
  84 to display all (and only) potentially significant decimal digits, including any significant trailing zeros,
  85 (`std::numeric_limits<T>::max_digits10`) for the type `T`.
  86
  87 [lambert_w_simple_examples_1]
  88
  89 Example of an integer argument to `lambert_w0`,
  90 showing that an `int` literal is correctly promoted to a `double`.
  91
  92 [lambert_w_simple_examples_2]
  93
  94 Using __multiprecision types to get much higher precision is painless.
  95
  96 [lambert_w_simple_examples_3]
  97
  98 [warning When using multiprecision, take very great care not to
  99 construct or assign non-integers from `double`, `float` ... silently losing precision.
 100 Use `"1.2345678901234567890123456789"` rather than `1.2345678901234567890123456789`.]
 101
 102 Using multiprecision types, it is all too easy to get multiprecision precision wrong!
 103
 104 [lambert_w_simple_examples_4]
 105
 106 [note See spurious non-seven decimal digits appearing after digit #17 in the argument 0.7777777777777777...!]
 107
 108 And similarly constructing from a literal `double 0.9`, with more random digits after digit number 17.
 109
 110 [lambert_w_simple_examples_4a]
 111
 112 Note how the `cpp_float_dec_50` result is only as correct as from a `double = 0.9`.
 113
 114 Now see the correct result for all 50 decimal digits constructing from a decimal digit string "0.9":
 115
 116 [lambert_w_simple_examples_4b]
 117
 118 Note the expected zeros for all places up to 50 - and the correct Lambert /W/ result!
 119
 120 (It is just as easy to compute even higher precisions,
 121 at least to thousands of decimal digits, but not shown here for brevity.
 122 See [@../../example/lambert_w_simple_examples.cpp lambert_w_simple_examples.cpp]
 123 for comparison of an evaluation at 1000 decimal digit precision with __WolframAlpha).
 124
 125 Policies can be used to control what action to take on errors:
 126
 127 [lambert_w_simple_examples_error_policies]
 128
 129 An example error message:
 130
 131 [lambert_w_simple_examples_error_message_1]
 132
 133 Showing an error reported if a value is passed to `lambert_w0` that is out of range,
 134 (and was probably meant to be passed to `lambert_wm1` instead).
 135
 136 [lambert_w_simple_examples_out_of_range]
 137
 138 The full source of these examples is at [@../../example/lambert_w_simple_examples.cpp lambert_w_simple_examples.cpp]
 139
 140 [h5:diode_resistance Diode Resistance Example]
 141
 142 [import ../../example/lambert_w_diode_graph.cpp]
 143
 144 A typical example of a practical application is estimating the current flow
 145 through a diode with series resistance from a paper by Banwell and Jayakumar.
 146
 147 Having the Lambert /W/ function available makes it simple to reproduce the plot
 148 in their paper (Fig 2) comparing estimates using with Lambert /W/ function
 149 and some actual measurements.
 150 The colored curves show the effect of various series resistance on the current
 151 compared to an extrapolated line in grey with no internal (or external) resistance.
 152
 153 Two formulae relating the diode current and effect of series resistance can be combined,
 154 but yield an otherwise intractable equation relating the current versus voltage
 155 with a varying series resistance. This was reformulated as a
 156 generalized equation in terms of the Lambert W function:
 157
 158 Banwell and Jakaumar equation 5
 159
 160 [expression I(V) = [mu] V[sub T]/ R [sub S] [dot] W[sub 0](I[sub 0] R[sub S] / ([mu] V[sub T]))]
 161
 162 Using these variables
 163
 164 [lambert_w_diode_graph_1]
 165
 166 the formulas can be rendered in C++
 167
 168 [lambert_w_diode_graph_2]
 169
 170 to reproduce their Fig 2:
 171
 172 [graph diode_iv_plot]
 173
 174 The plotted points for no external series resistance
 175 (derived from their published plot as the raw data are not publicly available)
 176 are used to extrapolate back to estimate the intrinsic emitter resistance as 0.3 ohm.
 177 The effect of external series resistance is visible when the colored lines start to curve away from the straight line as voltage increases.
 178
 179 See [@../../example/lambert_w_diode.cpp  lambert_w_diode.cpp] and
 180 [@../../example/lambert_w_diode_graph.cpp  lambert_w_diode_graph.cpp]
 181 for details of the calculation.
 182
 183 [h5:implementations Existing implementations]
 184
 185 The principal value of the Lambert /W/ function is implemented in the [@http://mathworld.wolfram.com/LambertW-Function.html Wolfram Language] as `ProductLog[k, z]`,
 186 where `k` is the branch.
 187
 188 The symbolic algebra program __Maple also computes Lambert /W/ to an arbitrary precision.
 189
 190 [h4:precision Controlling the compromise between Precision and Speed]
 191
 192 [h5:small_floats Floating-point types `double` and `float`]
 193 This implementation provides good precision and excellent speed for __fundamental `float` and `double`.
 194
 195 All the functions usually return values within a few __ulp
 196 for the floating-point type, except for very small arguments very near zero,
 197 and for arguments very close to the singularity at the branch point.
 198
 199 By default, this implementation provides the best possible speed.
 200 Very slightly average higher precision and less bias might be obtained
 201 by adding a __halley step refinement,
 202 but at the cost of more than doubling the runtime.
 203
 204 [h5:big_floats Floating-point types larger than double]
 205
 206 For floating-point types with precision greater than `double` and `float` __fundamental_types,
 207 a `double` evaluation is used as a first approximation followed by Halley refinement,
 208 using a single step where it can be predicted that this will be sufficient,
 209 and only using __halley iteration when necessary.
 210 Higher precision types are always going to be [*very, very much slower].
 211
 212 The 'best' evaluation (the nearest __representable) can be achieved by `static_cast`ing from a
 213 higher precision type, typically a __multiprecision type like `cpp_bin_float_50`,
 214 but at the cost of increasing run-time 100-fold;
 215 this has been used here to provide some of our reference values for testing.
 216
 217 [import ../../example/lambert_w_precision_example.cpp]
 218
 219 For example, we get a reference value using a high precision type, for example;
 220
 221   using boost::multiprecision::cpp_bin_float_50;
 222
 223 that uses Halley iteration to refine until it is as precise as possible for this `cpp_bin_float_50` type.
 224
 225 As a further check we can compare this with a __WolframAlpha computation
 226 using command [^N\[ProductLog\[10.\], 50\]] to get 50 decimal digits
 227 and similarly [^N\[ProductLog\[10.\], 17\]] to get the nearest representable for 64-bit `double` precision.
 228
 229 [lambert_w_precision_reference_w]
 230
 231 giving us the same nearest representable using 64-bit `double` as `1.7455280027406994`.
 232
 233 However, the rational polynomial and Fukushima Schroder approximations are so good for type `float` and `double`
 234 that negligible improvement is gained from a `double` Halley step.
 235
 236 This is shown with [@../../example/lambert_w_precision_example.cpp  lambert_w_precision_example.cpp]
 237 for Lambert /W/[sub 0]:
 238
 239 [lambert_w_precision_1]
 240
 241 with this output:
 242
 243 [lambert_w_precision_output_1]
 244
 245 and then for /W/[sub -1]:
 246
 247 [lambert_w_precision_2]
 248
 249 with this output:
 250
 251 [lambert_w_precision_output_2]
 252
 253 [h5:differences_distribution Distribution of differences from 'best' `double` evaluations]
 254
 255 The distribution of differences from 'best' are shown in these graphs comparing `double` precision evaluations
 256 with reference 'best' z50 evaluations using `cpp_bin_float_50` type reduced to `double` with `static_cast<double(z50)` :
 257
 258 [graph lambert_w0_errors_graph]
 259
 260 [graph lambert_wm1_errors_graph]
 261
 262 As noted in the implementation section, the distribution of these differences is somewhat biased
 263 for Lambert /W/[sub -1] and this might be reduced using a `double` Halley step at small runtime cost.
 264 But if you are seriously concerned to get really precise computations,
 265 the only way is using a higher precision type and then reduce to the desired type.
 266 Fortunately, __multiprecision makes this very easy to program, if much slower.
 267
 268 [h4:edge_cases Edge and Corner cases]
 269
 270 [h5:w0_edges The /W/[sub 0] Branch]
 271
 272 The domain of /W/[sub 0] is \[-/e/[super -1], [infin]\). Numerically,
 273
 274 * `lambert_w0(-1/e)` is exactly -1.
 275 * `lambert_w0(z)` for `z < -1/e` throws a `domain_error`, or returns `NaN` according to the policy.
 276 * `lambert_w0(std::numeric_limits<T>::infinity())`  throws an `overflow_error`.
 277
 278 (An infinite argument probably indicates that something has already gone wrong,
 279 but if it is desired to return infinity,
 280 this case should be handled before calling `lambert_w0`).
 281
 282 [h5:wm1_edges /W/[sub -1] Branch]
 283
 284 The domain of /W/[sub -1] is \[-/e/[super -1], 0\). Numerically,
 285
 286 * `lambert_wm1(-1/e)` is exactly -1.
 287 * `lambert_wm1(0)`  returns  -[infin] (or the nearest equivalent if `std::has_infinity == false`).
 288 * `lambert_wm1(-std::numeric_limits<T>::min())` returns the maximum (most negative) possible value of Lambert /W/ for the type T. [br]
 289 For example, for `double`: lambert_wm1(-2.2250738585072014e-308) = -714.96865723796634 [br]
 290 and for `float`: lambert_wm1(-1.17549435e-38) = -91.8567734 [br]
 291
 292 * `z < -std::numeric_limits<T>::min()`, means that z is zero or denormalized (if `std::numeric_limits<T>::has_denorm_min == true`),
 293 for example: `r = lambert_wm1(-std::numeric_limits<double>::denorm_min());` and an overflow_error exception is thrown,
 294 and will give a message like:
 295
 296   Error in function boost::math::lambert_wm1<RealType>(<RealType>):
 297   Argument z = -4.9406564584124654e-324 is too small
 298   (z < -std::numeric_limits<T>::min so denormalized) for Lambert W-1 branch!
 299
 300 Denormalized values are not supported for Lambert /W/[sub -1] (because not all floating-point types denormalize),
 301 and anyway it only covers a tiny fraction of the range of possible z arguments values.
 302
 303 [h4:compilers Compilers]
 304
 305 The `lambert_w.hpp` code has been shown to work on most C++98 compilers.
 306 (Apart from requiring C++11 extensions for using of `std::numeric_limits<>::max_digits10` in some diagnostics.
 307 Many old pre-c++11 compilers provide this extension but may require enabling to use,
 308 for example using b2/bjam the lambert_w examples use this command:
 309
 310    [ run lambert_w_basic_example.cpp  : : : [ requires cxx11_numeric_limits ] ]
 311
 312 See [@../../example/Jamfile.v2 jamfile.v2].)
 313
 314 For details of which compilers are expected to work see lambert_w tests and examples in:[br]
 315 [@https://www.boost.org/development/tests/master/developer/math.html  Boost Test Summary report for master branch (used for latest release)][br]
 316 [@https://www.boost.org/development/tests/develop/developer/math.html Boost Test Summary report for latest developer branch].
 317
 318 As expected, debug mode is very much slower than release.
 319
 320 [h5:diagnostics Diagnostics Macros]
 321
 322 Several macros are provided to output diagnostic information (potentially [*much] output).
 323 These can be statements, for example:
 324
 325 `#define BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS`
 326
 327 placed [*before] the `lambert_w` include statement
 328
 329 `#include <boost/math/special_functions/lambert_w.hpp>`,
 330
 331 or defined on the project compile command-line:  `/DBOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS`,
 332
 333 or defined in a jamfile.v2:  `<define>BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS`
 334
 335 [import ../../include/boost/math/special_functions/lambert_w.hpp]
 336
 337 [boost_math_instrument_lambert_w_macros]
 338
 339 [h4:implementation Implementation]
 340
 341 There are many previous implementations, each with increasing accuracy and/or speed.
 342 See __lambert_w_references below.
 343
 344 For most of the range of ['z] arguments, some initial approximation followed by a single refinement,
 345 often using Halley or similar method, gives a useful precision.
 346 For speed, several implementations avoid evaluation of a iteration test using the exponential function,
 347 estimating that a single refinement step will suffice,
 348 but these rarely get to the best result possible.
 349 To get a better precision, additional refinements, probably iterative, are needed
 350 for example, using __halley or __schroder methods.
 351
 352 For C++, the most precise results possible, closest to the nearest __representable for the C++ type being used,
 353 it is usually necessary to use a higher precision type for intermediate computation,
 354 finally static-casting back to the smaller desired result type.
 355 This strategy is used by __Maple and __WolframAlpha, for example, using arbitrary precision arithmetic,
 356 and some of their high-precision values are used for testing this library.
 357 This method is also used to provide some __boost_test values using __multiprecision,
 358 typically, a 50 decimal digit type like `cpp_bin_float_50`
 359 `static_cast` to a `float`, `double` or `long double` type.
 360
 361 For ['z] argument values near the singularity and near zero, other approximations may be used,
 362 possibly followed by refinement or increasing number of series terms until a desired precision is achieved.
 363 At extreme arguments near to zero or the singularity at the branch point,
 364 even this fails and the only method to achieve a really close result is to cast from a higher precision type.
 365
 366 In practical applications, the increased computation required
 367 (often towards a thousand-fold slower and requiring much additional code for __multiprecision)
 368 is not justified and the algorithms here do not implement this.
 369 But because the Boost.Lambert_W algorithms has been tested using __multiprecision,
 370 users who require this can always easily achieve the nearest representation for __fundamental_types
 371 - if the application justifies the very large extra computation cost.
 372
 373 [h5 Evolution of this implementation]
 374
 375 One compact real-only implementation was based on an algorithm by __Luu_thesis,
 376 (see routine 11 on page 98 for his Lambert W algorithm)
 377 and his Halley refinement is used iteratively when required.
 378 A first implementation was based on Thomas Luu's code posted at
 379 [@https://svn.boost.org/trac/boost/ticket/11027 Boost Trac \#11027].
 380 It has been implemented from Luu's algorithm but templated on `RealType` parameter and result
 381 and handles both __fundamental_types (`float, double, long double`), __multiprecision,
 382 and also has been tested successfully with a proposed fixed_point type.
 383
 384 A first approximation was computed using the method of Barry et al (see references 5 & 6 below).
 385 This was extended to the widely used [@https://people.sc.fsu.edu/~jburkardt/f_src/toms443/toms443.html TOMS443]
 386 FORTRAN and C++ versions by John Burkardt using Schroeder refinement(s).
 387 (For users only requiring an accuracy of relative accuracy of 0.02%, Barry's function alone might suffice,
 388 but a better __rational_function approximation method has since been developed for this implementation).
 389
 390 We also considered using __newton method.
 391 ``
 392   f(w) = w e^w -z = 0 // Luu equation 6.37
 393   f'(w) = e^w (1 + w), Wolfram alpha (d)/(dw)(f(w) = w exp(w) - z) = e^w (w + 1)
 394   if (f(w) / f'(w) -1 < tolerance
 395   w1 = w0 - (expw0 * (w0 + 1)); // Refine new Newton/Raphson estimate.
 396 ``
 397 but concluded that since the Newton-Raphson method takes typically 6 iterations to converge within tolerance,
 398 whereas Halley usually takes only 1 to 3 iterations to achieve an result within 1 __ulp,
 399 so the Newton-Raphson method is unlikely to be quicker
 400 than the additional cost of computing the 2nd derivative for Halley's method.
 401
 402 Halley refinement uses the simplified formulae obtained from
 403 [@http://www.wolframalpha.com/input/?i=%5B2(z+exp(z)-w)+d%2Fdx+(z+exp(z)-w)%5D+%2F+%5B2+(d%2Fdx+(z+exp(z)-w))%5E2+-+(z+exp(z)-w)+d%5E2%2Fdx%5E2+(z+exp(z)-w)%5D Wolfram Alpha]
 404
 405   [2(z exp(z)-w) d/dx (z exp(z)-w)] / [2 (d/dx (z exp(z)-w))^2 - (z exp(z)-w) d^2/dx^2 (z exp(z)-w)]
 406
 407 [h4:compact_implementation Implementing Compact Algorithms]
 408
 409 The most compact algorithm can probably be implemented using the log approximation of Corless et al.
 410 followed by Halley iteration (but is also slowest and least precise near zero and near the branch singularity).
 411
 412 [h4:faster_implementation Implementing Faster Algorithms]
 413
 414 More recently, the Tosio Fukushima has developed an even faster algorithm,
 415 avoiding any transcendental function calls as these are necessarily expensive.
 416 The current implementation of Lambert /W/[sub -1] is based on his algorithm
 417 starting with a translation from Fukushima's FORTRAN into C++ by Darko Veberic.
 418
 419 Many applications of the Lambert W function make many repeated evaluations for Monte Carlo methods;
 420 for these applications speed is very important.
 421 Luu, and Chapeau-Blondeau and Monir provide typical usage examples.
 422
 423 Fukushima improves the important observation that much of the execution time of all
 424 previous iterative algorithms was spent evaluating transcendental functions, usually `exp`.
 425 He has put a lot of work into avoiding any slow transcendental functions by using lookup tables and
 426 bisection, finishing with a single Schroeder refinement,
 427 without any check on the final precision of the result (necessarily evaluating an expensive exponential).
 428
 429 Theoretical and practical tests confirm that Fukushima's algorithm gives Lambert W estimates
 430 with a known small error bound (several __ulp) over nearly all the range of ['z] argument.
 431
 432 A mean difference was computed to express the typical error and is often about 0.5 epsilon,
 433 the theoretical minimum.  Using the __float_distance, we can also express this as the number of
 434 bits that are different from the nearest representable or 'exact' or 'best' value.
 435 The number and distribution of these few bits differences was studied by binning, including their sign.
 436 Bins for (signed) 0, 1, 2 and 3 and 4 bits proved suitable.
 437
 438 However, though these give results within a few __epsilon of the nearest representable result,
 439 they do not get as close as is very often possible with further refinement,
 440 nrealy always to within one or two __epsilon.
 441
 442 More significantly, the evaluations of the sum of all signed differences using the Fukshima algorithm
 443 show a slight bias, being more likely to be a bit or few below the nearest representation than above;
 444 bias might have unwanted effects on some statistical computations.
 445
 446 Fukushima's method also does not cover the full range of z arguments of 'float' precision and above.
 447
 448 For this implementation of Lambert /W/[sub 0], John Maddock used the Boost.Math __remez method program
 449 to devise a __rational_function for several ranges of argument for the /W/[sub 0] branch of Lambert /W/ function.
 450 These minimax rational approximations are combined for an algorithm that is both smaller and faster.
 451
 452 Sadly it has not proved practical to use the same __remez method for Lambert /W/[sub -1] branch
 453 and so the Fukushima algorithm is retained for this branch.
 454
 455 An advantage of both minimax rational __remez approximations
 456 is that the [*distribution] from the reference values is reasonably random and insignificantly biased.
 457
 458 For example, table below a test of Lambert /W/[sub 0] 10000 values of argument covering the main range of possible values,
 459 10000 comparisons from z = 0.0501 to 703, in 0.001 step factor 1.05 when module 7 == 0
 460
 461 [table:lambert_w0_Fukushima Fukushima Lambert /W/[sub 0] and typical improvement from a single Halley step.
 462 [[Method]  [Exact]  [One_bit]   [Two_bits]  [Few_bits]  [inexact] [bias]]
 463 [[Schroeder /W/[sub 0] ] [8804 ] [   1154 ] [  37 ] [ 5 ] [1243  ] [-1193]]
 464 [[after Halley step] [      9710 ] [ 288 ] [  2 ] [0] [ 292 ] [22]]
 465 ] [/table:lambert_w0_Fukushima /W/[sub 0] ]
 466
 467 Lambert /W/[sub 0] values computed using the Fukushima method with Schroeder refinement gave about 1/6 `lambert_w0` values that are
 468 one bit different from the 'best', and < 1% that are a few bits 'wrong'.
 469 If a Halley refinement step is added, only 1 in 30 are even one bit different, and only 2 two-bits 'wrong'.
 470
 471 [table:lambert_w0_plus_halley Rational polynomial Lambert /W/[sub 0] and typical improvement from a single Halley step.
 472 [[Method]  [Exact]  [One_bit]   [Two_bits]  [Few_bits]  [inexact] [bias]]
 473 [[rational/polynomial] [7135] [     2863    ] [ 2 ] [0] [    2867    ] [    -59] ]
 474 [[after Halley step ] [     9724  ] [273] [ 3 ] [0] [    279   ] [5] ]
 475 ] [/table:lambert_w0_plus_halley]
 476
 477 With the rational polynomial approximation method, there are a third one-bit from the best and none more than two-bits.
 478 Adding a Halley step (or iteration) reduces the number that are one-bit different from about a third down to one in 30;
 479 this is unavoidable 'computational noise'.
 480 An extra Halley step would double the runtime for a tiny gain and so is not chosen for this implementation,
 481 but remains a option, as detailed above.
 482
 483 For the Lambert /W/[sub -1] branch, the Fukushima algorithm is used.
 484
 485 [table:lambert_wm1_fukushima Lambert /W/[sub -1] using Fukushima algorithm.
 486 [[Method]  [Exact]  [One_bit]   [Two_bits]  [Few_bits]  [inexact] [bias]]
 487 [[Fukushima /W/[sub -1]]    [ 7167] [2704 ]  [ 129 ]  [0] [2962  ] [-160]]
 488 [[plus Halley step] [ 7379 ] [ 2529 ] [92  ]  [0] [ 2713 ] [ 549]]
 489 ] [/table:lambert_wm1_fukushima]
 490
 491 [/ generated using PAB j:\Cpp\Misc\lambert_w_compare_jm2\lambert_w_compare_jm2.cpp]
 492
 493 [h5:lookup_tables Lookup tables]
 494
 495 For speed during the bisection, Fukushima's algorithm computes lookup tables of powers of e and z for integral Lambert W.
 496 There are 64 elements in these tables. The FORTRAN version (and the C++ translation by Veberic)
 497 computed these (once) as `static` data. This is slower, may cause trouble with multithreading, and
 498 is slightly inaccurate because of rounding errors from repeated(64) multiplications.
 499
 500 In this implementation the array values have been computed using __multiprecision 50 decimal digit
 501 and output as C++ arrays 37 decimal digit `long double` literals using `max_digits10` precision
 502
 503   std::cout.precision(std::numeric_limits<cpp_bin_float_quad>::max_digits10);
 504
 505 The arrays are as `const` and `constexpr` and `static` as possible (for the compiler version),
 506 using BOOST_STATIC_CONSTEXPR macro.
 507 (See [@../../tools/lambert_w_lookup_table_generator.cpp  lambert_w_lookup_table_generator.cpp]
 508 The precision was chosen to ensure that if used as `long double` arrays,
 509 then the values output to
 510 [@ ../../include/boost/math/special_functions/detail/lambert_w_lookup_table.ipp lambert_w_lookup_table.ipp]
 511 will be the nearest representable value for the type chose by a `typedef` in
 512 [@../../include/boost/math/special_functions/lambert_w.hpp lambert_w.hpp].
 513
 514   typedef double lookup_t; // Type for lookup table (`double` or `float`, or even `long double`?)
 515
 516 This is to allow for future use at higher precision, up to platforms that use 128-bit
 517 (hardware or software) for their `long double` type.
 518
 519 The accuracy of the tables was confirmed using __WolframAlpha and agrees at the 37th decimal place,
 520 so ensuring that the value is exactly read into even 128-bit `long double` to the nearest representation.
 521
 522 [h5:higher_precision Higher precision]
 523
 524 For types more precise than `double`, Fukushima reported that it was best to use the `double` estimate
 525 as a starting point, followed by refinement using __halley iterations or other methods;
 526 our experience confirms this.
 527
 528 Using __multiprecision it is simple to compute very high precision values of Lambert
 529 W at least to thousands of decimal digits over most of the range of z arguments.
 530
 531 For this reason, the lookup tables and bisection are only carried out at low precision,
 532 usually `double`, chosen by the `typedef double lookup_t`.  Unlike the FORTRAN version,
 533 the lookup tables of Lambert_W of integral values are precomputed as C++ static arrays of floating-point literals.
 534 The default is a `typedef` setting the type to `double`.
 535 To allow users to vary the precision from `float` to `long double` these are computed to
 536 128-bit precision to ensure that even platforms with `long double` do not lose precision.
 537
 538 The FORTRAN version and translation only permits the z argument to be the largest
 539 items in these lookup arrays, `wm0s[64] = 3.99049`, producing an error message and returning `NaN`.
 540 So 64 is the largest possible value ever returned from the `lambert_w0` function.
 541 This is far from the `std::numeric_limits<>::max()` for even `float`s.
 542 Therefore this implementation uses an approximation or 'guess' and Halley's method to refine the result.
 543 Logarithmic approximation is discussed at length by R.M.Corless et al. (page 349).
 544 Here we use the first two terms of equation 4.19:
 545
 546   T lz = log(z);
 547   T llz = log(lz);
 548   guess = lz - llz + (llz / lz);
 549
 550 This gives a useful precision suitable for Halley refinement.
 551
 552 Similarly, for Lambert /W/[sub -1] branch, tiny values very near zero,
 553 W > 64 cannot be computed using the lookup table.
 554 For this region, an approximation followed by a few (usually 3) Halley refinements.
 555 See __lambert_w_wm1_near_zero.
 556
 557 For the less well-behaved regions for Lambert /W/[sub 0] ['z] arguments near zero,
 558 and near the branch singularity at ['-1/e], some series functions are used.
 559
 560 [h5:small_z Small values of argument z near zero]
 561
 562 When argument ['z] is small and near zero, there is an efficient and accurate series evaluation method available
 563 (implemented in `lambert_w0_small_z`).
 564 There is no equivalent for the /W/[sub -1] branch as this only covers argument `z < -1/e`.
 565 The cutoff used `abs(z) < 0.05` is as found by trial and error by Fukushima.
 566
 567 Coefficients of the inverted series expansion of the Lambert W function around `z = 0`
 568 are computed following Fukushima using 17 terms of a Taylor series
 569 computed using __Mathematica with
 570
 571   InverseSeries[Series[z Exp[z],{z,0,17}]]
 572
 573 See Tosio Fukushima, Journal of Computational and Applied Mathematics 244 (2013), page 86.
 574
 575 To provide higher precision constants (34 decimal digits) for types larger than `long double`,
 576
 577    InverseSeries[Series[z Exp[z],{z,0,34}]]
 578
 579 were also computed, but for current hardware it was found that evaluating a `double` precision
 580 and then refining with Halley's method was quicker and more accurate.
 581
 582 Decimal values of specifications for built-in floating-point types below
 583 are 21 digits precision == `std::numeric_limits<T>::max_digits10` for `long double`.
 584
 585 Specializations for `lambert_w0_small_z` are provided for
 586 `float`, `double`, `long double`, `float128` and for __multiprecision types.
 587
 588 The `tag_type` selection is based on the value `std::numeric_limits<T>::max_digits10`
 589 (and [*not] on the floating-point type T).
 590 This distinguishes between `long double` types that commonly vary between 64 and 80-bits,
 591 and also compilers that have a `float` type using 64 bits and/or `long double` using 128-bits.
 592
 593 As noted in the __lambert_w_implementation section above,
 594 it is only possible to ensure the nearest representable value by casting from a higher precision type,
 595 computed at very, very much greater cost.
 596
 597 For multiprecision types, first several terms of the series are tabulated and evaluated as a polynomial:
 598 (this will save us a bunch of expensive calls to `pow`).
 599 Then our series functor is initialized "as if" it had already reached term 18,
 600 enough evaluation of built-in 64-bit double and float (and 80-bit `long double`) types.
 601 Finally the functor is called repeatedly to compute as many additional series terms
 602 as necessary to achive the desired precision, set from `get_epsilon`
 603 (or terminated by `evaluation_error` on reaching the set iteration limit `max_series_iterations`).
 604
 605 A little more than one decimal digit of precision is gained by each additional series term.
 606 This allows computation of Lambert W near zero to at least 1000 decimal digit precision,
 607 given sufficient compute time.
 608
 609 [h4:near_singularity Argument z near the singularity at -1/e between branches /W/[sub 0] and /W/[sub -1] ]
 610
 611 Variants of Function `lambert_w_singularity_series` are used to handle ['z] arguments
 612 which are near to the singularity at `z = -exp(-1) = -3.6787944` where the branches /W/[sub 0] and /W/[sub -1] join.
 613
 614 T. Fukushima / Journal of Computational and Applied Mathematics 244 (2013) 77-89
 615 describes using __Mathematica
 616
 617   InverseSeries\[Series\[sqrt\[2(p Exp\[1 + p\] + 1)\], {p,-1, 20}\]\]
 618
 619 to provide his Table 3, page 85.
 620
 621 This implementation used __Mathematica to obtain 40 series terms at 50 decimal digit precision
 622 ``
 623   N\[InverseSeries\[Series\[Sqrt\[2(p Exp\[1 + p\] + 1)\], { p,-1,40 }\]\], 50\]
 624
 625   -1+p-p^2/3+(11 p^3)/72-(43 p^4)/540+(769 p^5)/17280-(221 p^6)/8505+(680863 p^7)/43545600 ...
 626 ``
 627 These constants are computed at compile time for the full precision for any `RealType T`
 628 using the original rationals from Fukushima Table 3.
 629
 630 Longer decimal digits strings are rationals pre-evaluated using __Mathematica.
 631 Some integer constants overflow, so largest size available is used, suffixed by `uLL`.
 632
 633 Above the 14th term, the rationals exceed the range of `unsigned long long` and are replaced
 634 by pre-computed decimal values at least 21 digits precision == `max_digits10` for `long double`.
 635
 636 A macro `BOOST_MATH_TEST_VALUE` (defined in [@../../test/test_value.hpp  test_value.hpp])
 637 taking a decimal floating-point literal was used
 638 to allow testing with both built-in floating-point types like `double`
 639 which have contructors taking literal decimal values like `3.14`,
 640 [*and] also multiprecision and other User-defined Types that only provide full-precision construction
 641 from decimal digit strings like `"3.14"`.
 642 (Construction of multiprecision types from built-in floating-point types
 643 only provides the precision of the built-in type, like `double`, only 17 decimal digits).
 644
 645 [tip Be exceeding careful not to silently lose precision by constructing multiprecision types from literal decimal types, usually [^double]. Use decimal digit strings like "3.1459" instead. See examples.]
 646
 647 Fukushima's implementation used 20 series terms; it was confirmed that using more terms does not usefully increase accuracy.
 648
 649 [h5:wm1_near_zero Lambert /W/[sub -1] arguments values very near zero.]
 650
 651 The lookup tables of Fukushima have only 64 elements,
 652 so that the z argument nearest zero is -1.0264389699511303e-26,
 653 that corresponds to a maximum Lambert /W/[sub -1] value of 64.0.
 654 Fukushima's implementation did not cater for z argument values that are smaller (nearer to zero),
 655 but this implementation adds code to accept smaller (but not denormalised) values of z.
 656 A crude approximation for these very small values is to take the exponent and multiply by ln[10] ~= 2.3.
 657 We also tried the approximation first proposed by Corless et al. using ln(-z), (equation 4.19 page 349)
 658 and then tried improving by a 2nd term -ln(ln(-z)), and finally the ratio term -ln(ln(-z))/ln(-z).
 659
 660 For a z very close to z = -1.0264389699511303e-26 when W = 64, when effect of ln(ln(-z) term, and ratio L1/L2 is greatest,
 661 the possible 'guesses' are
 662
 663   z = -1.e-26, w = -64.02, guess = -64.0277, ln(-z) = -59.8672, ln(-ln(-z) = 4.0921, llz/lz = -0.0684
 664
 665 whereas at the minimum (unnormalized) z
 666
 667   z = -2.2250e-308, w = -714.9, guess = -714.9687, ln(-z) = -708.3964, ln(-ln(-z) = 6.5630, llz/lz = -0.0092
 668
 669 Although the addition of the 3rd ratio term did not reduce the number of Halley iterations needed,
 670 it might allow return of a better low precision estimate [*without any Halley iterations].
 671 For the worst case near w = 64, the error in the 'guess' is 0.008, ratio 0.0001 or 1 in 10,000 digits 10 ~= 4.
 672 Two log evalutations are still needed, but is probably over an order of magnitude faster.
 673
 674 Halley's method was then used to refine the estimate of Lambert /W/[sub -1] from this guess.
 675 Experiments showed that although all approximations reached with __ulp of the closest representable value,
 676 the computational cost of the log functions was easily paid by far fewer iterations
 677 (typically from 8 down to 4 iterations for double or float).
 678 [h5:halley Halley refinement]
 679
 680 After obtaining a double approximation, for `double`, `long double` and `quad` 128-bit precision,
 681 a single iteration should suffice because
 682 Halley iteration should triple the precision with each step
 683 (as long as the function is well behaved - and it is),
 684 and since we have at least half of the bits correct already,
 685 one Halley step is ample to get to 128-bit precision.
 686
 687
 688 [h5:lambert_w_derivatives Lambert W Derivatives]
 689
 690 The derivatives are computed using the formulae in [@https://en.wikipedia.org/wiki/Lambert_W_function#Derivative Wikipedia].
 691
 692 [h4:testing Testing]
 693
 694 Initial testing of the algorithm was done using a small number of spot tests.
 695
 696 After it was established that the underlying algorithm (including unlimited Halley refinements with a tight terminating criterion) was correct,
 697 some tables of Lambert W values were computed using a 100 decimal digit precision __multiprecision `cpp_dec_float_100` type and saved as
 698 a C++ program that will initialise arrays of values of z arguments and lambert_W0 (`lambert_w_mp_high_values.ipp` and `lambert_w_mp_low_values.ipp` ).
 699
 700 (A few of these pairs were checked against values computed by Wolfram Alpha to try to guard against mistakes;
 701 all those tested agreed to the penultimate decimal place, so they can be considered reliable to at least 98 decimal digits precision).
 702
 703 A macro `BOOST_MATH_TEST_VALUE` was used to allow tests with any real type, both __fundamental_types and __multiprecision.
 704 (This is necessary because __fundamental_types have a constructor from floating-point literals like 3.1459F, 3.1459 or 3.1459L
 705 whereas __multiprecision types may lose precision unless constructed from decimal digits strings like "3.1459").
 706
 707 The 100-decimal digits precision pairs were then used to assess the precision of less-precise types, including
 708 __multiprecision `cpp_bin_float_quad` and `cpp_bin_float_50`.  `static_cast`ing from the high precision types should
 709 give the closest representable value of the less-precise type; this is then be used to assess the precision of
 710 the Lambert W algorithm.
 711
 712 Tests using
 713 confirm that over nearly all the range of z arguments,
 714 nearly all estimates are the nearest __representable value, a minority are within 1 __ulp and only a very few 2 ULP.
 715
 716 [graph lambert_w0_errors_graph]
 717 [graph lambert_wm1_errors_graph]
 718
 719 For the range of z arguments over the range -0.35 to 0.5, a different algorithm is used, but the same
 720 technique of evaluating reference values using a __multiprecision `cpp_dec_float_100` was used.
 721 For extremely small z arguments, near zero, and those extremely near the singularity at the branch point,
 722 precision can be much lower, as might be expected.
 723
 724 See source at:
 725 [@../../example/lambert_w_simple_examples.cpp lambert_w_simple_examples.cpp]
 726 [@../../test/test_lambert_w.cpp test_lambert_w.cpp] contains routine tests using __boost_test.
 727 [@../../tools/lambert_w_errors_graph.cpp lambert_w_errors_graph.cpp] generating error graphs.
 728
 729 [h5:quadrature_testing  Testing with quadrature]
 730
 731 A further method of testing over a wide range of argument z values was devised by Nick Thompson
 732 (cunningly also to test the recently written quadrature routines including __multiprecision !).
 733 These are definite integral formulas involving the W function that are exactly known constants,
 734 for example, LambertW0(1/(z[sup2]) == [sqrt](2[pi]),
 735 see [@https://en.wikipedia.org/wiki/Lambert_W_function#Definite_integrals Definite Integrals].
 736 Some care was needed to avoid overflow and underflow as the integral function must evaluate to a finite result over the entire range.
 737
 738 [h5 Other implementations]
 739
 740 The Lambert W has also been discussed in a [@http://lists.boost.org/Archives/boost/2016/09/230819.php Boost thread].
 741
 742 This also gives link to a prototype version by which also gives complex results [^(x < -exp(-1)], about -0.367879).
 743 [@https://github.com/CzB404/lambert_w/ Balazs Cziraki 2016]
 744 Physicist, PhD student at Eotvos Lorand University, ELTE TTK Institute of Physics, Budapest.
 745 has also produced a prototype C++ library that can compute the Lambert W function for floating point
 746 [*and complex number types].
 747 This is not implemented here but might be completed in the future.
 748
 749 [h4:acknowledgements  Acknowledgements]
 750
 751 * Thanks to Wolfram for use of their invaluable online Wolfram Alpha service.
 752 * Thanks for Mark Chapman for performing offline Wolfram computations.
 753
 754 [h4:references References]
 755
 756 # NIST Digital Library of Mathematical Functions. [@http://dlmf.nist.gov/4.13.F1].
 757
 758 # [@http://www.orcca.on.ca/LambertW/ Lambert W Poster],
 759 R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffery and D. E. Knuth,
 760 On the Lambert W function Advances in Computational Mathematics, Vol 5, (1996) pp 329-359.
 761
 762 # [@https://people.sc.fsu.edu/~jburkardt/f_src/toms443/toms443.html TOMS443],
 763 Andrew Barry, S. J. Barry, Patricia Culligan-Hensley,
 764 Algorithm 743: WAPR - A Fortran routine for calculating real values of the W-function,[br]
 765 ACM Transactions on Mathematical Software, Volume 21, Number 2, June 1995, pages 172-181.[br]
 766 BISECT approximates the W function using bisection (GNU licence).
 767 Original FORTRAN77 version by Andrew Barry, S. J. Barry, Patricia Culligan-Hensley,
 768 this version by C++ version by John Burkardt.
 769
 770 # [@https://people.sc.fsu.edu/~jburkardt/f_src/toms743/toms743.html TOMS743] Fortran 90 (updated 2014).
 771
 772 Initial guesses based on:
 773
 774 # R.M.Corless, G.H.Gonnet, D.E.G.Hare, D.J.Jeffrey, and D.E.Knuth,
 775 On the Lambert W function, Adv.Comput.Math., vol. 5, pp. 329 to 359, (1996).
 776
 777 # D.A. Barry, J.-Y. Parlange, L. Li, H. Prommer, C.J. Cunningham, and
 778 F. Stagnitti. Analytical approximations for real values of the Lambert
 779 W-function. Mathematics and Computers in Simulation, 53(1), 95-103 (2000).
 780
 781 # D.A. Barry, J.-Y. Parlange, L. Li, H. Prommer, C.J. Cunningham, and
 782 F. Stagnitti. Erratum to analytical approximations for real values of the
 783 Lambert W-function. Mathematics and Computers in Simulation, 59(6):543-543, 2002.
 784
 785 # C++ __CUDA version of Luu algorithm, [@https://github.com/thomasluu/plog/blob/master/plog.cu plog].
 786
 787 # __Luu_thesis, see routine 11, page 98 for Lambert W algorithm.
 788
 789 # Having Fun with Lambert W(x) Function, Darko Veberic
 790 University of Nova Gorica, Slovenia IK, Forschungszentrum Karlsruhe, Germany, J. Stefan Institute, Ljubljana, Slovenia.
 791
 792 # Fran[ccedil]ois Chapeau-Blondeau and Abdelilah Monir, Numerical Evaluation of the Lambert W Function and Application to Generation of Generalized
 793 Gaussian Noise With Exponent 1/2, IEEE Transactions on Signal Processing, 50(9) (2002) 2160 - 2165.
 794
 795 # Toshio Fukushima, Precise and fast computation of Lambert W-functions without transcendental function evaluations, Journal of Computational and Applied
 796 Mathematics, 244 (2013) 77-89.
 797
 798 # T.C. Banwell and A. Jayakumar, Electronic Letter, Feb 2000, 36(4), pages 291-2.
 799 Exact analytical solution for current flow through diode with series resistance.
 800 [@https://doi.org/10.1049/el:20000301]
 801
 802 # Princeton Companion to Applied Mathematics,
 803 'The Lambert-W function', Section 1.3: Series and Generating Functions.
 804
 805 # Cleve Moler, Mathworks blog [@https://blogs.mathworks.com/cleve/2013/09/02/the-lambert-w-function/#bfba4e2d-e049-45a6-8285-fe8b51d69ce7 The Lambert W Function]
 806
 807 # Digital Library of Mathematical Function, [@https://dlmf.nist.gov/4.13 Lambert W function].
 808
 809 [endsect] [/section:lambert_w Lambert W function]
 810
 811 [/
 812   Copyright 2016 John Maddock, Paul A. Bristow, Thomas Luu, Nicholas Thompson.
 813   Distributed under the Boost Software License, Version 1.0.
 814   (See accompanying file LICENSE_1_0.txt
 815   or copy at http://www.boost.org/LICENSE_1_0.txt).
 816 ]