runtime/3rdparty/half/include/half/README.txt

   1 HALF-PRECISION FLOATING-POINT LIBRARY (Version 2.1.0)
   2 -----------------------------------------------------
   3
   4 This is a C++ header-only library to provide an IEEE 754 conformant 16-bit
   5 half-precision floating-point type along with corresponding arithmetic
   6 operators, type conversions and common mathematical functions. It aims for both
   7 efficiency and ease of use, trying to accurately mimic the behaviour of the
   8 built-in floating-point types at the best performance possible.
   9
  10
  11 INSTALLATION AND REQUIREMENTS
  12 -----------------------------
  13
  14 Conveniently, the library consists of just a single header file containing all
  15 the functionality, which can be directly included by your projects, without the
  16 neccessity to build anything or link to anything.
  17
  18 Whereas this library is fully C++98-compatible, it can profit from certain
  19 C++11 features. Support for those features is checked automatically at compile
  20 (or rather preprocessing) time, but can be explicitly enabled or disabled by
  21 predefining the corresponding preprocessor symbols to either 1 or 0 yourself
  22 before including half.hpp. This is useful when the automatic detection fails
  23 (for more exotic implementations) or when a feature should be explicitly
  24 disabled:
  25
  26   - 'long long' integer type for mathematical functions returning 'long long'
  27     results (enabled for VC++ 2003 and icc 11.1 and newer, gcc and clang,
  28     overridable with 'HALF_ENABLE_CPP11_LONG_LONG').
  29
  30   - Static assertions for extended compile-time checks (enabled for VC++ 2010,
  31     gcc 4.3, clang 2.9, icc 11.1 and newer, overridable with
  32     'HALF_ENABLE_CPP11_STATIC_ASSERT').
  33
  34   - Generalized constant expressions (enabled for VC++ 2015, gcc 4.6, clang 3.1,
  35     icc 14.0 and newer, overridable with 'HALF_ENABLE_CPP11_CONSTEXPR').
  36
  37   - noexcept exception specifications (enabled for VC++ 2015, gcc 4.6,
  38     clang 3.0, icc 14.0 and newer, overridable with 'HALF_ENABLE_CPP11_NOEXCEPT').
  39
  40   - User-defined literals for half-precision literals to work (enabled for
  41     VC++ 2015, gcc 4.7, clang 3.1, icc 15.0 and newer, overridable with
  42     'HALF_ENABLE_CPP11_USER_LITERALS').
  43
  44   - Thread-local storage for per-thread floating-point exception flags (enabled
  45     for VC++ 2015, gcc 4.8, clang 3.3, icc 15.0 and newer, overridable with
  46     'HALF_ENABLE_CPP11_THREAD_LOCAL').
  47
  48   - Type traits and template meta-programming features from <type_traits>
  49     (enabled for VC++ 2010, libstdc++ 4.3, libc++ and newer, overridable with
  50     'HALF_ENABLE_CPP11_TYPE_TRAITS').
  51
  52   - Special integer types from <cstdint> (enabled for VC++ 2010, libstdc++ 4.3,
  53     libc++ and newer, overridable with 'HALF_ENABLE_CPP11_CSTDINT').
  54
  55   - Certain C++11 single-precision mathematical functions from <cmath> for
  56     floating-point classification during conversions from higher precision types
  57     (enabled for VC++ 2013, libstdc++ 4.3, libc++ and newer, overridable with
  58     'HALF_ENABLE_CPP11_CMATH').
  59
  60   - Floating-point environment control from <cfenv> for possible exception
  61     propagation to the built-in floating-point platform (enabled for VC++ 2013,
  62     libstdc++ 4.3, libc++ and newer, overridable with 'HALF_ENABLE_CPP11_CFENV').
  63
  64   - Hash functor 'std::hash' from <functional> (enabled for VC++ 2010,
  65     libstdc++ 4.3, libc++ and newer, overridable with 'HALF_ENABLE_CPP11_HASH').
  66
  67 The library has been tested successfully with Visual C++ 2005-2015, gcc 4-8
  68 and clang 3-8 on 32- and 64-bit x86 systems. Please contact me if you have any
  69 problems, suggestions or even just success testing it on other platforms.
  70
  71
  72 DOCUMENTATION
  73 -------------
  74
  75 What follows are some general words about the usage of the library and its
  76 implementation. For a complete documentation of its interface consult the
  77 corresponding website http://half.sourceforge.net. You may also generate the
  78 complete developer documentation from the library's only include file's doxygen
  79 comments, but this is more relevant to developers rather than mere users.
  80
  81 BASIC USAGE
  82
  83 To make use of the library just include its only header file half.hpp, which
  84 defines all half-precision functionality inside the 'half_float' namespace. The
  85 actual 16-bit half-precision data type is represented by the 'half' type, which
  86 uses the standard IEEE representation with 1 sign bit, 5 exponent bits and 11
  87 mantissa bits (including the hidden bit) and supports all types of special
  88 values, like subnormal values, infinity and NaNs. This type behaves like the
  89 built-in floating-point types as much as possible, supporting the usual
  90 arithmetic, comparison and streaming operators, which makes its use pretty
  91 straight-forward:
  92
  93     using half_float::half;
  94     half a(3.4), b(5);
  95     half c = a * b;
  96     c += 3;
  97     if(c > a)
  98         std::cout << c << std::endl;
  99
 100 Additionally the 'half_float' namespace also defines half-precision versions
 101 for all mathematical functions of the C++ standard library, which can be used
 102 directly through ADL:
 103
 104     half a(-3.14159);
 105     half s = sin(abs(a));
 106     long l = lround(s);
 107
 108 You may also specify explicit half-precision literals, since the library
 109 provides a user-defined literal inside the 'half_float::literal' namespace,
 110 which you just need to import (assuming support for C++11 user-defined literals):
 111
 112     using namespace half_float::literal;
 113     half x = 1.0_h;
 114
 115 Furthermore the library provides proper specializations for
 116 'std::numeric_limits', defining various implementation properties, and
 117 'std::hash' for hashing half-precision numbers (assuming support for C++11
 118 'std::hash'). Similar to the corresponding preprocessor symbols from <cmath>
 119 the library also defines the 'HUGE_VALH' constant and maybe the 'FP_FAST_FMAH'
 120 symbol.
 121
 122 CONVERSIONS AND ROUNDING
 123
 124 The half is explicitly constructible/convertible from a single-precision float
 125 argument. Thus it is also explicitly constructible/convertible from any type
 126 implicitly convertible to float, but constructing it from types like double or
 127 int will involve the usual warnings arising when implicitly converting those to
 128 float because of the lost precision. On the one hand those warnings are
 129 intentional, because converting those types to half neccessarily also reduces
 130 precision. But on the other hand they are raised for explicit conversions from
 131 those types, when the user knows what he is doing. So if those warnings keep
 132 bugging you, then you won't get around first explicitly converting to float
 133 before converting to half, or use the 'half_cast' described below. In addition
 134 you can also directly assign float values to halfs.
 135
 136 In contrast to the float-to-half conversion, which reduces precision, the
 137 conversion from half to float (and thus to any other type implicitly
 138 convertible from float) is implicit, because all values represetable with
 139 half-precision are also representable with single-precision. This way the
 140 half-to-float conversion behaves similar to the builtin float-to-double
 141 conversion and all arithmetic expressions involving both half-precision and
 142 single-precision arguments will be of single-precision type. This way you can
 143 also directly use the mathematical functions of the C++ standard library,
 144 though in this case you will invoke the single-precision versions which will
 145 also return single-precision values, which is (even if maybe performing the
 146 exact same computation, see below) not as conceptually clean when working in a
 147 half-precision environment.
 148
 149 The default rounding mode for conversions between half and more precise types
 150 as well as for rounding results of arithmetic operations and mathematical
 151 functions rounds to the nearest representable value. But by predefining the
 152 'HALF_ROUND_STYLE' preprocessor symbol this default can be overridden with one
 153 of the other standard rounding modes using their respective constants or the
 154 equivalent values of 'std::float_round_style' (it can even be synchronized with
 155 the built-in single-precision implementation by defining it to
 156 'std::numeric_limits<float>::round_style'):
 157
 158   - 'std::round_indeterminate' (-1) for the fastest rounding.
 159
 160   - 'std::round_toward_zero' (0) for rounding toward zero.
 161
 162   - 'std::round_to_nearest' (1) for rounding to the nearest value (default).
 163
 164   - 'std::round_toward_infinity' (2) for rounding toward positive infinity.
 165
 166   - 'std::round_toward_neg_infinity' (3) for rounding toward negative infinity.
 167
 168 In addition to changing the overall default rounding mode one can also use the
 169 'half_cast'. This converts between half and any built-in arithmetic type using
 170 a configurable rounding mode (or the default rounding mode if none is
 171 specified). In addition to a configurable rounding mode, 'half_cast' has
 172 another big difference to a mere 'static_cast': Any conversions are performed
 173 directly using the given rounding mode, without any intermediate conversion
 174 to/from 'float'. This is especially relevant for conversions to integer types,
 175 which don't necessarily truncate anymore. But also for conversions from
 176 'double' or 'long double' this may produce more precise results than a
 177 pre-conversion to 'float' using the single-precision implementation's current
 178 rounding mode would.
 179
 180     half a = half_cast<half>(4.2);
 181     half b = half_cast<half,std::numeric_limits<float>::round_style>(4.2f);
 182     assert( half_cast<int, std::round_to_nearest>( 0.7_h )     == 1 );
 183     assert( half_cast<half,std::round_toward_zero>( 4097 )     == 4096.0_h );
 184     assert( half_cast<half,std::round_toward_infinity>( 4097 ) == 4100.0_h );
 185     assert( half_cast<half,std::round_toward_infinity>( std::numeric_limits<double>::min() ) > 0.0_h );
 186
 187 ACCURACY AND PERFORMANCE
 188
 189 From version 2.0 onward the library is implemented without employing the
 190 underlying floating-point implementation of the system (except for conversions,
 191 of course), providing an entirely self-contained half-precision implementation
 192 with results independent from the system's existing single- or double-precision
 193 implementation and its rounding behaviour.
 194
 195 As to accuracy, many of the operators and functions provided by this library
 196 are exact to rounding for all rounding modes, i.e. the error to the exact
 197 result is at most 0.5 ULP (unit in the last place) for rounding to nearest and
 198 less than 1 ULP for all other rounding modes. This holds for all the operations
 199 required by the IEEE 754 standard and many more. Specifically the following
 200 functions might exhibit a deviation from the correctly rounded exact result by
 201 1 ULP for a select few input values: 'expm1', 'log1p', 'pow', 'atan2', 'erf',
 202 'erfc', 'lgamma', 'tgamma' (for more details see the documentation of the
 203 individual functions). All other functions and operators are always exact to
 204 rounding or independent of the rounding mode altogether.
 205
 206 The increased IEEE-conformance and cleanliness of this implementation comes
 207 with a certain performance cost compared to doing computations and mathematical
 208 functions in hardware-accelerated single-precision. On average and depending on
 209 the platform, the arithemtic operators are about 75% as fast and the
 210 mathematical functions about 33-50% as fast as performing the corresponding
 211 operations in single-precision and converting between the inputs and outputs.
 212 However, directly computing with half-precision values is a rather rare
 213 use-case and usually using actual 'float' values for all computations and
 214 temproraries and using 'half's only for storage is the recommended way. But
 215 nevertheless the goal of this library was to provide a complete and
 216 conceptually clean IEEE-confromant half-precision implementation and in the few
 217 cases when you do need to compute directly in half-precision you do so for a
 218 reason and want accurate results.
 219
 220 If necessary, this internal implementation can be overridden by predefining the
 221 'HALF_ARITHMETIC_TYPE' preprocessor symbol to one of the built-in
 222 floating-point types ('float', 'double' or 'long double'), which will cause the
 223 library to use this type for computing arithmetic operations and mathematical
 224 functions (if available). However, due to using the platform's floating-point
 225 implementation (and its rounding behaviour) internally, this might cause
 226 results to deviate from the specified half-precision rounding mode. It will of
 227 course also inhibit the automatic exception detection described below.
 228
 229 The conversion operations between half-precision and single-precision types can
 230 also make use of the F16C extension for x86 processors by using the
 231 corresponding compiler intrinsics from <immintrin.h>. Support for this is
 232 checked at compile-time by looking for the '__F16C__' macro which at least gcc
 233 and clang define based on the target platform. It can also be enabled manually
 234 by predefining the 'HALF_ENABLE_F16C_INTRINSICS' preprocessor symbol to 1, or 0
 235 for explicitly disabling it. However, this will directly use the corresponding
 236 intrinsics for conversion without checking if they are available at runtime
 237 (possibly crashing if they are not), so make sure they are supported on the
 238 target platform before enabling this.
 239
 240 EXCEPTION HANDLING
 241
 242 The half-precision implementation supports all 5 required floating-point
 243 exceptions from the IEEE standard to indicate erroneous inputs or inexact
 244 results during operations. These are represented by exception flags which
 245 actually use the same values as the corresponding 'FE_...' flags defined in
 246 C++11's <cfenv> header if supported, specifically:
 247
 248   - 'FE_INVALID' for invalid inputs to an operation.
 249   - 'FE_DIVBYZERO' for finite inputs producing infinite results.
 250   - 'FE_OVERFLOW' if a result is too large to represent finitely.
 251   - 'FE_UNDERFLOW' for a subnormal or zero result after rounding.
 252   - 'FE_INEXACT' if a result needed rounding to be representable.
 253   - 'FE_ALL_EXCEPT' as a convenient OR of all possible exception flags.
 254
 255 The internal exception flag state will start with all flags cleared and is
 256 maintained per thread if C++11 thread-local storage is supported, otherwise it
 257 will be maintained globally and will theoretically NOT be thread-safe (while
 258 practically being as thread-safe as a simple integer variable can be). These
 259 flags can be managed explicitly using the library's error handling functions,
 260 which again try to mimic the built-in functions for handling floating-point
 261 exceptions from <cfenv>. You can clear them with 'feclearexcept' (which is the
 262 only way a flag can be cleared), test them with 'fetestexcept', explicitly
 263 raise errors with 'feraiseexcept' and save and restore their state using
 264 'fegetexceptflag' and 'fesetexceptflag'. You can also throw corresponding C++
 265 exceptions based on the current flag state using 'fethrowexcept'.
 266
 267 However, any automatic exception detection and handling during half-precision
 268 operations and functions is DISABLED by default, since it comes with a minor
 269 performance overhead due to runtime checks, and reacting to IEEE floating-point
 270 exceptions is rarely ever needed in application code. But the library fully
 271 supports IEEE-conformant detection of floating-point exceptions and various
 272 ways for handling them, which can be enabled by pre-defining the corresponding
 273 preprocessor symbols to 1. They can be enabled individually or all at once and
 274 they will be processed in the order they are listed here:
 275
 276   - 'HALF_ERRHANDLING_FLAGS' sets the internal exception flags described above
 277     whenever the corresponding exception occurs.
 278   - 'HALF_ERRHANDLING_ERRNO' sets the value of 'errno' from <cerrno> similar to
 279     the behaviour of the built-in floating-point types when 'MATH_ERRNO' is used.
 280   - 'HALF_ERRHANDLING_FENV' will propagate exceptions to the built-in
 281     floating-point implementation using 'std::feraiseexcept' if support for
 282     C++11 floating-point control is enabled. However, this does not synchronize
 283     exceptions: neither will clearing  propagate nor will it work in reverse.
 284   - 'HALF_ERRHANDLING_THROW_...' can be defined to a string literal which will
 285     be used as description message for a C++ exception that is thrown whenever
 286     a 'FE_...' exception occurs, similar to the behaviour of 'fethrowexcept'.
 287
 288 If any of the above error handling is activated, non-quiet operations on
 289 half-precision values will also raise a 'FE_INVALID' exception whenever
 290 they encounter a signaling NaN value, in addition to transforming the value
 291 into a quiet NaN. If error handling is disabled, signaling NaNs will be
 292 treated like quiet NaNs (while still getting explicitly quieted if propagated
 293 to the result). There can also be additional treatment of overflow and
 294 underflow errors after they have been processed as above, which is ENABLED by
 295 default (but of course only takes effect if any other exception handling is
 296 activated) unless overridden by pre-defining the corresponding preprocessor
 297 symbol to 0:
 298
 299   - 'HALF_ERRHANDLING_OVERFLOW_TO_INEXACT' will cause overflow errors to also
 300     raise a 'FE_INEXACT' exception.
 301   - 'HALF_ERRHANDLING_UNDERFLOW_TO_INEXACT' will cause underflow errors to also
 302     raise a 'FE_INEXACT' exception. This will also slightly change the
 303     behaviour of the underflow exception, which will ONLY be raised if the
 304     result is actually inexact due to underflow. If this is disabled, underflow
 305     exceptions will be raised for ANY (possibly exact) subnormal result.
 306
 307
 308 CREDITS AND CONTACT
 309 -------------------
 310
 311 This library is developed by CHRISTIAN RAU and released under the MIT License
 312 (see LICENSE.txt). If you have any questions or problems with it, feel free to
 313 contact me at rauy@users.sourceforge.net.
 314
 315 Additional credit goes to JEROEN VAN DER ZIJP for his paper on "Fast Half Float
 316 Conversions", whose algorithms have been used in the library for converting
 317 between half-precision and single-precision values.