OpenBLAS ChangeLog
====================================================================
+Version 0.3.11
+ 17-Oct-2020
+
+ common:
+ * API change:
+ the newly added BFLOAT16 functions were renamed to use the
+ letter "B" instead of "H" to avoid potential confusion with
+ the IEEE "half precision float" type, i.e. the 0.3.10
+ SHGEMM is now SBGEMM and the corresponding build option
+ was changed from "BUILD_HALF" to "BUILD_BFLOAT16".
+ * Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper
+ limit for placing temporary arrays on the stack) to be compatible
+ with a stack size of 1mb (as imposed by the JAVA runtime library)
+ * Added mixed-precision dot function SBDOT and utility functions
+ shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between
+ single or double precision float arrays and bfloat16 arrays
+ * Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions
+ in lapack.h
+ * Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2
+ (causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263)
+ * Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415)
+ * Fixed several bugs in the LAPACK testsuite
+ * Improved performance of TRMM and TRSM for certain problem sizes
+ * Fixed infinite recursions and workspace miscalculations in ReLAPACK
+ * CMAKE builds no longer require pkg-config for creating the .pc file
+ * Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as
+ enabling these options
+ * Fixed detection of gfortran when invoked through an mpi wrapper
+ * Improve thread reinitialization performance with OpenMP xafter a fork
+ * Added support for building only the subset of the library required
+ for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE
+ * Optional function name prefixes and suffixes are now correctly
+ reflected in the generated cblas.h
+ * Added CMAKE build support for the LAPACK and multithreading tests
+
+POWER:
+ * Added optimized support for POWER10
+ * Added support for compiling for POWER8 in 32bit mode
+ * Added support for compilation with LLVM/clang
+ * Added support for compilation with NVIDIA/PGI compilers
+ * Fixed building on big-endian POWER8
+ * Fixed miscompilation of ZDOTC by gcc10
+ * Fixed alignment errors in the POWER8 SAXPY kernel
+ * Improved CPU detection on AIX
+ * Supported building with older compilers on POWER9
+
+x86_64:
+ * Added support for Intel Cooperlake
+ * Added autodetection of AMD Renoir/Matisse/Zen3 cpus
+ * Added autodetection of Intel Comet Lake cpus
+ * Reimplemented ?sum, ?dot and daxpy using universal intrinsics
+ * Reset the fpu state before using the fpu on Windows as a workaround
+ for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004)
+ * Fixed potentially undefined behaviour in the dot and gemv_t kernels
+ * Fixed a potential segmentation fault in DYNAMIC_ARCH builds
+ * Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers
+
+ARMV7:
+ * Fixed cpu detection on BSD-like systems
+
+ARMV8:
+ * Added preliminary support for Apple Vortex cpus
+ * Added support for the Cavium ThunderX3T110 cpu
+ * Fixed cpu detection on BSD-like systems
+ * Fixed compilation in -std=C18 mode
+
+
+IBM Z:
+ * Added support for compiling with the clang compiler
+ * Improved GEMM performance on Z14
+
+====================================================================
Version 0.3.10
14-Jun-2020