X-Git-Url: http://review.tizen.org/git/?a=blobdiff_plain;f=Changelog.txt;h=205ca02e2ed18f3eb11cfc1f3900c945b12bb222;hb=refs%2Fheads%2Faccepted%2Ftizen_6.0_unified;hp=c59166c38fdd264ead023e75c68f19d93cebf524;hpb=a71e8c82f6a9f73093b631e5deab1e8da716b61f;p=platform%2Fupstream%2Fopenblas.git diff --git a/Changelog.txt b/Changelog.txt index c59166c..205ca02 100644 --- a/Changelog.txt +++ b/Changelog.txt @@ -1,5 +1,446 @@ OpenBLAS ChangeLog ==================================================================== +Version 0.3.7 +11-Aug 2019 + +common: + * having the gmake special variables TARGET_ARCH or TARGET_MACH + defined no longer causes build failures in ctest or utest + * defining NO_AFFINITY or USE_TLS to 0 in gmake builds no longer + has the same effect as setting them to 1 + * a new test program was added to allow checking the library for + thread safety + * a new option USE_LOCKING was added to ensure thread safety when + OpenBLAS itself is built without multithreading but will be + called from multiple threads. + * a build failure on Linux with glibc versions earlier than 2.5 + was fixed + * a runtime error with CPU enumeration (and NO_AFFINITY not set) + on glibc 2.6 was fixed + * NO_AFFINITY was added to the CMAKE options (and defaults to being + active on Linux, as in the gmake builds) + +x86_64: + * the build-time logic for detection of AVX512 availability in + the processor and compiler was fixed + * gmake builds on OSX now set the internal name of the library to + libopenblas.0.dylib (consistent with CMAKE) + * the Haswell DGEMM kernel received a significant speedup through + improved prefetch and load instructions + * performance of DGEMM, DTRMM, DTRSM and ZDOT on Zen/Zen2 was markedly + increased by avoiding vpermpd instructions + * the SKYLAKEX (AVX512) DGEMM helper functions have now been disabled + to fix remaining errors in DGEMM, DSYMM and DTRMM + +## POWER: + * added support for building on FreeBSD/powerpc64 and FreeBSD/ppc970 + * added optimized kernels for POWER9 single and double precision complex BLAS3 + * added optimized kernels for POWER9 SGEMM and STRMM + +## ARMV7: + * fixed the softfp implementations of xAMAX and IxAMAX + * removed the predefined -march= flags on both ARMV5 and ARMV6 as + they were appropriate for only a subset of platforms + +==================================================================== +Version 0.3.6 +29-Apr-2019 + +common: + * the build tools now check that a given cpu TARGET is actually valid + * the build-time check of system features (c_check) has been made + less dependent on particular perl features (this should mainly + benefit building on Windows) + * several problem with the ReLAPACK integration were fixed, + including INTERFACE64 support and building a shared library + * building with CMAKE on BSD systems was improved + * a non-absolute SUM function was added based on the + existing optimized code for ASUM + * CBLAS interfaces to the IxMIN and IxMAX functions were added + * a name clash between LAPACKE and BOOST headers was resolved + * CMAKE builds with OpenMP failed to include the appropriate getrf_parallel + kernels + * a crash on thread (key) deletion with the USE_TLS=1 memory management + option was fixed + * restored several earlier fixes, in particular for OpenMP performance, + building on BSD, and calling fork on CYGWIN, which had inadvertently + been dropped in the 0.3.3 rewrite of the memory management code. + +x86_64: + * the AVX512 DGEMM kernel has been disabled again due to unsolved problems + * building with old versions of MSVC was fixed + * it is now possible to build a static library on Windows with CMAKE + * accessing environment variables on CYGWIN at run time was fixed + * the CMAKE build system now recognizes 32bit userspace on 64bit hardware + * Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected + * building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported + with CMAKE as well + * building for DYNAMIC_ARCH with GENERIC as the default target is now supported + * a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed + * assembly bugs involving undeclared modification of input operands were fixed + in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem, + Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause + test failures or segfaults when compiled with recent versions of gcc from 8 onward. + * a similar bug was fixed in the blas_quickdivide code used to split workloads + in most functions + * a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX + * fixed building on SkylakeX systems when either the compiler or the (emulated) operating + environment does not support AVX512 + * improved GEMM performance on ZEN targets + +x86: + * build failures caused by the recently added checks for AVX512 were fixed + * an inline assembly bug involving undeclared modification of an input argument was + fixed in the blas_quickdivide code used to split workloads in most functions + * a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX + +MIPS32: + * a bug in the IMIN implementation made it return the result of IMAX + +POWER: + * single precision BLAS1/2 functions have received optimized POWER8 kernels + * POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel + * building on PPC970 systems under OSX Leopard or Tiger is now supported + * out-of-bounds memory accesses in the gemm_beta microkernels were fixed + * building a shared library on AIX is now supported for POWER6 + * DYNAMIC_ARCH support has been added for POWER6 and newer + +ARMv7: + * corrected xDOT behaviour with zero INC_X or INC_Y + * a bug in the IMIN implementation made it return the result of IMAX + +ARMv8: + * added support for HiSilicon TSV110 cpus + * the CMAKE build system now recognizes 32bit userspace on 64bit hardware + * cross-compilation with CMAKE now works again + * a bug in the IMIN implementation made it return the result of IMAX + * ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7 + +IBM Z: + * optimized microkernels for single precicion BLAS1/2 functions have been added + for both Z13 and Z14 + +==================================================================== +Version 0.3.5 +31-Dec-2018 + +common: + * loop unrolling in TRMV has been enabled again. + * A domain error in the thread workload distribution for SYRK + has been fixed. + * gmake builds will now automatically add -fPIC to the build + options if the platform requires it. + * a pthreads key leakage (and associate crash on dlclose) in + the USE_TLS codepath was fixed. + * building of the utest cases on systems that do not provide + an implementation of complex.h was fixed. + +x86_64: + * the SkylakeX code was changed to compile on OSX. + * unwanted application of the -march=skylake-avx512 option + to the common code parts of a DYNAMIC_ARCH build was fixed. + * improved performance of SGEMM for small workloads on Skylake X. + * performance of SGEMM and DGEMM was improved on Haswell. + +ARMV8: + * a configuration error that broke the CNRM2 kernel was corrected. + * compilation of the GEMM kernels with CMAKE was fixed. + * DYNAMIC_ARCH builds are now available with CMAKE as well. + * using CMAKE for cross-compilation to the new cpu TARGETs + introduced in 0.3.4 now works. + +POWER: + * a problem in cpu autodetection for AIX has been corrected. + +==================================================================== +Version 0.3.4 +02-Dec-2018 + +common: + * the new, experimental thread-local memory allocation had + inadvertently been left enabled for gmake builds in 0.3.3 + despite the announcement. It is now disabled by default, and + single-threaded builds will keep using the old allocator even + if the USE_TLS option is turned on. + * OpenBLAS will now provide enough buffer space for at least 50 + threads by default. + * The output of openblas_get_config() now contains the version + number. + * A serious thread safety bug in GEMV operation with small M and + large N size has been fixed. + * The code will now automatically call blas_thread_init after a + fork if needed before handling a call to openblas_set_num_threads + * Accesses to parallelized level3 functions from multiple callers + are now serialized to avoid thread races (unless using OpenMP). + This should provide better performance than the known-threadsafe + (but non-default) USE_SIMPLE_THREADED_LEVEL3 option. + * When building LAPACK with gfortran, -frecursive is now (again) + enabled by default to ensure correct behaviour. + * The OpenBLAS version cblas.h now supports both CBLAS_ORDER and + CBLAS_LAYOUT as the name of the matrix row/column order option. + * Externally set LDFLAGS are now passed through to the final compile/link + steps to facilitate setting platform-specific linker flags. + * A potential race condition during the build of LAPACK (that would + usually manifest itself as a failure to build TESTING/MATGEN) has been + fixed. + * xHEMV has been changed to stay single-threaded for small input sizes + where the overhead of multithreading exceeds any possible gains + * CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or + ThunderX hardware with sizable input. + * Linker flags for the PGI compiler have been updated + * Behaviour of AXPY with zero increments is now handled in the C interface, + correcting the result on at least Intel Atom. + * The result matrix from calling SGELSS with an all-zero input matrix is + now zeroed completely. + +x86_64: + * Autodetection of AMD Ryzen2 has been fixed (again). + * CMAKE builds now support labeling of an INTERFACE64=1 build of + the library with the _64 suffix. + * AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel + has been sped up by rewriting with C intrinsics + * Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS) + +POWER: + * added support for building on AIX (with gcc and GNU tools from AIX Toolbox). + * CPU type detection has been implemented for AIX. + * CPU type detection has been fixed for NETBSD. + +MIPS64: + * AXPY on LOONGSON3A has been corrected to pass "zero increment" utest. + * DSDOT on LOONGSON3A has been fixed. + * the SGEMM microkernel has been hardened against potential data loss. + +ARMV8: + * DYNAMic_ARCH support is now available for 64bit ARM + * cross-compiling for ARMV8 under iOS now works. + * cpu-specific code has been rearranged to make better use of both + hardware commonalities and model-specific compiler optimizations. + * XGENE1 has been removed as a TARGET, superseded by the improved generic + ARMV8 support. + +ARMV7: + * Older assembly mnemonics have been converted to UAL form to allow + building with clang 7.0 + * Cross compiling LAPACKE for Android has been fixed again (broken by + update to LAPACK 3.7.0 some while ago). + +==================================================================== +Version 0.3.3 +31-Aug-2018 + +common: + * thread memory allocation has been switched back to the method + used before version 0.3.1 due to unexpected problems caused by + the new code under some circumstances. A new compile-time option + USE_TLS has been added to enable the new code, and it is hoped + that this can become the default again in the next version. + * LAPAck PR272 has been integrated, which fixes spurious errors + in DSYEVR and related functions caused by missing conversion + from ILAENV to ILAENV_2STAGE in several _2stage routines. + * the cmake-generated OpenBLASConfig.cmake now uses correct case + for the name of the library + * added support for Haiku OS + +x86_64: + * added AVX512 implementations of SDOT, DDOT, SAXPY, DAXPY, + DSCAL, DGEMVN and DSYMVL + * added a workaround for a cygwin issue that prevented compilation + of AVX512 code + +IBM Z: + * added autodetection of Z14 + * fixed TRMM errors in the generic target + +==================================================================== +Version 0.3.2 +30-Jul-2018 + +common: + * fixes for regressions caused by the rewrite of the thread + initialization code in 0.3.1 + +POWER: + * fixed cpu autodetection for the BSDs + +MIPS64: + * fixed utest errors in AXPY, DSDOT, ROT and SWAP + +x86_64: + * added autodetection of AMD Ryzen 2 + * fixed build with older versions of MSVC + +==================================================================== +Version 0.3.1 +01-Jul-2018 + +common: + * rewritten thread initialization code with significantly reduced overhead + * added CBLAS interfaces to the IxAMIN BLAS extension functions + * fixed the lapack-test target + * CMAKE builds now create an OpenBLASConfig.cmake file + * ZAXPY now uses a single thread for small input sizes + * the LAPACK code was updated from Reference-LAPACK/lapack#253 + (fixing LAPACKE interfaces to Aasen's functions) + +POWER: + * corrected CROT and ZROT behaviour with zero INC_X + +ARMV7: + * corrected xDOT behaviour with zero INC_X or INC_Y + +x86_64: + * retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER, + this affects PENRYN,DUNNINGTON,OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO + (which will still be supported via the slower PRESCOTT kernels when this option is not set) + * added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to + specify the list of x86_64 targets to include. Any target not on the list will be supported + by the Sandybridge or Nehalem kernels if available, or by Prescott. + * improved SWITCH_RATIO on Haswell for increased GEMM throughput + * added initial support for Intel Skylake X, including an AVX512 SGEMM kernel + * added autodetection of Intel Cannon Lake series as Skylake X + * added a default L2 cache size for hypervisors that return zero here (Chromebook) + * fixed a name clash with recent Windows10 headers that broke the build with (at least) + recent mingw from MSYS2 + * fixed a link error in mixed clang/gfortran builds with OpenMP + * updated the OSX deployment target to 10.8 + * switched on parallel make for builds on MS Windows by default + +x86: + * fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y + +==================================================================== +Version 0.3.0 +23-May-2108 + +common: + * fixed some more thread race and locking bugs + * added preliminary support for calling an OpenMP build of the library from multiple threads + * removed performance impact of thread locks added in 0.2.20 on OpenMP code + * general code cleanup + * optimized DSDOT implementation + * improved thread distribution for GEMM + * corrected IMATCOPY/OMATCOPY implementation + * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations + * cmake build improvements + * pkgconfig file now contains build options + * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build + * corrections and improvements for systems with more than 64 cpus + * LAPACK code updated to 3.8.0 including later fixes + * added ReLAPACK, a recursive implementation of several LAPACK functions + * Rewrote ROTMG to handle cases that the netlib code failed to address + * Disabled (broken) multithreading code for xTRMV + * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard + * shared memory access failures on startup are now handled more gracefully + * restored utests from earlier releases (and made them pass on all affected systems) + +SPARC: + * several fixes for cpu autodetection + +POWER: + * corrected vector register overwriting in several Power8 kernels + * optimized additional BLAS functions + +ARM: + * added support for CortexA53 and A72 + * added autodetection for ThunderX2T99 + * made most optimized kernels the default for generic ARMv8 targets + +x86_64: + * parallelized DDOT kernel for Haswell + * changed alignment directives in assembly kernels to boost performance on OSX + * fixed register handling in the GEMV microkernels (bug exposed by gcc7) + * added support for building on OpenBSD and Dragonfly + * updated compiler options to work with Intel release 2018 + * support fully optimized build with clang/flang on Microsoft Windows + * fixed building on AIX + +IBM Z: + * added optimized BLAS 1/2 functions + +MIPS: + * fixed cpu autodetection helper code + * added mips32 1004K cpu (Mediatek MT7621 and similar SoC) + * added mips64 I6500 cpu + +==================================================================== +Version 0.2.20 +24-Jul-2017 + +common: + * Improved CMake support + * Fixed several thread race and locking bugs + * Fixed default LAPACK optimization level + * Updated LAPACK to 3.7.0 + * Added ReLAPACK (https://github.com/HPAC/ReLAPACK, make BUILD_RELAPACK=1) + +POWER: + * Optimizations for Power9 + * Fixed several Power8 assembly bugs + +ARM: + * New optimized Vulcan and ThunderX2T99 targets + * Support for ARMV7 SOFT_FP ABI (make ARM_SOFTFP_ABI=1) + * Detect all cpu cores including offline ones + * Fix compilation with CLANG + * Support building a shared library for Android + +MIPS: + * Fixed several threading issues + * Fix compilation with CLANG + +x86_64: + * Detect Intel Bay Trail and Apollo Lake + * Detect Intel Sky Lake and Kaby Lake + * Detect Intel Knights Landing + * Detect AMD A8, A10, A12 and Ryzen + * Support 64bit builds with Visual Studio + * Fix building with Intel and PGI compilers + * Fix building with MINGW and TDM-GCC + * Fix cmake builds for Haswell and related cpus + * Fix building for Sandybridge with CLANG 3.9 + * Add support for the FLANG compiler + +IBM Z: + * New target z13 with BLAS3 optimizations + +==================================================================== +Version 0.2.19 +1-Sep-2016 +common: + * Improved cross compiling. + * Fix the bug on musl libc. + +POWER: + * Optimize BLAS on Power8 + * Fixed Julia+OpenBLAS bugs on Power8 + +MIPS: + * Optimize BLAS on MIPS P5600 and I6400 (Thanks, Shivraj Patil, Kaustubh Raste) + +ARM: + * Improved on ARM Cortex-A57. (Thanks, Ashwin Sekhar T K) + + +==================================================================== +Version 0.2.18 +12-Apr-2016 +common: + * If you set MAKE_NB_JOBS flag less or equal than zero, + make will be without -j. + +x86/x86_64: + * Support building Visual Studio static library. (#813, Thanks, theoractice) + * Fix bugs to pass buidbot CI tests (http://build.openblas.net) + +ARM: + * Provide DGEMM 8x4 kernel for Cortex-A57 (Thanks, Ashwin Sekhar T K) + +POWER: + * Optimize S and C BLAS3 on Power8 + * Optimize BLAS2/1 on Power8 + +==================================================================== Version 0.2.17 20-Mar-2016 common: