Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop
rebase
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link
link math lib on FreeBSD
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param
POWER10: Update param.h
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
POWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10: Optimize dgemv_n
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch
Restore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
add Fujitsu compilers
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix
Add -msse for x86 and silence build warning in getarch
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop
rebase
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
Don't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork
See #2993 for an analysis
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.
Martin Kroeker [Fri, 13 Nov 2020 11:35:09 +0000 (12:35 +0100)]
Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
Martin Kroeker [Fri, 13 Nov 2020 08:16:34 +0000 (09:16 +0100)]
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
Martin Kroeker [Fri, 13 Nov 2020 08:14:23 +0000 (09:14 +0100)]
Merge pull request #111 from xianyi/develop
rebase
Gengxin Xie [Fri, 13 Nov 2020 06:20:52 +0000 (14:20 +0800)]
Improve the performance of dasum and sasum when SMP is defined
Qiyu8 [Fri, 13 Nov 2020 02:20:24 +0000 (10:20 +0800)]
modify system.cmake to enable fma flag
Qiyu8 [Thu, 12 Nov 2020 12:31:03 +0000 (20:31 +0800)]
fix the CI failure of target specific option mismatch
Qiyu8 [Thu, 12 Nov 2020 09:35:17 +0000 (17:35 +0800)]
fix the CI failure of lack the head
Qiyu8 [Wed, 11 Nov 2020 07:53:48 +0000 (15:53 +0800)]
modify macro
Qiyu8 [Wed, 11 Nov 2020 07:18:01 +0000 (15:18 +0800)]
only FMA3 and vector larger than 128 have positive effects.
Qiyu8 [Wed, 11 Nov 2020 06:33:12 +0000 (14:33 +0800)]
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Tue, 10 Nov 2020 08:16:38 +0000 (16:16 +0800)]
fix sum optimize issues
Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:30:15 +0000 (20:30 +0100)]
Fix propagation of cpu properties to compiler options
Martin Kroeker [Sat, 7 Nov 2020 19:27:42 +0000 (20:27 +0100)]
Remove extraneous quotes that caused a cmake policy warning
Martin Kroeker [Sat, 7 Nov 2020 19:26:12 +0000 (20:26 +0100)]
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:22:41 +0000 (20:22 +0100)]
Merge pull request #110 from xianyi/develop
rebase
Martin Kroeker [Sat, 7 Nov 2020 13:41:34 +0000 (14:41 +0100)]
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
Martin Kroeker [Sat, 7 Nov 2020 11:17:49 +0000 (12:17 +0100)]
Fix macro name used in ifdef
Gengxin Xie [Thu, 5 Nov 2020 08:25:17 +0000 (16:25 +0800)]
fix typo
Gengxin Xie [Sun, 27 Sep 2020 02:38:19 +0000 (10:38 +0800)]
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Wed, 4 Nov 2020 15:02:46 +0000 (16:02 +0100)]
Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
Martin Kroeker [Tue, 3 Nov 2020 22:47:04 +0000 (23:47 +0100)]
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target