Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop
rebase
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop
MIPS: Fix two bugs
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop
Fix test errors reported by cblas_cgemm & cblas_ctrmm
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
Hao Chen [Mon, 7 Dec 2020 02:04:00 +0000 (10:04 +0800)]
Fix failed sswap and dswap case by using msa optimization
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
Martin Kroeker [Sun, 6 Dec 2020 21:34:36 +0000 (22:34 +0100)]
Merge pull request #3024 from martin-frbg/sparc
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:20:50 +0000 (19:20 +0100)]
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:15:37 +0000 (19:15 +0100)]
Work around DOT and SWAP test failures
Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop
rebase
Jin Bo [Sat, 5 Dec 2020 07:06:12 +0000 (15:06 +0800)]
Fix test errors reported by cblas_cgemm & cblas_ctrmm
The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link
link math lib on FreeBSD
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param
POWER10: Update param.h
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
POWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10: Optimize dgemv_n
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch
Restore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
add Fujitsu compilers
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix
Add -msse for x86 and silence build warning in getarch
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop
rebase
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
Don't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork
See #2993 for an analysis
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.
Martin Kroeker [Fri, 13 Nov 2020 11:35:09 +0000 (12:35 +0100)]
Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
Martin Kroeker [Fri, 13 Nov 2020 08:16:34 +0000 (09:16 +0100)]
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
Martin Kroeker [Fri, 13 Nov 2020 08:14:23 +0000 (09:14 +0100)]
Merge pull request #111 from xianyi/develop
rebase
Gengxin Xie [Fri, 13 Nov 2020 06:20:52 +0000 (14:20 +0800)]
Improve the performance of dasum and sasum when SMP is defined
Qiyu8 [Fri, 13 Nov 2020 02:20:24 +0000 (10:20 +0800)]
modify system.cmake to enable fma flag
Qiyu8 [Thu, 12 Nov 2020 12:31:03 +0000 (20:31 +0800)]
fix the CI failure of target specific option mismatch
Qiyu8 [Thu, 12 Nov 2020 09:35:17 +0000 (17:35 +0800)]
fix the CI failure of lack the head
Qiyu8 [Wed, 11 Nov 2020 07:53:48 +0000 (15:53 +0800)]
modify macro
Qiyu8 [Wed, 11 Nov 2020 07:18:01 +0000 (15:18 +0800)]
only FMA3 and vector larger than 128 have positive effects.
Qiyu8 [Wed, 11 Nov 2020 06:33:12 +0000 (14:33 +0800)]
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Tue, 10 Nov 2020 08:16:38 +0000 (16:16 +0800)]
fix sum optimize issues
Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode