Martin Kroeker [Sat, 12 Dec 2020 22:27:40 +0000 (23:27 +0100)]
Merge pull request #3034 from xianyi/release-0.3.0
Merge back the release branch into develop to copy tag
Martin Kroeker [Sat, 12 Dec 2020 17:19:29 +0000 (18:19 +0100)]
Merge pull request #3033 from xianyi/develop
Update branch from develop to release 0.3.13
Martin Kroeker [Sat, 12 Dec 2020 17:15:33 +0000 (18:15 +0100)]
Update version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:14:49 +0000 (18:14 +0100)]
Update version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:13:23 +0000 (18:13 +0100)]
Merge pull request #3031 from martin-frbg/changelog13
Update Changelog.txt
Martin Kroeker [Sat, 12 Dec 2020 13:27:37 +0000 (14:27 +0100)]
Update Changelog.txt
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
Martin Kroeker [Sat, 12 Dec 2020 09:01:45 +0000 (10:01 +0100)]
Merge pull request #3030 from martin-frbg/fix2994
Make fallback from POWER10 to POWER9 depend on new enough compiler
Martin Kroeker [Sat, 12 Dec 2020 00:25:20 +0000 (01:25 +0100)]
Update Changelog.txt for 0.3.13
Martin Kroeker [Fri, 11 Dec 2020 22:41:17 +0000 (23:41 +0100)]
Make fallback from P10 to P9 conditional on suitable compiler
Martin Kroeker [Fri, 11 Dec 2020 22:38:42 +0000 (23:38 +0100)]
Merge pull request #3 from xianyi/develop
rebase
Martin Kroeker [Fri, 11 Dec 2020 22:37:30 +0000 (23:37 +0100)]
Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
Martin Kroeker [Thu, 10 Dec 2020 21:49:28 +0000 (22:49 +0100)]
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
Martin Kroeker [Thu, 10 Dec 2020 18:42:54 +0000 (19:42 +0100)]
Merge pull request #3021 from austinpagan/trsm_p10
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
Rajalakshmi Srinivasaraghavan [Thu, 10 Dec 2020 17:51:42 +0000 (11:51 -0600)]
POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
Martin Kroeker [Thu, 10 Dec 2020 15:29:41 +0000 (16:29 +0100)]
Merge pull request #3026 from martin-frbg/revert747
Revert PR747 - SYRK parameter changes for Haswell and related targets
Martin Kroeker [Thu, 10 Dec 2020 15:27:30 +0000 (16:27 +0100)]
Merge pull request #3027 from gxw-loongson/develop
Add msa support for loongson
gxw [Thu, 10 Dec 2020 02:48:53 +0000 (10:48 +0800)]
Keep LOONGSON3A and LOONGSON3B for loongson
gxw [Thu, 26 Nov 2020 06:59:41 +0000 (14:59 +0800)]
Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
Martin Kroeker [Tue, 8 Dec 2020 20:07:57 +0000 (21:07 +0100)]
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
Martin Kroeker [Tue, 8 Dec 2020 20:01:36 +0000 (21:01 +0100)]
remove extra/intermediate size step for min_jj introduced in PR747
Martin Kroeker [Tue, 8 Dec 2020 19:59:56 +0000 (20:59 +0100)]
remove extra/intermediate size step of min_jj from PR747
Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop
rebase
gxw [Tue, 8 Dec 2020 11:16:39 +0000 (19:16 +0800)]
Remove gcc unrecognized option '-msched-weight' when check msa
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop
MIPS: Fix two bugs
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop
Fix test errors reported by cblas_cgemm & cblas_ctrmm
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
Hao Chen [Mon, 7 Dec 2020 02:04:00 +0000 (10:04 +0800)]
Fix failed sswap and dswap case by using msa optimization
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
Martin Kroeker [Sun, 6 Dec 2020 21:34:36 +0000 (22:34 +0100)]
Merge pull request #3024 from martin-frbg/sparc
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:20:50 +0000 (19:20 +0100)]
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:15:37 +0000 (19:15 +0100)]
Work around DOT and SWAP test failures
Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop
rebase
Jin Bo [Sat, 5 Dec 2020 07:06:12 +0000 (15:06 +0800)]
Fix test errors reported by cblas_cgemm & cblas_ctrmm
The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.
Gordon Fossum [Fri, 4 Dec 2020 23:07:06 +0000 (17:07 -0600)]
Added special unrolled vectorized versions of "Solve" for specific sizes,
in DTRSM and STRSM, to improve performance in Power9 and Power10.
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link
link math lib on FreeBSD
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param
POWER10: Update param.h
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
POWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10: Optimize dgemv_n
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch
Restore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
add Fujitsu compilers
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix
Add -msse for x86 and silence build warning in getarch
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop
rebase
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
Don't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork
See #2993 for an analysis
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
Anton Blanchard [Thu, 19 Nov 2020 10:04:10 +0000 (21:04 +1100)]
POWER10: Use POWER9 as a fallback
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
Anton Blanchard [Thu, 19 Nov 2020 09:50:42 +0000 (20:50 +1100)]
POWER10: Fix ld version detection
LDVERSIONGTEQ35 needs to escape the '>' character.
LDVERSIONGTEQ35 is checking the system ld version which may be different
to the toolchain being used to compile OpenBLAS. We don't have a path
to the linker in our Makefiles, so (ab)use gcc -Wl,--version to get the
version of ld in our toolchain.
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.