platform/upstream/openblas.git
4 years agoMerge pull request #2 from xianyi/develop
Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop

rebase

4 years agoMerge pull request #3025 from TiredNotTear/develop
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop

MIPS: Fix two bugs

4 years agoAdd PingTouGe contribution credit.
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.

4 years agoMerge pull request #3022 from jinboson/develop
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop

Fix test errors reported by cblas_cgemm & cblas_ctrmm

4 years agoFix failed cgemv and zgemv test case after using msa optimization
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization

The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.

4 years agoFix failed sswap and dswap case by using msa optimization
Hao Chen [Mon, 7 Dec 2020 02:04:00 +0000 (10:04 +0800)]
Fix failed sswap and dswap case by using msa optimization

The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.

4 years agoMerge pull request #3024 from martin-frbg/sparc
Martin Kroeker [Sun, 6 Dec 2020 21:34:36 +0000 (22:34 +0100)]
Merge pull request #3024 from martin-frbg/sparc

Fix 32 and 64bit builds on SPARC with SolarisStudio compilers

4 years agoFix compiler options for 32 and 64bit SPARC builds with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:20:50 +0000 (19:20 +0100)]
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio

4 years agoWork around DOT and SWAP test failures
Martin Kroeker [Sun, 6 Dec 2020 18:15:37 +0000 (19:15 +0100)]
Work around DOT and SWAP test failures

4 years agoFix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio

4 years agoFix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers

4 years agoChange comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility

4 years agoFix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds

4 years agoFix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc

4 years agoFix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers

4 years agoMerge pull request #1 from xianyi/develop
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop

rebase

4 years agoFix test errors reported by cblas_cgemm & cblas_ctrmm
Jin Bo [Sat, 5 Dec 2020 07:06:12 +0000 (15:06 +0800)]
Fix test errors reported by cblas_cgemm & cblas_ctrmm

The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.

4 years agoMerge pull request #3018 from martin-frbg/issue3015
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015

Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds

4 years agoMerge pull request #3016 from xiegengxin/complex-asum
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum

Improve the performance of zasum and casum with AVX512 intrinsic

4 years agoMerge pull request #3013 from martin-frbg/gcc46
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46

Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)

4 years agoMerge pull request #3011 from cyyever/fix_link
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link

link math lib on FreeBSD

4 years agoMerge pull request #3019 from RajalakshmiSR/dgemm_param
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param

POWER10: Update param.h

4 years agoUpdate f_check
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check

4 years agoPOWER10: Update param.h
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h

Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.

4 years agoAdd libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds

4 years agoAvoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds

4 years agouse gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12

4 years agoUpdate .travis.yml
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml

4 years agofix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines

4 years agofix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test

4 years agoDisable deprecated 32bit xcode
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode

4 years agofix error declare function blas_level1_thread_with_return_value
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value

4 years agoUpdate an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well

4 years agoUpdate OSX xcode version to 11.5
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5

4 years agoImprove the performance of zasum and casum with AVX512 intrinsic
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic

4 years agoSuppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6

4 years agoMove the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data

4 years agoMerge pull request #3014 from RajalakshmiSR/dgemvnp10
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10

POWER10:  Optimize dgemv_n

4 years agoPOWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10:  Optimize dgemv_n

Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.

4 years agoAdd SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE  flags for x86

4 years agoAdd workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx

4 years agoMerge pull request #3012 from martin-frbg/restore-getarch
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch

Restore RISCV entries accidentally trashed by my PR 3005

4 years agoRestore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005

4 years agoMerge pull request #3010 from ggouaillardet/topic/fj_compilers
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers

add Fujitsu compilers

4 years agolink math lib on FreeBSD
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD

4 years agoadd Fujitsu compilers
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers

Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
4 years agoMerge pull request #3005 from martin-frbg/ssefix
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix

Add -msse for x86 and silence build warning in getarch

4 years agoMerge pull request #3004 from martin-frbg/bsd_getauxval
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval

ARM64 DYNAMIC_ARCH build fix for BSD/OSX

4 years agoMerge pull request #3002 from martin-frbg/issue3000
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000

Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size

4 years agoMerge pull request #3001 from martin-frbg/issue2996
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996

Fix ambiguous ifdefs in tests for user-defined options in Makefiles

4 years agoAvoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning

4 years agoAdd -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported

4 years agoBuild fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval

4 years agoFix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup

4 years agoRestore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile

4 years agoEnsure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds

4 years agoUse ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option

4 years agoUse ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options

4 years agoUse ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options

4 years agoConvert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq

4 years agoChange ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq

4 years agoChange ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq

4 years agoMerge pull request #112 from xianyi/develop
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop

rebase

4 years agoMerge pull request #2965 from epsilon-0/develop
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop

allow setting soname without suffix or prefix

4 years agoMerge pull request #2988 from xiegengxin/smp-asum
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum

Improve the performance of dasum and sasum when SMP is defined

4 years agoMerge pull request #2997 from Flamefire/reproduce_crash
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash

Add reproducer test for crash after fork

4 years agoMerge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop

4 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v

4 years agoUpdate doc for C910.
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.

4 years agoMerge pull request #2995 from Flamefire/fix_thread_buffer_init
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init

Don't overwrite blas_thread_buffer if already set

4 years agoAdd reproducer test for crash after fork
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork

See #2993 for an analysis

4 years agoDon't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set

After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.

Fixes #2993

4 years agoMerge pull request #2981 from Qiyu8/fix-sum
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum

Fix sum optimize issues

4 years agoMerge pull request #2983 from Qiyu8/optimize-srot
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot

Optimize the performance of rot by using universal intrinsics

4 years agoremove the -mfma flag in when the host has AVX.
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.

4 years agoMerge pull request #2989 from martin-frbg/cmake-fma
Martin Kroeker [Fri, 13 Nov 2020 11:35:09 +0000 (12:35 +0100)]
Merge pull request #2989 from martin-frbg/cmake-fma

Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH

4 years agoAdd -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
Martin Kroeker [Fri, 13 Nov 2020 08:16:34 +0000 (09:16 +0100)]
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well

4 years agoMerge pull request #111 from xianyi/develop
Martin Kroeker [Fri, 13 Nov 2020 08:14:23 +0000 (09:14 +0100)]
Merge pull request #111 from xianyi/develop

rebase

4 years agoImprove the performance of dasum and sasum when SMP is defined
Gengxin Xie [Fri, 13 Nov 2020 06:20:52 +0000 (14:20 +0800)]
Improve the performance of dasum and sasum when SMP is defined

4 years agomodify system.cmake to enable fma flag
Qiyu8 [Fri, 13 Nov 2020 02:20:24 +0000 (10:20 +0800)]
modify system.cmake to enable fma flag

4 years agofix the CI failure of target specific option mismatch
Qiyu8 [Thu, 12 Nov 2020 12:31:03 +0000 (20:31 +0800)]
fix the CI failure of target specific option mismatch

4 years agofix the CI failure of lack the head
Qiyu8 [Thu, 12 Nov 2020 09:35:17 +0000 (17:35 +0800)]
fix the CI failure of lack the head

4 years agomodify macro
Qiyu8 [Wed, 11 Nov 2020 07:53:48 +0000 (15:53 +0800)]
modify macro

4 years agoonly FMA3 and vector larger than 128 have positive effects.
Qiyu8 [Wed, 11 Nov 2020 07:18:01 +0000 (15:18 +0800)]
only FMA3 and vector larger than 128 have positive effects.

4 years agoOptimize the performance of rot by using universal intrinsics
Qiyu8 [Wed, 11 Nov 2020 06:33:12 +0000 (14:33 +0800)]
Optimize the performance of rot by using universal intrinsics

4 years agofix sum optimize issues
Qiyu8 [Tue, 10 Nov 2020 08:16:38 +0000 (16:16 +0800)]
fix sum optimize issues

4 years agoRefs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v

4 years agoRefs #2899
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910

4 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v

4 years agoMerge pull request #2972 from xiegengxin/rot-intrinsic
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic

Improve the performance of rot by using AVX512 and AVX2 intrinsic

4 years agoMerge pull request #2980 from martin-frbg/fixgetarch
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch

Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

4 years agoFix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

4 years agoMerge pull request #2979 from RajalakshmiSR/dot_power10
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10

Optimize sdot/ddot for POWER10

4 years agoMerge pull request #2978 from martin-frbg/fixdynfeatures
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures

Fix handling of cpu capability flags in DYNAMIC_ARCH builds

4 years agoStay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine

4 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system

4 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system

4 years agoOptimize sdot/ddot for POWER10
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

4 years agoRemove previous workaround for compiler flags related to cpu capabilities in x86_64...
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds

4 years agoReset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode