platform/upstream/openblas.git
4 years agoFix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio

4 years agoFix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers

4 years agoChange comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility

4 years agoFix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds

4 years agoFix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc

4 years agoFix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers

4 years agoMerge pull request #1 from xianyi/develop
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop

rebase

4 years agoMerge pull request #3018 from martin-frbg/issue3015
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015

Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds

4 years agoMerge pull request #3016 from xiegengxin/complex-asum
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum

Improve the performance of zasum and casum with AVX512 intrinsic

4 years agoMerge pull request #3013 from martin-frbg/gcc46
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46

Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)

4 years agoMerge pull request #3011 from cyyever/fix_link
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link

link math lib on FreeBSD

4 years agoMerge pull request #3019 from RajalakshmiSR/dgemm_param
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param

POWER10: Update param.h

4 years agoUpdate f_check
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check

4 years agoPOWER10: Update param.h
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h

Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.

4 years agoAdd libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds

4 years agoAvoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds

4 years agouse gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12

4 years agoUpdate .travis.yml
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml

4 years agofix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines

4 years agofix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test

4 years agoDisable deprecated 32bit xcode
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode

4 years agofix error declare function blas_level1_thread_with_return_value
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value

4 years agoUpdate an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well

4 years agoUpdate OSX xcode version to 11.5
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5

4 years agoImprove the performance of zasum and casum with AVX512 intrinsic
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic

4 years agoSuppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6

4 years agoMove the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data

4 years agoMerge pull request #3014 from RajalakshmiSR/dgemvnp10
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10

POWER10:  Optimize dgemv_n

4 years agoPOWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10:  Optimize dgemv_n

Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.

4 years agoAdd SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE  flags for x86

4 years agoAdd workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx

4 years agoMerge pull request #3012 from martin-frbg/restore-getarch
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch

Restore RISCV entries accidentally trashed by my PR 3005

4 years agoRestore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005

4 years agoMerge pull request #3010 from ggouaillardet/topic/fj_compilers
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers

add Fujitsu compilers

4 years agolink math lib on FreeBSD
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD

4 years agoadd Fujitsu compilers
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers

Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
4 years agoMerge pull request #3005 from martin-frbg/ssefix
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix

Add -msse for x86 and silence build warning in getarch

4 years agoMerge pull request #3004 from martin-frbg/bsd_getauxval
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval

ARM64 DYNAMIC_ARCH build fix for BSD/OSX

4 years agoMerge pull request #3002 from martin-frbg/issue3000
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000

Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size

4 years agoMerge pull request #3001 from martin-frbg/issue2996
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996

Fix ambiguous ifdefs in tests for user-defined options in Makefiles

4 years agoAvoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning

4 years agoAdd -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported

4 years agoBuild fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval

4 years agoFix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup

4 years agoRestore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile

4 years agoEnsure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds

4 years agoUse ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option

4 years agoUse ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options

4 years agoUse ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options

4 years agoConvert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq

4 years agoChange ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq

4 years agoChange ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq

4 years agoMerge pull request #112 from xianyi/develop
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop

rebase

4 years agoMerge pull request #2965 from epsilon-0/develop
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop

allow setting soname without suffix or prefix

4 years agoMerge pull request #2988 from xiegengxin/smp-asum
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum

Improve the performance of dasum and sasum when SMP is defined

4 years agoMerge pull request #2997 from Flamefire/reproduce_crash
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash

Add reproducer test for crash after fork

4 years agoMerge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop

4 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v

4 years agoUpdate doc for C910.
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.

4 years agoMerge pull request #2995 from Flamefire/fix_thread_buffer_init
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init

Don't overwrite blas_thread_buffer if already set

4 years agoAdd reproducer test for crash after fork
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork

See #2993 for an analysis

4 years agoDon't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set

After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.

Fixes #2993

4 years agoMerge pull request #2981 from Qiyu8/fix-sum
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum

Fix sum optimize issues

4 years agoMerge pull request #2983 from Qiyu8/optimize-srot
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot

Optimize the performance of rot by using universal intrinsics

4 years agoremove the -mfma flag in when the host has AVX.
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.

4 years agoMerge pull request #2989 from martin-frbg/cmake-fma
Martin Kroeker [Fri, 13 Nov 2020 11:35:09 +0000 (12:35 +0100)]
Merge pull request #2989 from martin-frbg/cmake-fma

Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH

4 years agoAdd -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
Martin Kroeker [Fri, 13 Nov 2020 08:16:34 +0000 (09:16 +0100)]
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well

4 years agoMerge pull request #111 from xianyi/develop
Martin Kroeker [Fri, 13 Nov 2020 08:14:23 +0000 (09:14 +0100)]
Merge pull request #111 from xianyi/develop

rebase

4 years agoImprove the performance of dasum and sasum when SMP is defined
Gengxin Xie [Fri, 13 Nov 2020 06:20:52 +0000 (14:20 +0800)]
Improve the performance of dasum and sasum when SMP is defined

4 years agomodify system.cmake to enable fma flag
Qiyu8 [Fri, 13 Nov 2020 02:20:24 +0000 (10:20 +0800)]
modify system.cmake to enable fma flag

4 years agofix the CI failure of target specific option mismatch
Qiyu8 [Thu, 12 Nov 2020 12:31:03 +0000 (20:31 +0800)]
fix the CI failure of target specific option mismatch

4 years agofix the CI failure of lack the head
Qiyu8 [Thu, 12 Nov 2020 09:35:17 +0000 (17:35 +0800)]
fix the CI failure of lack the head

4 years agomodify macro
Qiyu8 [Wed, 11 Nov 2020 07:53:48 +0000 (15:53 +0800)]
modify macro

4 years agoonly FMA3 and vector larger than 128 have positive effects.
Qiyu8 [Wed, 11 Nov 2020 07:18:01 +0000 (15:18 +0800)]
only FMA3 and vector larger than 128 have positive effects.

4 years agoOptimize the performance of rot by using universal intrinsics
Qiyu8 [Wed, 11 Nov 2020 06:33:12 +0000 (14:33 +0800)]
Optimize the performance of rot by using universal intrinsics

4 years agofix sum optimize issues
Qiyu8 [Tue, 10 Nov 2020 08:16:38 +0000 (16:16 +0800)]
fix sum optimize issues

4 years agoRefs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v

4 years agoRefs #2899
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910

4 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v

4 years agoMerge pull request #2972 from xiegengxin/rot-intrinsic
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic

Improve the performance of rot by using AVX512 and AVX2 intrinsic

4 years agoMerge pull request #2980 from martin-frbg/fixgetarch
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch

Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

4 years agoFix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

4 years agoMerge pull request #2979 from RajalakshmiSR/dot_power10
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10

Optimize sdot/ddot for POWER10

4 years agoMerge pull request #2978 from martin-frbg/fixdynfeatures
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures

Fix handling of cpu capability flags in DYNAMIC_ARCH builds

4 years agoStay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine

4 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system

4 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system

4 years agoOptimize sdot/ddot for POWER10
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

4 years agoRemove previous workaround for compiler flags related to cpu capabilities in x86_64...
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds

4 years agoReset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode

4 years agoFix propagation of cpu properties to compiler options
Martin Kroeker [Sat, 7 Nov 2020 19:30:15 +0000 (20:30 +0100)]
Fix propagation of cpu properties to compiler options

4 years agoRemove extraneous quotes that caused a cmake policy warning
Martin Kroeker [Sat, 7 Nov 2020 19:27:42 +0000 (20:27 +0100)]
Remove extraneous quotes that caused a cmake policy warning

4 years agoFix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH...
Martin Kroeker [Sat, 7 Nov 2020 19:26:12 +0000 (20:26 +0100)]
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds

4 years agoMerge pull request #110 from xianyi/develop
Martin Kroeker [Sat, 7 Nov 2020 19:22:41 +0000 (20:22 +0100)]
Merge pull request #110 from xianyi/develop

rebase

4 years agoMerge pull request #2977 from martin-frbg/issue2976
Martin Kroeker [Sat, 7 Nov 2020 13:41:34 +0000 (14:41 +0100)]
Merge pull request #2977 from martin-frbg/issue2976

Fix macro name used in ifdef for POWERPC/PGI

4 years agoFix macro name used in ifdef
Martin Kroeker [Sat, 7 Nov 2020 11:17:49 +0000 (12:17 +0100)]
Fix macro name used in ifdef

4 years agofix typo
Gengxin Xie [Thu, 5 Nov 2020 08:25:17 +0000 (16:25 +0800)]
fix typo

4 years agoImprove the performance of rot by using AVX512 and AVX2 intrinsic
Gengxin Xie [Sun, 27 Sep 2020 02:38:19 +0000 (10:38 +0800)]
Improve the performance of rot by using AVX512 and AVX2 intrinsic

4 years agoMerge pull request #2966 from martin-frbg/issue2964
Martin Kroeker [Wed, 4 Nov 2020 15:02:46 +0000 (16:02 +0100)]
Merge pull request #2966 from martin-frbg/issue2964

Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds

4 years agoExport NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
Martin Kroeker [Tue, 3 Nov 2020 22:47:04 +0000 (23:47 +0100)]
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target