platform/upstream/openblas.git
3 years agoMerge pull request #1 from xianyi/develop
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop

rebase

3 years agoMerge pull request #3018 from martin-frbg/issue3015
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015

Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds

3 years agoMerge pull request #3016 from xiegengxin/complex-asum
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum

Improve the performance of zasum and casum with AVX512 intrinsic

3 years agoMerge pull request #3013 from martin-frbg/gcc46
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46

Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)

3 years agoMerge pull request #3011 from cyyever/fix_link
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link

link math lib on FreeBSD

3 years agoMerge pull request #3019 from RajalakshmiSR/dgemm_param
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param

POWER10: Update param.h

3 years agoUpdate f_check
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check

3 years agoPOWER10: Update param.h
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h

Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.

3 years agoAdd libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds

3 years agoAvoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds

3 years agouse gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12

3 years agoUpdate .travis.yml
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml

3 years agofix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines

3 years agofix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test

3 years agoDisable deprecated 32bit xcode
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode

3 years agofix error declare function blas_level1_thread_with_return_value
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value

3 years agoUpdate an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well

3 years agoUpdate OSX xcode version to 11.5
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5

3 years agoImprove the performance of zasum and casum with AVX512 intrinsic
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic

3 years agoSuppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6

3 years agoMove the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data

3 years agoMerge pull request #3014 from RajalakshmiSR/dgemvnp10
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10

POWER10:  Optimize dgemv_n

3 years agoPOWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10:  Optimize dgemv_n

Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.

3 years agoAdd SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE  flags for x86

3 years agoAdd workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx

3 years agoMerge pull request #3012 from martin-frbg/restore-getarch
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch

Restore RISCV entries accidentally trashed by my PR 3005

3 years agoRestore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005

3 years agoMerge pull request #3010 from ggouaillardet/topic/fj_compilers
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers

add Fujitsu compilers

3 years agolink math lib on FreeBSD
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD

3 years agoadd Fujitsu compilers
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers

Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
3 years agoMerge pull request #3005 from martin-frbg/ssefix
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix

Add -msse for x86 and silence build warning in getarch

3 years agoMerge pull request #3004 from martin-frbg/bsd_getauxval
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval

ARM64 DYNAMIC_ARCH build fix for BSD/OSX

3 years agoMerge pull request #3002 from martin-frbg/issue3000
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000

Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size

3 years agoMerge pull request #3001 from martin-frbg/issue2996
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996

Fix ambiguous ifdefs in tests for user-defined options in Makefiles

3 years agoAvoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning

3 years agoAdd -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported

3 years agoBuild fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval

3 years agoFix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup

3 years agoRestore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile

3 years agoEnsure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds

3 years agoUse ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option

3 years agoUse ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options

3 years agoUse ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options

3 years agoConvert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq

3 years agoChange ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq

3 years agoChange ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq

3 years agoMerge pull request #112 from xianyi/develop
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop

rebase

3 years agoMerge pull request #2965 from epsilon-0/develop
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop

allow setting soname without suffix or prefix

3 years agoMerge pull request #2988 from xiegengxin/smp-asum
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum

Improve the performance of dasum and sasum when SMP is defined

3 years agoMerge pull request #2997 from Flamefire/reproduce_crash
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash

Add reproducer test for crash after fork

3 years agoMerge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop

3 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v

3 years agoUpdate doc for C910.
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.

3 years agoMerge pull request #2995 from Flamefire/fix_thread_buffer_init
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init

Don't overwrite blas_thread_buffer if already set

3 years agoAdd reproducer test for crash after fork
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork

See #2993 for an analysis

3 years agoDon't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set

After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.

Fixes #2993

3 years agoMerge pull request #2981 from Qiyu8/fix-sum
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum

Fix sum optimize issues

3 years agoMerge pull request #2983 from Qiyu8/optimize-srot
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot

Optimize the performance of rot by using universal intrinsics

3 years agoremove the -mfma flag in when the host has AVX.
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.

3 years agoMerge pull request #2989 from martin-frbg/cmake-fma
Martin Kroeker [Fri, 13 Nov 2020 11:35:09 +0000 (12:35 +0100)]
Merge pull request #2989 from martin-frbg/cmake-fma

Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH

3 years agoAdd -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
Martin Kroeker [Fri, 13 Nov 2020 08:16:34 +0000 (09:16 +0100)]
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well

3 years agoMerge pull request #111 from xianyi/develop
Martin Kroeker [Fri, 13 Nov 2020 08:14:23 +0000 (09:14 +0100)]
Merge pull request #111 from xianyi/develop

rebase

3 years agoImprove the performance of dasum and sasum when SMP is defined
Gengxin Xie [Fri, 13 Nov 2020 06:20:52 +0000 (14:20 +0800)]
Improve the performance of dasum and sasum when SMP is defined

3 years agomodify system.cmake to enable fma flag
Qiyu8 [Fri, 13 Nov 2020 02:20:24 +0000 (10:20 +0800)]
modify system.cmake to enable fma flag

3 years agofix the CI failure of target specific option mismatch
Qiyu8 [Thu, 12 Nov 2020 12:31:03 +0000 (20:31 +0800)]
fix the CI failure of target specific option mismatch

3 years agofix the CI failure of lack the head
Qiyu8 [Thu, 12 Nov 2020 09:35:17 +0000 (17:35 +0800)]
fix the CI failure of lack the head

3 years agomodify macro
Qiyu8 [Wed, 11 Nov 2020 07:53:48 +0000 (15:53 +0800)]
modify macro

3 years agoonly FMA3 and vector larger than 128 have positive effects.
Qiyu8 [Wed, 11 Nov 2020 07:18:01 +0000 (15:18 +0800)]
only FMA3 and vector larger than 128 have positive effects.

3 years agoOptimize the performance of rot by using universal intrinsics
Qiyu8 [Wed, 11 Nov 2020 06:33:12 +0000 (14:33 +0800)]
Optimize the performance of rot by using universal intrinsics

3 years agofix sum optimize issues
Qiyu8 [Tue, 10 Nov 2020 08:16:38 +0000 (16:16 +0800)]
fix sum optimize issues

3 years agoRefs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v

3 years agoRefs #2899
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910

3 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v

3 years agoMerge pull request #2972 from xiegengxin/rot-intrinsic
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic

Improve the performance of rot by using AVX512 and AVX2 intrinsic

3 years agoMerge pull request #2980 from martin-frbg/fixgetarch
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch

Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

3 years agoFix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

3 years agoMerge pull request #2979 from RajalakshmiSR/dot_power10
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10

Optimize sdot/ddot for POWER10

3 years agoMerge pull request #2978 from martin-frbg/fixdynfeatures
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures

Fix handling of cpu capability flags in DYNAMIC_ARCH builds

3 years agoStay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine

3 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system

3 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system

3 years agoOptimize sdot/ddot for POWER10
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoRemove previous workaround for compiler flags related to cpu capabilities in x86_64...
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds

3 years agoReset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode

3 years agoFix propagation of cpu properties to compiler options
Martin Kroeker [Sat, 7 Nov 2020 19:30:15 +0000 (20:30 +0100)]
Fix propagation of cpu properties to compiler options

3 years agoRemove extraneous quotes that caused a cmake policy warning
Martin Kroeker [Sat, 7 Nov 2020 19:27:42 +0000 (20:27 +0100)]
Remove extraneous quotes that caused a cmake policy warning

3 years agoFix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH...
Martin Kroeker [Sat, 7 Nov 2020 19:26:12 +0000 (20:26 +0100)]
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds

3 years agoMerge pull request #110 from xianyi/develop
Martin Kroeker [Sat, 7 Nov 2020 19:22:41 +0000 (20:22 +0100)]
Merge pull request #110 from xianyi/develop

rebase

3 years agoMerge pull request #2977 from martin-frbg/issue2976
Martin Kroeker [Sat, 7 Nov 2020 13:41:34 +0000 (14:41 +0100)]
Merge pull request #2977 from martin-frbg/issue2976

Fix macro name used in ifdef for POWERPC/PGI

3 years agoFix macro name used in ifdef
Martin Kroeker [Sat, 7 Nov 2020 11:17:49 +0000 (12:17 +0100)]
Fix macro name used in ifdef

3 years agofix typo
Gengxin Xie [Thu, 5 Nov 2020 08:25:17 +0000 (16:25 +0800)]
fix typo

3 years agoImprove the performance of rot by using AVX512 and AVX2 intrinsic
Gengxin Xie [Sun, 27 Sep 2020 02:38:19 +0000 (10:38 +0800)]
Improve the performance of rot by using AVX512 and AVX2 intrinsic

3 years agoMerge pull request #2966 from martin-frbg/issue2964
Martin Kroeker [Wed, 4 Nov 2020 15:02:46 +0000 (16:02 +0100)]
Merge pull request #2966 from martin-frbg/issue2964

Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds

3 years agoExport NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
Martin Kroeker [Tue, 3 Nov 2020 22:47:04 +0000 (23:47 +0100)]
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target

3 years agoAdd -msse3 where needed for DYNAMIC_ARCH builds
Martin Kroeker [Tue, 3 Nov 2020 22:45:49 +0000 (23:45 +0100)]
Add -msse3 where needed for DYNAMIC_ARCH builds

3 years agoFix target test
Martin Kroeker [Mon, 2 Nov 2020 22:17:46 +0000 (23:17 +0100)]
Fix target test

3 years agoAdd files via upload
Martin Kroeker [Mon, 2 Nov 2020 21:43:50 +0000 (22:43 +0100)]
Add files via upload

3 years agoMerge pull request #2967 from RajalakshmiSR/dgemm88
Martin Kroeker [Mon, 2 Nov 2020 17:54:36 +0000 (18:54 +0100)]
Merge pull request #2967 from RajalakshmiSR/dgemm88

POWER10:  Change dgemm unroll factors

3 years agoallow setting soname without suffix or prefix
Aisha Tammy [Mon, 2 Nov 2020 13:04:53 +0000 (13:04 +0000)]
allow setting soname without suffix or prefix

Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts

3 years agotypo fix
Martin Kroeker [Sun, 1 Nov 2020 21:25:43 +0000 (22:25 +0100)]
typo fix