platform/upstream/openblas.git
4 years agoRevert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM...
Martin Kroeker [Mon, 12 Oct 2020 22:14:29 +0000 (00:14 +0200)]
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM

4 years agoMerge pull request #96 from xianyi/develop
Martin Kroeker [Sun, 11 Oct 2020 21:34:36 +0000 (23:34 +0200)]
Merge pull request #96 from xianyi/develop

rebase

4 years agoMerge branch 'develop' into develop
Martin Kroeker [Sun, 11 Oct 2020 21:34:14 +0000 (23:34 +0200)]
Merge branch 'develop' into develop

4 years agoMerge pull request #2882 from martin-frbg/issue2709
Martin Kroeker [Sun, 11 Oct 2020 20:22:30 +0000 (22:22 +0200)]
Merge pull request #2882 from martin-frbg/issue2709

Use generic C for (D/Z)NRM2 on Windows x86_64

4 years agoMerge pull request #2852 from martin-frbg/issue2588-cmake
Martin Kroeker [Sun, 11 Oct 2020 20:21:33 +0000 (22:21 +0200)]
Merge pull request #2852 from martin-frbg/issue2588-cmake

Support building only a subset of variable types

4 years agorepair TABs
Martin Kroeker [Sun, 11 Oct 2020 16:29:34 +0000 (18:29 +0200)]
repair TABs

4 years agoCopy BUILD_ settings to the LAPACK make.inc
Martin Kroeker [Sun, 11 Oct 2020 16:25:16 +0000 (18:25 +0200)]
Copy BUILD_ settings to the LAPACK make.inc

4 years agoSet BUILD_ options to 1 instead of just defining them
Martin Kroeker [Sun, 11 Oct 2020 16:08:21 +0000 (18:08 +0200)]
Set BUILD_ options to 1 instead of just defining them

4 years agoAdd cblas_xerbla interface
Martin Kroeker [Sun, 11 Oct 2020 15:45:41 +0000 (17:45 +0200)]
Add cblas_xerbla interface

4 years agoIf none of the BUILD_ options is set, enable them all
Martin Kroeker [Sun, 11 Oct 2020 15:33:51 +0000 (17:33 +0200)]
If none of the BUILD_ options is set, enable them all

4 years agoremove debug output
Martin Kroeker [Sun, 11 Oct 2020 15:23:08 +0000 (17:23 +0200)]
remove debug output

4 years agoMerge pull request #2885 from martin-frbg/ifexists
Martin Kroeker [Sun, 11 Oct 2020 13:45:24 +0000 (15:45 +0200)]
Merge pull request #2885 from martin-frbg/ifexists

Improve CMAKE check for conflicting config_kernel.h

4 years agoMerge pull request #2884 from martin-frbg/sse_fixup
Martin Kroeker [Sun, 11 Oct 2020 13:14:03 +0000 (15:14 +0200)]
Merge pull request #2884 from martin-frbg/sse_fixup

Add workaround for unwanted default activation of -msse3 in DYNAMIC_ARCH builds

4 years agoAllow building support for only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 13:11:15 +0000 (15:11 +0200)]
Allow building support for only a subset of variable types

4 years agoAdapt for supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 13:01:32 +0000 (15:01 +0200)]
Adapt for supporting only a subset of variable types

4 years agoAdapt for supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:58:57 +0000 (14:58 +0200)]
Adapt for supporting only a subset of variable types

4 years agoAdapt for supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:57:32 +0000 (14:57 +0200)]
Adapt for supporting only a subset of variable types

4 years agoAllow supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:53:26 +0000 (14:53 +0200)]
Allow supporting only a subset of variable types

4 years agoAllow compiling only a subset of kernels for specific variable types
Martin Kroeker [Sun, 11 Oct 2020 12:52:09 +0000 (14:52 +0200)]
Allow compiling only a subset of kernels for specific variable types

4 years agoAdd Makefile support for enabling only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:49:58 +0000 (14:49 +0200)]
Add Makefile support for enabling only some variable types

4 years agoAdd Makefile support for enabling only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:49:06 +0000 (14:49 +0200)]
Add Makefile support for enabling only some variable types

4 years agoAdd Makefile support for enabling only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:48:23 +0000 (14:48 +0200)]
Add Makefile support for enabling only some variable types

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:46:24 +0000 (14:46 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:45:40 +0000 (14:45 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:44:56 +0000 (14:44 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:44:13 +0000 (14:44 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:43:13 +0000 (14:43 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:42:26 +0000 (14:42 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:41:43 +0000 (14:41 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:40:51 +0000 (14:40 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:40:06 +0000 (14:40 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:39:19 +0000 (14:39 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:38:25 +0000 (14:38 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:36:45 +0000 (14:36 +0200)]
Adapt to having only a subset of variable types supported

4 years agoAdapt for having only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:34:12 +0000 (14:34 +0200)]
Adapt for having only a subset of variable types

4 years agoAdapt ctests to having only a subset of types in the build
Martin Kroeker [Sun, 11 Oct 2020 12:32:00 +0000 (14:32 +0200)]
Adapt ctests to having only a subset of types in the build

4 years agoAdapt tests to having only a subset of types in the build
Martin Kroeker [Sun, 11 Oct 2020 12:25:24 +0000 (14:25 +0200)]
Adapt tests to having only a subset of types in the build

4 years agoAdapt utests for builds supportin only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:15:35 +0000 (14:15 +0200)]
Adapt utests for builds supportin only some variable types

4 years agoMerge branch 'develop' into issue2588-cmake
Martin Kroeker [Sun, 11 Oct 2020 11:57:07 +0000 (13:57 +0200)]
Merge branch 'develop' into issue2588-cmake

4 years agoAdd files via upload
Martin Kroeker [Sun, 11 Oct 2020 11:26:05 +0000 (13:26 +0200)]
Add files via upload

4 years agoImprove check for conflicting config_kernel.h
Martin Kroeker [Sun, 11 Oct 2020 10:58:17 +0000 (12:58 +0200)]
Improve check for conflicting config_kernel.h

4 years agoMerge pull request #95 from xianyi/develop
Martin Kroeker [Sun, 11 Oct 2020 10:53:18 +0000 (12:53 +0200)]
Merge pull request #95 from xianyi/develop

rebase

4 years agoMerge pull request #2883 from martin-frbg/issue2872
Martin Kroeker [Sun, 11 Oct 2020 08:30:33 +0000 (10:30 +0200)]
Merge pull request #2883 from martin-frbg/issue2872

Minor CMAKE fixes

4 years agoAdd whitelist of DYNAMIC_ARCH kernels for which -msse3 needs to be enabled
Martin Kroeker [Sat, 10 Oct 2020 23:06:46 +0000 (01:06 +0200)]
Add whitelist of DYNAMIC_ARCH kernels for which -msse3 needs to be enabled

4 years agoDo not rely on HAVE_SSE3 in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 10 Oct 2020 23:04:57 +0000 (01:04 +0200)]
Do not rely on HAVE_SSE3 in DYNAMIC_ARCH builds

4 years agoMerge pull request #94 from xianyi/develop
Martin Kroeker [Sat, 10 Oct 2020 23:03:00 +0000 (01:03 +0200)]
Merge pull request #94 from xianyi/develop

rebase

4 years agorestore PRESCOTT default for DYNAMIC_LIST
Martin Kroeker [Sat, 10 Oct 2020 22:43:09 +0000 (00:43 +0200)]
restore PRESCOTT default for DYNAMIC_LIST

4 years agoStop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from...
Martin Kroeker [Sat, 10 Oct 2020 22:40:22 +0000 (00:40 +0200)]
Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build

This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined

4 years agoMerge pull request #2867 from Qiyu8/usimd-floatdot
Martin Kroeker [Sat, 10 Oct 2020 10:10:25 +0000 (12:10 +0200)]
Merge pull request #2867 from Qiyu8/usimd-floatdot

Optimize the performance of dot by using universal intrinsics in X86/ARM

4 years agoadd sse3 compiler flag
Qiyu8 [Sat, 10 Oct 2020 02:36:15 +0000 (10:36 +0800)]
add sse3 compiler flag

4 years agoMerge pull request #2879 from martin-frbg/issue2839
Martin Kroeker [Tue, 6 Oct 2020 21:26:52 +0000 (23:26 +0200)]
Merge pull request #2879 from martin-frbg/issue2839

Default BLAS3_MEM_ALLOC_THRESHOLD on all platforms to 32

4 years agoUse generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug
Martin Kroeker [Tue, 6 Oct 2020 19:33:16 +0000 (21:33 +0200)]
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug

4 years agomake BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows
Martin Kroeker [Sun, 4 Oct 2020 21:01:06 +0000 (23:01 +0200)]
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows

4 years agoReduce the BLAS3 heap allocation threshold to 32 and mark it as configurable
Martin Kroeker [Sun, 4 Oct 2020 20:59:24 +0000 (22:59 +0200)]
Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable

4 years agoMerge pull request #93 from xianyi/develop
Martin Kroeker [Sun, 4 Oct 2020 20:57:11 +0000 (22:57 +0200)]
Merge pull request #93 from xianyi/develop

rebase

4 years agoMerge pull request #2874 from Flamefire/memory_fixes
Martin Kroeker [Sun, 4 Oct 2020 13:16:51 +0000 (15:16 +0200)]
Merge pull request #2874 from Flamefire/memory_fixes

Avoid out of bounds access on invalid memory free

4 years agoMerge pull request #2876 from Flamefire/omp_fork_fix
Martin Kroeker [Sat, 3 Oct 2020 20:52:17 +0000 (22:52 +0200)]
Merge pull request #2876 from Flamefire/omp_fork_fix

Lazyly reinit threads after a fork in OMP mode

4 years agoMerge pull request #2878 from brada4/asms
Martin Kroeker [Sat, 3 Oct 2020 20:51:49 +0000 (22:51 +0200)]
Merge pull request #2878 from brada4/asms

fix clang std=c18 compilation on aarch64

4 years agoaarch64 fix std=c18 compilation
User User-User [Sat, 3 Oct 2020 15:00:34 +0000 (18:00 +0300)]
aarch64 fix std=c18 compilation

4 years agoLazyly reinit threads after a fork in OMP mode
Alexander Grund [Thu, 1 Oct 2020 13:41:42 +0000 (15:41 +0200)]
Lazyly reinit threads after a fork in OMP mode

This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.

4 years agoAvoid out of bounds access on invalid memory free
Alexander Grund [Thu, 1 Oct 2020 08:48:45 +0000 (10:48 +0200)]
Avoid out of bounds access on invalid memory free

4 years agoFix TABs and trailing space
Alexander Grund [Thu, 1 Oct 2020 08:43:16 +0000 (10:43 +0200)]
Fix TABs and trailing space

4 years agoMerge pull request #2873 from martin-frbg/issue2871
Martin Kroeker [Thu, 1 Oct 2020 04:38:22 +0000 (06:38 +0200)]
Merge pull request #2873 from martin-frbg/issue2871

Check for __linux rather than linux in cpuid code and benchmarks

4 years agoMerge pull request #2865 from thisch/backticks
Martin Kroeker [Thu, 1 Oct 2020 04:38:06 +0000 (06:38 +0200)]
Merge pull request #2865 from thisch/backticks

Consolidate usage of backticks for build options

4 years agoRemove redundant status message
Martin Kroeker [Wed, 30 Sep 2020 21:28:49 +0000 (23:28 +0200)]
Remove redundant status message

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:59:41 +0000 (22:59 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:50:21 +0000 (22:50 +0200)]
Change ifdef linux to __linux for C11 compatibility

and add a fallback for unsupported operating systems in detect()

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:47:25 +0000 (22:47 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:46:25 +0000 (22:46 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:45:18 +0000 (22:45 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:43:25 +0000 (22:43 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoAdapt ARM architect
Qiyu8 [Tue, 29 Sep 2020 08:36:14 +0000 (16:36 +0800)]
Adapt ARM architect

4 years agoMerge pull request #91 from xianyi/develop
Martin Kroeker [Mon, 28 Sep 2020 20:48:53 +0000 (22:48 +0200)]
Merge pull request #91 from xianyi/develop

rebase

4 years agoOptimize the performance of dot by using universal intrinsics in X86/ARM
Qiyu8 [Mon, 28 Sep 2020 12:36:53 +0000 (20:36 +0800)]
Optimize the performance of dot by using universal intrinsics in X86/ARM

4 years agoMerge pull request #2866 from RajalakshmiSR/p10_dcopy
Martin Kroeker [Mon, 28 Sep 2020 05:22:54 +0000 (07:22 +0200)]
Merge pull request #2866 from RajalakshmiSR/p10_dcopy

Optimize dcopy/zcopy for POWER10

4 years agoOptimize dcopy/zcopy for POWER10
Rajalakshmi Srinivasaraghavan [Mon, 28 Sep 2020 02:42:32 +0000 (21:42 -0500)]
Optimize dcopy/zcopy for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.

4 years agoConsolidate usage of backticks for build options
Thomas Hisch [Sun, 27 Sep 2020 22:42:17 +0000 (00:42 +0200)]
Consolidate usage of backticks for build options

There were some build options in the README that were not
highlighted. Now all are highlighted.

4 years agoMerge pull request #2853 from Qiyu8/usimd-daxpy
Martin Kroeker [Sun, 27 Sep 2020 21:19:59 +0000 (23:19 +0200)]
Merge pull request #2853 from Qiyu8/usimd-daxpy

Optimize the performance of daxpy by using universal intrinsics

4 years agoMerge pull request #2864 from martin-frbg/lapack445
Martin Kroeker [Sun, 27 Sep 2020 21:11:17 +0000 (23:11 +0200)]
Merge pull request #2864 from martin-frbg/lapack445

FIx underflow/rounding errors in LAPACK (S,D)LANV2

4 years agoFIx underflow/rounding errors in LAPACK (S,D)LANV2
Martin Kroeker [Sun, 27 Sep 2020 20:59:20 +0000 (22:59 +0200)]
FIx underflow/rounding errors in LAPACK (S,D)LANV2

Reference-LAPACK PR 445, fixing their issue 263

4 years agoMerge pull request #2863 from martin-frbg/readmefixes
Martin Kroeker [Sun, 27 Sep 2020 20:50:25 +0000 (22:50 +0200)]
Merge pull request #2863 from martin-frbg/readmefixes

Readmefixes

4 years agoUpdate cpu list, outline cmake build, clarify scope of set_num_threads extension
Martin Kroeker [Sun, 27 Sep 2020 20:48:41 +0000 (22:48 +0200)]
Update cpu list, outline cmake build, clarify scope of set_num_threads extension

4 years agoMerge pull request #90 from xianyi/develop
Martin Kroeker [Sun, 27 Sep 2020 20:35:45 +0000 (22:35 +0200)]
Merge pull request #90 from xianyi/develop

rebase

4 years agoMerge pull request #2850 from xiaojiayuan111/develop
Martin Kroeker [Sun, 27 Sep 2020 10:12:35 +0000 (12:12 +0200)]
Merge pull request #2850 from xiaojiayuan111/develop

fix a bug of trmm

4 years agoremove default support for FMA4 on zen architect
Qiyu8 [Sun, 27 Sep 2020 01:35:50 +0000 (09:35 +0800)]
remove default support for FMA4 on zen architect

4 years agoAdd support for building only selected variable types
Martin Kroeker [Sat, 26 Sep 2020 21:25:55 +0000 (23:25 +0200)]
Add support for building only selected variable types

4 years agoWork around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16
Martin Kroeker [Sat, 26 Sep 2020 21:24:37 +0000 (23:24 +0200)]
Work around sgemm_r/dgemm_r not being properly defined with BUILD_COMPLEX/BUILD_COMPLEX16

4 years agoMerge pull request #2854 from martin-frbg/travis-graviton
Martin Kroeker [Wed, 23 Sep 2020 19:59:18 +0000 (21:59 +0200)]
Merge pull request #2854 from martin-frbg/travis-graviton

Add an AWS-Graviton2 build to Travis CI

4 years agoAdd AWS Graviton2 build
Martin Kroeker [Wed, 23 Sep 2020 17:02:20 +0000 (19:02 +0200)]
Add AWS Graviton2 build

4 years agoAdapt tests to having only a subset of types in the library
Martin Kroeker [Tue, 22 Sep 2020 21:28:57 +0000 (23:28 +0200)]
Adapt tests to having only a subset of types in the library

4 years agoAdapt tests to having only a subset of types in the build
Martin Kroeker [Tue, 22 Sep 2020 21:28:03 +0000 (23:28 +0200)]
Adapt tests to having only a subset of types in the build

4 years agoSupport building only a subset of types
Martin Kroeker [Tue, 22 Sep 2020 21:25:59 +0000 (23:25 +0200)]
Support building only a subset of types

4 years agoSupport building only a subset of types
Martin Kroeker [Tue, 22 Sep 2020 21:25:04 +0000 (23:25 +0200)]
Support building only a subset of types

4 years agoAdd BUILD_vartype defines
Martin Kroeker [Tue, 22 Sep 2020 21:24:22 +0000 (23:24 +0200)]
Add BUILD_vartype defines

4 years agoAdd BUILD_vartype defines
Martin Kroeker [Tue, 22 Sep 2020 21:23:33 +0000 (23:23 +0200)]
Add BUILD_vartype defines

4 years agoSupport building only selected types
Martin Kroeker [Tue, 22 Sep 2020 21:21:30 +0000 (23:21 +0200)]
Support building only selected types

4 years agoSupport building only seleced types
Martin Kroeker [Tue, 22 Sep 2020 21:20:51 +0000 (23:20 +0200)]
Support building only seleced types

4 years agofix grouping of sources used for more than one type
Martin Kroeker [Tue, 22 Sep 2020 21:20:05 +0000 (23:20 +0200)]
fix grouping of sources used for more than one type

4 years agoadd defines for building a subset of types
Martin Kroeker [Tue, 22 Sep 2020 21:18:55 +0000 (23:18 +0200)]
add defines for building a subset of types

4 years agoMerge pull request #88 from xianyi/develop
Martin Kroeker [Tue, 22 Sep 2020 21:15:33 +0000 (23:15 +0200)]
Merge pull request #88 from xianyi/develop

rebase