platform/upstream/openblas.git
3 years agoMerge pull request #19 from xianyi/develop
Martin Kroeker [Thu, 18 Mar 2021 06:47:03 +0000 (07:47 +0100)]
Merge pull request #19 from xianyi/develop

rebase

3 years agoUpdate version to 0.3.14.dev
Martin Kroeker [Wed, 17 Mar 2021 20:14:42 +0000 (21:14 +0100)]
Update version to 0.3.14.dev

3 years agoUpdate version to 0.3.14.dev
Martin Kroeker [Wed, 17 Mar 2021 20:14:05 +0000 (21:14 +0100)]
Update version to 0.3.14.dev

3 years agoMerge pull request #3151 from xianyi/release-0.3.0
Martin Kroeker [Wed, 17 Mar 2021 20:13:25 +0000 (21:13 +0100)]
Merge pull request #3151 from xianyi/release-0.3.0

Merge 0.3.14 release branch back into develop to acquire tag

3 years agoMerge pull request #3150 from xianyi/develop
Martin Kroeker [Wed, 17 Mar 2021 19:21:42 +0000 (20:21 +0100)]
Merge pull request #3150 from xianyi/develop

Update branch from develop for 0.3.14 release

3 years agoUpdate version to 0.3.14 for release
Martin Kroeker [Wed, 17 Mar 2021 19:20:34 +0000 (20:20 +0100)]
Update version to 0.3.14 for release

3 years agoUpdate version to 0.3.14 for release
Martin Kroeker [Wed, 17 Mar 2021 19:20:00 +0000 (20:20 +0100)]
Update version to 0.3.14 for release

3 years agoMerge pull request #3149 from martin-frbg/changelog14
Martin Kroeker [Wed, 17 Mar 2021 19:14:50 +0000 (20:14 +0100)]
Merge pull request #3149 from martin-frbg/changelog14

Update Changelog for 0.3.14

3 years agoUpdate Changelog for 0.3.14
Martin Kroeker [Wed, 17 Mar 2021 15:17:55 +0000 (16:17 +0100)]
Update Changelog for 0.3.14

3 years agoMerge pull request #3148 from martin-frbg/issue3145
Martin Kroeker [Wed, 17 Mar 2021 08:05:43 +0000 (09:05 +0100)]
Merge pull request #3148 from martin-frbg/issue3145

Add workaround for older gcc on big-endian ppc64 not supporting casts in defines

3 years agoAdd workaround for older gcc on ppc64be not supporting casts in defines
Martin Kroeker [Tue, 16 Mar 2021 20:20:05 +0000 (21:20 +0100)]
Add workaround for older gcc on ppc64be not supporting casts in defines

3 years agoMerge pull request #18 from xianyi/develop
Martin Kroeker [Tue, 16 Mar 2021 20:09:45 +0000 (21:09 +0100)]
Merge pull request #18 from xianyi/develop

rebase

3 years agoMerge pull request #3147 from martin-frbg/issue3146
Martin Kroeker [Tue, 16 Mar 2021 19:25:42 +0000 (20:25 +0100)]
Merge pull request #3147 from martin-frbg/issue3146

Fix DYNAMIC_ARCH builds with CLANG on ppc64

3 years agoFix compilation with CLANG
Martin Kroeker [Tue, 16 Mar 2021 15:52:57 +0000 (16:52 +0100)]
Fix compilation with CLANG

3 years agoMerge pull request #3143 from martin-frbg/fix3088
Martin Kroeker [Sun, 14 Mar 2021 22:12:55 +0000 (23:12 +0100)]
Merge pull request #3143 from martin-frbg/fix3088

Resolve circular dependency between common.h and param.h

3 years agoMerge pull request #3144 from xoviat/fix-test
Martin Kroeker [Sun, 14 Mar 2021 22:12:33 +0000 (23:12 +0100)]
Merge pull request #3144 from xoviat/fix-test

disable openmp

3 years agodisable openmp
xoviat [Sun, 14 Mar 2021 21:34:02 +0000 (16:34 -0500)]
disable openmp

3 years agoremove inclusion of common.h again to avoid circular dependency
Martin Kroeker [Sun, 14 Mar 2021 16:36:51 +0000 (17:36 +0100)]
remove inclusion of common.h again to avoid circular dependency

3 years agoInclude common.h (and indirectly param.h) rather than just param.h to have BLASLONG...
Martin Kroeker [Sun, 14 Mar 2021 16:28:43 +0000 (17:28 +0100)]
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies

3 years agoMerge pull request #17 from xianyi/develop
Martin Kroeker [Sun, 14 Mar 2021 16:20:49 +0000 (17:20 +0100)]
Merge pull request #17 from xianyi/develop

rebase

3 years agoMerge pull request #3088 from xoviat/msvc
Martin Kroeker [Sun, 14 Mar 2021 16:14:28 +0000 (17:14 +0100)]
Merge pull request #3088 from xoviat/msvc

add misc fixes.

3 years agoMerge pull request #3141 from martin-frbg/nagfor-2
Martin Kroeker [Sat, 13 Mar 2021 22:04:53 +0000 (23:04 +0100)]
Merge pull request #3141 from martin-frbg/nagfor-2

Leave out ARM64 march/mtune options when compiling with nagfor

3 years agoSupport compilation with NAG fortran
Martin Kroeker [Sat, 13 Mar 2021 19:16:18 +0000 (20:16 +0100)]
Support compilation with NAG fortran

3 years agoMerge pull request #3140 from martin-frbg/issue3139
Martin Kroeker [Fri, 12 Mar 2021 14:35:58 +0000 (15:35 +0100)]
Merge pull request #3140 from martin-frbg/issue3139

Fix compilation on older x86_64 targets with old compilers that lack intrinsics support

3 years agoMerge pull request #3138 from martin-frbg/nagfor
Martin Kroeker [Fri, 12 Mar 2021 11:46:19 +0000 (12:46 +0100)]
Merge pull request #3138 from martin-frbg/nagfor

Add support for compilation with the NAG Fortran compiler

3 years agoMove includes under the ifdef for compilers w/o intrinsics support
Martin Kroeker [Fri, 12 Mar 2021 11:42:05 +0000 (12:42 +0100)]
Move includes under the ifdef for compilers w/o intrinsics support

3 years agoFix syntax
Martin Kroeker [Thu, 11 Mar 2021 22:03:58 +0000 (23:03 +0100)]
Fix syntax

3 years agoMerge pull request #3136 from austinpagan/Gemm.PQ
Martin Kroeker [Thu, 11 Mar 2021 14:17:48 +0000 (15:17 +0100)]
Merge pull request #3136 from austinpagan/Gemm.PQ

Modifying a couple parameters in the "POWER10"-specific section of pa…

3 years agoSupport compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:53:51 +0000 (11:53 +0100)]
Support compilation with nagfor

3 years agoSupport compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:52:29 +0000 (11:52 +0100)]
Support compilation with nagfor

3 years agoSupport compilation with the NAG Fortran compiler
Martin Kroeker [Thu, 11 Mar 2021 10:51:09 +0000 (11:51 +0100)]
Support compilation with the NAG Fortran compiler

3 years agoMerge pull request #16 from xianyi/develop
Martin Kroeker [Thu, 11 Mar 2021 10:48:37 +0000 (11:48 +0100)]
Merge pull request #16 from xianyi/develop

rebase

3 years agoMerge pull request #3137 from RajalakshmiSR/zscal_p10
Martin Kroeker [Thu, 11 Mar 2021 06:18:05 +0000 (07:18 +0100)]
Merge pull request #3137 from RajalakshmiSR/zscal_p10

Optimize zscal function for POWER10

3 years agoModifying a couple paramaters in the "POWER10"-specific section of param.h, for perfo...
austinpagan [Wed, 10 Mar 2021 23:19:12 +0000 (18:19 -0500)]
Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM.

3 years agoOptimize zscal function for POWER10
Rajalakshmi Srinivasaraghavan [Wed, 10 Mar 2021 23:15:33 +0000 (17:15 -0600)]
Optimize zscal function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3130 from martin-frbg/issue3128
Martin Kroeker [Sat, 6 Mar 2021 18:15:53 +0000 (19:15 +0100)]
Merge pull request #3130 from martin-frbg/issue3128

Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard

3 years agoRemove spurious AVX512 requirement and add AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 13:35:49 +0000 (14:35 +0100)]
Remove spurious AVX512 requirement and add AVX2/FMA3 guard

3 years agoMerge pull request #3129 from RajalakshmiSR/asum_p10
Martin Kroeker [Sat, 6 Mar 2021 08:13:59 +0000 (09:13 +0100)]
Merge pull request #3129 from RajalakshmiSR/asum_p10

Optimize s/dasum function for POWER10

3 years agoOptimize s/dasum function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 5 Mar 2021 22:22:36 +0000 (16:22 -0600)]
Optimize s/dasum function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3126 from martin-frbg/m1bench
Martin Kroeker [Tue, 2 Mar 2021 20:27:21 +0000 (21:27 +0100)]
Merge pull request #3126 from martin-frbg/m1bench

Support timing Apple M1 in the benchmarks

3 years agoSupport timing Apple M1
Martin Kroeker [Tue, 2 Mar 2021 16:50:55 +0000 (17:50 +0100)]
Support timing Apple M1

3 years agoMerge pull request #3125 from martin-frbg/issue3123
Martin Kroeker [Tue, 2 Mar 2021 08:58:40 +0000 (09:58 +0100)]
Merge pull request #3125 from martin-frbg/issue3123

Fix AMD AOCC compiler detection

3 years agoFix AMD AOCC compiler detection
Martin Kroeker [Mon, 1 Mar 2021 20:00:10 +0000 (21:00 +0100)]
Fix AMD AOCC compiler detection

3 years agoMerge pull request #3122 from martin-frbg/xeigtstz
Martin Kroeker [Sun, 28 Feb 2021 21:13:09 +0000 (22:13 +0100)]
Merge pull request #3122 from martin-frbg/xeigtstz

Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)

3 years agoAdjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:57:05 +0000 (18:57 +0100)]
Adjust build rules for ?chkee.F

3 years agoAdjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:53:20 +0000 (18:53 +0100)]
Adjust build rules for ?chkee.F

3 years agoAdd rewritten cchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:51:03 +0000 (18:51 +0100)]
Add rewritten cchkee.F from Reference-LAPACK PR335

3 years agoAdd rewritten dchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:50:26 +0000 (18:50 +0100)]
Add rewritten dchkee.F from Reference-LAPACK PR335

3 years agoAdd rewritten schkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:49:50 +0000 (18:49 +0100)]
Add rewritten schkee.F from Reference-LAPACK PR335

3 years agoAdd rewritten zchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:49:10 +0000 (18:49 +0100)]
Add rewritten zchkee.F from Reference-LAPACK PR335

3 years agoDelete zchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:47:06 +0000 (18:47 +0100)]
Delete zchkee.f

3 years agoDelete schkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:52 +0000 (18:46 +0100)]
Delete schkee.f

3 years agoDelete dchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:38 +0000 (18:46 +0100)]
Delete dchkee.f

3 years agoDelete cchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:08 +0000 (18:46 +0100)]
Delete cchkee.f

3 years agoMerge pull request #3121 from RajalakshmiSR/mmarename
Martin Kroeker [Sat, 27 Feb 2021 18:15:49 +0000 (19:15 +0100)]
Merge pull request #3121 from RajalakshmiSR/mmarename

POWER10: Rename mma builtins

3 years agoPOWER10: Rename mma builtins
Rajalakshmi Srinivasaraghavan [Sat, 27 Feb 2021 02:56:34 +0000 (20:56 -0600)]
POWER10: Rename mma builtins

The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.

Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62

3 years agoMerge pull request #3120 from martin-frbg/3118-x
Martin Kroeker [Fri, 26 Feb 2021 10:50:47 +0000 (11:50 +0100)]
Merge pull request #3120 from martin-frbg/3118-x

Fix use of undefined CC variable in f_check

3 years agofix undefined CC variable
Martin Kroeker [Fri, 26 Feb 2021 08:09:43 +0000 (09:09 +0100)]
fix undefined CC variable

3 years agoMerge pull request #15 from xianyi/develop
Martin Kroeker [Fri, 26 Feb 2021 08:06:25 +0000 (09:06 +0100)]
Merge pull request #15 from xianyi/develop

rebase

3 years agoMerge pull request #3119 from xianyi/revert-3118-issue3018-2
Martin Kroeker [Fri, 26 Feb 2021 03:18:33 +0000 (04:18 +0100)]
Merge pull request #3119 from xianyi/revert-3118-issue3018-2

Revert "Fix undefined CC in f_check (again)"

3 years agoRevert "Fix undefined CC in f_check (again)"
Martin Kroeker [Fri, 26 Feb 2021 03:18:04 +0000 (04:18 +0100)]
Revert "Fix undefined CC in f_check (again)"

3 years agoMerge pull request #3118 from martin-frbg/issue3018-2
Martin Kroeker [Thu, 25 Feb 2021 12:48:41 +0000 (13:48 +0100)]
Merge pull request #3118 from martin-frbg/issue3018-2

Fix undefined CC in f_check (again)

3 years agofix undefined CC again
Martin Kroeker [Thu, 25 Feb 2021 12:47:34 +0000 (13:47 +0100)]
fix undefined CC again

3 years agoMerge pull request #14 from xianyi/develop
Martin Kroeker [Thu, 25 Feb 2021 12:45:27 +0000 (13:45 +0100)]
Merge pull request #14 from xianyi/develop

rebase

3 years agoMerge pull request #3117 from haampie/fix-perl
Martin Kroeker [Wed, 24 Feb 2021 17:39:28 +0000 (18:39 +0100)]
Merge pull request #3117 from haampie/fix-perl

use /usr/bin/env perl

3 years agoMerge pull request #3114 from martin-frbg/issue3113
Martin Kroeker [Wed, 24 Feb 2021 17:38:25 +0000 (18:38 +0100)]
Merge pull request #3114 from martin-frbg/issue3113

Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64

3 years agoMerge pull request #3115 from martin-frbg/issue2532
Martin Kroeker [Wed, 24 Feb 2021 17:37:36 +0000 (18:37 +0100)]
Merge pull request #3115 from martin-frbg/issue2532

Replace unoptimized OMATCOPY_RT with 4x4 blocked version

3 years agouse /usr/bin/env perl
Harmen Stoppels [Wed, 24 Feb 2021 13:07:20 +0000 (14:07 +0100)]
use /usr/bin/env perl

3 years agoUpdate omatcopy_rt.c
Martin Kroeker [Wed, 24 Feb 2021 08:34:14 +0000 (09:34 +0100)]
Update omatcopy_rt.c

3 years agoUpdate omatcopy_rt.c
Martin Kroeker [Wed, 24 Feb 2021 08:13:12 +0000 (09:13 +0100)]
Update omatcopy_rt.c

3 years agoEnable optimized S/D OMATCOPY_RT
Martin Kroeker [Wed, 24 Feb 2021 08:03:41 +0000 (09:03 +0100)]
Enable optimized S/D OMATCOPY_RT

3 years agoAdd optimized omatcopy_rt
Martin Kroeker [Wed, 24 Feb 2021 08:00:54 +0000 (09:00 +0100)]
Add optimized omatcopy_rt

3 years agoTypo fix
Martin Kroeker [Tue, 23 Feb 2021 12:14:35 +0000 (13:14 +0100)]
Typo fix

3 years agoReplace naive omatcopy_rt with 4x4 blocked implementation
Martin Kroeker [Mon, 22 Feb 2021 20:35:42 +0000 (21:35 +0100)]
Replace naive omatcopy_rt with 4x4 blocked implementation

as suggested by MigMuc in issue 2532

3 years agoFix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
Martin Kroeker [Mon, 22 Feb 2021 18:40:36 +0000 (19:40 +0100)]
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64

3 years agoMerge pull request #13 from xianyi/develop
Martin Kroeker [Mon, 22 Feb 2021 18:31:41 +0000 (19:31 +0100)]
Merge pull request #13 from xianyi/develop

rebase

3 years agoMerge pull request #3111 from hawkinsp/forkrace
Martin Kroeker [Fri, 19 Feb 2021 08:57:18 +0000 (09:57 +0100)]
Merge pull request #3111 from hawkinsp/forkrace

Fix race in blas_thread_shutdown.

3 years agoFix race in blas_thread_shutdown.
Peter Hawkins [Thu, 18 Feb 2021 18:46:50 +0000 (13:46 -0500)]
Fix race in blas_thread_shutdown.

blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.

3 years agoMerge pull request #3110 from martin-frbg/issue3108
Martin Kroeker [Thu, 18 Feb 2021 14:45:25 +0000 (15:45 +0100)]
Merge pull request #3110 from martin-frbg/issue3108

Fix get_num_procs()  in the USE_TLS branch for non-glibc systems

3 years agoFix get_num_procs() in the USE_TLS branch for non-glibc systems
Martin Kroeker [Thu, 18 Feb 2021 10:14:05 +0000 (11:14 +0100)]
Fix get_num_procs()  in the USE_TLS branch for non-glibc systems

3 years agoMerge pull request #3105 from martin-frbg/tigerlake
Martin Kroeker [Fri, 12 Feb 2021 12:29:53 +0000 (13:29 +0100)]
Merge pull request #3105 from martin-frbg/tigerlake

Recognize Intel Tiger Lake CPUID as SkylakeX

3 years agoMerge pull request #3106 from RajalakshmiSR/ppcbe
Martin Kroeker [Fri, 12 Feb 2021 12:29:23 +0000 (13:29 +0100)]
Merge pull request #3106 from RajalakshmiSR/ppcbe

Fix build issue on POWER8 with DYNAMIC_ARCH

3 years agoFix build issue on POWER8 with DYNAMIC_ARCH
Rajalakshmi Srinivasaraghavan [Fri, 12 Feb 2021 03:28:03 +0000 (21:28 -0600)]
Fix build issue on POWER8 with DYNAMIC_ARCH

Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'

3 years agoRecognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 19:17:11 +0000 (20:17 +0100)]
Recognize Intel Tiger Lake as SkylakeX

3 years agoRecognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 19:16:27 +0000 (20:16 +0100)]
Recognize Intel Tiger Lake as SkylakeX

3 years agoMerge pull request #3104 from martin-frbg/issue3103
Martin Kroeker [Thu, 11 Feb 2021 14:42:47 +0000 (15:42 +0100)]
Merge pull request #3104 from martin-frbg/issue3103

Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen

3 years agoMerge pull request #3101 from jake-arkinstall/issue-3100
Martin Kroeker [Thu, 11 Feb 2021 14:42:18 +0000 (15:42 +0100)]
Merge pull request #3101 from jake-arkinstall/issue-3100

Addressed issue #3100 - removing an unnecessary write to the include directory

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:26:15 +0000 (09:26 +0100)]
Use Haswell optimizations for Zen as well

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:25:36 +0000 (09:25 +0100)]
Use Haswell optimizations for Zen as well

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:51 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:16 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well

3 years agoEnable optimized srot/drot kernels from Haswell
Martin Kroeker [Thu, 11 Feb 2021 08:23:05 +0000 (09:23 +0100)]
Enable optimized srot/drot kernels from Haswell

3 years agoMerge pull request #3102 from martin-frbg/issue3099
Martin Kroeker [Thu, 11 Feb 2021 07:56:46 +0000 (08:56 +0100)]
Merge pull request #3102 from martin-frbg/issue3099

Strip pkgversion info from compiler version string before comparing

3 years agoStrip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation
Martin Kroeker [Wed, 10 Feb 2021 13:22:59 +0000 (14:22 +0100)]
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation

3 years agoMerge pull request #12 from xianyi/develop
Martin Kroeker [Wed, 10 Feb 2021 13:17:24 +0000 (14:17 +0100)]
Merge pull request #12 from xianyi/develop

rebase

3 years agoAddressed issue #3100, removing an unnecessary write to the include directory
Jake Arkinstall [Wed, 10 Feb 2021 12:11:17 +0000 (12:11 +0000)]
Addressed issue #3100, removing an unnecessary write to the include directory

3 years agoMerge pull request #3094 from xoviat/patch-1
Martin Kroeker [Tue, 2 Feb 2021 12:36:17 +0000 (13:36 +0100)]
Merge pull request #3094 from xoviat/patch-1

build openmp on appveyor

3 years agoMerge pull request #3096 from martin-frbg/fixclangcmake
Martin Kroeker [Tue, 2 Feb 2021 12:33:15 +0000 (13:33 +0100)]
Merge pull request #3096 from martin-frbg/fixclangcmake

Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows

3 years agofix case in compiler name check
Martin Kroeker [Tue, 2 Feb 2021 09:53:46 +0000 (10:53 +0100)]
fix case in compiler name check

Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
3 years agoremove spurious lines (probably editor malfunction)
Martin Kroeker [Mon, 1 Feb 2021 20:02:53 +0000 (21:02 +0100)]
remove spurious lines (probably editor malfunction)