platform/upstream/openblas.git
4 years agoMerge pull request #2925 from martin-frbg/issue2911-2
Martin Kroeker [Tue, 20 Oct 2020 09:27:36 +0000 (11:27 +0200)]
Merge pull request #2925 from martin-frbg/issue2911-2

Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build

4 years agoMerge pull request #2926 from bartoldeman/vzeroupper-clobber-all
Martin Kroeker [Tue, 20 Oct 2020 07:24:47 +0000 (09:24 +0200)]
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all

x86_64: clobber all xmm registers after vzeroupper

4 years agoFix missing backquotes
Martin Kroeker [Tue, 20 Oct 2020 06:37:53 +0000 (08:37 +0200)]
Fix missing backquotes

4 years agox86_64: clobber all xmm registers after vzeroupper
Bart Oldeman [Tue, 20 Oct 2020 02:16:47 +0000 (02:16 +0000)]
x86_64: clobber all xmm registers after vzeroupper

As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.

4 years agoAdd POWER10 support flag (unconditionally for now)
Martin Kroeker [Mon, 19 Oct 2020 23:09:49 +0000 (01:09 +0200)]
Add POWER10 support flag (unconditionally for now)

4 years agoAdd ld/binutils version check for POWER10 support
Martin Kroeker [Mon, 19 Oct 2020 23:04:20 +0000 (01:04 +0200)]
Add ld/binutils version check for POWER10 support

4 years agoMove HAVE_P10_SUPPORT to the build system
Martin Kroeker [Mon, 19 Oct 2020 22:55:41 +0000 (00:55 +0200)]
Move HAVE_P10_SUPPORT to the build system

to be able to include a binutils version check

4 years agoMerge pull request #104 from xianyi/develop
Martin Kroeker [Mon, 19 Oct 2020 22:52:08 +0000 (00:52 +0200)]
Merge pull request #104 from xianyi/develop

rebase

4 years agoMerge pull request #2924 from martin-frbg/issue2920
Martin Kroeker [Mon, 19 Oct 2020 21:33:45 +0000 (23:33 +0200)]
Merge pull request #2924 from martin-frbg/issue2920

Put back all symbols accidentally dropped in the reorganization of gensymbol

4 years agoMerge pull request #2916 from martin-frbg/issue2911
Martin Kroeker [Mon, 19 Oct 2020 21:33:31 +0000 (23:33 +0200)]
Merge pull request #2916 from martin-frbg/issue2911

Clean up duplicate definitions in POWER8 kernels and fix power10 option passing

4 years agoAdd back symbols that got dropped when splitting by type
Martin Kroeker [Mon, 19 Oct 2020 18:37:52 +0000 (20:37 +0200)]
Add back symbols that got dropped when splitting by type

4 years agoAdd POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT
Martin Kroeker [Mon, 19 Oct 2020 15:43:53 +0000 (17:43 +0200)]
Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT

4 years agoMerge pull request #103 from xianyi/develop
Martin Kroeker [Mon, 19 Oct 2020 13:56:20 +0000 (15:56 +0200)]
Merge pull request #103 from xianyi/develop

rebase

4 years agoFix spurious trailing whitespace in comment
Martin Kroeker [Mon, 19 Oct 2020 07:12:12 +0000 (09:12 +0200)]
Fix spurious trailing whitespace in comment

4 years agoMerge pull request #2919 from isuruf/export
Martin Kroeker [Mon, 19 Oct 2020 06:14:27 +0000 (08:14 +0200)]
Merge pull request #2919 from isuruf/export

Fix exporting some lapack and cblas symbols

4 years agoRemove -mmma again (reduntant with cpu=power10) and add override statements
Martin Kroeker [Mon, 19 Oct 2020 06:11:22 +0000 (08:11 +0200)]
Remove -mmma again (reduntant with cpu=power10) and add override statements

4 years agoFix exporting some lapack and cblas
Isuru Fernando [Mon, 19 Oct 2020 02:42:32 +0000 (21:42 -0500)]
Fix exporting some lapack and cblas

4 years agoMerge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
Martin Kroeker [Sun, 18 Oct 2020 22:09:54 +0000 (00:09 +0200)]
Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex

sgemm_direct_skylakex: fix 75eeb26 regression.

4 years agoMerge pull request #2913 from martin-frbg/issue2910
Martin Kroeker [Sun, 18 Oct 2020 21:04:56 +0000 (23:04 +0200)]
Merge pull request #2913 from martin-frbg/issue2910

Support cross-compiling for Apple Vortex

4 years agosgemm_direct_skylakex: fix 75eeb26 regression.
Bart Oldeman [Sun, 18 Oct 2020 19:50:38 +0000 (19:50 +0000)]
sgemm_direct_skylakex: fix 75eeb26 regression.

The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.

Closes #2914

4 years agoClean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:41:43 +0000 (19:41 +0200)]
Clean up STACKSIZE redefinition

4 years agoClean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:39:18 +0000 (19:39 +0200)]
Clean up STACKSIZE redefinition

4 years agoClean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:37:04 +0000 (19:37 +0200)]
Clean up STACKSIZE redefinition

4 years agoClean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:31:01 +0000 (19:31 +0200)]
Clean up STACKSIZE redefinition

4 years agoClean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:29:45 +0000 (19:29 +0200)]
Clean up STACKSIZE redefinition

4 years agoAdd compiler option -mmma for POWER10
Martin Kroeker [Sun, 18 Oct 2020 17:27:51 +0000 (19:27 +0200)]
Add compiler option -mmma for POWER10

4 years agoFix naming of L2 cache size item reported for Vortex
Martin Kroeker [Sun, 18 Oct 2020 17:22:05 +0000 (19:22 +0200)]
Fix naming of L2 cache size item reported for Vortex

4 years agoMerge pull request #2909 from isuruf/patch-1
Martin Kroeker [Sun, 18 Oct 2020 17:16:08 +0000 (19:16 +0200)]
Merge pull request #2909 from isuruf/patch-1

Need a space when redirecting to file

4 years agoSupport cross-compiling for Apple Vortex
Martin Kroeker [Sun, 18 Oct 2020 17:10:58 +0000 (19:10 +0200)]
Support cross-compiling for Apple Vortex

4 years agoSupport cross-compiling for Apple Vortex
Martin Kroeker [Sun, 18 Oct 2020 16:54:54 +0000 (18:54 +0200)]
Support cross-compiling for Apple Vortex

4 years agoMerge pull request #102 from xianyi/develop
Martin Kroeker [Sun, 18 Oct 2020 16:49:59 +0000 (18:49 +0200)]
Merge pull request #102 from xianyi/develop

rebase

4 years agoNeed a space when redirecting to file
Isuru Fernando [Sun, 18 Oct 2020 14:40:31 +0000 (09:40 -0500)]
Need a space when redirecting to file

Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def

4 years agoUpdate version string to 0.3.11.dev
Martin Kroeker [Sat, 17 Oct 2020 20:40:47 +0000 (22:40 +0200)]
Update version string to 0.3.11.dev

4 years agoUpdate version string to 0.3.11.dev
Martin Kroeker [Sat, 17 Oct 2020 20:40:06 +0000 (22:40 +0200)]
Update version string to 0.3.11.dev

4 years agoMerge pull request #2908 from xianyi/release-0.3.0
Martin Kroeker [Sat, 17 Oct 2020 20:38:58 +0000 (22:38 +0200)]
Merge pull request #2908 from xianyi/release-0.3.0

Synchronyse tag with release 0.3.11

4 years agoMerge pull request #2907 from xianyi/develop
Martin Kroeker [Sat, 17 Oct 2020 20:14:12 +0000 (22:14 +0200)]
Merge pull request #2907 from xianyi/develop

Update from develop for 0.3.11

4 years agoUpdate version number to 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:11:34 +0000 (22:11 +0200)]
Update version number to 0.3.11

4 years agoUpdate version for 0.3.11 release
Martin Kroeker [Sat, 17 Oct 2020 20:10:50 +0000 (22:10 +0200)]
Update version for 0.3.11 release

4 years agoMerge pull request #2906 from martin-frbg/changelog-0311
Martin Kroeker [Sat, 17 Oct 2020 20:07:14 +0000 (22:07 +0200)]
Merge pull request #2906 from martin-frbg/changelog-0311

Update Changelog.txt with the 0.3.11 changes

4 years agoUpdate Changelog.txt with the 0.3.11 changes
Martin Kroeker [Sat, 17 Oct 2020 20:05:36 +0000 (22:05 +0200)]
Update Changelog.txt with the 0.3.11 changes

4 years agoMerge pull request #2905 from martin-frbg/aocc-clang
Martin Kroeker [Sat, 17 Oct 2020 07:45:22 +0000 (09:45 +0200)]
Merge pull request #2905 from martin-frbg/aocc-clang

Add -mavx for clang & aocc

4 years agoAdd AVX flags for clang/aocc as well
Martin Kroeker [Fri, 16 Oct 2020 18:52:15 +0000 (20:52 +0200)]
Add AVX flags for clang/aocc as well

4 years agoMerge pull request #101 from xianyi/develop
Martin Kroeker [Fri, 16 Oct 2020 18:48:58 +0000 (20:48 +0200)]
Merge pull request #101 from xianyi/develop

rebase

4 years agoMerge pull request #2900 from martin-frbg/fixcmake_sse
Martin Kroeker [Fri, 16 Oct 2020 14:17:36 +0000 (16:17 +0200)]
Merge pull request #2900 from martin-frbg/fixcmake_sse

Add compiler options for SSE to the cmake support files

4 years agoAdd compiler options for sse/sse2/ssse3/sse4.1
Martin Kroeker [Fri, 16 Oct 2020 08:47:06 +0000 (10:47 +0200)]
Add compiler options for sse/sse2/ssse3/sse4.1

4 years agoAdd sse options for use of intrinics with older compilers
Martin Kroeker [Fri, 16 Oct 2020 08:41:53 +0000 (10:41 +0200)]
Add sse options for use of intrinics with older compilers

4 years agofix core list for sse/sse2
Martin Kroeker [Fri, 16 Oct 2020 07:55:48 +0000 (09:55 +0200)]
fix core list for sse/sse2

4 years agoMerge pull request #2898 from martin-frbg/morefixes
Martin Kroeker [Fri, 16 Oct 2020 05:26:39 +0000 (07:26 +0200)]
Merge pull request #2898 from martin-frbg/morefixes

More pre-release fixes

4 years agoadd sse2
Martin Kroeker [Thu, 15 Oct 2020 20:10:32 +0000 (22:10 +0200)]
add sse2

4 years agoExpressly enable -msse for 32bit DYNAMIC_ARCH kernels
Martin Kroeker [Thu, 15 Oct 2020 18:16:15 +0000 (20:16 +0200)]
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels

4 years agoSilence a redefinition warning
Martin Kroeker [Thu, 15 Oct 2020 17:08:12 +0000 (19:08 +0200)]
Silence a redefinition warning

4 years agoAdd -msse where supported, apparently required for older gcc
Martin Kroeker [Thu, 15 Oct 2020 17:06:45 +0000 (19:06 +0200)]
Add -msse where supported, apparently required for older gcc

4 years agoUse ifdef instead of if
Martin Kroeker [Thu, 15 Oct 2020 17:05:37 +0000 (19:05 +0200)]
Use ifdef instead of if

4 years agoMerge pull request #100 from xianyi/develop
Martin Kroeker [Thu, 15 Oct 2020 16:54:20 +0000 (18:54 +0200)]
Merge pull request #100 from xianyi/develop

rebase

4 years agoMerge pull request #2896 from martin-frbg/intrin-double
Martin Kroeker [Thu, 15 Oct 2020 09:12:35 +0000 (11:12 +0200)]
Merge pull request #2896 from martin-frbg/intrin-double

Add compiler flag for SSE4 where available

4 years agoMerge pull request #2897 from Qiyu8/usimd-double
Martin Kroeker [Thu, 15 Oct 2020 06:38:24 +0000 (08:38 +0200)]
Merge pull request #2897 from Qiyu8/usimd-double

Add double precision universal intrinsics for X86/ARM

4 years agoRevert "add double precision SSE"
Martin Kroeker [Thu, 15 Oct 2020 06:37:02 +0000 (08:37 +0200)]
Revert "add double precision SSE"

4 years agoadapt arm platform
Qiyu8 [Thu, 15 Oct 2020 03:08:10 +0000 (11:08 +0800)]
adapt arm platform

4 years agoAdd double precision universal intrinsics for X86/ARM
Qiyu8 [Thu, 15 Oct 2020 02:29:42 +0000 (10:29 +0800)]
Add double precision universal intrinsics for X86/ARM

4 years agoadd sse4.1 for DYNAMIC_ARCH kernels
Martin Kroeker [Wed, 14 Oct 2020 18:34:33 +0000 (20:34 +0200)]
add sse4.1 for DYNAMIC_ARCH kernels

4 years agoAdd -msse4.1 when SSE4.1 is supported
Martin Kroeker [Wed, 14 Oct 2020 17:18:07 +0000 (19:18 +0200)]
Add -msse4.1 when SSE4.1 is supported

4 years agoAdd double precision operations
Martin Kroeker [Wed, 14 Oct 2020 16:10:45 +0000 (18:10 +0200)]
Add double precision operations

4 years agoMerge pull request #99 from xianyi/develop
Martin Kroeker [Wed, 14 Oct 2020 16:09:20 +0000 (18:09 +0200)]
Merge pull request #99 from xianyi/develop

rebase

4 years agoMerge pull request #2890 from martin-frbg/s-d-sum
Martin Kroeker [Wed, 14 Oct 2020 07:02:03 +0000 (09:02 +0200)]
Merge pull request #2890 from martin-frbg/s-d-sum

Revert special handling of Windows xNRM2 and enable C+intrinsics kern…

4 years agoMerge pull request #2895 from martin-frbg/sb-tests
Martin Kroeker [Wed, 14 Oct 2020 07:01:16 +0000 (09:01 +0200)]
Merge pull request #2895 from martin-frbg/sb-tests

Fix remaining build errors related to bfloat16 and cmake

4 years agoMerge pull request #2894 from RajalakshmiSR/bf16_packing
Martin Kroeker [Wed, 14 Oct 2020 06:12:08 +0000 (08:12 +0200)]
Merge pull request #2894 from RajalakshmiSR/bf16_packing

POWER10: Change the packing format for bfloat16

4 years agoReplace Makefile with simplified version again
Martin Kroeker [Tue, 13 Oct 2020 23:08:50 +0000 (01:08 +0200)]
Replace Makefile with simplified version again

4 years agoAdd express -mavx and -msse options (and fix a stray = for cooperlake)
Martin Kroeker [Tue, 13 Oct 2020 23:01:58 +0000 (01:01 +0200)]
Add express -mavx and -msse options (and fix a stray = for cooperlake)

4 years agoAdd the BFLOAT16 functions to cmake builds
Martin Kroeker [Tue, 13 Oct 2020 21:21:38 +0000 (23:21 +0200)]
Add the BFLOAT16 functions to cmake builds

4 years agoPOWER10: Change the packing format for bfloat16
Rajalakshmi Srinivasaraghavan [Tue, 13 Oct 2020 21:05:10 +0000 (16:05 -0500)]
POWER10: Change the packing format for bfloat16

As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code.  This avoids permute instructions
in the gemm kernel inner loop.

4 years agoRename "HALF" type to "BFLOAT16"
Martin Kroeker [Tue, 13 Oct 2020 18:07:19 +0000 (20:07 +0200)]
Rename "HALF" type to "BFLOAT16"

4 years agoCleanup
Martin Kroeker [Tue, 13 Oct 2020 17:56:09 +0000 (19:56 +0200)]
Cleanup

4 years agosh prefix renamed to sb
Martin Kroeker [Tue, 13 Oct 2020 17:55:14 +0000 (19:55 +0200)]
sh prefix renamed to sb

4 years agoMerge pull request #98 from xianyi/develop
Martin Kroeker [Tue, 13 Oct 2020 16:50:30 +0000 (18:50 +0200)]
Merge pull request #98 from xianyi/develop

rebase

4 years agoMerge pull request #2892 from RajalakshmiSR/bf16_make
Martin Kroeker [Tue, 13 Oct 2020 16:48:37 +0000 (18:48 +0200)]
Merge pull request #2892 from RajalakshmiSR/bf16_make

Fix build issues with bfloat16

4 years agoFix build issues with bfloat16
Rajalakshmi Srinivasaraghavan [Tue, 13 Oct 2020 16:00:22 +0000 (11:00 -0500)]
Fix build issues with bfloat16

This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.

4 years agoFix typo
Martin Kroeker [Tue, 13 Oct 2020 13:02:17 +0000 (15:02 +0200)]
Fix typo

4 years agoExpressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well
Martin Kroeker [Tue, 13 Oct 2020 12:41:25 +0000 (14:41 +0200)]
Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well

4 years agoMerge pull request #2891 from martin-frbg/fix-2886
Martin Kroeker [Tue, 13 Oct 2020 11:46:17 +0000 (13:46 +0200)]
Merge pull request #2891 from martin-frbg/fix-2886

Fix several bugs and omissions from the BFLOAT16 rename

4 years agoAdd -mssse3 if supported by the hardware
Martin Kroeker [Tue, 13 Oct 2020 09:57:04 +0000 (11:57 +0200)]
Add -mssse3 if supported by the hardware

4 years agoAdd -mssse3
Martin Kroeker [Tue, 13 Oct 2020 09:55:41 +0000 (11:55 +0200)]
Add -mssse3

4 years agoAdd Haswell and Zen to temporary sse3 whitelist
Martin Kroeker [Tue, 13 Oct 2020 09:42:39 +0000 (11:42 +0200)]
Add Haswell and Zen to temporary sse3 whitelist

4 years agowhitelist SANDYBRIDGE for SSE3
Martin Kroeker [Tue, 13 Oct 2020 08:32:19 +0000 (10:32 +0200)]
whitelist SANDYBRIDGE for SSE3

4 years agoCleanup
Martin Kroeker [Tue, 13 Oct 2020 08:14:08 +0000 (10:14 +0200)]
Cleanup

4 years agoFix typos in currently unused sections
Martin Kroeker [Tue, 13 Oct 2020 07:17:15 +0000 (09:17 +0200)]
Fix typos in currently unused sections

4 years agoFix bfloat16 conditional
Martin Kroeker [Tue, 13 Oct 2020 07:11:36 +0000 (09:11 +0200)]
Fix bfloat16 conditional

4 years agoAdd a POWER9 build with BFLOAT16 enabled
Martin Kroeker [Tue, 13 Oct 2020 07:07:50 +0000 (09:07 +0200)]
Add a POWER9 build with BFLOAT16 enabled

4 years agoFix some overlooked "SHBLAS" entries
Martin Kroeker [Tue, 13 Oct 2020 07:05:04 +0000 (09:05 +0200)]
Fix some overlooked "SHBLAS" entries

4 years agoMerge pull request #97 from xianyi/develop
Martin Kroeker [Tue, 13 Oct 2020 07:01:49 +0000 (09:01 +0200)]
Merge pull request #97 from xianyi/develop

rebase

4 years agoRevert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM...
Martin Kroeker [Mon, 12 Oct 2020 22:14:29 +0000 (00:14 +0200)]
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM

4 years agoMerge pull request #2886 from martin-frbg/issue_2767
Martin Kroeker [Mon, 12 Oct 2020 22:04:35 +0000 (00:04 +0200)]
Merge pull request #2886 from martin-frbg/issue_2767

Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix

4 years agoMerge pull request #2881 from mattip/fninit
Martin Kroeker [Mon, 12 Oct 2020 21:50:41 +0000 (23:50 +0200)]
Merge pull request #2881 from mattip/fninit

add fninit to reset fpu registers before assembler routines

4 years agoMerge pull request #2888 from Qiyu8/usimd-sum
Martin Kroeker [Mon, 12 Oct 2020 21:22:08 +0000 (23:22 +0200)]
Merge pull request #2888 from Qiyu8/usimd-sum

Optimize the performance of sum by using universal intrinsics

4 years agouse emms instead, add WIN guards
Matti Picus [Mon, 12 Oct 2020 15:15:01 +0000 (18:15 +0300)]
use emms instead, add WIN guards

4 years agoConvert the prototypes of the unimplemented BFLOAT16 functions to the new naming...
Martin Kroeker [Mon, 12 Oct 2020 12:44:33 +0000 (14:44 +0200)]
Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme

4 years agoOptimize the performance of sum by using universal intrinsics
Qiyu8 [Mon, 12 Oct 2020 11:48:53 +0000 (19:48 +0800)]
Optimize the performance of sum by using universal intrinsics

4 years agoRestore -msse3
Martin Kroeker [Sun, 11 Oct 2020 22:42:05 +0000 (00:42 +0200)]
Restore -msse3

4 years agocommon_sh.h renamed to common_sb.h
Martin Kroeker [Sun, 11 Oct 2020 22:27:11 +0000 (00:27 +0200)]
common_sh.h renamed to common_sb.h

4 years agoChange "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:11:31 +0000 (00:11 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"

4 years agoChange "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:08:29 +0000 (00:08 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"