Martin Kroeker [Tue, 20 Oct 2020 09:27:36 +0000 (11:27 +0200)]
Merge pull request #2925 from martin-frbg/issue2911-2
Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build
Martin Kroeker [Tue, 20 Oct 2020 07:24:47 +0000 (09:24 +0200)]
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all
x86_64: clobber all xmm registers after vzeroupper
Martin Kroeker [Tue, 20 Oct 2020 06:37:53 +0000 (08:37 +0200)]
Fix missing backquotes
Bart Oldeman [Tue, 20 Oct 2020 02:16:47 +0000 (02:16 +0000)]
x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
Martin Kroeker [Mon, 19 Oct 2020 23:09:49 +0000 (01:09 +0200)]
Add POWER10 support flag (unconditionally for now)
Martin Kroeker [Mon, 19 Oct 2020 23:04:20 +0000 (01:04 +0200)]
Add ld/binutils version check for POWER10 support
Martin Kroeker [Mon, 19 Oct 2020 22:55:41 +0000 (00:55 +0200)]
Move HAVE_P10_SUPPORT to the build system
to be able to include a binutils version check
Martin Kroeker [Mon, 19 Oct 2020 22:52:08 +0000 (00:52 +0200)]
Merge pull request #104 from xianyi/develop
rebase
Martin Kroeker [Mon, 19 Oct 2020 21:33:45 +0000 (23:33 +0200)]
Merge pull request #2924 from martin-frbg/issue2920
Put back all symbols accidentally dropped in the reorganization of gensymbol
Martin Kroeker [Mon, 19 Oct 2020 21:33:31 +0000 (23:33 +0200)]
Merge pull request #2916 from martin-frbg/issue2911
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
Martin Kroeker [Mon, 19 Oct 2020 18:37:52 +0000 (20:37 +0200)]
Add back symbols that got dropped when splitting by type
Martin Kroeker [Mon, 19 Oct 2020 15:43:53 +0000 (17:43 +0200)]
Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT
Martin Kroeker [Mon, 19 Oct 2020 13:56:20 +0000 (15:56 +0200)]
Merge pull request #103 from xianyi/develop
rebase
Martin Kroeker [Mon, 19 Oct 2020 07:12:12 +0000 (09:12 +0200)]
Fix spurious trailing whitespace in comment
Martin Kroeker [Mon, 19 Oct 2020 06:14:27 +0000 (08:14 +0200)]
Merge pull request #2919 from isuruf/export
Fix exporting some lapack and cblas symbols
Martin Kroeker [Mon, 19 Oct 2020 06:11:22 +0000 (08:11 +0200)]
Remove -mmma again (reduntant with cpu=power10) and add override statements
Isuru Fernando [Mon, 19 Oct 2020 02:42:32 +0000 (21:42 -0500)]
Fix exporting some lapack and cblas
Martin Kroeker [Sun, 18 Oct 2020 22:09:54 +0000 (00:09 +0200)]
Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
sgemm_direct_skylakex: fix 75eeb26 regression.
Martin Kroeker [Sun, 18 Oct 2020 21:04:56 +0000 (23:04 +0200)]
Merge pull request #2913 from martin-frbg/issue2910
Support cross-compiling for Apple Vortex
Bart Oldeman [Sun, 18 Oct 2020 19:50:38 +0000 (19:50 +0000)]
sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
Martin Kroeker [Sun, 18 Oct 2020 17:41:43 +0000 (19:41 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:39:18 +0000 (19:39 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:37:04 +0000 (19:37 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:31:01 +0000 (19:31 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:29:45 +0000 (19:29 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:27:51 +0000 (19:27 +0200)]
Add compiler option -mmma for POWER10
Martin Kroeker [Sun, 18 Oct 2020 17:22:05 +0000 (19:22 +0200)]
Fix naming of L2 cache size item reported for Vortex
Martin Kroeker [Sun, 18 Oct 2020 17:16:08 +0000 (19:16 +0200)]
Merge pull request #2909 from isuruf/patch-1
Need a space when redirecting to file
Martin Kroeker [Sun, 18 Oct 2020 17:10:58 +0000 (19:10 +0200)]
Support cross-compiling for Apple Vortex
Martin Kroeker [Sun, 18 Oct 2020 16:54:54 +0000 (18:54 +0200)]
Support cross-compiling for Apple Vortex
Martin Kroeker [Sun, 18 Oct 2020 16:49:59 +0000 (18:49 +0200)]
Merge pull request #102 from xianyi/develop
rebase
Isuru Fernando [Sun, 18 Oct 2020 14:40:31 +0000 (09:40 -0500)]
Need a space when redirecting to file
Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def
Martin Kroeker [Sat, 17 Oct 2020 20:40:47 +0000 (22:40 +0200)]
Update version string to 0.3.11.dev
Martin Kroeker [Sat, 17 Oct 2020 20:40:06 +0000 (22:40 +0200)]
Update version string to 0.3.11.dev
Martin Kroeker [Sat, 17 Oct 2020 20:38:58 +0000 (22:38 +0200)]
Merge pull request #2908 from xianyi/release-0.3.0
Synchronyse tag with release 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:14:12 +0000 (22:14 +0200)]
Merge pull request #2907 from xianyi/develop
Update from develop for 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:11:34 +0000 (22:11 +0200)]
Update version number to 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:10:50 +0000 (22:10 +0200)]
Update version for 0.3.11 release
Martin Kroeker [Sat, 17 Oct 2020 20:07:14 +0000 (22:07 +0200)]
Merge pull request #2906 from martin-frbg/changelog-0311
Update Changelog.txt with the 0.3.11 changes
Martin Kroeker [Sat, 17 Oct 2020 20:05:36 +0000 (22:05 +0200)]
Update Changelog.txt with the 0.3.11 changes
Martin Kroeker [Sat, 17 Oct 2020 07:45:22 +0000 (09:45 +0200)]
Merge pull request #2905 from martin-frbg/aocc-clang
Add -mavx for clang & aocc
Martin Kroeker [Fri, 16 Oct 2020 18:52:15 +0000 (20:52 +0200)]
Add AVX flags for clang/aocc as well
Martin Kroeker [Fri, 16 Oct 2020 18:48:58 +0000 (20:48 +0200)]
Merge pull request #101 from xianyi/develop
rebase
Martin Kroeker [Fri, 16 Oct 2020 14:17:36 +0000 (16:17 +0200)]
Merge pull request #2900 from martin-frbg/fixcmake_sse
Add compiler options for SSE to the cmake support files
Martin Kroeker [Fri, 16 Oct 2020 08:47:06 +0000 (10:47 +0200)]
Add compiler options for sse/sse2/ssse3/sse4.1
Martin Kroeker [Fri, 16 Oct 2020 08:41:53 +0000 (10:41 +0200)]
Add sse options for use of intrinics with older compilers
Martin Kroeker [Fri, 16 Oct 2020 07:55:48 +0000 (09:55 +0200)]
fix core list for sse/sse2
Martin Kroeker [Fri, 16 Oct 2020 05:26:39 +0000 (07:26 +0200)]
Merge pull request #2898 from martin-frbg/morefixes
More pre-release fixes
Martin Kroeker [Thu, 15 Oct 2020 20:10:32 +0000 (22:10 +0200)]
add sse2
Martin Kroeker [Thu, 15 Oct 2020 18:16:15 +0000 (20:16 +0200)]
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
Martin Kroeker [Thu, 15 Oct 2020 17:08:12 +0000 (19:08 +0200)]
Silence a redefinition warning
Martin Kroeker [Thu, 15 Oct 2020 17:06:45 +0000 (19:06 +0200)]
Add -msse where supported, apparently required for older gcc
Martin Kroeker [Thu, 15 Oct 2020 17:05:37 +0000 (19:05 +0200)]
Use ifdef instead of if
Martin Kroeker [Thu, 15 Oct 2020 16:54:20 +0000 (18:54 +0200)]
Merge pull request #100 from xianyi/develop
rebase
Martin Kroeker [Thu, 15 Oct 2020 09:12:35 +0000 (11:12 +0200)]
Merge pull request #2896 from martin-frbg/intrin-double
Add compiler flag for SSE4 where available
Martin Kroeker [Thu, 15 Oct 2020 06:38:24 +0000 (08:38 +0200)]
Merge pull request #2897 from Qiyu8/usimd-double
Add double precision universal intrinsics for X86/ARM
Martin Kroeker [Thu, 15 Oct 2020 06:37:02 +0000 (08:37 +0200)]
Revert "add double precision SSE"
Qiyu8 [Thu, 15 Oct 2020 03:08:10 +0000 (11:08 +0800)]
adapt arm platform
Qiyu8 [Thu, 15 Oct 2020 02:29:42 +0000 (10:29 +0800)]
Add double precision universal intrinsics for X86/ARM
Martin Kroeker [Wed, 14 Oct 2020 18:34:33 +0000 (20:34 +0200)]
add sse4.1 for DYNAMIC_ARCH kernels
Martin Kroeker [Wed, 14 Oct 2020 17:18:07 +0000 (19:18 +0200)]
Add -msse4.1 when SSE4.1 is supported
Martin Kroeker [Wed, 14 Oct 2020 16:10:45 +0000 (18:10 +0200)]
Add double precision operations
Martin Kroeker [Wed, 14 Oct 2020 16:09:20 +0000 (18:09 +0200)]
Merge pull request #99 from xianyi/develop
rebase
Martin Kroeker [Wed, 14 Oct 2020 07:02:03 +0000 (09:02 +0200)]
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
Martin Kroeker [Wed, 14 Oct 2020 07:01:16 +0000 (09:01 +0200)]
Merge pull request #2895 from martin-frbg/sb-tests
Fix remaining build errors related to bfloat16 and cmake
Martin Kroeker [Wed, 14 Oct 2020 06:12:08 +0000 (08:12 +0200)]
Merge pull request #2894 from RajalakshmiSR/bf16_packing
POWER10: Change the packing format for bfloat16
Martin Kroeker [Tue, 13 Oct 2020 23:08:50 +0000 (01:08 +0200)]
Replace Makefile with simplified version again
Martin Kroeker [Tue, 13 Oct 2020 23:01:58 +0000 (01:01 +0200)]
Add express -mavx and -msse options (and fix a stray = for cooperlake)
Martin Kroeker [Tue, 13 Oct 2020 21:21:38 +0000 (23:21 +0200)]
Add the BFLOAT16 functions to cmake builds
Rajalakshmi Srinivasaraghavan [Tue, 13 Oct 2020 21:05:10 +0000 (16:05 -0500)]
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
Martin Kroeker [Tue, 13 Oct 2020 18:07:19 +0000 (20:07 +0200)]
Rename "HALF" type to "BFLOAT16"
Martin Kroeker [Tue, 13 Oct 2020 17:56:09 +0000 (19:56 +0200)]
Cleanup
Martin Kroeker [Tue, 13 Oct 2020 17:55:14 +0000 (19:55 +0200)]
sh prefix renamed to sb
Martin Kroeker [Tue, 13 Oct 2020 16:50:30 +0000 (18:50 +0200)]
Merge pull request #98 from xianyi/develop
rebase
Martin Kroeker [Tue, 13 Oct 2020 16:48:37 +0000 (18:48 +0200)]
Merge pull request #2892 from RajalakshmiSR/bf16_make
Fix build issues with bfloat16
Rajalakshmi Srinivasaraghavan [Tue, 13 Oct 2020 16:00:22 +0000 (11:00 -0500)]
Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
Martin Kroeker [Tue, 13 Oct 2020 13:02:17 +0000 (15:02 +0200)]
Fix typo
Martin Kroeker [Tue, 13 Oct 2020 12:41:25 +0000 (14:41 +0200)]
Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well
Martin Kroeker [Tue, 13 Oct 2020 11:46:17 +0000 (13:46 +0200)]
Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
Martin Kroeker [Tue, 13 Oct 2020 09:57:04 +0000 (11:57 +0200)]
Add -mssse3 if supported by the hardware
Martin Kroeker [Tue, 13 Oct 2020 09:55:41 +0000 (11:55 +0200)]
Add -mssse3
Martin Kroeker [Tue, 13 Oct 2020 09:42:39 +0000 (11:42 +0200)]
Add Haswell and Zen to temporary sse3 whitelist
Martin Kroeker [Tue, 13 Oct 2020 08:32:19 +0000 (10:32 +0200)]
whitelist SANDYBRIDGE for SSE3
Martin Kroeker [Tue, 13 Oct 2020 08:14:08 +0000 (10:14 +0200)]
Cleanup
Martin Kroeker [Tue, 13 Oct 2020 07:17:15 +0000 (09:17 +0200)]
Fix typos in currently unused sections
Martin Kroeker [Tue, 13 Oct 2020 07:11:36 +0000 (09:11 +0200)]
Fix bfloat16 conditional
Martin Kroeker [Tue, 13 Oct 2020 07:07:50 +0000 (09:07 +0200)]
Add a POWER9 build with BFLOAT16 enabled
Martin Kroeker [Tue, 13 Oct 2020 07:05:04 +0000 (09:05 +0200)]
Fix some overlooked "SHBLAS" entries
Martin Kroeker [Tue, 13 Oct 2020 07:01:49 +0000 (09:01 +0200)]
Merge pull request #97 from xianyi/develop
rebase
Martin Kroeker [Mon, 12 Oct 2020 22:14:29 +0000 (00:14 +0200)]
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM
Martin Kroeker [Mon, 12 Oct 2020 22:04:35 +0000 (00:04 +0200)]
Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
Martin Kroeker [Mon, 12 Oct 2020 21:50:41 +0000 (23:50 +0200)]
Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
Martin Kroeker [Mon, 12 Oct 2020 21:22:08 +0000 (23:22 +0200)]
Merge pull request #2888 from Qiyu8/usimd-sum
Optimize the performance of sum by using universal intrinsics
Matti Picus [Mon, 12 Oct 2020 15:15:01 +0000 (18:15 +0300)]
use emms instead, add WIN guards
Martin Kroeker [Mon, 12 Oct 2020 12:44:33 +0000 (14:44 +0200)]
Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme
Qiyu8 [Mon, 12 Oct 2020 11:48:53 +0000 (19:48 +0800)]
Optimize the performance of sum by using universal intrinsics
Martin Kroeker [Sun, 11 Oct 2020 22:42:05 +0000 (00:42 +0200)]
Restore -msse3
Martin Kroeker [Sun, 11 Oct 2020 22:27:11 +0000 (00:27 +0200)]
common_sh.h renamed to common_sb.h
Martin Kroeker [Sun, 11 Oct 2020 22:11:31 +0000 (00:11 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:08:29 +0000 (00:08 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"