Martin Kroeker [Sat, 24 Oct 2020 11:10:51 +0000 (13:10 +0200)]
Merge pull request #2944 from xianyi/release-0.3.0
Merge back 0.3.12 tag (and Changelog typo fixes) from release
Martin Kroeker [Sat, 24 Oct 2020 11:03:28 +0000 (13:03 +0200)]
Fix typos
Martin Kroeker [Sat, 24 Oct 2020 10:52:59 +0000 (12:52 +0200)]
Merge pull request #2943 from xianyi/develop
Merge from develop for 0.3.12 release
Martin Kroeker [Sat, 24 Oct 2020 10:50:04 +0000 (12:50 +0200)]
Update Changelog with 0.3.12 changes
Martin Kroeker [Sat, 24 Oct 2020 10:15:33 +0000 (12:15 +0200)]
Update version to 0.3.12 for release
Martin Kroeker [Sat, 24 Oct 2020 10:14:45 +0000 (12:14 +0200)]
Update version to 0.3.12 for release
Martin Kroeker [Sat, 24 Oct 2020 07:26:50 +0000 (09:26 +0200)]
Merge pull request #2942 from martin-frbg/makebuildtypes
Comment out BUILD_SINGLE etc. in Makefile.rule and add a short explanation
Martin Kroeker [Fri, 23 Oct 2020 21:32:06 +0000 (23:32 +0200)]
Comment out BUILD_SINGLE etc. and add a short explanation
Martin Kroeker [Fri, 23 Oct 2020 18:47:35 +0000 (20:47 +0200)]
Merge pull request #2941 from martin-frbg/exportsfix
Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol
Martin Kroeker [Fri, 23 Oct 2020 13:53:40 +0000 (15:53 +0200)]
Fix wrong grouping of dcombssq
Martin Kroeker [Fri, 23 Oct 2020 13:31:25 +0000 (15:31 +0200)]
fix missing split of sladiv1/dladiv/ilaenv2stage by build type
Martin Kroeker [Fri, 23 Oct 2020 13:29:48 +0000 (15:29 +0200)]
Merge pull request #108 from xianyi/develop
rebase
Martin Kroeker [Fri, 23 Oct 2020 05:15:32 +0000 (07:15 +0200)]
Merge pull request #2937 from martin-frbg/pwr-buffersz
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
Martin Kroeker [Thu, 22 Oct 2020 22:19:49 +0000 (00:19 +0200)]
Merge pull request #2938 from martin-frbg/2934-3
Fix twisted spelling that broke the gfortran version test again
Martin Kroeker [Thu, 22 Oct 2020 22:18:29 +0000 (00:18 +0200)]
Fix twisted spelling that broke the gfortran version test again
Martin Kroeker [Thu, 22 Oct 2020 22:12:06 +0000 (00:12 +0200)]
Increase BUFFERSIZE further
Martin Kroeker [Thu, 22 Oct 2020 20:14:26 +0000 (22:14 +0200)]
label always_inline function as inline to silence a gcc warning
Martin Kroeker [Thu, 22 Oct 2020 20:08:46 +0000 (22:08 +0200)]
Merge pull request #2936 from martin-frbg/issue2934-2
Fix compiler version check for -mavx2 support (DYNAMIC_ARCH case)
Martin Kroeker [Thu, 22 Oct 2020 17:25:58 +0000 (19:25 +0200)]
Merge pull request #2935 from martin-frbg/lapack458
Fix macro used in argument conversion (LAPACK PR 458)
Martin Kroeker [Thu, 22 Oct 2020 16:47:07 +0000 (18:47 +0200)]
Increase BUFFERSIZE for POWER8-10 and use same value for POWER6
to fix overflow warning for PWR8 ZGEMM and PWR9 C/ZGEMM and avoid size mismatches in DYNAMIC_ARCH
Martin Kroeker [Thu, 22 Oct 2020 14:23:29 +0000 (16:23 +0200)]
Fix compiler version check
Martin Kroeker [Thu, 22 Oct 2020 14:21:09 +0000 (16:21 +0200)]
Merge pull request #106 from xianyi/develop
rebase
Martin Kroeker [Thu, 22 Oct 2020 14:19:26 +0000 (16:19 +0200)]
Fix macro used in argument conversion (LAPACK PR 458)
Martin Kroeker [Wed, 21 Oct 2020 22:29:46 +0000 (00:29 +0200)]
Merge pull request #2932 from RajalakshmiSR/copyp10
Optimize scopy/ccopy for POWER10
Martin Kroeker [Wed, 21 Oct 2020 22:29:02 +0000 (00:29 +0200)]
Merge pull request #2934 from thrasibule/improve_version_check
actually check that version is greater than 4.7
Guillaume Horel [Wed, 21 Oct 2020 20:42:37 +0000 (16:42 -0400)]
actually check that version is greater than 4.7
Rajalakshmi Srinivasaraghavan [Wed, 21 Oct 2020 14:53:45 +0000 (09:53 -0500)]
Optimize scopy/ccopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
Martin Kroeker [Wed, 21 Oct 2020 09:43:01 +0000 (11:43 +0200)]
Merge pull request #2930 from ismail/fix-no-return
Fix build with -Werror=return-type
Martin Kroeker [Wed, 21 Oct 2020 08:11:02 +0000 (10:11 +0200)]
Merge pull request #2928 from martin-frbg/issue2917
Enable -mavx2 for flang as well where supported
İsmail Dönmez [Wed, 21 Oct 2020 06:43:39 +0000 (08:43 +0200)]
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
Martin Kroeker [Tue, 20 Oct 2020 21:56:30 +0000 (23:56 +0200)]
Enable -mavx2 for flang as well
Martin Kroeker [Tue, 20 Oct 2020 21:48:53 +0000 (23:48 +0200)]
Merge pull request #105 from xianyi/develop
rebase
Martin Kroeker [Tue, 20 Oct 2020 09:27:36 +0000 (11:27 +0200)]
Merge pull request #2925 from martin-frbg/issue2911-2
Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build
Martin Kroeker [Tue, 20 Oct 2020 07:24:47 +0000 (09:24 +0200)]
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all
x86_64: clobber all xmm registers after vzeroupper
Martin Kroeker [Tue, 20 Oct 2020 06:37:53 +0000 (08:37 +0200)]
Fix missing backquotes
Bart Oldeman [Tue, 20 Oct 2020 02:16:47 +0000 (02:16 +0000)]
x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
Martin Kroeker [Mon, 19 Oct 2020 23:09:49 +0000 (01:09 +0200)]
Add POWER10 support flag (unconditionally for now)
Martin Kroeker [Mon, 19 Oct 2020 23:04:20 +0000 (01:04 +0200)]
Add ld/binutils version check for POWER10 support
Martin Kroeker [Mon, 19 Oct 2020 22:55:41 +0000 (00:55 +0200)]
Move HAVE_P10_SUPPORT to the build system
to be able to include a binutils version check
Martin Kroeker [Mon, 19 Oct 2020 22:52:08 +0000 (00:52 +0200)]
Merge pull request #104 from xianyi/develop
rebase
Martin Kroeker [Mon, 19 Oct 2020 21:33:45 +0000 (23:33 +0200)]
Merge pull request #2924 from martin-frbg/issue2920
Put back all symbols accidentally dropped in the reorganization of gensymbol
Martin Kroeker [Mon, 19 Oct 2020 21:33:31 +0000 (23:33 +0200)]
Merge pull request #2916 from martin-frbg/issue2911
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
Martin Kroeker [Mon, 19 Oct 2020 18:37:52 +0000 (20:37 +0200)]
Add back symbols that got dropped when splitting by type
Martin Kroeker [Mon, 19 Oct 2020 15:43:53 +0000 (17:43 +0200)]
Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT
Martin Kroeker [Mon, 19 Oct 2020 13:56:20 +0000 (15:56 +0200)]
Merge pull request #103 from xianyi/develop
rebase
Martin Kroeker [Mon, 19 Oct 2020 07:12:12 +0000 (09:12 +0200)]
Fix spurious trailing whitespace in comment
Martin Kroeker [Mon, 19 Oct 2020 06:14:27 +0000 (08:14 +0200)]
Merge pull request #2919 from isuruf/export
Fix exporting some lapack and cblas symbols
Martin Kroeker [Mon, 19 Oct 2020 06:11:22 +0000 (08:11 +0200)]
Remove -mmma again (reduntant with cpu=power10) and add override statements
Isuru Fernando [Mon, 19 Oct 2020 02:42:32 +0000 (21:42 -0500)]
Fix exporting some lapack and cblas
Martin Kroeker [Sun, 18 Oct 2020 22:09:54 +0000 (00:09 +0200)]
Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
sgemm_direct_skylakex: fix 75eeb26 regression.
Martin Kroeker [Sun, 18 Oct 2020 21:04:56 +0000 (23:04 +0200)]
Merge pull request #2913 from martin-frbg/issue2910
Support cross-compiling for Apple Vortex
Bart Oldeman [Sun, 18 Oct 2020 19:50:38 +0000 (19:50 +0000)]
sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
Martin Kroeker [Sun, 18 Oct 2020 17:41:43 +0000 (19:41 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:39:18 +0000 (19:39 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:37:04 +0000 (19:37 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:31:01 +0000 (19:31 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:29:45 +0000 (19:29 +0200)]
Clean up STACKSIZE redefinition
Martin Kroeker [Sun, 18 Oct 2020 17:27:51 +0000 (19:27 +0200)]
Add compiler option -mmma for POWER10
Martin Kroeker [Sun, 18 Oct 2020 17:22:05 +0000 (19:22 +0200)]
Fix naming of L2 cache size item reported for Vortex
Martin Kroeker [Sun, 18 Oct 2020 17:16:08 +0000 (19:16 +0200)]
Merge pull request #2909 from isuruf/patch-1
Need a space when redirecting to file
Martin Kroeker [Sun, 18 Oct 2020 17:10:58 +0000 (19:10 +0200)]
Support cross-compiling for Apple Vortex
Martin Kroeker [Sun, 18 Oct 2020 16:54:54 +0000 (18:54 +0200)]
Support cross-compiling for Apple Vortex
Martin Kroeker [Sun, 18 Oct 2020 16:49:59 +0000 (18:49 +0200)]
Merge pull request #102 from xianyi/develop
rebase
Isuru Fernando [Sun, 18 Oct 2020 14:40:31 +0000 (09:40 -0500)]
Need a space when redirecting to file
Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def
Martin Kroeker [Sat, 17 Oct 2020 20:40:47 +0000 (22:40 +0200)]
Update version string to 0.3.11.dev
Martin Kroeker [Sat, 17 Oct 2020 20:40:06 +0000 (22:40 +0200)]
Update version string to 0.3.11.dev
Martin Kroeker [Sat, 17 Oct 2020 20:38:58 +0000 (22:38 +0200)]
Merge pull request #2908 from xianyi/release-0.3.0
Synchronyse tag with release 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:14:12 +0000 (22:14 +0200)]
Merge pull request #2907 from xianyi/develop
Update from develop for 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:11:34 +0000 (22:11 +0200)]
Update version number to 0.3.11
Martin Kroeker [Sat, 17 Oct 2020 20:10:50 +0000 (22:10 +0200)]
Update version for 0.3.11 release
Martin Kroeker [Sat, 17 Oct 2020 20:07:14 +0000 (22:07 +0200)]
Merge pull request #2906 from martin-frbg/changelog-0311
Update Changelog.txt with the 0.3.11 changes
Martin Kroeker [Sat, 17 Oct 2020 20:05:36 +0000 (22:05 +0200)]
Update Changelog.txt with the 0.3.11 changes
Martin Kroeker [Sat, 17 Oct 2020 07:45:22 +0000 (09:45 +0200)]
Merge pull request #2905 from martin-frbg/aocc-clang
Add -mavx for clang & aocc
Martin Kroeker [Fri, 16 Oct 2020 18:52:15 +0000 (20:52 +0200)]
Add AVX flags for clang/aocc as well
Martin Kroeker [Fri, 16 Oct 2020 18:48:58 +0000 (20:48 +0200)]
Merge pull request #101 from xianyi/develop
rebase
Martin Kroeker [Fri, 16 Oct 2020 14:17:36 +0000 (16:17 +0200)]
Merge pull request #2900 from martin-frbg/fixcmake_sse
Add compiler options for SSE to the cmake support files
Martin Kroeker [Fri, 16 Oct 2020 08:47:06 +0000 (10:47 +0200)]
Add compiler options for sse/sse2/ssse3/sse4.1
Martin Kroeker [Fri, 16 Oct 2020 08:41:53 +0000 (10:41 +0200)]
Add sse options for use of intrinics with older compilers
Martin Kroeker [Fri, 16 Oct 2020 07:55:48 +0000 (09:55 +0200)]
fix core list for sse/sse2
Martin Kroeker [Fri, 16 Oct 2020 05:26:39 +0000 (07:26 +0200)]
Merge pull request #2898 from martin-frbg/morefixes
More pre-release fixes
Martin Kroeker [Thu, 15 Oct 2020 20:10:32 +0000 (22:10 +0200)]
add sse2
Martin Kroeker [Thu, 15 Oct 2020 18:16:15 +0000 (20:16 +0200)]
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
Martin Kroeker [Thu, 15 Oct 2020 17:08:12 +0000 (19:08 +0200)]
Silence a redefinition warning
Martin Kroeker [Thu, 15 Oct 2020 17:06:45 +0000 (19:06 +0200)]
Add -msse where supported, apparently required for older gcc
Martin Kroeker [Thu, 15 Oct 2020 17:05:37 +0000 (19:05 +0200)]
Use ifdef instead of if
Martin Kroeker [Thu, 15 Oct 2020 16:54:20 +0000 (18:54 +0200)]
Merge pull request #100 from xianyi/develop
rebase
Martin Kroeker [Thu, 15 Oct 2020 09:12:35 +0000 (11:12 +0200)]
Merge pull request #2896 from martin-frbg/intrin-double
Add compiler flag for SSE4 where available
Martin Kroeker [Thu, 15 Oct 2020 06:38:24 +0000 (08:38 +0200)]
Merge pull request #2897 from Qiyu8/usimd-double
Add double precision universal intrinsics for X86/ARM
Martin Kroeker [Thu, 15 Oct 2020 06:37:02 +0000 (08:37 +0200)]
Revert "add double precision SSE"
Qiyu8 [Thu, 15 Oct 2020 03:08:10 +0000 (11:08 +0800)]
adapt arm platform
Qiyu8 [Thu, 15 Oct 2020 02:29:42 +0000 (10:29 +0800)]
Add double precision universal intrinsics for X86/ARM
Martin Kroeker [Wed, 14 Oct 2020 18:34:33 +0000 (20:34 +0200)]
add sse4.1 for DYNAMIC_ARCH kernels
Martin Kroeker [Wed, 14 Oct 2020 17:18:07 +0000 (19:18 +0200)]
Add -msse4.1 when SSE4.1 is supported
Martin Kroeker [Wed, 14 Oct 2020 16:10:45 +0000 (18:10 +0200)]
Add double precision operations
Martin Kroeker [Wed, 14 Oct 2020 16:09:20 +0000 (18:09 +0200)]
Merge pull request #99 from xianyi/develop
rebase
Martin Kroeker [Wed, 14 Oct 2020 07:02:03 +0000 (09:02 +0200)]
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
Martin Kroeker [Wed, 14 Oct 2020 07:01:16 +0000 (09:01 +0200)]
Merge pull request #2895 from martin-frbg/sb-tests
Fix remaining build errors related to bfloat16 and cmake
Martin Kroeker [Wed, 14 Oct 2020 06:12:08 +0000 (08:12 +0200)]
Merge pull request #2894 from RajalakshmiSR/bf16_packing
POWER10: Change the packing format for bfloat16
Martin Kroeker [Tue, 13 Oct 2020 23:08:50 +0000 (01:08 +0200)]
Replace Makefile with simplified version again
Martin Kroeker [Tue, 13 Oct 2020 23:01:58 +0000 (01:01 +0200)]
Add express -mavx and -msse options (and fix a stray = for cooperlake)