Qiyu8 [Thu, 15 Oct 2020 03:08:10 +0000 (11:08 +0800)]
adapt arm platform
Qiyu8 [Thu, 15 Oct 2020 02:29:42 +0000 (10:29 +0800)]
Add double precision universal intrinsics for X86/ARM
Martin Kroeker [Wed, 14 Oct 2020 07:02:03 +0000 (09:02 +0200)]
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
Martin Kroeker [Wed, 14 Oct 2020 07:01:16 +0000 (09:01 +0200)]
Merge pull request #2895 from martin-frbg/sb-tests
Fix remaining build errors related to bfloat16 and cmake
Martin Kroeker [Wed, 14 Oct 2020 06:12:08 +0000 (08:12 +0200)]
Merge pull request #2894 from RajalakshmiSR/bf16_packing
POWER10: Change the packing format for bfloat16
Martin Kroeker [Tue, 13 Oct 2020 23:08:50 +0000 (01:08 +0200)]
Replace Makefile with simplified version again
Martin Kroeker [Tue, 13 Oct 2020 23:01:58 +0000 (01:01 +0200)]
Add express -mavx and -msse options (and fix a stray = for cooperlake)
Martin Kroeker [Tue, 13 Oct 2020 21:21:38 +0000 (23:21 +0200)]
Add the BFLOAT16 functions to cmake builds
Rajalakshmi Srinivasaraghavan [Tue, 13 Oct 2020 21:05:10 +0000 (16:05 -0500)]
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
Martin Kroeker [Tue, 13 Oct 2020 18:07:19 +0000 (20:07 +0200)]
Rename "HALF" type to "BFLOAT16"
Martin Kroeker [Tue, 13 Oct 2020 17:56:09 +0000 (19:56 +0200)]
Cleanup
Martin Kroeker [Tue, 13 Oct 2020 17:55:14 +0000 (19:55 +0200)]
sh prefix renamed to sb
Martin Kroeker [Tue, 13 Oct 2020 16:50:30 +0000 (18:50 +0200)]
Merge pull request #98 from xianyi/develop
rebase
Martin Kroeker [Tue, 13 Oct 2020 16:48:37 +0000 (18:48 +0200)]
Merge pull request #2892 from RajalakshmiSR/bf16_make
Fix build issues with bfloat16
Rajalakshmi Srinivasaraghavan [Tue, 13 Oct 2020 16:00:22 +0000 (11:00 -0500)]
Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
Martin Kroeker [Tue, 13 Oct 2020 13:02:17 +0000 (15:02 +0200)]
Fix typo
Martin Kroeker [Tue, 13 Oct 2020 12:41:25 +0000 (14:41 +0200)]
Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well
Martin Kroeker [Tue, 13 Oct 2020 11:46:17 +0000 (13:46 +0200)]
Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
Martin Kroeker [Tue, 13 Oct 2020 09:57:04 +0000 (11:57 +0200)]
Add -mssse3 if supported by the hardware
Martin Kroeker [Tue, 13 Oct 2020 09:55:41 +0000 (11:55 +0200)]
Add -mssse3
Martin Kroeker [Tue, 13 Oct 2020 09:42:39 +0000 (11:42 +0200)]
Add Haswell and Zen to temporary sse3 whitelist
Martin Kroeker [Tue, 13 Oct 2020 08:32:19 +0000 (10:32 +0200)]
whitelist SANDYBRIDGE for SSE3
Martin Kroeker [Tue, 13 Oct 2020 08:14:08 +0000 (10:14 +0200)]
Cleanup
Martin Kroeker [Tue, 13 Oct 2020 07:17:15 +0000 (09:17 +0200)]
Fix typos in currently unused sections
Martin Kroeker [Tue, 13 Oct 2020 07:11:36 +0000 (09:11 +0200)]
Fix bfloat16 conditional
Martin Kroeker [Tue, 13 Oct 2020 07:07:50 +0000 (09:07 +0200)]
Add a POWER9 build with BFLOAT16 enabled
Martin Kroeker [Tue, 13 Oct 2020 07:05:04 +0000 (09:05 +0200)]
Fix some overlooked "SHBLAS" entries
Martin Kroeker [Tue, 13 Oct 2020 07:01:49 +0000 (09:01 +0200)]
Merge pull request #97 from xianyi/develop
rebase
Martin Kroeker [Mon, 12 Oct 2020 22:14:29 +0000 (00:14 +0200)]
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM
Martin Kroeker [Mon, 12 Oct 2020 22:04:35 +0000 (00:04 +0200)]
Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
Martin Kroeker [Mon, 12 Oct 2020 21:50:41 +0000 (23:50 +0200)]
Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
Martin Kroeker [Mon, 12 Oct 2020 21:22:08 +0000 (23:22 +0200)]
Merge pull request #2888 from Qiyu8/usimd-sum
Optimize the performance of sum by using universal intrinsics
Matti Picus [Mon, 12 Oct 2020 15:15:01 +0000 (18:15 +0300)]
use emms instead, add WIN guards
Martin Kroeker [Mon, 12 Oct 2020 12:44:33 +0000 (14:44 +0200)]
Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme
Qiyu8 [Mon, 12 Oct 2020 11:48:53 +0000 (19:48 +0800)]
Optimize the performance of sum by using universal intrinsics
Martin Kroeker [Sun, 11 Oct 2020 22:42:05 +0000 (00:42 +0200)]
Restore -msse3
Martin Kroeker [Sun, 11 Oct 2020 22:27:11 +0000 (00:27 +0200)]
common_sh.h renamed to common_sb.h
Martin Kroeker [Sun, 11 Oct 2020 22:11:31 +0000 (00:11 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:08:29 +0000 (00:08 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:07:37 +0000 (00:07 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:06:06 +0000 (00:06 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:05:05 +0000 (00:05 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:03:21 +0000 (00:03 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:02:16 +0000 (00:02 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 22:00:55 +0000 (00:00 +0200)]
Change "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:56:17 +0000 (23:56 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:54:53 +0000 (23:54 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:53:50 +0000 (23:53 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:52:45 +0000 (23:52 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:51:34 +0000 (23:51 +0200)]
Rename "HALF" and "sh" to "BFLOAT16"and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:50:54 +0000 (23:50 +0200)]
Rename common_sh.h to common_sb.h
Martin Kroeker [Sun, 11 Oct 2020 21:49:22 +0000 (23:49 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:44:38 +0000 (23:44 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:43:36 +0000 (23:43 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:42:45 +0000 (23:42 +0200)]
Rename compare_sgemm_shgemm.c to compare_sgemm_sbgemm.c
Martin Kroeker [Sun, 11 Oct 2020 21:42:07 +0000 (23:42 +0200)]
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:41:13 +0000 (23:41 +0200)]
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c
Martin Kroeker [Sun, 11 Oct 2020 21:40:43 +0000 (23:40 +0200)]
Rename shdot.c to sbdot.c
Martin Kroeker [Sun, 11 Oct 2020 21:39:42 +0000 (23:39 +0200)]
rename "HALF" and "sh" to "BFLOAT16" and "sb"
Martin Kroeker [Sun, 11 Oct 2020 21:37:38 +0000 (23:37 +0200)]
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c
Martin Kroeker [Sun, 11 Oct 2020 21:34:36 +0000 (23:34 +0200)]
Merge pull request #96 from xianyi/develop
rebase
Martin Kroeker [Sun, 11 Oct 2020 21:34:14 +0000 (23:34 +0200)]
Merge branch 'develop' into develop
Martin Kroeker [Sun, 11 Oct 2020 20:22:30 +0000 (22:22 +0200)]
Merge pull request #2882 from martin-frbg/issue2709
Use generic C for (D/Z)NRM2 on Windows x86_64
Martin Kroeker [Sun, 11 Oct 2020 20:21:33 +0000 (22:21 +0200)]
Merge pull request #2852 from martin-frbg/issue2588-cmake
Support building only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 16:29:34 +0000 (18:29 +0200)]
repair TABs
Martin Kroeker [Sun, 11 Oct 2020 16:25:16 +0000 (18:25 +0200)]
Copy BUILD_ settings to the LAPACK make.inc
Martin Kroeker [Sun, 11 Oct 2020 16:08:21 +0000 (18:08 +0200)]
Set BUILD_ options to 1 instead of just defining them
Martin Kroeker [Sun, 11 Oct 2020 15:45:41 +0000 (17:45 +0200)]
Add cblas_xerbla interface
Martin Kroeker [Sun, 11 Oct 2020 15:33:51 +0000 (17:33 +0200)]
If none of the BUILD_ options is set, enable them all
Martin Kroeker [Sun, 11 Oct 2020 15:23:08 +0000 (17:23 +0200)]
remove debug output
Martin Kroeker [Sun, 11 Oct 2020 13:45:24 +0000 (15:45 +0200)]
Merge pull request #2885 from martin-frbg/ifexists
Improve CMAKE check for conflicting config_kernel.h
Martin Kroeker [Sun, 11 Oct 2020 13:14:03 +0000 (15:14 +0200)]
Merge pull request #2884 from martin-frbg/sse_fixup
Add workaround for unwanted default activation of -msse3 in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 11 Oct 2020 13:11:15 +0000 (15:11 +0200)]
Allow building support for only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 13:01:32 +0000 (15:01 +0200)]
Adapt for supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:58:57 +0000 (14:58 +0200)]
Adapt for supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:57:32 +0000 (14:57 +0200)]
Adapt for supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:53:26 +0000 (14:53 +0200)]
Allow supporting only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:52:09 +0000 (14:52 +0200)]
Allow compiling only a subset of kernels for specific variable types
Martin Kroeker [Sun, 11 Oct 2020 12:49:58 +0000 (14:49 +0200)]
Add Makefile support for enabling only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:49:06 +0000 (14:49 +0200)]
Add Makefile support for enabling only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:48:23 +0000 (14:48 +0200)]
Add Makefile support for enabling only some variable types
Martin Kroeker [Sun, 11 Oct 2020 12:46:24 +0000 (14:46 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:45:40 +0000 (14:45 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:44:56 +0000 (14:44 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:44:13 +0000 (14:44 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:43:13 +0000 (14:43 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:42:26 +0000 (14:42 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:41:43 +0000 (14:41 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:40:51 +0000 (14:40 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:40:06 +0000 (14:40 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:39:19 +0000 (14:39 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:38:25 +0000 (14:38 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:36:45 +0000 (14:36 +0200)]
Adapt to having only a subset of variable types supported
Martin Kroeker [Sun, 11 Oct 2020 12:34:12 +0000 (14:34 +0200)]
Adapt for having only a subset of variable types
Martin Kroeker [Sun, 11 Oct 2020 12:32:00 +0000 (14:32 +0200)]
Adapt ctests to having only a subset of types in the build
Martin Kroeker [Sun, 11 Oct 2020 12:25:24 +0000 (14:25 +0200)]
Adapt tests to having only a subset of types in the build
Martin Kroeker [Sun, 11 Oct 2020 12:15:35 +0000 (14:15 +0200)]
Adapt utests for builds supportin only some variable types
Martin Kroeker [Sun, 11 Oct 2020 11:57:07 +0000 (13:57 +0200)]
Merge branch 'develop' into issue2588-cmake
Martin Kroeker [Sun, 11 Oct 2020 11:26:05 +0000 (13:26 +0200)]
Add files via upload
Martin Kroeker [Sun, 11 Oct 2020 10:58:17 +0000 (12:58 +0200)]
Improve check for conflicting config_kernel.h