Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:30:15 +0000 (20:30 +0100)]
Fix propagation of cpu properties to compiler options
Martin Kroeker [Sat, 7 Nov 2020 19:27:42 +0000 (20:27 +0100)]
Remove extraneous quotes that caused a cmake policy warning
Martin Kroeker [Sat, 7 Nov 2020 19:26:12 +0000 (20:26 +0100)]
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:22:41 +0000 (20:22 +0100)]
Merge pull request #110 from xianyi/develop
rebase
Martin Kroeker [Sat, 7 Nov 2020 13:41:34 +0000 (14:41 +0100)]
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
Martin Kroeker [Sat, 7 Nov 2020 11:17:49 +0000 (12:17 +0100)]
Fix macro name used in ifdef
Gengxin Xie [Thu, 5 Nov 2020 08:25:17 +0000 (16:25 +0800)]
fix typo
Gengxin Xie [Sun, 27 Sep 2020 02:38:19 +0000 (10:38 +0800)]
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Wed, 4 Nov 2020 15:02:46 +0000 (16:02 +0100)]
Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
Martin Kroeker [Tue, 3 Nov 2020 22:47:04 +0000 (23:47 +0100)]
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
Martin Kroeker [Tue, 3 Nov 2020 22:45:49 +0000 (23:45 +0100)]
Add -msse3 where needed for DYNAMIC_ARCH builds
Martin Kroeker [Mon, 2 Nov 2020 22:17:46 +0000 (23:17 +0100)]
Fix target test
Martin Kroeker [Mon, 2 Nov 2020 21:43:50 +0000 (22:43 +0100)]
Add files via upload
Martin Kroeker [Mon, 2 Nov 2020 17:54:36 +0000 (18:54 +0100)]
Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10: Change dgemm unroll factors
Martin Kroeker [Sun, 1 Nov 2020 21:25:43 +0000 (22:25 +0100)]
typo fix
Martin Kroeker [Sun, 1 Nov 2020 21:11:48 +0000 (22:11 +0100)]
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
Martin Kroeker [Sun, 1 Nov 2020 20:58:26 +0000 (21:58 +0100)]
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
Martin Kroeker [Sun, 1 Nov 2020 13:24:40 +0000 (14:24 +0100)]
Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
Martin Kroeker [Sun, 1 Nov 2020 08:14:54 +0000 (09:14 +0100)]
Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
Rajalakshmi Srinivasaraghavan [Sat, 31 Oct 2020 23:28:57 +0000 (18:28 -0500)]
POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
Martin Kroeker [Sat, 31 Oct 2020 23:00:43 +0000 (00:00 +0100)]
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
User User-User [Sat, 31 Oct 2020 22:38:08 +0000 (00:38 +0200)]
add openbsd gfortran
Martin Kroeker [Sat, 31 Oct 2020 21:33:52 +0000 (22:33 +0100)]
Merge pull request #109 from xianyi/develop
rebase
Martin Kroeker [Sat, 31 Oct 2020 19:24:21 +0000 (20:24 +0100)]
Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
Guillaume Horel [Sat, 31 Oct 2020 14:00:48 +0000 (10:00 -0400)]
fix avx2 detection
reword commits to make it clearer
Martin Kroeker [Fri, 30 Oct 2020 07:54:10 +0000 (08:54 +0100)]
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
Optimize caxpy for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 29 Oct 2020 19:57:51 +0000 (14:57 -0500)]
Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Thu, 29 Oct 2020 19:28:37 +0000 (20:28 +0100)]
Merge pull request #2940 from Qiyu8/optimize-benchmark
Refactor the performance measurement system
Martin Kroeker [Thu, 29 Oct 2020 08:22:33 +0000 (09:22 +0100)]
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
Implementation of BF16 based gemv
Martin Kroeker [Thu, 29 Oct 2020 08:22:07 +0000 (09:22 +0100)]
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Fix cooperlake compile issue
Chen, Guobing [Wed, 28 Oct 2020 19:37:51 +0000 (03:37 +0800)]
Fix cooperlake compile issue
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
Chen, Guobing [Wed, 28 Oct 2020 00:49:12 +0000 (08:49 +0800)]
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
Martin Kroeker [Wed, 28 Oct 2020 08:38:40 +0000 (09:38 +0100)]
Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
Martin Kroeker [Wed, 28 Oct 2020 08:37:56 +0000 (09:37 +0100)]
Merge pull request #2951 from martin-frbg/cleanup_make
Minor Makefile cleanup
Martin Kroeker [Wed, 28 Oct 2020 08:37:32 +0000 (09:37 +0100)]
Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
Martin Kroeker [Wed, 28 Oct 2020 08:37:09 +0000 (09:37 +0100)]
Merge pull request #2948 from martin-frbg/issue2947
Expressly enable neon for use with intrinsics if available
Martin Kroeker [Tue, 27 Oct 2020 22:01:19 +0000 (23:01 +0100)]
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET
Martin Kroeker [Tue, 27 Oct 2020 16:51:32 +0000 (17:51 +0100)]
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
Martin Kroeker [Mon, 26 Oct 2020 23:02:18 +0000 (00:02 +0100)]
Merge pull request #2950 from RajalakshmiSR/saxpy
Optimize saxpy for POWER10
Martin Kroeker [Mon, 26 Oct 2020 20:37:04 +0000 (21:37 +0100)]
Remove debug printout of object list
Martin Kroeker [Mon, 26 Oct 2020 20:35:40 +0000 (21:35 +0100)]
Remove spurious expr in flang version check
Rajalakshmi Srinivasaraghavan [Mon, 26 Oct 2020 18:24:59 +0000 (13:24 -0500)]
Optimize saxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Qiyu8 [Mon, 26 Oct 2020 02:25:05 +0000 (10:25 +0800)]
Refractoring remaining benchmark cases.
Martin Kroeker [Sun, 25 Oct 2020 23:43:44 +0000 (00:43 +0100)]
Merge pull request #2946 from martin-frbg/issue2945
Move definitions that are neither needed nor supported on Solaris
Martin Kroeker [Sun, 25 Oct 2020 23:21:56 +0000 (00:21 +0100)]
Expressly enable neon for use with intrinsics if available
Martin Kroeker [Sun, 25 Oct 2020 11:01:50 +0000 (12:01 +0100)]
Move definitions that are neither needed nor supported on SUNOS
Martin Kroeker [Sat, 24 Oct 2020 21:29:05 +0000 (23:29 +0200)]
Update version to 0.3.12.dev
Martin Kroeker [Sat, 24 Oct 2020 21:28:29 +0000 (23:28 +0200)]
Update version to 0.3.12.dev
Martin Kroeker [Sat, 24 Oct 2020 11:10:51 +0000 (13:10 +0200)]
Merge pull request #2944 from xianyi/release-0.3.0
Merge back 0.3.12 tag (and Changelog typo fixes) from release
Martin Kroeker [Sat, 24 Oct 2020 11:03:28 +0000 (13:03 +0200)]
Fix typos
Martin Kroeker [Sat, 24 Oct 2020 10:52:59 +0000 (12:52 +0200)]
Merge pull request #2943 from xianyi/develop
Merge from develop for 0.3.12 release
Martin Kroeker [Sat, 24 Oct 2020 10:50:04 +0000 (12:50 +0200)]
Update Changelog with 0.3.12 changes
Martin Kroeker [Sat, 24 Oct 2020 10:15:33 +0000 (12:15 +0200)]
Update version to 0.3.12 for release
Martin Kroeker [Sat, 24 Oct 2020 10:14:45 +0000 (12:14 +0200)]
Update version to 0.3.12 for release
Martin Kroeker [Sat, 24 Oct 2020 07:26:50 +0000 (09:26 +0200)]
Merge pull request #2942 from martin-frbg/makebuildtypes
Comment out BUILD_SINGLE etc. in Makefile.rule and add a short explanation
Martin Kroeker [Fri, 23 Oct 2020 21:32:06 +0000 (23:32 +0200)]
Comment out BUILD_SINGLE etc. and add a short explanation
Martin Kroeker [Fri, 23 Oct 2020 18:47:35 +0000 (20:47 +0200)]
Merge pull request #2941 from martin-frbg/exportsfix
Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol
Martin Kroeker [Fri, 23 Oct 2020 13:53:40 +0000 (15:53 +0200)]
Fix wrong grouping of dcombssq
Martin Kroeker [Fri, 23 Oct 2020 13:31:25 +0000 (15:31 +0200)]
fix missing split of sladiv1/dladiv/ilaenv2stage by build type
Martin Kroeker [Fri, 23 Oct 2020 13:29:48 +0000 (15:29 +0200)]
Merge pull request #108 from xianyi/develop
rebase
Martin Kroeker [Fri, 23 Oct 2020 05:15:32 +0000 (07:15 +0200)]
Merge pull request #2937 from martin-frbg/pwr-buffersz
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
Qiyu8 [Fri, 23 Oct 2020 02:32:03 +0000 (10:32 +0800)]
Refactor the performance measurement system
Guillaume Horel [Fri, 23 Oct 2020 02:00:00 +0000 (22:00 -0400)]
reuse variables defined in Makefile.system
Martin Kroeker [Thu, 22 Oct 2020 22:19:49 +0000 (00:19 +0200)]
Merge pull request #2938 from martin-frbg/2934-3
Fix twisted spelling that broke the gfortran version test again
Martin Kroeker [Thu, 22 Oct 2020 22:18:29 +0000 (00:18 +0200)]
Fix twisted spelling that broke the gfortran version test again
Martin Kroeker [Thu, 22 Oct 2020 22:12:06 +0000 (00:12 +0200)]
Increase BUFFERSIZE further
Martin Kroeker [Thu, 22 Oct 2020 20:14:26 +0000 (22:14 +0200)]
label always_inline function as inline to silence a gcc warning
Martin Kroeker [Thu, 22 Oct 2020 20:08:46 +0000 (22:08 +0200)]
Merge pull request #2936 from martin-frbg/issue2934-2
Fix compiler version check for -mavx2 support (DYNAMIC_ARCH case)
Martin Kroeker [Thu, 22 Oct 2020 17:25:58 +0000 (19:25 +0200)]
Merge pull request #2935 from martin-frbg/lapack458
Fix macro used in argument conversion (LAPACK PR 458)
Martin Kroeker [Thu, 22 Oct 2020 16:47:07 +0000 (18:47 +0200)]
Increase BUFFERSIZE for POWER8-10 and use same value for POWER6
to fix overflow warning for PWR8 ZGEMM and PWR9 C/ZGEMM and avoid size mismatches in DYNAMIC_ARCH
Martin Kroeker [Thu, 22 Oct 2020 14:23:29 +0000 (16:23 +0200)]
Fix compiler version check
Martin Kroeker [Thu, 22 Oct 2020 14:21:09 +0000 (16:21 +0200)]
Merge pull request #106 from xianyi/develop
rebase
Martin Kroeker [Thu, 22 Oct 2020 14:19:26 +0000 (16:19 +0200)]
Fix macro used in argument conversion (LAPACK PR 458)
Martin Kroeker [Wed, 21 Oct 2020 22:29:46 +0000 (00:29 +0200)]
Merge pull request #2932 from RajalakshmiSR/copyp10
Optimize scopy/ccopy for POWER10
Martin Kroeker [Wed, 21 Oct 2020 22:29:02 +0000 (00:29 +0200)]
Merge pull request #2934 from thrasibule/improve_version_check
actually check that version is greater than 4.7
Guillaume Horel [Wed, 21 Oct 2020 20:42:37 +0000 (16:42 -0400)]
actually check that version is greater than 4.7
Rajalakshmi Srinivasaraghavan [Wed, 21 Oct 2020 14:53:45 +0000 (09:53 -0500)]
Optimize scopy/ccopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
Martin Kroeker [Wed, 21 Oct 2020 09:43:01 +0000 (11:43 +0200)]
Merge pull request #2930 from ismail/fix-no-return
Fix build with -Werror=return-type
Martin Kroeker [Wed, 21 Oct 2020 08:11:02 +0000 (10:11 +0200)]
Merge pull request #2928 from martin-frbg/issue2917
Enable -mavx2 for flang as well where supported
İsmail Dönmez [Wed, 21 Oct 2020 06:43:39 +0000 (08:43 +0200)]
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
Martin Kroeker [Tue, 20 Oct 2020 21:56:30 +0000 (23:56 +0200)]
Enable -mavx2 for flang as well
Martin Kroeker [Tue, 20 Oct 2020 21:48:53 +0000 (23:48 +0200)]
Merge pull request #105 from xianyi/develop
rebase
Martin Kroeker [Tue, 20 Oct 2020 09:27:36 +0000 (11:27 +0200)]
Merge pull request #2925 from martin-frbg/issue2911-2
Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build
Martin Kroeker [Tue, 20 Oct 2020 07:24:47 +0000 (09:24 +0200)]
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all
x86_64: clobber all xmm registers after vzeroupper
Martin Kroeker [Tue, 20 Oct 2020 06:37:53 +0000 (08:37 +0200)]
Fix missing backquotes
Bart Oldeman [Tue, 20 Oct 2020 02:16:47 +0000 (02:16 +0000)]
x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.