Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop
rebase
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
Don't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork
See #2993 for an analysis
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.
Martin Kroeker [Fri, 13 Nov 2020 11:35:09 +0000 (12:35 +0100)]
Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
Martin Kroeker [Fri, 13 Nov 2020 08:16:34 +0000 (09:16 +0100)]
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
Martin Kroeker [Fri, 13 Nov 2020 08:14:23 +0000 (09:14 +0100)]
Merge pull request #111 from xianyi/develop
rebase
Gengxin Xie [Fri, 13 Nov 2020 06:20:52 +0000 (14:20 +0800)]
Improve the performance of dasum and sasum when SMP is defined
Qiyu8 [Fri, 13 Nov 2020 02:20:24 +0000 (10:20 +0800)]
modify system.cmake to enable fma flag
Qiyu8 [Thu, 12 Nov 2020 12:31:03 +0000 (20:31 +0800)]
fix the CI failure of target specific option mismatch
Qiyu8 [Thu, 12 Nov 2020 09:35:17 +0000 (17:35 +0800)]
fix the CI failure of lack the head
Qiyu8 [Wed, 11 Nov 2020 07:53:48 +0000 (15:53 +0800)]
modify macro
Qiyu8 [Wed, 11 Nov 2020 07:18:01 +0000 (15:18 +0800)]
only FMA3 and vector larger than 128 have positive effects.
Qiyu8 [Wed, 11 Nov 2020 06:33:12 +0000 (14:33 +0800)]
Optimize the performance of rot by using universal intrinsics
Qiyu8 [Tue, 10 Nov 2020 08:16:38 +0000 (16:16 +0800)]
fix sum optimize issues
Xianyi Zhang [Tue, 10 Nov 2020 01:38:43 +0000 (09:38 +0800)]
Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:30:15 +0000 (20:30 +0100)]
Fix propagation of cpu properties to compiler options
Martin Kroeker [Sat, 7 Nov 2020 19:27:42 +0000 (20:27 +0100)]
Remove extraneous quotes that caused a cmake policy warning
Martin Kroeker [Sat, 7 Nov 2020 19:26:12 +0000 (20:26 +0100)]
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
Martin Kroeker [Sat, 7 Nov 2020 19:22:41 +0000 (20:22 +0100)]
Merge pull request #110 from xianyi/develop
rebase
Martin Kroeker [Sat, 7 Nov 2020 13:41:34 +0000 (14:41 +0100)]
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
Martin Kroeker [Sat, 7 Nov 2020 11:17:49 +0000 (12:17 +0100)]
Fix macro name used in ifdef
Gengxin Xie [Thu, 5 Nov 2020 08:25:17 +0000 (16:25 +0800)]
fix typo
Gengxin Xie [Sun, 27 Sep 2020 02:38:19 +0000 (10:38 +0800)]
Improve the performance of rot by using AVX512 and AVX2 intrinsic
Martin Kroeker [Wed, 4 Nov 2020 15:02:46 +0000 (16:02 +0100)]
Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
Martin Kroeker [Tue, 3 Nov 2020 22:47:04 +0000 (23:47 +0100)]
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
Martin Kroeker [Tue, 3 Nov 2020 22:45:49 +0000 (23:45 +0100)]
Add -msse3 where needed for DYNAMIC_ARCH builds
Martin Kroeker [Mon, 2 Nov 2020 22:17:46 +0000 (23:17 +0100)]
Fix target test
Martin Kroeker [Mon, 2 Nov 2020 21:43:50 +0000 (22:43 +0100)]
Add files via upload
Martin Kroeker [Mon, 2 Nov 2020 17:54:36 +0000 (18:54 +0100)]
Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10: Change dgemm unroll factors
Aisha Tammy [Mon, 2 Nov 2020 13:04:53 +0000 (13:04 +0000)]
allow setting soname without suffix or prefix
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
Martin Kroeker [Sun, 1 Nov 2020 21:25:43 +0000 (22:25 +0100)]
typo fix
Martin Kroeker [Sun, 1 Nov 2020 21:11:48 +0000 (22:11 +0100)]
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
Martin Kroeker [Sun, 1 Nov 2020 20:58:26 +0000 (21:58 +0100)]
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
Martin Kroeker [Sun, 1 Nov 2020 13:24:40 +0000 (14:24 +0100)]
Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
Martin Kroeker [Sun, 1 Nov 2020 08:14:54 +0000 (09:14 +0100)]
Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
Rajalakshmi Srinivasaraghavan [Sat, 31 Oct 2020 23:28:57 +0000 (18:28 -0500)]
POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
Martin Kroeker [Sat, 31 Oct 2020 23:00:43 +0000 (00:00 +0100)]
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
User User-User [Sat, 31 Oct 2020 22:38:08 +0000 (00:38 +0200)]
add openbsd gfortran
Martin Kroeker [Sat, 31 Oct 2020 21:33:52 +0000 (22:33 +0100)]
Merge pull request #109 from xianyi/develop
rebase
Martin Kroeker [Sat, 31 Oct 2020 19:24:21 +0000 (20:24 +0100)]
Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
Guillaume Horel [Sat, 31 Oct 2020 14:00:48 +0000 (10:00 -0400)]
fix avx2 detection
reword commits to make it clearer
Martin Kroeker [Fri, 30 Oct 2020 07:54:10 +0000 (08:54 +0100)]
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
Optimize caxpy for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 29 Oct 2020 19:57:51 +0000 (14:57 -0500)]
Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Thu, 29 Oct 2020 19:28:37 +0000 (20:28 +0100)]
Merge pull request #2940 from Qiyu8/optimize-benchmark
Refactor the performance measurement system
Martin Kroeker [Thu, 29 Oct 2020 08:22:33 +0000 (09:22 +0100)]
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
Implementation of BF16 based gemv
Martin Kroeker [Thu, 29 Oct 2020 08:22:07 +0000 (09:22 +0100)]
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Fix cooperlake compile issue
Chen, Guobing [Wed, 28 Oct 2020 19:37:51 +0000 (03:37 +0800)]
Fix cooperlake compile issue
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
Chen, Guobing [Wed, 28 Oct 2020 00:49:12 +0000 (08:49 +0800)]
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
Martin Kroeker [Wed, 28 Oct 2020 08:38:40 +0000 (09:38 +0100)]
Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
Martin Kroeker [Wed, 28 Oct 2020 08:37:56 +0000 (09:37 +0100)]
Merge pull request #2951 from martin-frbg/cleanup_make
Minor Makefile cleanup
Martin Kroeker [Wed, 28 Oct 2020 08:37:32 +0000 (09:37 +0100)]
Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
Martin Kroeker [Wed, 28 Oct 2020 08:37:09 +0000 (09:37 +0100)]
Merge pull request #2948 from martin-frbg/issue2947
Expressly enable neon for use with intrinsics if available
Martin Kroeker [Tue, 27 Oct 2020 22:01:19 +0000 (23:01 +0100)]
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET
Martin Kroeker [Tue, 27 Oct 2020 16:51:32 +0000 (17:51 +0100)]
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
Martin Kroeker [Mon, 26 Oct 2020 23:02:18 +0000 (00:02 +0100)]
Merge pull request #2950 from RajalakshmiSR/saxpy
Optimize saxpy for POWER10
Martin Kroeker [Mon, 26 Oct 2020 20:37:04 +0000 (21:37 +0100)]
Remove debug printout of object list
Martin Kroeker [Mon, 26 Oct 2020 20:35:40 +0000 (21:35 +0100)]
Remove spurious expr in flang version check
Rajalakshmi Srinivasaraghavan [Mon, 26 Oct 2020 18:24:59 +0000 (13:24 -0500)]
Optimize saxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Qiyu8 [Mon, 26 Oct 2020 02:25:05 +0000 (10:25 +0800)]
Refractoring remaining benchmark cases.
Martin Kroeker [Sun, 25 Oct 2020 23:43:44 +0000 (00:43 +0100)]
Merge pull request #2946 from martin-frbg/issue2945
Move definitions that are neither needed nor supported on Solaris
Martin Kroeker [Sun, 25 Oct 2020 23:21:56 +0000 (00:21 +0100)]
Expressly enable neon for use with intrinsics if available
Martin Kroeker [Sun, 25 Oct 2020 11:01:50 +0000 (12:01 +0100)]
Move definitions that are neither needed nor supported on SUNOS
Martin Kroeker [Sat, 24 Oct 2020 21:29:05 +0000 (23:29 +0200)]
Update version to 0.3.12.dev
Martin Kroeker [Sat, 24 Oct 2020 21:28:29 +0000 (23:28 +0200)]
Update version to 0.3.12.dev