Martin Kroeker [Fri, 26 Feb 2021 08:06:25 +0000 (09:06 +0100)]
Merge pull request #15 from xianyi/develop
rebase
Martin Kroeker [Fri, 26 Feb 2021 03:18:33 +0000 (04:18 +0100)]
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
Revert "Fix undefined CC in f_check (again)"
Martin Kroeker [Fri, 26 Feb 2021 03:18:04 +0000 (04:18 +0100)]
Revert "Fix undefined CC in f_check (again)"
Martin Kroeker [Thu, 25 Feb 2021 12:48:41 +0000 (13:48 +0100)]
Merge pull request #3118 from martin-frbg/issue3018-2
Fix undefined CC in f_check (again)
Martin Kroeker [Thu, 25 Feb 2021 12:47:34 +0000 (13:47 +0100)]
fix undefined CC again
Martin Kroeker [Thu, 25 Feb 2021 12:45:27 +0000 (13:45 +0100)]
Merge pull request #14 from xianyi/develop
rebase
Martin Kroeker [Wed, 24 Feb 2021 17:39:28 +0000 (18:39 +0100)]
Merge pull request #3117 from haampie/fix-perl
use /usr/bin/env perl
Martin Kroeker [Wed, 24 Feb 2021 17:38:25 +0000 (18:38 +0100)]
Merge pull request #3114 from martin-frbg/issue3113
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
Martin Kroeker [Wed, 24 Feb 2021 17:37:36 +0000 (18:37 +0100)]
Merge pull request #3115 from martin-frbg/issue2532
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
Harmen Stoppels [Wed, 24 Feb 2021 13:07:20 +0000 (14:07 +0100)]
use /usr/bin/env perl
Martin Kroeker [Wed, 24 Feb 2021 08:34:14 +0000 (09:34 +0100)]
Update omatcopy_rt.c
Martin Kroeker [Wed, 24 Feb 2021 08:13:12 +0000 (09:13 +0100)]
Update omatcopy_rt.c
Martin Kroeker [Wed, 24 Feb 2021 08:03:41 +0000 (09:03 +0100)]
Enable optimized S/D OMATCOPY_RT
Martin Kroeker [Wed, 24 Feb 2021 08:00:54 +0000 (09:00 +0100)]
Add optimized omatcopy_rt
Martin Kroeker [Tue, 23 Feb 2021 12:14:35 +0000 (13:14 +0100)]
Typo fix
Martin Kroeker [Mon, 22 Feb 2021 20:35:42 +0000 (21:35 +0100)]
Replace naive omatcopy_rt with 4x4 blocked implementation
as suggested by MigMuc in issue 2532
Martin Kroeker [Mon, 22 Feb 2021 18:40:36 +0000 (19:40 +0100)]
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
Martin Kroeker [Mon, 22 Feb 2021 18:31:41 +0000 (19:31 +0100)]
Merge pull request #13 from xianyi/develop
rebase
Martin Kroeker [Fri, 19 Feb 2021 08:57:18 +0000 (09:57 +0100)]
Merge pull request #3111 from hawkinsp/forkrace
Fix race in blas_thread_shutdown.
Peter Hawkins [Thu, 18 Feb 2021 18:46:50 +0000 (13:46 -0500)]
Fix race in blas_thread_shutdown.
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
Martin Kroeker [Thu, 18 Feb 2021 14:45:25 +0000 (15:45 +0100)]
Merge pull request #3110 from martin-frbg/issue3108
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
Martin Kroeker [Thu, 18 Feb 2021 10:14:05 +0000 (11:14 +0100)]
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
Martin Kroeker [Fri, 12 Feb 2021 12:29:53 +0000 (13:29 +0100)]
Merge pull request #3105 from martin-frbg/tigerlake
Recognize Intel Tiger Lake CPUID as SkylakeX
Martin Kroeker [Fri, 12 Feb 2021 12:29:23 +0000 (13:29 +0100)]
Merge pull request #3106 from RajalakshmiSR/ppcbe
Fix build issue on POWER8 with DYNAMIC_ARCH
Rajalakshmi Srinivasaraghavan [Fri, 12 Feb 2021 03:28:03 +0000 (21:28 -0600)]
Fix build issue on POWER8 with DYNAMIC_ARCH
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
Martin Kroeker [Thu, 11 Feb 2021 19:17:11 +0000 (20:17 +0100)]
Recognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 19:16:27 +0000 (20:16 +0100)]
Recognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 14:42:47 +0000 (15:42 +0100)]
Merge pull request #3104 from martin-frbg/issue3103
Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen
Martin Kroeker [Thu, 11 Feb 2021 14:42:18 +0000 (15:42 +0100)]
Merge pull request #3101 from jake-arkinstall/issue-3100
Addressed issue #3100 - removing an unnecessary write to the include directory
Martin Kroeker [Thu, 11 Feb 2021 08:26:15 +0000 (09:26 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:25:36 +0000 (09:25 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:51 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:16 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:23:05 +0000 (09:23 +0100)]
Enable optimized srot/drot kernels from Haswell
Martin Kroeker [Thu, 11 Feb 2021 07:56:46 +0000 (08:56 +0100)]
Merge pull request #3102 from martin-frbg/issue3099
Strip pkgversion info from compiler version string before comparing
Martin Kroeker [Wed, 10 Feb 2021 13:22:59 +0000 (14:22 +0100)]
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation
Martin Kroeker [Wed, 10 Feb 2021 13:17:24 +0000 (14:17 +0100)]
Merge pull request #12 from xianyi/develop
rebase
Jake Arkinstall [Wed, 10 Feb 2021 12:11:17 +0000 (12:11 +0000)]
Addressed issue #3100, removing an unnecessary write to the include directory
Martin Kroeker [Tue, 2 Feb 2021 12:36:17 +0000 (13:36 +0100)]
Merge pull request #3094 from xoviat/patch-1
build openmp on appveyor
Martin Kroeker [Tue, 2 Feb 2021 12:33:15 +0000 (13:33 +0100)]
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
Martin Kroeker [Tue, 2 Feb 2021 09:53:46 +0000 (10:53 +0100)]
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
Martin Kroeker [Mon, 1 Feb 2021 20:02:53 +0000 (21:02 +0100)]
remove spurious lines (probably editor malfunction)
Martin Kroeker [Mon, 1 Feb 2021 19:18:53 +0000 (20:18 +0100)]
handle AppleClang in Cooperlake support condition
Martin Kroeker [Mon, 1 Feb 2021 18:45:25 +0000 (19:45 +0100)]
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)
xoviat [Sun, 31 Jan 2021 03:28:12 +0000 (21:28 -0600)]
appveyor: cleanup and add openmp run
Martin Kroeker [Sun, 31 Jan 2021 17:02:41 +0000 (18:02 +0100)]
Merge pull request #3073 from xoviat/embedded
add embedded option
Martin Kroeker [Sat, 30 Jan 2021 21:21:28 +0000 (22:21 +0100)]
Merge pull request #3093 from martin-frbg/fix3064
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:46:25 +0000 (16:46 +0100)]
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:23:37 +0000 (16:23 +0100)]
Merge pull request #3092 from RajalakshmiSR/cscal_p10
Optimize cscal function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 29 Jan 2021 19:51:43 +0000 (13:51 -0600)]
Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Fri, 29 Jan 2021 12:37:23 +0000 (13:37 +0100)]
Merge pull request #3091 from martin-frbg/lapack477-2
Fix calculation of the non-exceptional shift values in LAPACK complex QZ
Martin Kroeker [Fri, 29 Jan 2021 09:45:36 +0000 (10:45 +0100)]
fix data type
Martin Kroeker [Fri, 29 Jan 2021 08:56:12 +0000 (09:56 +0100)]
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477)
Martin Kroeker [Fri, 29 Jan 2021 08:52:21 +0000 (09:52 +0100)]
Merge pull request #11 from xianyi/develop
rebase
Martin Kroeker [Wed, 27 Jan 2021 18:11:55 +0000 (19:11 +0100)]
Merge pull request #3087 from martin-frbg/lapack477
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
Martin Kroeker [Wed, 27 Jan 2021 12:41:45 +0000 (13:41 +0100)]
Add exceptional shift to fix rare convergence problems
Martin Kroeker [Wed, 27 Jan 2021 12:39:26 +0000 (13:39 +0100)]
Merge pull request #10 from xianyi/develop
rebase
Martin Kroeker [Wed, 27 Jan 2021 12:25:45 +0000 (13:25 +0100)]
Merge pull request #3076 from martin-frbg/dyn-thunderx
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
Martin Kroeker [Tue, 26 Jan 2021 19:11:42 +0000 (20:11 +0100)]
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
Martin Kroeker [Tue, 26 Jan 2021 14:13:35 +0000 (15:13 +0100)]
Merge pull request #3083 from martin-frbg/develop
Add DYNAMIC_LIST support for ARM64
Martin Kroeker [Mon, 25 Jan 2021 18:02:21 +0000 (19:02 +0100)]
Remove the VORTEX support bits again for now
Martin Kroeker [Mon, 25 Jan 2021 12:13:20 +0000 (13:13 +0100)]
Add DYNAMIC_LIST support for ARM64
Alex Henrie [Mon, 25 Jan 2021 05:20:44 +0000 (22:20 -0700)]
Fix null pointer check in blas_memory_alloc
Martin Kroeker [Sun, 24 Jan 2021 22:18:52 +0000 (23:18 +0100)]
Add DYNAMIC_LIST support for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:01 +0000 (23:18 +0100)]
Add DYNAMIC_LIST option for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:14:45 +0000 (23:14 +0100)]
Merge pull request #9 from xianyi/develop
rebase
Martin Kroeker [Sun, 24 Jan 2021 18:03:40 +0000 (19:03 +0100)]
Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
Rajalakshmi Srinivasaraghavan [Sun, 24 Jan 2021 13:48:28 +0000 (07:48 -0600)]
Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
xoviat [Sun, 24 Jan 2021 04:12:17 +0000 (22:12 -0600)]
add functions for embedded
Martin Kroeker [Sat, 23 Jan 2021 18:08:05 +0000 (19:08 +0100)]
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
Martin Kroeker [Sat, 23 Jan 2021 18:06:29 +0000 (19:06 +0100)]
Merge pull request #3068 from alexhenrie/scan-build
scan-build fixes
Martin Kroeker [Fri, 22 Jan 2021 07:26:00 +0000 (08:26 +0100)]
Merge pull request #3079 from RajalakshmiSR/rotp10
Optimize s/drot function for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 21 Jan 2021 19:24:45 +0000 (13:24 -0600)]
Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Thu, 21 Jan 2021 07:51:30 +0000 (08:51 +0100)]
Merge pull request #3075 from martin-frbg/issue3074
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
Martin Kroeker [Wed, 20 Jan 2021 20:34:36 +0000 (21:34 +0100)]
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
Martin Kroeker [Wed, 20 Jan 2021 19:21:27 +0000 (20:21 +0100)]
Update .drone.yml
Martin Kroeker [Wed, 20 Jan 2021 17:30:05 +0000 (18:30 +0100)]
Add gcc10/arm64 DYNAMIC_ARCH build
Martin Kroeker [Wed, 20 Jan 2021 14:41:04 +0000 (15:41 +0100)]
Require gcc 11 for builtin_cpu_is(power10)
fixes #3074
Martin Kroeker [Wed, 20 Jan 2021 14:38:30 +0000 (15:38 +0100)]
Merge pull request #8 from xianyi/develop
rebase
xoviat [Tue, 19 Jan 2021 14:57:44 +0000 (08:57 -0600)]
add cortex-m platform
Martin Kroeker [Sat, 16 Jan 2021 14:47:34 +0000 (15:47 +0100)]
Merge pull request #3070 from RajalakshmiSR/cdot
Optimize cdot function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 15 Jan 2021 19:40:34 +0000 (13:40 -0600)]
Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Alex Henrie [Fri, 15 Jan 2021 02:40:32 +0000 (19:40 -0700)]
Remove dead assignment to dflag in rotmg functions
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Don't define the mode variable when not needed in gemm functions
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Fix uninitialized argument value in dasum_k
Martin Kroeker [Thu, 14 Jan 2021 20:35:19 +0000 (21:35 +0100)]
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
Fix building "generic" TRMM kernel with CMake
Martin Kroeker [Thu, 14 Jan 2021 15:47:59 +0000 (16:47 +0100)]
Merge pull request #3064 from martin-frbg/issue3063
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
Martin Kroeker [Thu, 14 Jan 2021 15:00:38 +0000 (16:00 +0100)]
Merge pull request #3066 from martin-frbg/buffsizefix
Fix compile-time setting of the GEMM buffer size for gmake builds
Martin Kroeker [Thu, 14 Jan 2021 14:59:53 +0000 (15:59 +0100)]
Merge pull request #3062 from austinpagan/GemmPreferedSize3
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
Martin Kroeker [Thu, 14 Jan 2021 14:59:21 +0000 (15:59 +0100)]
Merge pull request #3061 from martin-frbg/arm64-pgi
Support NVIDIA HPC SDK on ARM64
Martin Kroeker [Thu, 14 Jan 2021 14:56:25 +0000 (15:56 +0100)]
Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
Albert Ziegenhagel [Thu, 14 Jan 2021 09:00:49 +0000 (10:00 +0100)]
Fix building "generic" TRMM kernel with CMake
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
Martin Kroeker [Wed, 13 Jan 2021 21:36:04 +0000 (22:36 +0100)]
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
Martin Kroeker [Wed, 13 Jan 2021 11:30:26 +0000 (12:30 +0100)]
Workaround for cmake having its own C_COMPILER variable
Martin Kroeker [Wed, 13 Jan 2021 08:46:53 +0000 (09:46 +0100)]
try to work around gcc update problems
Martin Kroeker [Tue, 12 Jan 2021 23:30:27 +0000 (00:30 +0100)]
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
Martin Kroeker [Tue, 12 Jan 2021 23:29:38 +0000 (00:29 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot