Martin Kroeker [Thu, 11 Mar 2021 22:03:58 +0000 (23:03 +0100)]
Fix syntax
Martin Kroeker [Thu, 11 Mar 2021 10:53:51 +0000 (11:53 +0100)]
Support compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:52:29 +0000 (11:52 +0100)]
Support compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:51:09 +0000 (11:51 +0100)]
Support compilation with the NAG Fortran compiler
Martin Kroeker [Thu, 11 Mar 2021 10:48:37 +0000 (11:48 +0100)]
Merge pull request #16 from xianyi/develop
rebase
Martin Kroeker [Thu, 11 Mar 2021 06:18:05 +0000 (07:18 +0100)]
Merge pull request #3137 from RajalakshmiSR/zscal_p10
Optimize zscal function for POWER10
Rajalakshmi Srinivasaraghavan [Wed, 10 Mar 2021 23:15:33 +0000 (17:15 -0600)]
Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 6 Mar 2021 18:15:53 +0000 (19:15 +0100)]
Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 13:35:49 +0000 (14:35 +0100)]
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 08:13:59 +0000 (09:13 +0100)]
Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 5 Mar 2021 22:22:36 +0000 (16:22 -0600)]
Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Tue, 2 Mar 2021 20:27:21 +0000 (21:27 +0100)]
Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
Martin Kroeker [Tue, 2 Mar 2021 16:50:55 +0000 (17:50 +0100)]
Support timing Apple M1
Martin Kroeker [Tue, 2 Mar 2021 08:58:40 +0000 (09:58 +0100)]
Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
Martin Kroeker [Mon, 1 Mar 2021 20:00:10 +0000 (21:00 +0100)]
Fix AMD AOCC compiler detection
Martin Kroeker [Sun, 28 Feb 2021 21:13:09 +0000 (22:13 +0100)]
Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
Martin Kroeker [Sun, 28 Feb 2021 17:57:05 +0000 (18:57 +0100)]
Adjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:53:20 +0000 (18:53 +0100)]
Adjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:51:03 +0000 (18:51 +0100)]
Add rewritten cchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:50:26 +0000 (18:50 +0100)]
Add rewritten dchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:49:50 +0000 (18:49 +0100)]
Add rewritten schkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:49:10 +0000 (18:49 +0100)]
Add rewritten zchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:47:06 +0000 (18:47 +0100)]
Delete zchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:52 +0000 (18:46 +0100)]
Delete schkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:38 +0000 (18:46 +0100)]
Delete dchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:08 +0000 (18:46 +0100)]
Delete cchkee.f
Martin Kroeker [Sat, 27 Feb 2021 18:15:49 +0000 (19:15 +0100)]
Merge pull request #3121 from RajalakshmiSR/mmarename
POWER10: Rename mma builtins
Rajalakshmi Srinivasaraghavan [Sat, 27 Feb 2021 02:56:34 +0000 (20:56 -0600)]
POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:
77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
Martin Kroeker [Fri, 26 Feb 2021 10:50:47 +0000 (11:50 +0100)]
Merge pull request #3120 from martin-frbg/3118-x
Fix use of undefined CC variable in f_check
Martin Kroeker [Fri, 26 Feb 2021 08:09:43 +0000 (09:09 +0100)]
fix undefined CC variable
Martin Kroeker [Fri, 26 Feb 2021 08:06:25 +0000 (09:06 +0100)]
Merge pull request #15 from xianyi/develop
rebase
Martin Kroeker [Fri, 26 Feb 2021 03:18:33 +0000 (04:18 +0100)]
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
Revert "Fix undefined CC in f_check (again)"
Martin Kroeker [Fri, 26 Feb 2021 03:18:04 +0000 (04:18 +0100)]
Revert "Fix undefined CC in f_check (again)"
Martin Kroeker [Thu, 25 Feb 2021 12:48:41 +0000 (13:48 +0100)]
Merge pull request #3118 from martin-frbg/issue3018-2
Fix undefined CC in f_check (again)
Martin Kroeker [Thu, 25 Feb 2021 12:47:34 +0000 (13:47 +0100)]
fix undefined CC again
Martin Kroeker [Thu, 25 Feb 2021 12:45:27 +0000 (13:45 +0100)]
Merge pull request #14 from xianyi/develop
rebase
Martin Kroeker [Wed, 24 Feb 2021 17:39:28 +0000 (18:39 +0100)]
Merge pull request #3117 from haampie/fix-perl
use /usr/bin/env perl
Martin Kroeker [Wed, 24 Feb 2021 17:38:25 +0000 (18:38 +0100)]
Merge pull request #3114 from martin-frbg/issue3113
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
Martin Kroeker [Wed, 24 Feb 2021 17:37:36 +0000 (18:37 +0100)]
Merge pull request #3115 from martin-frbg/issue2532
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
Harmen Stoppels [Wed, 24 Feb 2021 13:07:20 +0000 (14:07 +0100)]
use /usr/bin/env perl
Martin Kroeker [Wed, 24 Feb 2021 08:34:14 +0000 (09:34 +0100)]
Update omatcopy_rt.c
Martin Kroeker [Wed, 24 Feb 2021 08:13:12 +0000 (09:13 +0100)]
Update omatcopy_rt.c
Martin Kroeker [Wed, 24 Feb 2021 08:03:41 +0000 (09:03 +0100)]
Enable optimized S/D OMATCOPY_RT
Martin Kroeker [Wed, 24 Feb 2021 08:00:54 +0000 (09:00 +0100)]
Add optimized omatcopy_rt
Martin Kroeker [Tue, 23 Feb 2021 12:14:35 +0000 (13:14 +0100)]
Typo fix
Martin Kroeker [Mon, 22 Feb 2021 20:35:42 +0000 (21:35 +0100)]
Replace naive omatcopy_rt with 4x4 blocked implementation
as suggested by MigMuc in issue 2532
Martin Kroeker [Mon, 22 Feb 2021 18:40:36 +0000 (19:40 +0100)]
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
Martin Kroeker [Mon, 22 Feb 2021 18:31:41 +0000 (19:31 +0100)]
Merge pull request #13 from xianyi/develop
rebase
Martin Kroeker [Fri, 19 Feb 2021 08:57:18 +0000 (09:57 +0100)]
Merge pull request #3111 from hawkinsp/forkrace
Fix race in blas_thread_shutdown.
Peter Hawkins [Thu, 18 Feb 2021 18:46:50 +0000 (13:46 -0500)]
Fix race in blas_thread_shutdown.
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
Martin Kroeker [Thu, 18 Feb 2021 14:45:25 +0000 (15:45 +0100)]
Merge pull request #3110 from martin-frbg/issue3108
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
Martin Kroeker [Thu, 18 Feb 2021 10:14:05 +0000 (11:14 +0100)]
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
Martin Kroeker [Fri, 12 Feb 2021 12:29:53 +0000 (13:29 +0100)]
Merge pull request #3105 from martin-frbg/tigerlake
Recognize Intel Tiger Lake CPUID as SkylakeX
Martin Kroeker [Fri, 12 Feb 2021 12:29:23 +0000 (13:29 +0100)]
Merge pull request #3106 from RajalakshmiSR/ppcbe
Fix build issue on POWER8 with DYNAMIC_ARCH
Rajalakshmi Srinivasaraghavan [Fri, 12 Feb 2021 03:28:03 +0000 (21:28 -0600)]
Fix build issue on POWER8 with DYNAMIC_ARCH
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
Martin Kroeker [Thu, 11 Feb 2021 19:17:11 +0000 (20:17 +0100)]
Recognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 19:16:27 +0000 (20:16 +0100)]
Recognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 14:42:47 +0000 (15:42 +0100)]
Merge pull request #3104 from martin-frbg/issue3103
Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen
Martin Kroeker [Thu, 11 Feb 2021 14:42:18 +0000 (15:42 +0100)]
Merge pull request #3101 from jake-arkinstall/issue-3100
Addressed issue #3100 - removing an unnecessary write to the include directory
Martin Kroeker [Thu, 11 Feb 2021 08:26:15 +0000 (09:26 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:25:36 +0000 (09:25 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:51 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:16 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:23:05 +0000 (09:23 +0100)]
Enable optimized srot/drot kernels from Haswell
Martin Kroeker [Thu, 11 Feb 2021 07:56:46 +0000 (08:56 +0100)]
Merge pull request #3102 from martin-frbg/issue3099
Strip pkgversion info from compiler version string before comparing
Martin Kroeker [Wed, 10 Feb 2021 13:22:59 +0000 (14:22 +0100)]
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation
Martin Kroeker [Wed, 10 Feb 2021 13:17:24 +0000 (14:17 +0100)]
Merge pull request #12 from xianyi/develop
rebase
Jake Arkinstall [Wed, 10 Feb 2021 12:11:17 +0000 (12:11 +0000)]
Addressed issue #3100, removing an unnecessary write to the include directory
Martin Kroeker [Tue, 2 Feb 2021 12:36:17 +0000 (13:36 +0100)]
Merge pull request #3094 from xoviat/patch-1
build openmp on appveyor
Martin Kroeker [Tue, 2 Feb 2021 12:33:15 +0000 (13:33 +0100)]
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
Martin Kroeker [Tue, 2 Feb 2021 09:53:46 +0000 (10:53 +0100)]
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
Martin Kroeker [Mon, 1 Feb 2021 20:02:53 +0000 (21:02 +0100)]
remove spurious lines (probably editor malfunction)
Martin Kroeker [Mon, 1 Feb 2021 19:18:53 +0000 (20:18 +0100)]
handle AppleClang in Cooperlake support condition
Martin Kroeker [Mon, 1 Feb 2021 18:45:25 +0000 (19:45 +0100)]
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)
xoviat [Sun, 31 Jan 2021 03:28:12 +0000 (21:28 -0600)]
appveyor: cleanup and add openmp run
Martin Kroeker [Sun, 31 Jan 2021 17:02:41 +0000 (18:02 +0100)]
Merge pull request #3073 from xoviat/embedded
add embedded option
Martin Kroeker [Sat, 30 Jan 2021 21:21:28 +0000 (22:21 +0100)]
Merge pull request #3093 from martin-frbg/fix3064
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:46:25 +0000 (16:46 +0100)]
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:23:37 +0000 (16:23 +0100)]
Merge pull request #3092 from RajalakshmiSR/cscal_p10
Optimize cscal function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 29 Jan 2021 19:51:43 +0000 (13:51 -0600)]
Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Fri, 29 Jan 2021 12:37:23 +0000 (13:37 +0100)]
Merge pull request #3091 from martin-frbg/lapack477-2
Fix calculation of the non-exceptional shift values in LAPACK complex QZ
Martin Kroeker [Fri, 29 Jan 2021 09:45:36 +0000 (10:45 +0100)]
fix data type
Martin Kroeker [Fri, 29 Jan 2021 08:56:12 +0000 (09:56 +0100)]
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477)
Martin Kroeker [Fri, 29 Jan 2021 08:52:21 +0000 (09:52 +0100)]
Merge pull request #11 from xianyi/develop
rebase
Martin Kroeker [Wed, 27 Jan 2021 18:11:55 +0000 (19:11 +0100)]
Merge pull request #3087 from martin-frbg/lapack477
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
Martin Kroeker [Wed, 27 Jan 2021 12:41:45 +0000 (13:41 +0100)]
Add exceptional shift to fix rare convergence problems
Martin Kroeker [Wed, 27 Jan 2021 12:39:26 +0000 (13:39 +0100)]
Merge pull request #10 from xianyi/develop
rebase
Martin Kroeker [Wed, 27 Jan 2021 12:25:45 +0000 (13:25 +0100)]
Merge pull request #3076 from martin-frbg/dyn-thunderx
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
Martin Kroeker [Tue, 26 Jan 2021 19:11:42 +0000 (20:11 +0100)]
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
Martin Kroeker [Tue, 26 Jan 2021 14:13:35 +0000 (15:13 +0100)]
Merge pull request #3083 from martin-frbg/develop
Add DYNAMIC_LIST support for ARM64
Martin Kroeker [Mon, 25 Jan 2021 18:02:21 +0000 (19:02 +0100)]
Remove the VORTEX support bits again for now
Martin Kroeker [Mon, 25 Jan 2021 12:13:20 +0000 (13:13 +0100)]
Add DYNAMIC_LIST support for ARM64
Alex Henrie [Mon, 25 Jan 2021 05:20:44 +0000 (22:20 -0700)]
Fix null pointer check in blas_memory_alloc
Martin Kroeker [Sun, 24 Jan 2021 22:18:52 +0000 (23:18 +0100)]
Add DYNAMIC_LIST support for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:01 +0000 (23:18 +0100)]
Add DYNAMIC_LIST option for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:14:45 +0000 (23:14 +0100)]
Merge pull request #9 from xianyi/develop
rebase
Martin Kroeker [Sun, 24 Jan 2021 18:03:40 +0000 (19:03 +0100)]
Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
Rajalakshmi Srinivasaraghavan [Sun, 24 Jan 2021 13:48:28 +0000 (07:48 -0600)]
Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
xoviat [Sun, 24 Jan 2021 04:12:17 +0000 (22:12 -0600)]
add functions for embedded
Martin Kroeker [Sat, 23 Jan 2021 18:08:05 +0000 (19:08 +0100)]
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel