Martin Kroeker [Wed, 28 Apr 2021 17:20:08 +0000 (19:20 +0200)]
Clean up misdeclaration of the dummy stand-in for A in ?ORGBR/?UNGBR workspace queries (Reference-LAPACK PR 468 and 530)
Martin Kroeker [Tue, 20 Apr 2021 19:30:28 +0000 (21:30 +0200)]
Merge pull request #24 from xianyi/develop
rebase
Martin Kroeker [Tue, 20 Apr 2021 05:31:07 +0000 (07:31 +0200)]
Merge pull request #3184 from martin-frbg/ctestfix
Fix obscure ctest crashes on OSX and add OSX builds to Azure CI
Martin Kroeker [Mon, 19 Apr 2021 20:27:08 +0000 (22:27 +0200)]
Add OSX build variations to Azure CI
Martin Kroeker [Mon, 19 Apr 2021 20:26:34 +0000 (22:26 +0200)]
Include cblas_test.h to achieve int/long size change with INTERFACE64
Martin Kroeker [Mon, 19 Apr 2021 20:24:12 +0000 (22:24 +0200)]
Merge pull request #23 from xianyi/develop
rebase
Martin Kroeker [Fri, 16 Apr 2021 12:52:12 +0000 (14:52 +0200)]
Merge pull request #3183 from martin-frbg/2715-x
Restore __volatile__ keyword in ARM64 DYNAMIC_ARCH detection mechanism
Martin Kroeker [Fri, 16 Apr 2021 08:27:32 +0000 (10:27 +0200)]
Restore __volatile__ keyword
Martin Kroeker [Wed, 14 Apr 2021 20:43:02 +0000 (22:43 +0200)]
Merge pull request #3181 from RajalakshmiSR/dgemmp10vp
POWER10: Improve dgemm performance
Rajalakshmi Srinivasaraghavan [Wed, 14 Apr 2021 03:30:06 +0000 (22:30 -0500)]
POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
Martin Kroeker [Sun, 11 Apr 2021 08:01:09 +0000 (10:01 +0200)]
Merge pull request #3179 from RajalakshmiSR/zgemvp10
POWER10: Optimized zgemv
Rajalakshmi Srinivasaraghavan [Sun, 11 Apr 2021 00:00:24 +0000 (19:00 -0500)]
POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
Martin Kroeker [Fri, 9 Apr 2021 11:38:05 +0000 (13:38 +0200)]
Merge pull request #3178 from martin-frbg/fix2864
Fix unwanted fallback to implicit typing in slanv2/dlanv2
Martin Kroeker [Fri, 9 Apr 2021 08:04:15 +0000 (10:04 +0200)]
Fix implicit typing of new variable TWO
Martin Kroeker [Fri, 9 Apr 2021 08:03:31 +0000 (10:03 +0200)]
Fix implicit typing of new variable TWO
Martin Kroeker [Wed, 7 Apr 2021 06:22:42 +0000 (08:22 +0200)]
Merge pull request #3177 from martin-frbg/issue3176
Use "old" compute(24) function with clang due to register limitations
Martin Kroeker [Wed, 7 Apr 2021 06:22:28 +0000 (08:22 +0200)]
Merge pull request #3175 from LYP951018/develop
Pass NO_AVX512 macro def when `DYNAMIC_ARCH` is enabled
Martin Kroeker [Tue, 6 Apr 2021 17:58:32 +0000 (19:58 +0200)]
Use "old" compute(24) function with clang due to register limitations
刘雨培 [Tue, 6 Apr 2021 16:10:41 +0000 (00:10 +0800)]
pass NO_AVX512 macro def
Martin Kroeker [Mon, 5 Apr 2021 11:39:17 +0000 (13:39 +0200)]
Merge pull request #3173 from martin-frbg/dyna-sse3
Fix spillover of host-specific build flags into the shared part of x86 DYNAMIC_ARCH builds
Martin Kroeker [Sun, 4 Apr 2021 21:12:17 +0000 (23:12 +0200)]
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds
Martin Kroeker [Sun, 4 Apr 2021 18:19:09 +0000 (20:19 +0200)]
Merge pull request #3172 from martin-frbg/lapack477-final
Copy missing fixes from the final revision of Reference-LAPACK PR477
Martin Kroeker [Sun, 4 Apr 2021 10:31:22 +0000 (12:31 +0200)]
Update Makefile.x86_64
Martin Kroeker [Sat, 3 Apr 2021 20:18:15 +0000 (22:18 +0200)]
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
for #3139
Martin Kroeker [Sat, 3 Apr 2021 20:11:14 +0000 (22:11 +0200)]
Fix typo and potentially undefined variables
(copies fixes made in Reference-LAPACK PR 477 after the initial cherrypick)
Martin Kroeker [Sat, 3 Apr 2021 19:58:36 +0000 (21:58 +0200)]
Merge pull request #22 from xianyi/develop
rebase
Martin Kroeker [Sat, 3 Apr 2021 17:49:47 +0000 (19:49 +0200)]
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read
Remove Unnecessary/Erroneous Adds/Reads In sgemm_tcopy_16.S COPY1x8 Macro
Martin Kroeker [Thu, 1 Apr 2021 19:20:24 +0000 (21:20 +0200)]
Merge pull request #3171 from RajalakshmiSR/BE_p10
POWER10: Adding check for little endian
Rajalakshmi Srinivasaraghavan [Thu, 1 Apr 2021 02:32:42 +0000 (21:32 -0500)]
POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
CodesWithWolves [Wed, 31 Mar 2021 19:38:07 +0000 (15:38 -0400)]
Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.
Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
Martin Kroeker [Sat, 27 Mar 2021 11:40:42 +0000 (12:40 +0100)]
Merge pull request #3167 from xianyi/fix3126
Fix compilation of the benchmarks on older OSX versions
Martin Kroeker [Fri, 26 Mar 2021 21:29:29 +0000 (22:29 +0100)]
Fix compilation on older OSX versions
Martin Kroeker [Wed, 24 Mar 2021 13:05:34 +0000 (14:05 +0100)]
Merge pull request #3165 from martin-frbg/azure-osx
Add OSX build to Azure
Martin Kroeker [Wed, 24 Mar 2021 09:36:29 +0000 (10:36 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 24 Mar 2021 09:34:24 +0000 (10:34 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:54:30 +0000 (08:54 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:41:48 +0000 (08:41 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:30:48 +0000 (08:30 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 06:50:35 +0000 (07:50 +0100)]
Add OSX build to Azure
Martin Kroeker [Wed, 24 Mar 2021 05:56:10 +0000 (06:56 +0100)]
Merge pull request #3164 from martin-frbg/travisosxomp
Fix xcode12 build on Travis and add OSX/OpenMP job
Martin Kroeker [Wed, 24 Mar 2021 05:55:14 +0000 (06:55 +0100)]
Fix xcode12 build and add OSX/OpenMP
Martin Kroeker [Mon, 22 Mar 2021 16:53:43 +0000 (17:53 +0100)]
Merge pull request #20 from xianyi/develop
rebase
Martin Kroeker [Fri, 19 Mar 2021 19:53:21 +0000 (20:53 +0100)]
Merge pull request #3158 from austinpagan/Gemm.CZPQ
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
Martin Kroeker [Fri, 19 Mar 2021 14:23:12 +0000 (15:23 +0100)]
Merge pull request #3157 from martin-frbg/issue3020-final
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
Martin Kroeker [Fri, 19 Mar 2021 14:22:48 +0000 (15:22 +0100)]
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
Gordon Fossum [Fri, 19 Mar 2021 14:05:23 +0000 (10:05 -0400)]
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
Martin Kroeker [Fri, 19 Mar 2021 10:47:58 +0000 (11:47 +0100)]
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler
Martin Kroeker [Fri, 19 Mar 2021 10:46:25 +0000 (11:46 +0100)]
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
Martin Kroeker [Fri, 19 Mar 2021 10:44:31 +0000 (11:44 +0100)]
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
Martin Kroeker [Fri, 19 Mar 2021 08:55:31 +0000 (09:55 +0100)]
Merge pull request #3155 from martin-frbg/issue3152
Fix recent SGEMM_direct breakage on SkylakeX and Cooperlake
Martin Kroeker [Thu, 18 Mar 2021 20:53:50 +0000 (21:53 +0100)]
Remove premature entry for DOMATCOPY_RT
Martin Kroeker [Thu, 18 Mar 2021 20:28:19 +0000 (21:28 +0100)]
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time
Martin Kroeker [Thu, 18 Mar 2021 11:35:47 +0000 (12:35 +0100)]
Merge pull request #3154 from martin-frbg/issue3153
Fix premature include in getarch_2nd
Martin Kroeker [Thu, 18 Mar 2021 06:50:19 +0000 (07:50 +0100)]
Include just the definition of BLASLONG rather than all of common.h
Martin Kroeker [Thu, 18 Mar 2021 06:47:03 +0000 (07:47 +0100)]
Merge pull request #19 from xianyi/develop
rebase
Martin Kroeker [Wed, 17 Mar 2021 20:14:42 +0000 (21:14 +0100)]
Update version to 0.3.14.dev
Martin Kroeker [Wed, 17 Mar 2021 20:14:05 +0000 (21:14 +0100)]
Update version to 0.3.14.dev
Martin Kroeker [Wed, 17 Mar 2021 20:13:25 +0000 (21:13 +0100)]
Merge pull request #3151 from xianyi/release-0.3.0
Merge 0.3.14 release branch back into develop to acquire tag
Martin Kroeker [Wed, 17 Mar 2021 19:21:42 +0000 (20:21 +0100)]
Merge pull request #3150 from xianyi/develop
Update branch from develop for 0.3.14 release
Martin Kroeker [Wed, 17 Mar 2021 19:20:34 +0000 (20:20 +0100)]
Update version to 0.3.14 for release
Martin Kroeker [Wed, 17 Mar 2021 19:20:00 +0000 (20:20 +0100)]
Update version to 0.3.14 for release
Martin Kroeker [Wed, 17 Mar 2021 19:14:50 +0000 (20:14 +0100)]
Merge pull request #3149 from martin-frbg/changelog14
Update Changelog for 0.3.14
Martin Kroeker [Wed, 17 Mar 2021 15:17:55 +0000 (16:17 +0100)]
Update Changelog for 0.3.14
Martin Kroeker [Wed, 17 Mar 2021 08:05:43 +0000 (09:05 +0100)]
Merge pull request #3148 from martin-frbg/issue3145
Add workaround for older gcc on big-endian ppc64 not supporting casts in defines
Martin Kroeker [Tue, 16 Mar 2021 20:20:05 +0000 (21:20 +0100)]
Add workaround for older gcc on ppc64be not supporting casts in defines
Martin Kroeker [Tue, 16 Mar 2021 20:09:45 +0000 (21:09 +0100)]
Merge pull request #18 from xianyi/develop
rebase
Martin Kroeker [Tue, 16 Mar 2021 19:25:42 +0000 (20:25 +0100)]
Merge pull request #3147 from martin-frbg/issue3146
Fix DYNAMIC_ARCH builds with CLANG on ppc64
Martin Kroeker [Tue, 16 Mar 2021 15:52:57 +0000 (16:52 +0100)]
Fix compilation with CLANG
Martin Kroeker [Sun, 14 Mar 2021 22:12:55 +0000 (23:12 +0100)]
Merge pull request #3143 from martin-frbg/fix3088
Resolve circular dependency between common.h and param.h
Martin Kroeker [Sun, 14 Mar 2021 22:12:33 +0000 (23:12 +0100)]
Merge pull request #3144 from xoviat/fix-test
disable openmp
xoviat [Sun, 14 Mar 2021 21:34:02 +0000 (16:34 -0500)]
disable openmp
Martin Kroeker [Sun, 14 Mar 2021 16:36:51 +0000 (17:36 +0100)]
remove inclusion of common.h again to avoid circular dependency
Martin Kroeker [Sun, 14 Mar 2021 16:28:43 +0000 (17:28 +0100)]
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies
Martin Kroeker [Sun, 14 Mar 2021 16:20:49 +0000 (17:20 +0100)]
Merge pull request #17 from xianyi/develop
rebase
Martin Kroeker [Sun, 14 Mar 2021 16:14:28 +0000 (17:14 +0100)]
Merge pull request #3088 from xoviat/msvc
add misc fixes.
Martin Kroeker [Sat, 13 Mar 2021 22:04:53 +0000 (23:04 +0100)]
Merge pull request #3141 from martin-frbg/nagfor-2
Leave out ARM64 march/mtune options when compiling with nagfor
Martin Kroeker [Sat, 13 Mar 2021 19:16:18 +0000 (20:16 +0100)]
Support compilation with NAG fortran
Martin Kroeker [Fri, 12 Mar 2021 14:35:58 +0000 (15:35 +0100)]
Merge pull request #3140 from martin-frbg/issue3139
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
Martin Kroeker [Fri, 12 Mar 2021 11:46:19 +0000 (12:46 +0100)]
Merge pull request #3138 from martin-frbg/nagfor
Add support for compilation with the NAG Fortran compiler
Martin Kroeker [Fri, 12 Mar 2021 11:42:05 +0000 (12:42 +0100)]
Move includes under the ifdef for compilers w/o intrinsics support
Martin Kroeker [Thu, 11 Mar 2021 22:03:58 +0000 (23:03 +0100)]
Fix syntax
Martin Kroeker [Thu, 11 Mar 2021 14:17:48 +0000 (15:17 +0100)]
Merge pull request #3136 from austinpagan/Gemm.PQ
Modifying a couple parameters in the "POWER10"-specific section of pa…
Martin Kroeker [Thu, 11 Mar 2021 10:53:51 +0000 (11:53 +0100)]
Support compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:52:29 +0000 (11:52 +0100)]
Support compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:51:09 +0000 (11:51 +0100)]
Support compilation with the NAG Fortran compiler
Martin Kroeker [Thu, 11 Mar 2021 10:48:37 +0000 (11:48 +0100)]
Merge pull request #16 from xianyi/develop
rebase
Martin Kroeker [Thu, 11 Mar 2021 06:18:05 +0000 (07:18 +0100)]
Merge pull request #3137 from RajalakshmiSR/zscal_p10
Optimize zscal function for POWER10
austinpagan [Wed, 10 Mar 2021 23:19:12 +0000 (18:19 -0500)]
Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM.
Rajalakshmi Srinivasaraghavan [Wed, 10 Mar 2021 23:15:33 +0000 (17:15 -0600)]
Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 6 Mar 2021 18:15:53 +0000 (19:15 +0100)]
Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 13:35:49 +0000 (14:35 +0100)]
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 08:13:59 +0000 (09:13 +0100)]
Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 5 Mar 2021 22:22:36 +0000 (16:22 -0600)]
Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Tue, 2 Mar 2021 20:27:21 +0000 (21:27 +0100)]
Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
Martin Kroeker [Tue, 2 Mar 2021 16:50:55 +0000 (17:50 +0100)]
Support timing Apple M1
Martin Kroeker [Tue, 2 Mar 2021 08:58:40 +0000 (09:58 +0100)]
Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
Martin Kroeker [Mon, 1 Mar 2021 20:00:10 +0000 (21:00 +0100)]
Fix AMD AOCC compiler detection
Martin Kroeker [Sun, 28 Feb 2021 21:13:09 +0000 (22:13 +0100)]
Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
Martin Kroeker [Sun, 28 Feb 2021 17:57:05 +0000 (18:57 +0100)]
Adjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:53:20 +0000 (18:53 +0100)]
Adjust build rules for ?chkee.F