Martin Kroeker [Fri, 7 May 2021 06:51:45 +0000 (08:51 +0200)]
Merge pull request #3223 from martin-frbg/develop
Use percent instead of ampersand as placeholder for substitutions
Martin Kroeker [Thu, 6 May 2021 18:20:08 +0000 (20:20 +0200)]
Use percent instead of ampersand as placeholder for substitutions
Martin Kroeker [Wed, 5 May 2021 12:30:41 +0000 (14:30 +0200)]
Merge pull request #3219 from austinpagan/Gemm.ErrorFix
Add error message token for SBGEMM in gemm.c
Martin Kroeker [Wed, 5 May 2021 12:30:24 +0000 (14:30 +0200)]
Merge pull request #3220 from drhpc/drhpc-fixup
Delete lapack_wrappers.c.orig
drhpc [Tue, 4 May 2021 19:02:07 +0000 (21:02 +0200)]
Delete lapack_wrappers.c.orig
This looks like a leftover from patching and confuses further patching;-)
Gordon Fossum [Tue, 4 May 2021 18:55:02 +0000 (13:55 -0500)]
Add error message token for SBGEMM in gemm.c
Martin Kroeker [Sun, 2 May 2021 22:01:08 +0000 (00:01 +0200)]
Update version to 0.3.15.dev
Martin Kroeker [Sun, 2 May 2021 22:00:29 +0000 (00:00 +0200)]
Update version to 0.3.15.dev
Martin Kroeker [Sun, 2 May 2021 21:59:55 +0000 (23:59 +0200)]
Merge pull request #3217 from xianyi/release-0.3.0
merge 0.3.15 back into develop to copy tag
Martin Kroeker [Sun, 2 May 2021 21:50:22 +0000 (23:50 +0200)]
Update version to 0.3.15
Martin Kroeker [Sun, 2 May 2021 21:49:49 +0000 (23:49 +0200)]
Update version to 0.3.15
Martin Kroeker [Sun, 2 May 2021 21:48:28 +0000 (23:48 +0200)]
Merge pull request #3216 from xianyi/develop
Update from develop for 0.3.15 release
Martin Kroeker [Sun, 2 May 2021 21:47:24 +0000 (23:47 +0200)]
Merge pull request #3215 from martin-frbg/cl0315
Update Changelog for 0.3.15
Martin Kroeker [Sun, 2 May 2021 21:46:55 +0000 (23:46 +0200)]
Update Changelog for 0.3.15
Martin Kroeker [Sun, 2 May 2021 21:40:03 +0000 (23:40 +0200)]
Merge pull request #3214 from martin-frbg/lapack-3.9.1hrt
Add new Householder Reconstruction functions from LAPACK 3.9.1
Martin Kroeker [Sun, 2 May 2021 18:47:58 +0000 (20:47 +0200)]
Add files via upload
Martin Kroeker [Sun, 2 May 2021 17:57:47 +0000 (19:57 +0200)]
Add LAPACKE interfaces for the new Householder Reconstruction functions from 3.9.1
Martin Kroeker [Sun, 2 May 2021 17:56:11 +0000 (19:56 +0200)]
Add entries for the new Householder Reconstruction functions from 3.9.1
Martin Kroeker [Sun, 2 May 2021 17:55:15 +0000 (19:55 +0200)]
Add entries for the new Householder Reconstruction functions from 3.9.1
Martin Kroeker [Sun, 2 May 2021 17:28:21 +0000 (19:28 +0200)]
Add new tests for Householder reconstruction functions from 3.9.1
Martin Kroeker [Sun, 2 May 2021 17:25:43 +0000 (19:25 +0200)]
Add new files for Householder reconstruction functions from 3.9.1
Martin Kroeker [Sun, 2 May 2021 17:21:59 +0000 (19:21 +0200)]
Add entries for Householder reconstruction functions from 3.9.1
Martin Kroeker [Sun, 2 May 2021 17:19:28 +0000 (19:19 +0200)]
Merge pull request #26 from xianyi/develop
rebase
Martin Kroeker [Sun, 2 May 2021 16:45:15 +0000 (18:45 +0200)]
Merge pull request #3213 from martin-frbg/lapack382
Avoid allocating the transposed triangular matrix in LAPACKE_xlantr_work (Reference-LAPACK 382)
Martin Kroeker [Sun, 2 May 2021 16:44:59 +0000 (18:44 +0200)]
Merge pull request #3212 from martin-frbg/lapack463
Initialize X and Y to zero for N=0 in xGGGLM (Reference-LAPACK PR463)
Martin Kroeker [Sun, 2 May 2021 16:44:29 +0000 (18:44 +0200)]
Merge pull request #3211 from martin-frbg/lapack471
Handle norm NaN value in xGESDD (Reference LAPACK PR471)
Martin Kroeker [Sun, 2 May 2021 10:18:17 +0000 (12:18 +0200)]
Avoid allocating the transposed triangular matrix (Reference-LAPACK PR382)
Martin Kroeker [Sun, 2 May 2021 09:40:56 +0000 (11:40 +0200)]
Initialize X and Y to zero for N=0 (Reference-LAPACK PR463)
Martin Kroeker [Sun, 2 May 2021 09:24:50 +0000 (11:24 +0200)]
Handle norm NaN value (Reference LAPACK PR471)
Martin Kroeker [Sun, 2 May 2021 07:02:11 +0000 (09:02 +0200)]
Merge pull request #3210 from martin-frbg/lapack502
Fix possible division by zero in LAPACK xTGSJA (Reference-LAPACK PR502)
Martin Kroeker [Sat, 1 May 2021 19:31:13 +0000 (21:31 +0200)]
Fix possible division by zero in xTGSJA (Reference-LAPACK PR502)
Martin Kroeker [Sat, 1 May 2021 18:18:29 +0000 (20:18 +0200)]
Merge pull request #3208 from martin-frbg/lapack534
Apply MKL team fixes to the LAPACKE interfaces (Reference-LAPACK PR 534)
Martin Kroeker [Sat, 1 May 2021 18:08:24 +0000 (20:08 +0200)]
Merge pull request #3209 from martin-frbg/issue3160
Add casts to prevent overflow of intermediate results
Martin Kroeker [Sat, 1 May 2021 12:48:19 +0000 (14:48 +0200)]
Add casts to prevent overflow of intermediate result
Martin Kroeker [Sat, 1 May 2021 12:47:22 +0000 (14:47 +0200)]
Add cast to prevent overflow of intermediate result
Martin Kroeker [Sat, 1 May 2021 11:22:10 +0000 (13:22 +0200)]
Apply MKL team fixes to the LAPACKE interfaces (Reference-LAPACK PR 534)
Removed spurious checks for INFO in xLACPY,xLASET after routines not returning any,and redundant requirements for ldvt in xGESVD_WORK
Martin Kroeker [Sat, 1 May 2021 11:10:16 +0000 (13:10 +0200)]
Add const qualifiers
Martin Kroeker [Sat, 1 May 2021 10:42:54 +0000 (12:42 +0200)]
Merge pull request #3207 from hjl-tools/hjl/cet/develop
x86: Enable Intel CET
H.J. Lu [Sat, 1 May 2021 01:01:14 +0000 (18:01 -0700)]
x86: Enable Intel CET
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
Martin Kroeker [Fri, 30 Apr 2021 19:42:44 +0000 (21:42 +0200)]
Merge pull request #3206 from martin-frbg/lapack480535
Import packing improvements to LAPACK xLAQR from Reference-LAPACK (PR 480+535)
Martin Kroeker [Fri, 30 Apr 2021 11:50:55 +0000 (13:50 +0200)]
Import packing improvements in LAPACK xLAQR from Reference-LAPACK PR 480+535
Martin Kroeker [Fri, 30 Apr 2021 11:47:17 +0000 (13:47 +0200)]
Merge pull request #25 from xianyi/develop
rebase
Martin Kroeker [Fri, 30 Apr 2021 11:25:48 +0000 (13:25 +0200)]
Merge pull request #3204 from martin-frbg/lapack506
Correct INFO value returned by SLASQ2/DLASQ2 (Reference-LAPACK 506)
Martin Kroeker [Fri, 30 Apr 2021 07:26:54 +0000 (09:26 +0200)]
correct INFO value (Reference-LAPACK 506)
Martin Kroeker [Thu, 29 Apr 2021 16:58:27 +0000 (18:58 +0200)]
Merge pull request #3202 from martin-frbg/issue3201
Fix division by zero in the non-x86 codepath of C/ZROTG
Martin Kroeker [Thu, 29 Apr 2021 07:47:18 +0000 (09:47 +0200)]
Fix division by zero in the non-x86 codepath
Martin Kroeker [Thu, 29 Apr 2021 03:39:50 +0000 (05:39 +0200)]
Merge pull request #3199 from martin-frbg/lapack537
Add LAPACKE fixes from Reference-LAPACK PR 537
Martin Kroeker [Thu, 29 Apr 2021 03:39:35 +0000 (05:39 +0200)]
Merge pull request #3198 from martin-frbg/lapack539
Apply fixes from Reference-LAPACK PR468 and 539 for array declarations in ?ORGBR/?UNGBR
Martin Kroeker [Wed, 28 Apr 2021 18:56:55 +0000 (20:56 +0200)]
Add missing break statements in the ?lascl functions
Martin Kroeker [Wed, 28 Apr 2021 18:55:37 +0000 (20:55 +0200)]
Add const qualifiers
Martin Kroeker [Wed, 28 Apr 2021 17:20:08 +0000 (19:20 +0200)]
Clean up misdeclaration of the dummy stand-in for A in ?ORGBR/?UNGBR workspace queries (Reference-LAPACK PR 468 and 530)
Martin Kroeker [Wed, 28 Apr 2021 16:17:25 +0000 (18:17 +0200)]
Merge pull request #3195 from martin-frbg/lapack536
Apply lapack-testing fix from Reference-LAPACK PR536
Martin Kroeker [Tue, 27 Apr 2021 13:48:22 +0000 (15:48 +0200)]
Apply lapack-testing fix from Reference-LAPACK PR536
fixes changing back from a single OMP thread for error exit testing to the originally requested number of threads for computational tests
Martin Kroeker [Tue, 27 Apr 2021 13:40:51 +0000 (15:40 +0200)]
Merge pull request #3193 from martin-frbg/lapack538
Apply lapack-testing fixes from Reference-LAPACK PR538
Martin Kroeker [Tue, 27 Apr 2021 11:35:16 +0000 (13:35 +0200)]
Merge pull request #3191 from martin-frbg/issue3188
Delay creation of the (soft)link until after the library has been built
Martin Kroeker [Tue, 27 Apr 2021 10:52:49 +0000 (12:52 +0200)]
Apply fixes from Reference-LAPACK PR538
Martin Kroeker [Tue, 27 Apr 2021 04:36:28 +0000 (06:36 +0200)]
Merge pull request #3190 from martin-frbg/issue3128-2
Replace spurious AVX512 requirement in the Haswell drot microkernel with an AVX2/FMA3 guard
Martin Kroeker [Mon, 26 Apr 2021 20:32:23 +0000 (22:32 +0200)]
delay creation of the softlink until after the library has been created
Martin Kroeker [Mon, 26 Apr 2021 19:55:30 +0000 (21:55 +0200)]
replace spurious avx512 requirement with fma check
Martin Kroeker [Thu, 22 Apr 2021 00:11:20 +0000 (02:11 +0200)]
Add mixed clang/ifort build on OSX to Azure CI (#3185)
* Add mixed clang/ifort build on OSX to the Azure CI config based on https://github.com/oneapi-src/oneapi-ci
(and remove debugging tools from the clang+gfortran job)
* Remove extraneous libgfortran dependency of ifort builds
* remove FEXTRALIB from link line of shared library as ifort keeps track of dependencies (and they are different for a .dylib than what f_check got for an executable)
Martin Kroeker [Tue, 20 Apr 2021 19:30:28 +0000 (21:30 +0200)]
Merge pull request #24 from xianyi/develop
rebase
Martin Kroeker [Tue, 20 Apr 2021 05:31:07 +0000 (07:31 +0200)]
Merge pull request #3184 from martin-frbg/ctestfix
Fix obscure ctest crashes on OSX and add OSX builds to Azure CI
Martin Kroeker [Mon, 19 Apr 2021 20:27:08 +0000 (22:27 +0200)]
Add OSX build variations to Azure CI
Martin Kroeker [Mon, 19 Apr 2021 20:26:34 +0000 (22:26 +0200)]
Include cblas_test.h to achieve int/long size change with INTERFACE64
Martin Kroeker [Mon, 19 Apr 2021 20:24:12 +0000 (22:24 +0200)]
Merge pull request #23 from xianyi/develop
rebase
Martin Kroeker [Fri, 16 Apr 2021 12:52:12 +0000 (14:52 +0200)]
Merge pull request #3183 from martin-frbg/2715-x
Restore __volatile__ keyword in ARM64 DYNAMIC_ARCH detection mechanism
Martin Kroeker [Fri, 16 Apr 2021 08:27:32 +0000 (10:27 +0200)]
Restore __volatile__ keyword
Martin Kroeker [Wed, 14 Apr 2021 20:43:02 +0000 (22:43 +0200)]
Merge pull request #3181 from RajalakshmiSR/dgemmp10vp
POWER10: Improve dgemm performance
Rajalakshmi Srinivasaraghavan [Wed, 14 Apr 2021 03:30:06 +0000 (22:30 -0500)]
POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
Martin Kroeker [Sun, 11 Apr 2021 08:01:09 +0000 (10:01 +0200)]
Merge pull request #3179 from RajalakshmiSR/zgemvp10
POWER10: Optimized zgemv
Rajalakshmi Srinivasaraghavan [Sun, 11 Apr 2021 00:00:24 +0000 (19:00 -0500)]
POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
Martin Kroeker [Fri, 9 Apr 2021 11:38:05 +0000 (13:38 +0200)]
Merge pull request #3178 from martin-frbg/fix2864
Fix unwanted fallback to implicit typing in slanv2/dlanv2
Martin Kroeker [Fri, 9 Apr 2021 08:04:15 +0000 (10:04 +0200)]
Fix implicit typing of new variable TWO
Martin Kroeker [Fri, 9 Apr 2021 08:03:31 +0000 (10:03 +0200)]
Fix implicit typing of new variable TWO
Martin Kroeker [Wed, 7 Apr 2021 06:22:42 +0000 (08:22 +0200)]
Merge pull request #3177 from martin-frbg/issue3176
Use "old" compute(24) function with clang due to register limitations
Martin Kroeker [Wed, 7 Apr 2021 06:22:28 +0000 (08:22 +0200)]
Merge pull request #3175 from LYP951018/develop
Pass NO_AVX512 macro def when `DYNAMIC_ARCH` is enabled
Martin Kroeker [Tue, 6 Apr 2021 17:58:32 +0000 (19:58 +0200)]
Use "old" compute(24) function with clang due to register limitations
刘雨培 [Tue, 6 Apr 2021 16:10:41 +0000 (00:10 +0800)]
pass NO_AVX512 macro def
Martin Kroeker [Mon, 5 Apr 2021 11:39:17 +0000 (13:39 +0200)]
Merge pull request #3173 from martin-frbg/dyna-sse3
Fix spillover of host-specific build flags into the shared part of x86 DYNAMIC_ARCH builds
Martin Kroeker [Sun, 4 Apr 2021 21:12:17 +0000 (23:12 +0200)]
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds
Martin Kroeker [Sun, 4 Apr 2021 18:19:09 +0000 (20:19 +0200)]
Merge pull request #3172 from martin-frbg/lapack477-final
Copy missing fixes from the final revision of Reference-LAPACK PR477
Martin Kroeker [Sun, 4 Apr 2021 10:31:22 +0000 (12:31 +0200)]
Update Makefile.x86_64
Martin Kroeker [Sat, 3 Apr 2021 20:18:15 +0000 (22:18 +0200)]
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
for #3139
Martin Kroeker [Sat, 3 Apr 2021 20:11:14 +0000 (22:11 +0200)]
Fix typo and potentially undefined variables
(copies fixes made in Reference-LAPACK PR 477 after the initial cherrypick)
Martin Kroeker [Sat, 3 Apr 2021 19:58:36 +0000 (21:58 +0200)]
Merge pull request #22 from xianyi/develop
rebase
Martin Kroeker [Sat, 3 Apr 2021 17:49:47 +0000 (19:49 +0200)]
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read
Remove Unnecessary/Erroneous Adds/Reads In sgemm_tcopy_16.S COPY1x8 Macro
Martin Kroeker [Thu, 1 Apr 2021 19:20:24 +0000 (21:20 +0200)]
Merge pull request #3171 from RajalakshmiSR/BE_p10
POWER10: Adding check for little endian
Rajalakshmi Srinivasaraghavan [Thu, 1 Apr 2021 02:32:42 +0000 (21:32 -0500)]
POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
CodesWithWolves [Wed, 31 Mar 2021 19:38:07 +0000 (15:38 -0400)]
Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.
Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
Martin Kroeker [Sat, 27 Mar 2021 11:40:42 +0000 (12:40 +0100)]
Merge pull request #3167 from xianyi/fix3126
Fix compilation of the benchmarks on older OSX versions
Martin Kroeker [Fri, 26 Mar 2021 21:29:29 +0000 (22:29 +0100)]
Fix compilation on older OSX versions
Martin Kroeker [Wed, 24 Mar 2021 13:05:34 +0000 (14:05 +0100)]
Merge pull request #3165 from martin-frbg/azure-osx
Add OSX build to Azure
Martin Kroeker [Wed, 24 Mar 2021 09:36:29 +0000 (10:36 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 24 Mar 2021 09:34:24 +0000 (10:34 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:54:30 +0000 (08:54 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:41:48 +0000 (08:41 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:30:48 +0000 (08:30 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 06:50:35 +0000 (07:50 +0100)]
Add OSX build to Azure
Martin Kroeker [Wed, 24 Mar 2021 05:56:10 +0000 (06:56 +0100)]
Merge pull request #3164 from martin-frbg/travisosxomp
Fix xcode12 build on Travis and add OSX/OpenMP job
Martin Kroeker [Wed, 24 Mar 2021 05:55:14 +0000 (06:55 +0100)]
Fix xcode12 build and add OSX/OpenMP