Martin Kroeker [Sun, 4 Apr 2021 21:12:17 +0000 (23:12 +0200)]
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds
Martin Kroeker [Sun, 4 Apr 2021 10:31:22 +0000 (12:31 +0200)]
Update Makefile.x86_64
Martin Kroeker [Sat, 3 Apr 2021 20:18:15 +0000 (22:18 +0200)]
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
for #3139
Martin Kroeker [Sat, 3 Apr 2021 19:58:36 +0000 (21:58 +0200)]
Merge pull request #22 from xianyi/develop
rebase
Martin Kroeker [Sat, 3 Apr 2021 17:49:47 +0000 (19:49 +0200)]
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read
Remove Unnecessary/Erroneous Adds/Reads In sgemm_tcopy_16.S COPY1x8 Macro
Martin Kroeker [Thu, 1 Apr 2021 19:20:24 +0000 (21:20 +0200)]
Merge pull request #3171 from RajalakshmiSR/BE_p10
POWER10: Adding check for little endian
Rajalakshmi Srinivasaraghavan [Thu, 1 Apr 2021 02:32:42 +0000 (21:32 -0500)]
POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
CodesWithWolves [Wed, 31 Mar 2021 19:38:07 +0000 (15:38 -0400)]
Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.
Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
Martin Kroeker [Sat, 27 Mar 2021 11:40:42 +0000 (12:40 +0100)]
Merge pull request #3167 from xianyi/fix3126
Fix compilation of the benchmarks on older OSX versions
Martin Kroeker [Fri, 26 Mar 2021 21:29:29 +0000 (22:29 +0100)]
Fix compilation on older OSX versions
Martin Kroeker [Wed, 24 Mar 2021 13:05:34 +0000 (14:05 +0100)]
Merge pull request #3165 from martin-frbg/azure-osx
Add OSX build to Azure
Martin Kroeker [Wed, 24 Mar 2021 09:36:29 +0000 (10:36 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 24 Mar 2021 09:34:24 +0000 (10:34 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:54:30 +0000 (08:54 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:41:48 +0000 (08:41 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 07:30:48 +0000 (08:30 +0100)]
Update azure-pipelines.yml
Martin Kroeker [Wed, 24 Mar 2021 06:50:35 +0000 (07:50 +0100)]
Add OSX build to Azure
Martin Kroeker [Wed, 24 Mar 2021 05:56:10 +0000 (06:56 +0100)]
Merge pull request #3164 from martin-frbg/travisosxomp
Fix xcode12 build on Travis and add OSX/OpenMP job
Martin Kroeker [Wed, 24 Mar 2021 05:55:14 +0000 (06:55 +0100)]
Fix xcode12 build and add OSX/OpenMP
Martin Kroeker [Mon, 22 Mar 2021 16:53:43 +0000 (17:53 +0100)]
Merge pull request #20 from xianyi/develop
rebase
Martin Kroeker [Fri, 19 Mar 2021 19:53:21 +0000 (20:53 +0100)]
Merge pull request #3158 from austinpagan/Gemm.CZPQ
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
Martin Kroeker [Fri, 19 Mar 2021 14:23:12 +0000 (15:23 +0100)]
Merge pull request #3157 from martin-frbg/issue3020-final
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
Martin Kroeker [Fri, 19 Mar 2021 14:22:48 +0000 (15:22 +0100)]
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
Gordon Fossum [Fri, 19 Mar 2021 14:05:23 +0000 (10:05 -0400)]
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
Martin Kroeker [Fri, 19 Mar 2021 10:47:58 +0000 (11:47 +0100)]
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler
Martin Kroeker [Fri, 19 Mar 2021 10:46:25 +0000 (11:46 +0100)]
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
Martin Kroeker [Fri, 19 Mar 2021 10:44:31 +0000 (11:44 +0100)]
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
Martin Kroeker [Fri, 19 Mar 2021 08:55:31 +0000 (09:55 +0100)]
Merge pull request #3155 from martin-frbg/issue3152
Fix recent SGEMM_direct breakage on SkylakeX and Cooperlake
Martin Kroeker [Thu, 18 Mar 2021 20:53:50 +0000 (21:53 +0100)]
Remove premature entry for DOMATCOPY_RT
Martin Kroeker [Thu, 18 Mar 2021 20:28:19 +0000 (21:28 +0100)]
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time
Martin Kroeker [Thu, 18 Mar 2021 11:35:47 +0000 (12:35 +0100)]
Merge pull request #3154 from martin-frbg/issue3153
Fix premature include in getarch_2nd
Martin Kroeker [Thu, 18 Mar 2021 06:50:19 +0000 (07:50 +0100)]
Include just the definition of BLASLONG rather than all of common.h
Martin Kroeker [Thu, 18 Mar 2021 06:47:03 +0000 (07:47 +0100)]
Merge pull request #19 from xianyi/develop
rebase
Martin Kroeker [Wed, 17 Mar 2021 20:14:42 +0000 (21:14 +0100)]
Update version to 0.3.14.dev
Martin Kroeker [Wed, 17 Mar 2021 20:14:05 +0000 (21:14 +0100)]
Update version to 0.3.14.dev
Martin Kroeker [Wed, 17 Mar 2021 20:13:25 +0000 (21:13 +0100)]
Merge pull request #3151 from xianyi/release-0.3.0
Merge 0.3.14 release branch back into develop to acquire tag
Martin Kroeker [Wed, 17 Mar 2021 19:21:42 +0000 (20:21 +0100)]
Merge pull request #3150 from xianyi/develop
Update branch from develop for 0.3.14 release
Martin Kroeker [Wed, 17 Mar 2021 19:20:34 +0000 (20:20 +0100)]
Update version to 0.3.14 for release
Martin Kroeker [Wed, 17 Mar 2021 19:20:00 +0000 (20:20 +0100)]
Update version to 0.3.14 for release
Martin Kroeker [Wed, 17 Mar 2021 19:14:50 +0000 (20:14 +0100)]
Merge pull request #3149 from martin-frbg/changelog14
Update Changelog for 0.3.14
Martin Kroeker [Wed, 17 Mar 2021 15:17:55 +0000 (16:17 +0100)]
Update Changelog for 0.3.14
Martin Kroeker [Wed, 17 Mar 2021 08:05:43 +0000 (09:05 +0100)]
Merge pull request #3148 from martin-frbg/issue3145
Add workaround for older gcc on big-endian ppc64 not supporting casts in defines
Martin Kroeker [Tue, 16 Mar 2021 20:20:05 +0000 (21:20 +0100)]
Add workaround for older gcc on ppc64be not supporting casts in defines
Martin Kroeker [Tue, 16 Mar 2021 20:09:45 +0000 (21:09 +0100)]
Merge pull request #18 from xianyi/develop
rebase
Martin Kroeker [Tue, 16 Mar 2021 19:25:42 +0000 (20:25 +0100)]
Merge pull request #3147 from martin-frbg/issue3146
Fix DYNAMIC_ARCH builds with CLANG on ppc64
Martin Kroeker [Tue, 16 Mar 2021 15:52:57 +0000 (16:52 +0100)]
Fix compilation with CLANG
Martin Kroeker [Sun, 14 Mar 2021 22:12:55 +0000 (23:12 +0100)]
Merge pull request #3143 from martin-frbg/fix3088
Resolve circular dependency between common.h and param.h
Martin Kroeker [Sun, 14 Mar 2021 22:12:33 +0000 (23:12 +0100)]
Merge pull request #3144 from xoviat/fix-test
disable openmp
xoviat [Sun, 14 Mar 2021 21:34:02 +0000 (16:34 -0500)]
disable openmp
Martin Kroeker [Sun, 14 Mar 2021 16:36:51 +0000 (17:36 +0100)]
remove inclusion of common.h again to avoid circular dependency
Martin Kroeker [Sun, 14 Mar 2021 16:28:43 +0000 (17:28 +0100)]
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies
Martin Kroeker [Sun, 14 Mar 2021 16:20:49 +0000 (17:20 +0100)]
Merge pull request #17 from xianyi/develop
rebase
Martin Kroeker [Sun, 14 Mar 2021 16:14:28 +0000 (17:14 +0100)]
Merge pull request #3088 from xoviat/msvc
add misc fixes.
Martin Kroeker [Sat, 13 Mar 2021 22:04:53 +0000 (23:04 +0100)]
Merge pull request #3141 from martin-frbg/nagfor-2
Leave out ARM64 march/mtune options when compiling with nagfor
Martin Kroeker [Sat, 13 Mar 2021 19:16:18 +0000 (20:16 +0100)]
Support compilation with NAG fortran
Martin Kroeker [Fri, 12 Mar 2021 14:35:58 +0000 (15:35 +0100)]
Merge pull request #3140 from martin-frbg/issue3139
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
Martin Kroeker [Fri, 12 Mar 2021 11:46:19 +0000 (12:46 +0100)]
Merge pull request #3138 from martin-frbg/nagfor
Add support for compilation with the NAG Fortran compiler
Martin Kroeker [Fri, 12 Mar 2021 11:42:05 +0000 (12:42 +0100)]
Move includes under the ifdef for compilers w/o intrinsics support
Martin Kroeker [Thu, 11 Mar 2021 22:03:58 +0000 (23:03 +0100)]
Fix syntax
Martin Kroeker [Thu, 11 Mar 2021 14:17:48 +0000 (15:17 +0100)]
Merge pull request #3136 from austinpagan/Gemm.PQ
Modifying a couple parameters in the "POWER10"-specific section of pa…
Martin Kroeker [Thu, 11 Mar 2021 10:53:51 +0000 (11:53 +0100)]
Support compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:52:29 +0000 (11:52 +0100)]
Support compilation with nagfor
Martin Kroeker [Thu, 11 Mar 2021 10:51:09 +0000 (11:51 +0100)]
Support compilation with the NAG Fortran compiler
Martin Kroeker [Thu, 11 Mar 2021 10:48:37 +0000 (11:48 +0100)]
Merge pull request #16 from xianyi/develop
rebase
Martin Kroeker [Thu, 11 Mar 2021 06:18:05 +0000 (07:18 +0100)]
Merge pull request #3137 from RajalakshmiSR/zscal_p10
Optimize zscal function for POWER10
austinpagan [Wed, 10 Mar 2021 23:19:12 +0000 (18:19 -0500)]
Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM.
Rajalakshmi Srinivasaraghavan [Wed, 10 Mar 2021 23:15:33 +0000 (17:15 -0600)]
Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 6 Mar 2021 18:15:53 +0000 (19:15 +0100)]
Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 13:35:49 +0000 (14:35 +0100)]
Remove spurious AVX512 requirement and add AVX2/FMA3 guard
Martin Kroeker [Sat, 6 Mar 2021 08:13:59 +0000 (09:13 +0100)]
Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 5 Mar 2021 22:22:36 +0000 (16:22 -0600)]
Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Tue, 2 Mar 2021 20:27:21 +0000 (21:27 +0100)]
Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
Martin Kroeker [Tue, 2 Mar 2021 16:50:55 +0000 (17:50 +0100)]
Support timing Apple M1
Martin Kroeker [Tue, 2 Mar 2021 08:58:40 +0000 (09:58 +0100)]
Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
Martin Kroeker [Mon, 1 Mar 2021 20:00:10 +0000 (21:00 +0100)]
Fix AMD AOCC compiler detection
Martin Kroeker [Sun, 28 Feb 2021 21:13:09 +0000 (22:13 +0100)]
Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
Martin Kroeker [Sun, 28 Feb 2021 17:57:05 +0000 (18:57 +0100)]
Adjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:53:20 +0000 (18:53 +0100)]
Adjust build rules for ?chkee.F
Martin Kroeker [Sun, 28 Feb 2021 17:51:03 +0000 (18:51 +0100)]
Add rewritten cchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:50:26 +0000 (18:50 +0100)]
Add rewritten dchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:49:50 +0000 (18:49 +0100)]
Add rewritten schkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:49:10 +0000 (18:49 +0100)]
Add rewritten zchkee.F from Reference-LAPACK PR335
Martin Kroeker [Sun, 28 Feb 2021 17:47:06 +0000 (18:47 +0100)]
Delete zchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:52 +0000 (18:46 +0100)]
Delete schkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:38 +0000 (18:46 +0100)]
Delete dchkee.f
Martin Kroeker [Sun, 28 Feb 2021 17:46:08 +0000 (18:46 +0100)]
Delete cchkee.f
Martin Kroeker [Sat, 27 Feb 2021 18:15:49 +0000 (19:15 +0100)]
Merge pull request #3121 from RajalakshmiSR/mmarename
POWER10: Rename mma builtins
Rajalakshmi Srinivasaraghavan [Sat, 27 Feb 2021 02:56:34 +0000 (20:56 -0600)]
POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:
77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
Martin Kroeker [Fri, 26 Feb 2021 10:50:47 +0000 (11:50 +0100)]
Merge pull request #3120 from martin-frbg/3118-x
Fix use of undefined CC variable in f_check
Martin Kroeker [Fri, 26 Feb 2021 08:09:43 +0000 (09:09 +0100)]
fix undefined CC variable
Martin Kroeker [Fri, 26 Feb 2021 08:06:25 +0000 (09:06 +0100)]
Merge pull request #15 from xianyi/develop
rebase
Martin Kroeker [Fri, 26 Feb 2021 03:18:33 +0000 (04:18 +0100)]
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
Revert "Fix undefined CC in f_check (again)"
Martin Kroeker [Fri, 26 Feb 2021 03:18:04 +0000 (04:18 +0100)]
Revert "Fix undefined CC in f_check (again)"
Martin Kroeker [Thu, 25 Feb 2021 12:48:41 +0000 (13:48 +0100)]
Merge pull request #3118 from martin-frbg/issue3018-2
Fix undefined CC in f_check (again)
Martin Kroeker [Thu, 25 Feb 2021 12:47:34 +0000 (13:47 +0100)]
fix undefined CC again
Martin Kroeker [Thu, 25 Feb 2021 12:45:27 +0000 (13:45 +0100)]
Merge pull request #14 from xianyi/develop
rebase
Martin Kroeker [Wed, 24 Feb 2021 17:39:28 +0000 (18:39 +0100)]
Merge pull request #3117 from haampie/fix-perl
use /usr/bin/env perl
Martin Kroeker [Wed, 24 Feb 2021 17:38:25 +0000 (18:38 +0100)]
Merge pull request #3114 from martin-frbg/issue3113
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
Martin Kroeker [Wed, 24 Feb 2021 17:37:36 +0000 (18:37 +0100)]
Merge pull request #3115 from martin-frbg/issue2532
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
Harmen Stoppels [Wed, 24 Feb 2021 13:07:20 +0000 (14:07 +0100)]
use /usr/bin/env perl