Martin Kroeker [Sun, 17 Feb 2019 10:36:04 +0000 (11:36 +0100)]
Merge pull request #1988 from TiborGY/patch-1
Reword/expand comments in Makefile.rule
TiborGY [Sat, 16 Feb 2019 22:26:13 +0000 (23:26 +0100)]
fix the the
Martin Kroeker [Sat, 16 Feb 2019 17:05:40 +0000 (18:05 +0100)]
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
TiborGY [Sat, 16 Feb 2019 11:12:39 +0000 (12:12 +0100)]
Update Makefile.rule
add note about NUM_THREADS for package maintainers, add examples of programs that cause affinity troubles
Martin Kroeker [Fri, 15 Feb 2019 14:08:16 +0000 (15:08 +0100)]
Fix wrong constraints in inline assembly
for #2009
Martin Kroeker [Fri, 15 Feb 2019 14:02:54 +0000 (15:02 +0100)]
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
Martin Kroeker [Fri, 15 Feb 2019 09:10:04 +0000 (10:10 +0100)]
Rename operands to put lda on the input/output constraint list
Martin Kroeker [Fri, 15 Feb 2019 08:57:59 +0000 (09:57 +0100)]
Merge pull request #2020 from martin-frbg/issue1956
With the Intel compiler on Linux, prefer ifort for the final link step
Martin Kroeker [Thu, 14 Feb 2019 21:57:30 +0000 (22:57 +0100)]
With the Intel compiler on Linux, prefer ifort for the final link step
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
Martin Kroeker [Thu, 14 Feb 2019 21:43:18 +0000 (22:43 +0100)]
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
Martin Kroeker [Thu, 14 Feb 2019 20:55:11 +0000 (21:55 +0100)]
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
Bart Oldeman [Thu, 14 Feb 2019 16:19:41 +0000 (16:19 +0000)]
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
Martin Kroeker [Thu, 14 Feb 2019 14:21:36 +0000 (15:21 +0100)]
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function (#2017)
* Fix missing clobber in blas_quickdivide assembly
Martin Kroeker [Thu, 14 Feb 2019 08:29:34 +0000 (09:29 +0100)]
Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
Martin Kroeker [Wed, 13 Feb 2019 21:08:37 +0000 (22:08 +0100)]
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
Martin Kroeker [Wed, 13 Feb 2019 21:06:41 +0000 (22:06 +0100)]
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
Martin Kroeker [Wed, 13 Feb 2019 19:15:56 +0000 (20:15 +0100)]
Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
maamountki [Wed, 13 Feb 2019 19:06:25 +0000 (21:06 +0200)]
[ZARCH] Modify constraints
maamountki [Wed, 13 Feb 2019 10:54:35 +0000 (12:54 +0200)]
[ZARCH] Fix caxpy
Martin Kroeker [Tue, 12 Feb 2019 22:24:02 +0000 (23:24 +0100)]
Merge pull request #2010 from martin-frbg/issue2009
Fix declaration of input arguments in x86_64 GEMV, SYMV and DSCAL
Martin Kroeker [Tue, 12 Feb 2019 15:14:02 +0000 (16:14 +0100)]
Fix declaration of arguments in inline assembly
Argument 0 is modified so should be input and output
Martin Kroeker [Tue, 12 Feb 2019 15:00:18 +0000 (16:00 +0100)]
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
Arguments 0 and 1 are both input and output
Martin Kroeker [Tue, 12 Feb 2019 14:51:43 +0000 (15:51 +0100)]
Fix declaration of input arguments in inline assembly
Argument 0 is modified as it doubles as a counter
Martin Kroeker [Tue, 12 Feb 2019 14:33:48 +0000 (15:33 +0100)]
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
Arguments 0 and 1 need to be tagged as both input and output
maamountki [Tue, 12 Feb 2019 11:12:28 +0000 (13:12 +0200)]
[ZARCH] Fix cgemv_t_4
maamountki [Mon, 11 Feb 2019 14:01:13 +0000 (16:01 +0200)]
[ZARCH] Fix constraints and source code formatting
Martin Kroeker [Sun, 10 Feb 2019 22:24:45 +0000 (23:24 +0100)]
Fix potential memory leak in cpu enumeration on Linux (#2008)
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
Martin Kroeker [Thu, 7 Feb 2019 19:06:13 +0000 (20:06 +0100)]
Restore dropped patches in the non-TLS branch of memory.c (#2004)
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002, the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450, #1468, #1501, #1504 and #1520.
maamountki [Wed, 6 Feb 2019 18:11:44 +0000 (20:11 +0200)]
[ZARCH] Undo the last commit
Martin Kroeker [Wed, 6 Feb 2019 07:39:24 +0000 (08:39 +0100)]
Merge pull request #2001 from martin-frbg/cmake-dynlist
Support DYNAMIC_LIST option in cmake
Martin Kroeker [Tue, 5 Feb 2019 23:29:30 +0000 (00:29 +0100)]
Merge pull request #2000 from martin-frbg/issue1989
Make c_check robust against old or incomplete perl installations
Martin Kroeker [Tue, 5 Feb 2019 22:51:40 +0000 (23:51 +0100)]
Support DYNAMIC_LIST option in cmake
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
Martin Kroeker [Tue, 5 Feb 2019 21:02:11 +0000 (22:02 +0100)]
Merge pull request #1999 from martin-frbg/issue1996-2
fix second instance of complex.h for c++ as well
Martin Kroeker [Tue, 5 Feb 2019 19:06:34 +0000 (20:06 +0100)]
Make c_check robust against old or incomplete perl installations
by catching and working around failures to load modules, and avoiding object-oriented syntax in tempfile creation.
Fixes #1989
Martin Kroeker [Tue, 5 Feb 2019 18:29:33 +0000 (19:29 +0100)]
fix second instance of complex.h for c++ as well
maamountki [Tue, 5 Feb 2019 17:17:08 +0000 (19:17 +0200)]
[ZARCH] Set alignment hint for vl/vst
Martin Kroeker [Tue, 5 Feb 2019 16:39:59 +0000 (17:39 +0100)]
Merge pull request #1998 from martin-frbg/issue1992
Include complex rather than complex.h in C++ contexts
Martin Kroeker [Tue, 5 Feb 2019 12:30:13 +0000 (13:30 +0100)]
Include complex rather than complex.h in C++ contexts
to avoid name clashes e.g. with boost headers that use I as a generic placeholder.
Fixes #1992 as suggested by aprokop in that issue ticket.
maamountki [Tue, 5 Feb 2019 05:51:19 +0000 (07:51 +0200)]
[ZARCH] Fix copy constraint
maamountki [Tue, 5 Feb 2019 05:30:38 +0000 (07:30 +0200)]
[ZARCH] Format source code, Fix constraints
maamountki [Tue, 5 Feb 2019 05:25:38 +0000 (07:25 +0200)]
Merge pull request #1 from xianyi/develop
Update
Martin Kroeker [Mon, 4 Feb 2019 15:52:04 +0000 (16:52 +0100)]
Merge pull request #1996 from quickwritereader/develop
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
Ubuntu [Mon, 4 Feb 2019 15:41:56 +0000 (15:41 +0000)]
Note for unused kernels
Ubuntu [Mon, 4 Feb 2019 06:57:11 +0000 (06:57 +0000)]
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
Martin Kroeker [Fri, 1 Feb 2019 20:04:47 +0000 (21:04 +0100)]
Merge pull request #1994 from quickwritereader/develop
sgemv cgemv pairs
Ubuntu [Fri, 1 Feb 2019 13:45:00 +0000 (13:45 +0000)]
sgemv cgemv pairs
Martin Kroeker [Fri, 1 Feb 2019 11:58:59 +0000 (12:58 +0100)]
Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
Martin Kroeker [Fri, 1 Feb 2019 11:57:01 +0000 (12:57 +0100)]
Delete misplaced file sgemv_t_4.c
from #1993 , file should have gone into kernel/zarch
Martin Kroeker [Thu, 31 Jan 2019 20:27:00 +0000 (21:27 +0100)]
Merge pull request #1993 from martin-frbg/aarnes-zarch
Various fixes for the new Z14 target
Martin Kroeker [Thu, 31 Jan 2019 20:24:55 +0000 (21:24 +0100)]
Improve the z14 SGEMVT kernel
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:22:26 +0000 (21:22 +0100)]
Fix precision of zarch DSDOT
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:21:40 +0000 (21:21 +0100)]
Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:18:09 +0000 (21:18 +0100)]
USE_TRMM on Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:16:44 +0000 (21:16 +0100)]
Add cache sizes for Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:15:50 +0000 (21:15 +0100)]
Add FORCE Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:14:37 +0000 (21:14 +0100)]
Add parameters for Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:13:46 +0000 (21:13 +0100)]
Add Z14 target
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 18:10:03 +0000 (19:10 +0100)]
Merge pull request #1991 from maamountki/z14
[ZARCH] Z14 Support, BLAS 1/2 single precision implementations
maamountki [Thu, 31 Jan 2019 17:36:41 +0000 (19:36 +0200)]
Merge branch 'develop' into z14
maamountki [Thu, 31 Jan 2019 17:11:11 +0000 (19:11 +0200)]
[ZARCH] Add Z13 version for max/min functions
maamountki [Thu, 31 Jan 2019 16:52:11 +0000 (18:52 +0200)]
[ZARCH] Improve loading performance for camax/icamax
Martin Kroeker [Thu, 31 Jan 2019 14:27:21 +0000 (15:27 +0100)]
Fix wrong comparison that made IMIN identical to IMAX
as reported by aarnez in #1990
Martin Kroeker [Thu, 31 Jan 2019 14:25:15 +0000 (15:25 +0100)]
Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
maamountki [Thu, 31 Jan 2019 07:26:50 +0000 (09:26 +0200)]
Remove ztest
maamountki [Tue, 29 Jan 2019 15:59:38 +0000 (17:59 +0200)]
[ZARCH] Fix bug in max/min functions
maamountki [Tue, 29 Jan 2019 01:47:49 +0000 (03:47 +0200)]
[ZARCH] Fix icamax/icamin
maamountki [Mon, 28 Jan 2019 15:52:23 +0000 (17:52 +0200)]
[ZARCH] Fix iamax/imax single precision
maamountki [Mon, 28 Jan 2019 15:32:24 +0000 (17:32 +0200)]
[ZARCH] Undo the last commit
maamountki [Mon, 28 Jan 2019 15:16:18 +0000 (17:16 +0200)]
[ZARCH] Fix bug in iamax/iamin/imax/imin
Martin Kroeker [Mon, 28 Jan 2019 14:44:57 +0000 (15:44 +0100)]
Merge pull request #1985 from martin-frbg/issue1984
Correct naming of getrf_parallel object
Martin Kroeker [Mon, 28 Jan 2019 14:44:42 +0000 (15:44 +0100)]
Merge pull request #1981 from edisongustavo/develop
Fix include directory of exported targets
Martin Kroeker [Mon, 28 Jan 2019 14:43:35 +0000 (15:43 +0100)]
Merge pull request #1978 from danielgindi/feature/msvc_cmake
Better support for MSVC/Windows in CMake (v0.3.x)
Martin Kroeker [Mon, 28 Jan 2019 14:42:57 +0000 (15:42 +0100)]
Merge pull request #1962 from brada4/r
Modrenize R benchmarks slightly
TiborGY [Sun, 27 Jan 2019 16:22:26 +0000 (17:22 +0100)]
Update Makefile.rule
Revert generate to install, explain the nature of the affinity conflict
TiborGY [Sun, 27 Jan 2019 14:33:00 +0000 (15:33 +0100)]
Reword/expand comments in Makefile.rule
Lots of small changes in the wording of the comments, plus an expansion of the NUM_THREADS and NO_AFFINITY sections.
Martin Kroeker [Sat, 26 Jan 2019 21:25:29 +0000 (22:25 +0100)]
Merge pull request #1987 from martin-frbg/issue1961
Change ARMV8 target with BINARY=32 to ARMV7 automatically
Martin Kroeker [Sat, 26 Jan 2019 16:52:33 +0000 (17:52 +0100)]
Change ARMV8 target to ARMV7 for BINARY=32
Martin Kroeker [Sat, 26 Jan 2019 16:47:22 +0000 (17:47 +0100)]
Change ARMV8 target to ARMV7 when BINARY32 is set
fixes #1961
Martin Kroeker [Fri, 25 Jan 2019 23:45:45 +0000 (00:45 +0100)]
Correct naming of getrf_parallel object
fixes #1984
Martin Kroeker [Thu, 24 Jan 2019 08:17:48 +0000 (09:17 +0100)]
Merge pull request #1971 from martin-frbg/trsm-threshold
Shift transition to multithreading towards larger matrix sizes
Edison Gustavo Muenz [Wed, 23 Jan 2019 14:09:13 +0000 (15:09 +0100)]
Fix include directory of exported targets
Martin Kroeker [Wed, 23 Jan 2019 09:03:00 +0000 (10:03 +0100)]
Avoid penalizing tall skinny matrices
Martin Kroeker [Tue, 22 Jan 2019 20:10:38 +0000 (21:10 +0100)]
Merge pull request #1980 from martin-frbg/issue1979
Report SkylakeX as Haswell if compiler does not support AVX512
Martin Kroeker [Tue, 22 Jan 2019 17:55:43 +0000 (18:55 +0100)]
Syntax fix
Martin Kroeker [Tue, 22 Jan 2019 17:47:12 +0000 (18:47 +0100)]
Report SkylakeX as Haswell if compiler does not support AVX512
... or make was invoked with NO_AVX512=1
Daniel Cohen Gindi [Tue, 22 Jan 2019 12:38:01 +0000 (14:38 +0200)]
Adjust test script for correct deployment
Martin Kroeker [Tue, 22 Jan 2019 11:32:24 +0000 (12:32 +0100)]
Use VERSION_LESS for comparisons involving software version numbers
Daniel Cohen Gindi [Mon, 21 Jan 2019 06:35:23 +0000 (08:35 +0200)]
Better support for MSVC/Windows in CMake
maamountki [Mon, 21 Jan 2019 13:56:04 +0000 (15:56 +0200)]
[ZARCH] Update max/min functions
Martin Kroeker [Sun, 20 Jan 2019 19:30:11 +0000 (20:30 +0100)]
Merge pull request #1973 from martin-frbg/issue1464
Increase Zen SWITCH_RATIO to 16
Martin Kroeker [Sun, 20 Jan 2019 11:18:53 +0000 (12:18 +0100)]
Fix compilation with NO_AVX=1 set
fixes #1974
Martin Kroeker [Sat, 19 Jan 2019 22:01:31 +0000 (23:01 +0100)]
Increase Zen SWITCH_RATIO to 16
following GEMM benchmarks on Ryzen2700X. For #1464
Martin Kroeker [Fri, 18 Jan 2019 23:10:01 +0000 (00:10 +0100)]
Shift transition to multithreading towards larger matrix sizes
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
Martin Kroeker [Fri, 18 Jan 2019 07:11:39 +0000 (08:11 +0100)]
Fix declaration of input arguments in the Sandybridge GER microkernels (#1967)
* Tag arguments 0 and 1 as both input and output
Martin Kroeker [Fri, 18 Jan 2019 07:11:07 +0000 (08:11 +0100)]
Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966)
* Tag arguments 0 and 1 as both input and output (see #1964)
Martin Kroeker [Thu, 17 Jan 2019 22:20:32 +0000 (23:20 +0100)]
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965)
* Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292)
Martin Kroeker [Thu, 17 Jan 2019 15:42:11 +0000 (16:42 +0100)]
Merge pull request #1970 from quickwritereader/develop
crot fix
Martin Kroeker [Thu, 17 Jan 2019 15:19:03 +0000 (16:19 +0100)]
Bump xcode version to 10.1 to make sure it handles AVX512
Ubuntu [Thu, 17 Jan 2019 14:45:31 +0000 (14:45 +0000)]
crot fix
Martin Kroeker [Wed, 16 Jan 2019 17:41:03 +0000 (18:41 +0100)]
Merge pull request #1963 from quickwritereader/develop
Blas1 single missing kernels implemented with vector builtins