Martin Kroeker [Sat, 16 Feb 2019 17:36:39 +0000 (18:36 +0100)]
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
Martin Kroeker [Sat, 16 Feb 2019 17:24:11 +0000 (18:24 +0100)]
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
Martin Kroeker [Thu, 7 Feb 2019 19:06:13 +0000 (20:06 +0100)]
Restore dropped patches in the non-TLS branch of memory.c (#2004)
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002, the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450, #1468, #1501, #1504 and #1520.
Martin Kroeker [Wed, 6 Feb 2019 07:39:24 +0000 (08:39 +0100)]
Merge pull request #2001 from martin-frbg/cmake-dynlist
Support DYNAMIC_LIST option in cmake
Martin Kroeker [Tue, 5 Feb 2019 23:29:30 +0000 (00:29 +0100)]
Merge pull request #2000 from martin-frbg/issue1989
Make c_check robust against old or incomplete perl installations
Martin Kroeker [Tue, 5 Feb 2019 22:51:40 +0000 (23:51 +0100)]
Support DYNAMIC_LIST option in cmake
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
Martin Kroeker [Tue, 5 Feb 2019 21:02:11 +0000 (22:02 +0100)]
Merge pull request #1999 from martin-frbg/issue1996-2
fix second instance of complex.h for c++ as well
Martin Kroeker [Tue, 5 Feb 2019 19:06:34 +0000 (20:06 +0100)]
Make c_check robust against old or incomplete perl installations
by catching and working around failures to load modules, and avoiding object-oriented syntax in tempfile creation.
Fixes #1989
Martin Kroeker [Tue, 5 Feb 2019 18:29:33 +0000 (19:29 +0100)]
fix second instance of complex.h for c++ as well
Martin Kroeker [Tue, 5 Feb 2019 16:39:59 +0000 (17:39 +0100)]
Merge pull request #1998 from martin-frbg/issue1992
Include complex rather than complex.h in C++ contexts
Martin Kroeker [Tue, 5 Feb 2019 12:30:13 +0000 (13:30 +0100)]
Include complex rather than complex.h in C++ contexts
to avoid name clashes e.g. with boost headers that use I as a generic placeholder.
Fixes #1992 as suggested by aprokop in that issue ticket.
Martin Kroeker [Mon, 4 Feb 2019 15:52:04 +0000 (16:52 +0100)]
Merge pull request #1996 from quickwritereader/develop
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
Ubuntu [Mon, 4 Feb 2019 15:41:56 +0000 (15:41 +0000)]
Note for unused kernels
Ubuntu [Mon, 4 Feb 2019 06:57:11 +0000 (06:57 +0000)]
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
Martin Kroeker [Fri, 1 Feb 2019 20:04:47 +0000 (21:04 +0100)]
Merge pull request #1994 from quickwritereader/develop
sgemv cgemv pairs
Ubuntu [Fri, 1 Feb 2019 13:45:00 +0000 (13:45 +0000)]
sgemv cgemv pairs
Martin Kroeker [Fri, 1 Feb 2019 11:58:59 +0000 (12:58 +0100)]
Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
Martin Kroeker [Fri, 1 Feb 2019 11:57:01 +0000 (12:57 +0100)]
Delete misplaced file sgemv_t_4.c
from #1993 , file should have gone into kernel/zarch
Martin Kroeker [Thu, 31 Jan 2019 20:27:00 +0000 (21:27 +0100)]
Merge pull request #1993 from martin-frbg/aarnes-zarch
Various fixes for the new Z14 target
Martin Kroeker [Thu, 31 Jan 2019 20:24:55 +0000 (21:24 +0100)]
Improve the z14 SGEMVT kernel
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:22:26 +0000 (21:22 +0100)]
Fix precision of zarch DSDOT
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:21:40 +0000 (21:21 +0100)]
Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:18:09 +0000 (21:18 +0100)]
USE_TRMM on Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:16:44 +0000 (21:16 +0100)]
Add cache sizes for Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:15:50 +0000 (21:15 +0100)]
Add FORCE Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:14:37 +0000 (21:14 +0100)]
Add parameters for Z14
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 20:13:46 +0000 (21:13 +0100)]
Add Z14 target
from patch provided by aarnez in #991
Martin Kroeker [Thu, 31 Jan 2019 18:10:03 +0000 (19:10 +0100)]
Merge pull request #1991 from maamountki/z14
[ZARCH] Z14 Support, BLAS 1/2 single precision implementations
maamountki [Thu, 31 Jan 2019 17:36:41 +0000 (19:36 +0200)]
Merge branch 'develop' into z14
maamountki [Thu, 31 Jan 2019 17:11:11 +0000 (19:11 +0200)]
[ZARCH] Add Z13 version for max/min functions
maamountki [Thu, 31 Jan 2019 16:52:11 +0000 (18:52 +0200)]
[ZARCH] Improve loading performance for camax/icamax
Martin Kroeker [Thu, 31 Jan 2019 14:27:21 +0000 (15:27 +0100)]
Fix wrong comparison that made IMIN identical to IMAX
as reported by aarnez in #1990
Martin Kroeker [Thu, 31 Jan 2019 14:25:15 +0000 (15:25 +0100)]
Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
maamountki [Thu, 31 Jan 2019 07:26:50 +0000 (09:26 +0200)]
Remove ztest
maamountki [Tue, 29 Jan 2019 15:59:38 +0000 (17:59 +0200)]
[ZARCH] Fix bug in max/min functions
maamountki [Tue, 29 Jan 2019 01:47:49 +0000 (03:47 +0200)]
[ZARCH] Fix icamax/icamin
maamountki [Mon, 28 Jan 2019 15:52:23 +0000 (17:52 +0200)]
[ZARCH] Fix iamax/imax single precision
maamountki [Mon, 28 Jan 2019 15:32:24 +0000 (17:32 +0200)]
[ZARCH] Undo the last commit
maamountki [Mon, 28 Jan 2019 15:16:18 +0000 (17:16 +0200)]
[ZARCH] Fix bug in iamax/iamin/imax/imin
Martin Kroeker [Mon, 28 Jan 2019 14:44:57 +0000 (15:44 +0100)]
Merge pull request #1985 from martin-frbg/issue1984
Correct naming of getrf_parallel object
Martin Kroeker [Mon, 28 Jan 2019 14:44:42 +0000 (15:44 +0100)]
Merge pull request #1981 from edisongustavo/develop
Fix include directory of exported targets
Martin Kroeker [Mon, 28 Jan 2019 14:43:35 +0000 (15:43 +0100)]
Merge pull request #1978 from danielgindi/feature/msvc_cmake
Better support for MSVC/Windows in CMake (v0.3.x)
Martin Kroeker [Mon, 28 Jan 2019 14:42:57 +0000 (15:42 +0100)]
Merge pull request #1962 from brada4/r
Modrenize R benchmarks slightly
Martin Kroeker [Sat, 26 Jan 2019 21:25:29 +0000 (22:25 +0100)]
Merge pull request #1987 from martin-frbg/issue1961
Change ARMV8 target with BINARY=32 to ARMV7 automatically
Martin Kroeker [Sat, 26 Jan 2019 16:52:33 +0000 (17:52 +0100)]
Change ARMV8 target to ARMV7 for BINARY=32
Martin Kroeker [Sat, 26 Jan 2019 16:47:22 +0000 (17:47 +0100)]
Change ARMV8 target to ARMV7 when BINARY32 is set
fixes #1961
Martin Kroeker [Fri, 25 Jan 2019 23:45:45 +0000 (00:45 +0100)]
Correct naming of getrf_parallel object
fixes #1984
Martin Kroeker [Thu, 24 Jan 2019 08:17:48 +0000 (09:17 +0100)]
Merge pull request #1971 from martin-frbg/trsm-threshold
Shift transition to multithreading towards larger matrix sizes
Edison Gustavo Muenz [Wed, 23 Jan 2019 14:09:13 +0000 (15:09 +0100)]
Fix include directory of exported targets
Martin Kroeker [Wed, 23 Jan 2019 09:03:00 +0000 (10:03 +0100)]
Avoid penalizing tall skinny matrices
Martin Kroeker [Tue, 22 Jan 2019 20:10:38 +0000 (21:10 +0100)]
Merge pull request #1980 from martin-frbg/issue1979
Report SkylakeX as Haswell if compiler does not support AVX512
Martin Kroeker [Tue, 22 Jan 2019 17:55:43 +0000 (18:55 +0100)]
Syntax fix
Martin Kroeker [Tue, 22 Jan 2019 17:47:12 +0000 (18:47 +0100)]
Report SkylakeX as Haswell if compiler does not support AVX512
... or make was invoked with NO_AVX512=1
Daniel Cohen Gindi [Tue, 22 Jan 2019 12:38:01 +0000 (14:38 +0200)]
Adjust test script for correct deployment
Martin Kroeker [Tue, 22 Jan 2019 11:32:24 +0000 (12:32 +0100)]
Use VERSION_LESS for comparisons involving software version numbers
Daniel Cohen Gindi [Mon, 21 Jan 2019 06:35:23 +0000 (08:35 +0200)]
Better support for MSVC/Windows in CMake
maamountki [Mon, 21 Jan 2019 13:56:04 +0000 (15:56 +0200)]
[ZARCH] Update max/min functions
Martin Kroeker [Sun, 20 Jan 2019 19:30:11 +0000 (20:30 +0100)]
Merge pull request #1973 from martin-frbg/issue1464
Increase Zen SWITCH_RATIO to 16
Martin Kroeker [Sun, 20 Jan 2019 11:18:53 +0000 (12:18 +0100)]
Fix compilation with NO_AVX=1 set
fixes #1974
Martin Kroeker [Sat, 19 Jan 2019 22:01:31 +0000 (23:01 +0100)]
Increase Zen SWITCH_RATIO to 16
following GEMM benchmarks on Ryzen2700X. For #1464
Martin Kroeker [Fri, 18 Jan 2019 23:10:01 +0000 (00:10 +0100)]
Shift transition to multithreading towards larger matrix sizes
See #1886 and JuliaRobotics issue 500. trsm benchmarks on Haswell and Zen showed that with these values performance is roughly doubled for matrix sizes between 8x8 and 14x14, and still 10 to 20 percent better near the new cutoff at 32x32.
Martin Kroeker [Fri, 18 Jan 2019 07:11:39 +0000 (08:11 +0100)]
Fix declaration of input arguments in the Sandybridge GER microkernels (#1967)
* Tag arguments 0 and 1 as both input and output
Martin Kroeker [Fri, 18 Jan 2019 07:11:07 +0000 (08:11 +0100)]
Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966)
* Tag arguments 0 and 1 as both input and output (see #1964)
Martin Kroeker [Thu, 17 Jan 2019 22:20:32 +0000 (23:20 +0100)]
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965)
* Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292)
Martin Kroeker [Thu, 17 Jan 2019 15:42:11 +0000 (16:42 +0100)]
Merge pull request #1970 from quickwritereader/develop
crot fix
Martin Kroeker [Thu, 17 Jan 2019 15:19:03 +0000 (16:19 +0100)]
Bump xcode version to 10.1 to make sure it handles AVX512
Ubuntu [Thu, 17 Jan 2019 14:45:31 +0000 (14:45 +0000)]
crot fix
Martin Kroeker [Wed, 16 Jan 2019 17:41:03 +0000 (18:41 +0100)]
Merge pull request #1963 from quickwritereader/develop
Blas1 single missing kernels implemented with vector builtins
Abdelrauf [Wed, 16 Jan 2019 15:25:13 +0000 (19:25 +0400)]
Merge branch 'develop' into develop
Ubuntu [Wed, 16 Jan 2019 15:16:21 +0000 (15:16 +0000)]
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
Andrew [Wed, 16 Jan 2019 09:54:22 +0000 (11:54 +0200)]
disable NaN checks before BLAS calls dgemm.R
Andrew [Wed, 16 Jan 2019 09:41:46 +0000 (11:41 +0200)]
disable NaN checks before BLAS calls deig.R (shorten matrix def)
Andrew [Wed, 16 Jan 2019 09:38:14 +0000 (11:38 +0200)]
disable NaN checks before BLAS calls deig.R
Andrew [Wed, 16 Jan 2019 09:34:46 +0000 (11:34 +0200)]
disable NaN checks before BLAS calls dsolve.R (shorter formula)
Martin Kroeker [Wed, 16 Jan 2019 09:27:14 +0000 (10:27 +0100)]
Merge pull request #1960 from cnjsdfcy/Hygon
Add support for Hygon Dhyana
Andrew [Wed, 16 Jan 2019 09:23:51 +0000 (11:23 +0200)]
disable NaN checks before BLAS calls dsolve.R (shorter config part)
Andrew [Wed, 16 Jan 2019 09:18:54 +0000 (11:18 +0200)]
disable NaN checks before BLAS calls dsolve.R
Andrew [Wed, 16 Jan 2019 07:51:29 +0000 (09:51 +0200)]
init
caiyu [Wed, 16 Jan 2019 06:25:19 +0000 (14:25 +0800)]
Add support for Hygon Dhyana
maamountki [Tue, 15 Jan 2019 19:04:22 +0000 (21:04 +0200)]
[ZARCH] fix a bug in max/min functions
Martin Kroeker [Mon, 14 Jan 2019 21:41:31 +0000 (22:41 +0100)]
Fix missing braces in support_av() call
Martin Kroeker [Mon, 14 Jan 2019 21:38:32 +0000 (22:38 +0100)]
Fix missing braces in support_avx()
maamountki [Fri, 11 Jan 2019 15:43:11 +0000 (17:43 +0200)]
[ZARCH] Update dgemv_n_4.c
maamountki [Fri, 11 Jan 2019 15:39:17 +0000 (17:39 +0200)]
[ZARCH] update cgemv_n_4.c
maamountki [Fri, 11 Jan 2019 15:37:11 +0000 (17:37 +0200)]
[ZARCH] Update cgemv_t_4.c
maamountki [Fri, 11 Jan 2019 15:14:04 +0000 (17:14 +0200)]
Update sgemv_t_4.c
maamountki [Fri, 11 Jan 2019 15:13:02 +0000 (17:13 +0200)]
Update dgemv_t_4.c
maamountki [Fri, 11 Jan 2019 15:08:24 +0000 (17:08 +0200)]
[ZARCH] fix sgemv_n_4.c
maamountki [Fri, 11 Jan 2019 14:44:46 +0000 (16:44 +0200)]
[ZARCH] fix cgemv_n_4.c
Martin Kroeker [Thu, 10 Jan 2019 11:04:08 +0000 (12:04 +0100)]
Merge pull request #1957 from martin-frbg/issue1954
Move TLS key deletion to openblas_quit
Martin Kroeker [Wed, 9 Jan 2019 23:32:50 +0000 (00:32 +0100)]
Move TLS key deletion to openblas_quit
fixes #1954 (as suggested by thrasibule in that issue)
maamountki [Wed, 9 Jan 2019 14:50:07 +0000 (16:50 +0200)]
[ZARCH] fix data prefetch type in sdot
maamountki [Wed, 9 Jan 2019 14:49:44 +0000 (16:49 +0200)]
[ZARCH] fix data prefetch type in ddot
maamountki [Wed, 9 Jan 2019 14:33:54 +0000 (16:33 +0200)]
[ZARCH] fix dsdot.c
maamountki [Wed, 9 Jan 2019 05:43:45 +0000 (07:43 +0200)]
[ZARCH] fix cgemv_n_4.c
Martin Kroeker [Tue, 8 Jan 2019 19:44:08 +0000 (20:44 +0100)]
Merge pull request #1949 from martin-frbg/issue1947
Query AVX2 and AVX512VL support when selecting x86 kernels
Martin Kroeker [Tue, 8 Jan 2019 13:43:45 +0000 (14:43 +0100)]
Bump xcode to 8.3
Martin Kroeker [Tue, 8 Jan 2019 13:41:48 +0000 (14:41 +0100)]
Update OSX environment to Sierra
as homebrew seems to have dropped support for El Capitan in their gcc packages
Martin Kroeker [Tue, 8 Jan 2019 09:46:47 +0000 (10:46 +0100)]
Add travis_wait to the OSX brew install phase
Martin Kroeker [Sat, 5 Jan 2019 18:41:13 +0000 (19:41 +0100)]
Add message for SkylakeX and KNL fallbacks to Haswell