Martin Kroeker [Fri, 26 Jan 2018 22:20:00 +0000 (23:20 +0100)]
Make USE_TRMM depend on TARGET_CORE not TARGET
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
Martin Kroeker [Wed, 24 Jan 2018 08:02:55 +0000 (09:02 +0100)]
Merge pull request #1429 from martin-frbg/override_omp
When forcing USE_THREAD to zero, override USE_OPENMP as well
Martin Kroeker [Tue, 23 Jan 2018 20:33:21 +0000 (21:33 +0100)]
When forcing USE_THREAD=0, override USE_OPENMP as well
This avoids an error exit a few lines down as USE_THREAD=0 conflicts with USE_OPENMP=1
Martin Kroeker [Sat, 20 Jan 2018 17:20:42 +0000 (18:20 +0100)]
Merge pull request #1417 from xoviat/openblas-library-name
CMake: Use the correct library name on windows
Martin Kroeker [Sat, 20 Jan 2018 16:34:54 +0000 (17:34 +0100)]
Merge pull request #1426 from quickwritereader/develop
(Z13 ) Blas1 mikrokernels can be inlined by gcc. Refactoring,fixes,tunings
the mslm [Thu, 18 Jan 2018 02:05:44 +0000 (18:05 -0800)]
Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan
names). Some fixes and tunings
xoviat [Thu, 11 Jan 2018 17:34:53 +0000 (11:34 -0600)]
CMake: Use the correct library name on windows
FindBLAS and FindLAPACK don't correctly detect the OpenBLAS library
name on windows. I am unsure why this was originally set this way, but
it has caused me some trouble.
Martin Kroeker [Thu, 11 Jan 2018 07:35:02 +0000 (08:35 +0100)]
Merge pull request #1415 from quickwritereader/develop
(Z systems Z13) small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels…
Abdelrauf [Fri, 13 Oct 2017 15:29:27 +0000 (19:29 +0400)]
small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels can be inlined
Martin Kroeker [Sat, 6 Jan 2018 19:02:46 +0000 (20:02 +0100)]
Merge pull request #1410 from brada4/develop
Address warnings #1357
Martin Kroeker [Sat, 6 Jan 2018 19:01:35 +0000 (20:01 +0100)]
Merge pull request #1411 from christoph-conrads/fix-pkgconfig-path-escapes
Make: escape paths to pkg-config file
Christoph Conrads [Fri, 5 Jan 2018 22:08:55 +0000 (17:08 -0500)]
Make: escape paths to pkg-config file
Add double quotes around the path to the pkg-config file so that a path
containing whitespace does not break the build.
Andrew [Tue, 2 Jan 2018 13:38:50 +0000 (14:38 +0100)]
LAPACK helpers in C that need care too
Andrew [Mon, 1 Jan 2018 19:57:12 +0000 (20:57 +0100)]
address last warnings as seen by gcc7
Andrew [Mon, 1 Jan 2018 19:56:26 +0000 (20:56 +0100)]
remove surplus parentheses to silence clang5
Andrew [Mon, 1 Jan 2018 19:54:39 +0000 (20:54 +0100)]
Eliminate remaining unused results in kernels (clang5 analyzer)
Martin Kroeker [Sun, 31 Dec 2017 19:18:48 +0000 (20:18 +0100)]
Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
Martin Kroeker [Sun, 31 Dec 2017 17:03:36 +0000 (18:03 +0100)]
Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
Martin Kroeker [Sat, 30 Dec 2017 13:52:21 +0000 (14:52 +0100)]
Merge pull request #1408 from xoviat/flang-ninja
Appveyor: speed up fortran builds
Martin Kroeker [Sat, 30 Dec 2017 13:52:03 +0000 (14:52 +0100)]
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
Martin Kroeker [Sat, 30 Dec 2017 13:51:34 +0000 (14:51 +0100)]
Merge pull request #1403 from brada4/develop
Address few more warnings
xoviat [Sat, 30 Dec 2017 01:58:35 +0000 (19:58 -0600)]
Appveyor: enable building fortran with ninja
Martin Kroeker [Fri, 29 Dec 2017 22:56:41 +0000 (23:56 +0100)]
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
Andrew [Tue, 26 Dec 2017 08:24:24 +0000 (09:24 +0100)]
initialize potentially unitialized variables (clang5)
Andrew [Thu, 21 Dec 2017 23:56:35 +0000 (00:56 +0100)]
fix couple of dead assignment warnings
Andrew [Thu, 21 Dec 2017 23:55:40 +0000 (00:55 +0100)]
remove unused buffer
Martin Kroeker [Thu, 21 Dec 2017 22:36:52 +0000 (23:36 +0100)]
Merge pull request #1399 from martin-frbg/issue1398
Fix LAPACKE build problems with both cmake and make
Martin Kroeker [Thu, 21 Dec 2017 19:42:30 +0000 (20:42 +0100)]
Add conditionals around ar calls for optional modules
The macOS ar aborts when it gets called with no input, see #1398
Martin Kroeker [Thu, 21 Dec 2017 18:43:09 +0000 (19:43 +0100)]
Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp
These were inadvertently dropped from the list in my PR #1095
Martin Kroeker [Wed, 13 Dec 2017 19:27:14 +0000 (20:27 +0100)]
Merge pull request #1393 from martin-frbg/daxpybug
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
Martin Kroeker [Wed, 13 Dec 2017 17:40:39 +0000 (18:40 +0100)]
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
Martin Kroeker [Sun, 10 Dec 2017 20:46:36 +0000 (21:46 +0100)]
Merge pull request #1390 from martin-frbg/daxpybug
Use Sandybridge daxpy kernel on Haswell and Zen for now
Martin Kroeker [Sun, 10 Dec 2017 18:24:31 +0000 (19:24 +0100)]
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
Martin Kroeker [Sat, 9 Dec 2017 21:29:03 +0000 (22:29 +0100)]
Issue1388 (#1389)
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262 - should fix #1388
* Calculation of range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
Andrew [Tue, 5 Dec 2017 18:54:10 +0000 (19:54 +0100)]
warning cleanup (#1380)
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
Martin Kroeker [Tue, 5 Dec 2017 18:53:23 +0000 (19:53 +0100)]
Merge pull request #1382 from martin-frbg/dtrmv-1332
Work around errors in multithreaded dtrmv
Martin Kroeker [Tue, 5 Dec 2017 18:52:52 +0000 (19:52 +0100)]
Merge pull request #1386 from martin-frbg/bignuma
Limit MAX_CPU to 1024 for now
Martin Kroeker [Tue, 5 Dec 2017 18:52:03 +0000 (19:52 +0100)]
Merge pull request #1387 from martin-frbg/cmakeandroid
Explicitly link against libm on Android with cmake as well
Martin Kroeker [Tue, 5 Dec 2017 12:02:48 +0000 (13:02 +0100)]
Explicitly link against libm on Android with cmake as well
Patch from #1384
Martin Kroeker [Tue, 5 Dec 2017 11:54:15 +0000 (12:54 +0100)]
Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
Martin Kroeker [Sun, 3 Dec 2017 21:41:54 +0000 (22:41 +0100)]
Disable gemv unrolling
as a (hopefully temporary) workaround for #1332
Martin Kroeker [Sun, 3 Dec 2017 21:40:54 +0000 (22:40 +0100)]
Disable multithreading for trmv
as a (hopefully temporary) workaround for #1332
Martin Kroeker [Sun, 3 Dec 2017 20:35:20 +0000 (21:35 +0100)]
Merge pull request #1381 from martin-frbg/ctest-warnings
Fix compiler warnings in ctest
Martin Kroeker [Sun, 3 Dec 2017 17:19:30 +0000 (18:19 +0100)]
Fix compiler warnings in ctest
Various fixes for const correctness, stray tab characters and unused labels
Martin Kroeker [Sun, 3 Dec 2017 12:04:02 +0000 (13:04 +0100)]
Merge pull request #1379 from martin-frbg/warnfix
Work around compiler warnings for unused variables
Martin Kroeker [Sat, 2 Dec 2017 21:51:58 +0000 (22:51 +0100)]
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
Martin Kroeker [Sat, 2 Dec 2017 15:49:47 +0000 (16:49 +0100)]
Merge pull request #1372 from martin-frbg/param
Correct zgeadd_k prototype
Martin Kroeker [Sat, 2 Dec 2017 11:59:27 +0000 (12:59 +0100)]
Prefix make jobs with travis_wait (#1378)
* Prefix make with travis_wait to prevent it getting killed for producing no output
* Extend travis_wait to 30mins for the windows build
* Trying 45 mins wait time
* Increase travis_wait time to 45 minutes for linux builds as well
Martin Kroeker [Fri, 1 Dec 2017 15:22:35 +0000 (16:22 +0100)]
Merge pull request #1377 from isuruf/threads
Allow overriding NUM_THREADS in cmake
Isuru Fernando [Fri, 1 Dec 2017 07:39:46 +0000 (01:39 -0600)]
Allow overriding NUM_THREADS
Martin Kroeker [Fri, 1 Dec 2017 07:11:12 +0000 (08:11 +0100)]
Merge pull request #1376 from xoviat/patch-2
[appveyor] fix test directory
xoviat [Thu, 30 Nov 2017 22:31:09 +0000 (16:31 -0600)]
[appveyor] fix test directory
Martin Kroeker [Thu, 30 Nov 2017 21:43:54 +0000 (22:43 +0100)]
Merge pull request #1375 from xoviat/patch-1
[appveyor] Use out-of-tree build and cache
xoviat [Thu, 30 Nov 2017 21:33:32 +0000 (15:33 -0600)]
[appveyor] fix syntax
xoviat [Thu, 30 Nov 2017 21:30:10 +0000 (15:30 -0600)]
[appveyor] Use out-of-tree build and cache
Martin Kroeker [Thu, 30 Nov 2017 11:54:52 +0000 (12:54 +0100)]
Merge pull request #1373 from mc10/patch-1
README: Use the SVG Travis badge
Kevin Ji [Wed, 29 Nov 2017 23:21:12 +0000 (15:21 -0800)]
README: Use the SVG Travis badge
Martin Kroeker [Wed, 29 Nov 2017 18:57:35 +0000 (19:57 +0100)]
Correct zgeadd_k prototype
Martin Kroeker [Wed, 29 Nov 2017 18:55:21 +0000 (19:55 +0100)]
Merge pull request #1371 from martin-frbg/develop
Add trivially optimized DSDOT for POWER8
martin [Tue, 28 Nov 2017 17:38:07 +0000 (18:38 +0100)]
Add trivially optimized DSDOT for POWER8
Martin Kroeker [Tue, 28 Nov 2017 17:15:31 +0000 (18:15 +0100)]
Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
Martin Kroeker [Tue, 28 Nov 2017 17:15:04 +0000 (18:15 +0100)]
Merge pull request #1368 from brada4/develop
Eliminate warnings
Martin Kroeker [Sun, 26 Nov 2017 18:12:00 +0000 (19:12 +0100)]
Merge pull request #1366 from martin-frbg/develop
Update LAPACK to 3.8.0
Andrew [Sun, 26 Nov 2017 16:24:08 +0000 (17:24 +0100)]
more dead increments clang4 scan-build deadcode.deadstores
Andrew [Sun, 26 Nov 2017 12:26:11 +0000 (13:26 +0100)]
Eliminate 2-8 dead increments code
Andrew [Sat, 25 Nov 2017 01:54:37 +0000 (02:54 +0100)]
elminate unread variable, after reiteration 3 of them (clang4)
Martin Kroeker [Fri, 24 Nov 2017 19:05:27 +0000 (20:05 +0100)]
Add trivially optimized dsdot based on sdot
Martin Kroeker [Fri, 24 Nov 2017 19:04:29 +0000 (20:04 +0100)]
Add trivially optimized dsdot based on sdot
Martin Kroeker [Fri, 24 Nov 2017 19:03:40 +0000 (20:03 +0100)]
Add trivially optimized dsdot based on sdot
Martin Kroeker [Fri, 24 Nov 2017 19:02:28 +0000 (20:02 +0100)]
Add trivially optimized dsdot based on sdot
Martin Kroeker [Fri, 24 Nov 2017 19:01:42 +0000 (20:01 +0100)]
Add trivially optimized dsdot based on sdot
Martin Kroeker [Fri, 24 Nov 2017 19:00:23 +0000 (20:00 +0100)]
Add trivially optimized dsdot based on sdot
Martin Kroeker [Fri, 24 Nov 2017 18:59:28 +0000 (19:59 +0100)]
Add trivially optimized dsdot based on sdot
Andrew [Fri, 24 Nov 2017 18:13:24 +0000 (19:13 +0100)]
eliminate Wunused-const gcc7 warning
Andrew [Fri, 24 Nov 2017 17:39:04 +0000 (18:39 +0100)]
fix spurious compiler warning fix (no code change)
Andrew [Fri, 24 Nov 2017 17:36:37 +0000 (18:36 +0100)]
fix spurious compiler warning (no code change)
martin [Fri, 24 Nov 2017 08:15:20 +0000 (09:15 +0100)]
fix location of lapacke_nancheck
martin [Fri, 24 Nov 2017 07:15:40 +0000 (08:15 +0100)]
update cmake files
martin [Thu, 23 Nov 2017 20:22:01 +0000 (21:22 +0100)]
update cmakefiles for lapack 3.8.0
martin [Thu, 23 Nov 2017 17:13:35 +0000 (18:13 +0100)]
Update LAPACK to 3.8.0
Martin Kroeker [Wed, 22 Nov 2017 20:13:41 +0000 (21:13 +0100)]
Merge pull request #1365 from xoviat/patch-1
[appveyor] use cmake from conda forge
xoviat [Wed, 22 Nov 2017 00:44:02 +0000 (18:44 -0600)]
[appveyor] use cmake from conda forge
Martin Kroeker [Sun, 19 Nov 2017 11:50:16 +0000 (12:50 +0100)]
Merge pull request #1364 from martin-frbg/shmem-init
Handle shmem init failures in cpu affinity setup code
Martin Kroeker [Sat, 18 Nov 2017 22:57:44 +0000 (23:57 +0100)]
Handle shmem init failures in cpu affinity setup code
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
Martin Kroeker [Sat, 18 Nov 2017 22:47:17 +0000 (23:47 +0100)]
Merge pull request #1359 from brada4/develop
Eliminate mode variable where not needed in syrk interface
Martin Kroeker [Sat, 18 Nov 2017 22:46:57 +0000 (23:46 +0100)]
Merge pull request #1347 from martin-frbg/issue1322
Change CBLAS complex functions to take void pointers
Martin Kroeker [Sat, 18 Nov 2017 19:28:02 +0000 (20:28 +0100)]
Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well
Martin Kroeker [Sat, 18 Nov 2017 17:58:40 +0000 (18:58 +0100)]
Make last parameter of cblas_Xdotc_sub/cblas_Xdotu_sub a void pointer as well
Martin Kroeker [Sat, 18 Nov 2017 17:56:30 +0000 (18:56 +0100)]
Fix declaration of cblas_Xdotc_sub and cblas_Xdotu_sub
last parameter of cblas_(c,z)dotc_sub and cblas_(c,z)dotu_sub should be void* as well
Andrew [Wed, 15 Nov 2017 14:32:38 +0000 (15:32 +0100)]
Eliminate mode variable where not needed
Martin Kroeker [Wed, 15 Nov 2017 10:31:43 +0000 (11:31 +0100)]
Merge pull request #1358 from martin-frbg/unused_vars
Clean up spurious unused variables in the kernels
Martin Kroeker [Tue, 14 Nov 2017 22:35:10 +0000 (23:35 +0100)]
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm
Martin Kroeker [Tue, 14 Nov 2017 22:32:25 +0000 (23:32 +0100)]
Remove unused variables at0...at3 from ?symv_U
Martin Kroeker [Tue, 14 Nov 2017 22:29:42 +0000 (23:29 +0100)]
Remove unused (loop?) variable j from the gemv_n_4 implementations
Martin Kroeker [Tue, 14 Nov 2017 22:25:50 +0000 (23:25 +0100)]
Remove unused variable btpr
Martin Kroeker [Tue, 14 Nov 2017 22:23:44 +0000 (23:23 +0100)]
Silence an unused variable warning with a cast
l2 cache size is not universally needed to assign default unrolling limits, but neither putting its declaration inside an ifdef nor cloning it into all ifdef sections that need it really makes sense here.
Martin Kroeker [Fri, 10 Nov 2017 21:16:31 +0000 (22:16 +0100)]
Merge pull request #1353 from xoviat/patch-1
[appveyor] use flang from conda-forge
Martin Kroeker [Fri, 10 Nov 2017 21:15:27 +0000 (22:15 +0100)]
Merge pull request #1356 from martin-frbg/lapack-issue196
Break out of potentially infinite rescaling loop after 1000 iterations
Martin Kroeker [Fri, 10 Nov 2017 19:02:21 +0000 (20:02 +0100)]
Break out of potentially infinite rescaling loop after 1000 iterations
Inf values in the input vector will survive rescaling, causing an infinite loop. The value of 1000 is arbitrarily chosen as a large but finite value with the intention to never interfere with regular calculations.
Martin Kroeker [Fri, 10 Nov 2017 08:11:03 +0000 (09:11 +0100)]
Merge pull request #1354 from martin-frbg/shmem
Try to handle shmget or shmat failing