Martin Kroeker [Tue, 20 Feb 2018 16:17:38 +0000 (17:17 +0100)]
Merge pull request #1466 from xianyi/revert-1464-issue1461
Revert "Add locks only for non-OPENMP multithreading"
Martin Kroeker [Tue, 20 Feb 2018 16:17:12 +0000 (17:17 +0100)]
Revert "Add locks only for non-OPENMP multithreading"
Martin Kroeker [Tue, 20 Feb 2018 13:08:24 +0000 (14:08 +0100)]
Merge pull request #1464 from martin-frbg/issue1461
Add locks only for non-OPENMP multithreading
Martin Kroeker [Tue, 20 Feb 2018 11:17:18 +0000 (12:17 +0100)]
Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
Martin Kroeker [Tue, 20 Feb 2018 09:07:17 +0000 (10:07 +0100)]
Restore the remaining utests (#1462)
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -
1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
Martin Kroeker [Sun, 18 Feb 2018 14:12:42 +0000 (15:12 +0100)]
Merge pull request #1463 from martin-frbg/rotmg2
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
Martin Kroeker [Sun, 18 Feb 2018 14:12:20 +0000 (15:12 +0100)]
Merge pull request #1460 from martin-frbg/issue1425
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
Martin Kroeker [Sun, 18 Feb 2018 11:54:52 +0000 (12:54 +0100)]
Fix conditionals in the rescaling against GAMSQ
Martin Kroeker [Sun, 18 Feb 2018 11:44:14 +0000 (12:44 +0100)]
Update single and double precision BLAS1 tests from LAPACK 3.8.0
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
Martin Kroeker [Sun, 18 Feb 2018 11:37:09 +0000 (12:37 +0100)]
Fix condition in both second scaling loops
Martin Kroeker [Wed, 14 Feb 2018 21:58:14 +0000 (22:58 +0100)]
Make "OMP task depend" sections conditional on OpenMP4, not just OpenMP
To allow compiling with gcc versions older than 4.9
Martin Kroeker [Tue, 13 Feb 2018 21:44:45 +0000 (22:44 +0100)]
Revert insiduous suppression of the -fopenmp flag in the LAPACK subtree
This was added in #1046 citing a problem with mingw, but in effect it quietly reduces thread safety on all non-Windows platforms (while -fopenmp is already disabled for Windows builds through the toplevel Makefile.system). Removing the filter fixes #1425
Martin Kroeker [Sun, 11 Feb 2018 22:13:18 +0000 (23:13 +0100)]
Merge pull request #1457 from martin-frbg/issue1456
test_fork is not meant to be run (nor expected to work) with OpenMP
Martin Kroeker [Sun, 11 Feb 2018 19:58:27 +0000 (20:58 +0100)]
test_fork is not meant (nor expected) to be run with OpenMP
Fixes 1456
Martin Kroeker [Sun, 11 Feb 2018 19:48:04 +0000 (20:48 +0100)]
Merge pull request #1454 from martin-frbg/issue1452
Keep the flag handling separate from the scaling loops in rotmg
Martin Kroeker [Sun, 11 Feb 2018 19:47:49 +0000 (20:47 +0100)]
Merge pull request #1449 from martin-frbg/armv8
Enable assembly kernels for the generic ARMV8 target and treat CortexA53,A72 as A57
Martin Kroeker [Sat, 10 Feb 2018 13:18:21 +0000 (14:18 +0100)]
Add tests for rotmg
Martin Kroeker [Sat, 10 Feb 2018 13:17:41 +0000 (14:17 +0100)]
Add rotmg tests for CMAKE MSVC+CLANG build
Martin Kroeker [Sat, 10 Feb 2018 13:10:55 +0000 (14:10 +0100)]
Merge current Makefile from develop
Martin Kroeker [Sat, 10 Feb 2018 11:48:03 +0000 (12:48 +0100)]
Resurrect utest for rotmg and add testcase for issue 1452
Martin Kroeker [Fri, 9 Feb 2018 22:06:50 +0000 (23:06 +0100)]
Remove debug printfs
Martin Kroeker [Fri, 9 Feb 2018 22:00:03 +0000 (23:00 +0100)]
Keep the flag handling separate from the scaling loops
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
Martin Kroeker [Thu, 8 Feb 2018 21:39:24 +0000 (22:39 +0100)]
Merge pull request #1453 from martin-frbg/netlib228
Remove spurious EXTERNAL reference
Martin Kroeker [Thu, 8 Feb 2018 21:39:11 +0000 (22:39 +0100)]
Merge pull request #1450 from embray/cygwin/forking
Fix issues with forking on Cygwin
Martin Kroeker [Thu, 8 Feb 2018 13:57:13 +0000 (14:57 +0100)]
Remove spurious EXTERNAL reference
From Reference-LAPACK issue 228, remove spurious EXTERNAL reference to unused and nonexistent function xLACGV that could cause linking problems.
Erik M. Bray [Tue, 6 Feb 2018 10:11:30 +0000 (11:11 +0100)]
On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe.
Erik M. Bray [Tue, 6 Feb 2018 10:10:45 +0000 (11:10 +0100)]
Perform blas_thread_shutdown with pthread_atfork() on Cygwin
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
Erik M. Bray [Tue, 6 Feb 2018 10:07:56 +0000 (11:07 +0100)]
Rewrite this test to work with ctest and re-enable it on the appropriate platforms (including Cygwin, which has fork())
Martin Kroeker [Tue, 6 Feb 2018 10:42:58 +0000 (11:42 +0100)]
Enable most assembly kernels in the generic ARMV8 target
ref #1439
Martin Kroeker [Tue, 6 Feb 2018 10:38:18 +0000 (11:38 +0100)]
Detect CORTEX A53 and A72 as CORTEXA57
Martin Kroeker [Fri, 2 Feb 2018 08:02:49 +0000 (09:02 +0100)]
Merge pull request #1447 from martin-frbg/sparcfix
Generate CHAR_CORENAME for SPARC
Martin Kroeker [Thu, 1 Feb 2018 21:06:04 +0000 (22:06 +0100)]
Fix my copypaste blunder with get_corename
Martin Kroeker [Thu, 1 Feb 2018 17:20:38 +0000 (18:20 +0100)]
Use get_corename for SPARC as well
Martin Kroeker [Thu, 1 Feb 2018 17:15:15 +0000 (18:15 +0100)]
Return a corename for SPARC
Martin Kroeker [Thu, 1 Feb 2018 07:28:53 +0000 (08:28 +0100)]
Merge pull request #1445 from quickwritereader/develop
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
Abdelrauf [Thu, 1 Feb 2018 00:17:04 +0000 (16:17 -0800)]
Merge branch 'develop' into develop
Abdelrauf [Wed, 31 Jan 2018 15:49:38 +0000 (07:49 -0800)]
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
Martin Kroeker [Wed, 31 Jan 2018 22:49:01 +0000 (23:49 +0100)]
Merge pull request #1443 from martin-frbg/sparcfix
Also #define SPARC in config.h when autodetecting
Martin Kroeker [Wed, 31 Jan 2018 22:48:47 +0000 (23:48 +0100)]
Merge pull request #1440 from quickwritereader/develop
small corrections
Martin Kroeker [Wed, 31 Jan 2018 22:48:34 +0000 (23:48 +0100)]
Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
Martin Kroeker [Wed, 31 Jan 2018 21:02:00 +0000 (22:02 +0100)]
Also #define SPARC in config.h when autodetecting
Fixes #1442
Abdelrauf [Wed, 31 Jan 2018 15:49:38 +0000 (07:49 -0800)]
small fix
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
Martin Kroeker [Sat, 27 Jan 2018 13:26:49 +0000 (14:26 +0100)]
Merge pull request #1434 from xoviat/flang-wall
CMake: Remove unused wall option when FC=flang
Martin Kroeker [Sat, 27 Jan 2018 09:04:39 +0000 (10:04 +0100)]
Merge pull request #1436 from martin-frbg/cmaketrmm
Make USE_TRMM depend on TARGET_CORE not TARGET
the mslm [Sat, 27 Jan 2018 02:20:36 +0000 (18:20 -0800)]
zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes
Martin Kroeker [Fri, 26 Jan 2018 22:20:00 +0000 (23:20 +0100)]
Make USE_TRMM depend on TARGET_CORE not TARGET
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
xoviat [Fri, 26 Jan 2018 20:09:48 +0000 (14:09 -0600)]
CMake: Remove unused wall option when FC=flang
Martin Kroeker [Wed, 24 Jan 2018 08:02:55 +0000 (09:02 +0100)]
Merge pull request #1429 from martin-frbg/override_omp
When forcing USE_THREAD to zero, override USE_OPENMP as well
Martin Kroeker [Tue, 23 Jan 2018 20:33:21 +0000 (21:33 +0100)]
When forcing USE_THREAD=0, override USE_OPENMP as well
This avoids an error exit a few lines down as USE_THREAD=0 conflicts with USE_OPENMP=1
Martin Kroeker [Sat, 20 Jan 2018 17:20:42 +0000 (18:20 +0100)]
Merge pull request #1417 from xoviat/openblas-library-name
CMake: Use the correct library name on windows
Martin Kroeker [Sat, 20 Jan 2018 16:34:54 +0000 (17:34 +0100)]
Merge pull request #1426 from quickwritereader/develop
(Z13 ) Blas1 mikrokernels can be inlined by gcc. Refactoring,fixes,tunings
Andrew [Sat, 20 Jan 2018 10:42:31 +0000 (11:42 +0100)]
take out unused variables
the mslm [Thu, 18 Jan 2018 02:05:44 +0000 (18:05 -0800)]
Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan
names). Some fixes and tunings
Andrew [Fri, 19 Jan 2018 22:17:43 +0000 (23:17 +0100)]
core.IdenticalExpr clang501 checker
Andrew [Fri, 19 Jan 2018 22:15:58 +0000 (23:15 +0100)]
core.IdenticalExpr clang501 checker
Andrew [Fri, 19 Jan 2018 22:11:12 +0000 (23:11 +0100)]
add missing brackets to silence indentation warnings gcc721
Andrew [Fri, 12 Jan 2018 21:35:48 +0000 (22:35 +0100)]
add missing bracket for old glibc (cppcheck)
Andrew [Fri, 12 Jan 2018 21:35:00 +0000 (22:35 +0100)]
Initialize values to silence cppcheck
Andrew [Fri, 12 Jan 2018 21:33:41 +0000 (22:33 +0100)]
Initialize uninitialized variables (cppcheck)
xoviat [Thu, 11 Jan 2018 17:34:53 +0000 (11:34 -0600)]
CMake: Use the correct library name on windows
FindBLAS and FindLAPACK don't correctly detect the OpenBLAS library
name on windows. I am unsure why this was originally set this way, but
it has caused me some trouble.
Martin Kroeker [Thu, 11 Jan 2018 07:35:02 +0000 (08:35 +0100)]
Merge pull request #1415 from quickwritereader/develop
(Z systems Z13) small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels…
Abdelrauf [Fri, 13 Oct 2017 15:29:27 +0000 (19:29 +0400)]
small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels can be inlined
Martin Kroeker [Sat, 6 Jan 2018 19:02:46 +0000 (20:02 +0100)]
Merge pull request #1410 from brada4/develop
Address warnings #1357
Martin Kroeker [Sat, 6 Jan 2018 19:01:35 +0000 (20:01 +0100)]
Merge pull request #1411 from christoph-conrads/fix-pkgconfig-path-escapes
Make: escape paths to pkg-config file
Christoph Conrads [Fri, 5 Jan 2018 22:08:55 +0000 (17:08 -0500)]
Make: escape paths to pkg-config file
Add double quotes around the path to the pkg-config file so that a path
containing whitespace does not break the build.
Andrew [Tue, 2 Jan 2018 13:38:50 +0000 (14:38 +0100)]
LAPACK helpers in C that need care too
Andrew [Mon, 1 Jan 2018 19:57:12 +0000 (20:57 +0100)]
address last warnings as seen by gcc7
Andrew [Mon, 1 Jan 2018 19:56:26 +0000 (20:56 +0100)]
remove surplus parentheses to silence clang5
Andrew [Mon, 1 Jan 2018 19:54:39 +0000 (20:54 +0100)]
Eliminate remaining unused results in kernels (clang5 analyzer)
Martin Kroeker [Sun, 31 Dec 2017 19:18:48 +0000 (20:18 +0100)]
Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
Martin Kroeker [Sun, 31 Dec 2017 17:03:36 +0000 (18:03 +0100)]
Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
Martin Kroeker [Sat, 30 Dec 2017 13:52:21 +0000 (14:52 +0100)]
Merge pull request #1408 from xoviat/flang-ninja
Appveyor: speed up fortran builds
Martin Kroeker [Sat, 30 Dec 2017 13:52:03 +0000 (14:52 +0100)]
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
Martin Kroeker [Sat, 30 Dec 2017 13:51:34 +0000 (14:51 +0100)]
Merge pull request #1403 from brada4/develop
Address few more warnings
xoviat [Sat, 30 Dec 2017 01:58:35 +0000 (19:58 -0600)]
Appveyor: enable building fortran with ninja
Martin Kroeker [Fri, 29 Dec 2017 22:56:41 +0000 (23:56 +0100)]
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
Andrew [Tue, 26 Dec 2017 08:24:24 +0000 (09:24 +0100)]
initialize potentially unitialized variables (clang5)
Andrew [Thu, 21 Dec 2017 23:56:35 +0000 (00:56 +0100)]
fix couple of dead assignment warnings
Andrew [Thu, 21 Dec 2017 23:55:40 +0000 (00:55 +0100)]
remove unused buffer
Martin Kroeker [Thu, 21 Dec 2017 22:36:52 +0000 (23:36 +0100)]
Merge pull request #1399 from martin-frbg/issue1398
Fix LAPACKE build problems with both cmake and make
Martin Kroeker [Thu, 21 Dec 2017 19:42:30 +0000 (20:42 +0100)]
Add conditionals around ar calls for optional modules
The macOS ar aborts when it gets called with no input, see #1398
Martin Kroeker [Thu, 21 Dec 2017 18:43:09 +0000 (19:43 +0100)]
Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp
These were inadvertently dropped from the list in my PR #1095
Martin Kroeker [Wed, 13 Dec 2017 19:27:14 +0000 (20:27 +0100)]
Merge pull request #1393 from martin-frbg/daxpybug
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
Martin Kroeker [Wed, 13 Dec 2017 17:40:39 +0000 (18:40 +0100)]
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
Martin Kroeker [Sun, 10 Dec 2017 20:46:36 +0000 (21:46 +0100)]
Merge pull request #1390 from martin-frbg/daxpybug
Use Sandybridge daxpy kernel on Haswell and Zen for now
Martin Kroeker [Sun, 10 Dec 2017 18:24:31 +0000 (19:24 +0100)]
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
Martin Kroeker [Sat, 9 Dec 2017 21:29:03 +0000 (22:29 +0100)]
Issue1388 (#1389)
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262 - should fix #1388
* Calculation of range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
* Calculation of chunk range limits was ignoring num_cpu
bug introduced by me in #1262
Andrew [Tue, 5 Dec 2017 18:54:10 +0000 (19:54 +0100)]
warning cleanup (#1380)
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
Martin Kroeker [Tue, 5 Dec 2017 18:53:23 +0000 (19:53 +0100)]
Merge pull request #1382 from martin-frbg/dtrmv-1332
Work around errors in multithreaded dtrmv
Martin Kroeker [Tue, 5 Dec 2017 18:52:52 +0000 (19:52 +0100)]
Merge pull request #1386 from martin-frbg/bignuma
Limit MAX_CPU to 1024 for now
Martin Kroeker [Tue, 5 Dec 2017 18:52:03 +0000 (19:52 +0100)]
Merge pull request #1387 from martin-frbg/cmakeandroid
Explicitly link against libm on Android with cmake as well
Martin Kroeker [Tue, 5 Dec 2017 12:02:48 +0000 (13:02 +0100)]
Explicitly link against libm on Android with cmake as well
Patch from #1384
Martin Kroeker [Tue, 5 Dec 2017 11:54:15 +0000 (12:54 +0100)]
Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
Martin Kroeker [Sun, 3 Dec 2017 21:41:54 +0000 (22:41 +0100)]
Disable gemv unrolling
as a (hopefully temporary) workaround for #1332
Martin Kroeker [Sun, 3 Dec 2017 21:40:54 +0000 (22:40 +0100)]
Disable multithreading for trmv
as a (hopefully temporary) workaround for #1332
Martin Kroeker [Sun, 3 Dec 2017 20:35:20 +0000 (21:35 +0100)]
Merge pull request #1381 from martin-frbg/ctest-warnings
Fix compiler warnings in ctest
Martin Kroeker [Sun, 3 Dec 2017 17:19:30 +0000 (18:19 +0100)]
Fix compiler warnings in ctest
Various fixes for const correctness, stray tab characters and unused labels
Martin Kroeker [Sun, 3 Dec 2017 12:04:02 +0000 (13:04 +0100)]
Merge pull request #1379 from martin-frbg/warnfix
Work around compiler warnings for unused variables
Martin Kroeker [Sat, 2 Dec 2017 21:51:58 +0000 (22:51 +0100)]
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
Martin Kroeker [Sat, 2 Dec 2017 15:49:47 +0000 (16:49 +0100)]
Merge pull request #1372 from martin-frbg/param
Correct zgeadd_k prototype