Martin Kroeker [Sat, 27 Apr 2019 17:06:00 +0000 (19:06 +0200)]
Add support for INTERFACE64 and fix XERBLA calls
1. Replaced all instances of "int" with "blasint"
2. Added string length as "hidden" third parameter in calls to fortran XERBLA
Martin Kroeker [Sat, 27 Apr 2019 16:55:47 +0000 (18:55 +0200)]
Support INTERFACE64=1
Martin Kroeker [Sat, 30 Mar 2019 20:21:38 +0000 (21:21 +0100)]
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
Martin Kroeker [Sat, 30 Mar 2019 13:54:28 +0000 (14:54 +0100)]
Merge pull request #2071 from martin-frbg/issue2068
Provide CBLAS interfaces to I?MIN and I?MAX
Martin Kroeker [Sat, 30 Mar 2019 11:38:41 +0000 (12:38 +0100)]
Build CBLAS interfaces for I?MIN and I?MAX
Martin Kroeker [Sat, 30 Mar 2019 11:37:13 +0000 (12:37 +0100)]
Expose CBLAS interfaces for I?MIN and I?MAX
Martin Kroeker [Fri, 29 Mar 2019 20:46:21 +0000 (21:46 +0100)]
Merge pull request #2070 from quickwritereader/develop
power9 makefile. dgemm based on power8 kernel with following changes …
Martin Kroeker [Fri, 29 Mar 2019 18:36:29 +0000 (19:36 +0100)]
Merge branch 'develop' into develop
AbdelRauf [Thu, 14 Mar 2019 10:42:04 +0000 (10:42 +0000)]
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
Martin Kroeker [Mon, 25 Mar 2019 20:34:30 +0000 (21:34 +0100)]
Merge pull request #2069 from aixoss/aix-asm-change
AIX asm syntax changes needed for shared object creation
Ayappan P [Mon, 25 Mar 2019 13:23:25 +0000 (18:53 +0530)]
AIX asm syntax changes needed for shared object creation
Martin Kroeker [Tue, 19 Mar 2019 21:12:51 +0000 (22:12 +0100)]
Merge pull request #2064 from embray/cygwin/use-tls-thread-memory-cleanup
Fix for #2063
Erik M. Bray [Tue, 19 Mar 2019 09:22:02 +0000 (10:22 +0100)]
Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles.
Erik M. Bray [Mon, 18 Mar 2019 19:32:48 +0000 (20:32 +0100)]
Fix for #2063: The DllMain used in Cygwin did not run the thread memory
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.
Martin Kroeker [Sat, 16 Mar 2019 10:57:23 +0000 (11:57 +0100)]
Merge pull request #2058 from xsacha/patch-3
Change 64-bit detection as explained in #2056
Martin Kroeker [Sat, 16 Mar 2019 10:56:51 +0000 (11:56 +0100)]
Merge pull request #2060 from embray/cygwin/readenv
Use POSIX getenv on Cygwin
Erik M. Bray [Fri, 15 Mar 2019 14:06:30 +0000 (15:06 +0100)]
Use POSIX getenv on Cygwin
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
Martin Kroeker [Wed, 13 Mar 2019 21:10:28 +0000 (22:10 +0100)]
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
Martin Kroeker [Wed, 13 Mar 2019 18:20:23 +0000 (19:20 +0100)]
Trivial typo fix
as suggested in #2022
Sacha [Wed, 13 Mar 2019 13:21:54 +0000 (23:21 +1000)]
Change 64-bit detection as explained in #2056
Martin Kroeker [Tue, 12 Mar 2019 21:57:39 +0000 (22:57 +0100)]
Merge pull request #2042 from maomao194313/develop
add TARGET support for HiSilicon tsv110 CPUs
Martin Kroeker [Tue, 12 Mar 2019 21:57:07 +0000 (22:57 +0100)]
Merge pull request #2055 from martin-frbg/atomid
Add CPUID data for Intel Denverton (as Nehalem)
Martin Kroeker [Tue, 12 Mar 2019 15:09:55 +0000 (16:09 +0100)]
Add Intel Denverton
Martin Kroeker [Tue, 12 Mar 2019 15:03:56 +0000 (16:03 +0100)]
Add Intel Denverton
for #2048
maomao194313 [Tue, 12 Mar 2019 08:11:01 +0000 (16:11 +0800)]
make DYNAMIC_ARCH=1 package work on TSV110
maomao194313 [Tue, 12 Mar 2019 08:05:19 +0000 (16:05 +0800)]
make DYNAMIC_ARCH=1 package work on TSV110.
Martin Kroeker [Sat, 9 Mar 2019 15:39:35 +0000 (16:39 +0100)]
Merge pull request #2051 from martin-frbg/issue2048
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
Martin Kroeker [Sat, 9 Mar 2019 15:39:08 +0000 (16:39 +0100)]
Merge pull request #2050 from kencu/PowerMacFix
PowerMac 970 fixes
Martin Kroeker [Sat, 9 Mar 2019 10:21:16 +0000 (11:21 +0100)]
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
ken-cunningham-webuse [Thu, 7 Mar 2019 19:41:58 +0000 (11:41 -0800)]
common_power.h: force DCBT_ARG 0 on PPC970 Darwin
without this, we see
../kernel/power/gemv_n.S:427:Parameter syntax error
and many more similar entries
that relates to this assembly command
dcbt 8, r24, r18
this change makes the DCBT_ARG = 0
and openblas builds through to completion on PowerMac 970
Tests pass
ken-cunningham-webuse [Thu, 7 Mar 2019 19:36:35 +0000 (11:36 -0800)]
param.h : enable defines for PPC970 on DarwinOS
fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
#define SGEMM_P SGEMM_DEFAULT_P
^
Martin Kroeker [Thu, 7 Mar 2019 18:28:06 +0000 (19:28 +0100)]
Merge pull request #2049 from Celelibi/fix_crash_sgemm_sse_x64
Fix crash in sgemm SSE/nano kernel on x86_64
Celelibi [Thu, 7 Mar 2019 15:39:41 +0000 (16:39 +0100)]
Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047.
Signed-off-by: Celelibi <celelibi@gmail.com>
Martin Kroeker [Thu, 7 Mar 2019 13:51:41 +0000 (14:51 +0100)]
Merge pull request #2046 from kencu/powermac
ctest.c : add __POWERPC__ for PowerMac
ken-cunningham-webuse [Thu, 7 Mar 2019 04:55:06 +0000 (20:55 -0800)]
ctest.c : add __POWERPC__ for PowerMac
Martin Kroeker [Wed, 6 Mar 2019 21:40:26 +0000 (22:40 +0100)]
Merge pull request #2045 from martin-frbg/2033-3
Do not compile in AVX512 check if AVX support is disabled
Martin Kroeker [Tue, 5 Mar 2019 15:04:25 +0000 (16:04 +0100)]
Do not compile in AVX512 check if AVX support is disabled
xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway
Martin Kroeker [Tue, 5 Mar 2019 11:11:32 +0000 (12:11 +0100)]
Merge pull request #2044 from martin-frbg/issue2043
Fix module definition conflicts between LAPACK and ReLAPACK
Martin Kroeker [Tue, 5 Mar 2019 11:11:15 +0000 (12:11 +0100)]
Merge pull request #2039 from brada4/meminit
Address warning in memory.c
Martin Kroeker [Mon, 4 Mar 2019 20:17:08 +0000 (21:17 +0100)]
Fix module definition conflicts between LAPACK and ReLAPACK
for #2043
Martin Kroeker [Mon, 4 Mar 2019 14:08:31 +0000 (15:08 +0100)]
Merge pull request #2026 from martin-frbg/trmv_threads
Correct range limiting in trmv_thread and re-enable TRMV multithreading
Martin Kroeker [Mon, 4 Mar 2019 14:07:48 +0000 (15:07 +0100)]
Merge pull request #2038 from martin-frbg/issue2035
Improve handling of NO_STATIC and NO_SHARED
Martin Kroeker [Mon, 4 Mar 2019 14:07:14 +0000 (15:07 +0100)]
Merge pull request #2040 from martin-frbg/locks2002
Restore locking optimizations for OpenMP case
maomao194313 [Mon, 4 Mar 2019 08:48:49 +0000 (16:48 +0800)]
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:45:22 +0000 (16:45 +0800)]
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:41:21 +0000 (16:41 +0800)]
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:30:50 +0000 (16:30 +0800)]
HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
Martin Kroeker [Sun, 3 Mar 2019 13:17:07 +0000 (14:17 +0100)]
Restore locking optimizations for OpenMP case
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
Andrew [Sun, 3 Mar 2019 07:05:11 +0000 (09:05 +0200)]
address warning introed with #1814 et al
Andrew [Sun, 3 Mar 2019 06:59:27 +0000 (08:59 +0200)]
init
Martin Kroeker [Sat, 2 Mar 2019 22:36:36 +0000 (23:36 +0100)]
Improve handling of NO_STATIC and NO_SHARED
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
Martin Kroeker [Fri, 1 Mar 2019 10:45:02 +0000 (11:45 +0100)]
Merge pull request #2037 from martin-frbg/issue2033-2
Make sure that AVX512 is disabled in 32bit builds
Martin Kroeker [Fri, 1 Mar 2019 08:23:03 +0000 (09:23 +0100)]
Make sure that AVX512 is disabled in 32bit builds
for #2033
Martin Kroeker [Thu, 28 Feb 2019 21:10:12 +0000 (22:10 +0100)]
Merge pull request #2034 from martin-frbg/issue2033
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
Martin Kroeker [Thu, 28 Feb 2019 09:51:54 +0000 (10:51 +0100)]
Keep xcode8.3 for osx BINARY=32 build
as xcode10 deprecated i386
Martin Kroeker [Thu, 28 Feb 2019 08:58:25 +0000 (09:58 +0100)]
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
fixes #2033
Martin Kroeker [Mon, 25 Feb 2019 16:58:31 +0000 (17:58 +0100)]
Fix AVX512 test always returning false due to missing compiler option
Martin Kroeker [Mon, 25 Feb 2019 16:55:36 +0000 (17:55 +0100)]
Fix missing -c option in AVX512 test
Martin Kroeker [Sun, 24 Feb 2019 18:50:23 +0000 (19:50 +0100)]
Merge pull request #2028 from brada4/mv
Move one of clobber fixes to right place
Andrew [Sun, 24 Feb 2019 18:41:02 +0000 (20:41 +0200)]
move fix to right place
Andrew [Sun, 24 Feb 2019 18:39:25 +0000 (20:39 +0200)]
init
Martin Kroeker [Wed, 20 Feb 2019 09:27:48 +0000 (10:27 +0100)]
Reduce list of kernels in the dynamic arch build
to make compilation complete reliably within the 1h limit again
Martin Kroeker [Tue, 19 Feb 2019 21:16:33 +0000 (22:16 +0100)]
Fix error introduced during cleanup
Martin Kroeker [Tue, 19 Feb 2019 20:03:30 +0000 (21:03 +0100)]
Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)
Martin Kroeker [Tue, 19 Feb 2019 19:59:48 +0000 (20:59 +0100)]
Correct range_n limiting
same bug as seen in #1388, somehow missed in corresponding PR #1389
Martin Kroeker [Sun, 17 Feb 2019 10:49:15 +0000 (11:49 +0100)]
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
Martin Kroeker [Sun, 17 Feb 2019 10:48:57 +0000 (11:48 +0100)]
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
Martin Kroeker [Sun, 17 Feb 2019 10:36:04 +0000 (11:36 +0100)]
Merge pull request #1988 from TiborGY/patch-1
Reword/expand comments in Makefile.rule
TiborGY [Sat, 16 Feb 2019 22:26:13 +0000 (23:26 +0100)]
fix the the
Martin Kroeker [Sat, 16 Feb 2019 19:06:48 +0000 (20:06 +0100)]
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
Martin Kroeker [Sat, 16 Feb 2019 17:51:09 +0000 (18:51 +0100)]
Fix inline assembly constraints
Martin Kroeker [Sat, 16 Feb 2019 17:46:17 +0000 (18:46 +0100)]
Fix inline assembly constraints
Martin Kroeker [Sat, 16 Feb 2019 17:36:39 +0000 (18:36 +0100)]
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
Martin Kroeker [Sat, 16 Feb 2019 17:24:11 +0000 (18:24 +0100)]
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
Martin Kroeker [Sat, 16 Feb 2019 17:05:40 +0000 (18:05 +0100)]
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
TiborGY [Sat, 16 Feb 2019 11:12:39 +0000 (12:12 +0100)]
Update Makefile.rule
add note about NUM_THREADS for package maintainers, add examples of programs that cause affinity troubles
Martin Kroeker [Fri, 15 Feb 2019 14:08:16 +0000 (15:08 +0100)]
Fix wrong constraints in inline assembly
for #2009
Martin Kroeker [Fri, 15 Feb 2019 14:02:54 +0000 (15:02 +0100)]
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
Martin Kroeker [Fri, 15 Feb 2019 09:10:04 +0000 (10:10 +0100)]
Rename operands to put lda on the input/output constraint list
Martin Kroeker [Fri, 15 Feb 2019 08:57:59 +0000 (09:57 +0100)]
Merge pull request #2020 from martin-frbg/issue1956
With the Intel compiler on Linux, prefer ifort for the final link step
Martin Kroeker [Thu, 14 Feb 2019 21:57:30 +0000 (22:57 +0100)]
With the Intel compiler on Linux, prefer ifort for the final link step
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
Martin Kroeker [Thu, 14 Feb 2019 21:43:18 +0000 (22:43 +0100)]
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
Martin Kroeker [Thu, 14 Feb 2019 20:55:11 +0000 (21:55 +0100)]
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
Bart Oldeman [Thu, 14 Feb 2019 16:19:41 +0000 (16:19 +0000)]
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
Martin Kroeker [Thu, 14 Feb 2019 14:21:36 +0000 (15:21 +0100)]
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function (#2017)
* Fix missing clobber in blas_quickdivide assembly
Martin Kroeker [Thu, 14 Feb 2019 08:29:34 +0000 (09:29 +0100)]
Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
Martin Kroeker [Wed, 13 Feb 2019 21:08:37 +0000 (22:08 +0100)]
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
Martin Kroeker [Wed, 13 Feb 2019 21:06:41 +0000 (22:06 +0100)]
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
Martin Kroeker [Wed, 13 Feb 2019 19:15:56 +0000 (20:15 +0100)]
Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
maamountki [Wed, 13 Feb 2019 19:06:25 +0000 (21:06 +0200)]
[ZARCH] Modify constraints
maamountki [Wed, 13 Feb 2019 10:54:35 +0000 (12:54 +0200)]
[ZARCH] Fix caxpy
Martin Kroeker [Tue, 12 Feb 2019 22:24:02 +0000 (23:24 +0100)]
Merge pull request #2010 from martin-frbg/issue2009
Fix declaration of input arguments in x86_64 GEMV, SYMV and DSCAL
Martin Kroeker [Tue, 12 Feb 2019 15:14:02 +0000 (16:14 +0100)]
Fix declaration of arguments in inline assembly
Argument 0 is modified so should be input and output
Martin Kroeker [Tue, 12 Feb 2019 15:00:18 +0000 (16:00 +0100)]
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
Arguments 0 and 1 are both input and output
Martin Kroeker [Tue, 12 Feb 2019 14:51:43 +0000 (15:51 +0100)]
Fix declaration of input arguments in inline assembly
Argument 0 is modified as it doubles as a counter
Martin Kroeker [Tue, 12 Feb 2019 14:33:48 +0000 (15:33 +0100)]
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
Arguments 0 and 1 need to be tagged as both input and output
maamountki [Tue, 12 Feb 2019 11:12:28 +0000 (13:12 +0200)]
[ZARCH] Fix cgemv_t_4
maamountki [Mon, 11 Feb 2019 14:01:13 +0000 (16:01 +0200)]
[ZARCH] Fix constraints and source code formatting
Martin Kroeker [Sun, 10 Feb 2019 22:24:45 +0000 (23:24 +0100)]
Fix potential memory leak in cpu enumeration on Linux (#2008)
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
Martin Kroeker [Thu, 7 Feb 2019 19:06:13 +0000 (20:06 +0100)]
Restore dropped patches in the non-TLS branch of memory.c (#2004)
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002, the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450, #1468, #1501, #1504 and #1520.