Martin Kroeker [Sat, 23 Nov 2019 21:38:07 +0000 (22:38 +0100)]
Fix AVX512 capability test (always returning zero)
from #2322
Martin Kroeker [Thu, 21 Nov 2019 17:14:29 +0000 (18:14 +0100)]
Add the cpuid of the business/rackmount version of z15 as well
Martin Kroeker [Thu, 21 Nov 2019 17:03:00 +0000 (18:03 +0100)]
Merge pull request #2316 from sharkcz/s390x
zarch: treat z15 as z14 instead of generic
Martin Kroeker [Thu, 21 Nov 2019 16:59:21 +0000 (17:59 +0100)]
Merge pull request #2317 from aarnez/develop
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
Andreas Arnez [Fri, 20 Sep 2019 16:32:47 +0000 (18:32 +0200)]
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
Dan Horák [Thu, 21 Nov 2019 11:49:54 +0000 (12:49 +0100)]
zarch: treat z15 as z14 instead of generic
Signed-off-by: Dan Horák <dan@danny.cz>
Martin Kroeker [Thu, 21 Nov 2019 04:06:44 +0000 (05:06 +0100)]
Merge pull request #2315 from ewanglong/develop
revised fix windows compatible for #2313
Wang, Long [Thu, 21 Nov 2019 02:19:40 +0000 (10:19 +0800)]
revised fix windows compatible for #2313
Signed-off-by: Wang, Long <long1.wang@intel.com>
Martin Kroeker [Wed, 20 Nov 2019 15:16:35 +0000 (16:16 +0100)]
Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
Fix usage of TerminateThread() causing critical section corruption.
Martin Kroeker [Wed, 20 Nov 2019 14:12:06 +0000 (15:12 +0100)]
Merge pull request #2312 from martin-frbg/power8be
Further Power8 big-endian corrections
Martin Kroeker [Wed, 20 Nov 2019 13:49:15 +0000 (14:49 +0100)]
Merge pull request #2313 from ewanglong/develop
Fix the integer overflow issue for large matrix size
Wang, Long [Wed, 20 Nov 2019 13:30:16 +0000 (21:30 +0800)]
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Signed-off-by: Wang, Long <long1.wang@intel.com>
Jehan [Wed, 20 Nov 2019 11:21:35 +0000 (12:21 +0100)]
Fix usage of TerminateThread() causing critical section corruption.
This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:
First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread)
Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).
And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.
Wang, Long [Wed, 20 Nov 2019 03:50:37 +0000 (11:50 +0800)]
Fix the integer overflow issue for large matrix size
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
Martin Kroeker [Sun, 17 Nov 2019 22:19:48 +0000 (23:19 +0100)]
Merge pull request #2310 from martin-frbg/ppc440
Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default
Martin Kroeker [Sun, 17 Nov 2019 22:12:10 +0000 (23:12 +0100)]
Define alternate kernels for big-endian POWER8
Martin Kroeker [Sun, 17 Nov 2019 21:58:32 +0000 (22:58 +0100)]
Fix compilation for big-endian POWER8
Martin Kroeker [Sun, 17 Nov 2019 18:25:08 +0000 (19:25 +0100)]
Define alternate kernels for big-endian PPC440
Martin Kroeker [Sun, 17 Nov 2019 18:22:04 +0000 (19:22 +0100)]
Disable the old QCDOC qalloc by default and copy utility functions from memory.c
1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.
Martin Kroeker [Sun, 17 Nov 2019 18:09:49 +0000 (19:09 +0100)]
Merge pull request #17 from xianyi/develop
rebase
Martin Kroeker [Sun, 17 Nov 2019 17:22:24 +0000 (18:22 +0100)]
Merge pull request #2309 from martin-frbg/ppc970-be
Fix PPC970 big-endian support
Martin Kroeker [Sun, 17 Nov 2019 14:19:39 +0000 (15:19 +0100)]
Define alternate kernels for big-endian PPC970
The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.
Martin Kroeker [Sun, 17 Nov 2019 14:10:26 +0000 (15:10 +0100)]
Use "generic" S/CGEMM unroll M on big-endian PPC970
as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian
Martin Kroeker [Fri, 15 Nov 2019 07:33:17 +0000 (08:33 +0100)]
Merge pull request #2308 from martin-frbg/ctestfix
Fix potential issue in the c/z blas3 ctests
Martin Kroeker [Thu, 14 Nov 2019 23:20:36 +0000 (00:20 +0100)]
Fix potential spurious failure from uninitialized variable
Martin Kroeker [Thu, 14 Nov 2019 23:19:24 +0000 (00:19 +0100)]
Fix potential spurious failure from uninitialized variable
Martin Kroeker [Tue, 12 Nov 2019 06:38:37 +0000 (07:38 +0100)]
Merge pull request #2305 from wjc404/develop
AVX512 CGEMM & ZGEMM kernels
wjc404 [Mon, 11 Nov 2019 12:04:52 +0000 (20:04 +0800)]
AVX512 CGEMM & ZGEMM kernels
96-99% 1-thread performance of MKL2018
Martin Kroeker [Sat, 9 Nov 2019 17:52:08 +0000 (18:52 +0100)]
Merge pull request #15 from xianyi/develop
rebase
Martin Kroeker [Wed, 6 Nov 2019 06:27:33 +0000 (07:27 +0100)]
Merge pull request #2300 from wjc404/develop
Optimize SGEMM on SKYLAKEX CPUs
wjc404 [Tue, 5 Nov 2019 05:36:56 +0000 (13:36 +0800)]
optimizations of software prefetching
Martin Kroeker [Mon, 4 Nov 2019 21:55:05 +0000 (22:55 +0100)]
Merge pull request #2302 from martin-frbg/ppc970
Disable three-operand DCBT on PPC970 regardless of operating system
Martin Kroeker [Mon, 4 Nov 2019 21:54:28 +0000 (22:54 +0100)]
Merge pull request #2301 from martin-frbg/ppc8be
Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8
Martin Kroeker [Mon, 4 Nov 2019 21:53:58 +0000 (22:53 +0100)]
Merge pull request #2294 from martin-frbg/ios-cleanup
Remove obsolete workarounds for IOS on ARMV8
wjc404 [Mon, 4 Nov 2019 12:10:12 +0000 (20:10 +0800)]
Add files via upload
wjc404 [Mon, 4 Nov 2019 11:37:19 +0000 (19:37 +0800)]
optimizations via software prefetches
Martin Kroeker [Sun, 3 Nov 2019 21:55:31 +0000 (22:55 +0100)]
Use the two-operand form of DCBT on all PPC970 regardless of OS
There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems on other than the previously special-cased platforms as well
Martin Kroeker [Sun, 3 Nov 2019 21:42:46 +0000 (22:42 +0100)]
The assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:41:19 +0000 (22:41 +0100)]
The assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:39:06 +0000 (22:39 +0100)]
The assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:37:27 +0000 (22:37 +0100)]
The assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:33:31 +0000 (22:33 +0100)]
Merge pull request #13 from xianyi/develop
resync with upstream
wjc404 [Sat, 2 Nov 2019 02:09:19 +0000 (10:09 +0800)]
Add files via upload
wjc404 [Sat, 2 Nov 2019 02:06:13 +0000 (10:06 +0800)]
Add files via upload
wjc404 [Fri, 1 Nov 2019 16:00:48 +0000 (00:00 +0800)]
new sgemm kernel for skylakex
wjc404 [Fri, 1 Nov 2019 15:59:18 +0000 (23:59 +0800)]
update sgemm_q on skylakex cpus
Martin Kroeker [Mon, 28 Oct 2019 12:24:18 +0000 (13:24 +0100)]
Merge pull request #2296 from kdunee/develop
Fixed a minor cmake problem, occuring when DYNAMIC_ARCH=ON and CMAKE_C_FLAGS was empty
k.dunikowski [Mon, 28 Oct 2019 07:51:05 +0000 (08:51 +0100)]
Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty
Martin Kroeker [Fri, 25 Oct 2019 21:46:39 +0000 (23:46 +0200)]
Merge pull request #2293 from martin-frbg/pr2288
Add support for NetBSD by adding it to the existing xBSD conditionals
Martin Kroeker [Fri, 25 Oct 2019 21:07:00 +0000 (23:07 +0200)]
Remove special parameter set for obsolete IOS/ARMV8 workaround
Martin Kroeker [Fri, 25 Oct 2019 21:02:37 +0000 (23:02 +0200)]
Remove the IOS fallbacks to generic C kernels
Martin Kroeker [Fri, 25 Oct 2019 20:52:30 +0000 (22:52 +0200)]
Fix regex to parse -R options with and without whitespace
Both forms are seen on NetBSD (#2288)
Martin Kroeker [Fri, 25 Oct 2019 10:52:49 +0000 (12:52 +0200)]
Add NetBSD to the xBSD conditionals
Martin Kroeker [Fri, 25 Oct 2019 10:51:06 +0000 (12:51 +0200)]
Add NetBSD
Martin Kroeker [Fri, 25 Oct 2019 08:35:17 +0000 (10:35 +0200)]
Merge pull request #2292 from martin-frbg/g95fixes
Improve support for g95 and non-GNU ld
Martin Kroeker [Fri, 25 Oct 2019 08:34:50 +0000 (10:34 +0200)]
Merge pull request #2291 from martin-frbg/gensymbol
Fix netlib 3.7/3.8 function enumeration for linktest
Martin Kroeker [Fri, 25 Oct 2019 07:56:30 +0000 (09:56 +0200)]
Merge pull request #2282 from martin-frbg/issue2281
Optimize RPCC function on ARM64
Martin Kroeker [Thu, 24 Oct 2019 20:52:15 +0000 (22:52 +0200)]
Merge pull request #2290 from martin-frbg/cpuidfixes
Fixup x86 cpuid changes from #2283
Martin Kroeker [Thu, 24 Oct 2019 20:43:27 +0000 (22:43 +0200)]
Improve support for g95 and non-GNU ld
Auto-add "-fno-second-underscore" option to make LAPACKE compile (as it calls LAPACK functions that may have gotten a second underscore added otherwise). Also support -R for rpath when parsing compiler directives in f_check
Martin Kroeker [Thu, 24 Oct 2019 19:26:20 +0000 (21:26 +0200)]
Move most lapack 3.7/3.8 additions to the embedded_underscores list
to allow linktest to pass with a compiler that adds a second underscore to such names
Martin Kroeker [Thu, 24 Oct 2019 19:18:17 +0000 (21:18 +0200)]
Disable direct clock register access on IOS and Android
as I find conflicting information on accessibility from non-priviledged processes
luzpaz [Thu, 24 Oct 2019 16:56:53 +0000 (12:56 -0400)]
Remove prototype of unused, unimplemented function (#2274)
* Fix source typo
Found via `codespell -q 3 -L amin,als,ba,dum,mone,nd,nto,orign -S Changelog.txt,./lapack*`
* Remove beta-thread function per request
Martin Kroeker [Thu, 24 Oct 2019 16:45:27 +0000 (18:45 +0200)]
Restore Goldmont ID and improve QEMU support
#2283 had inadvertently removed Goldmont+, and cpuid was reporting a mix of Core2 and Pentium2 for some QEMU configurations
Martin Kroeker [Thu, 24 Oct 2019 16:40:13 +0000 (18:40 +0200)]
Merge pull request #12 from xianyi/develop
resync with upstream
Martin Kroeker [Sun, 20 Oct 2019 10:44:19 +0000 (12:44 +0200)]
Merge pull request #2286 from wjc404/develop
AVX512 DGEMM kernel
wjc404 [Fri, 18 Oct 2019 19:54:44 +0000 (03:54 +0800)]
native support for icopy_4
90% MKL 1-thread performance.
wjc404 [Fri, 18 Oct 2019 07:00:17 +0000 (15:00 +0800)]
Update dgemm_kernel_8x8_skylakex.c
wjc404 [Fri, 18 Oct 2019 06:58:07 +0000 (14:58 +0800)]
some correction
wjc404 [Fri, 18 Oct 2019 02:47:31 +0000 (10:47 +0800)]
make further changes to icopy_8 easier
wjc404 [Wed, 16 Oct 2019 11:23:36 +0000 (19:23 +0800)]
Add files via upload
wjc404 [Wed, 16 Oct 2019 02:14:51 +0000 (10:14 +0800)]
Update dgemm_kernel_8x8_skylakex.c
wjc404 [Tue, 15 Oct 2019 19:20:08 +0000 (03:20 +0800)]
Update dgemm_kernel_8x8_skylakex.c
wjc404 [Tue, 15 Oct 2019 18:01:13 +0000 (02:01 +0800)]
Add files via upload
wjc404 [Tue, 15 Oct 2019 18:00:34 +0000 (02:00 +0800)]
Add files via upload
Martin Kroeker [Wed, 9 Oct 2019 20:06:09 +0000 (22:06 +0200)]
Merge pull request #2283 from martin-frbg/issue2176
Support QEMU virtual cpu in 64bit mode as CORE2 or BARCELONA
Martin Kroeker [Wed, 9 Oct 2019 16:24:13 +0000 (18:24 +0200)]
Support QEMU cpu calling itself 64bit AMD Athlon as well
Some QEMU instances pretend to be "AuthenticAMD" with the same family 6/model 6 even when running on an Intel host
(could be related to qemu or libvirt version and/or kvm availability). Also fix the define to depend on __x86_64__ set by the
compiler, the defines using __64BIT__ will only work for getarch_2nd.
Martin Kroeker [Tue, 8 Oct 2019 20:30:02 +0000 (22:30 +0200)]
Support QEMU virtual cpu as CORE2
qemu itself claims it is a 64bit P6, which does not exist in the wild.
Martin Kroeker [Tue, 8 Oct 2019 18:13:14 +0000 (20:13 +0200)]
Simplify OSX/IOS cross-compilation and add a CI test for it (#2279)
* Add automatic fixups for OSX/IOS cross-compilation
* Add OSX/IOS cross-compilation test to Travis CI
* Handle platforms that lack hwcap.h by falling back to ARMV8
* Fix PROLOGUE for OSX/IOS
Martin Kroeker [Tue, 8 Oct 2019 18:12:08 +0000 (20:12 +0200)]
Update common_arm64.h
Martin Kroeker [Tue, 8 Oct 2019 08:25:25 +0000 (10:25 +0200)]
Merge pull request #2280 from martin-frbg/iosfix
Add overlooked part of IOS compilation fix
Martin Kroeker [Tue, 8 Oct 2019 06:37:50 +0000 (08:37 +0200)]
Remove automatic label postfixes from macro included only once
Martin Kroeker [Tue, 8 Oct 2019 06:32:52 +0000 (08:32 +0200)]
Merge pull request #11 from xianyi/develop
sync with upstream
Martin Kroeker [Tue, 8 Oct 2019 06:09:26 +0000 (08:09 +0200)]
Fix accidental duplication of jump instruction
Martin Kroeker [Sun, 6 Oct 2019 21:01:54 +0000 (23:01 +0200)]
Merge pull request #2277 from martin-frbg/issue2275
Rewrite ARMV8 code to allow cross-compilation for IOS
Martin Kroeker [Sun, 6 Oct 2019 09:12:44 +0000 (11:12 +0200)]
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
Revert "Avoid taking root of negative number in symv_thread.c"
Martin Kroeker [Sat, 5 Oct 2019 08:52:47 +0000 (10:52 +0200)]
Move 32bit OSX build back to xcode 8.3 but switch to gcc8
Martin Kroeker [Fri, 4 Oct 2019 12:53:23 +0000 (14:53 +0200)]
Make local labels in macro compatible with the xcode assembler
... which does not perform the automatic numbering on instantiation that the _@ suffix signifies
Martin Kroeker [Fri, 4 Oct 2019 12:50:03 +0000 (14:50 +0200)]
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios
Martin Kroeker [Wed, 2 Oct 2019 23:09:02 +0000 (01:09 +0200)]
Update 32bit macOS again to xcode 9.3
os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source
Martin Kroeker [Wed, 2 Oct 2019 20:35:34 +0000 (22:35 +0200)]
Update the OSX BINARY=32 test to xcode9.2
in response to Homebrew updates
Martin Kroeker [Tue, 1 Oct 2019 21:50:41 +0000 (23:50 +0200)]
Revert "Avoid taking root of negative number in symv_thread.c"
Martin Kroeker [Mon, 30 Sep 2019 09:27:29 +0000 (11:27 +0200)]
Merge pull request #2272 from seberg/thread-sqrt-of-negative
Avoid taking root of negative number in symv_thread.c
Sebastian Berg [Mon, 30 Sep 2019 05:03:12 +0000 (22:03 -0700)]
Avoid taking root of negative number in symv_thread.c
This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.
Martin Kroeker [Sun, 29 Sep 2019 11:53:45 +0000 (13:53 +0200)]
Merge pull request #2271 from quickwritereader/strmm_fix
fixed bug power9 strmm . BLAS-TESTER passes
AbdelRauf [Sun, 29 Sep 2019 02:27:50 +0000 (02:27 +0000)]
trmm fix
Martin Kroeker [Fri, 27 Sep 2019 07:52:19 +0000 (09:52 +0200)]
Merge pull request #2269 from martin-frbg/ppc-fixes
Ppc fixes
Martin Kroeker [Thu, 26 Sep 2019 22:47:18 +0000 (00:47 +0200)]
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc
Martin Kroeker [Thu, 26 Sep 2019 22:44:26 +0000 (00:44 +0200)]
Fix mis-edits in the gcc-derived power8 caxpy kernel
Martin Kroeker [Thu, 26 Sep 2019 22:42:32 +0000 (00:42 +0200)]
Merge pull request #7 from xianyi/develop
update
Martin Kroeker [Wed, 25 Sep 2019 21:13:24 +0000 (23:13 +0200)]
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.