platform/upstream/openblas.git
4 years agoMerge pull request #2317 from aarnez/develop
Martin Kroeker [Thu, 21 Nov 2019 16:59:21 +0000 (17:59 +0100)]
Merge pull request #2317 from aarnez/develop

Change bad usage of "asum" to "sum" in ZARCH versions of ?sum

4 years agoChange bad usage of "asum" to "sum" in ZARCH versions of ?sum
Andreas Arnez [Fri, 20 Sep 2019 16:32:47 +0000 (18:32 +0200)]
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum

The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead.  The mismatch causes a build error.  This is fixed.

4 years agoMerge pull request #2315 from ewanglong/develop
Martin Kroeker [Thu, 21 Nov 2019 04:06:44 +0000 (05:06 +0100)]
Merge pull request #2315 from ewanglong/develop

revised fix windows compatible for #2313

4 years agorevised fix windows compatible for #2313
Wang, Long [Thu, 21 Nov 2019 02:19:40 +0000 (10:19 +0800)]
revised fix windows compatible for #2313

Signed-off-by: Wang, Long <long1.wang@intel.com>
4 years agoMerge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
Martin Kroeker [Wed, 20 Nov 2019 15:16:35 +0000 (16:16 +0100)]
Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash

Fix usage of TerminateThread() causing critical section corruption.

4 years agoMerge pull request #2312 from martin-frbg/power8be
Martin Kroeker [Wed, 20 Nov 2019 14:12:06 +0000 (15:12 +0100)]
Merge pull request #2312 from martin-frbg/power8be

Further Power8 big-endian corrections

4 years agoMerge pull request #2313 from ewanglong/develop
Martin Kroeker [Wed, 20 Nov 2019 13:49:15 +0000 (14:49 +0100)]
Merge pull request #2313 from ewanglong/develop

Fix the integer overflow issue for large matrix size

4 years agoFor the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Wang, Long [Wed, 20 Nov 2019 13:30:16 +0000 (21:30 +0800)]
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length

Signed-off-by: Wang, Long <long1.wang@intel.com>
4 years agoFix usage of TerminateThread() causing critical section corruption.
Jehan [Wed, 20 Nov 2019 11:21:35 +0000 (12:21 +0100)]
Fix usage of TerminateThread() causing critical section corruption.

This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:

First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread)

Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).

And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.

4 years agoFix the integer overflow issue for large matrix size
Wang, Long [Wed, 20 Nov 2019 03:50:37 +0000 (11:50 +0800)]
Fix the integer overflow issue for large matrix size

For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.

Signed-off-by: Wang, Long <long1.wang@intel.com>
4 years agoMerge pull request #2310 from martin-frbg/ppc440
Martin Kroeker [Sun, 17 Nov 2019 22:19:48 +0000 (23:19 +0100)]
Merge pull request #2310 from martin-frbg/ppc440

Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default

4 years agoDefine alternate kernels for big-endian POWER8
Martin Kroeker [Sun, 17 Nov 2019 22:12:10 +0000 (23:12 +0100)]
Define alternate kernels for big-endian POWER8

4 years agoFix compilation for big-endian POWER8
Martin Kroeker [Sun, 17 Nov 2019 21:58:32 +0000 (22:58 +0100)]
Fix compilation for big-endian POWER8

4 years agoDefine alternate kernels for big-endian PPC440
Martin Kroeker [Sun, 17 Nov 2019 18:25:08 +0000 (19:25 +0100)]
Define alternate kernels for big-endian PPC440

4 years agoDisable the old QCDOC qalloc by default and copy utility functions from memory.c
Martin Kroeker [Sun, 17 Nov 2019 18:22:04 +0000 (19:22 +0100)]
Disable the old QCDOC qalloc by default and copy utility functions from memory.c

1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.

4 years agoMerge pull request #17 from xianyi/develop
Martin Kroeker [Sun, 17 Nov 2019 18:09:49 +0000 (19:09 +0100)]
Merge pull request #17 from xianyi/develop

rebase

4 years agoMerge pull request #2309 from martin-frbg/ppc970-be
Martin Kroeker [Sun, 17 Nov 2019 17:22:24 +0000 (18:22 +0100)]
Merge pull request #2309 from martin-frbg/ppc970-be

Fix PPC970 big-endian support

4 years agoDefine alternate kernels for big-endian PPC970
Martin Kroeker [Sun, 17 Nov 2019 14:19:39 +0000 (15:19 +0100)]
Define alternate kernels for big-endian PPC970

The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.

4 years agoUse "generic" S/CGEMM unroll M on big-endian PPC970
Martin Kroeker [Sun, 17 Nov 2019 14:10:26 +0000 (15:10 +0100)]
Use "generic" S/CGEMM unroll M on big-endian PPC970

as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian

4 years agoMerge pull request #2308 from martin-frbg/ctestfix
Martin Kroeker [Fri, 15 Nov 2019 07:33:17 +0000 (08:33 +0100)]
Merge pull request #2308 from martin-frbg/ctestfix

Fix potential issue in the c/z blas3 ctests

4 years agoFix potential spurious failure from uninitialized variable
Martin Kroeker [Thu, 14 Nov 2019 23:20:36 +0000 (00:20 +0100)]
Fix potential spurious failure from uninitialized variable

4 years agoFix potential spurious failure from uninitialized variable
Martin Kroeker [Thu, 14 Nov 2019 23:19:24 +0000 (00:19 +0100)]
Fix potential spurious failure from uninitialized variable

4 years agoMerge pull request #2305 from wjc404/develop
Martin Kroeker [Tue, 12 Nov 2019 06:38:37 +0000 (07:38 +0100)]
Merge pull request #2305 from wjc404/develop

AVX512 CGEMM & ZGEMM kernels

4 years agoAVX512 CGEMM & ZGEMM kernels
wjc404 [Mon, 11 Nov 2019 12:04:52 +0000 (20:04 +0800)]
AVX512 CGEMM & ZGEMM kernels

96-99% 1-thread performance of MKL2018

4 years agoMerge pull request #15 from xianyi/develop
Martin Kroeker [Sat, 9 Nov 2019 17:52:08 +0000 (18:52 +0100)]
Merge pull request #15 from xianyi/develop

rebase

4 years agoMerge pull request #2300 from wjc404/develop
Martin Kroeker [Wed, 6 Nov 2019 06:27:33 +0000 (07:27 +0100)]
Merge pull request #2300 from wjc404/develop

Optimize SGEMM on SKYLAKEX CPUs

4 years agooptimizations of software prefetching
wjc404 [Tue, 5 Nov 2019 05:36:56 +0000 (13:36 +0800)]
optimizations of software prefetching

4 years agoMerge pull request #2302 from martin-frbg/ppc970
Martin Kroeker [Mon, 4 Nov 2019 21:55:05 +0000 (22:55 +0100)]
Merge pull request #2302 from martin-frbg/ppc970

Disable three-operand DCBT on PPC970 regardless of operating system

4 years agoMerge pull request #2301 from martin-frbg/ppc8be
Martin Kroeker [Mon, 4 Nov 2019 21:54:28 +0000 (22:54 +0100)]
Merge pull request #2301 from martin-frbg/ppc8be

Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8

4 years agoMerge pull request #2294 from martin-frbg/ios-cleanup
Martin Kroeker [Mon, 4 Nov 2019 21:53:58 +0000 (22:53 +0100)]
Merge pull request #2294 from martin-frbg/ios-cleanup

Remove obsolete workarounds for IOS on ARMV8

4 years agoAdd files via upload
wjc404 [Mon, 4 Nov 2019 12:10:12 +0000 (20:10 +0800)]
Add files via upload

4 years agooptimizations via software prefetches
wjc404 [Mon, 4 Nov 2019 11:37:19 +0000 (19:37 +0800)]
optimizations via software prefetches

4 years agoUse the two-operand form of DCBT on all PPC970 regardless of OS
Martin Kroeker [Sun, 3 Nov 2019 21:55:31 +0000 (22:55 +0100)]
Use the two-operand form of DCBT on all PPC970 regardless of OS

There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems  on other than the previously special-cased platforms as well

4 years agoThe assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:42:46 +0000 (22:42 +0100)]
The assembly microkernel is not safe to use on ELFv1

4 years agoThe assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:41:19 +0000 (22:41 +0100)]
The assembly microkernel is not safe to use on ELFv1

4 years agoThe assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:39:06 +0000 (22:39 +0100)]
The assembly microkernel is not safe to use on ELFv1

4 years agoThe assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:37:27 +0000 (22:37 +0100)]
The assembly microkernel is not safe to use on ELFv1

4 years agoMerge pull request #13 from xianyi/develop
Martin Kroeker [Sun, 3 Nov 2019 21:33:31 +0000 (22:33 +0100)]
Merge pull request #13 from xianyi/develop

resync with upstream

4 years agoAdd files via upload
wjc404 [Sat, 2 Nov 2019 02:09:19 +0000 (10:09 +0800)]
Add files via upload

4 years agoAdd files via upload
wjc404 [Sat, 2 Nov 2019 02:06:13 +0000 (10:06 +0800)]
Add files via upload

4 years agonew sgemm kernel for skylakex
wjc404 [Fri, 1 Nov 2019 16:00:48 +0000 (00:00 +0800)]
new sgemm kernel for skylakex

4 years agoupdate sgemm_q on skylakex cpus
wjc404 [Fri, 1 Nov 2019 15:59:18 +0000 (23:59 +0800)]
update sgemm_q on skylakex cpus

4 years agoMerge pull request #2296 from kdunee/develop
Martin Kroeker [Mon, 28 Oct 2019 12:24:18 +0000 (13:24 +0100)]
Merge pull request #2296 from kdunee/develop

Fixed a minor cmake problem, occuring when DYNAMIC_ARCH=ON and CMAKE_C_FLAGS was empty

4 years agoFixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was...
k.dunikowski [Mon, 28 Oct 2019 07:51:05 +0000 (08:51 +0100)]
Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty

4 years agoMerge pull request #2293 from martin-frbg/pr2288
Martin Kroeker [Fri, 25 Oct 2019 21:46:39 +0000 (23:46 +0200)]
Merge pull request #2293 from martin-frbg/pr2288

Add support for NetBSD by adding it to the existing xBSD conditionals

4 years agoRemove special parameter set for obsolete IOS/ARMV8 workaround
Martin Kroeker [Fri, 25 Oct 2019 21:07:00 +0000 (23:07 +0200)]
Remove special parameter set for obsolete IOS/ARMV8 workaround

4 years agoRemove the IOS fallbacks to generic C kernels
Martin Kroeker [Fri, 25 Oct 2019 21:02:37 +0000 (23:02 +0200)]
Remove the IOS fallbacks to generic C kernels

4 years agoFix regex to parse -R options with and without whitespace
Martin Kroeker [Fri, 25 Oct 2019 20:52:30 +0000 (22:52 +0200)]
Fix regex to parse -R options with and without whitespace

Both forms are seen on NetBSD (#2288)

4 years agoAdd NetBSD to the xBSD conditionals
Martin Kroeker [Fri, 25 Oct 2019 10:52:49 +0000 (12:52 +0200)]
Add NetBSD to the xBSD conditionals

4 years agoAdd NetBSD
Martin Kroeker [Fri, 25 Oct 2019 10:51:06 +0000 (12:51 +0200)]
Add NetBSD

4 years agoMerge pull request #2292 from martin-frbg/g95fixes
Martin Kroeker [Fri, 25 Oct 2019 08:35:17 +0000 (10:35 +0200)]
Merge pull request #2292 from martin-frbg/g95fixes

Improve support for g95 and non-GNU ld

4 years agoMerge pull request #2291 from martin-frbg/gensymbol
Martin Kroeker [Fri, 25 Oct 2019 08:34:50 +0000 (10:34 +0200)]
Merge pull request #2291 from martin-frbg/gensymbol

Fix netlib 3.7/3.8 function enumeration for linktest

4 years agoMerge pull request #2282 from martin-frbg/issue2281
Martin Kroeker [Fri, 25 Oct 2019 07:56:30 +0000 (09:56 +0200)]
Merge pull request #2282 from martin-frbg/issue2281

Optimize RPCC function on ARM64

4 years agoMerge pull request #2290 from martin-frbg/cpuidfixes
Martin Kroeker [Thu, 24 Oct 2019 20:52:15 +0000 (22:52 +0200)]
Merge pull request #2290 from martin-frbg/cpuidfixes

Fixup x86 cpuid changes from #2283

4 years agoImprove support for g95 and non-GNU ld
Martin Kroeker [Thu, 24 Oct 2019 20:43:27 +0000 (22:43 +0200)]
Improve support for g95 and non-GNU ld

Auto-add "-fno-second-underscore" option to make LAPACKE compile (as it calls LAPACK functions that may have gotten a second underscore added otherwise). Also support -R for rpath when parsing compiler directives in f_check

4 years agoMove most lapack 3.7/3.8 additions to the embedded_underscores list
Martin Kroeker [Thu, 24 Oct 2019 19:26:20 +0000 (21:26 +0200)]
Move most lapack 3.7/3.8 additions to the embedded_underscores list

to allow linktest to pass with a compiler that adds a second underscore to such names

4 years agoDisable direct clock register access on IOS and Android
Martin Kroeker [Thu, 24 Oct 2019 19:18:17 +0000 (21:18 +0200)]
Disable direct clock register access on IOS and Android

as I find conflicting information on accessibility from non-priviledged processes

4 years agoRemove prototype of unused, unimplemented function (#2274)
luzpaz [Thu, 24 Oct 2019 16:56:53 +0000 (12:56 -0400)]
Remove prototype of unused, unimplemented function (#2274)

* Fix source typo

Found via `codespell -q 3 -L amin,als,ba,dum,mone,nd,nto,orign -S Changelog.txt,./lapack*`

* Remove beta-thread function per request

4 years agoRestore Goldmont ID and improve QEMU support
Martin Kroeker [Thu, 24 Oct 2019 16:45:27 +0000 (18:45 +0200)]
Restore Goldmont ID and improve QEMU support

#2283 had inadvertently removed Goldmont+, and cpuid was reporting a mix of Core2 and Pentium2 for some QEMU configurations

4 years agoMerge pull request #12 from xianyi/develop
Martin Kroeker [Thu, 24 Oct 2019 16:40:13 +0000 (18:40 +0200)]
Merge pull request #12 from xianyi/develop

resync with upstream

4 years agoMerge pull request #2286 from wjc404/develop
Martin Kroeker [Sun, 20 Oct 2019 10:44:19 +0000 (12:44 +0200)]
Merge pull request #2286 from wjc404/develop

AVX512 DGEMM kernel

4 years agonative support for icopy_4
wjc404 [Fri, 18 Oct 2019 19:54:44 +0000 (03:54 +0800)]
native support for icopy_4

90% MKL 1-thread performance.

4 years agoUpdate dgemm_kernel_8x8_skylakex.c
wjc404 [Fri, 18 Oct 2019 07:00:17 +0000 (15:00 +0800)]
Update dgemm_kernel_8x8_skylakex.c

4 years agosome correction
wjc404 [Fri, 18 Oct 2019 06:58:07 +0000 (14:58 +0800)]
some correction

4 years agomake further changes to icopy_8 easier
wjc404 [Fri, 18 Oct 2019 02:47:31 +0000 (10:47 +0800)]
make further changes to icopy_8 easier

4 years agoAdd files via upload
wjc404 [Wed, 16 Oct 2019 11:23:36 +0000 (19:23 +0800)]
Add files via upload

4 years agoUpdate dgemm_kernel_8x8_skylakex.c
wjc404 [Wed, 16 Oct 2019 02:14:51 +0000 (10:14 +0800)]
Update dgemm_kernel_8x8_skylakex.c

4 years agoUpdate dgemm_kernel_8x8_skylakex.c
wjc404 [Tue, 15 Oct 2019 19:20:08 +0000 (03:20 +0800)]
Update dgemm_kernel_8x8_skylakex.c

4 years agoAdd files via upload
wjc404 [Tue, 15 Oct 2019 18:01:13 +0000 (02:01 +0800)]
Add files via upload

4 years agoAdd files via upload
wjc404 [Tue, 15 Oct 2019 18:00:34 +0000 (02:00 +0800)]
Add files via upload

4 years agoMerge pull request #2283 from martin-frbg/issue2176
Martin Kroeker [Wed, 9 Oct 2019 20:06:09 +0000 (22:06 +0200)]
Merge pull request #2283 from martin-frbg/issue2176

Support QEMU virtual cpu in 64bit mode as CORE2 or BARCELONA

4 years agoSupport QEMU cpu calling itself 64bit AMD Athlon as well
Martin Kroeker [Wed, 9 Oct 2019 16:24:13 +0000 (18:24 +0200)]
Support QEMU cpu calling itself 64bit AMD Athlon as well

Some QEMU instances pretend to be "AuthenticAMD" with the same family 6/model 6 even when running on an Intel host
(could be related to qemu or libvirt version and/or kvm availability). Also fix the define to depend on __x86_64__ set by the
compiler, the defines using __64BIT__ will only work for getarch_2nd.

4 years agoSupport QEMU virtual cpu as CORE2
Martin Kroeker [Tue, 8 Oct 2019 20:30:02 +0000 (22:30 +0200)]
Support QEMU virtual cpu as CORE2

qemu itself claims it is a 64bit P6, which does not exist in the wild.

4 years agoSimplify OSX/IOS cross-compilation and add a CI test for it (#2279)
Martin Kroeker [Tue, 8 Oct 2019 18:13:14 +0000 (20:13 +0200)]
Simplify OSX/IOS cross-compilation and add a CI test for it (#2279)

* Add automatic fixups for OSX/IOS cross-compilation

* Add OSX/IOS cross-compilation test to Travis CI

* Handle platforms that lack hwcap.h by falling back to ARMV8

* Fix PROLOGUE for OSX/IOS

4 years agoUpdate common_arm64.h
Martin Kroeker [Tue, 8 Oct 2019 18:12:08 +0000 (20:12 +0200)]
Update common_arm64.h

4 years agoMerge pull request #2280 from martin-frbg/iosfix
Martin Kroeker [Tue, 8 Oct 2019 08:25:25 +0000 (10:25 +0200)]
Merge pull request #2280 from martin-frbg/iosfix

Add overlooked part of IOS compilation fix

4 years agoRemove automatic label postfixes from macro included only once
Martin Kroeker [Tue, 8 Oct 2019 06:37:50 +0000 (08:37 +0200)]
Remove automatic label postfixes from macro included only once

4 years agoMerge pull request #11 from xianyi/develop
Martin Kroeker [Tue, 8 Oct 2019 06:32:52 +0000 (08:32 +0200)]
Merge pull request #11 from xianyi/develop

sync with upstream

4 years agoFix accidental duplication of jump instruction
Martin Kroeker [Tue, 8 Oct 2019 06:09:26 +0000 (08:09 +0200)]
Fix accidental duplication of jump instruction

4 years agoMerge pull request #2277 from martin-frbg/issue2275
Martin Kroeker [Sun, 6 Oct 2019 21:01:54 +0000 (23:01 +0200)]
Merge pull request #2277 from martin-frbg/issue2275

Rewrite ARMV8 code to allow cross-compilation for IOS

4 years agoMerge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
Martin Kroeker [Sun, 6 Oct 2019 09:12:44 +0000 (11:12 +0200)]
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative

Revert "Avoid taking root of negative number in symv_thread.c"

4 years agoMove 32bit OSX build back to xcode 8.3 but switch to gcc8
Martin Kroeker [Sat, 5 Oct 2019 08:52:47 +0000 (10:52 +0200)]
Move 32bit OSX build back to xcode 8.3 but switch to gcc8

4 years agoMake local labels in macro compatible with the xcode assembler
Martin Kroeker [Fri, 4 Oct 2019 12:53:23 +0000 (14:53 +0200)]
Make local labels in macro compatible with the xcode assembler

... which does not perform the automatic numbering on instantiation that the _@ suffix signifies

4 years agoRewrite ARM64 PROLOGUE to make it compatible with xcode/ios
Martin Kroeker [Fri, 4 Oct 2019 12:50:03 +0000 (14:50 +0200)]
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios

4 years agoUpdate 32bit macOS again to xcode 9.3
Martin Kroeker [Wed, 2 Oct 2019 23:09:02 +0000 (01:09 +0200)]
Update 32bit macOS again to xcode 9.3

os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source

4 years agoUpdate the OSX BINARY=32 test to xcode9.2
Martin Kroeker [Wed, 2 Oct 2019 20:35:34 +0000 (22:35 +0200)]
Update the OSX BINARY=32 test to xcode9.2

in response to Homebrew updates

4 years agoRevert "Avoid taking root of negative number in symv_thread.c"
Martin Kroeker [Tue, 1 Oct 2019 21:50:41 +0000 (23:50 +0200)]
Revert "Avoid taking root of negative number in symv_thread.c"

4 years agoMerge pull request #2272 from seberg/thread-sqrt-of-negative
Martin Kroeker [Mon, 30 Sep 2019 09:27:29 +0000 (11:27 +0200)]
Merge pull request #2272 from seberg/thread-sqrt-of-negative

Avoid taking root of negative number in symv_thread.c

4 years agoAvoid taking root of negative number in symv_thread.c
Sebastian Berg [Mon, 30 Sep 2019 05:03:12 +0000 (22:03 -0700)]
Avoid taking root of negative number in symv_thread.c

This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.

4 years agoMerge pull request #2271 from quickwritereader/strmm_fix
Martin Kroeker [Sun, 29 Sep 2019 11:53:45 +0000 (13:53 +0200)]
Merge pull request #2271 from quickwritereader/strmm_fix

fixed bug power9 strmm . BLAS-TESTER passes

4 years agotrmm fix
AbdelRauf [Sun, 29 Sep 2019 02:27:50 +0000 (02:27 +0000)]
trmm fix

4 years agoMerge pull request #2269 from martin-frbg/ppc-fixes
Martin Kroeker [Fri, 27 Sep 2019 07:52:19 +0000 (09:52 +0200)]
Merge pull request #2269 from martin-frbg/ppc-fixes

Ppc fixes

4 years agoFix prologue of power9 assembly cdot(c) kernel to provide cdotc
Martin Kroeker [Thu, 26 Sep 2019 22:47:18 +0000 (00:47 +0200)]
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc

4 years agoFix mis-edits in the gcc-derived power8 caxpy kernel
Martin Kroeker [Thu, 26 Sep 2019 22:44:26 +0000 (00:44 +0200)]
Fix mis-edits in the gcc-derived power8 caxpy kernel

4 years agoMerge pull request #7 from xianyi/develop
Martin Kroeker [Thu, 26 Sep 2019 22:42:32 +0000 (00:42 +0200)]
Merge pull request #7 from xianyi/develop

update

4 years agoCount cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)
Martin Kroeker [Wed, 25 Sep 2019 21:13:24 +0000 (23:13 +0200)]
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)

There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.

4 years agoReplace several POWER8/9 C kernels with their gcc7-generated assembly versions (...
Martin Kroeker [Sun, 22 Sep 2019 20:35:22 +0000 (22:35 +0200)]
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263)

* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy

To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0

* Use gcc-generated assembly instead of original C sources

to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3

* Use gcc-generated assembly instead of the original C source

to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3

* Add gcc7-generated assembler version of caxpy for power8

to work around wrong code generated by gcc 8.3

* Handle CONJ define for caxpyc

* Handle CONJ define for caxpyc

* Add gcc7-generated assembly cdot for POWER9

* Use prebuilt assembly for POWER9 cdot

created with gcc 7.3.1 to work around ICE in older gcc versions

* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6

* Update Makefile.system

* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH

* Disable POWER9 with old gcc versions

4 years agoRestore ppc64 CI job and remove the travis_wait that caused the problem with it
Martin Kroeker [Fri, 20 Sep 2019 08:29:35 +0000 (10:29 +0200)]
Restore ppc64 CI job and remove the travis_wait that caused the problem with it

4 years agoRevert #2051 and replace with a better fix (#2261)
Martin Kroeker [Tue, 17 Sep 2019 16:56:04 +0000 (18:56 +0200)]
Revert #2051 and replace with a better fix (#2261)

* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again

4 years agoMerge pull request #6 from xianyi/develop
Martin Kroeker [Fri, 13 Sep 2019 12:00:23 +0000 (14:00 +0200)]
Merge pull request #6 from xianyi/develop

update to current develop