platform/upstream/openblas.git
4 years agoMerge pull request #2351 from Zeyiii/develop
Martin Kroeker [Tue, 31 Dec 2019 17:07:37 +0000 (18:07 +0100)]
Merge pull request #2351 from Zeyiii/develop

prefetching for dgemm_beta

4 years agoadd in runtime cpu detection for zarch (#2349)
int_13h [Tue, 31 Dec 2019 17:03:27 +0000 (22:33 +0530)]
add in runtime cpu detection for zarch (#2349)

 add in runtime cpu detection for zarch

4 years agoMerge remote-tracking branch 'pub/develop' into develop
w00421467 [Tue, 31 Dec 2019 02:13:24 +0000 (10:13 +0800)]
Merge remote-tracking branch 'pub/develop' into develop

4 years agoprefetching for dgemm_beta
w00421467 [Mon, 30 Dec 2019 03:45:49 +0000 (11:45 +0800)]
prefetching for dgemm_beta

4 years agoMerge pull request #2348 from wjc404/develop
Martin Kroeker [Sat, 28 Dec 2019 19:07:56 +0000 (20:07 +0100)]
Merge pull request #2348 from wjc404/develop

AVX2 CGEMM3M kernel

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Fri, 27 Dec 2019 15:36:13 +0000 (23:36 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate cgemm3m_kernel_8x4_haswell.c
wjc404 [Fri, 27 Dec 2019 10:23:29 +0000 (18:23 +0800)]
Update cgemm3m_kernel_8x4_haswell.c

4 years agoUpdate param.h
wjc404 [Fri, 27 Dec 2019 10:06:42 +0000 (18:06 +0800)]
Update param.h

4 years agoUpdate KERNEL.ZEN
wjc404 [Fri, 27 Dec 2019 10:04:08 +0000 (18:04 +0800)]
Update KERNEL.ZEN

4 years agoUpdate gemm3m_level3.c
wjc404 [Fri, 27 Dec 2019 10:03:01 +0000 (18:03 +0800)]
Update gemm3m_level3.c

4 years agoUpdate KERNEL.HASWELL
wjc404 [Fri, 27 Dec 2019 10:01:38 +0000 (18:01 +0800)]
Update KERNEL.HASWELL

4 years agoCreate cgemm3m_kernel_8x4_haswell.c
wjc404 [Fri, 27 Dec 2019 10:00:55 +0000 (18:00 +0800)]
Create cgemm3m_kernel_8x4_haswell.c

4 years agoMerge pull request #2345 from wjc404/develop
Martin Kroeker [Wed, 25 Dec 2019 21:26:41 +0000 (22:26 +0100)]
Merge pull request #2345 from wjc404/develop

Optimize AVX2 CGEMM

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Mon, 23 Dec 2019 16:40:16 +0000 (00:40 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 23 Dec 2019 16:30:16 +0000 (00:30 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 23 Dec 2019 16:24:40 +0000 (00:24 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate param.h
wjc404 [Mon, 23 Dec 2019 15:44:55 +0000 (23:44 +0800)]
Update param.h

4 years agoUpdate KERNEL.ZEN
wjc404 [Mon, 23 Dec 2019 15:42:30 +0000 (23:42 +0800)]
Update KERNEL.ZEN

4 years agoUpdate KERNEL.HASWELL
wjc404 [Mon, 23 Dec 2019 15:41:44 +0000 (23:41 +0800)]
Update KERNEL.HASWELL

4 years agoFast Haswell CGEMM kernel
wjc404 [Mon, 23 Dec 2019 15:40:03 +0000 (23:40 +0800)]
Fast Haswell CGEMM kernel

4 years agoMerge pull request #2344 from wjc404/develop
Martin Kroeker [Sat, 21 Dec 2019 11:16:55 +0000 (12:16 +0100)]
Merge pull request #2344 from wjc404/develop

Optimize AVX2 ZGEMM

4 years agoAdjust Haswell ZGEMM blocking parameters
wjc404 [Sat, 21 Dec 2019 06:38:51 +0000 (14:38 +0800)]
Adjust Haswell ZGEMM blocking parameters

4 years agoFast Haswell ZGEMM kernel
wjc404 [Sat, 21 Dec 2019 06:37:06 +0000 (14:37 +0800)]
Fast Haswell ZGEMM kernel

4 years agoFast Haswell ZGEMM kernel
wjc404 [Sat, 21 Dec 2019 06:35:15 +0000 (14:35 +0800)]
Fast Haswell ZGEMM kernel

4 years agoMerge pull request #2340 from Zeyiii/develop
Martin Kroeker [Fri, 20 Dec 2019 07:38:57 +0000 (08:38 +0100)]
Merge pull request #2340 from Zeyiii/develop

[WIP] Use arm neon instructions to optimize gemm beta operation

4 years agodeclare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
w00421467 [Fri, 20 Dec 2019 02:11:50 +0000 (10:11 +0800)]
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL

4 years agouse arm neon instructions to optimize gemm beta operation
w00421467 [Tue, 17 Dec 2019 02:00:13 +0000 (10:00 +0800)]
use arm neon instructions to optimize gemm beta operation

4 years agoMerge pull request #2339 from Jehan/wip/Jehan/fix-timeout
Martin Kroeker [Fri, 13 Dec 2019 13:57:26 +0000 (14:57 +0100)]
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout

driver: more reasonable thread wait timeout on Windows.

4 years agodriver: more reasonable thread wait timeout on Windows.
Jehan [Wed, 11 Dec 2019 16:51:42 +0000 (17:51 +0100)]
driver: more reasonable thread wait timeout on Windows.

It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.

Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.

4 years agoMerge pull request #2338 from kavanabhat/aix_mod
Martin Kroeker [Mon, 9 Dec 2019 16:54:49 +0000 (17:54 +0100)]
Merge pull request #2338 from kavanabhat/aix_mod

Changes to build on AIX in POWER8 mode

4 years agoMerge pull request #2337 from martin-frbg/issue2336
Martin Kroeker [Sat, 7 Dec 2019 08:38:06 +0000 (09:38 +0100)]
Merge pull request #2337 from martin-frbg/issue2336

Support two-digit version numbers in gcc version check

4 years agoSupport two-digit version numbers in gcc version check
Martin Kroeker [Fri, 6 Dec 2019 20:23:56 +0000 (21:23 +0100)]
Support two-digit version numbers in gcc version check

fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.

4 years agoAIX changes for Power8
Kavana Bhat [Fri, 6 Dec 2019 10:33:32 +0000 (04:33 -0600)]
AIX changes for Power8

4 years agoUpdate DYNAMIC_ARCH support for ARM64 and PPC (#2332)
Martin Kroeker [Wed, 4 Dec 2019 10:06:03 +0000 (11:06 +0100)]
Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)

* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets

4 years agoAIX changes for Power8
Kavana Bhat [Wed, 4 Dec 2019 06:23:46 +0000 (00:23 -0600)]
AIX changes for Power8

4 years agoMerge pull request #2334 from martin-frbg/fix2228
Martin Kroeker [Tue, 3 Dec 2019 21:23:52 +0000 (22:23 +0100)]
Merge pull request #2334 from martin-frbg/fix2228

Remove misplaced file

4 years agoAdd Intel Goldmont+ cpuid
Martin Kroeker [Tue, 3 Dec 2019 07:32:29 +0000 (08:32 +0100)]
Add Intel Goldmont+ cpuid

was originally in #2228 but that PR had misplaced the file in the toplevel directory

4 years agoDelete stray copy of dynamic.c from PR 2228
Martin Kroeker [Tue, 3 Dec 2019 07:24:10 +0000 (08:24 +0100)]
Delete stray copy of dynamic.c from PR 2228

4 years agoMerge pull request #20 from xianyi/develop
Martin Kroeker [Tue, 3 Dec 2019 07:22:40 +0000 (08:22 +0100)]
Merge pull request #20 from xianyi/develop

Rebase

4 years agoMerge pull request #2329 from isuruf/patch-1
Martin Kroeker [Mon, 2 Dec 2019 07:30:43 +0000 (08:30 +0100)]
Merge pull request #2329 from isuruf/patch-1

Workaround an ICE in clang 9.0.0

4 years agoWorkaround an ICE in clang 9.0.0
Isuru Fernando [Sun, 1 Dec 2019 17:55:49 +0000 (11:55 -0600)]
Workaround an ICE in clang 9.0.0

This bug is not there in 8.x nor in the 9.0 daily snapshot.

4 years agoMerge pull request #2328 from martin-frbg/ppc9
Martin Kroeker [Sat, 30 Nov 2019 11:23:57 +0000 (12:23 +0100)]
Merge pull request #2328 from martin-frbg/ppc9

Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version

4 years agoMerge pull request #2324 from antonblanchard/power9_segv
Martin Kroeker [Fri, 29 Nov 2019 23:03:42 +0000 (00:03 +0100)]
Merge pull request #2324 from antonblanchard/power9_segv

Fix SEGV in cdot_power9

4 years agoFix caxpy/caxpyc naming in localentry
Martin Kroeker [Fri, 29 Nov 2019 22:56:57 +0000 (23:56 +0100)]
Fix caxpy/caxpyc naming in localentry

4 years agoFix caxpy/caxpyc naming in localentry
Martin Kroeker [Fri, 29 Nov 2019 22:54:15 +0000 (23:54 +0100)]
Fix caxpy/caxpyc naming in localentry

4 years agoSubstitute precompiled gcc7 codes only when gcc is older than 9.x
Martin Kroeker [Fri, 29 Nov 2019 22:49:50 +0000 (23:49 +0100)]
Substitute precompiled gcc7 codes only when gcc is older than 9.x

4 years agoAdd variable for gcc >=9 test
Martin Kroeker [Fri, 29 Nov 2019 22:47:23 +0000 (23:47 +0100)]
Add variable for gcc >=9 test

used in KERNEL.POWER9

4 years agoMerge pull request #19 from xianyi/develop
Martin Kroeker [Fri, 29 Nov 2019 22:44:09 +0000 (23:44 +0100)]
Merge pull request #19 from xianyi/develop

rebase

4 years agoMerge pull request #2323 from wjc404/develop
Martin Kroeker [Thu, 28 Nov 2019 19:55:16 +0000 (20:55 +0100)]
Merge pull request #2323 from wjc404/develop

some optimizations of AVX512 DGEMM

4 years agoUpdate param.h
wjc404 [Thu, 28 Nov 2019 11:57:50 +0000 (19:57 +0800)]
Update param.h

4 years agoUpdate dgemm_kernel_4x8_skylakex_2.c
wjc404 [Thu, 28 Nov 2019 11:56:35 +0000 (19:56 +0800)]
Update dgemm_kernel_4x8_skylakex_2.c

4 years agoMerge pull request #2321 from martin-frbg/issue2319
Martin Kroeker [Thu, 28 Nov 2019 08:30:24 +0000 (09:30 +0100)]
Merge pull request #2321 from martin-frbg/issue2319

Fix race conditions in multithreaded GEMM3M

4 years agoMerge pull request #2327 from martin-frbg/travisosx
Martin Kroeker [Thu, 28 Nov 2019 07:43:45 +0000 (08:43 +0100)]
Merge pull request #2327 from martin-frbg/travisosx

Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now

4 years agoMerge pull request #2326 from xianyi/revert-2325-travisosx
Martin Kroeker [Wed, 27 Nov 2019 23:17:19 +0000 (00:17 +0100)]
Merge pull request #2326 from xianyi/revert-2325-travisosx

Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"

4 years agoCleanup IOS build and disable FORTRAN on 32bit and ios builds for now
Martin Kroeker [Wed, 27 Nov 2019 23:15:36 +0000 (00:15 +0100)]
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now

 Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment

4 years agoRevert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for...
Martin Kroeker [Wed, 27 Nov 2019 23:09:06 +0000 (00:09 +0100)]
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"

4 years agoMerge pull request #2325 from martin-frbg/travisosx
Martin Kroeker [Wed, 27 Nov 2019 20:59:36 +0000 (21:59 +0100)]
Merge pull request #2325 from martin-frbg/travisosx

Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now

4 years agoCleanup IOS build and disable FORTRAN on 32bit and ios builds for now
Martin Kroeker [Wed, 27 Nov 2019 14:10:57 +0000 (15:10 +0100)]
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now

Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment

4 years agoFix SEGV in cdot_power9
Anton Blanchard [Wed, 27 Nov 2019 04:55:04 +0000 (21:55 -0700)]
Fix SEGV in cdot_power9

We were corrupting r2 because the local entry wasn't being
setup correctly.

4 years agosome optimizations
wjc404 [Tue, 26 Nov 2019 06:12:20 +0000 (14:12 +0800)]
some optimizations

4 years agoFix AVX512 capability test (always returning zero)
Martin Kroeker [Sat, 23 Nov 2019 21:38:07 +0000 (22:38 +0100)]
Fix AVX512 capability test (always returning zero)

from #2322

4 years agoFix race conditions in multithreaded GEMM3M
Martin Kroeker [Sat, 23 Nov 2019 18:54:56 +0000 (19:54 +0100)]
Fix race conditions in multithreaded GEMM3M

by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago

4 years agoAdd the cpuid of the business/rackmount version of z15 as well
Martin Kroeker [Thu, 21 Nov 2019 17:14:29 +0000 (18:14 +0100)]
Add the cpuid of the business/rackmount version of z15 as well

4 years agoMerge pull request #2316 from sharkcz/s390x
Martin Kroeker [Thu, 21 Nov 2019 17:03:00 +0000 (18:03 +0100)]
Merge pull request #2316 from sharkcz/s390x

zarch: treat z15 as z14 instead of generic

4 years agoMerge pull request #2317 from aarnez/develop
Martin Kroeker [Thu, 21 Nov 2019 16:59:21 +0000 (17:59 +0100)]
Merge pull request #2317 from aarnez/develop

Change bad usage of "asum" to "sum" in ZARCH versions of ?sum

4 years agoChange bad usage of "asum" to "sum" in ZARCH versions of ?sum
Andreas Arnez [Fri, 20 Sep 2019 16:32:47 +0000 (18:32 +0200)]
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum

The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead.  The mismatch causes a build error.  This is fixed.

4 years agozarch: treat z15 as z14 instead of generic
Dan Horák [Thu, 21 Nov 2019 11:49:54 +0000 (12:49 +0100)]
zarch: treat z15 as z14 instead of generic

Signed-off-by: Dan Horák <dan@danny.cz>
4 years agoMerge pull request #2315 from ewanglong/develop
Martin Kroeker [Thu, 21 Nov 2019 04:06:44 +0000 (05:06 +0100)]
Merge pull request #2315 from ewanglong/develop

revised fix windows compatible for #2313

4 years agorevised fix windows compatible for #2313
Wang, Long [Thu, 21 Nov 2019 02:19:40 +0000 (10:19 +0800)]
revised fix windows compatible for #2313

Signed-off-by: Wang, Long <long1.wang@intel.com>
4 years agoMerge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
Martin Kroeker [Wed, 20 Nov 2019 15:16:35 +0000 (16:16 +0100)]
Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash

Fix usage of TerminateThread() causing critical section corruption.

4 years agoMerge pull request #2312 from martin-frbg/power8be
Martin Kroeker [Wed, 20 Nov 2019 14:12:06 +0000 (15:12 +0100)]
Merge pull request #2312 from martin-frbg/power8be

Further Power8 big-endian corrections

4 years agoMerge pull request #2313 from ewanglong/develop
Martin Kroeker [Wed, 20 Nov 2019 13:49:15 +0000 (14:49 +0100)]
Merge pull request #2313 from ewanglong/develop

Fix the integer overflow issue for large matrix size

4 years agoFor the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Wang, Long [Wed, 20 Nov 2019 13:30:16 +0000 (21:30 +0800)]
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length

Signed-off-by: Wang, Long <long1.wang@intel.com>
4 years agoFix usage of TerminateThread() causing critical section corruption.
Jehan [Wed, 20 Nov 2019 11:21:35 +0000 (12:21 +0100)]
Fix usage of TerminateThread() causing critical section corruption.

This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:

First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread)

Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).

And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.

4 years agoFix the integer overflow issue for large matrix size
Wang, Long [Wed, 20 Nov 2019 03:50:37 +0000 (11:50 +0800)]
Fix the integer overflow issue for large matrix size

For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.

Signed-off-by: Wang, Long <long1.wang@intel.com>
4 years agoMerge pull request #2310 from martin-frbg/ppc440
Martin Kroeker [Sun, 17 Nov 2019 22:19:48 +0000 (23:19 +0100)]
Merge pull request #2310 from martin-frbg/ppc440

Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default

4 years agoDefine alternate kernels for big-endian POWER8
Martin Kroeker [Sun, 17 Nov 2019 22:12:10 +0000 (23:12 +0100)]
Define alternate kernels for big-endian POWER8

4 years agoFix compilation for big-endian POWER8
Martin Kroeker [Sun, 17 Nov 2019 21:58:32 +0000 (22:58 +0100)]
Fix compilation for big-endian POWER8

4 years agoDefine alternate kernels for big-endian PPC440
Martin Kroeker [Sun, 17 Nov 2019 18:25:08 +0000 (19:25 +0100)]
Define alternate kernels for big-endian PPC440

4 years agoDisable the old QCDOC qalloc by default and copy utility functions from memory.c
Martin Kroeker [Sun, 17 Nov 2019 18:22:04 +0000 (19:22 +0100)]
Disable the old QCDOC qalloc by default and copy utility functions from memory.c

1. qalloc() appears to have been a special routine written for the PPC440-based QCDOC supercomputer(s) from around 2005, its source does not seem to be readily available. So switch the #if 1 in the code to rely on standard malloc() by default.
2. Utility functions like get_num_procs, get_num_threads that were added to the "normally" used memory.c in the meantime were still missing here.

4 years agoMerge pull request #17 from xianyi/develop
Martin Kroeker [Sun, 17 Nov 2019 18:09:49 +0000 (19:09 +0100)]
Merge pull request #17 from xianyi/develop

rebase

4 years agoMerge pull request #2309 from martin-frbg/ppc970-be
Martin Kroeker [Sun, 17 Nov 2019 17:22:24 +0000 (18:22 +0100)]
Merge pull request #2309 from martin-frbg/ppc970-be

Fix PPC970 big-endian support

4 years agoDefine alternate kernels for big-endian PPC970
Martin Kroeker [Sun, 17 Nov 2019 14:19:39 +0000 (15:19 +0100)]
Define alternate kernels for big-endian PPC970

The altivec versions of SGEMM and CGEMM fail most test in LAPACK-TESTING when compiled for big endian, STRSM/CTRSM even cause segfaults. The rot kernels either fail the corresponding utest or lead to failures in LAPACK-TESTING.

4 years agoUse "generic" S/CGEMM unroll M on big-endian PPC970
Martin Kroeker [Sun, 17 Nov 2019 14:10:26 +0000 (15:10 +0100)]
Use "generic" S/CGEMM unroll M on big-endian PPC970

as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian

4 years agoMerge pull request #2308 from martin-frbg/ctestfix
Martin Kroeker [Fri, 15 Nov 2019 07:33:17 +0000 (08:33 +0100)]
Merge pull request #2308 from martin-frbg/ctestfix

Fix potential issue in the c/z blas3 ctests

4 years agoFix potential spurious failure from uninitialized variable
Martin Kroeker [Thu, 14 Nov 2019 23:20:36 +0000 (00:20 +0100)]
Fix potential spurious failure from uninitialized variable

4 years agoFix potential spurious failure from uninitialized variable
Martin Kroeker [Thu, 14 Nov 2019 23:19:24 +0000 (00:19 +0100)]
Fix potential spurious failure from uninitialized variable

5 years agoMerge pull request #2305 from wjc404/develop
Martin Kroeker [Tue, 12 Nov 2019 06:38:37 +0000 (07:38 +0100)]
Merge pull request #2305 from wjc404/develop

AVX512 CGEMM & ZGEMM kernels

5 years agoAVX512 CGEMM & ZGEMM kernels
wjc404 [Mon, 11 Nov 2019 12:04:52 +0000 (20:04 +0800)]
AVX512 CGEMM & ZGEMM kernels

96-99% 1-thread performance of MKL2018

5 years agoMerge pull request #15 from xianyi/develop
Martin Kroeker [Sat, 9 Nov 2019 17:52:08 +0000 (18:52 +0100)]
Merge pull request #15 from xianyi/develop

rebase

5 years agoMerge pull request #2300 from wjc404/develop
Martin Kroeker [Wed, 6 Nov 2019 06:27:33 +0000 (07:27 +0100)]
Merge pull request #2300 from wjc404/develop

Optimize SGEMM on SKYLAKEX CPUs

5 years agooptimizations of software prefetching
wjc404 [Tue, 5 Nov 2019 05:36:56 +0000 (13:36 +0800)]
optimizations of software prefetching

5 years agoMerge pull request #2302 from martin-frbg/ppc970
Martin Kroeker [Mon, 4 Nov 2019 21:55:05 +0000 (22:55 +0100)]
Merge pull request #2302 from martin-frbg/ppc970

Disable three-operand DCBT on PPC970 regardless of operating system

5 years agoMerge pull request #2301 from martin-frbg/ppc8be
Martin Kroeker [Mon, 4 Nov 2019 21:54:28 +0000 (22:54 +0100)]
Merge pull request #2301 from martin-frbg/ppc8be

Disable IDAMIN/MAX and IZAMIN/MAX optimizations on big-endian POWER8

5 years agoMerge pull request #2294 from martin-frbg/ios-cleanup
Martin Kroeker [Mon, 4 Nov 2019 21:53:58 +0000 (22:53 +0100)]
Merge pull request #2294 from martin-frbg/ios-cleanup

Remove obsolete workarounds for IOS on ARMV8

5 years agoAdd files via upload
wjc404 [Mon, 4 Nov 2019 12:10:12 +0000 (20:10 +0800)]
Add files via upload

5 years agooptimizations via software prefetches
wjc404 [Mon, 4 Nov 2019 11:37:19 +0000 (19:37 +0800)]
optimizations via software prefetches

5 years agoUse the two-operand form of DCBT on all PPC970 regardless of OS
Martin Kroeker [Sun, 3 Nov 2019 21:55:31 +0000 (22:55 +0100)]
Use the two-operand form of DCBT on all PPC970 regardless of OS

There seems to be no advantage to the three-operand form used in the earliest GotoBLAS kernels, and it causes compilation problems  on other than the previously special-cased platforms as well

5 years agoThe assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:42:46 +0000 (22:42 +0100)]
The assembly microkernel is not safe to use on ELFv1

5 years agoThe assembly microkernel is not safe to use on ELFv1
Martin Kroeker [Sun, 3 Nov 2019 21:41:19 +0000 (22:41 +0100)]
The assembly microkernel is not safe to use on ELFv1