platform/upstream/openblas.git
4 years agoAdd files via upload
wjc404 [Tue, 15 Oct 2019 18:01:13 +0000 (02:01 +0800)]
Add files via upload

4 years agoAdd files via upload
wjc404 [Tue, 15 Oct 2019 18:00:34 +0000 (02:00 +0800)]
Add files via upload

4 years agoMerge pull request #2283 from martin-frbg/issue2176
Martin Kroeker [Wed, 9 Oct 2019 20:06:09 +0000 (22:06 +0200)]
Merge pull request #2283 from martin-frbg/issue2176

Support QEMU virtual cpu in 64bit mode as CORE2 or BARCELONA

4 years agoSupport QEMU cpu calling itself 64bit AMD Athlon as well
Martin Kroeker [Wed, 9 Oct 2019 16:24:13 +0000 (18:24 +0200)]
Support QEMU cpu calling itself 64bit AMD Athlon as well

Some QEMU instances pretend to be "AuthenticAMD" with the same family 6/model 6 even when running on an Intel host
(could be related to qemu or libvirt version and/or kvm availability). Also fix the define to depend on __x86_64__ set by the
compiler, the defines using __64BIT__ will only work for getarch_2nd.

4 years agoSupport QEMU virtual cpu as CORE2
Martin Kroeker [Tue, 8 Oct 2019 20:30:02 +0000 (22:30 +0200)]
Support QEMU virtual cpu as CORE2

qemu itself claims it is a 64bit P6, which does not exist in the wild.

4 years agoSimplify OSX/IOS cross-compilation and add a CI test for it (#2279)
Martin Kroeker [Tue, 8 Oct 2019 18:13:14 +0000 (20:13 +0200)]
Simplify OSX/IOS cross-compilation and add a CI test for it (#2279)

* Add automatic fixups for OSX/IOS cross-compilation

* Add OSX/IOS cross-compilation test to Travis CI

* Handle platforms that lack hwcap.h by falling back to ARMV8

* Fix PROLOGUE for OSX/IOS

4 years agoMerge pull request #2280 from martin-frbg/iosfix
Martin Kroeker [Tue, 8 Oct 2019 08:25:25 +0000 (10:25 +0200)]
Merge pull request #2280 from martin-frbg/iosfix

Add overlooked part of IOS compilation fix

4 years agoRemove automatic label postfixes from macro included only once
Martin Kroeker [Tue, 8 Oct 2019 06:37:50 +0000 (08:37 +0200)]
Remove automatic label postfixes from macro included only once

4 years agoMerge pull request #11 from xianyi/develop
Martin Kroeker [Tue, 8 Oct 2019 06:32:52 +0000 (08:32 +0200)]
Merge pull request #11 from xianyi/develop

sync with upstream

4 years agoFix accidental duplication of jump instruction
Martin Kroeker [Tue, 8 Oct 2019 06:09:26 +0000 (08:09 +0200)]
Fix accidental duplication of jump instruction

4 years agoMerge pull request #2277 from martin-frbg/issue2275
Martin Kroeker [Sun, 6 Oct 2019 21:01:54 +0000 (23:01 +0200)]
Merge pull request #2277 from martin-frbg/issue2275

Rewrite ARMV8 code to allow cross-compilation for IOS

4 years agoMerge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative
Martin Kroeker [Sun, 6 Oct 2019 09:12:44 +0000 (11:12 +0200)]
Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative

Revert "Avoid taking root of negative number in symv_thread.c"

4 years agoMove 32bit OSX build back to xcode 8.3 but switch to gcc8
Martin Kroeker [Sat, 5 Oct 2019 08:52:47 +0000 (10:52 +0200)]
Move 32bit OSX build back to xcode 8.3 but switch to gcc8

4 years agoMake local labels in macro compatible with the xcode assembler
Martin Kroeker [Fri, 4 Oct 2019 12:53:23 +0000 (14:53 +0200)]
Make local labels in macro compatible with the xcode assembler

... which does not perform the automatic numbering on instantiation that the _@ suffix signifies

4 years agoRewrite ARM64 PROLOGUE to make it compatible with xcode/ios
Martin Kroeker [Fri, 4 Oct 2019 12:50:03 +0000 (14:50 +0200)]
Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios

4 years agoUpdate 32bit macOS again to xcode 9.3
Martin Kroeker [Wed, 2 Oct 2019 23:09:02 +0000 (01:09 +0200)]
Update 32bit macOS again to xcode 9.3

os version 10.13 "High Sierra" appears to be the oldest release now for which Homebrew provides a gcc package.
Anything older and the Travis job will run out of time building gcc from source

4 years agoUpdate the OSX BINARY=32 test to xcode9.2
Martin Kroeker [Wed, 2 Oct 2019 20:35:34 +0000 (22:35 +0200)]
Update the OSX BINARY=32 test to xcode9.2

in response to Homebrew updates

4 years agoRevert "Avoid taking root of negative number in symv_thread.c"
Martin Kroeker [Tue, 1 Oct 2019 21:50:41 +0000 (23:50 +0200)]
Revert "Avoid taking root of negative number in symv_thread.c"

4 years agoMerge pull request #2272 from seberg/thread-sqrt-of-negative
Martin Kroeker [Mon, 30 Sep 2019 09:27:29 +0000 (11:27 +0200)]
Merge pull request #2272 from seberg/thread-sqrt-of-negative

Avoid taking root of negative number in symv_thread.c

4 years agoAvoid taking root of negative number in symv_thread.c
Sebastian Berg [Mon, 30 Sep 2019 05:03:12 +0000 (22:03 -0700)]
Avoid taking root of negative number in symv_thread.c

This is similar to fixes in gh-1929, but there was one remaining
occurance of this type of pattern in the driver/level2/*_thread.c
files.

4 years agoMerge pull request #2271 from quickwritereader/strmm_fix
Martin Kroeker [Sun, 29 Sep 2019 11:53:45 +0000 (13:53 +0200)]
Merge pull request #2271 from quickwritereader/strmm_fix

fixed bug power9 strmm . BLAS-TESTER passes

4 years agotrmm fix
AbdelRauf [Sun, 29 Sep 2019 02:27:50 +0000 (02:27 +0000)]
trmm fix

4 years agoMerge pull request #2269 from martin-frbg/ppc-fixes
Martin Kroeker [Fri, 27 Sep 2019 07:52:19 +0000 (09:52 +0200)]
Merge pull request #2269 from martin-frbg/ppc-fixes

Ppc fixes

4 years agoFix prologue of power9 assembly cdot(c) kernel to provide cdotc
Martin Kroeker [Thu, 26 Sep 2019 22:47:18 +0000 (00:47 +0200)]
Fix prologue of power9 assembly cdot(c) kernel to provide cdotc

4 years agoFix mis-edits in the gcc-derived power8 caxpy kernel
Martin Kroeker [Thu, 26 Sep 2019 22:44:26 +0000 (00:44 +0200)]
Fix mis-edits in the gcc-derived power8 caxpy kernel

4 years agoMerge pull request #7 from xianyi/develop
Martin Kroeker [Thu, 26 Sep 2019 22:42:32 +0000 (00:42 +0200)]
Merge pull request #7 from xianyi/develop

update

4 years agoCount cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)
Martin Kroeker [Wed, 25 Sep 2019 21:13:24 +0000 (23:13 +0200)]
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267)

There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.

4 years agoReplace several POWER8/9 C kernels with their gcc7-generated assembly versions (...
Martin Kroeker [Sun, 22 Sep 2019 20:35:22 +0000 (22:35 +0200)]
Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263)

* Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy

To work around internal compiler errors encountered when compiling the original C source with gcc 4 and 5, and wrong code generated by gcc 8.3.0

* Use gcc-generated assembly instead of original C sources

to work around internal compiler errors encountered with gcc 4.8/5.4 and wrong code generation by gcc 8.3

* Use gcc-generated assembly instead of the original C source

to work around internal compiler errors encountered with gcc 4.8 and 5.4, and wrong code generation by gcc 8.3

* Add gcc7-generated assembler version of caxpy for power8

to work around wrong code generated by gcc 8.3

* Handle CONJ define for caxpyc

* Handle CONJ define for caxpyc

* Add gcc7-generated assembly cdot for POWER9

* Use prebuilt assembly for POWER9 cdot

created with gcc 7.3.1 to work around ICE in older gcc versions

* Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6

* Update Makefile.system

* Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH

* Disable POWER9 with old gcc versions

4 years agoRestore ppc64 CI job and remove the travis_wait that caused the problem with it
Martin Kroeker [Fri, 20 Sep 2019 08:29:35 +0000 (10:29 +0200)]
Restore ppc64 CI job and remove the travis_wait that caused the problem with it

4 years agoRevert #2051 and replace with a better fix (#2261)
Martin Kroeker [Tue, 17 Sep 2019 16:56:04 +0000 (18:56 +0200)]
Revert #2051 and replace with a better fix (#2261)

* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again

4 years agoMerge pull request #6 from xianyi/develop
Martin Kroeker [Fri, 13 Sep 2019 12:00:23 +0000 (14:00 +0200)]
Merge pull request #6 from xianyi/develop

update to current develop

4 years agoMerge pull request #2252 from thrasibule/trtrs
Martin Kroeker [Thu, 12 Sep 2019 19:45:47 +0000 (21:45 +0200)]
Merge pull request #2252 from thrasibule/trtrs

Optimized ?trtrs

4 years agomore bugfix
Guillaume Horel [Tue, 10 Sep 2019 21:30:57 +0000 (17:30 -0400)]
more bugfix

4 years agofix Makefile
Guillaume Horel [Tue, 10 Sep 2019 21:11:01 +0000 (17:11 -0400)]
fix Makefile

4 years agofix error codes
Guillaume Horel [Tue, 10 Sep 2019 21:10:33 +0000 (17:10 -0400)]
fix error codes

4 years agoMerge pull request #2249 from brada4/gcc7minor
Martin Kroeker [Tue, 10 Sep 2019 06:27:32 +0000 (08:27 +0200)]
Merge pull request #2249 from brada4/gcc7minor

Address minor warnings popping up in gcc7+

4 years agoFix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)
Martin Kroeker [Tue, 10 Sep 2019 06:27:06 +0000 (08:27 +0200)]
Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)

* Fix compiler identification and option setting

* Handle BINARY=32 option on X86_64

* Add xGEMM3M unroll parameters for crossbuild-target CORE2

* Replace bogus mingw64/32bit CI job with actual 32bit build

mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).

4 years agofix Makefile
Guillaume Horel [Mon, 9 Sep 2019 15:36:50 +0000 (11:36 -0400)]
fix Makefile

4 years agobugfix
Guillaume Horel [Sun, 8 Sep 2019 02:06:27 +0000 (22:06 -0400)]
bugfix

4 years agoturn on optimized code
Guillaume Horel [Fri, 6 Sep 2019 21:19:40 +0000 (17:19 -0400)]
turn on optimized code

4 years agoadd missing file
Guillaume Horel [Fri, 6 Sep 2019 20:49:27 +0000 (16:49 -0400)]
add missing file

4 years agofix Makefile
Guillaume Horel [Fri, 6 Sep 2019 20:49:12 +0000 (16:49 -0400)]
fix Makefile

4 years agoadd missing defines and headers
Guillaume Horel [Fri, 6 Sep 2019 20:48:18 +0000 (16:48 -0400)]
add missing defines and headers

4 years agoupdate Makefile
Guillaume Horel [Fri, 6 Sep 2019 20:01:55 +0000 (16:01 -0400)]
update Makefile

4 years agoadd missing files
Guillaume Horel [Tue, 3 Sep 2019 18:45:43 +0000 (14:45 -0400)]
add missing files

4 years agoadd logic
Guillaume Horel [Tue, 3 Sep 2019 01:57:28 +0000 (21:57 -0400)]
add logic

4 years agoadd missing objects
Guillaume Horel [Tue, 3 Sep 2019 01:15:20 +0000 (21:15 -0400)]
add missing objects

4 years agoadd files
Guillaume Horel [Fri, 30 Aug 2019 20:31:25 +0000 (16:31 -0400)]
add files

4 years agostart working on ?trtrs
Guillaume Horel [Fri, 30 Aug 2019 19:06:38 +0000 (15:06 -0400)]
start working on ?trtrs

5 years agoaddress minor warnings from gcc7
Andrew [Sat, 7 Sep 2019 07:21:08 +0000 (10:21 +0300)]
address minor warnings from gcc7

5 years agoinit
Andrew [Sat, 7 Sep 2019 07:18:46 +0000 (10:18 +0300)]
init

5 years agoImprove cmake build behaviour with non-host cpu targets (#2246)
Martin Kroeker [Tue, 3 Sep 2019 20:41:17 +0000 (22:41 +0200)]
Improve cmake build behaviour with non-host cpu targets (#2246)

1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245

5 years agoMerge pull request #2 from xianyi/develop
Martin Kroeker [Tue, 3 Sep 2019 13:12:14 +0000 (15:12 +0200)]
Merge pull request #2 from xianyi/develop

update

5 years agoMerge pull request #2242 from martin-frbg/issue2235
Martin Kroeker [Mon, 2 Sep 2019 20:06:29 +0000 (22:06 +0200)]
Merge pull request #2242 from martin-frbg/issue2235

Add arch data for cmake cross-compiling to CORE2

5 years agoAdd cgemm and zgemm unroll factors for core2
Martin Kroeker [Mon, 2 Sep 2019 13:03:45 +0000 (15:03 +0200)]
Add cgemm and zgemm unroll factors for core2

5 years agoDisable ppc64le test environment on Travis CI
Martin Kroeker [Sat, 31 Aug 2019 16:06:12 +0000 (18:06 +0200)]
Disable ppc64le test environment on Travis CI

as this semi-official beta option has suddenly reverted to a standard x86_64 environment causing spurious failures

5 years agoMerge pull request #2243 from quickwritereader/develop
Martin Kroeker [Fri, 30 Aug 2019 21:06:23 +0000 (23:06 +0200)]
Merge pull request #2243 from quickwritereader/develop

possible cgemv,caxpy,cdot fix

5 years ago fix uninitialized variables i
AbdelRauf [Fri, 30 Aug 2019 11:14:55 +0000 (11:14 +0000)]
 fix uninitialized variables i

5 years agocaxpy and cdot are using vec_vsx_ld
AbdelRauf [Fri, 30 Aug 2019 04:09:15 +0000 (04:09 +0000)]
caxpy and cdot are using vec_vsx_ld

5 years agocgemv using vec_vsx_ld instead of letting gcc to decide
AbdelRauf [Fri, 30 Aug 2019 02:52:04 +0000 (02:52 +0000)]
cgemv using vec_vsx_ld instead of letting gcc to decide

5 years agoaligned
AbdelRauf [Thu, 29 Aug 2019 23:22:23 +0000 (23:22 +0000)]
aligned

5 years agoMerge pull request #2241 from martin-frbg/zdotfix
Martin Kroeker [Thu, 29 Aug 2019 05:12:54 +0000 (07:12 +0200)]
Merge pull request #2241 from martin-frbg/zdotfix

Make x86_64 zdot compile with PGI and Sun C again

5 years agoKeep both PGI/SUN and default code paths to avoid breaking Clang/WIndows
Martin Kroeker [Wed, 28 Aug 2019 16:07:44 +0000 (18:07 +0200)]
Keep both PGI/SUN and default code paths to avoid breaking Clang/WIndows

5 years agoAdd arch data for cross-compiling to CORE2
Martin Kroeker [Wed, 28 Aug 2019 15:35:56 +0000 (17:35 +0200)]
Add arch data for cross-compiling to CORE2

for #2235

5 years agoMerge pull request #2240 from martin-frbg/issue2237
Martin Kroeker [Wed, 28 Aug 2019 13:30:53 +0000 (15:30 +0200)]
Merge pull request #2240 from martin-frbg/issue2237

Fix PGI build options (again)

5 years agoMake x86_64 zdot compile with PGI and Sun C again
Martin Kroeker [Wed, 28 Aug 2019 09:35:31 +0000 (11:35 +0200)]
Make x86_64 zdot compile with PGI and Sun C again

broken by #2222 as CREAL,CIMAG do not expand to a valid lvalue with these compilers

5 years agoFix PGI build options (again)
Martin Kroeker [Wed, 28 Aug 2019 09:31:20 +0000 (11:31 +0200)]
Fix PGI build options (again)

for #2237

5 years agoMerge pull request #2239 from martin-frbg/issue2231
Martin Kroeker [Wed, 28 Aug 2019 05:54:57 +0000 (07:54 +0200)]
Merge pull request #2239 from martin-frbg/issue2231

Fix 32bit armv8 compilation regression

5 years agoDo not abuse the global ARCH variable as a local temporary
Martin Kroeker [Tue, 27 Aug 2019 20:52:17 +0000 (22:52 +0200)]
Do not abuse the global ARCH variable as a local temporary

Setting it with a simple "uname -m" just to be able to decide whether to compile getarch.c with -march=native
may actually keep getarch from doing a proper probe. Fixes #2231, a regression caused by #2110

5 years agoMerge pull request #2 from xianyi/develop
Martin Kroeker [Tue, 27 Aug 2019 20:41:31 +0000 (22:41 +0200)]
Merge pull request #2 from xianyi/develop

merge develop

5 years agoMerge pull request #2228 from martin-frbg/issue2227
Martin Kroeker [Mon, 19 Aug 2019 16:26:51 +0000 (18:26 +0200)]
Merge pull request #2228 from martin-frbg/issue2227

Add Intel Goldmont Plus CPUID

5 years agoMerge branch 'develop' into issue2227
Martin Kroeker [Mon, 19 Aug 2019 12:20:39 +0000 (14:20 +0200)]
Merge branch 'develop' into issue2227

5 years agoAdd Intel Goldmont Plus CPUID
Martin Kroeker [Mon, 19 Aug 2019 12:19:21 +0000 (14:19 +0200)]
Add Intel Goldmont Plus CPUID

fixes #2227

5 years agoMerge pull request #2223 from martin-frbg/getarch-pgi
Martin Kroeker [Fri, 16 Aug 2019 10:21:30 +0000 (12:21 +0200)]
Merge pull request #2223 from martin-frbg/getarch-pgi

Make getarch compile with PGI

5 years agoFix PGI compiler detection for getarch
Martin Kroeker [Fri, 16 Aug 2019 07:00:11 +0000 (09:00 +0200)]
Fix PGI compiler detection for getarch

5 years agoDo not use -march=native with the PGI compiler
Martin Kroeker [Fri, 16 Aug 2019 06:58:10 +0000 (08:58 +0200)]
Do not use -march=native with the PGI compiler

5 years agoMerge pull request #1 from xianyi/develop
Martin Kroeker [Fri, 16 Aug 2019 06:56:15 +0000 (08:56 +0200)]
Merge pull request #1 from xianyi/develop

rebase

5 years agoAdd multithreading support to the x86_64 zdot kernel (#2222)
Martin Kroeker [Thu, 15 Aug 2019 20:09:12 +0000 (22:09 +0200)]
Add multithreading support to the x86_64 zdot kernel (#2222)

* Add multithreading support

copied from the ThunderX2T99 kernel. For #2221

5 years agoMerge pull request #2218 from martin-frbg/issue2215
Martin Kroeker [Wed, 14 Aug 2019 05:32:31 +0000 (07:32 +0200)]
Merge pull request #2218 from martin-frbg/issue2215

Make the new DGEMM regression test properly depend on CBLAS and LAPACKE

5 years agoMake the new DGEMM regression test properly depend on CBLAS and LAPACKE
Martin Kroeker [Tue, 13 Aug 2019 20:29:48 +0000 (22:29 +0200)]
Make the new DGEMM regression test properly depend on CBLAS and LAPACKE

fixes #2215

5 years agoMerge pull request #2216 from martin-frbg/issue2214
Martin Kroeker [Tue, 13 Aug 2019 11:59:33 +0000 (13:59 +0200)]
Merge pull request #2216 from martin-frbg/issue2214

Remove case-sensitivity in x86 LSAME on (AMD) cpus without CMOV

5 years agoFix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV
Martin Kroeker [Tue, 13 Aug 2019 08:19:10 +0000 (10:19 +0200)]
Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV

Problem was already noticed some years ago in #238, but back then the problem was only corrected in one of the #ifdef branches.
Fixes #2214

5 years agoUpdate with changes from 0.3.7
Martin Kroeker [Sun, 11 Aug 2019 21:31:36 +0000 (23:31 +0200)]
Update with changes from 0.3.7

5 years agoIncrement version to 0.3.8.dev
Martin Kroeker [Sun, 11 Aug 2019 21:28:47 +0000 (23:28 +0200)]
Increment version to 0.3.8.dev

5 years agoIncrement version to 0.3.8.dev
Martin Kroeker [Sun, 11 Aug 2019 21:28:13 +0000 (23:28 +0200)]
Increment version to 0.3.8.dev

5 years agoMerge pull request #2212 from martin-frbg/nofort-nolib
Martin Kroeker [Sun, 11 Aug 2019 18:26:34 +0000 (20:26 +0200)]
Merge pull request #2212 from martin-frbg/nofort-nolib

Avoid spurious dependency on the fortran runtime despite NOFORTRAN=1

5 years agoAvoid adding a spurious dependency on the fortran runtime despite NOFORTRAN=1
Martin Kroeker [Sun, 11 Aug 2019 14:24:39 +0000 (16:24 +0200)]
Avoid adding a spurious dependency on the fortran runtime despite NOFORTRAN=1

for cases where a fortran compiler is present but not wanted (e.g. not fully functional)

5 years agoMerge pull request #2211 from martin-frbg/arm64_gcc_trivial
Martin Kroeker [Sun, 11 Aug 2019 14:08:05 +0000 (16:08 +0200)]
Merge pull request #2211 from martin-frbg/arm64_gcc_trivial

Silence two nuisance warnings from gcc

5 years agoSilence two nuisance warnings from gcc
Martin Kroeker [Sun, 11 Aug 2019 10:46:05 +0000 (12:46 +0200)]
Silence two nuisance warnings from gcc

5 years agoMerge pull request #2208 from martin-frbg/munmap-debug
Martin Kroeker [Fri, 9 Aug 2019 05:55:35 +0000 (07:55 +0200)]
Merge pull request #2208 from martin-frbg/munmap-debug

Provide more information on mmap/munmap failure

5 years agoMerge pull request #2206 from martin-frbg/zen-dtrmm
Martin Kroeker [Fri, 9 Aug 2019 05:55:20 +0000 (07:55 +0200)]
Merge pull request #2206 from martin-frbg/zen-dtrmm

Replace vpermpd with vpermilpd in the Haswell DTRMM kernel

5 years agoMerge pull request #2199 from martin-frbg/zen-dtrsm
Martin Kroeker [Fri, 9 Aug 2019 05:55:02 +0000 (07:55 +0200)]
Merge pull request #2199 from martin-frbg/zen-dtrsm

Replace most vpermpd calls in the Haswell DTRSM_RN kernel

5 years agoAdd files via upload
Martin Kroeker [Thu, 8 Aug 2019 22:08:11 +0000 (00:08 +0200)]
Add files via upload

5 years agoProvide more information on mmap/munmap failure
Martin Kroeker [Thu, 8 Aug 2019 21:15:35 +0000 (23:15 +0200)]
Provide more information on mmap/munmap failure

for #2207

5 years agoReplace most vpermpd calls in the Haswell DTRSM_RN kernel
Martin Kroeker [Sat, 3 Aug 2019 10:40:13 +0000 (12:40 +0200)]
Replace most vpermpd calls in the Haswell DTRSM_RN kernel

5 years agoMerge pull request #2198 from martin-frbg/icelake
Martin Kroeker [Fri, 2 Aug 2019 06:36:14 +0000 (08:36 +0200)]
Merge pull request #2198 from martin-frbg/icelake

Update CPUID recognition for Intel Ice Lake

5 years agoAdd CPUID identification of Intel Ice Lake
Martin Kroeker [Thu, 1 Aug 2019 20:52:35 +0000 (22:52 +0200)]
Add CPUID identification of Intel Ice Lake

5 years agoAutodetect Intel Ice Lake (as SKYLAKEX target)
Martin Kroeker [Thu, 1 Aug 2019 20:51:09 +0000 (22:51 +0200)]
Autodetect Intel Ice Lake (as SKYLAKEX target)

5 years agoReplace vpermpd with vpermilpd in the Haswell DTRMM kernel
Martin Kroeker [Sun, 28 Jul 2019 21:17:28 +0000 (23:17 +0200)]
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel

to improve performance on AMD Zen (#2180) applying wjc404's improvement of the DGEMM kernel from #2186

5 years agoMerge pull request #2196 from wjc404/develop
Martin Kroeker [Sun, 28 Jul 2019 21:11:40 +0000 (23:11 +0200)]
Merge pull request #2196 from wjc404/develop

Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S