platform/upstream/openblas.git
4 years agoMerge pull request #2873 from martin-frbg/issue2871
Martin Kroeker [Thu, 1 Oct 2020 04:38:22 +0000 (06:38 +0200)]
Merge pull request #2873 from martin-frbg/issue2871

Check for __linux rather than linux in cpuid code and benchmarks

4 years agoMerge pull request #2865 from thisch/backticks
Martin Kroeker [Thu, 1 Oct 2020 04:38:06 +0000 (06:38 +0200)]
Merge pull request #2865 from thisch/backticks

Consolidate usage of backticks for build options

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:59:41 +0000 (22:59 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:50:21 +0000 (22:50 +0200)]
Change ifdef linux to __linux for C11 compatibility

and add a fallback for unsupported operating systems in detect()

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:47:25 +0000 (22:47 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:46:25 +0000 (22:46 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:45:18 +0000 (22:45 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoChange ifdef linux to __linux for C11 compatibility
Martin Kroeker [Wed, 30 Sep 2020 20:43:25 +0000 (22:43 +0200)]
Change ifdef linux to __linux for C11 compatibility

4 years agoMerge pull request #91 from xianyi/develop
Martin Kroeker [Mon, 28 Sep 2020 20:48:53 +0000 (22:48 +0200)]
Merge pull request #91 from xianyi/develop

rebase

4 years agoMerge pull request #2866 from RajalakshmiSR/p10_dcopy
Martin Kroeker [Mon, 28 Sep 2020 05:22:54 +0000 (07:22 +0200)]
Merge pull request #2866 from RajalakshmiSR/p10_dcopy

Optimize dcopy/zcopy for POWER10

4 years agoOptimize dcopy/zcopy for POWER10
Rajalakshmi Srinivasaraghavan [Mon, 28 Sep 2020 02:42:32 +0000 (21:42 -0500)]
Optimize dcopy/zcopy for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.

4 years agoConsolidate usage of backticks for build options
Thomas Hisch [Sun, 27 Sep 2020 22:42:17 +0000 (00:42 +0200)]
Consolidate usage of backticks for build options

There were some build options in the README that were not
highlighted. Now all are highlighted.

4 years agoMerge pull request #2853 from Qiyu8/usimd-daxpy
Martin Kroeker [Sun, 27 Sep 2020 21:19:59 +0000 (23:19 +0200)]
Merge pull request #2853 from Qiyu8/usimd-daxpy

Optimize the performance of daxpy by using universal intrinsics

4 years agoMerge pull request #2864 from martin-frbg/lapack445
Martin Kroeker [Sun, 27 Sep 2020 21:11:17 +0000 (23:11 +0200)]
Merge pull request #2864 from martin-frbg/lapack445

FIx underflow/rounding errors in LAPACK (S,D)LANV2

4 years agoFIx underflow/rounding errors in LAPACK (S,D)LANV2
Martin Kroeker [Sun, 27 Sep 2020 20:59:20 +0000 (22:59 +0200)]
FIx underflow/rounding errors in LAPACK (S,D)LANV2

Reference-LAPACK PR 445, fixing their issue 263

4 years agoMerge pull request #2863 from martin-frbg/readmefixes
Martin Kroeker [Sun, 27 Sep 2020 20:50:25 +0000 (22:50 +0200)]
Merge pull request #2863 from martin-frbg/readmefixes

Readmefixes

4 years agoUpdate cpu list, outline cmake build, clarify scope of set_num_threads extension
Martin Kroeker [Sun, 27 Sep 2020 20:48:41 +0000 (22:48 +0200)]
Update cpu list, outline cmake build, clarify scope of set_num_threads extension

4 years agoMerge pull request #90 from xianyi/develop
Martin Kroeker [Sun, 27 Sep 2020 20:35:45 +0000 (22:35 +0200)]
Merge pull request #90 from xianyi/develop

rebase

4 years agoMerge pull request #2850 from xiaojiayuan111/develop
Martin Kroeker [Sun, 27 Sep 2020 10:12:35 +0000 (12:12 +0200)]
Merge pull request #2850 from xiaojiayuan111/develop

fix a bug of trmm

4 years agoremove default support for FMA4 on zen architect
Qiyu8 [Sun, 27 Sep 2020 01:35:50 +0000 (09:35 +0800)]
remove default support for FMA4 on zen architect

4 years agoMerge pull request #2854 from martin-frbg/travis-graviton
Martin Kroeker [Wed, 23 Sep 2020 19:59:18 +0000 (21:59 +0200)]
Merge pull request #2854 from martin-frbg/travis-graviton

Add an AWS-Graviton2 build to Travis CI

4 years agoAdd AWS Graviton2 build
Martin Kroeker [Wed, 23 Sep 2020 17:02:20 +0000 (19:02 +0200)]
Add AWS Graviton2 build

4 years agoMerge pull request #88 from xianyi/develop
Martin Kroeker [Tue, 22 Sep 2020 21:15:33 +0000 (23:15 +0200)]
Merge pull request #88 from xianyi/develop

rebase

4 years agoMerge pull request #2851 from martin-frbg/travis-xcode12
Martin Kroeker [Tue, 22 Sep 2020 19:44:55 +0000 (21:44 +0200)]
Merge pull request #2851 from martin-frbg/travis-xcode12

Add an OSX build with xcode12

4 years agoAdd an OSX build with xcode12
Martin Kroeker [Tue, 22 Sep 2020 15:26:19 +0000 (17:26 +0200)]
Add an OSX build with xcode12

4 years agoperformance improved
Qiyu8 [Tue, 22 Sep 2020 08:52:15 +0000 (16:52 +0800)]
performance improved

4 years agofix a bug of trmm
y00512012 [Tue, 22 Sep 2020 08:47:10 +0000 (16:47 +0800)]
fix a bug of trmm

4 years agoOptimize the performance of daxpy by using universal intrinsics
Qiyu8 [Tue, 22 Sep 2020 02:38:35 +0000 (10:38 +0800)]
Optimize the performance of daxpy by using universal intrinsics

4 years agoMerge pull request #2847 from mhillenibm/fixup_cscal
Martin Kroeker [Mon, 21 Sep 2020 20:22:43 +0000 (22:22 +0200)]
Merge pull request #2847 from mhillenibm/fixup_cscal

s390x: fix cscal and zscal implementations

4 years agos390x: fix cscal and zscal implementations
Marius Hillenbrand [Mon, 14 Sep 2020 16:36:31 +0000 (18:36 +0200)]
s390x: fix cscal and zscal implementations

The implementation of complex scalar * vector multiplication for Z14
makes some LAPACK tests fail because the numerical differences to the
reference implementation exceed the threshold (as can be seen by running
make lapack-test and replacing kernel/zarch/cscal.c with a generic
implementation for comparison).

The complex multiplication uses terms of the form a * b + c * d for both
real and imaginary parts. The assembly code (and compiler-emitted code
as well) uses fused multiply add operations for the second product and
sum. The results can be "surprising", for example when both terms in the
imaginary part nearly cancel each other out. In that case, the second
product contributes more digits to the sum than the first product that
has been rounded before.

One option is to use separate multiplications (which then round the same
way) and a distinct add. Change the code to pursue that path, by (1)
requesting the compiler not to contract the operations into FMAs and (2)
replacing the assembly kernel with corresponding vectorized C code
(where change 1 also applies).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agos390x: for clang use fp-contract=on instead of fast
Marius Hillenbrand [Wed, 16 Sep 2020 13:55:38 +0000 (15:55 +0200)]
s390x: for clang use fp-contract=on instead of fast

Make clang slightly more cautious when contracting floating-point
operations (e.g., when applying fused multiply add) by setting
-ffp-contract=on (instead of fast).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agos390x: move common vector definitions and utils into header
Marius Hillenbrand [Tue, 15 Sep 2020 08:54:37 +0000 (10:54 +0200)]
s390x: move common vector definitions and utils into header

... to facilitate reuse beyond gemm_vec.c and avoid code duplication.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agoMerge pull request #2845 from martin-frbg/lapack443
Martin Kroeker [Fri, 18 Sep 2020 21:18:41 +0000 (23:18 +0200)]
Merge pull request #2845 from martin-frbg/lapack443

Fix workspace query in LAPACK xGELQ (Reference-LAPACK 443)

4 years agoFix workspace query in xGELQ (Reference-LAPACK PR443)
Martin Kroeker [Fri, 18 Sep 2020 07:19:46 +0000 (09:19 +0200)]
Fix workspace query in xGELQ (Reference-LAPACK PR443)

4 years agoMerge pull request #86 from xianyi/develop
Martin Kroeker [Fri, 18 Sep 2020 07:17:49 +0000 (09:17 +0200)]
Merge pull request #86 from xianyi/develop

rebase

4 years agoMerge pull request #2844 from RajalakshmiSR/daxpy_p10
Martin Kroeker [Thu, 17 Sep 2020 21:46:32 +0000 (23:46 +0200)]
Merge pull request #2844 from RajalakshmiSR/daxpy_p10

Optimize daxpy/zaxpy for POWER10

4 years agoOptimize daxpy/zaxpy for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 17 Sep 2020 17:56:28 +0000 (12:56 -0500)]
Optimize daxpy/zaxpy for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.

4 years agoMerge pull request #2841 from martin-frbg/cpp_gemvtest
Martin Kroeker [Thu, 17 Sep 2020 15:29:56 +0000 (17:29 +0200)]
Merge pull request #2841 from martin-frbg/cpp_gemvtest

Make thread safety tests available to CMAKE and support running only the GEMV version

4 years agoMerge pull request #2843 from mhillenibm/fixup_merge_dynamic_zarch
Martin Kroeker [Thu, 17 Sep 2020 15:28:43 +0000 (17:28 +0200)]
Merge pull request #2843 from mhillenibm/fixup_merge_dynamic_zarch

s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification

4 years agos390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
Marius Hillenbrand [Thu, 17 Sep 2020 14:45:07 +0000 (16:45 +0200)]
s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification

An unrelated commit and merge inadvertently reverted our recent two
changes for simplifying DYNAMIC_ARCH on s390x. Simply reapply the
changes.

Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.

To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.

We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.

Closes: #2842
Fixes: ba644378dce7 ("Copy BUILD_ options available to the compiler flags"

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agoAdd option for running only the less demanding GEMV version of the thread safety...
Martin Kroeker [Thu, 17 Sep 2020 11:49:24 +0000 (13:49 +0200)]
Add option for running only the less demanding GEMV version of the thread safety tests

4 years agoSupport running just the GEMV version of the thread safety test
Martin Kroeker [Thu, 17 Sep 2020 11:46:41 +0000 (13:46 +0200)]
Support running just the GEMV version of the thread safety test

4 years agoAdd cpp_thread_test options
Martin Kroeker [Thu, 17 Sep 2020 11:45:40 +0000 (13:45 +0200)]
Add cpp_thread_test options

4 years agoAdd CMakeLists.txt
Martin Kroeker [Thu, 17 Sep 2020 11:43:55 +0000 (13:43 +0200)]
Add CMakeLists.txt

4 years agoMerge pull request #85 from xianyi/develop
Martin Kroeker [Thu, 17 Sep 2020 11:42:47 +0000 (13:42 +0200)]
Merge pull request #85 from xianyi/develop

rebase

4 years agoMerge pull request #2840 from martin-frbg/fixup2833
Martin Kroeker [Wed, 16 Sep 2020 16:55:50 +0000 (18:55 +0200)]
Merge pull request #2840 from martin-frbg/fixup2833

Fix for cmake BUILD_ settings PR 2833

4 years agoActivate all BUILD_ options if none was specified
Martin Kroeker [Tue, 15 Sep 2020 21:15:34 +0000 (23:15 +0200)]
Activate all BUILD_ options if none was specified

4 years agoMerge pull request #84 from xianyi/develop
Martin Kroeker [Tue, 15 Sep 2020 21:13:30 +0000 (23:13 +0200)]
Merge pull request #84 from xianyi/develop

rebase

4 years agoMerge pull request #2838 from austinpagan/gordon_trmm
Martin Kroeker [Tue, 15 Sep 2020 19:17:48 +0000 (21:17 +0200)]
Merge pull request #2838 from austinpagan/gordon_trmm

Adding performance patch for trmm, just like trsm (#2836)

4 years agoAdding performance patch for trmm, just like #2836
fossum [Tue, 15 Sep 2020 13:59:50 +0000 (08:59 -0500)]
Adding performance patch for trmm, just like #2836

4 years agoMerge pull request #2836 from austinpagan/gordon_trsm
Martin Kroeker [Tue, 15 Sep 2020 09:26:37 +0000 (11:26 +0200)]
Merge pull request #2836 from austinpagan/gordon_trsm

Fixing a performance bug in trsm_[LR].c.

4 years agoFixing a performance bug in trsm_[LR].c.
fossum [Mon, 14 Sep 2020 18:10:48 +0000 (13:10 -0500)]
Fixing a performance bug in trsm_[LR].c.

4 years agoMerge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Martin Kroeker [Mon, 14 Sep 2020 13:00:19 +0000 (15:00 +0200)]
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis

Add bfloat16 based dot and conversion with single/double

4 years agoMerge pull request #2833 from martin-frbg/issue2830
Martin Kroeker [Mon, 14 Sep 2020 05:24:23 +0000 (07:24 +0200)]
Merge pull request #2833 from martin-frbg/issue2830

Make building the tests for individual data types conditional on the respective BUILD option

4 years agoCopy BUILD_ options available to the compiler flags
Martin Kroeker [Sun, 13 Sep 2020 22:03:33 +0000 (00:03 +0200)]
Copy BUILD_ options available to the compiler flags

4 years agoAdd BUILD_SINGLE etc
Martin Kroeker [Sun, 13 Sep 2020 21:55:11 +0000 (23:55 +0200)]
Add BUILD_SINGLE etc

4 years agoRearrange ifdefs
Martin Kroeker [Sun, 13 Sep 2020 21:29:01 +0000 (23:29 +0200)]
Rearrange ifdefs

4 years agoRemove spurious tests for complex ASUM and NRM2
Martin Kroeker [Sun, 13 Sep 2020 20:20:41 +0000 (22:20 +0200)]
Remove spurious tests for complex ASUM and NRM2

4 years agoMake tests conditional on BUILD_DOUBLE
Martin Kroeker [Sun, 13 Sep 2020 20:17:46 +0000 (22:17 +0200)]
Make tests conditional on BUILD_DOUBLE

4 years agoMake tests for individual variable types conditional on the respective BUILD_ option
Martin Kroeker [Sun, 13 Sep 2020 19:52:18 +0000 (21:52 +0200)]
Make tests for individual variable types conditional on the respective BUILD_ option

4 years agoMake building individual tests depend on BUILD_SINGLE etc defines
Martin Kroeker [Sun, 13 Sep 2020 19:50:12 +0000 (21:50 +0200)]
Make building individual tests depend on BUILD_SINGLE etc defines

4 years agoRemove spurious complex16 tests
Martin Kroeker [Sun, 13 Sep 2020 19:49:01 +0000 (21:49 +0200)]
Remove spurious complex16 tests

4 years agoCopy BUILD_* directives to the compiler options to allow ifdef in tests
Martin Kroeker [Sun, 13 Sep 2020 19:47:55 +0000 (21:47 +0200)]
Copy BUILD_* directives to the compiler options to allow ifdef in tests

4 years agoMerge pull request #2832 from martin-frbg/issue2831
Martin Kroeker [Sun, 13 Sep 2020 19:20:30 +0000 (21:20 +0200)]
Merge pull request #2832 from martin-frbg/issue2831

Fix gfortran detection by vendor matching

4 years agoFix vendor match for GCC gfortran
Martin Kroeker [Sun, 13 Sep 2020 16:40:59 +0000 (18:40 +0200)]
Fix vendor match for GCC gfortran

4 years agoMerge pull request #83 from xianyi/develop
Martin Kroeker [Sun, 13 Sep 2020 16:30:11 +0000 (18:30 +0200)]
Merge pull request #83 from xianyi/develop

rebase

4 years agoMerge pull request #2829 from mhillenibm/clang_s390x
Martin Kroeker [Tue, 8 Sep 2020 21:36:41 +0000 (23:36 +0200)]
Merge pull request #2829 from mhillenibm/clang_s390x

Fix DYNAMIC_ARCH=1 with clang s390x

4 years agoAdd an s390 build with clang to the Travis configuration
Marius Hillenbrand [Tue, 8 Sep 2020 17:30:37 +0000 (19:30 +0200)]
Add an s390 build with clang to the Travis configuration

Since clang builds have been fixed on s390x, including support for
DYNAMIC_ARCH, cover that build type in Travis.

Explicitly request Ubuntu 20.04 (codename focal) to get a recent
LLVM/clang version 10.x and thereby cover all s390x architecture
generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot
build the inline assembly in some of the Z13 and Z14 kernels.

LLVM/clang currently does not support OpenMP on s390x, so disable that
in the build.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agoUpdate CONTRIBUTERS.md - clang build fixes for IBM z
Marius Hillenbrand [Tue, 8 Sep 2020 13:15:15 +0000 (15:15 +0200)]
Update CONTRIBUTERS.md - clang build fixes for IBM z

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agos390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions
Marius Hillenbrand [Mon, 7 Sep 2020 15:13:03 +0000 (17:13 +0200)]
s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions

Enable building DYNAMIC_ARCH support with older versions of glibc that
do not know about the hwcap flag HWCAP_S390_VXE yet.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agos390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code
Marius Hillenbrand [Mon, 7 Sep 2020 15:04:03 +0000 (17:04 +0200)]
s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code

... instead of duplicating the (old) mechanism from the Makefile that
aimed to derive supported architecture generations from the gcc
version.

To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.

We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agos390x/DYNAMIC_ARCH: generalize detecting supported archs for clang
Marius Hillenbrand [Fri, 4 Sep 2020 14:32:45 +0000 (16:32 +0200)]
s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang

Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
4 years agoMerge pull request #2828 from martin-frbg/lapack438
Martin Kroeker [Tue, 8 Sep 2020 08:25:19 +0000 (10:25 +0200)]
Merge pull request #2828 from martin-frbg/lapack438

Correct xLASET arguments in LAPACK EIG tests

4 years agoCorrect dimension argument to xLASET
Martin Kroeker [Mon, 7 Sep 2020 20:03:46 +0000 (22:03 +0200)]
Correct dimension argument to xLASET

from Reference-LAPACK PR 438

4 years agoMerge pull request #82 from xianyi/develop
Martin Kroeker [Mon, 7 Sep 2020 19:59:13 +0000 (21:59 +0200)]
Merge pull request #82 from xianyi/develop

rebase

4 years agoMerge pull request #2803 from xiegengxin/AVX2-asum
Martin Kroeker [Sun, 6 Sep 2020 16:32:15 +0000 (18:32 +0200)]
Merge pull request #2803 from xiegengxin/AVX2-asum

Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic

4 years agoMerge pull request #2824 from martin-frbg/asumbench
Martin Kroeker [Sun, 6 Sep 2020 08:05:47 +0000 (10:05 +0200)]
Merge pull request #2824 from martin-frbg/asumbench

Use POSIX2001 clock.gettime in asum benchmark if available

4 years agoUse POSIX2001 clock.gettime for higher resolution
Martin Kroeker [Sat, 5 Sep 2020 17:44:01 +0000 (19:44 +0200)]
Use POSIX2001 clock.gettime for higher resolution

4 years agoMerge pull request #2816 from martin-frbg/silicon
Martin Kroeker [Sat, 5 Sep 2020 17:17:59 +0000 (19:17 +0200)]
Merge pull request #2816 from martin-frbg/silicon

Add basic support for Apple Vortex (ARM64) cpu

4 years agoMerge pull request #2823 from martin-frbg/fix2778
Martin Kroeker [Sat, 5 Sep 2020 15:29:38 +0000 (17:29 +0200)]
Merge pull request #2823 from martin-frbg/fix2778

Improve fix for lapack-test EIG/cchkhb2stg from PR 2778

4 years agoCorrect argument to SLASET (Improves fix from PR2778)
Martin Kroeker [Sat, 5 Sep 2020 11:06:31 +0000 (13:06 +0200)]
Correct argument to SLASET (Improves fix from PR2778)

as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior

4 years agoMerge pull request #81 from xianyi/develop
Martin Kroeker [Sat, 5 Sep 2020 10:47:03 +0000 (12:47 +0200)]
Merge pull request #81 from xianyi/develop

rebase

4 years agoMerge pull request #2822 from martin-frbg/issue2821
Martin Kroeker [Sat, 5 Sep 2020 10:39:32 +0000 (12:39 +0200)]
Merge pull request #2822 from martin-frbg/issue2821

Fix potential domain error in sqrt

4 years agoFix potentiol domain error in sqrt
Martin Kroeker [Sat, 5 Sep 2020 07:44:33 +0000 (09:44 +0200)]
Fix potentiol domain error in sqrt

4 years agoMerge pull request #2819 from h-vetinari/carry_lapack_437
Martin Kroeker [Fri, 4 Sep 2020 21:50:43 +0000 (23:50 +0200)]
Merge pull request #2819 from h-vetinari/carry_lapack_437

Carry lapack#437

4 years agoMerge pull request #2820 from RajalakshmiSR/clang
Martin Kroeker [Fri, 4 Sep 2020 21:09:31 +0000 (23:09 +0200)]
Merge pull request #2820 from RajalakshmiSR/clang

POWER9: Fix mcpu option with clang

4 years agoPOWER9: Fix mcpu option with clang
Rajalakshmi Srinivasaraghavan [Fri, 4 Sep 2020 15:36:19 +0000 (10:36 -0500)]
POWER9: Fix mcpu option with clang

Adding check for compiler type before checking GCC version in Makefile.
This allows clang to use power9 instead of power8 when CORE is POWER9.

4 years agoadapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h
H. Vetinari [Wed, 2 Sep 2020 20:46:47 +0000 (22:46 +0200)]
adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h

4 years agoFollow-up to lapack#434 & lapack#409: add missing 'const' in signatures
H. Vetinari [Wed, 2 Sep 2020 20:41:50 +0000 (22:41 +0200)]
Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures

Based on how the surrounding functions in lapack.h are handling the
parameters, particularly the ?ggsv?3-variants of the affected functions

4 years agoFollow-up to lapack#434 & lapack#409: fix signature mismatches
H. Vetinari [Wed, 2 Sep 2020 20:38:56 +0000 (22:38 +0200)]
Follow-up to lapack#434 & lapack#409: fix signature mismatches

4 years agoMerge pull request #2778 from martin-frbg/lapackeig
Martin Kroeker [Fri, 4 Sep 2020 08:06:02 +0000 (10:06 +0200)]
Merge pull request #2778 from martin-frbg/lapackeig

Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite

4 years agoAdd bfloat16 based dot and conversion with single/double
Chen, Guobing [Wed, 26 Aug 2020 22:42:28 +0000 (06:42 +0800)]
Add bfloat16 based dot and conversion with single/double

1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
4 years agoMerge pull request #2817 from martin-frbg/lapack436
Martin Kroeker [Thu, 3 Sep 2020 15:10:23 +0000 (17:10 +0200)]
Merge pull request #2817 from martin-frbg/lapack436

LAPACKE: fix declaration of work arrays in [cz]gesvdq

4 years agoRename KERNEL.SILICON to KERNEL.VORTEX
Martin Kroeker [Thu, 3 Sep 2020 06:44:20 +0000 (08:44 +0200)]
Rename KERNEL.SILICON to KERNEL.VORTEX

4 years agoRename SILICON to VORTEX and fix duplicate numbering
Martin Kroeker [Thu, 3 Sep 2020 06:43:26 +0000 (08:43 +0200)]
Rename SILICON to VORTEX and fix duplicate numbering

4 years agoRename SILICON to VORTEX
Martin Kroeker [Thu, 3 Sep 2020 06:38:53 +0000 (08:38 +0200)]
Rename SILICON to VORTEX

4 years agorename SILICON to VORTEX
Martin Kroeker [Thu, 3 Sep 2020 06:38:08 +0000 (08:38 +0200)]
rename SILICON to VORTEX

4 years agoalign to 64, using SSE when input size is small
Gengxin Xie [Tue, 1 Sep 2020 07:41:48 +0000 (15:41 +0800)]
align to 64, using SSE when input size is small

4 years agoFix data type of work array in zgesvdq prototype
Martin Kroeker [Wed, 2 Sep 2020 21:44:44 +0000 (23:44 +0200)]
Fix data type of work array in zgesvdq prototype

4 years agoFix data type of rwork array
Martin Kroeker [Wed, 2 Sep 2020 21:41:51 +0000 (23:41 +0200)]
Fix data type of rwork array