Martin Kroeker [Sun, 13 Sep 2020 19:47:55 +0000 (21:47 +0200)]
Copy BUILD_* directives to the compiler options to allow ifdef in tests
Martin Kroeker [Sun, 13 Sep 2020 16:30:11 +0000 (18:30 +0200)]
Merge pull request #83 from xianyi/develop
rebase
Martin Kroeker [Tue, 8 Sep 2020 21:36:41 +0000 (23:36 +0200)]
Merge pull request #2829 from mhillenibm/clang_s390x
Fix DYNAMIC_ARCH=1 with clang s390x
Marius Hillenbrand [Tue, 8 Sep 2020 17:30:37 +0000 (19:30 +0200)]
Add an s390 build with clang to the Travis configuration
Since clang builds have been fixed on s390x, including support for
DYNAMIC_ARCH, cover that build type in Travis.
Explicitly request Ubuntu 20.04 (codename focal) to get a recent
LLVM/clang version 10.x and thereby cover all s390x architecture
generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot
build the inline assembly in some of the Z13 and Z14 kernels.
LLVM/clang currently does not support OpenMP on s390x, so disable that
in the build.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Tue, 8 Sep 2020 13:15:15 +0000 (15:15 +0200)]
Update CONTRIBUTERS.md - clang build fixes for IBM z
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Mon, 7 Sep 2020 15:13:03 +0000 (17:13 +0200)]
s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions
Enable building DYNAMIC_ARCH support with older versions of glibc that
do not know about the hwcap flag HWCAP_S390_VXE yet.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Mon, 7 Sep 2020 15:04:03 +0000 (17:04 +0200)]
s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code
... instead of duplicating the (old) mechanism from the Makefile that
aimed to derive supported architecture generations from the gcc
version.
To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.
We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Fri, 4 Sep 2020 14:32:45 +0000 (16:32 +0200)]
s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang
Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Martin Kroeker [Tue, 8 Sep 2020 08:25:19 +0000 (10:25 +0200)]
Merge pull request #2828 from martin-frbg/lapack438
Correct xLASET arguments in LAPACK EIG tests
Martin Kroeker [Mon, 7 Sep 2020 20:03:46 +0000 (22:03 +0200)]
Correct dimension argument to xLASET
from Reference-LAPACK PR 438
Martin Kroeker [Mon, 7 Sep 2020 19:59:13 +0000 (21:59 +0200)]
Merge pull request #82 from xianyi/develop
rebase
Martin Kroeker [Sun, 6 Sep 2020 16:32:15 +0000 (18:32 +0200)]
Merge pull request #2803 from xiegengxin/AVX2-asum
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
Martin Kroeker [Sun, 6 Sep 2020 08:05:47 +0000 (10:05 +0200)]
Merge pull request #2824 from martin-frbg/asumbench
Use POSIX2001 clock.gettime in asum benchmark if available
Martin Kroeker [Sat, 5 Sep 2020 17:44:01 +0000 (19:44 +0200)]
Use POSIX2001 clock.gettime for higher resolution
Martin Kroeker [Sat, 5 Sep 2020 17:17:59 +0000 (19:17 +0200)]
Merge pull request #2816 from martin-frbg/silicon
Add basic support for Apple Vortex (ARM64) cpu
Martin Kroeker [Sat, 5 Sep 2020 15:29:38 +0000 (17:29 +0200)]
Merge pull request #2823 from martin-frbg/fix2778
Improve fix for lapack-test EIG/cchkhb2stg from PR 2778
Martin Kroeker [Sat, 5 Sep 2020 11:06:31 +0000 (13:06 +0200)]
Correct argument to SLASET (Improves fix from PR2778)
as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior
Martin Kroeker [Sat, 5 Sep 2020 10:47:03 +0000 (12:47 +0200)]
Merge pull request #81 from xianyi/develop
rebase
Martin Kroeker [Sat, 5 Sep 2020 10:39:32 +0000 (12:39 +0200)]
Merge pull request #2822 from martin-frbg/issue2821
Fix potential domain error in sqrt
Martin Kroeker [Sat, 5 Sep 2020 07:44:33 +0000 (09:44 +0200)]
Fix potentiol domain error in sqrt
Martin Kroeker [Fri, 4 Sep 2020 21:50:43 +0000 (23:50 +0200)]
Merge pull request #2819 from h-vetinari/carry_lapack_437
Carry lapack#437
Martin Kroeker [Fri, 4 Sep 2020 21:09:31 +0000 (23:09 +0200)]
Merge pull request #2820 from RajalakshmiSR/clang
POWER9: Fix mcpu option with clang
Rajalakshmi Srinivasaraghavan [Fri, 4 Sep 2020 15:36:19 +0000 (10:36 -0500)]
POWER9: Fix mcpu option with clang
Adding check for compiler type before checking GCC version in Makefile.
This allows clang to use power9 instead of power8 when CORE is POWER9.
H. Vetinari [Wed, 2 Sep 2020 20:46:47 +0000 (22:46 +0200)]
adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h
H. Vetinari [Wed, 2 Sep 2020 20:41:50 +0000 (22:41 +0200)]
Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures
Based on how the surrounding functions in lapack.h are handling the
parameters, particularly the ?ggsv?3-variants of the affected functions
H. Vetinari [Wed, 2 Sep 2020 20:38:56 +0000 (22:38 +0200)]
Follow-up to lapack#434 & lapack#409: fix signature mismatches
Martin Kroeker [Fri, 4 Sep 2020 08:06:02 +0000 (10:06 +0200)]
Merge pull request #2778 from martin-frbg/lapackeig
Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite
Martin Kroeker [Thu, 3 Sep 2020 15:10:23 +0000 (17:10 +0200)]
Merge pull request #2817 from martin-frbg/lapack436
LAPACKE: fix declaration of work arrays in [cz]gesvdq
Martin Kroeker [Thu, 3 Sep 2020 06:44:20 +0000 (08:44 +0200)]
Rename KERNEL.SILICON to KERNEL.VORTEX
Martin Kroeker [Thu, 3 Sep 2020 06:43:26 +0000 (08:43 +0200)]
Rename SILICON to VORTEX and fix duplicate numbering
Martin Kroeker [Thu, 3 Sep 2020 06:38:53 +0000 (08:38 +0200)]
Rename SILICON to VORTEX
Martin Kroeker [Thu, 3 Sep 2020 06:38:08 +0000 (08:38 +0200)]
rename SILICON to VORTEX
Gengxin Xie [Tue, 1 Sep 2020 07:41:48 +0000 (15:41 +0800)]
align to 64, using SSE when input size is small
Martin Kroeker [Wed, 2 Sep 2020 21:44:44 +0000 (23:44 +0200)]
Fix data type of work array in zgesvdq prototype
Martin Kroeker [Wed, 2 Sep 2020 21:41:51 +0000 (23:41 +0200)]
Fix data type of rwork array
Martin Kroeker [Wed, 2 Sep 2020 20:56:58 +0000 (22:56 +0200)]
Create KERNEL.SILICON
Martin Kroeker [Wed, 2 Sep 2020 20:52:12 +0000 (22:52 +0200)]
Add AppleSIlicon cpu
Martin Kroeker [Wed, 2 Sep 2020 20:48:49 +0000 (22:48 +0200)]
Add Apple Silicon
Martin Kroeker [Wed, 2 Sep 2020 20:47:38 +0000 (22:47 +0200)]
Detect AppleSilicon cpu on OSX
Martin Kroeker [Wed, 2 Sep 2020 20:16:41 +0000 (22:16 +0200)]
Merge pull request #80 from xianyi/develop
rebase
Martin Kroeker [Wed, 2 Sep 2020 14:56:01 +0000 (16:56 +0200)]
Merge pull request #2815 from mhillenibm/clang_s390x
Fix build with clang on s390x
Marius Hillenbrand [Tue, 1 Sep 2020 14:16:53 +0000 (16:16 +0200)]
s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang
The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses
explicit unrolling and interleaving to improve performance. The code
employs an empty inline asm statement with operands that constrain the
compiler's instruction scheduling and thereby enforce proper overlapping
of load and compute phases. Fix an ifdef to apply that for clang builds,
as well.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Tue, 1 Sep 2020 13:09:32 +0000 (15:09 +0200)]
s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior)
gcc's default setting for floating-point expression contraction is
"fast", which allows the compiler to emit fused multiply adds instead of
separate multiplies and adds (amongst others). Fused multiply-adds,
which assembly kernels typically apply, also bring a significant
performance advantage to the C implementation for matrix-matrix
multiplication on s390x. To enable that performance advantage for builds
with clang, add -ffp-contract=fast to the compiler options.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Tue, 1 Sep 2020 10:08:05 +0000 (12:08 +0200)]
s390x: avoid variable-length arrays in struct for asm operands
... since it is not required and clang does not support that gcc
extension. Instead, use a variable-length array directly for these
operands.
Note that, while the actual inline assembly code does not directly use
these memory operands, they serve to inform the compiler that it cannot
reorder reads or writes to/from the input and output data across the
inline asm statements.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Tue, 1 Sep 2020 10:04:28 +0000 (12:04 +0200)]
s390x: avoid inline assembly for vector loads for clang
... since clang does not support the instruction format for inline
assembly and also it is not required for current versions of clang.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Tue, 1 Sep 2020 09:58:48 +0000 (11:58 +0200)]
s390x: replace nop with "nop 0" in inline assembly
... as a bandaid for building with clang until LLVM's internal assembler
supports nops without operand.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Marius Hillenbrand [Tue, 1 Sep 2020 11:59:06 +0000 (13:59 +0200)]
s390x: use "lghi" for immediate values to fix build with clang
Some of the kernels written in assembly utilize a "load address"
instruction for loading an immediate value into a register. That is
both unnecessarily complex and LLVM's assembler does not understand that
specific syntax. Thus, replace with the appropriate "load immediate"
instruction, which is also clearer to read.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Martin Kroeker [Tue, 1 Sep 2020 21:39:46 +0000 (23:39 +0200)]
Merge pull request #2813 from martin-frbg/issue2804-2
Fix for c_check misinterpreting arm64 in uname -m output as armv7
Martin Kroeker [Tue, 1 Sep 2020 17:54:08 +0000 (19:54 +0200)]
Fix c_check misinterpreting arm64 in uname output to mean armv7
additionla fix for upcoming OSX on ARM64 related to #2804, as suggested by fxcoudert in #2805
Martin Kroeker [Tue, 1 Sep 2020 15:19:14 +0000 (17:19 +0200)]
Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
Martin Kroeker [Tue, 1 Sep 2020 14:04:03 +0000 (16:04 +0200)]
Merge pull request #2797 from martin-frbg/relafixes1
ReLAPACK fixes
Martin Kroeker [Tue, 1 Sep 2020 10:03:53 +0000 (12:03 +0200)]
Merge pull request #79 from xianyi/develop
rebase
Martin Kroeker [Tue, 1 Sep 2020 08:44:48 +0000 (10:44 +0200)]
Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808)
* Fix misnaming of LAPACK_?ggsvp and ?ggsvd function prototypes as LAPACKE_
* Drop the LAPACKE matrix_layout parameter from the argument lists, change ints to pointers and add missing work arguments.
Martin Kroeker [Mon, 31 Aug 2020 21:44:56 +0000 (23:44 +0200)]
Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
Martin Kroeker [Mon, 31 Aug 2020 18:03:21 +0000 (20:03 +0200)]
Report cpu as ARMV8 instead of just giving up on non-Linux hosts
Martin Kroeker [Mon, 31 Aug 2020 18:02:08 +0000 (20:02 +0200)]
Handle Apple labeling armv8 as arm64 rather than aarch64
Gengxin Xie [Mon, 31 Aug 2020 06:39:08 +0000 (14:39 +0800)]
define __AVX2__ to ensure the haswell code compiled with avx2
Gengxin Xie [Fri, 21 Aug 2020 06:44:36 +0000 (14:44 +0800)]
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
Martin Kroeker [Fri, 28 Aug 2020 20:52:11 +0000 (22:52 +0200)]
Merge pull request #2799 from RajalakshmiSR/p10_ger
POWER10: Avoid setting accumulators to zero in gemm kernels
Rajalakshmi Srinivasaraghavan [Fri, 28 Aug 2020 15:42:54 +0000 (10:42 -0500)]
POWER10: Avoid setting accumulators to zero in gemm kernels
For the first iteration, it is better to use xvf*ger instead of xvf*gerpp
builtins which helps to avoid setting accumulators to zero. This helps
to reduce few instructions.
Martin Kroeker [Fri, 28 Aug 2020 06:30:59 +0000 (08:30 +0200)]
Merge pull request #2798 from kadler/aix-cpuid
Fix compile error on AIX cpuid detection
Kevin Adler [Fri, 28 Aug 2020 04:08:33 +0000 (23:08 -0500)]
Fix compile error on AIX cpuid detection
In 589c74a the cpuid detection was changed to use systemcfg, but a copy
and paste error was introduced during some refactoring that caused
POWER7 detection to reference CPUTYPE_POWER7 (which doesn't exist)
instead of CPUTYPE_POWER6.
Martin Kroeker [Thu, 27 Aug 2020 09:25:18 +0000 (11:25 +0200)]
Add early returns and fix sign errors in workspace calculations
Martin Kroeker [Thu, 27 Aug 2020 09:22:50 +0000 (11:22 +0200)]
Add early returns
Martin Kroeker [Thu, 27 Aug 2020 09:20:31 +0000 (11:20 +0200)]
Add early returns
Martin Kroeker [Thu, 27 Aug 2020 09:15:12 +0000 (11:15 +0200)]
Add early returns
Martin Kroeker [Thu, 27 Aug 2020 08:59:08 +0000 (10:59 +0200)]
Make ILAENV and xGETRF2 functions available
Martin Kroeker [Mon, 24 Aug 2020 18:18:09 +0000 (20:18 +0200)]
Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify
Fix OMP num specify issue
Martin Kroeker [Mon, 24 Aug 2020 06:03:39 +0000 (08:03 +0200)]
Merge pull request #2792 from pkubaj/patch-1
Add aliases for armv6, armv7
pkubaj [Sun, 23 Aug 2020 18:50:19 +0000 (18:50 +0000)]
Add aliases for armv6, armv7
FreeBSD uses those names for 32-bit ARM variants.
Chen, Guobing [Tue, 11 Aug 2020 19:28:25 +0000 (03:28 +0800)]
Fix OMP num specify issue
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
Martin Kroeker [Sun, 23 Aug 2020 17:33:03 +0000 (19:33 +0200)]
Merge pull request #2791 from martin-frbg/issue2787
Fix crashes in parallelized x86_64 ZDOT particularly on Windows
Martin Kroeker [Sun, 23 Aug 2020 13:08:16 +0000 (15:08 +0200)]
Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function
Martin Kroeker [Sun, 23 Aug 2020 12:42:35 +0000 (14:42 +0200)]
Merge pull request #2790 from martin-frbg/issue2789
Add OpenMP dependency to pkgconfig information if needed
Martin Kroeker [Sat, 22 Aug 2020 11:55:18 +0000 (13:55 +0200)]
Add OpenMP dependency to pkgconfig file if needed
Martin Kroeker [Sat, 22 Aug 2020 11:53:44 +0000 (13:53 +0200)]
Add OpenMP dependency to pkgconfig file if needed
Martin Kroeker [Sat, 22 Aug 2020 11:52:29 +0000 (13:52 +0200)]
Merge pull request #78 from xianyi/develop
rebase
Martin Kroeker [Thu, 20 Aug 2020 17:54:29 +0000 (19:54 +0200)]
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
Martin Kroeker [Wed, 19 Aug 2020 20:51:10 +0000 (22:51 +0200)]
Update system.cmake
Martin Kroeker [Wed, 19 Aug 2020 20:30:19 +0000 (22:30 +0200)]
Update system.cmake
Martin Kroeker [Wed, 19 Aug 2020 18:48:39 +0000 (20:48 +0200)]
fallback from cooperlake to skylake if gcc<10
Martin Kroeker [Wed, 19 Aug 2020 15:44:23 +0000 (17:44 +0200)]
Typo fix
Martin Kroeker [Wed, 19 Aug 2020 15:22:12 +0000 (17:22 +0200)]
-march=cooperlake requires gcc10
Martin Kroeker [Wed, 19 Aug 2020 15:17:53 +0000 (17:17 +0200)]
-march=cooperlake requires gcc10
Martin Kroeker [Wed, 19 Aug 2020 14:36:55 +0000 (16:36 +0200)]
Fix typo
Martin Kroeker [Wed, 19 Aug 2020 14:10:15 +0000 (16:10 +0200)]
-march=cooperlake only available in gcc >= 10
Martin Kroeker [Wed, 19 Aug 2020 13:06:30 +0000 (15:06 +0200)]
make march=cooperlake option conditional on gcc >= 10.1
Martin Kroeker [Wed, 19 Aug 2020 12:51:09 +0000 (14:51 +0200)]
[WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
Martin Kroeker [Wed, 19 Aug 2020 12:42:58 +0000 (14:42 +0200)]
Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config
Do not require pkg-config to generate the *.pc file
Albert Ziegenhagel [Tue, 18 Aug 2020 06:48:48 +0000 (08:48 +0200)]
Do not require pkg-config to generate the *.pc file
Generating the pkg-config file does not actually depend on pkg-config being available.
Martin Kroeker [Mon, 17 Aug 2020 17:06:13 +0000 (19:06 +0200)]
Merge pull request #2784 from martin-frbg/issue2783
Add fallback typedef for bfloat16 to openblas_config.h template
Martin Kroeker [Mon, 17 Aug 2020 13:32:14 +0000 (15:32 +0200)]
Add typedef for bfloat16 if needed
Martin Kroeker [Mon, 17 Aug 2020 13:28:15 +0000 (15:28 +0200)]
Merge pull request #77 from xianyi/develop
rebase
Martin Kroeker [Mon, 17 Aug 2020 13:20:41 +0000 (15:20 +0200)]
revert
Martin Kroeker [Mon, 17 Aug 2020 13:20:16 +0000 (15:20 +0200)]
revert
Martin Kroeker [Mon, 17 Aug 2020 13:19:40 +0000 (15:19 +0200)]
revert
Martin Kroeker [Sat, 15 Aug 2020 13:46:18 +0000 (15:46 +0200)]
Update .drone.yml
Martin Kroeker [Sat, 15 Aug 2020 12:46:26 +0000 (14:46 +0200)]
Update Makefile
Martin Kroeker [Sat, 15 Aug 2020 11:38:05 +0000 (13:38 +0200)]
Add simple MT sgemm precision test and INTERFACE64 build
Martin Kroeker [Sat, 15 Aug 2020 11:33:52 +0000 (13:33 +0200)]
Add simple sgemm preicsion test