Martin Kroeker [Sun, 9 Jul 2017 11:15:24 +0000 (13:15 +0200)]
Do not add -lpthread on Android builds (#1229)
* Do not add -lpthread on Android builds
* Do not add -lpthread on Android cmake builds
Zhang Xianyi [Fri, 7 Jul 2017 07:43:33 +0000 (15:43 +0800)]
Merge pull request #1225 from martin-frbg/stolen_from_wernsaar_fork
fixed syrk_thread.c taken from wernsaar
Martin Kroeker [Thu, 6 Jul 2017 15:30:12 +0000 (17:30 +0200)]
fixed syrk_thread.c taken from wernsaar
Stride calculation fix copied from https://github.com/wernsaar/OpenBLAS/commit/88900e1
Zhang Xianyi [Wed, 5 Jul 2017 09:01:03 +0000 (17:01 +0800)]
Link -lm or -lm_hard for Android ARMv7.
Zhang Xianyi [Mon, 3 Jul 2017 05:48:29 +0000 (13:48 +0800)]
Merge pull request #1218 from m-brow/power9
Optimise loads on Power9 LE
Zhang Xianyi [Mon, 3 Jul 2017 05:43:48 +0000 (13:43 +0800)]
Merge pull request #1212 from neilsh-msft/develop
Add Microsoft Windows 10 UWP build support
Martin Kroeker [Sat, 1 Jul 2017 18:43:23 +0000 (20:43 +0200)]
Merge pull request #1220 from ashwinyes/develop_aarch64_20170701_t99_options
arm64: Change mtune/mcpu options for THUNDERX2T99 target
Ashwin Sekhar T K [Sat, 1 Jul 2017 18:16:12 +0000 (11:16 -0700)]
arm64: Change mtune/mcpu options for THUNDERX2T99 target
Neil Shipp [Fri, 23 Jun 2017 20:07:34 +0000 (13:07 -0700)]
Add Microsoft Windows 10 UWP build support
Zhang Xianyi [Fri, 23 Jun 2017 03:35:25 +0000 (11:35 +0800)]
Merge branch 'arm_soft_fp_abi' into develop
Zhang Xianyi [Fri, 23 Jun 2017 03:33:09 +0000 (11:33 +0800)]
Merge pull request #1211 from neilsh-msft/develop
Add 64bit support for Microsoft Visual Studio
Neil Shipp [Fri, 23 Jun 2017 01:05:19 +0000 (18:05 -0700)]
Reorder dependencies to allow in-place build to succeed the first time.
Neil Shipp [Fri, 23 Jun 2017 00:08:09 +0000 (17:08 -0700)]
Avoid truncating cblas.h when compiling gencblas target
Neil Shipp [Thu, 22 Jun 2017 00:49:57 +0000 (17:49 -0700)]
Revert changes to sed and awk
Neil Shipp [Wed, 21 Jun 2017 18:06:48 +0000 (11:06 -0700)]
Add 64bit support for Microsoft Visual Studio
Matt Brown [Wed, 14 Jun 2017 06:47:56 +0000 (16:47 +1000)]
Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:45:58 +0000 (16:45 +1000)]
Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:43:31 +0000 (16:43 +1000)]
Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:39:27 +0000 (16:39 +1000)]
Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:38:32 +0000 (16:38 +1000)]
Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:36:10 +0000 (16:36 +1000)]
Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:23:20 +0000 (16:23 +1000)]
Optimise sswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 04:58:00 +0000 (14:58 +1000)]
Optimise scopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 04:25:10 +0000 (14:25 +1000)]
Optimise ccopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Martin Kroeker [Thu, 1 Jun 2017 14:36:26 +0000 (16:36 +0200)]
Fix installation of header files with cmake (#1186)
* Fix installation of header files with cmake
Install only the required header files, with openblas_config.h preprocessed like in Makefile.install
Fixes #1184
* Update CMakeLists.txt
Escape remaining semicolons in awk argument list (to get it working on Windows as well)
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Add files via upload
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
see if it is the single quotes that cause the problem on windows
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Use C utility instead of awk for header generation in cmake builds
* Update CMakeLists.txt
* Fix generation and installation of header files
Generate openblas_config.h and f77blas.h with same contents as in plain Makefile builds and install only the public header files
Martin Kroeker [Thu, 1 Jun 2017 14:35:52 +0000 (16:35 +0200)]
Merge pull request #1190 from oviradoi/utest_make_complex
Update test to use openblas_make_complex_float and openblas_make_comp…
Ovidiu Radoi [Tue, 30 May 2017 09:07:43 +0000 (12:07 +0300)]
Update test to use openblas_make_complex_float and openblas_make_complex_double functions
Martin Kroeker [Sun, 28 May 2017 09:07:57 +0000 (11:07 +0200)]
Merge pull request #1189 from pawosm-arm/flang
build: Flang has the same interface as PGI
Paul Osmialowski [Sat, 27 May 2017 05:23:58 +0000 (06:23 +0100)]
build: Flang has the same interface as PGI
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
Martin Kroeker [Fri, 26 May 2017 21:02:47 +0000 (23:02 +0200)]
Merge pull request #1188 from pawosm-arm/flang
build: Flang compiler support
Paul Osmialowski [Thu, 25 May 2017 11:22:17 +0000 (12:22 +0100)]
build: LLVM: Add Flang compiler support and enable OpenMP for Clang
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
Zhang Xianyi [Wed, 24 May 2017 07:54:58 +0000 (15:54 +0800)]
Merge pull request #1187 from mine260309/develop
build: fix libxlmass errors building on Power CPU
Lei YU [Wed, 24 May 2017 06:18:45 +0000 (14:18 +0800)]
build: fix libxlmass errors building on Power CPU
IBM MASS library is upgraded to 8.1.5 and 8.1.3 is not available.
Update README.md and Makefile.power to use version 8.1.5 of libxlmass.
Martin Kroeker [Wed, 10 May 2017 17:39:09 +0000 (19:39 +0200)]
Merge pull request #1182 from martin-frbg/martin-frbg-patch-1
Build shared library on Android without SONAME versioning
Martin Kroeker [Wed, 10 May 2017 11:08:13 +0000 (13:08 +0200)]
Build shared library on Android without SONAME versioning
Android does not support versioned SONAME entries, ref. #1173
Martin Kroeker [Sat, 6 May 2017 15:20:10 +0000 (17:20 +0200)]
Merge pull request #1178 from jcowgill/mips-fixes
MIPS threading fixes
Martin Kroeker [Sat, 6 May 2017 11:08:46 +0000 (13:08 +0200)]
Merge pull request #1179 from jcowgill/memory-fixes
Fixes to driver/others/memory.c
James Cowgill [Fri, 5 May 2017 09:33:56 +0000 (10:33 +0100)]
memory: Fix buffer overflow when position == NUM_BUFFERS
James Cowgill [Thu, 4 May 2017 13:35:36 +0000 (14:35 +0100)]
mips: remove incorrect blas_lock implementations
MIPS 32-bit currently has an empty blas_lock implementation which is
worse than nothing at all. MIPS 64-bit does has a blas_lock
implementation but is broken. Remove them and fallback to the generic
version in common.h which should do the right thing on MIPS.
James Cowgill [Thu, 4 May 2017 13:32:46 +0000 (14:32 +0100)]
mips: implement MB and WMB
The MIPS architecture has weak memory ordering and therefore requires
sutible memory barriers when doing lock free programming with multiple
threads (just like ARM does). This commit implements those barriers for
MIPS and MIPS64 using GCC bultins which is probably easiest way.
James Cowgill [Thu, 4 May 2017 13:29:48 +0000 (14:29 +0100)]
memory: switch loop condition around in blas_memory_free
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.
This commit switches the loop condition around so it works as intended.
Martin Kroeker [Fri, 5 May 2017 10:00:04 +0000 (12:00 +0200)]
Merge pull request #1175 from martin-frbg/lapack_143
Fix workspace computation in LAPACKE ?tpmqrt
Martin Kroeker [Fri, 5 May 2017 09:59:41 +0000 (11:59 +0200)]
Merge pull request #1176 from staticfloat/sf/dynamic_arch
Fix DYNAMIC_ARCH=1 breaking builds on non-x86 platforms
Elliot Saba [Thu, 4 May 2017 18:52:34 +0000 (11:52 -0700)]
Force `DYNAMIC_ARCH` to empty when `DYNAMIC_CORE` is not set
Elliot Saba [Thu, 4 May 2017 18:51:29 +0000 (11:51 -0700)]
Add Makefile debugging trick so that we can inspect runtime Makefile variables
Martin Kroeker [Thu, 4 May 2017 18:01:41 +0000 (20:01 +0200)]
Fix workspace computation for side=L
From netlib PR#144
Martin Kroeker [Thu, 4 May 2017 17:59:02 +0000 (19:59 +0200)]
Fix workspace computation for side=L
From netlib PR#144
Martin Kroeker [Thu, 4 May 2017 17:55:02 +0000 (19:55 +0200)]
Fix workspace computation for side=L
From netlib PR#144
Martin Kroeker [Thu, 4 May 2017 17:49:51 +0000 (19:49 +0200)]
Fix workspace allocation in lapacke_ctp for side=L
from netlib PR #144
Martin Kroeker [Thu, 4 May 2017 17:32:50 +0000 (19:32 +0200)]
Merge pull request #1169 from martin-frbg/cblas_xerbla
Add trivial implementation of cblas_xerbla
Martin Kroeker [Wed, 26 Apr 2017 18:29:30 +0000 (20:29 +0200)]
Update xerbla.c
Martin Kroeker [Wed, 26 Apr 2017 18:01:34 +0000 (20:01 +0200)]
Add cblas_xerbla
Martin Kroeker [Wed, 26 Apr 2017 17:58:59 +0000 (19:58 +0200)]
Add cblas_xerbla()
Martin Kroeker [Fri, 21 Apr 2017 13:14:16 +0000 (15:14 +0200)]
Merge pull request #1165 from rcoscali/patch-1
README.md update
Rémi Cohen-Scali [Fri, 21 Apr 2017 12:18:57 +0000 (14:18 +0200)]
Update README.md
Martin Kroeker [Fri, 21 Apr 2017 08:53:49 +0000 (10:53 +0200)]
Merge pull request #1164 from sharkcz/s390x
detect CPU on zArch
Dan Horák [Thu, 20 Apr 2017 19:13:41 +0000 (21:13 +0200)]
detect CPU on zArch
Martin Kroeker [Wed, 19 Apr 2017 18:03:23 +0000 (20:03 +0200)]
Merge pull request #1160 from gcp/extra-streamroller-cpuid
Add an extra familiy/model combination used by AMD Steamrolller.
Gian-Carlo Pascutto [Wed, 19 Apr 2017 17:15:47 +0000 (19:15 +0200)]
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
Martin Kroeker [Wed, 19 Apr 2017 13:04:41 +0000 (15:04 +0200)]
Merge pull request #1158 from martin-frbg/force-zen
Make FORCE_ZEN option in getarch.c actually set target names to ZEN
Martin Kroeker [Wed, 19 Apr 2017 12:20:42 +0000 (14:20 +0200)]
Fix FORCE_ZEN option in getarch.c
Martin Kroeker [Tue, 18 Apr 2017 11:32:16 +0000 (13:32 +0200)]
Merge pull request #1157 from gcp/revert-zen-param
Revert Zen param.h to Haswell values (instead of Excavator).
Gian-Carlo Pascutto [Tue, 18 Apr 2017 10:40:25 +0000 (12:40 +0200)]
Revert Zen param.h to Haswell values (instead of Excavator).
Martin Kroeker [Tue, 18 Apr 2017 07:00:24 +0000 (09:00 +0200)]
Merge pull request #1156 from SoapGentoo/cmake-fixes
Use GNUInstallDirs to allow changing target directories
David Seifert [Sat, 15 Apr 2017 22:43:34 +0000 (00:43 +0200)]
Use GNUInstallDirs to allow changing target directories
* Multi-lib distributions need to change the libdir
which is only portably possible with `GNUInstallDirs`.
* Multi-arch distributions such as Debian and Exherbo
need to be able to change the bindir.
Martin Kroeker [Thu, 13 Apr 2017 14:37:29 +0000 (16:37 +0200)]
Merge pull request #1154 from sharkcz/s390x
add lapack laswp directory for zarch
Dan Horák [Thu, 13 Apr 2017 10:21:10 +0000 (12:21 +0200)]
add lapack laswp for zarch
Zhang Xianyi [Tue, 11 Apr 2017 03:56:10 +0000 (11:56 +0800)]
Build shared library for Android.
Martin Kroeker [Mon, 10 Apr 2017 18:17:14 +0000 (20:17 +0200)]
Merge pull request #1148 from gcp/fix-dynamic-zen
Fix dynamic detection for ZEN CPUs.
Gian-Carlo Pascutto [Mon, 10 Apr 2017 18:05:16 +0000 (20:05 +0200)]
Recognize ZEN when passed as OPENBLAS_CORETYPE.
Gian-Carlo Pascutto [Mon, 10 Apr 2017 17:07:52 +0000 (19:07 +0200)]
Fix dynamic detection for ZEN CPUs.
Martin Kroeker [Thu, 6 Apr 2017 14:20:01 +0000 (16:20 +0200)]
Merge pull request #1142 from amodra/develop
Power8 inline assembly tweaks
Alan Modra [Sat, 1 Apr 2017 09:05:59 +0000 (19:35 +1030)]
Power8 inline assembly tweaks
Further fixes on top of
9e2f316ed. Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
Martin Kroeker [Mon, 3 Apr 2017 07:47:09 +0000 (09:47 +0200)]
Merge pull request #1140 from JohannesBuchner/develop
Autodetect AMD A8-6410 as BARCELONA
Johannes Buchner [Mon, 3 Apr 2017 07:07:27 +0000 (17:07 +1000)]
Autodetect AMD A8-6410 as BARCELONA
Martin Kroeker [Fri, 24 Mar 2017 21:05:22 +0000 (22:05 +0100)]
Fix integer overflow in LAPACK DBDSQR, SBDSQR (#1135)
* Fix integer overflow in DBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in SBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
Martin Kroeker [Fri, 24 Mar 2017 12:47:32 +0000 (13:47 +0100)]
Merge pull request #1133 from steckdenis/develop
Add ZEN support
Zhang Xianyi [Mon, 20 Mar 2017 09:39:25 +0000 (17:39 +0800)]
Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger.
Denis Steckelmacher [Sun, 19 Mar 2017 14:32:50 +0000 (15:32 +0100)]
Add ZEN support (tested for auto-detected static backend)
Andrew [Thu, 16 Mar 2017 12:13:31 +0000 (13:13 +0100)]
Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
Martin Kroeker [Wed, 15 Mar 2017 09:00:52 +0000 (10:00 +0100)]
Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
Martin Kroeker [Tue, 14 Mar 2017 16:17:39 +0000 (17:17 +0100)]
Merge pull request #1126 from martin-frbg/pgi
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
Martin Kroeker [Mon, 13 Mar 2017 17:08:00 +0000 (18:08 +0100)]
Update zdot.c
Martin Kroeker [Mon, 13 Mar 2017 16:49:07 +0000 (17:49 +0100)]
Update zdot.c
Martin Kroeker [Sun, 12 Mar 2017 23:40:11 +0000 (00:40 +0100)]
Replace gnu _real_, _imag_ extensions in initializers
Martin Kroeker [Sun, 12 Mar 2017 23:38:37 +0000 (00:38 +0100)]
Replace gnu _real_ , _imag_ extensions in initializers
Martin Kroeker [Sun, 12 Mar 2017 23:36:01 +0000 (00:36 +0100)]
Fix CREAL,CIMAG macros for PGI
Abdurrauf [Sun, 12 Mar 2017 21:23:16 +0000 (01:23 +0400)]
strmm and ctrmm
Martin Kroeker [Fri, 10 Mar 2017 11:58:38 +0000 (12:58 +0100)]
Merge pull request #1124 from martin-frbg/c_check-ppc
Update c_check.cmake to label ppc64 as power ARCH
Martin Kroeker [Fri, 10 Mar 2017 10:45:48 +0000 (11:45 +0100)]
Update c_check.cmake
Martin Kroeker [Fri, 10 Mar 2017 08:51:34 +0000 (09:51 +0100)]
Merge pull request #1122 from martin-frbg/zlasyf
Fix misspelling of zlasyf_aa from previous commit
Martin Kroeker [Fri, 10 Mar 2017 07:44:49 +0000 (08:44 +0100)]
Fix misspelling of zlasyf_aa from previous commit
Martin Kroeker [Fri, 10 Mar 2017 07:33:36 +0000 (08:33 +0100)]
Merge pull request #1121 from staticfloat/sf/Xsymv_export
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
Elliot Saba [Thu, 9 Mar 2017 23:30:43 +0000 (15:30 -0800)]
Whitespace cleanup/reformatting
Elliot Saba [Thu, 9 Mar 2017 23:22:40 +0000 (15:22 -0800)]
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
Zhang Xianyi [Mon, 6 Mar 2017 14:16:13 +0000 (22:16 +0800)]
Support ARM softfp ABI for sgemm on ARMV7.
make ARM_SOFTFP_ABI=1
Zhang Xianyi [Mon, 6 Mar 2017 05:53:56 +0000 (13:53 +0800)]
Merge branch 'develop' into arm_soft_fp_abi
Abdurrauf [Mon, 6 Mar 2017 00:27:40 +0000 (04:27 +0400)]
initial strmm(sgemm). not tuned yet
Martin Kroeker [Thu, 2 Mar 2017 17:43:59 +0000 (18:43 +0100)]
Merge pull request #1111 from martin-frbg/kaby-no-avx
Fix core detection for Kaby Lake without AVX (G4560)
Martin Kroeker [Thu, 2 Mar 2017 16:36:16 +0000 (17:36 +0100)]
Fix core detection for Kaby Lake without AVX (G4560)
Should fix #1109)