Martin Kroeker [Thu, 12 Oct 2017 15:00:00 +0000 (17:00 +0200)]
Add cmake build list file for ReLAPACK
Martin Kroeker [Thu, 12 Oct 2017 14:58:37 +0000 (16:58 +0200)]
Add ReLAPACK option
Martin Kroeker [Tue, 10 Oct 2017 10:14:34 +0000 (12:14 +0200)]
Merge pull request #1325 from grisuthedragon/patch-1
Update README.md to include POWER8
Martin Köhler [Tue, 10 Oct 2017 08:12:04 +0000 (10:12 +0200)]
Update README.md
Add POWER 8 to the list of additional architectures.
Martin Kroeker [Mon, 9 Oct 2017 21:34:18 +0000 (23:34 +0200)]
Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
Martin Kroeker [Sun, 8 Oct 2017 21:31:33 +0000 (23:31 +0200)]
Merge pull request #1320 from timmoon10/develop
2D thread distribution for multi-threaded GEMMs
Martin Kroeker [Sun, 8 Oct 2017 21:31:06 +0000 (23:31 +0200)]
Merge pull request #1319 from martin-frbg/issue601
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
Martin Kroeker [Sun, 8 Oct 2017 21:30:46 +0000 (23:30 +0200)]
Merge pull request #1317 from martin-frbg/power8-asm
Save and restore VSX registers
Martin Kroeker [Fri, 6 Oct 2017 21:51:32 +0000 (23:51 +0200)]
Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
Martin Kroeker [Fri, 6 Oct 2017 19:13:45 +0000 (21:13 +0200)]
Merge pull request #1279 from xsacha/develop
CMake improvements
Tim Moon [Wed, 4 Oct 2017 19:37:49 +0000 (12:37 -0700)]
Reduce number of data partitions in n.
Martin Kroeker [Wed, 4 Oct 2017 18:35:00 +0000 (20:35 +0200)]
Merge pull request #1316 from timmoon10/develop
Variable thread count for multi-threaded GEMMs
Tim Moon [Tue, 3 Oct 2017 23:32:08 +0000 (16:32 -0700)]
Cleaning up and documenting multi-threaded GEMM code.
Tim Moon [Tue, 3 Oct 2017 20:43:39 +0000 (13:43 -0700)]
Use 2D thread distribution for small GEMMs.
Allows maximum use of available cores if one of M and N is small and the other is large.
Martin Kroeker [Sat, 30 Sep 2017 23:06:39 +0000 (01:06 +0200)]
Fix out-of-bounds accesses where the data should be zero anyway
Martin Kroeker [Sat, 30 Sep 2017 19:31:28 +0000 (21:31 +0200)]
Merge pull request #1318 from pv/potrf-smoketest
Add trivial smoketest for xpotrf
Pauli Virtanen [Sat, 30 Sep 2017 16:40:03 +0000 (18:40 +0200)]
Add trivial smoketest for xpotrf
Tim Moon [Thu, 28 Sep 2017 19:56:29 +0000 (12:56 -0700)]
Increasing flexibility of GEMM benchmark.
m, n, and k can be set to arbitrary constants. A and B matrices can be transposed independently.
Martin Kroeker [Thu, 28 Sep 2017 10:17:09 +0000 (12:17 +0200)]
Save and restore VSX registers
Tim Moon [Thu, 28 Sep 2017 02:26:38 +0000 (19:26 -0700)]
Merge https://github.com/timmoon10/OpenBLAS into develop
Tim Moon [Thu, 28 Sep 2017 02:25:33 +0000 (19:25 -0700)]
Reducing threads for multi-threaded GEMMs on small matrices.
Martin Kroeker [Tue, 26 Sep 2017 08:34:18 +0000 (10:34 +0200)]
Merge pull request #1314 from martin-frbg/nofortran-fix-2
Rewrite NOFORTRAN conditionals
Martin Kroeker [Mon, 25 Sep 2017 21:45:14 +0000 (23:45 +0200)]
Rewrite NOFORTRAN conditionals
... so that they do not trigger accidentally when NOFORTRAN is empty/unset
Martin Kroeker [Fri, 22 Sep 2017 07:34:54 +0000 (09:34 +0200)]
Merge pull request #1310 from sva-img/develop
Added mips I6500 core
Shivraj Patil [Fri, 22 Sep 2017 06:27:43 +0000 (11:57 +0530)]
Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Martin Kroeker [Tue, 19 Sep 2017 12:04:37 +0000 (14:04 +0200)]
Merge pull request #1308 from sebastien-villemot/develop
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
Sébastien Villemot [Tue, 19 Sep 2017 10:16:42 +0000 (12:16 +0200)]
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
Martin Kroeker [Mon, 18 Sep 2017 08:16:40 +0000 (10:16 +0200)]
Merge pull request #1304 from martin-frbg/aix-build-fixes
(Plain make) build system fixes for AIX
Martin Kroeker [Sun, 17 Sep 2017 23:29:21 +0000 (01:29 +0200)]
(Plain make) build system fixes for AIX
- retry fortran compiler test with aix-specific option if generic -m32/-m64 fails
- pass any custom ARFLAGS to lapack
- no addition of -m32/-m64 to the CFLAGS and FFLAGS on AIX
Martin Kroeker [Thu, 14 Sep 2017 19:46:26 +0000 (21:46 +0200)]
Merge pull request #1303 from martin-frbg/imatcopy-rowscols
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
Martin Kroeker [Thu, 14 Sep 2017 17:59:05 +0000 (19:59 +0200)]
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
Equivalent of #1244 (issue #899) for the non-complex cases. Fixes #1289
Martin Kroeker [Thu, 14 Sep 2017 09:54:20 +0000 (11:54 +0200)]
Merge pull request #1302 from martin-frbg/nofortran-fix
Remove default FEXTRALIBS in NOFORTRAN case
Martin Kroeker [Thu, 14 Sep 2017 07:21:04 +0000 (09:21 +0200)]
Remove default FEXTRALIBS in NOFORTRAN case
Martin Kroeker [Sat, 9 Sep 2017 21:47:17 +0000 (23:47 +0200)]
Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
Martin Kroeker [Sat, 9 Sep 2017 21:46:27 +0000 (23:46 +0200)]
Merge pull request #1293 from embray/cygwin/install
More canonical installation on Cygwin
Martin Kroeker [Sat, 9 Sep 2017 21:41:53 +0000 (23:41 +0200)]
Merge pull request #1299 from martin-frbg/race_fixes
Fix thread data races uncovered by gcc thread sanitizer
Martin Kroeker [Sat, 9 Sep 2017 18:30:33 +0000 (20:30 +0200)]
Convert another caller of "allocation" to LOCK_COMMAND
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
Martin Kroeker [Sat, 9 Sep 2017 17:07:06 +0000 (19:07 +0200)]
Fix thread data races
Martin Kroeker [Sat, 9 Sep 2017 16:58:38 +0000 (18:58 +0200)]
Fix thread data race in memory.c
Erik M. Bray [Thu, 7 Sep 2017 12:18:56 +0000 (14:18 +0200)]
More canonical installation on Cygwin:
* The DLL is named cygopenblas.dll, not libopenblas.dll
* The import lib (still called libopenblas.dll.a) is installed
Abdurrauf [Sat, 8 Apr 2017 17:51:15 +0000 (21:51 +0400)]
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)
Martin Kroeker [Sun, 3 Sep 2017 11:02:10 +0000 (13:02 +0200)]
Merge pull request #1290 from martin-frbg/imatcopy
Use in-place transform shortcut only if matrix is square
Martin Kroeker [Sun, 3 Sep 2017 07:52:55 +0000 (09:52 +0200)]
Use in-place transform shortcut only if matrix is square
Martin Kroeker [Sun, 27 Aug 2017 11:23:57 +0000 (13:23 +0200)]
Merge pull request #1286 from martin-frbg/baytrail
Fix coretype detection for Bay Trail Atom
Martin Kroeker [Sun, 27 Aug 2017 11:06:54 +0000 (13:06 +0200)]
Fix coretype detection for Bay Trail Atom
My earlier PR #982 appears to have been incomplete in this regard - fixes #1285
Sacha [Wed, 23 Aug 2017 02:47:38 +0000 (12:47 +1000)]
Clean up config file writing.
Sacha [Wed, 23 Aug 2017 01:16:24 +0000 (11:16 +1000)]
Fix open_blas.config which was never working out-of-source. Remove need for gen_config_h.exe. If OpenMP is requested, do not silently ignore when it isn't available.
Sacha Refshauge [Tue, 22 Aug 2017 21:19:02 +0000 (07:19 +1000)]
Do not require Perl for MSVC if CMake >= 3.4
Sacha Refshauge [Sun, 20 Aug 2017 14:37:29 +0000 (00:37 +1000)]
Clean up, fix old typos. Simplify arch usages. Move system arch check to earlier position.
Sacha Refshauge [Sun, 20 Aug 2017 12:50:31 +0000 (22:50 +1000)]
Improvements to previous commit (cross-compile).
Fix typos and bad if statements discovered in 0.2.20.
Sacha Refshauge [Sun, 20 Aug 2017 10:08:53 +0000 (20:08 +1000)]
Add support for cross compiling.
Add support for not having host compiler as CMake cannot detect such a compiler.
Add support for not using getarch.
Successfully builds Android ARMV8. Any target can be added by supplying the TARGET_CORE config in prebuild.cmake.
Martin Kroeker [Sat, 19 Aug 2017 18:37:19 +0000 (20:37 +0200)]
Merge pull request #1281 from sharkcz/armv8
fix detection of generic ARMv8 CPUs
Sacha Refshauge [Sat, 19 Aug 2017 14:59:14 +0000 (00:59 +1000)]
Add kernel/Makefile.LA to CMake
Sacha Refshauge [Sat, 19 Aug 2017 14:59:00 +0000 (00:59 +1000)]
Add a CMake GCC and Clang target to Travis CI
Sacha Refshauge [Sat, 19 Aug 2017 14:13:46 +0000 (00:13 +1000)]
Remove _static usages for tests
Sacha Refshauge [Sat, 19 Aug 2017 14:13:24 +0000 (00:13 +1000)]
Only run utest without NOFORTRAN, same as Makefile. Linux now compiles.
Sacha Refshauge [Sat, 19 Aug 2017 05:07:42 +0000 (15:07 +1000)]
Fix threading usage in CMake: s/SMP/USE_THREAD/
Dan Horák [Fri, 18 Aug 2017 12:53:29 +0000 (14:53 +0200)]
fix detection of generic ARMv8 CPUs
Sacha Refshauge [Thu, 17 Aug 2017 07:27:01 +0000 (17:27 +1000)]
Fix typos and use CMake OpenMP support.
<srefshauge@imagus.com.au> [Wed, 16 Aug 2017 17:32:04 +0000 (03:32 +1000)]
Fix bug that required fortran. Fix bug that needed CXX var. Remove redundant set vars. Fix threading detection. Do not attempt to run code if cross compiling.
<srefshauge@imagus.com.au> [Wed, 16 Aug 2017 16:04:36 +0000 (02:04 +1000)]
Drop some redundant vars and improve arch detection in CMake.
<srefshauge@imagus.com.au> [Wed, 16 Aug 2017 14:51:04 +0000 (00:51 +1000)]
Allow CMake to determine if it is building static or shared.
<srefshauge@imagus.com.au> [Wed, 16 Aug 2017 14:35:54 +0000 (00:35 +1000)]
Let CMake deal with build type.
Martin Kroeker [Thu, 10 Aug 2017 21:42:23 +0000 (23:42 +0200)]
Merge pull request #1277 from cconrads-scicomp/fix-installation-instructions
Make: fix installation instructions
Martin Kroeker [Thu, 10 Aug 2017 19:35:32 +0000 (21:35 +0200)]
Merge pull request #1276 from cconrads-scicomp/android_-lm_fix
ARM: do not add linker flag `-lm` unconditionally
Martin Kroeker [Thu, 10 Aug 2017 19:32:09 +0000 (21:32 +0200)]
Merge pull request #1275 from cconrads-scicomp/recognize-gfortran-on-arm
ARM: recognize gfortran pre-releases
Christoph Conrads [Thu, 10 Aug 2017 18:22:26 +0000 (14:22 -0400)]
Make: show installation instructions after build
Christoph Conrads [Thu, 10 Aug 2017 16:47:18 +0000 (12:47 -0400)]
Make: fix installation instructions
The installation instructions as shown after successfully compiling
OpenBLAS are wrong because this arguments used during compilation have
to be provided to Make again.
Christoph Conrads [Thu, 10 Aug 2017 15:34:21 +0000 (11:34 -0400)]
ARM: do not add linker flag `-lm` unconditionally
On ARM the required math library depends on whether the soft floating
point ABI is used or not but this is already handled in
`Makefile.system`, lines 499-505.
Christoph Conrads [Thu, 10 Aug 2017 15:48:29 +0000 (11:48 -0400)]
ARM: recognize gfortran pre-releases
Without proper recognition of gfortran versions such as
> GNU Fortran (GCC) 4.9.x
20150123 (prerelease)
OpenBLAS assumes the presence of the G77 compiler. Consequently,
`-lgfortran` is missing from the pkg-config file.
The aforementioned compiler is the gfortran compiler in the Android repo
in a commit tagged as `ndk-r14`, cf. Paul Mustière's gfortran build
instructions for Android at https://github.com/buffer51/android-gfortran
Martin Kroeker [Tue, 8 Aug 2017 21:47:47 +0000 (23:47 +0200)]
Merge pull request #1264 from isuruf/dyn
Support DYNAMIC_ARCH with CMake
Martin Kroeker [Tue, 8 Aug 2017 19:54:35 +0000 (21:54 +0200)]
Merge pull request #1268 from jirutka/travis-2
Travis: Add jobs building with clang and disable job `LINUX64_MUSL USE_OPENMP=1`
Martin Kroeker [Tue, 8 Aug 2017 14:39:13 +0000 (16:39 +0200)]
Change travis back to sudo true
,,,to see if this has any influence on the recent ld SIGKILLS
Isuru Fernando [Mon, 7 Aug 2017 18:37:25 +0000 (00:07 +0530)]
No strncasecmp with MSVC
Isuru Fernando [Mon, 7 Aug 2017 17:38:44 +0000 (23:08 +0530)]
Add commonobjs
Isuru Fernando [Sun, 6 Aug 2017 13:47:31 +0000 (19:17 +0530)]
Test DYNAMIC_ARCH on appveyor
Isuru Fernando [Sun, 6 Aug 2017 13:37:00 +0000 (19:07 +0530)]
Merge remote-tracking branch 'upstream/develop' into dyn
Martin Kroeker [Sun, 6 Aug 2017 12:11:44 +0000 (14:11 +0200)]
Merge pull request #1262 from martin-frbg/xmv_thread-splitting
Make sure that range limit of last thread never exceeds data size
Martin Kroeker [Sun, 6 Aug 2017 12:10:18 +0000 (14:10 +0200)]
Merge pull request #1256 from isuruf/develop
Support compiling with clang on windows
Jakub Jirutka [Sun, 6 Aug 2017 09:17:02 +0000 (11:17 +0200)]
Travis: Add jobs building with clang
Jakub Jirutka [Sun, 6 Aug 2017 09:06:03 +0000 (11:06 +0200)]
Travis: Disable job "LINUX64_MUSL USE_OPENMP=1"
https://github.com/xianyi/OpenBLAS/pull/1255#issuecomment-
320494610
Isuru Fernando [Fri, 4 Aug 2017 02:34:16 +0000 (08:04 +0530)]
Build all branches so that appveyor works in forks
Isuru Fernando [Fri, 4 Aug 2017 02:27:20 +0000 (07:57 +0530)]
New utest for clang
Isuru Fernando [Fri, 4 Aug 2017 02:27:55 +0000 (07:57 +0530)]
Merge remote-tracking branch 'upstream/develop' into develop
Martin Kroeker [Thu, 3 Aug 2017 13:33:28 +0000 (15:33 +0200)]
Merge pull request #1266 from ashwinyes/develop_thunderx2t99_fix_clang_compilation
THUDNERX2T99: Fix clang compilation
Ashwin Sekhar T K [Wed, 2 Aug 2017 18:28:45 +0000 (11:28 -0700)]
THUDNERX2T99: Fix clang compilation
Isuru Fernando [Wed, 2 Aug 2017 13:09:04 +0000 (18:39 +0530)]
Add missing EXCAVATOR
Martin Kroeker [Wed, 2 Aug 2017 13:31:05 +0000 (15:31 +0200)]
Merge pull request #1259 from isuruf/cmake
CMake Improvements
Isuru Fernando [Wed, 2 Aug 2017 13:00:26 +0000 (18:30 +0530)]
Fix extra whitespaces. CMake parser macro fails with it
TODO: Fix the parser macro to strip trailing whitespaces
Isuru Fernando [Wed, 2 Aug 2017 12:54:54 +0000 (18:24 +0530)]
Add hemm3m and symm3m objects
Isuru Fernando [Wed, 2 Aug 2017 10:44:34 +0000 (16:14 +0530)]
Fixes for dynamic_arch. almost there
Martin Kroeker [Wed, 2 Aug 2017 10:03:54 +0000 (12:03 +0200)]
Update trmv_thread.c
Martin Kroeker [Wed, 2 Aug 2017 09:59:17 +0000 (11:59 +0200)]
Merge pull request #1255 from jirutka/travis
Travis: Rewrite config, build and test also on Alpine Linux (musl libc)
Martin Kroeker [Tue, 1 Aug 2017 22:37:58 +0000 (00:37 +0200)]
Make sure that range_n of last thread never exceeds the actual data size when splitting the workload
Jakub Jirutka [Fri, 28 Jul 2017 16:08:44 +0000 (18:08 +0200)]
Travis: Allow job LINUX64_MUSL USE_OPENMP=1 to fail
See: https://github.com/xianyi/OpenBLAS/pull/1255#issuecomment-
318692183
Jakub Jirutka [Fri, 28 Jul 2017 12:32:17 +0000 (14:32 +0200)]
Travis: Disable some gcc warnings to avoid exceeding Travis limit
See: https://github.com/xianyi/OpenBLAS/pull/1255#issuecomment-
318628666
Jakub Jirutka [Fri, 28 Jul 2017 00:31:27 +0000 (02:31 +0200)]
Travis: Build and test also on Alpine Linux (musl libc)
Alpine jobs needs sudo (for chroot), so they run on VMs infrastructure.
That's why they are much slower than other jobs.
Jakub Jirutka [Fri, 28 Jul 2017 00:01:44 +0000 (02:01 +0200)]
Travis: Simplify configuration using Build Stages and APT addon
Using APT addon has nice side-effect - you don't need sudo anymore, so
it can run on Travis containers-based infrastructure that is much faster
than their VMs infrastructure (used when sudo is needed).
You've been still running on Ubuntu Presty builders, but new default is
Trusty. Thus I've explicitly set `dist: presty` to let it stay on
Presty, to not change build environment by this commit.
Martin Kroeker [Tue, 1 Aug 2017 18:07:32 +0000 (20:07 +0200)]
Merge pull request #1260 from xianyi/revert-1254-xbmv_range
Revert "Fix calculated range limit exceeding actual data size for last thread"
Isuru Fernando [Tue, 1 Aug 2017 17:53:55 +0000 (23:23 +0530)]
configure kernel_core.h