Isuru Fernando [Sat, 29 Jul 2017 15:30:32 +0000 (21:00 +0530)]
Ninja complains that file openblas.def does not exist
Isuru Fernando [Sat, 29 Jul 2017 15:29:17 +0000 (20:59 +0530)]
clang on windows needs FU=''
Isuru Fernando [Sat, 29 Jul 2017 15:08:16 +0000 (20:38 +0530)]
typedefs only for c
Isuru Fernando [Fri, 28 Jul 2017 06:20:29 +0000 (11:50 +0530)]
Fix complex support for MSVC headers
Isuru Fernando [Fri, 28 Jul 2017 06:19:39 +0000 (11:49 +0530)]
check compiler is msvc instead of msvc
Martin Kroeker [Tue, 25 Jul 2017 21:31:57 +0000 (23:31 +0200)]
Merge pull request #1249 from martin-frbg/cgroup
Honor cgroup/cpuset limits when enumerating cpus
Martin Kroeker [Tue, 25 Jul 2017 20:47:34 +0000 (22:47 +0200)]
Honor cgroup/cpuset limits when enumerating cpus
Martin Kroeker [Mon, 24 Jul 2017 14:17:50 +0000 (16:17 +0200)]
Revert "Honor cgroup/cpuset limits when enumerating cpus" (#1246)
Zhang Xianyi [Mon, 24 Jul 2017 04:07:00 +0000 (12:07 +0800)]
Merge pull request #1236 from martin-frbg/l1cache
Use cpuid 4 with subleafs to query L1 cache size on Intel processors
Zhang Xianyi [Mon, 24 Jul 2017 04:06:29 +0000 (12:06 +0800)]
Bump develop version for 0.3.0.
Zhang Xianyi [Mon, 24 Jul 2017 04:03:35 +0000 (12:03 +0800)]
Merge branch 'develop'
0.2.20 version
Zhang Xianyi [Mon, 24 Jul 2017 03:55:10 +0000 (11:55 +0800)]
Update doc for 0.2.20 version.
Zhang Xianyi [Mon, 24 Jul 2017 03:46:52 +0000 (11:46 +0800)]
Merge pull request #1239 from martin-frbg/cgroups
Honor cgroup/cpuset limits when enumerating cpus
Zhang Xianyi [Mon, 24 Jul 2017 03:45:27 +0000 (11:45 +0800)]
Merge pull request #1244 from martin-frbg/micmuc_cimatcopy
Fix complex imatcopy for Trans cases with non-square matrix
Martin Kroeker [Fri, 21 Jul 2017 09:20:15 +0000 (11:20 +0200)]
Use in-place transform shortcut only if matrix is square
Martin Kroeker [Thu, 20 Jul 2017 18:51:06 +0000 (20:51 +0200)]
Add files via upload
Martin Kroeker [Sat, 15 Jul 2017 20:02:53 +0000 (22:02 +0200)]
Exchange rows and cols in final omatcopy with BlasTrans
This is MicMuc's patch from #899
Martin Kroeker [Sat, 15 Jul 2017 10:48:42 +0000 (12:48 +0200)]
More fixes for silly misedits
Martin Kroeker [Sat, 15 Jul 2017 09:53:28 +0000 (11:53 +0200)]
Fixup braces lost in previous edit
Martin Kroeker [Sat, 15 Jul 2017 08:40:42 +0000 (10:40 +0200)]
Merge branch 'develop' into cgroups
Martin Kroeker [Thu, 13 Jul 2017 20:01:47 +0000 (22:01 +0200)]
Disable ReLAPACK by default (#1238)
* Disable ReLAPACK by default; mention it in final build message if included
* Add files via upload
* Add files via upload
* Add files via upload
Zhang Xianyi [Thu, 13 Jul 2017 12:31:08 +0000 (20:31 +0800)]
Merge pull request #1214 from martin-frbg/relapack
Initial import of ReLAPACK
Zhang Xianyi [Thu, 13 Jul 2017 12:27:37 +0000 (20:27 +0800)]
Merge pull request #1234 from brada4/develop
Fix write past fixed size buffer
Martin Kroeker [Wed, 12 Jul 2017 19:56:23 +0000 (21:56 +0200)]
Add dummy implementation of cpuid_count for the CPUIDEMU case
Martin Kroeker [Wed, 12 Jul 2017 18:43:09 +0000 (20:43 +0200)]
Use cpuid 4 with subleafs to query L1 cache size on Intel processors
Martin Kroeker [Wed, 12 Jul 2017 07:37:55 +0000 (09:37 +0200)]
Merge pull request #1235 from xianyi/revert-1233-cpuid-fix
Revert "Fix unintentional fall-through cases in get_cacheinfo"
Martin Kroeker [Wed, 12 Jul 2017 07:35:11 +0000 (09:35 +0200)]
Revert "Fix unintentional fall-through cases in get_cacheinfo"
Andrew [Tue, 11 Jul 2017 22:59:30 +0000 (00:59 +0200)]
Fix write past fixed size buffer
Martin Kroeker [Tue, 11 Jul 2017 16:48:13 +0000 (18:48 +0200)]
Add files via upload
Martin Kroeker [Tue, 11 Jul 2017 16:42:39 +0000 (18:42 +0200)]
Add files via upload
Martin Kroeker [Tue, 11 Jul 2017 16:27:33 +0000 (18:27 +0200)]
Honor cgroup/cpuset constraints when enumerating cpus
Martin Kroeker [Tue, 11 Jul 2017 15:10:55 +0000 (17:10 +0200)]
Merge pull request #1233 from martin-frbg/cpuid-fix
Fix unintentional fall-through cases in get_cacheinfo
Martin Kroeker [Tue, 11 Jul 2017 13:39:15 +0000 (15:39 +0200)]
Fix unintentional fall-through cases in get_cacheinfo
These appear to be unintended side effects of PR #1091, probably causing #1232
Zhang Xianyi [Mon, 10 Jul 2017 12:04:42 +0000 (20:04 +0800)]
Merge pull request #1226 from ashwinyes/develop_arm_clang_ual_fix
arm: Fix clang compilation for ARMv7
Zhang Xianyi [Mon, 10 Jul 2017 12:03:57 +0000 (20:03 +0800)]
Merge pull request #1221 from ashwinyes/develop_arm_softfp
arm: add support for softfp in arm vfp assembly files
Zhang Xianyi [Mon, 10 Jul 2017 12:02:36 +0000 (20:02 +0800)]
Merge branch 'develop' into develop_arm_softfp
Martin Kroeker [Sun, 9 Jul 2017 11:16:16 +0000 (13:16 +0200)]
Merge pull request #1230 from martin-frbg/rhel5
Add sched_getcpu implementation for pre-2.6 glibc
Martin Kroeker [Sun, 9 Jul 2017 11:15:24 +0000 (13:15 +0200)]
Do not add -lpthread on Android builds (#1229)
* Do not add -lpthread on Android builds
* Do not add -lpthread on Android cmake builds
Martin Kroeker [Sun, 9 Jul 2017 07:45:38 +0000 (09:45 +0200)]
Add sched_getcpu implementation for pre-2.6 glibc
Fixes #1210, compilation on RHEL5 with affinity enabled
Zhang Xianyi [Fri, 7 Jul 2017 07:43:33 +0000 (15:43 +0800)]
Merge pull request #1225 from martin-frbg/stolen_from_wernsaar_fork
fixed syrk_thread.c taken from wernsaar
Ashwin Sekhar T K [Fri, 7 Jul 2017 07:00:42 +0000 (12:30 +0530)]
arm: Fix clang compilation for ARMv7
clang is not recognizing some pre-UAL VFP mnemonics like fnmacs, fnmacd,
fnmuls and fnmuld. Replaced them with equivalent UAL mnemonics which are
vmls.f32, vmls.f64, vnmul.f32 and vnmul.f64 respectively.
Martin Kroeker [Thu, 6 Jul 2017 15:30:12 +0000 (17:30 +0200)]
fixed syrk_thread.c taken from wernsaar
Stride calculation fix copied from https://github.com/wernsaar/OpenBLAS/commit/88900e1
Martin Kroeker [Thu, 6 Jul 2017 08:12:00 +0000 (10:12 +0200)]
Handle different object extensions in Makefile
The optimized LAPACK functions from interface use OS-dependent suffixes .o/.obj for the object files, while netlib LAPACK uses .o throughout. ReLAPACK object names have to match in order for function replacement in the growing library file to work.
Zhang Xianyi [Wed, 5 Jul 2017 09:01:03 +0000 (17:01 +0800)]
Link -lm or -lm_hard for Android ARMv7.
Zhang Xianyi [Mon, 3 Jul 2017 05:48:29 +0000 (13:48 +0800)]
Merge pull request #1218 from m-brow/power9
Optimise loads on Power9 LE
Zhang Xianyi [Mon, 3 Jul 2017 05:43:48 +0000 (13:43 +0800)]
Merge pull request #1212 from neilsh-msft/develop
Add Microsoft Windows 10 UWP build support
Martin Kroeker [Sat, 1 Jul 2017 23:46:23 +0000 (01:46 +0200)]
Update Makefile
Martin Kroeker [Sat, 1 Jul 2017 22:50:14 +0000 (00:50 +0200)]
Add files via upload
Ashwin Sekhar T K [Sat, 1 Jul 2017 21:36:36 +0000 (03:06 +0530)]
arm: Remove unnecessary files/code
Since softfp code has been added to all required vfp kernels,
the code for auto detection of abi is no longer required.
The option to force softfp ABI on make command line by giving
ARM_SOFTFP_ABI=1 is retained. But there is no need to give this option
anymore.
Also the newly added C versions of 4x4/4x2 gemm/trmm kernels are removed.
These are longer required. Moreover these kernels has bugs.
Ashwin Sekhar T K [Sat, 1 Jul 2017 21:24:32 +0000 (02:54 +0530)]
arm: add softfp support in zgemm/ztrmm vfp kernels
Ashwin Sekhar T K [Sat, 1 Jul 2017 21:12:32 +0000 (02:42 +0530)]
arm: add softfp support in cgemm/ctrmm vfp kernels
Ashwin Sekhar T K [Sat, 1 Jul 2017 20:54:38 +0000 (02:24 +0530)]
arm: add softfp support in dgemm/dtrmm vfp kernels
Ashwin Sekhar T K [Sat, 1 Jul 2017 20:35:48 +0000 (02:05 +0530)]
arm: add softfp support in sgemm/strmm vfp kernels
Ashwin Sekhar T K [Sat, 1 Jul 2017 20:30:48 +0000 (02:00 +0530)]
generic: Bug fixes in generic 4x2 and 4x4 gemm kernels
Ashwin Sekhar T K [Sat, 1 Jul 2017 19:08:44 +0000 (00:38 +0530)]
arm: add softfp support in vfp gemv kernels
Martin Kroeker [Sat, 1 Jul 2017 18:43:23 +0000 (20:43 +0200)]
Merge pull request #1220 from ashwinyes/develop_aarch64_20170701_t99_options
arm64: Change mtune/mcpu options for THUNDERX2T99 target
Ashwin Sekhar T K [Sat, 1 Jul 2017 18:16:12 +0000 (11:16 -0700)]
arm64: Change mtune/mcpu options for THUNDERX2T99 target
Ashwin Sekhar T K [Sat, 1 Jul 2017 15:07:40 +0000 (20:37 +0530)]
arm: add softfp support in kernel/arm/swap_vfp.S
Ashwin Sekhar T K [Sat, 1 Jul 2017 14:27:28 +0000 (19:57 +0530)]
arm: add softfp support in kernel/arm/nrm2_vfp*.S
Ashwin Sekhar T K [Fri, 30 Jun 2017 18:16:02 +0000 (23:46 +0530)]
arm: add softfp support in kernel/arm/*dot_vfp.S
Ashwin Sekhar T K [Fri, 30 Jun 2017 16:22:32 +0000 (21:52 +0530)]
arm: add softfp support in kernel/arm/rot_vfp.S
Ashwin Sekhar T K [Fri, 30 Jun 2017 14:36:29 +0000 (20:06 +0530)]
arm: add softfp support in kernel/arm/axpy_vfp.S
Ashwin Sekhar T K [Fri, 30 Jun 2017 07:42:05 +0000 (13:12 +0530)]
arm: add softfp support in kernel/arm/asum_vfp.S
Ashwin Sekhar T K [Fri, 30 Jun 2017 07:36:38 +0000 (13:06 +0530)]
arm: Use assembly implementations based on the ARM abi
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.
In case of hard abi, all assembly implementations are used.
Ashwin Sekhar T K [Fri, 30 Jun 2017 07:16:18 +0000 (12:46 +0530)]
generic: add some generic gemm and trmm kernels
Added generic 4x4 and 4x2 gemm kernels
Added generic 4x2 trmm kernel
Ashwin Sekhar T K [Fri, 30 Jun 2017 07:13:13 +0000 (12:43 +0530)]
arm: Determine the abi from compiler if not specified on command line
If ARM abi is not explicitly mentioned on the command line, then set the
arm abi to softfp or hard according to the compiler environment.
This assumes that compiler sets the defines __ARM_PCS and __ARM_PCS_VFP
accordingly.
Martin Kroeker [Wed, 28 Jun 2017 16:15:21 +0000 (18:15 +0200)]
Add ReLAPACK to Makefiles
Martin Kroeker [Wed, 28 Jun 2017 16:13:14 +0000 (18:13 +0200)]
Restore ReLAPACK test folder
Martin Kroeker [Wed, 28 Jun 2017 15:38:41 +0000 (17:38 +0200)]
Add Elmar Peise's ReLAPACK
Neil Shipp [Fri, 23 Jun 2017 20:07:34 +0000 (13:07 -0700)]
Add Microsoft Windows 10 UWP build support
Zhang Xianyi [Fri, 23 Jun 2017 03:35:25 +0000 (11:35 +0800)]
Merge branch 'arm_soft_fp_abi' into develop
Zhang Xianyi [Fri, 23 Jun 2017 03:33:09 +0000 (11:33 +0800)]
Merge pull request #1211 from neilsh-msft/develop
Add 64bit support for Microsoft Visual Studio
Neil Shipp [Fri, 23 Jun 2017 01:05:19 +0000 (18:05 -0700)]
Reorder dependencies to allow in-place build to succeed the first time.
Neil Shipp [Fri, 23 Jun 2017 00:08:09 +0000 (17:08 -0700)]
Avoid truncating cblas.h when compiling gencblas target
Neil Shipp [Thu, 22 Jun 2017 00:49:57 +0000 (17:49 -0700)]
Revert changes to sed and awk
Neil Shipp [Wed, 21 Jun 2017 18:06:48 +0000 (11:06 -0700)]
Add 64bit support for Microsoft Visual Studio
Matt Brown [Wed, 14 Jun 2017 06:47:56 +0000 (16:47 +1000)]
Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:45:58 +0000 (16:45 +1000)]
Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:43:31 +0000 (16:43 +1000)]
Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:39:27 +0000 (16:39 +1000)]
Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:38:32 +0000 (16:38 +1000)]
Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:36:10 +0000 (16:36 +1000)]
Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 06:23:20 +0000 (16:23 +1000)]
Optimise sswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 04:58:00 +0000 (14:58 +1000)]
Optimise scopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Matt Brown [Wed, 14 Jun 2017 04:25:10 +0000 (14:25 +1000)]
Optimise ccopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
Martin Kroeker [Thu, 1 Jun 2017 14:36:26 +0000 (16:36 +0200)]
Fix installation of header files with cmake (#1186)
* Fix installation of header files with cmake
Install only the required header files, with openblas_config.h preprocessed like in Makefile.install
Fixes #1184
* Update CMakeLists.txt
Escape remaining semicolons in awk argument list (to get it working on Windows as well)
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Add files via upload
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
see if it is the single quotes that cause the problem on windows
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Update CMakeLists.txt
* Use C utility instead of awk for header generation in cmake builds
* Update CMakeLists.txt
* Fix generation and installation of header files
Generate openblas_config.h and f77blas.h with same contents as in plain Makefile builds and install only the public header files
Martin Kroeker [Thu, 1 Jun 2017 14:35:52 +0000 (16:35 +0200)]
Merge pull request #1190 from oviradoi/utest_make_complex
Update test to use openblas_make_complex_float and openblas_make_comp…
Ovidiu Radoi [Tue, 30 May 2017 09:07:43 +0000 (12:07 +0300)]
Update test to use openblas_make_complex_float and openblas_make_complex_double functions
Martin Kroeker [Sun, 28 May 2017 09:07:57 +0000 (11:07 +0200)]
Merge pull request #1189 from pawosm-arm/flang
build: Flang has the same interface as PGI
Paul Osmialowski [Sat, 27 May 2017 05:23:58 +0000 (06:23 +0100)]
build: Flang has the same interface as PGI
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
Martin Kroeker [Fri, 26 May 2017 21:02:47 +0000 (23:02 +0200)]
Merge pull request #1188 from pawosm-arm/flang
build: Flang compiler support
Paul Osmialowski [Thu, 25 May 2017 11:22:17 +0000 (12:22 +0100)]
build: LLVM: Add Flang compiler support and enable OpenMP for Clang
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
Zhang Xianyi [Wed, 24 May 2017 07:54:58 +0000 (15:54 +0800)]
Merge pull request #1187 from mine260309/develop
build: fix libxlmass errors building on Power CPU
Lei YU [Wed, 24 May 2017 06:18:45 +0000 (14:18 +0800)]
build: fix libxlmass errors building on Power CPU
IBM MASS library is upgraded to 8.1.5 and 8.1.3 is not available.
Update README.md and Makefile.power to use version 8.1.5 of libxlmass.
Martin Kroeker [Wed, 10 May 2017 17:39:09 +0000 (19:39 +0200)]
Merge pull request #1182 from martin-frbg/martin-frbg-patch-1
Build shared library on Android without SONAME versioning
Martin Kroeker [Wed, 10 May 2017 11:08:13 +0000 (13:08 +0200)]
Build shared library on Android without SONAME versioning
Android does not support versioned SONAME entries, ref. #1173
Martin Kroeker [Sat, 6 May 2017 15:20:10 +0000 (17:20 +0200)]
Merge pull request #1178 from jcowgill/mips-fixes
MIPS threading fixes
Martin Kroeker [Sat, 6 May 2017 11:08:46 +0000 (13:08 +0200)]
Merge pull request #1179 from jcowgill/memory-fixes
Fixes to driver/others/memory.c
James Cowgill [Fri, 5 May 2017 09:33:56 +0000 (10:33 +0100)]
memory: Fix buffer overflow when position == NUM_BUFFERS
James Cowgill [Thu, 4 May 2017 13:35:36 +0000 (14:35 +0100)]
mips: remove incorrect blas_lock implementations
MIPS 32-bit currently has an empty blas_lock implementation which is
worse than nothing at all. MIPS 64-bit does has a blas_lock
implementation but is broken. Remove them and fallback to the generic
version in common.h which should do the right thing on MIPS.