Werner Saar [Sat, 5 Mar 2016 09:27:27 +0000 (10:27 +0100)]
Modified assembly label name, so that they are hidden.
Added license informations.
Werner Saar [Fri, 4 Mar 2016 14:01:15 +0000 (15:01 +0100)]
enabled gemm_beta assembly kernels
Werner Saar [Fri, 4 Mar 2016 12:38:57 +0000 (13:38 +0100)]
modified configuration, to use power6 sgemm kernel for power8
Werner Saar [Fri, 4 Mar 2016 12:20:50 +0000 (13:20 +0100)]
enabled hemv assemly function for power8
Werner Saar [Fri, 4 Mar 2016 12:08:18 +0000 (13:08 +0100)]
enabled symv assembly kernels on power8
Werner Saar [Fri, 4 Mar 2016 11:53:31 +0000 (12:53 +0100)]
enabled gemv assembly on power8
Werner Saar [Fri, 4 Mar 2016 11:35:25 +0000 (12:35 +0100)]
enabled all level1 assembly kernels for power8
Werner Saar [Fri, 4 Mar 2016 09:26:53 +0000 (10:26 +0100)]
BUGFIX: increased BUFFER_SIZE for POWER8
Werner Saar [Tue, 1 Mar 2016 06:33:56 +0000 (07:33 +0100)]
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
Zhang Xianyi [Tue, 3 Nov 2015 04:25:05 +0000 (12:25 +0800)]
Init POWER8 kernels by POWER6.
Zhang Xianyi [Fri, 5 Feb 2016 00:39:08 +0000 (08:39 +0800)]
Fixed #778. Merge branch 'buffer51-develop' into develop
buffer51 [Thu, 4 Feb 2016 22:20:07 +0000 (17:20 -0500)]
Restored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682.
buffer51 [Thu, 4 Feb 2016 22:05:31 +0000 (17:05 -0500)]
Fixed linking error when compiling ARMv7 for Android (disabled -lpthread and added -Wl,--no-warn-mismatch).
buffer51 [Sun, 8 Nov 2015 00:31:13 +0000 (19:31 -0500)]
Fix lapack complex implementation of lauu2 and potf2 for Android (use FLOAT instead of FLOAT[2] as imaginary part is not used).
Zhang Xianyi [Thu, 4 Feb 2016 20:23:32 +0000 (15:23 -0500)]
Fixed #773 blas_quickdivide bug on CMake and Visual Studio x86 32-bit.
Zhang Xianyi [Tue, 2 Feb 2016 18:56:27 +0000 (02:56 +0800)]
Fixed #711, #698. Merge branch 'byzhang-develop' into develop
Zhang Xianyi [Tue, 2 Feb 2016 18:48:32 +0000 (02:48 +0800)]
Merge branch 'develop' of https://github.com/byzhang/OpenBLAS into byzhang-develop
Zhang Xianyi [Tue, 2 Feb 2016 18:46:12 +0000 (13:46 -0500)]
Merge pull request #743 from tkelman/patch-1
re enable Fortran optimization flag on windows
Zhang Xianyi [Tue, 2 Feb 2016 18:43:51 +0000 (13:43 -0500)]
Fixed #769. Merge branch 'martin-frbg-develop' into develop
Martin Kroeker [Tue, 2 Feb 2016 08:00:18 +0000 (09:00 +0100)]
Update dynamic.c and cpuid_x86.c for Intel Avoton.
Second part of "support Intel Avoton via Nehalem kernel"
Zhang Xianyi [Tue, 2 Feb 2016 01:15:02 +0000 (09:15 +0800)]
Refs #768. Swap the result of zdot x87 fp kernel.
Martin Kroeker [Sun, 31 Jan 2016 14:33:56 +0000 (15:33 +0100)]
Update cpuid_x86.c
Add recognition of Intel Atom C27xx (Avoton, model code 4D)
Benyu Zhang [Tue, 2 Feb 2016 02:32:42 +0000 (18:32 -0800)]
Fix the source paths
Zhang Xianyi [Tue, 2 Feb 2016 01:15:02 +0000 (09:15 +0800)]
Refs #768. Swap the result of zdot x87 fp kernel.
Tony Kelman [Mon, 18 Jan 2016 16:44:46 +0000 (08:44 -0800)]
re enable Fortran optimization flag on windows
partial revert of https://github.com/xianyi/OpenBLAS/commit/
299cdcdc29999d591fcb300630d50b2986bfb6fc
from #696, was not explained why that was needed
Zhang Xianyi [Fri, 29 Jan 2016 04:18:38 +0000 (22:18 -0600)]
Fix utest bug when INTERFACE64=1.
Zhang Xianyi [Fri, 29 Jan 2016 03:35:31 +0000 (11:35 +0800)]
Use ctest.h for unit test. Enable unit test on travis CI.
Zhang Xianyi [Thu, 28 Jan 2016 17:30:26 +0000 (17:30 +0000)]
Detect ARMV8 on 32-bit mode by using ARMV7 kernels.
Zhang Xianyi [Wed, 27 Jan 2016 20:38:07 +0000 (04:38 +0800)]
Refs #714. avoid compiling warnings.
Zhang Xianyi [Tue, 26 Jan 2016 20:03:27 +0000 (14:03 -0600)]
Merge pull request #764 from martin-frbg/develop
Update Makefile.system to fix awk/nawk issue #763
Martin Kroeker [Tue, 26 Jan 2016 19:35:25 +0000 (20:35 +0100)]
Update Makefile.system
Define AWK as "nawk" for SunOS (actually Illumos) only - fixes #763
Zhang Xianyi [Tue, 26 Jan 2016 15:14:57 +0000 (09:14 -0600)]
Refs #723. Avoid out of boundary for getf2.
Zhang Xianyi [Tue, 26 Jan 2016 14:45:16 +0000 (08:45 -0600)]
Merge pull request #762 from jeromerobert/bug760
Let openblas_get_num_threads return the number of active threads
Zhang Xianyi [Tue, 26 Jan 2016 14:43:32 +0000 (08:43 -0600)]
Merge pull request #759 from jeromerobert/bug742
Bug742
Zhang Xianyi [Tue, 26 Jan 2016 14:42:20 +0000 (08:42 -0600)]
Merge pull request #749 from lotheac/illumos_fixes
illumos fixes
Jerome Robert [Tue, 26 Jan 2016 12:04:16 +0000 (13:04 +0100)]
Let openblas_get_num_threads return the number of active threads
... not the number of allocated threads.
Close #760
wernsaar [Tue, 26 Jan 2016 08:19:14 +0000 (09:19 +0100)]
Merge pull request #761 from wernsaar/develop
Ref #740: all assembly codes now clear floating point register correctly
Werner Saar [Mon, 25 Jan 2016 14:00:13 +0000 (15:00 +0100)]
updated gemv_n_vfpv3.S for armv7
Werner Saar [Mon, 25 Jan 2016 10:55:25 +0000 (11:55 +0100)]
updated nrm2 kernel for armv7
Werner Saar [Mon, 25 Jan 2016 10:08:56 +0000 (11:08 +0100)]
updated trmm kernels for armv7
Werner Saar [Mon, 25 Jan 2016 09:46:10 +0000 (10:46 +0100)]
updated gemm kernels for armv7
Lauri Tirkkonen [Fri, 22 Jan 2016 16:46:27 +0000 (18:46 +0200)]
don't pass -Y at all to the linker on illumos
the illumos linker can't understand the "-Y/lib"... form that f_check
generates, and -Wl cannot pass options that include commas
Werner Saar [Sun, 24 Jan 2016 16:12:07 +0000 (17:12 +0100)]
updated KERNEL.ARMV6
Werner Saar [Sun, 24 Jan 2016 15:31:19 +0000 (16:31 +0100)]
updated gemv kernel for armv6
Werner Saar [Sun, 24 Jan 2016 13:42:38 +0000 (14:42 +0100)]
updated cgemv and zgemv kernels for armv6
Werner Saar [Sun, 24 Jan 2016 12:03:33 +0000 (13:03 +0100)]
updated trmm_kernels for armv6
Werner Saar [Sun, 24 Jan 2016 10:55:50 +0000 (11:55 +0100)]
updated gemm_kernels for armv6
Jerome Robert [Sun, 24 Jan 2016 09:30:50 +0000 (10:30 +0100)]
Use GEMM_MULTITHREAD_THRESHOLD as a number of ops
...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.
Close #742
Werner Saar [Sun, 24 Jan 2016 09:56:49 +0000 (10:56 +0100)]
updated cdot and zdot on arm
Jerome Robert [Fri, 15 Jan 2016 17:40:13 +0000 (18:40 +0100)]
[z]ger: increase multithread threshold
The ones given in 3ae30cd was by far to low because I
mixed m and m*n in my measures. Note that the new ones
are closed to the [z]gemv ones which is comforting
that both are right.
Werner Saar [Sat, 23 Jan 2016 16:47:58 +0000 (17:47 +0100)]
Ref #740: updated nrm2_vfp.S
Werner Saar [Sat, 23 Jan 2016 13:44:34 +0000 (14:44 +0100)]
Ref #740: updated asum_vfp.S and iamax_vfp.S
Werner Saar [Sat, 23 Jan 2016 10:59:51 +0000 (11:59 +0100)]
Ref #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm
Lauri Tirkkonen [Fri, 22 Jan 2016 16:50:05 +0000 (18:50 +0200)]
actually install the shared lib on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:50:29 +0000 (18:50 +0200)]
actually build the shared lib on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:50:53 +0000 (18:50 +0200)]
use $(AWK) in Makefile.install and switch it to nawk
Lauri Tirkkonen [Fri, 22 Jan 2016 16:49:17 +0000 (18:49 +0200)]
RLIMIT_NPROC doesn't exist on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:48:50 +0000 (18:48 +0200)]
make parallel make work on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:48:25 +0000 (18:48 +0200)]
illumos fixes to memory.c
wernsaar [Thu, 21 Jan 2016 13:21:59 +0000 (14:21 +0100)]
Merge pull request #747 from wernsaar/develop
Ref #730: added performance updates for syrk and syr2k
Werner Saar [Thu, 21 Jan 2016 12:16:44 +0000 (13:16 +0100)]
added updates for syrk and syr2k
Zhang Xianyi [Wed, 20 Jan 2016 17:24:22 +0000 (11:24 -0600)]
Merge pull request #745 from jakirkham/minor_fix_scipy_prof
BENCH: Minor fixes in SciPy benchmarks
Zhang Xianyi [Wed, 20 Jan 2016 17:18:21 +0000 (11:18 -0600)]
Merge pull request #744 from jeromerobert/bug731
Bug731
John Kirkham [Tue, 19 Jan 2016 20:32:28 +0000 (15:32 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Overwrite will work on a Fortran array of the correct type.
John Kirkham [Tue, 19 Jan 2016 20:31:37 +0000 (15:31 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Overwrite will work on a Fortran array of the correct type.
John Kirkham [Tue, 19 Jan 2016 20:29:43 +0000 (15:29 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Arrays should be Fortran order.
John Kirkham [Tue, 19 Jan 2016 20:28:22 +0000 (15:28 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Arrays should be Fortran order.
John Kirkham [Tue, 19 Jan 2016 20:06:17 +0000 (15:06 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Fix PEP8 issues.
John Kirkham [Tue, 19 Jan 2016 20:05:18 +0000 (15:05 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Fix PEP8 issues.
John Kirkham [Tue, 19 Jan 2016 20:00:54 +0000 (15:00 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Write values into `C`.
John Kirkham [Tue, 19 Jan 2016 20:00:23 +0000 (15:00 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Write values into `C`.
John Kirkham [Tue, 19 Jan 2016 19:05:14 +0000 (14:05 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Use the environment python.
John Kirkham [Tue, 19 Jan 2016 19:04:55 +0000 (14:04 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Use the environment python.
John Kirkham [Tue, 19 Jan 2016 17:34:01 +0000 (12:34 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Drop unneeded semicolons.
John Kirkham [Tue, 19 Jan 2016 17:33:44 +0000 (12:33 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Drop unneeded semicolons.
John Kirkham [Tue, 19 Jan 2016 17:32:26 +0000 (12:32 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Allocate `C` using zeros instead of randomly generating it.
John Kirkham [Tue, 19 Jan 2016 17:32:14 +0000 (12:32 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Allocate `C` using zeros instead of randomly generating it.
Jerome Robert [Tue, 19 Jan 2016 16:15:31 +0000 (17:15 +0100)]
update CONTRIBUTORS.md
Jerome Robert [Mon, 18 Jan 2016 08:12:37 +0000 (09:12 +0100)]
swap: disable multi-threading for small matrices
Close #731
Jerome Robert [Fri, 15 Jan 2016 16:12:04 +0000 (17:12 +0100)]
Disable multi-threading for small matrices in [z]ger
Ref #731
Werner Saar [Sun, 17 Jan 2016 14:37:12 +0000 (15:37 +0100)]
Ref #740: simple solution to clear floating point register on arm
Zhang Xianyi [Thu, 14 Jan 2016 22:42:54 +0000 (06:42 +0800)]
Fixed CMake bug for single core.
Zhang Xianyi [Wed, 13 Jan 2016 02:44:49 +0000 (20:44 -0600)]
[av skip] Change test cmd on Travis.
Zhang Xianyi [Wed, 13 Jan 2016 02:01:49 +0000 (20:01 -0600)]
Refs #738. Fix previous commit bug. Run BLAS and CBLAS test on Travis.
Zhang Xianyi [Tue, 12 Jan 2016 22:52:47 +0000 (22:52 +0000)]
Refs #738. Run test on Travis.
Zhang Xianyi [Tue, 12 Jan 2016 22:25:36 +0000 (22:25 +0000)]
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop
Zhang Xianyi [Tue, 12 Jan 2016 22:25:08 +0000 (22:25 +0000)]
Merge branch 'jeromerobert-bug736' into develop
Zhang Xianyi [Tue, 12 Jan 2016 22:19:58 +0000 (22:19 +0000)]
#736 Revert #733 patch to fix bus error on ARM.
Zhang Xianyi [Tue, 12 Jan 2016 20:47:34 +0000 (14:47 -0600)]
Merge pull request #739 from sebastien-villemot/develop
Fixes for old outstanding bugs in CBLAS test programs
Sébastien Villemot [Mon, 11 Jan 2016 10:22:17 +0000 (11:22 +0100)]
Fix output descriptors of c_{s,d,c,z}blat3
The NTRA argument can be equal to -1 if one does not want a snapshot file
(and this is the case with sample data {s,d,c,z}in3).
The routines {S,D,C,Z}PRCN3 will try to use their first argument as an output
unit number, so we avoid calling them when NTRA < 0.
Patch originally written by Camm Maguire.
Sébastien Villemot [Mon, 11 Jan 2016 10:15:33 +0000 (11:15 +0100)]
Fix CBLAS double complex level 2 tests
The SNAME variable contains names of C functions like "cblas_dgemv".
Apparently the code was not taking into account the 6-letter "cblas_"
prefix when determining the task to be done.
The issue does not affect c_{s,d,c}blat2.f, which use the correct
offsetting.
Patch originally written by Camm Maguire.
Jerome Robert [Sun, 10 Jan 2016 18:04:37 +0000 (19:04 +0100)]
stack alloc: Fix stack smashing detection in 32bits
* Fix commit 87a2ccc
* Close #736
Werner Saar [Sun, 10 Jan 2016 11:19:03 +0000 (12:19 +0100)]
added benchmark tests for ssyrk and dsyrk
Zhang Xianyi [Sat, 9 Jan 2016 04:13:37 +0000 (22:13 -0600)]
Merge pull request #734 from jeromerobert/common_stackalloc
Factorize MAX_STACK_ALLOC code to common_stackalloc.h
Jerome Robert [Sun, 3 Jan 2016 12:59:37 +0000 (13:59 +0100)]
Factorize MAX_STACK_ALLOC code to common_stackalloc.h
Ref #727
Zhang Xianyi [Wed, 6 Jan 2016 01:35:12 +0000 (19:35 -0600)]
Merge pull request #733 from yuyichao/arm-asm
Do not use vsub to clear the register values
Yichao Yu [Tue, 5 Jan 2016 04:36:25 +0000 (23:36 -0500)]
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
wernsaar [Tue, 5 Jan 2016 14:34:08 +0000 (15:34 +0100)]
Merge pull request #732 from wernsaar/develop
added optimized trsm_kernels
Werner Saar [Tue, 5 Jan 2016 12:05:05 +0000 (13:05 +0100)]
added optimized trsm_kernels
Werner Saar [Tue, 5 Jan 2016 11:36:49 +0000 (12:36 +0100)]
include sched.h if OS is Android