platform/upstream/openblas.git
8 years agoupdated smallscaling.c to build without C99 or C11
Werner Saar [Sun, 6 Mar 2016 07:40:51 +0000 (08:40 +0100)]
updated smallscaling.c to build without C99 or C11
increased the threshold value of nep.in to 40

8 years agoMerge pull request #790 from jeromerobert/bug786
Zhang Xianyi [Sat, 5 Mar 2016 20:25:27 +0000 (15:25 -0500)]
Merge pull request #790 from jeromerobert/bug786

ztrmv_L.c: no longer need a 4kB buffer

8 years agoztrmv_L.c: no longer need a 4kB buffer
Jerome Robert [Sat, 5 Mar 2016 18:07:03 +0000 (19:07 +0100)]
ztrmv_L.c: no longer need a 4kB buffer

Fix #786

8 years agoFixed #789 Fix utest/ctest.h on Mingw.
Zhang Xianyi [Sat, 5 Mar 2016 14:34:37 +0000 (09:34 -0500)]
Fixed #789 Fix utest/ctest.h on Mingw.

8 years agoMerge remote-tracking branch 'origin/power8' into develop
Zhang Xianyi [Sat, 5 Mar 2016 11:03:19 +0000 (06:03 -0500)]
Merge remote-tracking branch 'origin/power8' into develop

Refs #774

8 years agoModified assembly label name, so that they are hidden.
Werner Saar [Sat, 5 Mar 2016 09:27:27 +0000 (10:27 +0100)]
Modified assembly label name, so that they are hidden.
Added license informations.

8 years agoRefs #786. avoid old assembly c/zgemv kernels.
Zhang Xianyi [Sat, 5 Mar 2016 00:32:03 +0000 (08:32 +0800)]
Refs #786. avoid old assembly c/zgemv kernels.

8 years agoenabled gemm_beta assembly kernels
Werner Saar [Fri, 4 Mar 2016 14:01:15 +0000 (15:01 +0100)]
enabled gemm_beta assembly kernels

8 years agomodified configuration, to use power6 sgemm kernel for power8
Werner Saar [Fri, 4 Mar 2016 12:38:57 +0000 (13:38 +0100)]
modified configuration, to use power6 sgemm kernel for power8

8 years agoenabled hemv assemly function for power8
Werner Saar [Fri, 4 Mar 2016 12:20:50 +0000 (13:20 +0100)]
enabled hemv assemly function for power8

8 years agoenabled symv assembly kernels on power8
Werner Saar [Fri, 4 Mar 2016 12:08:18 +0000 (13:08 +0100)]
enabled symv assembly kernels on power8

8 years agoenabled gemv assembly on power8
Werner Saar [Fri, 4 Mar 2016 11:53:31 +0000 (12:53 +0100)]
enabled gemv assembly on power8

8 years agoenabled all level1 assembly kernels for power8
Werner Saar [Fri, 4 Mar 2016 11:35:25 +0000 (12:35 +0100)]
enabled all level1 assembly kernels for power8

8 years agoBUGFIX: increased BUFFER_SIZE for POWER8
Werner Saar [Fri, 4 Mar 2016 09:26:53 +0000 (10:26 +0100)]
BUGFIX: increased BUFFER_SIZE for POWER8

8 years agoModify travis script.
Zhang Xianyi [Thu, 3 Mar 2016 20:24:43 +0000 (04:24 +0800)]
Modify travis script.

8 years agoChange Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.
Zhang Xianyi [Tue, 1 Mar 2016 12:13:08 +0000 (20:13 +0800)]
Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.

8 years agoadded dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
Werner Saar [Tue, 1 Mar 2016 06:33:56 +0000 (07:33 +0100)]
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8

8 years agoRefs #695 add testcase.
Zhang Xianyi [Tue, 1 Mar 2016 06:05:56 +0000 (01:05 -0500)]
Refs #695 add testcase.

8 years agoRefs #695 #783. Replace default x86_64 cgemv_t
Zhang Xianyi [Tue, 1 Mar 2016 03:18:56 +0000 (11:18 +0800)]
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.

8 years agoMerge pull request #784 from peterph/develop
Zhang Xianyi [Sat, 27 Feb 2016 16:24:20 +0000 (11:24 -0500)]
Merge pull request #784 from peterph/develop

collected usage notes

8 years agocollected usage notes
Petr Cerny [Sat, 27 Feb 2016 15:57:22 +0000 (16:57 +0100)]
collected usage notes

8 years agoRefs #696. Turn off stack limit setting on Linux.
Zhang Xianyi [Wed, 24 Feb 2016 19:18:39 +0000 (14:18 -0500)]
Refs #696. Turn off stack limit setting on Linux.

I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.

8 years agoFix c/zaxpyc kernel bug on Cortex-A57.
Zhang Xianyi [Tue, 23 Feb 2016 22:47:53 +0000 (22:47 +0000)]
Fix c/zaxpyc kernel bug on Cortex-A57.

8 years agoRefs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
Zhang Xianyi [Fri, 19 Feb 2016 22:56:07 +0000 (17:56 -0500)]
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.

On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.

8 years ago[av skip] Fix utest makefile bug on travis ci.
Zhang Xianyi [Fri, 19 Feb 2016 16:21:43 +0000 (00:21 +0800)]
[av skip] Fix utest makefile bug on travis ci.

8 years agoFix makefile bug for utest.
Zhang Xianyi [Thu, 18 Feb 2016 22:01:48 +0000 (17:01 -0500)]
Fix makefile bug for utest.

8 years agoFix compiling bug on ARM Cortex-A57.
Zhang Xianyi [Sat, 13 Feb 2016 15:38:52 +0000 (15:38 +0000)]
Fix compiling bug on ARM Cortex-A57.

8 years agoUpdate readme.
Zhang Xianyi [Fri, 12 Feb 2016 16:33:53 +0000 (00:33 +0800)]
Update readme.

8 years agoRun utest when building.
Zhang Xianyi [Fri, 12 Feb 2016 16:33:31 +0000 (00:33 +0800)]
Run utest when building.

8 years agoEnable utest for appveyor.
Zhang Xianyi [Fri, 12 Feb 2016 06:50:20 +0000 (01:50 -0500)]
Enable utest for appveyor.

8 years agoAdd utest for CMake.
Zhang Xianyi [Thu, 11 Feb 2016 21:38:13 +0000 (05:38 +0800)]
Add utest for CMake.

8 years agoAdded mising lapacke files for CMake.
Zhang Xianyi [Thu, 11 Feb 2016 21:28:16 +0000 (05:28 +0800)]
Added mising lapacke files for CMake.

8 years agoAdd gemm3m building for CMake.
Zhang Xianyi [Thu, 11 Feb 2016 21:02:51 +0000 (05:02 +0800)]
Add gemm3m building for CMake.

8 years agoUpdate ctest.h from github.com:xianyi/ctest.git.
Zhang Xianyi [Thu, 11 Feb 2016 21:01:57 +0000 (05:01 +0800)]
Update ctest.h from github.com:xianyi/ctest.git.

8 years agoRefs #707. Bugfix for previous commit.
Zhang Xianyi [Wed, 10 Feb 2016 21:14:53 +0000 (05:14 +0800)]
Refs #707. Bugfix for previous commit.

8 years agoRefs #707. Add BUILD_LAPACK_DEPRECATED flag in Makefile.rule.
Zhang Xianyi [Wed, 10 Feb 2016 20:22:53 +0000 (04:22 +0800)]
Refs #707. Add BUILD_LAPACK_DEPRECATED flag in Makefile.rule.

If you want to build LAPACK deprecated functions since LAPACK 3.6.0

make BUILD_LAPACK_DEPRECATED=1

8 years agoRefs #727. Align stack buffer address on 32-bytes.
Zhang Xianyi [Wed, 10 Feb 2016 19:51:26 +0000 (03:51 +0800)]
Refs #727. Align stack buffer address on 32-bytes.

8 years agoMerge pull request #780 from jeromerobert/bug727
Zhang Xianyi [Mon, 8 Feb 2016 18:24:40 +0000 (13:24 -0500)]
Merge pull request #780 from jeromerobert/bug727

Bug727

8 years agoFix zgemv.c compilation when stack allocation is disabled
Jerome Robert [Mon, 8 Feb 2016 11:05:02 +0000 (12:05 +0100)]
Fix zgemv.c compilation when stack allocation is disabled

8 years agoupdate CONTRIBUTORS.md
Jerome Robert [Mon, 18 Jan 2016 17:54:51 +0000 (18:54 +0100)]
update CONTRIBUTORS.md

8 years agoAdd benchmark/smallscaling.c
Jerome Robert [Sun, 3 Jan 2016 13:04:33 +0000 (14:04 +0100)]
Add benchmark/smallscaling.c

* Bench small matrices with multi-threading
* Close #727

8 years agozgemv: Add a workaround for #746
Jerome Robert [Sun, 24 Jan 2016 09:14:41 +0000 (10:14 +0100)]
zgemv: Add a workaround for #746

8 years agoImprove performances of ztrmv on small matrices
Jerome Robert [Thu, 14 Jan 2016 21:12:57 +0000 (22:12 +0100)]
Improve performances of ztrmv on small matrices

* Use stack allocation
* Disable multi-threading
* Ref #727

8 years agoUse stack allocation in zgemv and zger
Jerome Robert [Sun, 3 Jan 2016 13:01:12 +0000 (14:01 +0100)]
Use stack allocation in zgemv and zger

For better performance with small matrices
Ref #727

8 years agoFixed #778. Merge branch 'buffer51-develop' into develop
Zhang Xianyi [Fri, 5 Feb 2016 00:39:08 +0000 (08:39 +0800)]
Fixed #778. Merge branch 'buffer51-develop' into develop

8 years agoRestored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682.
buffer51 [Thu, 4 Feb 2016 22:20:07 +0000 (17:20 -0500)]
Restored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682.

8 years agoFixed linking error when compiling ARMv7 for Android (disabled -lpthread and added...
buffer51 [Thu, 4 Feb 2016 22:05:31 +0000 (17:05 -0500)]
Fixed linking error when compiling ARMv7 for Android (disabled -lpthread and added -Wl,--no-warn-mismatch).

8 years agoFix lapack complex implementation of lauu2 and potf2 for Android (use FLOAT instead...
buffer51 [Sun, 8 Nov 2015 00:31:13 +0000 (19:31 -0500)]
Fix lapack complex implementation of lauu2 and potf2 for Android (use FLOAT instead of FLOAT[2] as imaginary part is not used).

8 years agoFixed #773 blas_quickdivide bug on CMake and Visual Studio x86 32-bit.
Zhang Xianyi [Thu, 4 Feb 2016 20:23:32 +0000 (15:23 -0500)]
Fixed #773 blas_quickdivide bug on CMake and Visual Studio x86 32-bit.

8 years agoFixed #711, #698. Merge branch 'byzhang-develop' into develop
Zhang Xianyi [Tue, 2 Feb 2016 18:56:27 +0000 (02:56 +0800)]
Fixed #711, #698. Merge branch 'byzhang-develop' into develop

8 years agoMerge branch 'develop' of https://github.com/byzhang/OpenBLAS into byzhang-develop
Zhang Xianyi [Tue, 2 Feb 2016 18:48:32 +0000 (02:48 +0800)]
Merge branch 'develop' of https://github.com/byzhang/OpenBLAS into byzhang-develop

8 years agoMerge pull request #743 from tkelman/patch-1
Zhang Xianyi [Tue, 2 Feb 2016 18:46:12 +0000 (13:46 -0500)]
Merge pull request #743 from tkelman/patch-1

re enable Fortran optimization flag on windows

8 years agoFixed #769. Merge branch 'martin-frbg-develop' into develop
Zhang Xianyi [Tue, 2 Feb 2016 18:43:51 +0000 (13:43 -0500)]
Fixed #769. Merge branch 'martin-frbg-develop' into develop

8 years agoUpdate dynamic.c and cpuid_x86.c for Intel Avoton.
Martin Kroeker [Tue, 2 Feb 2016 08:00:18 +0000 (09:00 +0100)]
Update dynamic.c and cpuid_x86.c for Intel Avoton.

Second part of "support Intel Avoton via Nehalem kernel"

8 years agoRefs #768. Swap the result of zdot x87 fp kernel.
Zhang Xianyi [Tue, 2 Feb 2016 01:15:02 +0000 (09:15 +0800)]
Refs #768. Swap the result of zdot x87 fp kernel.

8 years agoUpdate cpuid_x86.c
Martin Kroeker [Sun, 31 Jan 2016 14:33:56 +0000 (15:33 +0100)]
Update cpuid_x86.c

Add recognition of Intel Atom C27xx (Avoton, model code 4D)

8 years agoFix the source paths
Benyu Zhang [Tue, 2 Feb 2016 02:32:42 +0000 (18:32 -0800)]
Fix the source paths

8 years agoRefs #768. Swap the result of zdot x87 fp kernel.
Zhang Xianyi [Tue, 2 Feb 2016 01:15:02 +0000 (09:15 +0800)]
Refs #768. Swap the result of zdot x87 fp kernel.

8 years agore enable Fortran optimization flag on windows
Tony Kelman [Mon, 18 Jan 2016 16:44:46 +0000 (08:44 -0800)]
re enable Fortran optimization flag on windows

partial revert of https://github.com/xianyi/OpenBLAS/commit/299cdcdc29999d591fcb300630d50b2986bfb6fc
from #696, was not explained why that was needed

8 years agoFix utest bug when INTERFACE64=1.
Zhang Xianyi [Fri, 29 Jan 2016 04:18:38 +0000 (22:18 -0600)]
Fix utest bug when INTERFACE64=1.

8 years agoUse ctest.h for unit test. Enable unit test on travis CI.
Zhang Xianyi [Fri, 29 Jan 2016 03:35:31 +0000 (11:35 +0800)]
Use ctest.h for unit test. Enable unit test on travis CI.

8 years agoDetect ARMV8 on 32-bit mode by using ARMV7 kernels.
Zhang Xianyi [Thu, 28 Jan 2016 17:30:26 +0000 (17:30 +0000)]
Detect ARMV8 on 32-bit mode by using ARMV7 kernels.

8 years agoRefs #714. avoid compiling warnings.
Zhang Xianyi [Wed, 27 Jan 2016 20:38:07 +0000 (04:38 +0800)]
Refs #714. avoid compiling warnings.

8 years agoMerge pull request #764 from martin-frbg/develop
Zhang Xianyi [Tue, 26 Jan 2016 20:03:27 +0000 (14:03 -0600)]
Merge pull request #764 from martin-frbg/develop

Update Makefile.system to fix awk/nawk issue #763

8 years agoUpdate Makefile.system
Martin Kroeker [Tue, 26 Jan 2016 19:35:25 +0000 (20:35 +0100)]
Update Makefile.system

Define AWK as "nawk" for SunOS (actually Illumos) only - fixes #763

8 years agoRefs #723. Avoid out of boundary for getf2.
Zhang Xianyi [Tue, 26 Jan 2016 15:14:57 +0000 (09:14 -0600)]
Refs #723. Avoid out of boundary for getf2.

8 years agoMerge pull request #762 from jeromerobert/bug760
Zhang Xianyi [Tue, 26 Jan 2016 14:45:16 +0000 (08:45 -0600)]
Merge pull request #762 from jeromerobert/bug760

Let openblas_get_num_threads return the number of active threads

8 years agoMerge pull request #759 from jeromerobert/bug742
Zhang Xianyi [Tue, 26 Jan 2016 14:43:32 +0000 (08:43 -0600)]
Merge pull request #759 from jeromerobert/bug742

Bug742

8 years agoMerge pull request #749 from lotheac/illumos_fixes
Zhang Xianyi [Tue, 26 Jan 2016 14:42:20 +0000 (08:42 -0600)]
Merge pull request #749 from lotheac/illumos_fixes

illumos fixes

8 years agoLet openblas_get_num_threads return the number of active threads
Jerome Robert [Tue, 26 Jan 2016 12:04:16 +0000 (13:04 +0100)]
Let openblas_get_num_threads return the number of active threads

... not the number of allocated threads.

Close #760

8 years agoMerge pull request #761 from wernsaar/develop
wernsaar [Tue, 26 Jan 2016 08:19:14 +0000 (09:19 +0100)]
Merge pull request #761 from wernsaar/develop

Ref #740: all assembly codes now clear floating point register correctly

8 years agoupdated gemv_n_vfpv3.S for armv7
Werner Saar [Mon, 25 Jan 2016 14:00:13 +0000 (15:00 +0100)]
updated gemv_n_vfpv3.S for armv7

8 years agoupdated nrm2 kernel for armv7
Werner Saar [Mon, 25 Jan 2016 10:55:25 +0000 (11:55 +0100)]
updated nrm2 kernel for armv7

8 years agoupdated trmm kernels for armv7
Werner Saar [Mon, 25 Jan 2016 10:08:56 +0000 (11:08 +0100)]
updated trmm kernels for armv7

8 years agoupdated gemm kernels for armv7
Werner Saar [Mon, 25 Jan 2016 09:46:10 +0000 (10:46 +0100)]
updated gemm kernels for armv7

8 years agodon't pass -Y at all to the linker on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:46:27 +0000 (18:46 +0200)]
don't pass -Y at all to the linker on illumos

the illumos linker can't understand the "-Y/lib"... form that f_check
generates, and -Wl cannot pass options that include commas

8 years agoupdated KERNEL.ARMV6
Werner Saar [Sun, 24 Jan 2016 16:12:07 +0000 (17:12 +0100)]
updated KERNEL.ARMV6

8 years agoupdated gemv kernel for armv6
Werner Saar [Sun, 24 Jan 2016 15:31:19 +0000 (16:31 +0100)]
updated gemv kernel for armv6

8 years agoupdated cgemv and zgemv kernels for armv6
Werner Saar [Sun, 24 Jan 2016 13:42:38 +0000 (14:42 +0100)]
updated cgemv and zgemv kernels for armv6

8 years agoupdated trmm_kernels for armv6
Werner Saar [Sun, 24 Jan 2016 12:03:33 +0000 (13:03 +0100)]
updated trmm_kernels for armv6

8 years agoupdated gemm_kernels for armv6
Werner Saar [Sun, 24 Jan 2016 10:55:50 +0000 (11:55 +0100)]
updated gemm_kernels for armv6

8 years agoUse GEMM_MULTITHREAD_THRESHOLD as a number of ops
Jerome Robert [Sun, 24 Jan 2016 09:30:50 +0000 (10:30 +0100)]
Use GEMM_MULTITHREAD_THRESHOLD as a number of ops

...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.

Close #742

8 years agoupdated cdot and zdot on arm
Werner Saar [Sun, 24 Jan 2016 09:56:49 +0000 (10:56 +0100)]
updated cdot and zdot on arm

8 years ago[z]ger: increase multithread threshold
Jerome Robert [Fri, 15 Jan 2016 17:40:13 +0000 (18:40 +0100)]
[z]ger: increase multithread threshold

The ones given in 3ae30cd was by far to low because I
mixed m and m*n in my measures. Note that the new ones
are closed to the [z]gemv ones which is comforting
that both are right.

8 years agoRef #740: updated nrm2_vfp.S
Werner Saar [Sat, 23 Jan 2016 16:47:58 +0000 (17:47 +0100)]
Ref #740: updated nrm2_vfp.S

8 years agoRef #740: updated asum_vfp.S and iamax_vfp.S
Werner Saar [Sat, 23 Jan 2016 13:44:34 +0000 (14:44 +0100)]
Ref #740: updated asum_vfp.S and iamax_vfp.S

8 years agoRef #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm
Werner Saar [Sat, 23 Jan 2016 10:59:51 +0000 (11:59 +0100)]
Ref #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm

8 years agoactually install the shared lib on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:50:05 +0000 (18:50 +0200)]
actually install the shared lib on illumos

8 years agoactually build the shared lib on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:50:29 +0000 (18:50 +0200)]
actually build the shared lib on illumos

8 years agouse $(AWK) in Makefile.install and switch it to nawk
Lauri Tirkkonen [Fri, 22 Jan 2016 16:50:53 +0000 (18:50 +0200)]
use $(AWK) in Makefile.install and switch it to nawk

8 years agoRLIMIT_NPROC doesn't exist on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:49:17 +0000 (18:49 +0200)]
RLIMIT_NPROC doesn't exist on illumos

8 years agomake parallel make work on illumos
Lauri Tirkkonen [Fri, 22 Jan 2016 16:48:50 +0000 (18:48 +0200)]
make parallel make work on illumos

8 years agoillumos fixes to memory.c
Lauri Tirkkonen [Fri, 22 Jan 2016 16:48:25 +0000 (18:48 +0200)]
illumos fixes to memory.c

8 years agoMerge pull request #747 from wernsaar/develop
wernsaar [Thu, 21 Jan 2016 13:21:59 +0000 (14:21 +0100)]
Merge pull request #747 from wernsaar/develop

Ref #730: added performance updates for syrk and syr2k

8 years agoadded updates for syrk and syr2k
Werner Saar [Thu, 21 Jan 2016 12:16:44 +0000 (13:16 +0100)]
added updates for syrk and syr2k

8 years agoMerge pull request #745 from jakirkham/minor_fix_scipy_prof
Zhang Xianyi [Wed, 20 Jan 2016 17:24:22 +0000 (11:24 -0600)]
Merge pull request #745 from jakirkham/minor_fix_scipy_prof

BENCH: Minor fixes in SciPy benchmarks

8 years agoMerge pull request #744 from jeromerobert/bug731
Zhang Xianyi [Wed, 20 Jan 2016 17:18:21 +0000 (11:18 -0600)]
Merge pull request #744 from jeromerobert/bug731

Bug731

8 years agobenchmark/scripts/SCIPY/dsyrk.py: Overwrite will work on a Fortran array of the corre...
John Kirkham [Tue, 19 Jan 2016 20:32:28 +0000 (15:32 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Overwrite will work on a Fortran array of the correct type.

8 years agobenchmark/scripts/SCIPY/ssyrk.py: Overwrite will work on a Fortran array of the corre...
John Kirkham [Tue, 19 Jan 2016 20:31:37 +0000 (15:31 -0500)]
benchmark/scripts/SCIPY/ssyrk.py: Overwrite will work on a Fortran array of the correct type.

8 years agobenchmark/scripts/SCIPY/dsyrk.py: Arrays should be Fortran order.
John Kirkham [Tue, 19 Jan 2016 20:29:43 +0000 (15:29 -0500)]
benchmark/scripts/SCIPY/dsyrk.py: Arrays should be Fortran order.