platform/upstream/openblas.git
8 years agoMerge pull request #814 from wernsaar/develop
wernsaar [Tue, 22 Mar 2016 14:24:59 +0000 (15:24 +0100)]
Merge pull request #814 from wernsaar/develop

added optimized daxpy kernel for POWER8

8 years agoadded optimized daxpy kernel for POWER8
Werner Saar [Tue, 22 Mar 2016 13:50:03 +0000 (14:50 +0100)]
added optimized daxpy kernel for POWER8

8 years agoMerge pull request #812 from wernsaar/develop
wernsaar [Mon, 21 Mar 2016 12:59:44 +0000 (13:59 +0100)]
Merge pull request #812 from wernsaar/develop

added optimized sdot kernel for POWER8

8 years agoadded optimized sdot kernel for POWER8
Werner Saar [Mon, 21 Mar 2016 12:18:23 +0000 (13:18 +0100)]
added optimized sdot kernel for POWER8

8 years agoMerge pull request #811 from wernsaar/develop
wernsaar [Mon, 21 Mar 2016 09:48:41 +0000 (10:48 +0100)]
Merge pull request #811 from wernsaar/develop

added optimized zdot kernel for POWER8

8 years agoadded optimized zdot kernel for POWER8
Werner Saar [Mon, 21 Mar 2016 09:12:07 +0000 (10:12 +0100)]
added optimized zdot kernel for POWER8

8 years agoMerge branch 'release-0.2.17' into develop
Zhang Xianyi [Mon, 21 Mar 2016 00:52:43 +0000 (20:52 -0400)]
Merge branch 'release-0.2.17' into develop

8 years agoFix change log typo. v0.2.17
Zhang Xianyi [Mon, 21 Mar 2016 00:52:15 +0000 (20:52 -0400)]
Fix change log typo.

8 years agoMerge branch 'master' into develop
Zhang Xianyi [Mon, 21 Mar 2016 00:48:21 +0000 (20:48 -0400)]
Merge branch 'master' into develop
Bump to 0.2.18.dev

Conflicts:
CMakeLists.txt
Makefile.rule

8 years agoMerge branch 'release-0.2.17'
Zhang Xianyi [Mon, 21 Mar 2016 00:44:01 +0000 (20:44 -0400)]
Merge branch 'release-0.2.17'

8 years agoUpdate doc for 0.2.17.
Zhang Xianyi [Mon, 21 Mar 2016 00:43:42 +0000 (20:43 -0400)]
Update doc for 0.2.17.

8 years agoMerge branch 'release-0.2.17' into develop
Zhang Xianyi [Sun, 20 Mar 2016 13:24:28 +0000 (09:24 -0400)]
Merge branch 'release-0.2.17' into develop

8 years agoRefs #807. Enable BUILD_LAPACK_DEPRECATED=1 by default.
Zhang Xianyi [Sun, 20 Mar 2016 13:22:56 +0000 (09:22 -0400)]
Refs #807. Enable BUILD_LAPACK_DEPRECATED=1 by default.

8 years agoMerge pull request #808 from theoractice/develop
Zhang Xianyi [Sun, 20 Mar 2016 13:07:47 +0000 (09:07 -0400)]
Merge pull request #808 from theoractice/develop

Fix a minor compiler error in VisualStudio with CMake

8 years agoMerge pull request #809 from wernsaar/develop
wernsaar [Sun, 20 Mar 2016 12:16:41 +0000 (13:16 +0100)]
Merge pull request #809 from wernsaar/develop

Ref #795: added optimized ddot kernel for POWER8

8 years agoFix a minor compiler error in VisualStudio with CMake
theoractice [Sun, 20 Mar 2016 10:58:18 +0000 (18:58 +0800)]
Fix a minor compiler error in VisualStudio with CMake

8 years agoddot for POWER8: updated licence information
Werner Saar [Sun, 20 Mar 2016 10:19:27 +0000 (11:19 +0100)]
ddot for POWER8: updated licence information

8 years agoadded optimized ddot kernel for POWER8
Werner Saar [Sun, 20 Mar 2016 10:06:06 +0000 (11:06 +0100)]
added optimized ddot kernel for POWER8

8 years agoMerge pull request #806 from wernsaar/develop
wernsaar [Fri, 18 Mar 2016 11:46:16 +0000 (12:46 +0100)]
Merge pull request #806 from wernsaar/develop

adding optimized single precision blas level3 kernels for POWER8

8 years agofixed sgemm- and strmm-kernel
Werner Saar [Fri, 18 Mar 2016 11:12:03 +0000 (12:12 +0100)]
fixed sgemm- and strmm-kernel

8 years agoadd optimized cgemm- and ctrmm-kernel for POWER8
Werner Saar [Fri, 18 Mar 2016 07:17:25 +0000 (08:17 +0100)]
add optimized cgemm- and ctrmm-kernel for POWER8

8 years agoBump devlop version to 0.2.17.dev.
Zhang Xianyi [Tue, 15 Mar 2016 18:52:01 +0000 (14:52 -0400)]
Bump devlop version to 0.2.17.dev.

8 years agoMerge branch 'release-0.2.16' v0.2.16
Zhang Xianyi [Tue, 15 Mar 2016 18:49:10 +0000 (14:49 -0400)]
Merge branch 'release-0.2.16'

8 years agoUpdate 0.2.16 doc
Zhang Xianyi [Tue, 15 Mar 2016 18:48:41 +0000 (14:48 -0400)]
Update 0.2.16 doc

8 years agoMerge branch 'develop' into release-0.2.16
Zhang Xianyi [Tue, 15 Mar 2016 17:56:01 +0000 (13:56 -0400)]
Merge branch 'develop' into release-0.2.16

8 years agoMerge pull request #802 from ashwinyes/develop_20160314_dgemm_optimization
Zhang Xianyi [Tue, 15 Mar 2016 00:31:03 +0000 (20:31 -0400)]
Merge pull request #802 from ashwinyes/develop_20160314_dgemm_optimization

DGEMM Optimizations for Cortex-A57

8 years agoMerge pull request #801 from Keno/patch-3
Zhang Xianyi [Mon, 14 Mar 2016 19:42:31 +0000 (15:42 -0400)]
Merge pull request #801 from Keno/patch-3

Don't pass REALNAME to `.end`

8 years agoUpdate CONTRIBUTORS.md
Ashwin Sekhar T K [Mon, 14 Mar 2016 14:29:41 +0000 (19:59 +0530)]
Update CONTRIBUTORS.md

8 years agoOptimize Dgemm 4x4 for Cortex A57
Ashwin Sekhar T K [Mon, 14 Mar 2016 14:05:23 +0000 (19:35 +0530)]
Optimize Dgemm 4x4 for Cortex A57

8 years agoFunctional Assembly Kernels for CortexA57
Ashwin Sekhar T K [Mon, 14 Mar 2016 14:03:21 +0000 (19:33 +0530)]
Functional Assembly Kernels for CortexA57

Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8

8 years agoBUGFIX: KERNEL.POWER8
Werner Saar [Mon, 14 Mar 2016 13:36:59 +0000 (14:36 +0100)]
BUGFIX: KERNEL.POWER8

8 years agoadded sgemm- and strmm-kernel for POWER8
Werner Saar [Mon, 14 Mar 2016 12:52:44 +0000 (13:52 +0100)]
added sgemm- and strmm-kernel for POWER8

8 years agoDon't pass REALNAME to `.end`
Keno Fischer [Sun, 13 Mar 2016 22:56:21 +0000 (18:56 -0400)]
Don't pass REALNAME to `.end`

Putting the procedure there is an MSVC-ism, where it is optional. GCC silently ignores and Clang errors, so it is best to remove this.

8 years agoMerge pull request #800 from jeromerobert/smallscaling
Zhang Xianyi [Thu, 10 Mar 2016 20:45:33 +0000 (15:45 -0500)]
Merge pull request #800 from jeromerobert/smallscaling

Fix smallscaling compilation

8 years agoFix smallscaling compilation
Jerome Robert [Thu, 10 Mar 2016 19:24:41 +0000 (20:24 +0100)]
Fix smallscaling compilation

Also revert 0bbca5e

8 years agoFIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
Werner Saar [Thu, 10 Mar 2016 10:10:38 +0000 (11:10 +0100)]
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c

8 years agoMerge pull request #799 from wernsaar/develop
wernsaar [Thu, 10 Mar 2016 09:22:08 +0000 (10:22 +0100)]
Merge pull request #799 from wernsaar/develop

Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver…

8 years agoAdded optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
Werner Saar [Thu, 10 Mar 2016 08:42:07 +0000 (09:42 +0100)]
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller

8 years agoAdd missing openblas_env makefile.
Zhang Xianyi [Wed, 9 Mar 2016 19:52:47 +0000 (14:52 -0500)]
Add missing openblas_env makefile.

8 years agoRefs #716. Only call getenv at init function.
Zhang Xianyi [Wed, 9 Mar 2016 17:50:07 +0000 (12:50 -0500)]
Refs #716. Only call getenv at init function.

8 years agoMerge pull request #798 from wernsaar/develop
wernsaar [Wed, 9 Mar 2016 14:55:56 +0000 (15:55 +0100)]
Merge pull request #798 from wernsaar/develop

Optimized zgemv_n kernel for bulldozer, piledriver and steamroller

8 years agomodified common.h for piledriver
Werner Saar [Wed, 9 Mar 2016 14:48:29 +0000 (15:48 +0100)]
modified common.h for piledriver

8 years agoAdded optimized zgemv_n kernel for bulldozer, piledriver and steamroller
Werner Saar [Wed, 9 Mar 2016 13:02:03 +0000 (14:02 +0100)]
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller

8 years agoMerge pull request #797 from wernsaar/develop
wernsaar [Mon, 7 Mar 2016 15:44:17 +0000 (16:44 +0100)]
Merge pull request #797 from wernsaar/develop

bugfixes for lapack and lapacke

8 years agoBUGFIX: removed fixes for bugs #148 and #149, because info for xerbla is wrong
Werner Saar [Mon, 7 Mar 2016 09:34:04 +0000 (10:34 +0100)]
BUGFIX: removed fixes for bugs #148 and #149, because info for xerbla is wrong

8 years agobugfixes form lapack svn for bugs #142 - #155
Werner Saar [Mon, 7 Mar 2016 09:10:00 +0000 (10:10 +0100)]
bugfixes form lapack svn for bugs #142 - #155

8 years agoBugfix for ztrmv
Werner Saar [Mon, 7 Mar 2016 08:39:34 +0000 (09:39 +0100)]
Bugfix for ztrmv

8 years agoRefs #786. Revert to default assembly kernel.
Zhang Xianyi [Mon, 7 Mar 2016 03:34:58 +0000 (11:34 +0800)]
Refs #786. Revert to default assembly kernel.

8 years agoremoved build of smallscaling, because build on arm, arm64 and power fails
Werner Saar [Sun, 6 Mar 2016 10:54:41 +0000 (11:54 +0100)]
removed build of smallscaling, because build on arm, arm64 and power fails

8 years agomodified KERNEL for power, to use the generic DSDOT-KERNEL
Werner Saar [Sun, 6 Mar 2016 08:07:24 +0000 (09:07 +0100)]
modified KERNEL for power, to use the generic DSDOT-KERNEL

8 years agoupdated smallscaling.c to build without C99 or C11
Werner Saar [Sun, 6 Mar 2016 07:40:51 +0000 (08:40 +0100)]
updated smallscaling.c to build without C99 or C11
increased the threshold value of nep.in to 40

8 years agoMerge pull request #790 from jeromerobert/bug786
Zhang Xianyi [Sat, 5 Mar 2016 20:25:27 +0000 (15:25 -0500)]
Merge pull request #790 from jeromerobert/bug786

ztrmv_L.c: no longer need a 4kB buffer

8 years agoztrmv_L.c: no longer need a 4kB buffer
Jerome Robert [Sat, 5 Mar 2016 18:07:03 +0000 (19:07 +0100)]
ztrmv_L.c: no longer need a 4kB buffer

Fix #786

8 years agoFixed #789 Fix utest/ctest.h on Mingw.
Zhang Xianyi [Sat, 5 Mar 2016 14:34:37 +0000 (09:34 -0500)]
Fixed #789 Fix utest/ctest.h on Mingw.

8 years agoMerge remote-tracking branch 'origin/power8' into develop
Zhang Xianyi [Sat, 5 Mar 2016 11:03:19 +0000 (06:03 -0500)]
Merge remote-tracking branch 'origin/power8' into develop

Refs #774

8 years agoModified assembly label name, so that they are hidden.
Werner Saar [Sat, 5 Mar 2016 09:27:27 +0000 (10:27 +0100)]
Modified assembly label name, so that they are hidden.
Added license informations.

8 years agoRefs #786. avoid old assembly c/zgemv kernels.
Zhang Xianyi [Sat, 5 Mar 2016 00:32:03 +0000 (08:32 +0800)]
Refs #786. avoid old assembly c/zgemv kernels.

8 years agoenabled gemm_beta assembly kernels
Werner Saar [Fri, 4 Mar 2016 14:01:15 +0000 (15:01 +0100)]
enabled gemm_beta assembly kernels

8 years agomodified configuration, to use power6 sgemm kernel for power8
Werner Saar [Fri, 4 Mar 2016 12:38:57 +0000 (13:38 +0100)]
modified configuration, to use power6 sgemm kernel for power8

8 years agoenabled hemv assemly function for power8
Werner Saar [Fri, 4 Mar 2016 12:20:50 +0000 (13:20 +0100)]
enabled hemv assemly function for power8

8 years agoenabled symv assembly kernels on power8
Werner Saar [Fri, 4 Mar 2016 12:08:18 +0000 (13:08 +0100)]
enabled symv assembly kernels on power8

8 years agoenabled gemv assembly on power8
Werner Saar [Fri, 4 Mar 2016 11:53:31 +0000 (12:53 +0100)]
enabled gemv assembly on power8

8 years agoenabled all level1 assembly kernels for power8
Werner Saar [Fri, 4 Mar 2016 11:35:25 +0000 (12:35 +0100)]
enabled all level1 assembly kernels for power8

8 years agoBUGFIX: increased BUFFER_SIZE for POWER8
Werner Saar [Fri, 4 Mar 2016 09:26:53 +0000 (10:26 +0100)]
BUGFIX: increased BUFFER_SIZE for POWER8

8 years agoModify travis script.
Zhang Xianyi [Thu, 3 Mar 2016 20:24:43 +0000 (04:24 +0800)]
Modify travis script.

8 years agoChange Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.
Zhang Xianyi [Tue, 1 Mar 2016 12:13:08 +0000 (20:13 +0800)]
Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.

8 years agoadded dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
Werner Saar [Tue, 1 Mar 2016 06:33:56 +0000 (07:33 +0100)]
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8

8 years agoRefs #695 add testcase.
Zhang Xianyi [Tue, 1 Mar 2016 06:05:56 +0000 (01:05 -0500)]
Refs #695 add testcase.

8 years agoRefs #695 #783. Replace default x86_64 cgemv_t
Zhang Xianyi [Tue, 1 Mar 2016 03:18:56 +0000 (11:18 +0800)]
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.

8 years agoMerge pull request #784 from peterph/develop
Zhang Xianyi [Sat, 27 Feb 2016 16:24:20 +0000 (11:24 -0500)]
Merge pull request #784 from peterph/develop

collected usage notes

8 years agocollected usage notes
Petr Cerny [Sat, 27 Feb 2016 15:57:22 +0000 (16:57 +0100)]
collected usage notes

8 years agoUpdate Changelog for 0.2.16.rc1.
Zhang Xianyi [Wed, 24 Feb 2016 20:21:22 +0000 (15:21 -0500)]
Update Changelog for 0.2.16.rc1.

8 years agoRefs #696. Turn off stack limit setting on Linux.
Zhang Xianyi [Wed, 24 Feb 2016 19:18:39 +0000 (14:18 -0500)]
Refs #696. Turn off stack limit setting on Linux.

I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.

8 years agoRefs #696. Turn off stack limit setting on Linux.
Zhang Xianyi [Wed, 24 Feb 2016 19:18:39 +0000 (14:18 -0500)]
Refs #696. Turn off stack limit setting on Linux.

I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.

8 years agoRelease 0.2.16 rc1 v0.2.16.rc1
Zhang Xianyi [Tue, 23 Feb 2016 23:29:21 +0000 (18:29 -0500)]
Release 0.2.16 rc1

8 years agoFix c/zaxpyc kernel bug on Cortex-A57.
Zhang Xianyi [Tue, 23 Feb 2016 22:47:53 +0000 (22:47 +0000)]
Fix c/zaxpyc kernel bug on Cortex-A57.

8 years agoRefs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
Zhang Xianyi [Fri, 19 Feb 2016 22:56:07 +0000 (17:56 -0500)]
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.

On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.

8 years ago[av skip] Fix utest makefile bug on travis ci.
Zhang Xianyi [Fri, 19 Feb 2016 16:21:43 +0000 (00:21 +0800)]
[av skip] Fix utest makefile bug on travis ci.

8 years agoFix makefile bug for utest.
Zhang Xianyi [Thu, 18 Feb 2016 22:01:48 +0000 (17:01 -0500)]
Fix makefile bug for utest.

8 years agoFix compiling bug on ARM Cortex-A57.
Zhang Xianyi [Sat, 13 Feb 2016 15:38:52 +0000 (15:38 +0000)]
Fix compiling bug on ARM Cortex-A57.

8 years agoUpdate readme.
Zhang Xianyi [Fri, 12 Feb 2016 16:33:53 +0000 (00:33 +0800)]
Update readme.

8 years agoRun utest when building.
Zhang Xianyi [Fri, 12 Feb 2016 16:33:31 +0000 (00:33 +0800)]
Run utest when building.

8 years agoEnable utest for appveyor.
Zhang Xianyi [Fri, 12 Feb 2016 06:50:20 +0000 (01:50 -0500)]
Enable utest for appveyor.

8 years agoAdd utest for CMake.
Zhang Xianyi [Thu, 11 Feb 2016 21:38:13 +0000 (05:38 +0800)]
Add utest for CMake.

8 years agoAdded mising lapacke files for CMake.
Zhang Xianyi [Thu, 11 Feb 2016 21:28:16 +0000 (05:28 +0800)]
Added mising lapacke files for CMake.

8 years agoAdd gemm3m building for CMake.
Zhang Xianyi [Thu, 11 Feb 2016 21:02:51 +0000 (05:02 +0800)]
Add gemm3m building for CMake.

8 years agoUpdate ctest.h from github.com:xianyi/ctest.git.
Zhang Xianyi [Thu, 11 Feb 2016 21:01:57 +0000 (05:01 +0800)]
Update ctest.h from github.com:xianyi/ctest.git.

8 years agoRefs #707. Bugfix for previous commit.
Zhang Xianyi [Wed, 10 Feb 2016 21:14:53 +0000 (05:14 +0800)]
Refs #707. Bugfix for previous commit.

8 years agoRefs #707. Add BUILD_LAPACK_DEPRECATED flag in Makefile.rule.
Zhang Xianyi [Wed, 10 Feb 2016 20:22:53 +0000 (04:22 +0800)]
Refs #707. Add BUILD_LAPACK_DEPRECATED flag in Makefile.rule.

If you want to build LAPACK deprecated functions since LAPACK 3.6.0

make BUILD_LAPACK_DEPRECATED=1

8 years agoRefs #727. Align stack buffer address on 32-bytes.
Zhang Xianyi [Wed, 10 Feb 2016 19:51:26 +0000 (03:51 +0800)]
Refs #727. Align stack buffer address on 32-bytes.

8 years agoMerge pull request #780 from jeromerobert/bug727
Zhang Xianyi [Mon, 8 Feb 2016 18:24:40 +0000 (13:24 -0500)]
Merge pull request #780 from jeromerobert/bug727

Bug727

8 years agoFix zgemv.c compilation when stack allocation is disabled
Jerome Robert [Mon, 8 Feb 2016 11:05:02 +0000 (12:05 +0100)]
Fix zgemv.c compilation when stack allocation is disabled

8 years agoupdate CONTRIBUTORS.md
Jerome Robert [Mon, 18 Jan 2016 17:54:51 +0000 (18:54 +0100)]
update CONTRIBUTORS.md

8 years agoAdd benchmark/smallscaling.c
Jerome Robert [Sun, 3 Jan 2016 13:04:33 +0000 (14:04 +0100)]
Add benchmark/smallscaling.c

* Bench small matrices with multi-threading
* Close #727

8 years agozgemv: Add a workaround for #746
Jerome Robert [Sun, 24 Jan 2016 09:14:41 +0000 (10:14 +0100)]
zgemv: Add a workaround for #746

8 years agoImprove performances of ztrmv on small matrices
Jerome Robert [Thu, 14 Jan 2016 21:12:57 +0000 (22:12 +0100)]
Improve performances of ztrmv on small matrices

* Use stack allocation
* Disable multi-threading
* Ref #727

8 years agoUse stack allocation in zgemv and zger
Jerome Robert [Sun, 3 Jan 2016 13:01:12 +0000 (14:01 +0100)]
Use stack allocation in zgemv and zger

For better performance with small matrices
Ref #727

8 years agoFixed #778. Merge branch 'buffer51-develop' into develop
Zhang Xianyi [Fri, 5 Feb 2016 00:39:08 +0000 (08:39 +0800)]
Fixed #778. Merge branch 'buffer51-develop' into develop

8 years agoRestored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682.
buffer51 [Thu, 4 Feb 2016 22:20:07 +0000 (17:20 -0500)]
Restored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682.

8 years agoFixed linking error when compiling ARMv7 for Android (disabled -lpthread and added...
buffer51 [Thu, 4 Feb 2016 22:05:31 +0000 (17:05 -0500)]
Fixed linking error when compiling ARMv7 for Android (disabled -lpthread and added -Wl,--no-warn-mismatch).