wernsaar [Mon, 30 Sep 2013 16:03:56 +0000 (18:03 +0200)]
changed some values for arm
wernsaar [Mon, 30 Sep 2013 15:31:23 +0000 (17:31 +0200)]
updated dgemm_kernel_8x2_vfpv3.S
wernsaar [Sun, 29 Sep 2013 17:42:33 +0000 (19:42 +0200)]
add modified c_check perl program
wernsaar [Sun, 29 Sep 2013 16:55:21 +0000 (18:55 +0200)]
added Makefile.arm
wernsaar [Sun, 29 Sep 2013 15:46:23 +0000 (17:46 +0200)]
changed dgemm_kernel to use fused multiply add
wernsaar [Sat, 28 Sep 2013 17:13:47 +0000 (19:13 +0200)]
modified Makefile.L3 for ARM
wernsaar [Sat, 28 Sep 2013 17:10:32 +0000 (19:10 +0200)]
common files modified for ARM
wernsaar [Sat, 28 Sep 2013 17:02:25 +0000 (19:02 +0200)]
initial checkin of kernel/arm
Zhang Xianyi [Thu, 5 Sep 2013 07:39:45 +0000 (15:39 +0800)]
Added backers.
Lars Buitinck [Wed, 28 Aug 2013 15:20:16 +0000 (17:20 +0200)]
Merge pull request #290 from larsmans/missing-threshold
check if GEMM_MULTITHREAD_THRESHOLD defined in gemm.c
Set a fallback value.
Zhang Xianyi [Wed, 28 Aug 2013 16:26:16 +0000 (09:26 -0700)]
Merge pull request #291 from larsmans/fix-makefile-prefix
fix default prefix handling in makefiles
Zhang Xianyi [Wed, 28 Aug 2013 16:25:23 +0000 (09:25 -0700)]
Merge pull request #289 from larsmans/no-noconst
get rid of the generated cblas_noconst.h file
Lars Buitinck [Wed, 28 Aug 2013 15:39:54 +0000 (17:39 +0200)]
fix default prefix handling in makefiles
PREFIX wasn't communicated to Makefile.install (where it matters)
by Makefile. The result is that the default PREFIX is empty and
OpenBLAS was being installed in /lib.
Lars Buitinck [Wed, 28 Aug 2013 14:52:24 +0000 (16:52 +0200)]
get rid of the generated cblas_noconst.h file
Zhang Xianyi [Wed, 28 Aug 2013 13:26:37 +0000 (06:26 -0700)]
Merge pull request #288 from sebastien-villemot/develop
Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
Sébastien Villemot [Wed, 28 Aug 2013 12:27:59 +0000 (14:27 +0200)]
Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
The present patch verifies that, on machines declaring an Athlon CPU model and
family, the 3dnow and 3dnowext feature flags are indeed present. If they are
not, it fallbacks on the most generic x86 kernel. This prevents crashes due to
illegal instruction on qemu guests with a weird configuration.
Closes #272
Zhang Xianyi [Sat, 24 Aug 2013 14:46:18 +0000 (11:46 -0300)]
Merge branch 'bulldozer' into develop
Zhang Xianyi [Sat, 24 Aug 2013 05:09:49 +0000 (13:09 +0800)]
Refs #281. Detect __CYGWIN__ macro for Cygwin x86_64.
Signed-off-by: Zhang Xianyi <traits.zhang@gmail.com>
Zhang Xianyi [Fri, 23 Aug 2013 17:10:02 +0000 (01:10 +0800)]
Refs #281. Detect _WIN32 macro for Windows API.
http://www.mail-archive.com/bug-gnulib@gnu.org/msg05722.html
wernsaar [Sat, 17 Aug 2013 04:46:17 +0000 (06:46 +0200)]
removed unnessesary instructions from zgemm_kernel_2x2_bulldozer.S
wernsaar [Fri, 16 Aug 2013 18:23:34 +0000 (20:23 +0200)]
removed unnessesary instructions
Zhang Xianyi [Fri, 23 Aug 2013 08:27:17 +0000 (16:27 +0800)]
Refs #282. Fixed zgemv_n typo bug on Win64.
Zhang Xianyi [Wed, 21 Aug 2013 15:21:51 +0000 (08:21 -0700)]
Merge pull request #280 from ViralBShah/develop
Patch LAPACK XLASD4.f as discussed in JuliaLang/julia#2340
Viral B. Shah [Wed, 21 Aug 2013 13:44:07 +0000 (19:14 +0530)]
Patch LAPACK XLASD4.f as discussed in JuliaLang/julia#2340
Zhang Xianyi [Tue, 20 Aug 2013 16:03:25 +0000 (00:03 +0800)]
Refs #279. Provide ONLY_CBLAS flag. If you only need CBLAS without
a fortran compiler, please try make ONLY_CBLAS=1.
This mode only compiler CBLAS without BLAS fortran interface and LAPACK.
Zhang Xianyi [Mon, 12 Aug 2013 15:22:10 +0000 (23:22 +0800)]
Merge branch 'bulldozer' into develop
Zhang Xianyi [Fri, 9 Aug 2013 02:49:44 +0000 (10:49 +0800)]
Fixed #276. Merge branch 'wernsaar-develop' into bulldozer
Zhang Xianyi [Fri, 9 Aug 2013 02:48:46 +0000 (10:48 +0800)]
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
wernsaar [Thu, 8 Aug 2013 15:49:30 +0000 (17:49 +0200)]
modified KERNEL.BULLDOZER
wernsaar [Thu, 8 Aug 2013 05:14:08 +0000 (07:14 +0200)]
added dtrsm_kernel_RN_8x2_bulldozer.S
wernsaar [Mon, 5 Aug 2013 09:27:16 +0000 (11:27 +0200)]
dtrsm_kernel_LT_8x2_bulldozer.S performance optimization
Zhang Xianyi [Mon, 5 Aug 2013 08:17:15 +0000 (16:17 +0800)]
Refs #270 #268. Merge branch 'wernsaar-develop' into bulldozer
Zhang Xianyi [Mon, 5 Aug 2013 08:09:47 +0000 (16:09 +0800)]
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Zhang Xianyi [Mon, 5 Aug 2013 08:07:54 +0000 (16:07 +0800)]
Enable bulldozer kernels.
Zhang Xianyi [Mon, 5 Aug 2013 07:51:53 +0000 (15:51 +0800)]
Merge branch 'develop' into bulldozer
wernsaar [Sun, 4 Aug 2013 10:16:12 +0000 (12:16 +0200)]
modified dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sun, 4 Aug 2013 08:15:33 +0000 (10:15 +0200)]
modified dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sun, 4 Aug 2013 07:54:40 +0000 (09:54 +0200)]
added dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 13:40:51 +0000 (15:40 +0200)]
removed dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 10:19:29 +0000 (12:19 +0200)]
fixed bug in dgemv_t_bulldozer.S
wernsaar [Sat, 3 Aug 2013 09:43:25 +0000 (11:43 +0200)]
repaired trmm bug in sgemm_kernel_16x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 08:32:51 +0000 (10:32 +0200)]
repaired trmm bug in cgemm_kernel_4x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 08:17:08 +0000 (10:17 +0200)]
repaired trmm bug in zgemm_kernel_2x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 07:35:39 +0000 (09:35 +0200)]
repaired trmm bug in dgemm_kernel_8x2_bulldozer.S
Zhang Xianyi [Thu, 1 Aug 2013 15:57:19 +0000 (23:57 +0800)]
Merge branch 'hotfix-v0.2.8' into develop
Zhang Xianyi [Thu, 1 Aug 2013 15:52:43 +0000 (23:52 +0800)]
Update the doc for 0.2.8 version.
Zhang Xianyi [Wed, 31 Jul 2013 06:49:16 +0000 (14:49 +0800)]
OpenBLAS 0.2.8 rc1.
Zhang Xianyi [Wed, 31 Jul 2013 06:46:56 +0000 (14:46 +0800)]
Merge branch 'hotfix-v0.2.8' into develop
Zhang Xianyi [Wed, 31 Jul 2013 06:41:39 +0000 (14:41 +0800)]
Refs #266. Fixed the compiling bug with Open64 5.0.
wernsaar [Tue, 30 Jul 2013 18:18:57 +0000 (20:18 +0200)]
added generic trmm kernels and modified Makefile.L3
Zhang Xianyi [Mon, 29 Jul 2013 15:21:10 +0000 (23:21 +0800)]
Fixed #264 the memory leak bug in dtrtri_U.
Zhang Xianyi [Sat, 27 Jul 2013 14:37:57 +0000 (22:37 +0800)]
Fixed the FMA3 detection bug.
Zhang Xianyi [Fri, 26 Jul 2013 15:43:54 +0000 (23:43 +0800)]
Fixed #261. Use strncmp instead of a comparing trick.
Zhang Xianyi [Mon, 29 Jul 2013 07:42:00 +0000 (15:42 +0800)]
Fixed typo in getarch_2nd.c.
wernsaar [Sun, 28 Jul 2013 14:47:58 +0000 (16:47 +0200)]
added dtrsm_kernel_LT_8x2_bulldozer.S
Zhang Xianyi [Sun, 28 Jul 2013 09:39:24 +0000 (17:39 +0800)]
Refs #263. Rollback bulldozer and piledriver kernels to barcelona kernels.
Zhang Xianyi [Sun, 28 Jul 2013 04:38:25 +0000 (06:38 +0200)]
Merge branch 'develop' into bulldozer
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
Zhang Xianyi [Sat, 27 Jul 2013 16:09:40 +0000 (00:09 +0800)]
Refs #262. Added executable stack markings.
Zhang Xianyi [Sat, 27 Jul 2013 15:03:07 +0000 (23:03 +0800)]
Merge branch 'sfabbro-ldflags' into develop
Zhang Xianyi [Sat, 27 Jul 2013 15:01:36 +0000 (23:01 +0800)]
Fixed #260. Fixed generating 32-bit shared library on previous commit.
Zhang Xianyi [Sat, 27 Jul 2013 14:37:57 +0000 (22:37 +0800)]
Fixed the FMA3 detection bug.
Zhang Xianyi [Sat, 27 Jul 2013 14:19:54 +0000 (22:19 +0800)]
Merge branch 'ldflags' of https://github.com/sfabbro/OpenBLAS into sfabbro-ldflags
Zhang Xianyi [Fri, 26 Jul 2013 15:43:54 +0000 (23:43 +0800)]
Fixed #261. Use strncmp instead of a comparing trick.
Sebastien Fabbro [Wed, 24 Jul 2013 16:37:16 +0000 (09:37 -0700)]
Respect user's LDFLAGS
Zhang Xianyi [Thu, 25 Jul 2013 17:34:45 +0000 (01:34 +0800)]
Merge branch 'develop'
Zhang Xianyi [Thu, 25 Jul 2013 17:32:32 +0000 (01:32 +0800)]
Refs #259. Fixed missing LAPACK functions in shared library.
Zhang Xianyi [Tue, 23 Jul 2013 05:40:08 +0000 (13:40 +0800)]
Merge branch 'develop'
Zhang Xianyi [Tue, 23 Jul 2013 05:35:29 +0000 (22:35 -0700)]
Merge pull request #257 from staticfloat/develop
Add in return value for `interface/trtri.c`
Elliot Saba [Tue, 23 Jul 2013 00:02:06 +0000 (17:02 -0700)]
Fix xianyi/OpenBLAS#256
Zhang Xianyi [Mon, 22 Jul 2013 03:34:43 +0000 (11:34 +0800)]
Refs #255. Didn't use f77 compiler.
Zhang Xianyi [Sat, 20 Jul 2013 15:32:23 +0000 (23:32 +0800)]
Update CONTRIBUTORS.md
Zhang Xianyi [Sat, 20 Jul 2013 15:05:36 +0000 (23:05 +0800)]
Merge branch 'develop'
Zhang Xianyi [Sat, 20 Jul 2013 15:05:12 +0000 (23:05 +0800)]
Fixed #253. Update doc for v0.2.7 version.
Zhang Xianyi [Sat, 20 Jul 2013 14:33:35 +0000 (22:33 +0800)]
Merge branch 'loongson3b' into develop
Zhang Xianyi [Sat, 20 Jul 2013 14:32:38 +0000 (22:32 +0800)]
Merge branch 'loongson3a' into develop
Conflicts:
Makefile.system
Zhang Xianyi [Sat, 20 Jul 2013 03:35:27 +0000 (11:35 +0800)]
Fixed #254. Added the date of changes in contributors file.
Zhang Xianyi [Fri, 19 Jul 2013 00:38:03 +0000 (08:38 +0800)]
create contributor file.
wangqian [Thu, 18 Jul 2013 12:23:21 +0000 (20:23 +0800)]
Fixed a computational error in zgemm_kernel_4x4_sandy.S file.
Zhang Xianyi [Wed, 17 Jul 2013 07:19:07 +0000 (15:19 +0800)]
Ensure the correct stack alignment on Win32.
Zhang Xianyi [Tue, 16 Jul 2013 15:18:18 +0000 (23:18 +0800)]
Fixed typo in generating shared library on x86_64.
Zhang Xianyi [Tue, 16 Jul 2013 14:44:27 +0000 (22:44 +0800)]
Modified Makefile to avoid redundant echo.
Zhang Xianyi [Tue, 16 Jul 2013 09:45:00 +0000 (17:45 +0800)]
Modified Makefile.install
Zhang Xianyi [Mon, 15 Jul 2013 01:56:19 +0000 (09:56 +0800)]
Refs #225. Fixed a bug in GEMM OpenMP threading.
Zhang Xianyi [Sun, 14 Jul 2013 14:16:30 +0000 (22:16 +0800)]
Refs #191. A walk around for dtrtri_U single thread bug.
This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.
Zhang Xianyi [Sun, 14 Jul 2013 02:41:54 +0000 (10:41 +0800)]
Changed makefile for lapack.
Zhang Xianyi [Fri, 12 Jul 2013 13:41:12 +0000 (21:41 +0800)]
Updated travis.
Zhang Xianyi [Thu, 11 Jul 2013 15:49:29 +0000 (23:49 +0800)]
Update build matrix for Travis CI.
Zhang Xianyi [Thu, 11 Jul 2013 15:47:07 +0000 (23:47 +0800)]
Fixed the typo.
Zhang Xianyi [Thu, 11 Jul 2013 14:24:50 +0000 (22:24 +0800)]
Fixed generating dll bug in last commit.
Zhang Xianyi [Thu, 11 Jul 2013 13:41:44 +0000 (21:41 +0800)]
Fixed #251. Merge branch 'grisuthedragon-develop' into develop
grisuthedragon [Thu, 11 Jul 2013 11:39:27 +0000 (13:39 +0200)]
create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.
Zhang Xianyi [Wed, 10 Jul 2013 19:20:02 +0000 (03:20 +0800)]
Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
Zhang Xianyi [Wed, 10 Jul 2013 08:02:27 +0000 (16:02 +0800)]
Refs #248. Support LAPACK and LAPACKE with lsbcc.
For LAPACKE, use LAPACK_COMPLEX_STRUCTURE.
The reson is lsbcc didn't define complex I in complex.h.
Zhang Xianyi [Wed, 10 Jul 2013 08:01:03 +0000 (01:01 -0700)]
Merge pull request #249 from wernsaar/develop
replaced defined(DOUBLE) by !defined(XDOUBLE)
wernsaar [Tue, 9 Jul 2013 16:17:50 +0000 (18:17 +0200)]
replaced defined(DOUBLE) by !defined(XDOUBLE)
Zhang Xianyi [Tue, 9 Jul 2013 09:00:02 +0000 (17:00 +0800)]
Refs #247. Included lapack source codes. Avoid downloading tar.gz from netlib.org
Based on 3.4.2 version, apply patch.for_lapack-3.4.2.
Zhang Xianyi [Tue, 9 Jul 2013 08:26:59 +0000 (16:26 +0800)]
Fixed the typo in getarch.c
Zhang Xianyi [Tue, 9 Jul 2013 07:38:03 +0000 (15:38 +0800)]
Refs #248. Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.
Zhang Xianyi [Sun, 7 Jul 2013 17:07:05 +0000 (01:07 +0800)]
Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
Zhang Xianyi [Sat, 6 Jul 2013 15:06:43 +0000 (12:06 -0300)]
Support AMD Piledriver by bulldozer kernels.