platform/upstream/openblas.git
10 years agoMerge branch 'ldflags' of https://github.com/sfabbro/OpenBLAS into sfabbro-ldflags
Zhang Xianyi [Sat, 27 Jul 2013 14:19:54 +0000 (22:19 +0800)]
Merge branch 'ldflags' of https://github.com/sfabbro/OpenBLAS into sfabbro-ldflags

10 years agoFixed #261. Use strncmp instead of a comparing trick.
Zhang Xianyi [Fri, 26 Jul 2013 15:43:54 +0000 (23:43 +0800)]
Fixed #261. Use strncmp instead of a comparing trick.

10 years agoRespect user's LDFLAGS
Sebastien Fabbro [Wed, 24 Jul 2013 16:37:16 +0000 (09:37 -0700)]
Respect user's LDFLAGS

10 years agoRefs #259. Fixed missing LAPACK functions in shared library.
Zhang Xianyi [Thu, 25 Jul 2013 17:32:32 +0000 (01:32 +0800)]
Refs #259. Fixed missing LAPACK functions in shared library.

10 years agoMerge pull request #257 from staticfloat/develop
Zhang Xianyi [Tue, 23 Jul 2013 05:35:29 +0000 (22:35 -0700)]
Merge pull request #257 from staticfloat/develop

Add in return value for `interface/trtri.c`

10 years agoFix xianyi/OpenBLAS#256
Elliot Saba [Tue, 23 Jul 2013 00:02:06 +0000 (17:02 -0700)]
Fix xianyi/OpenBLAS#256

10 years agoRefs #255. Didn't use f77 compiler.
Zhang Xianyi [Mon, 22 Jul 2013 03:34:43 +0000 (11:34 +0800)]
Refs #255. Didn't use f77 compiler.

10 years agoUpdate CONTRIBUTORS.md
Zhang Xianyi [Sat, 20 Jul 2013 15:32:23 +0000 (23:32 +0800)]
Update CONTRIBUTORS.md

10 years agoFixed #253. Update doc for v0.2.7 version.
Zhang Xianyi [Sat, 20 Jul 2013 15:05:12 +0000 (23:05 +0800)]
Fixed #253. Update doc for v0.2.7 version.

10 years agoMerge branch 'loongson3b' into develop
Zhang Xianyi [Sat, 20 Jul 2013 14:33:35 +0000 (22:33 +0800)]
Merge branch 'loongson3b' into develop

10 years agoMerge branch 'loongson3a' into develop
Zhang Xianyi [Sat, 20 Jul 2013 14:32:38 +0000 (22:32 +0800)]
Merge branch 'loongson3a' into develop

Conflicts:
Makefile.system

10 years agoFixed #254. Added the date of changes in contributors file.
Zhang Xianyi [Sat, 20 Jul 2013 03:35:27 +0000 (11:35 +0800)]
Fixed #254. Added the date of changes in contributors file.

10 years agocreate contributor file.
Zhang Xianyi [Fri, 19 Jul 2013 00:38:03 +0000 (08:38 +0800)]
create contributor file.

10 years agoFixed a computational error in zgemm_kernel_4x4_sandy.S file.
wangqian [Thu, 18 Jul 2013 12:23:21 +0000 (20:23 +0800)]
Fixed a computational error in zgemm_kernel_4x4_sandy.S file.

10 years agoEnsure the correct stack alignment on Win32.
Zhang Xianyi [Wed, 17 Jul 2013 07:19:07 +0000 (15:19 +0800)]
Ensure the correct stack alignment on Win32.

10 years agoFixed typo in generating shared library on x86_64.
Zhang Xianyi [Tue, 16 Jul 2013 15:18:18 +0000 (23:18 +0800)]
Fixed typo in generating shared library on x86_64.

10 years agoModified Makefile to avoid redundant echo.
Zhang Xianyi [Tue, 16 Jul 2013 14:44:27 +0000 (22:44 +0800)]
Modified Makefile to avoid redundant echo.

10 years agoModified Makefile.install
Zhang Xianyi [Tue, 16 Jul 2013 09:45:00 +0000 (17:45 +0800)]
Modified Makefile.install

10 years agoRefs #225. Fixed a bug in GEMM OpenMP threading.
Zhang Xianyi [Mon, 15 Jul 2013 01:56:19 +0000 (09:56 +0800)]
Refs #225. Fixed a bug in GEMM OpenMP threading.

10 years agoRefs #191. A walk around for dtrtri_U single thread bug.
Zhang Xianyi [Sun, 14 Jul 2013 14:16:30 +0000 (22:16 +0800)]
Refs #191. A walk around for dtrtri_U single thread bug.

This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.

10 years agoChanged makefile for lapack.
Zhang Xianyi [Sun, 14 Jul 2013 02:41:54 +0000 (10:41 +0800)]
Changed makefile for lapack.

10 years agoUpdated travis.
Zhang Xianyi [Fri, 12 Jul 2013 13:41:12 +0000 (21:41 +0800)]
Updated travis.

10 years agoUpdate build matrix for Travis CI.
Zhang Xianyi [Thu, 11 Jul 2013 15:49:29 +0000 (23:49 +0800)]
Update build matrix for Travis CI.

10 years agoFixed the typo.
Zhang Xianyi [Thu, 11 Jul 2013 15:47:07 +0000 (23:47 +0800)]
Fixed the typo.

10 years agoFixed generating dll bug in last commit.
Zhang Xianyi [Thu, 11 Jul 2013 14:24:50 +0000 (22:24 +0800)]
Fixed generating dll bug in last commit.

10 years agoFixed #251. Merge branch 'grisuthedragon-develop' into develop
Zhang Xianyi [Thu, 11 Jul 2013 13:41:44 +0000 (21:41 +0800)]
Fixed #251. Merge branch 'grisuthedragon-develop' into develop

10 years agocreate openblas_get_parallel to retrieve information which
grisuthedragon [Thu, 11 Jul 2013 11:39:27 +0000 (13:39 +0200)]
create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.

10 years agoRefs #214, #221, #246. Fixed the getrf overflow bug on Windows.
Zhang Xianyi [Wed, 10 Jul 2013 19:20:02 +0000 (03:20 +0800)]
Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.

I used a smaller threshold since the stack size is 1MB on windows.

10 years agoRefs #248. Support LAPACK and LAPACKE with lsbcc.
Zhang Xianyi [Wed, 10 Jul 2013 08:02:27 +0000 (16:02 +0800)]
Refs #248. Support LAPACK and LAPACKE with lsbcc.

For LAPACKE, use LAPACK_COMPLEX_STRUCTURE.
The reson is lsbcc didn't define complex I in complex.h.

10 years agoMerge pull request #249 from wernsaar/develop
Zhang Xianyi [Wed, 10 Jul 2013 08:01:03 +0000 (01:01 -0700)]
Merge pull request #249 from wernsaar/develop

replaced defined(DOUBLE) by !defined(XDOUBLE)

10 years agoreplaced defined(DOUBLE) by !defined(XDOUBLE)
wernsaar [Tue, 9 Jul 2013 16:17:50 +0000 (18:17 +0200)]
replaced defined(DOUBLE) by !defined(XDOUBLE)

10 years agoRefs #247. Included lapack source codes. Avoid downloading tar.gz from netlib.org
Zhang Xianyi [Tue, 9 Jul 2013 09:00:02 +0000 (17:00 +0800)]
Refs #247. Included lapack source codes. Avoid downloading tar.gz from netlib.org

Based on 3.4.2 version, apply patch.for_lapack-3.4.2.

10 years agoFixed the typo in getarch.c
Zhang Xianyi [Tue, 9 Jul 2013 08:26:59 +0000 (16:26 +0800)]
Fixed the typo in getarch.c

10 years agoRefs #248. Fixed the LSB compatiable issue for BLAS only.
Zhang Xianyi [Tue, 9 Jul 2013 07:38:03 +0000 (15:38 +0800)]
Refs #248. Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.

10 years agoRefs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
Zhang Xianyi [Sun, 7 Jul 2013 17:07:05 +0000 (01:07 +0800)]
Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.

When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.

10 years agoSupport AMD Piledriver by bulldozer kernels.
Zhang Xianyi [Sat, 6 Jul 2013 15:06:43 +0000 (12:06 -0300)]
Support AMD Piledriver by bulldozer kernels.

10 years agoAdded Travis CI status image.
Zhang Xianyi [Fri, 5 Jul 2013 07:28:41 +0000 (15:28 +0800)]
Added Travis CI status image.

10 years agoUse quiet make for Travis CI.
Zhang Xianyi [Fri, 5 Jul 2013 06:52:57 +0000 (14:52 +0800)]
Use quiet make for Travis CI.

10 years agoInstall gfortran in Travis CI.
Zhang Xianyi [Fri, 5 Jul 2013 03:11:18 +0000 (11:11 +0800)]
Install gfortran in Travis CI.

10 years agoAdded travis.yml file.
Zhang Xianyi [Thu, 4 Jul 2013 15:30:53 +0000 (23:30 +0800)]
Added travis.yml file.

10 years agoImproved make clean on Mac OS X.
Zhang Xianyi [Tue, 2 Jul 2013 06:37:30 +0000 (14:37 +0800)]
Improved make clean on Mac OS X.

10 years agoRefs #221. Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC...
Zhang Xianyi [Tue, 2 Jul 2013 06:17:55 +0000 (14:17 +0800)]
Refs #221. Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.

10 years agoUse ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX.
Zhang Xianyi [Mon, 1 Jul 2013 08:09:05 +0000 (16:09 +0800)]
Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX.

10 years agoMerge pull request #242 from danluu/readme.haswell
Zhang Xianyi [Sun, 30 Jun 2013 16:40:32 +0000 (09:40 -0700)]
Merge pull request #242 from danluu/readme.haswell

Update README to reflect Haswell support, etc.

10 years agoFix miscellaneous typos
Dan Luu [Sun, 30 Jun 2013 16:36:13 +0000 (11:36 -0500)]
Fix miscellaneous typos

10 years agoFixed #217 openblas_config.h bug on Windows 64.
Zhang Xianyi [Sun, 30 Jun 2013 16:35:14 +0000 (00:35 +0800)]
Fixed #217 openblas_config.h bug on Windows 64.

10 years agoAdd Haswell support
Dan Luu [Sun, 30 Jun 2013 16:35:00 +0000 (11:35 -0500)]
Add Haswell support

10 years agoRefs #241. Add Haswell support (using sandybridge optimizations)
Dan Luu [Sat, 29 Jun 2013 22:26:56 +0000 (17:26 -0500)]
Refs #241. Add Haswell support (using sandybridge optimizations)

10 years agoFixed #239 bug in param.h about BARCELONA and BULLDOZER.
Zhang Xianyi [Sat, 29 Jun 2013 02:36:01 +0000 (10:36 +0800)]
Fixed #239 bug in param.h about BARCELONA and BULLDOZER.

10 years agoFixed #238 bug in lsame on x86.
Zhang Xianyi [Fri, 28 Jun 2013 14:43:41 +0000 (22:43 +0800)]
Fixed #238 bug in lsame on x86.

10 years agoMerge pull request #235 from wernsaar/develop
Zhang Xianyi [Sat, 22 Jun 2013 00:59:26 +0000 (17:59 -0700)]
Merge pull request #235 from wernsaar/develop

Added ddot, daxpy, dcopy kernels for AMD bulldozer.

10 years agoadded dcopy_bulldozer.S
wernsaar [Fri, 21 Jun 2013 14:06:51 +0000 (16:06 +0200)]
added dcopy_bulldozer.S

10 years agoadded ddot_bulldozer.S
wernsaar [Thu, 20 Jun 2013 14:15:09 +0000 (16:15 +0200)]
added ddot_bulldozer.S

10 years agoadded daxpy_bulldozer.S
wernsaar [Thu, 20 Jun 2013 12:07:54 +0000 (14:07 +0200)]
added daxpy_bulldozer.S

10 years agocleanup of dgemm_ncopy_8_bulldozer.S
wernsaar [Wed, 19 Jun 2013 17:31:38 +0000 (19:31 +0200)]
cleanup of dgemm_ncopy_8_bulldozer.S

10 years agoadded dgemv_t_bulldozer.S
wernsaar [Wed, 19 Jun 2013 15:32:42 +0000 (17:32 +0200)]
added dgemv_t_bulldozer.S

10 years agoMerge pull request #233 from wernsaar/develop
Zhang Xianyi [Wed, 19 Jun 2013 03:02:36 +0000 (20:02 -0700)]
Merge pull request #233 from wernsaar/develop

added dgemv_n and some faster gemm_copy routines to BULLDOZER.

10 years agoadded dgemm_ncopy_8_bulldozer.S
wernsaar [Tue, 18 Jun 2013 11:29:23 +0000 (13:29 +0200)]
added dgemm_ncopy_8_bulldozer.S

10 years agoadded gemm_tcopy_2_bulldozer.S
wernsaar [Tue, 18 Jun 2013 09:01:33 +0000 (11:01 +0200)]
added gemm_tcopy_2_bulldozer.S

10 years agoadded dgemm_tcopy_8_bulldozer.S
wernsaar [Mon, 17 Jun 2013 12:19:09 +0000 (14:19 +0200)]
added dgemm_tcopy_8_bulldozer.S

10 years agoadded gemm_ncopy_2_bulldozer.S
wernsaar [Mon, 17 Jun 2013 10:55:12 +0000 (12:55 +0200)]
added gemm_ncopy_2_bulldozer.S

10 years agocleanup of dgemv_n_bulldozer.S and optimization of inner loop
wernsaar [Sun, 16 Jun 2013 10:50:45 +0000 (12:50 +0200)]
cleanup of dgemv_n_bulldozer.S and optimization of inner loop

10 years agoadded dgemv_n_bulldozer.S
wernsaar [Sat, 15 Jun 2013 14:42:37 +0000 (16:42 +0200)]
added dgemv_n_bulldozer.S

11 years agoMerge pull request #230 from wernsaar/develop
Zhang Xianyi [Thu, 13 Jun 2013 14:29:27 +0000 (07:29 -0700)]
Merge pull request #230 from wernsaar/develop

Refs #230. New dgemm and sgemm Kernel for BULLDOZER

11 years agoRefs #231. Change the default C compiler to clang on Mac OSX.
Zhang Xianyi [Thu, 13 Jun 2013 14:15:19 +0000 (22:15 +0800)]
Refs #231. Change the default C compiler to clang on Mac OSX.

11 years agoperformance optimizations in sgemm_kernel_16x2_bulldozer.S
wernsaar [Thu, 13 Jun 2013 09:35:15 +0000 (11:35 +0200)]
performance optimizations in sgemm_kernel_16x2_bulldozer.S

11 years agoadded cgemm_kernel_4x2_bulldozer.S
wernsaar [Wed, 12 Jun 2013 13:55:27 +0000 (15:55 +0200)]
added cgemm_kernel_4x2_bulldozer.S

11 years agoadded zgemm_kernel_2x2_bulldozer.S
wernsaar [Tue, 11 Jun 2013 10:00:49 +0000 (12:00 +0200)]
added zgemm_kernel_2x2_bulldozer.S

11 years agoAdded UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
wernsaar [Sun, 9 Jun 2013 15:26:42 +0000 (17:26 +0200)]
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3

11 years agoadded new sgemm kernel for BULLDOZER
wernsaar [Sun, 9 Jun 2013 13:57:42 +0000 (15:57 +0200)]
added new sgemm kernel for BULLDOZER

11 years agochanged stack touching
wernsaar [Sat, 8 Jun 2013 08:43:08 +0000 (10:43 +0200)]
changed stack touching

11 years agocorrect GEMM_THREAD in param.h
wernsaar [Sat, 8 Jun 2013 08:03:59 +0000 (10:03 +0200)]
correct GEMM_THREAD in param.h

11 years agoNew dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
wernsaar [Sat, 8 Jun 2013 07:40:17 +0000 (09:40 +0200)]
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S

11 years agoRefs #227. Detected LLVM/Clang compiler.
Zhang Xianyi [Thu, 6 Jun 2013 15:43:40 +0000 (23:43 +0800)]
Refs #227. Detected LLVM/Clang compiler.

11 years agoRefs #124. Check XSAVE flag on x86 CPU.
Zhang Xianyi [Thu, 6 Jun 2013 14:50:43 +0000 (22:50 +0800)]
Refs #124. Check XSAVE flag on x86 CPU.

11 years agoChange LIBSUFFIX from .lib to .a on windows.
Zhang Xianyi [Tue, 4 Jun 2013 08:05:28 +0000 (16:05 +0800)]
Change LIBSUFFIX from .lib to .a on windows.

11 years agoRefs #223. Fixed s/dgemv bug on windows.
Zhang Xianyi [Tue, 4 Jun 2013 08:01:05 +0000 (16:01 +0800)]
Refs #223. Fixed s/dgemv bug on windows.

11 years agoFixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.
wangqian [Wed, 29 May 2013 11:48:31 +0000 (19:48 +0800)]
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.

11 years agoFixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
wangqian [Wed, 29 May 2013 05:23:12 +0000 (13:23 +0800)]
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.

11 years agoFixed the bug about testing the exist of lapack tar package.
Zhang Xianyi [Fri, 24 May 2013 07:52:35 +0000 (15:52 +0800)]
Fixed the bug about testing the exist of lapack tar package.

11 years agoRefs #205. Merge boegel's codes about downloading LAPACK.
Zhang Xianyi [Fri, 24 May 2013 07:29:10 +0000 (15:29 +0800)]
Refs #205. Merge boegel's codes about downloading LAPACK.

11 years agoFixed #199. Saved USE_THREAD switch for make install.
Zhang Xianyi [Fri, 24 May 2013 07:15:52 +0000 (15:15 +0800)]
Fixed #199. Saved USE_THREAD switch for make install.

11 years agoRefs #220. Support Power7 by old Power6 kernels.
Zhang Xianyi [Tue, 21 May 2013 14:59:45 +0000 (22:59 +0800)]
Refs #220. Support Power7 by old Power6 kernels.

11 years agoRefs #215. Fixed the compatible between <complex.h> and <complex> in C++.
Zhang Xianyi [Fri, 17 May 2013 08:41:05 +0000 (16:41 +0800)]
Refs #215. Fixed the compatible between <complex.h> and <complex> in C++.

11 years agoRefs #216. Revert the default value of GEMM_MULTITHREAD_THRESHOLD to 4.
Zhang Xianyi [Fri, 3 May 2013 01:08:54 +0000 (09:08 +0800)]
Refs #216. Revert the default value of GEMM_MULTITHREAD_THRESHOLD to 4.

11 years agochanged DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit
wernsaar [Tue, 30 Apr 2013 08:07:17 +0000 (10:07 +0200)]
changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit

11 years agobad performance with some data
wernsaar [Sun, 28 Apr 2013 09:14:23 +0000 (11:14 +0200)]
bad performance with some data

11 years agoremoved trsm_kernel_RT_4x4_bulldozer.S. wrong results
wernsaar [Sat, 27 Apr 2013 15:23:08 +0000 (17:23 +0200)]
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results

11 years agoadded trsm_kernel_RT_4x4_bulldozer.S
wernsaar [Sat, 27 Apr 2013 14:48:48 +0000 (16:48 +0200)]
added trsm_kernel_RT_4x4_bulldozer.S

11 years agoadded trsm_kernel_LT_4x4_bulldozer.S
wernsaar [Sat, 27 Apr 2013 12:30:00 +0000 (14:30 +0200)]
added trsm_kernel_LT_4x4_bulldozer.S

11 years agoprefetch improved. Defined 2 different kernels for inner loop
wernsaar [Sat, 27 Apr 2013 11:40:49 +0000 (13:40 +0200)]
prefetch improved. Defined 2 different kernels for inner loop

11 years agoRefs #210. Disable checking /lib/libpthread.so*.
Zhang Xianyi [Sat, 27 Apr 2013 07:02:04 +0000 (15:02 +0800)]
Refs #210. Disable checking /lib/libpthread.so*.

11 years agominor improvements and code cleanup
wernsaar [Fri, 26 Apr 2013 18:05:42 +0000 (20:05 +0200)]
minor improvements and code cleanup

11 years agoUpdated the mailing list for OpenBLAS.
Xianyi Zhang [Wed, 24 Apr 2013 16:45:42 +0000 (00:45 +0800)]
Updated the mailing list for OpenBLAS.

11 years agoUpdated the mailing list for OpenBLAS.
Xianyi Zhang [Wed, 24 Apr 2013 16:44:22 +0000 (00:44 +0800)]
Updated the mailing list for OpenBLAS.

11 years agoMerge pull request #213 from wernsaar/develop
Zhang Xianyi [Thu, 18 Apr 2013 06:56:09 +0000 (23:56 -0700)]
Merge pull request #213 from wernsaar/develop

Merged some improvements into dgemm_kernel_4x4_bulldozer.S.

11 years agoMerged some improvements into dgemm_kernel_4x4_bulldozer.S.
wernsaar [Tue, 16 Apr 2013 17:05:06 +0000 (19:05 +0200)]
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts

11 years agoAdded NO_PARALLEL_MAKE flag to disable parallel make.
Zhang Xianyi [Mon, 15 Apr 2013 13:37:30 +0000 (21:37 +0800)]
Added NO_PARALLEL_MAKE flag to disable parallel make.

11 years agoMerge pull request #211 from wernsaar/develop
Zhang Xianyi [Mon, 15 Apr 2013 07:20:55 +0000 (00:20 -0700)]
Merge pull request #211 from wernsaar/develop

New version of dgemm_kernel_4x4_bulldozer.S

11 years agoNew version of dgemm_kernel_4x4_bulldozer.S
wernsaar [Fri, 12 Apr 2013 15:55:51 +0000 (17:55 +0200)]
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops