Rajalakshmi Srinivasaraghavan [Thu, 12 May 2022 16:17:33 +0000 (11:17 -0500)]
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
Martin Kroeker [Wed, 4 May 2022 13:12:22 +0000 (15:12 +0200)]
Merge pull request #3619 from martin-frbg/fixup-3613
Initial attempt at proper cpu detection on RISCV
Martin Kroeker [Wed, 4 May 2022 06:58:56 +0000 (08:58 +0200)]
Initial attempt at proper cpu detection on RISCV
Martin Kroeker [Wed, 4 May 2022 05:22:47 +0000 (07:22 +0200)]
Merge pull request #3613 from Rabenda/fix-riscv
Fix riscv64 detect
Martin Kroeker [Wed, 4 May 2022 05:22:25 +0000 (07:22 +0200)]
Merge pull request #3618 from martin-frbg/issue3606
Automatically downgrade C910V to RISCV64_GENERIC if the compiler lacks vector support
Martin Kroeker [Tue, 3 May 2022 21:29:55 +0000 (23:29 +0200)]
Have getarch downgrade the RISCV C910V target to GENERIC if compiler lacks vector support
Martin Kroeker [Tue, 3 May 2022 21:27:50 +0000 (23:27 +0200)]
Add compiler check for RISCV vector support
Martin Kroeker [Sat, 30 Apr 2022 22:09:20 +0000 (00:09 +0200)]
Merge pull request #3616 from martin-frbg/issue3615
Fix CMAKE generator rules for ?laswp_ncopy and ?neg_tcopy kernels
Martin Kroeker [Sat, 30 Apr 2022 18:38:09 +0000 (20:38 +0200)]
rename lapack subtarget to lapack_overrides to avoid name clash with netlib in case-insensitive settings
Martin Kroeker [Sat, 30 Apr 2022 18:35:17 +0000 (20:35 +0200)]
Merge pull request #3614 from martin-frbg/clapackfix
Makefile fixes related to C_LAPACK, plus Travis CI fixes
Martin Kroeker [Sat, 30 Apr 2022 16:49:04 +0000 (18:49 +0200)]
Update .travis.yml
Martin Kroeker [Sat, 30 Apr 2022 16:33:00 +0000 (18:33 +0200)]
try to fix assembler errors on z13
Martin Kroeker [Sat, 30 Apr 2022 13:28:38 +0000 (15:28 +0200)]
Fix generator rules for ?laswp_ncopy and ?neg_tcopy
Martin Kroeker [Wed, 27 Apr 2022 20:18:22 +0000 (22:18 +0200)]
fix arch tags
Martin Kroeker [Wed, 27 Apr 2022 19:59:45 +0000 (21:59 +0200)]
Remove leftover debug output
Martin Kroeker [Wed, 27 Apr 2022 18:31:42 +0000 (20:31 +0200)]
Avoid adding -lgfortran with NOFORTRAN
Martin Kroeker [Wed, 27 Apr 2022 18:26:45 +0000 (20:26 +0200)]
Update NOFORTRAN message for fallback to C_LAPACK
Han Gao [Tue, 26 Apr 2022 18:29:43 +0000 (02:29 +0800)]
Fix riscv64 arch detect
Signed-off-by: Han Gao <gaohan@uniontech.com>
Han Gao [Tue, 26 Apr 2022 17:34:55 +0000 (01:34 +0800)]
Fix other arch build in detect.
When CORE is empty, use -march=loongson3a. Fix it.
Signed-off-by: Han Gao <gaohan@uniontech.com>
Martin Kroeker [Mon, 25 Apr 2022 13:51:34 +0000 (15:51 +0200)]
Merge pull request #3612 from nsait-linaro/fix-windows-make-build
build: minor fixes to build on windows with make
Niyas Sait [Sun, 24 Apr 2022 22:58:26 +0000 (23:58 +0100)]
build: minor fixes to build on windows with make
This patch contains following fixes
1. Fix to build without PIC flag
2. Define LAPACK_COMPLEX_STRUCTURE for windows. Builds are failing
without it and changes are consistent with the CMake rules defined
in system.cmake (line 576)
Martin Kroeker [Sun, 17 Apr 2022 15:49:38 +0000 (17:49 +0200)]
C_LAPACK: Fixes to make it compile with MSVC (#3605)
* Fix f2c-like support functions to compile with MSVC, and
re-enable C_LAPACK for MSVC in CMAKE
* Add MSVC&flang build to Azure CI in order to check C_LAPACK correctness
Martin Kroeker [Sat, 16 Apr 2022 10:54:35 +0000 (12:54 +0200)]
Merge pull request #3607 from martin-frbg/issue3603
Fix undefined PREFETCHSIZEs in PPC440 GEMV kernels
Martin Kroeker [Sat, 16 Apr 2022 08:04:27 +0000 (10:04 +0200)]
fix undefined prefetchsizes
Martin Kroeker [Sat, 16 Apr 2022 08:00:10 +0000 (10:00 +0200)]
fix undefined prefetchsize
Martin Kroeker [Mon, 11 Apr 2022 17:31:26 +0000 (19:31 +0200)]
Merge pull request #3604 from mmuetzel/ci
Adapt commands for tests with GNU make.
Markus Mützel [Mon, 11 Apr 2022 09:45:05 +0000 (11:45 +0200)]
Adapt commands for tests with GNU make.
Martin Kroeker [Sat, 9 Apr 2022 20:38:58 +0000 (22:38 +0200)]
Use f2c translations of LAPACK when no Fortran compiler is available (#3539)
* Add C equivalents of the Fortran routines from Reference-LAPACK as fallbacks, and C_LAPACK variable to trigger their use
Martin Kroeker [Sat, 9 Apr 2022 20:25:15 +0000 (22:25 +0200)]
Merge pull request #3601 from mmuetzel/ci
Consolidate actions on GitHub runners.
Martin Kroeker [Sat, 9 Apr 2022 20:23:45 +0000 (22:23 +0200)]
Merge pull request #3602 from martin-frbg/fixup3600
Fix missing braces from previous commit (PR3600)
Martin Kroeker [Sat, 9 Apr 2022 18:03:36 +0000 (20:03 +0200)]
Fix missing braces from previous commit (PR3600)
Markus Mützel [Sat, 9 Apr 2022 16:46:27 +0000 (18:46 +0200)]
Consolidate actions on GitHub runners.
Re-organize build matrix for Ubuntu and MacOS runners.
Don't start runners that don't do anything.
Run tests.
Martin Kroeker [Sat, 9 Apr 2022 15:14:24 +0000 (17:14 +0200)]
Disable flang (over-)optimizations in BLAS tests (#3600)
* limit flang optimizations to -O2
Martin Kroeker [Thu, 7 Apr 2022 12:25:15 +0000 (14:25 +0200)]
Merge pull request #3593 from e4t/Fix_build_targets_Makefile_prebuild
Fix build targets in Makefile.prebuild
Martin Kroeker [Thu, 7 Apr 2022 12:24:19 +0000 (14:24 +0200)]
Prevent powershell from adding a BOM to test input (#3595)
* Prevent addition of a BOM to test input (which would distort the names of output files)
Egbert Eich [Mon, 21 Mar 2022 10:16:49 +0000 (11:16 +0100)]
Fix build targets in Makefile.prebuild
- config.h was used as target even when it wasn't generated.
This only worked because the 'dummy' target always triggers
a full rebuild.
It is however better to specify the exact target that is to
be rebuilt do avoid confusion.
- Explicitly mark 'dummy' as a 'phony' target.
Signed-off-by: Egbert Eich <eich@suse.com>
Martin Kroeker [Sun, 3 Apr 2022 17:53:38 +0000 (19:53 +0200)]
Merge pull request #3590 from mmuetzel/ci-msys2
Add action for MSYS2 builds.
Markus Mützel [Thu, 31 Mar 2022 09:07:18 +0000 (11:07 +0200)]
Add action for MSYS2 builds.
Martin Kroeker [Tue, 29 Mar 2022 18:04:04 +0000 (20:04 +0200)]
Merge pull request #3575 from mmuetzel/lapacke-win64
Fix LAPACKE with 64-bit indexing on Windows.
Martin Kroeker [Tue, 29 Mar 2022 17:36:27 +0000 (19:36 +0200)]
Merge pull request #3588 from martin-frbg/fix3586
Fix mistaken declaration of CortexX1 as ArmV9 in PR#3586
Martin Kroeker [Tue, 29 Mar 2022 17:35:56 +0000 (19:35 +0200)]
Merge pull request #3589 from e4t/Exclude_paramter.c_symbols_with_DYNAMIC_ARCH
Do not include symbols defined in driver/others/parameter.c in DYNAMI…
Egbert Eich [Sun, 13 Mar 2022 09:57:59 +0000 (10:57 +0100)]
Do not include symbols defined in driver/others/parameter.c in DYNAMIC_ARCH
driver/others/parameter.c does not get build during DYNAMIC_ARCH, thus,
do not declare its symbols. This will make the build fail early and in
an obvious way if functions are trying to use these symbols.
Signed-off-by: Egbert Eich <eich@suse.com>
Martin Kroeker [Mon, 28 Mar 2022 16:10:08 +0000 (18:10 +0200)]
Update param.h
Martin Kroeker [Mon, 28 Mar 2022 15:40:27 +0000 (17:40 +0200)]
Cortex X1 is only Arm8.2
Martin Kroeker [Mon, 28 Mar 2022 15:37:06 +0000 (17:37 +0200)]
fix defines for CORTEX-X
Martin Kroeker [Mon, 28 Mar 2022 15:31:26 +0000 (17:31 +0200)]
CortexX1 is only ArmV8
Martin Kroeker [Mon, 28 Mar 2022 15:28:29 +0000 (17:28 +0200)]
CortexX1 is ARMV8 like A7x
Martin Kroeker [Mon, 28 Mar 2022 15:18:56 +0000 (17:18 +0200)]
CortexX1 is only ARMV8
Martin Kroeker [Mon, 28 Mar 2022 12:58:32 +0000 (14:58 +0200)]
Merge pull request #3587 from e4t/fix_avx512
Use CC and full command line instead of hard-coding gcc for AVX512 ch…
Egbert Eich [Mon, 28 Mar 2022 06:14:52 +0000 (08:14 +0200)]
Use CC and full command line instead of hard-coding gcc for AVX512 checking
Hard-coding gcc may not provide incorrect results when a different compiler
for the target build is used. To remain in sync with the main call to c_check,
pass the full command line.
Signed-off-by: Egbert Eich <eich@suse.com>
Martin Kroeker [Sun, 27 Mar 2022 16:12:21 +0000 (18:12 +0200)]
Merge pull request #3586 from martin-frbg/arm64cpus
Initial support for M1 on Linux, Phytium FT2000 series, ARMV9 Cortex X1,X2,A510,A710
Martin Kroeker [Sun, 27 Mar 2022 13:29:20 +0000 (15:29 +0200)]
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
Martin Kroeker [Sun, 27 Mar 2022 13:26:42 +0000 (15:26 +0200)]
Add initial support for ARMV9 Cortex 510/710/X1/X2
Martin Kroeker [Sun, 27 Mar 2022 13:24:40 +0000 (15:24 +0200)]
Add initial support for M1 on Linux, Phytium FT2xxx series, ARM Cortex 510/710/X1/X2
Martin Kroeker [Sun, 27 Mar 2022 13:19:26 +0000 (15:19 +0200)]
Merge pull request #3585 from martin-frbg/issue3581
Revert accidental change of generic ARMV8 DGEMM parameters from #3425
Martin Kroeker [Sun, 27 Mar 2022 11:10:47 +0000 (13:10 +0200)]
Revert accidental change of generic ARMV8 DGEMM parameters from #3425
Martin Kroeker [Fri, 25 Mar 2022 13:35:15 +0000 (14:35 +0100)]
Merge pull request #3584 from martin-frbg/ctestskip
Add a (CMAKE) option to skip the LAPACK testsuite and use it in Azure CI
Markus Mützel [Fri, 25 Mar 2022 12:37:15 +0000 (13:37 +0100)]
Add support for Intel Fortran compilers.
Port changes from upstream Reference-LAPACK.
Martin Kroeker [Thu, 24 Mar 2022 20:25:16 +0000 (21:25 +0100)]
Disable the LAPACK testsuite for the Windows clang/flang build as it takes too long
Martin Kroeker [Thu, 24 Mar 2022 20:23:28 +0000 (21:23 +0100)]
Add LAPACK-like option to omit the LAPACK testsuite
Larson, Eric [Fri, 24 Sep 2021 20:03:59 +0000 (13:03 -0700)]
ILP support
long's in windows are 4 bytes (MSVS, intel compilers). Use int64_t and int32_t
to ensure 8 byte integers for ILP interface.
support 8 byte integer flag for intel ifort compiler
Aisha Tammy [Sun, 1 Nov 2020 02:43:56 +0000 (02:43 +0000)]
create INDEX64 target
Martin Kroeker [Thu, 24 Mar 2022 10:33:00 +0000 (11:33 +0100)]
Merge pull request #3580 from martin-frbg/dynx86_sbgemm
Remove extraneous (and wrong) definition of sbgemm_r on x86_64
Martin Kroeker [Thu, 24 Mar 2022 07:28:37 +0000 (08:28 +0100)]
Merge pull request #3579 from martin-frbg/issue3557-2
Fix malfunctioning AVX512 check
Martin Kroeker [Wed, 23 Mar 2022 19:05:32 +0000 (20:05 +0100)]
Remove extraneous (and wrong) definition of sbgemm_r on x86_64
Martin Kroeker [Wed, 23 Mar 2022 18:13:54 +0000 (19:13 +0100)]
Merge branch 'xianyi:develop' into issue3557-2
Martin Kroeker [Wed, 23 Mar 2022 14:48:58 +0000 (15:48 +0100)]
Fix checks for AVX512 and atomics
Martin Kroeker [Wed, 23 Mar 2022 14:22:13 +0000 (15:22 +0100)]
Revert AVX512 capability check from PR #1980 (moved to build)
Martin Kroeker [Wed, 23 Mar 2022 14:19:55 +0000 (15:19 +0100)]
Utilize compiler AVX512 capability info from c_check when building getarch
Martin Kroeker [Wed, 23 Mar 2022 10:28:13 +0000 (11:28 +0100)]
Merge pull request #3561 from AlessioZanga/patch-msvc
Remove MSVC limitation
Martin Kroeker [Wed, 23 Mar 2022 06:19:15 +0000 (07:19 +0100)]
Merge pull request #3576 from martin-frbg/cmaketestbom
Skip BLAS tests if Windows powershell added a BOM
Martin Kroeker [Wed, 23 Mar 2022 06:18:45 +0000 (07:18 +0100)]
Merge pull request #3577 from martin-frbg/azure_win2022
Update Windows jobs in Azure CI to use Windows2022
Martin Kroeker [Tue, 22 Mar 2022 20:51:09 +0000 (21:51 +0100)]
Update Windows jobs in Azure CI to use Windows2022
Martin Kroeker [Tue, 22 Mar 2022 20:37:55 +0000 (21:37 +0100)]
Skip tests if Windows powershell added a BOM
Martin Kroeker [Sat, 19 Mar 2022 08:21:56 +0000 (09:21 +0100)]
Merge pull request #3574 from AdamNiederer/fix-dynamic-list-compilation
Fix broken elif in dynamic.c
Adam Niederer [Fri, 18 Mar 2022 00:02:39 +0000 (20:02 -0400)]
Fix broken elif in dynamic.c
This fixes compilation in the following case:
$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
Martin Kroeker [Sat, 12 Mar 2022 12:40:17 +0000 (13:40 +0100)]
Merge pull request #3567 from cenewcombe/develop
Fix unsafe read of Y in zsymv_L_sse2.S
Caroline Newcombe [Fri, 11 Mar 2022 17:56:33 +0000 (11:56 -0600)]
fix unsafe read of Y in assembly kernel
Martin Kroeker [Fri, 11 Mar 2022 13:29:30 +0000 (14:29 +0100)]
Merge pull request #3565 from jonaszhou1/develop
Support Zhaoxin/Centaur kh40000 as ZEN
Martin Kroeker [Fri, 11 Mar 2022 13:27:27 +0000 (14:27 +0100)]
Merge pull request #3566 from martin-frbg/configtls
Report USE_TLS in get_config output if set
Martin Kroeker [Thu, 10 Mar 2022 15:19:29 +0000 (16:19 +0100)]
Report USE_TLS if set
JonasZhou [Fri, 4 Mar 2022 09:14:52 +0000 (17:14 +0800)]
Support Zhaoxin/Centaur kh40000 as ZEN
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
AlessioZanga [Sat, 5 Mar 2022 22:35:29 +0000 (23:35 +0100)]
Change `BUILD_WITHOUT_LAPACK` to `OFF` by default
Alessio Zanga [Fri, 4 Mar 2022 23:07:01 +0000 (00:07 +0100)]
Remove MSVC limitation
Martin Kroeker [Mon, 28 Feb 2022 07:28:02 +0000 (08:28 +0100)]
Merge pull request #3550 from guowangy/smatrix-mask-fix
Small Matrix: use proper inline asm input constraint for AVX512 mask
Wangyang Guo [Mon, 28 Feb 2022 03:22:31 +0000 (03:22 +0000)]
Small Matrix: use proper inline asm input constraint for AVX512 mask
Martin Kroeker [Sat, 26 Feb 2022 20:49:05 +0000 (21:49 +0100)]
Merge pull request #3549 from martin-frbg/issue3543
Annotate LAPACKE_lsame with attribute const for GCC(+compatible)
Martin Kroeker [Sat, 26 Feb 2022 20:48:39 +0000 (21:48 +0100)]
Merge pull request #3548 from martin-frbg/rela-gemmt
Enable the ?GEMMT functions in ReLAPACK
Martin Kroeker [Sat, 26 Feb 2022 18:27:34 +0000 (19:27 +0100)]
Annotate LAPACKE_lsame with the const attribute for GCC and compatible compilers
Martin Kroeker [Sat, 26 Feb 2022 18:24:27 +0000 (19:24 +0100)]
Fix xGEMMT argument lists
Martin Kroeker [Sat, 26 Feb 2022 18:23:40 +0000 (19:23 +0100)]
Enable xGEMMT functions
Martin Kroeker [Fri, 25 Feb 2022 20:54:11 +0000 (21:54 +0100)]
Merge pull request #3547 from martin-frbg/issue3540-2
More build fixes for CooperLake with BFLOAT16 and DYNAMIC_ARCH
Martin Kroeker [Fri, 25 Feb 2022 14:36:02 +0000 (15:36 +0100)]
really fix definition of SHUFFLE_MAGIC_NO
Martin Kroeker [Fri, 25 Feb 2022 14:33:02 +0000 (15:33 +0100)]
Remove stray $
Martin Kroeker [Fri, 25 Feb 2022 09:05:36 +0000 (10:05 +0100)]
Declare SHUFFLE_MAGIC_NO as const to placate clang
Martin Kroeker [Fri, 25 Feb 2022 09:04:00 +0000 (10:04 +0100)]
Define sbgemm_r to fix DYNAMIC_ARCH builds
Martin Kroeker [Wed, 23 Feb 2022 23:00:00 +0000 (00:00 +0100)]
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
Martin Kroeker [Wed, 23 Feb 2022 22:57:57 +0000 (23:57 +0100)]
Merge pull request #3544 from giordano/mg/gcc6
Fix compilation of Skylake AVX512 kernels with GCC 6
Mosè Giordano [Wed, 23 Feb 2022 22:51:59 +0000 (22:51 +0000)]
Fix compilation of Skylake AVX512 kernels with GCC 6
Martin Kroeker [Wed, 23 Feb 2022 22:13:53 +0000 (23:13 +0100)]
Merge pull request #3541 from martin-frbg/issue3530
Fix compilation for SkylakeX with gcc 6.x