platform/upstream/openblas.git
5 years agoAdd an AVX512 enabled DSCAL function
Arjan van de Ven [Sat, 11 Aug 2018 17:14:57 +0000 (17:14 +0000)]
Add an AVX512 enabled DSCAL function

written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough

5 years agoMerge pull request #1725 from fenrus75/axpy
Martin Kroeker [Sat, 11 Aug 2018 09:01:20 +0000 (11:01 +0200)]
Merge pull request #1725 from fenrus75/axpy

Add a AVX512 enabled SAXPY/DAXPY functions

5 years agoMerge pull request #1724 from fenrus75/sdot
Martin Kroeker [Sat, 11 Aug 2018 09:00:56 +0000 (11:00 +0200)]
Merge pull request #1724 from fenrus75/sdot

Add an AVX512 enabled SDOT function

5 years agoMerge pull request #1728 from martin-frbg/changelog
Martin Kroeker [Fri, 10 Aug 2018 11:24:36 +0000 (13:24 +0200)]
Merge pull request #1728 from martin-frbg/changelog

Add changes from the 0.3.x releases

5 years agoAdd changes from the 0.3.x releases
Martin Kroeker [Fri, 10 Aug 2018 11:23:47 +0000 (13:23 +0200)]
Add changes from the 0.3.x releases

fixes #1727

5 years agoAdd a AVX512 enabled SAXPY/DAXPY functions
Arjan van de Ven [Fri, 10 Aug 2018 02:58:32 +0000 (02:58 +0000)]
Add a AVX512 enabled SAXPY/DAXPY functions

written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough

5 years agoAdd an AVX512 enabled SDOT function
Arjan van de Ven [Fri, 10 Aug 2018 02:31:48 +0000 (02:31 +0000)]
Add an AVX512 enabled SDOT function

written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough

5 years agoMerge pull request #1721 from fenrus75/ddot2
Martin Kroeker [Thu, 9 Aug 2018 13:39:06 +0000 (15:39 +0200)]
Merge pull request #1721 from fenrus75/ddot2

Add an AVX512 enabled DDOT function

5 years agoAdd an AVX512 enabled DDOT function
Arjan van de Ven [Wed, 8 Aug 2018 02:59:11 +0000 (02:59 +0000)]
Add an AVX512 enabled DDOT function

written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough

5 years agoMerge pull request #1717 from martin-frbg/issue1708
Martin Kroeker [Mon, 6 Aug 2018 20:05:47 +0000 (22:05 +0200)]
Merge pull request #1717 from martin-frbg/issue1708

Add workaround for avx512 compilations on Cygwin

5 years agoAdd workaround for avx512 compilations on Cygwin
Martin Kroeker [Mon, 6 Aug 2018 14:40:32 +0000 (16:40 +0200)]
Add workaround for avx512 compilations on Cygwin

fixes #1708

5 years agoMerge pull request #1715 from stevengj/patch-1
Martin Kroeker [Sun, 5 Aug 2018 20:48:44 +0000 (22:48 +0200)]
Merge pull request #1715 from stevengj/patch-1

fix blasabs for windows

5 years agofix blasabs for windows
Steven G. Johnson [Sun, 5 Aug 2018 12:18:51 +0000 (08:18 -0400)]
fix blasabs for windows

Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.

5 years agoMerge pull request #1713 from martin-frbg/issue1710
Martin Kroeker [Sat, 4 Aug 2018 21:51:31 +0000 (23:51 +0200)]
Merge pull request #1713 from martin-frbg/issue1710

Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64

5 years agoMerge pull request #1709 from stevengj/patch-1
Martin Kroeker [Sat, 4 Aug 2018 21:51:10 +0000 (23:51 +0200)]
Merge pull request #1709 from stevengj/patch-1

fabs -> fabsl

5 years agofabs -> fabsl
Martin Kroeker [Sat, 4 Aug 2018 18:14:51 +0000 (20:14 +0200)]
fabs -> fabsl

5 years agoIntroduce blasabs() to switch between abs() and labs() for INTERFACE64
Martin Kroeker [Sat, 4 Aug 2018 18:07:59 +0000 (20:07 +0200)]
Introduce blasabs() to switch between abs() and labs() for INTERFACE64

5 years agoUse blasabs to switch between abs and labs as needed for INTERFACE64
Martin Kroeker [Sat, 4 Aug 2018 18:06:49 +0000 (20:06 +0200)]
Use blasabs to switch between abs and labs as needed for INTERFACE64

5 years agofabs -> fabsl
Steven G. Johnson [Fri, 3 Aug 2018 17:00:10 +0000 (13:00 -0400)]
fabs -> fabsl

Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.

5 years agoMerge pull request #1703 from wsttiger/cmake_fix
Martin Kroeker [Thu, 2 Aug 2018 21:48:42 +0000 (23:48 +0200)]
Merge pull request #1703 from wsttiger/cmake_fix

Set EXPORT_NAME to match OpenBLASConfig.cmake

5 years agoMerge pull request #1707 from extrowerk/haiku_support
Martin Kroeker [Thu, 2 Aug 2018 20:27:00 +0000 (22:27 +0200)]
Merge pull request #1707 from extrowerk/haiku_support

Haiku supporting patches

5 years agoAdded target_include_directories()
Scott Thornton [Thu, 2 Aug 2018 19:58:52 +0000 (14:58 -0500)]
Added target_include_directories()

5 years agoHaiku supporting patches
Zoltán Mizsei [Thu, 2 Aug 2018 18:49:14 +0000 (20:49 +0200)]
Haiku supporting patches

5 years agoMerge pull request #1706 from oon3m0oo/develop
Martin Kroeker [Thu, 2 Aug 2018 16:53:34 +0000 (18:53 +0200)]
Merge pull request #1706 from oon3m0oo/develop

Fix #1705 where we incorrectly calculate page locations.

5 years agoFix #1705 where we incorrectly calculate page locations.
Craig Donner [Thu, 2 Aug 2018 15:21:19 +0000 (16:21 +0100)]
Fix #1705 where we incorrectly calculate page locations.

Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly.  Now we detect if we've
found enough pages for the allocation and terminate the loop.

5 years agoSet EXPORT_NAME to match OpenBLASConfig.cmake
Scott Thornton [Mon, 30 Jul 2018 20:18:29 +0000 (15:18 -0500)]
Set EXPORT_NAME to match OpenBLASConfig.cmake

5 years agoSet version to 0.3.3.dev
Martin Kroeker [Mon, 30 Jul 2018 06:23:13 +0000 (08:23 +0200)]
Set version to 0.3.3.dev

5 years agoSet version to 0.3.3.dev
Martin Kroeker [Mon, 30 Jul 2018 06:22:38 +0000 (08:22 +0200)]
Set version to 0.3.3.dev

5 years agoMerge branch 'release-0.3.0' into develop
Martin Kroeker [Sun, 29 Jul 2018 20:37:09 +0000 (22:37 +0200)]
Merge branch 'release-0.3.0' into develop

5 years agoMerge pull request #1697 from martin-frbg/issue1696
Martin Kroeker [Wed, 25 Jul 2018 17:55:29 +0000 (19:55 +0200)]
Merge pull request #1697 from martin-frbg/issue1696

Do not treat WIndows UWB builds as cross-compiling

5 years agoDo not treat WIndows UWB builds as cross-compiling
Martin Kroeker [Tue, 24 Jul 2018 15:46:33 +0000 (17:46 +0200)]
Do not treat WIndows UWB builds as cross-compiling

5 years agoMerge pull request #1695 from martin-frbg/issue1692
Martin Kroeker [Sun, 22 Jul 2018 14:34:09 +0000 (16:34 +0200)]
Merge pull request #1695 from martin-frbg/issue1692

Unset memory table entry, not just the local pointer to it on shutdown

5 years agoUnset memory table entry, not just the temporary pointer to it on shutdown
Martin Kroeker [Sun, 22 Jul 2018 07:19:19 +0000 (09:19 +0200)]
Unset memory table entry, not just the temporary pointer to it on shutdown

to fix crash with multiple instances of OpenBLAS, #1692

5 years agoMerge pull request #1688 from martin-frbg/issue1673
Martin Kroeker [Thu, 19 Jul 2018 17:03:45 +0000 (19:03 +0200)]
Merge pull request #1688 from martin-frbg/issue1673

Temporarily disable special handling of OPENMP thread memory allocation

5 years agoTemporarily disable special handling of OPENMP thread memory allocation
Martin Kroeker [Thu, 19 Jul 2018 06:57:56 +0000 (08:57 +0200)]
Temporarily disable special handling of OPENMP thread memory allocation

for issue #1673

5 years agoMerge pull request #1681 from martin-frbg/issue1671
Martin Kroeker [Mon, 16 Jul 2018 20:47:05 +0000 (22:47 +0200)]
Merge pull request #1681 from martin-frbg/issue1671

Add cpu identification via mfpvr call for the BSDs

5 years agoMerge pull request #1684 from martin-frbg/issue1672
Martin Kroeker [Mon, 16 Jul 2018 20:46:49 +0000 (22:46 +0200)]
Merge pull request #1684 from martin-frbg/issue1672

Work around utest failures in the MIPS64 SICORTEX target

5 years agotypo fix
Martin Kroeker [Mon, 16 Jul 2018 10:56:39 +0000 (12:56 +0200)]
typo fix

5 years agoFix precision problem in DSDOT
Martin Kroeker [Sun, 15 Jul 2018 15:11:40 +0000 (17:11 +0200)]
Fix precision problem in DSDOT

5 years agoUse C kernels for default c/zAXPY, xROT, c/zSWAP
Martin Kroeker [Sun, 15 Jul 2018 15:09:55 +0000 (17:09 +0200)]
Use C kernels for default c/zAXPY, xROT, c/zSWAP

5 years agoAdd cpu identification via mfpvr call for the BSDs
Martin Kroeker [Thu, 12 Jul 2018 21:39:00 +0000 (23:39 +0200)]
Add cpu identification via mfpvr call for the BSDs

fixes #1671

5 years agoMerge pull request #1680 from martin-frbg/snprint
Martin Kroeker [Thu, 12 Jul 2018 12:05:13 +0000 (14:05 +0200)]
Merge pull request #1680 from martin-frbg/snprint

Fix wrong redefinitions of snprintf for older MSVC

5 years agoFix declaration of snprintf for older MSVC
Martin Kroeker [Thu, 12 Jul 2018 09:47:52 +0000 (11:47 +0200)]
Fix declaration of snprintf for older MSVC

_snprintf_s takes an additional (size) argument, so is no direct replacement.
(Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)

5 years agoFix definition of snprintf for MSVC
Martin Kroeker [Thu, 12 Jul 2018 09:42:25 +0000 (11:42 +0200)]
Fix definition of snprintf for MSVC

MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)

5 years agoMerge pull request #1678 from martin-frbg/issue1677
Martin Kroeker [Thu, 12 Jul 2018 07:21:34 +0000 (09:21 +0200)]
Merge pull request #1678 from martin-frbg/issue1677

Define snprintf for older versions of MSVC

5 years agoDefine snprintf for older versions of MSVC
Martin Kroeker [Thu, 12 Jul 2018 05:30:58 +0000 (07:30 +0200)]
Define snprintf for older versions of MSVC

for #1677

5 years agoMerge pull request #1667 from xianyi/revert-1642-develop
Martin Kroeker [Wed, 4 Jul 2018 06:27:21 +0000 (08:27 +0200)]
Merge pull request #1667 from xianyi/revert-1642-develop

Revert "Rewrite &= -> = and simplify the initial blocking phase."

5 years agoMerge pull request #1665 from martin-frbg/cpuid-ryzen2
Martin Kroeker [Wed, 4 Jul 2018 06:19:40 +0000 (08:19 +0200)]
Merge pull request #1665 from martin-frbg/cpuid-ryzen2

Add cpuid for AMD Ryzen 2

5 years agoMerge pull request #1663 from martin-frbg/issue1641
Martin Kroeker [Wed, 4 Jul 2018 06:19:11 +0000 (08:19 +0200)]
Merge pull request #1663 from martin-frbg/issue1641

Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave

5 years agoRevert "Rewrite &= -> = and simplify the initial blocking phase."
Martin Kroeker [Tue, 3 Jul 2018 19:42:28 +0000 (21:42 +0200)]
Revert "Rewrite &= -> = and simplify the initial blocking phase."

5 years agoAdd cpuid for AMD Ryzen 2
Martin Kroeker [Tue, 3 Jul 2018 19:03:24 +0000 (21:03 +0200)]
Add cpuid for AMD Ryzen 2

5 years agoAdd cpuid for AMD Ryzen 2
Martin Kroeker [Tue, 3 Jul 2018 19:01:35 +0000 (21:01 +0200)]
Add cpuid for AMD Ryzen 2

for #1664

5 years agoMerge pull request #1662 from martin-frbg/cmake-avx512
Martin Kroeker [Tue, 3 Jul 2018 15:40:09 +0000 (17:40 +0200)]
Merge pull request #1662 from martin-frbg/cmake-avx512

Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…

5 years agoDouble MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
Martin Kroeker [Tue, 3 Jul 2018 15:35:54 +0000 (17:35 +0200)]
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave

for #1641

5 years agoAdd -march=skylake-avx512 to AVX512 compile check and suppress its output
Martin Kroeker [Tue, 3 Jul 2018 12:41:44 +0000 (14:41 +0200)]
Add -march=skylake-avx512 to AVX512 compile check and suppress its output

5 years agoMerge pull request #1660 from martin-frbg/issue1659
Martin Kroeker [Mon, 2 Jul 2018 15:48:19 +0000 (17:48 +0200)]
Merge pull request #1660 from martin-frbg/issue1659

Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2

5 years agoFix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
Martin Kroeker [Mon, 2 Jul 2018 12:40:41 +0000 (14:40 +0200)]
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2

fixes 1659

5 years agoMerge pull request #1657 from martin-frbg/release-0.3.0 v0.3.1
Martin Kroeker [Sun, 1 Jul 2018 10:03:07 +0000 (12:03 +0200)]
Merge pull request #1657 from martin-frbg/release-0.3.0

Release 0.3.1

5 years agoset version number to 0.3.2.dev
Martin Kroeker [Sun, 1 Jul 2018 10:01:51 +0000 (12:01 +0200)]
set version number to 0.3.2.dev

5 years agoset version number to 0.3.2.dev
Martin Kroeker [Sun, 1 Jul 2018 10:01:16 +0000 (12:01 +0200)]
set version number to 0.3.2.dev

5 years agoremove dev suffix from version number
Martin Kroeker [Sun, 1 Jul 2018 09:59:47 +0000 (11:59 +0200)]
remove dev suffix from version number

5 years agoremove dev suffix from version number
Martin Kroeker [Sun, 1 Jul 2018 09:58:57 +0000 (11:58 +0200)]
remove dev suffix from version number

5 years agoMerge pull request #1648 from martin-frbg/nofort
Martin Kroeker [Sun, 1 Jul 2018 09:56:40 +0000 (11:56 +0200)]
Merge pull request #1648 from martin-frbg/nofort

Handle NOFORTRAN=0

5 years agoMerge pull request #1656 from xianyi/develop
Martin Kroeker [Sun, 1 Jul 2018 09:55:21 +0000 (11:55 +0200)]
Merge pull request #1656 from xianyi/develop

Update the 0.3 branch from develop

5 years agoMerge pull request #1655 from martin-frbg/issue1641
Martin Kroeker [Sun, 1 Jul 2018 06:41:22 +0000 (08:41 +0200)]
Merge pull request #1655 from martin-frbg/issue1641

Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS

5 years agoMerge pull request #1654 from martin-frbg/avx512check
Martin Kroeker [Sat, 30 Jun 2018 23:17:03 +0000 (01:17 +0200)]
Merge pull request #1654 from martin-frbg/avx512check

Add compiler option to avx512 test and hide test output

5 years agoFix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
Martin Kroeker [Sat, 30 Jun 2018 21:57:50 +0000 (23:57 +0200)]
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS

fixes #1641

5 years agoAdd compiler option to avx512 test and hide test output
Martin Kroeker [Sat, 30 Jun 2018 21:47:44 +0000 (23:47 +0200)]
Add compiler option to avx512 test and hide test output

5 years agoMerge pull request #1651 from martin-frbg/avx512-nodgemm
Martin Kroeker [Sat, 30 Jun 2018 15:48:03 +0000 (17:48 +0200)]
Merge pull request #1651 from martin-frbg/avx512-nodgemm

Disable the 16x2 DTRMM kernel on SkylakeX as well

5 years agoDisable the 16x2 DTRMM kernel on SkylakeX as well
Martin Kroeker [Sat, 30 Jun 2018 15:31:06 +0000 (17:31 +0200)]
Disable the 16x2 DTRMM kernel on SkylakeX as well

5 years agoMerge pull request #1650 from martin-frbg/avx512-nodgemm
Martin Kroeker [Sat, 30 Jun 2018 11:05:46 +0000 (13:05 +0200)]
Merge pull request #1650 from martin-frbg/avx512-nodgemm

Disable the AVX512 DGEMM kernel for now

5 years agoMerge pull request #1639 from martin-frbg/dyn_list
Martin Kroeker [Sat, 30 Jun 2018 11:05:30 +0000 (13:05 +0200)]
Merge pull request #1639 from martin-frbg/dyn_list

Add DYNAMIC_LIST option for user-defined list of dynamic targets

5 years agoDisable the AVX512 DGEMM kernel for now
Martin Kroeker [Sat, 30 Jun 2018 09:34:48 +0000 (11:34 +0200)]
Disable the AVX512 DGEMM kernel for now

due to #1643

5 years agoUpdate Makefile
Martin Kroeker [Tue, 26 Jun 2018 22:09:21 +0000 (00:09 +0200)]
Update Makefile

5 years agoMerge branch 'develop' into nofort
Martin Kroeker [Tue, 26 Jun 2018 22:07:32 +0000 (00:07 +0200)]
Merge branch 'develop' into nofort

5 years agoHandle NOFORTRAN=0
Martin Kroeker [Tue, 26 Jun 2018 22:00:27 +0000 (00:00 +0200)]
Handle NOFORTRAN=0

5 years agoMerge pull request #1647 from martin-frbg/armv7-dot
Martin Kroeker [Tue, 26 Jun 2018 20:27:30 +0000 (22:27 +0200)]
Merge pull request #1647 from martin-frbg/armv7-dot

Remove premature exits from ARMV7 xdot codes

5 years agoRemove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:46:42 +0000 (20:46 +0200)]
Remove premature exit for INC_X or INC_Y zero

5 years agoRemove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:45:57 +0000 (20:45 +0200)]
Remove premature exit for INC_X or INC_Y zero

5 years agoRemove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:45:00 +0000 (20:45 +0200)]
Remove premature exit for INC_X or INC_Y zero

5 years agoRemove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:44:13 +0000 (20:44 +0200)]
Remove premature exit for INC_X or INC_Y zero

5 years agoMerge pull request #1644 from martin-frbg/revert-filterout
Martin Kroeker [Tue, 26 Jun 2018 08:15:15 +0000 (10:15 +0200)]
Merge pull request #1644 from martin-frbg/revert-filterout

Revert changes to NOFORTRAN handling in Makefile

5 years agoRevert changes to NOFORTRAN handling from 952541e
Martin Kroeker [Tue, 26 Jun 2018 06:09:52 +0000 (08:09 +0200)]
Revert changes to NOFORTRAN handling from 952541e

5 years agoTry gradual fallback for cores not in the dynamic core list
Martin Kroeker [Mon, 25 Jun 2018 19:02:31 +0000 (21:02 +0200)]
Try gradual fallback for cores not in the dynamic core list

5 years agoMerge pull request #2 from martin-frbg/develop
Martin Kroeker [Mon, 25 Jun 2018 18:48:10 +0000 (20:48 +0200)]
Merge pull request #2 from martin-frbg/develop

merge develop

5 years agoMerge pull request #1 from xianyi/develop
Martin Kroeker [Mon, 25 Jun 2018 18:45:56 +0000 (20:45 +0200)]
Merge pull request #1 from xianyi/develop

Merge xianyi:develop into develop

5 years agoMerge pull request #1642 from oon3m0oo/develop
Martin Kroeker [Mon, 25 Jun 2018 17:23:40 +0000 (19:23 +0200)]
Merge pull request #1642 from oon3m0oo/develop

Rewrite &= -> = and simplify the initial blocking phase.

5 years agoRewrite &= -> = and simplify the initial blocking phase.
Craig Donner [Mon, 25 Jun 2018 12:53:11 +0000 (13:53 +0100)]
Rewrite &= -> = and simplify the initial blocking phase.

5 years agoAdd support for a user-defined list of dynamic targets
Martin Kroeker [Sat, 23 Jun 2018 17:42:15 +0000 (19:42 +0200)]
Add support for a user-defined list of dynamic targets

5 years agoAdd support for a user-defined list of dynamic targets
Martin Kroeker [Sat, 23 Jun 2018 17:41:32 +0000 (19:41 +0200)]
Add support for a user-defined list of dynamic targets

5 years agoMerge pull request #1638 from martin-frbg/issue1637
Martin Kroeker [Sat, 23 Jun 2018 13:01:02 +0000 (15:01 +0200)]
Merge pull request #1638 from martin-frbg/issue1637

Expose the CBLAS interface to the IxAMIN functions and have make build it

5 years agoExpose CBLAS interface to BLAS extensions iXamin
Martin Kroeker [Sat, 23 Jun 2018 11:31:09 +0000 (13:31 +0200)]
Expose CBLAS interface to BLAS extensions iXamin

5 years agoBuild cblas_iXamin interfaces
Martin Kroeker [Sat, 23 Jun 2018 11:27:30 +0000 (13:27 +0200)]
Build cblas_iXamin interfaces

5 years agoMerge pull request #1634 from oon3m0oo/develop
Martin Kroeker [Thu, 21 Jun 2018 19:01:03 +0000 (21:01 +0200)]
Merge pull request #1634 from oon3m0oo/develop

Fix data races reported by TSAN.

5 years agoUse BLAS rather than CBLAS in test_fork.c (#1626)
oon3m0oo [Thu, 21 Jun 2018 16:47:45 +0000 (17:47 +0100)]
Use BLAS rather than CBLAS in test_fork.c (#1626)

This is handy for people not using lapack.

5 years agoFix data races reported by TSAN.
Craig Donner [Thu, 21 Jun 2018 10:13:57 +0000 (11:13 +0100)]
Fix data races reported by TSAN.

5 years agoFurther improvements to memory.c. (#1625)
oon3m0oo [Wed, 20 Jun 2018 20:04:03 +0000 (21:04 +0100)]
Further improvements to memory.c. (#1625)

- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
  over-allocating; this saves having external structures do the bookkeeping, and
  reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
  my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
  read than before

5 years agoMerge pull request #1630 from martin-frbg/x86-march
Martin Kroeker [Wed, 20 Jun 2018 19:51:57 +0000 (21:51 +0200)]
Merge pull request #1630 from martin-frbg/x86-march

Add -march=skylake-avx512 to flags if target is skylake x

5 years agoMerge pull request #1631 from oon3m0oo/stack
Martin Kroeker [Wed, 20 Jun 2018 19:51:38 +0000 (21:51 +0200)]
Merge pull request #1631 from oon3m0oo/stack

Avoid declaring arrays of size 0 when making large stack allocations.

5 years agoAvoid declaring arrays of size 0 when making large stack allocations.
Craig Donner [Wed, 20 Jun 2018 16:03:18 +0000 (17:03 +0100)]
Avoid declaring arrays of size 0 when making large stack allocations.