Sunil K Pandey [Wed, 29 Dec 2021 17:17:28 +0000 (09:17 -0800)]
x86-64: Add vector atan2/atan2f implementation to libmvec
Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 17:11:23 +0000 (09:11 -0800)]
x86-64: Add vector cbrt/cbrtf implementation to libmvec
Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 17:05:18 +0000 (09:05 -0800)]
x86-64: Add vector sinh/sinhf implementation to libmvec
Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:59:16 +0000 (08:59 -0800)]
x86-64: Add vector expm1/expm1f implementation to libmvec
Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:53:16 +0000 (08:53 -0800)]
x86-64: Add vector cosh/coshf implementation to libmvec
Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:47:16 +0000 (08:47 -0800)]
x86-64: Add vector exp10/exp10f implementation to libmvec
Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:41:18 +0000 (08:41 -0800)]
x86-64: Add vector exp2/exp2f implementation to libmvec
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:35:22 +0000 (08:35 -0800)]
x86-64: Add vector hypot/hypotf implementation to libmvec
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:29:26 +0000 (08:29 -0800)]
x86-64: Add vector asin/asinf implementation to libmvec
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Wed, 29 Dec 2021 16:23:33 +0000 (08:23 -0800)]
x86-64: Add vector atan/atanf implementation to libmvec
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Florian Weimer [Tue, 28 Dec 2021 21:52:56 +0000 (22:52 +0100)]
elf: Add _dl_find_object function
It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).
_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed. It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).
It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.
The implementation uses software transactional memory, as suggested
by Torvald Riegel. Two copies of the supporting data structures are
used, also achieving full async-signal-safety.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Thu, 16 Dec 2021 18:26:36 +0000 (15:26 -0300)]
malloc: Remove memusage.h
And use machine-sp.h instead. The Linux implementation is based on
already provided CURRENT_STACK_FRAME (used on nptl code) and
STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.
Adhemerval Zanella [Thu, 16 Dec 2021 16:43:38 +0000 (13:43 -0300)]
malloc: Use hp-timing on libmemusage
Instead of reimplemeting on GETTIME macro.
Adhemerval Zanella [Thu, 16 Dec 2021 16:42:31 +0000 (13:42 -0300)]
Remove atomic-machine.h atomic typedefs
Now that memusage.c uses generic types we can remove them.
Adhemerval Zanella [Thu, 16 Dec 2021 15:32:17 +0000 (12:32 -0300)]
malloc: Remove atomic_* usage
These typedef are used solely on memusage and can be replaced with
generic types.
Thomas Petazzoni [Tue, 28 Dec 2021 12:09:49 +0000 (09:09 -0300)]
microblaze: Add missing implementation when !__ASSUME_TIME64_SYSCALLS
In commit
a92f4e6299fe0e3cb6f77e79de00817aece501ce ("linux: Add time64
pselect support"), a Microblaze specific implementation of
__pselect32() was added to cover the case of kernels < 3.15 which lack
the pselect6 system call.
This new file sysdeps/unix/sysv/linux/microblaze/pselect32.c takes
precedence over the default implementation
sysdeps/unix/sysv/linux/pselect32.c.
However sysdeps/unix/sysv/linux/pselect32.c provides an implementation
of __pselect32() which is needed when __ASSUME_TIME64_SYSCALLS is not
defined. On Microblaze, which is a 32-bit architecture,
__ASSUME_TIME64_SYSCALLS is only true for kernels >= 5.1.
Due to sysdeps/unix/sysv/linux/microblaze/pselect32.c taking
precedence over sysdeps/unix/sysv/linux/pselect32.c, it means that
when we are with a kernel >= 3.15 but < 5.1, we need a __pselect32()
implementation, but sysdeps/unix/sysv/linux/microblaze/pselect32.c
doesn't provide it, and sysdeps/unix/sysv/linux/pselect32.c which
would provide it is not compiled in.
This causes the following build failure on Microblaze with for example
Linux kernel headers 4.9:
[...]/build/libc_pic.os: in function `__pselect64':
(.text+0x120b44): undefined reference to `__pselect32'
collect2: error: ld returned 1 exit status
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Wed, 30 Jun 2021 20:33:57 +0000 (17:33 -0300)]
elf: Do not fail for failed dlmopen on audit modules (BZ #28061)
The dl_main sets the LM_ID_BASE to RT_ADD just before starting to
add load new shared objects. The state is set to RT_CONSISTENT just
after all objects are loaded.
However if a audit modules tries to dlmopen an inexistent module,
the _dl_open will assert that the namespace is in an inconsistent
state.
This is different than dlopen, since first it will not use
LM_ID_BASE and second _dl_map_object_from_fd is the sole responsible
to set and reset the r_state value.
So the assert on _dl_open can not really be seen if the state is
consistent, since _dt_main resets it. This patch removes the assert.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Mon, 19 Jul 2021 21:42:26 +0000 (18:42 -0300)]
elf: Issue audit la_objopen for vDSO
The vDSO is is listed in the link_map chain, but is never the subject of
an la_objopen call. A new internal flag __RTLD_VDSO is added that
acts as __RTLD_OPENEXEC to allocate the required 'struct auditstate'
extra space for the 'struct link_map'.
The return value from the callback is currently ignored, since there
is no PLT call involved by glibc when using the vDSO, neither the vDSO
are exported directly.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Wed, 30 Jun 2021 18:51:31 +0000 (15:51 -0300)]
elf: Add audit tests for modules with TLSDESC
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Wed, 30 Jun 2021 13:24:09 +0000 (10:24 -0300)]
elf: Avoid unnecessary slowdown from profiling with audit (BZ#15533)
The rtld-audit interfaces introduces a slowdown due to enabling
profiling instrumentation (as if LD_AUDIT implied LD_PROFILE).
However, instrumenting is only necessary if one of audit libraries
provides PLT callbacks (la_pltenter or la_pltexit symbols). Otherwise,
the slowdown can be avoided.
The following patch adjusts the logic that enables profiling to iterate
over all audit modules and check if any of those provides a PLT hook.
To keep la_symbind to work even without PLT callbacks, _dl_fixup now
calls the audit callback if the modules implements it.
Co-authored-by: Alexander Monakov <amonakov@ispras.ru>
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Thu, 22 Jul 2021 21:02:42 +0000 (18:02 -0300)]
elf: Add _dl_audit_pltexit
It consolidates the code required to call la_pltexit audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Thu, 22 Jul 2021 20:45:33 +0000 (17:45 -0300)]
elf: Add _dl_audit_pltenter
It consolidates the code required to call la_pltenter audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Thu, 22 Jul 2021 20:10:57 +0000 (17:10 -0300)]
elf: Add _dl_audit_preinit
It consolidates the code required to call la_preinit audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Tue, 20 Jul 2021 18:58:35 +0000 (15:58 -0300)]
elf: Add _dl_audit_symbind_alt and _dl_audit_symbind
It consolidates the code required to call la_symbind{32,64} audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Tue, 20 Jul 2021 17:04:51 +0000 (14:04 -0300)]
elf: Add _dl_audit_objclose
It consolidates the code required to call la_objclose audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Tue, 20 Jul 2021 16:47:36 +0000 (13:47 -0300)]
elf: Add _dl_audit_objsearch
It consolidates the code required to call la_objsearch audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Tue, 20 Jul 2021 14:03:34 +0000 (11:03 -0300)]
elf: Add _dl_audit_activity_map and _dl_audit_activity_nsid
It consolidates the code required to call la_activity audit
callback.
Also for a new Lmid_t the namespace link_map list are empty, so it
requires to check if before using it. This can happen for when audit
module is used along with dlmopen.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Adhemerval Zanella [Mon, 19 Jul 2021 18:47:51 +0000 (15:47 -0300)]
elf: Add _dl_audit_objopen
It consolidates the code required to call la_objopen audit callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Samuel Thibault [Tue, 28 Dec 2021 09:27:06 +0000 (10:27 +0100)]
hurd: Fix static-PIE startup
hurd initialization stages use RUN_HOOK to run various initialization
functions. That is however using absolute addresses which need to be
relocated, which is done later by csu. We can however easily make the
linker compute relative addresses which thus don't need a relocation.
The new SET_RELHOOK and RUN_RELHOOK macros implement this.
Samuel Thibault [Tue, 28 Dec 2021 09:15:52 +0000 (10:15 +0100)]
hurd: let csu initialize tls
Since
9cec82de715b ("htl: Initialize later"), we let csu initialize
pthreads. We can thus let it initialize tls later too, to better align
with the generic order. Initialization however accesses ports which
links/unlinks into the sigstate for unwinding. We can however easily
skip that during initialization.
Samuel Thibault [Mon, 27 Dec 2021 21:21:08 +0000 (22:21 +0100)]
hurd: Fix XFAIL-ing mallocfork2 tests
They are using setpshared but are outside the htl directory.
Samuel Thibault [Mon, 27 Dec 2021 21:15:43 +0000 (22:15 +0100)]
hurd: XFAIL more tests that require setpshared support
Samuel Thibault [Mon, 27 Dec 2021 21:10:15 +0000 (22:10 +0100)]
malloc: Add missing shared thread library flags
Samuel Thibault [Mon, 27 Dec 2021 20:22:14 +0000 (21:22 +0100)]
stdio-common: Fix %m sprintf test output for GNU/Hurd
GNU/Hurd has slightly different error messages for undefined numbers,
due to the notion of error subsystems.
Noah Goldstein [Sat, 25 Dec 2021 00:54:53 +0000 (18:54 -0600)]
x86: Optimize L(less_vec) case in memcmpeq-evex.S
No bug.
Optimizations are twofold.
1) Replace page cross and 0/1 checks with masked load instructions in
L(less_vec). In applications this reduces branch-misses in the
hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.
Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range. From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Sat, 25 Dec 2021 00:54:41 +0000 (18:54 -0600)]
x86: Optimize L(less_vec) case in memcmp-evex-movbe.S
No bug.
Optimizations are twofold.
1) Replace page cross and 0/1 checks with masked load instructions in
L(less_vec). In applications this reduces branch-misses in the
hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.
Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range. From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
H.J. Lu [Thu, 23 Dec 2021 14:48:24 +0000 (06:48 -0800)]
elf: Remove AArch64 from comment for AT_MINSIGSTKSZ
Remove AArch64 from comment for AT_MINSIGSTKSZ to match
commit
7cd60e43a6def40ecb75deb8decc677995970d0b
Author: Chang S. Bae <chang.seok.bae@intel.com>
Date: Tue May 18 13:03:15 2021 -0700
uapi/auxvec: Define the aux vector AT_MINSIGSTKSZ
Define AT_MINSIGSTKSZ in the generic uapi header. It is already used
as generic ABI in glibc's generic elf.h, and this define will prevent
future namespace conflicts. In particular, x86 is also using this
generic definition.
in Linux kernel 5.14.
H.J. Lu [Mon, 20 Dec 2021 23:00:24 +0000 (15:00 -0800)]
math: Properly cast X_TLOSS to float [BZ #28713]
Add
#define AS_FLOAT_CONSTANT_1(x) x##f
#define AS_FLOAT_CONSTANT(x) AS_FLOAT_CONSTANT_1(x)
to cast X_TLOSS to float at compile-time to fix:
FAIL: math/test-float-j0
FAIL: math/test-float-jn
FAIL: math/test-float-y0
FAIL: math/test-float-y1
FAIL: math/test-float-yn
FAIL: math/test-float32-j0
FAIL: math/test-float32-jn
FAIL: math/test-float32-y0
FAIL: math/test-float32-y1
FAIL: math/test-float32-yn
when compiling with GCC 12.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Adhemerval Zanella [Fri, 17 Dec 2021 12:18:52 +0000 (09:18 -0300)]
Set default __TIMESIZE default to 64
This is expected size for newer ABIs.
Florian Weimer [Thu, 23 Dec 2021 14:01:07 +0000 (15:01 +0100)]
stdio: Implement %#m for vfprintf and related functions
%#m prints errno as an error constant if one is available, or
a decimal number as a fallback. This intends to address the gap
that strerrorname_np does not work well with printf for unknown
error codes due to its NULL return values in those cases.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Florian Weimer [Thu, 23 Dec 2021 11:24:30 +0000 (12:24 +0100)]
elf: Remove unused NEED_DL_BASE_ADDR and _dl_base_addr
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Sunil K Pandey [Wed, 22 Dec 2021 14:20:41 +0000 (06:20 -0800)]
x86-64: Add vector acos/acosf implementation to libmvec
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Andrea Monaco [Sun, 12 Dec 2021 09:24:28 +0000 (10:24 +0100)]
intl/plural.y: Avoid conflicting declarations of yyerror and yylex
bison-3.8 includes these lines in the generated intl/plural.c:
#if !defined __gettexterror && !defined YYERROR_IS_DECLARED
void __gettexterror (struct parse_args *arg, const char *msg);
#endif
#if !defined __gettextlex && !defined YYLEX_IS_DECLARED
int __gettextlex (YYSTYPE *yylvalp, struct parse_args *arg);
#endif
Those default prototypes provided by bison conflict with the
declarations later on in plural.y. This patch solves the issue.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
H.J. Lu [Tue, 21 Dec 2021 20:35:47 +0000 (12:35 -0800)]
elf: Remove excessive p_align check on PT_LOAD segments [BZ #28688]
p_align does not have to be a multiple of the page size. Only PT_LOAD
segment layout should be aligned to the page size.
1: Remove p_align check against the page size.
2. Use the page size, instead of p_align, to check PT_LOAD segment layout.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
H.J. Lu [Mon, 20 Dec 2021 22:37:26 +0000 (14:37 -0800)]
s_sincosf.h: Change pio4 type to float [BZ #28713]
s_cosf.c and s_sinf.c have
if (abstop12 (y) < abstop12 (pio4))
where abstop12 takes a float argument, but pio4 is static const double.
pio4 is used only in calls to abstop12 and never in arithmetic. Apply
-static const double pio4 = 0x1.921FB54442D18p-1;
+static const float pio4 = 0x1.921FB6p-1f;
to fix:
FAIL: math/test-float-cos
FAIL: math/test-float-sin
FAIL: math/test-float-sincos
FAIL: math/test-float32-cos
FAIL: math/test-float32-sin
FAIL: math/test-float32-sincos
when compiling with GCC 12.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
maminjie [Mon, 20 Dec 2021 11:36:32 +0000 (19:36 +0800)]
Linux: Fix 32-bit vDSO for clock_gettime on powerpc32
When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID,
on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully (
because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S
does not support these two IDs, the 32-bit time_t syscall will be used),
but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used,
resulting in two system calls.
Fix commit
72e84d1db22203e01a43268de71ea8669eca2863.
Signed-off-by: maminjie <maminjie2@huawei.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
H.J. Lu [Mon, 20 Dec 2021 23:15:12 +0000 (15:15 -0800)]
Regenerate ulps on x86_64 with GCC 12
Fix
FAIL: math/test-float-clog10
FAIL: math/test-float32-clog10
on Intel Core i7-1165G7 with GCC 12.
Joseph Myers [Mon, 20 Dec 2021 15:38:32 +0000 (15:38 +0000)]
Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h
Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along
with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit
cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol
numbers for PF_CAN") but apparently missed for glibc at the time.
Tested for x86_64.
Adhemerval Zanella [Thu, 16 Dec 2021 13:32:38 +0000 (10:32 -0300)]
Remove ununsed tcb-offset
Some architectures do not use the auto-generated tcb-offsets.h.
Aurelien Jarno [Wed, 15 Dec 2021 23:06:28 +0000 (00:06 +0100)]
riscv: align stack before calling _dl_init [BZ #28703]
Align the stack pointer to 128 bits during the call to _dl_init() as
specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test.
Fixes bug 28703.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
Aurelien Jarno [Tue, 14 Dec 2021 21:44:35 +0000 (22:44 +0100)]
riscv: align stack in clone [BZ #28702]
The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to
a 128-bit boundary upon procedure entry". This as not the case in clone.
This fixes the misc/tst-misalign-clone-internal and
misc/tst-misalign-clone tests.
Fixes bug 28702.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
Aurelien Jarno [Wed, 15 Dec 2021 22:46:19 +0000 (23:46 +0100)]
elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704]
On KVM guests running on some AMD systems, the IBRS feature is reported
as a synthetic feature using the Intel feature, while the cpuinfo entry
keeps the same. Handle that by first checking the presence of the Intel
feature on AMD systems.
Fixes bug 28704.
Matheus Castanho [Wed, 1 Dec 2021 14:14:40 +0000 (11:14 -0300)]
powerpc64[le]: Allocate extra stack frame on syscall.S
The syscall function does not allocate the extra stack frame for scv like other
assembly syscalls using DO_CALL_SCV. So after commit
d120fb9941 changed the
offset that is used to save LR, syscall ended up using an invalid offset,
causing regressions on powerpc64. So make sure the extra stack frame is
allocated in syscall.S as well to make it consistent with other uses of
DO_CALL_SCV and avoid similar issues in the future.
Tested on powerpc, powerpc64, and powerpc64le (with and without scv)
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Maxim Kuvyrkov [Thu, 16 Dec 2021 18:55:15 +0000 (18:55 +0000)]
Update copyright header in recently merged ab_GE locale
ab_GE locale was committed under DCO and this header
proposed in [1] suits it better.
[1] https://sourceware.org/pipermail/libc-alpha/2021-September/130692.html
Signed-off-by: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Signed-off-by: Nart Tlisha <daniel.abzakh@gmail.com>
Siddhesh Poyarekar [Fri, 17 Dec 2021 13:05:44 +0000 (18:35 +0530)]
fortify: Fix spurious warning with realpath
The length and object size arguments were swapped around for realpath.
Also add a smoke test so that any changes in this area get caught in
future.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Florian Weimer [Fri, 17 Dec 2021 11:01:20 +0000 (12:01 +0100)]
nss: Use "files dns" as the default for the hosts database (bug 28700)
This matches what is currently in nss/nsswitch.conf. The new ordering
matches what most distributions use in their installed configuration
files.
It is common to add localhost to /etc/hosts because the name does not
exist in the DNS, but is commonly used as a host name.
With the built-in "dns [!UNAVAIL=return] files" default, dns is
searched first and provides an answer for "localhost" (NXDOMAIN).
We never look at the files database as a result, so the contents of
/etc/hosts is ignored. This means that "getent hosts localhost"
fail without a /etc/nsswitch.conf file, even though the host name
is listed in /etc/hosts.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Florian Weimer [Fri, 17 Dec 2021 10:48:41 +0000 (11:48 +0100)]
arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro
Due to PIE-by-default, PIC is now defined in more cases. libc.a
does not have _rtld_global_ro, and statically linking setcontext
fails. SHARED is the right condition to use, so that libc.a
references _dl_hwcap instead of _rtld_global_ro.
For static PIE support, the !SHARED case would still have to be made
PIC. This patch does not achieve that.
Fixes commit
23645707f12f2dd9d80b51effb2d9618a7b65565
("Replace --enable-static-pie with --disable-default-pie").
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Siddhesh Poyarekar [Fri, 17 Dec 2021 02:29:45 +0000 (07:59 +0530)]
Fix The GNU ToolChain Authors copyright notice
I (and maybe one or two others) added a (C) to the copyright notice
regardless of the contribution checklist[1] not mentioning it. Fix all
these instances so that the notice reads as "Copyright The GNU Toolchain
Authors" across the source code.
[1] https://sourceware.org/glibc/wiki/Contribution%20checklist
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Patrick McGehearty [Thu, 16 Dec 2021 00:43:24 +0000 (00:43 +0000)]
Remove upper limit on tunable MALLOC_MMAP_THRESHOLD
The current limit on MALLOC_MMAP_THRESHOLD is either 1 Mbyte (for
32-bit apps) or 32 Mbytes (for 64-bit apps). This value was set by a
patch dated 2006 (15 years ago). Attempts to set the threshold higher
are currently ignored.
The default behavior is appropriate for many highly parallel
applications where many processes or threads are sharing RAM. In other
situations where the number of active processes or threads closely
matches the number of cores, a much higher limit may be desired by the
application designer. By today's standards on personal computers and
small servers, 2 Gbytes of RAM per core is commonly available. On
larger systems 4 Gbytes or more of RAM is sometimes available.
Instead of raising the limit to match current needs, this patch
proposes to remove the limit of the tunable, leaving the decision up
to the user of a tunable to judge the best value for their needs.
This patch does not change any of the defaults for malloc tunables,
retaining the current behavior of the dynamic malloc mmap threshold.
bugzilla 27801 - Remove upper limit on tunable MALLOC_MMAP_THRESHOLD
Reviewed-by: DJ Delorie <dj@redhat.com>
malloc/
malloc.c changed do_set_mmap_threshold to remove test
for HEAP_MAX_SIZE.
Nart Tlisha [Thu, 16 Dec 2021 10:49:02 +0000 (13:49 +0300)]
localedata: add new locale ab_GE
Add the Abkhazian language in the Georgia territory
The ab_GE was just recently added to CLDR, it should be available
in CLDR v41, https://github.com/unicode-org/cldr/pull/1402
The Abkhazian language has been added to Gnome for localization
The locale has been tested on Ubuntu 20.04, Mint 20.2 and Fedora 35 Beta
Signed-off-by: Nart Tlisha <daniel.abzakh@gmail.com>
Reviewed-by: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Stefan Liebler [Thu, 16 Dec 2021 11:47:11 +0000 (12:47 +0100)]
Fix __minimal_malloc segfaults in __mmap due to stack-protector
Starting with commit
b05fae4d8e34604a72ee36d2d3164391b76fcf0b
"elf: Use the minimal malloc on tunables_strdup",
I get lots of segfaults in static tests on s390x when also using, e.g.:
export GLIBC_TUNABLES="glibc.elision.enable=1"
tunables_strdup callls __minimal_malloc which tries to call __mmap
due to insufficient space left. __mmap itself first setups a new
stack frame and segfaults when copying the stack-protector canary
from thread-pointer. The latter one is not yet setup.
Thus this patch also turns off stack-protection for mmap.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Siddhesh Poyarekar [Thu, 16 Dec 2021 01:49:14 +0000 (07:19 +0530)]
__glibc_unsafe_len: Fix comment
We know that the length is *unsafe*.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Adhemerval Zanella [Mon, 30 Aug 2021 17:01:00 +0000 (14:01 -0300)]
malloc: Enable huge page support on main arena
This patch adds support huge page support on main arena allocation,
enable with tunable glibc.malloc.hugetlb=2. The patch essentially
disable the __glibc_morecore() sbrk() call (similar when memory
tag does when sbrk() call does not support it) and fallback to
default page size if the memory allocation fails.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Mon, 30 Aug 2021 13:56:55 +0000 (10:56 -0300)]
malloc: Move MORECORE fallback mmap to sysmalloc_mmap_fallback
So it can be used on hugepage code as well.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Fri, 20 Aug 2021 16:22:35 +0000 (13:22 -0300)]
malloc: Add Huge Page support to arenas
It is enabled as default for glibc.malloc.hugetlb set to 2 or higher.
It also uses a non configurable minimum value and maximum value,
currently set respectively to 1 and 4 selected huge page size.
The arena allocation with huge pages does not use MAP_NORESERVE. As
indicate by kernel internal documentation [1], the flag might trigger
a SIGBUS on soft page faults if at memory access there is no left
pages in the pool.
On systems without a reserved huge pages pool, is just stress the
mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is
required to create a pool with some allocated pages.
Checked on x86_64-linux-gnu with no reserved pages, 10 reserved pages
(which trigger mmap(MAP_HUGETBL) failures) and with 256 reserved pages
(which does not trigger mmap(MAP_HUGETLB) failures).
[1] https://www.kernel.org/doc/html/v4.18/vm/hugetlbfs_reserv.html#resv-map-modifications
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Mon, 16 Aug 2021 18:08:27 +0000 (15:08 -0300)]
malloc: Add Huge Page support for mmap
With the morecore hook removed, there is not easy way to provide huge
pages support on with glibc allocator without resorting to transparent
huge pages. And some users and programs do prefer to use the huge pages
directly instead of THP for multiple reasons: no splitting, re-merging
by the VM, no TLB shootdowns for running processes, fast allocation
from the reserve pool, no competition with the rest of the processes
unlike THP, no swapping all, etc.
This patch extends the 'glibc.malloc.hugetlb' tunable: the value
'2' means to use huge pages directly with the system default size,
while a positive value means and specific page size that is matched
against the supported ones by the system.
Currently only memory allocated on sysmalloc() is handled, the arenas
still uses the default system page size.
To test is a new rule is added tests-malloc-hugetlb2, which run the
addes tests with the required GLIBC_TUNABLE setting. On systems without
a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB)
allocation failure. To improve test coverage it is required to create
a pool with some allocated pages.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Mon, 16 Aug 2021 14:14:20 +0000 (11:14 -0300)]
malloc: Move mmap logic to its own function
So it can be used with different pagesize and flags.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Fri, 13 Aug 2021 13:06:04 +0000 (10:06 -0300)]
malloc: Add THP/madvise support for sbrk
To increase effectiveness with Transparent Huge Page with madvise, the
large page size is use instead page size for sbrk increment for the
main arena.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Fri, 13 Aug 2021 11:36:29 +0000 (08:36 -0300)]
malloc: Add madvise support for Transparent Huge Pages
Linux Transparent Huge Pages (THP) current supports three different
states: 'never', 'madvise', and 'always'. The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
pages. However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through a madvise(MADV_HUGEPAGE) call.
To enable it a new tunable is provided, 'glibc.malloc.hugetlb',
where setting to a value diffent than 0 enables the madvise call.
This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc() with sizes larger than the default huge
page size. The madvise() call is disable is system does not support
THP or if it has the mode set to "never" and on Linux only support
one page size for THP, even if the architecture supports multiple
sizes.
To test is a new rule is added tests-malloc-hugetlb1, which run the
addes tests with the required GLIBC_TUNABLE setting.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Florian Weimer [Wed, 15 Dec 2021 15:06:25 +0000 (16:06 +0100)]
powerpc: Use global register variable in <thread_pointer.h>
A local register variable is merely a compiler hint, and so not
appropriate in this context. Move the global register variable into
<thread_pointer.h> and include it from <tls.h>, as there can only
be one global definition for one particular register.
Fixes commit
8dbeb0561eeb876f557ac9eef5721912ec074ea5
("nptl: Add <thread_pointer.h> for defining __thread_pointer").
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Adhemerval Zanella [Thu, 20 May 2021 17:20:18 +0000 (14:20 -0300)]
Use LFS and 64 bit time for installed programs (BZ #15333)
The installed programs are built with a combination of different
values for MODULE_NAME, as below. To enable both Long File Support
and 64 bt time, -D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64 is added for
nonlibi, nscd, lddlibc4, libresolv, ldconfig, locale_programs,
iconvprogs, libnss_files, libnss_compat, libnss_db, libnss_hesiod,
libutil, libpcprofile, and libSegFault.
nscd/nscd
nscd/nscd.o MODULE_NAME=nscd
nscd/connections.o MODULE_NAME=nscd
nscd/pwdcache.o MODULE_NAME=nscd
nscd/getpwnam_r.o MODULE_NAME=nscd
nscd/getpwuid_r.o MODULE_NAME=nscd
nscd/grpcache.o MODULE_NAME=nscd
nscd/getgrnam_r.o MODULE_NAME=nscd
nscd/getgrgid_r.o MODULE_NAME=nscd
nscd/hstcache.o MODULE_NAME=nscd
nscd/gethstbyad_r.o MODULE_NAME=nscd
nscd/gethstbynm3_r.o MODULE_NAME=nscd
nscd/getsrvbynm_r.o MODULE_NAME=nscd
nscd/getsrvbypt_r.o MODULE_NAME=nscd
nscd/servicescache.o MODULE_NAME=nscd
nscd/dbg_log.o MODULE_NAME=nscd
nscd/nscd_conf.o MODULE_NAME=nscd
nscd/nscd_stat.o MODULE_NAME=nscd
nscd/cache.o MODULE_NAME=nscd
nscd/mem.o MODULE_NAME=nscd
nscd/nscd_setup_thread.o MODULE_NAME=nscd
nscd/xmalloc.o MODULE_NAME=nscd
nscd/xstrdup.o MODULE_NAME=nscd
nscd/aicache.o MODULE_NAME=nscd
nscd/initgrcache.o MODULE_NAME=nscd
nscd/gai.o MODULE_NAME=nscd
nscd/res_hconf.o MODULE_NAME=nscd
nscd/netgroupcache.o MODULE_NAME=nscd
nscd/cachedumper.o MODULE_NAME=nscd
elf/lddlibc4
elf/lddlibc4 MODULE_NAME=lddlibc4
elf/pldd
elf/pldd.o MODULE_NAME=nonlib
elf/xmalloc.o MODULE_NAME=nonlib
elf/sln
elf/sln.o MODULE_NAME=nonlib
elf/static-stubs.o MODULE_NAME=nonlib
elf/sprof MODULE_NAME=nonlib
elf/ldconfig
elf/ldconfig.o MODULE_NAME=ldconfig
elf/cache.o MODULE_NAME=nonlib
elf/readlib.o MODULE_NAME=nonlib
elf/xmalloc.o MODULE_NAME=nonlib
elf/xstrdup.o MODULE_NAME=nonlib
elf/chroot_canon.o MODULE_NAME=nonlib
elf/static-stubs.o MODULE_NAME=nonlib
elf/stringtable.o MODULE_NAME=nonlib
io/pwd
io/pwd.o MODULE_NAME=nonlib
locale/locale
locale/locale.o MODULE_NAME=locale_programs
locale/locale-spec.o MODULE_NAME=locale_programs
locale/charmap-dir.o MODULE_NAME=locale_programs
locale/simple-hash.o MODULE_NAME=locale_programs
locale/xmalloc.o MODULE_NAME=locale_programs
locale/xstrdup.o MODULE_NAME=locale_programs
locale/record-status.o MODULE_NAME=locale_programs
locale/xasprintf.o MODULE_NAME=locale_programs
locale/localedef
locale/localedef.o MODULE_NAME=locale_programs
locale/ld-ctype.o MODULE_NAME=locale_programs
locale/ld-messages.o MODULE_NAME=locale_programs
locale/ld-monetary.o MODULE_NAME=locale_programs
locale/ld-numeric.o MODULE_NAME=locale_programs
locale/ld-time.o MODULE_NAME=locale_programs
locale/ld-paper.o MODULE_NAME=locale_programs
locale/ld-name.o MODULE_NAME=locale_programs
locale/ld-address.o MODULE_NAME=locale_programs
locale/ld-telephone.o MODULE_NAME=locale_programs
locale/ld-measurement.o MODULE_NAME=locale_programs
locale/ld-identification.o MODULE_NAME=locale_programs
locale/ld-collate.o MODULE_NAME=locale_programs
locale/charmap.o MODULE_NAME=locale_programs
locale/linereader.o MODULE_NAME=locale_programs
locale/locfile.o MODULE_NAME=locale_programs
locale/repertoire.o MODULE_NAME=locale_programs
locale/locarchive.o MODULE_NAME=locale_programs
locale/md5.o MODULE_NAME=locale_programs
locale/charmap-dir.o MODULE_NAME=locale_programs
locale/simple-hash.o MODULE_NAME=locale_programs
locale/xmalloc.o MODULE_NAME=locale_programs
locale/xstrdup.o MODULE_NAME=locale_programs
locale/record-status.o MODULE_NAME=locale_programs
locale/xasprintf.o MODULE_NAME=locale_programs
catgets/gencat
catgets/gencat.o MODULE_NAME=nonlib
catgets/xmalloc.o MODULE_NAME=nonlib
nss/makedb
nss/makedb.o MODULE_NAME=nonlib
nss/xmalloc.o MODULE_NAME=nonlib
nss/hash-string.o MODULE_NAME=nonlib
nss/getent
nss/getent.o MODULE_NAME=nonlib
posix/getconf
posix/getconf.o MODULE_NAME=nonlib
login/utmpdump
login/utmpdump.o MODULE_NAME=nonlib
debug/pcprofiledump
debug/pcprofiledump.o MODULE_NAME=nonlib
timezone/zic
timezone/zic.o MODULE_NAME=nonlib
timezone/zdump
timezone/zdump.o MODULE_NAME=nonlib
iconv/iconv_prog
iconv/iconv_prog.o MODULE_NAME=nonlib
iconv/iconv_charmap.o MODULE_NAME=iconvprogs
iconv/charmap.o MODULE_NAME=iconvprogs
iconv/charmap-dir.o MODULE_NAME=iconvprogs
iconv/linereader.o MODULE_NAME=iconvprogs
iconv/dummy-repertoire.o MODULE_NAME=iconvprogs
iconv/simple-hash.o MODULE_NAME=iconvprogs
iconv/xstrdup.o MODULE_NAME=iconvprogs
iconv/xmalloc.o MODULE_NAME=iconvprogs
iconv/record-status.o MODULE_NAME=iconvprogs
iconv/iconvconfig
iconv/iconvconfig.o MODULE_NAME=nonlib
iconv/strtab.o MODULE_NAME=iconvprogs
iconv/xmalloc.o MODULE_NAME=iconvprogs
iconv/hash-string.o MODULE_NAME=iconvprogs
nss/libnss_files.so MODULE_NAME=libnss_files
nss/libnss_compat.so.2 MODULE_NAME=libnss_compat
nss/libnss_db.so MODULE_NAME=libnss_db
hesiod/libnss_hesiod.so MODULE_NAME=libnss_hesiod
login/libutil.so MODULE_NAME=libutil
debug/libpcprofile.so MODULE_NAME=libpcprofile
debug/libSegFault.so MODULE_NAME=libSegFault
Also, to avoid adding both LFS and 64 bit time support on internal
tests they are moved to a newer 'testsuite-internal' module. It
should be similar to 'nonlib' regarding internal definition and
linking namespace.
This patch also enables LFS and 64 bit support of libsupport container
programs (echo-container, test-container, shell-container, and
true-container).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
H.J. Lu [Fri, 10 Dec 2021 23:44:46 +0000 (15:44 -0800)]
Support target specific ALIGN for variable alignment test [BZ #28676]
Add <tst-file-align.h> to support target specific ALIGN for variable
alignment test:
1. Alpha: Use 0x10000.
2. MicroBlaze and Nios II: Use 0x8000.
3. All others: Use 0x200000.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
H.J. Lu [Tue, 14 Dec 2021 15:19:36 +0000 (07:19 -0800)]
NEWS: Document LD_PREFER_MAP_32BIT_EXEC as x86-64 only
H.J. Lu [Mon, 13 Dec 2021 15:17:29 +0000 (07:17 -0800)]
elf: Align argument of __munmap to page size [BZ #28676]
On Linux/x86-64, for elf/tst-align3, we now get
munmap(0x7f88f9401000,
1126424) = 0
instead of
munmap(0x7f1615200018, 544768) = -1 EINVAL (Invalid argument)
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Florian Weimer [Tue, 14 Dec 2021 11:37:44 +0000 (12:37 +0100)]
elf: Use new dependency sorting algorithm by default
The default has to change eventually, and there are no known failures
that require a delay.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Khem Raj [Thu, 2 Dec 2021 07:13:13 +0000 (23:13 -0800)]
intl: Emit no lines in bison generated files
Improve reproducibility:
Do not put any #line preprocessor commands in bison generated files.
These lines contain absolute paths containing file locations on
the host build machine.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Samuel Thibault [Tue, 14 Dec 2021 07:38:05 +0000 (08:38 +0100)]
hurd: Do not set PIE_UNSUPPORTED
This is now supported.
H.J. Lu [Tue, 14 Dec 2021 00:33:57 +0000 (16:33 -0800)]
NEWS: Move LD_PREFER_MAP_32BIT_EXEC
Move LD_PREFER_MAP_32BIT_EXEC to
Deprecated and removed features, and other changes affecting compatibility:
Samuel Thibault [Tue, 14 Dec 2021 00:01:48 +0000 (01:01 +0100)]
mach: Fix spurious inclusion of stack_chk_fail_local in libmachuser.a
When linking programs statically, stack_chk_fail_local already comes
from libc_nonshared, so we don't need it in lib{mach,hurd}user.a.
H.J. Lu [Wed, 8 Dec 2021 15:02:27 +0000 (07:02 -0800)]
Disable DT_RUNPATH on NSS tests [BZ #28455]
The glibc internal NSS functions should always load NSS modules from
the system. For testing purpose, disable DT_RUNPATH on NSS tests so
that the glibc internal NSS functions can load testing NSS modules
via DT_RPATH.
This partially fixes BZ #28455.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Akila Welihinda [Sun, 12 Dec 2021 18:35:03 +0000 (10:35 -0800)]
sysdeps: Simplify sin Taylor Series calculation
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.
Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu>
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Adhemerval Zanella [Tue, 6 Apr 2021 17:33:14 +0000 (14:33 -0300)]
math: Remove the error handling wrapper from hypot and hypotf
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Wilco Dijkstra [Wed, 1 Dec 2021 14:08:14 +0000 (11:08 -0300)]
math: Use fmin/fmax on hypot
It optimizes for architectures that provides fast builtins.
Checked on aarch64-linux-gnu.
Adhemerval Zanella [Wed, 1 Dec 2021 13:57:32 +0000 (10:57 -0300)]
aarch64: Add math-use-builtins-f{max,min}.h
It allows to remove the arch-specific implementations.
Adhemerval Zanella [Wed, 1 Dec 2021 13:44:58 +0000 (10:44 -0300)]
math: Add math-use-builtinds-fmin.h
It allows the architecture to use the builtin instead of generic
implementation.
Adhemerval Zanella [Wed, 1 Dec 2021 13:37:44 +0000 (10:37 -0300)]
math: Add math-use-builtinds-fmax.h
It allows the architecture to use the builtin instead of generic
implementation.
Adhemerval Zanella [Sun, 4 Apr 2021 02:52:45 +0000 (23:52 -0300)]
math: Remove powerpc e_hypot
The generic implementation is shows only slight worse performance:
POWER10 reciprocal-throughput latency
master 8.28478 13.7253
new hypot 7.21945 13.1933
POWER9 reciprocal-throughput latency
master 13.4024 14.0967
new hypot 14.8479 15.8061
POWER8 reciprocal-throughput latency
master 15.5767 16.8885
new hypot 16.5371 18.4057
One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).
Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
Adhemerval Zanella [Tue, 6 Apr 2021 15:32:06 +0000 (12:32 -0300)]
i386: Move hypot implementation to C
The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling. The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly. A similar
implementation is provided instead of using the generic implementation:
Checked on i686-linux-gnu.
Adhemerval Zanella [Tue, 6 Apr 2021 02:55:55 +0000 (23:55 -0300)]
math: Use an improved algorithm for hypotl (ldbl-128)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
Adhemerval Zanella [Mon, 5 Apr 2021 20:28:48 +0000 (17:28 -0300)]
math: Use an improved algorithm for hypotl (ldbl-96)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).
[1] https://arxiv.org/pdf/1904.09481.pdf
Wilco Dijkstra [Tue, 30 Nov 2021 19:29:25 +0000 (16:29 -0300)]
math: Improve hypot performance with FMA
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Passes GLIBC testsuite.
Wilco Dijkstra [Mon, 8 Mar 2021 20:07:39 +0000 (17:07 -0300)]
math: Use an improved algorithm for hypot (dbl-64)
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for denormal results.
The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (
3424869 results) while the new implementation only shows
0.002% of total (18851).
The performance result are also only slight worse than current
implementation. On x86_64 (Ryzen 5900X) with gcc 12:
Before:
"hypot": {
"workload-random": {
"duration": 3.73319e+09,
"iterations": 1.12e+08,
"reciprocal-throughput": 22.8737,
"latency": 43.7904,
"max-throughput": 4.37184e+07,
"min-throughput": 2.28361e+07
}
}
After:
"hypot": {
"workload-random": {
"duration": 3.7597e+09,
"iterations": 9.8e+07,
"reciprocal-throughput": 23.7547,
"latency": 52.9739,
"max-throughput": 4.2097e+07,
"min-throughput": 1.88772e+07
}
}
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
Adhemerval Zanella [Mon, 5 Apr 2021 17:49:47 +0000 (14:49 -0300)]
math: Simplify hypotf implementation
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by
7c10fd3515f to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Checked on x86_64-linux-gnu.
Siddhesh Poyarekar [Mon, 13 Dec 2021 04:31:45 +0000 (10:01 +0530)]
Cleanup encoding in comments
Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8
equivalents so that files don't end up with mixed encodings. With this,
all files (except tests that actually test different encodings) have a
single encoding.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Siddhesh Poyarekar [Wed, 8 Dec 2021 05:51:26 +0000 (11:21 +0530)]
Replace --enable-static-pie with --disable-default-pie
Build glibc programs and tests as PIE by default and enable static-pie
automatically if the architecture and toolchain supports it.
Also add a new configuration option --disable-default-pie to prevent
building programs as PIE.
Only the following architectures now have PIE disabled by default
because they do not work at the moment. hppa, ia64, alpha and csky
don't work because the linker is unable to handle a pcrel relocation
generated from PIE objects. The microblaze compiler is currently
failing with an ICE. GNU hurd tries to enable static-pie, which does
not work and hence fails. All these targets have default PIE disabled
at the moment and I have left it to the target maintainers to enable PIE
on their targets.
build-many-glibcs runs clean for all targets. I also tested x86_64 on
Fedora and Ubuntu, to verify that the default build as well as
--disable-default-pie work as expected with both system toolchains.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Samuel Thibault [Sat, 11 Dec 2021 22:08:32 +0000 (23:08 +0100)]
hurd: Add rules for static PIE build
This fixes [BZ #28671].
Samuel Thibault [Sat, 11 Dec 2021 23:41:38 +0000 (00:41 +0100)]
hurd: Fix gmon-static
We need to use crt0 for gmon-static too.
H.J. Lu [Fri, 10 Dec 2021 21:00:09 +0000 (13:00 -0800)]
x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since
the first PT_LOAD segment is no longer executable due to defaulting to
-z separate-code.
This fixes [BZ #28656].
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Florian Weimer [Fri, 10 Dec 2021 20:34:30 +0000 (21:34 +0100)]
elf: Use errcode instead of (unset) errno in rtld_chain_load
H.J. Lu [Thu, 9 Dec 2021 15:01:33 +0000 (07:01 -0800)]
Add a testcase to check alignment of PT_LOAD segment [BZ #28676]