review.tizen.org Git - platform/upstream/glibc.git/log

elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)

This second patch contains the actual implementation of a new sorting algorithm
for shared objects in the dynamic loader, which solves the slow behavior that
the current "old" algorithm falls into when the DSO set contains circular
dependencies.

The new algorithm implemented here is simply depth-first search (DFS) to obtain
the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1
bitfield is added to struct link_map to more elegantly facilitate such a search.

The DFS algorithm is applied to the input maps[nmap-1] backwards towards
maps[0]. This has the effect of a more "shallow" recursion depth in general
since the input is in BFS. Also, when combined with the natural order of
processing l_initfini[] at each node, this creates a resulting output sorting
closer to the intuitive "left-to-right" order in most cases.

Another notable implementation adjustment related to this _dl_sort_maps change
is the removing of two char arrays 'used' and 'done' in _dl_close_worker to
represent two per-map attributes. This has been changed to simply use two new
bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows
discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes
do along the way.

Tunable support for switching between different sorting algorithms at runtime is
also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1
(old algorithm) and 2 (new DFS algorithm) has been added. At time of commit
of this patch, the default setting is 1 (old algorithm).

Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Testing infrastructure for ld.so DSO sorting (BZ #17645)

This is the first of a 2-part patch set that fixes slow DSO sorting behavior in
the dynamic loader, as reported in BZ #17645. In order to facilitate such a
large modification to the dynamic loader, this first patch implements a testing
framework for validating shared object sorting behavior, to enable comparison
between old/new sorting algorithms, and any later enhancements.

This testing infrastructure consists of a Python script
scripts/dso-ordering-test.py' which takes in a description language, consisting
of strings that describe a set of link dependency relations between DSOs, and
generates testcase programs and Makefile fragments to automatically test the
described situation, for example:

  a->b->c->d          # four objects linked one after another

  a->[bc]->d;b->c     # a depends on b and c, which both depend on d,
                      # b depends on c (b,c linked to object a in fixed order)

  a->b->c;{+a;%a;-a}  # a, b, c serially dependent, main program uses
                      # dlopen/dlsym/dlclose on object a

  a->b->c;{}!->[abc]  # a, b, c serially dependent; multiple tests generated
                      # to test all permutations of a, b, c ordering linked
                      # to main program

(Above is just a short description of what the script can do, more
  documentation is in the script comments.)

Two files containing several new tests, elf/dso-sort-tests-[12].def are added,
including test scenarios for BZ #15311 and Redhat issue #1162810 [1].

Due to the nature of dynamic loader tests, where the sorting behavior and test
output occurs before/after main(), generating testcases to use
support/test-driver.c does not suffice to control meaningful timeout for ld.so.
Therefore a new utility program 'support/test-run-command', based on
test-driver.c/support_test_main.c has been added. This does the same testcase
control, but for a program specified through a command-line rather than at the
source code level. This utility is used to run the dynamic loader testcases
generated by dso-ordering-test.py.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1162810

Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

iconv: Use TIMEOUTFACTOR for iconv test timeout

Currently the timeout for each iconv test is hard coded to 3 seconds.
On my OpenRISC test platform this is too slow and the test fails with a
HANG error.

This change uses the available TIMEOUTFACTOR to compute the timeout.
The default value is still 3.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

posix: Remove alloca usage for internal fnmatch implementation

This patch replaces the internal fnmatch pattern list generation
to use a dynamic array.

Checked on x86_64-linux-gnu.

Add alloc_align attribute to memalign et al

GCC 4.9.0 added the alloc_align attribute to say that a function
argument specifies the alignment of the returned pointer. Clang supports
the attribute too. Using the attribute can allow a compiler to generate
better code if it knows the returned pointer has a minimum alignment.
See https://gcc.gnu.org/PR60092 for more details.

GCC implicitly knows the semantics of aligned_alloc and posix_memalign,
but not the obsolete memalign. As a result, GCC generates worse code
when memalign is used, compared to aligned_alloc. Clang knows about
aligned_alloc and memalign, but not posix_memalign.

This change adds a new __attribute_alloc_align__ macro to <sys/cdefs.h>
and then uses it on memalign (where it helps GCC) and aligned_alloc
(where GCC and Clang already know the semantics, but it doesn't hurt)
and xposix_memalign. It can't be used on posix_memalign because that
doesn't return a pointer (the allocated pointer is returned via a void**
parameter instead).

Unlike the alloc_size attribute, alloc_align only allows a single
argument. That means the new __attribute_alloc_align__ macro doesn't
really need to be used with double parentheses to protect a comma
between its arguments. For consistency with __attribute_alloc_size__
this patch defines it the same way, so that double parentheses are
required.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>

linux: Fix a possibly non-constant expression in _Static_assert

According to C11 6.6p6, `const int` as an operand may not make up a
constant expression. GCC -O0 errors:

../sysdeps/unix/sysv/linux/opendir.c:107:19: error: static_assert expression is not an integral constant expression
_Static_assert (allocation_size >= sizeof (struct dirent64),

-O2 -Wpedantic has a similar warning.
See https://gcc.gnu.org/PR102502 for GCC's inconsistency.

Use enum which is guaranteed to be a constant expression.
This also makes the file compilable with Clang.

Fixes: 4b962c9e859de23b461d61f860dbd3f21311e83a ("linux: Simplify opendir buffer allocation")

x86-64: Add sysdeps/x86_64/fpu/Makeconfig

1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which
contains libmvec ABI test dependencies and CFLAGS, in the build directory.
2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS.

Tested on SSE4, AVX, AVX2 and AVX512 machines.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>

stdlib: Fix tst-canon-bz26341 when the glibc build current working directory is itself using symlinks.

powerpc: Remove backtrace implementation

The powerpc optimization to provide a fast stacktrace requires some
ad-hoc code to handle Linux signal frames and the change is fragile
once the kernel decides to slight change its execution sequence [1].

The generic implementation work as-is and it should be future proof
since the kernel provides the expected CFI directives in vDSO shared
page.

Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and
powerpc64-linux-gnu.

[1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html

Correct access attribute on memfrob (bug 28475)

As noted in bug 28475, the access attribute on memfrob in <string.h>
is incorrect: the function both reads and writes the memory pointed to
by its argument, so it needs to use __read_write__, not
__write_only__.  This incorrect attribute results in a build failure
for accessing uninitialized memory for s390x-linux-gnu-O3 with
build-many-glibcs.py using GCC mainline.

Correct the attribute.  Fixing this shows up that some calls to
memfrob in elf/ tests are reading uninitialized memory; I'm not
entirely sure of the purpose of those calls, but guess they are about
ensuring that the stack space is indeed allocated at that point in the
function, and so it matters that they are calling a function whose
semantics are unknown to the compiler.  Thus, change the first memfrob
call in those tests to use explicit_bzero instead, as suggested by
Florian in
<https://sourceware.org/pipermail/libc-alpha/2021-October/132119.html>,
to avoid the use of uninitialized memory.

Tested for x86_64, and with build-many-glibcs.py (GCC mainline) for
s390x-linux-gnu-O3.

debug: Add tests for _FORTIFY_SOURCE=3

Add some testing coverage for _FORTIFY_SOURCE=3.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Make sure that the fortified function conditionals are constant

In _FORTIFY_SOURCE=3, the size expression may be non-constant,
resulting in branches in the inline functions remaining intact and
causing a tiny overhead.  Clang (and in future, gcc) make sure that
the -1 case is always safe, i.e. any comparison of the generated
expression with (size_t)-1 is always false so that bit is taken care
of.  The rest is avoidable since we want the _chk variant whenever we
have a size expression and it's not -1.

Rework the conditionals in a uniform way to clearly indicate two
conditions at compile time:

- Either the size is unknown (-1) or we know at compile time that the
  operation length is less than the object size.  We can call the
  original function in this case.  It could be that either the length,
  object size or both are non-constant, but the compiler, through
  range analysis, is able to fold the *comparison* to a constant.

- The size and length are known and the compiler can see at compile
  time that operation length > object size.  This is valid grounds for
  a warning at compile time, followed by emitting the _chk variant.

For everything else, emit the _chk variant.

This simplifies most of the fortified function implementations and at
the same time, ensures that only one call from _chk or the regular
function is emitted.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Don't add access size hints to fortifiable functions

In the context of a function definition, the size hints imply that the
size of an object pointed to by one parameter is another parameter.
This doesn't make sense for the fortified versions of the functions
since that's the bit it's trying to validate.

This is harmless with __builtin_object_size since it has fairly simple
semantics when it comes to objects passed as function parameters.
With __builtin_dynamic_object_size we could (as my patchset for gcc[1]
already does) use the access attribute to determine the object size in
the general case but it misleads the fortified functions.

Basically the problem occurs when access attributes are present on
regular functions that have inline fortified definitions to generate
_chk variants; the attributes get inherited by these definitions,
causing problems when analyzing them.  For example with poll(fds, nfds,
timeout), nfds is hinted using the __attr_access as being the size of
fds.

Now, when analyzing the inline function definition in bits/poll2.h, the
compiler sees that nfds is the size of fds and tries to use that
information in the function body.  In _FORTIFY_SOURCE=3 case, where the
object size could be a non-constant expression, this information results
in the conclusion that nfds is the size of fds, which defeats the
purpose of the implementation because we're trying to check here if nfds
does indeed represent the size of fds.  Hence for this case, it is best
to not have the access attribute.

With the attributes gone, the expression evaluation should get delayed
until the function is actually inlined into its destinations.

Disable the access attribute for fortified function inline functions
when building at _FORTIFY_SOURCE=3 to make this work better.  The
access attributes remain for the _chk variants since they can be used
by the compiler to warn when the caller is passing invalid arguments.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581125.html

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

glibcextract.py: Place un-assemblable @@@ in a comment

Unlike GCC, Clang parses asm statements and verifies they are valid
instructions/directives. Place the magic @@@ into a comment to avoid
a parse error.

nss: Unnest nested function add_key

This makes makedb.c compilable with Clang which does not support nested
functions.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340]

1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly
before calling elf_get_dynamic_info to get dynamic info in bootstrap_map,
2. Define a single

static inline bool
dl_relocate_ld (const struct link_map *l)
{
/* Don't relocate dynamic section if it is readonly */
return !(l->l_ld_readonly || DL_RO_DYN_SECTION);
}

This updates BZ #28340 fix.

timex: Use 64-bit fields on 32-bit TIMESIZE=64 systems (BZ #28469)

This was found when testing the OpenRISC port I am working on.  These
two tests fail with SIGSEGV:

  FAIL: misc/tst-ntp_gettime
  FAIL: misc/tst-ntp_gettimex

This was found to be due to the kernel overwriting the stack space
allocated by the timex structure.  The reason for the overwrite being
that the kernel timex has 64-bit fields and user space code only
allocates enough stack space for timex with 32-bit fields.

On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined.
This causes the timex structure to use 32-bit fields with type
__syscall_slong_t.

This patch adjusts the ifdef condition to allow 32-bit systems with
TIMESIZE=64 to use the 64-bit long long timex definition.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

manual: Update _TIME_BITS to clarify it's user defined

The current language reads "This macro determines...", changing to
"Define this macro...". This is consistent with other feature macro
documentation language.

When I first read the previous language it seems to indicate that the
macro is already defined. By changing the language to "Define this
macro..." it's clear that its the user's responsibility to define it.

nptl: Fix tst-cancel7 and tst-cancelx7 pidfile race

The check for waiting for the pidfile to be created looks wrong. At the
point when ACCESS is run the pid file will always be created and
accessible as it is created during DO_PREPARE. This means that thread
cancellation may be performed before the pid is written to the pidfile.

This was found to be flaky when testing on my OpenRISC platform.

Fix this by using the semaphore to wait for pidfile pid write
completion.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Fix elf_get_dynamic_info() for bootstrap

THe d6d89608ac8c broke powerpc for --enable-bind-now because it turned
out that different than patch assumption rtld elf_get_dynamic_info()
does require to handle RTLD_BOOTSTRAP to avoid DT_FLAGS and
DT_RUNPATH (more specially the GLRO usage which is not reallocate
yet).

This patch fixes by passing two arguments to elf_get_dynamic_info()
to inform that by rtld (bootstrap) or static pie initialization
(static_pie_bootstrap). I think using explicit argument is way more
clear and burried C preprocessor, and compiler should remove the
dead code.

I checked on x86_64 and i686 with default options, --enable-bind-now,
and --enable-bind-now and --enable--static-pie. I also check on
aarch64, armhf, powerpc64, and powerpc with default and
--enable-bind-now.

hurd if_index: Explicitly use AF_INET for if index discovery

5bf07e1b3a74 ("Linux: Simplify __opensock and fix race condition [BZ #28353]")
made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET
knows about network interfaces, so better actually specify that in
if_index.

hurd: Fix intr-msg parameter/stack kludge

INTR_MSG_TRAP was tinkering with esp to make it point to
_hurd_intr_rpc_mach_msg's parameters, and notably use (&msg)[-1] which is
meaningless in C.

Instead, just push the parameters on the stack, which also avoids leaving
local variables of _hurd_intr_rpc_mach_msg below esp. We now also
properly express that OPTION and TIMEOUT may be updated during the trap
call.

x86-64: Add test-vector-abi.h/test-vector-abi-sincos.h

Add templates for vector ABI test and use them for vector sincos/sincosf
ABI tests.

elf: Fix dynamic-link.h usage on rtld.c

The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on
rtld.c due two issues:

  1. RTLD_BOOTSTRAP is also used on dl-machine.h on various
     architectures and it changes the semantics of various machine
     relocation functions.

  2. The elf_get_dynamic_info() change was done sideways, previously
     to 490e6c62aa get-dynamic-info.h was included by the first
     dynamic-link.h include *without* RTLD_BOOTSTRAP being defined.
     It means that the code within elf_get_dynamic_info() that uses
     RTLD_BOOTSTRAP is in fact unused.

To fix 1. this patch now includes dynamic-link.h only once with
RTLD_BOOTSTRAP defined.  The ELF_DYNAMIC_RELOCATE call will now have
the relocation fnctions with the expected semantics for the loader.

And to fix 2. part of 4af6982e4c is reverted (the check argument
elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP
pieces are removed.

To reorganize the includes the static TLS definition is moved to
its own header to avoid a circular dependency (it is defined on
dynamic-link.h and dl-machine.h requires it at same time other
dynamic-link.h definition requires dl-machine.h defitions).

Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL
are moved to its own header.  Only ancient ABIs need special values
(arm, i386, and mips), so a generic one is used as default.

The powerpc Elf64_FuncDesc is also moved to its own header, since
csu code required its definition (which would require either include
elf/ folder or add a full path with elf/).

Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32,
and powerpc64le.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

x86: Optimize memset-vec-unaligned-erms.S

No bug.

Optimization are

1. change control flow for L(more_2x_vec) to fall through to loop and
   jump for L(less_4x_vec) and L(less_8x_vec). This uses less code
   size and saves jumps for length > 4x VEC_SIZE.

2. For EVEX/AVX512 move L(less_vec) closer to entry.

3. Avoid complex address mode for length > 2x VEC_SIZE

4. Slightly better aligning code for the loop from the perspective of
   code size and uops.

5. Align targets so they make full use of their fetch block and if
   possible cache line.

6. Try and reduce total number of icache lines that will need to be
   pulled in for a given length.

7. Include "local" version of stosb target. For AVX2/EVEX/AVX512
   jumping to the stosb target in the sse2 code section will almost
   certainly be to a new page. The new version does increase code size
   marginally by duplicating the target but should get better iTLB
   behavior as a result.

test-memset, test-wmemset, and test-bzero are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

x86: Optimize memcmp-evex-movbe.S for frontend behavior and size

No bug.

The frontend optimizations are to:
1. Reorganize logically connected basic blocks so they are either in
the same cache line or adjacent cache lines.
2. Avoid cases when basic blocks unnecissarily cross cache lines.
3. Try and 32 byte align any basic blocks possible without sacrificing
code size. Smaller / Less hot basic blocks are used for this.

Overall code size shrunk by 168 bytes. This should make up for any
extra costs due to aligning to 64 bytes.

In general performance before deviated a great deal dependending on
whether entry alignment % 64 was 0, 16, 32, or 48. These changes
essentially make it so that the current implementation is at least
equal to the best alignment of the original for any arguments.

The only additional optimization is in the page cross case. Branch on
equals case was removed from the size == [4, 7] case. As well the [4,
7] and [2, 3] case where swapped as [4, 7] is likely a more hot
argument size.

test-memcmp and test-wmemcmp are both passing.

libio: Update tst-wfile-sync to not depend on stdin

The test expects stdin to be a file which is not the case when running
tests over ssh where stdin is piped in.

The test fails with:
error: xlseek.c:27: lseek64 (0, 0, 1): Illegal seek

Update the test to create a temporary file and use that to perform the
test.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Update audit tests to not depend on stdout

The tst-audit14, tst-audit15 and tst-audit16 tests all have audit
modules that write to stdout; the test reads from stdout to confirm
what was written.  This assumes the stdout is a file which is not the
case when run over ssh.

This patch updates the tests to use a post run cmp command to compare
the output against and .exp file.  This is similar to how many other
tests work and it fixes the stdout limitation.  Also, this means the
test code can be greatly simplified.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Fix elf_get_dynamic_info definition

Before to 490e6c62aa31a8a ('elf: Avoid nested functions in the loader
[BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on
the first dynamic-link.h include and later within _dl_start(). The
former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used
on setup_vdso() (since it is a global definition), while the former does
define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation.

With the commit change, the function is now included and defined once
instead of defined as a nested function. So rtld.c defines without
defining RTLD_BOOTSTRAP and it brokes at least powerpc32.

This patch fixes by moving the get-dynamic-info.h include out of
dynamic-link.h, which then the caller can corirectly set the expected
semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or
RESOLVE_MAP.

It also required to enable some asserts only for the loader bootstrap
to avoid issues when called from setup_vdso().

As a side note, this is another issues with nested functions: it is
not clear from pre-processed output (-E -dD) how the function will
be build and its semantic (since nested function will be local and
extra C defines may change it).

I checked on x86_64-linux-gnu (w/o --enable-static-pie),
i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4,
aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and
s390x-linux-gnu.

Reviewed-by: Fangrui Song <maskray@google.com>

Add TEST_COMPARE_STRING_WIDE to support/check.h

I'd like to be able to test narrow and wide string interfaces, with
the narrow string tests using TEST_COMPARE_STRING and the wide string
tests using something analogous (possibly generated using macros from
a common test template for both the narrow and wide string tests where
appropriate).

Add such a TEST_COMPARE_STRING_WIDE, along with functions
support_quote_blob_wide and support_test_compare_string_wide that it
builds on. Those functions are built using macros from common
templates shared by the narrow and wide string implementations, though
I didn't do that for the tests of test functions. In
support_quote_blob_wide, I chose to use the \x{} delimited escape
sequence syntax proposed for C2X in N2785, rather than e.g. trying to
generate the end of a string and the start of a new string when
ambiguity would result from undelimited \x (when the next character
after such an escape sequence is valid hex) or forcing an escape
sequence to be used for the next character in the case of such
ambiguity.

Tested for x86_64.

Fix nios2 localplt failure

Building for nios2-linux-gnu has recently started showing a localplt
test failure, arising from a reference to __floatunsidf from
getloadavg after commit b5c8a3aa82f66f49b731ca5204104cee48bccfa5
("Linux: implement getloadavg(3) using sysinfo(2)") (this is an
architecture with soft-fp in libc). Add this as a permitted local PLT
reference in localplt.data.

Tested with build-many-glibcs.py for nios2-linux-gnu.

elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT)

Intel MPX failed to gain wide adoption and has been deprecated for a
while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in
2019.

This patch removes the support code from the dynamic loader.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

resolv: Avoid GCC 12 false positive warning [BZ #28439].

Replace a call to sprintf with an equivalent pair of stpcpy/strcpy calls
to avoid a GCC 12 -Wformat-overflow false positive due to recent optimizer
improvements.

benchtests: Add medium cases and increase iters in bench-memset.c

No bug.

This commit adds new medium size cases for lengths in [512, 1024). As
well it increase the iters to INNER_LOOP_ITERS_LARGE for more reliable
results.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>

x86: Modify ENTRY in sysdep.h so that p2align can be specified

No bug.

This change adds a new macro ENTRY_P2ALIGN which takes a second
argument, log2 of the desired function alignment.

The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this
doesn't affect any existing functionality.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>

resolv: make res_randomid use random_bits()

It is at least "more random" than 0xffff & __getpid ();

Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Linux: implement getloadavg(3) using sysinfo(2)

Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Remove unreliable parts of rt/tst-cpuclock2

This is a follow-up to the tst-cpuclock1.c change here:
9a29f1a2ae3d4bb253ee368e0d71db0ca9494120

This test, like tst-cpuclock1, may fail on heavily loaded VM
servers (and has occasionally failed on the 32bit trybot),
so tests that rely on "wall time" have been removed.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

elf: Avoid nested functions in the loader [BZ #27220]

dynamic-link.h is included more than once in some elf/ files (rtld.c,
dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested
functions. This harms readability and the nested functions usage
is the biggest obstacle prevents Clang build (Clang doesn't support GCC
nested functions).

The key idea for unnesting is to add extra parameters (struct link_map
*and struct r_scope_elm *[]) to RESOLVE_MAP,
ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a],
elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired
by Stan Shebs' ppc64/x86-64 implementation in the
google/grte/v5-2.27/master which uses mixed extra parameters and static
variables.)

Future simplification:
* If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM,
elf_machine_runtime_setup can drop the `scope` parameter.
* If TLSDESC no longer need to be in elf_machine_lazy_rel,
elf_machine_lazy_rel can drop the `scope` parameter.

Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32,
sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64.
In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu
and riscv64-linux-gnu-rv64imafdc-lp64d.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Add run-time check for indirect external access

When performing symbol lookup for references in executable without
indirect external access:

1. Disallow copy relocations in executable against protected data symbols
in a shared object with indirect external access.
2. Disallow non-zero symbol values of undefined function symbols in
executable, which are used as the function pointer, against protected
function symbols in a shared object with indirect external access.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Initial support for GNU_PROPERTY_1_NEEDED

1. Add GNU_PROPERTY_1_NEEDED:

#define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO

to indicate the needed properties by the object file.
2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:

#define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)

to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.
3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

io: Fix ftw internal realloc buffer (BZ #28126)

The 106ff08526d3ca did not take in consideration the buffer might be
reallocated if the total path is larger than PATH_MAX. The realloc
uses 'dirbuf', where 'dirstreams' is the allocated buffer.

Checked on x86_64-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Fix subscript error with odd TZif file [BZ #28338]

* time/tzfile.c (__tzfile_compute): Fix unlikely off-by-one bug
that accessed before start of an array when an oddball-but-valid
TZif file was queried with an unusual time_t value.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

S390: Add PCI_MIO and SIE HWCAPs

Both new HWCAPs were introduced in these kernel commits:
- 7e8403ecaf884f307b627f3c371475913dd29292
  "s390: add HWCAP_S390_PCI_MIO to ELF hwcaps"
- 7e82523f2583e9813e4109df3656707162541297
  "s390/hwcaps: make sie capability regular hwcap"

Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e
"s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros
from "HWCAP_S390_" to "HWCAP_".  For compatibility reasons, we do not
change the prefix in public glibc header file.

support: Also return fd when it is 0

The fd validity check in open_dev_null checks if fd > 0, which would
lead to a leaked fd if it is == 0.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

ld.so: Don't fill the DT_DEBUG entry in ld.so [BZ #28129]

Linker creates the DT_DEBUG entry only in executables. Don't fill the
non-existent DT_DEBUG entry in ld.so with the run-time address of the
r_debug structure. This fixes BZ #28129.

S390: update libm test ulps

Update after

commit 6bbf7298323bf31bc43494b2201465a449778e10.
Fixed inaccuracy of j0f (BZ #28185)

See also e.g.
commit c75b106145c30e6c7bcf87f384a5c68ce56406e9
aarch64: update libm test ulps

powerpc: update libm test ulps

Update after commit 6bbf7298323bf31bc43494b2201465a449778e10
(Fixed inaccuracy of j0f (BZ #28185)).

math: Also xfail the new j0f tests for ibm128-libgcc

From commit 6bbf7298323bf31b.

Checked on powerpc64-linux-gnu.

y2038: Use a common definition for stat for sparc32

The sparc32 misses support for support done by 4e8521333bea6.

Checked on sparcv9-linux-gnu.

Fix stdlib/tst-setcontext.c for GCC 12 -Warray-compare

Building stdlib/tst-setcontext.c fails with GCC mainline:

tst-setcontext.c: In function 'f2':
tst-setcontext.c:61:16: error: comparison between two arrays [-Werror=array-compare]
61 | if (on_stack < st2 || on_stack >= st2 + sizeof (st2))
| ^
tst-setcontext.c:61:16: note: use '&on_stack[0] < &st2[0]' to compare the addresses

The comparison in this case is deliberate, so adjust it as suggested
in that note.

Tested with build-many-glibcs.py (GCC mainline) for aarch64-linux-gnu.

aarch64: update libm test ulps

Update after

commit 6bbf7298323bf31bc43494b2201465a449778e10.
Fixed inaccuracy of j0f (BZ #28185)

Fixed inaccuracy of j0f (BZ #28185)

The largest errors over the full binary32 range are after this
patch (on x86_64):

RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7

Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>

Fix stdio-common tests for GCC 12 -Waddress

My glibc bot shows failures building the testsuite with GCC mainline
across all architectures:

tst-vfprintf-width-prec.c: In function 'do_test':
tst-vfprintf-width-prec.c:90:16: error: the comparison will always evaluate as 'false' for the address of 'result' will never be NULL [-Werror=address]
   90 |     if (result == NULL)
      |                ^~
tst-vfprintf-width-prec.c:89:13: note: 'result' declared here
   89 |     wchar_t result[100];
      |             ^~~~~~

This is clearly a correct warning; the comparison against NULL is
clearly a cut-and-paste mistake from an earlier case in the test that
does use calloc.  Thus, remove the unnecessary check for NULL shown up
by the warning.

Similarly, two other tests have bogus comparisons against NULL; remove
those as well:

scanf14a.c:95:13: error: the comparison will always evaluate as 'false' for the address of 'fname' will never be NULL [-Werror=address]
   95 |   if (fname == NULL)
      |             ^~
scanf14a.c:93:8: note: 'fname' declared here
   93 |   char fname[strlen (tmpdir) + sizeof "/tst-scanf14.XXXXXX"];
      |        ^~~~~

scanf16a.c:125:13: error: the comparison will always evaluate as 'false' for the address of 'fname' will never be NULL [-Werror=address]
  125 |   if (fname == NULL)
      |             ^~
scanf16a.c:123:8: note: 'fname' declared here
  123 |   char fname[strlen (tmpdir) + sizeof "/tst-scanf16.XXXXXX"];
      |        ^~~~~

Tested with build-many-glibcs.py (GCC mainline) for aarch64-linux-gnu.

benchtests: Building benchmarks as static executables

Building benchmarks as static executables:
=========================================

To build benchmarks as static executables, on the build system, run:

$ make STATIC-BENCHTESTS=yes bench-build

You can copy benchmark executables to another machine and run them
without copying the source nor build directories.

elf: Avoid deadlock between pthread_create and ctors [BZ #28357]

The fix for bug 19329 caused a regression such that pthread_create can
deadlock when concurrent ctors from dlopen are waiting for it to finish.
Use a new GL(dl_load_tls_lock) in pthread_create that is not taken
around ctors in dlopen.

The new lock is also used in __tls_get_addr instead of GL(dl_load_lock).

The new lock is held in _dl_open_worker and _dl_close_worker around
most of the logic before/after the init/fini routines.  When init/fini
routines are running then TLS is in a consistent, usable state.
In _dl_open_worker the new lock requires catching and reraising dlopen
failures that happen in the critical section.

The new lock is reinitialized in a fork child, to keep the existing
behaviour and it is kept recursive in case malloc interposition or TLS
access from signal handlers can retake it.  It is not obvious if this
is necessary or helps, but avoids changing the preexisting behaviour.

The new lock may be more appropriate for dl_iterate_phdr too than
GL(dl_load_write_lock), since TLS state of an incompletely loaded
module may be accessed.  If the new lock can replace the old one,
that can be a separate change.

Fixes bug 28357.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

time: Ignore interval nanoseconds on tst-itimer

Running the test on a 4.4 kernel within KVM, the precision used on
ITIMER_VIRTUAL and ITIMER_PROF seems to different than the one used
for ITIMER_REAL (it seems the same used for CLOCK_REALTIME_COARSE and
CLOCK_MONOTONIC_COARSE). I did not see it on other kernels, for
instance 5.11 and 4.15.

To avoid trying to guess the resolution used, do not check the
nanosecond internal values for the specific timers.

Checked on i686-linux-gnu with a 4.4 kernel.

io: Do not skip timestamps tests for 32-bit time_t

The first test in the set do not require 64-bit time_t support, so there
is no need to return UNSUPPORTED for the whole test. The patch also adds
another test with arbitrary date prior y2038.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Update to Unicode 14.0.0 [BZ #28390]

Unicode 14.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 14.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

Total added characters in newly generated CHARMAP: 838
Total removed characters in newly generated WIDTH: 1
    (Characters not in WIDTH get width 1 by default, i.e. these have width 1 now.)
    removed: <U1734> 0 : eaw=N category=Mc bidi=L   name=HANUNOO SIGN PAMUDPOD
    That seems intentional, the character had category Mn (Mark, nonspacing) before
    and now has Mc (Mark, spacing combining)
Total changed characters in newly generated WIDTH: 0
Total added characters in newly generated WIDTH: 175

nptl: pthread_kill must send signals to a specific thread [BZ #28407]

The choice between the kill vs tgkill system calls is not just about
the TID reuse race, but also about whether the signal is sent to the
whole process (and any thread in it) or to a specific thread.

This was caught by the openposix test suite:

LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset.
<https://gitlab.com/cki-project/kernel-tests/-/issues/764>

Fixes commit 526c3cf11ee9367344b6b15d669e4c3cb461a2be ("nptl: Fix race
between pthread_kill and thread exit (bug 12889)").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>

support: Add check for TID zero in support_wait_for_thread_exit

Some kernel versions (observed with kernel 5.14 and earlier) can list
"0" entries in /proc/self/task. This happens when a thread exits
while the task list is being constructed. Treat this entry as not
present, like the proposed kernel patch does:

[PATCH] procfs: Do not list TID 0 in /proc/<pid>/task
<https://lore.kernel.org/all/8735pn5dx7.fsf@oldenburg.str.redhat.com/>

Fixes commit 032d74eaf6179100048a5bf0ce942e97dc8b9a60 ("support: Add
support_wait_for_thread_exit").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>

nptl: Add CLOCK_MONOTONIC support for PI mutexes

Linux added FUTEX_LOCK_PI2 to support clock selection
(commit bf22a6976897977b0a3f1aeba6823c959fc4fdae). With the new
flag we can now proper support CLOCK_MONOTONIC for
pthread_mutex_clocklock with Priority Inheritance. If kernel
does not support, EINVAL is returned instead.

The difference is the futex operation will be issued and the kernel
will advertise the missing support (instead of hard-code error
return).

Checked on x86_64-linux-gnu and i686-linux-gnu on Linux 5.14, 5.11,
and 4.15.

support: Add support_mutex_pi_monotonic

Returns true if Priority Inheritance support CLOCK_MONOTONIC.

nptl: Use FUTEX_LOCK_PI2 when available

This patch uses the new futex PI operation provided by Linux v5.14
when it is required.

The futex_lock_pi64() is moved to futex-internal.c (since it used on
two different places and its code size might be large depending of the
kernel configuration) and clockid is added as an argument.

Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>

Linux: Add FUTEX_LOCK_PI2

Linux v5.14.0 introduced a new futex operation called FUTEX_LOCK_PI2.

This kernel feature can be used to implement
pthread_mutex_clocklock(MONOTONIC)/PI.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Add C2X _PRINTF_NAN_LEN_MAX

C2X adds a macro _PRINTF_NAN_LEN_MAX to <stdio.h>, giving the maximum
length of printf output for a NaN. glibc never includes an
n-char-sequence in its printf output for NaNs, so the correct value
for glibc is 4 ("-nan" or "-NAN"); define the macro accordingly.

This patch makes the macro definition conditional on __GLIBC_USE
(ISOC2X), as is generally done with features from new standard
versions. The name is in the implementation namespace for older
standards, so it would also be possible to define it unconditionally.

Tested for x86_64.

Add exp10 macro to <tgmath.h> (bug 26108)

glibc has had exp10 functions since long before they were
standardized; now they are standardized in TS 18661-4 and C2X, they
are also specified there to have a corresponding type-generic macro.
Add one to <tgmath.h>, so fixing bug 26108.

glibc doesn't have other functions from TS 18661-4 yet, but when
added, it will be natural to add the type-generic macro for each
function family at the same time as the functions.

Tested for x86_64.

elf: Replace nsid with args.nsid [BZ #27609]

commit ec935dea6332cb22f9881cd1162bad156173f4b0
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Apr 24 22:31:15 2020 +0200

    elf: Implement __libc_early_init

has

@@ -856,6 +876,11 @@ no more namespaces available for dlmopen()"));
   /* See if an error occurred during loading.  */
   if (__glibc_unlikely (exception.errstring != NULL))
     {
+      /* Avoid keeping around a dangling reference to the libc.so link
+   map in case it has been cached in libc_map.  */
+      if (!args.libc_already_loaded)
+  GL(dl_ns)[nsid].libc_map = NULL;
+

do_dlopen calls _dl_open with nsid == __LM_ID_CALLER (-2), which calls
dl_open_worker with args.nsid = nsid.  dl_open_worker updates args.nsid
if it is __LM_ID_CALLER.  After dl_open_worker returns, it is wrong to
use nsid.

Replace nsid with args.nsid after dl_open_worker returns.  This fixes
BZ #27609.

Add missing braces to bsearch inline implementation [BZ #28400]

GCC treats the pragma as a statement, so that the else branch only
consists of the pragma, not the return statement.

Fixes commit a725ff1de965f4cc4f36a7e8ae795d40ca0350d7 ("Suppress
-Wcast-qual warnings in bsearch").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Update alpha libm-test-ulps

Suppress -Wcast-qual warnings in bsearch

The first cast to (void *) is redundant but should be (const void *)
anyway, because that's the type of the lvalue being assigned to.

The second cast is necessary and intentionally not const-correct, so
tell the compiler not to warn about it.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

elf: Copy l_addr/l_ld when adding ld.so to a new namespace

When add ld.so to a new namespace, we don't actually load ld.so. We
create a new link map and refers the real one for almost everything.
Copy l_addr and l_ld from the real ld.so link map to avoid GDB warning:

warning: .dynamic section for ".../elf/ld-linux-x86-64.so.2" is not at the expected address (wrong library or version mismatch?)

when handling shared library loaded by dlmopen.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

powerpc: Fix unrecognized instruction errors with recent binutils

Recent versions of binutils (with commit
b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a) stopped preserving "sticky"
options across a base `.machine` directive, nullifying the use of
passing "-many" through GCC to the assembler. As a result, some
instructions which were recognized even under older, more stringent
`.machine` directives become unrecognized instructions in that
context.

In `sysdeps/powerpc/tst-set_ppr.c`, the use of the `mfppr32` extended
mnemonic became unrecognized, as the default compilation with GCC for
32bit powerpc adds a `.machine ppc` in the resulting assembly, so the
command line option `-Wa,-many` is essentially ignored, and the ISA 2.06
instructions and mnemonics, like `mfppr32`, are unrecognized.

The compilation of `sysdeps/powerpc/tst-set_ppr.c` fails with:
Error: unrecognized opcode: `mfppr32'

Add appropriate `.machine` directives in the assembly to bracket the
`mfppr32` instruction.

Part of a 2019 fix (commit 9250e6610fdb0f3a6f238d2813e319a41fb7a810) to
the above test's Makefile to add `-many` to the compilation when GCC
itself stopped passing `-many` to the assember no longer has any effect,
so remove that.

Reported-by: Joseph Myers <joseph@codesourcery.com>

Do not declare fmax, fmin _FloatN, _FloatNx versions for C2X

At the last WG14 meeting,
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2711.htm> was
accepted, which places more emphasis on the new fmaximum / fminimum
functions and less on the old fmax / fmin functions.  Some of the
changes are to examples, notes or otherwise don't require
implementation changes.  However, the changes include removing the
_FloatN / _FloatNx versions of the fmax and fmin functions that came
from TS 18661-3.

Thus, those function versions should only be declared under similar
conditions to the _FloatN / _FloatNx versions of fmaxmag and fminmag:
for _GNU_SOURCE and pre-C2X use of __STDC_WANT_IEC_60559_TYPES_EXT__,
but not for C2X without _GNU_SOURCE.

In turn this requires a tgmath.h change so that the corresponding
tgmath.h macros, for C2X with __STDC_WANT_IEC_60559_TYPES_EXT__ but
without _GNU_SOURCE, don't try to use function variants that aren't
declared.  (That issue doesn't arise for the tgmath.h macros for
fmaxmag and fminmag, because those aren't defined at all in those
circumstances unless __STDC_WANT_IEC_60559_BFP_EXT__ (from TS 18661-1
and not specified at all by C2X) is also defined, and in that case the
_FloatN / _FloatNx versions of fmaxmag and fminmag get declared - this
is only ever an issue when it's possible for some functions
corresponding to a type-generic-macro to be declared, and for _FloatN
/ _FloatNx functions in general to be declared, but without the
_FloatN / _FloatNx functions corresponding to that particular macro
being declared.)

Tested for x86_64.

Do not define tgmath.h fmaxmag, fminmag macros for C2X (bug 28397)

C2X does not include fmaxmag and fminmag. When I updated feature test
macro handling accordingly (commit
858045ad1c5ac1682288bbcb3676632b97a21ddf, "Update floating-point
feature test macro handling for C2X", included in 2.34), I missed
updating tgmath.h so it doesn't define the corresponding type-generic
macros unless __STDC_WANT_IEC_60559_BFP_EXT__ is defined; I've now
reported this as bug 28397. Adjust the conditionals in tgmath.h
accordingly.

Tested for x86_64.

Add fmaximum, fminimum functions

C2X adds new <math.h> functions for floating-point maximum and
minimum, corresponding to the new operations that were added in IEEE
754-2019 because of concerns about the old operations not being
associative in the presence of signaling NaNs.  fmaximum and fminimum
handle NaNs like most <math.h> functions (any NaN argument means the
result is a quiet NaN).  fmaximum_num and fminimum_num handle both
quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if
one argument is a number and the other is a NaN, return the number),
but still raise "invalid" for a signaling NaN argument, making them
exceptions to the normal rule that a function with a floating-point
result raising "invalid" also returns a quiet NaN.  fmaximum_mag,
fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding
functions returning the argument with greatest or least absolute
value.  All these functions also treat +0 as greater than -0.  There
are also corresponding <tgmath.h> type-generic macros.

Add these functions to glibc.  The implementations use type-generic
templates based on those for fmax, fmin, fmaxmag and fminmag, and test
inputs are based on those for those functions with appropriate
adjustments to the expected results.  The RISC-V maintainers might
wish to add optimized versions of fmaximum_num and fminimum_num (for
float and double), since RISC-V (F extension version 2.2 and later)
provides instructions corresponding to those functions - though it
might be at least as useful to add architecture-independent built-in
functions to GCC and teach the RISC-V back end to expand those
functions inline, which is what you generally want for functions that
can be implemented with a single instruction.

Tested for x86_64 and x86, and with build-many-glibcs.py.

Linux: Simplify __opensock and fix race condition [BZ #28353]

AF_NETLINK support is not quite optional on modern Linux systems
anymore, so it is likely that the first attempt will always succeed.
Consequently, there is no need to cache the result. Keep AF_UNIX
and the Internet address families as a fallback, for the rare case
that AF_NETLINK is missing. The other address families previously
probed are totally obsolete be now, so remove them.

Use this simplified version as the generic implementation, disabling
Netlink support as needed.

pthread/tst-cancel28: Fix barrier re-init race condition

When running this test on the OpenRISC port I am working on this test
fails with a timeout.  The test passes when being straced or debugged.
Looking at the code there seems to be a race condition in that:

  1 main thread: calls xpthread_cancel
  2 sub thread : receives cancel signal
  3 sub thread : cleanup routine waits on barrier
  4 main thread: re-inits barrier
  5 main thread: waits on barrier

After getting to 5 the main thread and sub thread wait forever as the 2
barriers are no longer the same.

Removing the barrier re-init seems to fix this issue.  Also, the barrier
does not need to be reinitialized as that is done by default.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

powerpc: Delete unneeded ELF_MACHINE_BEFORE_RTLD_RELOC

Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>

posix: Remove spawni.c

Although it provide an alternate implementation that communicates
using pipe() instead of shared memory, no port uses and it adds extra
burden for posix_spawn() extensions.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Disable symbol hack in libc_nonshared.a

Don't reference __GI_memmove, __GI_memset, __GI_memcpy, __divdi3_internal,
__udivdi3_internal and __moddi3_internal in libc_nonshared.a.

linux: Revert the use of sched_getaffinity on get_nproc (BZ #28310)

The use of sched_getaffinity on get_nproc and
sysconf (_SC_NPROCESSORS_ONLN) done in 903bc7dcc2acafc40 (BZ #27645)
breaks the top command in common hypervisor configurations and also
other monitoring tools.

The main issue using sched_getaffinity changed the symbols semantic
from system-wide scope of online CPUs to per-process one (which can
be changed with kernel cpusets or book parameters in VM).

This patch reverts mostly of the 903bc7dcc2acafc40, with the
exceptions:

  * No more cached values and atomic updates, since they are inherent
    racy.

  * No /proc/cpuinfo fallback, since /proc/stat is already used and
    it would require to revert more arch-specific code.

  * The alloca is replace with a static buffer of 1024 bytes.

So the implementation first consult the sysfs, and fallbacks to procfs.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

linux: Simplify get_nprocs

This patch simplifies the memory allocation code and uses the sched
routines instead of reimplement it.  This still uses a stack
allocation buffer, so it can be used on malloc initialization code.

Linux currently supports at maximum of 4096 cpus for most architectures:

$ find -iname Kconfig | xargs git grep -A10 -w NR_CPUS | grep -w range
arch/alpha/Kconfig- range 2 32
arch/arc/Kconfig- range 2 4096
arch/arm/Kconfig- range 2 16 if DEBUG_KMAP_LOCAL
arch/arm/Kconfig- range 2 32 if !DEBUG_KMAP_LOCAL
arch/arm64/Kconfig- range 2 4096
arch/csky/Kconfig- range 2 32
arch/hexagon/Kconfig- range 2 6 if SMP
arch/ia64/Kconfig- range 2 4096
arch/mips/Kconfig- range 2 256
arch/openrisc/Kconfig- range 2 32
arch/parisc/Kconfig- range 2 32
arch/riscv/Kconfig- range 2 32
arch/s390/Kconfig- range 2 512
arch/sh/Kconfig- range 2 32
arch/sparc/Kconfig- range 2 32 if SPARC32
arch/sparc/Kconfig- range 2 4096 if SPARC64
arch/um/Kconfig- range 1 1
arch/x86/Kconfig-# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range.
arch/x86/Kconfig- range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END
arch/xtensa/Kconfig- range 2 32

With x86 supporting 8192:

arch/x86/Kconfig
976 config NR_CPUS_RANGE_END
977         int
978         depends on X86_64
979         default 8192 if  SMP && CPUMASK_OFFSTACK
980         default  512 if  SMP && !CPUMASK_OFFSTACK
981         default    1 if !SMP

So using a maximum of 32k cpu should cover all cases (and I would
expect once we start to have many more CPUs that Linux would provide
a more straightforward way to query for such information).

A test is added to check if sched_getaffinity can successfully return
with large buffers.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

misc: Add __get_nprocs_sched

This is an internal function meant to return the number of avaliable
processor where the process can scheduled, different than the
__get_nprocs which returns a the system available online CPU.

The Linux implementation currently only calls __get_nprocs(), which
in tuns calls sched_getaffinity.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

htl: Fix sigset of main thread

d482ebfa6785 ('htl: Keep thread signals blocked during its initialization')
fixed not letting signals get delivered too early during thread creation,
but it also affected the main thread, thus making it block signals by
default. We need to just let the main thread sigset as it is.

htl: make pthread_sigstate read/write set/oset outside sigstate section

so that if a segfault occurs, the handler can run fine.

Avoid warning: overriding recipe for .../tst-ro-dynamic-mod.so

Add tst-ro-dynamic-mod to modules-names-nobuild to avoid

../Makerules:767: warning: ignoring old recipe for target '.../elf/tst-ro-dynamic-mod.so'

This updates BZ #28340 fix.

benchtests: Improve reliability of memcmp benchmarks

No bug. Remove reallocation of bufs between implementation tests. Move
initialization outside of foreach implementation test loop. Increase
iteration count.

Generally before this commit was seeing a great deal of variability
between runs. The goal of this commit is to make the results more
reliable.

Benchtests build and bench-memcmp succeeding.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>

Define __STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__

TS 18661-1 and C2X specify predefined macros __STDC_IEC_60559_BFP__
and __STDC_IEC_60559_COMPLEX__, making __STDC_IEC_559__ and
__STDC_IEC_559_COMPLEX__ obsolescent (but still included in the
standard).  Now that we have all the functions from TS 18661-1, define
these macros in stdc-predef.h, under the same conditions in which the
older macros are defined, since support for the floating-point
features in TS 18661-1 is now at the same level as that for those in
C11 and before (all library functions and other library APIs present,
but no standard pragma support).

The macros are defined for now with their TS 18661-1 values.  C2X will
give them new values (listed as yyyymmL in the working drafts until
the final standard), at which point there will be the question of what
value to use in stdc-predef.h (where it could depend on
__STDC_VERSION__, but not on feature test macros defined by the user).
My inclination then would be to use the C2X value unconditionally
rather than using an older value to indicate TS support, and only have
any C standard version conditionals for the value when subsequent C
standard versions define further values.

(Note that I'm also inclined, when we implement the C2X change to the
return types of fromfp functions, to make that change unconditional
much like the change made to the types of totalorder functions, with
the old version only supported with compat symbols for already-linked
programs and not as an API for newly built objects.  So using the C2X
value would also accurately reflect not supporting the versions of
APIs in the TS where those ended up being incompatible with the first
version actually added to the standard.)

Tested for x86_64.

build-many-glibcs.py: add powerpc64le glibc variant without multiarch

This configuration tests the float128 to ldouble128 redirect support
on powerpc64le without the extra wrappers needed to support ifunc
on this target.

Fix sysdeps/x86/fpu/s_ffma.c for 32-bit FMA processor case

It turns out the __SSE2_MATH__ conditional in sysdeps/x86/fpu/s_ffma.c
does not cover all cases where the x86 fenv_private.h macros might
manipulate one of the SSE and 387 floating-point state, while the
actual fma implementation uses the other.  Specifically, in the 32-bit
case, with a compiler not defaulting to -mfpmath=sse, but testing on a
processor with hardware FMA support, the multiarch fma function
implementations will end up using SSE, while the fenv_private.h macros
will use the 387 state for double.  Change the conditional to use the
default macros rather than the optimized ones in all cases except when
the compiler inlines an fma instruction (in which case, since all
those instructions are SSE instructions and -mfpmath=sse must be in
effect for them to be inlined, the optimized macros will only use the
SSE state and it's OK for them to only use the SSE state).

Tested for x86_64 and x86.  H.J. reports in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131367.html>
that it fixes the problems he observed.

Linux: Avoid closing -1 on failure in __closefrom_fallback

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

i386: Port elf_machine_{load_address,dynamic} from x86-64

This drops reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC.

The code sequence length does not change.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

aarch64: Disable A64FX memcpy/memmove BTI unconditionally

This patch disables A64FX memcpy/memmove BTI instruction insertion
unconditionally such as A64FX memset patch [1] for performance.

[1] commit 07b427296b8d59f439144029d9a948f6c1ce0a31

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

xsysconf: Only fail on error results and errno set

When testing nptl/tst-pthread-attr-affinity-fail fails with:

    error: xsysconf.c:33: sysconf (83): Cannot allocate memory
    error: 1 test failures

This happens as xsysconf checks the errno after running sysconf.
Internally the sysconf request for _SC_NPROCESSORS_CONF on linux
allocates memory.  But there is a problem, even though malloc succeeds
errno is getting set to ENOMEM.

POSIX allows successful calls to clobber errno.  So xsysconf just
checking errno is wrong.  Fix xsysconf by only failing if we have an
error result and errno is set.

powerpc64le: Avoid conflicting types for f64xfmaf128 when IFUNC is not used

Avoid defining f64xfmaf128 twice when building s_fmaf128.c.
This can be reproduced on powerpc64le whenever f128 functions do not
have IFUNC enabled, e.g. using "--with-cpu=power8 --disable-multi-arch", or
when using "-with-cpu=power9".

Fixes: b3f27d8150d4f ("Add narrowing fma functions")

Fix ffma use of round-to-odd on x86

On 32-bit x86 with -mfpmath=sse, and on x86_64 with
--disable-multi-arch, the tests of ffma and its aliases (fma narrowing
from binary64 to binary32) fail.  This is probably the issue reported
by H.J. in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>.

The problem is the use of fenv_private.h macros in the round-to-odd
implementation.  Those macros are set up to manipulate only one of the
SSE and 387 floating-point state, whichever is relevant for the type
indicated by the suffix on the macro name.  But x86 configurations
sometimes use the ldbl-96 implementation of binary64 fma (that's where
--disable-multi-arch is relevant for x86_64: it causes the ldbl-96
implementation to be used, instead of an IFUNC implementation that
falls back to the dbl-64 version), contrary to the expectations of
those macros for functions operating on double when __SSE2_MATH__ is
defined.

This can be addressed by using the default versions of those macros
(giving x86 its own version of s_ffma.c), as is done for the *f128
macro variants where it depends on the details of how GCC was
configured when building libgcc which floating-point state is affected
by _Float128 arithmetic.  The issue only applies when __SSE2_MATH__ is
defined, and doesn't apply when __FP_FAST_FMA is defined (because in
that case, fma will be inlined by the compiler, meaning it's
definitely an SSE operation; for the same reason, this is not an issue
for narrowing sqrt, as hardware sqrt is always inlined in that
implementation for x86), but in other cases it's safest to use the
default versions of the fenv_private.h macros to ensure things work
whichever fma implementation is used.

Tested for x86_64 (with and without --disable-multi-arch) and x86
(with and without -mfpmath=sse).

vfprintf: Unify argument handling in process_arg

Instead of checking a pointer argument for NULL, use helper macros
defined differently in the non-positional and positional cases.
This avoids frequent conditional checks and a GCC 12 warning
about comparing pointers against NULL which cannot be NULL.

vfprintf: Handle floating-point cases outside of process_arg macro

A lot of the code is unique to the positional and non-positional
code. Also unify the decimal and hexadecimal cases via the new
helper function __printf_fp_spec.

nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361]

As part of the fix for bug 12889, signals are blocked during
thread exit, so that application code cannot run on the thread that
is about to exit.  This would cause problems if the application
expected signals to be delivered after the signal handler revealed
the thread to still exist, despite pthread_kill can no longer be used
to send signals to it.  However, glibc internally uses the SIGSETXID
signal in a way that is incompatible with signal blocking, due to the
way the setxid handshake delays thread exit until the setxid operation
has completed.  With a blocked SIGSETXID, the handshake can never
complete, causing a deadlock.

As a band-aid, restore the previous handshake protocol by not blocking
SIGSETXID during thread exit.

The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on
a downstream test by Martin Osvald.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>