Florian Weimer [Tue, 5 Jul 2022 07:05:22 +0000 (09:05 +0200)]
locale: Fix signed char bug in lr_getc
The array lr->buf contains characters, which can be signed. A 0xff
byte in the input could be incorrectly reported as EOF. More
importantly, get_string in linereader.c converts a signed input byte
to a Unicode code point using ADDWC ((uint32_t) ch), under the
assumption that this decodes the ISO-8859-1 input encoding. If char
is signed, this does not give the correct result. This means that
ISO-8859-1 input files for localedef are not actually supported,
contrary to the comment in get_string. This is a happy accident because
we can therefore change the file encoding to UTF-8 without impacting
backwards compatibility.
While at it, remove the \32 check for MS-DOS end-of-file character (^Z).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Florian Weimer [Tue, 5 Jul 2022 07:05:22 +0000 (09:05 +0200)]
locale: Turn ADDC and ADDS into functions in linereader.c
And introduce struct lr_buffer. The functions addc and adds can
be called from functions, enabling subsequent refactoring.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Fangrui Song [Tue, 5 Jul 2022 04:15:51 +0000 (21:15 -0700)]
libc-symbols.h: remove unused macros
Beside weak_hidden_alias/declare_symbol_alias/hidden_data_ver, many
*_hidden_* macros are removed. If there is a rare need to use one, one
may write something like `#if IS_IN (libm)\nhidden_def (...)\n#endif`
instead.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Mon, 4 Jul 2022 13:41:58 +0000 (10:41 -0300)]
Fix hurd namespace issues for internal signal functions
It was introduced by "Refactor internal-signals.h
(
a1bdd81664aa681364d)". Use the internal symbols instead.
Checked with a build for i686-gnu.
Guilherme Janczak [Wed, 22 Jun 2022 14:42:39 +0000 (14:42 +0000)]
argp: Remove old includes in !_LIBC case
The headers mempcpy.h, strcase.h, strchrnul.h, and strndup.h are
included if not building argp for glibc. Commit
c5af724c0b214a517f8558887f7a70efcfa2c813 added them in 2003 for gnulib,
but gnulib's current master patches them out:
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/argp-namefrob.h;h=
9c82ac79c215540f986c3e04398edba3ba1b7234;hb=HEAD
Joseph Myers [Mon, 4 Jul 2022 13:15:58 +0000 (13:15 +0000)]
Use GCC 12 branch in build-many-glibcs.py
This patch makes build-many-glibcs.py use GCC 12 branch by default.
Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).
Adhemerval Zanella [Thu, 21 Apr 2022 12:41:59 +0000 (09:41 -0300)]
Refactor internal-signals.h
The main drive is to optimize the internal usage and required size
when sigset_t is embedded in other data structures. On Linux, the
current supported signal set requires up to 8 bytes (16 on mips),
was lower than the user defined sigset_t (128 bytes).
A new internal type internal_sigset_t is added, along with the
functions to operate on it similar to the ones for sigset_t. The
internal-signals.h is also refactored to remove unused functions
Besides small stack usage on some functions (posix_spawn, abort)
it lower the struct pthread by about 120 bytes (112 on mips).
Checked on x86_64-linux-gnu.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Kito Cheng [Tue, 28 Jun 2022 13:52:19 +0000 (21:52 +0800)]
riscv: Use memcpy to handle unaligned access when fixing R_RISCV_RELATIVE
Although RISC-V Linux will enable the unaligned memory access handler by
default, that is quite expensive in general, using memcpy will be much cheaper
- just break down that into several load/store byte instructions.
ARM and MIPS has similar issue:
ARM: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51456
MIPS: https://gcc.gnu.org/legacy-ml/gcc-help/2005-07/msg00325.html
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Tejas Belagod [Mon, 27 Jun 2022 18:00:50 +0000 (18:00 +0000)]
AArch64: Add asymmetric faulting mode for tag violations in mem.tagging tunable
The new asymmetric mode is available when HWCAP2_MTE3 is set (support is
available), bit2 is set in the tunable (user request per application),
and the system is configured such that the asymmetric mode is preferred over
sync or async (per-cpu system-wide setting).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Adhemerval Zanella [Thu, 30 Jun 2022 12:08:31 +0000 (09:08 -0300)]
linux: Fix mq_timereceive check for 32 bit fallback code (BZ 29304)
On success, mq_receive() and mq_timedreceive() return the number of
bytes in the received message, so it requires to check if the value
is larger than 0.
Checked on i686-linux-gnu.
Noah Goldstein [Thu, 30 Jun 2022 01:56:18 +0000 (18:56 -0700)]
x86: Add missing IS_IN (libc) check to strncmp-sse4_2.S
Was missing to for the multiarch build rtld-strncmp-sse4_2.os was
being built and exporting symbols:
build/glibc/string/rtld-strncmp-sse4_2.os:
0000000000000000 T __strncmp_sse42
Introduced in:
commit
11ffcacb64a939c10cfc713746b8ec88837f5c4a
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jun 21 12:10:50 2017 -0700
x86-64: Implement strcmp family IFUNC selectors in C
Noah Goldstein [Thu, 30 Jun 2022 01:56:17 +0000 (18:56 -0700)]
x86: Add missing IS_IN (libc) check to strcspn-sse4.c
Was missing to for the multiarch build rtld-strcspn-sse4.os was
being built and exporting symbols:
build/glibc/string/rtld-strcspn-sse4.os:
U ___m128i_shift_right
U __strcspn_generic
0000000000000000 T __strcspn_sse42
U strlen
build/glibc/string/rtld-varshift.os:
0000000000000000 R ___m128i_shift_right
Introduced in:
commit
06e51c8f3de38761f8855700841bc49cf495c8c0
Author: H.J. Lu <hongjiu.lu@intel.com>
Date: Fri Jul 3 02:48:56 2009 -0700
Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
Noah Goldstein [Thu, 30 Jun 2022 01:56:16 +0000 (18:56 -0700)]
x86: Add missing IS_IN (libc) check to memmove-ssse3.S
Was missing to for the multiarch build rtld-memmove-ssse3.os was
being built and exporting symbols:
>$ nm string/rtld-memmove-ssse3.os
U __GI___chk_fail
0000000000000020 T __memcpy_chk_ssse3
0000000000000040 T __memcpy_ssse3
0000000000000020 T __memmove_chk_ssse3
0000000000000040 T __memmove_ssse3
0000000000000000 T __mempcpy_chk_ssse3
0000000000000010 T __mempcpy_ssse3
U __x86_shared_cache_size_half
Introduced after 2.35 in:
commit
26b2478322db94edc9e0e8f577b2f71d291e5acb
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Thu Apr 14 11:47:40 2022 -0500
x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
H.J. Lu [Wed, 29 Jun 2022 20:42:06 +0000 (13:42 -0700)]
x86-64: Properly indent X86_IFUNC_IMPL_ADD_VN arguments
Properly indent X86_IFUNC_IMPL_ADD_VN arguments for memchr, rawmemchr
and wmemchr.
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 28 Jun 2022 15:26:26 +0000 (08:26 -0700)]
x86-64: Small improvements to dl-trampoline.S
1. Remove sse2 instructions when using the avx512 or avx version.
2. Fixup some format nits in how the address offsets where aligned.
3. Use more space efficient instructions in the conditional AVX
restoral.
- vpcmpeqq -> vpcmpeqb
- cmp imm32, r; jz -> inc r; jz
4. Use `rep movsb` instead of `rep movsq`. The former is guranteed to
be fast with the ERMS flags, the latter is not. The latter also
wastes an instruction in size setup.
Noah Goldstein [Wed, 29 Jun 2022 23:07:15 +0000 (16:07 -0700)]
x86: Move mem{p}{mov|cpy}_{chk_}erms to its own file
The primary memmove_{impl}_unaligned_erms implementations don't
interact with this function. Putting them in same file both
wastes space and unnecessarily bloats a hot code section.
Noah Goldstein [Wed, 29 Jun 2022 23:07:05 +0000 (16:07 -0700)]
x86: Move and slightly improve memset_erms
Implementation wise:
1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not
use the L(stosb) label that was previously defined.
2. Don't give the hotpath (fallthrough) to zero size.
Code positioning wise:
Move memset_{chk}_erms to its own file. Leaving it in between the
memset_{impl}_unaligned both adds unnecessary complexity to the
file and wastes space in a relatively hot cache section.
Noah Goldstein [Wed, 29 Jun 2022 23:07:04 +0000 (16:07 -0700)]
x86: Add definition for __wmemset_chk AVX2 RTM in ifunc impl list
This was simply missing and meant we weren't testing it properly.
Arjun Shankar [Wed, 29 Jun 2022 22:37:34 +0000 (00:37 +0200)]
linux: Remove unnecessary nice.c and signal.c
These files simply include the sysdeps/posix implementations which would
be used even in the absence of the files. They have been unnecessary
since
7b17aeda0c5e when nice and signal were removed from the
syscalls.list file.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Thu, 21 Apr 2022 12:44:48 +0000 (09:44 -0300)]
nptl: Remove unused members from struct pthread
It removes both pid_ununsed and cpuclock_offset_ununsed, saving about
12 bytes from struct pthread.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Florian Weimer [Wed, 29 Jun 2022 11:50:50 +0000 (13:50 +0200)]
Linux: Forward declaration of struct iovec for process_madvise
This maintains compatibility between <sys/mman.h> and <linux/uio.h>.
Before that, the addition of process_madvise made those two header
files incompatible. This has been observed resulting in a build
failure in LLDB's Process/Linux/NativeRegisterContextLinux_s390x.cpp
source file.
Fixes commit
d19ee3473d68ca0e794f3a8b7677a0983ae1342e
("linux: Add process_madvise").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Noah Goldstein [Tue, 28 Jun 2022 04:07:03 +0000 (21:07 -0700)]
x86: Add more feature definitions to isa-level.h
This commit doesn't change anything in itself. It is just to add
definitions that will be needed by future patches.
Florian Weimer [Tue, 28 Jun 2022 08:40:16 +0000 (10:40 +0200)]
elf: Fix -DNDEBUG warning in _dl_start_args_adjust
This is another blocker for building glibc with the default
-Werror setting and -DNDEBUG.
Yang Yanchao [Fri, 15 Apr 2022 09:25:05 +0000 (17:25 +0800)]
elf: Fix compile error with -Werror and -DNDEBUG
Using -Werror and -DNDEBUG at the same time will trigger the
following compiler error:
cache.c: In function 'save_cache':
cache.c:758:15: error: unused variable 'old_offset' [-Werror=unused-variable]
758 | off64_t old_offset = lseek64 (fd, extension_offset, SEEK_SET);
| ^~~~~~~~~~
-DNDEBUG disables the assertion, making old_offset unused.
Use __attribute__ ((unused)) to disable this warning.
H.J. Lu [Mon, 27 Jun 2022 18:36:28 +0000 (11:36 -0700)]
x86-64: Only define used SSE/AVX/AVX512 run-time resolvers
When glibc is built with x86-64 ISA level v3, SSE run-time resolvers
aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers
are unused. Check the minimum x86-64 ISA level to exclude the unused
run-time resolvers.
H.J. Lu [Mon, 27 Jun 2022 19:52:58 +0000 (12:52 -0700)]
x86: Move CPU_FEATURE{S}_{USABLE|ARCH}_P to isa-level.h
Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to
where MINIMUM_X86_ISA_LEVEL and XXX_X86_ISA_LEVEL are defined.
Noah Goldstein [Fri, 24 Jun 2022 23:15:42 +0000 (16:15 -0700)]
x86: Fix backwards Prefer_No_VZEROUPPER check in ifunc-evex.h
Add third argument to X86_ISA_CPU_FEATURES_ARCH_P macro so the runtime
CPU_FEATURES_ARCH_P check can be inverted if the
MINIMUM_X86_ISA_LEVEL is not high enough to constantly evaluate
the check.
Use this new macro to correct the backwards check in ifunc-evex.h
Noah Goldstein [Fri, 24 Jun 2022 16:42:13 +0000 (09:42 -0700)]
x86: Rename strstr_sse2 to strstr_generic as it uses string/strstr.c
This is in accordance with other files in the multiarch directory.
Noah Goldstein [Fri, 24 Jun 2022 16:42:14 +0000 (09:42 -0700)]
x86: Remove unused file wmemcmp-sse4
The memcmp-sse4 was removed in:
commit
7cbc03d03091d5664060924789afe46d30a5477e
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Fri Apr 15 12:28:00 2022 -0500
x86: Remove memcmp-sse4.S
so this file does nothing.
Noah Goldstein [Fri, 24 Jun 2022 16:42:15 +0000 (09:42 -0700)]
x86: Put wcs{n}len-sse4.1 in the sse4.1 text section
Previously was missing but the two implementations shouldn't get in
the sse2 (generic) text section.
Noah Goldstein [Fri, 24 Jun 2022 16:42:12 +0000 (09:42 -0700)]
x86: Align entry for memrchr to 64-bytes.
The function was tuned around 64-byte entry alignment and performs
better for all sizes with it.
As well different code boths where explicitly written to touch the
minimum number of cache line i.e sizes <= 32 touch only the entry
cache line.
Fangrui Song [Sun, 26 Jun 2022 22:31:19 +0000 (15:31 -0700)]
Makerules: Remove no-op -Wl,-d when linking libc_pic.os
In GNU ld, -d assigns space to common symbols for -r (i.e. change common
symbols to STB_GLOBAL definitions). This option was added in commit
da2d1bc5adf49352232ad0514e79fbd5dcae08e8 (1998) perhaps because ld at
that time had a bug that common symbols did not override shared object
definitions. -d has been long unneeded and more so since -fno-common
was added to +cflags.
Andreas Schwab [Fri, 24 Jun 2022 19:24:40 +0000 (21:24 +0200)]
m68k: optimize RTLD_START
Adhemerval Zanella [Tue, 7 Jun 2022 14:11:03 +0000 (11:11 -0300)]
misc: Optimize internal usage of __libc_single_threaded
By adding an internal alias to avoid the GOT indirection.
On some architecture, __libc_single_thread may be accessed through
copy relocations and thus it requires to update also the copies
default copy.
This is done by adding a new internal macro,
libc_hidden_data_{proto,def}, which has an addition argument that
specifies the alias name (instead of default __GI_ one).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
Adhemerval Zanella [Mon, 31 Jan 2022 13:04:57 +0000 (10:04 -0300)]
linux: Add move_mount
It was added on Linux 5.2 (
2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae)
as way t move a mount from one place to another and, in the next
commit, allow to attach an unattached mount tree.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Adhemerval Zanella [Mon, 31 Jan 2022 12:46:24 +0000 (09:46 -0300)]
linux: Add fsmount
It was added on 5.2 (
93766fbd2696c2c4453dd8e1070977e9cd4e6b6d) to
provide a way by which a filesystem opened with fsopen and configured
by a series of fsconfig calls can have a detached mount object
created for it.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Adhemerval Zanella [Mon, 31 Jan 2022 12:45:12 +0000 (09:45 -0300)]
linux: Add fsopen
It was added on Linux 5.2 (
24dcb3d90a1f67fe08c68a004af37df059d74005)
to start the process of preparing to create a superblock that will
then be mountable, using an fd as a context handle.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Florian Weimer [Fri, 24 Jun 2022 17:38:14 +0000 (19:38 +0200)]
resolv/tst-resolv-noaaaa: Support building for older C standards
This avoids a compilation error:
tst-resolv-noaaaa.c: In function 'response':
tst-resolv-noaaaa.c:74:11: error: a label can only be part of a statement and a declaration is not a statement
char ipv4[4] = {192, 0, 2, i + 1};
^~~~
tst-resolv-noaaaa.c:79:11: error: a label can only be part of a statement and a declaration is not a statement
char *name = xasprintf ("ptr-%d", i);
^~~~
Florian Weimer [Fri, 24 Jun 2022 16:16:41 +0000 (18:16 +0200)]
resolv: Implement no-aaaa stub resolver option
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Florian Weimer [Fri, 24 Jun 2022 16:16:41 +0000 (18:16 +0200)]
support: Change non-address output format of support_format_dns_packet
It makes sense to include the owner name (LHS) and record type in the
output, so that they can be checked for correctness.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Kito Cheng [Thu, 23 Jun 2022 15:47:04 +0000 (23:47 +0800)]
riscv: Use elf_machine_rela_relative to handle R_RISCV_RELATIVE
Minor clean-up, we need to change this part in following patch, clean this up
to prevent we duplicated the change twice.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Noah Goldstein [Thu, 23 Jun 2022 17:49:19 +0000 (10:49 -0700)]
x86: Remove faulty sanity tests for RTLD build with no multiarch
The sanity tests where meant to ensure that the default implementation
was only being built without multiarch with the exception of the
multiarch/rtld-*.S files.
The code used IS_IN (rtld) to check if the build for was for an
multiarch/rtld-*.S file which is incorrect as IS_IN (rtld) is set for
the non-multiarch build as well.
Noah Goldstein [Wed, 22 Jun 2022 23:34:42 +0000 (16:34 -0700)]
stdlib: Fixup mbstowcs NULL __dst handling. [BZ #29279]
commit
464d189b9622932a75302290625de84931656ec0 (origin/master, origin/HEAD)
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Jun 22 08:24:21 2022 -0700
stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265]
Incorrectly called `__mbstowcs_chk` in the NULL __dst case which is
incorrect as in the NULL __dst case we are explicitly skipping
the objsize checks.
As well, remove the `__always_inline` attribute which exists in
`__fortify_function`.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Noah Goldstein [Mon, 20 Jun 2022 20:02:10 +0000 (13:02 -0700)]
x86: Replace all sse instructions with vex equivilent in avx+ files
Most of these don't really matter as there was no dirty upper state
but we should generally avoid stray sse when its not needed.
The one case that really matters is in svml_d_tanh4_core_avx2.S:
blendvps %xmm0, %xmm8, %xmm7
When there was a dirty upper state.
Tested on x86_64-linux
Noah Goldstein [Wed, 22 Jun 2022 23:51:20 +0000 (16:51 -0700)]
x86: Add support for compiling {raw|w}memchr with high ISA level
1. Refactor files so that all implementations for in the multiarch
directory.
- Essentially moved sse2 {raw|w}memchr.S implementation to
multiarch/{raw|w}memchr-sse2.S
- The non-multiarch {raw|w}memchr.S file now only includes one of
the implementations in the multiarch directory based on the
compiled ISA level (only used for non-multiarch builds.
Otherwise we go through the ifunc selector).
2. Add ISA level build guards to different implementations.
- I.e memchr-avx2.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to include
it as we will always use one of the ISA level 4
implementations (memchr-evex{-rtm}.S).
3. Add new multiarch/rtld-{raw}memchr.S that just include the
non-multiarch {raw}memchr.S which will in turn select the best
implementation based on the compiled ISA level.
4. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
- Guranteed replacement essentially means that for any ISA level
build there must be a function that the baseline of the ISA
supports. So for {raw|w}memchr.S since there is not ISA level 2
function, the ISA level 2 build still includes the ISA level
1 (sse2) function. Once we reach the ISA level 3 build, however,
{raw|w}memchr-avx2{-rtm}.S will always be sufficient so the ISA
level 1 implementation ({raw|w}memchr-sse2.S) will not be built.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
Noah Goldstein [Wed, 22 Jun 2022 23:51:19 +0000 (16:51 -0700)]
x86: Add defines / utilities for making ISA specific x86 builds
1. Factor out some of the ISA level defines in isa-level.c to
standalone header isa-level.h
2. Add new headers with ISA level dependent macros for handling
ifuncs.
Note, this file does not change any code.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
Noah Goldstein [Wed, 22 Jun 2022 15:24:21 +0000 (08:24 -0700)]
stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265]
mbstows is defined if dst is NULL and is defined to special cased if
dst is NULL so the fortify objsize check if incorrect in that case.
Tested on x86-64 linux.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Noah Goldstein [Wed, 22 Jun 2022 17:53:33 +0000 (10:53 -0700)]
stdlib: Remove trailing whitespace from Makefile
This causes precommit tests to fail when pushing commits that modify
this file.
Andreas Schwab [Wed, 22 Jun 2022 11:16:30 +0000 (13:16 +0200)]
debug: make __read_chk a cancellation point (bug 29274)
The __read_chk function, as the implementation behind the fortified read
function, must be a cancellation point, thus it cannot use INLINE_SYSCALL.
Sam James [Thu, 9 Jun 2022 02:56:23 +0000 (03:56 +0100)]
s390: use LC_ALL=C for readelf call
Let's use LC_ALL=C as we do elsewhere for consistency.
Tested on s390x-ibm-linux-gnu.
See:
72bd208846535725ea28b8173e79ef60e57a968c
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
Sam James [Thu, 9 Jun 2022 02:56:22 +0000 (03:56 +0100)]
s390: use $READELF
We already check for it in root configure.ac with AC_CHECK_TOOL. Let's
use the result.
Tested on s390x-ibm-linux-gnu.
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
Noah Goldstein [Fri, 17 Jun 2022 18:18:32 +0000 (11:18 -0700)]
i386: Fix include paths for strspn, strcspn, and strpbrk
commit
c22eb807b0c8125101f6a274795425be2bbd0386
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Thu Jun 16 15:07:12 2022 -0700
x86: Rename generic functions with unique postfix for clarity
Changed the names of the strspn-c, strcspn-c, and strpbrk-c files
in a general refactor. It didn't change the include paths for the
i386 files breaking the i386 build. This commit fixes that.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
H.J. Lu [Fri, 17 Jun 2022 01:52:02 +0000 (18:52 -0700)]
elf: Silence GCC 11/12 false positive warning
Silence GCC 11/12 false positive warning with -mavx512f on dl-load.c:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106008
$ gcc -O2 -fPIC -march=x86-64 -mavx512f -S -Wall ...
dl-load.c: In function ‘_dl_map_object_from_fd.constprop’:
dl-load.c:1158:30: warning: ‘(((char *)loadcmds.113_68 + _933 + 16))[
329406144173384849].mapend’ may be used uninitialized [-Wmaybe-uninitialized]
Noah Goldstein [Thu, 16 Jun 2022 22:07:12 +0000 (15:07 -0700)]
x86: Rename generic functions with unique postfix for clarity
No functions are changed. It just renames generic implementations from
'{func}_sse2' to '{func}_generic'. This is just because the postfix
"_sse2" was overloaded and was used for files that had hand-optimized
sse2 assembly implementations and files that just redirected back
to the generic implementation.
Full xcheck passed on x86_64.
Noah Goldstein [Thu, 16 Jun 2022 22:01:08 +0000 (15:01 -0700)]
x86: Add BMI1/BMI2 checks for ISA_V3 check
BMI1/BMI2 are part of the ISA V3 requirements:
https://en.wikipedia.org/wiki/X86-64
And defined by GCC when building with `-march=x86-64-v3`
Fangrui Song [Thu, 16 Jun 2022 18:48:15 +0000 (11:48 -0700)]
x86-64: Handle fewer relocation types for RTLD_BOOTSTRAP
The RTLD_BOOTSTRAP branch is used to relocate ld.so itself. It only
needs to handle RELATIVE, GLOB_DAT, and JUMP_SLOT. RELATIVE has been
handled (by _ELF_DYNAMIC_DO_RELOC due to DT_RELACOUNT, or RELR), so the
switch statement only needs to handle GLOB_DAT and JUMP_SLOT.
We can drop these `#if[n]def RTLD_BOOTSTRAP` and add a large
`# ifndef RTLD_BOOTSTRAP` instead.
Fangrui Song [Thu, 16 Jun 2022 02:21:53 +0000 (19:21 -0700)]
aarch64: Handle fewer relocations for RTLD_BOOTSTRAP
The RTLD_BOOTSTRAP branch is used to relocate ld.so itself. It only
needs to handle RELATIVE, GLOB_DAT, and JUMP_SLOT.
TLSDESC/TLS_DTPMOD/TLS_DTPREL handling can be removed. Remove
`case AARCH64_R(RELATIVE)` as well as elf_machine_rela has checked it.
Tested on aarch64-linux-gnu.
Fangrui Song [Thu, 16 Jun 2022 01:42:03 +0000 (18:42 -0700)]
riscv: Change the relocations handled for RTLD_BOOTSTRAP
The RTLD_BOOTSTRAP branch is used to relocate ld.so itself. It only
needs to handle RELATIVE, GLOB_DAT, and the symbolic relocation type
(R_RISCV_{32,64}). NONE and IRELATIVE can be removed.
The code relies on ld.so having DT_RELACOUNT so that the RTLD_BOOTSTRAP
branch does not need handle RELATIVE. Drop this minor size
optimization for clarity.
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Noah Goldstein [Wed, 15 Jun 2022 17:41:28 +0000 (10:41 -0700)]
x86: Cleanup bounds checking in large memcpy case
1. Fix incorrect lower-bound threshold in L(large_memcpy_2x).
Previously was using `__x86_rep_movsb_threshold` and should
have been using `__x86_shared_non_temporal_threshold`.
2. Avoid reloading __x86_shared_non_temporal_threshold before
the L(large_memcpy_4x) bounds check.
3. Document the second bounds check for L(large_memcpy_4x)
more clearly.
Noah Goldstein [Wed, 15 Jun 2022 17:41:29 +0000 (10:41 -0700)]
x86: Add bounds `x86_non_temporal_threshold`
The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed
by memmove-vec-unaligned-erms.
The lower-bound is needed because memmove-vec-unaligned-erms unrolls
the loop aggressively in the L(large_memset_4x) case.
The upper-bound is needed because memmove-vec-unaligned-erms
right-shifts the value of `x86_non_temporal_threshold` by
LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow.
The lack of lower-bound can be a correctness issue. The lack of
upper-bound cannot.
Fangrui Song [Wed, 15 Jun 2022 20:02:17 +0000 (13:02 -0700)]
Remove remnant reference to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
This fixes nios2 build after commit
de38b2a343e6d64b95c50004943d6107a9e380d0.
Fangrui Song [Wed, 15 Jun 2022 18:29:55 +0000 (11:29 -0700)]
elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
If an executable has copy relocations for extern protected data, that
can only work if the library containing the definition is built with
assumptions (a) the compiler emits GOT-generating relocations (b) the
linker produces R_*_GLOB_DAT instead of R_*_RELATIVE. Otherwise the
library uses its own definition directly and the executable accesses a
stale copy. Note: the GOT relocations defeat the purpose of protected
visibility as an optimization, but allow rtld to make the executable and
library use the same copy when copy relocations are present, but it
turns out this never worked perfectly.
ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both
a.so and b.so define protected var and the executable copy relocates
var: b.so accesses its own copy even with GLOB_DAT. The behavior change
is from commit
62da1e3b00b51383ffa7efc89d8addda0502e107 (x86) and then
copied to nios2 (
ae5eae7cfc9c4a8297ff82ec6b794faca1976ecc) and arc
(
0e7d930c4c11de896fe807f67fa1eb756c9c1e05).
Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy
relocated data like a.so.
There is now a warning for copy relocation on protected symbol since
commit
7374c02b683b7110b853a32496a619410364d70b. It's extremely
unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
behavior, so let's remove it: this removes a check in the symbol lookup
code.
Noah Goldstein [Tue, 14 Jun 2022 22:37:28 +0000 (15:37 -0700)]
x86: Add sse42 implementation to strcmp's ifunc
This has been missing since the the ifuncs where added.
The performance of SSE4.2 is preferable to to SSE2.
Measured on Tigerlake with N = 20 runs.
Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906
Noah Goldstein [Tue, 14 Jun 2022 20:50:11 +0000 (13:50 -0700)]
x86: Fix misordered logic for setting `rep_movsb_stop_threshold`
Move the setting of `rep_movsb_stop_threshold` to after the tunables
have been collected so that the `rep_movsb_stop_threshold` (which
is used to redirect control flow to the non_temporal case) will
use any user value for `non_temporal_threshold` (set using
glibc.cpu.x86_non_temporal_threshold)
Fangrui Song [Tue, 14 Jun 2022 20:07:27 +0000 (13:07 -0700)]
elf: Refine direct extern access diagnostics to protected symbol
Refine commit
349b0441dab375099b1d7f6909c1742286a67da9:
1. Copy relocations for extern protected data do not work properly,
regardless whether GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS is used.
It makes sense to produce a warning unconditionally.
2. Non-zero value of an undefined function symbol may break pointer
equality, but may be benign in many cases (many programs don't take the
address in the shared object then compare it with the address in the
executable). Reword the diagnostic to be clearer.
3. Remove the unneeded condition !(undef_map->l_1_needed &
GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS). If the executable does
not not have GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (can only
occur in error cases), the diagnostic should be emitted as well.
When the defining shared object has
GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS, report an error to apply
the intended enforcement.
Stefan Liebler [Fri, 3 Jun 2022 12:52:51 +0000 (14:52 +0200)]
Avoid -Wstringop-overflow= warning in iconv module.
On s390x when compiling with GCC 12, I get this warning:
utf8-utf16-z9.c:
../iconv/loop.c: In function ‘__from_utf8_loop_etf3eh_single’:
../iconv/loop.c:445:22: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
445 | bytebuf[inlen++] = *inptr++;
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~
../iconv/loop.c:381:17: note: at offset 4 into destination object ‘bytebuf’ of size 4
381 | unsigned char bytebuf[MAX_NEEDED_INPUT];
| ^~~~~~~
../iconv/loop.c:445:22: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
445 | bytebuf[inlen++] = *inptr++;
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~
../iconv/loop.c:381:17: note: at offset 5 into destination object ‘bytebuf’ of size 4
381 | unsigned char bytebuf[MAX_NEEDED_INPUT];
| ^~~~~~~
This patch tells the compiler that inend is always behind inptr which
avoids the warning. Note that the SINGLE function is only used to
implement the mb*towc*() or wc*tomb*() functions. Those functions use
inptr and inend pointing to a variable on stack, compute the inend pointer
or explicitly check the arguments which always leads to inptr < inend.
Special notes for backporters (according to Siddhesh Poyarekar):
If someone wants to backport this patch to release branches, they should
also backport the following wcrtomb change. Otherwise the assumptions
assumed by this patch are not true.
commit
9bcd12d223a8990254b65e2dada54faa5d2742f3
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date: Fri May 13 19:10:15 2022 +0530
wcrtomb: Make behavior POSIX compliant
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Wilco Dijkstra [Fri, 10 Jun 2022 16:13:29 +0000 (17:13 +0100)]
Add bounds check to __libc_ifunc_impl_list
Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC
redundant and fixes several targets that will write outside the array.
To avoid unnecessary large diffs, pass the maximum in the argument 'i' to
IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing
ones can be updated if desired.
Passes buildmanyglibc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Wilco Dijkstra [Fri, 10 Jun 2022 12:33:26 +0000 (13:33 +0100)]
libio: Avoid RMW of flags2 outside lock (BZ #27842)
Remove an unconditional RMW on flags2 in flockfile - we don't need to change
_IO_FLAGS2_NEED_LOCK since it isn't used in flockfile or funlockfile.
This fixes BZ #27842.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Noah Goldstein [Thu, 9 Jun 2022 16:58:35 +0000 (09:58 -0700)]
x86: Optimize svml_s_tanhf4_core_sse4.S
Optimizations are:
1. Reduce code size (-112 bytes).
2. Remove redundant move instructions.
3. Slightly improve instruction selection/scheduling where
possible.
4. Prefer registers which get short instruction encoding.
5. Reduce rodata size (-4k+ rodata is shared with avx2).
Result is roughly a 15-16% speedup:
Function, New Time, Old Time, New / Old
_ZGVbN4v_tanhf, 3.158, 3.749, 0.842
Noah Goldstein [Thu, 9 Jun 2022 18:16:36 +0000 (11:16 -0700)]
x86: Optimize svml_s_tanhf8_core_avx2.S
Optimizations are:
1. Reduce code size (-81 bytes).
2. Remove redundant move instructions.
3. Slightly improve instruction selection/scheduling where
possible.
4. Prefer registers which get short instruction encoding.
5. Reduce rodata size (-32 bytes).
Result is roughly a 17-18% speedup:
Function, New Time, Old Time, New / Old
_ZGVdN8v_tanhf, 1.977, 2.402, 0.823
Noah Goldstein [Thu, 9 Jun 2022 16:58:33 +0000 (09:58 -0700)]
x86: Add data file that can be shared by tanhf-avx2 and tanhf-sse4
tanhf-avx2 and tanhf-sse4 use the same data tables so we can save
over 4kb using a shared datatable. This does increase the memory
footprint of the sse4 version (as now all the targets are 32 bytes
instead of 16), generally it seems worth the code size save.
NB: This patch doesn't do anything itself, it is setup for future
patches.
Noah Goldstein [Thu, 9 Jun 2022 16:58:32 +0000 (09:58 -0700)]
x86: Optimize svml_s_tanhf16_core_avx512.S
Optimizations are:
1. Reduce code size (-67 bytes).
2. Remove redundant move instructions.
3. Slightly improve instruction selection/scheduling where
possible.
4. Reduce rodata usage (-448 bytes).
Result is roughly a 14% speedup:
Function, New Time, Old Time, New / Old
_ZGVeN16v_tanhf, 0.649, 0.752, 0.863
Noah Goldstein [Thu, 9 Jun 2022 16:58:31 +0000 (09:58 -0700)]
x86: Improve svml_s_atanhf4_core_sse4.S
Improvements are:
1. Reduce code size (-62 bytes).
2. Remove redundant move instructions.
3. Slightly improve instruction selection/scheduling where
possible.
4. Prefer registers which get short instruction encoding.
5. Reduce rodata usage (-16 bytes).
The throughput improvement is not significant as the port 0 bottleneck
is unavoidable.
Function, New Time, Old Time, New / Old
_ZGVbN4v_atanhf, 8.821, 8.903, 0.991
Noah Goldstein [Thu, 9 Jun 2022 18:16:35 +0000 (11:16 -0700)]
x86: Improve svml_s_atanhf8_core_avx2.S
Improvements are:
1. Reduce code size (-60 bytes).
2. Remove redundant move instructions.
3. Slightly improve instruction selection/scheduling where
possible.
4. Prefer registers which get short instruction encoding.
5. Shrink rodata usage (-32 bytes).
The throughput improvement is not that significant (3-5%) as the
port 0 bottleneck is unavoidable.
Function, New Time, Old Time, New / Old
_ZGVdN8v_atanhf, 2.799, 2.923, 0.958
Noah Goldstein [Thu, 9 Jun 2022 18:16:34 +0000 (11:16 -0700)]
x86: Improve svml_s_atanhf16_core_avx512.S
Improvements are:
1. Reduce code size (-64 bytes).
2. Remove redundant move instructions.
3. Slightly improve instruction selection/scheduling where
possible.
4. Reduce rodata size ([-128, -188] bytes).
The throughput improvement is not significant as the port 0 bottleneck
is unavoidable.
Function, New Time, Old Time, New / Old
_ZGVeN16v_atanhf, 1.39, 1.408, 0.987
Noah Goldstein [Thu, 9 Jun 2022 04:16:51 +0000 (21:16 -0700)]
x86: Align varshift table to 32-bytes
This ensures the load will never split a cache line.
Noah Goldstein [Thu, 9 Jun 2022 00:27:59 +0000 (17:27 -0700)]
x86: Add copyright to strpbrk-c.c
Sam James [Sun, 5 Jun 2022 03:57:10 +0000 (04:57 +0100)]
nss: handle stat failure in check_reload_and_get (BZ #28752)
Skip the chroot test if the database isn't loaded
correctly (because the chroot test uses some
existing DB state).
The __stat64_time64 -> fstatat call can fail if
running under an (aggressive) seccomp filter,
like Firefox seems to use.
This manifested in a crash when using glib built
with FAM support with such a Firefox build.
Suggested-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
Sam James [Sun, 5 Jun 2022 03:57:09 +0000 (04:57 +0100)]
nss: add assert to DB_LOOKUP_FCT (BZ #28752)
It's interesting if we have a null action list,
so an assert is worthwhile.
Suggested-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
Noah Goldstein [Wed, 8 Jun 2022 21:34:59 +0000 (14:34 -0700)]
x86: Fix page cross case in rawmemchr-avx2 [BZ #29234]
commit
6dcbb7d95dded20153b12d76d2f4e0ef0cda4f35
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Mon Jun 6 21:11:33 2022 -0700
x86: Shrink code size of memchr-avx2.S
Changed how the page cross case aligned string (rdi) in
rawmemchr. This was incompatible with how
`L(cross_page_continue)` expected the pointer to be aligned and
would cause rawmemchr to read data start started before the
beginning of the string. What it would read was in valid memory
but could count CHAR matches resulting in an incorrect return
value.
This commit fixes that issue by essentially reverting the changes to
the L(page_cross) case as they didn't really matter.
Test cases added and all pass with the new code (and where confirmed
to fail with the old code).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Paul E. Murphy [Wed, 1 Jun 2022 16:19:49 +0000 (16:19 +0000)]
nptl_db: disable DT_RELR on libthread_db.so
Some nptl tests inadvertently use the host's gdb to verify
libthread_db.so, which is loaded with the host's runtime. This causes
a couple of test failures when the host glibc does not support DT_RELR.
The not correct, but simple, workaround is to build without DT_RELR
as this library is otherwise likely to load on glibc 2.17 and newer
today.
This allows tst-pthread-gdb-attach{,-static} to continue working
when testing on a gdb loaded with an older glibc.
This avoids a failure in tst-pthread-gdb-attach similar to:
Trying host libthread_db library: .../build/glibc/nptl_db/libthread_db.so.1.
dlopen failed: /lib64/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by .../build/glibc/nptl_db/libthread_db.so.1).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Andreas Schwab [Wed, 8 Jun 2022 13:25:26 +0000 (15:25 +0200)]
elf: add missing newlines in lateglobal test
Adhemerval Zanella [Tue, 31 May 2022 20:13:35 +0000 (17:13 -0300)]
nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214)
This was due a wrong revert done on
404656009b459658.
Checked on x86_64-linux-gnu.
Noah Goldstein [Fri, 3 Jun 2022 23:52:37 +0000 (18:52 -0500)]
x86: ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST expect no transactions
Give fall-through path to `vzeroupper` and taken-path to `vzeroall`.
Generally even on machines with RTM the expectation is the
string-library functions will not be called in transactions.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:34 +0000 (21:11 -0700)]
x86: Shrink code size of memchr-evex.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.
The total code size saving is: 64 bytes
There are no non-negligible changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 1.000
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:33 +0000 (21:11 -0700)]
x86: Shrink code size of memchr-avx2.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.
The total code size saving is: 59 bytes
There are no major changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 0.967
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:32 +0000 (21:11 -0700)]
x86: Optimize memrchr-avx2.S
The new code:
1. prioritizes smaller user-arg lengths more.
2. optimizes target placement more carefully
3. reuses logic more
4. fixes up various inefficiencies in the logic. The biggest
case here is the `lzcnt` logic for checking returns which
saves either a branch or multiple instructions.
The total code size saving is: 306 bytes
Geometric Mean of all benchmarks New / Old: 0.760
Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
10-20% regression.
This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 15-45% speedup.
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:31 +0000 (21:11 -0700)]
x86: Optimize memrchr-evex.S
The new code:
1. prioritizes smaller user-arg lengths more.
2. optimizes target placement more carefully
3. reuses logic more
4. fixes up various inefficiencies in the logic. The biggest
case here is the `lzcnt` logic for checking returns which
saves either a branch or multiple instructions.
The total code size saving is: 263 bytes
Geometric Mean of all benchmarks New / Old: 0.755
Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
20% regression.
This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 35% speedup.
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:30 +0000 (21:11 -0700)]
x86: Optimize memrchr-sse2.S
The new code:
1. prioritizes smaller lengths more.
2. optimizes target placement more carefully.
3. reuses logic more.
4. fixes up various inefficiencies in the logic.
The total code size saving is: 394 bytes
Geometric Mean of all benchmarks New / Old: 0.874
Regressions:
1. The page cross case is now colder, especially re-entry from the
page cross case if a match is not found in the first VEC
(roughly 50%). My general opinion with this patch is this is
acceptable given the "coldness" of this case (less than 4%) and
generally performance improvement in the other far more common
cases.
2. There are some regressions 5-15% for medium/large user-arg
lengths that have a match in the first VEC. This is because the
logic was rewritten to optimize finds in the first VEC if the
user-arg length is shorter (where we see roughly 20-50%
performance improvements). It is not always the case this is a
regression. My intuition is some frontend quirk is partially
explaining the data although I haven't been able to find the
root cause.
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:29 +0000 (21:11 -0700)]
Benchtests: Improve memrchr benchmarks
Add a second iteration for memrchr to set `pos` starting from the end
of the buffer.
Previously `pos` was only set relative to the beginning of the
buffer. This isn't really useful for memrchr because the beginning
of the search space is (buf + len).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:28 +0000 (21:11 -0700)]
x86: Add COND_VZEROUPPER that can replace vzeroupper if no `ret`
The RTM vzeroupper mitigation has no way of replacing inline
vzeroupper not before a return.
This can be useful when hoisting a vzeroupper to save code size
for example:
```
L(foo):
cmpl %eax, %edx
jz L(bar)
tzcntl %eax, %eax
addq %rdi, %rax
VZEROUPPER_RETURN
L(bar):
xorl %eax, %eax
VZEROUPPER_RETURN
```
Can become:
```
L(foo):
COND_VZEROUPPER
cmpl %eax, %edx
jz L(bar)
tzcntl %eax, %eax
addq %rdi, %rax
ret
L(bar):
xorl %eax, %eax
ret
```
This code does not change any existing functionality.
There is no difference in the objdump of libc.so before and after this
patch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Tue, 7 Jun 2022 04:11:27 +0000 (21:11 -0700)]
x86: Create header for VEC classes in x86 strings library
This patch does not touch any existing code and is only meant to be a
tool for future patches so that simple source files can more easily be
maintained to target multiple VEC classes.
There is no difference in the objdump of libc.so before and after this
patch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Matheus Castanho [Tue, 7 Jun 2022 13:27:26 +0000 (10:27 -0300)]
powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]
__strncpy_power9 initializes VR 18 with zeroes to be used throughout the
code, including when zero-padding the destination string. However, the
v18 reference was mistakenly being used for stxv and stxvl, which take a
VSX vector as operand. The code ended up using the uninitialized VSR 18
register by mistake.
Both occurrences have been changed to use the proper VSX number for VR 18
(i.e. VSR 50).
Tested on powerpc, powerpc64 and powerpc64le.
Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
Wilco Dijkstra [Tue, 7 Jun 2022 15:45:46 +0000 (16:45 +0100)]
AArch64: Sort makefile entries
Sort makefile entries to reduce conflicts.
Wilco Dijkstra [Tue, 7 Jun 2022 15:44:35 +0000 (16:44 +0100)]
AArch64: Add SVE memcpy
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE
vectors which improves the random memcpy benchmark significantly.
Cleanup the memcpy and memmove ifunc selectors.
Raghuveer Devulapalli [Mon, 6 Jun 2022 19:17:43 +0000 (12:17 -0700)]
x86_64: Add strstr function with 512-bit EVEX
Adding a 512-bit EVEX version of strstr. The algorithm works as follows:
(1) We spend a few cycles at the begining to peek into the needle. We
locate an edge in the needle (first occurance of 2 consequent distinct
characters) and also store the first 64-bytes into a zmm register.
(2) We search for the edge in the haystack by looking into one cache
line of the haystack at a time. This avoids having to read past a page
boundary which can cause a seg fault.
(3) If an edge is found in the haystack we first compare the first
64-bytes of the needle (already stored in a zmm register) before we
proceed with a full string compare performed byte by byte.
Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512)
Geometric mean of all benchmarks: new / old = 0.66
Difficult skiptable(0) : new / old = 0.02
Difficult skiptable(1) : new / old = 0.01
Difficult 2-way : new / old = 0.25
Difficult testing first 2 : new / old = 1.26
Difficult skiptable(0) : new / old = 0.05
Difficult skiptable(1) : new / old = 0.06
Difficult 2-way : new / old = 0.26
Difficult testing first 2 : new / old = 1.05
Difficult skiptable(0) : new / old = 0.42
Difficult skiptable(1) : new / old = 0.24
Difficult 2-way : new / old = 0.21
Difficult testing first 2 : new / old = 1.04
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Adhemerval Zanella [Mon, 6 Jun 2022 17:41:24 +0000 (14:41 -0300)]
scripts/glibcelf.py: Add PT_AARCH64_MEMTAG_MTE constant
It was added in commit
603e5c8ba7257483c162cabb06eb6f79096429b6.
This caused the elf/tst-glibcelf consistency check to fail.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Dmitriy Fedchenko [Mon, 6 Jun 2022 15:46:14 +0000 (12:46 -0300)]
socket: Fix mistyped define statement in socket/sys/socket.h (BZ #29225)
Joseph Myers [Mon, 6 Jun 2022 14:47:03 +0000 (14:47 +0000)]
Declare timegm for ISO C2X
The next revision of the ISO C standard has added the timegm function
(that was already supported in glibc). Update the feature test
conditionals on its declaration in <time.h> accordingly.
Tested for x86_64.
Joseph Myers [Mon, 6 Jun 2022 14:45:34 +0000 (14:45 +0000)]
Add PT_AARCH64_MEMTAG_MTE from Linux 5.18 to elf.h
Linux 5.18 defines a new AArch64 ELF segment type
PT_AARCH64_MEMTAG_MTE; add it to elf.h.
Tested with build-many-glibcs.py for aarch64-linux-gnu.