H.J. Lu [Tue, 29 Mar 2022 21:08:54 +0000 (14:08 -0700)]
elf: Define DT_RELR related macros and types
Fangrui Song [Tue, 26 Apr 2022 16:26:22 +0000 (09:26 -0700)]
elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC
PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage
variables and hidden visibility variables in a shared object (ld.so)
need dynamic relocations (usually R_*_RELATIVE). PI (position
independent) in the macro name is a misnomer: a code sequence using GOT
is typically position-independent as well, but using dynamic relocations
does not meet the requirement.
Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new
ports will define PI_STATIC_AND_HIDDEN. Current ports defining
PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure
default.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Carlos O'Donell [Tue, 26 Apr 2022 14:52:41 +0000 (10:52 -0400)]
i386: Regenerate ulps
These failures were caught while building glibc master for Fedora
Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse'
using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon
processor.
Florian Weimer [Tue, 26 Apr 2022 12:23:02 +0000 (14:23 +0200)]
dlfcn: Do not use rtld_active () to determine ld.so state (bug 29078)
When audit modules are loaded, ld.so initialization is not yet
complete, and rtld_active () returns false even though ld.so is
mostly working. Instead, the static dlopen hook is used, but that
does not work at all because this is not a static dlopen situation.
Commit
466c1ea15f461edb8e3ffaf5d86d708876343bbf ("dlfcn: Rework
static dlopen hooks") moved the hook pointer into _rtld_global_ro,
which means that separate protection is not needed anymore and the
hook pointer can be checked directly.
The guard for disabling libio vtable hardening in _IO_vtable_check
should stay for now.
Fixes commit
8e1472d2c1e25e6eabc2059170731365f6d5b3d1 ("ld.so:
Examine GLRO to detect inactive loader [BZ #20204]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Florian Weimer [Tue, 26 Apr 2022 12:22:10 +0000 (14:22 +0200)]
INSTALL: Rephrase -with-default-link documentation
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Fangrui Song [Mon, 25 Apr 2022 17:30:26 +0000 (10:30 -0700)]
elf: Move post-relocation code of _dl_start into _dl_start_final
On non-PI_STATIC_AND_HIDDEN architectures, getting the address of
_rtld_local_ro (for GLRO (dl_final_object)) goes through a GOT entry.
The GOT load may be reordered before self relocation, leading to an
unrelocated/incorrect _rtld_local_ro address.
84e02af1ebc9988126eebe60bf19226cea835623 tickled GCC powerpc32 to
reorder the GOT load before relative relocations, leading to ld.so
crash. This is similar to the m68k jump table reordering issue fixed by
a8e9b5b8079d18116ca69c9797e77804ecf2ee7e.
Move code after self relocation into _dl_start_final to avoid the
reordering. This fixes powerpc32 and may help other architectures when
ELF_DYNAMIC_RELOCATE is simplified in the future.
Joan Bruguera [Mon, 11 Apr 2022 17:49:56 +0000 (19:49 +0200)]
misc: Fix rare fortify crash on wchar funcs. [BZ 29030]
If `__glibc_objsize (__o) == (size_t) -1` (i.e. `__o` is unknown size), fortify
checks should pass, and `__whatever_alias` should be called.
Previously, `__glibc_objsize (__o) == (size_t) -1` was explicitly checked, but
on commit
a643f60c53876b, this was moved into `__glibc_safe_or_unknown_len`.
A comment says the -1 case should work as: "The -1 check is redundant because
since it implies that __glibc_safe_len_cond is true.". But this fails when:
* `__s > 1`
* `__osz == -1` (i.e. unknown size at compile time)
* `__l` is big enough
* `__l * __s <= __osz` can be folded to a constant
(I only found this to be true for `mbsrtowcs` and other functions in wchar2.h)
In this case `__l * __s <= __osz` is false, and `__whatever_chk_warn` will be
called by `__glibc_fortify` or `__glibc_fortify_n` and crash the program.
This commit adds the explicit `__osz == -1` check again.
moc crashes on startup due to this, see: https://bugs.archlinux.org/task/74041
Minimal test case (test.c):
#include <wchar.h>
int main (void)
{
const char *hw = "HelloWorld";
mbsrtowcs (NULL, &hw, (size_t)-1, NULL);
return 0;
}
Build with:
gcc -O2 -Wp,-D_FORTIFY_SOURCE=2 test.c -o test && ./test
Output:
*** buffer overflow detected ***: terminated
Fixes: BZ #29030
Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Fangrui Song [Mon, 25 Apr 2022 08:01:02 +0000 (01:01 -0700)]
elf: Remove unused enum allowmask
Unused since
52a01100ad011293197637e42b5be1a479a2f4ae
("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]").
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Florian Weimer [Fri, 22 Apr 2022 17:34:52 +0000 (19:34 +0200)]
scripts/glibcelf.py: Mark as UNSUPPORTED on Python 3.5 and earlier
enum.IntFlag and enum.EnumMeta._missing_ support are not part of
earlier Python versions.
Noah Goldstein [Fri, 22 Apr 2022 01:52:30 +0000 (20:52 -0500)]
x86: Optimize {str|wcs}rchr-evex
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.755
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Fri, 22 Apr 2022 01:52:29 +0000 (20:52 -0500)]
x86: Optimize {str|wcs}rchr-avx2
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.832
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Fri, 22 Apr 2022 01:52:28 +0000 (20:52 -0500)]
x86: Optimize {str|wcs}rchr-sse2
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.741
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Fri, 22 Apr 2022 19:11:59 +0000 (14:11 -0500)]
benchtests: Improve bench-strrchr
1. Use json-lib for printing results.
2. Expose all parameters (before pos, seek_char, and max_char where
not printed).
3. Add benchmarks that test multiple occurence of seek_char in the
string.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
H.J. Lu [Fri, 22 Apr 2022 17:53:13 +0000 (10:53 -0700)]
x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32
Clear the upper 32 bits in RDX (memory size) for x32 to fix
FAIL: string/tst-size_t-memcmp
FAIL: string/tst-size_t-memcmp-2
FAIL: string/tst-size_t-memcpy
FAIL: wcsmbs/tst-size_t-wmemcmp
on x32 introduced by
8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S
26b2478322 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Florian Weimer [Mon, 11 Apr 2022 09:30:31 +0000 (11:30 +0200)]
Default to --with-default-link=no (bug 25812)
This is necessary to place the libio vtables into the RELRO segment.
New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to
verify that this is what actually happens.
The new tests fail on ia64 due to lack of (default) RELRO support
inbutils, so they are XFAILed there.
Florian Weimer [Mon, 11 Apr 2022 09:28:08 +0000 (11:28 +0200)]
scripts: Add glibcelf.py module
Hopefully, this will lead to tests that are easier to maintain. The
current approach of parsing readelf -W output using regular expressions
is not necessarily easier than parsing the ELF data directly.
This module is still somewhat incomplete (e.g., coverage of relocation
types and versioning information is missing), but it is sufficient to
perform basic symbol analysis or program header analysis.
The EM_* mapping for architecture-specific constant classes (e.g.,
SttX86_64) is not yet implemented. The classes are defined for the
benefit of elf/tst-glibcelf.py.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Emil Soleyman-Zomalan [Tue, 25 Jan 2022 01:33:10 +0000 (19:33 -0600)]
Add locale for syr_SY
Fangrui Song [Wed, 20 Apr 2022 20:52:45 +0000 (13:52 -0700)]
elf: Move elf_dynamic_do_Rel RTLD_BOOTSTRAP branches outside
elf_dynamic_do_Rel checks RTLD_BOOTSTRAP in several #ifdef branches.
Create an outside RTLD_BOOTSTRAP branch to simplify reasoning about the
function at the cost of a few duplicate lines.
Since dl_naudit is zero in RTLD_BOOTSTRAP code, the RTLD_BOOTSTRAP
branch can avoid _dl_audit_symbind calls to decrease code size.
Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
Fangrui Song [Wed, 20 Apr 2022 17:24:15 +0000 (10:24 -0700)]
m68k: Handle fewer relocations for RTLD_BOOTSTRAP (#BZ29071)
m68k is a non-PI_STATIC_AND_HIDDEN arch which uses a GOT relocation when
loading the address of a jump table. The GOT load may be reordered
before processing R_68K_RELATIVE relocations, leading to an
unrelocated/incorrect jump table, which will cause a crash.
The foolproof approach is to add an optimization barrier (e.g. calling
an non-inlinable function after relative relocations are resolved). That
is non-trivial given the current code structure, so just use the simple
approach to avoid the jump table: handle only the essential reloctions
for RTLD_BOOTSTRAP code.
This is based on Andreas Schwab's patch and fixed ld.so crash on m68k.
Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Wed, 20 Apr 2022 15:01:43 +0000 (12:01 -0300)]
nptl: Fix pthread_cancel cancelhandling atomic operations
The
404656009b reversion did not setup the atomic loop to set the
cancel bits correctly. The fix is essentially what pthread_cancel
did prior
26cfbb7162ad.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Noah Goldstein [Tue, 19 Apr 2022 22:52:33 +0000 (17:52 -0500)]
x86: Fix missing __wmemcmp def for disable-multiarch build
commit
8804157ad9da39631703b92315460808eac86b0c
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Fri Apr 15 12:27:59 2022 -0500
x86: Optimize memcmp SSE2 in memcmp.S
Only defined wmemcmp and missed __wmemcmp. This commit fixes that by
defining __wmemcmp and setting wmemcmp as a weak alias to __wmemcmp.
Both multiarch and disable-multiarch builds succeed and full xchecks
pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Fangrui Song [Tue, 19 Apr 2022 22:52:27 +0000 (15:52 -0700)]
elf: Remove __libc_init_secure
After
73fc4e28b9464f0e13edc719a5372839970e7ddb,
__libc_enable_secure_decided is always 0 and a statically linked
executable may overwrite __libc_enable_secure without considering
AT_SECURE.
The __libc_enable_secure has been correctly initialized in _dl_aux_init,
so just remove __libc_enable_secure_decided and __libc_init_secure.
This allows us to remove some startup_get*id functions from
22b79ed7f413cd980a7af0cf258da5bf82b6d5e5.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
=Joshua Kinard [Mon, 18 Apr 2022 12:55:08 +0000 (09:55 -0300)]
mips: Fix mips64n32 64 bit time_t stat support (BZ#29069)
Add missing support initially added by
4e8521333bea6e89fcef1020
(which missed n32 stat).
Noah Goldstein [Fri, 15 Apr 2022 17:28:01 +0000 (12:28 -0500)]
x86: Cleanup page cross code in memcmp-avx2-movbe.S
Old code was both inefficient and wasted code size. New code (-62
bytes) and comparable or better performance in the page cross case.
geometric_mean(N=20) of page cross cases New / Original: 0.960
size, align0, align1, ret, New Time/Old Time
1, 4095, 0, 0, 1.001
1, 4095, 0, 1, 0.999
1, 4095, 0, -1, 1.0
2, 4094, 0, 0, 1.0
2, 4094, 0, 1, 1.0
2, 4094, 0, -1, 1.0
3, 4093, 0, 0, 1.0
3, 4093, 0, 1, 1.0
3, 4093, 0, -1, 1.0
4, 4092, 0, 0, 0.987
4, 4092, 0, 1, 1.0
4, 4092, 0, -1, 1.0
5, 4091, 0, 0, 0.984
5, 4091, 0, 1, 1.002
5, 4091, 0, -1, 1.005
6, 4090, 0, 0, 0.993
6, 4090, 0, 1, 1.001
6, 4090, 0, -1, 1.003
7, 4089, 0, 0, 0.991
7, 4089, 0, 1, 1.0
7, 4089, 0, -1, 1.001
8, 4088, 0, 0, 0.875
8, 4088, 0, 1, 0.881
8, 4088, 0, -1, 0.888
9, 4087, 0, 0, 0.872
9, 4087, 0, 1, 0.879
9, 4087, 0, -1, 0.883
10, 4086, 0, 0, 0.878
10, 4086, 0, 1, 0.886
10, 4086, 0, -1, 0.873
11, 4085, 0, 0, 0.878
11, 4085, 0, 1, 0.881
11, 4085, 0, -1, 0.879
12, 4084, 0, 0, 0.873
12, 4084, 0, 1, 0.889
12, 4084, 0, -1, 0.875
13, 4083, 0, 0, 0.873
13, 4083, 0, 1, 0.863
13, 4083, 0, -1, 0.863
14, 4082, 0, 0, 0.838
14, 4082, 0, 1, 0.869
14, 4082, 0, -1, 0.877
15, 4081, 0, 0, 0.841
15, 4081, 0, 1, 0.869
15, 4081, 0, -1, 0.876
16, 4080, 0, 0, 0.988
16, 4080, 0, 1, 0.99
16, 4080, 0, -1, 0.989
17, 4079, 0, 0, 0.978
17, 4079, 0, 1, 0.981
17, 4079, 0, -1, 0.98
18, 4078, 0, 0, 0.981
18, 4078, 0, 1, 0.98
18, 4078, 0, -1, 0.985
19, 4077, 0, 0, 0.977
19, 4077, 0, 1, 0.979
19, 4077, 0, -1, 0.986
20, 4076, 0, 0, 0.977
20, 4076, 0, 1, 0.986
20, 4076, 0, -1, 0.984
21, 4075, 0, 0, 0.977
21, 4075, 0, 1, 0.983
21, 4075, 0, -1, 0.988
22, 4074, 0, 0, 0.983
22, 4074, 0, 1, 0.994
22, 4074, 0, -1, 0.993
23, 4073, 0, 0, 0.98
23, 4073, 0, 1, 0.992
23, 4073, 0, -1, 0.995
24, 4072, 0, 0, 0.989
24, 4072, 0, 1, 0.989
24, 4072, 0, -1, 0.991
25, 4071, 0, 0, 0.99
25, 4071, 0, 1, 0.999
25, 4071, 0, -1, 0.996
26, 4070, 0, 0, 0.993
26, 4070, 0, 1, 0.995
26, 4070, 0, -1, 0.998
27, 4069, 0, 0, 0.993
27, 4069, 0, 1, 0.999
27, 4069, 0, -1, 1.0
28, 4068, 0, 0, 0.997
28, 4068, 0, 1, 1.0
28, 4068, 0, -1, 0.999
29, 4067, 0, 0, 0.996
29, 4067, 0, 1, 0.999
29, 4067, 0, -1, 0.999
30, 4066, 0, 0, 0.991
30, 4066, 0, 1, 1.001
30, 4066, 0, -1, 0.999
31, 4065, 0, 0, 0.988
31, 4065, 0, 1, 0.998
31, 4065, 0, -1, 0.998
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Fri, 15 Apr 2022 17:28:00 +0000 (12:28 -0500)]
x86: Remove memcmp-sse4.S
Code didn't actually use any sse4 instructions since `ptest` was
removed in:
commit
2f9062d7171850451e6044ef78d91ff8c017b9c0
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Nov 10 16:18:56 2021 -0600
x86: Shrink memcmp-sse4.S code size
The new memcmp-sse2 implementation is also faster.
geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905
Note there are two regressions preferring SSE2 for Size = 1 and Size =
65.
Size = 1:
size, align0, align1, ret, New Time/Old Time
1, 1, 1, 0, 1.2
1, 1, 1, 1, 1.197
1, 1, 1, -1, 1.2
This is intentional. Size == 1 is significantly less hot based on
profiles of GCC11 and Python3 than sizes [4, 8] (which is made
hotter).
Python3 Size = 1 -> 13.64%
Python3 Size = [4, 8] -> 60.92%
GCC11 Size = 1 -> 1.29%
GCC11 Size = [4, 8] -> 33.86%
size, align0, align1, ret, New Time/Old Time
4, 4, 4, 0, 0.622
4, 4, 4, 1, 0.797
4, 4, 4, -1, 0.805
5, 5, 5, 0, 0.623
5, 5, 5, 1, 0.777
5, 5, 5, -1, 0.802
6, 6, 6, 0, 0.625
6, 6, 6, 1, 0.813
6, 6, 6, -1, 0.788
7, 7, 7, 0, 0.625
7, 7, 7, 1, 0.799
7, 7, 7, -1, 0.795
8, 8, 8, 0, 0.625
8, 8, 8, 1, 0.848
8, 8, 8, -1, 0.914
9, 9, 9, 0, 0.625
Size = 65:
size, align0, align1, ret, New Time/Old Time
65, 0, 0, 0, 1.103
65, 0, 0, 1, 1.216
65, 0, 0, -1, 1.227
65, 65, 0, 0, 1.091
65, 0, 65, 1, 1.19
65, 65, 65, -1, 1.215
This is because A) the checks in range [65, 96] are now unrolled 2x
and B) because smaller values <= 16 are now given a hotter path. By
contrast the SSE4 version has a branch for Size = 80. The unrolled
version has get better performance for returns which need both
comparisons.
size, align0, align1, ret, New Time/Old Time
128, 4, 8, 0, 0.858
128, 4, 8, 1, 0.879
128, 4, 8, -1, 0.888
As well, out of microbenchmark environments that are not full
predictable the branch will have a real-cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Fri, 15 Apr 2022 17:27:59 +0000 (12:27 -0500)]
x86: Optimize memcmp SSE2 in memcmp.S
New code save size (-303 bytes) and has significantly better
performance.
geometric_mean(N=20) of page cross cases New / Original: 0.634
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Adhemerval Zanella [Fri, 18 Mar 2022 13:10:53 +0000 (10:10 -0300)]
misc: Use 64 bit time_t interfaces on syslog
It also handles the highly unlikely case where localtime might return
NULL, in this case only the PRI is set to hopefully instruct the relay
to get eh TIMESTAMP (as defined by the RFC).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Sun, 11 Apr 2021 03:04:01 +0000 (00:04 -0300)]
misc: syslog: Move SYSLOG_NAME to USE_MISC (BZ #16355)
There is no easy solution as described on first comment in bug report,
and some code (like busybox) assumes facilitynames existance when
SYSLOG_NAMES is defined (so we can't just remove it as suggested in
comment #2).
So use the easier solution and guard it with __USE_MISC.
Adhemerval Zanella [Tue, 5 Oct 2021 12:58:09 +0000 (09:58 -0300)]
misc: syslog: Use fixed-sized buffer and remove memstream
A fixed-sized buffer is used instead of memstream for messages up to
1024 bytes to avoid the potential BUFSIZ (8K) malloc and free for
each syslog call.
Also, since the buffer size is know, memstream is replaced with a
malloced buffer for larger messages.
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Tue, 5 Oct 2021 12:26:54 +0000 (09:26 -0300)]
misc: syslog: Simplify implementation
Use a temporary buffer for strftime instead of using internal libio
members, simplify fprintf call on the memstream and memory allocation,
use %b instead of %h, use dprintf instead of writev for LOG_PERROR.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Tue, 5 Oct 2021 12:15:19 +0000 (09:15 -0300)]
misc: syslog: Fix indentation and style
And also clenaup the headers, no semantic changes.
Adhemerval Zanella [Fri, 9 Apr 2021 11:34:22 +0000 (08:34 -0300)]
misc: Add syslog test
The test cover:
- All possible priorities and facilities through TCP and UDP.
- Same syslog tests for vsyslog.
- Some openlog/syslog/close combinations.
- openlog with LOG_CONS, LOG_PERROR, and LOG_PID.
Internally is done with a test-container where the main process mimics
the syslog server interface.
The test does not cover multithread and async-signal usage.
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Thu, 8 Apr 2021 20:36:07 +0000 (17:36 -0300)]
support: Add xmkfifo
Wrapper support mkfifo.
Adhemerval Zanella [Wed, 6 Apr 2022 14:46:46 +0000 (11:46 -0300)]
stdio: Split __get_errname definition from errlist.c
The loader does not need to pull all __get_errlist definitions
and its size is decreased:
Before:
$ size elf/ld.so
text data bss dec hex filename
197774 11024 456 209254 33166 elf/ld.so
After:
$ size elf/ld.so
text data bss dec hex filename
191510 9936 456 201902 314ae elf/ld.so
Checked on x86_64-linux-gnu.
Noah Goldstein [Thu, 14 Apr 2022 16:47:40 +0000 (11:47 -0500)]
x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
The goal is to remove most SSSE3 function as SSE4, AVX2, and EVEX are
generally preferable. memcpy/memmove is one exception where avoiding
unaligned loads with `palignr` is important for some targets.
This commit replaces memmove-ssse3 with a better optimized are lower
code footprint verion. As well it aliases memcpy to memmove.
Aside from this function all other SSSE3 functions should be safe to
remove.
The performance is not changed drastically although shows overall
improvements without any major regressions or gains.
bench-memcpy geometric_mean(N=50) New / Original: 0.957
bench-memcpy-random geometric_mean(N=50) New / Original: 0.912
bench-memcpy-large geometric_mean(N=50) New / Original: 0.892
Benchmarks where run on Zhaoxin KX-6840@2000MHz See attached numbers
for all results.
More important this saves 7246 bytes of code size in memmove an
additional 10741 bytes by reusing memmove code for memcpy (total 17987
bytes saves). As well an additional 896 bytes of rodata for the jump
table entries.
Noah Goldstein [Thu, 14 Apr 2022 16:47:38 +0000 (11:47 -0500)]
x86: Remove mem{move|cpy}-ssse3-back
With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer
SSSE3. As a result it is no longer worth it to keep the SSSE3
versions given the code size cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Thu, 14 Apr 2022 16:47:37 +0000 (11:47 -0500)]
x86: Remove str{p}{n}cpy-ssse3
With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer
SSSE3. As a result it is no longer worth it to keep the SSSE3
versions given the code size cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Thu, 14 Apr 2022 16:47:36 +0000 (11:47 -0500)]
x86: Remove str{n}cat-ssse3
With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer
SSSE3. As a result it is no longer worth it to keep the SSSE3
versions given the code size cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Thu, 14 Apr 2022 16:47:35 +0000 (11:47 -0500)]
x86: Remove str{n}{case}cmp-ssse3
With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer
SSSE3. As a result it is no longer worth it to keep the SSSE3
versions given the code size cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Thu, 14 Apr 2022 16:47:34 +0000 (11:47 -0500)]
x86: Remove {w}memcmp-ssse3
With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer
SSSE3. As a result it is no longer worth it to keep the SSSE3
versions given the code size cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Adhemerval Zanella [Wed, 6 Apr 2022 15:24:42 +0000 (12:24 -0300)]
nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029)
Some Linux interfaces never restart after being interrupted by a signal
handler, regardless of the use of SA_RESTART [1]. It means that for
pthread cancellation, if the target thread disables cancellation with
pthread_setcancelstate and calls such interfaces (like poll or select),
it should not see spurious EINTR failures due the internal SIGCANCEL.
However recent changes made pthread_cancel to always sent the internal
signal, regardless of the target thread cancellation status or type.
To fix it, the previous semantic is restored, where the cancel signal
is only sent if the target thread has cancelation enabled in
asynchronous mode.
The cancel state and cancel type is moved back to cancelhandling
and atomic operation are used to synchronize between threads. The
patch essentially revert the following commits:
8c1c0aae20 nptl: Move cancel type out of cancelhandling
2b51742531 nptl: Move cancel state out of cancelhandling
26cfbb7162 nptl: Remove CANCELING_BITMASK
However I changed the atomic operation to follow the internal C11
semantic and removed the MACRO usage, it simplifies a bit the
resulting code (and removes another usage of the old atomic macros).
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
and powerpc64-linux-gnu.
[1] https://man7.org/linux/man-pages/man7/signal.7.html
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Stefan Liebler [Wed, 13 Apr 2022 12:36:09 +0000 (14:36 +0200)]
S390: Add new s390 platform z16.
The new IBM z16 is added to platform string array.
The macro _DL_PLATFORMS_COUNT is incremented.
_dl_hwcaps_subdir is extended by "z16" if HWCAP_S390_VXRS_PDE2
is set. HWCAP_S390_NNPA is not tested in _dl_hwcaps_subdirs_active
as those instructions may be replaced or removed in future.
tst-glibc-hwcaps.c is extended in order to test z16 via new marker5.
A fatal glibc error is dumped if glibc was build with architecture
level set for z16, but run on an older machine. (See dl-hwcap-check.h)
Noah Goldstein [Thu, 14 Apr 2022 00:46:03 +0000 (19:46 -0500)]
Replace {u}int_fast{16|32} with {u}int32_t
On 32-bit machines this has no affect. On 64-bit machines
{u}int_fast{16|32} are set as {u}int64_t which is often not
ideal. Particularly x86_64 this change both saves code size and
may save instruction cost.
Full xcheck passes on x86_64.
Adhemerval Zanella [Fri, 8 Apr 2022 12:57:57 +0000 (09:57 -0300)]
stdlib: Reflow and sort most variable assignments
Florian Weimer [Wed, 13 Apr 2022 12:18:28 +0000 (14:18 +0200)]
elf: Fix memory leak in _dl_find_object_update (bug 29062)
The count can be zero if an object has already been loaded as
an indirect dependency (so that l_searchlist.r_list in its link
map is still NULL) is promoted to global scope via RTLD_GLOBAL.
Fixes commit
5d28a8962dc ("elf: Add _dl_find_object function").
Samuel Thibault [Tue, 12 Apr 2022 20:16:38 +0000 (22:16 +0200)]
hurd: Define ELIBEXEC
So we can implement it in the exec server.
Samuel Thibault [Tue, 12 Apr 2022 20:14:34 +0000 (22:14 +0200)]
hurd: Fix arbitrary error code
ELIBBAD is Linux-specific.
Carlos O'Donell [Tue, 12 Apr 2022 17:26:10 +0000 (13:26 -0400)]
NEWS: Move PLT tracking slowdown to glibc 2.35.
In commit
063f9ba220f434c7f30dd65c4cff17c0c458a7cf the NEWS section
was accidentally added to the glibc 2.34 NEWS section. The NEWS entry
should have been added to glibc 2.35 which contained the committed
fix. This moves the NEWS entry to correct section.
Szabolcs Nagy [Wed, 6 Apr 2022 15:56:07 +0000 (16:56 +0100)]
Remove _dl_skip_args_internal declaration
It does not seem to be used.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
DJ Delorie [Mon, 11 Apr 2022 19:45:35 +0000 (15:45 -0400)]
test-container: Fix "unused code" warnings on HURD
Comment out bits of code that are only used when we *have* pid
namespaces, to avoid "unused code" warnings.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <xno817tnds.fsf@greed.delorie.com>
Noah Goldstein [Mon, 4 Apr 2022 16:56:46 +0000 (11:56 -0500)]
Add .clang-format style file
Went with version >= 11.0 since it covers most of the major features
and should be pretty universally accessibly.
There are some issues:
1. indention of preprocessor directives:
Unfortunately there doesn't appear to be a switch for a seperate
'IndentWidth' for preprocessor directives vs. normal code so we
are stuck either not indenting the directives or over-indenting
them. i.e:
Desired:
```
#ifndef A
# define B
#endif
```
Options:
```
#ifndef A
# define B /* Two spaces instead of one. */
#endif
#ifndef C
#define D /* No spaces. */
#endif
```
Chose to over-indent as it generally seems easier to script
halving all pre-processor indentations than counting the nested
depth and indenting from scratch.
2. concatenation of lines missing semi-colons:
Throughout glibc there are macros used to setup aliasing that are
outside of functions and don't end in semi-colons i.e:
```
libc_hidden_def (__pthread_self)
weak_alias (__pthread_self, pthread_self)
```
clang-format reformats lines like these to:
```
libc_hidden_def (__pthread_self) weak_alias (__pthread_self, pthread_self)
```
which is generally undesirable.
Other than those two big concerns there are certainly some questions
diffs but for the most part it creates a easy to read and consistent
style.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tom Coldrick [Tue, 5 Apr 2022 09:46:54 +0000 (10:46 +0100)]
manual: Avoid name collision in libm ULP table [BZ #28956]
The 32-bit and 64-bit variants of RISC-V share the same name - "RISC-V"
- when generating the libm error table for the info pages. This
collision, and the way how the table is generated, mean that the values
in the final table for "RISC-V" may be either for the 32- or 64-bit
variant, with no indication as to which.
As an additional side-effect, this makes the build non-reproducible, as
the error table generated is dependent upon the host filesystem
implementation.
To solve this issue, the libm-test-ulps-name files for both variants
have been modified to include their word size, so as to remove the
collision and provide more accurate information in the table.
An alternative proposed was to merge the two variants' ULP values into a
single file, but this would mean that information about error values is
lost, as the two variants are not identical. Some differences are
considerable, notably the values for the exp() function are large.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Alan Modra [Sun, 23 Jan 2022 23:55:49 +0000 (10:25 +1030)]
powerpc: Relocate stinfo->main
start_addresses in sysdeps/powerpc/powerpc64/start.S is historical
baggage that should disappear. Until someone does that, relocating
stinfo->main by hand is one solution to the fact that the field may be
unrelocated at the time it is accessed. This is similar to what is
done for dynamic tags via the D_PTR macro. stinfo->init and
stinfo->fini are zero in both powerpc64/start.S and powerpc32/start.S,
so make it a little more obvious they are unused by passing NULLs to
LIBC_START_MAIN. The makefile change is needed to pick up
elf/dl-static-tls.h from dl-machine.h.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Alan Modra [Sun, 23 Jan 2022 21:28:18 +0000 (07:58 +1030)]
powerpc64: Set up thread register for _dl_relocate_static_pie
libgcc ifunc resolvers that access hwcap via a field in the tcb can't
be called until the thread pointer is set up. Other ifunc resolvers
might need access to at_platform. This patch sets up a fake thread
pointer early to a copy of tcbhead_t. hwcapinfo.c already had local
variables for hwcap and at_platform, replace them with an entire
tcbhead_t. It's not that large and this way we easily ensure hwcap
and at_platform are at the same relative offsets as they are in the
real thread block.
The patch also conditionally disables part of tst-tlsifunc-static,
"bar address read from IFUNC resolver is incorrect". We can't get a
proper address for a thread variable before glibc initialises tls.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Alan Modra [Sat, 22 Jan 2022 01:18:50 +0000 (11:48 +1030)]
powerpc64: Use medium model toc accesses throughout
The PowerPC64 linker edits medium model toc-indirect code to toc-pointer
relative:
addis r9,r2,tc_entry_for_var@toc@ha
ld r9,tc_entry_for_var@toc@l(r9)
becomes
addis r9,r2,(var-.TOC.)@ha
addi r9,r9,(var-.TOC.)@l
when "var" is known to be local to the binary. This isn't done for
small-model toc-indirect code, because "var" is almost guaranteed to
be too far away from .TOC. for a 16-bit signed offset. And, because
the analysis of which .toc entry can be removed becomes much more
complicated in objects that mix code models, they aren't removed if
any small-model toc sequence appears in an object file.
Unfortunately, glibc's build of ld.so smashes the needed objects
together in a ld -r linking stage. This means the GOT/TOC is left
with a whole lot of relative relocations which is untidy, but in
itself is not a serious problem. However, static-pie on powerpc64
bombs due to a segfault caused by one of the small-model accesses
before _dl_relocate_static_pie. (The very first one in rcrt1.o
passing start_addresses in r8 to __libc_start_main.)
So this patch makes all the toc/got accesses in assembly medium code
model, and a couple of functions hidden. By itself this is not
enough to give us working static-pie, but it is useful in isolation to
enable better linker optimisation.
There's a serious problem in libgcc too. libgcc ifuncs access the
AT_HWCAP words stored in the tcb with an offset from the thread
pointer (r13), but r13 isn't set at the time _dl_relocate_static_pie.
A followup patch will fix that.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Alan Modra [Sat, 22 Jan 2022 03:02:39 +0000 (13:32 +1030)]
linux: Constify rfv variable in dl_vdso_vsym
Compilers may decide to put the rfv variable in .data rather than on
the stack. It's slightly better to put it in .data.rel.ro.local
instead. Regardles of that, making it const may enable further
optimisations. Found when examining relative relocations (GOT ones
in particular) as part of enabling static-pie for PowerPC64.
Ricardo Bittencourt [Sun, 20 Mar 2022 01:04:42 +0000 (22:04 -0300)]
string: Replace outdated comments in strlen().
Copyright The GNU Toolchain Authors.
The comments on strlen() don't match what the actual code does. They
describe an older algorithm which is no longer in use. This change
replace the old comments with new ones describing the algorithm used.
I am a first time contributor, and I believe there is no need for
copyright assignment, since the file changed is not in the shared
source files list.
This patch only changes comments, but for safety I have run the tests in
my x64 ubuntu machine, with the following results:
Summary of test results:
5051 PASS
80 UNSUPPORTED
16 XFAIL
6 XPASS
Signed-off-by: Ricardo Bittencourt <bluepenguin@gmail.com>
Stefan Liebler [Thu, 7 Apr 2022 11:59:48 +0000 (13:59 +0200)]
S390: Fix elf/tst-audit25[ab]
If glibc is configured with --disable-default-pie and build on
s390 with -O3, the tests elf/tst-audit25a and elf/tst-audit25b are
failing as there are additional la_symbind lines for free and malloc.
It turns out that those belong to the executable. In fact those are
the PLT-stubs. Furthermore la_symbind is also called for calloc and
realloc symbols, but those belong to libc.
Those functions are not called at all, but dlsym'ed in
elf/dl-minimal.c:
__rtld_malloc_init_real (struct link_map *main_map)
{
...
void *new_calloc = lookup_malloc_symbol (main_map, "calloc", &version);
void *new_free = lookup_malloc_symbol (main_map, "free", &version);
void *new_malloc = lookup_malloc_symbol (main_map, "malloc", &version);
void *new_realloc = lookup_malloc_symbol (main_map, "realloc", &version);
...
}
Therefore, this commit just ignored symbols with LA_SYMB_DLSYM flag.
Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Tue, 5 Apr 2022 14:03:42 +0000 (11:03 -0300)]
sparc64: Remove fcopysign{f} implementation
The builtin from generic code generates similar compliant sequence.
Checked on sparc64-linux-gnu.
Adhemerval Zanella [Tue, 5 Apr 2022 13:31:58 +0000 (10:31 -0300)]
alpha: Remove fcopysign{f} implementation
The generic code already uses builtins.
Adhemerval Zanella [Thu, 7 Apr 2022 17:39:59 +0000 (14:39 -0300)]
math: Use builtin for ldbl-96 copysign
All architectures that uses it (x86, ia64, m68k) implement the
builtin.
Checked on x86_64-linux-gnu and ia64-linux-gnu.
Adhemerval Zanella [Tue, 5 Apr 2022 12:46:31 +0000 (09:46 -0300)]
ia64: Remove fcopysign{f} implementation
The builtin used by generic code generates similar code.
Checked on ia64-linux-gnu.
Adhemerval Zanella [Tue, 5 Apr 2022 12:04:36 +0000 (09:04 -0300)]
x86: Remove fcopysign{f} implementation
The builtin used by generic code generates similar code.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Tue, 5 Apr 2022 13:12:03 +0000 (10:12 -0300)]
powerpc: Remove fcopysign{f} implementation
The builtin and generic implementation from generic files are suffice.
Checked on powerpc64-linux-gnu and powerpc-linux-gnu.
Ilyahoo Proshel [Tue, 5 Apr 2022 11:23:16 +0000 (13:23 +0200)]
Add rif_MA locale [BZ #27781]
Resolves: BZ #27781
Siddhesh Poyarekar [Wed, 6 Apr 2022 15:23:24 +0000 (20:53 +0530)]
tests/string: Drop simple/stupid/builtin tests
In most cases the simple/stupid/builtin functions were in there to
benchmark optimized implementations against. Only in some cases the
functions are used to check expected results.
Remove these tests from IMPL() and only keep them in wherever they're
used for a specific purpose, e.g. to generate expected results.
This improves timing of `make subdirs=string` by over a minute and a
half (over 15%) on a Whiskey Lake laptop.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Noah Goldstein <libc-alpha@sourceware.org>
Siddhesh Poyarekar [Wed, 6 Apr 2022 07:15:39 +0000 (12:45 +0530)]
test-memcpy: Actually reverse source and destination
Looks like an oversight in memcpy tests resulted in s2 and s1 not being
swapped for the second iteration of the memcpy test. Fix it. Also fix
a formatting nit.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Adhemerval Zanella [Mon, 4 Apr 2022 12:34:14 +0000 (09:34 -0300)]
benchtests: Only build libmvec benchmarks iff $(build-mathvec) is set
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Wed, 23 Mar 2022 20:40:01 +0000 (17:40 -0300)]
linux: Fix __closefrom_fallback iterates until max int (BZ#28993)
The __closefrom_fallback tries to get a available file descriptor
if the initial open ("/proc/self/fd/", ...) fails. It assumes the
failure would be only if procfs is not mount (ENOENT), however if
the the proc file is not accessible (due some other kernel filtering
such apparmor) it will iterate over a potentially large file set
issuing close calls.
It should only try the close fallback if open returns EMFILE,
ENFILE, or ENOMEM.
Checked on x86_64-linux-gnu.
Fangrui Song [Tue, 5 Apr 2022 00:19:07 +0000 (17:19 -0700)]
Remove -z combreloc and HAVE_Z_COMBRELOC
-z combreloc has been the default regadless of the architecture since
binutils commit
f4d733664aabd7bd78c82895e030ec9779a92809 (2002). The
configure check added in commit
fdde83499a05 (2001) has long been
unneeded.
We can therefore treat HAVE_Z_COMBRELOC as always 1 and delete dead code
paths in dl-machine.h files (many were copied from commit
a711b01d34ca
and
ee0cb67ec238).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Fri, 1 Apr 2022 13:18:19 +0000 (10:18 -0300)]
sparc: Remove s_abs implementations
For sparc64 is the same as the generic implementation, while for
sparc32 the builtin generates the same code.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Adhemerval Zanella [Fri, 1 Apr 2022 12:53:39 +0000 (09:53 -0300)]
ia64: Remove fabs implementations
The generic implementation fixes 5 fabs tests on ia64-linux-gnu:
math/test-double-fabs
math/test-float-fabs
math/test-float32-fabs
math/test-float32x-fabs
math/test-float64-fabs
Checked on ia64-linux-gnu.
Adhemerval Zanella [Fri, 1 Apr 2022 12:52:14 +0000 (09:52 -0300)]
x86: Remove fabs{f} implementation
For x86_64 is the same as the generic implementation, while for i686
the builtin generates the same code.
Adhemerval Zanella [Fri, 1 Apr 2022 12:51:06 +0000 (09:51 -0300)]
alpha: Remove s_abs implementations
The generic implementation already uses builtins.
DJ Delorie [Tue, 29 Mar 2022 03:53:33 +0000 (23:53 -0400)]
Allow for unpriviledged nested containers
If the build itself is run in a container, we may not be able to
fully set up a nested container for test-container testing.
Notably is the mounting of /proc, since it's critical that it
be mounted from within the same PID namespace as its users, and
thus cannot be bind mounted from outside the container like other
mounts.
This patch defaults to using the parent's PID namespace instead of
creating a new one, as this is more likely to be allowed.
If the test needs an isolated PID namespace, it should add the "pidns"
command to its init script.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Szabolcs Nagy [Tue, 29 Mar 2022 08:50:23 +0000 (09:50 +0100)]
Increase the test timeout of some string tests
Recent changes in test-strncasecmp and test-strncmp pushed the run time
of the tests above the 4 minute limit specified in test-string.h on an
arm tester machine.
Siddhesh Poyarekar [Thu, 31 Mar 2022 16:30:58 +0000 (22:00 +0530)]
realpath: Bring back GNU extension on ENOENT and EACCES [BZ #28996]
The GNU extension for realpath states that if the path resolution fails
with ENOENT or EACCES and the resolved buffer is non-NULL, it will
contain part of the path that failed resolution.
commit
949ad78a189194048df8a253bb31d1d11d919044 broke this when it
omitted the copy on failure. Bring it back partially to continue
supporting this GNU extension.
Resolves: BZ #28996
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
Adhemerval Zanella [Thu, 24 Mar 2022 18:17:12 +0000 (15:17 -0300)]
stdlib: Fix tst-getrandom memcmp call
The idea is to check if the up sizeof (buf) are equal, not only
the first byte.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Thu, 24 Mar 2022 18:22:55 +0000 (15:22 -0300)]
stdlib: Fix tst-rand48.c printf types
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 11:53:42 +0000 (08:53 -0300)]
elf: Remove unused functions from tst-audit25(a,b)
Adhemerval Zanella [Fri, 25 Mar 2022 12:01:48 +0000 (09:01 -0300)]
nptl: Use libc-diag.h with tst-thread-setspecific
And also use libsupport.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 14:13:37 +0000 (11:13 -0300)]
crypt: Remove unused variable on cert test
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 14:16:26 +0000 (11:16 -0300)]
elf: Remove unused variables in tests
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 14:16:49 +0000 (11:16 -0300)]
elf: Fix wrong fscanf usage on tst-pldd
To take in consideration the extra '\0'.
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 14:25:22 +0000 (11:25 -0300)]
posix: Remove unused variable on tst-_Fork.c
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 14:26:26 +0000 (11:26 -0300)]
resolv: Initialize loop variable on tst-resolv-trailing
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Fri, 25 Mar 2022 16:57:38 +0000 (13:57 -0300)]
locale: Remove set but unused variable on ld-collate.c
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Mon, 28 Mar 2022 17:55:52 +0000 (14:55 -0300)]
localedate: Fix printf type on tst_mbrtowc
Checked on x86_64-linux-gnu and i686-linux-gnu.
Adhemerval Zanella [Mon, 28 Mar 2022 17:40:55 +0000 (14:40 -0300)]
localedata: Remove unused variables in tests
Checked on x86_64-linux-gnu and i686-linux-gnu.
Noah Goldstein [Fri, 25 Mar 2022 22:13:33 +0000 (17:13 -0500)]
x86: Small improvements for wcslen
Just a few QOL changes.
1. Prefer `add` > `lea` as it has high execution units it can run
on.
2. Don't break macro-fusion between `test` and `jcc`
3. Reduce code size by removing gratuitous padding bytes (-90
bytes).
geometric_mean(N=20) of all benchmarks New / Original: 0.959
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Fri, 25 Mar 2022 22:13:32 +0000 (17:13 -0500)]
x86: Small improvements for wcscpy-ssse3
Just a few small QOL changes.
1. Prefer `add` > `lea` as it has high execution units it can run
on.
2. Don't break macro-fusion between `test` and `jcc`
geometric_mean(N=20) of all benchmarks New / Original: 0.973
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Cristian Rodríguez [Sun, 13 Mar 2022 18:40:24 +0000 (18:40 +0000)]
debug: Improve fdelt_chk error message
It is not a "buffer overflow detected" but an out of range
bit on fd_set
Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Joseph Myers [Mon, 28 Mar 2022 13:16:48 +0000 (13:16 +0000)]
Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h
Add the new HWCAP2_AFP and HWCAP2_RPRES constants from Linux 5.17.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
Noah Goldstein [Wed, 23 Mar 2022 21:57:46 +0000 (16:57 -0500)]
x86: Remove AVX str{n}casecmp
The rational is:
1. SSE42 has nearly identical logic so any benefit is minimal (3.4%
regression on Tigerlake using SSE42 versus AVX across the
benchtest suite).
2. AVX2 version covers the majority of targets that previously
prefered it.
3. The targets where AVX would still be best (SnB and IVB) are
becoming outdated.
All in all the saving the code size is worth it.
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Thu, 24 Mar 2022 23:56:13 +0000 (18:56 -0500)]
x86: Add EVEX optimized str{n}casecmp
geometric_mean(N=40) of all benchmarks EVEX / SSE42: .621
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Thu, 24 Mar 2022 23:56:12 +0000 (18:56 -0500)]
x86: Add AVX2 optimized str{n}casecmp
geometric_mean(N=40) of all benchmarks AVX2 / SSE42: .702
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Wed, 23 Mar 2022 21:57:40 +0000 (16:57 -0500)]
string: Expand page cross test cases in test-strncmp.c
Test cases for when both `s1` and `s2` are near the end of a page
where previously missing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Wed, 23 Mar 2022 21:57:39 +0000 (16:57 -0500)]
string: Expand page cross test cases in test-strcmp.c
Test cases for when both `s1` and `s2` are near the end of a page
where previously missing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Wed, 23 Mar 2022 21:57:38 +0000 (16:57 -0500)]
x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S
Slightly faster method of doing TOLOWER that saves an
instruction.
Also replace the hard coded 5-byte no with .p2align 4. On builds with
CET enabled this misaligned entry to strcasecmp.
geometric_mean(N=40) of all benchmarks New / Original: .920
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Noah Goldstein [Wed, 23 Mar 2022 21:57:36 +0000 (16:57 -0500)]
x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S
Slightly faster method of doing TOLOWER that saves an
instruction.
Also replace the hard coded 5-byte no with .p2align 4. On builds with
CET enabled this misaligned entry to strcasecmp.
geometric_mean(N=40) of all benchmarks New / Original: .894
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>