Adhemerval Zanella [Mon, 30 Aug 2021 17:01:00 +0000 (14:01 -0300)]
malloc: Enable huge page support on main arena
This patch adds support huge page support on main arena allocation,
enable with tunable glibc.malloc.hugetlb=2. The patch essentially
disable the __glibc_morecore() sbrk() call (similar when memory
tag does when sbrk() call does not support it) and fallback to
default page size if the memory allocation fails.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Mon, 30 Aug 2021 13:56:55 +0000 (10:56 -0300)]
malloc: Move MORECORE fallback mmap to sysmalloc_mmap_fallback
So it can be used on hugepage code as well.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Fri, 20 Aug 2021 16:22:35 +0000 (13:22 -0300)]
malloc: Add Huge Page support to arenas
It is enabled as default for glibc.malloc.hugetlb set to 2 or higher.
It also uses a non configurable minimum value and maximum value,
currently set respectively to 1 and 4 selected huge page size.
The arena allocation with huge pages does not use MAP_NORESERVE. As
indicate by kernel internal documentation [1], the flag might trigger
a SIGBUS on soft page faults if at memory access there is no left
pages in the pool.
On systems without a reserved huge pages pool, is just stress the
mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is
required to create a pool with some allocated pages.
Checked on x86_64-linux-gnu with no reserved pages, 10 reserved pages
(which trigger mmap(MAP_HUGETBL) failures) and with 256 reserved pages
(which does not trigger mmap(MAP_HUGETLB) failures).
[1] https://www.kernel.org/doc/html/v4.18/vm/hugetlbfs_reserv.html#resv-map-modifications
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Mon, 16 Aug 2021 18:08:27 +0000 (15:08 -0300)]
malloc: Add Huge Page support for mmap
With the morecore hook removed, there is not easy way to provide huge
pages support on with glibc allocator without resorting to transparent
huge pages. And some users and programs do prefer to use the huge pages
directly instead of THP for multiple reasons: no splitting, re-merging
by the VM, no TLB shootdowns for running processes, fast allocation
from the reserve pool, no competition with the rest of the processes
unlike THP, no swapping all, etc.
This patch extends the 'glibc.malloc.hugetlb' tunable: the value
'2' means to use huge pages directly with the system default size,
while a positive value means and specific page size that is matched
against the supported ones by the system.
Currently only memory allocated on sysmalloc() is handled, the arenas
still uses the default system page size.
To test is a new rule is added tests-malloc-hugetlb2, which run the
addes tests with the required GLIBC_TUNABLE setting. On systems without
a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB)
allocation failure. To improve test coverage it is required to create
a pool with some allocated pages.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Mon, 16 Aug 2021 14:14:20 +0000 (11:14 -0300)]
malloc: Move mmap logic to its own function
So it can be used with different pagesize and flags.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Fri, 13 Aug 2021 13:06:04 +0000 (10:06 -0300)]
malloc: Add THP/madvise support for sbrk
To increase effectiveness with Transparent Huge Page with madvise, the
large page size is use instead page size for sbrk increment for the
main arena.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Adhemerval Zanella [Fri, 13 Aug 2021 11:36:29 +0000 (08:36 -0300)]
malloc: Add madvise support for Transparent Huge Pages
Linux Transparent Huge Pages (THP) current supports three different
states: 'never', 'madvise', and 'always'. The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
pages. However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through a madvise(MADV_HUGEPAGE) call.
To enable it a new tunable is provided, 'glibc.malloc.hugetlb',
where setting to a value diffent than 0 enables the madvise call.
This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc() with sizes larger than the default huge
page size. The madvise() call is disable is system does not support
THP or if it has the mode set to "never" and on Linux only support
one page size for THP, even if the architecture supports multiple
sizes.
To test is a new rule is added tests-malloc-hugetlb1, which run the
addes tests with the required GLIBC_TUNABLE setting.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Florian Weimer [Wed, 15 Dec 2021 15:06:25 +0000 (16:06 +0100)]
powerpc: Use global register variable in <thread_pointer.h>
A local register variable is merely a compiler hint, and so not
appropriate in this context. Move the global register variable into
<thread_pointer.h> and include it from <tls.h>, as there can only
be one global definition for one particular register.
Fixes commit
8dbeb0561eeb876f557ac9eef5721912ec074ea5
("nptl: Add <thread_pointer.h> for defining __thread_pointer").
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Adhemerval Zanella [Thu, 20 May 2021 17:20:18 +0000 (14:20 -0300)]
Use LFS and 64 bit time for installed programs (BZ #15333)
The installed programs are built with a combination of different
values for MODULE_NAME, as below. To enable both Long File Support
and 64 bt time, -D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64 is added for
nonlibi, nscd, lddlibc4, libresolv, ldconfig, locale_programs,
iconvprogs, libnss_files, libnss_compat, libnss_db, libnss_hesiod,
libutil, libpcprofile, and libSegFault.
nscd/nscd
nscd/nscd.o MODULE_NAME=nscd
nscd/connections.o MODULE_NAME=nscd
nscd/pwdcache.o MODULE_NAME=nscd
nscd/getpwnam_r.o MODULE_NAME=nscd
nscd/getpwuid_r.o MODULE_NAME=nscd
nscd/grpcache.o MODULE_NAME=nscd
nscd/getgrnam_r.o MODULE_NAME=nscd
nscd/getgrgid_r.o MODULE_NAME=nscd
nscd/hstcache.o MODULE_NAME=nscd
nscd/gethstbyad_r.o MODULE_NAME=nscd
nscd/gethstbynm3_r.o MODULE_NAME=nscd
nscd/getsrvbynm_r.o MODULE_NAME=nscd
nscd/getsrvbypt_r.o MODULE_NAME=nscd
nscd/servicescache.o MODULE_NAME=nscd
nscd/dbg_log.o MODULE_NAME=nscd
nscd/nscd_conf.o MODULE_NAME=nscd
nscd/nscd_stat.o MODULE_NAME=nscd
nscd/cache.o MODULE_NAME=nscd
nscd/mem.o MODULE_NAME=nscd
nscd/nscd_setup_thread.o MODULE_NAME=nscd
nscd/xmalloc.o MODULE_NAME=nscd
nscd/xstrdup.o MODULE_NAME=nscd
nscd/aicache.o MODULE_NAME=nscd
nscd/initgrcache.o MODULE_NAME=nscd
nscd/gai.o MODULE_NAME=nscd
nscd/res_hconf.o MODULE_NAME=nscd
nscd/netgroupcache.o MODULE_NAME=nscd
nscd/cachedumper.o MODULE_NAME=nscd
elf/lddlibc4
elf/lddlibc4 MODULE_NAME=lddlibc4
elf/pldd
elf/pldd.o MODULE_NAME=nonlib
elf/xmalloc.o MODULE_NAME=nonlib
elf/sln
elf/sln.o MODULE_NAME=nonlib
elf/static-stubs.o MODULE_NAME=nonlib
elf/sprof MODULE_NAME=nonlib
elf/ldconfig
elf/ldconfig.o MODULE_NAME=ldconfig
elf/cache.o MODULE_NAME=nonlib
elf/readlib.o MODULE_NAME=nonlib
elf/xmalloc.o MODULE_NAME=nonlib
elf/xstrdup.o MODULE_NAME=nonlib
elf/chroot_canon.o MODULE_NAME=nonlib
elf/static-stubs.o MODULE_NAME=nonlib
elf/stringtable.o MODULE_NAME=nonlib
io/pwd
io/pwd.o MODULE_NAME=nonlib
locale/locale
locale/locale.o MODULE_NAME=locale_programs
locale/locale-spec.o MODULE_NAME=locale_programs
locale/charmap-dir.o MODULE_NAME=locale_programs
locale/simple-hash.o MODULE_NAME=locale_programs
locale/xmalloc.o MODULE_NAME=locale_programs
locale/xstrdup.o MODULE_NAME=locale_programs
locale/record-status.o MODULE_NAME=locale_programs
locale/xasprintf.o MODULE_NAME=locale_programs
locale/localedef
locale/localedef.o MODULE_NAME=locale_programs
locale/ld-ctype.o MODULE_NAME=locale_programs
locale/ld-messages.o MODULE_NAME=locale_programs
locale/ld-monetary.o MODULE_NAME=locale_programs
locale/ld-numeric.o MODULE_NAME=locale_programs
locale/ld-time.o MODULE_NAME=locale_programs
locale/ld-paper.o MODULE_NAME=locale_programs
locale/ld-name.o MODULE_NAME=locale_programs
locale/ld-address.o MODULE_NAME=locale_programs
locale/ld-telephone.o MODULE_NAME=locale_programs
locale/ld-measurement.o MODULE_NAME=locale_programs
locale/ld-identification.o MODULE_NAME=locale_programs
locale/ld-collate.o MODULE_NAME=locale_programs
locale/charmap.o MODULE_NAME=locale_programs
locale/linereader.o MODULE_NAME=locale_programs
locale/locfile.o MODULE_NAME=locale_programs
locale/repertoire.o MODULE_NAME=locale_programs
locale/locarchive.o MODULE_NAME=locale_programs
locale/md5.o MODULE_NAME=locale_programs
locale/charmap-dir.o MODULE_NAME=locale_programs
locale/simple-hash.o MODULE_NAME=locale_programs
locale/xmalloc.o MODULE_NAME=locale_programs
locale/xstrdup.o MODULE_NAME=locale_programs
locale/record-status.o MODULE_NAME=locale_programs
locale/xasprintf.o MODULE_NAME=locale_programs
catgets/gencat
catgets/gencat.o MODULE_NAME=nonlib
catgets/xmalloc.o MODULE_NAME=nonlib
nss/makedb
nss/makedb.o MODULE_NAME=nonlib
nss/xmalloc.o MODULE_NAME=nonlib
nss/hash-string.o MODULE_NAME=nonlib
nss/getent
nss/getent.o MODULE_NAME=nonlib
posix/getconf
posix/getconf.o MODULE_NAME=nonlib
login/utmpdump
login/utmpdump.o MODULE_NAME=nonlib
debug/pcprofiledump
debug/pcprofiledump.o MODULE_NAME=nonlib
timezone/zic
timezone/zic.o MODULE_NAME=nonlib
timezone/zdump
timezone/zdump.o MODULE_NAME=nonlib
iconv/iconv_prog
iconv/iconv_prog.o MODULE_NAME=nonlib
iconv/iconv_charmap.o MODULE_NAME=iconvprogs
iconv/charmap.o MODULE_NAME=iconvprogs
iconv/charmap-dir.o MODULE_NAME=iconvprogs
iconv/linereader.o MODULE_NAME=iconvprogs
iconv/dummy-repertoire.o MODULE_NAME=iconvprogs
iconv/simple-hash.o MODULE_NAME=iconvprogs
iconv/xstrdup.o MODULE_NAME=iconvprogs
iconv/xmalloc.o MODULE_NAME=iconvprogs
iconv/record-status.o MODULE_NAME=iconvprogs
iconv/iconvconfig
iconv/iconvconfig.o MODULE_NAME=nonlib
iconv/strtab.o MODULE_NAME=iconvprogs
iconv/xmalloc.o MODULE_NAME=iconvprogs
iconv/hash-string.o MODULE_NAME=iconvprogs
nss/libnss_files.so MODULE_NAME=libnss_files
nss/libnss_compat.so.2 MODULE_NAME=libnss_compat
nss/libnss_db.so MODULE_NAME=libnss_db
hesiod/libnss_hesiod.so MODULE_NAME=libnss_hesiod
login/libutil.so MODULE_NAME=libutil
debug/libpcprofile.so MODULE_NAME=libpcprofile
debug/libSegFault.so MODULE_NAME=libSegFault
Also, to avoid adding both LFS and 64 bit time support on internal
tests they are moved to a newer 'testsuite-internal' module. It
should be similar to 'nonlib' regarding internal definition and
linking namespace.
This patch also enables LFS and 64 bit support of libsupport container
programs (echo-container, test-container, shell-container, and
true-container).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
H.J. Lu [Fri, 10 Dec 2021 23:44:46 +0000 (15:44 -0800)]
Support target specific ALIGN for variable alignment test [BZ #28676]
Add <tst-file-align.h> to support target specific ALIGN for variable
alignment test:
1. Alpha: Use 0x10000.
2. MicroBlaze and Nios II: Use 0x8000.
3. All others: Use 0x200000.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
H.J. Lu [Tue, 14 Dec 2021 15:19:36 +0000 (07:19 -0800)]
NEWS: Document LD_PREFER_MAP_32BIT_EXEC as x86-64 only
H.J. Lu [Mon, 13 Dec 2021 15:17:29 +0000 (07:17 -0800)]
elf: Align argument of __munmap to page size [BZ #28676]
On Linux/x86-64, for elf/tst-align3, we now get
munmap(0x7f88f9401000, 1126424) = 0
instead of
munmap(0x7f1615200018, 544768) = -1 EINVAL (Invalid argument)
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Florian Weimer [Tue, 14 Dec 2021 11:37:44 +0000 (12:37 +0100)]
elf: Use new dependency sorting algorithm by default
The default has to change eventually, and there are no known failures
that require a delay.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Khem Raj [Thu, 2 Dec 2021 07:13:13 +0000 (23:13 -0800)]
intl: Emit no lines in bison generated files
Improve reproducibility:
Do not put any #line preprocessor commands in bison generated files.
These lines contain absolute paths containing file locations on
the host build machine.
Signed-off-by: Juro Bystricky <juro.bystricky@intel.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Samuel Thibault [Tue, 14 Dec 2021 07:38:05 +0000 (08:38 +0100)]
hurd: Do not set PIE_UNSUPPORTED
This is now supported.
H.J. Lu [Tue, 14 Dec 2021 00:33:57 +0000 (16:33 -0800)]
NEWS: Move LD_PREFER_MAP_32BIT_EXEC
Move LD_PREFER_MAP_32BIT_EXEC to
Deprecated and removed features, and other changes affecting compatibility:
Samuel Thibault [Tue, 14 Dec 2021 00:01:48 +0000 (01:01 +0100)]
mach: Fix spurious inclusion of stack_chk_fail_local in libmachuser.a
When linking programs statically, stack_chk_fail_local already comes
from libc_nonshared, so we don't need it in lib{mach,hurd}user.a.
H.J. Lu [Wed, 8 Dec 2021 15:02:27 +0000 (07:02 -0800)]
Disable DT_RUNPATH on NSS tests [BZ #28455]
The glibc internal NSS functions should always load NSS modules from
the system. For testing purpose, disable DT_RUNPATH on NSS tests so
that the glibc internal NSS functions can load testing NSS modules
via DT_RPATH.
This partially fixes BZ #28455.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Akila Welihinda [Sun, 12 Dec 2021 18:35:03 +0000 (10:35 -0800)]
sysdeps: Simplify sin Taylor Series calculation
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.
Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu>
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Adhemerval Zanella [Tue, 6 Apr 2021 17:33:14 +0000 (14:33 -0300)]
math: Remove the error handling wrapper from hypot and hypotf
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Wilco Dijkstra [Wed, 1 Dec 2021 14:08:14 +0000 (11:08 -0300)]
math: Use fmin/fmax on hypot
It optimizes for architectures that provides fast builtins.
Checked on aarch64-linux-gnu.
Adhemerval Zanella [Wed, 1 Dec 2021 13:57:32 +0000 (10:57 -0300)]
aarch64: Add math-use-builtins-f{max,min}.h
It allows to remove the arch-specific implementations.
Adhemerval Zanella [Wed, 1 Dec 2021 13:44:58 +0000 (10:44 -0300)]
math: Add math-use-builtinds-fmin.h
It allows the architecture to use the builtin instead of generic
implementation.
Adhemerval Zanella [Wed, 1 Dec 2021 13:37:44 +0000 (10:37 -0300)]
math: Add math-use-builtinds-fmax.h
It allows the architecture to use the builtin instead of generic
implementation.
Adhemerval Zanella [Sun, 4 Apr 2021 02:52:45 +0000 (23:52 -0300)]
math: Remove powerpc e_hypot
The generic implementation is shows only slight worse performance:
POWER10 reciprocal-throughput latency
master 8.28478 13.7253
new hypot 7.21945 13.1933
POWER9 reciprocal-throughput latency
master 13.4024 14.0967
new hypot 14.8479 15.8061
POWER8 reciprocal-throughput latency
master 15.5767 16.8885
new hypot 16.5371 18.4057
One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).
Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
Adhemerval Zanella [Tue, 6 Apr 2021 15:32:06 +0000 (12:32 -0300)]
i386: Move hypot implementation to C
The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling. The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly. A similar
implementation is provided instead of using the generic implementation:
Checked on i686-linux-gnu.
Adhemerval Zanella [Tue, 6 Apr 2021 02:55:55 +0000 (23:55 -0300)]
math: Use an improved algorithm for hypotl (ldbl-128)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
Adhemerval Zanella [Mon, 5 Apr 2021 20:28:48 +0000 (17:28 -0300)]
math: Use an improved algorithm for hypotl (ldbl-96)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).
[1] https://arxiv.org/pdf/1904.09481.pdf
Wilco Dijkstra [Tue, 30 Nov 2021 19:29:25 +0000 (16:29 -0300)]
math: Improve hypot performance with FMA
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Passes GLIBC testsuite.
Wilco Dijkstra [Mon, 8 Mar 2021 20:07:39 +0000 (17:07 -0300)]
math: Use an improved algorithm for hypot (dbl-64)
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for denormal results.
The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).
The performance result are also only slight worse than current
implementation. On x86_64 (Ryzen 5900X) with gcc 12:
Before:
"hypot": {
"workload-random": {
"duration": 3.73319e+09,
"iterations": 1.12e+08,
"reciprocal-throughput": 22.8737,
"latency": 43.7904,
"max-throughput": 4.37184e+07,
"min-throughput": 2.28361e+07
}
}
After:
"hypot": {
"workload-random": {
"duration": 3.7597e+09,
"iterations": 9.8e+07,
"reciprocal-throughput": 23.7547,
"latency": 52.9739,
"max-throughput": 4.2097e+07,
"min-throughput": 1.88772e+07
}
}
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
Adhemerval Zanella [Mon, 5 Apr 2021 17:49:47 +0000 (14:49 -0300)]
math: Simplify hypotf implementation
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by
7c10fd3515f to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Checked on x86_64-linux-gnu.
Siddhesh Poyarekar [Mon, 13 Dec 2021 04:31:45 +0000 (10:01 +0530)]
Cleanup encoding in comments
Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8
equivalents so that files don't end up with mixed encodings. With this,
all files (except tests that actually test different encodings) have a
single encoding.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Siddhesh Poyarekar [Wed, 8 Dec 2021 05:51:26 +0000 (11:21 +0530)]
Replace --enable-static-pie with --disable-default-pie
Build glibc programs and tests as PIE by default and enable static-pie
automatically if the architecture and toolchain supports it.
Also add a new configuration option --disable-default-pie to prevent
building programs as PIE.
Only the following architectures now have PIE disabled by default
because they do not work at the moment. hppa, ia64, alpha and csky
don't work because the linker is unable to handle a pcrel relocation
generated from PIE objects. The microblaze compiler is currently
failing with an ICE. GNU hurd tries to enable static-pie, which does
not work and hence fails. All these targets have default PIE disabled
at the moment and I have left it to the target maintainers to enable PIE
on their targets.
build-many-glibcs runs clean for all targets. I also tested x86_64 on
Fedora and Ubuntu, to verify that the default build as well as
--disable-default-pie work as expected with both system toolchains.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Samuel Thibault [Sat, 11 Dec 2021 22:08:32 +0000 (23:08 +0100)]
hurd: Add rules for static PIE build
This fixes [BZ #28671].
Samuel Thibault [Sat, 11 Dec 2021 23:41:38 +0000 (00:41 +0100)]
hurd: Fix gmon-static
We need to use crt0 for gmon-static too.
H.J. Lu [Fri, 10 Dec 2021 21:00:09 +0000 (13:00 -0800)]
x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since
the first PT_LOAD segment is no longer executable due to defaulting to
-z separate-code.
This fixes [BZ #28656].
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Florian Weimer [Fri, 10 Dec 2021 20:34:30 +0000 (21:34 +0100)]
elf: Use errcode instead of (unset) errno in rtld_chain_load
H.J. Lu [Thu, 9 Dec 2021 15:01:33 +0000 (07:01 -0800)]
Add a testcase to check alignment of PT_LOAD segment [BZ #28676]
Rongwei Wang [Fri, 10 Dec 2021 12:39:10 +0000 (20:39 +0800)]
elf: Properly align PT_LOAD segments [BZ #28676]
When PT_LOAD segment alignment > the page size, allocate enough space to
ensure that the segment can be properly aligned. This change helps code
segments use huge pages become simple and available.
This fixes [BZ #28676].
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Florian Weimer [Fri, 10 Dec 2021 15:06:36 +0000 (16:06 +0100)]
elf: Install a symbolic link to ld.so as /usr/bin/ld.so
This makes ld.so features such as --preload, --audit,
and --list-diagnostics more accessible to end users because they
do not need to know the ABI name of the dynamic loader.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Florian Weimer [Fri, 10 Dec 2021 04:14:24 +0000 (05:14 +0100)]
nptl: Add one more barrier to nptl/tst-create1
Without the bar_ctor_finish barrier, it was possible that thread2
re-locked user_lock before ctor had a chance to lock it. ctor then
blocked in its locking operation, xdlopen from the main thread
did not return, and thread2 was stuck waiting in bar_dtor:
thread 1: started.
thread 2: started.
thread 2: locked user_lock.
constructor started: 0.
thread 1: in ctor: started.
thread 3: started.
thread 3: done.
thread 2: unlocked user_lock.
thread 2: locked user_lock.
Fixes the test in commit
83b5323261bb72313bffcf37476c1b8f0847c736
("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]").
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 16:57:11 +0000 (17:57 +0100)]
Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN
TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely
introduced to support a configuration where the thread pointer
has not the same alignment as THREAD_SELF. Only ia64 seems to use
that, but for the stack/pointer guard, not for storing tcbhead_t.
Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift
the thread pointer, potentially landing in a different residue class
modulo the alignment, but the changes should not impact that.
In general, given that TLS variables have their own alignment
requirements, having different alignment for the (unshifted) thread
pointer and struct pthread would potentially result in dynamic
offsets, leading to more complexity.
hppa had different values before: __alignof__ (tcbhead_t), which
seems to be 4, and __alignof__ (struct pthread), which was 8
(old default) and is now 32. However, it defines THREAD_SELF as:
/* Return the thread descriptor for the current thread. */
# define THREAD_SELF \
({ struct pthread *__self; \
__self = __get_cr27(); \
__self - 1; \
})
So the thread pointer points after struct pthread (hence __self - 1),
and they have to have the same alignment on hppa as well.
Similarly, on ia64, the definitions were different. We have:
# define TLS_PRE_TCB_SIZE \
(sizeof (struct pthread) \
+ (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \
? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \
& ~(__alignof__ (struct pthread) - 1)) \
: 0))
# define THREAD_SELF \
((struct pthread *) ((char *) __thread_self - TLS_PRE_TCB_SIZE))
And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment
(confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c).
On m68k, we have a larger gap between tcbhead_t and struct pthread.
But as far as I can tell, the port is fine with that. The definition
of TCB_OFFSET is sufficient to handle the shifted TCB scenario.
This fixes commit
23c77f60181eb549f11ec2f913b4270af29eee38
("nptl: Increase default TCB alignment to 32").
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: rseq failure after registration on main thread is fatal
This simplifies the application programming model.
Browser sandboxes have already been fixed:
Sandbox is incompatible with rseq registration
<https://bugzilla.mozilla.org/show_bug.cgi?id=1651701>
Allow rseq in the Linux sandboxes. r=gcp
<https://hg.mozilla.org/mozilla-central/rev/
042425712eb1>
Sandbox needs to support rseq system call
<https://bugs.chromium.org/p/chromium/issues/detail?id=1104160>
Linux sandbox: Allow rseq(2)
<https://chromium.googlesource.com/chromium/src.git/+/
230675d9ac8f1>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Add public rseq symbols and <sys/rseq.h>
The relationship between the thread pointer and the rseq area
is made explicit. The constant offset can be used by JIT compilers
to optimize rseq access (e.g., for really fast sched_getcpu).
Extensibility is provided through __rseq_size and __rseq_flags.
(In the future, the kernel could request a different rseq size
via the auxiliary vector.)
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Add glibc.pthread.rseq tunable to control rseq registration
This tunable allows applications to register the rseq area instead
of glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
Linux: Use rseq to accelerate sched_getcpu
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Add rseq registration
The rseq area is placed directly into struct pthread. rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.
<sys/rseq.h> is not yet installed as a public header.
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Introduce THREAD_GETMEM_VOLATILE
This will be needed for rseq TCB access.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Introduce <tcb-access.h> for THREAD_* accessors
These are common between most architectures. Only the x86 targets
are outliers.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Florian Weimer [Thu, 9 Dec 2021 08:49:32 +0000 (09:49 +0100)]
nptl: Add <thread_pointer.h> for defining __thread_pointer
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.
Only architectures for which rseq support is added are covered.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
John David Anglin [Tue, 7 Dec 2021 22:10:20 +0000 (16:10 -0600)]
String: test-memcpy used unaligned types for buffers [BZ 28572]
commit
d585ba47fcda99fdf228e3e45a01b11a15efbc5a
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Mon Nov 1 00:49:48 2021 -0500
string: Make tests birdirectional test-memcpy.c
Add tests that had src/dst non 4-byte aligned. Since src/dst are
initialized/compared as uint32_t type which is 4-byte aligned this can
break on some targets.
Fix the issue by specifying a new non-aligned 4-byte
`unaligned_uint32_t` for src/dst.
Another alternative is to rely on memcpy/memcmp for
initializing/testing src/dst. Using memcpy for initializing in memcpy
tests, however, could lead to future bugs.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Aurelien Jarno [Sun, 5 Dec 2021 10:51:17 +0000 (11:51 +0100)]
localedef: check magic value on archive load [BZ #28650]
localedef currently blindly trust the archive header. When passed an
archive file with the wrong endianess, this leads to a segmentation
fault:
$ localedef --big-endian --list-archive /usr/lib/locale/locale-archive
Segmentation fault (core dumped)
When passed non-archive files, asserts are reported on the best case,
but sometimes it can lead to a segmentation fault:
$ localedef --list-archive /bin/true
localedef: programs/locarchive.c:1643: show_archive_content: Assertion `used < GET (head->namehash_used)' failed.
Aborted (core dumped)
$ localedef --list-archive /usr/lib/locale/C.utf8/LC_COLLATE
Segmentation fault (core dumped)
This patch improves the user experience by looking at the magic value,
which is always written, but never checked. It should still be possible
to trigger a segmentation fault with crafted files, but this already
catch many cases.
H.J. Lu [Mon, 6 Dec 2021 15:14:12 +0000 (07:14 -0800)]
x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
Adhemerval Zanella [Fri, 19 Nov 2021 19:58:52 +0000 (16:58 -0300)]
linux: Add generic ioctl implementation
The powerpc is refactor to use the default implementation.
Adhemerval Zanella [Fri, 19 Nov 2021 18:33:16 +0000 (15:33 -0300)]
linux: Add generic syscall implementation
It allows also to remove hppa specific implementation and simplify
riscv implementation a bit.
Florian Weimer [Mon, 6 Dec 2021 07:01:08 +0000 (08:01 +0100)]
misc, nptl: Remove stray references to __condvar_load_64_relaxed
The function was renamed to __atomic_wide_counter_load_relaxed
in commit
8bd336a00a5311bf7a9e99b3b0e9f01ff5faa74b ("nptl: Extract
<bits/atomic_wide_counter.h> from pthread_cond_common.c").
Florian Weimer [Sun, 5 Dec 2021 12:50:17 +0000 (13:50 +0100)]
csu: Always use __executable_start in gmon-start.c
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary. As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.
The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__. This happens with rseq integration
because it is necessary to include kernel headers in more places.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Florian Weimer [Sun, 5 Dec 2021 10:28:34 +0000 (11:28 +0100)]
elf: execve statically linked programs instead of crashing [BZ #28648]
Programs without dynamic dependencies and without a program
interpreter are now run via execve.
Previously, the dynamic linker either crashed while attempting to
read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT
data), or the self-relocated in the static PIE executable crashed
because the outer dynamic linker had already applied RELRO protection.
<dl-execve.h> is needed because execve is not available in the
dynamic loader on Hurd.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
H.J. Lu [Sat, 4 Dec 2021 19:25:53 +0000 (11:25 -0800)]
Add --with-timeoutfactor=NUM to specify TIMEOUTFACTOR
On Ice Lake and Tiger Lake laptops, some test programs timeout when there
are 3 "make check -j8" runs in parallel. Add --with-timeoutfactor=NUM to
specify an integer to scale the timeout of test programs, which can be
overriden by TIMEOUTFACTOR environment variable.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Noah Goldstein [Fri, 3 Dec 2021 23:29:25 +0000 (15:29 -0800)]
x86-64: Use notl in EVEX strcmp [BZ #28646]
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Florian Weimer [Fri, 3 Dec 2021 15:28:07 +0000 (16:28 +0100)]
nptl: Increase default TCB alignment to 32
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.
nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Luca Boccassi [Thu, 2 Dec 2021 22:01:29 +0000 (23:01 +0100)]
elf: add definition for ELF_NOTE_FDO and NT_FDO_PACKAGING_METADATA note
As defined on: https://systemd.io/COREDUMP_PACKAGE_METADATA/
this note will be used starting from Fedora 36.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Wilco Dijkstra [Thu, 2 Dec 2021 18:33:26 +0000 (18:33 +0000)]
AArch64: Improve A64FX memcpy
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Wilco Dijkstra [Thu, 2 Dec 2021 18:30:55 +0000 (18:30 +0000)]
AArch64: Optimize memcmp
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Matheus Castanho [Tue, 26 Oct 2021 13:44:59 +0000 (10:44 -0300)]
powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532]
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.
After this change:
$ readelf -wF libc.so.6 | grep
0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6
00004a48 0000000000000020 00004a4c FDE cie=
00000000 pc=
000000000004b9d4..
000000000004ba3c
LOC CFA r31 ra
000000000004b9d4 r1+0 u u
000000000004b9e4 r1+48 u u
000000000004b9e8 r1+48 c-16 u
000000000004b9fc r1+48 c-16 c+16
000000000004ba08 r1+48 c-16
000000000004ba18 r1+48 u
000000000004ba1c r1+0 u
libc.so.6: file format elf64-powerpcle
Disassembly of section .text:
000000000004b9d4 <kill>:
4b9d4: 1f 00 4c 3c addis r2,r12,31
4b9d8: 2c c3 42 38 addi r2,r2,-15572
4b9dc: 25 00 00 38 li r0,37
4b9e0: d1 ff 21 f8 stdu r1,-48(r1)
4b9e4: 20 00 e1 fb std r31,32(r1)
4b9e8: 98 8f ed eb ld r31,-28776(r13)
4b9ec: 10 00 ff 77 andis. r31,r31,16
4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38>
4b9f4: a6 02 28 7d mflr r9
4b9f8: 40 00 21 f9 std r9,64(r1)
4b9fc: 01 00 00 44 scv 0
4ba00: 40 00 21 e9 ld r9,64(r1)
4ba04: a6 03 28 7d mtlr r9
4ba08: 08 00 00 48 b 4ba10 <kill+0x3c>
4ba0c: 02 00 00 44 sc
4ba10: 00 00 bf 2e cmpdi cr5,r31,0
4ba14: 20 00 e1 eb ld r31,32(r1)
4ba18: 30 00 21 38 addi r1,r1,48
4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60>
4ba20: 01 f0 20 39 li r9,-4095
4ba24: 40 48 23 7c cmpld r3,r9
4ba28: 20 00 e0 4d bltlr+
4ba2c: d0 00 63 7c neg r3,r3
4ba30: 08 00 00 48 b 4ba38 <kill+0x64>
4ba34: 20 00 e3 4c bnslr+
4ba38: c8 32 fe 4b b 2ed00 <__syscall_error>
...
4ba44: 40 20 0c 00 .long 0xc2040
4ba48: 68 00 00 00 .long 0x68
4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3
4ba50: 6b 69 6c 6c xoris r12,r3,26987
Adhemerval Zanella [Fri, 26 Jun 2020 12:29:11 +0000 (09:29 -0300)]
linux: Implement pipe in terms of __NR_pipe2
The syscall pipe2 was added in linux 2.6.27 and glibc requires linux
3.2.0. The patch removes the arch-specific implementation for alpha,
ia64, mips, sh, and sparc which requires a different kernel ABI
than the usual one.
Checked on x86_64-linux-gnu and with a build for the affected ABIs.
Adhemerval Zanella [Wed, 17 Jun 2020 12:29:07 +0000 (09:29 -0300)]
linux: Implement mremap in C
Variadic function calls in syscalls.list does not work for all ABIs
(for instance where the argument are passed on the stack instead of
registers) and might have underlying issues depending of the variadic
type (for instance if a 64-bit argument is used).
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Thu, 11 Jun 2020 20:41:16 +0000 (17:41 -0300)]
linux: Add prlimit64 C implementation
The LFS prlimit64 requires a arch-specific implementation in
syscalls.list. Instead add a generic one that handles the
required symbol alias for __RLIM_T_MATCHES_RLIM64_T.
HPPA is the only outlier which requires a different default
symbol.
Checked on x86_64-linux-gnu and with build for the affected ABIs.
Florian Weimer [Tue, 30 Nov 2021 14:39:17 +0000 (15:39 +0100)]
elf: Include <stdbool.h> in tst-tls20.c
The test uses the bool type.
Florian Weimer [Tue, 30 Nov 2021 13:35:54 +0000 (14:35 +0100)]
elf: Include <stdint.h> in tst-tls20.c
The test uses standard integer types.
Samuel Thibault [Sun, 28 Nov 2021 20:26:25 +0000 (21:26 +0100)]
hurd: Let report-wait use a weak reference to _hurd_itimer_thread
libc.so.0.3 does not seem to need this defined any more.
Adhemerval Zanella [Thu, 25 Nov 2021 12:12:00 +0000 (09:12 -0300)]
linux: Use /proc/stat fallback for __get_nprocs_conf (BZ #28624)
The /proc/statm fallback was removed by
f13fb81ad3159 if sysfs is
not available, reinstate it.
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Thu, 11 Jun 2020 19:49:40 +0000 (16:49 -0300)]
linux: Add fanotify_mark C implementation
Passing 64-bit arguments on syscalls.list is tricky: it requires
to reimplement the expected kernel abi in each architecture. This
is way to better to represent in C code where we already have
macros for this (SYSCALL_LL64).
Checked on x86_64-linux-gnu.
Adhemerval Zanella [Wed, 24 Nov 2021 15:57:57 +0000 (12:57 -0300)]
linux: Only build fstatat fallback if required
For 32-bit architecture with __ASSUME_STATX there is no need to
build fstatat64_time64_stat.
Checked on i686-linux-gnu.
Paul Eggert [Wed, 24 Nov 2021 22:16:09 +0000 (14:16 -0800)]
regex: fix buffer read overrun in search [BZ#28470]
Problem reported by Benno Schulenberg in:
https://lists.gnu.org/r/bug-gnulib/2021-10/msg00035.html
* posix/regexec.c (re_search_internal): Use better bounds check.
Sunil K Pandey [Sat, 6 Nov 2021 06:29:02 +0000 (23:29 -0700)]
x86-64: Add vector sin/sinf to libmvec microbenchmark
Add vector sin/sinf and input files to libmvec microbenchmark.
libmvec-sin-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-sinf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Sat, 6 Nov 2021 06:19:53 +0000 (23:19 -0700)]
x86-64: Add vector pow/powf to libmvec microbenchmark
Add vector pow/powf and input files to libmvec microbenchmark.
libmvec-pow-inputs:
arg1:
90% Normal random distribution
range: (0.0, 256.0)
mean: 0.0
sigma: 32.0
10% uniform random distribution in range (0.0, 256.0)
arg2:
90% Normal random distribution
range: (-127.0, 127.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-127.0, 127.0)
libmvec-powf-inputs:
arg1:
90% Normal random distribution
range: (0.0f, 100.0f)
mean: 0.0f
sigma: 16.0f
10% uniform random distribution in range (0.0f, 100.0f)
arg2:
90% Normal random distribution
range: (-10.0f, 10.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-10.0f, 10.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Sat, 6 Nov 2021 06:15:20 +0000 (23:15 -0700)]
x86-64: Add vector log/logf to libmvec microbenchmark
Add vector log/logf and input files to libmvec microbenchmark.
libmvec-log-inputs:
70% Normal random distribution
range: (0.0, DBL_MAX)
mean: 1.0
sigma: 50.0
30% uniform random distribution in range (0.0, DBL_MAX)
libmvec-logf-inputs:
70% Normal random distribution
range: (0.0f, FLT_MAX)
mean: 1.0f
sigma: 50.0f
30% uniform random distribution in range (0.0f, FLT_MAX)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Sat, 6 Nov 2021 06:11:00 +0000 (23:11 -0700)]
x86-64: Add vector exp/expf to libmvec microbenchmark
Add vector exp/expf and input files to libmvec microbenchmark.
libmvec-exp-inputs:
90% Normal random distribution
range: (-708.0, 709.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-500.0, 500.0)
libmvec-expf-inputs:
90% Normal random distribution
range: (-87.0f, 88.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-50.0f, 50.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Sunil K Pandey [Fri, 5 Nov 2021 20:44:49 +0000 (13:44 -0700)]
x86-64: Add vector cos/cosf to libmvec microbenchmark
Add vector cos/cosf and input files to libmvec microbenchmark.
libmvec-cos-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-cosf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Adhemerval Zanella [Mon, 8 Nov 2021 13:20:23 +0000 (10:20 -0300)]
io: Refactor close_range and closefrom
Now that Hurd implementis both close_range and closefrom (
f2c996597d),
we can make close_range() a base ABI, and make the default closefrom()
implementation on top of close_range().
The generic closefrom() implementation based on __getdtablesize() is
moved to generic close_range(). On Linux it will be overriden by
the auto-generation syscall while on Hurd it will be a system specific
implementation.
The closefrom() now calls close_range() and __closefrom_fallback().
Since on Hurd close_range() does not fail, __closefrom_fallback() is an
empty static inline function set by__ASSUME_CLOSE_RANGE.
The __ASSUME_CLOSE_RANGE also allows optimize Linux
__closefrom_fallback() implementation when --enable-kernel=5.9 or
higher is used.
Finally the Linux specific tst-close_range.c is moved to io and
enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are
guarded so it can be built for Hurd (I have not actually test it).
Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu
build.
Florian Weimer [Wed, 24 Nov 2021 07:59:54 +0000 (08:59 +0100)]
nptl: Do not set signal mask on second setjmp return [BZ #28607]
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel). This is observable with blocked pending
signals during thread exit.
Fixes commit
b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8
("nptl: Start new threads with all signals blocked [BZ #25098]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Sat, 6 Nov 2021 12:38:09 +0000 (09:38 -0300)]
powerpc: Define USE_PPC64_NOTOC iff compiler supports it
The @notoc usage only yields an advantage on ISA 3.1+ machine (power10)
and for ld.bfd also when it sees pcrel relocations used on the code
(generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+
instruction on stubs are used iff linker also sees the new pc-relative
relocations (for instance R_PPC64_D34), otherwise it generates default
stubs (ppc64_elf_check_relocs:4700).
This patch also help on linkers that do not implement this optimization,
since building for older ISA (such as 3.0 / power9) will also trigger
power10 stubs generation in the assembly code uses the NOTOC imacro.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Adhemerval Zanella [Fri, 19 Nov 2021 14:06:00 +0000 (11:06 -0300)]
setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym
It requires less boilerplate code for newer ports. The _Static_assert
checks from internal setjmp are moved to its own internal test since
setjmp.h is included early by multiple headers (to generate
rtld-sizes.sym).
The riscv jmp_buf-macros.h check is also redundant, it is already
done by riscv configure.ac.
Checked with a build for the affected architectures.
Joseph Myers [Mon, 22 Nov 2021 15:30:12 +0000 (15:30 +0000)]
Update kernel version to 5.15 in tst-mman-consts.py
This patch updates the kernel version in the test tst-mman-consts.py
to 5.15. (There are no new MAP_* constants covered by this test in
5.15 that need any other header changes.)
Tested with build-many-glibcs.py.
Florian Weimer [Mon, 22 Nov 2021 13:41:14 +0000 (14:41 +0100)]
socket: Do not use AF_NETLINK in __opensock
It is not possible to use interface ioctls with netlink sockets
on all Linux kernels.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Thu, 11 Nov 2021 12:28:21 +0000 (09:28 -0300)]
elf: Move la_activity (LA_ACT_ADD) after _dl_add_to_namespace_list() (BZ #28062)
It ensures that the the namespace is guaranteed to not be empty.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Joseph Myers [Wed, 17 Nov 2021 14:25:16 +0000 (14:25 +0000)]
Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h
Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add
these constants to bits/socket.h.
Tested for x86_64.
Stafford Horne [Wed, 13 Oct 2021 10:43:30 +0000 (19:43 +0900)]
malloc: Fix malloc debug for 2.35 onwards
The change
1e5a5866cb ("Remove malloc hooks [BZ #23328]") has broken
ports that are using GLIBC_2_35, like the new OpenRISC port I am working
on.
The libc_malloc_debug.so library used to bring in the debug
infrastructure is currently essentially empty for GLIBC_2_35 ports like
mine causing mtrace tests to fail:
cat sysdeps/unix/sysv/linux/or1k/shlib-versions
DEFAULT GLIBC_2.35
ld=ld-linux-or1k.so.1
FAIL: posix/bug-glob2-mem
FAIL: posix/bug-regex14-mem
FAIL: posix/bug-regex2-mem
FAIL: posix/bug-regex21-mem
FAIL: posix/bug-regex31-mem
FAIL: posix/bug-regex36-mem
FAIL: malloc/tst-mtrace.
The issue seems to be with the ifdefs in malloc/malloc-debug.c. The
ifdefs are currently essentially exluding all symbols for ports > 2.35.
Removing the top level SHLIB_COMPAT ifdef allows things to just work.
Fixes:
1e5a5866cb ("Remove malloc hooks [BZ #23328]")
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Florian Weimer [Wed, 17 Nov 2021 11:20:29 +0000 (12:20 +0100)]
elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres
This will be used to deallocate memory allocated using the non-minimal
malloc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Florian Weimer [Wed, 17 Nov 2021 11:20:13 +0000 (12:20 +0100)]
nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c
And make it an installed header. This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.
The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Sunil K Pandey [Fri, 5 Nov 2021 07:56:47 +0000 (00:56 -0700)]
x86-64: Create microbenchmark infrastructure for libmvec
Add python script to generate libmvec microbenchmark from the input
values for each libmvec function using skeleton benchmark template.
Creates double and float benchmarks with vector length 1, 2, 4, 8,
and 16 for each libmvec function. Vector length 1 corresponds to
scalar version of function and is included for vector function perf
comparison.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Adhemerval Zanella [Tue, 16 Nov 2021 18:58:16 +0000 (15:58 -0300)]
elf: hidden visibility for __minimal_malloc functions
Since
b05fae4d8e34, __minimal malloc code is used during static
startup before PIE self-relocation (_dl_relocate_static_pie).
So it requires the same fix done for other objects by
47618209d05a.
Checked on aarch64, x86_64, and i686 with and without static-pie.
H.J. Lu [Tue, 16 Nov 2021 00:28:39 +0000 (16:28 -0800)]
elf: Use a temporary file to generate Makefile fragments [BZ #28550]
1. Use a temporary file to generate Makefile fragments for DSO sorting
tests and use -include on them.
2. Add Makefile fragments to postclean-generated so that a "make clean"
removes the autogenerated fragments and a subsequent "make" regenerates
them.
This partially fixes BZ #28550.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
H.J. Lu [Sat, 6 Nov 2021 21:13:27 +0000 (14:13 -0700)]
dso-ordering-test.py: Put all sources in one directory [BZ #28550]
Put all sources for DSO sorting tests in the dso-sort-tests-src directory
and compile test relocatable objects with
$(objpfx)tst-dso-ordering1-dir/tst-dso-ordering1-a.os: $(objpfx)dso-sort-tests-src/tst-dso-ordering1-a.c
$(compile.c) $(OUTPUT_OPTION)
to avoid random $< values from $(before-compile) when compiling test
relocatable objects with
$(objpfx)%$o: $(objpfx)%.c $(before-compile); $$(compile-command.c)
compile-command.c = $(compile.c) $(OUTPUT_OPTION) $(compile-mkdep-flags)
compile.c = $(CC) $< -c $(CFLAGS) $(CPPFLAGS)
for 3 "make -j 28" parallel builds on a machine with 112 cores at the
same time.
This partially fixes BZ #28550.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval Zanella [Thu, 29 Jul 2021 14:13:57 +0000 (11:13 -0300)]
elf: Move LAV_CURRENT to link_lavcurrent.h
No functional change.
H.J. Lu [Fri, 12 Nov 2021 19:47:42 +0000 (11:47 -0800)]
Move assignment out of the CAS condition
Update
commit
49302b8fdf9103b6fc0a398678668a22fa19574c
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:54:01 2021 -0800
Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
and
commit
0b82747dc48d5bf0871bdc6da8cb6eec1256355f
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:31:51 2021 -0800
Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
by moving assignment out of the CAS condition.
H.J. Lu [Mon, 8 Nov 2021 15:49:11 +0000 (07:49 -0800)]
Add a comment for --enable-initfini-array [BZ #27945]
Document that --enable-initfini-array is enabled by default in GCC 12,
which can be removed when GCC 12 becomes the minimum requirement.
Stafford Horne [Sun, 19 Sep 2021 21:03:07 +0000 (06:03 +0900)]
tst-tzset: output reason when creating 4GiB file fails
Currently, if the temporary file creation fails the create_tz_file
function returns NULL. The NULL pointer is then passed to setenv which
causes a SIGSEGV. Rather than failing with a SIGSEGV print a warning
and exit.
H.J. Lu [Wed, 3 Nov 2021 01:33:07 +0000 (18:33 -0700)]
Add LLL_MUTEX_READ_LOCK [BZ #28537]
CAS instruction is expensive. From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading. See Appendix
A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.
Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>