Heiko Carstens [Sun, 29 May 2022 16:55:06 +0000 (18:55 +0200)]
s390/uaccess: use symbolic names for inline assembler operands
Make code easier to read by using symbolic names.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Alexander Gordeev [Mon, 23 May 2022 10:38:14 +0000 (12:38 +0200)]
s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag
Commit
d768bd892fc8 ("s390: add options to change branch prediction
behaviour for the kernel") introduced .Lsie_exit label - supposedly
to fence off SIE instruction. However, the corresponding address
range length .Lsie_crit_mcck_length was not updated, which led to
BPON code potentionally marked with CIF_MCCK_GUEST flag.
Both .Lsie_exit and .Lsie_crit_mcck_length were removed with commit
0b0ed657fe00 ("s390: remove critical section cleanup from entry.S"),
but the issue persisted - currently BPOFF and BPENTER macros might
get wrongly considered by the machine check handler as a guest.
Fixes:
d768bd892fc8 ("s390: add options to change branch prediction behaviour for the kernel")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Christian Borntraeger [Mon, 30 May 2022 09:27:06 +0000 (11:27 +0200)]
s390/mm: use non-quiescing sske for KVM switch to keyed guest
The switch to a keyed guest does not require a classic sske as the other
guest CPUs are not accessing the key before the switch is complete.
By using the NQ SSKE things are faster especially with multiple guests.
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Suggested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220530092706.11637-3-borntraeger@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Christian Borntraeger [Mon, 30 May 2022 09:27:05 +0000 (11:27 +0200)]
s390/gmap: voluntarily schedule during key setting
With large and many guest with storage keys it is possible to create
large latencies or stalls during initial key setting:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 18-....: (2099 ticks this GP) idle=54e/1/0x4000000000000002 softirq=
35598716/
35598716 fqs=998
(t=2100 jiffies g=
155867385 q=20879)
Task dump for CPU 18:
CPU 1/KVM R running task 0 1030947 256019 0x06000004
Call Trace:
sched_show_task
rcu_dump_cpu_stacks
rcu_sched_clock_irq
update_process_times
tick_sched_handle
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
do_IRQ
ext_int_handler
ptep_zap_key
The mmap lock is held during the page walking but since this is a
semaphore scheduling is still possible. Same for the kvm srcu.
To minimize overhead do this on every segment table entry or large page.
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220530092706.11637-2-borntraeger@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Eric Farman [Wed, 25 May 2022 14:40:28 +0000 (16:40 +0200)]
MAINTAINERS: Update s390 virtio-ccw
Add myself to the kernel side of virtio-ccw
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/20220525144028.2714489-2-farman@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 30 May 2022 09:37:48 +0000 (11:37 +0200)]
s390/kexec: add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP
Avoid invoking the OOM-killer when allocating the control page. This
is the s390 variant of commit
dc5cccacf427 ("kexec: don't invoke
OOM-killer for control page allocation").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Juerg Haefliger [Wed, 25 May 2022 12:01:51 +0000 (14:01 +0200)]
s390/Kconfig.debug: fix indentation
The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Link: https://lore.kernel.org/r/20220525120151.39594-1-juerg.haefliger@canonical.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Juerg Haefliger [Wed, 25 May 2022 12:01:40 +0000 (14:01 +0200)]
s390/Kconfig: fix indentation
The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Link: https://lore.kernel.org/r/20220525120140.39534-1-juerg.haefliger@canonical.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Nico Boehr [Tue, 24 May 2022 13:43:20 +0000 (15:43 +0200)]
s390/perf: obtain sie_block from the right address
Since commit
1179f170b6f0 ("s390: fix fpu restore in entry.S"), the
sie_block pointer is located at empty1[1], but in sie_block() it was
taken from empty1[0].
This leads to a random pointer being dereferenced, possibly causing
system crash.
This problem can be observed when running a simple guest with an endless
loop and recording the cpu-clock event:
sudo perf kvm --guestvmlinux=<guestkernel> --guest top -e cpu-clock
With this fix, the correct guest address is shown.
Fixes:
1179f170b6f0 ("s390: fix fpu restore in entry.S")
Cc: stable@vger.kernel.org
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 23 May 2022 12:42:40 +0000 (14:42 +0200)]
s390: generate register offsets into pt_regs automatically
Use asm offsets method to generate register offsets into pt_regs,
instead of open-coding at several places.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Fri, 20 May 2022 17:23:48 +0000 (19:23 +0200)]
s390: simplify early program check handler
Due to historic reasons the base program check handler calls a
configurable function. Given that there is only the early program
check handler left, simplify the code by directly calling that
function.
The only other user was removed with commit
d485235b0054 ("s390:
assume diag308 set always works").
Also rename all functions and the asm file to reflect this.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Jann Horn [Tue, 17 May 2022 14:30:47 +0000 (16:30 +0200)]
s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
The argument of scatterwalk_unmap() is supposed to be the void* that was
returned by the previous scatterwalk_map() call.
The s390 AES-GCM implementation was instead passing the pointer to the
struct scatter_walk.
This doesn't actually break anything because scatterwalk_unmap() only uses
its argument under CONFIG_HIGHMEM and ARCH_HAS_FLUSH_ON_KUNMAP.
Fixes:
bf7fa038707c ("s390/crypto: add s390 platform specific aes gcm support.")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517143047.3054498-1-jannh@google.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 16 May 2022 12:37:47 +0000 (14:37 +0200)]
s390/head: get rid of 31 bit leftovers
Get rid of old 31 bit leftovers within ipl code:
- convert everything to pc relative code
- use 64 bit addressing mode as early as possible
- use 64 bit arithmetics wherever possible
This way the code doesn't look as odd as before anymore.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:32 +0000 (14:05 +0200)]
scripts/min-tool-version.sh: raise minimum clang version to 14.0.0 for s390
Before version 14.0.0 llvm's integrated assembler fails to handle some
displacement variants:
arch/s390/purgatory/head.S:108:10: error: invalid operand for instruction
lg %r11,kernel_type-.base_crash(%r13)
Instead of working around this and given that this is already fixed
raise the minimum clang version from 13.0.0 to 14.0.0.
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://reviews.llvm.org/D113341
Link: https://lore.kernel.org/r/20220511120532.2228616-9-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:31 +0000 (14:05 +0200)]
s390/boot: do not emit debug info for assembly with llvm's IAS
Commit
ee6d777d3e93 ("s390/decompressor: support extra debug flags")
added extra debug flags, in particular debug info is created,
depending on config options.
With llvm's IAS this causes this compile warning:
arch/s390/boot/head.S:38:1: warning: DWARF2 only supports one section per compilation unit
.section ".head.text","ax"
^
This is a known problem and was addressed with commit
b8a9092330da
("Kbuild: do not emit debug info for assembly with LLVM_IAS=1").
Just do the same for s390 to get rid of this warning.
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220511120532.2228616-8-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:30 +0000 (14:05 +0200)]
s390/boot: workaround llvm IAS bug
For at least the mvc and clc instructions llvm's integrated assembler can
generate incorrect code. In particular this happens with decompressor boot
code. The reason seems to be that relocations for the second displacement
of each instruction are at incorrect locations (-/+: gas vs llvm IAS):
mvc __LC_IO_NEW_PSW(16),.Lnewpsw
results in
4: d2 0f 01 f0 00 00 mvc 496(16,%r0),0
- 8: R_390_12 .head.text+0x10
+ 6: R_390_12 .head.text+0x10
and
clc 0(3,%r4),.L_hdr
results in
258: d5 02 40 00 00 00 clc 0(3,%r4),0
- 25c: R_390_12 .head.text+0x324
+ 25a: R_390_12 .head.text+0x324
Workaround this by writing the code in a different way.
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/llvm/llvm-project/issues/55411
Link: https://lore.kernel.org/r/20220511120532.2228616-7-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:29 +0000 (14:05 +0200)]
s390/purgatory: workaround llvm's IAS limitations
llvm's integrated assembler cannot handle immediate values which are
calculated with two local labels:
arch/s390/purgatory/head.S:139:11: error: invalid operand for instruction
aghi %r8,-(.base_crash-purgatory_start)
Workaround this by partially rewriting the code.
Link: https://lore.kernel.org/r/20220511120532.2228616-6-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:27 +0000 (14:05 +0200)]
s390/entry: workaround llvm's IAS limitations
llvm's integrated assembler cannot handle immediate values which are
calculated with two local labels:
<instantiation>:3:13: error: invalid operand for instruction
clgfi %r14,.Lsie_done - .Lsie_gmap
Workaround this by adding clang specific code which reads the specific
value from memory. Since this code is within the hot paths of the kernel
and adds an additional memory reference, keep the original code, and add
ifdef'ed code.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20220511120532.2228616-5-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:26 +0000 (14:05 +0200)]
s390/alternatives: remove padding generation code
clang fails to handle ".if" statements in inline assembly which are heavily
used in the alternatives code.
To work around this remove this code, and enforce that users of
alternatives must specify original and alternative instruction sequences
which have identical sizes. Add a compile time check with two ".org"
statements similar to arm64.
In result not only clang can handle this, but also quite a lot of code can
be removed.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1356
Link: https://lore.kernel.org/r/20220511120532.2228616-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 11 May 2022 12:05:25 +0000 (14:05 +0200)]
s390/alternatives: provide identical sized orginal/alternative sequences
Explicitly provide identical sized original/alternative instruction
sequences. This way there is no need for the s390 specific alternatives
infrastructure to generate padding sequences.
The code which generates such sequences will be removed with a follow on
patch.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220511120532.2228616-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Fri, 13 May 2022 10:42:55 +0000 (12:42 +0200)]
s390/cpumf: add new extended counter set for IBM z16
Export the extended counter set counters of the IBM z16 via sysfs.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Fri, 6 May 2022 09:33:19 +0000 (11:33 +0200)]
s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
gcc 12 does not (always) optimize away code that should only be generated
if parameters are constant and within in a certain range. This depends on
various obscure kernel config options, however in particular
PROFILE_ALL_BRANCHES can trigger this compile error:
In function ‘__atomic_add_const’,
inlined from ‘__preempt_count_add.part.0’ at ./arch/s390/include/asm/preempt.h:50:3:
./arch/s390/include/asm/atomic_ops.h:80:9: error: impossible constraint in ‘asm’
80 | asm volatile( \
| ^~~
Workaround this by simply disabling the optimization for
PROFILE_ALL_BRANCHES, since the kernel will be so slow, that this
optimization won't matter at all.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Tue, 3 May 2022 07:58:33 +0000 (09:58 +0200)]
s390/stp: clock_delta should be signed
clock_delta is declared as unsigned long in various places. However,
the clock sync delta can be negative. This would add a huge positive
offset in clock_sync_global where clock_delta is added to clk.eitod
which is a 72 bit integer. Declare it as signed long to fix this.
Cc: stable@vger.kernel.org
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Mon, 2 May 2022 09:12:09 +0000 (11:12 +0200)]
s390/stp: fix todoff size
The size of the TOD offset field in the stp info response is 64 bits.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Richter [Wed, 4 May 2022 06:23:51 +0000 (08:23 +0200)]
s390/pai: add support for cryptography counters
PMU device driver perf_pai_crypto supports Processor Activity
Instrumentation (PAI), available with IBM z16:
- maps a full page to lowcore address 0x1500.
- uses CR0 bit 13 to turn PAI crypto counting on and off.
- creates a sample with raw data on each context switch out when
at context switch some mapped counters have a value of nonzero.
This device driver only supports CPU wide context, no task context
is allowed.
Support for counting:
- one or more counters can be specified using
perf stat -e pai_crypto/xxx/
where xxx stands for the counter event name. Multiple invocation
of this command is possible. The counter names are listed in
/sys/devices/pai_crypto/events directory.
- one special counters can be specified using
perf stat -e pai_crypto/CRYPTO_ALL/
which returns the sum of all incremented crypto counters.
- one event pai_crypto/CRYPTO_ALL/ is reserved for sampling.
No multiple invocations are possible. The event collects data at
context switch out and saves them in the ring buffer.
Add qpaci assembly instruction to query supported memory mapped crypto
counters. It returns the number of counters (no holes allowed in that
range).
The PAI crypto counter events are system wide and can not be executed
in parallel. Therefore some restrictions documented in function
paicrypt_busy apply.
In particular event CRYPTO_ALL for sampling must run exclusive.
Only counting events can run in parallel.
PAI crypto counter events can not be created when a CPU hot plug
add is processed. This means a CPU hot plug add does not get
the necessary PAI event to record PAI cryptography counter increments
on the newly added CPU. CPU hot plug remove removes the event and
terminates the counting of PAI counters immediately.
Co-developed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220504062351.2954280-3-tmricht@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Wed, 4 May 2022 06:23:50 +0000 (08:23 +0200)]
entry: Rename arch_check_user_regs() to arch_enter_from_user_mode()
arch_check_user_regs() is used at the moment to verify that struct pt_regs
contains valid values when entering the kernel from userspace. s390 needs
a place in the generic entry code to modify a cpu data structure when
switching from userspace to kernel mode. As arch_check_user_regs() is
exactly this, rename it to arch_enter_from_user_mode().
When entering the kernel from userspace, arch_check_user_regs() is
used to verify that struct pt_regs contains valid values. Note that
the NMI codepath doesn't call this function. s390 needs a place in the
generic entry code to modify a cpu data structure when switching from
userspace to kernel mode. As arch_check_user_regs() is exactly this,
rename it to arch_enter_from_user_mode().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220504062351.2954280-2-tmricht@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 4 May 2022 13:23:39 +0000 (15:23 +0200)]
s390/compat: cleanup compat_linux.h header file
Remove various declarations from former s390 specific compat system
calls which have been removed with commit
fef747bab3c0 ("s390: use
generic UID16 implementation"). While at it clean up the whole small
header file.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Tue, 3 May 2022 12:59:16 +0000 (14:59 +0200)]
s390/entry: remove broken and not needed code
LLVM's integrated assembler reports the following error when compiling
entry.S:
<instantiation>:38:5: error: unknown token in expression
tm %r8,0x0001 # coming from user space?
The correct instruction would have been tmhh instead of tm.
The current code is doing nothing, since (with gas) it get's
translated to a tm instruction which reads from real address 8, which
again contains always zero, and therefore the conditional code is
never executed.
Note that due to the missing displacement gas translates "%r8" into
"8(%r0)".
Also code inspection reveals that this conditional code is not needed.
Therefore remove it.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Mon, 25 Apr 2022 19:24:56 +0000 (21:24 +0200)]
s390/boot: convert parmarea to C
Convert parmarea to C, which makes it much easier to initialize it. No need
to keep offsets in assembler code in sync with struct parmarea anymore.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 27 Apr 2022 02:14:34 +0000 (04:14 +0200)]
s390/boot: convert initial lowcore to C
Convert initial lowcore to C and use proper defines and structures to
initialize it. This should make the z/VM ipl procedure a bit less magic.
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Wed, 27 Apr 2022 02:14:29 +0000 (04:14 +0200)]
s390/ptrace: move short psw definitions to ptrace header file
The short psw definitions are contained in compat header files, however
short psws are not compat specific. Therefore move the definitions to
ptrace header file. This also gets rid of a compat header include in kvm
code.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Sun, 24 Apr 2022 14:02:21 +0000 (16:02 +0200)]
s390/head: initialize all new psws
Initialize all new psws with disabled wait psws, except for the restart new
psw. This way every unexpected exception, svc, machine check, or interrupt
is handled properly.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Sun, 24 Apr 2022 13:52:01 +0000 (15:52 +0200)]
s390/boot: change initial program check handler to disabled wait psw
The program check handler of the kernel image points to
startup_pgm_check_handler. However an early program check which happens
while loading the kernel image will jump to potentially random code, since
the code of the program check handler is not yet loaded; leading to a
program check loop.
Therefore initialize it to a disabled wait psw and let the startup code set
the proper psw when everything is in memory.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Sun, 24 Apr 2022 13:44:05 +0000 (15:44 +0200)]
s390/head: adjust iplstart entry point
Move iplstart entry point to 0x200 again, instead of the middle of the ipl
code. This way even the comment describing the ccw program is correct
again.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Sat, 23 Apr 2022 19:31:22 +0000 (21:31 +0200)]
s390/boot: get rid of startup archive
The final kernel image is created by linking decompressor object files with
a startup archive. The startup archive file however does not contain only
optional code and data which can be discarded if not referenced. It also
contains mandatory object data like head.o which must never be discarded,
even if not referenced.
Move the decompresser code and linker script to the boot directory and get
rid of the startup archive so everything is kept during link time.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Tue, 3 May 2022 08:48:41 +0000 (10:48 +0200)]
s390/vx: remove comments from macros which break LLVM's IAS
LLVM's integrated assembler does not like comments within macros:
<instantiation>:3:19: error: too many positional arguments
GR_NUM b2, 1 /* Base register */
^
Remove them, since they are obvious anyway.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Sun, 1 May 2022 19:05:59 +0000 (21:05 +0200)]
s390/extable: prefer local labels in .set directives
Use local labels in .set directives to avoid potential compile errors
with LTO + clang. See commit
334865b2915c ("x86/extable: Prefer local
labels in .set directives") for further details.
Since s390 doesn't support LTO currently this doesn't fix a real bug
for now, but helps to avoid problems as soon as required pieces have
been added to llvm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Heiko Carstens [Sun, 1 May 2022 18:55:05 +0000 (20:55 +0200)]
s390/nospec: prefer local labels in .set directives
Use local labels in .set directives to avoid potential compile errors
with LTO + clang. See commit
334865b2915c ("x86/extable: Prefer local
labels in .set directives") for further details.
Since s390 doesn't support LTO currently this doesn't fix a real bug
for now, but helps to avoid problems as soon as required pieces have
been added to llvm.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Julia Lawall [Sat, 30 Apr 2022 19:11:19 +0000 (21:11 +0200)]
s390/hypfs: fix typos in comments
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220430191122.8667-5-Julia.Lawall@inria.fr
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Julia Lawall [Sat, 30 Apr 2022 19:11:16 +0000 (21:11 +0200)]
s390/crypto: fix typos in comments
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220430191122.8667-2-Julia.Lawall@inria.fr
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Guilherme G. Piccoli [Wed, 27 Apr 2022 22:49:07 +0000 (19:49 -0300)]
s390/consoles: improve panic notifiers reliability
Currently many console drivers for s390 rely on panic/reboot notifiers
to invoke callbacks on these events. The panic() function disables local
IRQs, secondary CPUs and preemption, so callbacks invoked on panic are
effectively running in atomic context.
Happens that most of these console callbacks from s390 doesn't take the
proper care with regards to atomic context, like taking spinlocks that
might be taken in other function/CPU and hence will cause a lockup
situation.
The goal for this patch is to improve the notifiers reliability, acting
on 4 console drivers, as detailed below:
(1) con3215: changed a regular spinlock to the trylock alternative.
(2) con3270: also changed a regular spinlock to its trylock counterpart,
but here we also have another problem: raw3270_activate_view() takes a
different spinlock. So, we worked a helper to validate if this other lock
is safe to acquire, and if so, raw3270_activate_view() should be safe.
Notice though that there is a functional change here: it's now possible
to continue the notifier code [reaching con3270_wait_write() and
con3270_rebuild_update()] without executing raw3270_activate_view().
(3) sclp: a global lock is used heavily in the functions called from
the notifier, so we added a check here - if the lock is taken already,
we just bail-out, preventing the lockup.
(4) sclp_vt220: same as (3), a lock validation was added to prevent the
potential lockup problem.
Besides (1)-(4), we also removed useless void functions, adding the
code called from the notifier inside its own body, and changed the
priority of such notifiers to execute late, since they are "heavyweight"
for the panic environment, so we aim to reduce risks here.
Changed return values to NOTIFY_DONE as well, the standard one.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Link: https://lore.kernel.org/r/20220427224924.592546-14-gpiccoli@igalia.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Pingfan Liu [Fri, 22 Apr 2022 10:02:12 +0000 (18:02 +0800)]
s390/irq: utilize RCU instead of irq_lock_sparse() in show_msi_interrupt()
As demonstrated by commit
74bdf7815dfb ("genirq: Speedup
show_interrupts()"), irq_desc can be accessed safely in RCU read section.
Hence here resorting to rcu read lock to get rid of irq_lock_sparse().
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Link: https://lore.kernel.org/r/20220422100212.22666-1-kernelfans@gmail.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Ilya Leoshkevich [Tue, 19 Apr 2022 15:40:29 +0000 (17:40 +0200)]
s390: add KCSAN instrumentation to barriers and spinlocks
test_barrier fails on s390 because of the missing KCSAN instrumentation
for several synchronization primitives.
Add it to barriers by defining __mb(), __rmb(), __wmb(), __dma_rmb()
and __dma_wmb(), and letting the common code in asm-generic/barrier.h
do the rest.
Spinlocks require instrumentation only on the unlock path; notify KCSAN
that the CPU cannot move memory accesses outside of the spin lock. In
reality it also cannot move stores inside of it, but this is not
important and can be omitted.
Reported-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Niklas Schnelle [Fri, 25 Feb 2022 08:45:24 +0000 (09:45 +0100)]
s390/pci: add error record for CC 2 retries
Currently it is not detectable from within Linux when PCI instructions
are retried because of a busy condition. Detecting such conditions and
especially how long they lasted can however be quite useful in problem
determination. This patch enables this by adding an s390dbf error log
when a CC 2 is first encountered as well as after the retried
instruction.
Despite being unlikely it may be possible that these added debug
messages drown out important other messages so allow setting the debug
level in zpci_err_insn*() and set their level to 1 so they can be
filtered out if need be.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Niklas Schnelle [Thu, 24 Feb 2022 14:45:33 +0000 (15:45 +0100)]
s390/pci: add PCI access type and length to error records
Currently when a PCI instruction returns a non-zero condition code it
can be very hard to tell from the s390dbf logs what kind of instruction
was executed. In case of PCI memory I/O (MIO) instructions it is even
impossible to tell if we attempted a load, store or block store or how
large the access was because only the address is logged.
Improve this by adding an indicator byte for the instruction type to the
error record and also store the length of the access for MIO
instructions where this can not be deduced from the request.
We use the following indicator values:
- 'l': PCI load
- 's': PCI store
- 'b': PCI store block
- 'L': PCI load (MIO)
- 'S': PCI store (MIO)
- 'B': PCI store block (MIO)
- 'M': MPCIFC
- 'R': RPCIT
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Niklas Schnelle [Fri, 1 Apr 2022 12:04:14 +0000 (14:04 +0200)]
s390/pci: don't log availability events as errors
Availability events are logged in s390dbf in s390dbf/pci_error/hex_ascii
even though they don't indicate an error condition.
They have also become redundant as commit
6526a597a2e85 ("s390/pci: add
simpler s390dbf traces for events") added an s390dbf/pci_msg/sprintf log
entry for availability events which contains all non reserved fields of
struct zpci_ccdf_avail. On the other hand the availability entries in
the error log make it easy to miss actual errors and may even overwrite
error entries if the message buffer wraps.
Thus simply remove the availability events from the error log thereby
establishing the rule that any content in s390dbf/pci_error indicates
some kind of error.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Niklas Schnelle [Fri, 18 Mar 2022 15:25:31 +0000 (16:25 +0100)]
s390/pci: make better use of zpci_dbg() levels
While the zpci_dbg() macro offers a level parameter this is currently
largely unused. The only instance with higher importance than 3 is the
UID checking change debug message which is not actually more important
as the UID uniqueness guarantee is already exposed in sysfs so this
should rather be 3 as well.
On the other hand the "add ..." message which shows what devices are
visible at the lowest level is essential during problem determination.
By setting its level to 1, lowering the debug level can act as a filter
to only show the available functions.
On the error side the default level is set to 6 while all existing
messages are printed at level 0. This is inconsistent and means there is
no room for having messages be invisible on the default level so instead
set the default level to 3 like for errors matching the default for
debug messages.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Thomas Huth [Wed, 13 Apr 2022 09:44:16 +0000 (11:44 +0200)]
s390/vfio-ap: remove superfluous MODULE_DEVICE_TABLE declaration
The vfio_ap module tries to register for the vfio_ap bus - but that's
the interface that it provides itself, so this does not make much sense,
thus let's simply drop this statement now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20220413094416.412114-1-thuth@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Wed, 6 Apr 2022 07:17:21 +0000 (09:17 +0200)]
s390/vdso: add vdso randomization
Randomize the address of vdso if randomize_va_space is enabled.
Note that this keeps the vdso address on the same PMD as the stack
to avoid allocating an extra page table just for vdso.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Wed, 6 Apr 2022 06:44:49 +0000 (08:44 +0200)]
s390/vdso: map vdso above stack
In the current code vdso is mapped below the stack. This is
problematic when programs mapped to the top of the address space
are allocating a lot of memory, because the heap will clash with
the vdso. To avoid this map the vdso above the stack and move
STACK_TOP so that it all fits into three level paging.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Wed, 6 Apr 2022 06:35:26 +0000 (08:35 +0200)]
s390/vdso: move vdso mapping to its own function
This is a preparation patch for adding vdso randomization to s390.
It adds a function vdso_size(), which will be used later in calculating
the STACK_TOP value. It also moves the vdso mapping into a new function
vdso_map(), to keep the code similar to other architectures.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Wed, 6 Apr 2022 06:01:24 +0000 (08:01 +0200)]
s390/mmap: increase stack/mmap gap to 128MB
This basically reverts commit
9e78a13bfb16 ("[S390] reduce miminum
gap between stack and mmap_base"). 32MB is not enough space
between stack and mmap for some programs. Given that compat
task aren't common these days, lets revert back to 128MB.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Harald Freudenberger [Mon, 4 Apr 2022 15:12:37 +0000 (17:12 +0200)]
s390/zcrypt: code cleanup
This patch tries to fix as much as possible of the
checkpatch.pl --strict findings:
CHECK: Logical continuations should be on the previous line
CHECK: No space is necessary after a cast
CHECK: Alignment should match open parenthesis
CHECK: 'useable' may be misspelled - perhaps 'usable'?
WARNING: Possible repeated word: 'is'
CHECK: spaces preferred around that '*' (ctx:VxV)
CHECK: Comparison to NULL could be written "!msg"
CHECK: Prefer kzalloc(sizeof(*zc)...) over kzalloc(sizeof(struct...)...)
CHECK: Unnecessary parentheses around resp_type->work
CHECK: Avoid CamelCase: <xcRB>
There is no functional change comming with this patch, only
code cleanup, renaming, whitespaces, indenting, ... but no
semantic change in any way. Also the API (zcrypt and pkey
header file) is semantically unchanged.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Jürgen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Harald Freudenberger [Fri, 1 Apr 2022 14:59:09 +0000 (16:59 +0200)]
s390/zcrypt: cleanup CPRB struct definitions
This patch does a little cleanup on the CPRBX struct
in zcrypt.h and the redundant CPRB struct definition in
zcrypt_msgtype6.c. Especially some of the misleading
fields from the CPRBX struct have been removed.
There is no semantic change coming with this patch.
The field names changed in the XCRB struct are only related
to reserved fields which should never been used.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Jürgen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Harald Freudenberger [Wed, 23 Mar 2022 11:13:32 +0000 (12:13 +0100)]
s390/ap: uevent on apmask/aqpmask change
This patch introduces user space notifications for changes
on the apmask or aqmask attributes. So it could be possible
to write a udev rule to load/unload the vfio_ap kernel module
based on changes of these masks.
On chance of the apmask or aqmask an AP change event will
be produced with an uevent environment variable showing
the new APMASK or AQMASK mask.
So a change on the apmask triggers an uvevent like this:
KERNEL[490.160396] change /devices/ap (ap)
ACTION=change
DEVPATH=/devices/ap
SUBSYSTEM=ap
APMASK=0xffffffdfffffffffffffffffffffffffffffffffffffffffffffffffffffffff
SEQNUM=13367
and a change on the aqmask looks like this:
KERNEL[283.217642] change /devices/ap (ap)
ACTION=change
DEVPATH=/devices/ap
SUBSYSTEM=ap
AQMASK=0xfbffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
SEQNUM=13348
Only real changes to the masks are processed - the old and
new masks are compared and no action is done if the values
are equal (and thus no uevent). The emit of the uevent is
the very last action done when a mask change is processed.
However, there is no guarantee that all unbind/bind actions
caused by the apmask/aqmask changes are completed when the
apmask/aqmask change uevent is received in userspace.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Jürgen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Haowen Bai [Thu, 7 Apr 2022 02:16:47 +0000 (10:16 +0800)]
s390/cio: simplify the calculation of variables
Fix the following coccicheck warnings:
./arch/s390/include/asm/scsw.h:695:47-49: WARNING
!A || A && B is equivalent to !A || B
I apply a readable version just to get rid of a warning.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Link: https://lore.kernel.org/r/1649297808-5048-1-git-send-email-baihaowen@meizu.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Alexander Gordeev [Wed, 30 Mar 2022 17:50:16 +0000 (19:50 +0200)]
s390/smp: sort out physical vs virtual CPU0 lowcore pointer
SPX instruction called from set_prefix() expects physical
address of the lowcore to be installed, but instead the
virtual address is passed.
Note: this does not fix a bug currently, since virtual and
physical addresses are identical.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Harald Freudenberger [Thu, 24 Mar 2022 12:23:53 +0000 (13:23 +0100)]
s390/zcrypt: add display of ASYM master key verification pattern
This patch extends the sysfs attribute mkvps for CCA cards
to show the states and master key verification patterns for
the old, current and new ASYM master key registers.
With this patch now all relevant master key verification
patterns related to a CCA HSM are available with the mkvps
sysfs attribute. This is a requirement for some exploiters
like the kubernetes cex plugin or initrd code needing to
verify the master key verification patterns on HSMs before
use.
A sample output:
cat /sys/devices/ap/card04/04.0005/mkvps
AES NEW: empty 0x0000000000000000
AES CUR: valid 0xe9a49a58cd039bed
AES OLD: valid 0x7d10d17bc8a409c4
APKA NEW: empty 0x0000000000000000
APKA CUR: valid 0x5f2f27aaa2d59b4a
APKA OLD: valid 0x82a5e2cd5030d5ec
ASYM NEW: empty 0x00000000000000000000000000000000
ASYM CUR: valid 0x650c25a89c27e716d0e692b6c83f10e5
ASYM OLD: valid 0xf8ae2acf8bfc57f0a0957c732c16078b
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Jörg Schmidbauer <jschmidb@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Alexander Egorenkov [Fri, 3 Sep 2021 09:08:48 +0000 (11:08 +0200)]
s390/kexec: set end-of-ipl flag in last diag308 call
If the facility IPL-complete-control is present then the last diag308
call made by kexec shall set the end-of-ipl flag in the subcode register
to signal the hypervisor that this is the last diag308 call made by Linux.
Only the diag308 calls made during a regular kexec need to set
the end-of-ipl flag, in all other cases the hypervisor will ignore it.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Alexander Egorenkov [Fri, 3 Sep 2021 07:39:48 +0000 (09:39 +0200)]
s390/sclp: add detection of IPL-complete-control facility
The presence of the IPL-complete-control facility can be derived
from the hypervisor's SCLP info response.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Linus Torvalds [Sun, 24 Apr 2022 21:51:22 +0000 (14:51 -0700)]
Linux 5.18-rc4
Linus Torvalds [Sun, 24 Apr 2022 20:28:06 +0000 (13:28 -0700)]
Merge tag 'sched_urgent_for_v5.18_rc4' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
- Fix a corner case when calculating sched runqueue variables
That fix also removes a check for a zero divisor in the code, without
mentioning it. Vincent clarified that it's ok after I whined about it:
https://lore.kernel.org/all/CAKfTPtD2QEyZ6ADd5WrwETMOX0XOwJGnVddt7VHgfURdqgOS-Q@mail.gmail.com/
* tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/pelt: Fix attach_entity_load_avg() corner case
Linus Torvalds [Sun, 24 Apr 2022 19:11:20 +0000 (12:11 -0700)]
Merge tag 'powerpc-5.18-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Partly revert a change to our timer_interrupt() that caused lockups
with high res timers disabled.
- Fix a bug in KVM TCE handling that could corrupt kernel memory.
- Two commits fixing Power9/Power10 perf alternative event selection.
Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic
Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin.
* tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix 32bit compile
powerpc/perf: Fix power10 event alternatives
powerpc/perf: Fix power9 event alternatives
KVM: PPC: Fix TCE handling for VFIO
powerpc/time: Always set decrementer in timer_interrupt()
Linus Torvalds [Sun, 24 Apr 2022 19:01:16 +0000 (12:01 -0700)]
Merge tag 'perf_urgent_for_v5.18_rc4' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Sapphire Rapids CPU support
- Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use)
* tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support
perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
Linus Torvalds [Sun, 24 Apr 2022 18:24:48 +0000 (11:24 -0700)]
Merge tag 'edac_urgent_for_v5.18_rc4' of git://git./linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
- Read the reported error count from the proper register on
synopsys_edac
* tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Read the error count from the correct register
Linus Torvalds [Fri, 22 Apr 2022 18:41:38 +0000 (11:41 -0700)]
kvmalloc: use vmalloc_huge for vmalloc allocations
Since commit
559089e0a93d ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an
opt-in strategy, because it caused a number of problems that weren't
noticed until x86 enabled it too.
One of the issues was fixed by Nick Piggin in commit
3b8000ae185c
("mm/vmalloc: huge vmalloc backing pages should be split rather than
compound"), but I'm still worried about page protection issues, and
VM_FLUSH_RESET_PERMS in particular.
However, like the hash table allocation case (commit
f2edd118d02d:
"page_alloc: use vmalloc_huge for large system hash"), the use of
kvmalloc() should be safe from any such games, since the returned
pointer might be a SLUB allocation, and as such no user should
reasonably be using it in any odd ways.
We also know that the allocations are fairly large, since it falls back
to the vmalloc case only when a kmalloc() fails. So using a hugepage
mapping seems both safe and relevant.
This patch does show a weakness in the opt-in strategy: since the opt-in
flag is in the 'vm_flags', not the usual gfp_t allocation flags, very
few of the usual interfaces actually expose it.
That's not much of an issue in this case that already used one of the
fairly specialized low-level vmalloc interfaces for the allocation, but
for a lot of other vmalloc() users that might want to opt in, it's going
to be very inconvenient.
We'll either have to fix any compatibility problems, or expose it in the
gfp flags (__GFP_COMP would have made a lot of sense) to allow normal
vmalloc() users to use hugepage mappings. That said, the cases that
really matter were probably already taken care of by the hash tabel
allocation.
Link: https://lore.kernel.org/all/20220415164413.2727220-1-song@kernel.org/
Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Song Liu [Fri, 15 Apr 2022 16:44:11 +0000 (09:44 -0700)]
page_alloc: use vmalloc_huge for large system hash
Use vmalloc_huge() in alloc_large_system_hash() so that large system
hash (>= PMD_SIZE) could benefit from huge pages.
Note that vmalloc_huge only allocates huge pages for systems with
HAVE_ARCH_HUGE_VMALLOC.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 24 Apr 2022 00:16:10 +0000 (17:16 -0700)]
Merge tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd server fixes from Steve French:
- cap maximum sector size reported to avoid mount problems
- reference count fix
- fix filename rename race
* tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
ksmbd: increment reference count of parent fp
ksmbd: remove filename in ksmbd_file
Linus Torvalds [Sat, 23 Apr 2022 23:24:30 +0000 (16:24 -0700)]
Merge tag 'arc-5.18-rc4' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Assorted fixes
* tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: remove redundant READ_ONCE() in cmpxchg loop
ARC: atomic: cleanup atomic-llsc definitions
arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirely
ARC: dts: align SPI NOR node name with dtschema
ARC: Remove a redundant memset()
ARC: fix typos in comments
ARC: entry: fix syscall_trace_exit argument
Linus Torvalds [Sat, 23 Apr 2022 20:58:18 +0000 (13:58 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix for an information leak caused by copying a buffer to
userspace without checking for error first in the sr driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sr: Do not leak information in ioctl
Linus Torvalds [Sat, 23 Apr 2022 20:53:21 +0000 (13:53 -0700)]
Merge tag 'for-linus-5.18-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A simple cleanup patch and a refcount fix for Xen on Arm"
* tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Fix some refcount leaks
xen: Convert kmap() to kmap_local_page()
Linus Torvalds [Sat, 23 Apr 2022 16:57:30 +0000 (09:57 -0700)]
Merge tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Maarten was away, so Maxine stepped up and sent me the drm-fixes
merge, so no point leaving it for another week.
The big change is an OF revert around bridge/panels, it may have some
driver fallout, but hopefully this revert gets them shook out in the
next week easier.
Otherwise it's a bunch of locking/refcounts across drivers, a radeon
dma_resv logic fix and some raspberry pi panel fixes.
panel:
- revert of patch that broke panel/bridge issues
dma-buf:
- remove unused header file.
amdgpu:
- partial revert of locking change
radeon:
- fix dma_resv logic inversion
panel:
- pi touchscreen panel init fixes
vc4:
- build fix
- runtime pm refcount fix
vmwgfx:
- refcounting fix"
* tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: partial revert "remove ctx->lock" v2
Revert "drm: of: Lookup if child node has panel or bridge"
Revert "drm: of: Properly try all possible cases for bridge/panel detection"
drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
drm/vmwgfx: Fix gem refcounting and memory evictions
drm/vc4: Fix build error when CONFIG_DRM_VC4=y && CONFIG_RASPBERRYPI_FIRMWARE=m
drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
dma-buf-map: remove renamed header file
drm/radeon: fix logic inversion in radeon_sync_resv
Linus Torvalds [Sat, 23 Apr 2022 16:52:07 +0000 (09:52 -0700)]
Merge tag 'input-for-v5.18-rc3' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a new set of keycodes to be used by marine navigation systems
- minor fixes to omap4-keypad and cypress-sf drivers
* tag 'input-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: add Marine Navigation Keycodes
Input: omap4-keypad - fix pm_runtime_get_sync() error checking
Input: cypress-sf - register a callback to disable the regulators
Linus Torvalds [Sat, 23 Apr 2022 16:46:44 +0000 (09:46 -0700)]
Merge tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just two small regression fixes for bcache"
* tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()
bcache: put bch_bio_map() back to correct location in journal_write_unlocked()
Linus Torvalds [Sat, 23 Apr 2022 16:42:13 +0000 (09:42 -0700)]
Merge tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one fixing a potential leak for the iovec for
larger requests added in this cycle, and one fixing a theoretical leak
with CQE_SKIP and IOPOLL"
* tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
io_uring: fix leaks on IOPOLL and CQE_SKIP
io_uring: free iovec if file assignment fails
Linus Torvalds [Sat, 23 Apr 2022 16:36:23 +0000 (09:36 -0700)]
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE
perf.data files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on
s390, where it is not supported.
* tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Fix error message for test case 71 on s390, where it is not supported
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
perf script: Always allow field 'data_src' for auxtrace
perf clang: Fix header include for LLVM >= 14
Randy Dunlap [Sat, 23 Apr 2022 03:25:17 +0000 (20:25 -0700)]
sparc: cacheflush_32.h needs struct page
Add a struct page forward declaration to cacheflush_32.h.
Fixes this build warning:
CC drivers/crypto/xilinx/zynqmp-sha.o
In file included from arch/sparc/include/asm/cacheflush.h:11,
from include/linux/cacheflush.h:5,
from drivers/crypto/xilinx/zynqmp-sha.c:6:
arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
38 | void sparc_flush_page_to_ram(struct page *page);
Exposed by commit
0e03b8fd2936 ("crypto: xilinx - Turn SHA into a
tristate and allow COMPILE_TEST") but not Fixes: that commit because the
underlying problem is older.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux@vger.kernel.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Sat, 23 Apr 2022 05:00:33 +0000 (15:00 +1000)]
Merge tag 'drm-misc-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for the raspberrypi panel initialisation, one fix for a logic
inversion in radeon, a build and pm refcounting fix for vc4, two reverts
for drm_of_get_bridge that caused a number of regression and a locking
regression for amdgpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220422084403.2xrhf3jusdej5yo4@houat
Linus Torvalds [Sat, 23 Apr 2022 01:18:27 +0000 (18:18 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix some syzbot-detected bugs, as well as other bugs found by I/O
injection testing.
Change ext4's fallocate to consistently drop set[ug]id bits when an
fallocate operation might possibly change the user-visible contents of
a file.
Also, improve handling of potentially invalid values in the the
s_overhead_cluster superblock field to avoid ext4 returning a negative
number of free blocks"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix a potential race while discarding reserved buffers after an abort
ext4: update the cached overhead value in the superblock
ext4: force overhead calculation if the s_overhead_cluster makes no sense
ext4: fix overhead calculation to account for the reserved gdt blocks
ext4, doc: fix incorrect h_reserved size
ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
ext4: fix use-after-free in ext4_search_dir
ext4: fix bug_on in start_this_handle during umount filesystem
ext4: fix symlink file size not match to file content
ext4: fix fallocate to use file_modified to update permissions consistently
Linus Torvalds [Sat, 23 Apr 2022 01:09:49 +0000 (18:09 -0700)]
Merge tag 'ata-5.18-rc4' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single fix to avoid a NULL pointer dereference in the pata_marvell
driver with adapters not supporting DMA, from Zheyu"
* tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_marvell: Check the 'bmdma_addr' beforing reading
Linus Torvalds [Sat, 23 Apr 2022 00:58:36 +0000 (17:58 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main and larger change here is a workaround for AMD's lack of
cache coherency for encrypted-memory guests.
I have another patch pending, but it's waiting for review from the
architecture maintainers.
RISC-V:
- Remove 's' & 'u' as valid ISA extension
- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
- Fix NMI watchdog in guests on AMD
- Fix for SEV cache incoherency issues
- Don't re-acquire SRCU lock in complete_emulated_io()
- Avoid NULL pointer deref if VM creation fails
- Fix race conditions between APICv disabling and vCPU creation
- Bugfixes for disabling of APICv
- Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
- Do not use bitfields larger than 32-bits, they differ between GCC
and clang"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: introduce and use more page size-related constants
kvm: selftests: do not use bitfields larger than 32-bits for PTEs
KVM: SEV: add cache flush to solve SEV cache incoherency issues
KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
KVM: selftests: Silence compiler warning in the kvm_page_table_test
KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
KVM: SPDX style and spelling fixes
KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
RISC-V: KVM: Restrict the extensions that can be disabled
RISC-V: KVM: Remove 's' & 'u' as valid ISA extension
Thomas Richter [Wed, 20 Apr 2022 06:29:21 +0000 (08:29 +0200)]
perf test: Fix error message for test case 71 on s390, where it is not supported
Test case 71 'Convert perf time to TSC' is not supported on s390.
Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.
The root cause is function evlist__open() called from
test__perf_time_to_tsc(). evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.
The PMU driver returns -ENOENT in this case. This error is leads to the
failure.
Fix this by returning TEST_SKIP on -ENOENT.
Output before:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: FAILED!
Output after:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: Skip (perf_read_tsc_conversion is not supported)
This also happens on an x86_64 virtual machine:
# uname -m
x86_64
$ ./perf test -F 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : FAILED!
$
Committer testing:
Continues to work on x86_64:
$ perf test 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : Ok
$
Fixes:
290fa68bdc458863 ("perf test tsc: Fix error message when not supported")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: chengdongli@tencent.com
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220420062921.1211825-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Thu, 14 Apr 2022 12:32:01 +0000 (20:32 +0800)]
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
Since commit
bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.
The commit
ffab487052054162 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue. It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.
On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data. This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.
Fixes:
bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/20220414123201.842754-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Sun, 17 Apr 2022 11:48:37 +0000 (19:48 +0800)]
perf script: Always allow field 'data_src' for auxtrace
If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:
# perf script -F,+data_src
Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.
This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.
Fixes:
e55ed3423c1bb29f ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220417114837.839896-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Guilherme Amadio [Sat, 16 Apr 2022 07:45:55 +0000 (09:45 +0200)]
perf clang: Fix header include for LLVM >= 14
The header TargetRegistry.h has moved in LLVM/clang 14.
Committer notes:
The problem as noticed when building in ubuntu:22.04:
90 98.61 ubuntu:22.04 : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
util/c++/clang.cpp:23:10: fatal error: llvm/Support/TargetRegistry.h: No such file or directory
23 | #include "llvm/Support/TargetRegistry.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Fixed after applying this patch.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Guilherme Amadio <amadio@gentoo.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://twitter.com/GuilhermeAmadio/status/1514970524232921088
Link: http://lore.kernel.org/lkml/Ylp0M/VYgHOxtcnF@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mario Limonciello [Fri, 22 Apr 2022 13:14:52 +0000 (08:14 -0500)]
gpio: Request interrupts after IRQ is initialized
Commit
5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.
This manifests in messages showing deferred probing while trying to
allocate IRQs like so:
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ .. more of the same .. ]
The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.
Fixes:
5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@BL1PR12MB5157.namprd12.prod.outlook.com/
Link: https://bugzilla.suse.com/show_bug.cgi?id=1198697
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215850
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1979
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1976
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Shreeya Patel <shreeya.patel@collabora.com>
Tested-By: Samuel Čavoj <samuel@cavoj.net>
Tested-By: lukeluk498@gmail.com Link:
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-and-tested-by: Takashi Iwai <tiwai@suse.de>
Cc: Shreeya Patel <shreeya.patel@collabora.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 22 Apr 2022 20:53:14 +0000 (13:53 -0700)]
Merge tag 'riscv-for-linus-5.18-rc4' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes Palmer Dabbelt:
- A pair of build fixes for the recent cpuidle driver
- A fix for systems without sv57 that manifests as a crash
early in boot
* tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE
RISC-V: mm: Fix set_satp_mode() for platform not having Sv57
cpuidle: riscv: support non-SMP config
Linus Torvalds [Fri, 22 Apr 2022 20:49:26 +0000 (13:49 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's no real pattern to the fixes, but the main one fixes our
pmd_leaf() definition to resolve a NULL dereference on the migration
path.
- Fix PMU event validation in the absence of any event counters
- Fix allmodconfig build using clang in conjunction with binutils
- Fix definitions of pXd_leaf() to handle PROT_NONE entries
- More typo fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: fix p?d_leaf()
arm64: fix typos in comments
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
arm_pmu: Validate single/group leader events
Miaoqian Lin [Wed, 20 Apr 2022 01:49:13 +0000 (01:49 +0000)]
arm/xen: Fix some refcount leaks
The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.
Fixes:
9b08aaa3199a ("ARM: XEN: Move xen_early_init() before efi_init()")
Fixes:
b2371587fe0c ("arm/xen: Read extended regions from DT and init Xen resource")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Linus Torvalds [Fri, 22 Apr 2022 20:31:39 +0000 (13:31 -0700)]
Merge tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray
Pull xarray fixes from Matthew Wilcox:
"Syzbot found a nasty race between large page splitting and page
lookup. Details in the commit log, but fortunately it has a reliable
reproducer. I thought it better to send this one to you straight away.
Also fix the test suite build for kmem_cache_alloc_lru()"
* tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray:
XArray: Disallow sibling entries of nodes
tools: Add kmem_cache_alloc_lru()
Linus Torvalds [Fri, 22 Apr 2022 20:26:11 +0000 (13:26 -0700)]
Merge tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four fixes, two of them for stable:
- fcollapse fix
- reconnect lock fix
- DFS oops fix
- minor cleanup patch"
* tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: destage any unwritten data to the server before calling copychunk_write
cifs: use correct lock type in cifs_reconnect()
cifs: fix NULL ptr dereference in refresh_mounts()
cifs: Use kzalloc instead of kmalloc/memset
Linus Torvalds [Fri, 22 Apr 2022 20:17:19 +0000 (13:17 -0700)]
Merge tag 'fs.fixes.v5.18-rc4' of git://git./linux/kernel/git/brauner/linux
Pull mount_setattr fix from Christian Brauner:
"The recent cleanup in
e257039f0fc7 ("mount_setattr(): clean the
control flow and calling conventions") switched the mount attribute
codepaths from do-while to for loops as they are more idiomatic when
walking mounts.
However, we did originally choose do-while constructs because if we
request a mount or mount tree to be made read-only we need to hold
writers in the following way: The mount attribute code will grab
lock_mount_hash() and then call mnt_hold_writers() which will
_unconditionally_ set MNT_WRITE_HOLD on the mount.
Any callers that need write access have to call mnt_want_write(). They
will immediately see that MNT_WRITE_HOLD is set on the mount and the
caller will then either spin (on non-preempt-rt) or wait on
lock_mount_hash() (on preempt-rt).
The fact that MNT_WRITE_HOLD is set unconditionally means that once
mnt_hold_writers() returns we need to _always_ pair it with
mnt_unhold_writers() in both the failure and success paths.
The do-while constructs did take care of this. But Al's change to a
for loop in the failure path stops on the first mount we failed to
change mount attributes _without_ going into the loop to call
mnt_unhold_writers().
This in turn means that once we failed to make a mount read-only via
mount_setattr() - i.e. there are already writers on that mount - we
will block any writers indefinitely. Fix this by ensuring that the for
loop always unsets MNT_WRITE_HOLD including the first mount we failed
to change to read-only. Also sprinkle a few comments into the cleanup
code to remind people about what is happening including myself. After
all, I didn't catch it during review.
This is only relevant on mainline and was reported by syzbot. Details
about the syzbot reports are all in the commit message"
* tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: unset MNT_WRITE_HOLD on failure
Linus Torvalds [Fri, 22 Apr 2022 20:11:38 +0000 (13:11 -0700)]
Merge tag 'sound-5.18-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"At this time, the majority of changes are for pending ASoC fixes while
a few usual HD-audio and USB-audio quirks are found.
Almost all patches are small device-specific fixes, and nothing
worrisome stands out, so far"
* tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits)
ALSA: hda/realtek: Add quirk for Clevo NP70PNP
ALSA: hda: intel-dsp-config: Add RaptorLake PCI IDs
ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook 845/865 G9
ALSA: usb-audio: Clear MIDI port active flag after draining
ALSA: usb-audio: add mapping for MSI MAG X570S Torpedo MAX.
ALSA: hda/i915: Fix one too many pci_dev_put()
ALSA: hda/hdmi: add HDMI codec VID for Raptorlake-P
ALSA: hda/hdmi: fix warning about PCM count when used with SOF
sound/oss/dmasound: fix 'dmasound_setup' defined but not used
firmware: cs_dsp: Fix overrun of unterminated control name string
ASoC: codecs: Fix an error handling path in (rx|tx|va)_macro_probe()
ASoC: Intel: sof_es8336: Add a quirk for Huawei Matebook D15
ASoC: Intel: sof_es8336: add a quirk for headset at mic1 port
ASoC: Intel: sof_es8336: support a separate gpio to control headphone
ASoC: Intel: sof_es8336: simplify speaker gpio naming
ASoC: wm8731: Disable the regulator when probing fails
ASoC: Intel: soc-acpi: correct device endpoints for max98373
ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use
ASoC: SOF: topology: Fix memory leak in sof_control_load()
ASoC: SOF: topology: cleanup dailinks on widget unload
...
Matthew Wilcox (Oracle) [Fri, 22 Apr 2022 17:23:12 +0000 (13:23 -0400)]
XArray: Disallow sibling entries of nodes
There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption. Fortunately,
it's hard to hit (syzbot took three months to find it) and often guarded
with VM_BUG_ON().
The anatomy of this race is:
thread A thread B
order-9 page is stored at index 0x200
lookup of page at index 0x274
page split starts
load of sibling entry at offset 9
stores nodes at offsets 8-15
load of entry at offset 8
The entry at offset 8 turns out to be a node, and so we descend into it,
and load the page at index 0x234 instead of 0x274. This is hard to fix
on the split side; we could replace the entire node that contains the
order-9 page instead of replacing the eight entries. Fixing it on
the lookup side is easier; just disallow sibling entries that point
to nodes. This cannot ever be a useful thing as the descent would not
know the correct offset to use within the new node.
The test suite continues to pass, but I have not added a new test for
this bug.
Reported-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
Tested-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
Fixes:
6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox (Oracle) [Fri, 22 Apr 2022 17:20:21 +0000 (13:20 -0400)]
tools: Add kmem_cache_alloc_lru()
Turn kmem_cache_alloc() into a wrapper around kmem_cache_alloc_lru().
Fixes:
9bbdc0f32409 ("xarray: use kmem_cache_alloc_lru to allocate xa_node")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Li Wang <liwang@redhat.com>
Linus Torvalds [Fri, 22 Apr 2022 17:10:43 +0000 (10:10 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"13 patches.
Subsystems affected by this patch series: mm (memory-failure, memcg,
userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
kcov: don't generate a warning on vm_insert_page()'s failure
MAINTAINERS: add Vincenzo Frascino to KASAN reviewers
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
selftest/vm: add skip support to mremap_test
selftest/vm: support xfail in mremap_test
selftest/vm: verify remap destination address in mremap_test
selftest/vm: verify mmap addr in mremap_test
mm, hugetlb: allow for "high" userspace addresses
userfaultfd: mark uffd_wp regardless of VM_WRITE flag
memcg: sync flush only if periodic flush is delayed
mm/memory-failure.c: skip huge_zero_page in memory_failure()
mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Nicholas Piggin [Fri, 22 Apr 2022 06:01:05 +0000 (16:01 +1000)]
mm/vmalloc: huge vmalloc backing pages should be split rather than compound
Huge vmalloc higher-order backing pages were allocated with __GFP_COMP
in order to allow the sub-pages to be refcounted by callers such as
"remap_vmalloc_page [sic]" (remap_vmalloc_range).
However a similar problem exists for other struct page fields callers
use, for example fb_deferred_io_fault() takes a vmalloc'ed page and
not only refcounts it but uses ->lru, ->mapping, ->index.
This is not compatible with compound sub-pages, and can cause bad page
state issues like
BUG: Bad page state in process swapper/0 pfn:00743
page:(____ptrval____) refcount:0 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x743
flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff)
raw:
007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000
raw:
0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: corrupted mapping in tail page
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810
Call Trace:
dump_stack_lvl+0x74/0xa8 (unreliable)
bad_page+0x12c/0x170
free_tail_pages_check+0xe8/0x190
free_pcp_prepare+0x31c/0x4e0
free_unref_page+0x40/0x1b0
__vunmap+0x1d8/0x420
...
The correct approach is to use split high-order pages for the huge
vmalloc backing. These allow callers to treat them in exactly the same
way as individually-allocated order-0 pages.
Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Fri, 22 Apr 2022 06:00:33 +0000 (14:00 +0800)]
arm64: mm: fix p?d_leaf()
The pmd_leaf() is used to test a leaf mapped PMD, however, it misses
the PROT_NONE mapped PMD on arm64. Fix it. A real world issue [1]
caused by this was reported by Qian Cai. Also fix pud_leaf().
Link: https://patchwork.kernel.org/comment/24798260/
Fixes:
8aa82df3c123 ("arm64: mm: add p?d_leaf() definitions")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Link: https://lore.kernel.org/r/20220422060033.48711-1-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Fri, 22 Apr 2022 03:10:43 +0000 (20:10 -0700)]
Merge tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Extra quiet after Easter, only have minor i915 and msm pulls. However
I haven't seen a PR from our misc tree in a little while, I've cc'ed
all the suspects. Once that unblocks I expect a bit larger bunch of
patches to arrive.
Otherwise as I said, one msm revert and two i915 fixes.
msm:
- revert iommu change that broke some platforms.
i915:
- Unset enable_psr2_sel_fetch if PSR2 detection fails
- Fix to detect when VRR is turned off from panel settings"
* tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm:
drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails
drm/msm: Revert "drm/msm: Stop using iommu_present()"
drm/i915/display/vrr: Reset VRR capable property on a long hpd
Alistair Popple [Thu, 21 Apr 2022 23:36:10 +0000 (16:36 -0700)]
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
In some cases it is possible for mmu_interval_notifier_remove() to race
with mn_tree_inv_end() allowing it to return while the notifier data
structure is still in use. Consider the following sequence:
CPU0 - mn_tree_inv_end() CPU1 - mmu_interval_notifier_remove()
----------------------------------- ------------------------------------
spin_lock(subscriptions->lock);
seq = subscriptions->invalidate_seq;
spin_lock(subscriptions->lock); spin_unlock(subscriptions->lock);
subscriptions->invalidate_seq++;
wait_event(invalidate_seq != seq);
return;
interval_tree_remove(interval_sub); kfree(interval_sub);
spin_unlock(subscriptions->lock);
wake_up_all();
As the wait_event() condition is true it will return immediately. This
can lead to use-after-free type errors if the caller frees the data
structure containing the interval notifier subscription while it is
still on a deferred list. Fix this by taking the appropriate lock when
reading invalidate_seq to ensure proper synchronisation.
I observed this whilst running stress testing during some development.
You do have to be pretty unlucky, but it leads to the usual problems of
use-after-free (memory corruption, kernel crash, difficult to diagnose
WARN_ON, etc).
Link: https://lkml.kernel.org/r/20220420043734.476348-1-apopple@nvidia.com
Fixes:
99cb252f5e68 ("mm/mmu_notifier: add an interval tree notifier")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>