platform/kernel/linux-starfive.git
21 months agos390/idle: mark arch_cpu_idle() noinstr
Heiko Carstens [Mon, 6 Feb 2023 13:49:40 +0000 (14:49 +0100)]
s390/idle: mark arch_cpu_idle() noinstr

linux-next commit ("cpuidle: tracing: Warn about !rcu_is_watching()")
adds a new warning which hits on s390's arch_cpu_idle() function:

RCU not on for: arch_cpu_idle+0x0/0x28
WARNING: CPU: 2 PID: 0 at include/linux/trace_recursion.h:162 arch_ftrace_ops_list_func+0x24c/0x258
Modules linked in:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-rc6-next-20230202 #4
Hardware name: IBM 8561 T01 703 (z/VM 7.3.0)
Krnl PSW : 0404d00180000000 00000000002b55c0 (arch_ftrace_ops_list_func+0x250/0x258)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: c0000000ffffbfff 0000000080000002 0000000000000026 0000000000000000
           0000037ffffe3a28 0000037ffffe3a20 0000000000000000 0000000000000000
           0000000000000000 0000000000f4acf6 00000000001044f0 0000037ffffe3cb0
           0000000000000000 0000000000000000 00000000002b55bc 0000037ffffe3bb8
Krnl Code: 00000000002b55b0c02000840051        larl    %r2,0000000001335652
           00000000002b55b6c0e5fff512d1        brasl   %r14,0000000000157b58
          #00000000002b55bcaf000000            mc      0,0
          >00000000002b55c0a7f4ffe7            brc     15,00000000002b558e
           00000000002b55c4: 0707                bcr     0,%r7
           00000000002b55c6: 0707                bcr     0,%r7
           00000000002b55c8eb6ff0480024        stmg    %r6,%r15,72(%r15)
           00000000002b55ceb90400ef            lgr     %r14,%r15
Call Trace:
 [<00000000002b55c0>] arch_ftrace_ops_list_func+0x250/0x258
([<00000000002b55bc>] arch_ftrace_ops_list_func+0x24c/0x258)
 [<0000000000f5f0fc>] ftrace_common+0x1c/0x20
 [<00000000001044f6>] arch_cpu_idle+0x6/0x28
 [<0000000000f4acf6>] default_idle_call+0x76/0x128
 [<00000000001cc374>] do_idle+0xf4/0x1b0
 [<00000000001cc6ce>] cpu_startup_entry+0x36/0x40
 [<0000000000119d00>] smp_start_secondary+0x140/0x150
 [<0000000000f5d2ae>] restart_int_handler+0x6e/0x90

Mark arch_cpu_idle() noinstr like all other architectures with
CONFIG_ARCH_WANTS_NO_INSTR (should) have it to fix this.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/idle: move idle time accounting to account_idle_time_irq()
Heiko Carstens [Mon, 6 Feb 2023 13:49:39 +0000 (14:49 +0100)]
s390/idle: move idle time accounting to account_idle_time_irq()

There is no reason to do idle time accounting in arch_cpu_idle().
Do idle time accounting in account_idle_time_irq(), where it belongs
to. The accounted values don't change between account_idle_time_irq() and
arch_cpu_idle(); so the result is the same.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agoMerge branch 'cmpxchg_user_key' into features
Heiko Carstens [Thu, 9 Feb 2023 19:10:35 +0000 (20:10 +0100)]
Merge branch 'cmpxchg_user_key' into features

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: unify lpar and zvm diag288 helpers
Alexander Egorenkov [Fri, 3 Feb 2023 07:39:58 +0000 (08:39 +0100)]
watchdog: diag288_wdt: unify lpar and zvm diag288 helpers

Change naming of the internal diag288 helper functions
to improve overall readability and reduce confusion:
* Rename __diag288() to diag288().
* Get rid of the misnamed helper __diag288_lpar() that was used not only
  on LPARs but also zVM and KVM systems.
* Rename __diag288_vm() to diag288_str().

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230203073958.1585738-6-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: de-duplicate diag_stat_inc() calls
Alexander Egorenkov [Fri, 3 Feb 2023 07:39:57 +0000 (08:39 +0100)]
watchdog: diag288_wdt: de-duplicate diag_stat_inc() calls

Call diag_stat_inc() from __diag288() to reduce code duplication.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230203073958.1585738-5-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: unify command buffer handling for diag288 zvm
Alexander Egorenkov [Fri, 3 Feb 2023 07:39:56 +0000 (08:39 +0100)]
watchdog: diag288_wdt: unify command buffer handling for diag288 zvm

Simplify and de-duplicate code by introducing a common single command
buffer allocated once at initialization. Moreover, simplify the interface
of __diag288_vm() by accepting ASCII strings as the command parameter
and converting it to the EBCDIC format within the function itself.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230203073958.1585738-4-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: remove power management
Alexander Egorenkov [Fri, 3 Feb 2023 07:39:55 +0000 (08:39 +0100)]
watchdog: diag288_wdt: remove power management

Remove power management because s390 no longer supports hibernation since
commit 394216275c7d ("s390: remove broken hibernate / power management
support").

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230203073958.1585738-3-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: get rid of register asm
Alexander Egorenkov [Fri, 3 Feb 2023 07:39:54 +0000 (08:39 +0100)]
watchdog: diag288_wdt: get rid of register asm

Using register asm statements has been proven to be very error prone,
especially when using code instrumentation where gcc may add function
calls, which clobbers register contents in an unexpected way.

Therefore, get rid of register asm statements in watchdog code, and make
sure this bug class cannot happen.

Moreover, remove the register r1 from the clobber list because this
register is not changed by DIAG 288.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230203073958.1585738-2-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agoMerge branch 'fixes' into features
Heiko Carstens [Mon, 6 Feb 2023 14:13:45 +0000 (15:13 +0100)]
Merge branch 'fixes' into features

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/boot: avoid potential amode31 truncation
Vasily Gorbik [Thu, 2 Feb 2023 18:22:35 +0000 (19:22 +0100)]
s390/boot: avoid potential amode31 truncation

Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/boot: move detect_facilities() after cmd line parsing
Vasily Gorbik [Thu, 2 Feb 2023 18:21:38 +0000 (19:21 +0100)]
s390/boot: move detect_facilities() after cmd line parsing

Facilities setup has to be done after "facilities" command line option
parsing, it might set extra or remove existing facilities bits for
testing purposes.

Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/kasan: avoid mapping KASAN shadow for standby memory
Vasily Gorbik [Mon, 30 Jan 2023 01:11:39 +0000 (02:11 +0100)]
s390/kasan: avoid mapping KASAN shadow for standby memory

KASAN common code is able to handle memory hotplug and create KASAN shadow
memory on a fly. Online memory ranges are available from mem_detect,
use this information to avoid mapping KASAN shadow for standby memory.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/boot: avoid page tables memory in kaslr
Vasily Gorbik [Fri, 27 Jan 2023 16:08:29 +0000 (17:08 +0100)]
s390/boot: avoid page tables memory in kaslr

If kernel is build without KASAN support there is a chance that kernel
image is going to be positioned by KASLR code to overlap with identity
mapping page tables.

When kernel is build with KASAN support enabled memory which
is potentially going to be used for page tables and KASAN
shadow mapping is accounted for in KASLR with the use of
kasan_estimate_memory_needs(). Split this function and introduce
vmem_estimate_memory_needs() to cover decompressor's vmem identity
mapping page tables.

Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/mem_detect: add get_mem_detect_online_total()
Vasily Gorbik [Sun, 29 Jan 2023 23:57:12 +0000 (00:57 +0100)]
s390/mem_detect: add get_mem_detect_online_total()

Add a function to get online memory in total. It is supposed to be used
in the decompressor as well as during early kernel startup.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/mem_detect: handle online memory limit just once
Vasily Gorbik [Sat, 28 Jan 2023 22:55:04 +0000 (23:55 +0100)]
s390/mem_detect: handle online memory limit just once

Introduce mem_detect_truncate() to cut any online memory ranges above
established identity mapping size, so that mem_detect users wouldn't
have to do it over and over again.

Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/boot: fix mem_detect extended area allocation
Vasily Gorbik [Mon, 23 Jan 2023 11:49:47 +0000 (12:49 +0100)]
s390/boot: fix mem_detect extended area allocation

Allocation of mem_detect extended area was not considered neither
in commit 9641b8cc733f ("s390/ipl: read IPL report at early boot")
nor in commit b2d24b97b2a9 ("s390/kernel: add support for kernel address
space layout randomization (KASLR)"). As a result mem_detect extended
theoretically may overlap with ipl report or randomized kernel image
position. But as mem_detect code will allocate extended area only
upon exceeding 255 online regions (which should alternate with offline
memory regions) it is not seen in practice.

To make sure mem_detect extended area does not overlap with ipl report
or randomized kernel position extend usage of "safe_addr". Make initrd
handling and mem_detect extended area allocation code move it further
right and make KASLR takes in into consideration as well.

Fixes: 9641b8cc733f ("s390/ipl: read IPL report at early boot")
Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/mem_detect: rely on diag260() if sclp_early_get_memsize() fails
Vasily Gorbik [Fri, 27 Jan 2023 13:57:43 +0000 (14:57 +0100)]
s390/mem_detect: rely on diag260() if sclp_early_get_memsize() fails

In case sclp_early_get_memsize() fails but diag260() succeeds make sure
some sane value is returned. This error scenario is highly unlikely,
but this change makes system able to boot in such case.

Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/diag: make __diag8c_tmp_amode31 static
Heiko Carstens [Thu, 2 Feb 2023 20:47:11 +0000 (21:47 +0100)]
s390/diag: make __diag8c_tmp_amode31 static

Get rid of this sparse warning:

arch/s390/kernel/diag.c:69:29: warning: symbol '__diag8c_tmp_amode31' was not declared. Should it be static?

Fixes: fbaee7464fbb ("s390/tty3270: add support for diag 8c")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/rethook: add local rethook header file
Heiko Carstens [Thu, 2 Feb 2023 20:36:07 +0000 (21:36 +0100)]
s390/rethook: add local rethook header file

Compiling the kernel with CONFIG_KPROBES disabled, but CONFIG_RETHOOK
enabled, results in this sparse warning:

arch/s390/kernel/rethook.c:26:15: warning: no previous prototype for 'arch_rethook_trampoline_callback' [-Wmissing-prototypes]
    26 | unsigned long arch_rethook_trampoline_callback(struct pt_regs *regs)
       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Add a local rethook header file similar to riscv to address this.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1a280f48c0e4 ("s390/kprobes: replace kretprobe with rethook")
Link: https://lore.kernel.org/all/202302030102.69dZIuJk-lkp@intel.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vmem: remove unnecessary KASAN checks
Vasily Gorbik [Sat, 28 Jan 2023 17:06:40 +0000 (18:06 +0100)]
s390/vmem: remove unnecessary KASAN checks

Kasan shadow memory area has been moved to the end of kernel address
space since commit 9a39abb7c9aa ("s390/boot: simplify and fix kernel
memory layout setup"), therefore skipping any memory ranges above
VMALLOC_START in empty page tables cleanup code already handles
KASAN shadow memory intersection case and explicit checks could be
removed.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vmem: fix empty page tables cleanup under KASAN
Vasily Gorbik [Sat, 28 Jan 2023 16:35:12 +0000 (17:35 +0100)]
s390/vmem: fix empty page tables cleanup under KASAN

Commit b9ff81003cf1 ("s390/vmem: cleanup empty page tables") introduced
empty page tables cleanup in vmem code, but when the kernel is built
with KASAN enabled the code has no effect due to wrong KASAN shadow
memory intersection condition, which effectively ignores any memory
range below KASAN shadow. Fix intersection condition to make code
work as anticipated.

Fixes: b9ff81003cf1 ("s390/vmem: cleanup empty page tables")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/kasan: update kasan memory layout note
Vasily Gorbik [Fri, 27 Jan 2023 22:27:18 +0000 (23:27 +0100)]
s390/kasan: update kasan memory layout note

Kasan shadow memory area has been moved to the end of kernel address
space since commit 9a39abb7c9aa ("s390/boot: simplify and fix kernel
memory layout setup"). Change kasan memory layout note accordingly.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/mem_detect: fix detect_memory() error handling
Vasily Gorbik [Fri, 27 Jan 2023 13:03:07 +0000 (14:03 +0100)]
s390/mem_detect: fix detect_memory() error handling

Currently if for some reason sclp_early_read_info() fails,
sclp_early_get_memsize() will not set max_physmem_end and it
will stay uninitialized. Any garbage value other than 0 will lead
to detect_memory() taking wrong path or returning a garbage value
as max_physmem_end. To avoid that simply initialize max_physmem_end.

Fixes: 73045a08cf55 ("s390: unify identity mapping limits handling")
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/hmcdrv: use strscpy() instead of strlcpy()
Heiko Carstens [Tue, 31 Jan 2023 18:50:25 +0000 (19:50 +0100)]
s390/hmcdrv: use strscpy() instead of strlcpy()

Given that strlcpy() is deprecated use strscpy() instead.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/ipl: add loadparm parameter to eckd ipl/reipl data
Sven Schnelle [Sun, 29 Jan 2023 15:39:19 +0000 (16:39 +0100)]
s390/ipl: add loadparm parameter to eckd ipl/reipl data

commit 87fd22e0ae92 ("s390/ipl: add eckd support") missed to add the
loadparm attribute to the new eckd ipl/reipl data.

Fixes: 87fd22e0ae92 ("s390/ipl: add eckd support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/ipl: add DEFINE_GENERIC_LOADPARM()
Sven Schnelle [Sun, 29 Jan 2023 15:36:03 +0000 (16:36 +0100)]
s390/ipl: add DEFINE_GENERIC_LOADPARM()

In the current code each reipl type implements its own pair of loadparm
show/store functions. Add a macro to deduplicate the code a bit.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: 87fd22e0ae92 ("s390/ipl: add eckd support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/mem_detect: do not update output parameters on failure
Alexander Gordeev [Wed, 25 Jan 2023 18:16:10 +0000 (19:16 +0100)]
s390/mem_detect: do not update output parameters on failure

Function __get_mem_detect_block() resets start and end
output parameters in case of invalid mem_detect array
index is provided. That violates the rule of sparing
the output on fail path and leads e.g to a below anomaly:

for_each_mem_detect_block(i, &start, &end)
continue;

One would expect start and end contain addresses of the
last memory block (if available), but in fact the two
will be reset to zeroes. That is not how an iterator is
expected to work.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cio: introduce locking for register/unregister functions
Vineeth Vijayan [Fri, 11 Nov 2022 12:46:33 +0000 (13:46 +0100)]
s390/cio: introduce locking for register/unregister functions

Unbinding an I/O subchannel with a child-CCW device in disconnected
state sometimes causes a kernel-panic. The race condition was seen
mostly during testing, when setting all the CHPIDs of a device to
offline and at the same time, the unbinding the I/O subchannel driver.

The kernel-panic occurs because of double delete, the I/O subchannel
driver calls device_del on the CCW device while another device_del
invocation for the same device is in-flight.  For instance, disabling
all the CHPIDs will trigger the ccw_device_remove function, which will
call a ccw_device_unregister(), which ends up calling the device_del()
which is asynchronous via cdev's todo workqueue. And unbinding the I/O
subchannel driver calls io_subchannel_remove() function which calls the
ccw_device_unregister() and device_del().

This double delete can be prevented by serializing all CCW device
registration/unregistration calls into the driver core. This patch
introduces a mutex which will be used for this purpose.

Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/mm,ptdump: avoid Kasan vs Memcpy Real markers swapping
Vasily Gorbik [Tue, 24 Jan 2023 17:08:38 +0000 (18:08 +0100)]
s390/mm,ptdump: avoid Kasan vs Memcpy Real markers swapping

---[ Real Memory Copy Area Start ]---
0x001bfffffffff000-0x001c000000000000         4K PTE I
---[ Kasan Shadow Start ]---
---[ Real Memory Copy Area End ]---
0x001c000000000000-0x001c000200000000         8G PMD RW NX
...
---[ Kasan Shadow End ]---

ptdump does a stable sort of markers. Move kasan markers after
memcpy real to avoid swapping.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/boot: remove pgtable_populate_end
Vasily Gorbik [Tue, 24 Jan 2023 16:03:24 +0000 (17:03 +0100)]
s390/boot: remove pgtable_populate_end

setup_vmem() already calls populate for all online memory regions.
pgtable_populate_end() could be removed.

Also rename pgtable_populate_begin() to pgtable_populate_init().

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/boot: avoid mapping standby memory
Vasily Gorbik [Mon, 23 Jan 2023 14:24:17 +0000 (15:24 +0100)]
s390/boot: avoid mapping standby memory

Commit bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
doesn't consider online memory holes due to potential memory offlining
and erroneously creates pgtables for stand-by memory, which bear RW+X
attribute and trigger a warning:

RANGE                                 SIZE   STATE REMOVABLE BLOCK
0x0000000000000000-0x0000000c3fffffff  49G  online       yes  0-48
0x0000000c40000000-0x0000000c7fffffff   1G offline              49
0x0000000c80000000-0x0000000fffffffff  14G  online       yes 50-63
0x0000001000000000-0x00000013ffffffff  16G offline           64-79

    s390/mm: Found insecure W+X mapping at address 0xc40000000
    WARNING: CPU: 14 PID: 1 at arch/s390/mm/dump_pagetables.c:142 note_page+0x2cc/0x2d8

Map only online memory ranges which fit within identity mapping limit.

Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/decompressor: specify __decompress() buf len to avoid overflow
Vasily Gorbik [Sun, 29 Jan 2023 22:47:23 +0000 (23:47 +0100)]
s390/decompressor: specify __decompress() buf len to avoid overflow

Historically calls to __decompress() didn't specify "out_len" parameter
on many architectures including s390, expecting that no writes beyond
uncompressed kernel image are performed. This has changed since commit
2aa14b1ab2c4 ("zstd: import usptream v1.5.2") which includes zstd library
commit 6a7ede3dfccb ("Reduce size of dctx by reutilizing dst buffer
(#2751)"). Now zstd decompression code might store literal buffer in
the unwritten portion of the destination buffer. Since "out_len" is
not set, it is considered to be unlimited and hence free to use for
optimization needs. On s390 this might corrupt initrd or ipl report
which are often placed right after the decompressor buffer. Luckily the
size of uncompressed kernel image is already known to the decompressor,
so to avoid the problem simply specify it in the "out_len" parameter.

Link: https://github.com/facebook/zstd/commit/6a7ede3dfccb
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hours
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: fix __diag288() inline assembly
Alexander Egorenkov [Fri, 27 Jan 2023 13:52:42 +0000 (14:52 +0100)]
watchdog: diag288_wdt: fix __diag288() inline assembly

The DIAG 288 statement consumes an EBCDIC string the address of which is
passed in a register. Use a "memory" clobber to tell the compiler that
memory is accessed within the inline assembly.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agowatchdog: diag288_wdt: do not use stack buffers for hardware data
Alexander Egorenkov [Fri, 27 Jan 2023 13:52:41 +0000 (14:52 +0100)]
watchdog: diag288_wdt: do not use stack buffers for hardware data

With CONFIG_VMAP_STACK=y the stack is allocated from the vmalloc space.
Data passed to a hardware or a hypervisor interface that
requires V=R can no longer be allocated on the stack.

Use kmalloc() to get memory for a diag288 command.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/syscalls: get rid of system call alias functions
Heiko Carstens [Mon, 23 Jan 2023 13:30:46 +0000 (14:30 +0100)]
s390/syscalls: get rid of system call alias functions

bpftrace and friends only consider functions present in
/sys/kernel/tracing/available_filter_functions.

For system calls there is the s390 specific problem that the system call
function itself is present via __se_sys##name() while the system call
itself is wired up via an __s390x_sys##name() alias. The required DWARF
debug information however is only available for the original function, not
the alias, but within available_filter_functions only the functions with
__s390x_ prefix are available. Which means the required DWARF debug
information cannot be found.
While this could be solved via tooling, it is easier to change the s390
specific system call wrapper handling.

Therefore get rid of this alias handling and implement system call wrappers
like most other architectures are doing. In result the implementation
generates the following functions:

long __s390x_sys##name(struct pt_regs *regs)
static inline long __se_sys##name(...)
static inline long __do_sys##name(...)

__s390x_sys##name() is the visible system call function which is also wired
up in the system call table. Its only parameter is a pt_regs variable.

This function calls the corresponding __se_sys##name() function, which has
as many parameters like the system call definition. This function in turn
performs all zero and sign extensions of all system call parameters, taken
from the pt_regs structure, and finally calls __do_sys##name().

__do_sys##name() is the actual inlined system call function implementation.

For all 64 bit system calls there is a 31/32 bit system call function
__s390_sys##name() generated, which handles all system call parameters
correctly as required by compat handling. This function may be wired
up within the compat system call table, unless there exists an
explicit compat system call function, which is then used instead.

Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/syscalls: remove trailing semicolon
Heiko Carstens [Mon, 23 Jan 2023 13:30:45 +0000 (14:30 +0100)]
s390/syscalls: remove trailing semicolon

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/syscalls: move __S390_SYS_STUBx() macro
Heiko Carstens [Mon, 23 Jan 2023 13:30:44 +0000 (14:30 +0100)]
s390/syscalls: move __S390_SYS_STUBx() macro

Move __S390_SYS_STUBx() the end of the CONFIG_COMPAT section, so both
variants (compat and non-compat) are close together and can be easily
compared.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/syscalls: remove __SC_COMPAT_TYPE define
Heiko Carstens [Mon, 23 Jan 2023 13:30:43 +0000 (14:30 +0100)]
s390/syscalls: remove __SC_COMPAT_TYPE define

Remove __SC_COMPAT_TYPE define which is an unused leftover.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/syscalls: remove SYSCALL_METADATA() from compat syscalls
Heiko Carstens [Mon, 23 Jan 2023 13:30:42 +0000 (14:30 +0100)]
s390/syscalls: remove SYSCALL_METADATA() from compat syscalls

SYSCALL_METADATA() is only supposed to be used for non-compat system
calls. Otherwise there would be a name clash.

This also removes the inconsistency that s390 is the only architecture
which uses SYSCALL_METADATA() for compat system calls, and even that only
for compat system calls without parameters. Only two such compat system
calls exist.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390: discard .interp section
Ilya Leoshkevich [Mon, 23 Jan 2023 21:50:32 +0000 (22:50 +0100)]
s390: discard .interp section

When debugging vmlinux with QEMU + GDB, the following GDB error may
occur:

    (gdb) c
    Continuing.
    Warning:
    Cannot insert breakpoint -1.
    Cannot access memory at address 0xffffffffffff95c0

    Command aborted.
    (gdb)

The reason is that, when .interp section is present, GDB tries to
locate the file specified in it in memory and put a number of
breakpoints there (see enable_break() function in gdb/solib-svr4.c).
Sometimes GDB finds a bogus location that matches its heuristics,
fails to set a breakpoint and stops. This makes further debugging
impossible.

The .interp section contains misleading information anyway (vmlinux
does not need ld.so), so fix by discarding it.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_cf: simplify PMC_INIT and PMC_RELEASE usage
Thomas Richter [Tue, 24 Jan 2023 11:20:55 +0000 (12:20 +0100)]
s390/cpum_cf: simplify PMC_INIT and PMC_RELEASE usage

Simplify the use of constants PMC_INIT and PMC_RELEASE.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_cf: merge source files for CPU Measurement counter facility
Thomas Richter [Tue, 24 Jan 2023 11:20:54 +0000 (12:20 +0100)]
s390/cpum_cf: merge source files for CPU Measurement counter facility

With no in-kernel user, the source files can be merged.

Move all functions and the variable definitions to file perf_cpum_cf.c
This file now contains all the necessary functions and definitions
for the CPU Measurement counter facility device driver.

The files cpu_mcf.h and perf_cpum_cf_common.c are deleted.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_cf: remove in-kernel counting facility interface
Thomas Richter [Tue, 24 Jan 2023 11:20:53 +0000 (12:20 +0100)]
s390/cpum_cf: remove in-kernel counting facility interface

Commit 17bebcc68eee ("s390/cpum_cf: Add minimal in-kernel interface for
counter measurements") introduced a small in-kernel interface for CPU
Measurement counter facility.
There are no users of this interface, therefore remove it.

The following functions are removed:
 kernel_cpumcf_alert(),
 kernel_cpumcf_begin(),
 kernel_cpumcf_end(),
 kernel_cpumcf_avail()
there is no need for them anymore.
With the removal of function kernel_cpumcf_alert(), also remove
member alert in struct cpu_cf_events. Its purpose was to counter
measurement alert interrupts for the in-kernel interface.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_cf: move stccm_avail()
Thomas Richter [Tue, 24 Jan 2023 11:20:52 +0000 (12:20 +0100)]
s390/cpum_cf: move stccm_avail()

Function stccm_avail() is defined in a header file and the
only user is one single source file. Move this function to the source
file where it is also used and remove it from the header file.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_cf: move cpum_cf_ctrset_size()
Thomas Richter [Tue, 24 Jan 2023 11:20:51 +0000 (12:20 +0100)]
s390/cpum_cf: move cpum_cf_ctrset_size()

Function cpum_cf_ctrset_size() is defined in one source file and the
only user is in another source file. Move this function to the source
file where it is used and remove its prototype from the header file.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_cf: simplify hw_perf_event_destroy()
Thomas Richter [Tue, 24 Jan 2023 11:20:50 +0000 (12:20 +0100)]
s390/cpum_cf: simplify hw_perf_event_destroy()

To remove an event from the CPU Measurement counter facility
use the lock/unlock scheme as done in event creation. Remove
the atomic_add_unless function to make the code easier.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cache: change type from unsigned long long to unsigned long
Heiko Carstens [Sun, 22 Jan 2023 18:13:03 +0000 (19:13 +0100)]
s390/cache: change type from unsigned long long to unsigned long

The unsigned long long type is a leftover of the 31 bit area.
Get rid of it.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vfio_ap: increase max wait time for reset verification
Tony Krowiak [Wed, 18 Jan 2023 20:31:11 +0000 (15:31 -0500)]
s390/vfio_ap: increase max wait time for reset verification

Increase the maximum time to wait for verification of a queue reset
operation to 200ms.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230118203111.529766-7-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vfio_ap: fix handling of error response codes
Tony Krowiak [Wed, 18 Jan 2023 20:31:10 +0000 (15:31 -0500)]
s390/vfio_ap: fix handling of error response codes

Some response codes returned from the queue reset function are not being
handled correctly; this patch fixes them:

1. Response code 3, AP queue deconfigured: Deconfiguring an AP adapter
   resets all of its queues, so this is handled by indicating the reset
   verification completed successfully.

2. For all response codes other than 0 (normal reset completion), 2
   (queue reset in progress) and 3 (AP deconfigured), the -EIO error will
   be returned from the vfio_ap_mdev_reset_queue() function. In all cases,
   all fields of the status word other than the response code will be
   set to zero, so it makes no sense to check status bits.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230118203111.529766-6-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vfio_ap: verify ZAPQ completion after return of response code zero
Tony Krowiak [Wed, 18 Jan 2023 20:31:09 +0000 (15:31 -0500)]
s390/vfio_ap: verify ZAPQ completion after return of response code zero

Verification that the asynchronous ZAPQ function has completed only needs
to be done when the response code indicates the function was successfully
initiated; so, let's call the apq_reset_check function immediately after
the response code zero is returned from the ZAPQ.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230118203111.529766-5-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vfio_ap: use TAPQ to verify reset in progress completes
Tony Krowiak [Wed, 18 Jan 2023 20:31:08 +0000 (15:31 -0500)]
s390/vfio_ap: use TAPQ to verify reset in progress completes

To eliminate the repeated calls to the PQAP(ZAPQ) function to verify that
a reset in progress completed successfully and ensure that error response
codes get appropriately logged, let's call the apq_reset_check() function
when the ZAPQ response code indicates that a reset that is already in
progress.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230118203111.529766-4-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vfio_ap: check TAPQ response code when waiting for queue reset
Tony Krowiak [Wed, 18 Jan 2023 20:31:07 +0000 (15:31 -0500)]
s390/vfio_ap: check TAPQ response code when waiting for queue reset

The vfio_ap_mdev_reset_queue() function does not check the status
response code returned form the PQAP(TAPQ) function when verifying the
queue's status; consequently, there is no way of knowing whether
verification failed because the wait time was exceeded, or because the
PQAP(TAPQ) failed.

This patch adds a function to check the status response code from the
PQAP(TAPQ) instruction and logs an appropriate message if it fails.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230118203111.529766-3-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vfio-ap: verify reset complete in separate function
Tony Krowiak [Wed, 18 Jan 2023 20:31:06 +0000 (15:31 -0500)]
s390/vfio-ap: verify reset complete in separate function

The vfio_ap_mdev_reset_queue() function contains a loop to verify that the
reset successfully completes within 40ms. This patch moves that loop into
a separate function.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230118203111.529766-2-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/kprobes: replace kretprobe with rethook
Vasily Gorbik [Tue, 17 Jan 2023 13:37:10 +0000 (14:37 +0100)]
s390/kprobes: replace kretprobe with rethook

That's an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes:
Replace kretprobe with rethook on x86") to s390.

Replaces the kretprobe code with rethook on s390. With this patch,
kretprobe on s390 uses the rethook instead of kretprobe specific
trampoline code.

Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_sf: diagnostic sampling buffer setup to handle virtual addresses
Thomas Richter [Fri, 13 Jan 2023 11:29:55 +0000 (12:29 +0100)]
s390/cpum_sf: diagnostic sampling buffer setup to handle virtual addresses

The CPU Measurement Sampling Facility (CPUM_SF) installs large buffers
to save samples collected by hardware. These buffers are organized
as Sample Data Buffer Tables (SDBT) and Sample Data Buffers (SDB).
SDBs contain the samples which are extracted and saved in the perf
ring buffer.
The SDBTs are chained using real addresses and refer to SDBs using
real addresses.

The diagnostic sampling setup uses buffers provided by the process
which invokes perf_event_open system call. The buffers are memory
mapped. The buffers have been allocated by the kernel event
subsystem.

Add proper virtual to phyiscal address translation to the buffer
chaining. The current constraint which requires virtual equals
real address layout is removed.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_sf: rework macro AUX_SDB_NUM_xxx
Thomas Richter [Fri, 13 Jan 2023 11:12:07 +0000 (12:12 +0100)]
s390/cpum_sf: rework macro AUX_SDB_NUM_xxx

Macro AUX_SDB_NUM() has three parameters. The first one is not used.
Remove the first parameter. Also convert the macros to inline functions.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_sf: sampling buffer setup to handle virtual addresses
Thomas Richter [Mon, 16 Jan 2023 13:58:22 +0000 (14:58 +0100)]
s390/cpum_sf: sampling buffer setup to handle virtual addresses

The CPU Measurement Sampling Facility (CPUM_SF) installs large buffers
to save samples collected by hardware. These buffers are organized
as Sample Data Buffer Tables (SDBT) and Sample Data Buffers (SDB).
SDBs contain the samples which are extracted and saved in the perf
ring buffer.
The SDBTs are chained using real addresses and refer to SDBs using
real addresses.

Adds proper virtual to phyiscal address translation to the buffer
chaining. The current constraint which requires virtual equals real
address layout is removed.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_sf: remove debug statements from function setup_pmc_cpu
Thomas Richter [Thu, 5 Jan 2023 13:11:33 +0000 (14:11 +0100)]
s390/cpum_sf: remove debug statements from function setup_pmc_cpu

Remove debug statements from function setup_pmc_cpu().
The debug statement displays a pointer value to a per cpu variable.
This pointer value is printed nowhere else, so it has no use for
cross reference.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_sf: move functions from header file to source file
Thomas Richter [Thu, 12 Jan 2023 14:09:21 +0000 (15:09 +0100)]
s390/cpum_sf: move functions from header file to source file

Some inline helper functions are defined in a header file but used
in only one source file. Move these functions to the source file.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/vmem: use swap() instead of open coding it
Jiapeng Chong [Tue, 17 Jan 2023 06:02:23 +0000 (14:02 +0800)]
s390/vmem: use swap() instead of open coding it

Swap is a function interface that provides exchange function. To avoid
code duplication, we can use swap function.

./arch/s390/mm/vmem.c:680:10-11: WARNING opportunity for swap().

[hca@linux.ibm.com: get rid of all temp variables]
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3786
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230117060223.58583-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cio: evaluate devices with non-operational paths
Vineeth Vijayan [Thu, 17 Nov 2022 06:31:50 +0000 (07:31 +0100)]
s390/cio: evaluate devices with non-operational paths

css_schedule_reprobe() function calls the evaluation for CSS_EVAL_UNREG
which is specific to the idset of unregistered subchannels. This
evaluation was introduced because, previously, if the underlying device
become not-accessible, the subchannel was unregistered. But, in the recent
changes in cio,with the commit '2297791c92d0 s390/cio: dont unregister
subchannel from child-drivers', we no  longer unregister the subchannels
just because of a non-operational device. This allows to have subchannels
without any operational device connected on it. So, a css_schedule_reprobe
function on unregistered subchannel does not have any effect.

Change this functionality to evaluate the subchannels which does not
have a working path to the device. This could be due the erroneous
device or due to the erraneous path. Evaluate based on the values of OPM
and PAM&POM.
Here we introduced a new idset function,to keep I/O subchannels in the
idset when the last seen status indicates that the device has no working
path. A device has no working path if all available paths have been tried
without success.A failed I/O attempt on a path is indicated as a 0 bit
value in the POM mask. By looking at the POM mask bit values of available
paths (1 in PAM) that Linux is supposed to use (1 in vary mask OPM), we
can identify a non-working device as a device where the bit-wise and of
the PAM, POM and OPM mask return 0.

css_schedule_reprobe() is being used by dasd-driver and chsc-cio
component. dasd driver, when it detects a change in the pathgroup, invokes
the re-evaluation of the subchannel. And chsc-cio component upon a CRW
event, (resource accessibility event). In both the cases, it makes much
better sense to re-evalute the subchannel with no-valid path.

Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reported-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/ipl: use kstrtobool() instead of strtobool()
Christophe JAILLET [Sat, 14 Jan 2023 15:08:22 +0000 (16:08 +0100)]
s390/ipl: use kstrtobool() instead of strtobool()

strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.

In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/58a3ed2e21903a93dfd742943b1e6936863ca037.1673708887.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agoMerge branch 'fixes' into features
Heiko Carstens [Tue, 17 Jan 2023 18:12:39 +0000 (19:12 +0100)]
Merge branch 'fixes' into features

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390: workaround invalid gcc-11 out of bounds read warning
Heiko Carstens [Tue, 17 Jan 2023 18:00:59 +0000 (19:00 +0100)]
s390: workaround invalid gcc-11 out of bounds read warning

GCC 11.1.0 and 11.2.0 generate a wrong warning when compiling the
kernel e.g. with allmodconfig:

arch/s390/kernel/setup.c: In function â€˜setup_lowcore_dat_on’:
./include/linux/fortify-string.h:57:33: error: â€˜__builtin_memcpy’ reading 128 bytes from a region of size 0 [-Werror=stringop-overread]
...
arch/s390/kernel/setup.c:526:9: note: in expansion of macro â€˜memcpy’
  526 |         memcpy(abs_lc->cregs_save_area, S390_lowcore.cregs_save_area,
      |         ^~~~~~

This could be addressed by using absolute_pointer() with the
S390_lowcore macro, but this is not a good idea since this generates
worse code for performance critical paths.

Therefore simply use a for loop to copy the array in question and get
rid of the warning.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agodocs/ABI: use linux-s390 list as the main contact
Vineeth Vijayan [Mon, 2 Jan 2023 19:02:53 +0000 (20:02 +0100)]
docs/ABI: use linux-s390 list as the main contact

Remove Cornelia's email address from the file as suggested by her. List
linux-s390 mailing-list address as the primary contact instead.

Link: https://lore.kernel.org/linux-s390/8735d0oiq6.fsf@redhat.com/
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/vfio-ap: fix an error handling path in vfio_ap_mdev_probe_queue()
Christophe JAILLET [Sun, 31 Jul 2022 16:09:14 +0000 (18:09 +0200)]
s390/vfio-ap: fix an error handling path in vfio_ap_mdev_probe_queue()

The commit in Fixes: has switch the order of a sysfs_create_group() and a
kzalloc().

It correctly removed the now useless kfree() but forgot to add a
sysfs_remove_group() in case of (unlikely) memory allocation failure.

Add it now.

Fixes: 260f3ea14138 ("s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/d0c0a35eec4fa87cb7f3910d8ac4dc0f7dc9008a.1659283738.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390: move __amode31_base declaration to proper header file
Heiko Carstens [Sun, 8 Jan 2023 18:08:55 +0000 (19:08 +0100)]
s390: move __amode31_base declaration to proper header file

Move __amode31_base declaration to proper header file to get rid of

arch/s390/boot/startup.c:24:15:
 warning: symbol '__amode31_base' was not declared. Should it be static?

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/mm: allocate Absolute Lowcore Area in decompressor
Alexander Gordeev [Mon, 19 Dec 2022 20:08:27 +0000 (21:08 +0100)]
s390/mm: allocate Absolute Lowcore Area in decompressor

Move Absolute Lowcore Area allocation to the decompressor.
As result, get_abs_lowcore() and put_abs_lowcore() access
brackets become really straight and do not require complex
execution context analysis and LAP and interrupts tackling.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/mm: allocate Real Memory Copy Area in decompressor
Alexander Gordeev [Sun, 11 Dec 2022 07:18:57 +0000 (08:18 +0100)]
s390/mm: allocate Real Memory Copy Area in decompressor

Move Real Memory Copy Area allocation to the decompressor.
As result, memcpy_real() and memcpy_real_iter() movers
become usable since the very moment the kernel starts.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/boot: allow setup of different virtual address types
Alexander Gordeev [Thu, 15 Dec 2022 09:33:52 +0000 (10:33 +0100)]
s390/boot: allow setup of different virtual address types

Currently the decompressor sets up only identity mapping.
Allow adding more address range types as a prerequisite
for allocation of kernel fixed mappings.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/kasan: remove identity mapping support
Alexander Gordeev [Fri, 16 Dec 2022 18:49:23 +0000 (19:49 +0100)]
s390/kasan: remove identity mapping support

The identity mapping is created in the decompressor,
there is no need to have the same functionality in
the kasan setup code. Thus, remove it.

Remove the 4KB pages check for first 1MB since there
is no need to take care of the lowcore pages.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/maccess: remove dead DAT-off code
Alexander Gordeev [Fri, 2 Dec 2022 18:23:11 +0000 (19:23 +0100)]
s390/maccess: remove dead DAT-off code

As the kernel is executed in DAT-on mode only, remove
unnecessary DAT bit check together with the dead code.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/mm: start kernel with DAT enabled
Alexander Gordeev [Tue, 13 Dec 2022 10:35:11 +0000 (11:35 +0100)]
s390/mm: start kernel with DAT enabled

The setup of the kernel virtual address space is spread
throughout the sources, boot stages and config options
like this:

1. The available physical memory regions are queried
   and stored as mem_detect information for later use
   in the decompressor.

2. Based on the physical memory availability the virtual
   memory layout is established in the decompressor;

3. If CONFIG_KASAN is disabled the kernel paging setup
   code populates kernel pgtables and turns DAT mode on.
   It uses the information stored at step [1].

4. If CONFIG_KASAN is enabled the kernel early boot
   kasan setup populates kernel pgtables and turns DAT
   mode on. It uses the information stored at step [1].

   The kasan setup creates early_pg_dir directory and
   directly overwrites swapper_pg_dir entries to make
   shadow memory pages available.

Move the kernel virtual memory setup to the decompressor
and start the kernel with DAT turned on right from the
very first istruction. That completely eliminates the
boot phase when the kernel runs in DAT-off mode, simplies
the overall design and consolidates pgtables setup.

The identity mapping is created in the decompressor, while
kasan shadow mappings are still created by the early boot
kernel code.

Share with decompressor the existing kasan memory allocator.
It decreases the size of a newly requested memory block from
pgalloc_pos and ensures that kernel image is not overwritten.
pgalloc_low and pgalloc_pos pointers are made preserved boot
variables for that.

Use the bootdata infrastructure to setup swapper_pg_dir
and invalid_pg_dir directories used by the kernel later.
The interim early_pg_dir directory established by the
kasan initialization code gets eliminated as result.

As the kernel runs in DAT-on mode only the PSW_KERNEL_BITS
define gets PSW_MASK_DAT bit by default. Additionally, the
setup_lowcore_dat_off() and setup_lowcore_dat_on() routines
get merged, since there is no DAT-off mode stage anymore.

The memory mappings are created with RW+X protection that
allows the early boot code setting up all necessary data
and services for the kernel being booted. Just before the
paging is enabled the memory protection is changed to
RO+X for text, RO+NX for read-only data and RW+NX for
kernel data and the identity mapping.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/boot: detect and enable memory facilities
Alexander Gordeev [Sun, 4 Dec 2022 20:15:41 +0000 (21:15 +0100)]
s390/boot: detect and enable memory facilities

Detect and enable memory facilities which is a
prerequisite for pgtables setup in the decompressor.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/pgtable: add REGION3_KERNEL_EXEC protection
Alexander Gordeev [Fri, 2 Dec 2022 18:07:07 +0000 (19:07 +0100)]
s390/pgtable: add REGION3_KERNEL_EXEC protection

Similar to existing PAGE_KERNEL_EXEC and SEGMENT_KERNEL_EXEC
memory protection add REGION3_KERNEL_EXEC attribute that
could be set on PUD pgtable entries.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/kasan: use set_pXe_bit() for pgtable entries setup
Alexander Gordeev [Fri, 16 Dec 2022 18:07:38 +0000 (19:07 +0100)]
s390/kasan: use set_pXe_bit() for pgtable entries setup

Convert setup of pgtable entries to use set_pXe_bit()
helpers as the preferred way in MM code.

Locally introduce pgprot_clear_bit() helper, which is
strictly speaking a generic function. However, it is
only x86 pgprot_clear_protnone_bits() helper, which
does a similar thing, so do not make it public.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/kasan: cleanup setup of untracked memory pgtables
Alexander Gordeev [Sat, 10 Dec 2022 08:49:04 +0000 (09:49 +0100)]
s390/kasan: cleanup setup of untracked memory pgtables

Avoid duplicate IS_ENABLED(CONFIG_KASAN_VMALLOC) condition check.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/kasan: cleanup setup of zero pgtable
Alexander Gordeev [Fri, 9 Dec 2022 21:09:44 +0000 (22:09 +0100)]
s390/kasan: cleanup setup of zero pgtable

Fix variables initialization coding style and setup zero
pgtable same way region and segment pgtables are set up.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/kasan: sort out physical vs virtual memory confusion
Alexander Gordeev [Tue, 13 Dec 2022 10:31:39 +0000 (11:31 +0100)]
s390/kasan: sort out physical vs virtual memory confusion

The kasan early boot memory allocators operate on pgalloc_pos
and segment_pos physical address pointers, but fail to convert
it to the corresponding virtual pointers.

Currently it is not a problem, since virtual and physical
addresses on s390 are the same. Nevertheless, should they
ever differ, this would cause an invalid pointer access.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/early: fix sclp_early_sccb variable lifetime
Alexander Gordeev [Thu, 15 Dec 2022 07:00:34 +0000 (08:00 +0100)]
s390/early: fix sclp_early_sccb variable lifetime

Commit ada1da31ce34 ("s390/sclp: sort out physical vs
virtual pointers usage") fixed the notion of virtual
address for sclp_early_sccb pointer. However, it did
not take into account that kasan_early_init() can also
output messages and sclp_early_sccb should be adjusted
by the time kasan_early_init() is called.

Currently it is not a problem, since virtual and physical
addresses on s390 are the same. Nevertheless, should they
ever differ, this would cause an invalid pointer access.

Fixes: ada1da31ce34 ("s390/sclp: sort out physical vs virtual pointers usage")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/boot: cleanup decompressor header files
Alexander Gordeev [Thu, 5 May 2022 14:54:54 +0000 (16:54 +0200)]
s390/boot: cleanup decompressor header files

Move declarations to appropriate header files. Instead of cryptic
casting directly assign struct vmlinux_info type to _vmlinux_info
linker script variable - wich it actually is.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390: update defconfigs
Heiko Carstens [Wed, 11 Jan 2023 14:04:47 +0000 (15:04 +0100)]
s390: update defconfigs

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agoKVM: s390: interrupt: use READ_ONCE() before cmpxchg()
Heiko Carstens [Mon, 9 Jan 2023 14:54:56 +0000 (15:54 +0100)]
KVM: s390: interrupt: use READ_ONCE() before cmpxchg()

Use READ_ONCE() before cmpxchg() to prevent that the compiler generates
code that fetches the to be compared old value several times from memory.

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()
Heiko Carstens [Mon, 9 Jan 2023 10:51:20 +0000 (11:51 +0100)]
s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()

Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only
dereferenced once by using READ_ONCE(). Otherwise the compiler could
generate incorrect code.

Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/cpum_sf: add READ_ONCE() semantics to compare and swap loops
Heiko Carstens [Thu, 5 Jan 2023 14:44:20 +0000 (15:44 +0100)]
s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops

The current cmpxchg_double() loops within the perf hw sampling code do not
have READ_ONCE() semantics to read the old value from memory. This allows
the compiler to generate code which reads the "old" value several times
from memory, which again allows for inconsistencies.

For example:

        /* Reset trailer (using compare-double-and-swap) */
        do {
                te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK;
                te_flags |= SDB_TE_ALERT_REQ_MASK;
        } while (!cmpxchg_double(&te->flags, &te->overflow,
                 te->flags, te->overflow,
                 te_flags, 0ULL));

The compiler could generate code where te->flags used within the
cmpxchg_double() call may be refetched from memory and which is not
necessarily identical to the previous read version which was used to
generate te_flags. Which in turn means that an incorrect update could
happen.

Fix this by adding READ_ONCE() semantics to all cmpxchg_double()
loops. Given that READ_ONCE() cannot generate code on s390 which atomically
reads 16 bytes, use a private compare-and-swap-double implementation to
achieve that.

Also replace cmpxchg_double() with the private implementation to be able to
re-use the old value within the loops.

As a side effect this converts the whole code to only use bit fields
to read and modify bits within the hws trailer header.

Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/archrandom: add missing header include
Heiko Carstens [Sun, 8 Jan 2023 18:13:49 +0000 (19:13 +0100)]
s390/archrandom: add missing header include

Add missing header include to get rid of

arch/s390/crypto/arch_random.c:15:1:
 warning: symbol 's390_arch_random_available' was not declared. Should it be static?
arch/s390/crypto/arch_random.c:17:12:
 warning: symbol 's390_arch_random_counter' was not declared. Should it be static?

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/con3270: move condev definition
Heiko Carstens [Tue, 10 Jan 2023 12:49:23 +0000 (13:49 +0100)]
s390/con3270: move condev definition

Fix this for allmodconfig:

drivers/s390/char/con3270.c:43:24: error: 'condev' defined but not used [-Werror=unused-variable]
 static struct tty3270 *condev;
                        ^~~~~~

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: c17fe081ac1f ("s390/3270: unify con3270 + tty3270")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/kexec: fix ipl report address for kdump
Alexander Egorenkov [Mon, 14 Nov 2022 10:40:08 +0000 (11:40 +0100)]
s390/kexec: fix ipl report address for kdump

This commit addresses the following erroneous situation with file-based
kdump executed on a system with a valid IPL report.

On s390, a kdump kernel, its initrd and IPL report if present are loaded
into a special and reserved on boot memory region - crashkernel. When
a system crashes and kdump was activated before, the purgatory code
is entered first which swaps the crashkernel and [0 - crashkernel size]
memory regions. Only after that the kdump kernel is entered. For this
reason, the pointer to an IPL report in lowcore must point to the IPL report
after the swap and not to the address of the IPL report that was located in
crashkernel memory region before the swap. Failing to do so, makes the
kdump's decompressor try to read memory from the crashkernel memory region
which already contains the production's kernel memory.

The situation described above caused spontaneous kdump failures/hangs
on systems where the Secure IPL is activated because on such systems
an IPL report is always present. In that case kdump's decompressor tried
to parse an IPL report which frequently lead to illegal memory accesses
because an IPL report contains addresses to various data.

Cc: <stable@vger.kernel.org>
Fixes: 99feaa717e55 ("s390/kexec_file: Create ipl report and pass to next kernel")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agos390/zcrypt: use strscpy() to instead of strncpy()
Xu Panda [Thu, 5 Jan 2023 12:24:34 +0000 (20:24 +0800)]
s390/zcrypt: use strscpy() to instead of strncpy()

The implementation of strscpy() is more robust and safer.
That's now the recommended way to copy NUL-terminated strings.

Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Link: https://lore.kernel.org/r/202301052024349365834@zte.com.cn
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: remove old IDA format restrictions
Eric Farman [Fri, 19 Feb 2021 19:41:49 +0000 (20:41 +0100)]
vfio/ccw: remove old IDA format restrictions

By this point, all the pieces are in place to properly support
a 2K Format-2 IDAL, and to convert a guest Format-1 IDAL to
the 2K Format-2 variety. Let's remove the fence that prohibits
them, and allow a guest to submit them if desired.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: don't group contiguous pages on 2K IDAWs
Eric Farman [Thu, 11 Aug 2022 17:20:54 +0000 (19:20 +0200)]
vfio/ccw: don't group contiguous pages on 2K IDAWs

The vfio_pin_pages() interface allows contiguous pages to be
pinned as a single request, which is great for the 4K pages
that are normally processed. Old IDA formats operate on 2K
chunks, which makes this logic more difficult.

Since these formats are rare, let's just invoke the page
pinning one-at-a-time, instead of trying to group them.
We can rework this code at a later date if needed.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: handle a guest Format-1 IDAL
Eric Farman [Mon, 31 Oct 2022 18:48:29 +0000 (19:48 +0100)]
vfio/ccw: handle a guest Format-1 IDAL

There are two scenarios that need to be addressed here.

First, an ORB that does NOT have the Format-2 IDAL bit set could
have both a direct-addressed CCW and an indirect-data-address CCW
chained together. This means that the IDA CCW will contain a
Format-1 IDAL, and can be easily converted to a 2K Format-2 IDAL.
But it also means that the direct-addressed CCW needs to be
converted to the same 2K Format-2 IDAL for consistency with the
ORB settings.

Secondly, a Format-1 IDAL is comprised of 31-bit addresses.
Thus, we need to cast this IDAL to a pointer of ints while
populating the list of addresses that are sent to vfio.

Since the result of both of these is the use of the 2K IDAL
variants, and the output of vfio-ccw is always a Format-2 IDAL
(in order to use 64-bit addresses), make sure that the correct
control bit gets set in the ORB when these scenarios occur.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: allocate/populate the guest idal
Eric Farman [Fri, 21 Oct 2022 18:53:51 +0000 (20:53 +0200)]
vfio/ccw: allocate/populate the guest idal

Today, we allocate memory for a list of IDAWs, and if the CCW
being processed contains an IDAL we read that data from the guest
into that space. We then copy each IDAW into the pa_iova array,
or fabricate that pa_iova array with a list of addresses based
on a direct-addressed CCW.

Combine the reading of the guest IDAL with the creation of a
pseudo-IDAL for direct-addressed CCWs, so that both CCW types
have a "guest" IDAL that can be populated straight into the
pa_iova array.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: calculate number of IDAWs regardless of format
Eric Farman [Wed, 2 Dec 2020 18:37:24 +0000 (19:37 +0100)]
vfio/ccw: calculate number of IDAWs regardless of format

The idal_nr_words() routine works well for 4K IDAWs, but lost its
ability to handle the old 2K formats with the removal of 31-bit
builds in commit 5a79859ae0f3 ("s390: remove 31 bit support").

Since there's nothing preventing a guest from generating this IDAW
format, let's re-introduce the math for them and use both when
calculating the number of IDAWs based on the bits specified in
the ORB.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: read only one Format-1 IDAW
Eric Farman [Thu, 20 Oct 2022 17:00:14 +0000 (19:00 +0200)]
vfio/ccw: read only one Format-1 IDAW

The intention is to read the first IDAW to determine the starting
location of an I/O operation, knowing that the second and any/all
subsequent IDAWs will be aligned per architecture. But, this read
receives 64-bits of data, which is the size of a Format-2 IDAW.

In the event that Format-1 IDAWs are presented, adjust the size
of the read to 32-bits. The data will end up occupying the upper
word of the target iova variable, so shift it down to the lower
word for use as an address. (By definition, this IDAW format
uses a 31-bit address, so the "sign" bit will always be off and
there is no concern about sign extension.)

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: refactor the idaw counter
Eric Farman [Mon, 31 Oct 2022 18:12:54 +0000 (19:12 +0100)]
vfio/ccw: refactor the idaw counter

The rules of an IDAW are fairly simple: Each one can move no
more than a defined amount of data, must not cross the
boundary defined by that length, and must be aligned to that
length as well. The first IDAW in a list is special, in that
it does not need to adhere to that alignment, but the other
rules still apply. Thus, by reading the first IDAW in a list,
the number of IDAWs that will comprise a data transfer of a
particular size can be calculated.

Let's factor out the reading of that first IDAW with the
logic that calculates the length of the list, to simplify
the rest of the routine that handles the individual IDAWs.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: populate page_array struct inline
Eric Farman [Fri, 21 Oct 2022 15:02:48 +0000 (17:02 +0200)]
vfio/ccw: populate page_array struct inline

There are two possible ways the list of addresses that get passed
to vfio are calculated. One is from a guest IDAL, which would be
an array of (probably) non-contiguous addresses. The other is
built from contiguous pages that follow the starting address
provided by ccw->cda.

page_array_alloc() attempts to simplify things by pre-populating
this array from the starting address, but that's not needed for
a CCW with an IDAL anyway so doesn't need to be in the allocator.
Move it to the caller in the non-IDAL case, since it will be
overwritten when reading the guest IDAL.

Remove the initialization of the pa_page output pointers,
since it won't be explicitly needed for either case.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: pass page count to page_array struct
Eric Farman [Fri, 21 Oct 2022 14:59:13 +0000 (16:59 +0200)]
vfio/ccw: pass page count to page_array struct

The allocation of our page_array struct calculates the number
of 4K pages that would be needed to hold a certain number of
bytes. But, since the number of pages that will be pinned is
also calculated by the length of the IDAL, this logic is
unnecessary. Let's pass that information in directly, and
avoid the math within the allocator.

Also, let's make this two allocations instead of one,
to make it apparent what's happening within here.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: remove unnecessary malloc alignment
Eric Farman [Thu, 22 Oct 2020 14:54:32 +0000 (16:54 +0200)]
vfio/ccw: remove unnecessary malloc alignment

Everything about this allocation is harder than necessary,
since the memory allocation is already aligned to our needs.
Break them apart for readability, instead of doing the
funky arithmetic.

Of the structures that are involved, only ch_ccw needs the
GFP_DMA flag, so the others can be allocated without it.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
22 months agovfio/ccw: simplify CCW chain fetch routines
Eric Farman [Fri, 21 Oct 2022 13:32:30 +0000 (15:32 +0200)]
vfio/ccw: simplify CCW chain fetch routines

The act of processing a fetched CCW has two components:

 1) Process a Transfer-in-channel (TIC) CCW
 2) Process any other CCW

The former needs to look at whether the TIC jumps backwards into
the current channel program or forwards into a new segment,
while the latter just processes the CCW data address itself.

Rather than passing the chain segment and index within it to the
handlers for the above, and requiring each to calculate the
elements it needs, simply pass the needed pointers directly.

For the TIC, that means the CCW being processed and the location
of the entire channel program which holds all segments. For the
other CCWs, the page_array pointer is also needed to perform the
page pinning, etc.

While at it, rename ccwchain_fetch_direct to _ccw, to indicate
what it is. The name "_direct" is historical, when it used to
process a direct-addressed CCW, but IDAs are processed here too.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>