platform/kernel/linux-rpi.git
10 months agoscripts/gdb/slab: add slab support
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:17 +0000 (16:30 +0800)]
scripts/gdb/slab: add slab support

Add 'lx-slabinfo' and 'lx-slabtrace' support.

This GDB scripts print slabinfo and slabtrace for user
to analyze slab memory usage.

Example output like below:
(gdb) lx-slabinfo
     Pointer       |         name         | active_objs  |   num_objs   | objsize  | objperslab  | pagesperslab
------------------ | -------------------- | ------------ | ------------ | -------- | ----------- | -------------
0xffff0000c59df480 | p9_req_t             | 0            | 0            | 280      | 29          | 2
0xffff0000c59df280 | isp1760_qh           | 0            | 0            | 160      | 25          | 1
0xffff0000c59df080 | isp1760_qtd          | 0            | 0            | 184      | 22          | 1
0xffff0000c59dee80 | isp1760_urb_listite  | 0            | 0            | 136      | 30          | 1
0xffff0000c59dec80 | asd_sas_event        | 0            | 0            | 256      | 32          | 2
0xffff0000c59dea80 | sas_task             | 0            | 0            | 448      | 36          | 4
0xffff0000c59de880 | bio-120              | 18           | 21           | 384      | 21          | 2
0xffff0000c59de680 | io_kiocb             | 0            | 0            | 448      | 36          | 4
0xffff0000c59de480 | bfq_io_cq            | 0            | 0            | 1504     | 21          | 8
0xffff0000c59de280 | bfq_queue            | 0            | 0            | 720      | 22          | 4
0xffff0000c59de080 | mqueue_inode_cache   | 1            | 28           | 1152     | 28          | 8
0xffff0000c59dde80 | v9fs_inode_cache     | 0            | 0            | 832      | 39          | 8
...

(gdb) lx-slabtrace --cache_name kmalloc-1k
63 <tty_register_device_attr+508> waste=16632/264 age=46856/46871/46888 pid=1 cpus=6,
   0xffff800008720240 <__kmem_cache_alloc_node+236>:    mov     x22, x0
   0xffff80000862a4fc <kmalloc_trace+64>:       mov     x21, x0
   0xffff8000095d086c <tty_register_device_attr+508>:   mov     x19, x0
   0xffff8000095d0f98 <tty_register_driver+704>:        cmn     x0, #0x1, lsl #12
   0xffff80000c2677e8 <vty_init+620>:   Cannot access memory at address 0xffff80000c2677e8
   0xffff80000c265a10 <tty_init+276>:   Cannot access memory at address 0xffff80000c265a10
   0xffff80000c26d3c4 <chr_dev_init+204>:       Cannot access memory at address 0xffff80000c26d3c4
   0xffff8000080161d4 <do_one_initcall+176>:    mov     w21, w0
   0xffff80000c1c1b58 <kernel_init_freeable+956>:       Cannot access memory at address 0xffff80000c1c1b58
   0xffff80000acf1334 <kernel_init+36>: bl      0xffff8000081ac040 <async_synchronize_full>
   0xffff800008018d00 <ret_from_fork+16>:       mrs     x28, sp_el0

(gdb) lx-slabtrace --cache_name kmalloc-1k --free
428 <not-available> age=4294958600 pid=0 cpus=0,

Link: https://lkml.kernel.org/r/20230808083020.22254-8-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscripts/gdb/page_owner: add page owner support
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:16 +0000 (16:30 +0800)]
scripts/gdb/page_owner: add page owner support

This GDB script prints page owner information for user to analyze the
memory usage or memory corruption issue.

Example output from an aarch64 system:

(gdb) lx-dump-page-owner --pfn 655360
page_owner tracks the page as allocated
Page last allocated via order 0, gfp_mask: 0x8, pid: 1, tgid: 1 ("swapper/0\000\000\000\000\000\000"), ts 1295948880 ns, free_ts 1011852016 ns
PFN: 655360, Flags: 0x3fffc0000000000
   0xffff8000086ab964 <post_alloc_hook+452>:    ldp     x19, x20, [sp, #16]
   0xffff80000862e4e0 <split_map_pages+344>:    cbnz    w22, 0xffff80000862e57c <split_map_pages+500>
   0xffff8000086370c4 <isolate_freepages_range+556>:    mov     x0, x27
   0xffff8000086bc1cc <alloc_contig_range+808>: mov     x24, x0
   0xffff80000877d6d8 <cma_alloc+772>:  mov     w1, w0
   0xffff8000082c8d18 <dma_alloc_from_contiguous+104>:  ldr     x19, [sp, #16]
   0xffff8000082ce0e8 <atomic_pool_expand+208>: mov     x19, x0
   0xffff80000c1e41b4 <__dma_atomic_pool_init+172>:     Cannot access memory at address 0xffff80000c1e41b4
   0xffff80000c1e4298 <dma_atomic_pool_init+92>:        Cannot access memory at address 0xffff80000c1e4298
   0xffff8000080161d4 <do_one_initcall+176>:    mov     w21, w0
   0xffff80000c1c1b50 <kernel_init_freeable+952>:       Cannot access memory at address 0xffff80000c1c1b50
   0xffff80000acf87dc <kernel_init+36>: bl      0xffff8000081ab100 <async_synchronize_full>
   0xffff800008018d00 <ret_from_fork+16>:       mrs     x28, sp_el0
page last free stack trace:
   0xffff8000086a6e8c <free_unref_page_prepare+796>:    mov     w2, w23
   0xffff8000086aee1c <free_unref_page+96>:     tst     w0, #0xff
   0xffff8000086af3f8 <__free_pages+292>:       ldp     x19, x20, [sp, #16]
   0xffff80000c1f3214 <init_cma_reserved_pageblock+220>:        Cannot access memory at address 0xffff80000c1f3214
   0xffff80000c20363c <cma_init_reserved_areas+1284>:   Cannot access memory at address 0xffff80000c20363c
   0xffff8000080161d4 <do_one_initcall+176>:    mov     w21, w0
   0xffff80000c1c1b50 <kernel_init_freeable+952>:       Cannot access memory at address 0xffff80000c1c1b50
   0xffff80000acf87dc <kernel_init+36>: bl      0xffff8000081ab100 <async_synchronize_full>
   0xffff800008018d00 <ret_from_fork+16>:       mrs     x28, sp_el0

Link: https://lkml.kernel.org/r/20230808083020.22254-7-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscripts/gdb/stackdepot: add stackdepot support
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:15 +0000 (16:30 +0800)]
scripts/gdb/stackdepot: add stackdepot support

Add support for printing the backtrace of stackdepot handle.

This is the preparation patch for dumping page_owner,
slabtrace usage.

Link: https://lkml.kernel.org/r/20230808083020.22254-6-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscripts/gdb/aarch64: add aarch64 page operation helper commands and configs
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:14 +0000 (16:30 +0800)]
scripts/gdb/aarch64: add aarch64 page operation helper commands and configs

1. Move page table debugging from mm.py to pgtable.py.

2. Add aarch64 kernel config and memory constants value.

3. Add below aarch64 page operation helper commands.
   page_to_pfn, page_to_phys, pfn_to_page, page_address,
   virt_to_phys, sym_to_pfn, pfn_to_kaddr, virt_to_page.

4. Only support CONFIG_SPARSEMEM_VMEMMAP=y now.

Link: https://lkml.kernel.org/r/20230808083020.22254-5-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscripts/gdb/utils: add common type usage
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:13 +0000 (16:30 +0800)]
scripts/gdb/utils: add common type usage

Since we often use 'unsigned long', 'size_t', 'usigned int'
and 'struct page', we add these common types to utils.

Link: https://lkml.kernel.org/r/20230808083020.22254-4-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscripts/gdb/modules: add get module text support
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:12 +0000 (16:30 +0800)]
scripts/gdb/modules: add get module text support

When we get an text address from coredump and we cannot find
this address in vmlinux, it might located in kernel module.

We want to know which kernel module it located in.

This GDB scripts can help us to find the target kernel module.

(gdb) lx-getmod-by-textaddr 0xffff800002d305ac
0xffff800002d305ac is in kasan_test.ko

Link: https://lkml.kernel.org/r/20230808083020.22254-3-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agoscripts/gdb/symbols: add specific ko module load command
Kuan-Ying Lee [Tue, 8 Aug 2023 08:30:11 +0000 (16:30 +0800)]
scripts/gdb/symbols: add specific ko module load command

Patch series "Add GDB memory helper commands", v2.

I've created some GDB commands I think useful when I debug some memory
issues and kernel module issue.

For memory issue, we would like to get slabinfo, slabtrace, page_owner and
vmallocinfo to debug the memory issues.

For module issue, we would like to query kernel module name when we get a
module text address and load module symbol by specific path.

Patch 1-2:
 - Add kernel module related command.
Patch 3-5:
 - Prepares for the memory-related command.
Patch 6-8:
 - Add memory-related commands.

This patch (of 8):

Add lx-symbols <ko_path> command to support add specific
ko module.

Example output like below:
(gdb) lx-symbols mm/kasan/kasan_test.ko
loading @0xffff800002d30000: mm/kasan/kasan_test.ko

Link: https://lkml.kernel.org/r/20230808083020.22254-1-Kuan-Ying.Lee@mediatek.com
Link: https://lkml.kernel.org/r/20230808083020.22254-2-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agocheckpatch: reword long-line warning about commit-msg
Jim Cromie [Tue, 8 Aug 2023 03:30:19 +0000 (21:30 -0600)]
checkpatch: reword long-line warning about commit-msg

Reword the warning to complain about line length 1st, since thats
whats actually tested.

Link: https://lkml.kernel.org/r/20230808033019.21911-3-jim.cromie@gmail.com
Cc: apw@canonical.com
Cc: joe@perches.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 months agocheckpatch: special case extern struct in .c
Jim Cromie [Tue, 8 Aug 2023 03:30:18 +0000 (21:30 -0600)]
checkpatch: special case extern struct in .c

"externs should be avoided in .c files" needs an exception for linker
symbols, like those that mark the start, stop of many kernel sections.

Since checkpatch already checks REALNAME to avoid looking at fragments
changing vmlinux.lds.h, add a new else-if block to look at them
instead.  As a simple heuristic, treat all words (in the patch-line)
as possible symbols, to screen later warnings.

For my test case, the possible-symbols included BOUNDED_BY (a macro),
which is extra, but not troublesome - these are just to screen
WARNINGS that might be issued on later fragments (changing .c files)

Where the WARN is issued, precede it with an else-if block to catch
one common extern-in-c use case: "extern struct foo bar[]".  Here we
can at least issue a softer warning, after checking for a match with a
maybe-linker-symbol parsed earlier from the patch.

Though heuristic, it worked for my test-case, allowing both start__,
stop__ $symbol's (wo the prefixes specifically named).  I've coded it
narrowly, it can be expanded later to cover any other expressions.

It does require that the externs in .c's have the additions to
vmlinux.lds.h in the same patch.  And requires vmlinux.lds.h before .c
fragments.

Link: https://lkml.kernel.org/r/20230808033019.21911-2-jim.cromie@gmail.com
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agox86/kernel: increase kcov coverage under arch/x86/kernel folder
Pengfei Xu [Mon, 31 Jul 2023 03:04:18 +0000 (11:04 +0800)]
x86/kernel: increase kcov coverage under arch/x86/kernel folder

Currently kcov instrument is disabled for object files under
arch/x86/kernel folder.

For object files under arch/x86/kernel, actually just disabling the kcov
instrument of files:"head32.o or head64.o and sev.o" could achieve
successful booting and provide kcov coverage for object files that do not
disable kcov instrument.  The additional kcov coverage collected from
arch/x86/kernel folder helps kernel fuzzing efforts to find bugs.

Link to related improvement discussion is below:
https://groups.google.com/g/syzkaller/c/Dsl-RYGCqs8/m/x-tfpTyFBAAJ Related
ticket is as follow: https://bugzilla.kernel.org/show_bug.cgi?id=198443

Link: https://lkml.kernel.org/r/06c0bb7b5f61e5884bf31180e8c122648c752010.1690771380.git.pengfei.xu@intel.com
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Pengfei Xu <pengfei.xu@intel.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: <heng.su@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kees Cook <keescook@google.com>,
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agofs: ocfs2: namei: check return value of ocfs2_add_entry()
Artem Chernyshev [Thu, 3 Aug 2023 14:54:17 +0000 (17:54 +0300)]
fs: ocfs2: namei: check return value of ocfs2_add_entry()

Process result of ocfs2_add_entry() in case we have an error
value.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lkml.kernel.org/r/20230803145417.177649-1-artem.chernyshev@red-soft.ru
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Kurt Hackel <kurt.hackel@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agowatchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()
Douglas Anderson [Fri, 4 Aug 2023 14:00:43 +0000 (07:00 -0700)]
watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()

After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started storing a `struct cpumask` on the
stack in watchdog_hardlockup_check().  On systems with CONFIG_NR_CPUS set
to 8192 this takes up 1K on the stack.  That triggers warnings with
`CONFIG_FRAME_WARN` set to 1024.

We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to
use a CPU mask at all.

Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid
Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agonmi_backtrace: allow excluding an arbitrary CPU
Douglas Anderson [Fri, 4 Aug 2023 14:00:42 +0000 (07:00 -0700)]
nmi_backtrace: allow excluding an arbitrary CPU

The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU.  This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.

Let's extend the API to take a CPU ID to exclude instead of just a
boolean.  This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.

Arguably, this new API also encourages safer behavior.  Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.

[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub]
Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agorange.h: Move resource API and constant to respective files
Andy Shevchenko [Fri, 4 Aug 2023 06:46:36 +0000 (09:46 +0300)]
range.h: Move resource API and constant to respective files

range.h works with struct range data type. The resource_size_t
is an alien here.

(1) Move cap_resource() implementation into its only user, and
(2) rename and move RESOURCE_SIZE_MAX to limits.h.

Link: https://lkml.kernel.org/r/20230804064636.15368-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agokthread: unexport __kthread_should_park()
Greg Kroah-Hartman [Fri, 4 Aug 2023 06:41:51 +0000 (08:41 +0200)]
kthread: unexport __kthread_should_park()

There are no in-kernel users of __kthread_should_park() so mark it as
static and do not export it.

Link: https://lkml.kernel.org/r/2023080450-handcuff-stump-1d6e@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: John Stultz <jstultz@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: "Arve Hjønnevåg" <arve@android.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Zqiang <qiang1.zhang@intel.com>
Cc: Prathu Baronia <quic_pbaronia@quicinc.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoefs: clean up -Wunused-const-variable= warning
Zhu Wang [Thu, 3 Aug 2023 01:51:03 +0000 (09:51 +0800)]
efs: clean up -Wunused-const-variable= warning

When building with W=1, the following warning occurs.

In file included from fs/efs/super.c:18:0:
fs/efs/efs.h:22:19: warning: `cprt' defined but not used [-Wunused-const-variable=]
 static const char cprt[] = "EFS: "EFS_VERSION" - (c) 1999 Al Smith
<Al.Smith@aeschi.ch.eu.org>";
                   ^~~~
The 'cprt' is not used in any files, we move the copyright statement
into the comment.

Link: https://lkml.kernel.org/r/20230803015103.192985-1-wangzhu9@huawei.com
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Cc: Al Smith <Al.Smith@aeschi.ch.eu.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agodrm/i915: Move abs_diff() to math.h
Andy Shevchenko [Thu, 3 Aug 2023 13:19:18 +0000 (16:19 +0300)]
drm/i915: Move abs_diff() to math.h

abs_diff() belongs to math.h.  Move it there.  This will allow others to
use it.

[andriy.shevchenko@linux.intel.com: add abs_diff() documentation]
Link: https://lkml.kernel.org/r/20230804050934.83223-1-andriy.shevchenko@linux.intel.com
[akpm@linux-foundation.org: fix comment, per Randy]
Link: https://lkml.kernel.org/r/20230803131918.53727-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org> # tty/serial
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> # gpu/ipu-v3
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoocfs2: cluster: fix potential deadlock on &o2net_debug_lock
Chengfeng Ye [Wed, 2 Aug 2023 13:14:36 +0000 (13:14 +0000)]
ocfs2: cluster: fix potential deadlock on &o2net_debug_lock

&o2net_debug_lock is acquired by timer o2net_idle_timer() along the
following call chain.  Thus the acquisition of the lock under process
context should disable bottom half, otherwise deadlock could happen if the
timer happens to preempt the execution while the lock is held in process
context on the same CPU.

<timer interrupt>
        -> o2net_idle_timer()
        -> queue_delayed_work()
        -> sc_put()
        -> sc_kref_release()
        -> o2net_debug_del_sc()
        -> spin_lock(&o2net_debug_lock);

Several lock acquisition of &o2net_debug_lock under process context do not
disable irq or bottom half.  The patch fixes these potential deadlocks
scenerio by using spin_lock_bh() on &o2net_debug_lock.

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.  x86_64 allmodconfig using gcc shows
no new warning.

Link: https://lkml.kernel.org/r/20230802131436.17765-1-dg573847474@gmail.com
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoocfs2: cluster: fix potential deadlock on &qs->qs_lock
Chengfeng Ye [Wed, 2 Aug 2023 12:38:24 +0000 (12:38 +0000)]
ocfs2: cluster: fix potential deadlock on &qs->qs_lock

&qs->qs_lock is acquired by timer o2net_idle_timer() along the following
call chain.  Thus the acquisition of the lock under process context should
disable bottom half, otherwise deadlock could happen if the timer happens
to preempt the execution while the lock is held in process context on the
same CPU.

<timer interrupt>
        -> o2net_idle_timer()
        -> o2quo_conn_err()
        -> spin_lock(&qs->qs_lock)

Several lock acquisition of &qs->qs_lock under process contex do not
disable irq or bottom half.  The patch fixes these potential deadlocks
scenerio by using spin_lock_bh() on &qs->qs_lock.

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.  x86_64 allmodconfig using gcc shows
no new warning.

Link: https://lkml.kernel.org/r/20230802123824.15301-1-dg573847474@gmail.com
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoscripts/gdb: fix 'lx-lsmod' show the wrong size
Kuan-Ying Lee [Mon, 10 Jul 2023 09:28:46 +0000 (17:28 +0800)]
scripts/gdb: fix 'lx-lsmod' show the wrong size

'lsmod' shows total core layout size, so we need to sum up all the
sections in core layout in gdb scripts.

/ # lsmod
kasan_test 200704 0 - Live 0xffff80007f640000

Before patch:
(gdb) lx-lsmod
Address            Module                  Size  Used by
0xffff80007f640000 kasan_test             36864  0

After patch:
(gdb) lx-lsmod
Address            Module                  Size  Used by
0xffff80007f640000 kasan_test            200704  0

Link: https://lkml.kernel.org/r/20230710092852.31049-1-Kuan-Ying.Lee@mediatek.com
Fixes: b4aff7513df3 ("scripts/gdb: use mem instead of core_layout to get the module address")
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agolib/bch.c: use bitrev instead of internal logic
John Sanpe [Sun, 30 Jul 2023 08:17:17 +0000 (16:17 +0800)]
lib/bch.c: use bitrev instead of internal logic

Replace internal logic with separate bitrev library.

Link: https://lkml.kernel.org/r/20230730081717.1498217-1-sanpeqf@gmail.com
Signed-off-by: John Sanpe <sanpeqf@gmail.com>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoscripts/gdb: fix lx-symbols command for arm64 LLVM
Koudai Iwahori [Tue, 1 Aug 2023 12:10:52 +0000 (05:10 -0700)]
scripts/gdb: fix lx-symbols command for arm64 LLVM

lx-symbols assumes that module's .text sections is located at
`module->mem[MOD_TEXT].base` and passes it to add-symbol-file command.
However, .text section follows after .plt section in modules built by LLVM
toolchain for arm64 target.  Symbol addresses are skewed in GDB.

Fix this issue by using the address of .text section stored in
`module->sect_attrs`.

Link: https://lkml.kernel.org/r/20230801121052.2475183-1-koudai@google.com
Signed-off-by: Koudai Iwahori <koudai@google.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agogcov: shut up missing prototype warnings for internal stubs
Arnd Bergmann [Tue, 25 Jul 2023 12:23:38 +0000 (14:23 +0200)]
gcov: shut up missing prototype warnings for internal stubs

gcov uses global functions that are called from generated code, but these
have no prototype in a header, which causes a W=1 build warning:

kernel/gcov/gcc_base.c:12:6: error: no previous prototype for '__gcov_init' [-Werror=missing-prototypes]
kernel/gcov/gcc_base.c:40:6: error: no previous prototype for '__gcov_flush' [-Werror=missing-prototypes]
kernel/gcov/gcc_base.c:46:6: error: no previous prototype for '__gcov_merge_add' [-Werror=missing-prototypes]
kernel/gcov/gcc_base.c:52:6: error: no previous prototype for '__gcov_merge_single' [-Werror=missing-prototypes]

Just turn off these warnings unconditionally for the two files that
contain them.

Link: https://lore.kernel.org/all/0820010f-e9dc-779d-7924-49c7df446bce@linux.ibm.com/
Link: https://lkml.kernel.org/r/20230725123042.2269077-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoocfs2: use regular seq_show_option for osb_cluster_stack
Kees Cook [Wed, 26 Jul 2023 21:59:22 +0000 (14:59 -0700)]
ocfs2: use regular seq_show_option for osb_cluster_stack

While cleaning up seq_show_option_n()'s use of strncpy, it was noticed
that the osb_cluster_stack member is always NUL-terminated, so there is no
need to use the special seq_show_option_n() routine.  Replace it with the
standard seq_show_option() routine.

Link: https://lkml.kernel.org/r/20230726215919.never.127-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoocfs2: Use struct_size()
Christophe JAILLET [Sun, 16 Jul 2023 18:48:57 +0000 (20:48 +0200)]
ocfs2: Use struct_size()

Use struct_size() instead of hand-writing it, when allocating a structure
with a flex array.

This is less verbose.

Link: https://lkml.kernel.org/r/9d99ea2090739f816d0dc0c4ebaa42b26fc48a9e.1689533270.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoocfs2: use flexible array in 'struct ocfs2_recovery_map'
Christophe JAILLET [Sun, 16 Jul 2023 18:48:56 +0000 (20:48 +0200)]
ocfs2: use flexible array in 'struct ocfs2_recovery_map'

Turn 'rm_entries' in 'struct ocfs2_recovery_map' into a flexible array.

The advantages are:
   - save the size of a pointer when the new undo structure is allocated
   - avoid some always ugly pointer arithmetic to get the address of
    'rm_entries'
   - avoid an indirection when the array is accessed

While at it, use struct_size() to compute the size of the new undo
structure.

Link: https://lkml.kernel.org/r/c645911ffd2720fce5e344c17de642518cd0db52.1689533270.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agogenetlink: replace custom CONCATENATE() implementation
Andy Shevchenko [Tue, 18 Jul 2023 21:11:47 +0000 (00:11 +0300)]
genetlink: replace custom CONCATENATE() implementation

Replace custom implementation of the macros from args.h.

Link: https://lkml.kernel.org/r/20230718211147.18647-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoarm64: smccc: replace custom COUNT_ARGS() & CONCATENATE() implementations
Andy Shevchenko [Tue, 18 Jul 2023 21:11:46 +0000 (00:11 +0300)]
arm64: smccc: replace custom COUNT_ARGS() & CONCATENATE() implementations

Replace custom implementation of the macros from args.h.

Link: https://lkml.kernel.org/r/20230718211147.18647-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gow <davidgow@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agox86/asm: replace custom COUNT_ARGS() & CONCATENATE() implementations
Andy Shevchenko [Tue, 18 Jul 2023 21:11:45 +0000 (00:11 +0300)]
x86/asm: replace custom COUNT_ARGS() & CONCATENATE() implementations

Replace custom implementation of the macros from args.h.

Link: https://lkml.kernel.org/r/20230718211147.18647-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gow <davidgow@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agokernel.h: split out COUNT_ARGS() and CONCATENATE() to args.h
Andy Shevchenko [Tue, 18 Jul 2023 21:11:44 +0000 (00:11 +0300)]
kernel.h: split out COUNT_ARGS() and CONCATENATE() to args.h

Patch series "kernel.h: Split out a couple of macros to args.h", v4.

There are macros in kernel.h that can be used outside of that header.
Split them to args.h and replace open coded variants.

This patch (of 4):

kernel.h is being used as a dump for all kinds of stuff for a long time.
The COUNT_ARGS() and CONCATENATE() macros may be used in some places
without need of the full kernel.h dependency train with it.

Here is the attempt on cleaning it up by splitting out these macros().

While at it, include new header where it's being used.

Link: https://lkml.kernel.org/r/20230718211147.18647-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20230718211147.18647-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [PCI]
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gow <davidgow@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoarch: enable HAS_LTO_CLANG with KASAN and KCOV
Jakob Koschel [Tue, 18 Jul 2023 22:29:12 +0000 (00:29 +0200)]
arch: enable HAS_LTO_CLANG with KASAN and KCOV

Both KASAN and KCOV had issues with LTO_CLANG if DEBUG_INFO is enabled.
With LTO inlinable function calls are required to have debug info if they
are inlined into a function that has debug info.

Starting with LLVM 17 this will be fixed ([1],[2]) and enabling LTO with
KASAN/KCOV and DEBUG_INFO doesn't cause linker errors anymore.

Link: https://github.com/llvm/llvm-project/commit/913f7e93dac67ecff47bade862ba42f27cb68ca9
Link: https://github.com/llvm/llvm-project/commit/4a8b1249306ff11f229320abdeadf0c215a00400
Link: https://lkml.kernel.org/r/20230717-enable-kasan-lto1-v3-1-650e1efc19d1@gmail.com
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jakob Koschel <jkl820.git@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agofs: hfsplus: make extend error rate limited
Colin Ian King [Wed, 19 Jul 2023 12:17:35 +0000 (13:17 +0100)]
fs: hfsplus: make extend error rate limited

Extending a file where there is not enough free space can trigger frequent
extend alloc file error messages and this can easily spam the kernel log.
Make the error message rate limited.

Link: https://lkml.kernel.org/r/20230719121735.2831164-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agolib: error-inject: remove error checking for debugfs_create_dir()
Wang Ming [Wed, 19 Jul 2023 14:44:10 +0000 (14:44 +0000)]
lib: error-inject: remove error checking for debugfs_create_dir()

It is expected that most callers should _ignore_ the errors return by
debugfs_create_dir() in ei_debugfs_init().

Link: https://lkml.kernel.org/r/20230719144355.6720-1-machel@vivo.com
Signed-off-by: Wang Ming <machel@vivo.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agolib: remove error checking for debugfs_create_dir()
Wang Ming [Thu, 13 Jul 2023 08:24:43 +0000 (16:24 +0800)]
lib: remove error checking for debugfs_create_dir()

It is expected that most callers should _ignore_ the errors return by
debugfs_create_dir() in err_inject_init().

Link: https://lkml.kernel.org/r/20230713082455.2415-1-machel@vivo.com
Signed-off-by: Wang Ming <machel@vivo.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agokernel: relay: remove unnecessary NULL values from relay_open_buf
Li kunyu [Thu, 13 Jul 2023 23:44:59 +0000 (07:44 +0800)]
kernel: relay: remove unnecessary NULL values from relay_open_buf

buf is assigned first, so it does not need to initialize the assignment.

Link: https://lkml.kernel.org/r/20230713234459.2908-1-kunyu@nfschina.com
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Reviewed-by: Andrew Morton <akpm@linux-foudation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoremove ARCH_DEFAULT_KEXEC from Kconfig.kexec
Eric DeVolder [Wed, 2 Aug 2023 16:17:50 +0000 (12:17 -0400)]
remove ARCH_DEFAULT_KEXEC from Kconfig.kexec

This patch is a minor cleanup to the series "refactor Kconfig to
consolidate KEXEC and CRASH options".

In that series, a new option ARCH_DEFAULT_KEXEC was introduced in order to
obtain the equivalent behavior of s390 original Kconfig settings for
KEXEC.  As it turns out, this new option did not fully provide the
equivalent behavior, rather a "select KEXEC" did.

As such, the ARCH_DEFAULT_KEXEC is not needed anymore, so remove it.

Link: https://lkml.kernel.org/r/20230802161750.2215-1-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agokexec: rename ARCH_HAS_KEXEC_PURGATORY
Eric DeVolder [Wed, 12 Jul 2023 16:15:45 +0000 (12:15 -0400)]
kexec: rename ARCH_HAS_KEXEC_PURGATORY

The Kconfig refactor to consolidate KEXEC and CRASH options utilized
option names of the form ARCH_SUPPORTS_<option>. Thus rename the
ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow
the same.

Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agosh/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:44 +0000 (12:15 -0400)]
sh/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-14-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agos390/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:43 +0000 (12:15 -0400)]
s390/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-13-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoriscv/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:42 +0000 (12:15 -0400)]
riscv/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-12-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agopowerpc/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:41 +0000 (12:15 -0400)]
powerpc/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-11-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoparisc/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:40 +0000 (12:15 -0400)]
parisc/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-10-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agomips/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:39 +0000 (12:15 -0400)]
mips/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-9-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agom68k/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:38 +0000 (12:15 -0400)]
m68k/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-8-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoloongarch/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:37 +0000 (12:15 -0400)]
loongarch/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-7-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoarm64/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:36 +0000 (12:15 -0400)]
arm64/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-6-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoia64/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:35 +0000 (12:15 -0400)]
ia64/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-5-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoarm/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:34 +0000 (12:15 -0400)]
arm/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-4-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agox86/kexec: refactor for kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:33 +0000 (12:15 -0400)]
x86/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-3-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agokexec: consolidate kexec and crash options into kernel/Kconfig.kexec
Eric DeVolder [Wed, 12 Jul 2023 16:15:32 +0000 (12:15 -0400)]
kexec: consolidate kexec and crash options into kernel/Kconfig.kexec

Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6.

The Kconfig is refactored to consolidate KEXEC and CRASH options from
various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec.

The Kconfig.kexec is now a submenu titled "Kexec and crash features"
located under "General Setup".

The following options are impacted:

 - KEXEC
 - KEXEC_FILE
 - KEXEC_SIG
 - KEXEC_SIG_FORCE
 - KEXEC_IMAGE_VERIFY_SIG
 - KEXEC_BZIMAGE_VERIFY_SIG
 - KEXEC_JUMP
 - CRASH_DUMP

Over time, these options have been copied between Kconfig files and
are very similar to one another, but with slight differences.

The following architectures are impacted by the refactor (because of
use of one or more KEXEC/CRASH options):

 - arm
 - arm64
 - ia64
 - loongarch
 - m68k
 - mips
 - parisc
 - powerpc
 - riscv
 - s390
 - sh
 - x86

More information:

In the patch series "crash: Kernel handling of CPU and memory hot
un/plug"

 https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devolder@oracle.com/

the new kernel feature introduces the config option CRASH_HOTPLUG.

In reviewing, Thomas Gleixner requested that the new config option
not be placed in x86 Kconfig. Rather the option needs a generic/common
home. To Thomas' point, the KEXEC and CRASH options have largely been
duplicated in the various arch/<arch>/Kconfig files, with minor
differences. This kind of proliferation is to be avoid/stopped.

 https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/

To that end, I have refactored the arch Kconfigs so as to consolidate
the various KEXEC and CRASH options. Generally speaking, this work has
the following themes:

- KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec
  - These items from arch/Kconfig:
      CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC
  - These items from arch/x86/Kconfig form the common options:
      KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE
      KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP
  - These items from arch/arm64/Kconfig form the common options:
      KEXEC_IMAGE_VERIFY_SIG
  - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec
- The Kconfig.kexec is now a submenu titled "Kexec and crash features"
  and is now listed in "General Setup" submenu from init/Kconfig.
- To control the common options, each has a new ARCH_SUPPORTS_<option>
  option. These gateway options determine whether the common options
  options are valid for the architecture.
- To account for the slight differences in the original architecture
  coding of the common options, each now has a corresponding
  ARCH_SELECTS_<option> which are used to elicit the same side effects
  as the original arch/<arch>/Kconfig files for KEXEC and CRASH options.

An example, 'make menuconfig' illustrating the submenu:

  > General setup > Kexec and crash features
  [*] Enable kexec system call
  [*] Enable kexec file based system call
  [*]   Verify kernel signature during kexec_file_load() syscall
  [ ]     Require a valid signature in kexec_file_load() syscall
  [ ]     Enable bzImage signature verification support
  [*] kexec jump
  [*] kernel crash dumps
  [*]   Update the crash elfcorehdr on system configuration changes

In the process of consolidating the common options, I encountered
slight differences in the coding of these options in several of the
architectures. As a result, I settled on the following solution:

- Each of the common options has a 'depends on ARCH_SUPPORTS_<option>'
  statement. For example, the KEXEC_FILE option has a 'depends on
  ARCH_SUPPORTS_KEXEC_FILE' statement.

  This approach is needed on all common options so as to prevent
  options from appearing for architectures which previously did
  not allow/enable them. For example, arm supports KEXEC but not
  KEXEC_FILE. The arch/arm/Kconfig does not provide
  ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options
  are not available to arm.

- The boolean ARCH_SUPPORTS_<option> in effect allows the arch to
  determine when the feature is allowed.  Archs which don't have the
  feature simply do not provide the corresponding ARCH_SUPPORTS_<option>.
  For each arch, where there previously were KEXEC and/or CRASH
  options, these have been replaced with the corresponding boolean
  ARCH_SUPPORTS_<option>, and an appropriate def_bool statement.

  For example, if the arch supports KEXEC_FILE, then the
  ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits
  the KEXEC_FILE option to be available.

  If the arch has a 'depends on' statement in its original coding
  of the option, then that expression becomes part of the def_bool
  expression. For example, arm64 had:

  config KEXEC
    depends on PM_SLEEP_SMP

  and in this solution, this converts to:

  config ARCH_SUPPORTS_KEXEC
    def_bool PM_SLEEP_SMP

- In order to account for the architecture differences in the
  coding for the common options, the ARCH_SELECTS_<option> in the
  arch/<arch>/Kconfig is used. This option has a 'depends on
  <option>' statement to couple it to the main option, and from
  there can insert the differences from the common option and the
  arch original coding of that option.

  For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
  KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and
  'select CRYPTO' and 'select CRYPTO_SHA256' statements.

Illustrating the option relationships:

For each of the common KEXEC and CRASH options:
 ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option>

 <option>                   # in Kconfig.kexec
 ARCH_SUPPORTS_<option>     # in arch/<arch>/Kconfig, as needed
 ARCH_SELECTS_<option>      # in arch/<arch>/Kconfig, as needed

For example, KEXEC:
 ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC

 KEXEC                      # in Kconfig.kexec
 ARCH_SUPPORTS_KEXEC        # in arch/<arch>/Kconfig, as needed
 ARCH_SELECTS_KEXEC         # in arch/<arch>/Kconfig, as needed

To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
select statements).

Examples:
A few examples to show the new strategy in action:

===== x86 (minus the help section) =====
Original:
 config KEXEC
    bool "kexec system call"
    select KEXEC_CORE

 config KEXEC_FILE
    bool "kexec file based system call"
    select KEXEC_CORE
    select HAVE_IMA_KEXEC if IMA
    depends on X86_64
    depends on CRYPTO=y
    depends on CRYPTO_SHA256=y

 config ARCH_HAS_KEXEC_PURGATORY
    def_bool KEXEC_FILE

 config KEXEC_SIG
    bool "Verify kernel signature during kexec_file_load() syscall"
    depends on KEXEC_FILE

 config KEXEC_SIG_FORCE
    bool "Require a valid signature in kexec_file_load() syscall"
    depends on KEXEC_SIG

 config KEXEC_BZIMAGE_VERIFY_SIG
    bool "Enable bzImage signature verification support"
    depends on KEXEC_SIG
    depends on SIGNED_PE_FILE_VERIFICATION
    select SYSTEM_TRUSTED_KEYRING

 config CRASH_DUMP
    bool "kernel crash dumps"
    depends on X86_64 || (X86_32 && HIGHMEM)

 config KEXEC_JUMP
    bool "kexec jump"
    depends on KEXEC && HIBERNATION
    help

becomes...
New:
config ARCH_SUPPORTS_KEXEC
    def_bool y

config ARCH_SUPPORTS_KEXEC_FILE
    def_bool X86_64 && CRYPTO && CRYPTO_SHA256

config ARCH_SELECTS_KEXEC_FILE
    def_bool y
    depends on KEXEC_FILE
    select HAVE_IMA_KEXEC if IMA

config ARCH_SUPPORTS_KEXEC_PURGATORY
    def_bool KEXEC_FILE

config ARCH_SUPPORTS_KEXEC_SIG
    def_bool y

config ARCH_SUPPORTS_KEXEC_SIG_FORCE
    def_bool y

config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
    def_bool y

config ARCH_SUPPORTS_KEXEC_JUMP
    def_bool y

config ARCH_SUPPORTS_CRASH_DUMP
    def_bool X86_64 || (X86_32 && HIGHMEM)

===== powerpc (minus the help section) =====
Original:
 config KEXEC
    bool "kexec system call"
    depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
    select KEXEC_CORE

 config KEXEC_FILE
    bool "kexec file based system call"
    select KEXEC_CORE
    select HAVE_IMA_KEXEC if IMA
    select KEXEC_ELF
    depends on PPC64
    depends on CRYPTO=y
    depends on CRYPTO_SHA256=y

 config ARCH_HAS_KEXEC_PURGATORY
    def_bool KEXEC_FILE

 config CRASH_DUMP
    bool "Build a dump capture kernel"
    depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
    select RELOCATABLE if PPC64 || 44x || PPC_85xx

becomes...
New:
config ARCH_SUPPORTS_KEXEC
    def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)

config ARCH_SUPPORTS_KEXEC_FILE
    def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y

config ARCH_SUPPORTS_KEXEC_PURGATORY
    def_bool KEXEC_FILE

config ARCH_SELECTS_KEXEC_FILE
    def_bool y
    depends on KEXEC_FILE
    select KEXEC_ELF
    select HAVE_IMA_KEXEC if IMA

config ARCH_SUPPORTS_CRASH_DUMP
    def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)

config ARCH_SELECTS_CRASH_DUMP
    def_bool y
    depends on CRASH_DUMP
    select RELOCATABLE if PPC64 || 44x || PPC_85xx

Testing Approach and Results

There are 388 config files in the arch/<arch>/configs directories.
For each of these config files, a .config is generated both before and
after this Kconfig series, and checked for equivalence. This approach
allows for a rather rapid check of all architectures and a wide
variety of configs wrt/ KEXEC and CRASH, and avoids requiring
compiling for all architectures and running kernels and run-time
testing.

For each config file, the olddefconfig, allnoconfig and allyesconfig
targets are utilized. In testing the randconfig has revealed problems
as well, but is not used in the before and after equivalence check
since one can not generate the "same" .config for before and after,
even if using the same KCONFIG_SEED since the option list is
different.

As such, the following script steps compare the before and after
of 'make olddefconfig'. The new symbols introduced by this series
are filtered out, but otherwise the config files are PASS only if
they were equivalent, and FAIL otherwise.

The script performs the test by doing the following:

 # Obtain the "golden" .config output for given config file
 # Reset test sandbox
 git checkout master
 git branch -D test_Kconfig
 git checkout -B test_Kconfig master
 make distclean
 # Write out updated config
 cp -f <config file> .config
 make ARCH=<arch> olddefconfig
 # Track each item in .config, LHSB is "golden"
 scoreboard .config

 # Obtain the "changed" .config output for given config file
 # Reset test sandbox
 make distclean
 # Apply this Kconfig series
 git am <this Kconfig series>
 # Write out updated config
 cp -f <config file> .config
 make ARCH=<arch> olddefconfig
 # Track each item in .config, RHSB is "changed"
 scoreboard .config

 # Determine test result
 # Filter-out new symbols introduced by this series
 # Filter-out symbol=n which not in either scoreboard
 # Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL

The script was instrumental during the refactoring of Kconfig as it
continually revealed problems. The end result being that the solution
presented in this series passes all configs as checked by the script,
with the following exceptions:

- arch/ia64/configs/zx1_config with olddefconfig
  This config file has:
  # CONFIG_KEXEC is not set
  CONFIG_CRASH_DUMP=y
  and this refactor now couples KEXEC to CRASH_DUMP, so it is not
  possible to enable CRASH_DUMP without KEXEC.

- arch/sh/configs/* with allyesconfig
  The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU
  (which clearly is not meant to be set). This symbol is not provided
  but with the allyesconfig it is set to yes which enables CRASH_DUMP.
  But KEXEC is coded as dependent upon MMU, and is set to no in
  arch/sh/mm/Kconfig, so KEXEC is not enabled.
  This refactor now couples KEXEC to CRASH_DUMP, so it is not
  possible to enable CRASH_DUMP without KEXEC.

While the above exceptions are not equivalent to their original,
the config file produced is valid (and in fact better wrt/ CRASH_DUMP
handling).

This patch (of 14)

The config options for kexec and crash features are consolidated
into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
is a new submenu "Kexec and crash handling". All the kexec and
crash options that were once in the arch-dependent submenu "Processor
type and features" are now consolidated in the new submenu.

The following options are impacted:

 - KEXEC
 - KEXEC_FILE
 - KEXEC_SIG
 - KEXEC_SIG_FORCE
 - KEXEC_BZIMAGE_VERIFY_SIG
 - KEXEC_JUMP
 - CRASH_DUMP

The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.

Architectures specify support of certain KEXEC and CRASH features with
similarly named new ARCH_SUPPORTS_<option> config options.

Architectures can utilize the new ARCH_SELECTS_<option> config
options to specify additional components when <option> is enabled.

To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
select statements).

Link: https://lkml.kernel.org/r/20230712161545.87870-1-eric.devolder@oracle.com
Link: https://lkml.kernel.org/r/20230712161545.87870-2-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cc. "H. Peter Anvin" <hpa@zytor.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com> # for x86
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Juerg Haefliger <juerg.haefliger@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Marc Aurèle La France <tsi@tuyoix.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Xin Li <xin3.li@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoirqchip/al-fic: make AL_FIC depend on HAS_IOMEM
Baoquan He [Fri, 7 Jul 2023 13:58:50 +0000 (21:58 +0800)]
irqchip/al-fic: make AL_FIC depend on HAS_IOMEM

On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common than
PCI devices.  Hence it could have a fully functional s390 kernel with
CONFIG_PCI=n, then the relevant iomem mapping functions [including
ioremap(), devm_ioremap(), etc.] are not available.

Here let AL_FIC depend on HAS_IOMEM so that it won't be built
to cause below compiling error if PCI is unset:

------
ld: drivers/irqchip/irq-al-fic.o: in function `al_fic_init_dt':
irq-al-fic.c:(.init.text+0x76): undefined reference to `of_iomap'
ld: irq-al-fic.c:(.init.text+0x4ce): undefined reference to `iounmap'
------

Link: https://lkml.kernel.org/r/20230707135852.24292-7-bhe@redhat.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agonet: altera-tse: make ALTERA_TSE depend on HAS_IOMEM
Baoquan He [Fri, 7 Jul 2023 13:58:49 +0000 (21:58 +0800)]
net: altera-tse: make ALTERA_TSE depend on HAS_IOMEM

On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common than
PCI devices.  Hence it could have a fully functional s390 kernel with
CONFIG_PCI=n, then the relevant iomem mapping functions [including
ioremap(), devm_ioremap(), etc.] are not available.

Here let ALTERA_TSE depend on HAS_IOMEM so that it won't be built to cause
below compiling error if PCI is unset:

------
ERROR: modpost: "devm_ioremap" [drivers/net/ethernet/altera/altera_tse.ko] undefined!
------

Link: https://lkml.kernel.org/r/20230707135852.24292-6-bhe@redhat.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Joyce Ooi <joyce.ooi@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoipc/sem: use flexible array in 'struct sem_undo'
Christophe JAILLET [Sun, 9 Jul 2023 16:12:55 +0000 (18:12 +0200)]
ipc/sem: use flexible array in 'struct sem_undo'

Turn 'semadj' in 'struct sem_undo' into a flexible array.

The advantages are:
   - save the size of a pointer when the new undo structure is allocated
   - avoid some always ugly pointer arithmetic to get the address of semadj
   - avoid an indirection when the array is accessed

While at it, use struct_size() to compute the size of the new undo
structure.

Link: https://lkml.kernel.org/r/1ba993d443ad7e16ac2b1902adab1f05ebdfa454.1688918791.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoacct: replace all non-returning strlcpy with strscpy
Azeem Shaikh [Mon, 10 Jul 2023 01:17:48 +0000 (01:17 +0000)]
acct: replace all non-returning strlcpy with strscpy

strlcpy() reads the entire source buffer first.  This read may exceed the
destination size limit.  This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1].  In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Link: https://lkml.kernel.org/r/20230710011748.3538624-1-azeemshaikh38@gmail.com
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agosignal: print comm and exe name on fatal signals
Vincent Whitchurch [Fri, 7 Jul 2023 09:29:36 +0000 (11:29 +0200)]
signal: print comm and exe name on fatal signals

Make the print-fatal-signals message more useful by printing the comm
and the exe name for the process which received the fatal signal:

Before:

 potentially unexpected fatal signal 4
 potentially unexpected fatal signal 11

After:

 buggy-program: pool: potentially unexpected fatal signal 4
 some-daemon: gdbus: potentially unexpected fatal signal 11

comm used to be present but was removed in commit 681a90ffe829b8ee25d
("arc, print-fatal-signals: reduce duplicated information") because it's
also included as part of the later stack trace.  Having the comm as part
of the main "unexpected fatal..." print is rather useful though when
analysing logs, and the exe name is also valuable as shown in the
examples above where the comm ends up having some generic name like
"pool".

[akpm@linux-foundation.org: don't include linux/file.h twice]
Link: https://lkml.kernel.org/r/20230707-fatal-comm-v1-1-400363905d5e@axis.com
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoarch/ia64/include: remove CONFIG_IA64_DEBUG_CMPXCHG from uapi header
Thomas Huth [Wed, 26 Apr 2023 06:50:32 +0000 (08:50 +0200)]
arch/ia64/include: remove CONFIG_IA64_DEBUG_CMPXCHG from uapi header

CONFIG_* switches should not be exposed in uapi headers.  The macros that
are defined here are also only useful for the kernel code, so let's move
them to asm/cmpxchg.h instead.

The only two files that are using these macros are the headers
arch/ia64/include/asm/bitops.h and arch/ia64/include/asm/atomic.h and
these include asm/cmpxchg.h via asm/intrinsics.h, so this movement should
not cause any trouble.

Link: https://lkml.kernel.org/r/20230426065032.517693-1-thuth@redhat.com
Signed-off-by: Thomas Huth <thuth@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agolib: replace kmap() with kmap_local_page()
Sumitra Sharma [Sat, 10 Jun 2023 17:57:12 +0000 (10:57 -0700)]
lib: replace kmap() with kmap_local_page()

kmap() has been deprecated in favor of the kmap_local_page() due to high
cost, restricted mapping space, the overhead of a global lock for
synchronization, and making the process sleep in the absence of free
slots.

kmap_local_page() is faster than kmap() and offers thread-local and
CPU-local mappings, take pagefaults in a local kmap region and preserves
preemption by saving the mappings of outgoing tasks and restoring those of
the incoming one during a context switch.

The mappings are kept thread local in the functions “dmirror_do_read”
and “dmirror_do_write” in test_hmm.c

Therefore, replace kmap() with kmap_local_page() and use
mempcy_from/to_page() to avoid open coding kmap_local_page() + memcpy() +
kunmap_local().

Remove the unused variable “tmp”.

Link: https://lkml.kernel.org/r/20230610175712.GA348514@sumitra.com
Signed-off-by: Sumitra Sharma <sumitraartsy@gmail.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Deepak R Varma <drv@mailo.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoproc: skip proc-empty-vm on anything but amd64 and i386
Alexey Dobriyan [Fri, 30 Jun 2023 18:34:34 +0000 (21:34 +0300)]
proc: skip proc-empty-vm on anything but amd64 and i386

This test is arch specific, requires "munmap everything" primitive.

Link: https://lkml.kernel.org/r/20230630183434.17434-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoproc: support proc-empty-vm test on i386
Alexey Dobriyan [Fri, 30 Jun 2023 18:34:33 +0000 (21:34 +0300)]
proc: support proc-empty-vm test on i386

Unmap everything starting from 4GB length until it unmaps, otherwise test
has to detect which virtual memory split kernel is using.

Link: https://lkml.kernel.org/r/20230630183434.17434-1-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agocred: convert printks to pr_<level>
tiozhang [Sun, 25 Jun 2023 03:34:52 +0000 (11:34 +0800)]
cred: convert printks to pr_<level>

Use current logging style.

Link: https://lkml.kernel.org/r/20230625033452.GA22858@didi-ThinkCentre-M930t-N000
Signed-off-by: tiozhang <tiozhang@didiglobal.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Paulo Alcantara <pc@cjr.nz>
Cc: Weiping Zhang <zwp10758@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 months agoLinux 6.5-rc4
Linus Torvalds [Sun, 30 Jul 2023 20:23:47 +0000 (13:23 -0700)]
Linux 6.5-rc4

11 months agoMerge tag 'spi-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Sun, 30 Jul 2023 19:54:31 +0000 (12:54 -0700)]
Merge tag 'spi-fix-v6.5-rc3' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A bunch of fixes for the Qualcomm QSPI driver, fixing multiple issues
  with the newly added DMA mode - it had a number of issues exposed when
  tested in a wider range of use cases, both race condition style issues
  and issues with different inputs to those that had been used in test"

* tag 'spi-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-qcom-qspi: Add mem_ops to avoid PIO for badly sized reads
  spi: spi-qcom-qspi: Fallback to PIO for xfers that aren't multiples of 4 bytes
  spi: spi-qcom-qspi: Add DMA_CHAIN_DONE to ALL_IRQS
  spi: spi-qcom-qspi: Call dma_wmb() after setting up descriptors
  spi: spi-qcom-qspi: Use GFP_ATOMIC flag while allocating for descriptor
  spi: spi-qcom-qspi: Ignore disabled interrupts' status in isr

11 months agoMerge tag 'regulator-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 30 Jul 2023 19:52:05 +0000 (12:52 -0700)]
Merge tag 'regulator-fix-v6.5-rc3' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A couple of small fixes for the the mt6358 driver, fixing error
  reporting and a bootstrapping issue"

* tag 'regulator-fix-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: mt6358: Fix incorrect VCN33 sync error message
  regulator: mt6358: Sync VCN33_* enable status after checking ID

11 months agoMerge tag 'usb-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 30 Jul 2023 18:57:51 +0000 (11:57 -0700)]
Merge tag 'usb-6.5-rc4' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a set of USB driver fixes for 6.5-rc4. Include in here are:

   - new USB serial device ids

   - dwc3 driver fixes for reported issues

   - typec driver fixes for reported problems

   - gadget driver fixes

   - reverts of some problematic USB changes that went into -rc1

  All of these have been in linux-next with no reported problems"

* tag 'usb-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
  usb: misc: ehset: fix wrong if condition
  usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy
  usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config
  usb: gadget: call usb_gadget_check_config() to verify UDC capability
  usb: typec: Use sysfs_emit_at when concatenating the string
  usb: typec: Iterate pds array when showing the pd list
  usb: typec: Set port->pd before adding device for typec_port
  usb: typec: qcom: fix return value check in qcom_pmic_typec_probe()
  Revert "usb: gadget: tegra-xudc: Fix error check in tegra_xudc_powerdomain_init()"
  Revert "usb: xhci: tegra: Fix error check"
  USB: gadget: Fix the memory leak in raw_gadget driver
  usb: gadget: core: remove unbalanced mutex_unlock in usb_gadget_activate
  Revert "usb: dwc3: core: Enable AutoRetry feature in the controller"
  Revert "xhci: add quirk for host controllers that don't update endpoint DCS"
  USB: quirks: add quirk for Focusrite Scarlett
  usb: xhci-mtk: set the dma max_seg_size
  MAINTAINERS: drop invalid usb/cdns3 Reviewer e-mail
  usb: dwc3: don't reset device side if dwc3 was configured as host-only
  usb: typec: ucsi: move typec_set_mode(TYPEC_STATE_SAFE) to ucsi_unregister_partner()
  usb: ohci-at91: Fix the unhandle interrupt when resume
  ...

11 months agoMerge tag 'tty-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 30 Jul 2023 18:51:36 +0000 (11:51 -0700)]
Merge tag 'tty-6.5-rc4' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small TTY and serial driver fixes for 6.5-rc4 for some
  reported problems. Included in here is:

   - TIOCSTI fix for braille readers

   - documentation fix for minor numbers

   - MAINTAINERS update for new serial files in -rc1

   - minor serial driver fixes for reported problems

  All of these have been in linux-next with no reported problems"

* tag 'tty-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_dw: Preserve original value of DLF register
  tty: serial: sh-sci: Fix sleeping in atomic context
  serial: sifive: Fix sifive_serial_console_setup() section
  Documentation: devices.txt: reconcile serial/ucc_uart minor numers
  MAINTAINERS: Update TTY layer for lists and recently added files
  tty: n_gsm: fix UAF in gsm_cleanup_mux
  TIOCSTI: always enable for CAP_SYS_ADMIN

11 months agoMerge tag 'staging-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 30 Jul 2023 18:47:56 +0000 (11:47 -0700)]
Merge tag 'staging-6.5-rc4' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are three small staging driver fixes for 6.5-rc4 that resolve
  some reported problems. These fixes are:

   - fix for an old bug in the r8712 driver

   - fbtft driver fix for a spi device

   - potential overflow fix in the ks7010 driver

  All of these have been in linux-next with no reported problems"

* tag 'staging-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext()
  staging: fbtft: ili9341: use macro FBTFT_REGISTER_SPI_DRIVER
  staging: r8712: Fix memory leak in _r8712_init_xmit_priv()

11 months agoMerge tag 'char-misc-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 30 Jul 2023 18:44:00 +0000 (11:44 -0700)]
Merge tag 'char-misc-6.5-rc4' of git://git./linux/kernel/git/gregkh/char-misc

Pull char driver and Documentation fixes from Greg KH:
 "Here is a char driver fix and some documentation updates for 6.5-rc4
  that contain the following changes:

   - sram/genalloc bugfix for reported problem

   - security-bugs.rst update based on recent discussions

   - embargoed-hardware-issues minor cleanups and then partial revert
     for the project/company lists

  All of these have been in linux-next for a while with no reported
  problems, and the documentation updates have all been reviewed by the
  relevant developers"

* tag 'char-misc-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc/genalloc: Name subpools by of_node_full_name()
  Documentation: embargoed-hardware-issues.rst: add AMD to the list
  Documentation: embargoed-hardware-issues.rst: clean out empty and unused entries
  Documentation: security-bugs.rst: clarify CVE handling
  Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group

11 months agoMerge tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 30 Jul 2023 18:27:22 +0000 (11:27 -0700)]
Merge tag 'probes-fixes-v6.5-rc3' of git://git./linux/kernel/git/trace/linux-trace

Pull probe fixes from Masami Hiramatsu:

 - probe-events: add NULL check for some BTF API calls which can return
   error code and NULL.

 - ftrace selftests: check fprobe and kprobe event correctly. This fixes
   a miss condition of the test command.

 - kprobes: do not allow probing functions that start with "__cfi_" or
   "__pfx_" since those are auto generated for kernel CFI and not
   executed.

* tag 'probes-fixes-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  kprobes: Prohibit probing on CFI preamble symbol
  selftests/ftrace: Fix to check fprobe event eneblement
  tracing/probes: Fix to add NULL check for BTF APIs

11 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 30 Jul 2023 18:19:08 +0000 (11:19 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86:

   - Do not register IRQ bypass consumer if posted interrupts not
     supported

   - Fix missed device interrupt due to non-atomic update of IRR

   - Use GFP_KERNEL_ACCOUNT for pid_table in ipiv

   - Make VMREAD error path play nice with noinstr

   - x86: Acquire SRCU read lock when handling fastpath MSR writes

   - Support linking rseq tests statically against glibc 2.35+

   - Fix reference count for stats file descriptors

   - Detect userspace setting invalid CR0

  Non-KVM:

   - Remove coccinelle script that has caused multiple confusion
     ("debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE()
     usage", acked by Greg)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: selftests: Expand x86's sregs test to cover illegal CR0 values
  KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest
  KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
  Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"
  KVM: selftests: Verify stats fd is usable after VM fd has been closed
  KVM: selftests: Verify stats fd can be dup()'d and read
  KVM: selftests: Verify userspace can create "redundant" binary stats files
  KVM: selftests: Explicitly free vcpus array in binary stats test
  KVM: selftests: Clean up stats fd in common stats_test() helper
  KVM: selftests: Use pread() to read binary stats header
  KVM: Grab a reference to KVM for VM and vCPU stats file descriptors
  selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
  Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"
  KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes
  KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path
  KVM: VMX: Make VMREAD error path play nice with noinstr
  KVM: x86/irq: Conditionally register IRQ bypass consumer again
  KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv
  KVM: x86: check the kvm_cpu_get_interrupt result before using it
  KVM: x86: VMX: set irr_pending in kvm_apic_update_irr
  ...

11 months agoMerge tag 'locking_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2023 18:12:32 +0000 (11:12 -0700)]
Merge tag 'locking_urgent_for_v6.5_rc4' of git://git./linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

 - Fix a rtmutex race condition resulting from sharing of the sort key
   between the lock waiters and the PI chain tree (->pi_waiters) of a
   task by giving each tree their own sort key

* tag 'locking_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Fix task->pi_waiters integrity

11 months agoMerge tag 'x86_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2023 18:05:35 +0000 (11:05 -0700)]
Merge tag 'x86_urgent_for_v6.5_rc4' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - AMD's automatic IBRS doesn't enable cross-thread branch target
   injection protection (STIBP) for user processes. Enable STIBP on such
   systems.

 - Do not delete (but put the ref instead) of AMD MCE error thresholding
   sysfs kobjects when destroying them in order not to delete the kernfs
   pointer prematurely

 - Restore annotation in ret_from_fork_asm() in order to fix kthread
   stack unwinding from being marked as unreliable and thus breaking
   livepatching

* tag 'x86_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
  x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
  x86: Fix kthread unwind

11 months agoMerge tag 'irq_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 Jul 2023 17:59:19 +0000 (10:59 -0700)]
Merge tag 'irq_urgent_for_v6.5_rc4' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Work around an erratum on GIC700, where a race between a CPU handling
   a wake-up interrupt, a change of affinity, and another CPU going to
   sleep can result in a lack of wake-up event on the next interrupt

 - Fix the locking required on a VPE for GICv4

 - Enable Rockchip 3588001 erratum workaround for RK3588S

 - Fix the irq-bcm6345-l1 assumtions of the boot CPU always be the first
   CPU in the system

* tag 'irq_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Workaround for GIC-700 erratum 2941627
  irqchip/gic-v3: Enable Rockchip 3588001 erratum workaround for RK3588S
  irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation
  irq-bcm6345-l1: Do not assume a fixed block to cpu mapping

11 months agoMerge tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 30 Jul 2023 03:49:13 +0000 (20:49 -0700)]
Merge tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Four small SMB3 client fixes:

   - two reconnect fixes (to address the case where non-default
     iocharset gets incorrectly overridden at reconnect with the
     default charset)

   - fix for NTLMSSP_AUTH request setting a flag incorrectly)

   - Add missing check for invalid tlink (tree connection) in ioctl"

* tag '6.5-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: add missing return value check for cifs_sb_tlink
  smb3: do not set NTLMSSP_VERSION flag for negotiate not auth request
  cifs: fix charset issue in reconnection
  fs/nls: make load_nls() take a const parameter

11 months agoMerge tag 'trace-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sun, 30 Jul 2023 03:40:43 +0000 (20:40 -0700)]
Merge tag 'trace-v6.5-rc3' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix to /sys/kernel/tracing/per_cpu/cpu*/stats read and entries.

   If a resize shrinks the buffer it clears the read count to notify
   readers that they need to reset. But the read count is also used for
   accounting and this causes the numbers to be off. Instead, create a
   separate variable to use to notify readers to reset.

 - Fix the ref counts of the "soft disable" mode. The wrong value was
   used for testing if soft disable mode should be enabled or disable,
   but instead, just change the logic to do the enable and disable in
   place when the SOFT_MODE is set or cleared.

 - Several kernel-doc fixes

 - Removal of unused external declarations

* tag 'trace-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix warning in trace_buffered_event_disable()
  ftrace: Remove unused extern declarations
  tracing: Fix kernel-doc warnings in trace_seq.c
  tracing: Fix kernel-doc warnings in trace_events_trigger.c
  tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.c
  ring-buffer: Fix kernel-doc warnings in ring_buffer.c
  ring-buffer: Fix wrong stat of cpu_buffer->read

11 months agoarch/*/configs/*defconfig: Replace AUTOFS4_FS by AUTOFS_FS
Sven Joachim [Thu, 27 Jul 2023 20:00:41 +0000 (22:00 +0200)]
arch/*/configs/*defconfig: Replace AUTOFS4_FS by AUTOFS_FS

Commit a2225d931f75 ("autofs: remove left-over autofs4 stubs")
promised the removal of the fs/autofs/Kconfig fragment for AUTOFS4_FS
within a couple of releases, but five years later this still has not
happened yet, and AUTOFS4_FS is still enabled in 63 defconfigs.

Get rid of it mechanically:

   git grep -l CONFIG_AUTOFS4_FS -- '*defconfig' |
       xargs sed -i 's/AUTOFS4_FS/AUTOFS_FS/'

Also just remove the AUTOFS4_FS config option stub.  Anybody who hasn't
regenerated their config file in the last five years will need to just
get the new name right when they do.

Signed-off-by: Sven Joachim <svenjoac@gmx.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 months agoMerge tag 'loongarch-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 29 Jul 2023 15:59:25 +0000 (08:59 -0700)]
Merge tag 'loongarch-fixes-6.5-1' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Some bug fixes for build system, builtin cmdline handling, bpf and
  {copy, clear}_user, together with a trivial cleanup"

* tag 'loongarch-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Cleanup __builtin_constant_p() checking for cpu_has_*
  LoongArch: BPF: Fix check condition to call lu32id in move_imm()
  LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArch
  LoongArch: Fix return value underflow in exception path
  LoongArch: Fix CMDLINE_EXTEND and CMDLINE_BOOTLOADER handling
  LoongArch: Fix module relocation error with binutils 2.41
  LoongArch: Only fiddle with CHECKFLAGS if `need-compiler'

11 months agoKVM: selftests: Expand x86's sregs test to cover illegal CR0 values
Sean Christopherson [Tue, 13 Jun 2023 20:30:37 +0000 (13:30 -0700)]
KVM: selftests: Expand x86's sregs test to cover illegal CR0 values

Add coverage to x86's set_sregs_test to verify KVM rejects vendor-agnostic
illegal CR0 values, i.e. CR0 values whose legality doesn't depend on the
current VMX mode.  KVM historically has neglected to reject bad CR0s from
userspace, i.e. would happily accept a completely bogus CR0 via
KVM_SET_SREGS{2}.

Punt VMX specific subtests to future work, as they would require quite a
bit more effort, and KVM gets coverage for CR0 checks in general through
other means, e.g. KVM-Unit-Tests.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest
Sean Christopherson [Tue, 13 Jun 2023 20:30:36 +0000 (13:30 -0700)]
KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest

Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only
if KVM itself is not configured to utilize unrestricted guests, i.e. don't
stuff CR0/CR4 for a restricted L2 that is running as the guest of an
unrestricted L1.  Any attempt to VM-Enter a restricted guest with invalid
CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should
never observe a restricted L2 with incompatible CR0/CR4, since nested
VM-Enter from L1 should have failed.

And if KVM does observe an active, restricted L2 with incompatible state,
e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail
does more harm than good, as KVM will often neglect to undo the side
effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus
the damage can easily spill over to L1.  On the other hand, letting
VM-Enter fail due to bad guest state is more likely to contain the damage
to L2 as KVM relies on hardware to perform most guest state consistency
checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into
L1 irrespective of (un)restricted guest behavior.

Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Fixes: bddd82d19e2e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
Sean Christopherson [Tue, 13 Jun 2023 20:30:35 +0000 (13:30 -0700)]
KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid

Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode.  The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).

Opportunistically fix a benign typo in the prototype for is_valid_cr4().

Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoRevert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"
Sean Christopherson [Wed, 26 Jul 2023 20:29:20 +0000 (13:29 -0700)]
Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"

Remove coccinelle's recommendation to use DEFINE_DEBUGFS_ATTRIBUTE()
instead of DEFINE_SIMPLE_ATTRIBUTE().  Regardless of whether or not the
"significant overhead" incurred by debugfs_create_file() is actually
meaningful, warnings from the script have led to a rash of low-quality
patches that have sowed confusion and consumed maintainer time for little
to no benefit.  There have been no less than four attempts to "fix" KVM,
and a quick search on lore shows that KVM is not alone.

This reverts commit 5103068eaca290f890a30aae70085fac44cecaf6.

Link: https://lore.kernel.org/all/87tu2nbnz3.fsf@mpe.ellerman.id.au
Link: https://lore.kernel.org/all/c0b98151-16b6-6d8f-1765-0f7d46682d60@redhat.com
Link: https://lkml.kernel.org/r/20230706072954.4881-1-duminjie%40vivo.com
Link: https://lore.kernel.org/all/Y2FsbufV00jbyF0B@google.com
Link: https://lore.kernel.org/all/Y2ENJJ1YiSg5oHiy@orome
Link: https://lore.kernel.org/all/7560b350e7b23786ce712118a9a504356ff1cca4.camel@kernel.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230726202920.507756-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: selftests: Verify stats fd is usable after VM fd has been closed
Sean Christopherson [Tue, 11 Jul 2023 23:01:31 +0000 (16:01 -0700)]
KVM: selftests: Verify stats fd is usable after VM fd has been closed

Verify that VM and vCPU binary stats files are usable even after userspace
has put its last direct reference to the VM.  This is a regression test
for a UAF bug where KVM didn't gift the stats files a reference to the VM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230711230131.648752-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: selftests: Verify stats fd can be dup()'d and read
Sean Christopherson [Tue, 11 Jul 2023 23:01:30 +0000 (16:01 -0700)]
KVM: selftests: Verify stats fd can be dup()'d and read

Expand the binary stats test to verify that a stats fd can be dup()'d
and read, to (very) roughly simulate userspace passing around the file.
Adding the dup() test is primarily an intermediate step towards verifying
that userspace can read VM/vCPU stats before _and_ after userspace closes
its copy of the VM fd; the dup() test itself is only mildly interesting.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230711230131.648752-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: selftests: Verify userspace can create "redundant" binary stats files
Sean Christopherson [Tue, 11 Jul 2023 23:01:29 +0000 (16:01 -0700)]
KVM: selftests: Verify userspace can create "redundant" binary stats files

Verify that KVM doesn't artificially limit KVM_GET_STATS_FD to a single
file per VM/vCPU.  There's no known use case for getting multiple stats
fds, but it should work, and more importantly creating multiple files will
make it easier to test that KVM correct manages VM refcounts for stats
files.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230711230131.648752-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: selftests: Explicitly free vcpus array in binary stats test
Sean Christopherson [Tue, 11 Jul 2023 23:01:28 +0000 (16:01 -0700)]
KVM: selftests: Explicitly free vcpus array in binary stats test

Explicitly free the all-encompassing vcpus array in the binary stats test
so that the test is consistent with respect to freeing all dynamically
allocated resources (versus letting them be freed on exit).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230711230131.648752-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: selftests: Clean up stats fd in common stats_test() helper
Sean Christopherson [Tue, 11 Jul 2023 23:01:27 +0000 (16:01 -0700)]
KVM: selftests: Clean up stats fd in common stats_test() helper

Move the stats fd cleanup code into stats_test() and drop the
superfluous vm_stats_test() and vcpu_stats_test() helpers in order to
decouple creation of the stats file from consuming/testing the file
(deduping code is a bonus).  This will make it easier to test various
edge cases related to stats, e.g. that userspace can dup() a stats fd,
that userspace can have multiple stats files for a singleVM/vCPU, etc.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230711230131.648752-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: selftests: Use pread() to read binary stats header
Sean Christopherson [Tue, 11 Jul 2023 23:01:26 +0000 (16:01 -0700)]
KVM: selftests: Use pread() to read binary stats header

Use pread() with an explicit offset when reading the header and the header
name for a binary stats fd so that the common helper and the binary stats
test don't subtly rely on the file effectively being untouched, e.g. to
allow multiple reads of the header, name, etc.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230711230131.648752-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: Grab a reference to KVM for VM and vCPU stats file descriptors
Sean Christopherson [Tue, 11 Jul 2023 23:01:25 +0000 (16:01 -0700)]
KVM: Grab a reference to KVM for VM and vCPU stats file descriptors

Grab a reference to KVM prior to installing VM and vCPU stats file
descriptors to ensure the underlying VM and vCPU objects are not freed
until the last reference to any and all stats fds are dropped.

Note, the stats paths manually invoke fd_install() and so don't need to
grab a reference before creating the file.

Fixes: ce55c049459c ("KVM: stats: Support binary stats retrieval for a VCPU")
Fixes: fcfe1baeddbf ("KVM: stats: Support binary stats retrieval for a VM")
Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu>
Closes: https://lore.kernel.org/all/CAC_GQSr3xzZaeZt85k_RCBd5kfiOve8qXo7a81Cq53LuVQ5r=Q@mail.gmail.com
Cc: stable@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Message-Id: <20230711230131.648752-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoselftests/rseq: Play nice with binaries statically linked against glibc 2.35+
Sean Christopherson [Fri, 21 Jul 2023 22:33:52 +0000 (15:33 -0700)]
selftests/rseq: Play nice with binaries statically linked against glibc 2.35+

To allow running rseq and KVM's rseq selftests as statically linked
binaries, initialize the various "trampoline" pointers to point directly
at the expect glibc symbols, and skip the dlysm() lookups if the rseq
size is non-zero, i.e. the binary is statically linked *and* the libc
registered its own rseq.

Define weak versions of the symbols so as not to break linking against
libc versions that don't support rseq in any capacity.

The KVM selftests in particular are often statically linked so that they
can be run on targets with very limited runtime environments, i.e. test
machines.

Fixes: 233e667e1ae3 ("selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35")
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721223352.2333911-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoRevert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"
Sean Christopherson [Fri, 21 Jul 2023 22:43:37 +0000 (15:43 -0700)]
Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"

Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows
dereferencing memslots during WRMSR emulation, drop the requirement that
"next RIP" is valid.  In hindsight, acquiring kvm->srcu would have been a
better fix than avoiding the pastpath, but at the time it was thought that
accessing SRCU-protected data in the fastpath was a one-off edge case.

This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: x86: Acquire SRCU read lock when handling fastpath MSR writes
Sean Christopherson [Fri, 21 Jul 2023 22:43:36 +0000 (15:43 -0700)]
KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes

Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in
the VM-Exit fastpath handler, as several of the common helpers used during
emulation expect the caller to provide SRCU protection.  E.g. if the guest
is counting instructions retired, KVM will query the PMU event filter when
stepping over the WRMSR.

  dump_stack+0x85/0xdf
  lockdep_rcu_suspicious+0x109/0x120
  pmc_event_is_allowed+0x165/0x170
  kvm_pmu_trigger_event+0xa5/0x190
  handle_fastpath_set_msr_irqoff+0xca/0x1e0
  svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
  vcpu_enter_guest+0x2108/0x2580

Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this
isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM:
SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid").  Providing
protection for the entirety of WRMSR emulation will allow reverting the
aforementioned commit, and will avoid having to play whack-a-mole when new
uses of SRCU-protected structures are inevitably added in common emulation
helpers.

Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")
Reported-by: Greg Thelen <gthelen@google.com>
Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: VMX: Use vmread_error() to report VM-Fail in "goto" path
Sean Christopherson [Fri, 21 Jul 2023 23:56:37 +0000 (16:56 -0700)]
KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path

Use vmread_error() to report VM-Fail on VMREAD for the "asm goto" case,
now that trampoline case has yet another wrapper around vmread_error() to
play nice with instrumentation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721235637.2345403-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: VMX: Make VMREAD error path play nice with noinstr
Sean Christopherson [Fri, 21 Jul 2023 23:56:36 +0000 (16:56 -0700)]
KVM: VMX: Make VMREAD error path play nice with noinstr

Mark vmread_error_trampoline() as noinstr, and add a second trampoline
for the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=n case to enable instrumentation
when handling VM-Fail on VMREAD.  VMREAD is used in various noinstr
flows, e.g. immediately after VM-Exit, and objtool rightly complains that
the call to the error trampoline leaves a no-instrumentation section
without annotating that it's safe to do so.

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0xc9:
  call to vmread_error_trampoline() leaves .noinstr.text section

Note, strictly speaking, enabling instrumentation in the VM-Fail path
isn't exactly safe, but if VMREAD fails the kernel/system is likely hosed
anyways, and logging that there is a fatal error is more important than
*maybe* encountering slightly unsafe instrumentation.

Reported-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721235637.2345403-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: x86/irq: Conditionally register IRQ bypass consumer again
Like Xu [Mon, 24 Jul 2023 11:12:36 +0000 (19:12 +0800)]
KVM: x86/irq: Conditionally register IRQ bypass consumer again

As was attempted commit 14717e203186 ("kvm: Conditionally register IRQ
bypass consumer"): "if we don't support a mechanism for bypassing IRQs,
don't register as a consumer.  Initially this applied to AMD processors,
but when AVIC support was implemented for assigned devices,
kvm_arch_has_irq_bypass() was always returning true.

We can still skip registering the consumer where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.
This eliminates meaningless dev_info()s when the connect fails
between producer and consumer", such as on Linux hosts where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.

Cc: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Yong He <alexyonghe@tencent.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20230724111236.76570-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv
Peng Hao [Fri, 28 Jul 2023 06:49:48 +0000 (14:49 +0800)]
KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv

The pid_table of ipiv is the persistent memory allocated by
per-vcpu, which should be counted into the memory cgroup.

Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Message-Id: <CAPm50aLxCQ3TQP2Lhc0PX3y00iTRg+mniLBqNDOC=t9CLxMwwA@mail.gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: x86: check the kvm_cpu_get_interrupt result before using it
Maxim Levitsky [Wed, 26 Jul 2023 13:59:45 +0000 (16:59 +0300)]
KVM: x86: check the kvm_cpu_get_interrupt result before using it

The code was blindly assuming that kvm_cpu_get_interrupt never returns -1
when there is a pending interrupt.

While this should be true, a bug in KVM can still cause this.

If -1 is returned, the code before this patch was converting it to 0xFF,
and 0xFF interrupt was injected to the guest, which results in an issue
which was hard to debug.

Add WARN_ON_ONCE to catch this case and skip the injection
if this happens again.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: x86: VMX: set irr_pending in kvm_apic_update_irr
Maxim Levitsky [Wed, 26 Jul 2023 13:59:44 +0000 (16:59 +0300)]
KVM: x86: VMX: set irr_pending in kvm_apic_update_irr

When the APICv is inhibited, the irr_pending optimization is used.

Therefore, when kvm_apic_update_irr sets bits in the IRR,
it must set irr_pending to true as well.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agoKVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomically
Maxim Levitsky [Wed, 26 Jul 2023 13:59:43 +0000 (16:59 +0300)]
KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomically

If APICv is inhibited, then IPIs from peer vCPUs are done by
atomically setting bits in IRR.

This means, that when __kvm_apic_update_irr copies PIR to IRR,
it has to modify IRR atomically as well.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11 months agokprobes: Prohibit probing on CFI preamble symbol
Masami Hiramatsu (Google) [Tue, 11 Jul 2023 01:50:47 +0000 (10:50 +0900)]
kprobes: Prohibit probing on CFI preamble symbol

Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.

Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
11 months agoMerge tag 'ata-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Sat, 29 Jul 2023 01:31:18 +0000 (18:31 -0700)]
Merge tag 'ata-6.5-rc4' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:

 - Fix error message output in the pata_arasan_cf driver (Minjie)

 - Fix invalid error return in the pata_octeon_cf driver initialization
   (Yingliang)

 - Fix a compilation warning due to a missing static function
   declaration in the pata_ns87415 driver (Arnd)

 - Fix the condition evaluating when to fetch sense data for successful
   completions, which should be done only when command duration limits
   are being used (Niklas)

* tag 'ata-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: fix when to fetch sense data for successful commands
  ata: pata_ns87415: mark ns87560_tf_read static
  ata: pata_octeon_cf: fix error return code in octeon_cf_probe()
  ata: pata_arasan_cf: Use dev_err_probe() instead dev_err() in data_xfer()

11 months agotracing: Fix warning in trace_buffered_event_disable()
Zheng Yejian [Wed, 26 Jul 2023 09:58:04 +0000 (17:58 +0800)]
tracing: Fix warning in trace_buffered_event_disable()

Warning happened in trace_buffered_event_disable() at
  WARN_ON_ONCE(!trace_buffered_event_ref)

  Call Trace:
   ? __warn+0xa5/0x1b0
   ? trace_buffered_event_disable+0x189/0x1b0
   __ftrace_event_enable_disable+0x19e/0x3e0
   free_probe_data+0x3b/0xa0
   unregister_ftrace_function_probe_func+0x6b8/0x800
   event_enable_func+0x2f0/0x3d0
   ftrace_process_regex.isra.0+0x12d/0x1b0
   ftrace_filter_write+0xe6/0x140
   vfs_write+0x1c9/0x6f0
   [...]

The cause of the warning is in __ftrace_event_enable_disable(),
trace_buffered_event_enable() was called once while
trace_buffered_event_disable() was called twice.
Reproduction script show as below, for analysis, see the comments:
 ```
 #!/bin/bash

 cd /sys/kernel/tracing/

 # 1. Register a 'disable_event' command, then:
 #    1) SOFT_DISABLED_BIT was set;
 #    2) trace_buffered_event_enable() was called first time;
 echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \
     set_ftrace_filter

 # 2. Enable the event registered, then:
 #    1) SOFT_DISABLED_BIT was cleared;
 #    2) trace_buffered_event_disable() was called first time;
 echo 1 > events/initcall/initcall_finish/enable

 # 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was
 #    set again!!!
 cat /proc/cmdline

 # 4. Unregister the 'disable_event' command, then:
 #    1) SOFT_DISABLED_BIT was cleared again;
 #    2) trace_buffered_event_disable() was called second time!!!
 echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \
     set_ftrace_filter
 ```

To fix it, IIUC, we can change to call trace_buffered_event_enable() at
fist time soft-mode enabled, and call trace_buffered_event_disable() at
last time soft-mode disabled.

Link: https://lore.kernel.org/linux-trace-kernel/20230726095804.920457-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>