SeongJae Park [Sat, 25 Dec 2021 05:12:54 +0000 (21:12 -0800)]
mm/damon/dbgfs: protect targets destructions with kdamond_lock
DAMON debugfs interface iterates current monitoring targets in
'dbgfs_target_ids_read()' while holding the corresponding
'kdamond_lock'. However, it also destructs the monitoring targets in
'dbgfs_before_terminate()' without holding the lock. This can result in
a use_after_free bug. This commit avoids the race by protecting the
destruction with the corresponding 'kdamond_lock'.
Link: https://lkml.kernel.org/r/20211221094447.2241-1-sj@kernel.org
Reported-by: Sangwoo Bae <sangwoob@amazon.com>
Fixes:
4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thibaut Sautereau [Sat, 25 Dec 2021 05:12:51 +0000 (21:12 -0800)]
mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
The second parameter of alloc_pages_exact_nid is the one indicating the
size of memory pointed by the returned pointer.
Link: https://lkml.kernel.org/r/YbjEgwhn4bGblp//@coeus
Fixes:
abd58f38dfb4 ("mm/page_alloc: add __alloc_size attributes for better bounds checking")
Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Levente Polyak <levente@leventepolyak.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 25 Dec 2021 05:12:48 +0000 (21:12 -0800)]
mm: delete unsafe BUG from page_cache_add_speculative()
It is not easily reproducible, but on 5.16-rc I have several times hit
the VM_BUG_ON_PAGE(PageTail(page), page) in
page_cache_add_speculative(): usually from filemap_get_read_batch() for
an ext4 read, yesterday from next_uptodate_page() from
filemap_map_pages() for a shmem fault.
That BUG used to be placed where page_ref_add_unless() had succeeded,
but now it is placed before folio_ref_add_unless() is attempted: that is
not safe, since it is only the acquired reference which makes the page
safe from racing THP collapse or split.
We could keep the BUG, checking PageTail only when
folio_ref_try_add_rcu() has succeeded; but I don't think it adds much
value - just delete it.
Link: https://lkml.kernel.org/r/8b98fc6f-3439-8614-c3f3-945c659a1aba@google.com
Fixes:
020853b6f5ea ("mm: Add folio_try_get_rcu()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Sat, 25 Dec 2021 05:12:45 +0000 (21:12 -0800)]
mm, hwpoison: fix condition in free hugetlb page path
When a memory error hits a tail page of a free hugepage,
__page_handle_poison() is expected to be called to isolate the error in
4kB unit, but it's not called due to the outdated if-condition in
memory_failure_hugetlb(). This loses the chance to isolate the error in
the finer unit, so it's not optimal. Drop the condition.
This "(p != head && TestSetPageHWPoison(head)" condition is based on the
old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was
set on the subpage), so it's not necessray any more. By getting to set
PG_hwpoison on head page for hugepages, concurrent error events on
different subpages in a single hugepage can be prevented by
TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb().
So dropping the condition should not reopen the race window originally
mentioned in commit
b985194c8c0a ("hwpoison, hugetlb:
lock_page/unlock_page does not match for handling a free hugepage")
[naoya.horiguchi@linux.dev: fix "HardwareCorrupted" counter]
Link: https://lkml.kernel.org/r/20211220084851.GA1460264@u2004
Link: https://lkml.kernel.org/r/20211210110208.879740-1-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Fei Luo <luofei@unicloud.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Sat, 25 Dec 2021 05:12:42 +0000 (21:12 -0800)]
MAINTAINERS: mark more list instances as moderated
Some lists that are moderated are not marked as moderated consistently,
so mark them all as moderated.
Link: https://lkml.kernel.org/r/20211209001330.18558-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Conor Culhane <conor.culhane@silvaco.com>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Jianjun Wang <jianjun.wang@mediatek.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philipp Rudo [Sat, 25 Dec 2021 05:12:39 +0000 (21:12 -0800)]
kernel/crash_core: suppress unknown crashkernel parameter warning
When booting with crashkernel= on the kernel command line a warning
similar to
Kernel command line: ro console=ttyS0 crashkernel=256M
Unknown kernel command line parameters "crashkernel=256M", will be passed to user space.
is printed.
This comes from crashkernel= being parsed independent from the kernel
parameter handling mechanism. So the code in init/main.c doesn't know
that crashkernel= is a valid kernel parameter and prints this incorrect
warning.
Suppress the warning by adding a dummy early_param handler for
crashkernel=.
Link: https://lkml.kernel.org/r/20211208133443.6867-1-prudo@redhat.com
Fixes:
86d1919a4fb0 ("init: print out unknown kernel parameters")
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Sat, 25 Dec 2021 05:12:35 +0000 (21:12 -0800)]
mm: mempolicy: fix THP allocations escaping mempolicy restrictions
alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:
page = __alloc_pages_node(hpage_node,
gfp | __GFP_THISNODE | __GFP_NORETRY, order);
And if the allocation fails it retries allowing remote memory:
if (!page && (gfp & __GFP_DIRECT_RECLAIM))
page = __alloc_pages_node(hpage_node,
gfp, order);
However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.
The first appearance of this bug seems to be the commit
ac5b2c18911f
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").
The bug disappeared later in the commit
89c83fb539f9 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit
76e654cc91bb
("mm, page_alloc: allow hugepage fallback to remote nodes when
madvised")
Fix this by passing correct nodemask to the __alloc_pages() call.
The demonstration/reproducer of the problem:
$ mount -oremount,size=4G,huge=always /dev/shm/
$ echo always > /sys/kernel/mm/transparent_hugepage/defrag
$ cat mbind_thp.c
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <numaif.h>
#define SIZE 2ULL << 30
int main(int argc, char **argv)
{
int fd;
unsigned long long i;
char *addr;
pid_t pid;
char buf[100];
unsigned long nodemask = 1;
fd = open("/dev/shm/test", O_RDWR|O_CREAT);
assert(fd > 0);
assert(ftruncate(fd, SIZE) == 0);
addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, 0);
assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
for (i = 0; i < SIZE; i+=4096) {
addr[i] = 1;
}
pid = getpid();
snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
system(buf);
sleep(10000);
return 0;
}
$ gcc mbind_thp.c -o mbind_thp -lnuma
$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 1918 MB
node 0 free: 1595 MB
node 1 cpus: 1 3
node 1 size: 2014 MB
node 1 free: 1731 MB
node distances:
node 0 1
0: 10 20
1: 20 10
$ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4
Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes:
ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Baokun Li [Sat, 25 Dec 2021 05:12:32 +0000 (21:12 -0800)]
kfence: fix memory leak when cat kfence objects
Hulk robot reported a kmemleak problem:
unreferenced object 0xffff93d1d8cc02e8 (size 248):
comm "cat", pid 23327, jiffies
4624670141 (age 495992.217s)
hex dump (first 32 bytes):
00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00 .@..............
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
seq_open+0x2a/0x80
full_proxy_open+0x167/0x1e0
do_dentry_open+0x1e1/0x3a0
path_openat+0x961/0xa20
do_filp_open+0xae/0x120
do_sys_openat2+0x216/0x2f0
do_sys_open+0x57/0x80
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff93d419854000 (size 4096):
comm "cat", pid 23327, jiffies
4624670141 (age 495992.217s)
hex dump (first 32 bytes):
6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30 kfence-#250: 0x0
30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d
0000000754bda12-
backtrace:
seq_read_iter+0x313/0x440
seq_read+0x14b/0x1a0
full_proxy_read+0x56/0x80
vfs_read+0xa5/0x1b0
ksys_read+0xa0/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
I find that we can easily reproduce this problem with the following
commands:
cat /sys/kernel/debug/kfence/objects
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
The leaked memory is allocated in the stack below:
do_syscall_64
do_sys_open
do_dentry_open
full_proxy_open
seq_open ---> alloc seq_file
vfs_read
full_proxy_read
seq_read
seq_read_iter
traverse ---> alloc seq_buf
And it should have been released in the following process:
do_syscall_64
syscall_exit_to_user_mode
exit_to_user_mode_prepare
task_work_run
____fput
__fput
full_proxy_release ---> free here
However, the release function corresponding to file_operations is not
implemented in kfence. As a result, a memory leak occurs. Therefore,
the solution to this problem is to implement the corresponding release
function.
Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com
Fixes:
0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 22 Dec 2021 19:39:53 +0000 (11:39 -0800)]
Merge tag 'fixes-2021-12-22' of git://git./linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"Fix memblock_phys_alloc() section mismatch error
There are section mismatch errors when compiler refuses to inline
one-line wrapper memblock_phys_alloc(). Make memblock_phys_alloc()
__always_inline to avoid these mismatch issues"
* tag 'fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: fix memblock_phys_alloc() section mismatch error
Linus Torvalds [Wed, 22 Dec 2021 18:17:16 +0000 (10:17 -0800)]
Merge tag 'for-5.16/parisc-7' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- Fix a bug in the C code which calculates the relevant futex spinlock
based on the futex virtual address. In some cases a wrong spinlock
(compared to what is calculated in the assembly code path) was
choosen which then can lead to deadlocks.
- The 64-bit kernel missed to clip the LWS number in the
Light-weight-syscall path for 32-bit processes.
- Prevent CPU register dump to show stale value in IIR register on
access rights traps.
- Remove unused ARCH_DEFCONFIG entries.
* tag 'for-5.16/parisc-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: remove ARCH_DEFCONFIG
parisc: Fix mask used to select futex spinlock
parisc: Correct completer in lws start
parisc: Clear stale IIR value on instruction access rights trap
Linus Torvalds [Wed, 22 Dec 2021 18:11:17 +0000 (10:11 -0800)]
Merge tag 'for-linus-5.16-3' of git://github.com/cminyard/linux-ipmi
Pull IPMI fixes from Corey Minyard:
"Fix some IPMI crashes
Some crash fixes have come in dealing with various error handling
issues. They have sat in next for 5 days or more without issue, and
they are fairly critical"
* tag 'for-linus-5.16-3' of git://github.com/cminyard/linux-ipmi:
ipmi: Fix UAF when uninstall ipmi_si and ipmi_msghandler module
ipmi: fix initialization when workqueue allocation fails
ipmi: bail out if init_srcu_struct fails
ipmi: ssif: initialize ssif_info->client early
Linus Torvalds [Wed, 22 Dec 2021 18:06:32 +0000 (10:06 -0800)]
Merge tag 'tomoyo-pr-
20211222' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1
Pull tomoyo fixes from Tetsuo Handa:
"Two overhead reduction patches for testing/fuzzing environment"
* tag 'tomoyo-pr-
20211222' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
tomoyo: use hweight16() in tomoyo_domain_quota_is_ok()
tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok().
Linus Torvalds [Wed, 22 Dec 2021 18:02:08 +0000 (10:02 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression in the qat driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: qat - do not handle PFVF sources for qat_4xxx
Jackie Liu [Fri, 17 Dec 2021 02:07:54 +0000 (10:07 +0800)]
memblock: fix memblock_phys_alloc() section mismatch error
Fix modpost Section mismatch error in memblock_phys_alloc()
[...]
WARNING: modpost: vmlinux.o(.text.unlikely+0x1dcc): Section mismatch in reference
from the function memblock_phys_alloc() to the function .init.text:memblock_phys_alloc_range()
The function memblock_phys_alloc() references
the function __init memblock_phys_alloc_range().
This is often because memblock_phys_alloc lacks a __init
annotation or the annotation of memblock_phys_alloc_range is wrong.
ERROR: modpost: Section mismatches detected.
Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.
[...]
memblock_phys_alloc() is a one-line wrapper, make it __always_inline to
avoid these section mismatches.
Reported-by: k2ci <kernel-bot@kylinos.cn>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
[rppt: slightly massaged changelog ]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20211217020754.2874872-1-liu.yun@linux.dev
Masahiro Yamada [Mon, 13 Dec 2021 02:40:44 +0000 (11:40 +0900)]
parisc: remove ARCH_DEFCONFIG
Commit
2a86f6612164 ("kbuild: use KBUILD_DEFCONFIG as the fallback for
DEFCONFIG_LIST") removed ARCH_DEFCONFIG because it does not make much
sense.
In the same development cycle, Commit
ededa081ed20 ("parisc: Fix
defconfig selection") added ARCH_DEFCONFIG for parisc.
Please use KBUILD_DEFCONFIG in arch/*/Makefile for defconfig selection.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Tue, 21 Dec 2021 20:31:55 +0000 (12:31 -0800)]
Merge tag 'pm-5.16-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a recent regression causing the loop in dpm_prepare() to become
infinite if one of the device ->prepare() callbacks returns an error"
* tag 'pm-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Fix error handling in dpm_prepare()
Linus Torvalds [Tue, 21 Dec 2021 20:25:57 +0000 (12:25 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- Fix for compilation of selftests on non-x86 architectures
- Fix for kvm_run->if_flag on SEV-ES
- Fix for page table use-after-free if yielding during exit_mm()
- Improve behavior when userspace starts a nested guest with invalid
state
- Fix missed wakeup with assigned devices but no VT-d posted interrupts
- Do not tell userspace to save/restore an unsupported PMU MSR
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state
KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state
KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required
KVM: VMX: Always clear vmx->fail on emulation_required
selftests: KVM: Fix non-x86 compiling
KVM: x86: Always set kvm_run->if_flag
KVM: x86/mmu: Don't advance iterator after restart due to yielding
KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all
John David Anglin [Tue, 21 Dec 2021 18:33:16 +0000 (13:33 -0500)]
parisc: Fix mask used to select futex spinlock
The address bits used to select the futex spinlock need to match those used in
the LWS code in syscall.S. The mask 0x3f8 only selects 7 bits. It should
select 8 bits.
This change fixes the glibc nptl/tst-cond24 and nptl/tst-cond25 tests.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Fixes:
53a42b6324b8 ("parisc: Switch to more fine grained lws locks")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Helge Deller <deller@gmx.de>
John David Anglin [Tue, 21 Dec 2021 18:21:22 +0000 (13:21 -0500)]
parisc: Correct completer in lws start
The completer in the "or,ev %r1,%r30,%r30" instruction is reversed, so we are
not clipping the LWS number when we are called from a 32-bit process (W=0).
We need to nulify the following depdi instruction when the least-significant
bit of %r30 is 1.
If the %r20 register is not clipped, a user process could perform a LWS call
that would branch to an undefined location in the kernel and potentially crash
the machine.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Tue, 21 Dec 2021 20:02:36 +0000 (12:02 -0800)]
Merge tag 'nfsd-5.16-3' of git://git./linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
"Address a buffer overrun reported by Anatoly Trosinenko"
* tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix READDIR buffer overflow
Sean Christopherson [Tue, 21 Dec 2021 15:37:00 +0000 (10:37 -0500)]
KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
Drop a check that guards triggering a posted interrupt on the currently
running vCPU, and more importantly guards waking the target vCPU if
triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
If a vIRQ is delivered from asynchronous context, the target vCPU can be
the currently running vCPU and can also be blocking, in which case
skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to
be a wake event for the vCPU.
The "do nothing" logic when "vcpu == running_vcpu" mostly works only
because the majority of calls to ->deliver_posted_interrupt(), especially
when using posted interrupts, come from synchronous KVM context. But if
a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ
and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to
use posted interrupts, wake events from the device will be delivered to
KVM from IRQ context, e.g.
vfio_msihandler()
|
|-> eventfd_signal()
|
|-> ...
|
|-> irqfd_wakeup()
|
|->kvm_arch_set_irq_inatomic()
|
|-> kvm_irq_delivery_to_apic_fast()
|
|-> kvm_apic_set_irq()
This also aligns the non-nested and nested usage of triggering posted
interrupts, and will allow for additional cleanups.
Fixes:
379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reported-by: Longpeng (Mike) <longpeng2@huawei.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <
20211208015236.1616697-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Tue, 21 Dec 2021 17:30:32 +0000 (09:30 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- NULL pointer dereference fix in Vivaldi driver (Jiasheng Jiang)
- regression fix for device probing in Holtek driver (Benjamin
Tissoires)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: potential dereference of null pointer
HID: holtek: fix mouse probing
Wu Bo [Tue, 21 Dec 2021 07:00:34 +0000 (15:00 +0800)]
ipmi: Fix UAF when uninstall ipmi_si and ipmi_msghandler module
Hi,
When testing install and uninstall of ipmi_si.ko and ipmi_msghandler.ko,
the system crashed.
The log as follows:
[ 141.087026] BUG: unable to handle kernel paging request at
ffffffffc09b3a5a
[ 141.087241] PGD
8fe4c0d067 P4D
8fe4c0d067 PUD
8fe4c0f067 PMD
103ad89067 PTE 0
[ 141.087464] Oops: 0010 [#1] SMP NOPTI
[ 141.087580] CPU: 67 PID: 668 Comm: kworker/67:1 Kdump: loaded Not tainted 4.18.0.x86_64 #47
[ 141.088009] Workqueue: events 0xffffffffc09b3a40
[ 141.088009] RIP: 0010:0xffffffffc09b3a5a
[ 141.088009] Code: Bad RIP value.
[ 141.088009] RSP: 0018:
ffffb9094e2c3e88 EFLAGS:
00010246
[ 141.088009] RAX:
0000000000000000 RBX:
ffff9abfdb1f04a0 RCX:
0000000000000000
[ 141.088009] RDX:
0000000000000000 RSI:
0000000000000246 RDI:
0000000000000246
[ 141.088009] RBP:
0000000000000000 R08:
ffff9abfffee3cb8 R09:
00000000000002e1
[ 141.088009] R10:
ffffb9094cb73d90 R11:
00000000000f4240 R12:
ffff9abfffee8700
[ 141.088009] R13:
0000000000000000 R14:
ffff9abfdb1f04a0 R15:
ffff9abfdb1f04a8
[ 141.088009] FS:
0000000000000000(0000) GS:
ffff9abfffec0000(0000) knlGS:
0000000000000000
[ 141.088009] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 141.088009] CR2:
ffffffffc09b3a30 CR3:
0000008fe4c0a001 CR4:
00000000007606e0
[ 141.088009] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 141.088009] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 141.088009] PKRU:
55555554
[ 141.088009] Call Trace:
[ 141.088009] ? process_one_work+0x195/0x390
[ 141.088009] ? worker_thread+0x30/0x390
[ 141.088009] ? process_one_work+0x390/0x390
[ 141.088009] ? kthread+0x10d/0x130
[ 141.088009] ? kthread_flush_work_fn+0x10/0x10
[ 141.088009] ? ret_from_fork+0x35/0x40] BUG: unable to handle kernel paging request at
ffffffffc0b28a5a
[ 200.223240] PGD
97fe00d067 P4D
97fe00d067 PUD
97fe00f067 PMD
a580cbf067 PTE 0
[ 200.223464] Oops: 0010 [#1] SMP NOPTI
[ 200.223579] CPU: 63 PID: 664 Comm: kworker/63:1 Kdump: loaded Not tainted 4.18.0.x86_64 #46
[ 200.224008] Workqueue: events 0xffffffffc0b28a40
[ 200.224008] RIP: 0010:0xffffffffc0b28a5a
[ 200.224008] Code: Bad RIP value.
[ 200.224008] RSP: 0018:
ffffbf3c8e2a3e88 EFLAGS:
00010246
[ 200.224008] RAX:
0000000000000000 RBX:
ffffa0799ad6bca0 RCX:
0000000000000000
[ 200.224008] RDX:
0000000000000000 RSI:
0000000000000246 RDI:
0000000000000246
[ 200.224008] RBP:
0000000000000000 R08:
ffff9fe43fde3cb8 R09:
00000000000000d5
[ 200.224008] R10:
ffffbf3c8cb53d90 R11:
00000000000f4240 R12:
ffff9fe43fde8700
[ 200.224008] R13:
0000000000000000 R14:
ffffa0799ad6bca0 R15:
ffffa0799ad6bca8
[ 200.224008] FS:
0000000000000000(0000) GS:
ffff9fe43fdc0000(0000) knlGS:
0000000000000000
[ 200.224008] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 200.224008] CR2:
ffffffffc0b28a30 CR3:
00000097fe00a002 CR4:
00000000007606e0
[ 200.224008] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 200.224008] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 200.224008] PKRU:
55555554
[ 200.224008] Call Trace:
[ 200.224008] ? process_one_work+0x195/0x390
[ 200.224008] ? worker_thread+0x30/0x390
[ 200.224008] ? process_one_work+0x390/0x390
[ 200.224008] ? kthread+0x10d/0x130
[ 200.224008] ? kthread_flush_work_fn+0x10/0x10
[ 200.224008] ? ret_from_fork+0x35/0x40
[ 200.224008] kernel fault(0x1) notification starting on CPU 63
[ 200.224008] kernel fault(0x1) notification finished on CPU 63
[ 200.224008] CR2:
ffffffffc0b28a5a
[ 200.224008] ---[ end trace
c82a412d93f57412 ]---
The reason is as follows:
T1: rmmod ipmi_si.
->ipmi_unregister_smi()
-> ipmi_bmc_unregister()
-> __ipmi_bmc_unregister()
-> kref_put(&bmc->usecount, cleanup_bmc_device);
-> schedule_work(&bmc->remove_work);
T2: rmmod ipmi_msghandler.
ipmi_msghander module uninstalled, and the module space
will be freed.
T3: bmc->remove_work doing cleanup the bmc resource.
-> cleanup_bmc_work()
-> platform_device_unregister(&bmc->pdev);
-> platform_device_del(pdev);
-> device_del(&pdev->dev);
-> kobject_uevent(&dev->kobj, KOBJ_REMOVE);
-> kobject_uevent_env()
-> dev_uevent()
-> if (dev->type && dev->type->name)
'dev->type'(bmc_device_type) pointer space has freed when uninstall
ipmi_msghander module, 'dev->type->name' cause the system crash.
drivers/char/ipmi/ipmi_msghandler.c:
2820 static const struct device_type bmc_device_type = {
2821 .groups = bmc_dev_attr_groups,
2822 };
Steps to reproduce:
Add a time delay in cleanup_bmc_work() function,
and uninstall ipmi_si and ipmi_msghandler module.
2910 static void cleanup_bmc_work(struct work_struct *work)
2911 {
2912 struct bmc_device *bmc = container_of(work, struct bmc_device,
2913 remove_work);
2914 int id = bmc->pdev.id; /* Unregister overwrites id */
2915
2916 msleep(3000); <---
2917 platform_device_unregister(&bmc->pdev);
2918 ida_simple_remove(&ipmi_bmc_ida, id);
2919 }
Use 'remove_work_wq' instead of 'system_wq' to solve this issues.
Fixes:
b2cfd8ab4add ("ipmi: Rework device id and guid handling to catch changing BMCs")
Signed-off-by: Wu Bo <wubo40@huawei.com>
Message-Id: <
1640070034-56671-1-git-send-email-wubo40@huawei.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Linus Torvalds [Tue, 21 Dec 2021 01:26:42 +0000 (17:26 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Last fixes before holidays. Nothing very exciting:
- Work around a HW bug in HNS HIP08
- Recent memory leak regression in qib
- Incorrect use of kfree() for vmalloc memory in hns"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Replace kfree() with kvfree()
IB/qib: Fix memory leak in qib_user_sdma_queue_pkts()
RDMA/hns: Fix RNR retransmission issue for HIP08
Linus Torvalds [Mon, 20 Dec 2021 18:23:19 +0000 (10:23 -0800)]
Merge tag 'spi-fix-v5.16-rc6' of git://git./linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One small fix for a long standing issue with error handling on probe
in the Armada driver"
* tag 'spi-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: change clk_disable_unprepare to clk_unprepare
Linus Torvalds [Mon, 20 Dec 2021 18:13:30 +0000 (10:13 -0800)]
Merge tag 'regulator-fix-v5.16-rc6' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Binding fix for v5.16
This fixes problems validating DT bindings using op_mode which wasn't
described as it should have been when converting to DT schema"
* tag 'regulator-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: samsung,s5m8767: add missing op_mode to bucks
Linus Torvalds [Mon, 20 Dec 2021 15:42:21 +0000 (07:42 -0800)]
Merge branch 'xsa' of git://git./linux/kernel/git/xen/tip
Merge xen fixes from Juergen Gross:
"Fixes for two issues related to Xen and malicious guests:
- Guest can force the netback driver to hog large amounts of memory
- Denial of Service in other guests due to event storms"
* 'xsa' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/netback: don't queue unlimited number of packages
xen/netback: fix rx queue stall detection
xen/console: harden hvc_xen against event channel storms
xen/netfront: harden netfront against event channel storms
xen/blkfront: harden blkfront against event channel storms
Helge Deller [Wed, 8 Dec 2021 10:06:52 +0000 (11:06 +0100)]
parisc: Clear stale IIR value on instruction access rights trap
When a trap 7 (Instruction access rights) occurs, this means the CPU
couldn't execute an instruction due to missing execute permissions on
the memory region. In this case it seems the CPU didn't even fetched
the instruction from memory and thus did not store it in the cr19 (IIR)
register before calling the trap handler. So, the trap handler will find
some random old stale value in cr19.
This patch simply overwrites the stale IIR value with a constant magic
"bad food" value (0xbaadf00d), in the hope people don't start to try to
understand the various random IIR values in trap 7 dumps.
Noticed-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Sean Christopherson [Tue, 7 Dec 2021 19:30:06 +0000 (19:30 +0000)]
KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state
Add a selftest to attempt to enter L2 with invalid guests state by
exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set
invalid guest state (marking TR unusable is arbitrary chosen for its
relative simplicity).
This is a regression test for a bug introduced by commit
c8607e4a086f
("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if
!from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid
guest state and ultimately triggered a WARN due to nested_vmx_vmexit()
seeing vmx->fail==true while attempting to synthesize a nested VM-Exit.
The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT
for L2, which is somewhat arbitrary behavior, instead of emulating L2.
KVM should never emulate L2 due to invalid guest state, as it's
architecturally impossible for L1 to run an L2 guest with invalid state
as nested VM-Enter should always fail, i.e. L1 needs to do the emulation.
Stuffing state via KVM ioctl() is a non-architctural, out-of-band case,
hence the TRIPLE_FAULT being rather arbitrary.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211207193006.120997-5-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 7 Dec 2021 19:30:05 +0000 (19:30 +0000)]
KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state
Update the documentation for kvm-intel's emulate_invalid_guest_state to
rectify the description of KVM's default behavior, and to document that
the behavior and thus parameter only applies to L1.
Fixes:
a27685c33acc ("KVM: VMX: Emulate invalid guest state by default")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211207193006.120997-4-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 7 Dec 2021 19:30:04 +0000 (19:30 +0000)]
KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required
Synthesize a triple fault if L2 guest state is invalid at the time of
VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
guest state via ioctls(), e.g. KVM_SET_SREGS. KVM should never emulate
invalid guest state, since from L1's perspective, it's architecturally
impossible for L2 to have invalid state while L2 is running in hardware.
E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
or #GP.
Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
can trigger this scenario, as nested VM-Enter correctly rejects any
attempt to enter L2 with invalid state.
RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
Following AMD's SMRAM layout is important as AMD's layout saves/restores
the descriptor cache information, including CS.RPL and SS.RPL, and also
defines all the fields relevant to invalid guest state as read-only, i.e.
so long as the vCPU had valid state before the SMI, which is guaranteed
for L2, RSM will generate valid state unless SMRAM was modified. Intel's
layout saves/restores only the selector, which means that scenarios where
the selector and cached RPL don't match, e.g. conforming code segments,
would yield invalid guest state. Intel CPUs fudge around this issued by
stuffing SS.RPL and CS.RPL on RSM. Per Intel's SDM on the "Default
Treatment of RSM", paraphrasing for brevity:
IF internal storage indicates that the [CPU was post-VMXON]
THEN
enter VMX operation (root or non-root);
restore VMX-critical state as defined in Section 34.14.1;
set to their fixed values any bits in CR0 and CR4 whose values must
be fixed in VMX operation [unless coming from an unrestricted guest];
IF RFLAGS.VM = 0 AND (in VMX root operation OR the
“unrestricted guest” VM-execution control is 0)
THEN
CS.RPL := SS.DPL;
SS.RPL := SS.DPL;
FI;
restore current VMCS pointer;
FI;
Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
will sythesize TRIPLE_FAULT in this scenario. KVM's behavior is allowed
as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
only way for CR0 and/or CR4 to have illegal values is if they were
modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
section states "modifying these registers will result in unpredictable
behavior".
KVM's ioctl() behavior is less straightforward. Because KVM allows
ioctls() to be executed in any order, rejecting an ioctl() if it would
result in invalid L2 guest state is not an option as KVM cannot know if
a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE. Ideally, KVM would
reject KVM_RUN if L2 contained invalid guest state, but that carries the
risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
exited to userspace. Setting a flag/request to detect such a scenario is
undesirable because (a) it's extremely unlikely to add value to KVM as a
whole, and (b) KVM would need to consider ioctl() interactions with such
a flag, e.g. if userspace migrated the vCPU while the flag were set.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211207193006.120997-3-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 7 Dec 2021 19:30:03 +0000 (19:30 +0000)]
KVM: VMX: Always clear vmx->fail on emulation_required
Revert a relatively recent change that set vmx->fail if the vCPU is in L2
and emulation_required is true, as that behavior is completely bogus.
Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong:
(a) it's impossible to have both a VM-Fail and VM-Exit
(b) vmcs.EXIT_REASON is not modified on VM-Fail
(c) emulation_required refers to guest state and guest state checks are
always VM-Exits, not VM-Fails.
For KVM specifically, emulation_required is handled before nested exits
in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect,
i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored.
Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit()
firing when tearing down the VM as KVM never expects vmx->fail to be set
when L2 is active, KVM always reflects those errors into L1.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548
nested_vmx_vmexit+0x16bd/0x17e0
arch/x86/kvm/vmx/nested.c:4547
Modules linked in:
CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547
Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80
Call Trace:
vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline]
nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330
vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799
kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989
kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441
kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline]
kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545
kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline]
kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220
kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489
__fput+0x3fc/0x870 fs/file_table.c:280
task_work_run+0x146/0x1c0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0x705/0x24f0 kernel/exit.c:832
do_group_exit+0x168/0x2d0 kernel/exit.c:929
get_signal+0x1740/0x2120 kernel/signal.c:2852
arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
handle_signal_work kernel/entry/common.c:148 [inline]
exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes:
c8607e4a086f ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry")
Reported-by: syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211207193006.120997-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Jones [Tue, 14 Dec 2021 15:18:42 +0000 (16:18 +0100)]
selftests: KVM: Fix non-x86 compiling
Attempting to compile on a non-x86 architecture fails with
include/kvm_util.h: In function ‘vm_compute_max_gfn’:
include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type ‘struct kvm_vm’
return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
^~
This is because the declaration of struct kvm_vm is in
lib/kvm_util_internal.h as an effort to make it private to
the test lib code. We can still provide arch specific functions,
though, by making the generic function symbols weak. Do that to
fix the compile error.
Fixes:
c8cc43c1eae2 ("selftests: KVM: avoid failures due to reserved HyperTransport region")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <
20211214151842.848314-1-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Marc Orr [Thu, 9 Dec 2021 15:52:57 +0000 (07:52 -0800)]
KVM: x86: Always set kvm_run->if_flag
The kvm_run struct's if_flag is a part of the userspace/kernel API. The
SEV-ES patches failed to set this flag because it's no longer needed by
QEMU (according to the comment in the source code). However, other
hypervisors may make use of this flag. Therefore, set the flag for
guests with encrypted registers (i.e., with guest_state_protected set).
Fixes:
f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Signed-off-by: Marc Orr <marcorr@google.com>
Message-Id: <
20211209155257.128747-1-marcorr@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Sean Christopherson [Tue, 14 Dec 2021 03:35:28 +0000 (03:35 +0000)]
KVM: x86/mmu: Don't advance iterator after restart due to yielding
After dropping mmu_lock in the TDP MMU, restart the iterator during
tdp_iter_next() and do not advance the iterator. Advancing the iterator
results in skipping the top-level SPTE and all its children, which is
fatal if any of the skipped SPTEs were not visited before yielding.
When zapping all SPTEs, i.e. when min_level == root_level, restarting the
iter and then invoking tdp_iter_next() is always fatal if the current gfn
has as a valid SPTE, as advancing the iterator results in try_step_side()
skipping the current gfn, which wasn't visited before yielding.
Sprinkle WARNs on iter->yielded being true in various helpers that are
often used in conjunction with yielding, and tag the helper with
__must_check to reduce the probabily of improper usage.
Failing to zap a top-level SPTE manifests in one of two ways. If a valid
SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),
the shadow page will be leaked and KVM will WARN accordingly.
WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm]
RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm]
Call Trace:
<TASK>
kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
kvm_destroy_vm+0x162/0x2a0 [kvm]
kvm_vcpu_release+0x34/0x60 [kvm]
__fput+0x82/0x240
task_work_run+0x5c/0x90
do_exit+0x364/0xa10
? futex_unqueue+0x38/0x60
do_group_exit+0x33/0xa0
get_signal+0x155/0x850
arch_do_signal_or_restart+0xed/0x750
exit_to_user_mode_prepare+0xc5/0x120
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x48/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by
kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of
marking a struct page as dirty/accessed after it has been put back on the
free list. This directly triggers a WARN due to encountering a page with
page_count() == 0, but it can also lead to data corruption and additional
errors in the kernel.
WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171
RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm]
Call Trace:
<TASK>
kvm_set_pfn_dirty+0x120/0x1d0 [kvm]
__handle_changed_spte+0x92e/0xca0 [kvm]
__handle_changed_spte+0x63c/0xca0 [kvm]
__handle_changed_spte+0x63c/0xca0 [kvm]
__handle_changed_spte+0x63c/0xca0 [kvm]
zap_gfn_range+0x549/0x620 [kvm]
kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm]
mmu_free_root_page+0x219/0x2c0 [kvm]
kvm_mmu_free_roots+0x1b4/0x4e0 [kvm]
kvm_mmu_unload+0x1c/0xa0 [kvm]
kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm]
kvm_put_kvm+0x3b1/0x8b0 [kvm]
kvm_vcpu_release+0x4e/0x70 [kvm]
__fput+0x1f7/0x8c0
task_work_run+0xf8/0x1a0
do_exit+0x97b/0x2230
do_group_exit+0xda/0x2a0
get_signal+0x3be/0x1e50
arch_do_signal_or_restart+0x244/0x17f0
exit_to_user_mode_prepare+0xcb/0x120
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x4d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Note, the underlying bug existed even before commit
1af4a96025b3 ("KVM:
x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to
tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still
incorrectly advance past a top-level entry when yielding on a lower-level
entry. But with respect to leaking shadow pages, the bug was introduced
by yielding before processing the current gfn.
Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or
callers could jump to their "retry" label. The downside of that approach
is that tdp_mmu_iter_cond_resched() _must_ be called before anything else
in the loop, and there's no easy way to enfornce that requirement.
Ideally, KVM would handling the cond_resched() fully within the iterator
macro (the code is actually quite clean) and avoid this entire class of
bugs, but that is extremely difficult do while also supporting yielding
after tdp_mmu_set_spte_atomic() fails. Yielding after failing to set a
SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly
bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE
may block operations on the SPTE for a significant amount of time.
Fixes:
faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes:
1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211214033528.123268-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jiasheng Jiang [Wed, 15 Dec 2021 08:36:05 +0000 (16:36 +0800)]
HID: potential dereference of null pointer
The return value of devm_kzalloc() needs to be checked.
To avoid hdev->dev->driver_data to be null in case of the failure of
alloc.
Fixes:
14c9c014babe ("HID: add vivaldi HID driver")
Cc: stable@vger.kernel.org
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20211215083605.117638-1-jiasheng@iscas.ac.cn
Benjamin Tissoires [Mon, 20 Dec 2021 09:51:20 +0000 (10:51 +0100)]
HID: holtek: fix mouse probing
An overlook from the previous commit: we don't even parse or start the
device, meaning that the device is not presented to user space.
Fixes:
93020953d0fa ("HID: check for valid USB device for many HID drivers")
Cc: stable@vger.kernel.org
Link: https://bugs.archlinux.org/task/73048
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215341
Link: https://lore.kernel.org/r/e4efbf13-bd8d-0370-629b-6c80c0044b15@leemhuis.info/
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Wei Wang [Fri, 17 Dec 2021 12:49:34 +0000 (07:49 -0500)]
KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all
The fixed counter 3 is used for the Topdown metrics, which hasn't been
enabled for KVM guests. Userspace accessing to it will fail as it's not
included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines,
which have this counter.
To reproduce it on ICX+ machines, ./state_test reports:
==== Test Assertion Failure ====
lib/x86_64/processor.c:1078: r == nmsrs
pid=4564 tid=4564 - Argument list too long
1 0x000000000040b1b9: vcpu_save_state at processor.c:1077
2 0x0000000000402478: main at state_test.c:209 (discriminator 6)
3 0x00007fbe21ed5f92: ?? ??:0
4 0x000000000040264d: _start at ??:?
Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c)
With this patch, it works well.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Message-Id: <
20211217124934.32893-1-wei.w.wang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Sun, 19 Dec 2021 22:14:33 +0000 (14:14 -0800)]
Linux 5.16-rc6
Linus Torvalds [Sun, 19 Dec 2021 20:44:03 +0000 (12:44 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Two small fixes, one of which was being worked around in selftests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Retry page fault if MMU reload is pending and root has no sp
KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs
KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
Linus Torvalds [Sun, 19 Dec 2021 20:38:53 +0000 (12:38 -0800)]
Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block
Pull block revert from Jens Axboe:
"It turns out that the fix for not hammering on the delayed work timer
too much caused a performance regression for BFQ, so let's revert the
change for now.
I've got some ideas on how to fix it appropriately, but they should
wait for 5.17"
* tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block:
Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
Linus Torvalds [Sun, 19 Dec 2021 20:28:46 +0000 (12:28 -0800)]
Merge tag 'irq_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it
is restored to its reset state
- Mask MSI-X vectors late on the init path in order to handle
out-of-spec Marvell NVME devices which apparently look at the MSI-X
mask even when MSI-X is disabled
* tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
PCI/MSI: Mask MSI-X vectors only on success
Linus Torvalds [Sun, 19 Dec 2021 20:23:18 +0000 (12:23 -0800)]
Merge tag 'timers_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never
positive
* tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Really make sure wall_to_monotonic isn't positive
Linus Torvalds [Sun, 19 Dec 2021 20:17:26 +0000 (12:17 -0800)]
Merge tag 'locking_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Fix the rtmutex condition checking when the optimistic spinning of a
waiter needs to be terminated
* tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Linus Torvalds [Sun, 19 Dec 2021 19:46:54 +0000 (11:46 -0800)]
Merge tag 'core_urgent_for_v5.16_rc6' of git://git./linux/kernel/git/tip/tip
Pull signal handlign fix from Borislav Petkov:
- Prevent lock contention on the new sigaltstack lock on the
common-case path, when no changes have been made to the alternative
signal stack.
* tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
signal: Skip the altstack update when not needed
Linus Torvalds [Sun, 19 Dec 2021 19:40:11 +0000 (11:40 -0800)]
Merge tag 'mips-fixes_5.16_3' of git://git./linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
- only enable pci_remap_iospace() for Ralink devices
* tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Only define pci_remap_iospace() for Ralink
Linus Torvalds [Sun, 19 Dec 2021 19:31:14 +0000 (11:31 -0800)]
Merge tag 'powerpc-5.16-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix a recently introduced oops at boot on 85xx in some configurations.
Fix crashes when loading some livepatch modules with
STRICT_MODULE_RWX.
Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"
* tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/module_64: Fix livepatching for RO modules
powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
Linus Torvalds [Sun, 19 Dec 2021 19:23:02 +0000 (11:23 -0800)]
Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Two cifs/smb3 fixes, one fscache related, and one mount parsing
related for stable"
* tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: sanitize multiple delimiters in prepath
cifs: ignore resource_id while getting fscache super cookie
Sean Christopherson [Thu, 9 Dec 2021 06:05:46 +0000 (06:05 +0000)]
KVM: x86: Retry page fault if MMU reload is pending and root has no sp
Play nice with a NULL shadow page when checking for an obsolete root in
the page fault handler by flagging the page fault as stale if there's no
shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
Invalidating memslots, which is the only case where _all_ roots need to
be reloaded, requests all vCPUs to reload their MMUs while holding
mmu_lock for lock.
The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
backed by a shadow page. Running with TDP disabled or with nested NPT
explodes spectaculary due to dereferencing a NULL shadow page pointer.
Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
root. Zapping shadow pages in response to guest activity, e.g. when the
guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
vCPU isn't using the affected root. I.e. KVM_REQ_MMU_RELOAD can be seen
with a completely valid root shadow page. This is a bit of a moot point
as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
be cleaned up in the future.
Fixes:
a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20211209060552.2956723-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 16 Dec 2021 16:52:12 +0000 (17:52 +0100)]
KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs
Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend
on guest visible CPUIDs and (incorrect) KVM logic implementing it is
about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden
and causes test to fail.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes:
feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211216165213.338923-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 16 Dec 2021 16:52:13 +0000 (17:52 +0100)]
KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should
not depend on guest visible CPUID entries, even if just to allow
creating/restoring guest MSRs and CPUIDs in any sequence.
Fixes:
27461da31089 ("KVM: x86/pmu: Support full width counting")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20211216165213.338923-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jens Axboe [Sun, 19 Dec 2021 14:58:44 +0000 (07:58 -0700)]
Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
This reverts commit
cb2ac2912a9ca7d3d26291c511939a41361d2d83.
Alex and the kernel test robot report that this causes a significant
performance regression with BFQ. I can reproduce that result, so let's
revert this one as we're close to -rc6 and we there's no point in trying
to rush a fix.
Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/
Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/
Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chuck Lever [Thu, 16 Dec 2021 16:12:11 +0000 (11:12 -0500)]
NFSD: Fix READDIR buffer overflow
If a client sends a READDIR count argument that is too small (say,
zero), then the buffer size calculation in the new init_dirlist
helper functions results in an underflow, allowing the XDR stream
functions to write beyond the actual buffer.
This calculation has always been suspect. NFSD has never sanity-
checked the READDIR count argument, but the old entry encoders
managed the problem correctly.
With the commits below, entry encoding changed, exposing the
underflow to the pointer arithmetic in xdr_reserve_space().
Modern NFS clients attempt to retrieve as much data as possible
for each READDIR request. Also, we have no unit tests that
exercise the behavior of READDIR at the lower bound of @count
values. Thus this case was missed during testing.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Fixes:
f5dcccd647da ("NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream")
Fixes:
7f87fc2d34d4 ("NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Linus Torvalds [Sat, 18 Dec 2021 21:23:55 +0000 (13:23 -0800)]
Merge tag 'tty-5.16-rc6' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are two small tty/serial fixes for 5.16-rc6. They include:
- n_hdlc fix for syzbot reported problem that you were previously
copied on.
- 8250_fintek driver fix that resolved a console problem by removing
a previous change.
Both have been in linux-next with no reported issues"
* tag 'tty-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_fintek: Fix garbled text for console
tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
Linus Torvalds [Sat, 18 Dec 2021 21:16:43 +0000 (13:16 -0800)]
Merge tag 'usb-5.16-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported problems.
They include:
- dwc2 driver fixes
- xhci driver fixes
- cdnsp driver fixes
- typec driver fix
- gadget u_ether driver fix
- new quirk additions
- usb gadget endpoint calculation fix
- usb serial new device ids
- revert of a xhci-dbg change that broke early debug booting
All changes, except for the revert, have been in linux-next with no
reported problems. The revert was from yesterday, and it was reported
by the developers affected that it resolved their problem"
* tag 'usb-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: early: convert to readl_poll_timeout_atomic()"
usb: typec: tcpm: fix tcpm unregister port but leave a pending timer
usb: cdnsp: Fix lack of spin_lock_irqsave/spin_lock_restore
USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04)
usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe
USB: gadget: bRequestType is a bitfield, not a enum
USB: serial: option: add Telit FN990 compositions
USB: serial: cp210x: fix CP2105 GPIO registration
usb: cdnsp: Fix incorrect status for control request
usb: cdnsp: Fix issue in cdnsp_log_ep trace event
usb: cdnsp: Fix incorrect calling of cdnsp_died function
usb: xhci-mtk: fix list_del warning when enable list debug
usb: gadget: u_ether: fix race in setting MAC address in setup phase
Linus Torvalds [Sat, 18 Dec 2021 19:53:14 +0000 (11:53 -0800)]
Merge tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix segfaults in 'perf inject' related to usage of unopened files
- The return value of hashmap__new() should be checked using IS_ERR()
* tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf inject: Fix segfault due to perf_data__fd() without open
perf inject: Fix segfault due to close without open
perf expr: Fix missing check for return value of hashmap__new()
Adrian Hunter [Mon, 13 Dec 2021 08:48:29 +0000 (10:48 +0200)]
perf inject: Fix segfault due to perf_data__fd() without open
The fixed commit attempts to get the output file descriptor even if the
file was never opened e.g.
$ perf record uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
$ perf inject -i perf.data --vm-time-correlation=dry-run
Segmentation fault (core dumped)
$ gdb --quiet perf
Reading symbols from perf...
(gdb) r inject -i perf.data --vm-time-correlation=dry-run
Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
__GI___fileno (fp=0x0) at fileno.c:35
35 fileno.c: No such file or directory.
(gdb) bt
#0 __GI___fileno (fp=0x0) at fileno.c:35
#1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72
#2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69
#3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017
#4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313
#5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
#7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539
(gdb)
Fixes:
0ae03893623dd1dd ("perf tools: Pass a fd to perf_file_header__read_pipe()")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 13 Dec 2021 08:48:28 +0000 (10:48 +0200)]
perf inject: Fix segfault due to close without open
The fixed commit attempts to close inject.output even if it was never
opened e.g.
$ perf record uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
$ perf inject -i perf.data --vm-time-correlation=dry-run
Segmentation fault (core dumped)
$ gdb --quiet perf
Reading symbols from perf...
(gdb) r inject -i perf.data --vm-time-correlation=dry-run
Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
48 iofclose.c: No such file or directory.
(gdb) bt
#0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
#1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376
#2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085
#3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313
#4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
#6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539
(gdb)
Fixes:
02e6246f5364d526 ("perf inject: Close inject.output on exit")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Miaoqian Lin [Sun, 12 Dec 2021 06:25:02 +0000 (06:25 +0000)]
perf expr: Fix missing check for return value of hashmap__new()
The hashmap__new() function may return ERR_PTR(-ENOMEM) when malloc()
fails, add IS_ERR() checking for ctx->ids.
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20211212062504.25841-1-linmq006@gmail.com
[ s/kfree()/free()/ and add missing linux/err.h include ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Zqiang [Fri, 17 Dec 2021 07:42:07 +0000 (15:42 +0800)]
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Optimistic spinning needs to be terminated when the spinning waiter is not
longer the top waiter on the lock, but the condition is negated. It
terminates if the waiter is the top waiter, which is defeating the whole
purpose.
Fixes:
c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless")
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com
Linus Torvalds [Sat, 18 Dec 2021 01:24:53 +0000 (17:24 -0800)]
Merge tag 'libata-5.16-rc6' of git://git./linux/kernel/git/dlemoal/libata
Pull libata fix from Damien Le Moal:
"A single fix for this cycle:
- Check that ATA16 passthrough commands that do not transfer any data
have a DMA direction set to DMA_NONE (From George)"
* tag 'libata-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
libata: if T_LENGTH is zero, dma direction should be DMA_NONE
Linus Torvalds [Sat, 18 Dec 2021 01:19:51 +0000 (17:19 -0800)]
Merge tag 'zonefs-5.16-rc6' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
"One fix and one trivial update for rc6:
- Add MODULE_ALIAS_FS to get automatic module loading on mount
(Naohiro)
- Update Damien's email address in the MAINTAINERS file (me)"
* tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
MAITAINERS: Change zonefs maintainer email address
zonefs: add MODULE_ALIAS_FS
Thiago Rafael Becker [Fri, 17 Dec 2021 18:20:22 +0000 (15:20 -0300)]
cifs: sanitize multiple delimiters in prepath
mount.cifs can pass a device with multiple delimiters in it. This will
cause rename(2) to fail with ENOENT.
V2:
- Make sanitize_path more readable.
- Fix multiple delimiters between UNC and prepath.
- Avoid a memory leak if a bad user starts putting a lot of delimiters
in the path on purpose.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2031200
Fixes:
24e0a1eff9e2 ("cifs: switch to new mount api")
Cc: stable@vger.kernel.org # 5.11+
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Shyam Prasad N [Wed, 8 Dec 2021 16:33:19 +0000 (16:33 +0000)]
cifs: ignore resource_id while getting fscache super cookie
We have a cyclic dependency between fscache super cookie
and root inode cookie. The super cookie relies on
tcon->resource_id, which gets populated from the root inode
number. However, fetching the root inode initializes inode
cookie as a child of super cookie, which is yet to be populated.
resource_id is only used as auxdata to check the validity of
super cookie. We can completely avoid setting resource_id to
remove the circular dependency. Since vol creation time and
vol serial numbers are used for auxdata, we should be fine.
Additionally, there will be auxiliary data check for each
inode cookie as well.
Fixes:
5bf91ef03d98 ("cifs: wait for tcon resource_id before getting fscache super")
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Yu Liao [Mon, 13 Dec 2021 13:57:27 +0000 (21:57 +0800)]
timekeeping: Really make sure wall_to_monotonic isn't positive
Even after commit
e1d7ba873555 ("time: Always make sure wall_to_monotonic
isn't positive") it is still possible to make wall_to_monotonic positive
by running the following code:
int main(void)
{
struct timespec time;
clock_gettime(CLOCK_MONOTONIC, &time);
time.tv_nsec = 0;
clock_settime(CLOCK_REALTIME, &time);
return 0;
}
The reason is that the second parameter of timespec64_compare(), ts_delta,
may be unnormalized because the delta is calculated with an open coded
substraction which causes the comparison of tv_sec to yield the wrong
result:
wall_to_monotonic = { .tv_sec = -10, .tv_nsec =
900000000 }
ts_delta = { .tv_sec = -9, .tv_nsec = -
900000000 }
That makes timespec64_compare() claim that wall_to_monotonic < ts_delta,
but actually the result should be wall_to_monotonic > ts_delta.
After normalization, the result of timespec64_compare() is correct because
the tv_sec comparison is not longer misleading:
wall_to_monotonic = { .tv_sec = -10, .tv_nsec =
900000000 }
ts_delta = { .tv_sec = -10, .tv_nsec =
100000000 }
Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the
issue.
Fixes:
e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
Linus Torvalds [Fri, 17 Dec 2021 21:55:03 +0000 (13:55 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One driver fix: the pm8001 has never actually worked on a system with
an IOMMU and this fixes that use case"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: pm8001: Fix phys_to_virt() usage on dma_addr_t
Linus Torvalds [Fri, 17 Dec 2021 21:50:58 +0000 (13:50 -0800)]
Merge tag 'for-5.16-rc5-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes, almost all error handling one-liners and for stable.
- regression fix in directory logging items
- regression fix of extent buffer status bits handling after an error
- fix memory leak in error handling path in tree-log
- fix freeing invalid anon device number when handling errors during
subvolume creation
- fix warning when freeing leaf after subvolume creation failure
- fix missing blkdev put in device scan error handling
- fix invalid delayed ref after subvolume creation failure"
* tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix missing blkdev_put() call in btrfs_scan_one_device()
btrfs: fix warning when freeing leaf after subvolume creation failure
btrfs: fix invalid delayed ref after subvolume creation failure
btrfs: check WRITE_ERR when trying to read an extent buffer
btrfs: fix missing last dir item offset update when logging directory
btrfs: fix double free of anon_dev after failure to create subvolume
btrfs: fix memory leak in __add_inode_ref()
Thadeu Lima de Souza Cascardo [Fri, 17 Dec 2021 15:44:10 +0000 (12:44 -0300)]
ipmi: fix initialization when workqueue allocation fails
If the workqueue allocation fails, the driver is marked as not initialized,
and timer and panic_notifier will be left registered.
Instead of removing those when workqueue allocation fails, do the workqueue
initialization before doing it, and cleanup srcu_struct if it fails.
Fixes:
1d49eb91e86e ("ipmi: Move remove_work to dedicated workqueue")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Corey Minyard <cminyard@mvista.com>
Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Cc: stable@vger.kernel.org
Message-Id: <
20211217154410.1228673-2-cascardo@canonical.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Thadeu Lima de Souza Cascardo [Fri, 17 Dec 2021 15:44:09 +0000 (12:44 -0300)]
ipmi: bail out if init_srcu_struct fails
In case, init_srcu_struct fails (because of memory allocation failure), we
might proceed with the driver initialization despite srcu_struct not being
entirely initialized.
Fixes:
913a89f009d9 ("ipmi: Don't initialize anything in the core until something uses it")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org
Message-Id: <
20211217154410.1228673-1-cascardo@canonical.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Linus Torvalds [Fri, 17 Dec 2021 20:03:14 +0000 (12:03 -0800)]
Merge tag 'selinux-pr-
20211217' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"Another small SELinux fix for v5.16 to ensure that we don't block on
memory allocations while holding a spinlock.
This passes all our tests without problem"
* tag 'selinux-pr-
20211217' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix sleeping function called from invalid context
Linus Torvalds [Fri, 17 Dec 2021 19:55:35 +0000 (11:55 -0800)]
Merge tag 'riscv-for-linus-5.16-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A handful of DT updates for the SiFive HiFive Unmatched, that fix the
regulator handling. These should stop some warning spew.
- A pair of fixes for both the SiFive Hifive Unleashed and Unmatched,
that correctly hook up the MMC card detect signal.
* tag 'riscv-for-linus-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: sifive unmatched: Link the tmp451 with its power supply
riscv: dts: sifive unmatched: Fix regulator for board rev3
riscv: dts: sifive unmatched: Expose the PMIC sub-functions
riscv: dts: sifive unmatched: Expose the board ID eeprom
riscv: dts: sifive unmatched: Name gpio lines
riscv: dts: unmatched: Add gpio card detect to mmc-spi-slot
riscv: dts: unleashed: Add gpio card detect to mmc-spi-slot
Linus Torvalds [Fri, 17 Dec 2021 19:46:07 +0000 (11:46 -0800)]
Merge tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix for hammering on the delayed run queue timer (me)
- bcache regression fix for this merge window (Lin)
- Fix a divide-by-zero in the blk-iocost code (Tejun)
* tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block:
bcache: fix NULL pointer reference in cached_dev_detach_finish
block: reduce kblockd_mod_delayed_work_on() CPU consumption
iocost: Fix divide-by-zero on donation from low hweight cgroup
Linus Torvalds [Fri, 17 Dec 2021 19:31:46 +0000 (11:31 -0800)]
Merge tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a single fix, fixing an issue with the worker creation change
that was merged last week"
* tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block:
io-wq: drop wqe lock before creating new worker
Linus Torvalds [Fri, 17 Dec 2021 19:25:50 +0000 (11:25 -0800)]
Merge tag 'dmaengine-fix-5.16' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A bunch of driver fixes, notably:
- uninit variable fix for dw-axi-dmac driver
- return value check dw-edma driver
- calling wq quiesce inside spinlock and missed completion for idxd
driver
- mod alias fix for st_fdma driver"
* tag 'dmaengine-fix-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: st_fdma: fix MODULE_ALIAS
dmaengine: idxd: fix missed completion on abort path
dmaengine: ti: k3-udma: Fix smatch warnings
dmaengine: idxd: fix calling wq quiesce inside spinlock
dmaengine: dw-edma: Fix return value check for dma_set_mask_and_coherent()
dmaengine: dw-axi-dmac: Fix uninitialized variable in axi_chan_block_xfer_start()
Linus Torvalds [Fri, 17 Dec 2021 19:07:13 +0000 (11:07 -0800)]
Merge tag 'drm-fixes-2021-12-17-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Mostly amdgpu fixes this week scattered around the driver, otherwise
one i915, one ast, one simpledrm. There is a revert in the fb-helper
for places userspace was using a string that we tried to change.
i915:
- Fix a bound check in the DMC fw load.
ast:
- NULL ptr deref fix
simpledrm:
- pixel clock units fix
fb-helper:
- userspace regression revert
amdgpu:
- Fix RLC register offset
- GMC fix
- Properly cache SMU FW version on Yellow Carp
- Fix missing callback on DCN3.1
- Reset DMCUB before HW init
- Fix for GMC powergating on PCO
- Fix a possible memory leak in GPU metrics table handling on RN"
* tag 'drm-fixes-2021-12-17-1' of git://anongit.freedesktop.org/drm/drm:
drm/amd/pm: fix a potential gpu_metrics_table memory leak
drm/amdgpu: correct the wrong cached state for GMC on PICASSO
drm/amd/display: Reset DMCUB before HW init
drm/amd/display: Set exit_optimized_pwr_state for DCN31
drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC
drm/amdgpu: don't override default ECO_BITs setting
drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE
drm/i915/display: Fix an unsigned subtraction which can never be negative.
drm/ast: potential dereference of null pointer
drm: simpledrm: fix wrong unit with pixel clock
Revert "drm/fb-helper: improve DRM fbdev emulation device names"
Rafael J. Wysocki [Thu, 16 Dec 2021 19:30:18 +0000 (20:30 +0100)]
PM: sleep: Fix error handling in dpm_prepare()
Commit
2aa36604e824 ("PM: sleep: Avoid calling put_device() under
dpm_list_mtx") forgot to update the while () loop termination
condition to also break the loop if error is nonzero, which
causes the loop to become infinite if device_prepare() returns
an error for one device.
Add the missing !error check.
Fixes:
2aa36604e824 ("PM: sleep: Avoid calling put_device() under dpm_list_mtx")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Greg Kroah-Hartman [Fri, 17 Dec 2021 15:24:30 +0000 (16:24 +0100)]
Revert "usb: early: convert to readl_poll_timeout_atomic()"
This reverts commit
796eed4b2342c9d6b26c958e92af91253a2390e1.
This change causes boot lockups when using "arlyprintk=xdbc" because
ktime can not be used at this point in time in the boot process. Also,
it is not needed for very small delays like this.
Reported-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Fixes:
796eed4b2342 ("usb: early: convert to readl_poll_timeout_atomic()")
Link: https://lore.kernel.org/r/c2b5c9bb-1b75-bf56-3754-b5b18812d65e@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 17 Dec 2021 09:37:02 +0000 (10:37 +0100)]
Merge tag 'usb-serial-5.16-rc6' of https://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.16-rc6
Here's a fix for a reported problem in the cp210x gpio-registration code
and some more modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.16-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit FN990 compositions
USB: serial: cp210x: fix CP2105 GPIO registration
Damien Le Moal [Fri, 17 Dec 2021 07:41:17 +0000 (16:41 +0900)]
MAITAINERS: Change zonefs maintainer email address
Update my email address from damien.lemoal@wdc.com to
damien.lemoal@opensource.wdc.com.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Naohiro Aota [Fri, 17 Dec 2021 06:15:45 +0000 (15:15 +0900)]
zonefs: add MODULE_ALIAS_FS
Add MODULE_ALIAS_FS() to load the module automatically when you do "mount
-t zonefs".
Fixes:
8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable <stable@vger.kernel.org> # 5.6+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:42 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Link the tmp451 with its power supply
Fixes the following probe warning:
lm90 0-004c: Looking up vcc-supply from device tree
lm90 0-004c: Looking up vcc-supply property in node /soc/i2c@
10030000/temperature-sensor@4c failed
lm90 0-004c: supply vcc not found, using dummy regulator
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:41 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Fix regulator for board rev3
The existing values are rejected by the da9063 regulator driver, as they
are unachievable with the declared chip setup (non-merged vcore and bmem
are unable to provide the declared curent).
Fix voltages to match rev3 schematics, which also matches their boot-up
configuration within the chip's available precision.
Declare bcore1/bcore2 and bmem/bio as merged.
Set ldo09 and ldo10 as always-on as their consumers are not declared but
exist.
Drop ldo current limits as there is no current limit feature for these
regulators in the DA9063. Fixes warnings like:
DA9063_LDO3: Operation of current configuration missing
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:39 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Expose the PMIC sub-functions
These sub-functions are available in the chip revision on this board, so
expose them.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:38 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Expose the board ID eeprom
Mark it as read-only as it is factory-programmed with identifying
information, and no executable nor configuration:
- eth MAC address
- board model (PCB version, BoM version)
- board serial number
Accidental modification would cause misidentification which could brick
the board, so marking read-only seem like both a safe and non-constraining
choice.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Vincent Pelletier [Tue, 16 Nov 2021 23:57:37 +0000 (23:57 +0000)]
riscv: dts: sifive unmatched: Name gpio lines
Follow the pin descriptions given in the version 3 of the board schematics.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dave Airlie [Fri, 17 Dec 2021 05:01:00 +0000 (15:01 +1000)]
Merge tag 'amd-drm-fixes-5.16-2021-12-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.16-2021-12-15:
amdgpu:
- Fix RLC register offset
- GMC fix
- Properly cache SMU FW version on Yellow Carp
- Fix missing callback on DCN3.1
- Reset DMCUB before HW init
- Fix for GMC powergating on PCO
- Fix a possible memory leak in GPU metrics table handling on RN
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216035239.5787-1-alexander.deucher@amd.com
Dave Airlie [Fri, 17 Dec 2021 04:16:57 +0000 (14:16 +1000)]
Merge tag 'drm-misc-fixes-2021-12-16-1' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes
One null pointer dereference fix for ast, a pixel clock unit fix for
simpledrm and a user-space regression revert for fb-helper
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216082603.pm6yzlckmxvwnqyv@houat
Giovanni Cabiddu [Wed, 17 Nov 2021 14:30:34 +0000 (14:30 +0000)]
crypto: qat - do not handle PFVF sources for qat_4xxx
The QAT driver does not have support for PFVF interrupts for GEN4
devices, therefore report the vf2pf sources as 0.
This prevents a NULL pointer dereference in the function
adf_msix_isr_ae() if the device triggers a spurious interrupt.
Fixes:
993161d36ab5 ("crypto: qat - fix handling of VF to PF interrupts")
Reported-by: Adam Guerin <adam.guerin@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
George Kennedy [Tue, 14 Dec 2021 14:45:10 +0000 (09:45 -0500)]
libata: if T_LENGTH is zero, dma direction should be DMA_NONE
Avoid data corruption by rejecting pass-through commands where
T_LENGTH is zero (No data is transferred) and the dma direction
is not DMA_NONE.
Cc: <stable@vger.kernel.org>
Reported-by: syzkaller<syzkaller@googlegroups.com>
Signed-off-by: George Kennedy<george.kennedy@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Linus Torvalds [Thu, 16 Dec 2021 23:24:46 +0000 (15:24 -0800)]
Merge tag 'audit-pr-
20211216' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single patch to fix a problem where the audit queue could grow
unbounded when the audit daemon is forcibly stopped"
* tag 'audit-pr-
20211216' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: improve robustness of the audit queue handling
Linus Torvalds [Thu, 16 Dec 2021 23:02:14 +0000 (15:02 -0800)]
Merge tag 'net-5.16-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from mac80211, wifi, bpf.
Relatively large batches of fixes from BPF and the WiFi stack, calm in
general networking.
Current release - regressions:
- dpaa2-eth: fix buffer overrun when reporting ethtool statistics
Current release - new code bugs:
- bpf: fix incorrect state pruning for <8B spill/fill
- iavf:
- add missing unlocks in iavf_watchdog_task()
- do not override the adapter state in the watchdog task (again)
- mlxsw: spectrum_router: consolidate MAC profiles when possible
Previous releases - regressions:
- mac80211 fixes:
- rate control, avoid driver crash for retransmitted frames
- regression in SSN handling of addba tx
- a memory leak where sta_info is not freed
- marking TX-during-stop for TX in in_reconfig, prevent stall
- cfg80211: acquire wiphy mutex on regulatory work
- wifi drivers: fix build regressions and LED config dependency
- virtio_net: fix rx_drops stat for small pkts
- dsa: mv88e6xxx: unforce speed & duplex in mac_link_down()
Previous releases - always broken:
- bpf fixes:
- kernel address leakage in atomic fetch
- kernel address leakage in atomic cmpxchg's r0 aux reg
- signed bounds propagation after mov32
- extable fixup offset
- extable address check
- mac80211:
- fix the size used for building probe request
- send ADDBA requests using the tid/queue of the aggregation
session
- agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid
deadlocks
- validate extended element ID is present
- mptcp:
- never allow the PM to close a listener subflow (null-defer)
- clear 'kern' flag from fallback sockets, prevent crash
- fix deadlock in __mptcp_push_pending()
- inet_diag: fix kernel-infoleak for UDP sockets
- xsk: do not sleep in poll() when need_wakeup set
- smc: avoid very long waits in smc_release()
- sch_ets: don't remove idle classes from the round-robin list
- netdevsim:
- zero-initialize memory for bpf map's value, prevent info leak
- don't let user space overwrite read only (max) ethtool parms
- ixgbe: set X550 MDIO speed before talking to PHY
- stmmac:
- fix null-deref in flower deletion w/ VLAN prio Rx steering
- dwmac-rk: fix oob read in rk_gmac_setup
- ice: time stamping fixes
- systemport: add global locking for descriptor life cycle"
* tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits)
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
selftest/bpf: Add a test that reads various addresses.
bpf: Fix extable address check.
bpf: Fix extable fixup offset.
bpf, selftests: Add test case trying to taint map value pointer
bpf: Make 32->64 bounds propagation slightly more robust
bpf: Fix signed bounds propagation after mov32
sit: do not call ipip6_dev_free() from sit_init_net()
net: systemport: Add global locking for descriptor lifecycle
net/smc: Prevent smc_release() from long blocking
net: Fix double 0x prefix print in SKB dump
virtio_net: fix rx_drops stat for small pkts
dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED
sfc_ef100: potential dereference of null pointer
net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup
net: usb: lan78xx: add Allied Telesis AT29M2-AF
net/packet: rx_owner_map depends on pg_vec
netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
dpaa2-eth: fix ethtool statistics
ixgbe: set X550 MDIO speed before talking to PHY
...
Linus Torvalds [Thu, 16 Dec 2021 22:48:57 +0000 (14:48 -0800)]
Merge tag 'soc-fixes-5.16-3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of DT fixes, mostly for mistakes found through
static checking of the dts files again, as well as a couple of minor
changes to address incorrect DT settings.
For i.MX, there is yet another series of devitree changes to update
RGMII delay settings for ethernet, which is an ongoing problem after
some driver changes.
For SoC specific device drivers, a number of smaller fixes came up:
- i.MX SoC identification was incorrectly registered non-i.MX
machines when the driver is built-in
- One fix on imx8m-blk-ctrl driver to get i.MX8MM MIPI reset work
properly
- a few compile fixes for warnings that get in the way of -Werror
- a string overflow in the scpi firmware driver
- a boot failure with FORTIFY_SOURCE on Rockchips machines
- broken error handling in the AMD TEE driver
- a revert for a tegra reset driver commit that broke HDA"
* tag 'soc-fixes-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
soc/tegra: fuse: Fix bitwise vs. logical OR warning
firmware: arm_scpi: Fix string overflow in SCPI genpd driver
soc: imx: Register SoC device only on i.MX boards
soc: imx: imx8m-blk-ctrl: Fix imx8mm mipi reset
ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad name
arm64: dts: imx8mq: remove interconnect property from lcdif
ARM: socfpga: dts: fix qspi node compatible
arm64: dts: apple: add #interrupt-cells property to pinctrl nodes
dt-bindings: i2c: apple,i2c: allow multiple compatibles
arm64: meson: remove COMMON_CLK
arm64: meson: fix dts for JetHub D1
tee: amdtee: fix an IS_ERR() vs NULL bug
arm64: dts: apple: change ethernet0 device type to ethernet
arm64: dts: ten64: remove redundant interrupt declaration for gpio-keys
arm64: dts: rockchip: fix poweroff on helios64
arm64: dts: rockchip: fix audio-supply for Rock Pi 4
arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supply
arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supply
arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edge
ARM: rockchip: Use memcpy_toio instead of memcpy on smp bring-up
...
Scott Mayhew [Wed, 15 Dec 2021 21:28:40 +0000 (16:28 -0500)]
selinux: fix sleeping function called from invalid context
selinux_sb_mnt_opts_compat() is called via sget_fc() under the sb_lock
spinlock, so it can't use GFP_KERNEL allocations:
[ 868.565200] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:230
[ 868.568246] in_atomic(): 1, irqs_disabled(): 0,
non_block: 0, pid: 4914, name: mount.nfs
[ 868.569626] preempt_count: 1, expected: 0
[ 868.570215] RCU nest depth: 0, expected: 0
[ 868.570809] Preemption disabled at:
[ 868.570810] [<
0000000000000000>] 0x0
[ 868.571848] CPU: 1 PID: 4914 Comm: mount.nfs Kdump: loaded
Tainted: G W 5.16.0-rc5.
2585cf9dfa #1
[ 868.573273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-4.fc34 04/01/2014
[ 868.574478] Call Trace:
[ 868.574844] <TASK>
[ 868.575156] dump_stack_lvl+0x34/0x44
[ 868.575692] __might_resched.cold+0xd6/0x10f
[ 868.576308] slab_pre_alloc_hook.constprop.0+0x89/0xf0
[ 868.577046] __kmalloc_track_caller+0x72/0x420
[ 868.577684] ? security_context_to_sid_core+0x48/0x2b0
[ 868.578569] kmemdup_nul+0x22/0x50
[ 868.579108] security_context_to_sid_core+0x48/0x2b0
[ 868.579854] ? _nfs4_proc_pathconf+0xff/0x110 [nfsv4]
[ 868.580742] ? nfs_reconfigure+0x80/0x80 [nfs]
[ 868.581355] security_context_str_to_sid+0x36/0x40
[ 868.581960] selinux_sb_mnt_opts_compat+0xb5/0x1e0
[ 868.582550] ? nfs_reconfigure+0x80/0x80 [nfs]
[ 868.583098] security_sb_mnt_opts_compat+0x2a/0x40
[ 868.583676] nfs_compare_super+0x113/0x220 [nfs]
[ 868.584249] ? nfs_try_mount_request+0x210/0x210 [nfs]
[ 868.584879] sget_fc+0xb5/0x2f0
[ 868.585267] nfs_get_tree_common+0x91/0x4a0 [nfs]
[ 868.585834] vfs_get_tree+0x25/0xb0
[ 868.586241] fc_mount+0xe/0x30
[ 868.586605] do_nfs4_mount+0x130/0x380 [nfsv4]
[ 868.587160] nfs4_try_get_tree+0x47/0xb0 [nfsv4]
[ 868.587724] vfs_get_tree+0x25/0xb0
[ 868.588193] do_new_mount+0x176/0x310
[ 868.588782] __x64_sys_mount+0x103/0x140
[ 868.589388] do_syscall_64+0x3b/0x90
[ 868.589935] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 868.590699] RIP: 0033:0x7f2b371c6c4e
[ 868.591239] Code: 48 8b 0d dd 71 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00
00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 71
0e 00 f7 d8 64 89 01 48
[ 868.593810] RSP: 002b:
00007ffc83775d88 EFLAGS:
00000246
ORIG_RAX:
00000000000000a5
[ 868.594691] RAX:
ffffffffffffffda RBX:
00007ffc83775f10 RCX:
00007f2b371c6c4e
[ 868.595504] RDX:
0000555d517247a0 RSI:
0000555d51724700 RDI:
0000555d51724540
[ 868.596317] RBP:
00007ffc83775f10 R08:
0000555d51726890 R09:
0000555d51726890
[ 868.597162] R10:
0000000000000000 R11:
0000000000000246 R12:
0000555d51726890
[ 868.598005] R13:
0000000000000003 R14:
0000555d517246e0 R15:
0000555d511ac925
[ 868.598826] </TASK>
Cc: stable@vger.kernel.org
Fixes:
69c4a42d72eb ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
[PM: cleanup/line-wrap the backtrace]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Jakub Kicinski [Thu, 16 Dec 2021 21:06:49 +0000 (13:06 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2021-12-16
We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 434 insertions(+), 30 deletions(-).
The main changes are:
1) Fix incorrect verifier state pruning behavior for <8B register spill/fill,
from Paul Chaignon.
2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer
is an ERR_PTR(), from Alexei Starovoitov.
3) Fix 3 different possibilities that BPF verifier missed where unprivileged
could leak kernel addresses, from Daniel Borkmann.
4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson.
5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump,
from Kumar Kartikeya Dwivedi.
6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
selftest/bpf: Add a test that reads various addresses.
bpf: Fix extable address check.
bpf: Fix extable fixup offset.
bpf, selftests: Add test case trying to taint map value pointer
bpf: Make 32->64 bounds propagation slightly more robust
bpf: Fix signed bounds propagation after mov32
bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer
bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg
bpf, selftests: Add test case for atomic fetch on spilled pointer
bpf: Fix kernel address leakage in atomic fetch
selftests/bpf: Fix OOB write in test_verifier
xsk: Do not sleep in poll() when need_wakeup set
selftests/bpf: Tests for state pruning with u32 spill/fill
bpf: Fix incorrect state pruning for <8B spill/fill
====================
Link: https://lore.kernel.org/r/20211216210005.13815-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin KaFai Lau [Thu, 16 Dec 2021 19:16:30 +0000 (11:16 -0800)]
bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
The libbpf CI reported occasional failure in btf_skc_cls_ingress:
test_syncookie:FAIL:Unexpected syncookie states gen_cookie:
80326634 recv_cookie:0
bpf prog error at line 97
"error at line 97" means the bpf prog cannot find the listening socket
when the final ack is received. It then skipped processing
the syncookie in the final ack which then led to "recv_cookie:0".
The problem is the userspace program did not do accept() and went
ahead to close(listen_fd) before the kernel (and the bpf prog) had
a chance to process the final ack.
The fix is to add accept() call so that the userspace will wait for
the kernel to finish processing the final ack first before close()-ing
everything.
Fixes:
9a856cae2217 ("bpf: selftest: Add test_btf_skc_cls_ingress")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211216191630.466151-1-kafai@fb.com
Alexei Starovoitov [Wed, 15 Dec 2021 20:35:34 +0000 (12:35 -0800)]
selftest/bpf: Add a test that reads various addresses.
Add a function to bpf_testmod that returns invalid kernel and user addresses.
Then attach an fexit program to that function that tries to read
memory through these addresses.
This logic checks that bpf_probe_read_kernel and BPF_PROBE_MEM logic is sane.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Wed, 15 Dec 2021 03:25:13 +0000 (19:25 -0800)]
bpf: Fix extable address check.
The verifier checks that PTR_TO_BTF_ID pointer is either valid or NULL,
but it cannot distinguish IS_ERR pointer from valid one.
When offset is added to IS_ERR pointer it may become small positive
value which is a user address that is not handled by extable logic
and has to be checked for at the runtime.
Tighten BPF_PROBE_MEM pointer check code to prevent this case.
Fixes:
4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Reported-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Thu, 16 Dec 2021 02:38:30 +0000 (18:38 -0800)]
bpf: Fix extable fixup offset.
The prog - start_of_ldx is the offset before the faulting ldx to the location
after it, so this will be used to adjust pt_regs->ip for jumping over it and
continuing, and with old temp it would have been fixed up to the wrong offset,
causing crash.
Fixes:
4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Linus Torvalds [Thu, 16 Dec 2021 19:48:59 +0000 (11:48 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"A single fix for the clk framework that needed some more bake time in
linux-next.
The problem is that two clks being registered at the same time can
lead to a busted clk tree if the parent isn't fully registered by the
time the child finds the parent. We rejigger the place where we mark
the parent as fully registered so that the child can't find the parent
until things are proper"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Don't parent clks until the parent is fully registered
Daniel Borkmann [Wed, 15 Dec 2021 23:48:54 +0000 (23:48 +0000)]
bpf, selftests: Add test case trying to taint map value pointer
Add a test case which tries to taint map value pointer arithmetic into a
unknown scalar with subsequent export through the map.
Before fix:
# ./test_verifier 1186
#1186/u map access: trying to leak tained dst reg FAIL
Unexpected success to load!
verification time 24 usec
stack depth 8
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
#1186/p map access: trying to leak tained dst reg FAIL
Unexpected success to load!
verification time 8 usec
stack depth 8
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
After fix:
# ./test_verifier 1186
#1186/u map access: trying to leak tained dst reg OK
#1186/p map access: trying to leak tained dst reg OK
Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>