platform/kernel/linux-starfive.git
21 months agoinit/Kconfig: fix typo (usafe -> unsafe)
Lizzy Fleckenstein [Mon, 9 Jan 2023 20:18:37 +0000 (21:18 +0100)]
init/Kconfig: fix typo (usafe -> unsafe)

Fix the help text for the PRINTK_SAFE_LOG_BUF_SHIFT setting.

Link: https://lkml.kernel.org/r/20230109201837.23873-1-eliasfleckenstein@web.de
Signed-off-by: Lizzy Fleckenstein <eliasfleckenstein@web.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agonommu: fix split_vma() map_count error
Liam Howlett [Mon, 9 Jan 2023 20:58:20 +0000 (20:58 +0000)]
nommu: fix split_vma() map_count error

During the maple tree conversion of nommu, an error in counting the VMAs
was introduced by counting the existing VMA again.  The counting used to
be decremented by one and incremented by two, but now it only increments
by two.  Fix the counting error by moving the increment outside the
setup_vma_to_mm() function to the callers.

Link: https://lkml.kernel.org/r/20230109205809.956325-1-Liam.Howlett@oracle.com
Fixes: 8220543df148 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agonommu: fix do_munmap() error path
Liam Howlett [Mon, 9 Jan 2023 20:57:21 +0000 (20:57 +0000)]
nommu: fix do_munmap() error path

When removing a VMA from the tree fails due to no memory, do not free the
VMA since a reference still exists.

Link: https://lkml.kernel.org/r/20230109205708.956103-1-Liam.Howlett@oracle.com
Fixes: 8220543df148 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agonommu: fix memory leak in do_mmap() error path
Liam Howlett [Mon, 9 Jan 2023 20:55:21 +0000 (20:55 +0000)]
nommu: fix memory leak in do_mmap() error path

The preallocation of the maple tree nodes may leak if the error path to
"error_just_free" is taken.  Fix this by moving the freeing of the maple
tree nodes to a shared location for all error paths.

Link: https://lkml.kernel.org/r/20230109205507.955577-1-Liam.Howlett@oracle.com
Fixes: 8220543df148 ("nommu: remove uses of VMA linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agoMAINTAINERS: update Robert Foss' email address
Robert Foss [Fri, 6 Jan 2023 15:21:51 +0000 (16:21 +0100)]
MAINTAINERS: update Robert Foss' email address

Update the email address for Robert's maintainer entries and fill in
.mailmap accordingly.

Link: https://lkml.kernel.org/r/20230106152151.115648-1-robert.foss@linaro.org
Signed-off-by: Robert Foss <rfoss@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Vasily Averin <vasily.averin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agoproc: fix PIE proc-empty-vm, proc-pid-vm tests
Alexey Dobriyan [Fri, 6 Jan 2023 19:30:14 +0000 (22:30 +0300)]
proc: fix PIE proc-empty-vm, proc-pid-vm tests

vsyscall detection code uses direct call to the beginning of
the vsyscall page:

asm ("call %P0" :: "i" (0xffffffffff600000))

It generates "call rel32" instruction but it is not relocated if binary
is PIE, so binary segfaults into random userspace address and vsyscall
page status is detected incorrectly.

Do more direct:

asm ("call *%rax")

which doesn't do need any relocaltions.

Mark g_vsyscall as volatile for a good measure, I didn't find instruction
setting it to 0. Now the code is obviously correct:

xor eax, eax
mov rdi, rbp
mov rsi, rbp
mov DWORD PTR [rip+0x2d15], eax      # g_vsyscall = 0
mov rax, 0xffffffffff600000
call rax
mov DWORD PTR [rip+0x2d02], 1        # g_vsyscall = 1
mov eax, DWORD PTR ds:0xffffffffff600000
mov DWORD PTR [rip+0x2cf1], 2        # g_vsyscall = 2
mov edi, [rip+0x2ceb]                # exit(g_vsyscall)
call exit

Note: fixed proc-empty-vm test oopses 5.19.0-28-generic kernel
but this is separate story.

Link: https://lkml.kernel.org/r/Y7h2xvzKLg36DSq8@p183
Fixes: 5bc73bb3451b9 ("proc: test how it holds up with mapping'less process")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm: update mmap_sem comments to refer to mmap_lock
Lorenzo Stoakes [Sat, 7 Jan 2023 00:00:05 +0000 (00:00 +0000)]
mm: update mmap_sem comments to refer to mmap_lock

The rename from mm->mmap_sem to mm->mmap_lock was performed in commit
da1c55f1b272 ("mmap locking API: rename mmap_sem to mmap_lock") and commit
c1e8d7c6a7a6 ("map locking API: convert mmap_sem comments"), however some
incorrect comments remain.

This patch simply corrects those comments which are obviously incorrect
within mm itself.

Link: https://lkml.kernel.org/r/33fba04389ab63fc4980e7ba5442f521df6dc657.1673048927.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agoinclude/linux/mm: fix release_pages_arg kernel doc comment
SeongJae Park [Fri, 6 Jan 2023 20:33:31 +0000 (20:33 +0000)]
include/linux/mm: fix release_pages_arg kernel doc comment

Commit 449c796768c9 ("mm: teach release_pages() to take an array of
encoded page pointers too") added the kernel doc comment for
release_pages() on top of 'union release_pages_arg', so making 'make
htmldocs' complains as below:

    ./include/linux/mm.h:1268: warning: cannot understand function prototype: 'typedef union '

The kernel doc comment for the function is already on top of the
function's definition in mm/swap.c, and the new comment is actually not
for the function but indeed release_pages_arg.  Fixing the comment to
reflect the intent would be one option.  But, kernel doc cannot parse
the union as below due to the attribute.

    ./include/linux/mm.h:1272: error: Cannot parse struct or union!

Modify the comment to reflect the intent but do not mark it as a kernel
doc comment.

Link: https://lkml.kernel.org/r/20230106203331.127532-1-sj@kernel.org
Fixes: 449c796768c9 ("mm: teach release_pages() to take an array of encoded page pointers too")
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agolib/win_minmax: use /* notation for regular comments
Randy Dunlap [Mon, 2 Jan 2023 21:16:14 +0000 (13:16 -0800)]
lib/win_minmax: use /* notation for regular comments

Don't use kernel-doc "/**" notation for non-kernel-doc comments.
Prevents a kernel-doc warning:

lib/win_minmax.c:31: warning: expecting prototype for lib/minmax.c(). Prototype was for minmax_subwin_update() instead

Link: https://lkml.kernel.org/r/20230102211614.26343-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agokasan: mark kasan_kunit_executing as static
Andrey Konovalov [Wed, 4 Jan 2023 01:09:33 +0000 (02:09 +0100)]
kasan: mark kasan_kunit_executing as static

Mark kasan_kunit_executing as static, as it is only used within
mm/kasan/report.c.

Link: https://lkml.kernel.org/r/f64778a4683b16a73bba72576f73bf4a2b45a82f.1672794398.git.andreyknvl@google.com
Fixes: c8c7016f50c8 ("kasan: fail non-kasan KUnit tests on KASAN reports")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agonilfs2: fix general protection fault in nilfs_btree_insert()
Ryusuke Konishi [Thu, 5 Jan 2023 05:53:56 +0000 (14:53 +0900)]
nilfs2: fix general protection fault in nilfs_btree_insert()

If nilfs2 reads a corrupted disk image and tries to reads a b-tree node
block by calling __nilfs_btree_get_block() against an invalid virtual
block address, it returns -ENOENT because conversion of the virtual block
address to a disk block address fails.  However, this return value is the
same as the internal code that b-tree lookup routines return to indicate
that the block being searched does not exist, so functions that operate on
that b-tree may misbehave.

When nilfs_btree_insert() receives this spurious 'not found' code from
nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was
successful and continues the insert operation using incomplete lookup path
data, causing the following crash:

 general protection fault, probably for non-canonical address
 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
 ...
 RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline]
 RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline]
 RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238
 Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89
 ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c
 28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02
 ...
 Call Trace:
 <TASK>
  nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline]
  nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147
  nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101
  __block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991
  __block_write_begin fs/buffer.c:2041 [inline]
  block_write_begin+0x93/0x1e0 fs/buffer.c:2102
  nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261
  generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772
  __generic_file_write_iter+0x176/0x400 mm/filemap.c:3900
  generic_file_write_iter+0xab/0x310 mm/filemap.c:3932
  call_write_iter include/linux/fs.h:2186 [inline]
  new_sync_write fs/read_write.c:491 [inline]
  vfs_write+0x7dc/0xc50 fs/read_write.c:584
  ksys_write+0x177/0x2a0 fs/read_write.c:637
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 ...
 </TASK>

This patch fixes the root cause of this problem by replacing the error
code that __nilfs_btree_get_block() returns on block address conversion
failure from -ENOENT to another internal code -EINVAL which means that the
b-tree metadata is corrupted.

By returning -EINVAL, it propagates without glitches, and for all relevant
b-tree operations, functions in the upper bmap layer output an error
message indicating corrupted b-tree metadata via
nilfs_bmap_convert_error(), and code -EIO will be eventually returned as
it should be.

Link: https://lkml.kernel.org/r/000000000000bd89e205f0e38355@google.com
Link: https://lkml.kernel.org/r/20230105055356.8811-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ede796cecd5296353515@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agoDocs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning
Nhat Pham [Fri, 6 Jan 2023 22:00:16 +0000 (14:00 -0800)]
Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning

Writeback has been implemented for zsmalloc, so this warning no longer
holds.

Link: https://lkml.kernel.org/r/20230106220016.172303-1-nphamcs@gmail.com
Fixes: 9997bc017549a ("zsmalloc: implement writeback mechanism for zsmalloc")
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
Peter Xu [Wed, 4 Jan 2023 22:52:05 +0000 (17:52 -0500)]
mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects

Userfaultfd-wp uses pte markers to mark wr-protected pages for both shmem
and hugetlb.  Shmem has pre-allocation ready for markers, but hugetlb path
was overlooked.

Doing so by calling huge_pte_alloc() if the initial pgtable walk fails to
find the huge ptep.  It's possible that huge_pte_alloc() can fail with
high memory pressure, in that case stop the loop immediately and fail
silently.  This is not the most ideal solution but it matches with what we
do with shmem meanwhile it avoids the splat in dmesg.

Link: https://lkml.kernel.org/r/20230104225207.1066932-2-peterx@redhat.com
Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Houghton <jthoughton@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agohugetlb: unshare some PMDs when splitting VMAs
James Houghton [Wed, 4 Jan 2023 23:19:10 +0000 (23:19 +0000)]
hugetlb: unshare some PMDs when splitting VMAs

PMD sharing can only be done in PUD_SIZE-aligned pieces of VMAs; however,
it is possible that HugeTLB VMAs are split without unsharing the PMDs
first.

Without this fix, it is possible to hit the uffd-wp-related WARN_ON_ONCE
in hugetlb_change_protection [1].  The key there is that
hugetlb_unshare_all_pmds will not attempt to unshare PMDs in
non-PUD_SIZE-aligned sections of the VMA.

It might seem ideal to unshare in hugetlb_vm_op_open, but we need to
unshare in both the new and old VMAs, so unsharing in hugetlb_vm_op_split
seems natural.

[1]: https://lore.kernel.org/linux-mm/CADrL8HVeOkj0QH5VZZbRzybNE8CG-tEGFshnA+bG9nMgcWtBSg@mail.gmail.com/

Link: https://lkml.kernel.org/r/20230104231910.1464197-1-jthoughton@google.com
Fixes: 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm: fix vma->anon_name memory leak for anonymous shmem VMAs
Suren Baghdasaryan [Thu, 5 Jan 2023 00:02:40 +0000 (16:02 -0800)]
mm: fix vma->anon_name memory leak for anonymous shmem VMAs

free_anon_vma_name() is missing a check for anonymous shmem VMA which
leads to a memory leak due to refcount not being dropped.  Fix this by
calling anon_vma_name_put() unconditionally.  It will free vma->anon_name
whenever it's non-NULL.

Link: https://lkml.kernel.org/r/20230105000241.1450843-1-surenb@google.com
Fixes: d09e8ca6cb93 ("mm: anonymous shared memory naming")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+91edf9178386a07d06a7@syzkaller.appspotmail.com
Cc: Hugh Dickins <hughd@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
Zach O'Keefe [Sat, 24 Dec 2022 08:20:35 +0000 (00:20 -0800)]
mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE

SHMEM_HUGE_DENY is for emergency use by the admin, to disable allocation
of shmem huge pages if, for example, a dangerous bug is found in their
usage: see "deny" in Documentation/mm/transhuge.rst.  An app using
madvise(,,MADV_COLLAPSE) should not be allowed to override it: restore its
precedence over shmem_huge_force.

Restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE.

Link: https://lkml.kernel.org/r/20221224082035.3197140-2-zokeefe@google.com
Fixes: 7c6c6cc4d3a2 ("mm/shmem: add flag to enforce shmem THP in hugepage_vma_check()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
Zach O'Keefe [Sat, 24 Dec 2022 08:20:34 +0000 (00:20 -0800)]
mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end

MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.

At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped.  One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.

However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.

The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results.  An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region.  An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.

I don't believe there is a kernel stability concern here as we always
(re)validate the VMA / region accordingly.  Also as Hugh mentions, the
user-visible effects are: we try to collapse more memory than requested
by the user, and/or failing an operation that should have otherwise
succeeded.  An example is trying to collapse a 4MiB file contained
within a 12MiB VMA.

Don't expand the acted-on region when refetching vma->vm_end.

Link: https://lkml.kernel.org/r/20221224082035.3197140-1-zokeefe@google.com
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
David Hildenbrand [Fri, 9 Dec 2022 08:09:12 +0000 (09:09 +0100)]
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA

Currently, we don't enable writenotify when enabling userfaultfd-wp on a
shared writable mapping (for now only shmem and hugetlb).  The consequence
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).

So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
wrprotect).  For example, when enabling softdirty tracking, we enable
writenotify.  With uffd-wp on shared mappings, that changed.  More details
on vma->vm_page_prot semantics were summarized in [1].

This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone.
Prone to such issues is any code that uses vma->vm_page_prot to set PTE
permissions: primarily pte_modify() and mk_pte().

Instead, let's enable writenotify such that PTEs/PMDs/...  will be mapped
write-protected as default and we will only allow selected PTEs that are
definitely safe to be mapped without write-protection (see
can_change_pte_writable()) to be writable.  In the future, we might want
to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
locations, for example, also when removing uffd-wp protection.

This fixes two known cases:

(a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
    in uffd-wp not triggering on write access.
(b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
    writable, resulting in uffd-wp not triggering on write access.

Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
without NUMA hinting (which currently doesn't seem to be applicable to
shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA.  On
such a VMA, userfaultfd-wp is currently non-functional.

Note that when enabling userfaultfd-wp, there is no need to walk page
tables to enforce the new default protection for the PTEs: we know that
they cannot be uffd-wp'ed yet, because that can only happen after enabling
uffd-wp for the VMA in general.

Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
accidentally set the write bit -- which would result in uffd-wp not
triggering on later write access.  This commit makes uffd-wp on shmem
behave just like uffd-wp on anonymous memory in that regard, even though,
mixing mprotect with uffd-wp is controversial.

[1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com

Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ives van Hoorne <ives@codesandbox.io>
Debugged-by: Peter Xu <peterx@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
Hugh Dickins [Thu, 22 Dec 2022 20:41:50 +0000 (12:41 -0800)]
mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma

uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma is always set
in that case, so undo the prohibition.  And MADV_COLLAPSE ought to be able
to collapse some page tables in a vma which happens to have anon_vma set
from CoWing elsewhere.

Is anon_vma lock required?  Almost not: if any page other than expected
subpage of the non-anon huge page is found in the page table, collapse is
aborted without making any change.  However, it is possible that an anon
page was CoWed from this extent in another mm or vma, in which case a
concurrent lookup might look here: so keep it away while clearing pmd (but
perhaps we shall go back to using pmd_lock() there in future).

Note that collapse_pte_mapped_thp() is exceptional in freeing a page table
without having cleared its ptes: I'm uneasy about that, and had thought
pte_clear()ing appropriate; but exclusive i_mmap lock does fix the
problem, and we would have to move the mmu_notification if clearing those
ptes.

What this fixes is not a dangerous instability.  But I suggest Cc stable
because uprobes "healing" has regressed in that way, so this should follow
8d3c106e19e8 into those stable releases where it was backported (and may
want adjustment there - I'll supply backports as needed).

Link: https://lkml.kernel.org/r/b740c9fb-edba-92ba-59fb-7a5592e5dfc@google.com
Fixes: 8d3c106e19e8 ("mm/khugepaged: take the right locks for page table retraction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
David Hildenbrand [Thu, 22 Dec 2022 20:55:11 +0000 (21:55 +0100)]
mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()

We have to update the uffd-wp SWP PTE bit independent of the type of
migration entry.  Currently, if we're unlucky and we want to install/clear
the uffd-wp bit just while we're migrating a read-only mapped hugetlb
page, we would miss to set/clear the uffd-wp bit.

Further, if we're processing a readable-exclusive migration entry and
neither want to set or clear the uffd-wp bit, we could currently end up
losing the uffd-wp bit.  Note that the same would hold for writable
migrating entries, however, having a writable migration entry with the
uffd-wp bit set would already mean that something went wrong.

Note that the change from !is_readable_migration_entry ->
writable_migration_entry is harmless and actually cleaner, as raised by
Miaohe Lin and discussed in [1].

[1] https://lkml.kernel.org/r/90dd6a93-4500-e0de-2bf0-bf522c311b0c@huawei.com

Link: https://lkml.kernel.org/r/20221222205511.675832-3-david@redhat.com
Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agomm/hugetlb: fix PTE marker handling in hugetlb_change_protection()
David Hildenbrand [Thu, 22 Dec 2022 20:55:10 +0000 (21:55 +0100)]
mm/hugetlb: fix PTE marker handling in hugetlb_change_protection()

Patch series "mm/hugetlb: uffd-wp fixes for hugetlb_change_protection()".

Playing with virtio-mem and background snapshots (using uffd-wp) on
hugetlb in QEMU, I managed to trigger a VM_BUG_ON().  Looking into the
details, hugetlb_change_protection() seems to not handle uffd-wp correctly
in all cases.

Patch #1 fixes my test case.  I don't have reproducers for patch #2, as it
requires running into migration entries.

I did not yet check in detail yet if !hugetlb code requires similar care.

This patch (of 2):

There are two problematic cases when stumbling over a PTE marker in
hugetlb_change_protection():

(1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will
    end up in the "!huge_pte_none(pte)" case and mess up the PTE marker.

(2) We unprotect a uffd-wp PTE marker: we will similarly end up in the
    "!huge_pte_none(pte)" case even though we cleared the PTE, because
    the "pte" variable is stale. We'll mess up the PTE marker.

For example, if we later stumble over such a "wrongly modified" PTE marker,
we'll treat it like a present PTE that maps some garbage page.

This can, for example, be triggered by mapping a memfd backed by huge
pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a)
uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c)
unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE)
on that file range, we will run into a VM_BUG_ON:

[  195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
[  195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff)
[  195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000
[  195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page))
[  195.039573] ------------[ cut here ]------------
[  195.039574] kernel BUG at mm/rmap.c:1346!
[  195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1
[  195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022
[  195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550
[  195.039588] Code: [...]
[  195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292
[  195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000
[  195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff
[  195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08
[  195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0
[  195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100
[  195.039595] FS:  00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000
[  195.039596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0
[  195.039598] PKRU: 55555554
[  195.039599] Call Trace:
[  195.039600]  <TASK>
[  195.039602]  __unmap_hugepage_range+0x33b/0x7d0
[  195.039605]  unmap_hugepage_range+0x55/0x70
[  195.039608]  hugetlb_vmdelete_list+0x77/0xa0
[  195.039611]  hugetlbfs_fallocate+0x410/0x550
[  195.039612]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[  195.039616]  vfs_fallocate+0x12e/0x360
[  195.039618]  __x64_sys_fallocate+0x40/0x70
[  195.039620]  do_syscall_64+0x58/0x80
[  195.039623]  ? syscall_exit_to_user_mode+0x17/0x40
[  195.039624]  ? do_syscall_64+0x67/0x80
[  195.039626]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  195.039628] RIP: 0033:0x7fc7b590651f
[  195.039653] Code: [...]
[  195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[  195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f
[  195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c
[  195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073
[  195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000
[  195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000
[  195.039661]  </TASK>

Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over
an exclusive marker.  spin_unlock() + continue would get the job done.

However, instead, make it clearer that there are no fall-through
statements: we process each case (hwpoison, migration, marker, !none,
none) and then unlock the page table to continue with the next PTE.  Let's
avoid "continue" statements and use a single spin_unlock() at the end.

Link: https://lkml.kernel.org/r/20221222205511.675832-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221222205511.675832-2-david@redhat.com
Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
21 months agopowerpc/64s/hash: Make stress_hpt_timer_fn() static
Yang Yingliang [Wed, 28 Dec 2022 09:36:03 +0000 (17:36 +0800)]
powerpc/64s/hash: Make stress_hpt_timer_fn() static

stress_hpt_timer_fn() is only used in hash_utils.c, make it static.

Fixes: 6b34a099faa1 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221228093603.3166599-1-yangyingliang@huawei.com
21 months agoMerge tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm...
Linus Torvalds [Wed, 11 Jan 2023 23:12:14 +0000 (17:12 -0600)]
Merge tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Make 'perf kmem' cope with the removal of some
   kmem:kmem_cache_alloc_node and kmem:kmalloc_node in the
   11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of
   tracepoints") commit, making sure it works with Linux >= 6.2 as well
   as with older kernels where those tracepoints are present.

 - Also make it handle the new "node" kmem:kmalloc and
   kmem:kmem_cache_alloc tracepoint field introduced in that same
   commit.

 - Fix hardware tracing PMU address filter duplicate symbol selection,
   that was preventing to match with static functions with the same name
   present in different object files.

 - Fix regression on what linux/types.h file gets used to build the "BPF
   prologue" 'perf test' entry, the system one lacks the fmode_t
   definition used in this test, so provide that type in the test
   itself.

 - Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1. If the
   user asks for linking with the libbpf package provided by the distro,
   then it has to be >= 0.8.0. Using the libbpf supplied with the kernel
   would be a fallback in that case.

 - Fix the build when libbpf isn't available or explicitly disabled via
   NO_LIBBPF=1.

 - Don't try to install libtraceevent plugins as its not anymore in the
   kernel sources and will thus always fail.

* tag 'perf-tools-fixes-for-v6.2-2-2023-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf auxtrace: Fix address filter duplicate symbol selection
  perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1
  perf build: Fix build error when NO_LIBBPF=1
  perf tools: Don't install libtraceevent plugins as its not anymore in the kernel sources
  perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring
  perf kmem: Support legacy tracepoints
  perf build: Properly guard libbpf includes
  perf tests bpf prologue: Fix bpf-script-test-prologue test compile issue with clang

21 months agoKVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
David Woodhouse [Wed, 11 Jan 2023 18:06:51 +0000 (18:06 +0000)]
KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock

In commit 14243b387137a ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:

       /*
        * For the irqfd workqueue, using the main kvm->lock mutex is
        * fine since this function is invoked from kvm_set_irq() with
        * no other lock held, no srcu. In future if it will be called
        * directly from a vCPU thread (e.g. on hypercall for an IPI)
        * then it may need to switch to using a leaf-node mutex for
        * serializing the shared_info mapping.
        */
       mutex_lock(&kvm->lock);

In commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.

Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.

Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-4-dwmw2@infradead.org>
[Rebase, add docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
21 months agox86/pci: Simplify is_mmconf_reserved() messages
Bjorn Helgaas [Tue, 10 Jan 2023 18:02:42 +0000 (12:02 -0600)]
x86/pci: Simplify is_mmconf_reserved() messages

is_mmconf_reserved() takes a "with_e820" parameter that only determines the
message logged if it finds the MMCONFIG region is reserved.  Pass the
message directly, which will simplify a future patch that adds a new way of
looking for that reservation.  No functional change intended.

Link: https://lore.kernel.org/r/20230110180243.1590045-2-helgaas@kernel.org
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
21 months agodocs/conf.py: Use about.html only in sidebar of alabaster theme
Akira Yokosawa [Tue, 10 Jan 2023 09:47:25 +0000 (18:47 +0900)]
docs/conf.py: Use about.html only in sidebar of alabaster theme

"about.html" is available only for the alabaster theme [1].
Unconditionally putting it to html_sidebars prevents us from
using other themes which respect html_sidebars.

Remove about.html from the initialization and insert it at the
front for the alabaster theme.

Link: [1] https://alabaster.readthedocs.io/en/latest/installation.html#sidebars
Fixes: d5389d3145ef ("docs: Switch the default HTML theme to alabaster")
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/4b162dbe-2a7f-1710-93e0-754cf8680aae@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
21 months agoMerge branch 'master' into mm-stable
Andrew Morton [Wed, 11 Jan 2023 22:01:38 +0000 (14:01 -0800)]
Merge branch 'master' into mm-stable

21 months agos390: update defconfigs
Heiko Carstens [Wed, 11 Jan 2023 14:04:47 +0000 (15:04 +0100)]
s390: update defconfigs

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agoKVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
David Woodhouse [Wed, 11 Jan 2023 18:06:50 +0000 (18:06 +0000)]
KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule

Documentation/virt/kvm/locking.rst tells us that kvm->lock is taken outside
vcpu->mutex. But that doesn't actually happen very often; it's only in
some esoteric cases like migration with AMD SEV. This means that lockdep
usually doesn't notice, and doesn't do its job of keeping us honest.

Ensure that lockdep *always* knows about the ordering of these two locks,
by briefly taking vcpu->mutex in kvm_vm_ioctl_create_vcpu() while kvm->lock
is held.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
21 months agoKVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
David Woodhouse [Wed, 11 Jan 2023 18:06:49 +0000 (18:06 +0000)]
KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()

The kvm_xen_update_runstate_guest() function can be called when the vCPU
is being scheduled out, from a preempt notifier. It *opportunistically*
updates the runstate area in the guest memory, if the gfn_to_pfn_cache
which caches the appropriate address is still valid.

If there is *contention* when it attempts to obtain gpc->lock, then
locking inside the priority inheritance checks may cause a deadlock.
Lockdep reports:

[13890.148997] Chain exists of:
                 &gpc->lock --> &p->pi_lock --> &rq->__lock

[13890.149002]  Possible unsafe locking scenario:

[13890.149003]        CPU0                    CPU1
[13890.149004]        ----                    ----
[13890.149005]   lock(&rq->__lock);
[13890.149007]                                lock(&p->pi_lock);
[13890.149009]                                lock(&rq->__lock);
[13890.149011]   lock(&gpc->lock);
[13890.149013]
                *** DEADLOCK ***

In the general case, if there's contention for a read lock on gpc->lock,
that's going to be because something else is either invalidating or
revalidating the cache. Either way, we've raced with seeing it in an
invalid state, in which case we would have aborted the opportunistic
update anyway.

So in the 'atomic' case when called from the preempt notifier, just
switch to using read_trylock() and avoid the PI handling altogether.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
21 months agoKVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
David Woodhouse [Wed, 11 Jan 2023 18:06:48 +0000 (18:06 +0000)]
KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking

In commit 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate
area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same
time:
/*
 * The guest's runstate_info is split across two pages and we
 * need to hold and validate both GPCs simultaneously. We can
 * declare a lock ordering GPC1 > GPC2 because nothing else
 * takes them more than one at a time.
 */

However, we forgot to tell lockdep. Do so, by setting a subclass on the
first lock before taking the second.

Fixes: 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
21 months agoMerge tag 'kvmarm-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmar...
Paolo Bonzini [Wed, 11 Jan 2023 18:31:53 +0000 (13:31 -0500)]
Merge tag 'kvmarm-fixes-6.2-1' of git://git./linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 6.2, take #1

- Fix the PMCR_EL0 reset value after the PMU rework

- Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

- Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

- Put the Apple M2 on the naughty step for not being able to
  correctly implement the vgic SEIS feature, just liek the M1
  before it

- Reviewer updates: Alex is stepping down, replaced by Zenghui

21 months agoDocumentation: kvm: fix SRCU locking order docs
Paolo Bonzini [Mon, 9 Jan 2023 11:02:16 +0000 (06:02 -0500)]
Documentation: kvm: fix SRCU locking order docs

kvm->srcu is taken in KVM_RUN and several other vCPU ioctls, therefore
vcpu->mutex is susceptible to the same deadlock that is documented
for kvm->slots_lock.  The same holds for kvm->lock, since kvm->lock
is held outside vcpu->mutex.  Fix the documentation and rearrange it
to highlight the difference between these locks and kvm->slots_arch_lock,
and how kvm->slots_arch_lock can be useful while processing a vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
21 months agoperf auxtrace: Fix address filter duplicate symbol selection
Adrian Hunter [Tue, 10 Jan 2023 18:56:59 +0000 (20:56 +0200)]
perf auxtrace: Fix address filter duplicate symbol selection

When a match has been made to the nth duplicate symbol, return
success not error.

Example:

  Before:

    $ cat file.c
    cat: file.c: No such file or directory
    $ cat file1.c
    #include <stdio.h>

    static void func(void)
    {
            printf("First func\n");
    }

    void other(void);

    int main()
    {
            func();
            other();
            return 0;
    }
    $ cat file2.c
    #include <stdio.h>

    static void func(void)
    {
            printf("Second func\n");
    }

    void other(void)
    {
            func();
    }

    $ gcc -Wall -Wextra -o test file1.c file2.c
    $ perf record -e intel_pt//u --filter 'filter func @ ./test' -- ./test
    Multiple symbols with name 'func'
    #1      0x1149  l       func
                    which is near           main
    #2      0x1179  l       func
                    which is near           other
    Disambiguate symbol name by inserting #n after the name e.g. func #2
    Or select a global symbol by inserting #0 or #g or #G
    Failed to parse address filter: 'filter func @ ./test'
    Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
    Where multiple filters are separated by space or comma.
    $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
    Failed to parse address filter: 'filter func #2 @ ./test'
    Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
    Where multiple filters are separated by space or comma.

  After:

    $ perf record -e intel_pt//u --filter 'filter func #2 @ ./test' -- ./test
    First func
    Second func
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.016 MB perf.data ]
    $ perf script --itrace=b -Ftime,flags,ip,sym,addr --ns
    1231062.526977619:   tr strt                               0 [unknown] =>     558495708179 func
    1231062.526977619:   tr end  call               558495708188 func =>     558495708050 _init
    1231062.526979286:   tr strt                               0 [unknown] =>     55849570818d func
    1231062.526979286:   tr end  return             55849570818f func =>     55849570819d other

Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters")
Reported-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230110185659.15979-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agodrm/i915/gt: Cover rest of SVG unit MCR registers
Gustavo Sousa [Thu, 5 Jan 2023 13:37:01 +0000 (10:37 -0300)]
drm/i915/gt: Cover rest of SVG unit MCR registers

CHICKEN_RASTER_{1,2} got overlooked with the move done in commit
a9e69428b1b4 ("drm/i915: Define MCR registers explicitly"). Registers
from the SVG unit became multicast as of Xe_HP graphics.

BSpec: 66534
Fixes: a9e69428b1b4 ("drm/i915: Define MCR registers explicitly")
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105133701.19556-1-gustavo.sousa@intel.com
(cherry picked from commit 10903b0a0f4d4964b352fa3df12d3d2ef5fb7a3b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
21 months agoKVM: s390: interrupt: use READ_ONCE() before cmpxchg()
Heiko Carstens [Mon, 9 Jan 2023 14:54:56 +0000 (15:54 +0100)]
KVM: s390: interrupt: use READ_ONCE() before cmpxchg()

Use READ_ONCE() before cmpxchg() to prevent that the compiler generates
code that fetches the to be compared old value several times from memory.

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()
Heiko Carstens [Mon, 9 Jan 2023 10:51:20 +0000 (11:51 +0100)]
s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()

Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only
dereferenced once by using READ_ONCE(). Otherwise the compiler could
generate incorrect code.

Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agos390/cpum_sf: add READ_ONCE() semantics to compare and swap loops
Heiko Carstens [Thu, 5 Jan 2023 14:44:20 +0000 (15:44 +0100)]
s390/cpum_sf: add READ_ONCE() semantics to compare and swap loops

The current cmpxchg_double() loops within the perf hw sampling code do not
have READ_ONCE() semantics to read the old value from memory. This allows
the compiler to generate code which reads the "old" value several times
from memory, which again allows for inconsistencies.

For example:

        /* Reset trailer (using compare-double-and-swap) */
        do {
                te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK;
                te_flags |= SDB_TE_ALERT_REQ_MASK;
        } while (!cmpxchg_double(&te->flags, &te->overflow,
                 te->flags, te->overflow,
                 te_flags, 0ULL));

The compiler could generate code where te->flags used within the
cmpxchg_double() call may be refetched from memory and which is not
necessarily identical to the previous read version which was used to
generate te_flags. Which in turn means that an incorrect update could
happen.

Fix this by adding READ_ONCE() semantics to all cmpxchg_double()
loops. Given that READ_ONCE() cannot generate code on s390 which atomically
reads 16 bytes, use a private compare-and-swap-double implementation to
achieve that.

Also replace cmpxchg_double() with the private implementation to be able to
re-use the old value within the loops.

As a side effect this converts the whole code to only use bit fields
to read and modify bits within the hws trailer header.

Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agospi: Merge rename of spi-cs-setup-ns DT property
Mark Brown [Wed, 11 Jan 2023 14:15:22 +0000 (14:15 +0000)]
spi: Merge rename of spi-cs-setup-ns DT property

The newly added spi-cs-setup-ns doesn't really fit with the existing
property names for delays, rename it so that it does before it makes it
into a release and becomes ABI.

21 months agospi: spidev: remove debug messages that access spidev->spi without locking
Bartosz Golaszewski [Fri, 6 Jan 2023 10:07:19 +0000 (11:07 +0100)]
spi: spidev: remove debug messages that access spidev->spi without locking

The two debug messages in spidev_open() dereference spidev->spi without
taking the lock and without checking if it's not null. This can lead to
a crash. Drop the messages as they're not needed - the user-space will
get informed about ENOMEM with the syscall return value.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230106100719.196243-2-brgl@bgdev.pl
Signed-off-by: Mark Brown <broonie@kernel.org>
21 months agospi: spidev: fix a race condition when accessing spidev->spi
Bartosz Golaszewski [Fri, 6 Jan 2023 10:07:18 +0000 (11:07 +0100)]
spi: spidev: fix a race condition when accessing spidev->spi

There's a spinlock in place that is taken in file_operations callbacks
whenever we check if spidev->spi is still alive (not null). It's also
taken when spidev->spi is set to NULL in remove().

This however doesn't protect the code against driver unbind event while
one of the syscalls is still in progress. To that end we need a lock taken
continuously as long as we may still access spidev->spi. As both the file
ops and the remove callback are never called from interrupt context, we
can replace the spinlock with a mutex.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://lore.kernel.org/r/20230106100719.196243-1-brgl@bgdev.pl
Signed-off-by: Mark Brown <broonie@kernel.org>
21 months agoMerge tag 'mlx5-fixes-2023-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 11 Jan 2023 12:55:09 +0000 (12:55 +0000)]
Merge tag 'mlx5-fixes-2023-01-09' of git://git./linux/kernel/git/saeed/linux

mlx5-fixes-2023-01-09

21 months agoipv6: raw: Deduct extension header length in rawv6_push_pending_frames
Herbert Xu [Tue, 10 Jan 2023 00:59:06 +0000 (08:59 +0800)]
ipv6: raw: Deduct extension header length in rawv6_push_pending_frames

The total cork length created by ip6_append_data includes extension
headers, so we must exclude them when comparing them against the
IPV6_CHECKSUM offset which does not include extension headers.

Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
21 months agodrm/nouveau: Remove file nouveau_fbcon.c
Thomas Zimmermann [Tue, 10 Jan 2023 12:35:26 +0000 (13:35 +0100)]
drm/nouveau: Remove file nouveau_fbcon.c

Commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
converted nouveau to generic fbdev emulation. The driver's internal
implementation later got accidentally restored during a merge commit.
Remove the file from the driver. No functional changes.

v2:
* point Fixes tag to merge commit (Alex)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 4e291f2f5853 ("Merge tag 'drm-misc-next-2022-11-10-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230110123526.28770-1-tzimmermann@suse.de
21 months agonet: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()
Clément Léger [Mon, 9 Jan 2023 15:32:23 +0000 (16:32 +0100)]
net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()

If ptp was not enabled due to missing IRQ for instance,
lan966x_ptp_deinit() will dereference NULL pointers.

Fixes: d096459494a8 ("net: lan966x: Add support for ptp clocks")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
21 months agopowerpc/imc-pmu: Fix use of mutex in IRQs disabled section
Kajol Jain [Fri, 6 Jan 2023 06:51:57 +0000 (12:21 +0530)]
powerpc/imc-pmu: Fix use of mutex in IRQs disabled section

Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.

Command to trigger the warning:
  # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5

   Performance counter stats for 'sleep 5':

                   0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/

         5.002117947 seconds time elapsed

         0.000131000 seconds user
         0.001063000 seconds sys

Below is snippet of the warning in dmesg:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
  preempt_count: 2, expected: 0
  4 locks held by perf-exec/2869:
   #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
   #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
   #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
   #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
  irq event stamp: 4806
  hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
  hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
  softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    __might_resched+0x2f8/0x310
    __mutex_lock+0x6c/0x13f0
    thread_imc_event_add+0xf4/0x1b0
    event_sched_in+0xe0/0x210
    merge_sched_in+0x1f0/0x600
    visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
    ctx_flexible_sched_in+0xcc/0x140
    ctx_sched_in+0x20c/0x2a0
    ctx_resched+0x104/0x1c0
    perf_event_exec+0x340/0x510
    begin_new_exec+0x730/0xef0
    load_elf_binary+0x3f8/0x1e10
  ...
  do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
  WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
  CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
  REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
  CFAR: c00000000013fb64 IRQMASK: 1

The above warning triggered because the current imc-pmu code uses mutex
lock in interrupt disabled sections. The function mutex_lock()
internally calls __might_resched(), which will check if IRQs are
disabled and in case IRQs are disabled, it will trigger the warning.

Fix the issue by changing the mutex lock to spinlock.

Fixes: 8f95faaac56c ("powerpc/powernv: Detect and create IMC device")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Fix comments, trim oops in change log, add reported-by tags]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
21 months agopowerpc/boot: Fix incorrect version calculation issue in ld_version
Ojaswin Mujoo [Wed, 4 Jan 2023 20:24:37 +0000 (01:54 +0530)]
powerpc/boot: Fix incorrect version calculation issue in ld_version

The ld_version() function computes the wrong version value for certain
ld versions such as the following:

  $ ld --version
  GNU ld (GNU Binutils; SUSE Linux Enterprise 15)
  2.37.20211103-150100.7.37

For input 2.37.20211103, the value computed is 202348030000 which is
higher than the value for a later version like 2.39.0, which is
23900000.

This issue was highlighted because with the above ld version, the
powerpc kernel build started failing with ld error: "unrecognized option
--no-warn-rwx-segments". This was caused due to the recent commit
579aee9fc594 ("powerpc: suppress some linker warnings in recent linker
versions") which added the --no-warn-rwx-segments linker flag if the ld
version is greater than 2.39.

Due to the bug in ld_version(), ld version 2.37.20111103 is wrongly
calculated to be greater than 2.39 and the unsupported flag is added.

To fix it, if version is of the form x.y.z and length(z) == 8, then most
probably it is a date [yyyymmdd] commonly used for release snapshots and
not an actual new version. Hence, ignore the date part replacing it with
0.

Fixes: 579aee9fc594 ("powerpc: suppress some linker warnings in recent linker versions")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
[mpe: Tweak change log wording/formatting, add Fixes tag]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230104202437.90039-1-ojaswin@linux.ibm.com
21 months agocifs: fix potential memory leaks in session setup
Paulo Alcantara [Tue, 10 Jan 2023 23:35:46 +0000 (20:35 -0300)]
cifs: fix potential memory leaks in session setup

Make sure to free cifs_ses::auth_key.response before allocating it as
we might end up leaking memory in reconnect or mounting.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
21 months agocifs: do not query ifaces on smb1 mounts
Paulo Alcantara [Tue, 10 Jan 2023 22:23:21 +0000 (19:23 -0300)]
cifs: do not query ifaces on smb1 mounts

Users have reported the following error on every 600 seconds
(SMB_INTERFACE_POLL_INTERVAL) when mounting SMB1 shares:

CIFS: VFS: \\srv\share error -5 on ioctl to get interface list

It's supported only by SMB2+, so do not query network interfaces on
SMB1 mounts.

Fixes: 6e1c1c08cdf3 ("cifs: periodically query network interfaces from server")
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
21 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 11 Jan 2023 02:26:33 +0000 (18:26 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-09 (ice)

This series contains updates to ice driver only.

Jiasheng Jiang frees allocated cmd_buf if write_buf allocation failed to
prevent memory leak.

Yuan Can adds check, and proper cleanup, of gnss_tty_port allocation call
to avoid memory leaks.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Add check for kzalloc
  ice: Fix potential memory leak in ice_gnss_tty_write()
====================

Link: https://lore.kernel.org/r/20230109225358.3478060-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
21 months agonet: sched: disallow noqueue for qdisc classes
Frederick Lawler [Mon, 9 Jan 2023 16:39:06 +0000 (10:39 -0600)]
net: sched: disallow noqueue for qdisc classes

While experimenting with applying noqueue to a classful queue discipline,
we discovered a NULL pointer dereference in the __dev_queue_xmit()
path that generates a kernel OOPS:

    # dev=enp0s5
    # tc qdisc replace dev $dev root handle 1: htb default 1
    # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit
    # tc qdisc add dev $dev parent 1:1 handle 10: noqueue
    # ping -I $dev -w 1 -c 1 1.1.1.1

[    2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    2.173217] #PF: supervisor instruction fetch in kernel mode
...
[    2.178451] Call Trace:
[    2.178577]  <TASK>
[    2.178686]  htb_enqueue+0x1c8/0x370
[    2.178880]  dev_qdisc_enqueue+0x15/0x90
[    2.179093]  __dev_queue_xmit+0x798/0xd00
[    2.179305]  ? _raw_write_lock_bh+0xe/0x30
[    2.179522]  ? __local_bh_enable_ip+0x32/0x70
[    2.179759]  ? ___neigh_create+0x610/0x840
[    2.179968]  ? eth_header+0x21/0xc0
[    2.180144]  ip_finish_output2+0x15e/0x4f0
[    2.180348]  ? dst_output+0x30/0x30
[    2.180525]  ip_push_pending_frames+0x9d/0xb0
[    2.180739]  raw_sendmsg+0x601/0xcb0
[    2.180916]  ? _raw_spin_trylock+0xe/0x50
[    2.181112]  ? _raw_spin_unlock_irqrestore+0x16/0x30
[    2.181354]  ? get_page_from_freelist+0xcd6/0xdf0
[    2.181594]  ? sock_sendmsg+0x56/0x60
[    2.181781]  sock_sendmsg+0x56/0x60
[    2.181958]  __sys_sendto+0xf7/0x160
[    2.182139]  ? handle_mm_fault+0x6e/0x1d0
[    2.182366]  ? do_user_addr_fault+0x1e1/0x660
[    2.182627]  __x64_sys_sendto+0x1b/0x30
[    2.182881]  do_syscall_64+0x38/0x90
[    2.183085]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
[    2.187402]  </TASK>

Previously in commit d66d6c3152e8 ("net: sched: register noqueue
qdisc"), NULL was set for the noqueue discipline on noqueue init
so that __dev_queue_xmit() falls through for the noqueue case. This
also sets a bypass of the enqueue NULL check in the
register_qdisc() function for the struct noqueue_disc_ops.

Classful queue disciplines make it past the NULL check in
__dev_queue_xmit() because the discipline is set to htb (in this case),
and then in the call to __dev_xmit_skb(), it calls into htb_enqueue()
which grabs a leaf node for a class and then calls qdisc_enqueue() by
passing in a queue discipline which assumes ->enqueue() is not set to NULL.

Fix this by not allowing classes to be assigned to the noqueue
discipline. Linux TC Notes states that classes cannot be set to
the noqueue discipline. [1] Let's enforce that here.

Links:
1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt

Fixes: d66d6c3152e8 ("net: sched: register noqueue qdisc")
Cc: stable@vger.kernel.org
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
21 months agodrm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
Eric Huang [Thu, 5 Jan 2023 19:01:18 +0000 (14:01 -0500)]
drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU

The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agodrm/amd/pm/smu13: BACO is supported when it's in BACO state
Guchun Chen [Tue, 10 Jan 2023 03:33:44 +0000 (11:33 +0800)]
drm/amd/pm/smu13: BACO is supported when it's in BACO state

This leverages the logic in smu11. No need to talk to SMU to
check BACO enablement as it's in BACO state already.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
21 months agodrm/amdkfd: Add sync after creating vram bo
Eric Huang [Mon, 9 Jan 2023 19:16:42 +0000 (14:16 -0500)]
drm/amdkfd: Add sync after creating vram bo

There will be data corruption on vram allocated by svm
if the initialization is not complete and application is
writting on the memory. Adding sync to wait for the
initialization completion is to resolve this issue.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
21 months agocifs: fix double free on failed kerberos auth
Paulo Alcantara [Tue, 10 Jan 2023 20:55:20 +0000 (17:55 -0300)]
cifs: fix double free on failed kerberos auth

If session setup failed with kerberos auth, we ended up freeing
cifs_ses::auth_key.response twice in SMB2_auth_kerberos() and
sesInfoFree().

Fix this by zeroing out cifs_ses::auth_key.response after freeing it
in SMB2_auth_kerberos().

Fixes: a4e430c8c8ba ("cifs: replace kfree() with kfree_sensitive() for sensitive data")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
21 months agocifs: remove redundant assignment to the variable match
Colin Ian King [Thu, 5 Jan 2023 13:41:11 +0000 (13:41 +0000)]
cifs: remove redundant assignment to the variable match

The variable match is being assigned a value that is never read, it
is being re-assigned a new value later on. The assignment is redundant
and can be removed.

Cleans up clang scan-build warning:
fs/cifs/dfs_cache.c:1302:2: warning: Value stored to 'match' is never read

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
21 months agoMerge tag 'nfsd-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Tue, 10 Jan 2023 21:03:06 +0000 (15:03 -0600)]
Merge tag 'nfsd-6.2-3' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a race when creating NFSv4 files

 - Revert the use of relaxed bitops

* tag 'nfsd-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Use set_bit(RQ_DROPME)
  Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
  nfsd: fix handling of cached open files in nfsd4_open codepath

21 months agoMerge tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa
Linus Torvalds [Tue, 10 Jan 2023 20:48:12 +0000 (14:48 -0600)]
Merge tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa

Pull xtensa fixes from Max Filippov:

 - fix xtensa allmodconfig build broken by the kcsan test

 - drop unused members of struct thread_struct

* tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa:
  xtensa: drop unused members of struct thread_struct
  kcsan: test: don't put the expect array on the stack

21 months agoiavf/iavf_main: actually log ->src mask when talking about it
Daniil Tatianin [Tue, 20 Dec 2022 06:32:46 +0000 (09:32 +0300)]
iavf/iavf_main: actually log ->src mask when talking about it

This fixes a copy-paste issue where dev_err would log the dst mask even
though it is clearly talking about src.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Fixes: 0075fa0fadd0 ("i40evf: Add support to apply cloud filters")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
21 months agoigc: Fix PPS delta between two synchronized end-points
Christopher S Hall [Wed, 14 Dec 2022 08:10:38 +0000 (16:10 +0800)]
igc: Fix PPS delta between two synchronized end-points

This patch fix the pulse per second output delta between
two synchronized end-points.

Based on Intel Discrete I225 Software User Manual Section
4.2.15 TimeSync Auxiliary Control Register, ST0[Bit 4] and
ST1[Bit 7] must be set to ensure that clock output will be
toggles based on frequency value defined. This is to ensure
that output of the PPS is aligned with the clock.

How to test:

1) Running time synchronization on both end points.
Ex: ptp4l --step_threshold=1 -m -f gPTP.cfg -i <interface name>

2) Configure PPS output using below command for both end-points
Ex: SDP0 on I225 REV4 SKU variant

./testptp -d /dev/ptp0 -L 0,2
./testptp -d /dev/ptp0 -p 1000000000

3) Measure the output using analyzer for both end-points

Fixes: 87938851b6ef ("igc: enable auxiliary PHC functions for the i225")
Signed-off-by: Christopher S Hall <christopher.s.hall@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
21 months agoixgbe: fix pci device refcount leak
Yang Yingliang [Tue, 29 Nov 2022 01:57:48 +0000 (09:57 +0800)]
ixgbe: fix pci device refcount leak

As the comment of pci_get_domain_bus_and_slot() says, it
returns a PCI device with refcount incremented, when finish
using it, the caller must decrement the reference count by
calling pci_dev_put().

In ixgbe_get_first_secondary_devfn() and ixgbe_x550em_a_has_mii(),
pci_dev_put() is called to avoid leak.

Fixes: 8fa10ef01260 ("ixgbe: register a mdiobus")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
21 months agocpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering
Perry Yuan [Tue, 10 Jan 2023 15:10:29 +0000 (23:10 +0800)]
cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering

In the amd_pstate_adjust_perf(), there is one cpufreq_cpu_get() call to
increase increments the kobject reference count of policy and make it as
busy. Therefore, a corresponding call to cpufreq_cpu_put() is needed to
decrement the kobject reference count back, it will resolve the kernel
hang issue when unregistering the amd-pstate driver and register the
`amd_pstate_epp` driver instance.

Fixes: 1d215f0319 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Cc: 5.17+ <stable@vger.kernel.org> # 5.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
21 months agoMerge tag 'cpufreq/arm/fixes-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Rafael J. Wysocki [Tue, 10 Jan 2023 19:27:13 +0000 (20:27 +0100)]
Merge tag 'cpufreq/arm/fixes-6.2-rc4' of git://git./linux/kernel/git/vireshk/pm

Pull cpufreq ARM fixes for 6.2-rc4 from Viresh Kumar:

"- Fix double initialization and set suspend-freq for Apple's cpufreq
   driver (Arnd Bergmann and Hector Martin).

 - Fix reading of "reg" property, update cpufreq-dt's blocklist and
   update DT documentation for Qualcomm's cpufreq driver (Konrad Dybcio
   and Krzysztof Kozlowski).

 - Replace 0 with NULL for Armada driver (Miles Chen).

 - Fix potential overflows in CPPC driver (Pierre Gondois).

 - Update blocklist for Tegra234 Soc (Sumit Gupta)."

21 months agoACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops
Hans de Goede [Tue, 10 Jan 2023 15:30:28 +0000 (16:30 +0100)]
ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops

The Dell Latitude E6430 both with and without the optional NVidia dGPU
has a bug in its ACPI tables which is causing Linux to assign the wrong
ACPI fwnode / companion to the pci_device for the i915 iGPU.

Specifically under the PCI root bridge there are these 2 ACPI Device()s :

 Scope (_SB.PCI0)
 {
     Device (GFX0)
     {
         Name (_ADR, 0x00020000)  // _ADR: Address
     }

     ...

     Device (VID)
     {
         Name (_ADR, 0x00020000)  // _ADR: Address
         ...

         Method (_DOS, 1, NotSerialized)  // _DOS: Disable Output Switching
         {
             VDP8 = Arg0
             VDP1 (One, VDP8)
         }

         Method (_DOD, 0, NotSerialized)  // _DOD: Display Output Devices
         {
             ...
         }
         ...
     }
 }

The non-functional GFX0 ACPI device is a problem, because this gets
returned as ACPI companion-device by acpi_find_child_device() for the iGPU.

This is a long standing problem and the i915 driver does use the ACPI
companion for some things, but works fine without it.

However since commit 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
acpi_get_pci_dev() relies on the physical-node pointer in the acpi_device
and that is set on the wrong acpi_device because of the wrong
acpi_find_child_device() return. This breaks the ACPI video code,
leading to non working backlight control in some cases.

Add a type.backlight flag, mark ACPI video bus devices with this and make
find_child_checks() return a higher score for children with this flag set,
so that it picks the right companion-device.

Fixes: 63f534b8bad9 ("ACPI: PCI: Rework acpi_get_pci_dev()")
Co-developed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: 6.1+ <stable@vger.kernel.org> # 6.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
21 months agoACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline
Hans de Goede [Mon, 9 Jan 2023 19:18:11 +0000 (20:18 +0100)]
ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline

The patches adding NVidia-WMI-EC and Apple GMUX backlight detection
support to acpi_video_get_backlight_type(), forgot to update
acpi_video_parse_cmdline() to allow manually selecting these from
the commandline.

Add support for these to acpi_video_parse_cmdline().

Fixes: fe7aebb40d42 ("ACPI: video: Add Nvidia WMI EC brightness control detection (v3)")
Fixes: 21245df307cb ("ACPI: video: Add Apple GMUX brightness control detection")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
21 months agoACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA
Tamim Khan [Fri, 30 Dec 2022 05:58:39 +0000 (00:58 -0500)]
ACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA

Like the Asus Expertbook B2502CBA and various Asus Vivobook laptops,
the Asus Expertbook B2402CBA has an ACPI DSDT table that describes IRQ 1
as ActiveLow while the kernel overrides it to Edge_High. This prevents the
keyboard from working. To fix this issue, add this laptop to the
skip_override_table so that the kernel does not override IRQ 1.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216864
Tested-by: zelenat <zelenat@gmail.com>
Signed-off-by: Tamim Khan <tamim@fusetak.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
21 months agox86/resctrl: Fix event counts regression in reused RMIDs
Peter Newman [Tue, 20 Dec 2022 16:41:31 +0000 (17:41 +0100)]
x86/resctrl: Fix event counts regression in reused RMIDs

When creating a new monitoring group, the RMID allocated for it may have
been used by a group which was previously removed. In this case, the
hardware counters will have non-zero values which should be deducted
from what is reported in the new group's counts.

resctrl_arch_reset_rmid() initializes the prev_msr value for counters to
0, causing the initial count to be charged to the new group. Resurrect
__rmid_read() and use it to initialize prev_msr correctly.

Unlike before, __rmid_read() checks for error bits in the MSR read so
that callers don't need to.

Fixes: 1d81d15db39c ("x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221220164132.443083-1-peternewman@google.com
21 months agox86/resctrl: Fix task CLOSID/RMID update race
Peter Newman [Tue, 20 Dec 2022 16:11:23 +0000 (17:11 +0100)]
x86/resctrl: Fix task CLOSID/RMID update race

When the user moves a running task to a new rdtgroup using the task's
file interface or by deleting its rdtgroup, the resulting change in
CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the
task(s) CPUs.

x86 allows reordering loads with prior stores, so if the task starts
running between a task_curr() check that the CPU hoisted before the
stores in the CLOSID/RMID update then it can start running with the old
CLOSID/RMID until it is switched again because __rdtgroup_move_task()
failed to determine that it needs to be interrupted to obtain the new
CLOSID/RMID.

Refer to the diagram below:

CPU 0                                   CPU 1
-----                                   -----
__rdtgroup_move_task():
  curr <- t1->cpu->rq->curr
                                        __schedule():
                                          rq->curr <- t1
                                        resctrl_sched_in():
                                          t1->{closid,rmid} -> {1,1}
  t1->{closid,rmid} <- {2,2}
  if (curr == t1) // false
   IPI(t1->cpu)

A similar race impacts rdt_move_group_tasks(), which updates tasks in a
deleted rdtgroup.

In both cases, use smp_mb() to order the task_struct::{closid,rmid}
stores before the loads in task_curr().  In particular, in the
rdt_move_group_tasks() case, simply execute an smp_mb() on every
iteration with a matching task.

It is possible to use a single smp_mb() in rdt_move_group_tasks(), but
this would require two passes and a means of remembering which
task_structs were updated in the first loop. However, benchmarking
results below showed too little performance impact in the simple
approach to justify implementing the two-pass approach.

Times below were collected using `perf stat` to measure the time to
remove a group containing a 1600-task, parallel workload.

CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads)

  # mkdir /sys/fs/resctrl/test
  # echo $$ > /sys/fs/resctrl/test/tasks
  # perf bench sched messaging -g 40 -l 100000

task-clock time ranges collected using:

  # perf stat rmdir /sys/fs/resctrl/test

Baseline:                     1.54 - 1.60 ms
smp_mb() every matching task: 1.57 - 1.67 ms

  [ bp: Massage commit message. ]

Fixes: ae28d1aae48a ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
Fixes: 0efc89be9471 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com
21 months agoio_uring/fdinfo: include locked hash table in fdinfo output
Jens Axboe [Tue, 10 Jan 2023 17:24:52 +0000 (10:24 -0700)]
io_uring/fdinfo: include locked hash table in fdinfo output

A previous commit split the hash table for polled requests into two
parts, but didn't get the fdinfo output updated. This means that it's
less useful for debugging, as we may think a given request is not pending
poll.

Fix this up by dumping the locked hash table contents too.

Fixes: 9ca9fb24d5fe ("io_uring: mutex locked poll hashing")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
21 months agox86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
Juergen Gross [Tue, 10 Jan 2023 06:54:27 +0000 (07:54 +0100)]
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case

Since

  72cbc8f04fe2 ("x86/PAT: Have pat_enabled() properly reflect state when running on Xen")

PAT can be enabled without MTRR.

This has resulted in problems e.g. for a SEV-SNP guest running under Hyper-V,
when trying to establish a new mapping via memremap() with WB caching mode, as
pat_x_mtrr_type() will call mtrr_type_lookup(), which in turn is returning
MTRR_TYPE_INVALID due to MTRR being disabled in this configuration.

The result is a mapping with UC- caching, leading to severe performance
degradation.

Fix that by handling MTRR_TYPE_INVALID the same way as MTRR_TYPE_WRBACK
in pat_x_mtrr_type() because MTRR_TYPE_INVALID means MTRRs are disabled.

  [ bp: Massage commit message. ]

Fixes: 72cbc8f04fe2 ("x86/PAT: Have pat_enabled() properly reflect state when running on Xen")
Reported-by: Michael Kelley (LINUX) <mikelley@microsoft.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230110065427.20767-1-jgross@suse.com
21 months agodrm/i915/gt: Reset twice
Chris Wilson [Mon, 12 Dec 2022 16:13:38 +0000 (17:13 +0100)]
drm/i915/gt: Reset twice

After applying an engine reset, on some platforms like Jasperlake, we
occasionally detect that the engine state is not cleared until shortly
after the resume. As we try to resume the engine with volatile internal
state, the first request fails with a spurious CS event (it looks like
it reports a lite-restore to the hung context, instead of the expected
idle->active context switch).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221212161338.1007659-1-andi.shyti@linux.intel.com
(cherry picked from commit 3db9d590557da3aa2c952f2fecd3e9b703dad790)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
21 months agoperf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1
Arnaldo Carvalho de Melo [Tue, 10 Jan 2023 14:46:37 +0000 (11:46 -0300)]
perf bpf: Avoid build breakage with libbpf < 0.8.0 + LIBBPF_DYNAMIC=1

In 746bd29e348f99b4 ("perf build: Use tools/lib headers from install
path") we stopped having the tools/lib/ directory from the kernel
sources in the header include path unconditionally, which breaks the
build on systems with older versions of libbpf-devel, in this case 0.7.0
as some of the structures and function declarations present in the newer
version of libbpf included in the kernel sources (tools/lib/bpf) are not
anymore used, just the ones in the system libbpf.

So instead of trying to provide alternative functions when the
libbpf-bpf_program__set_insns feature test fails, fail a
LIBBPF_DYNAMIC=1 build (requesting the use of the system's libbpf) and
emit this build error message:

  $ make LIBBPF_DYNAMIC=1 -C tools/perf
  Makefile.config:593: *** Error: libbpf devel library needs to be >= 0.8.0 to build with LIBBPF_DYNAMIC, update or build statically with the version that comes with the kernel sources.  Stop.
  $

For v6.3 these tests will be revamped and we'll require libbpf 1.0 as a
minimal version for using LIBBPF_DYNAMIC=1, most distros should have it
by now or at v6.3 time.

Fixes: 746bd29e348f99b4 ("perf build: Use tools/lib headers from install path")
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/CAP-5=fVa51_URGsdDFVTzpyGmdDRj_Dj2EKPuDHNQ0BYgMSzUA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agoperf build: Fix build error when NO_LIBBPF=1
Ian Rogers [Fri, 6 Jan 2023 15:13:20 +0000 (07:13 -0800)]
perf build: Fix build error when NO_LIBBPF=1

The $(LIBBPF) target should only be a dependency of prepare if the
static version of libbpf is needed. Add a new LIBBPF_STATIC variable
that is set by Makefile.config. Use LIBBPF_STATIC to determine whether
the CFLAGS, etc. need updating and for adding $(LIBBPF) as a prepare
dependency.

As Makefile.config isn't loaded for "clean" as a target, always set
LIBBPF_OUTPUT regardless of whether it is needed for $(LIBBPF). This
is done to minimize conditional logic for $(LIBBPF)-clean.

This issue and an original fix was reported by Mike Leach in:
https://lore.kernel.org/lkml/20230105172243.7238-1-mike.leach@linaro.org/

Fixes: 746bd29e348f99b4 ("perf build: Use tools/lib headers from install path")
Reported-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20230106151320.619514-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agoperf tools: Don't install libtraceevent plugins as its not anymore in the kernel...
Arnaldo Carvalho de Melo [Mon, 9 Jan 2023 14:59:30 +0000 (11:59 -0300)]
perf tools: Don't install libtraceevent plugins as its not anymore in the kernel sources

While doing 'make -C tools/perf build-test' one can notice error
messages while trying to install libtraceevent plugins, stop doing that
as libtraceevent isn't anymore a homie.

These are the warnings dealt with:

   make_install_prefix_slash_O: make install prefix=/tmp/krava/
    failed to find: /tmp/krava/etc/bash_completion.d/perf
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_cfg80211.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_scsi.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_xen.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_function.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_sched_switch.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_mac80211.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_kvm.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_kmem.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_hrtimer.so
    failed to find: /tmp/krava/lib64/traceevent/plugins/plugin_jbd2.so

Fixes: 4171925aa9f3f7bf ("tools lib traceevent: Remove libtraceevent")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lore.kernel.org/lkml/Y7xXz+TSpiCbQGjw@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agoperf kmem: Support field "node" in evsel__process_alloc_event() coping with recent...
Leo Yan [Sun, 8 Jan 2023 06:24:00 +0000 (14:24 +0800)]
perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring

Commit 11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of
tracepoints") adds the field "node" into the tracepoints 'kmalloc' and
'kmem_cache_alloc', so this patch modifies the event process function to
support the field "node".

If field "node" is detected by checking function evsel__field(), it
stats the cross allocation.

When the "node" value is NUMA_NO_NODE (-1), it means the memory can be
allocated from any memory node, in this case, we don't account it as a
cross allocation.

Fixes: 11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of tracepoints")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20230108062400.250690-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agoperf kmem: Support legacy tracepoints
Leo Yan [Sun, 8 Jan 2023 06:23:59 +0000 (14:23 +0800)]
perf kmem: Support legacy tracepoints

Commit 11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of
tracepoints") removed tracepoints 'kmalloc_node' and
'kmem_cache_alloc_node', we need to consider the tool should be backward
compatible.

If it detect the tracepoint "kmem:kmalloc_node", this patch enables the
legacy tracepoints, otherwise, it will ignore them.

Fixes: 11e9734bcb6a7361 ("mm/slab_common: unify NUMA and UMA version of tracepoints")
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20230108062400.250690-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agoperf build: Properly guard libbpf includes
Ian Rogers [Fri, 6 Jan 2023 15:13:19 +0000 (07:13 -0800)]
perf build: Properly guard libbpf includes

Including libbpf header files should be guarded by HAVE_LIBBPF_SUPPORT.
In bpf_counter.h, move the skeleton utilities under HAVE_BPF_SKEL.

Fixes: d6a735ef3277c45f ("perf bpf_counter: Move common functions to bpf_counter.h")
Reported-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20230105172243.7238-1-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
21 months agos390/kexec: fix ipl report address for kdump
Alexander Egorenkov [Mon, 14 Nov 2022 10:40:08 +0000 (11:40 +0100)]
s390/kexec: fix ipl report address for kdump

This commit addresses the following erroneous situation with file-based
kdump executed on a system with a valid IPL report.

On s390, a kdump kernel, its initrd and IPL report if present are loaded
into a special and reserved on boot memory region - crashkernel. When
a system crashes and kdump was activated before, the purgatory code
is entered first which swaps the crashkernel and [0 - crashkernel size]
memory regions. Only after that the kdump kernel is entered. For this
reason, the pointer to an IPL report in lowcore must point to the IPL report
after the swap and not to the address of the IPL report that was located in
crashkernel memory region before the swap. Failing to do so, makes the
kdump's decompressor try to read memory from the crashkernel memory region
which already contains the production's kernel memory.

The situation described above caused spontaneous kdump failures/hangs
on systems where the Secure IPL is activated because on such systems
an IPL report is always present. In that case kdump's decompressor tried
to parse an IPL report which frequently lead to illegal memory accesses
because an IPL report contains addresses to various data.

Cc: <stable@vger.kernel.org>
Fixes: 99feaa717e55 ("s390/kexec_file: Create ipl report and pass to next kernel")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
21 months agoASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets
Mark Brown [Fri, 6 Jan 2023 23:15:07 +0000 (23:15 +0000)]
ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets

The fsl-asoc-card AC'97 support currently tries to route to Playback and
Capture widgets provided by the AC'97 CODEC. This doesn't work since the
generic AC'97 driver registers with an "AC97" at the front of the stream
and hence widget names, update to reflect reality. It's not clear to me
if or how this ever worked.

Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230106-asoc-udoo-probe-v1-2-a5d7469d4f67@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
21 months agoASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC
Mark Brown [Fri, 6 Jan 2023 23:15:06 +0000 (23:15 +0000)]
ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC

The SSI driver calls the AC'97 playback and transmit streams "AC97 Playback"
and "AC97 Capture" respectively. This is the same name used by the generic
AC'97 CODEC driver in ASoC, creating confusion for the Freescale ASoC card
when it attempts to use these widgets in routing. Add a "CPU" in the name
like the regular DAIs registered by the driver to disambiguate.

Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230106-asoc-udoo-probe-v1-1-a5d7469d4f67@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
21 months agox86/boot: Avoid using Intel mnemonics in AT&T syntax asm
Peter Zijlstra [Tue, 10 Jan 2023 11:15:40 +0000 (12:15 +0100)]
x86/boot: Avoid using Intel mnemonics in AT&T syntax asm

With 'GNU assembler (GNU Binutils for Debian) 2.39.90.20221231' the
build now reports:

  arch/x86/realmode/rm/../../boot/bioscall.S: Assembler messages:
  arch/x86/realmode/rm/../../boot/bioscall.S:35: Warning: found `movsd'; assuming `movsl' was meant
  arch/x86/realmode/rm/../../boot/bioscall.S:70: Warning: found `movsd'; assuming `movsl' was meant

  arch/x86/boot/bioscall.S: Assembler messages:
  arch/x86/boot/bioscall.S:35: Warning: found `movsd'; assuming `movsl' was meant
  arch/x86/boot/bioscall.S:70: Warning: found `movsd'; assuming `movsl' was meant

Which is due to:

  PR gas/29525

  Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
  templates taking operands, mixed IsString/non-IsString template groups
  (with memory operands) cannot occur anymore. With that
  maybe_adjust_templates() becomes unnecessary (and is hence being
  removed).

More details: https://sourceware.org/bugzilla/show_bug.cgi?id=29525

Borislav Petkov further explains:

  " the particular problem here is is that the 'd' suffix is
    "conflicting" in the sense that you can have SSE mnemonics like movsD %xmm...
    and the same thing also for string ops (which is the case here) so apparently
    the agreement in binutils land is to use the always accepted suffixes 'l' or 'q'
    and phase out 'd' slowly... "

Fixes: 7a734e7dd93b ("x86, setup: "glove box" BIOS calls -- infrastructure")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y71I3Ex2pvIxMpsP@hirez.programming.kicks-ass.net
21 months agoMerge tag '6.2-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Tue, 10 Jan 2023 11:34:13 +0000 (05:34 -0600)]
Merge tag '6.2-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmb server fixes from Steve French:

 - fix possible infinite loop in socket handler

 - fix possible panic in ntlmv2 authentication

 - fix error handling on tree connect

* tag '6.2-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix infinite loop in ksmbd_conn_handler_loop()
  ksmbd: check nt_len to be at least CIFS_ENCPWD_SIZE in ksmbd_decode_ntlmssp_auth_blob
  ksmbd: send proper error response in smb2_tree_connect()

21 months agosh/mm: Fix pmd_t for real
Peter Zijlstra [Tue, 10 Jan 2023 10:44:51 +0000 (11:44 +0100)]
sh/mm: Fix pmd_t for real

Because typing is hard...

Fixes: 0862ff059c9e ("sh/mm: Make pmd_t similar to pte_t")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
21 months agodrm/amdgpu: fix pipeline sync v2
Christian König [Mon, 9 Jan 2023 08:06:03 +0000 (09:06 +0100)]
drm/amdgpu: fix pipeline sync v2

This fixes a potential memory leak of dma_fence objects in the CS code
as well as glitches in firefox because of missing pipeline sync.

v2: use the scheduler instead of the fence context

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2323
Tested-by: Michal Kubecek mkubecek@suse.cz
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230109130120.73389-1-christian.koenig@amd.com
21 months agoocteontx2-pf: Fix resource leakage in VF driver unbind
Hariprasad Kelam [Mon, 9 Jan 2023 06:13:25 +0000 (11:43 +0530)]
octeontx2-pf: Fix resource leakage in VF driver unbind

resources allocated like mcam entries to support the Ntuple feature
and hash tables for the tc feature are not getting freed in driver
unbind. This patch fixes the issue.

Fixes: 2da489432747 ("octeontx2-pf: devlink params support to set mcam entry count")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20230109061325.21395-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
21 months agoMerge branch 'selftests-net-isolate-l2_tos_ttl_inherit-sh-in-its-own-netns'
Paolo Abeni [Tue, 10 Jan 2023 09:13:52 +0000 (10:13 +0100)]
Merge branch 'selftests-net-isolate-l2_tos_ttl_inherit-sh-in-its-own-netns'

Guillaume Nault says:

====================
selftests/net: Isolate l2_tos_ttl_inherit.sh in its own netns.

l2_tos_ttl_inherit.sh uses a veth pair to run its tests, but only one
of the veth interfaces runs in a dedicated netns. The other one remains
in the initial namespace where the existing network configuration can
interfere with the setup used for the tests.

Isolate both veth devices in their own netns and ensure everything gets
cleaned up when the script exits.

Link: https://lore.kernel.org/netdev/924f1062-ab59-9b88-3b43-c44e73a30387@alu.unizg.hr/
====================

Link: https://lore.kernel.org/r/cover.1673191942.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
21 months agoselftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure.
Guillaume Nault [Sun, 8 Jan 2023 15:45:50 +0000 (16:45 +0100)]
selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure.

Use 'set -e' and an exit handler to stop the script if a command fails
and ensure the test environment is cleaned up in any case. Also, handle
the case where the script is interrupted by SIGINT.

The only command that's expected to fail is 'wait $ping_pid', since
it's killed by the script. Handle this case with '|| true' to make it
play well with 'set -e'.

Finally, return the Kselftest SKIP code (4) when the script breaks
because of an environment problem or a command line failure. The 0 and
1 return codes should now reliably indicate that all tests have been
run (0: all tests run and passed, 1: all tests run but at least one
failed, 4: test script didn't run completely).

Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
21 months agoselftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns.
Guillaume Nault [Sun, 8 Jan 2023 15:45:46 +0000 (16:45 +0100)]
selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns.

This selftest currently runs half in the current namespace and half in
a netns of its own. Therefore, the test can fail if the current
namespace is already configured with incompatible parameters (for
example if it already has a veth0 interface).

Adapt the script to put both ends of the veth pair in their own netns.
Now veth0 is created in NS0 instead of the current namespace, while
veth1 is set up in NS1 (instead of the 'testing' netns).

The user visible netns names are randomised to minimise the risk of
conflicts with already existing namespaces. The cleanup() function
doesn't need to remove the virtual interface anymore: deleting NS0 and
NS1 automatically removes the virtual interfaces they contained.

We can remove $ns, which was only used to run ip commands in the
'testing' netns (let's use the builtin "-netns" option instead).
However, we still need a similar functionality as ping and tcpdump
now need to run in NS0. So we now have $RUN_NS0 for that.

Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
21 months agoselftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad".
Guillaume Nault [Sun, 8 Jan 2023 15:45:41 +0000 (16:45 +0100)]
selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad".

The ping command can run before DAD completes. In that case, ping may
fail and break the selftest.

We don't need DAD here since we're working on isolated device pairs.

Fixes: b690842d12fd ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
21 months agoALSA: hda/hdmi: Add a HP device 0x8715 to force connect list
Adrian Chan [Mon, 9 Jan 2023 21:05:20 +0000 (16:05 -0500)]
ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list

Add the 'HP Engage Flex Mini' device to the force connect list to
enable audio through HDMI.

Signed-off-by: Adrian Chan <adchan@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230109210520.16060-1-adchan@google.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
21 months agoMAINTAINERS: stop nvme matching for nvmem files
Russell King (Oracle) [Tue, 3 Jan 2023 17:02:05 +0000 (17:02 +0000)]
MAINTAINERS: stop nvme matching for nvmem files

The nvme patterns detect all include files starting with nvme, which
also picks up the nvmem subsystem header files. Fix this by using
a more specific pattern.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[hch: switched to a purely inclusive pattern instead of excluding nvmem*]
Signed-off-by: Christoph Hellwig <hch@lst.de>
21 months agonvme: don't allow unprivileged passthrough on partitions
Christoph Hellwig [Sun, 8 Jan 2023 06:56:54 +0000 (07:56 +0100)]
nvme: don't allow unprivileged passthrough on partitions

Passthrough commands can always access the entire device, and thus
submitting them on partitions is an privelege escalation.

In hindsight we should have never allowed any passthrough commands on
partitions, but it's probably too late to change that decision now.

Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
21 months agonvme: replace the "bool vec" arguments with flags in the ioctl path
Christoph Hellwig [Sun, 8 Jan 2023 06:53:03 +0000 (07:53 +0100)]
nvme: replace the "bool vec" arguments with flags in the ioctl path

To prepare for passing down more information, replace the boolean
vec argument with a more extensible flags one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
21 months agonvme: remove __nvme_ioctl
Christoph Hellwig [Sun, 8 Jan 2023 06:45:30 +0000 (07:45 +0100)]
nvme: remove __nvme_ioctl

Open code __nvme_ioctl in the two callers to make future changes that
pass down additional paramters in the ioctl path easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
21 months agonvme-pci: fix error handling in nvme_pci_enable()
Tong Zhang [Thu, 29 Dec 2022 18:37:31 +0000 (10:37 -0800)]
nvme-pci: fix error handling in nvme_pci_enable()

There are two issues in nvme_pci_enable():

 1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by
    adding a goto disable statement.
 2) nvme_pci_configure_admin_queue could return -ENODEV, in this case,
    we will need to free IRQ properly.  Otherwise the following warning
    could be triggered:

[    5.286752] WARNING: CPU: 0 PID: 33 at kernel/irq/irqdomain.c:253 irq_domain_remove+0x12d/0x140
[    5.290547] Call Trace:
[    5.290626]  <TASK>
[    5.290695]  msi_remove_device_irq_domain+0xc9/0xf0
[    5.290843]  msi_device_data_release+0x15/0x80
[    5.290978]  release_nodes+0x58/0x90
[    5.293788] WARNING: CPU: 0 PID: 33 at kernel/irq/msi.c:276 msi_device_data_release+0x76/0x80
[    5.297573] Call Trace:
[    5.297651]  <TASK>
[    5.297719]  release_nodes+0x58/0x90
[    5.297831]  devres_release_all+0xef/0x140
[    5.298339]  device_unbind_cleanup+0x11/0xc0
[    5.298479]  really_probe+0x296/0x320

Fixes: a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable")
Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
21 months agonvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
Hector Martin [Wed, 4 Jan 2023 10:16:42 +0000 (19:16 +0900)]
nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers

This mirrors the quirk added to Apple Silicon controllers in apple.c.
These controllers do not support the Active NS ID List command and
behave identically to the SoC version judging by existing user
reports/syslogs, so will need the same fix. This quirk reverts
back to NVMe 1.0 behavior and disables the broken commands.

Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues")
Signed-off-by: Hector Martin <marcan@marcan.st>
Tested-by: Orlando Chamberlain <orlandoch.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
21 months agonvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
Hector Martin [Wed, 4 Jan 2023 09:21:49 +0000 (18:21 +0900)]
nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression

From the get-go, this driver and the ANS syslog have been complaining
about namespace identification. In 6.2-rc1, commit 811f4de0344d ("nvme:
avoid fallback to sequential scan due to transient issues") regressed
the driver by no longer allowing fallback to sequential namespace scans,
leaving us with no namespaces.

It turns out that the real problem is that this controller claiming
NVMe 1.1 compat is treating the CNS field as a binary field, as in NVMe
1.0. This already has a quirk, NVME_QUIRK_IDENTIFY_CNS, so set it for
the controller to fix all this nonsense (including other errors
triggered by other CNS commands).

Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues")
Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
Signed-off-by: Hector Martin <marcan@marcan.st>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Christoph Hellwig <hch@lst.de>
21 months agonet/mlx5e: Fix macsec possible null dereference when updating MAC security entity...
Emeel Hakim [Sun, 11 Dec 2022 11:22:23 +0000 (13:22 +0200)]
net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY)

Upon updating MAC security entity (SecY) in hw offload path, the macsec
security association (SA) initialization routine is called. In case of
extended packet number (epn) is enabled the salt and ssci attributes are
retrieved using the MACsec driver rx_sa context which is unavailable when
updating a SecY property such as encoding-sa hence the null dereference.
Fix by using the provided SA to set those attributes.

Fixes: 4411a6c0abd3 ("net/mlx5e: Support MACsec offload extended packet number (EPN)")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
21 months agonet/mlx5e: Fix macsec ssci attribute handling in offload path
Emeel Hakim [Wed, 14 Dec 2022 14:34:13 +0000 (16:34 +0200)]
net/mlx5e: Fix macsec ssci attribute handling in offload path

Currently when macsec offload is set with extended packet number (epn)
enabled, the driver wrongly deduce the short secure channel identifier
(ssci) from the salt instead of the stand alone ssci attribute as it
should, consequently creating a mismatch between the kernel and driver's
ssci values.
Fix by using the ssci value from the relevant attribute.

Fixes: 4411a6c0abd3 ("net/mlx5e: Support MACsec offload extended packet number (EPN)")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
21 months agonet/mlx5: E-switch, Coverity: overlapping copy
Shay Drory [Thu, 15 Dec 2022 13:33:38 +0000 (15:33 +0200)]
net/mlx5: E-switch, Coverity: overlapping copy

When a capability is set via port function caps callbacks, a memcpy() is
performed in which the source and the target are the same address, e.g.:
the copy is redundant. Hence, Remove it.
Discovered by Coverity.

Fixes: 7db98396ef45 ("net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE")
Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>