review.tizen.org Git - platform/kernel/linux-starfive.git/log

io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr

Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: disable partial retries for recvmsg with cmsg

We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.

Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.

Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: clear msg_controllen on partial sendmsg retry

If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.

Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/poll: serialize poll linked timer start with poll removal

We selectively grab the ctx->uring_lock for poll update/removal, but
we really should grab it from the start to fully synchronize with
linked timeouts. Normally this is indeed the case, but if requests
are forced async by the application, we don't fully cover removal
and timer disarm within the uring_lock.

Make this simpler by having consistent locking state for poll removal.

Cc: stable@vger.kernel.org # 6.1+
Reported-by: Querijn Voet <querijnqyn@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: clear current->worker_private on exit

A recent fix stopped clearing PF_IO_WORKER from current->flags on exit,
which meant that we can now call inc/dec running on the worker after it
has been removed if it ends up scheduling in/out as part of exit.

If this happens after an RCU grace period has passed, then the struct
pointed to by current->worker_private may have been freed, and we can
now be accessing memory that is freed.

Ensure this doesn't happen by clearing the task worker_private field.
Both io_wq_worker_running() and io_wq_worker_sleeping() check this
field before going any further, and we don't need any accounting etc
done after this worker has exited.

Fixes: fd37b884003c ("io_uring/io-wq: don't clear PF_IO_WORKER on exit")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/net: save msghdr->msg_control for retries

If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'nios2_fix_v6.4' of git://git./linux/kernel/git/dinguyen/linux

Pull NIOS2 dts fix from Dinh Nguyen:

- Fix tse_mac "max-frame-size" property

* tag 'nios2_fix_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: dts: Fix tse_mac "max-frame-size" property

nios2: dts: Fix tse_mac "max-frame-size" property

The given value of 1518 seems to refer to the layer 2 ethernet frame
size without 802.1Q tag. Actual use of the "max-frame-size" including in
the consumer of the "altr,tse-1.0" compatible is the MTU.

Fixes: 95acd4c7b69c ("nios2: Device tree support")
Fixes: 61c610ec61bb ("nios2: Add Max10 device tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>

Merge tag 'devicetree-fixes-for-6.4-3' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fix missing of_node_put() in init_overlay_changeset()

- Fix schema for qcom,pmic-mpp "qcom,paired" property

- Fix 'additionalProperties' in silvaco,i3c-master binding

- usage-model.rst: Use documented "arm,primecell" compatible string

- Update Damien Le Moal's email address

- Fixes in Realtek Bluetooth binding

* tag 'devicetree-fixes-for-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: pinctrl: qcom,pmic-mpp: Fix schema for "qcom,paired"
  dt-bindings: i3c: silvaco,i3c-master: fix missing schema restriction
  of: overlay: Fix missing of_node_put() in error case of init_overlay_changeset()
  docs: zh_CN/devicetree: sync usage-model fix
  docs: dt: fix documented Primecell compatible string
  dt-bindings: Change Damien Le Moal's contact email
  dt-bindings: net: realtek-bluetooth: Fix double RTL8723CS in desc
  dt-bindings: net: realtek-bluetooth: Fix RTL8821CS binding

dt-bindings: pinctrl: qcom,pmic-mpp: Fix schema for "qcom,paired"

The "qcom,paired" schema is all wrong. First, it's a list rather than an
object(dictionary). Second, it is missing a required type. The meta-schema
normally catches this, but schemas under "$defs" was not getting checked.
A fix for that is pending.

Fixes: f9a06b810951 ("dt-bindings: pinctrl: qcom,pmic-mpp: Convert qcom pmic mpp bindings to YAML")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230418150606.1528107-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>

Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"19 hotfixes. 14 are cc:stable and the remainder address issues which
  were introduced during this development cycle or which were considered
  inappropriate for a backport"

* tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  zswap: do not shrink if cgroup may not zswap
  page cache: fix page_cache_next/prev_miss off by one
  ocfs2: check new file size on fallocate call
  mailmap: add entry for John Keeping
  mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()
  epoll: ep_autoremove_wake_function should use list_del_init_careful
  mm/gup_test: fix ioctl fail for compat task
  nilfs2: reject devices with insufficient block count
  ocfs2: fix use-after-free when unmounting read-only filesystem
  lib/test_vmalloc.c: avoid garbage in page array
  nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
  riscv/purgatory: remove PGO flags
  powerpc/purgatory: remove PGO flags
  x86/purgatory: remove PGO flags
  kexec: support purgatories with .text.hot sections
  mm/uffd: allow vma to merge as much as possible
  mm/uffd: fix vma operation where start addr cuts part of vma
  radix-tree: move declarations to header
  nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()

zswap: do not shrink if cgroup may not zswap

Before storing a page, zswap first checks if the number of stored pages
exceeds the limit specified by memory.zswap.max, for each cgroup in the
hierarchy. If this limit is reached or exceeded, then zswap shrinking is
triggered and short-circuits the store attempt.

However, since the zswap's LRU is not memcg-aware, this can create the
following pathological behavior: the cgroup whose zswap limit is 0 will
evict pages from other cgroups continually, without lowering its own zswap
usage. This means the shrinking will continue until the need for swap
ceases or the pool becomes empty.

As a result of this, we observe a disproportionate amount of zswap
writeback and a perpetually small zswap pool in our experiments, even
though the pool limit is never hit.

More generally, a cgroup might unnecessarily evict pages from other
cgroups before we drive the memcg back below its limit.

This patch fixes the issue by rejecting zswap store attempt without
shrinking the pool when obj_cgroup_may_zswap() returns false.

[akpm@linux-foundation.org: fix return of unintialized value]
[akpm@linux-foundation.org: s/ENOSPC/ENOMEM/]
Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20230530232435.3097106-1-nphamcs@gmail.com
Fixes: f4840ccfca25 ("zswap: memcg accounting")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

page cache: fix page_cache_next/prev_miss off by one

Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
issue showed up after the conversion of hugetlb page cache lookup code to
use page_cache_next_miss.  Code in hugetlb fallocate, userfaultfd and GUP
is now using page_cache_next_miss to determine if a page is present the
page cache.  The following statement is used.

present = page_cache_next_miss(mapping, index, 1) != index;

There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
   the same index will always be returned.  This wrap-around value is 0,
   so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
   will be returned.  When passed a range of 1 as above, the passed
   index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.

As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file.  In
addition, hugetlb pages will not be included in core dumps if they need to
be brought in via GUP.  userfaultfd UFFDIO_COPY also uses this code and
will not notice pages already present in the cache.  It may try to
allocate a new page and potentially return ENOMEM as opposed to EEXIST.

Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
  returned if passed as index.

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/

Link: https://lkml.kernel.org/r/20230602225747.103865-2-mike.kravetz@oracle.com
Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: check new file size on fallocate call

When changing a file size with fallocate() the new size isn't being
checked. In particular, the FSIZE ulimit isn't being checked, which makes
fstest generic/228 fail. Simply adding a call to inode_newsize_ok() fixes
this issue.

Link: https://lkml.kernel.org/r/20230529152645.32680-1-lhenriques@suse.de
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Mark Fasheh <mark@fasheh.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: add entry for John Keeping

Map my corporate address to my personal one, as I am leaving the
company.

Link: https://lkml.kernel.org/r/20230531144839.1157112-1-john@keeping.me.uk
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()

If 'aggr_interval' is smaller than 'sample_interval', max_nr_accesses in
damon_nr_accesses_to_accesses_bp() becomes zero which leads to divide
error, let's validate the values of them in damon_set_attrs() to fix it,
which similar to others attrs check.

Link: https://lkml.kernel.org/r/20230527032101.167788-1-wangkefeng.wang@huawei.com
Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes")
Reported-by: syzbot+841a46899768ec7bec67@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=841a46899768ec7bec67
Link: https://lore.kernel.org/damon/00000000000055fc4e05fc975bc2@google.com/
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

epoll: ep_autoremove_wake_function should use list_del_init_careful

autoremove_wake_function uses list_del_init_careful, so should epoll's
more aggressive variant. It only doesn't because it was copied from an
older wait.c rather than the most recent.

[bsegall@google.com: add comment]
Link: https://lkml.kernel.org/r/xm26bki0ulsr.fsf_-_@google.com
Link: https://lkml.kernel.org/r/xm26pm6hvfer.fsf@google.com
Fixes: a16ceb139610 ("epoll: autoremove wakers even more aggressively")
Signed-off-by: Ben Segall <bsegall@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/gup_test: fix ioctl fail for compat task

When tools/testing/selftests/mm/gup_test.c is compiled as 32bit, then run
on arm64 kernel, it reports "ioctl: Inappropriate ioctl for device".

Fix it by filling compat_ioctl in gup_test_fops

Link: https://lkml.kernel.org/r/20230526022125.175728-1-haibo.li@mediatek.com
Signed-off-by: Haibo Li <haibo.li@mediatek.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: reject devices with insufficient block count

The current sanity check for nilfs2 geometry information lacks checks for
the number of segments stored in superblocks, so even for device images
that have been destructively truncated or have an unusually high number of
segments, the mount operation may succeed.

This causes out-of-bounds block I/O on file system block reads or log
writes to the segments, the latter in particular causing
"a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to
hang.

Fix this issue by checking the number of segments stored in the superblock
and avoiding mounting devices that can cause out-of-bounds accesses. To
eliminate the possibility of overflow when calculating the number of
blocks required for the device from the number of segments, this also adds
a helper function to calculate the upper bound on the number of segments
and inserts a check using it.

Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+7d50f1e54a12ba3aeae2@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: fix use-after-free when unmounting read-only filesystem

It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using
fstest generic/452. After a read-only remount, quotas are suspended and
ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info(). When unmounting
the filesystem, an UAF access to the oinfo will eventually cause a crash.

BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0
Read of size 8 at addr ffff8880389a8208 by task umount/669
...
Call Trace:
<TASK>
...
timer_delete+0x54/0xc0
try_to_grab_pending+0x31/0x230
__cancel_work_timer+0x6c/0x270
ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2]
ocfs2_dismount_volume+0xdd/0x450 [ocfs2]
generic_shutdown_super+0xaa/0x280
kill_block_super+0x46/0x70
deactivate_locked_super+0x4d/0xb0
cleanup_mnt+0x135/0x1f0
...
</TASK>

Allocated by task 632:
kasan_save_stack+0x1c/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x8b/0x90
ocfs2_local_read_info+0xe3/0x9a0 [ocfs2]
dquot_load_quota_sb+0x34b/0x680
dquot_load_quota_inode+0xfe/0x1a0
ocfs2_enable_quotas+0x190/0x2f0 [ocfs2]
ocfs2_fill_super+0x14ef/0x2120 [ocfs2]
mount_bdev+0x1be/0x200
legacy_get_tree+0x6c/0xb0
vfs_get_tree+0x3e/0x110
path_mount+0xa90/0xe10
__x64_sys_mount+0x16f/0x1a0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 650:
kasan_save_stack+0x1c/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x50
__kasan_slab_free+0xf9/0x150
__kmem_cache_free+0x89/0x180
ocfs2_local_free_info+0x2ba/0x3f0 [ocfs2]
dquot_disable+0x35f/0xa70
ocfs2_susp_quotas.isra.0+0x159/0x1a0 [ocfs2]
ocfs2_remount+0x150/0x580 [ocfs2]
reconfigure_super+0x1a5/0x3a0
path_mount+0xc8a/0xe10
__x64_sys_mount+0x16f/0x1a0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Link: https://lkml.kernel.org/r/20230522102112.9031-1-lhenriques@suse.de
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/test_vmalloc.c: avoid garbage in page array

It turns out that alloc_pages_bulk_array() does not treat the page_array
parameter as an output parameter, but rather reads the array and skips any
entries that have already been allocated.

This is somewhat unexpected and breaks this test, as we allocate the pages
array uninitialised on the assumption it will be overwritten.

As a result, the test was referencing uninitialised data and causing the
PFN to not be valid and thus a WARN_ON() followed by a null pointer deref
and panic.

In addition, this is an array of pointers not of struct page objects, so we
need only allocate an array with elements of pointer size.

We solve both problems by simply using kcalloc() and referencing
sizeof(struct page *) rather than sizeof(struct page).

Link: https://lkml.kernel.org/r/20230524082424.10022-1-lstoakes@gmail.com
Fixes: 869cb29a61a1 ("lib/test_vmalloc.c: add vm_map_ram()/vm_unmap_ram() test case")
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: fix possible out-of-bounds segment allocation in resize ioctl

Syzbot reports that in its stress test for resize ioctl, the log writing
function nilfs_segctor_do_construct hits a WARN_ON in
nilfs_segctor_truncate_segments().

It turned out that there is a problem with the current implementation of
the resize ioctl, which changes the writable range on the device (the
range of allocatable segments) at the end of the resize process.

This order is necessary for file system expansion to avoid corrupting the
superblock at trailing edge. However, in the case of a file system
shrink, if log writes occur after truncating out-of-bounds trailing
segments and before the resize is complete, segments may be allocated from
the truncated space.

The userspace resize tool was fine as it limits the range of allocatable
segments before performing the resize, but it can run into this issue if
the resize ioctl is called alone.

Fix this issue by changing nilfs_sufile_resize() to update the range of
allocatable segments immediately after successful truncation of segment
space in case of file system shrink.

Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com
Fixes: 4e33f9eab07e ("nilfs2: implement resize ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

riscv/purgatory: remove PGO flags

If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.

Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-4-b05c520b7296@chromium.org
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Cc: <stable@vger.kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

powerpc/purgatory: remove PGO flags

If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.

Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-3-b05c520b7296@chromium.org
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <stable@vger.kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

x86/purgatory: remove PGO flags

If profile-guided optimization is enabled, the purgatory ends up with
multiple .text sections. This is not supported by kexec and crashes the
system.

Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium.org
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kexec: support purgatories with .text.hot sections

Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7.

When upreving llvm I realised that kexec stopped working on my test
platform.

The reason seems to be that due to PGO there are multiple .text sections
on the purgatory, and kexec does not supports that.

This patch (of 4):

Clang16 links the purgatory text in two sections when PGO is in use:

  [ 1] .text             PROGBITS         0000000000000000  00000040
       00000000000011a1  0000000000000000  AX       0     0     16
  [ 2] .rela.text        RELA             0000000000000000  00003498
       0000000000000648  0000000000000018   I      24     1     8
  ...
  [17] .text.hot.        PROGBITS         0000000000000000  00003220
       000000000000020b  0000000000000000  AX       0     0     1
  [18] .rela.text.hot.   RELA             0000000000000000  00004428
       0000000000000078  0000000000000018   I      24    17     8

And both of them have their range [sh_addr ... sh_addr+sh_size] on the
area pointed by `e_entry`.

This causes that image->start is calculated twice, once for .text and
another time for .text.hot. The second calculation leaves image->start
in a random location.

Because of this, the system crashes immediately after:

kexec_core: Starting new kernel

Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org
Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org
Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/uffd: allow vma to merge as much as possible

We used to not pass in the pgoff correctly when register/unregister uffd
regions, it caused incorrect behavior on vma merging and can cause
mergeable vmas being separate after ioctls return.

For example, when we have:

  vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)

Then someone unregisters uffd on range (5-9), it should logically become:

  vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)

But with current code we'll have:

  vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)

This patch allows such merge to happen correctly before ioctl returns.

This behavior seems to have existed since the 1st day of uffd.  Since
pgoff for vma_merge() is only used to identify the possibility of vma
merging, meanwhile here what we did was always passing in a pgoff smaller
than what we should, so there should have no other side effect besides not
merging it.  Let's still tentatively copy stable for this, even though I
don't see anything will go wrong besides vma being split (which is mostly
not user visible).

Link: https://lkml.kernel.org/r/20230517190916.3429499-3-peterx@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/uffd: fix vma operation where start addr cuts part of vma

Patch series "mm/uffd: Fix vma merge/split", v2.

This series contains two patches that fix vma merge/split for userfaultfd
on two separate issues.

Patch 1 fixes a regression since 6.1+ due to something we overlooked when
converting to maple tree apis.  The plan is we use patch 1 to replace the
commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to
vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring
uffd vma operations back aligned with the rest code again.

Patch 2 fixes a long standing issue that vma can be left unmerged even if
we can for either uffd register or unregister.

Many thanks to Lorenzo on either noticing this issue from the assert
movement patch, looking at this problem, and also provided a reproducer on
the unmerged vma issue [1].

[1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e

This patch (of 2):

It seems vma merging with uffd paths is broken with either
register/unregister, where right now we can feed wrong parameters to
vma_merge() and it's found by recent patch which moved asserts upwards in
vma_merge() by Lorenzo Stoakes:

https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/

It's possible that "start" is contained within vma but not clamped to its
start.  We need to convert this into either "cannot merge" case or "can
merge" case 4 which permits subdivision of prev by assigning vma to prev.
As we loop, each subsequent VMA will be clamped to the start.

This patch will eliminate the report and make sure vma_merge() calls will
become legal again.

One thing to mention is that the "Fixes: 29417d292bd0" below is there only
to help explain where the warning can start to trigger, the real commit to
fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
issue, but unfortunately we may want to keep it in Fixes too just to ease
kernel backporters for easier tracking.

Link: https://lkml.kernel.org/r/20230517190916.3429499-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230517190916.3429499-2-peterx@redhat.com
Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

radix-tree: move declarations to header

The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definition causes a missing-prototype warning:

lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]

Instead, move the declaration for this function to a new header that can
be included by both, and do the same for the radix_tree_node_cachep
variable that has the same underlying problem but does not cause a warning
with gcc.

[zhangpeng.00@bytedance.com: fix building radix tree test suite]
Link: https://lkml.kernel.org/r/20230521095450.21332-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230516194212.548910-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()

A syzbot fault injection test reported that nilfs_btnode_create_block, a
helper function that allocates a new node block for b-trees, causes a
kernel BUG for disk images where the file system block size is smaller
than the page size.

This was due to unexpected flags on the newly allocated buffer head, and
it turned out to be because the buffer flags were not cleared by
nilfs_btnode_abort_change_key() after an error occurred during a b-tree
update operation and the buffer was later reused in that state.

Fix this issue by using nilfs_btnode_delete() to abandon the unused
preallocated buffer in nilfs_btnode_abort_change_key().

Link: https://lkml.kernel.org/r/20230513102428.10223-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+b0a35a5c1f7e846d3b09@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000d1d6c205ebc4d512@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

io_uring/io-wq: don't clear PF_IO_WORKER on exit

A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().

The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.

Reported-by: Zorro Lang <zlang@redhat.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-6.4-rc6-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"A  more fixes and regression fixes:

   - in subpage mode, fix crash when repairing metadata at the end of
     a stripe

   - properly enable async discard when remounting from read-only to
     read-write

   - scrub regression fixes:
      - respect read-only scrub when attempting to do a repair
      - fix reporting of found errors, the stats don't get properly
        accounted after a stripe repair"

* tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: scrub: also report errors hit during the initial read
  btrfs: scrub: respect the read-only flag during repair
  btrfs: properly enable async discard when switching from RO->RW
  btrfs: subpage: fix a crash in metadata repair path

Linux 6.4-rc6

Merge tag 'x86_urgent_for_v6.4_rc6' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

- Set up the kernel CS earlier in the boot process in case EFI boots
   the kernel after bypassing the decompressor and the CS descriptor
   used ends up being the EFI one which is not mapped in the identity
   page table, leading to early SEV/SNP guest communication exceptions
   resulting in the guest crashing

* tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed

Merge tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
"Five smb3 server fixes, all also for stable:

   - Fix four slab out of bounds warnings: improve checks for protocol
     id, and for small packet length, and for create context parsing,
     and for negotiate context parsing

   - Fix for incorrect dereferencing POSIX ACLs"

* tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: validate smb request protocol id
  ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop
  ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
  ksmbd: fix out-of-bound read in parse_lease_state()
  ksmbd: fix out-of-bound read in deassemble_neg_contexts()

Merge tag 'i2c-for-6.4-rc6' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Biggest news is that Andi Shyti steps in for maintaining the
  controller drivers. Thank you very much!

  Other than that, one new driver maintainer and the rest is usual
  driver bugfixes. at24 has a Kconfig dependecy fix"

* tag 'i2c-for-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Add entries for Renesas RZ/V2M I2C driver
  eeprom: at24: also select REGMAP
  i2c: sprd: Delete i2c adapter in .remove's error path
  i2c: mv64xxx: Fix reading invalid status value in atomic mode
  i2c: designware: fix idx_write_cnt in read loop
  i2c: mchp-pci1xxxx: Avoid cast to incompatible function type
  i2c: img-scb: Fix spelling mistake "innacurate" -> "inaccurate"
  MAINTAINERS: Add myself as I2C host drivers maintainer

Merge tag 'soundwire-6.4-fixes' of git://git./linux/kernel/git/vkoul/soundwire

Pull soundwire fixes from Vinod Koul:
"Core fix for missing flag clear, error patch handling in qcom driver
  and BIOS quirk for HP Spectre x360:

   - HP Spectre x360 soundwire DMI quirk

   - Error path handling for qcom driver

   - Core fix for missing clear of alloc_slave_rt"

* tag 'soundwire-6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: stream: Add missing clear of alloc_slave_rt
  soundwire: qcom: add proper error paths in qcom_swrm_startup()
  soundwire: dmi-quirks: add new mapping for HP Spectre x360

Merge tag 'arm-fixes-6.4-2' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"Most of the changes this time are for the Qualcomm Snapdragon
  platforms.

  There are bug fixes for error handling in Qualcomm icc-bwmon,
  rpmh-rsc, ramp_controller and rmtfs driver as well as the AMD tee
  firmware driver and a missing initialization in the Arm ff-a firmware
  driver. The Qualcomm RPMh and EDAC drivers need some rework to work
  correctly on all supported chips.

  The DT fixes include:

   - i.MX8 fixes for gpio, pinmux and clock settings

   - ADS touchscreen gpio polarity settings in several machines

   - Address dtb warnings for caches, panel and input-enable properties
     on Qualcomm platforms

   - Incorrect data on qualcomm platforms fir SA8155P power domains,
     SM8550 LLCC, SC7180-lite SDRAM frequencies and SM8550 soundwire

   - Remoteproc firmware paths are corrected for Sony Xperia 10 IV"

* tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits)
  firmware: arm_ffa: Set handle field to zero in memory descriptor
  ARM: dts: Fix erroneous ADS touchscreen polarities
  arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
  arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
  arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
  EDAC/qcom: Get rid of hardcoded register offsets
  EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
  arm64: dts: qcom: sm8550: Use the correct LLCC register scheme
  dt-bindings: cache: qcom,llcc: Fix SM8550 description
  arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
  arm64: dts: qcom: sm8550: use uint16 for Soundwire interval
  soc: qcom: rpmhpd: Add SA8155P power domains
  arm64: dts: qcom: Split out SA8155P and use correct RPMh power domains
  dt-bindings: power: qcom,rpmpd: Add SA8155P
  soc: qcom: Rename ice to qcom_ice to avoid module name conflict
  soc: qcom: rmtfs: Fix error code in probe()
  soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
  ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
  ARM: at91: pm: fix imbalanced reference counter for ethernet devices
  arm64: dts: qcom: sm6375-pdx225: Fix remoteproc firmware paths
  ...

dt-bindings: i3c: silvaco,i3c-master: fix missing schema restriction

Each device schema must end with unevaluatedProperties: false, if it
references other common schema. Otherwise it would allow any properties
to be listed.

Fixes: b8b0446f1f1a ("dt-bindings: i3c: Describe Silvaco master binding")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230609141107.66128-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>

of: overlay: Fix missing of_node_put() in error case of init_overlay_changeset()

In init_overlay_changeset(), the variable "node" is from
of_get_child_by_name(), and the "node" should be discarded in error case.

Fixes: d1651b03c2df ("of: overlay: add overlay symbols to live device tree")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/20230602020502.11693-1-hayashi.kunihiko@socionext.com
Signed-off-by: Rob Herring <robh@kernel.org>

Merge tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- Fix an issue with the hardware queue nr_active, causing it to become
   imbalanced (Tian)

- Fix an issue with null_blk not releasing pages if configured as
   memory backed (Nitesh)

- Fix a locking issue in dasd (Jan)

* tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux:
  s390/dasd: Use correct lock while counting channel queue length
  null_blk: Fix: memory release when memory_backed=1
  blk-mq: fix blk_mq_hw_ctx active request accounting

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio bug fixes from Michael Tsirkin:
"A bunch of fixes all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  tools/virtio: use canonical ftrace path
  vhost_vdpa: support PACKED when setting-getting vring_base
  vhost: support PACKED when setting-getting vring_base
  vhost: Fix worker hangs due to missed wake up calls
  vhost: Fix crash during early vhost_transport_send_pkt calls
  vhost_net: revert upend_idx only on retriable error
  vhost_vdpa: tell vqs about the negotiated
  vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister
  tools/virtio: Add .gitignore for ringtest
  tools/virtio: Fix arm64 ringtest compilation error
  vduse: avoid empty string for dev name
  vhost: use kzalloc() instead of kmalloc() followed by memset()

Merge tag 'ceph-for-6.4-rc6' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A fix for a potential data corruption in differential backup and
  snapshot-based mirroring scenarios in RBD and a reference counting
  fixup to avoid use-after-free in CephFS, all marked for stable"

* tag 'ceph-for-6.4-rc6' of https://github.com/ceph/ceph-client:
  ceph: fix use-after-free bug for inodes when flushing capsnaps
  rbd: get snapshot context after exclusive lock is ensured to be held
  rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting

s390/dasd: Use correct lock while counting channel queue length

The lock around counting the channel queue length in the BIODASDINFO
ioctl was incorrectly changed to the dasd_block->queue_lock with commit
583d6535cb9d ("dasd: remove dead code"). This can lead to endless list
iterations and a subsequent crash.

The queue_lock is supposed to be used only for queue lists belonging to
dasd_block. For dasd_device related queue lists the ccwdev lock must be
used.

Fix the mentioned issues by correctly using the ccwdev lock instead of
the queue lock.

Fixes: 583d6535cb9d ("dasd: remove dead code")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230609153750.1258763-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'riscv-for-linus-6.4-rc6' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A fix to avoid ISA-disallowed privilege mappings that can result from
   WRITE+EXEC mmap requests from userspace.

- A fix for kfence to handle the huge pages.

- A fix to avoid converting misaligned VAs to huge pages.

- ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE has been selected so kprobe
   can understand user pointers.

* tag 'riscv-for-linus-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: fix kprobe __user string arg print fault issue
  riscv: Check the virtual alignment before choosing a map size
  riscv: Fix kfence now that the linear mapping can be backed by PUD/P4D/PGD
  riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable

Merge tag 's390-6.4-3' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

- Avoid linker error for randomly generated config file that has
   CONFIG_BRANCH_PROFILE_NONE enabled and make it similar to riscv, x86
   and also to commit 4bf3ec384edf ("s390: disable branch profiling for
   vdso").

- Currently, if the device is offline and all the channel paths are
   either configured or varied offline, the associated subchannel gets
   unregistered. Don't unregister the subchannel, instead unregister
   offline device.

* tag 's390-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/purgatory: disable branch profiling
  s390/cio: unregister device when the only path is gone

Merge tag 'gpio-fixes-for-v6.4-rc6' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
"Two fixes for the GPIO testing module and one commit making Andy a
  reviewer for the GPIO subsystem:

   - fix a memory corruption bug in gpio-sim

   - fix inconsistencies in user-space configuration of gpio-sim

   - make Andy Shevchenko a reviewer for the GPIO subsystem"

* tag 'gpio-fixes-for-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: add Andy Shevchenko as reviewer for the GPIO subsystem
  gpio: sim: quietly ignore configured lines outside the bank
  gpio: sim: fix memory corruption when adding named lines and unnamed hogs

tools/virtio: use canonical ftrace path

The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

A few spots in tools/virtio still refer to this older debugfs
path, so let's update them to avoid confusion.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Message-Id: <20230215223350.2658616-6-zwisler@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>

vhost_vdpa: support PACKED when setting-getting vring_base

Use the right structs for PACKED or split vqs when setting and
getting the vring base.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230424225031.18947-4-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

vhost: support PACKED when setting-getting vring_base

Use the right structs for PACKED or split vqs when setting and
getting the vring base.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230424225031.18947-3-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

Merge tag 'pinctrl-v6.4-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
"A single fix for the Meson driver, nothing else has surfaced so far
this cycle"

* tag 'pinctrl-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: meson-axg: add missing GPIOA_18 gpio group

Merge tag 'sound-6.4-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Lots of small fixes, and almost all are device-specific.

  A few of them are the fixes for the old regressions by the fast kctl
  lookups (introduced around 5.19). Others are ASoC simple-card fixes,
  selftest compile warning fixes, ASoC AMD quirks, various ASoC codec
  fixes as well as usual HD-audio quirks"

* tag 'sound-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
  ALSA: hda/realtek: Enable 4 amplifiers instead of 2 on a HP platform
  ALSA: hda: Fix kctl->id initialization
  ALSA: gus: Fix kctl->id initialization
  ALSA: cmipci: Fix kctl->id initialization
  ALSA: ymfpci: Fix kctl->id initialization
  ALSA: ice1712,ice1724: fix the kcontrol->id initialization
  ALSA: hda/realtek: Add quirk for Clevo NS50AU
  ALSA: hda/realtek: Add quirks for Asus ROG 2024 laptops using CS35L41
  ALSA: hda/realtek: Add "Intel Reference board" and "NUC 13" SSID in the ALC256
  ALSA: hda/realtek: Add Lenovo P3 Tower platform
  ALSA: hda/realtek: Add a quirk for HP Slim Desktop S01
  selftests: alsa: pcm-test: Fix compiler warnings about the format
  ASoC: fsl_sai: Enable BCI bit if SAI works on synchronous mode with BYP asserted
  ASoC: simple-card-utils: fix PCM constraint error check
  ASoC: cs35l56: Remove NULL check from cs35l56_sdw_dai_set_stream()
  ASoC: max98363: limit the number of channel to 1
  ASoC: max98363: Removed 32bit support
  ASoC: mediatek: mt8195: fix use-after-free in driver remove path
  ASoC: mediatek: mt8188: fix use-after-free in driver remove path
  ASoC: amd: yc: Add Thinkpad Neo14 to quirks list for acp6x
  ...

Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
"Fix an ext4 regression which breaks remounting r/w file systems that
  have the quota feature enabled"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: only check dquot_initialize_needed() when debugging
  Revert "ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled"

Merge tag 'at24-fixes-for-v6.4-rc6' of git://git./linux/kernel/git/brgl/linux into i2c/for-current

at24 fixes for v6.4-rc6

- fix a Kconfig issue (we need to select REGMAP, not only REGMAP_I2C)

Merge tag 'imx-fixes-6.4-2' of git://git./linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 6.4, round 2:

- Fix SPI CS pinmux for the final production version of imx8mn-beacon
  board.
- Fix GPIOs for USDHC2 CD and WP signals on imx8qm-mek board.
- Assign default clock rate for i.MX8 LPUARTs to fix UART failure.

* tag 'imx-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
  arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
  arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals

Link: https://lore.kernel.org/r/20230607141312.GU4199@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'drm-fixes-2023-06-09' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Bit busier and a bit more scattered than usual. amdgpu is the main
  one, with ivpu and msm having a few fixes, then i915, exynos, ast,
  lima, radeon with some misc bits, but overall nothing standing out.

  fb-helper:
   - Fill in fb-helper vars more correctly

  amdgpu:
   - S0ix fixes
   - GPU reset fixes
   - SMU13 fixes
   - SMU11 fixes
   - Misc Display fixes
   - Revert RV/RV2/PCO clock counter changes
   - Fix Stoney xclk value
   - Fix reserved vram debug info

  radeon:
   - Fix a potential use after free

  i915:
   - CDCLK voltage fix for ADL-P
   - eDP wake sync pulse fix
   - Two error handling fixes to selftests

  exynos:
   - Fix wrong return in Exynos vidi driver
   - Fix use-after-free issue to Exynos g2d driver

  ast:
   - resume and modeset fixes for ast

  ivpu:
   - Assorted ivpu fixes

  lima:
   - lima context destroy fix

  msm:
   - Fix max segment size to address splat on newer a6xx
   - Disable PSR by default w/ modparam to re-enable, since there still
     seems to be a lingering issue
   - Fix HPD issue
   - Fix issue with unitialized GMU mutex"

* tag 'drm-fixes-2023-06-09' of git://anongit.freedesktop.org/drm/drm: (32 commits)
  drm/msm/a6xx: initialize GMU mutex earlier
  drm/msm/dp: enable HDP plugin/unplugged interrupts at hpd_enable/disable
  accel/ivpu: Fix sporadic VPU boot failure
  accel/ivpu: Do not use mutex_lock_interruptible
  accel/ivpu: Do not trigger extra VPU reset if the VPU is idle
  drm/amd/display: Reduce sdp bw after urgent to 90%
  drm/amdgpu: change reserved vram info print
  drm/amdgpu: fix xclk freq on CHIP_STONEY
  drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl
  Revert "drm/amdgpu: switch to golden tsc registers for raven/raven2"
  Revert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id"
  Revert "drm/amdgpu: change the reference clock for raven/raven2"
  drm/amd/display: add ODM case when looking for first split pipe
  drm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder during suspend path
  drm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs
  drm/amd/pm: Fix power context allocation in SMU13
  drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram
  drm/amd: Disallow s0ix without BIOS support again
  drm/i915/selftests: Add some missing error propagation
  drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl
  ...

Merge tag 'cgroup-for-6.4-rc5-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

- Fix css_set reference leaks on fork failures

- Fix CPU hotplug locking in cgroup_transfer_tasks() which is used by
   cgroup1 cpuset

- Doc update

* tag 'cgroup-for-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Documentation: Clarify usage of memory limits
  cgroup: always put cset in cgroup_css_set_put_fork
  cgroup: fix missing cpus_read_{lock,unlock}() in cgroup_transfer_tasks()

Merge tag 'drm-msm-fixes-2023-06-08' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

A few more late fixes for v6.4-rc6

+ Fix max segment size to address splat on newer a6xx
+ Disable PSR by default w/ modparam to re-enable, since there
still seems to be a lingering issue
+ Fix HPD issue
+ Fix issue with unitialized GMU mutex

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGufjVZRNT6YfQ7YUXFC7Cz95wdLF7QHAYkiGfp+3Xc3DQ@mail.gmail.com

drm/msm/a6xx: initialize GMU mutex earlier

Move GMU mutex initialization earlier to make sure that it is always
initialized. a6xx_destroy can be called from ther failure path before
GMU initialization.

This fixes the following backtrace:

------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 58 at kernel/locking/mutex.c:582 __mutex_lock+0x1ec/0x3d0
Modules linked in:
CPU: 0 PID: 58 Comm: kworker/u16:1 Not tainted 6.3.0-rc5-00155-g187c06436519 #565
Hardware name: Qualcomm Technologies, Inc. SM8350 HDK (DT)
Workqueue: events_unbound deferred_probe_work_func
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mutex_lock+0x1ec/0x3d0
lr : __mutex_lock+0x1ec/0x3d0
sp : ffff800008993620
x29: ffff800008993620 x28: 0000000000000002 x27: ffff47b253c52800
x26: 0000000001000606 x25: ffff47b240bb2810 x24: fffffffffffffff4
x23: 0000000000000000 x22: ffffc38bba15ac14 x21: 0000000000000002
x20: ffff800008993690 x19: ffff47b2430cc668 x18: fffffffffffe98f0
x17: 6f74616c75676572 x16: 20796d6d75642067 x15: 0000000000000038
x14: 0000000000000000 x13: ffffc38bbba050b8 x12: 0000000000000666
x11: 0000000000000222 x10: ffffc38bbba603e8 x9 : ffffc38bbba050b8
x8 : 00000000ffffefff x7 : ffffc38bbba5d0b8 x6 : 0000000000000222
x5 : 000000000000bff4 x4 : 40000000fffff222 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff47b240cb1880
Call trace:
__mutex_lock+0x1ec/0x3d0
mutex_lock_nested+0x2c/0x38
a6xx_destroy+0xa0/0x138
a6xx_gpu_init+0x41c/0x618
adreno_bind+0x188/0x290
component_bind_all+0x118/0x248
msm_drm_bind+0x1c0/0x670
try_to_bring_up_aggregate_device+0x164/0x1d0
__component_add+0xa8/0x16c
component_add+0x14/0x20
dsi_dev_attach+0x20/0x2c
dsi_host_attach+0x9c/0x144
devm_mipi_dsi_attach+0x34/0xac
lt9611uxc_attach_dsi.isra.0+0x84/0xfc
lt9611uxc_probe+0x5b8/0x67c
i2c_device_probe+0x1ac/0x358
really_probe+0x148/0x2ac
__driver_probe_device+0x78/0xe0
driver_probe_device+0x3c/0x160
__device_attach_driver+0xb8/0x138
bus_for_each_drv+0x84/0xe0
__device_attach+0x9c/0x188
device_initial_probe+0x14/0x20
bus_probe_device+0xac/0xb0
deferred_probe_work_func+0x8c/0xc8
process_one_work+0x2bc/0x594
worker_thread+0x228/0x438
kthread+0x108/0x10c
ret_from_fork+0x10/0x20
irq event stamp: 299345
hardirqs last enabled at (299345): [<ffffc38bb9ba61e4>] put_cpu_partial+0x1c8/0x22c
hardirqs last disabled at (299344): [<ffffc38bb9ba61dc>] put_cpu_partial+0x1c0/0x22c
softirqs last enabled at (296752): [<ffffc38bb9890434>] _stext+0x434/0x4e8
softirqs last disabled at (296741): [<ffffc38bb989669c>] ____do_softirq+0x10/0x1c
---[ end trace 0000000000000000 ]---

Fixes: 4cd15a3e8b36 ("drm/msm/a6xx: Make GPU destroy a bit safer")
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/531540/
Signed-off-by: Rob Clark <robdclark@chromium.org>

drm/msm/dp: enable HDP plugin/unplugged interrupts at hpd_enable/disable

The internal_hpd flag is set to true by dp_bridge_hpd_enable() and set to
false by dp_bridge_hpd_disable() to handle GPIO pinmuxed into DP controller
case. HDP related interrupts can not be enabled until internal_hpd is set
to true. At current implementation dp_display_config_hpd() will initialize
DP host controller first followed by enabling HDP related interrupts if
internal_hpd was true at that time. Enable HDP related interrupts depends on
internal_hpd status may leave system with DP driver host is in running state
but without HDP related interrupts being enabled. This will prevent external
display from being detected. Eliminated this dependency by moving HDP related
interrupts enable/disable be done at dp_bridge_hpd_enable/disable() directly
regardless of internal_hpd status.

Changes in V3:
-- dp_catalog_ctrl_hpd_enable() and dp_catalog_ctrl_hpd_disable()
-- rewording ocmmit text

Changes in V4:
-- replace dp_display_config_hpd() with dp_display_host_start()
-- move enable_irq() at dp_display_host_start();

Changes in V5:
-- replace dp_display_host_start() with dp_display_host_init()

Changes in V6:
-- squash remove enable_irq() and disable_irq()

Fixes: cd198caddea7 ("drm/msm/dp: Rely on hpd_enable/disable callbacks")
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Tested-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Link: https://lore.kernel.org/r/1684878756-17830-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>

Merge tag 'drm-misc-fixes-2023-06-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v6.4-rc6:
- resume and modeset fixes for ast.
- Fill in fb-helper vars more correctly.
- Assorted ivpu fixes.
- lima context destroy fix.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ea6b88ec-b653-3781-0b68-cd0275c27923@linux.intel.com

Merge tag 'exynos-drm-fixes-for-v6.4-rc6' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes

Two fixups
- Fix wrong return in Exynos vidi driver.
- Fix use-after-free issue to Exynos g2d driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230607043148.43303-1-inki.dae@samsung.com

Merge tag 'drm-intel-fixes-2023-06-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

CDCLK voltage fix for ADL-P and eDP wake sync pulse fix.
Two error handling fixes to selftests (to appease static checkers)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZIGUHBz7+LsqN2nm@jlahtine-mobl.ger.corp.intel.com

Merge tag 'amd-drm-fixes-6.4-2023-06-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.4-2023-06-07:

amdgpu:
- S0ix fixes
- GPU reset fixes
- SMU13 fixes
- SMU11 fixes
- Misc Display fixes
- Revert RV/RV2/PCO clock counter changes
- Fix Stoney xclk value
- Fix reserved vram debug info

radeon:
- Fix a potential use after free

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230607213740.7723-1-alexander.deucher@amd.com

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Two tiny arm64 fixes for -rc6.

  One fixes a build breakage when MAX_ORDER can be nonsensical if
  CONFIG_EXPERT=y and the other fixes the address masking for perf's
  page fault software events so that it is consistent amongst them:

   - Fix build breakage due to bogus MAX_ORDER definitions on !4k pages

   - Avoid masking fault address for perf software events"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
  arm64: Remove the ARCH_FORCE_MAX_ORDER config input prompt

vhost: Fix worker hangs due to missed wake up calls

We can race where we have added work to the work_list, but
vhost_task_fn has passed that check but not yet set us into
TASK_INTERRUPTIBLE. wake_up_process will see us in TASK_RUNNING and
just return.

This bug was intoduced in commit f9010dbdce91 ("fork, vhost: Use
CLONE_THREAD to fix freezer/ps regression") when I moved the setting
of TASK_INTERRUPTIBLE to simplfy the code and avoid get_signal from
logging warnings about being in the wrong state. This moves the setting
of TASK_INTERRUPTIBLE back to before we test if we need to stop the
task to avoid a possible race there as well. We then have vhost_worker
set TASK_RUNNING if it finds work similar to before.

Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230607192338.6041-3-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: Fix crash during early vhost_transport_send_pkt calls

If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we
can race where:
1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue
2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create.
3. vhost_worker_create will set the dev->worker pointer before setting
the worker->vtsk pointer.
4. thread0's vhost_work_queue will see the dev->worker pointer is
set and try to call vhost_task_wake using not yet set worker->vtsk
pointer.
5. We then crash since vtsk is NULL.

Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker
threads"), we only had the worker pointer so we could just check it to
see if VHOST_SET_OWNER has been done. After that commit we have the
vhost_worker and vhost_task pointer, so we can now hit the bug above.

This patch embeds the vhost_worker in the vhost_dev and moves the work
list initialization back to vhost_dev_init, so we can just check the
worker.vtsk pointer to check if VHOST_SET_OWNER has been done like
before.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230607192338.6041-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

vhost_net: revert upend_idx only on retriable error

Fix possible virtqueue used buffers leak and corresponding stuck
in case of temporary -EIO from sendmsg() which is produced by
tun driver while backend device is not up.

In case of no-retriable error and zcopy do not revert upend_idx
to pass packet data (that is update used_idx in corresponding
vhost_zerocopy_signal_used()) as if packet data has been
transferred successfully.

v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN
in case of fake successful transmit.

Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Acked-by: Jason Wang <jasowang@redhat.com>

vhost_vdpa: tell vqs about the negotiated

As is done in the net, iscsi, and vsock vhost support, let the vdpa vqs
know about the features that have been negotiated. This allows vhost
to more safely make decisions based on the features, such as when using
PACKED vs split queues.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230424225031.18947-2-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister

Currently the vdpa device is unregistered after the workqueue that
processes vq commands is disabled. However, the device unregister
process can still send commands to the cvq (a vlan delete for example)
which leads to a hang because the handing workqueue has been disabled
and the command never finishes:

[ 2263.095764] rcu: INFO: rcu_sched self-detected stall on CPU
[ 2263.096307] rcu:        9-....: (5250 ticks this GP) idle=dac4/1/0x4000000000000000 softirq=111009/111009 fqs=2544
[ 2263.097154] rcu:        (t=5251 jiffies g=393549 q=347 ncpus=10)
[ 2263.097648] CPU: 9 PID: 94300 Comm: kworker/u20:2 Not tainted 6.3.0-rc6_for_upstream_min_debug_2023_04_14_00_02 #1
[ 2263.098535] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 2263.099481] Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
[ 2263.100143] RIP: 0010:virtnet_send_command+0x109/0x170
[ 2263.100621] Code: 1d df f5 ff 85 c0 78 5c 48 8b 7b 08 e8 d0 c5 f5 ff 84 c0 75 11 eb 22 48 8b 7b 08 e8 01 b7 f5 ff 84 c0 75 15 f3 90 48 8b 7b 08 <48> 8d 74 24 04 e8 8d c5 f5 ff 48 85 c0 74 de 48 8b 83 f8 00 00 00
[ 2263.102148] RSP: 0018:ffff888139cf36e8 EFLAGS: 00000246
[ 2263.102624] RAX: 0000000000000000 RBX: ffff888166bea940 RCX: 0000000000000001
[ 2263.103244] RDX: 0000000000000000 RSI: ffff888139cf36ec RDI: ffff888146763800
[ 2263.103864] RBP: ffff888139cf3710 R08: ffff88810d201000 R09: 0000000000000000
[ 2263.104473] R10: 0000000000000002 R11: 0000000000000003 R12: 0000000000000002
[ 2263.105082] R13: 0000000000000002 R14: ffff888114528400 R15: ffff888166bea000
[ 2263.105689] FS:  0000000000000000(0000) GS:ffff88852cc80000(0000) knlGS:0000000000000000
[ 2263.106404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2263.106925] CR2: 00007f31f394b000 CR3: 000000010615b006 CR4: 0000000000370ea0
[ 2263.107542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2263.108163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2263.108769] Call Trace:
[ 2263.109059]  <TASK>
[ 2263.109320]  ? check_preempt_wakeup+0x11f/0x230
[ 2263.109750]  virtnet_vlan_rx_kill_vid+0x5a/0xa0
[ 2263.110180]  vlan_vid_del+0x9c/0x170
[ 2263.110546]  vlan_device_event+0x351/0x760 [8021q]
[ 2263.111004]  raw_notifier_call_chain+0x41/0x60
[ 2263.111426]  dev_close_many+0xcb/0x120
[ 2263.111808]  unregister_netdevice_many_notify+0x130/0x770
[ 2263.112297]  ? wq_worker_running+0xa/0x30
[ 2263.112688]  unregister_netdevice_queue+0x89/0xc0
[ 2263.113128]  unregister_netdev+0x18/0x20
[ 2263.113512]  virtnet_remove+0x4f/0x230
[ 2263.113885]  virtio_dev_remove+0x31/0x70
[ 2263.114273]  device_release_driver_internal+0x18f/0x1f0
[ 2263.114746]  bus_remove_device+0xc6/0x130
[ 2263.115146]  device_del+0x173/0x3c0
[ 2263.115502]  ? kernfs_find_ns+0x35/0xd0
[ 2263.115895]  device_unregister+0x1a/0x60
[ 2263.116279]  unregister_virtio_device+0x11/0x20
[ 2263.116706]  device_release_driver_internal+0x18f/0x1f0
[ 2263.117182]  bus_remove_device+0xc6/0x130
[ 2263.117576]  device_del+0x173/0x3c0
[ 2263.117929]  ? vdpa_dev_remove+0x20/0x20 [vdpa]
[ 2263.118364]  device_unregister+0x1a/0x60
[ 2263.118752]  mlx5_vdpa_dev_del+0x4c/0x80 [mlx5_vdpa]
[ 2263.119232]  vdpa_match_remove+0x21/0x30 [vdpa]
[ 2263.119663]  bus_for_each_dev+0x71/0xc0
[ 2263.120054]  vdpa_mgmtdev_unregister+0x57/0x70 [vdpa]
[ 2263.120520]  mlx5v_remove+0x12/0x20 [mlx5_vdpa]
[ 2263.120953]  auxiliary_bus_remove+0x18/0x30
[ 2263.121356]  device_release_driver_internal+0x18f/0x1f0
[ 2263.121830]  bus_remove_device+0xc6/0x130
[ 2263.122223]  device_del+0x173/0x3c0
[ 2263.122581]  ? devl_param_driverinit_value_get+0x29/0x90
[ 2263.123070]  mlx5_rescan_drivers_locked+0xc4/0x2d0 [mlx5_core]
[ 2263.123633]  mlx5_unregister_device+0x54/0x80 [mlx5_core]
[ 2263.124169]  mlx5_uninit_one+0x54/0x150 [mlx5_core]
[ 2263.124656]  mlx5_sf_dev_remove+0x45/0x90 [mlx5_core]
[ 2263.125153]  auxiliary_bus_remove+0x18/0x30
[ 2263.125560]  device_release_driver_internal+0x18f/0x1f0
[ 2263.126052]  bus_remove_device+0xc6/0x130
[ 2263.126451]  device_del+0x173/0x3c0
[ 2263.126815]  mlx5_sf_dev_remove+0x39/0xf0 [mlx5_core]
[ 2263.127318]  mlx5_sf_dev_state_change_handler+0x178/0x270 [mlx5_core]
[ 2263.127920]  blocking_notifier_call_chain+0x5a/0x80
[ 2263.128379]  mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core]
[ 2263.128951]  process_one_work+0x1bb/0x3c0
[ 2263.129355]  ? process_one_work+0x3c0/0x3c0
[ 2263.129766]  worker_thread+0x4d/0x3c0
[ 2263.130140]  ? process_one_work+0x3c0/0x3c0
[ 2263.130548]  kthread+0xb9/0xe0
[ 2263.130895]  ? kthread_complete_and_exit+0x20/0x20
[ 2263.131349]  ret_from_fork+0x1f/0x30
[ 2263.131717]  </TASK>

The fix is to disable and destroy the workqueue after the device
unregister. It is expected that vhost will not trigger kicks after
the unregister. But even if it would, the wq is disabled already by
setting the pointer to NULL (done so in the referenced commit).

Fixes: ad6dc1daaf29 ("vdpa/mlx5: Avoid processing works if workqueue was destroyed")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20230516095800.3549932-1-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>

tools/virtio: Add .gitignore for ringtest

Ignore executables for ringtest.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_C121802C93CB4095C6D7D95113442E830A07@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: Fix arm64 ringtest compilation error

Add cpu_relax() for arm64 instead of directly assert(), and add assert.h
header file. Also, add smp_wmb and smp_mb for arm64.

Compilation error as follows, avoid __always_inline undefined.

    $ make
    cc -Wall -pthread -O2 -ggdb -flto -fwhole-program -c -o ring.o ring.c
    In file included from ring.c:10:
    main.h: In function ‘busy_wait’:
    main.h:99:21: warning: implicit declaration of function ‘assert’
    [-Wimplicit-function-declaration]
    99 | #define cpu_relax() assert(0)
        |                     ^~~~~~
    main.h:107:17: note: in expansion of macro ‘cpu_relax’
    107 |                 cpu_relax();
        |                 ^~~~~~~~~
    main.h:12:1: note: ‘assert’ is defined in header ‘<assert.h>’; did you
    forget to ‘#include <assert.h>’?
    11 | #include <stdbool.h>
    +++ |+#include <assert.h>
    12 |
    main.h: At top level:
    main.h:143:23: error: expected ‘;’ before ‘void’
    143 | static __always_inline
        |                       ^
        |                       ;
    144 | void __read_once_size(const volatile void *p, void *res, int
    size)
        | ~~~~
    main.h:158:23: error: expected ‘;’ before ‘void’
    158 | static __always_inline void __write_once_size(volatile void *p,
    void *res, int size)
        |                       ^~~~~
        |                       ;
    make: *** [<builtin>: ring.o] Error 1

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_F53E159DD7925174445D830DA19FACF44B07@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vduse: avoid empty string for dev name

Syzkaller hits a kernel WARN when the first character of the dev name
provided is NULL. Solution is to add a NULL check before calling
cdev_device_add() in vduse_create_dev().

kobject: (0000000072042169): attempted to be registered with empty name!
WARNING: CPU: 0 PID: 112695 at lib/kobject.c:236
Call Trace:
kobject_add_varg linux/src/lib/kobject.c:390 [inline]
kobject_add+0xf6/0x150 linux/src/lib/kobject.c:442
device_add+0x28f/0xc20 linux/src/drivers/base/core.c:2167
cdev_device_add+0x83/0xc0 linux/src/fs/char_dev.c:546
vduse_create_dev linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2254 [inline]
vduse_ioctl+0x7b5/0xf30 linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2316
vfs_ioctl linux/src/fs/ioctl.c:47 [inline]
file_ioctl linux/src/fs/ioctl.c:510 [inline]
do_vfs_ioctl+0x14b/0xa80 linux/src/fs/ioctl.c:697
ksys_ioctl+0x7c/0xa0 linux/src/fs/ioctl.c:714
__do_sys_ioctl linux/src/fs/ioctl.c:721 [inline]
__se_sys_ioctl linux/src/fs/ioctl.c:719 [inline]
__x64_sys_ioctl+0x42/0x50 linux/src/fs/ioctl.c:719
do_syscall_64+0x94/0x330 linux/src/arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Cc: "Xie Yongji" <xieyongji@bytedance.com>
Reported-by: Xianjun Zeng <zengxianjun@bytedance.com>
Signed-off-by: Sheng Zhao <sheng.zhao@bytedance.com>
Message-Id: <20230530033626.1266794-1-sheng.zhao@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Cc: "Michael S. Tsirkin"<mst@redhat.com>, "Jason Wang"<jasowang@redhat.com>,
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>

MAINTAINERS: Add entries for Renesas RZ/V2M I2C driver

Add the MAINTAINERS entries for the Renesas RZ/V2M I2C driver.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Acked-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

riscv: fix kprobe __user string arg print fault issue

On riscv qemu platform, when add kprobe event on do_sys_open() to show
filename string arg, it just print fault as follow:

echo 'p:myprobe do_sys_open dfd=$arg1 filename=+0($arg2):string flags=$arg3
mode=$arg4' > kprobe_events

bash-166     [000] ...1.   360.195367: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6

bash-166     [000] ...1.   360.219369: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6

bash-191     [000] ...1.   360.378827: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x98800 mode=0x0

As riscv do not select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
the +0($arg2) addr is processed as a kernel address though it is a
userspace address, cause the above filename=(fault) print. So select
ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to avoid the issue, after that the
kprobe trace is ok as below:

bash-166     [000] ...1.    96.767641: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6

bash-166     [000] ...1.    96.793751: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6

bash-177     [000] ...1.    96.962354: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/sys/kernel/debug/tracing/events/kprobes/"
flags=0x98800 mode=0x0

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
Link: https://lore.kernel.org/r/20230504072910.3742842-1-ruanjinjie@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Merge tag 'net-6.4-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from can, wifi, netfilter, bluetooth and ebpf.

  Current release - regressions:

   - bpf: sockmap: avoid potential NULL dereference in
     sk_psock_verdict_data_ready()

   - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()

   - phylink: actually fix ksettings_set() ethtool call

   - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3

  Current release - new code bugs:

   - wifi: mt76: fix possible NULL pointer dereference in
     mt7996_mac_write_txwi()

  Previous releases - regressions:

   - netfilter: fix NULL pointer dereference in nf_confirm_cthelper

   - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS

   - openvswitch: fix upcall counter access before allocation

   - bluetooth:
      - fix use-after-free in hci_remove_ltk/hci_remove_irk
      - fix l2cap_disconnect_req deadlock

   - nic: bnxt_en: prevent kernel panic when receiving unexpected
     PHC_UPDATE event

  Previous releases - always broken:

   - core: annotate rfs lockless accesses

   - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values

   - netfilter: add null check for nla_nest_start_noflag() in
     nft_dump_basechain_hook()

   - bpf: fix UAF in task local storage

   - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294

   - ipv6: rpl: fix route of death.

   - tcp: gso: really support BIG TCP

   - mptcp: fixes for user-space PM address advertisement

   - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT

   - can: avoid possible use-after-free when j1939_can_rx_register fails

   - batman-adv: fix UaF while rescheduling delayed work

   - eth: qede: fix scheduling while atomic

   - eth: ice: make writes to /dev/gnssX synchronous"

* tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
  bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
  bnxt_en: Skip firmware fatal error recovery if chip is not accessible
  bnxt_en: Query default VLAN before VNIC setup on a VF
  bnxt_en: Don't issue AP reset during ethtool's reset operation
  bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
  net: bcmgenet: Fix EEE implementation
  eth: ixgbe: fix the wake condition
  eth: bnxt: fix the wake condition
  lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release()
  bpf: Add extra path pointer check to d_path helper
  net: sched: fix possible refcount leak in tc_chain_tmplt_add()
  net: sched: act_police: fix sparse errors in tcf_police_dump()
  net: openvswitch: fix upcall counter access before allocation
  net: sched: move rtm_tca_policy declaration to include file
  ice: make writes to /dev/gnssX synchronous
  net: sched: add rcu annotations around qdisc->qdisc_sleeping
  rfs: annotate lockless accesses to RFS sock flow table
  rfs: annotate lockless accesses to sk->sk_rxhash
  virtio_net: use control_buf for coalesce params
  ...

Merge tag 'xfs-6.4-rc5-fixes' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Dave Chinner:
"These are a set of regression fixes discovered on recent kernels. I
  was hoping to send this to you a week and half ago, but events out of
  my control delayed finalising the changes until early this week.

  Whilst the diffstat looks large for this stage of the merge window, a
  large chunk of it comes from moving the guts of one function from one
  file to another i.e. it's the same code, it is just run in a different
  context where it is safe to hold a specific lock. Otherwise the
  individual changes are relatively small and straigtht forward.

  Summary:

   - Propagate unlinked inode list corruption back up to log recovery
     (regression fix)

   - improve corruption detection for AGFL entries, AGFL indexes and
     XEFI extents (syzkaller fuzzer oops report)

   - Avoid double perag reference release (regression fix)

   - Improve extent merging detection in scrub (regression fix)

   - Fix a new undefined high bit shift (regression fix)

   - Fix for AGF vs inode cluster buffer deadlock (regression fix)"

* tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: collect errors from inodegc for unlinked inode recovery
  xfs: validate block number being freed before adding to xefi
  xfs: validity check agbnos on the AGFL
  xfs: fix agf/agfl verification on v4 filesystems
  xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()
  xfs: fix broken logic when detecting mergeable bmap records
  xfs: Fix undefined behavior of shift into sign bit
  xfs: fix AGF vs inode cluster buffer deadlock
  xfs: defered work could create precommits
  xfs: restore allocation trylock iteration
  xfs: buffer pins need to hold a buffer reference

ext4: only check dquot_initialize_needed() when debugging

ext4_xattr_block_set() relies on its caller to call dquot_initialize()
on the inode.  To assure that this has happened there are WARN_ON
checks.  Unfortunately, this is subject to false positives if there is
an antagonist thread which is flipping the file system at high rates
between r/o and rw.  So only do the check if EXT4_XATTR_DEBUG is
enabled.

Link: https://lore.kernel.org/r/20230608044056.GA1418535@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Revert "ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled"

This reverts commit a44be64bbecb15a452496f60db6eacfee2b59c79.

Link: https://lore.kernel.org/r/653b3359-2005-21b1-039d-c55ca4cffdcc@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

docs: zh_CN/devicetree: sync usage-model fix

Sync compatibly string fix from the English document.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/b39560250cb58f8cdcfe95791ce5af7455c6e8e3.1682333574.git.baruch@tkos.co.il
Signed-off-by: Rob Herring <robh@kernel.org>

docs: dt: fix documented Primecell compatible string

Only arm,primecell is documented as compatible string for Primecell
peripherals. Current code agrees with that.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Link: https://lore.kernel.org/r/9e137548c4e76e0d8deef6d49460cb37897934ca.1682333574.git.baruch@tkos.co.il
Signed-off-by: Rob Herring <robh@kernel.org>

dt-bindings: Change Damien Le Moal's contact email

Change my email address to dlemoal@kernel.org.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230514222614.115299-1-dlemoal@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>

btrfs: scrub: also report errors hit during the initial read

[BUG]
After the recent scrub rework introduced in commit e02ee89baa66 ("btrfs:
scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"),
btrfs scrub no longer reports repaired errors any more:

  # mkfs.btrfs -f $dev -d DUP
  # mount $dev $mnt
  # xfs_io -f -d -c "pwrite -b 64K -S 0xaa 0 64" $mnt/file
  # umount $dev
  # xfs_io -f -c "pwrite -S 0xff $phy1 64K" $dev # Corrupt the first mirror
  # mount $dev $mnt
  # btrfs scrub start -BR $mnt
  scrub done for 725e7cb7-8a4a-4c77-9f2a-86943619e218
  Scrub started:    Tue Jun  6 14:56:50 2023
  Status:           finished
  Duration:         0:00:00
   data_extents_scrubbed: 2
   tree_extents_scrubbed: 18
   data_bytes_scrubbed: 131072
   tree_bytes_scrubbed: 294912
   read_errors: 0
   csum_errors: 0 <<< No errors here
   verify_errors: 0
         [...]
   uncorrectable_errors: 0
   unverified_errors: 0
   corrected_errors: 16 <<< Only corrected errors
   last_physical: 2723151872

This can confuse btrfs-progs, as it relies on the csum_errors to
determine if there is anything wrong.

While on v6.3.x kernels, the report is different:

csum_errors: 16 <<<
verify_errors: 0
[...]
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 16 <<<

[CAUSE]
In the reworked scrub, we update the scrub progress inside
scrub_stripe_report_errors(), using various bitmaps to update the
result.

For example for csum_errors, we use bitmap_weight() of
stripe->csum_error_bitmap.

Unfortunately at that stage, all error bitmaps (except
init_error_bitmap) are the result of the latest repair attempt, thus if
the stripe is fully repaired, those error bitmaps will all be empty,
resulting the above output mismatch.

To fix this, record the number of errors into stripe->init_nr_*_errors.
Since we don't really care about where those errors are, we only need to
record the number of errors.

Then in scrub_stripe_report_errors(), use those initial numbers to
update the progress other than using the latest error bitmaps.

Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: scrub: respect the read-only flag during repair

[BUG]
With recent scrub rework, the scrub operation no longer respects the
read-only flag passed by "-r" option of "btrfs scrub start" command.

  # mkfs.btrfs -f -d raid1 $dev1 $dev2
  # mount $dev1 $mnt
  # xfs_io -f -d -c "pwrite -b 128K -S 0xaa 0 128k" $mnt/file
  # sync
  # xfs_io -c "pwrite -S 0xff $phy1 64k" $dev1
  # xfs_io -c "pwrite -S 0xff $((phy2 + 65536)) 64k" $dev2
  # mount $dev1 $mnt -o ro
  # btrfs scrub start -BrRd $mnt
  Scrub device $dev1 (id 1) done
  Scrub started:    Tue Jun  6 09:59:14 2023
  Status:           finished
  Duration:         0:00:00
         [...]
   corrected_errors: 16 <<< Still has corrupted sectors
   last_physical: 1372585984

  Scrub device $dev2 (id 2) done
  Scrub started:    Tue Jun  6 09:59:14 2023
  Status:           finished
  Duration:         0:00:00
         [...]
   corrected_errors: 16 <<< Still has corrupted sectors
   last_physical: 1351614464

  # btrfs scrub start -BrRd $mnt
  Scrub device $dev1 (id 1) done
  Scrub started:    Tue Jun  6 10:00:17 2023
  Status:           finished
  Duration:         0:00:00
         [...]
   corrected_errors: 0 <<< No more errors
   last_physical: 1372585984

  Scrub device $dev2 (id 2) done
         [...]
   corrected_errors: 0 <<< No more errors
   last_physical: 1372585984

[CAUSE]
In the newly reworked scrub code, repair is always submitted no matter
if we're doing a read-only scrub.

[FIX]
Fix it by skipping the write submission if the scrub is a read-only one.

Unfortunately for the report part, even for a read-only scrub we will
still report it as corrected errors, as we know it's repairable, even we
won't really submit the write.

Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

soundwire: stream: Add missing clear of alloc_slave_rt

The current path that skips allocating the slave runtime does not clear
the alloc_slave_rt flag, this is clearly incorrect. Add the missing
clear, so the runtime won't be erroneously cleaned up.

Fixes: f3016b891c8c ("soundwire: stream: sdw_stream_add_ functions can be called multiple times")
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230602101140.2040141-1-ckeepax@opensource.cirrus.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

Merge branch 'bnxt_en-bug-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

This patchset has the following fixes for bnxt_en:

1. Add missing VNIC ID parameter in the FW message when getting an
updated RSS configuration from the FW.

2. Fix a warning when doing ethtool reset on newer chips.

3. Fix VLAN issue on a VF when a default VLAN is assigned.

4. Fix a problem during DPC (Downstream Port containment) scenario.

5. Fix a NULL pointer dereference when receiving a PTP event from FW.

6. Fix VXLAN/Geneve UDP port delete/add with newer FW.
====================

Link: https://lore.kernel.org/r/20230607075409.228450-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks

As per the new udp tunnel framework, drivers which need to know the
details of a port entry (i.e. port type) when it gets deleted should
use the .set_port / .unset_port callbacks.

Implementing the current .udp_tunnel_sync callback would mean that the
deleted tunnel port entry would be all zeros. This used to work on
older firmware because it would not check the input when deleting a
tunnel port. With newer firmware, the delete will now fail and
subsequent tunnel port allocation will fail as a result.

Fixes: 442a35a5a7aa ("bnxt: convert to new udp_tunnel_nic infra")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event

The firmware can send PHC_RTC_UPDATE async event on a PF that may not
have PTP registered. In such a case, there will be a null pointer
deference for bp->ptp_cfg when we try to handle the event.

Fix it by not registering for this event with the firmware if !bp->ptp_cfg.
Also, check that bp->ptp_cfg is valid before proceeding when we receive
the event.

Fixes: 8bcf6f04d4a5 ("bnxt_en: Handle async event when the PHC is updated in RTC mode")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: Skip firmware fatal error recovery if chip is not accessible

Driver starts firmware fatal error recovery by detecting
heartbeat failure or fw reset count register changing. But
these checks are not reliable if the device is not accessible.
This can happen while DPC (Downstream Port containment) is in
progress. Skip firmware fatal recovery if pci_device_is_present()
returns false.

Fixes: acfb50e4e773 ("bnxt_en: Add FW fatal devlink_health_reporter.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: Query default VLAN before VNIC setup on a VF

We need to call bnxt_hwrm_func_qcfg() on a VF to query the default
VLAN that may be setup by the PF.  If a default VLAN is enabled,
the VF cannot support VLAN acceleration on the receive side and
the VNIC must be setup to strip out the default VLAN tag.  If a
default VLAN is not enabled, the VF can support VLAN acceleration
on the receive side.  The VNIC should be set up to strip or not
strip the VLAN based on the RX VLAN acceleration setting.

Without this call to determine the default VLAN before calling
bnxt_setup_vnic(), the VNIC may not be set up correctly.  For
example, bnxt_setup_vnic() may set up to strip the VLAN tag based
on stale default VLAN information.  If RX VLAN acceleration is
not enabled, the VLAN tag will be incorrectly stripped and the
RX data path will not work correctly.

Fixes: cf6645f8ebc6 ("bnxt_en: Add function for VF driver to query default VLAN.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: Don't issue AP reset during ethtool's reset operation

Only older NIC controller's firmware uses the PROC AP reset type.
Firmware on 5731X/5741X and newer chips does not support this reset
type. When bnxt_reset() issues a series of resets, this PROC AP
reset may actually fail on these newer chips because the firmware
is not ready to accept this unsupported command yet. Avoid this
unnecessary error by skipping this reset type on chips that don't
support it.

Fixes: 7a13240e3718 ("bnxt_en: fix ethtool_reset_flags ABI violations")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()

We must specify the vnic id of the vnic in the input structure of this
firmware message. Otherwise we will get an error from the firmware.

Fixes: 98a4322b70e8 ("bnxt_en: update RSS config using difference algorithm")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

eeprom: at24: also select REGMAP

Selecting only REGMAP_I2C can leave REGMAP unset, causing build errors,
so also select REGMAP to prevent the build errors.

../drivers/misc/eeprom/at24.c:540:42: warning: 'struct regmap_config' declared inside parameter list will not be visible outside of this definition or declaration
  540 |                                   struct regmap_config *regmap_config)
../drivers/misc/eeprom/at24.c: In function 'at24_make_dummy_client':
../drivers/misc/eeprom/at24.c:552:18: error: implicit declaration of function 'devm_regmap_init_i2c' [-Werror=implicit-function-declaration]
  552 |         regmap = devm_regmap_init_i2c(dummy_client, regmap_config);
../drivers/misc/eeprom/at24.c:552:16: warning: assignment to 'struct regmap *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
  552 |         regmap = devm_regmap_init_i2c(dummy_client, regmap_config);
../drivers/misc/eeprom/at24.c: In function 'at24_probe':
../drivers/misc/eeprom/at24.c:586:16: error: variable 'regmap_config' has initializer but incomplete type
  586 |         struct regmap_config regmap_config = { };
../drivers/misc/eeprom/at24.c:586:30: error: storage size of 'regmap_config' isn't known
  586 |         struct regmap_config regmap_config = { };
../drivers/misc/eeprom/at24.c:586:30: warning: unused variable 'regmap_config' [-Wunused-variable]

Fixes: 5c015258478e ("eeprom: at24: add basic regmap_i2c support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

ceph: fix use-after-free bug for inodes when flushing capsnaps

There is a race between capsnaps flush and removing the inode from
'mdsc->snap_flush_list' list:

   == Thread A ==                     == Thread B ==
ceph_queue_cap_snap()
-> allocate 'capsnapA'
->ihold('&ci->vfs_inode')
->add 'capsnapA' to 'ci->i_cap_snaps'
->add 'ci' to 'mdsc->snap_flush_list'
    ...
   == Thread C ==
ceph_flush_snaps()
->__ceph_flush_snaps()
  ->__send_flush_snap()
                                handle_cap_flushsnap_ack()
                                 ->iput('&ci->vfs_inode')
                                   this also will release 'ci'
                                    ...
      == Thread D ==
                                ceph_handle_snap()
                                 ->flush_snaps()
                                  ->iterate 'mdsc->snap_flush_list'
                                   ->get the stale 'ci'
->remove 'ci' from                ->ihold(&ci->vfs_inode) this
   'mdsc->snap_flush_list'           will WARNING

To fix this we will increase the inode's i_count ref when adding 'ci'
to the 'mdsc->snap_flush_list' list.

[ idryomov: need_put int -> bool ]

Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2209299
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

accel/ivpu: Fix sporadic VPU boot failure

Wait for AON bit in HOST_SS_CPR_RST_CLR to return 0 before
starting VPUIP power up sequence, otherwise the VPU device
may sporadically fail to boot.

An error in power up sequence is propagated to the runtime
power management - the device will be in an error state
until the VPU driver is reloaded.

Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
Cc: stable@vger.kernel.org # 6.3.x
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230607094502.388489-1-stanislaw.gruszka@linux.intel.com

accel/ivpu: Do not use mutex_lock_interruptible

If we get signal when waiting for the mmu->lock we do not invalidate
current MMU configuration that might result in undefined behavior.

Additionally there is little or no benefit on break waiting for
ipc->lock. In current code base, we keep this lock for short periods.

Fixes: 263b2ba5fc93 ("accel/ivpu: Add Intel VPU MMU support")
Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-2-stanislaw.gruszka@linux.intel.com

accel/ivpu: Do not trigger extra VPU reset if the VPU is idle

Turning off the PLL and entering D0i3 will reset the VPU so
an explicit IP reset is redundant.
But if the VPU is active, it may interfere with PLL disabling
and to avoid that, we have to issue an additional IP reset
to silence the VPU before turning off the PLL.

Fixes: a8fed6d1e0b9 ("accel/ivpu: Fix power down sequence")
Cc: stable@vger.kernel.org # 6.3.x
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-1-stanislaw.gruszka@linux.intel.com

Merge tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

- fix a broken sync while rescheduling delayed work,
by Vladislav Efanov

* tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge:
batman-adv: Broken sync while rescheduling delayed work
====================

Link: https://lore.kernel.org/r/20230607155515.548120-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmgenet: Fix EEE implementation

We had a number of short comings:

- EEE must be re-evaluated whenever the state machine detects a link
  change as wight be switching from a link partner with EEE
  enabled/disabled

- tx_lpi_enabled controls whether EEE should be enabled/disabled for the
  transmit path, which applies to the TBUF block

- We do not need to forcibly enable EEE upon system resume, as the PHY
  state machine will trigger a link event that will do that, too

Fixes: 6ef398ea60d9 ("net: bcmgenet: add EEE support")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230606214348.2408018-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: ixgbe: fix the wake condition

Flip the netif_carrier_ok() condition in queue wake logic.
When I moved it to inside __netif_txq_completed_wake()
I missed negating it.

This made the condition ineffective and could probably
lead to crashes.

Fixes: 301f227fc860 ("net: piggy back on the memory barrier in bql when waking queues")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>