Pavel Begunkov [Tue, 4 Apr 2023 12:39:47 +0000 (13:39 +0100)]
io_uring: don't put nodes under spinlocks
io_req_put_rsrc() doesn't need any locking, so move it out of
a spinlock section in __io_req_complete_post() and adjust helpers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5b87a5f31270dade6805f7acafc4cc34b84b241.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Tue, 4 Apr 2023 12:39:46 +0000 (13:39 +0100)]
io_uring/rsrc: keep cached refs per node
We cache refs of the current node (i.e. ctx->rsrc_node) in
ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the
cached refs in struct io_rsrc_node for now. It's a prep patch and
shouldn't change anything in practise.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Tue, 4 Apr 2023 12:39:45 +0000 (13:39 +0100)]
io_uring/rsrc: use non-pcpu refcounts for nodes
One problem with the current rsrc infra is that often updates will
generates lots of rsrc nodes, each carry pcpu refs. That takes quite a
lot of memory, especially if there is a stall, and takes lots of CPU
cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark
updating files in a loop.
Replace pcpu refs with normal refcounting. There is already a hot path
avoiding atomics / refs, but following patches will further improve it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 30 Mar 2023 16:05:31 +0000 (10:05 -0600)]
io_uring: cap io_sqring_entries() at SQ ring size
We already do this manually for the !SQPOLL case, do it in general and
we can also dump the ugly min3() in io_submit_sqes().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 30 Mar 2023 16:03:41 +0000 (10:03 -0600)]
io_uring: rename trace_io_uring_submit_sqe() tracepoint
It has nothing to do with the SQE at this point, it's a request
submission. While in there, get rid of the 'force_nonblock' argument
which is also dead, as we only pass in true.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 27 Mar 2023 15:38:15 +0000 (16:38 +0100)]
io_uring: encapsulate task_work state
For task works we're passing around a bool pointer for whether the
current ring is locked or not, let's wrap it in a structure, that
will make it more opaque preventing abuse and will also help us
to pass more info in the future if needed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 27 Mar 2023 15:38:14 +0000 (16:38 +0100)]
io_uring: remove extra tw trylocks
Before cond_resched()'ing in handle_tw_list() we also drop the current
ring context, and so the next loop iteration will need to pick/pin a new
context and do trylock.
The chunk removed by this patch was intended to be an optimisation
covering exactly this case, i.e. retaking the lock after reschedule, but
in reality it's skipped for the first iteration after resched as
described and will keep hammering the lock if it's contended.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 27 Mar 2023 19:10:21 +0000 (13:10 -0600)]
io_uring/io-wq: drop outdated comment
Since the move to PF_IO_WORKER, we don't juggle memory context manually
anymore. Remove that outdated part of the comment for __io_worker_idle().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 27 Mar 2023 15:34:48 +0000 (16:34 +0100)]
io_uring: kill unused notif declarations
There are two leftover structures from the notification registration
mechanism that has never been released, kill them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f05f65aebaf8b1b5bf28519a8fdb350e3e7c9ad0.1679924536.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gabriel Krisman Bertazi [Wed, 22 Mar 2023 01:16:28 +0000 (22:16 -0300)]
io-wq: Drop struct io_wqe
Since commit
0654b05e7e65 ("io_uring: One wqe per wq"), we have just a
single io_wqe instance embedded per io_wq. Drop the extra structure in
favor of accessing struct io_wq directly, cleaning up quite a bit of
dereferences and backpointers.
No functional changes intended. Tested with liburing's testsuite
and mmtests performance microbenchmarks. I didn't observe any
performance regressions.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Gabriel Krisman Bertazi [Wed, 22 Mar 2023 01:16:27 +0000 (22:16 -0300)]
io-wq: Move wq accounting to io_wq
Since we now have a single io_wqe per io_wq instead of per-node, and in
preparation to its removal, move the accounting into the parent
structure.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 17 Mar 2023 16:42:08 +0000 (10:42 -0600)]
io_uring/kbuf: disallow mapping a badly aligned provided ring buffer
On at least parisc, we have strict requirements on how we virtually map
an address that is shared between the application and the kernel. On
these platforms, IOU_PBUF_RING_MMAP should be used when setting up a
shared ring buffer for provided buffers. If the application is mapping
these pages and asking the kernel to pin+map them as well, then we have
no control over what virtual address we get in the kernel.
For that case, do a sanity check if SHM_COLOUR is defined, and disallow
the mapping request. The application must fall back to using
IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently
with the set of helpers that it has.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Breno Leitao [Thu, 23 Feb 2023 16:43:53 +0000 (08:43 -0800)]
io_uring: Add KASAN support for alloc_caches
Add support for KASAN in the alloc_caches (apoll and netmsg_cache).
Thus, if something touches the unused caches, it will raise a KASAN
warning/exception.
It poisons the object when the object is put to the cache, and unpoisons
it when the object is gotten or freed.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Breno Leitao [Thu, 23 Feb 2023 16:43:52 +0000 (08:43 -0800)]
io_uring: Move from hlist to io_wq_work_node
Having cache entries linked using the hlist format brings no benefit, and
also requires an unnecessary extra pointer address per cache entry.
Use the internal io_wq_work_node single-linked list for the internal
alloc caches (async_msghdr and async_poll)
This is required to be able to use KASAN on cache entries, since we do
not need to touch unused (and poisoned) cache entries when adding more
entries to the list.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230223164353.2839177-2-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Breno Leitao [Fri, 10 Mar 2023 20:11:07 +0000 (12:11 -0800)]
io_uring: One wqe per wq
Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now
bound to a task, the task basically uses only the NUMA local io_wqe, and
almost never changes NUMA nodes, thus, the other wqes are mostly
unused.
Allocate just one io_wqe embedded into io_wq, and uses all possible cpus
(cpu_possible_mask) in the io_wqe->cpumask.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 14 Mar 2023 17:07:19 +0000 (11:07 -0600)]
io_uring: add support for user mapped provided buffer ring
The ring mapped provided buffer rings rely on the application allocating
the memory for the ring, and then the kernel will map it. This generally
works fine, but runs into issues on some architectures where we need
to be able to ensure that the kernel and application virtual address for
the ring play nicely together. This at least impacts architectures that
set SHM_COLOUR, but potentially also anyone setting SHMLBA.
To use this variant of ring provided buffers, the application need not
allocate any memory for the ring. Instead the kernel will do so, and
the allocation must subsequently call mmap(2) on the ring with the
offset set to:
IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)
to get a virtual address for the buffer ring. Normally the application
would allocate a suitable piece of memory (and correctly aligned) and
simply pass that in via io_uring_buf_reg.ring_addr and the kernel would
map it.
Outside of the setup differences, the kernel allocate + user mapped
provided buffer ring works exactly the same.
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 14 Mar 2023 17:01:45 +0000 (11:01 -0600)]
io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'
In preparation for allowing flags to be set for registration, rename
the padding and use it for that.
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 14 Mar 2023 16:59:46 +0000 (10:59 -0600)]
io_uring/kbuf: add buffer_list->is_mapped member
Rather than rely on checking buffer_list->buf_pages or ->buf_nr_pages,
add a separate member that tracks if this is a ring mapped provided
buffer list or not.
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 14 Mar 2023 16:55:50 +0000 (10:55 -0600)]
io_uring/kbuf: move pinning of provided buffer ring into helper
In preparation for allowing the kernel to allocate the provided buffer
rings and have the application mmap it instead, abstract out the
current method of pinning and mapping the user allocated ring.
No functional changes intended in this patch.
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Helge Deller [Thu, 16 Feb 2023 08:09:38 +0000 (09:09 +0100)]
io_uring: Adjust mapping wrt architecture aliasing requirements
Some architectures have memory cache aliasing requirements (e.g. parisc)
if memory is shared between userspace and kernel. This patch fixes the
kernel to return an aliased address when asked by userspace via mmap().
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 7 Mar 2023 16:47:20 +0000 (09:47 -0700)]
io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it
io_uring hashes writes to a given file/inode so that it can serialize
them. This is useful if the file system needs exclusive access to the
file to perform the write, as otherwise we end up with a ton of io-wq
threads trying to lock the inode at the same time. This can cause
excessive system time.
But if the file system has flagged that it supports parallel O_DIRECT
writes, then there's no need to serialize the writes. Check for that
through FMODE_DIO_PARALLEL_WRITE and don't hash it if we don't need to.
In a basic test of 8 threads writing to a file on XFS on a gen2 Optane,
with each thread writing in 4k chunks, it improves performance from
~1350K IOPS (or ~5290MiB/sec) to ~1410K IOPS (or ~5500MiB/sec).
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 7 Mar 2023 16:40:28 +0000 (09:40 -0700)]
fs: add FMODE_DIO_PARALLEL_WRITE flag
Some filesystems support multiple threads writing to the same file with
O_DIRECT without requiring exclusive access to it. io_uring can use this
hint to avoid serializing dio writes to this inode, instead allowing them
to run in parallel.
XFS and ext4 both fall into this category, so set the flag for both of
them.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 2 Apr 2023 21:29:29 +0000 (14:29 -0700)]
Linux 6.3-rc5
Linus Torvalds [Sun, 2 Apr 2023 17:57:12 +0000 (10:57 -0700)]
Merge tag 'for-6.3-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- scan block devices in non-exclusive mode to avoid temporary mkfs
failures
- fix race between quota disable and quota assign ioctls
- fix deadlock when aborting transaction during relocation with scrub
- ignore fiemap path cache when there are multiple paths for a node
* tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ignore fiemap path cache when there are multiple paths for a node
btrfs: fix deadlock when aborting transaction during relocation with scrub
btrfs: scan device in non-exclusive mode
btrfs: fix race between quota disable and quota assign ioctls
Javier Martinez Canillas [Tue, 7 Feb 2023 10:22:54 +0000 (11:22 +0100)]
Revert "venus: firmware: Correct non-pix start and end addresses"
This reverts commit
a837e5161cff, which broke probing of the venus
driver, at least on the SC7180 SoC HP X2 Chromebook:
qcom-venus aa00000.video-codec: Adding to iommu group 11
qcom-venus aa00000.video-codec: non legacy binding
qcom-venus aa00000.video-codec: failed to reset venus core
qcom-venus: probe of aa00000.video-codec failed with error -110
Matthias Kaehlcke also reported that the same change caused a regression
in SC7180 and sc7280, that prevents AOSS from entering sleep mode during
system suspend. So let's revert this commit for now to fix both issues.
Fixes:
a837e5161cff ("venus: firmware: Correct non-pix start and end addresses")
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 2 Apr 2023 17:10:16 +0000 (10:10 -0700)]
Merge tag 'driver-core-6.3-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are three small changes for 6.3-rc5 semi-related to driver core
stuff:
- documentation update where we move the security_bugs file to a more
relevant location.
- mdt/spi-nor debugfs memory leak fix that's been floating around for
a long time and acked by the maintainer
- cacheinfo bugfix for a regression in 6.3-rc1
All have been in linux-next with no reported problems"
* tag 'driver-core-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
cacheinfo: Fix LLC is not exported through sysfs
Documentation/security-bugs: move from admin-guide/ to process/
mtd: spi-nor: fix memory leak when using debugfs_lookup()
Linus Torvalds [Sun, 2 Apr 2023 17:01:56 +0000 (10:01 -0700)]
Merge tag 'powerpc-6.3-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y)
- Fix oops when a PF_IO_WORKER thread tries to core dump
- Don't try to reconfigure VAS when it's disabled
Thanks to Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, and
Russell Currey.
* tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc/64s: Fix __pte_needs_flush() false positive warning
Linus Torvalds [Sat, 1 Apr 2023 21:50:22 +0000 (14:50 -0700)]
Merge tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Four cifs/smb3 client (reconnect and DFS related) fixes, including two
for stable:
- DFS oops fix
- DFS reconnect recursion fix
- An SMB1 parallel reconnect fix
- Trivial dead code removal in smb2_reconnect"
* tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: get rid of dead check in smb2_reconnect()
cifs: prevent infinite recursion in CIFSGetDFSRefer()
cifs: avoid races in parallel reconnects in smb1
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
Linus Torvalds [Sat, 1 Apr 2023 21:09:51 +0000 (14:09 -0700)]
Merge tag 'input-for-v6.3-rc4' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- fixes to ALPS and Focaltech PS/2 drivers dealing with the breakage of
switching to -funsigned-char
- quirks to i8042 to better handle Lifebook A574/H and TUXEDO devices
- a quirk to Goodix touchscreen driver to handle Yoga Book X90F
- a fix for incorrectly merged patch to xpad game controller driver
* tag 'input-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add TUXEDO devices to i8042 quirk tables for partial fix
Input: alps - fix compatibility with -funsigned-char
Input: focaltech - use explicitly signed char type
Input: xpad - fix incorrectly applied patch for MAP_PROFILE_BUTTON
Input: goodix - add Lenovo Yoga Book X90F to nine_bytes_report DMI table
Input: i8042 - add quirk for Fujitsu Lifebook A574/H
Linus Torvalds [Sat, 1 Apr 2023 16:47:08 +0000 (09:47 -0700)]
Merge tag 'pinctrl-v6.3-2' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some pin control fixes for the v6.3 series.
The most notable and urgent one is probably the AMD fix which affects
AMD laptops, found by the Chromium people.
Summary:
- Fix up the Kconfig options for MediaTek MT7981
- Fix the irq domain name in the AT91-PIO4 driver
- Fix some alternative muxing modes in the Ocelot driver
- Allocate the GPIO numbers dynamically in the STM32 driver
- Disable and mask interrupts on resume in the AMD driver
- Fix a typo in the Qualcomm SM8550 pin control device tree bindings"
* tag 'pinctrl-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
dt-bindings: pinctrl: qcom,sm8550-lpass-lpi: allow input-enabled and bias-bus-hold
pinctrl: amd: Disable and mask interrupts on resume
pinctrl: stm32: use dynamic allocation of GPIO base
pinctrl: ocelot: Fix alt mode for ocelot
pinctrl: at91-pio4: fix domain name assignment
pinctrl: mediatek: fix naming inconsistency
pinctrl: mediatek: add missing options to PINCTRL_MT7981
Linus Torvalds [Sat, 1 Apr 2023 16:25:17 +0000 (09:25 -0700)]
Merge tag 'kbuild-fixes-v6.3-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix linux-headers debian package
- Fix a merge_config.sh error due to a misspelled variable
- Fix modversion for 32-bit build machines
* tag 'kbuild-fixes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: Fix processing of CRCs on 32-bit build machines
scripts: merge_config: Fix typo in variable name.
kbuild: deb-pkg: set version for linux-headers paths
Linus Torvalds [Sat, 1 Apr 2023 16:17:33 +0000 (09:17 -0700)]
Merge tag 'iommu-fixes-6.3-rc4' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Maintainer update for S390 IOMMU driver
- A fix for the set_platform_dma_ops() call-back in the Exynos
IOMMU driver
- Intel VT-d fixes from Lu Baolu:
- Fix a lockdep splat
- Fix a supplement of the specification
- Fix a warning in perfmon code
* tag 'iommu-fixes-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
iommu/vt-d: Allow zero SAGAW if second-stage not supported
iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
iommu/exynos: Fix set_platform_dma_ops() callback
MAINTAINERS: Update s390-iommu driver maintainer information
Arnd Bergmann [Tue, 7 Feb 2023 16:13:12 +0000 (17:13 +0100)]
media: i2c: imx290: fix conditional function defintions
The runtime suspend/resume functions are only referenced from the
dev_pm_ops, but they use the old SET_RUNTIME_PM_OPS() helper that
requires a __maybe_unused annotation to avoid a warning:
drivers/media/i2c/imx290.c:1082:12: error: unused function 'imx290_runtime_resume' [-Werror,-Wunused-function]
static int imx290_runtime_resume(struct device *dev)
^
drivers/media/i2c/imx290.c:1090:12: error: unused function 'imx290_runtime_suspend' [-Werror,-Wunused-function]
static int imx290_runtime_suspend(struct device *dev)
^
Convert this to the new RUNTIME_PM_OPS() helper that so this is not
required. To improve this further, also use the pm_ptr() helper that
lets the dev_pm_ops get dropped entirely when CONFIG_PM is disabled.
A related mistake happened in the of_match_ptr() macro here, which like
SET_RUNTIME_PM_OPS() requires the match table to be marked as
__maybe_unused, though I could not reproduce building this without
CONFIG_OF. Remove the of_match_ptr() here as there is no point in
dropping the match table in configurations without CONFIG_OF.
Fixes:
02852c01f654 ("media: i2c: imx290: Initialize runtime PM before subdev")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 31 Mar 2023 20:22:14 +0000 (13:22 -0700)]
Merge tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Fix shutdown of NFS TCP client sockets
- Fix hangs when recovering open state after a server reboot
* tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix shutdown of NFS TCP client socket
NFSv4: Fix hangs when recovering open state after a server reboot
Linus Torvalds [Fri, 31 Mar 2023 20:11:06 +0000 (13:11 -0700)]
Merge tag 'platform-drivers-x86-v6.3-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- Fix a regression in ideapad-laptop which caused the touchpad to stop
working after a suspend/resume on some models
- One other small fix and three hw-id additions
* tag 'platform-drivers-x86-v6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE
platform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models
platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE
platform/x86: gigabyte-wmi: add support for B650 AORUS ELITE AX
platform/x86/intel/pmc: Alder Lake PCH slp_s0_residency fix
Linus Torvalds [Fri, 31 Mar 2023 20:07:01 +0000 (13:07 -0700)]
Merge tag 'pci-v6.3-fixes-1' of git://git./linux/kernel/git/pci/pci
Pull PCI fix from Bjorn Helgaas:
- Fix DesignWare PORT_LINK_CONTROL setup, which was corrupted when the
DT "snps,enable-cdm-check" property was present (Yoshihiro Shimoda)
* tag 'pci-v6.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: dwc: Fix PORT_LINK_CONTROL update when CDM check enabled
Linus Torvalds [Fri, 31 Mar 2023 20:02:34 +0000 (13:02 -0700)]
Merge tag 'regulator-fix-v6.3-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Deferred probe fix for v6.3.
This fixes a rarely triggered issue where we would treat probe
deferral for clocks as a fatal error in the fixed regulator, causing
it to fail to retry when it should"
* tag 'regulator-fix-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Handle deferred clk
Linus Torvalds [Fri, 31 Mar 2023 19:35:03 +0000 (12:35 -0700)]
Merge tag 'block-6.3-2023-03-30' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos)
- Fix a possible UAF when failing to allocate an TCP io queue (Sagi
Grimberg)
- MD pull request via Song:
- Fix a null pointer deference in 6.3-rc (Yu Kuai)
- uevent partition fix (Alyssa)
* tag 'block-6.3-2023-03-30' of git://git.kernel.dk/linux:
nvme-tcp: fix a possible UAF when failing to allocate an io queue
md: fix regression for null-ptr-deference in __md_stop()
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
loop: LOOP_CONFIGURE: send uevents for partitions
Linus Torvalds [Fri, 31 Mar 2023 19:30:13 +0000 (12:30 -0700)]
Merge tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix a regression with the poll retry, introduced in this merge window
(me)
- Fix a regression with the alloc cache not decrementing the member
count on removal. Also a regression from this merge window (Pavel)
- Fix race around rsrc node grabbing (Pavel)
* tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux:
io_uring: fix poll/netmsg alloc caches
io_uring/rsrc: fix rogue rsrc node grabbing
io_uring/poll: clear single/double poll flags on poll arming
Hans de Goede [Thu, 30 Mar 2023 19:46:44 +0000 (21:46 +0200)]
platform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE
Commit
5829f8a897e4 ("platform/x86: ideapad-laptop: Send
KEY_TOUCHPAD_TOGGLE on some models") made ideapad-laptop send
KEY_TOUCHPAD_TOGGLE when we receive an ACPI notify with VPC event bit 5 set
and the touchpad-state has not been changed by the EC itself already.
This was done under the assumption that this would be good to do to make
the touchpad-toggle hotkey work on newer models where the EC does not
toggle the touchpad on/off itself (because it is not routed through
the PS/2 controller, but uses I2C).
But it turns out that at least some models, e.g. the Yoga 7-15ITL5 the EC
triggers an ACPI notify with VPC event bit 5 set on resume, which would
now cause a spurious KEY_TOUCHPAD_TOGGLE on resume to which the desktop
environment responds by disabling the touchpad in software, breaking
the touchpad (until manually re-enabled) on resume.
It was never confirmed that sending KEY_TOUCHPAD_TOGGLE actually improves
things on new models and at least some new models like the Yoga 7-15ITL5
don't have a touchpad on/off toggle hotkey at all, while still sending
ACPI notify events with VPC event bit 5 set.
So it seems best to revert the change to send KEY_TOUCHPAD_TOGGLE when
receiving an ACPI notify events with VPC event bit 5 and the touchpad
state as reported by the EC has not changed.
Note this is not a full revert the code to cache the last EC touchpad
state is kept to avoid sending spurious KEY_TOUCHPAD_ON / _OFF events
on resume.
Fixes:
5829f8a897e4 ("platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217234
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230330194644.64628-1-hdegoede@redhat.com
weiliang1503 [Thu, 30 Mar 2023 11:49:43 +0000 (19:49 +0800)]
platform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models
Make quirk_asus_tablet_mode apply on other ROG Flow X13 devices,
which only affects the GV301Q model before.
Signed-off-by: weiliang1503 <weiliang1503@gmail.com>
Link: https://lore.kernel.org/r/20230330114943.15057-1-weiliang1503@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Hans de Goede [Fri, 31 Mar 2023 17:31:48 +0000 (19:31 +0200)]
platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE
Add "X570S AORUS ELITE" to known working boards
Reported-by: Brandon Nielsen <nielsenb@jetfuse.net>
Link: https://lore.kernel.org/r/20230331014902.7864-1-nielsenb@jetfuse.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Linus Torvalds [Fri, 31 Mar 2023 17:23:27 +0000 (10:23 -0700)]
Merge tag 'thermal-6.3-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These remove two recently added excessive lockdep assertions from the
sysfs-related thermal code and fix two issues in Intel thermal
drivers.
Specifics:
- Drop two lockdep assertions producing false positive warnings from
the sysfs-related thermal core code (Rafael Wysocki)
- Fix handling of two recently added module parameters in the Intel
powerclamp thermal driver (David Arcari)
- Fix one more deadlock in the int340x thermal driver (Srinivas
Pandruvada)"
* tag 'thermal-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
thermal: intel: int340x: processor_thermal: Fix additional deadlock
thermal: core: Drop excessive lockdep_assert_held() calls
Linus Torvalds [Fri, 31 Mar 2023 17:18:56 +0000 (10:18 -0700)]
Merge tag 'acpi-6.3-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a recent regression related to the handling of ACPI notifications
that made it more likely for ACPI driver callbacks to be invoked in an
unexpected order and NULL pointers can be dereferenced as a result or
similar.
The fix is to modify the global ACPI notification handler so it does
not invoke driver callbacks at all and allow the device-level
notification handlers to receive "system" notifications (for the
drivers that want to receive them)"
* tag 'acpi-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: bus: Rework system-level device notification handling
Linus Torvalds [Fri, 31 Mar 2023 17:15:17 +0000 (10:15 -0700)]
Merge tag 'riscv-for-linus-6.3-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for FPU probing in XIP kernels
- Always enable the alternative framework for non-XIP kernels
* tag 'riscv-for-linus-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Linus Torvalds [Fri, 31 Mar 2023 17:12:07 +0000 (10:12 -0700)]
Merge tag 'mips-fixes_6.3_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
"Fix to avoid crash on BCM6358 platforms"
* tag 'mips-fixes_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: bmips: BCM6358: disable RAC flush for TP1
Rafael J. Wysocki [Fri, 31 Mar 2023 10:02:46 +0000 (12:02 +0200)]
Merge branch 'thermal-intel-fixes'
Merge Intel thermal driver fixes for 6.3-rc5:
- Fix handling of two recently added module parameters in the Intel
powerclamp thermal driver (David Arcari).
- Fix one more deadlock in the int340x thermal driver (Srinivas
Pandruvada).
* thermal-intel-fixes:
thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
thermal: intel: int340x: processor_thermal: Fix additional deadlock
Kan Liang [Wed, 29 Mar 2023 13:47:21 +0000 (21:47 +0800)]
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
A warning can be triggered when hotplug CPU 0.
$ echo 0 > /sys/devices/system/cpu/cpu0/online
------------[ cut here ]------------
Voluntary context switch within RCU read-side critical section!
WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
rcu_note_context_switch+0x4f4/0x580
RIP: 0010:rcu_note_context_switch+0x4f4/0x580
Call Trace:
<TASK>
? perf_event_update_userpage+0x104/0x150
__schedule+0x8d/0x960
? perf_event_set_state.part.82+0x11/0x50
schedule+0x44/0xb0
schedule_timeout+0x226/0x310
? __perf_event_disable+0x64/0x1a0
? _raw_spin_unlock+0x14/0x30
wait_for_completion+0x94/0x130
__wait_rcu_gp+0x108/0x130
synchronize_rcu+0x67/0x70
? invoke_rcu_core+0xb0/0xb0
? __bpf_trace_rcu_stall_warning+0x10/0x10
perf_pmu_migrate_context+0x121/0x370
iommu_pmu_cpu_offline+0x6a/0xa0
? iommu_pmu_del+0x1e0/0x1e0
cpuhp_invoke_callback+0x129/0x510
cpuhp_thread_fun+0x94/0x150
smpboot_thread_fn+0x183/0x220
? sort_range+0x20/0x20
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
---[ end trace
0000000000000000 ]---
The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.
Two methods were considered to fix the issue.
- Use the dmar_global_lock to replace the RCU read lock when going
through the drhd list. But it triggers a lockdep warning.
- Use the cpuhp_setup_state_multi() to set up a dedicated state for each
IOMMU PMU. The lock can be avoided.
The latter method is implemented in this patch. Since each IOMMU PMU has
a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track
the state. The state can be dynamically allocated now. Remove the
CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.
Fixes:
46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Wed, 29 Mar 2023 13:47:20 +0000 (21:47 +0800)]
iommu/vt-d: Allow zero SAGAW if second-stage not supported
The VT-d spec states (in section 11.4.2) that hardware implementations
reporting second-stage translation support (SSTS) field as Clear also
report the SAGAW field as 0. Fix an inappropriate check in alloc_iommu().
Fixes:
792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default")
Suggested-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230318024824.124542-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Wed, 29 Mar 2023 13:47:19 +0000 (21:47 +0800)]
iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
The global rwsem dmar_global_lock was introduced by commit
3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2 #468 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
at: __irq_domain_alloc_irqs+0x3b/0xa0
but task is already holding lock:
ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
at: intel_iommu_init+0x58e/0x880
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (dmar_global_lock){++++}-{3:3}:
lock_acquire+0xd6/0x320
down_read+0x42/0x180
intel_irq_remapping_alloc+0xad/0x750
mp_irqdomain_alloc+0xb8/0x2b0
irq_domain_alloc_irqs_locked+0x12f/0x2d0
__irq_domain_alloc_irqs+0x56/0xa0
alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
mp_map_pin_to_irq+0x1dc/0x330
setup_IO_APIC+0x128/0x210
apic_intr_mode_init+0x67/0x110
x86_late_time_init+0x24/0x40
start_kernel+0x41e/0x7e0
secondary_startup_64_no_verify+0xe0/0xeb
-> #0 (&domain->mutex){+.+.}-{3:3}:
check_prevs_add+0x160/0xef0
__lock_acquire+0x147d/0x1950
lock_acquire+0xd6/0x320
__mutex_lock+0x9c/0xfc0
__irq_domain_alloc_irqs+0x3b/0xa0
dmar_alloc_hwirq+0x9e/0x120
iommu_pmu_register+0x11d/0x200
intel_iommu_init+0x5de/0x880
pci_iommu_init+0x12/0x40
do_one_initcall+0x65/0x350
kernel_init_freeable+0x3ca/0x610
kernel_init+0x1a/0x140
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(dmar_global_lock);
lock(&domain->mutex);
lock(dmar_global_lock);
lock(&domain->mutex);
*** DEADLOCK ***
Fixes:
9dbb8e3452ab ("irqdomain: Switch to per-domain locking")
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230314051836.23817-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Jens Axboe [Fri, 31 Mar 2023 02:29:47 +0000 (20:29 -0600)]
Merge tag 'md-fixes-2023-03-29' of https://git./linux/kernel/git/song/md into block-6.3
Pull MD fix from Song.
* tag 'md-fixes-2023-03-29' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: fix regression for null-ptr-deference in __md_stop()
Linus Torvalds [Thu, 30 Mar 2023 23:09:37 +0000 (16:09 -0700)]
Merge tag 'dma-mapping-6.3-2023-03-31' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- fix for swiotlb deadlock due to wrong alignment checks (GuoRui.Yu,
Petr Tesarik)
* tag 'dma-mapping-6.3-2023-03-31' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix slot alignment checks
swiotlb: use wrap_area_index() instead of open-coding it
swiotlb: fix the deadlock in swiotlb_do_find_slots
Paulo Alcantara [Wed, 29 Mar 2023 20:14:23 +0000 (17:14 -0300)]
cifs: get rid of dead check in smb2_reconnect()
The SMB2_IOCTL check in the switch statement will never be true as we
return earlier from smb2_reconnect() if @smb2_command == SMB2_IOCTL.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Wed, 29 Mar 2023 20:14:22 +0000 (17:14 -0300)]
cifs: prevent infinite recursion in CIFSGetDFSRefer()
We can't call smb_init() in CIFSGetDFSRefer() as cifs_reconnect_tcon()
may end up calling CIFSGetDFSRefer() again to get new DFS referrals
and thus causing an infinite recursion.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Wed, 29 Mar 2023 20:14:21 +0000 (17:14 -0300)]
cifs: avoid races in parallel reconnects in smb1
Prevent multiple threads of doing negotiate, session setup and tree
connect by holding @ses->session_mutex in cifs_reconnect_tcon() while
reconnecting session and tcon.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 30 Mar 2023 22:52:45 +0000 (15:52 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers. The core fix is yet another
attempt to insulate us from UFS devices' weird behaviour for VPD
pages"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Don't print sense pool info twice
scsi: core: Improve scsi_vpd_inquiry() checks
scsi: megaraid_sas: Fix crash after a double completion
scsi: megaraid_sas: Fix fw_crash_buffer_show()
Jens Axboe [Thu, 30 Mar 2023 22:39:04 +0000 (16:39 -0600)]
Merge tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme into block-6.3
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos)
- fix a possible UAF when failing to allocate an TCP io queue
(Sagi Grimberg)"
* tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme:
nvme-tcp: fix a possible UAF when failing to allocate an io queue
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
David Disseldorp [Wed, 29 Mar 2023 20:24:06 +0000 (22:24 +0200)]
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
When compiled with CONFIG_CIFS_DFS_UPCALL disabled, cifs_dfs_d_automount
is NULL. cifs.ko logic for mapping CIFS_FATTR_DFS_REFERRAL attributes to
S_AUTOMOUNT and corresponding dentry flags is retained regardless of
CONFIG_CIFS_DFS_UPCALL, leading to a NULL pointer dereference in
VFS follow_automount() when traversing a DFS referral link:
BUG: kernel NULL pointer dereference, address:
0000000000000000
...
Call Trace:
<TASK>
__traverse_mounts+0xb5/0x220
? cifs_revalidate_mapping+0x65/0xc0 [cifs]
step_into+0x195/0x610
? lookup_fast+0xe2/0xf0
path_lookupat+0x64/0x140
filename_lookup+0xc2/0x140
? __create_object+0x299/0x380
? kmem_cache_alloc+0x119/0x220
? user_path_at_empty+0x31/0x50
user_path_at_empty+0x31/0x50
__x64_sys_chdir+0x2a/0xd0
? exit_to_user_mode_prepare+0xca/0x100
do_syscall_64+0x42/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
This fix adds an inline cifs_dfs_d_automount() {return -EREMOTE} handler
when CONFIG_CIFS_DFS_UPCALL is disabled. An alternative would be to
avoid flagging S_AUTOMOUNT, etc. without CONFIG_CIFS_DFS_UPCALL. This
approach was chosen as it provides more control over the error path.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Thu, 30 Mar 2023 21:05:21 +0000 (14:05 -0700)]
Merge tag 'net-6.3-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and WPAN.
Still quite a few bugs from this release. This pull is a bit smaller
because major subtrees went into the previous one. Or maybe people
took spring break off?
Current release - regressions:
- phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
Current release - new code bugs:
- eth: wangxun: fix vector length of interrupt cause
- vsock/loopback: consistently protect the packet queue with
sk_buff_head.lock
- virtio/vsock: fix header length on skb merging
- wpan: ca8210: fix unsigned mac_len comparison with zero
Previous releases - regressions:
- eth: stmmac: don't reject VLANs when IFF_PROMISC is set
- eth: smsc911x: avoid PHY being resumed when interface is not up
- eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
- eth: bnx2x: use the right build_skb() helper after core rework
- wwan: iosm: fix 7560 modem crash on use on unsupported channel
Previous releases - always broken:
- eth: sfc: don't overwrite offload features at NIC reset
- eth: r8169: fix RTL8168H and RTL8107E rx crc error
- can: j1939: prevent deadlock by moving j1939_sk_errqueue()
- virt: vmxnet3: use GRO callback when UPT is enabled
- virt: xen: don't do grant copy across page boundary
- phy: dp83869: fix default value for tx-/rx-internal-delay
- dsa: ksz8: fix multiple issues with ksz8_fdb_dump
- eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
- eth: mtk_eth_soc: fix flow block refcounting logic
Misc:
- constify fwnode pointers in SFP handling"
* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
net: dsa: sync unicast and multicast addresses for VLAN filters too
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
xen/netback: use same error messages for same errors
test/vsock: new skbuff appending test
virtio/vsock: WARN_ONCE() for invalid state of socket
virtio/vsock: fix header length on skb merging
bnxt_en: Add missing 200G link speed reporting
bnxt_en: Fix typo in PCI id to device description string mapping
bnxt_en: Fix reporting of test result in ethtool selftest
i40e: fix registers dump after run ethtool adapter self test
bnx2x: use the right build_skb() helper
net: ipa: compute DMA pool size properly
net: wwan: iosm: fixes 7560 modem crash
net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
...
Linus Torvalds [Thu, 30 Mar 2023 20:58:12 +0000 (13:58 -0700)]
Merge tag 'for-6.3/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix two DM core bugs in the code that handles splitting "abnormal" IO
(discards, write same and secure erase) and issuing that IO to the
correct underlying devices (and offsets within those devices).
* tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix __send_duplicate_bios() to always allow for splitting IO
dm: fix improper splitting for abnormal bios
Linus Torvalds [Thu, 30 Mar 2023 20:38:27 +0000 (13:38 -0700)]
Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Two regression fixes in here, otherwise just the usual stuff:
- i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
more
- amdgpu: dp mst and hibernate regression fix
- etnaviv: revert fdinfo support (incl drm/sched revert), leak fix
- misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
fixes"
* tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
Revert "drm/scheduler: track GPU active time per entity"
Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
drm/etnaviv: fix reference leak when mmaping imported buffer
drm/amdgpu: allow more APUs to do mode2 reset when go to S4
drm/amd/display: Take FEC Overhead into Timeslot Calculation
drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
drm: test: Fix 32-bit issue in drm_buddy_test
drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
drm/nouveau/kms: Fix backlight registration
drm/i915/perf: Drop wakeref on GuC RC error
drm/i915/dpt: Treat the DPT BO as a framebuffer
drm/i915/gem: Flush lmem contents after construction
drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
drm/i915: Disable DC states for all commits
drm/i915: Workaround ICL CSC_MODE sticky arming
drm/i915: Add a .color_post_update() hook
drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
drm/i915/pmu: Use functions common with sysfs to read actual freq
accel/ivpu: Fix IPC buffer header status field value
...
Mike Snitzer [Thu, 30 Mar 2023 19:09:29 +0000 (15:09 -0400)]
dm: fix __send_duplicate_bios() to always allow for splitting IO
Commit
7dd76d1feec70 ("dm: improve bio splitting and associated IO
accounting") only called setup_split_accounting() from
__send_duplicate_bios() if a single bio were being issued. But the case
where duplicate bios are issued must call it too.
Otherwise the bio won't be split and resubmitted (via recursion through
block core back to DM) to submit the later portions of a bio (which may
map to an entirely different target).
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before (broken, discards the first striped target's devices twice):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528
After (works as expected):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528
Fixes:
7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Thu, 30 Mar 2023 18:56:38 +0000 (14:56 -0400)]
dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit
7dd06a2548b2 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().
It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before this fix:
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
After this fix;
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
Fixes:
7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Felix Fietkau [Thu, 30 Mar 2023 12:08:40 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
The cache needs to be flushed to ensure that the hardware stops offloading
the flow immediately.
Fixes:
33fc42de3327 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Thu, 30 Mar 2023 12:08:39 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
Check for skb metadata in order to detect the case where the DSA header
is not present.
Fixes:
2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Thu, 30 Mar 2023 12:08:38 +0000 (14:08 +0200)]
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
call flow_block_cb_incref for a newly allocated cb.
Also fix the accidentally inverted refcount check on unbind.
Fixes:
502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Wed, 29 Mar 2023 12:11:17 +0000 (13:11 +0100)]
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.
Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().
Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.
Fixes:
2adb719d74f6 ("net: mvneta: Implement software TSO")
Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 29 Mar 2023 15:18:21 +0000 (18:18 +0300)]
net: dsa: sync unicast and multicast addresses for VLAN filters too
If certain conditions are met, DSA can install all necessary MAC
addresses on the CPU ports as FDB entries and disable flooding towards
the CPU (we call this RX filtering).
There is one corner case where this does not work.
ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link add link swp0 name swp0.100 type vlan id 100
ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100
Traffic through swp0.100 is broken, because the bridge turns on VLAN
filtering in the swp0 port (causing RX packets to be classified to the
FDB database corresponding to the VID from their 802.1Q header), and
although the 8021q module does call dev_uc_add() towards the real
device, that API is VLAN-unaware, so it only contains the MAC address,
not the VID; and DSA's current implementation of ndo_set_rx_mode() is
only for VID 0 (corresponding to FDB entries which are installed in an
FDB database which is only hit when the port is VLAN-unaware).
It's interesting to understand why the bridge does not turn on
IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
that this is a regression caused by the logic in commit
2796d0c648c9
("bridge: Automatically manage port promiscuous mode."). After all,
a bridge port needs to have IFF_PROMISC by its very nature - it needs to
receive and forward frames with a MAC DA different from the bridge
ports' MAC addresses.
While that may be true, when the bridge is VLAN-aware *and* it has a
single port, there is no real reason to enable promiscuity even if that
is an automatic port, with flooding and learning (there is nowhere for
packets to go except to the BR_FDB_LOCAL entries), and this is how the
corner case appears. Adding a second automatic interface to the bridge
would make swp0 promisc as well, and would mask the corner case.
Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
pass a VLAN ID), the only way to address that problem is to install host
FDB entries for the cartesian product of RX filtering MAC addresses and
VLAN RX filters.
Fixes:
7569459a52c9 ("net: dsa: manage flooding on the CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Bätz [Wed, 29 Mar 2023 15:01:40 +0000 (12:01 -0300)]
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
Do not set the MV88E6XXX_PORT_CTL0_IGMP_MLD_SNOOP bit on CPU or DSA ports.
This allows the host CPU port to be a regular IGMP listener by sending out
IGMP Membership Reports, which would otherwise not be forwarded by the
mv88exxx chip, but directly looped back to the CPU port itself.
Fixes:
54d792f257c6 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329150140.701559-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Vetter [Thu, 30 Mar 2023 18:15:06 +0000 (20:15 +0200)]
Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes
- revert gpu time fdinfo support
- reference leak fix on imported buffers
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/de8e08c2599ec0e22456ae36e9757b9ff14c2124.camel@pengutronix.de
David Arcari [Thu, 30 Mar 2023 13:42:18 +0000 (09:42 -0400)]
thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
When cpumask is specified as a module parameter the value is
overwritten by the module init routine. This can easily be fixed
by checking to see if the mask has already been allocated in the
init routine.
When max_idle is specified as a module parameter a panic will occur.
The problem is that the idle_injection_cpu_mask is not allocated until
the module init routine executes. This can easily be fixed by allocating
the cpumask if it's not already allocated.
Fixes:
ebf519710218 ("thermal: intel: powerclamp: Add two module parameters")
Signed-off-by: David Arcari <darcari@redhat.com>
Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Daniel Vetter [Thu, 30 Mar 2023 17:59:06 +0000 (19:59 +0200)]
Merge tag 'amd-drm-fixes-6.3-2023-03-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.3-2023-03-30:
amdgpu:
- Hibernation regression fix
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330153859.18332-1-alexander.deucher@amd.com
Daniel Vetter [Thu, 30 Mar 2023 16:56:52 +0000 (18:56 +0200)]
Merge tag 'drm-misc-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:
* various ivpu fixes
* fix nouveau backlight registration
* fix buddy allocator in 32-bit systems
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230330141006.GA22908@linux-uq9g
Daniel Vetter [Thu, 30 Mar 2023 16:26:05 +0000 (18:26 +0200)]
Merge tag 'amd-drm-fixes-6.3-2023-03-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.3-2023-03-29:
amdgpu:
- Two DP MST fixes
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329220059.7622-1-alexander.deucher@amd.com
Daniel Vetter [Thu, 30 Mar 2023 16:07:12 +0000 (18:07 +0200)]
Merge tag 'drm-intel-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v6.3-rc5:
- Fix PMU support by reusing functions with sysfs
- Fix a number of issues related to color, PSR and arm/noarm
- Fix state check related to ICL PHY ownership check in TC-cold state
- Flush lmem contents after construction
- Fix hibernate oops related to DPT BO
- Fix perf stream error path wakeref balance
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87355m4gtm.fsf@intel.com
Linus Torvalds [Thu, 30 Mar 2023 16:04:04 +0000 (09:04 -0700)]
Merge tag 'sound-6.3-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes:
- A potential deadlock fix for USB-audio, involving some change in
PCM core side
- A regression fix for probes of USB-audio devices with the
vendor-specific PCM format bits
- Two regression fixes for the old YMFPCI driver
- A few HD-audio quirks as usual"
* tag 'sound-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
ALSA: ymfpci: Fix BUG_ON in probe function
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
ALSA: usb-audio: Fix regression on detection of Roland VS-100
ALSA: hda/realtek: Fix support for Dell Precision 3260
ALSA: usb-audio: Fix recursive locking at XRUN during syncing
ALSA: hda/conexant: Partial revert of a quirk for Lenovo
ALSA: hda/realtek: Add quirks for some Clevo laptops
Linus Torvalds [Thu, 30 Mar 2023 16:00:17 +0000 (09:00 -0700)]
Merge tag 'zonefs-6.3-rc5' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fixes from Damien Le Moal:
- Make sure to always invalidate the last page of an inode straddling
inode->i_size to avoid data inconsistencies with appended data when
the device zone write granularity does not match the page size.
- Do not propagate iomap -ENOBLK error to userspace and use -EBUSY
instead.
* tag 'zonefs-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
zonefs: Always invalidate last cached page on append write
Lucas Stach [Thu, 30 Mar 2023 15:35:13 +0000 (17:35 +0200)]
Revert "drm/scheduler: track GPU active time per entity"
This reverts commit
df622729ddbf as it introduces a use-after-free,
which isn't easy to fix without going back to the design drawing board.
Reported-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Lucas Stach [Thu, 30 Mar 2023 15:33:27 +0000 (17:33 +0200)]
Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
This reverts commit
97804a133c68, as it builds on top of
df622729ddbf
("drm/scheduler: track GPU active time per entity") which needs to be
reverted, as it introduces a use-after-free.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Lucas Stach [Fri, 24 Feb 2023 17:21:54 +0000 (18:21 +0100)]
drm/etnaviv: fix reference leak when mmaping imported buffer
drm_gem_prime_mmap() takes a reference on the GEM object, but before that
drm_gem_mmap_obj() already takes a reference, which will be leaked as only
one reference is dropped when the mapping is closed. Drop the extra
reference when dma_buf_mmap() succeeds.
Cc: stable@vger.kernel.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tim Huang [Thu, 30 Mar 2023 02:33:02 +0000 (10:33 +0800)]
drm/amdgpu: allow more APUs to do mode2 reset when go to S4
Skip mode2 reset only for IMU enabled APUs when do S4.
This patch is to fix the regression issue
https://gitlab.freedesktop.org/drm/amd/-/issues/2483
It is generated by commit
b589626674de ("drm/amdgpu: skip ASIC reset
for APUs when go to S4").
Fixes:
b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483
Tested-by: Yuan Perry <Perry.Yuan@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Juergen Gross [Wed, 29 Mar 2023 08:02:59 +0000 (10:02 +0200)]
xen/netback: use same error messages for same errors
Issue the same error message in case an illegal page boundary crossing
has been detected in both cases where this is tested.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20230329080259.14823-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pavel Begunkov [Thu, 30 Mar 2023 12:52:38 +0000 (06:52 -0600)]
io_uring: fix poll/netmsg alloc caches
We increase cache->nr_cached when we free into the cache but don't
decrease when we take from it, so in some time we'll get an empty
cache with cache->nr_cached larger than IO_ALLOC_CACHE_MAX, that fails
io_alloc_cache_put() and effectively disables caching.
Fixes:
9b797a37c4bd8 ("io_uring: add abstraction around apoll cache")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Thu, 30 Mar 2023 00:47:58 +0000 (09:47 +0900)]
zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Fixes:
8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Damien Le Moal [Wed, 29 Mar 2023 04:16:01 +0000 (13:16 +0900)]
zonefs: Always invalidate last cached page on append write
When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.
This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Fixes:
02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Paolo Abeni [Thu, 30 Mar 2023 08:47:51 +0000 (10:47 +0200)]
Merge branch 'fix-header-length-on-skb-merging'
Arseniy Krasnov says:
====================
fix header length on skb merging
this patchset fixes appending newly arrived skbuff to the last skbuff of
the socket's queue during rx path. Problem fires when we are trying to
append data to skbuff which was already processed in dequeue callback
at least once. Dequeue callback calls function 'skb_pull()' which changes
'skb->len'. In current implementation 'skb->len' is used to update length
in header of last skbuff after new data was copied to it. This is bug,
because value in header is used to calculate 'rx_bytes'/'fwd_cnt' and
thus must be constant during skbuff lifetime. Here is example, we have
two skbuffs: skb0 with length 10 and skb1 with length 4.
1) skb0 arrives, hdr->len == skb->len == 10, rx_bytes == 10
2) Read 3 bytes from skb0, skb->len == 7, hdr->len == 10, rx_bytes == 10
3) skb1 arrives, hdr->len == skb->len == 4, rx_bytes == 14
4) Append skb1 to skb0, skb0 now has skb->len == 11, hdr->len == 11.
But value of 11 in header is invalid.
5) Read whole skb0, update rx_bytes by 11 from skb0's header.
6) At this moment rx_bytes == 3, but socket's queue is empty.
This bug starts to fire since:
commit
077706165717 ("virtio/vsock: don't use skbuff state to account credit")
In fact, it presents before, but didn't triggered due to a little bit
buggy implementation of credit calculation logic. So i'll use Fixes tag
for it.
I really forgot about this branch in rx path when implemented patch
077706165717.
This patchset contains 3 patches:
1) Fix itself.
2) Patch with WARN_ONCE() to catch such problems in future.
3) Patch with test which triggers skb appending logic. It looks like
simple test with several 'send()' and 'recv()', but it checks, that
skbuff appending works ok.
====================
Link: https://lore.kernel.org/r/0683cc6e-5130-484c-1105-ef2eb792d355@sberdevices.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arseniy Krasnov [Tue, 28 Mar 2023 11:33:07 +0000 (14:33 +0300)]
test/vsock: new skbuff appending test
This adds test which checks case when data of newly received skbuff is
appended to the last skbuff in the socket's queue. It looks like simple
test with 'send()' and 'recv()', but internally it triggers logic which
appends one received skbuff to another. Test checks that this feature
works correctly.
This test is actual only for virtio transport.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arseniy Krasnov [Tue, 28 Mar 2023 11:32:12 +0000 (14:32 +0300)]
virtio/vsock: WARN_ONCE() for invalid state of socket
This adds WARN_ONCE() and return from stream dequeue callback when
socket's queue is empty, but 'rx_bytes' still non-zero. This allows
the detection of potential bugs due to packet merging (see previous
patch).
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arseniy Krasnov [Tue, 28 Mar 2023 11:31:28 +0000 (14:31 +0300)]
virtio/vsock: fix header length on skb merging
This fixes appending newly arrived skbuff to the last skbuff of the
socket's queue. Problem fires when we are trying to append data to skbuff
which was already processed in dequeue callback at least once. Dequeue
callback calls function 'skb_pull()' which changes 'skb->len'. In current
implementation 'skb->len' is used to update length in header of the last
skbuff after new data was copied to it. This is bug, because value in
header is used to calculate 'rx_bytes'/'fwd_cnt' and thus must be not
be changed during skbuff's lifetime.
Bug starts to fire since:
commit
077706165717
("virtio/vsock: don't use skbuff state to account credit")
It presents before, but didn't triggered due to a little bit buggy
implementation of credit calculation logic. So use Fixes tag for it.
Fixes:
077706165717 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 30 Mar 2023 04:48:18 +0000 (21:48 -0700)]
Merge branch 'bnxt_en-3-bug-fixes'
Michael Chan says:
====================
bnxt_en: 3 Bug fixes
This series contains 3 small bug fixes covering ethtool self test, PCI
ID string typos, and some missing 200G link speed ethtool reporting logic.
====================
Link: https://lore.kernel.org/r/20230329013021.5205-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Wed, 29 Mar 2023 01:30:21 +0000 (18:30 -0700)]
bnxt_en: Add missing 200G link speed reporting
bnxt_fw_to_ethtool_speed() is missing the case statement for 200G
link speed reported by firmware. As a result, ethtool will report
unknown speed when the firmware reports 200G link speed.
Fixes:
532262ba3b84 ("bnxt_en: ethtool: support PAM4 link speeds up to 200G")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalesh AP [Wed, 29 Mar 2023 01:30:20 +0000 (18:30 -0700)]
bnxt_en: Fix typo in PCI id to device description string mapping
Fix 57502 and 57508 NPAR description string entries. The typos
caused these devices to not match up with lspci output.
Fixes:
49c98421e6ab ("bnxt_en: Add PCI IDs for 57500 series NPAR devices.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalesh AP [Wed, 29 Mar 2023 01:30:19 +0000 (18:30 -0700)]
bnxt_en: Fix reporting of test result in ethtool selftest
When the selftest command fails, driver is not reporting the failure
by updating the "test->flags" when bnxt_close_nic() fails.
Fixes:
eb51365846bc ("bnxt_en: Add basic ethtool -t selftest support.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Radoslaw Tyl [Tue, 28 Mar 2023 17:26:59 +0000 (10:26 -0700)]
i40e: fix registers dump after run ethtool adapter self test
Fix invalid registers dump from ethtool -d ethX after adapter self test
by ethtool -t ethY. It causes invalid data display.
The problem was caused by overwriting i40e_reg_list[].elements
which is common for ethtool self test and dump.
Fixes:
22dd9ae8afcc ("i40e: Rework register diagnostic")
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230328172659.3906413-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 30 Mar 2023 04:46:18 +0000 (21:46 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-28 (ice)
This series contains updates to ice driver only.
Jesse fixes mismatched header documentation reported when building with
W=1.
Brett restricts setting of VSI context to only applicable fields for the
given ICE_AQ_VSI_PROP_Q_OPT_VALID bit.
Junfeng adds check when adding Flow Director filters that conflict with
existing filter rules.
Jakob Koschel adds interim variable for iterating to prevent possible
misuse after looping.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
ice: fix W=1 headers mismatch
====================
Link: https://lore.kernel.org/r/20230328172035.3904953-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 30 Mar 2023 04:41:12 +0000 (21:41 -0700)]
Merge tag 'ieee802154-for-net-2023-03-29' of git://git./linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
ieee802154 for net 2023-03-29
Two small fixes this time.
Dongliang Mu removed an unnecessary null pointer check.
Harshit Mogalapalli fixed an int comparison unsigned against signed from a
recent other fix in the ca8210 driver.
* tag 'ieee802154-for-net-2023-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
net: ieee802154: remove an unnecessary null pointer check
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
====================
Link: https://lore.kernel.org/r/20230329064541.2147400-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 29 Mar 2023 00:00:13 +0000 (17:00 -0700)]
bnx2x: use the right build_skb() helper
build_skb() no longer accepts slab buffers. Since slab use is fairly
uncommon we prefer the drivers to call a separate slab_build_skb()
function appropriately.
bnx2x uses the old semantics where size of 0 meant buffer from slab.
It sets the fp->rx_frag_size to 0 for MTUs which don't fit in a page.
It needs to call slab_build_skb().
This fixes the WARN_ONCE() of incorrect API use seen with bnx2x.
Reported-by: Thomas Voegtle <tv@lio96.de>
Link: https://lore.kernel.org/all/b8f295e4-ba57-8bfb-7d9c-9d62a498a727@lio96.de/
Fixes:
ce098da1497c ("skbuff: Introduce slab_build_skb()")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230329000013.2734957-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 28 Mar 2023 16:27:51 +0000 (11:27 -0500)]
net: ipa: compute DMA pool size properly
In gsi_trans_pool_init_dma(), the total size of a pool of memory
used for DMA transactions is calculated. However the calculation is
done incorrectly.
For 4KB pages, this total size is currently always more than one
page, and as a result, the calculation produces a positive (though
incorrect) total size. The code still works in this case; we just
end up with fewer DMA pool entries than we intended.
Bjorn Andersson tested booting a kernel with 16KB pages, and hit a
null pointer derereference in sg_alloc_append_table_from_pages(),
descending from gsi_trans_pool_init_dma(). The cause of this was
that a 16KB total size was going to be allocated, and with 16KB
pages the order of that allocation is 0. The total_size calculation
yielded 0, which eventually led to the crash.
Correcting the total_size calculation fixes the problem.
Reported-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Tested-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Fixes:
9dd441e4ed57 ("soc: qcom: ipa: GSI transactions")
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230328162751.2861791-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sagi Grimberg [Mon, 20 Mar 2023 13:33:34 +0000 (15:33 +0200)]
nvme-tcp: fix a possible UAF when failing to allocate an io queue
When we allocate a nvme-tcp queue, we set the data_ready callback before
we actually need to use it. This creates the potential that if a stray
controller sends us data on the socket before we connect, we can trigger
the io_work and start consuming the socket.
In this case reported: we failed to allocate one of the io queues, and
as we start releasing the queues that we already allocated, we get
a UAF [1] from the io_work which is running before it should really.
Fix this by setting the socket ops callbacks only before we start the
queue, so that we can't accidentally schedule the io_work in the
initialization phase before the queue started. While we are at it,
rename nvme_tcp_restore_sock_calls to pair with nvme_tcp_setup_sock_ops.
[1]:
[16802.107284] nvme nvme4: starting error recovery
[16802.109166] nvme nvme4: Reconnecting in 10 seconds...
[16812.173535] nvme nvme4: failed to connect socket: -111
[16812.173745] nvme nvme4: Failed reconnect attempt 1
[16812.173747] nvme nvme4: Reconnecting in 10 seconds...
[16822.413555] nvme nvme4: failed to connect socket: -111
[16822.413762] nvme nvme4: Failed reconnect attempt 2
[16822.413765] nvme nvme4: Reconnecting in 10 seconds...
[16832.661274] nvme nvme4: creating 32 I/O queues.
[16833.919887] BUG: kernel NULL pointer dereference, address:
0000000000000088
[16833.920068] nvme nvme4: Failed reconnect attempt 3
[16833.920094] #PF: supervisor write access in kernel mode
[16833.920261] nvme nvme4: Reconnecting in 10 seconds...
[16833.920368] #PF: error_code(0x0002) - not-present page
[16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
...
[16833.923138] Call Trace:
[16833.923271] <TASK>
[16833.923402] lock_sock_nested+0x1e/0x50
[16833.923545] nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp]
[16833.923685] nvme_tcp_io_work+0x68/0xa0 [nvme_tcp]
[16833.923824] process_one_work+0x1e8/0x390
[16833.923969] worker_thread+0x53/0x3d0
[16833.924104] ? process_one_work+0x390/0x390
[16833.924240] kthread+0x124/0x150
[16833.924376] ? set_kthread_struct+0x50/0x50
[16833.924518] ret_from_fork+0x1f/0x30
[16833.924655] </TASK>
Reported-by: Yanjun Zhang <zhangyanjun@cestc.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Yanjun Zhang <zhangyanjun@cestc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fangzhi Zuo [Wed, 1 Mar 2023 02:34:58 +0000 (21:34 -0500)]
drm/amd/display: Take FEC Overhead into Timeslot Calculation
8b/10b encoding needs to add 3% fec overhead into the pbn.
In the Synapcis Cascaded MST hub, the first stage MST branch device
needs the information to determine the timeslot count for the
second stage MST branch device. Missing this overhead will leads to
insufficient timeslot allocation.
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>