review.tizen.org Git - platform/kernel/linux-starfive.git/log

projects / platform / kernel / linux-starfive.git / log

Eli Cohen [Tue, 21 Mar 2023 11:28:08 +0000 (13:28 +0200)]

vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default

Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.

Current firmware versions show degradation in packet rate when using
MRG_RXBUF. Users who favor memory saving over packet rate could enable
this feature but we want to keep it off by default.

One can still enable it when creating the vdpa device using vdpa tool by
providing features that include it.

For example:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

commit | commitdiff | tree

Feng Liu [Wed, 15 Mar 2023 18:54:56 +0000 (20:54 +0200)]

virtio_ring: Allow non power of 2 sizes for packed virtqueue

According to the Virtio Specification, the Queue Size parameter of a
virtqueue corresponds to the maximum number of descriptors in that
queue, and it does not have to be a power of 2 for packed virtqueues.
However, the virtio_pci_modern driver enforced a power of 2 check for
virtqueue sizes, which is unnecessary and restrictive for packed
virtuqueue.

Split virtqueue still needs to check the virtqueue size is power_of_2
which has been done in vring_alloc_queue_split of the virtio_ring layer.

To validate this change, we tested various virtqueue sizes for packed
rings, including 128, 256, 512, 100, 200, 500, and 1000, with
CONFIG_PAGE_POISONING enabled, and all tests passed successfully.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20230315185458.11638-2-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Mike Christie [Tue, 21 Mar 2023 02:06:24 +0000 (21:06 -0500)]

vhost-scsi: Reduce vhost_scsi_mutex use

We on longer need to hold the vhost_scsi_mutex the entire time we
set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related
to the tpg list, the port link/unlink functions use the tv_tpg_mutex while
accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer
queue events after the virtqueue's backend has been cleared and flushed,
and we don't drop our refcount to the tpg until after we have stopped
cmds and wait for outstanding cmds to complete.

This moves the vhost_scsi_mutex use to it's documented use of being used
to access the tpg list. We then don't need to hold it while a flush is
being performed causing other device's vhost_scsi_set_endpoint
and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a
flakey device.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-8-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Mike Christie [Tue, 21 Mar 2023 02:06:23 +0000 (21:06 -0500)]

vhost-scsi: Drop vhost_scsi_mutex use in port callouts

We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and
vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared
tpg->vhost_scsi and it can't be freed while they are using.

However, we currently set the tpg->vhost_scsi pointer while holding
tv_tpg_mutex. So, we can just hold that while calling
vhost_scsi_hotplug/hotunplug. We then don't need to hold the
vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing
a flush which could cause the LUN map/unmap to have to wait on another
device's flush.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-7-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Mike Christie [Tue, 21 Mar 2023 02:06:22 +0000 (21:06 -0500)]

vhost-scsi: Check for a cleared backend before queueing an event

We currenly hold the vhost_scsi_mutex while clearing the endpoint and
while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed
from uder us, and to make sure anything queued is handled by the
full call in vhost_scsi_clear_endpoint.

This patch removes the need for the vhost_scsi_mutex for the latter
case. In the next patches, we won't hold the vhost_scsi_mutex while
flushing so this patch adds a check for the clearing of the virtqueue
from vhost_scsi_clear_endpoint. We then know that once
vhost_scsi_clear_endpoint has cleared the backend that no new events
will be queued, and the flush after the vhost_vq_set_backend(vq, NULL)
call will see everything that's been queued to that point. So the flush
will then handle all events without the need for the vhost_scsi_mutex.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-6-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Mike Christie [Tue, 21 Mar 2023 02:06:21 +0000 (21:06 -0500)]

vhost-scsi: Drop device mutex use in vhost_scsi_do_plug

We don't need the device mutex in vhost_scsi_do_plug because:
1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not
change on us and the vhost_scsi can't be freed from under us if it was
set.
2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while
holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint
has completed that vhost_scsi_do_plug can't send new events and any
queued ones have completed.

So this patch drops the device mutex use in vhost_scsi_do_plug.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Mike Christie [Tue, 21 Mar 2023 02:06:20 +0000 (21:06 -0500)]

vhost-scsi: Delay releasing our refcount on the tpg

We currently hold the vhost_scsi_mutex the entire time we are running
vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents
userspace from being able to free the se_tpg from under us after we have
called target_undepend_item. However, it forces management operations for
for other devices to have to wait on a flakey device's:

vhost_scsi_clear_endpoint -> vhost_scsi_flush()

call which can which can take a long time.

This moves the target_undepend_item call and the tpg unsetup code to after
we have stopped new IO from starting up and after we have waited on
running IO. We can then release our refcount on the tpg and session
knowing our device is no longer accessing them. We can then drop the
vhost_scsi_mutex use during thee flush call in later patches in this set,
when we have removed other reasons for holding it.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-4-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Feng Liu [Fri, 10 Mar 2023 05:34:28 +0000 (07:34 +0200)]

virtio_ring: Use const to annotate read-only pointer params

Add const to make the read-only pointer parameters clear, similar to
many existing functions.

To implement this change, the commit also introduces the use of
`container_of_const` to implement `to_vvq`, which ensures the const-ness
of read-only parameters and avoids accidental modification of their
members.

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Message-Id: <20230310053428.3376-4-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Feng Liu [Fri, 10 Mar 2023 05:34:27 +0000 (07:34 +0200)]

virtio_ring: Avoid using inline for small functions

According to kernel coding style [1], defining inline functions is not
necessary and beneficial for simple functions. Hence clean up the code
by removing the inline keyword.

It is verified with GCC 12.2.0, the generated code with/without inline
is same. Additionally tested with pktgen and iperf, and verified the
result, the pps test results are the same in the cases of with/without
inline.

Iperf and pps of pktgen for virtio-net didn't change before and after
the change.

[1]
https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-disease

Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20230310053428.3376-3-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Rong Tao [Thu, 9 Mar 2023 08:42:50 +0000 (16:42 +0800)]

tools/virtio: virtio_test -h,--help should return directly

When we get help information, we should return directly, and we should not
execute test cases. Move the exit() directly into the help() function and
remove it from case '?'.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_822CEBEB925205EA1573541CD1C2604F4805@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Rong Tao [Thu, 9 Mar 2023 06:13:20 +0000 (14:13 +0800)]

tools/virtio: virtio_test: Fix indentation

Replace eight spaces with Tab.

Signed-off-by: Rong Tao <rtoax@foxmail.com>
Message-Id: <tencent_89579C514BC4020324A1A4ACA44B5B95BB07@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Christophe JAILLET [Sat, 18 Feb 2023 08:10:31 +0000 (09:10 +0100)]

virtio: Reorder fields in 'struct virtqueue'

Group some variables based on their sizes to reduce hole and avoid padding.
On x86_64, this shrinks the size of 'struct virtqueue'
from 72 to 68 bytes.

It saves a few bytes of memory.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <8f3d2e49270a2158717e15008e7ed7228196ba02.1676707807.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Lafreniere <peter@n8pjl.ca>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

commit | commitdiff | tree

Jacob Keller [Mon, 27 Feb 2023 21:41:27 +0000 (13:41 -0800)]

vhost: use struct_size and size_add to compute flex array sizes

The vhost_get_avail_size and vhost_get_used_size functions compute the size
of structures with flexible array members with an additional 2 bytes if the
VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use
struct_size() and size_add() instead of coding the calculation by hand.

This ensures that the calculations will saturate at SIZE_MAX rather than
overflowing.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Eli Cohen [Mon, 17 Apr 2023 11:03:43 +0000 (14:03 +0300)]

vdpa/mlx5: Avoid losing link state updates

Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
negotiated. However, link state updates could be received before feature
negotiation was completed , therefore causing link state events to be
lost, possibly leaving the link state down.

Modify the code so link state notifier is registered after DRIVER_OK was
negotiated and carry the registration only if
VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the
device is reset.

Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230417110343.138319-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 22:23:53 +0000 (15:23 -0700)]

Linux 6.3-rc7

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 17:33:43 +0000 (10:33 -0700)]

Merge tag 'sched_urgent_for_v6.3_rc7' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

- Do not pull tasks to the local scheduling group if its average load
is higher than the average system load

* tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix imbalance overflow

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 17:28:29 +0000 (10:28 -0700)]

Merge tag 'x86_urgent_for_v6.3_rc7' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

- Drop __init annotation from two rtc functions which get called after
boot is done, in order to prevent a crash

* tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/rtc: Remove __init for runtime functions

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 16:55:18 +0000 (09:55 -0700)]

Merge tag 'powerpc-6.3-5' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:

- A fix for NUMA distance handling in the pseries SCM (pmem) driver.

Thanks to Aneesh Kumar K.V.

* tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Update the NUMA distance table for the target node

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 16:46:32 +0000 (09:46 -0700)]

Merge tag 'kbuild-fixes-v6.3-3' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Drop debug info from purgatory objects again

- Document that kernel.org provides prebuilt LLVM toolchains

- Give up handling untracked files for source package builds

- Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given
   with a pre-epoch data.

- Change panic_show_mem() to a macro to handle variable-length argument

- Compress tarballs on-the-fly again

* tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: do not create intermediate *.tar for tar packages
  kbuild: do not create intermediate *.tar for source tarballs
  kbuild: merge cmd_archive_linux and cmd_archive_perf
  init/initramfs: Fix argument forwarding to panic() in panic_show_mem()
  initramfs: Check negative timestamp to prevent broken cpio archive
  kbuild: give up untracked files for source package builds
  Documentation/llvm: Add a note about prebuilt kernel.org toolchains
  purgatory: fix disabling debug info

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 16:39:55 +0000 (09:39 -0700)]

Merge tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd

Pull ksmbd server fix from Steve French:
"smb311 server preauth integrity negotiate context parsing fix (check
for out of bounds access)"

* tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
ksmbd: avoid out of bounds access in decode_preauth_ctxt()

commit | commitdiff | tree

Masahiro Yamada [Fri, 7 Apr 2023 10:16:29 +0000 (19:16 +0900)]

kbuild: do not create intermediate *.tar for tar packages

Commit 05e96e96a315 ("kbuild: use git-archive for source package
creation") split the compression as a separate step to factor out
the common build rules.

With the previous commit, we got back to the situation where source
tarballs are compressed on-the-fly.
There is no reason to keep the separate compression rules.

Generate the comressed tar packages directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>

commit | commitdiff | tree

Masahiro Yamada [Fri, 7 Apr 2023 10:16:28 +0000 (19:16 +0900)]

kbuild: do not create intermediate *.tar for source tarballs

Since commit 05e96e96a315 ("kbuild: use git-archive for source package
creation"), a source tarball is created in two steps; create *.tar file
then compress it. I split the compression as a separate rule because I
just thought 'git archive' supported only gzip.

For other compression algorithms, I could pipe the two commands:

$ git archive HEAD | xz > linux.tar.xz

I read git-archive(1) carefully, and I realized GIT had provided a
more elegant way:

$ git -c tar.tar.xz.command=xz archive -o linux.tar.xz HEAD

This commit uses 'tar.tar.*.command' configuration to specify the
compression backend so we can compress a source tarball on-the-fly.

GIT commit 767cf4579f0e ("archive: implement configurable tar filters")
is more than a decade old, so it should be available on almost all build
environments.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>

commit | commitdiff | tree

Masahiro Yamada [Fri, 7 Apr 2023 10:16:27 +0000 (19:16 +0900)]

kbuild: merge cmd_archive_linux and cmd_archive_perf

The two commands, cmd_archive_linux and cmd_archive_perf, are similar.
Merge them to make it easier to add more changes to the git-archive
command.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>

commit | commitdiff | tree

Benjamin Gray [Mon, 20 Mar 2023 23:05:34 +0000 (10:05 +1100)]

init/initramfs: Fix argument forwarding to panic() in panic_show_mem()

Forwarding variadic argument lists can't be done by passing a va_list
to a function with signature foo(...) (as panic() has). It ends up
interpreting the va_list itself as a single argument instead of
iterating it. printf() happily accepts it of course, leading to corrupt
output.

Convert panic_show_mem() to a macro to allow forwarding the arguments.
The function is trivial enough that it's easier than trying to introduce
a vpanic() variant.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

commit | commitdiff | tree

Benjamin Gray [Mon, 20 Mar 2023 04:08:38 +0000 (15:08 +1100)]

initramfs: Check negative timestamp to prevent broken cpio archive

Similar to commit 4c9d410f32b3 ("initramfs: Check timestamp to prevent
broken cpio archive"), except asserts that the timestamp is
non-negative. This can happen when the KBUILD_BUILD_TIMESTAMP is a value
before UNIX epoch, which may be set when making reproducible builds that
don't want to look like they use a valid date.

While support for dates before 1970 might not be supported, this is more
about preventing undetected CPIO corruption. The printf's use a minimum
length format specifier, and will happily make the field longer than 8
characters if they need to.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Sun, 16 Apr 2023 01:37:51 +0000 (18:37 -0700)]

Merge tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fix from Steve French:
"Small client fix for better checking for smb311 negotiate context
overflows, also marked for stable"

* tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix negotiate context parsing

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Apr 2023 23:55:09 +0000 (16:55 -0700)]

Merge tag 'ubifs-for-linus-6.3-rc7' of git://git./linux/kernel/git/rw/ubifs

Pull UBI fixes from Richard Weinberger:

- Fix failure to attach when vid_hdr offset equals the (sub)page size

- Fix for a deadlock in UBI's worker thread

* tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
ubi: Fix deadlock caused by recursively holding work_sem

commit | commitdiff | tree

David Disseldorp [Thu, 6 Apr 2023 22:34:11 +0000 (00:34 +0200)]

cifs: fix negotiate context parsing

smb311_decode_neg_context() doesn't properly check against SMB packet
boundaries prior to accessing individual negotiate context entries. This
is due to the length check omitting the eight byte smb2_neg_context
header, as well as incorrect decrementing of len_of_ctxts.

Fixes: 5100d8a3fe03 ("SMB311: Improve checking of negotiate security contexts")
Reported-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Apr 2023 18:06:49 +0000 (11:06 -0700)]

Merge tag 'i2c-for-6.3-rc7' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Just two driver fixes"

* tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ocores: generate stop condition after timeout in polling mode
i2c: mchp-pci1xxxx: Update Timing registers

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Apr 2023 17:49:47 +0000 (10:49 -0700)]

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"One small fix to SCSI Enclosure Services to fix a regression caused by
another recent fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ses: Handle enclosure with just a primary component gracefully

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Apr 2023 17:39:18 +0000 (10:39 -0700)]

Merge tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
"A single NVMe quirk entry addition"

* tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux:
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD

commit | commitdiff | tree

Linus Torvalds [Sat, 15 Apr 2023 17:29:53 +0000 (10:29 -0700)]

Merge tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"Just a small tweak to when task_work needs redirection, marked for
stable as well"

* tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux:
io_uring: complete request via task work in case of DEFER_TASKRUN

commit | commitdiff | tree

Linus Torvalds [Fri, 14 Apr 2023 17:44:48 +0000 (10:44 -0700)]

Merge tag 'riscv-for-linus-6.3-rc7' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- A fix for a missing fence when generating the NOMMU sigreturn
   trampoline

- A set of fixes for early DTB handling of reserved memory nodes

* tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: No need to relocate the dtb as it lies in the fixmap region
  riscv: Do not set initial_boot_params to the linear address of the dtb
  riscv: Move early dtb mapping into the fixmap region
  riscv: add icache flush for nommu sigreturn trampoline

commit | commitdiff | tree

Linus Torvalds [Fri, 14 Apr 2023 17:37:07 +0000 (10:37 -0700)]

Merge tag 'acpi-6.3-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These add two ACPI-related quirks:

   - Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario
     Limonciello)

   - Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul
     Menzel)"

* tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA
  ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable

commit | commitdiff | tree

Linus Torvalds [Fri, 14 Apr 2023 17:25:30 +0000 (10:25 -0700)]

Merge tag 'pm-6.3-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Make the amd-pstate cpufreq driver take all of the possible
  combinations of the 'old' and 'new' status values correctly while
  changing the operation mode via sysfs (Wyes Karny)"

* tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  amd-pstate: Fix amd_pstate mode switch

commit | commitdiff | tree

Linus Torvalds [Fri, 14 Apr 2023 17:19:18 +0000 (10:19 -0700)]

Merge tag 'thermal-6.3-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
"Modify the Intel thermal throttling code to avoid updating unsupported
  status clearing mask bits which causes the kernel to complain about
  unchecked MSR access (Srinivas Pandruvada)"

* tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits

commit | commitdiff | tree

Linus Torvalds [Fri, 14 Apr 2023 17:13:54 +0000 (10:13 -0700)]

Merge tag 'sound-6.3-rc7' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes.

  At this time, quite a few fixes for the old PCI drivers are found.
  Although they are not regression fixes, I took these as they are
  materials for stable kernels.

  In addition, a couple of regression fixes and another couple of
  HD-audio quirks are included"

* tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: disable KAE for Intel DG2
  ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2
  ALSA: hda: patch_realtek: add quirk for Asus N7601ZM
  ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
  ALSA: emu10k1: don't create old pass-through playback device on Audigy
  ALSA: emu10k1: fix capture interrupt handler unlinking
  ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
  ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
  ALSA: i2c/cs8427: fix iec958 mixer control deactivation

commit | commitdiff | tree

Linus Torvalds [Fri, 14 Apr 2023 17:06:50 +0000 (10:06 -0700)]

Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"We had a fairly slow cycle on the rc side this time, here are the
  accumulated fixes, mostly in drivers:

   - irdma should not generate extra completions during flushing

   - Fix several memory leaks

   - Do not get confused in irdma's iwarp mode if IPv6 is present

   - Correct a link speed calculation in mlx5

   - Increase the EQ/WQ limits on erdma as they are too small for big
     applications

   - Use the right math for erdma's inline mtt feature

   - Make erdma probing more robust to boot time ordering differences

   - Fix a KMSAN crash in CMA due to uninitialized qkey"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/core: Fix GID entry ref leak when create_ah fails
  RDMA/cma: Allow UD qp_type to join multicast only
  RDMA/erdma: Defer probing if netdevice can not be found
  RDMA/erdma: Inline mtt entries into WQE if supported
  RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
  RDMA/erdma: Fix some typos
  IB/mlx5: Add support for 400G_8X lane speed
  RDMA/irdma: Add ipv4 check to irdma_find_listener()
  RDMA/irdma: Increase iWARP CM default rexmit count
  RDMA/irdma: Fix memory leak of PBLE objects
  RDMA/irdma: Do not generate SW completions for NOPs

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 14 Apr 2023 13:15:32 +0000 (15:15 +0200)]

Merge branch 'acpi-x86'

Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario
Limonciello).

* acpi-x86:
ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable

commit | commitdiff | tree

Ming Lei [Fri, 14 Apr 2023 07:53:13 +0000 (15:53 +0800)]

io_uring: complete request via task work in case of DEFER_TASKRUN

So far io_req_complete_post() only covers DEFER_TASKRUN by completing
request via task work when the request is completed from IOWQ.

However, uring command could be completed from any context, and if io
uring is setup with DEFER_TASKRUN, the command is required to be
completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
can't be wakeup, and may hang forever.

The issue can be observed on removing ublk device, but turns out it is
one generic issue for uring command & DEFER_TASKRUN, so solve it in
io_uring core code.

Fixes: e6aeb2721d3b ("io_uring: complete all requests in task context")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/
Reported-by: Jens Axboe <axboe@kernel.dk>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Jens Axboe [Fri, 14 Apr 2023 12:29:00 +0000 (06:29 -0600)]

Merge branch 'nvme-6.3' of git://git.infradead.org/nvme into block-6.3

Pull NVMe fix from Christoph.

* 'nvme-6.3' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD

commit | commitdiff | tree

Kai Vehmanen [Thu, 13 Apr 2023 19:11:53 +0000 (22:11 +0300)]

ALSA: hda/hdmi: disable KAE for Intel DG2

Use of keep-alive (KAE) has resulted in loss of audio on some A750/770
cards as the transition from keep-alive to stream playback is not
working as expected. As there is limited benefit of the new KAE mode
on discrete cards, revert back to older silent-stream implementation
on these systems.

Cc: stable@vger.kernel.org
Fixes: 15175a4f2bbb ("ALSA: hda/hdmi: add keep-alive support for ADL-P and DG2")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8307
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20230413191153.3692049-1-kai.vehmanen@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Duy Truong [Fri, 14 Apr 2023 00:55:48 +0000 (17:55 -0700)]

nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD

Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting
duplicate NGUIDs.

Signed-off-by: Duy Truong <dory@dory.moe>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>

commit | commitdiff | tree

Alexandre Ghiti [Wed, 29 Mar 2023 08:19:32 +0000 (10:19 +0200)]

riscv: No need to relocate the dtb as it lies in the fixmap region

We used to access the dtb via its linear mapping address but now that the
dtb early mapping was moved in the fixmap region, we can keep using this
address since it is present in swapper_pg_dir, and remove the dtb
relocation.

Note that the relocation was wrong anyway since early_memremap() is
restricted to 256K whereas the maximum fdt size is 2MB.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230329081932.79831-4-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

commit | commitdiff | tree

Alexandre Ghiti [Wed, 29 Mar 2023 08:19:31 +0000 (10:19 +0200)]

riscv: Do not set initial_boot_params to the linear address of the dtb

early_init_dt_verify() is already called in parse_dtb() and since the dtb
address does not change anymore (it is now in the fixmap region), no need
to reset initial_boot_params by calling early_init_dt_verify() again.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329081932.79831-3-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

commit | commitdiff | tree

Alexandre Ghiti [Wed, 29 Mar 2023 08:19:30 +0000 (10:19 +0200)]

riscv: Move early dtb mapping into the fixmap region

riscv establishes 2 virtual mappings:

- early_pg_dir maps the kernel which allows to discover the system
memory
- swapper_pg_dir installs the final mapping (linear mapping included)

We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.

The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
must be called before swapper_pg_dir is set up otherwise we could
allocate in reserved memory defined in the dtb.

So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.

Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob")
Fixes: 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap")
Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230329081932.79831-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 23:28:33 +0000 (16:28 -0700)]

Merge tag 'cgroup-for-6.3-rc6-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"This is a relatively big pull request this late in the cycle but the
  major contributor is the cpuset bug which is rather significant:

   - Fix several cpuset bugs including one where it wasn't applying the
     target cgroup when tasks are created with CLONE_INTO_CGROUP

  With a few smaller fixes:

   - Fix inversed locking order in cgroup1 freezer implementation

   - Fix garbage cpu.stat::core_sched.forceidle_usec reporting in the
     root cgroup"

* tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpuset
  cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
  cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
  cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
  cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
  cgroup/cpuset: Fix partition root's cpuset.cpus update bug
  cgroup: fix display of forceidle time at root

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 23:16:33 +0000 (16:16 -0700)]

Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A few more clk driver fixes:

   - Set the max_register member of the spreadtrum regmap so that reads
     don't go off the end of the I/O space

   - Avoid a clk parent error in the i.MX imx6ul driver when the
     selector is unknown

   - Fix an oops due to REGCACHE_NONE usage by the Renesas 9-series
     driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: rs9: Fix suspend/resume
  clk: imx6ul: fix "failed to get parent" error
  clk: sprd: set max_register according to mapping range

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 22:33:04 +0000 (15:33 -0700)]

Merge tag 'net-6.3-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, and bluetooth.

  Not all that quiet given spring celebrations, but "current" fixes are
  thinning out, which is encouraging. One outstanding regression in the
  mlx5 driver when using old FW, not blocking but we're pushing for a
  fix.

  Current release - new code bugs:

   - eth: enetc: workaround for unresponsive pMAC after receiving
     express traffic

  Previous releases - regressions:

   - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
     pid/seq fields 0 for backward compatibility

  Previous releases - always broken:

   - sctp: fix a potential overflow in sctp_ifwdtsn_skip

   - mptcp:
      - use mptcp_schedule_work instead of open-coding it and make the
        worker check stricter, to avoid scheduling work on closed
        sockets
      - fix NULL pointer dereference on fastopen early fallback

   - skbuff: fix memory corruption due to a race between skb coalescing
     and releasing clones confusing page_pool reference counting

   - bonding: fix neighbor solicitation validation on backup slaves

   - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp

   - bpf: arm64: fixed a BTI error on returning to patched function

   - openvswitch: fix race on port output leading to inf loop

   - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
     returning a different errno than expected

   - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove

   - Bluetooth: fix printing errors if LE Connection times out

   - Bluetooth: assorted UaF, deadlock and data race fixes

   - eth: macb: fix memory corruption in extended buffer descriptor mode

  Misc:

   - adjust the XDP Rx flow hash API to also include the protocol layers
     over which the hash was computed"

* tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
  mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
  veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
  mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
  xdp: rss hash types representation
  selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
  skbuff: Fix a race between coalescing and releasing SKBs
  net: macb: fix a memory corruption in extended buffer descriptor mode
  selftests: add the missing CONFIG_IP_SCTP in net config
  udp6: fix potential access to stale information
  selftests: openvswitch: adjust datapath NL message declaration
  selftests: mptcp: userspace pm: uniform verify events
  mptcp: fix NULL pointer dereference on fastopen early fallback
  mptcp: stricter state check in mptcp_worker
  mptcp: use mptcp_schedule_work instead of open-coding it
  net: enetc: workaround for unresponsive pMAC after receiving express traffic
  sctp: fix a potential overflow in sctp_ifwdtsn_skip
  net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
  rtnetlink: Restore RTM_NEW/DELLINK notification behavior
  net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 22:21:56 +0000 (15:21 -0700)]

Merge tag 'devicetree-fixes-for-6.2-3' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fix interaction between fw_devlink and DT overlays causing devices to
   not be probed

- Fix the compatible string for loongson,cpu-interrupt-controller

* tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  treewide: Fix probing of devices in DT overlays
  dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 22:17:59 +0000 (15:17 -0700)]

Merge tag 'pinctrl-v6.3-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
"This is just a revert of the AMD fix, because the fix broke some
laptops. We are working on a proper solution"

* tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
Revert "pinctrl: amd: Disable and mask interrupts on resume"

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 21:58:55 +0000 (14:58 -0700)]

Merge tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:

- two fbcon regressions

- amdgpu: dp mst, smu13

- i915: dual link dsi for tgl+

- armada, nouveau, drm/sched, fbmem

* tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm:
  fbcon: set_con2fb_map needs to set con2fb_map!
  fbcon: Fix error paths in set_con2fb_map
  drm/amd/pm: correct the pcie link state check for SMU13
  drm/amd/pm: correct SMU13.0.7 max shader clock reporting
  drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
  drm/amd/display: Pass the right info to drm_dp_remove_payload
  drm/armada: Fix a potential double free in an error handling path
  fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
  drm/nouveau/fb: add missing sysmen flush callbacks
  drm/i915/dsi: fix DSS CTL register offsets for TGL+
  drm/scheduler: Fix UAF race in drm_sched_entity_push_job()

commit | commitdiff | tree

Jakub Kicinski [Thu, 13 Apr 2023 20:04:44 +0000 (13:04 -0700)]

Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-04-13

We've added 6 non-merge commits during the last 1 day(s) which contain
a total of 14 files changed, 205 insertions(+), 38 deletions(-).

The main changes are:

1) One late straggler fix on the XDP hints side which fixes
   bpf_xdp_metadata_rx_hash kfunc API before the release goes out
   in order to provide information on the RSS hash type,
   from Jesper Dangaard Brouer.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
  mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
  veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
  mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
  xdp: rss hash types representation
  selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
====================

Link: https://lore.kernel.org/r/20230413192939.10202-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

David Disseldorp [Thu, 13 Apr 2023 14:49:57 +0000 (23:49 +0900)]

ksmbd: avoid out of bounds access in decode_preauth_ctxt()

Confirm that the accessed pneg_ctxt->HashAlgorithms address sits within
the SMB request boundary; deassemble_neg_contexts() only checks that the
eight byte smb2_neg_context header + (client controlled) DataLength are
within the packet boundary, which is insufficient.

Checking for sizeof(struct smb2_preauth_neg_context) is overkill given
that the type currently assumes SMB311_SALT_SIZE bytes of trailing Salt.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

commit | commitdiff | tree

Daniel Vetter [Thu, 13 Apr 2023 18:47:58 +0000 (20:47 +0200)]

Merge tag 'drm-misc-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

* armada: Fix double free
* fb: Clear FB_ACTIVATE_KD_TEXT in ioctl
* nouveau: Add missing callbacks
* scheduler: Fix use-after-free error

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230413184233.GA8148@linux-uq9g

commit | commitdiff | tree

Alexei Starovoitov [Thu, 13 Apr 2023 18:05:49 +0000 (11:05 -0700)]

Merge branch 'XDP-hints: change RX-hash kfunc bpf_xdp_metadata_rx_hash'

Jesper Dangaard Brouer says:

====================

Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value,
but doesn't provide information on the RSS hash type (part of 6.3-rc).

This patchset proposal is to change the function call signature via adding
a pointer value argument for providing the RSS hash type.

Patchset also removes all bpf_printk's from xdp_hw_metadata program
that we expect driver developers to use. Instead counters are introduced
for relaying e.g. skip and fail info.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 12 Apr 2023 19:49:00 +0000 (21:49 +0200)]

selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg

Update BPF selftests to use the new RSS type argument for kfunc
bpf_xdp_metadata_rx_hash.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 12 Apr 2023 19:48:55 +0000 (21:48 +0200)]

mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type

Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via matching individual Completion Queue Entry (CQE) status bits.

Fixes: ab46182d0dcb ("net/mlx4_en: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 12 Apr 2023 19:48:50 +0000 (21:48 +0200)]

veth: bpf_xdp_metadata_rx_hash add xdp rss hash type

Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type.

The veth driver currently only support XDP-hints based on SKB code path.
The SKB have lost information about the RSS hash type, by compressing
the information down to a single bitfield skb->l4_hash, that only knows
if this was a L4 hash value.

In preparation for veth, the xdp_rss_hash_type have an L4 indication
bit that allow us to return a meaningful L4 indication when working
with SKB based packets.

Fixes: 306531f0249f ("veth: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 12 Apr 2023 19:48:45 +0000 (21:48 +0200)]

mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type

Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via mapping table.

The mlx5 hardware can also identify and RSS hash IPSEC. This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.

Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.

Fixes: bc8d405b1ba9 ("net/mlx5e: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 12 Apr 2023 19:48:40 +0000 (21:48 +0200)]

xdp: rss hash types representation

The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.

Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.

The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
contains both BITs for the L3/L4 types, and combinations to be used by
drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
to BPF via BTF, and it is up to the BPF-programmer to match using these
defines.

This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
a pointer value argument for provide the RSS hash type.
Change signature for all xmo_rx_hash calls in drivers to make it compile.

The RSS type implementations for each driver comes as separate patches.

Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 12 Apr 2023 19:48:35 +0000 (21:48 +0200)]

selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters

The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.

Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.

Add counters for providing remaining information about failure and
skipped packet events.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Vetter [Wed, 12 Apr 2023 15:31:46 +0000 (17:31 +0200)]

fbcon: set_con2fb_map needs to set con2fb_map!

I got really badly confused in d443d9386472 ("fbcon: move more common
code into fb_open()") because we set the con2fb_map before the failure
points, which didn't look good.

But in trying to fix that I moved the assignment into the wrong path -
we need to do it for _all_ vc we take over, not just the first one
(which additionally requires the call to con2fb_acquire_newinfo).

I've figured this out because of a KASAN bug report, where the
fbcon_registered_fb and fbcon_display arrays went out of sync in
fbcon_mode_deleted() because the con2fb_map pointed at the old
fb_info, but the modes and everything was updated for the new one.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: d443d9386472 ("fbcon: move more common code into fb_open()")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.19+

commit | commitdiff | tree

Daniel Vetter [Wed, 12 Apr 2023 15:23:49 +0000 (17:23 +0200)]

fbcon: Fix error paths in set_con2fb_map

This is a regressoin introduced in b07db3958485 ("fbcon: Ditch error
handling for con2fb_release_oldinfo"). I failed to realize what the if
(!err) checks. The mentioned commit was dropping the
con2fb_release_oldinfo() return value but the if (!err) was also
checking whether the con2fb_acquire_newinfo() function call above
failed or not.

Fix this with an early return statement.

Note that there's still a difference compared to the orginal state of
the code, the below lines are now also skipped on error:

if (!search_fb_in_map(info_idx))
info_idx = newidx;

These are only needed when we've actually thrown out an old fb_info
from the console mappings, which only happens later on.

Also move the fbcon_add_cursor_work() call into the same if block,
it's all protected by console_lock so doesn't matter when we set up
the blinking cursor delayed work anyway. This further simplifies the
control flow and allows us to ditch the found local variable.

v2: Clarify commit message (Javier)

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: b07db3958485 ("fbcon: Ditch error handling for con2fb_release_oldinfo")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.19+

commit | commitdiff | tree

Liang Chen [Thu, 13 Apr 2023 09:03:53 +0000 (17:03 +0800)]

skbuff: Fix a race between coalescing and releasing SKBs

Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
<IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230413090353.14448-1-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Roman Gushchin [Wed, 12 Apr 2023 23:21:44 +0000 (16:21 -0700)]

net: macb: fix a memory corruption in extended buffer descriptor mode

For quite some time we were chasing a bug which looked like a sudden
permanent failure of networking and mmc on some of our devices.
The bug was very sensitive to any software changes and even more to
any kernel debug options.

Finally we got a setup where the problem was reproducible with
CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma:

[   16.992082] ------------[ cut here ]------------
[   16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes]
[   17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900
[   17.018977] Modules linked in: xxxxx
[   17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28
[   17.045345] Hardware name: xxxxx
[   17.049528] pstate: 60000005 (nZCv daif -PAN -UAO)
[   17.054322] pc : check_unmap+0x6a0/0x900
[   17.058243] lr : check_unmap+0x6a0/0x900
[   17.062163] sp : ffffffc010003c40
[   17.065470] x29: ffffffc010003c40 x28: 000000004000c03c
[   17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800
[   17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8
[   17.081407] x23: 0000000000000000 x22: ffffffc010a08750
[   17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000
[   17.092032] x19: 0000000875e3e244 x18: 0000000000000010
[   17.097343] x17: 0000000000000000 x16: 0000000000000000
[   17.102647] x15: ffffff8879e4a988 x14: 0720072007200720
[   17.107959] x13: 0720072007200720 x12: 0720072007200720
[   17.113261] x11: 0720072007200720 x10: 0720072007200720
[   17.118565] x9 : 0720072007200720 x8 : 000000000000022d
[   17.123869] x7 : 0000000000000015 x6 : 0000000000000098
[   17.129173] x5 : 0000000000000000 x4 : 0000000000000000
[   17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370
[   17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000
[   17.145082] Call trace:
[   17.147524]  check_unmap+0x6a0/0x900
[   17.151091]  debug_dma_unmap_page+0x88/0x90
[   17.155266]  gem_rx+0x114/0x2f0
[   17.158396]  macb_poll+0x58/0x100
[   17.161705]  net_rx_action+0x118/0x400
[   17.165445]  __do_softirq+0x138/0x36c
[   17.169100]  irq_exit+0x98/0xc0
[   17.172234]  __handle_domain_irq+0x64/0xc0
[   17.176320]  gic_handle_irq+0x5c/0xc0
[   17.179974]  el1_irq+0xb8/0x140
[   17.183109]  xiic_process+0x5c/0xe30
[   17.186677]  irq_thread_fn+0x28/0x90
[   17.190244]  irq_thread+0x208/0x2a0
[   17.193724]  kthread+0x130/0x140
[   17.196945]  ret_from_fork+0x10/0x20
[   17.200510] ---[ end trace 7240980785f81d6f ]---

[  237.021490] ------------[ cut here ]------------
[  237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b
[  237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240
[  237.041802] Modules linked in: xxxxx
[  237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.4.0 #28
[  237.068941] Hardware name: xxxxx
[  237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO)
[  237.077900] pc : add_dma_entry+0x214/0x240
[  237.081986] lr : add_dma_entry+0x214/0x240
[  237.086072] sp : ffffffc010003c30
[  237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00
[  237.094683] x27: 0000000000000180 x26: ffffff8878e387c0
[  237.099987] x25: 0000000000000002 x24: 0000000000000000
[  237.105290] x23: 000000000000003b x22: ffffffc010a0fa00
[  237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600
[  237.115897] x19: 00000000ffffffef x18: 0000000000000010
[  237.121201] x17: 0000000000000000 x16: 0000000000000000
[  237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720
[  237.131807] x13: 0720072007200720 x12: 0720072007200720
[  237.137111] x11: 0720072007200720 x10: 0720072007200720
[  237.142415] x9 : 0720072007200720 x8 : 0000000000000259
[  237.147718] x7 : 0000000000000001 x6 : 0000000000000000
[  237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001
[  237.158325] x3 : 0000000000000006 x2 : 0000000000000007
[  237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000
[  237.168932] Call trace:
[  237.171373]  add_dma_entry+0x214/0x240
[  237.175115]  debug_dma_map_page+0xf8/0x120
[  237.179203]  gem_rx_refill+0x190/0x280
[  237.182942]  gem_rx+0x224/0x2f0
[  237.186075]  macb_poll+0x58/0x100
[  237.189384]  net_rx_action+0x118/0x400
[  237.193125]  __do_softirq+0x138/0x36c
[  237.196780]  irq_exit+0x98/0xc0
[  237.199914]  __handle_domain_irq+0x64/0xc0
[  237.204000]  gic_handle_irq+0x5c/0xc0
[  237.207654]  el1_irq+0xb8/0x140
[  237.210789]  arch_cpu_idle+0x40/0x200
[  237.214444]  default_idle_call+0x18/0x30
[  237.218359]  do_idle+0x200/0x280
[  237.221578]  cpu_startup_entry+0x20/0x30
[  237.225493]  rest_init+0xe4/0xf0
[  237.228713]  arch_call_rest_init+0xc/0x14
[  237.232714]  start_kernel+0x47c/0x4a8
[  237.236367] ---[ end trace 7240980785f81d70 ]---

Lars was fast to find an explanation: according to the datasheet
bit 2 of the rx buffer descriptor entry has a different meaning in the
extended mode:
  Address [2] of beginning of buffer, or
  in extended buffer descriptor mode (DMA configuration register [28] = 1),
  indicates a valid timestamp in the buffer descriptor entry.

The macb driver didn't mask this bit while getting an address and it
eventually caused a memory corruption and a dma failure.

The problem is resolved by explicitly clearing the problematic bit
if hw timestamping is used.

Fixes: 7b4296148066 ("net: macb: Add support for PTP timestamps in DMA descriptors")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Co-developed-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230412232144.770336-1-roman.gushchin@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Xin Long [Wed, 12 Apr 2023 15:13:06 +0000 (11:13 -0400)]

selftests: add the missing CONFIG_IP_SCTP in net config

The selftest sctp_vrf needs CONFIG_IP_SCTP set in config
when building the kernel, so add it.

Fixes: a61bd7b9fef3 ("selftests: add a selftest for sctp vrf")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/61dddebc4d2dd98fe7fb145e24d4b2430e42b572.1681312386.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Wed, 12 Apr 2023 13:03:08 +0000 (13:03 +0000)]

udp6: fix potential access to stale information

lena wang reported an issue caused by udpv6_sendmsg()
mangling msg->msg_name and msg->msg_namelen, which
are later read from ____sys_sendmsg() :

/*
* If this is sendmmsg() and sending to current destination address was
* successful, remember it.
*/
if (used_address && err >= 0) {
used_address->name_len = msg_sys->msg_namelen;
if (msg_sys->msg_name)
memcpy(&used_address->name, msg_sys->msg_name,
used_address->name_len);
}

udpv6_sendmsg() wants to pretend the remote address family
is AF_INET in order to call udp_sendmsg().

A fix would be to modify the address in-place, instead
of using a local variable, but this could have other side effects.

Instead, restore initial values before we return from udpv6_sendmsg().

Fixes: c71d8ebe7a44 ("net: Fix security_socket_sendmsg() bypass problem.")
Reported-by: lena wang <lena.wang@mediatek.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230412130308.1202254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Aaron Conole [Wed, 12 Apr 2023 11:58:28 +0000 (07:58 -0400)]

selftests: openvswitch: adjust datapath NL message declaration

The netlink message for creating a new datapath takes an array
of ports for the PID creation. This shouldn't cause much issue
but correct it for future cases where we need to do decode of
datapath information that could include the per-cpu PID map.

Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 13 Apr 2023 16:59:00 +0000 (09:59 -0700)]

Merge branch 'mptcp-more-fixes-for-6-3'

Matthieu Baerts says:

====================
mptcp: more fixes for 6.3

Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.

Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.

Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.

Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
====================

Link: https://lore.kernel.org/r/20230411-upstream-net-20230411-mptcp-fixes-v1-0-ca540f3ef986@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Matthieu Baerts [Tue, 11 Apr 2023 20:42:12 +0000 (22:42 +0200)]

selftests: mptcp: userspace pm: uniform verify events

Simply adding a "sleep" before checking something is usually not a good
idea because the time that has been picked can not be enough or too
much. The best is to wait for events with a timeout.

In this selftest, 'sleep 0.5' is used more than 40 times. It is always
used before calling a 'verify_*' function except for this
verify_listener_events which has been added later.

At the end, using all these 'sleep 0.5' seems to work: the slow CIs
don't complain so far. Also because it doesn't take too much time, we
can just add two more 'sleep 0.5' to uniform what is done before calling
a 'verify_*' function. For the same reasons, we can also delay a bigger
refactoring to replace all these 'sleep 0.5' by functions waiting for
events instead of waiting for a fix time and hope for the best.

Fixes: 6c73008aa301 ("selftests: mptcp: listener test for userspace PM")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Tue, 11 Apr 2023 20:42:11 +0000 (22:42 +0200)]

mptcp: fix NULL pointer dereference on fastopen early fallback

In case of early fallback to TCP, subflow_syn_recv_sock() deletes
the subflow context before returning the newly allocated sock to
the caller.

The fastopen path does not cope with the above unconditionally
dereferencing the subflow context.

Fixes: 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Tue, 11 Apr 2023 20:42:10 +0000 (22:42 +0200)]

mptcp: stricter state check in mptcp_worker

As reported by Christoph, the mptcp protocol can run the
worker when the relevant msk socket is in an unexpected state:

connect()
// incoming reset + fastclose
// the mptcp worker is scheduled
mptcp_disconnect()
// msk is now CLOSED
listen()
mptcp_worker()

Leading to the following splat:

divide error: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21 Comm: kworker/1:0 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:__tcp_select_window+0x22c/0x4b0 net/ipv4/tcp_output.c:3018
RSP: 0018:ffffc900000b3c98 EFLAGS: 00010293
RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8214ce97 RDI: 0000000000000004
RBP: 000000000000ffd7 R08: 0000000000000004 R09: 0000000000010000
R10: 000000000000ffd7 R11: ffff888005afa148 R12: 000000000000ffd7
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000405270 CR3: 000000003011e006 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_select_window net/ipv4/tcp_output.c:262 [inline]
__tcp_transmit_skb+0x356/0x1280 net/ipv4/tcp_output.c:1345
tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
tcp_send_active_reset+0x13e/0x320 net/ipv4/tcp_output.c:3459
mptcp_check_fastclose net/mptcp/protocol.c:2530 [inline]
mptcp_worker+0x6c7/0x800 net/mptcp/protocol.c:2705
process_one_work+0x3bd/0x950 kernel/workqueue.c:2390
worker_thread+0x5b/0x610 kernel/workqueue.c:2537
kthread+0x138/0x170 kernel/kthread.c:376
ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
</TASK>

This change addresses the issue explicitly checking for bad states
before running the mptcp worker.

Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/374
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paolo Abeni [Tue, 11 Apr 2023 20:42:09 +0000 (22:42 +0200)]

mptcp: use mptcp_schedule_work instead of open-coding it

Beyond reducing code duplication this also avoids scheduling
the mptcp_worker on a closed socket on some edge scenarios.

The addressed issue is actually older than the blamed commit
below, but this fix needs it as a pre-requisite.

Fixes: ba8f48f7a4d7 ("mptcp: introduce mptcp_schedule_work")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Gregor Herburger [Thu, 13 Apr 2023 09:37:37 +0000 (11:37 +0200)]

i2c: ocores: generate stop condition after timeout in polling mode

In polling mode, no stop condition is generated after a timeout. This
causes SCL to remain low and thereby block the bus. If this happens
during a transfer it can cause slaves to misinterpret the subsequent
transfer and return wrong values.

To solve this, pass the ETIMEDOUT error up from ocores_process_polling()
instead of setting STATE_ERROR directly. The caller is adjusted to call
ocores_process_timeout() on error both in polling and in IRQ mode, which
will set STATE_ERROR and generate a stop condition.

Fixes: 69c8c0c0efa8 ("i2c: ocores: add polling interface")
Signed-off-by: Gregor Herburger <gregor.herburger@tq-group.com>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Federico Vaga <federico.vaga@cern.ch>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

commit | commitdiff | tree

Saravanan Vajravel [Sat, 1 Apr 2023 06:34:24 +0000 (23:34 -0700)]

RDMA/core: Fix GID entry ref leak when create_ah fails

If AH create request fails, release sgid_attr to avoid GID entry
referrence leak reported while releasing GID table

Fixes: 1a1f460ff151 ("RDMA: Hold the sgid_attr inside the struct ib_ah/qp")
Link: https://lore.kernel.org/r/20230401063424.342204-1-saravanan.vajravel@broadcom.com
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

commit | commitdiff | tree

Matija Glavinic Pecotic [Thu, 6 Apr 2023 06:26:52 +0000 (08:26 +0200)]

x86/rtc: Remove __init for runtime functions

set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init
annotation is wrong.

A crash was observed on an x86 platform where CMOS RTC is unused and
disabled via device tree. set_rtc_noop() was invoked from ntp:
sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock()
doesn't honour that.

  Workqueue: events_power_efficient sync_hw_clock
  RIP: 0010:set_rtc_noop
  Call Trace:
   update_persistent_clock64
   sync_hw_clock

Fix this by dropping the __init annotation from set/get_rtc_noop().

Fixes: c311ed6183f4 ("x86/init: Allow DT configured systems to disable RTC at boot time")
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com

commit | commitdiff | tree

Daniel Vetter [Thu, 13 Apr 2023 12:24:44 +0000 (14:24 +0200)]

Merge tag 'drm-intel-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v6.3-rc7:
- Fix dual link DSI for TGL+

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/877cugckzu.fsf@intel.com

commit | commitdiff | tree

Vladimir Oltean [Tue, 11 Apr 2023 19:26:45 +0000 (22:26 +0300)]

net: enetc: workaround for unresponsive pMAC after receiving express traffic

I have observed an issue where the RX direction of the LS1028A ENETC pMAC
seems unresponsive. The minimal procedure to reproduce the issue is:

1. Connect ENETC port 0 with a loopback RJ45 cable to one of the Felix
   switch ports (0).

2. Bring the ports up (MAC Merge layer is not enabled on either end).

3. Send a large quantity of unidirectional (express) traffic from Felix
   to ENETC. I tried altering frame size and frame count, and it doesn't
   appear to be specific to either of them, but rather, to the quantity
   of octets received. Lowering the frame count, the minimum quantity of
   packets to reproduce relatively consistently seems to be around 37000
   frames at 1514 octets (w/o FCS) each.

4. Using ethtool --set-mm, enable the pMAC in the Felix and in the ENETC
   ports, in both RX and TX directions, and with verification on both
   ends.

5. Wait for verification to complete on both sides.

6. Configure a traffic class as preemptible on both ends.

7. Send some packets again.

The issue is at step 5, where the verification process of ENETC ends
(meaning that Felix responds with an SMD-R and ENETC sees the response),
but the verification process of Felix never ends (it remains VERIFYING).

If step 3 is skipped or if ENETC receives less traffic than
approximately that threshold, the test runs all the way through
(verification succeeds on both ends, preemptible traffic passes fine).

If, between step 4 and 5, the step below is also introduced:

4.1. Disable and re-enable PM0_COMMAND_CONFIG bit RX_EN

then again, the sequence of steps runs all the way through, and
verification succeeds, even if there was the previous RX traffic
injected into ENETC.

Traffic sent *by* the ENETC port prior to enabling the MAC Merge layer
does not seem to influence the verification result, only received
traffic does.

The LS1028A manual does not mention any relationship between
PM0_COMMAND_CONFIG and MMCSR, and the hardware people don't seem to
know for now either.

The bit that is toggled to work around the issue is also toggled
by enetc_mac_enable(), called from phylink's mac_link_down() and
mac_link_up() methods - which is how the workaround was found:
verification would work after a link down/up.

Fixes: c7b9e8086902 ("net: enetc: add support for MAC Merge layer")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230411192645.1896048-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Xin Long [Mon, 10 Apr 2023 19:43:30 +0000 (15:43 -0400)]

sctp: fix a potential overflow in sctp_ifwdtsn_skip

Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only
checks the pos against the end of the chunk. However, the data left for
the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference
it as struct sctp_ifwdtsn_skip may cause coverflow.

This patch fixes it by checking the pos against "the end of the chunk -
sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to
sctp_fwdtsn_skip.

Fixes: 0fc2ea922c8a ("sctp: implement validate_ftsn for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.1681155810.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Daniel Vetter [Thu, 13 Apr 2023 07:57:19 +0000 (09:57 +0200)]

Merge tag 'amd-drm-fixes-6.3-2023-04-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.3-2023-04-12:

amdgpu:
- SMU13 fixes
- DP MST fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230412215637.7881-1-alexander.deucher@amd.com

commit | commitdiff | tree

Ziyang Xuan [Mon, 10 Apr 2023 01:23:52 +0000 (09:23 +0800)]

net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()

Syzbot reported a bug as following:

=====================================================
BUG: KMSAN: uninit-value in qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
qrtr_endpoint_post+0xf85/0x11b0 net/qrtr/af_qrtr.c:519
qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
call_write_iter include/linux/fs.h:2189 [inline]
aio_write+0x63a/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Uninit was created at:
slab_post_alloc_hook mm/slab.h:766 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
kmalloc_reserve net/core/skbuff.c:492 [inline]
__alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
__netdev_alloc_skb+0x120/0x7d0 net/core/skbuff.c:630
qrtr_endpoint_post+0xbd/0x11b0 net/qrtr/af_qrtr.c:446
qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
call_write_iter include/linux/fs.h:2189 [inline]
aio_write+0x63a/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

It is because that skb->len requires at least sizeof(struct qrtr_ctrl_pkt)
in qrtr_tx_resume(). And skb->len equals to size in qrtr_endpoint_post().
But size is less than sizeof(struct qrtr_ctrl_pkt) when qrtr_cb->type
equals to QRTR_TYPE_RESUME_TX in qrtr_endpoint_post() under the syzbot
scenario. This triggers the uninit variable access bug.

Add size check when qrtr_cb->type equals to QRTR_TYPE_RESUME_TX in
qrtr_endpoint_post() to fix the bug.

Fixes: 5fdeb0d372ab ("net: qrtr: Implement outgoing flow control")
Reported-by: syzbot+4436c9630a45820fda76@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=c14607f0963d27d5a3d5f4c8639b500909e43540
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230410012352.3997823-1-william.xuanziyang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Stefan Binding [Wed, 12 Apr 2023 16:05:31 +0000 (17:05 +0100)]

ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2

These Lenovo laptops use Realtek HDA codec combined with
2xCS35L41 Amplifiers using I2C with External Boost.

Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230412160531.182007-1-sbinding@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Martin Willi [Tue, 11 Apr 2023 07:43:19 +0000 (09:43 +0200)]

rtnetlink: Restore RTM_NEW/DELLINK notification behavior

The commits referenced below allows userspace to use the NLM_F_ECHO flag
for RTM_NEW/DELLINK operations to receive unicast notifications for the
affected link. Prior to these changes, applications may have relied on
multicast notifications to learn the same information without specifying
the NLM_F_ECHO flag.

For such applications, the mentioned commits changed the behavior for
requests not using NLM_F_ECHO. Multicast notifications are still received,
but now use the portid of the requester and the sequence number of the
request instead of zero values used previously. For the application, this
message may be unexpected and likely handled as a response to the
NLM_F_ACKed request, especially if it uses the same socket to handle
requests and notifications.

To fix existing applications relying on the old notification behavior,
set the portid and sequence number in the notification only if the
request included the NLM_F_ECHO flag. This restores the old behavior
for applications not using it, but allows unicasted notifications for
others.

Fixes: f3a63cce1b4f ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link")
Fixes: d88e136cab37 ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create")
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230411074319.24133-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 00:26:00 +0000 (17:26 -0700)]

Merge tag 'for-linus-2023041201' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- kernel panic fix for intel-ish-hid driver (Tanu Malhotra)

- buffer overflow fix in hid-sensor-custom driver (Todd Brandt)

- two device specific quirks (Alessandro Manca, Philippe Troin)

* tag 'for-linus-2023041201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: intel-ish-hid: Fix kernel panic during warm reset
  HID: hid-sensor-custom: Fix buffer overrun in device name
  HID: topre: Add support for 87 keys Realforce R2
  HID: add HP 13t-aw100 & 14t-ea100 digitizer battery quirks

commit | commitdiff | tree

Linus Torvalds [Thu, 13 Apr 2023 00:20:55 +0000 (17:20 -0700)]

Merge tag 'dmaengine-fix-6.3' of git://git./linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
"A couple of fixes in apple driver, core and kernedoc fix for dmaengine
  subsystem:

   - apple admac driver fixes for current_tx, src_addr_widths and
     global' interrupt flags handling

   - xdma kerneldoc fix

   - core fix for use of devm_add_action_or_reset"

* tag 'dmaengine-fix-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: apple-admac: Fix 'current_tx' not getting freed
  dmaengine: apple-admac: Set src_addr_widths capability
  dmaengine: apple-admac: Handle 'global' interrupt flags
  dmaengine: xilinx: xdma: Fix some kernel-doc comments
  dmaengine: Actually use devm_add_action_or_reset()

commit | commitdiff | tree

Evan Quan [Fri, 7 Apr 2023 09:12:15 +0000 (17:12 +0800)]

drm/amd/pm: correct the pcie link state check for SMU13

Update the driver implementations to fit those data exposed
by PMFW.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

commit | commitdiff | tree

Horatio Zhang [Thu, 6 Apr 2023 05:32:14 +0000 (13:32 +0800)]

drm/amd/pm: correct SMU13.0.7 max shader clock reporting

Correct the max shader clock reporting on SMU
13.0.7.

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

commit | commitdiff | tree

Horatio Zhang [Thu, 6 Apr 2023 03:17:38 +0000 (11:17 +0800)]

drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings

Correct the pstate standard/peak profiling mode clock
settings for SMU13.0.7.

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

commit | commitdiff | tree

Wayne Lin [Fri, 17 Feb 2023 05:26:56 +0000 (13:26 +0800)]

drm/amd/display: Pass the right info to drm_dp_remove_payload

[Why & How]
drm_dp_remove_payload() interface was changed. Correct amdgpu dm code
to pass the right parameter to the drm helper function.

Reviewed-by: Jerry Zuo <Jerry.Zuo@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

commit | commitdiff | tree

Waiman Long [Tue, 11 Apr 2023 13:36:00 +0000 (09:36 -0400)]

cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpuset

It is found that attaching a task to the top_cpuset does not currently
ignore CPUs allocated to subpartitions in cpuset_attach_task(). So the
code is changed to fix that.

Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Waiman Long [Tue, 11 Apr 2023 13:35:59 +0000 (09:35 -0400)]

cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods

In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.

We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.

Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Waiman Long [Tue, 11 Apr 2023 13:35:58 +0000 (09:35 -0400)]

cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly

By default, the clone(2) syscall spawn a child process into the same
cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag
introduced by commit ef2c41cf38a7 ("clone3: allow spawning processes
into cgroups"), the child will be spawned into a different cgroup which
is somewhat similar to writing the child's tid into "cgroup.threads".

The current cpuset_fork() method does not properly handle the
CLONE_INTO_CGROUP case where the cpuset of the child may be different
from that of its parent. Update the cpuset_fork() method to treat the
CLONE_INTO_CGROUP case similar to cpuset_attach().

Since the newly cloned task has not been running yet, its actual
memory usage isn't known. So it is not necessary to make change to mm
in cpuset_fork().

Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups")
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Waiman Long [Tue, 11 Apr 2023 13:35:57 +0000 (09:35 -0400)]

cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()

After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.

Fixes: e44193d39e8d ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

Tetsuo Handa [Wed, 5 Apr 2023 13:15:32 +0000 (22:15 +0900)]

cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex

syzbot is reporting circular locking dependency between cpu_hotplug_lock
and freezer_mutex, for commit f5d39b020809 ("freezer,sched: Rewrite core
freezer logic") replaced atomic_inc() in freezer_apply_state() with
static_branch_inc() which holds cpu_hotplug_lock.

cpu_hotplug_lock => cgroup_threadgroup_rwsem => freezer_mutex

  cgroup_file_write() {
    cgroup_procs_write() {
      __cgroup_procs_write() {
        cgroup_procs_write_start() {
          cgroup_attach_lock() {
            cpus_read_lock() {
              percpu_down_read(&cpu_hotplug_lock);
            }
            percpu_down_write(&cgroup_threadgroup_rwsem);
          }
        }
        cgroup_attach_task() {
          cgroup_migrate() {
            cgroup_migrate_execute() {
              freezer_attach() {
                mutex_lock(&freezer_mutex);
                (...snipped...)
              }
            }
          }
        }
        (...snipped...)
      }
    }
  }

freezer_mutex => cpu_hotplug_lock

  cgroup_file_write() {
    freezer_write() {
      freezer_change_state() {
        mutex_lock(&freezer_mutex);
        freezer_apply_state() {
          static_branch_inc(&freezer_active) {
            static_key_slow_inc() {
              cpus_read_lock();
              static_key_slow_inc_cpuslocked();
              cpus_read_unlock();
            }
          }
        }
        mutex_unlock(&freezer_mutex);
      }
    }
  }

Swap locking order by moving cpus_read_lock() in freezer_apply_state()
to before mutex_lock(&freezer_mutex) in freezer_change_state().

Reported-by: syzbot <syzbot+c39682e86c9d84152f93@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=c39682e86c9d84152f93
Suggested-by: Hillf Danton <hdanton@sina.com>
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

commit | commitdiff | tree

David Howells [Wed, 12 Apr 2023 12:18:57 +0000 (13:18 +0100)]

netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVEC

Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the
size of the page to the part of the page extracted, not the remaining
amount of data in the extracted page array at that point.

This doesn't yet affect anything as cifs, the only current user, only
passes in non-user-backed iterators.

Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Vincent Guittot [Tue, 11 Apr 2023 09:06:11 +0000 (11:06 +0200)]

sched/fair: Fix imbalance overflow

When local group is fully busy but its average load is above system load,
computing the imbalance will overflow and local group is not the best
target for pulling this load.

Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
Reported-by: Tingjia Cao <tjcao980311@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingjia Cao <tjcao980311@gmail.com>
Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ89jQ@mail.gmail.com/T/

commit | commitdiff | tree

Maarten Lankhorst [Wed, 12 Apr 2023 10:01:32 +0000 (12:01 +0200)]

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

We were stuck on rc2, should at least attempt to track drm-fixes
slightly.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

commit | commitdiff | tree

Rob Herring [Mon, 10 Apr 2023 23:27:19 +0000 (18:27 -0500)]

net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes

TI CPSW uses of_platform_* functions which are declared in of_platform.h.
of_platform.h gets implicitly included by of_device.h, but that is going
to be removed soon. Nothing else depends on of_device.h so it can be
dropped. of_platform.h also implicitly includes platform_device.h, so
add an explicit include for it, too.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Harshit Mogalapalli [Sat, 8 Apr 2023 19:43:21 +0000 (12:43 -0700)]

net: wwan: iosm: Fix error handling path in ipc_pcie_probe()

Smatch reports:
drivers/net/wwan/iosm/iosm_ipc_pcie.c:298 ipc_pcie_probe()
warn: missing unwind goto?

When dma_set_mask fails it directly returns without disabling pci
device and freeing ipc_pcie. Fix this my calling a correct goto label

As dma_set_mask returns either 0 or -EIO, we can use a goto label, as
it finally returns -EIO.

Add a set_mask_fail goto label which stands consistent with other goto
labels in this function..

Fixes: 035e3befc191 ("net: wwan: iosm: fix driver not working with INTEL_IOMMU disabled")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Domain: System / Kernel;

RSS Atom