platform/kernel/linux-starfive.git
15 months agovfio: Move device_del() before waiting for the last vfio_device registration refcount
Yi Liu [Tue, 18 Jul 2023 13:55:42 +0000 (06:55 -0700)]
vfio: Move device_del() before waiting for the last vfio_device registration refcount

device_del() destroys the vfio-dev/vfioX under the sysfs for vfio_device.
There is no reason to keep it while the device is going to be unregistered.

This movement is also a preparation for adding vfio_device cdev. Kernel
should remove the cdev node of the vfio_device to avoid new registration
refcount increment while the device is going to be unregistered.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-18-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Move vfio_device_group_unregister() to be the first operation in unregister
Yi Liu [Tue, 18 Jul 2023 13:55:41 +0000 (06:55 -0700)]
vfio: Move vfio_device_group_unregister() to be the first operation in unregister

This avoids endless vfio_device refcount increment by userspace, which
would keep blocking the vfio_unregister_group_dev().

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-17-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio-iommufd: Add detach_ioas support for emulated VFIO devices
Yi Liu [Tue, 18 Jul 2023 13:55:40 +0000 (06:55 -0700)]
vfio-iommufd: Add detach_ioas support for emulated VFIO devices

This prepares for adding DETACH ioctl for emulated VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-16-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agoiommufd/device: Add iommufd_access_detach() API
Nicolin Chen [Tue, 18 Jul 2023 13:55:39 +0000 (06:55 -0700)]
iommufd/device: Add iommufd_access_detach() API

Previously, the detach routine is only done by the destroy(). And it was
called by vfio_iommufd_emulated_unbind() when the device runs close(), so
all the mappings in iopt were cleaned in that setup, when the call trace
reaches this detach() routine.

Now, there's a need of a detach uAPI, meaning that it does not only need
a new iommufd_access_detach() API, but also requires access->ops->unmap()
call as a cleanup. So add one.

However, leaving that unprotected can introduce some potential of a race
condition during the pin_/unpin_pages() call, where access->ioas->iopt is
getting referenced. So, add an ioas_lock to protect the context of iopt
referencings.

Also, to allow the iommufd_access_unpin_pages() callback to happen via
this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
be affected by the "access->ioas = NULL" trick.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-15-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio-iommufd: Add detach_ioas support for physical VFIO devices
Yi Liu [Tue, 18 Jul 2023 13:55:38 +0000 (06:55 -0700)]
vfio-iommufd: Add detach_ioas support for physical VFIO devices

This prepares for adding DETACH ioctl for physical VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-14-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Record devid in vfio_device_file
Yi Liu [Tue, 18 Jul 2023 13:55:37 +0000 (06:55 -0700)]
vfio: Record devid in vfio_device_file

.bind_iommufd() will generate an ID to represent this bond, which is
needed by userspace for further usage. Store devid in vfio_device_file
to avoid passing the pointer in multiple places.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-13-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio-iommufd: Split bind/attach into two steps
Yi Liu [Tue, 18 Jul 2023 13:55:36 +0000 (06:55 -0700)]
vfio-iommufd: Split bind/attach into two steps

This aligns the bind/attach logic with the coming vfio device cdev support.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-12-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
Yi Liu [Tue, 18 Jul 2023 13:55:35 +0000 (06:55 -0700)]
vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()

This moves the noiommu compat validation logic into vfio_df_group_open().
This is more consistent with what will be done in vfio device cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-11-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Make vfio_df_open() single open for device cdev path
Yi Liu [Tue, 18 Jul 2023 13:55:34 +0000 (06:55 -0700)]
vfio: Make vfio_df_open() single open for device cdev path

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability of opening
a device multiple times. Given old group path still allows it we store
a vfio_group pointer in struct vfio_device_file to differentiate.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-10-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Add cdev_device_open_cnt to vfio_group
Yi Liu [Tue, 18 Jul 2023 13:55:33 +0000 (06:55 -0700)]
vfio: Add cdev_device_open_cnt to vfio_group

This is for counting the devices that are opened via the cdev path. This
count is increased and decreased by the cdev path. The group path checks
it to achieve exclusion with the cdev path. With this, only one path
(group path or cdev path) will claim DMA ownership. This avoids scenarios
in which devices within the same group may be opened via different paths.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-9-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Block device access via device fd until device is opened
Yi Liu [Tue, 18 Jul 2023 13:55:32 +0000 (06:55 -0700)]
vfio: Block device access via device fd until device is opened

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-8-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Pass struct vfio_device_file * to vfio_device_open/close()
Yi Liu [Tue, 18 Jul 2023 13:55:31 +0000 (06:55 -0700)]
vfio: Pass struct vfio_device_file * to vfio_device_open/close()

This avoids passing too much parameters in multiple functions. Per the
input parameter change, rename the function to be vfio_df_open/close().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-7-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agokvm/vfio: Accept vfio device file from userspace
Yi Liu [Tue, 18 Jul 2023 13:55:30 +0000 (06:55 -0700)]
kvm/vfio: Accept vfio device file from userspace

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-6-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agokvm/vfio: Prepare for accepting vfio device fd
Yi Liu [Tue, 18 Jul 2023 13:55:29 +0000 (06:55 -0700)]
kvm/vfio: Prepare for accepting vfio device fd

This renames kvm_vfio_group related helpers to prepare for accepting
vfio device fd. No functional change is intended.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-5-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Accept vfio device file in the KVM facing kAPI
Yi Liu [Tue, 18 Jul 2023 13:55:28 +0000 (06:55 -0700)]
vfio: Accept vfio device file in the KVM facing kAPI

This makes the vfio file kAPIs to accept vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-4-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Refine vfio file kAPIs for KVM
Yi Liu [Tue, 18 Jul 2023 13:55:27 +0000 (06:55 -0700)]
vfio: Refine vfio file kAPIs for KVM

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-3-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Allocate per device file structure
Yi Liu [Tue, 18 Jul 2023 13:55:26 +0000 (06:55 -0700)]
vfio: Allocate per device file structure

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) A per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd.
2) A mechanism to block device access through device cdev fd before it
   is bound to an iommufd.

To address the above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-2-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
Yi Liu [Tue, 18 Jul 2023 10:55:42 +0000 (03:55 -0700)]
vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

This is the way user to invoke hot-reset for the devices opened by cdev
interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
hot-reset for cdev devices.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-11-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio/pci: Copy hot-reset device info to userspace in the devices loop
Yi Liu [Tue, 18 Jul 2023 10:55:41 +0000 (03:55 -0700)]
vfio/pci: Copy hot-reset device info to userspace in the devices loop

This copies the vfio_pci_dependent_device to userspace during looping each
affected device for reporting vfio_pci_hot_reset_info. This avoids counting
the affected devices and allocating a potential large buffer to store the
vfio_pci_dependent_device of all the affected devices before copying them
to userspace.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230718105542.4138-10-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
Yi Liu [Tue, 18 Jul 2023 10:55:40 +0000 (03:55 -0700)]
vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
of the cdev device to check the ownership of the other affected devices.

When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
the values returned are IOMMUFD devids rather than group IDs as used when
accessing vfio devices through the conventional vfio group interface.
Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
in this mode if all of the devices affected by the hot-reset are owned by
either virtue of being directly bound to the same iommufd context as the
calling device, or implicitly owned via a shared IOMMU group.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-9-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Add helper to search vfio_device in a dev_set
Yi Liu [Tue, 18 Jul 2023 10:55:39 +0000 (03:55 -0700)]
vfio: Add helper to search vfio_device in a dev_set

There are drivers that need to search vfio_device within a given dev_set.
e.g. vfio-pci. So add a helper.

vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e
("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where
it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb().
However, it makes more sense to return -ENODEV.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-8-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio: Mark cdev usage in vfio_device
Yi Liu [Tue, 18 Jul 2023 10:55:38 +0000 (03:55 -0700)]
vfio: Mark cdev usage in vfio_device

This can be used to differentiate whether to report group_id or devid in
the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
cdev path yet, so the vfio_device_cdev_opened() helper always returns false.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-7-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agoiommufd: Add helper to retrieve iommufd_ctx and devid
Yi Liu [Tue, 18 Jul 2023 10:55:37 +0000 (03:55 -0700)]
iommufd: Add helper to retrieve iommufd_ctx and devid

This is needed by the vfio-pci driver to report affected devices in the
hot-reset for a given device.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-6-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agoiommufd: Add iommufd_ctx_has_group()
Yi Liu [Tue, 18 Jul 2023 10:55:36 +0000 (03:55 -0700)]
iommufd: Add iommufd_ctx_has_group()

This adds the helper to check if any device within the given iommu_group
has been bound with the iommufd_ctx. This is helpful for the checking on
device ownership for the devices which have not been bound but cannot be
bound to any other iommufd_ctx as the iommu_group has been bound.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-5-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agoiommufd: Reserve all negative IDs in the iommufd xarray
Yi Liu [Tue, 18 Jul 2023 10:55:35 +0000 (03:55 -0700)]
iommufd: Reserve all negative IDs in the iommufd xarray

With this reservation, IOMMUFD users can encode the negative IDs for
specific purposes. e.g. VFIO needs two reserved values to tell userspace
the ID returned is not valid but has other meaning.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-4-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio/pci: Move the existing hot reset logic to be a helper
Yi Liu [Tue, 18 Jul 2023 10:55:34 +0000 (03:55 -0700)]
vfio/pci: Move the existing hot reset logic to be a helper

This prepares to add another method for hot reset. The major hot reset logic
are moved to vfio_pci_ioctl_pci_hot_reset_groups().

No functional change is intended.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-3-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agovfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
Yi Liu [Tue, 18 Jul 2023 10:55:33 +0000 (03:55 -0700)]
vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()

This suits more on what the code does.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718105542.4138-2-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
15 months agoLinux 6.5-rc3
Linus Torvalds [Sun, 23 Jul 2023 22:24:10 +0000 (15:24 -0700)]
Linux 6.5-rc3

15 months agoMerge tag 'trace-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sun, 23 Jul 2023 22:19:14 +0000 (15:19 -0700)]
Merge tag 'trace-v6.5-rc2' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Swapping the ring buffer for snapshotting (for things like irqsoff)
   can crash if the ring buffer is being resized. Disable swapping when
   this happens. The missed swap will be reported to the tracer

 - Report error if the histogram fails to be created due to an error in
   adding a histogram variable, in event_hist_trigger_parse()

 - Remove unused declaration of tracing_map_set_field_descr()

* tag 'trace-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/histograms: Return an error if we fail to add histogram to hist_vars list
  ring-buffer: Do not swap cpu_buffer during resize process
  tracing: Remove unused extern declaration tracing_map_set_field_descr()

15 months agoMerge tag 'kbuild-fixes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahi...
Linus Torvalds [Sun, 23 Jul 2023 21:55:41 +0000 (14:55 -0700)]
Merge tag 'kbuild-fixes-v6.5' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix stale help text in gconfig

 - Support *.S files in compile_commands.json

 - Flatten KBUILD_CFLAGS

 - Fix external module builds with Rust so that temporary files are
   created in the modules directories instead of the kernel tree

* tag 'kbuild-fixes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: rust: avoid creating temporary files
  kbuild: flatten KBUILD_CFLAGS
  gen_compile_commands: add assembly files to compilation database
  kconfig: gconfig: correct program name in help text
  kconfig: gconfig: drop the Show Debug Info help text

15 months agokbuild: rust: avoid creating temporary files
Miguel Ojeda [Sun, 23 Jul 2023 14:21:28 +0000 (16:21 +0200)]
kbuild: rust: avoid creating temporary files

`rustc` outputs by default the temporary files (i.e. the ones saved
by `-Csave-temps`, such as `*.rcgu*` files) in the current working
directory when `-o` and `--out-dir` are not given (even if
`--emit=x=path` is given, i.e. it does not use those for temporaries).

Since out-of-tree modules are compiled from the `linux` tree,
`rustc` then tries to create them there, which may not be accessible.

Thus pass `--out-dir` explicitly, even if it is just for the temporary
files.

Similarly, do so for Rust host programs too.

Reported-by: Raphael Nestler <raphael.nestler@gmail.com>
Closes: https://github.com/Rust-for-Linux/linux/issues/1015
Reported-by: Andrea Righi <andrea.righi@canonical.com>
Tested-by: Raphael Nestler <raphael.nestler@gmail.com> # non-hostprogs
Tested-by: Andrea Righi <andrea.righi@canonical.com> # non-hostprogs
Fixes: 295d8398c67e ("kbuild: specify output names separately for each emission type from rustc")
Cc: stable@vger.kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 23 Jul 2023 17:44:38 +0000 (10:44 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Avoid pKVM finalization if KVM initialization fails

   - Add missing BTI instructions in the hypervisor, fixing an early
     boot failure on BTI systems

   - Handle MMU notifiers correctly for non hugepage-aligned memslots

   - Work around a bug in the architecture where hypervisor timer
     controls have UNKNOWN behavior under nested virt

   - Disable preemption in kvm_arch_hardware_enable(), fixing a kernel
     BUG in cpu hotplug resulting from per-CPU accessor sanity checking

   - Make WFI emulation on GICv4 systems robust w.r.t. preemption,
     consistently requesting a doorbell interrupt on vcpu_put()

   - Uphold RES0 sysreg behavior when emulating older PMU versions

   - Avoid macro expansion when initializing PMU register names,
     ensuring the tracepoints pretty-print the sysreg

  s390:

   - Two fixes for asynchronous destroy

  x86 fixes will come early next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: pv: fix index value of replaced ASCE
  KVM: s390: pv: simplify shutdown and fix race
  KVM: arm64: Fix the name of sys_reg_desc related to PMU
  KVM: arm64: Correctly handle RES0 bits PMEVTYPER<n>_EL0.evtCount
  KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption
  KVM: arm64: Add missing BTI instructions
  KVM: arm64: Correctly handle page aging notifiers for unaligned memslot
  KVM: arm64: Disable preemption in kvm_arch_hardware_enable()
  KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm
  KVM: arm64: timers: Use CNTHCTL_EL2 when setting non-CNTKCTL_EL1 bits

15 months agoMerge tag 'ext4_for_linus-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 23 Jul 2023 17:21:49 +0000 (10:21 -0700)]
Merge tag 'ext4_for_linus-6.5-rc3' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug and regression fixes for 6.5-rc3 for ext4's mballoc and jbd2's
  checkpoint code"

* tag 'ext4_for_linus-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix rbtree traversal bug in ext4_mb_use_preallocated
  ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()
  ext4: correct inline offset when handling xattrs in inode body
  jbd2: remove __journal_try_to_free_buffer()
  jbd2: fix a race when checking checkpoint buffer busy
  jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  jbd2: remove journal_clean_one_cp_list()
  jbd2: remove t_checkpoint_io_list
  jbd2: recheck chechpointing non-dirty buffer

15 months agoMerge tag '6.5-rc2-smb3-client-fixes-ver2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 23 Jul 2023 17:16:44 +0000 (10:16 -0700)]
Merge tag '6.5-rc2-smb3-client-fixes-ver2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
 "Add minor debugging improvement.

  The change improves ability to read a network trace to debug problems
  on encrypted connections which are very common (e.g. using wireshark
  or tcpdump).

  That works today with tools like 'smbinfo keys /mnt/file' but requires
  passing in a filename on the mount (see e.g. [1]), but it often makes
  more sense to just pass in the mount point path (ie a directory not a
  filename).

  So this fix was needed to debug some types of problems (an obvious
  example is on an encrypted connection failing operations on an empty
  share or with no files in the root of the directory) - so you can
  simply pass in the 'smbinfo keys <mntpoint>' and get the information
  that wireshark needs"

Link: https://wiki.samba.org/index.php/Wireshark_Decryption
* tag '6.5-rc2-smb3-client-fixes-ver2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number for cifs.ko
  cifs: allow dumping keys for directories too

15 months agoMerge tag 'kvm-s390-master-6.5-1' of https://git.kernel.org/pub/scm/linux/kernel...
Paolo Bonzini [Sun, 23 Jul 2023 16:50:30 +0000 (12:50 -0400)]
Merge tag 'kvm-s390-master-6.5-1' of https://git./linux/kernel/git/kvms390/linux into HEAD

Two fixes for asynchronous destroy

15 months agoMerge tag 'kvmarm-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmar...
Paolo Bonzini [Sun, 23 Jul 2023 16:50:14 +0000 (12:50 -0400)]
Merge tag 'kvmarm-fixes-6.5-1' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.5, part #1

 - Avoid pKVM finalization if KVM initialization fails

 - Add missing BTI instructions in the hypervisor, fixing an early boot
   failure on BTI systems

 - Handle MMU notifiers correctly for non hugepage-aligned memslots

 - Work around a bug in the architecture where hypervisor timer controls
   have UNKNOWN behavior under nested virt.

 - Disable preemption in kvm_arch_hardware_enable(), fixing a kernel BUG
   in cpu hotplug resulting from per-CPU accessor sanity checking.

 - Make WFI emulation on GICv4 systems robust w.r.t. preemption,
   consistently requesting a doorbell interrupt on vcpu_put()

 - Uphold RES0 sysreg behavior when emulating older PMU versions

 - Avoid macro expansion when initializing PMU register names, ensuring
   the tracepoints pretty-print the sysreg.

15 months agotracing/histograms: Return an error if we fail to add histogram to hist_vars list
Mohamed Khalfella [Fri, 14 Jul 2023 20:33:41 +0000 (20:33 +0000)]
tracing/histograms: Return an error if we fail to add histogram to hist_vars list

Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if
they have referenced variables") added a check to fail histogram creation
if save_hist_vars() failed to add histogram to hist_vars list. But the
commit failed to set ret to failed return code before jumping to
unregister histogram, fix it.

Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfella@purestorage.com
Cc: stable@vger.kernel.org
Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
15 months agoring-buffer: Do not swap cpu_buffer during resize process
Chen Lin [Wed, 19 Jul 2023 07:58:47 +0000 (15:58 +0800)]
ring-buffer: Do not swap cpu_buffer during resize process

When ring_buffer_swap_cpu was called during resize process,
the cpu buffer was swapped in the middle, resulting in incorrect state.
Continuing to run in the wrong state will result in oops.

This issue can be easily reproduced using the following two scripts:
/tmp # cat test1.sh
//#! /bin/sh
for i in `seq 0 100000`
do
         echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
         echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
done
/tmp # cat test2.sh
//#! /bin/sh
for i in `seq 0 100000`
do
        echo irqsoff > /sys/kernel/debug/tracing/current_tracer
        sleep 1
        echo nop > /sys/kernel/debug/tracing/current_tracer
        sleep 1
done
/tmp # ./test1.sh &
/tmp # ./test2.sh &

A typical oops log is as follows, sometimes with other different oops logs.

[  231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
[  231.713375] Modules linked in:
[  231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
[  231.716750] Hardware name: linux,dummy-virt (DT)
[  231.718152] Workqueue: events update_pages_handler
[  231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  231.721171] pc : rb_update_pages+0x378/0x3f8
[  231.722212] lr : rb_update_pages+0x25c/0x3f8
[  231.723248] sp : ffff800082b9bd50
[  231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
[  231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
[  231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
[  231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
[  231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
[  231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
[  231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
[  231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
[  231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
[  231.744196] Call trace:
[  231.744892]  rb_update_pages+0x378/0x3f8
[  231.745893]  update_pages_handler+0x1c/0x38
[  231.746893]  process_one_work+0x1f0/0x468
[  231.747852]  worker_thread+0x54/0x410
[  231.748737]  kthread+0x124/0x138
[  231.749549]  ret_from_fork+0x10/0x20
[  231.750434] ---[ end trace 0000000000000000 ]---
[  233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  233.721696] Mem abort info:
[  233.721935]   ESR = 0x0000000096000004
[  233.722283]   EC = 0x25: DABT (current EL), IL = 32 bits
[  233.722596]   SET = 0, FnV = 0
[  233.722805]   EA = 0, S1PTW = 0
[  233.723026]   FSC = 0x04: level 0 translation fault
[  233.723458] Data abort info:
[  233.723734]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  233.724176]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  233.724589]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
[  233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  233.726720] Modules linked in:
[  233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
[  233.727777] Hardware name: linux,dummy-virt (DT)
[  233.728225] Workqueue: events update_pages_handler
[  233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  233.729054] pc : rb_update_pages+0x1a8/0x3f8
[  233.729334] lr : rb_update_pages+0x154/0x3f8
[  233.729592] sp : ffff800082b9bd50
[  233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
[  233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
[  233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
[  233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
[  233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
[  233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
[  233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
[  233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
[  233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
[  233.734418] Call trace:
[  233.734593]  rb_update_pages+0x1a8/0x3f8
[  233.734853]  update_pages_handler+0x1c/0x38
[  233.735148]  process_one_work+0x1f0/0x468
[  233.735525]  worker_thread+0x54/0x410
[  233.735852]  kthread+0x124/0x138
[  233.736064]  ret_from_fork+0x10/0x20
[  233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
[  233.736959] ---[ end trace 0000000000000000 ]---

After analysis, the seq of the error is as follows [1-5]:

int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
int cpu_id)
{
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
//1. get cpu_buffer, aka cpu_buffer(A)
...
...
schedule_work_on(cpu,
 &cpu_buffer->update_pages_work);
//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
// update_pages_handler, do the update process, set 'update_done' in
// complete(&cpu_buffer->update_done) and to wakeup resize process.
//---->
//3. Just at this moment, ring_buffer_swap_cpu is triggered,
//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
//ring_buffer_swap_cpu is called as the 'Call trace' below.

Call trace:
 dump_backtrace+0x0/0x2f8
 show_stack+0x18/0x28
 dump_stack+0x12c/0x188
 ring_buffer_swap_cpu+0x2f8/0x328
 update_max_tr_single+0x180/0x210
 check_critical_timing+0x2b4/0x2c8
 tracer_hardirqs_on+0x1c0/0x200
 trace_hardirqs_on+0xec/0x378
 el0_svc_common+0x64/0x260
 do_el0_svc+0x90/0xf8
 el0_svc+0x20/0x30
 el0_sync_handler+0xb0/0xb8
 el0_sync+0x180/0x1c0
//<----

/* wait for all the updates to complete */
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
//for example, cpu_buffer(A)->update_done will leave be set 1, and will
//not 'wait_for_completion' at the next resize round.
  if (!cpu_buffer->nr_pages_to_update)
continue;

if (cpu_online(cpu))
wait_for_completion(&cpu_buffer->update_done);
cpu_buffer->nr_pages_to_update = 0;
}
...
}
//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
//Continuing to run in the wrong state, then oops occurs.

Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn
Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
15 months agotracing: Remove unused extern declaration tracing_map_set_field_descr()
YueHaibing [Sat, 22 Jul 2023 03:21:23 +0000 (11:21 +0800)]
tracing: Remove unused extern declaration tracing_map_set_field_descr()

Since commit 08d43a5fa063 ("tracing: Add lock-free tracing_map"),
this is never used, so can be removed.

Link: https://lore.kernel.org/linux-trace-kernel/20230722032123.24664-1-yuehaibing@huawei.com
Cc: <mhiramat@kernel.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
15 months agokbuild: flatten KBUILD_CFLAGS
Alexey Dobriyan [Thu, 13 Jul 2023 18:52:28 +0000 (21:52 +0300)]
kbuild: flatten KBUILD_CFLAGS

Make it slightly easier to see which compiler options are added and
removed (and not worry about column limit too!).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agogen_compile_commands: add assembly files to compilation database
Benjamin Gray [Wed, 19 Jul 2023 03:19:12 +0000 (13:19 +1000)]
gen_compile_commands: add assembly files to compilation database

Like C source files, tooling can find it useful to have the assembly
source file compilation recorded.

The .S extension appears to used across all architectures.

Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
15 months agoext4: fix rbtree traversal bug in ext4_mb_use_preallocated
Ojaswin Mujoo [Sat, 22 Jul 2023 17:15:24 +0000 (22:45 +0530)]
ext4: fix rbtree traversal bug in ext4_mb_use_preallocated

During allocations, while looking for preallocations(PA) in the per
inode rbtree, we can't do a direct traversal of the tree because
ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted
and that can cause direct traversal to skip some entries. This was
leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy
our request and ultimately tried to create a new PA that would overlap
with the missed one.

To makes sure we handle that case while still keeping the performance of
the rbtree, we make use of the fact that the only pa that could possibly
overlap the original goal start is the one that satisfies the below
conditions:

  1. It must have it's logical start immediately to the left of
  (ie less than) original logical start.

  2. It must not be deleted

To find this pa we use the following traversal method:

1. Descend into the rbtree normally to find the immediate neighboring
PA. Here we keep descending irrespective of if the PA is deleted or if
it overlaps with our request etc. The goal is to find an immediately
adjacent PA.

2. If the found PA is on right of original goal, use rb_prev() to find
the left adjacent PA.

3. Check if this PA is deleted and keep moving left with rb_prev() until
a non deleted PA is found.

4. This is the PA we are looking for. Now we can check if it can satisfy
the original request and proceed accordingly.

This approach also takes care of having deleted PAs in the tree.

(While we are at it, also fix a possible overflow bug in calculating the
end of a PA)

[1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+tKRFaEQHtt8Q@mail.gmail.com/

Cc: stable@kernel.org # 6.4
Fixes: 3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
Tested-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.1690045963.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
15 months agoext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()
Ojaswin Mujoo [Fri, 9 Jun 2023 10:34:03 +0000 (16:04 +0530)]
ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()

In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).

After all the processing:

order = 1 order below goal len
min_order = maximum of the three:-
             - order - trim_order
             - 1 order below B2C(s_stripe)
             - 1 order above original len

Cc: stable@kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
15 months agoext4: correct inline offset when handling xattrs in inode body
Eric Whitney [Mon, 22 May 2023 18:15:20 +0000 (14:15 -0400)]
ext4: correct inline offset when handling xattrs in inode body

When run on a file system where the inline_data feature has been
enabled, xfstests generic/269, generic/270, and generic/476 cause ext4
to emit error messages indicating that inline directory entries are
corrupted.  This occurs because the inline offset used to locate
inline directory entries in the inode body is not updated when an
xattr in that shared region is deleted and the region is shifted in
memory to recover the space it occupied.  If the deleted xattr precedes
the system.data attribute, which points to the inline directory entries,
that attribute will be moved further up in the region.  The inline
offset continues to point to whatever is located in system.data's former
location, with unfortunate effects when used to access directory entries
or (presumably) inline data in the inode body.

Cc: stable@kernel.org
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20230522181520.1570360-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
15 months agoMerge tag 'powerpc-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 23 Jul 2023 02:32:00 +0000 (19:32 -0700)]
Merge tag 'powerpc-6.5-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Reinstate support for little endian ELFv1 binaries, which it turns
   out still exist in the wild.

 - Revert a change which used asm goto for WARN_ON/__WARN_FLAGS, as it
   lead to dead code generation and seemed to trigger compiler bugs in
   some edge cases.

 - Fix a deadlock in the pseries VAS code, between live migration and
   the driver's mmap handler.

 - Disable KCOV instrumentation in the powerpc KASAN code.

Thanks to Andrew Donnellan, Benjamin Gray, Christophe Leroy, Haren
Myneni, Russell Currey, and Uwe Kleine-König.

* tag 'powerpc-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"
  powerpc/kasan: Disable KCOV in KASAN code
  powerpc/512x: lpbfifo: Convert to platform remove callback returning void
  powerpc/crypto: Add gitignore for generated P10 AES/GCM .S files
  Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto"
  powerpc/pseries/vas: Hold mmap_mutex after mmap lock during window close

15 months agocifs: update internal module version number for cifs.ko
Steve French [Thu, 20 Jul 2023 13:30:32 +0000 (08:30 -0500)]
cifs: update internal module version number for cifs.ko

From 2.43 to 2.44

Signed-off-by: Steve French <stfrench@microsoft.com>
15 months agocifs: allow dumping keys for directories too
Shyam Prasad N [Fri, 16 Jun 2023 10:37:46 +0000 (10:37 +0000)]
cifs: allow dumping keys for directories too

Dumping the enc/dec keys is a session wide operation.
And it should not matter if the ioctl was run on
a regular file or a directory.

Currently, we obtain the tcon pointer from the
cifs file handle. But since there's no dir open call
in cifs, this is not populated for dirs.

This change allows dumping of session keys using ioctl
even for directories. To do this, we'll now get the
tcon pointer from the superblock, and not from the file
handle.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
15 months agoMerge tag 's390-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 22 Jul 2023 18:24:03 +0000 (11:24 -0700)]
Merge tag 's390-6.5-3' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Fix per vma lock fault handling: add missing !(fault & VM_FAULT_ERROR)
   check to fault handler to prevent error handling for return values
   that don't indicate an error

 - Use kfree_sensitive() instead of kfree() in paes crypto code to clear
   memory that may contain keys before freeing it

 - Fix reply buffer size calculation for CCA replies in zcrypt device
   driver

* tag 's390-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: fix reply buffer calculations for CCA replies
  s390/crypto: use kfree_sensitive() instead of kfree()
  s390/mm: fix per vma lock fault handling

15 months agoMerge tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux
Linus Torvalds [Sat, 22 Jul 2023 18:05:15 +0000 (11:05 -0700)]
Merge tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Fix for loop regressions (Mauricio)

 - Fix a potential stall with batched wakeups in sbitmap (David)

 - Fix for stall with recursive plug flushes (Ross)

 - Skip accounting of empty requests for blk-iocost (Chengming)

 - Remove a dead field in struct blk_mq_hw_ctx (Chengming)

* tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux:
  loop: do not enforce max_loop hard limit by (new) default
  loop: deprecate autoloading callback loop_probe()
  sbitmap: fix batching wakeup
  blk-iocost: skip empty flush bio in iocost
  blk-mq: delete dead struct blk_mq_hw_ctx->queued field
  blk-mq: Fix stall due to recursive flush plug

15 months agoMerge tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux
Linus Torvalds [Sat, 22 Jul 2023 17:46:30 +0000 (10:46 -0700)]
Merge tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix for io-wq not always honoring REQ_F_NOWAIT, if it was set and
   punted directly (eg via DRAIN) (me)

 - Capability check fix (Ondrej)

 - Regression fix for the mmap changes that went into 6.4, which
   apparently broke IA64 (Helge)

* tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux:
  ia64: mmap: Consider pgoff when searching for free mapping
  io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()
  io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq
  io_uring: don't audit the capability check in io_uring_create()

15 months agoMerge tag 'devicetree-fixes-for-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 22 Jul 2023 17:28:22 +0000 (10:28 -0700)]
Merge tag 'devicetree-fixes-for-6.5-1' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Fix moortec,mr75203 schema usage of 'multipleOf' keyword

 - Fix regression in systems depending on "of-display" device name

 - Build fix for s390 with CONFIG_PCI=n and OF_EARLY_FLATTREE=y

 - Drop two obsolete serial .txt bindings

* tag 'devicetree-fixes-for-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: serial: Remove obsolete nxp,lpc1850-uart.txt
  dt-bindings: serial: Remove obsolete cavium-uart.txt
  dt-bindings: hwmon: moortec,mr75203: fix multipleOf for coefficients
  of: Preserve "of-display" device name for compatibility
  of: make OF_EARLY_FLATTREE depend on HAS_IOMEM

15 months agoMerge tag 'regmap-fix-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 22 Jul 2023 17:20:56 +0000 (10:20 -0700)]
Merge tag 'regmap-fix-v6.5-rc2' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "Three fixes here:

   - The issues with accounting for register and padding length on raw
     buses turn out to be quite widespread in custom buses.

     In order to avoid disturbing anything drop the initial fixes and
     fall back to a point fix in the SMBus code where the issue was
     originally noticed, a more substantial refactoring of the API which
     ensures that all buses make the same assumptions will follow.

   - The generic regcache code had been forcing on async I/O which did
     not work with the new maple tree sync code when used with SPI.

     Since that was mainly for the rbtree cache and the assumptions
     about hardware that drove the choice are probably not true any more
     fix this by pushing the enablement of async down into the rbtree
     code.

     This probably also makes cache syncs for systems faster though it's
     not the point.

   - The test code was triggering use of the rbtree and maple tree
     caches with dynamic allocation of nodes since all the testing is
     with RAM backed caches with no I/O performance issues.

     Just disable the locking in the tests to avoid triggering warnings
     when allocation debugging is turned on, it's not really what's
     being tested"

* tag 'regmap-fix-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: Disable locking for RBTREE and MAPLE unit tests
  regcache: Push async I/O request down into the rbtree cache
  regmap: Account for register length in SMBus I/O limits
  regmap: Drop initial version of maximum transfer length fixes

15 months agoMerge tag 'gpio-fixes-for-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 22 Jul 2023 17:14:04 +0000 (10:14 -0700)]
Merge tag 'gpio-fixes-for-v6.5-rc3' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix initial value handling for output-only pins in gpio-tps68470

 - fix two resource leaks in gpio-mvebu

* tag 'gpio-fixes-for-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: mvebu: fix irq domain leak
  gpio: mvebu: Make use of devm_pwmchip_add
  gpio: tps68470: Make tps68470_gpio_output() always set the initial value

15 months agodt-bindings: serial: Remove obsolete nxp,lpc1850-uart.txt
Rob Herring [Fri, 7 Jul 2023 22:16:06 +0000 (16:16 -0600)]
dt-bindings: serial: Remove obsolete nxp,lpc1850-uart.txt

nxp,lpc1850-uart.txt binding is already covered by 8250.yaml, so remove
it.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230707221607.1064888-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
15 months agodt-bindings: serial: Remove obsolete cavium-uart.txt
Rob Herring [Fri, 7 Jul 2023 22:16:02 +0000 (16:16 -0600)]
dt-bindings: serial: Remove obsolete cavium-uart.txt

cavium-uart.txt binding is already covered by 8250.yaml, so remove it.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230707221602.1063972-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
15 months agoloop: do not enforce max_loop hard limit by (new) default
Mauricio Faria de Oliveira [Thu, 20 Jul 2023 14:30:33 +0000 (11:30 -0300)]
loop: do not enforce max_loop hard limit by (new) default

Problem:

The max_loop parameter is used for 2 different purposes:

1) initial number of loop devices to pre-create on init
2) maximum number of loop devices to add on access/open()

Historically, its default value (zero) caused 1) to create non-zero
number of devices (CONFIG_BLK_DEV_LOOP_MIN_COUNT), and no hard limit on
2) to add devices with autoloading.

However, the default value changed in commit 85c50197716c ("loop: Fix
the max_loop commandline argument treatment when it is set to 0") to
CONFIG_BLK_DEV_LOOP_MIN_COUNT, for max_loop=0 not to pre-create devices.

That does improve 1), but unfortunately it breaks 2), as the default
behavior changed from no-limit to hard-limit.

Example:

For example, this userspace code broke for N >= CONFIG, if the user
relied on the default value 0 for max_loop:

    mknod("/dev/loopN");
    open("/dev/loopN");  // now fails with ENXIO

Though affected users may "fix" it with (loop.)max_loop=0, this means to
require a kernel parameter change on stable kernel update (that commit
Fixes: an old commit in stable).

Solution:

The original semantics for the default value in 2) can be applied if the
parameter is not set (ie, default behavior).

This still keeps the intended function in 1) and 2) if set, and that
commit's intended improvement in 1) if max_loop=0.

Before 85c50197716c:
  - default:     1) CONFIG devices   2) no limit
  - max_loop=0:  1) CONFIG devices   2) no limit
  - max_loop=X:  1) X devices        2) X limit

After 85c50197716c:
  - default:     1) CONFIG devices   2) CONFIG limit (*)
  - max_loop=0:  1) 0 devices (*)    2) no limit
  - max_loop=X:  1) X devices        2) X limit

This commit:
  - default:     1) CONFIG devices   2) no limit (*)
  - max_loop=0:  1) 0 devices        2) no limit
  - max_loop=X:  1) X devices        2) X limit

Future:

The issue/regression from that commit only affects code under the
CONFIG_BLOCK_LEGACY_AUTOLOAD deprecation guard, thus the fix too is
contained under it.

Once that deprecated functionality/code is removed, the purpose 2) of
max_loop (hard limit) is no longer in use, so the module parameter
description can be changed then.

Tests:

Linux 6.4-rc7
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLOCK_LEGACY_AUTOLOAD=y

- default (original)

# ls -1 /dev/loop*
/dev/loop-control
/dev/loop0
...
/dev/loop7

# ./test-loop
open: /dev/loop8: No such device or address

- default (patched)

# ls -1 /dev/loop*
/dev/loop-control
/dev/loop0
...
/dev/loop7

# ./test-loop
#

- max_loop=0 (original & patched):

# ls -1 /dev/loop*
/dev/loop-control

# ./test-loop
#

- max_loop=8 (original & patched):

# ls -1 /dev/loop*
/dev/loop-control
/dev/loop0
...
/dev/loop7

# ./test-loop
open: /dev/loop8: No such device or address

- max_loop=0 (patched; CONFIG_BLOCK_LEGACY_AUTOLOAD is not set)

# ls -1 /dev/loop*
/dev/loop-control

# ./test-loop
open: /dev/loop8: No such device or address

Fixes: 85c50197716c ("loop: Fix the max_loop commandline argument treatment when it is set to 0")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230720143033.841001-3-mfo@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoloop: deprecate autoloading callback loop_probe()
Mauricio Faria de Oliveira [Thu, 20 Jul 2023 14:30:32 +0000 (11:30 -0300)]
loop: deprecate autoloading callback loop_probe()

The 'probe' callback in __register_blkdev() is only used under the
CONFIG_BLOCK_LEGACY_AUTOLOAD deprecation guard.

The loop_probe() function is only used for that callback, so guard it
too, accordingly.

See commit fbdee71bb5d8 ("block: deprecate autoloading based on dev_t").

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230720143033.841001-2-mfo@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agosbitmap: fix batching wakeup
David Jeffery [Fri, 21 Jul 2023 09:57:15 +0000 (17:57 +0800)]
sbitmap: fix batching wakeup

Current code supposes that it is enough to provide forward progress by
just waking up one wait queue after one completion batch is done.

Unfortunately this way isn't enough, cause waiter can be added to wait
queue just after it is woken up.

Follows one example(64 depth, wake_batch is 8)

1) all 64 tags are active

2) in each wait queue, there is only one single waiter

3) each time one completion batch(8 completions) wakes up just one
   waiter in each wait queue, then immediately one new sleeper is added
   to this wait queue

4) after 64 completions, 8 waiters are wakeup, and there are still 8
   waiters in each wait queue

5) after another 8 active tags are completed, only one waiter can be
   wakeup, and the other 7 can't be waken up anymore.

Turns out it isn't easy to fix this problem, so simply wakeup enough
waiters for single batch.

Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 21 Jul 2023 17:24:21 +0000 (10:24 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "I've picked up a handful of arm64 fixes while Catalin's been away, so
  here they are. Below is the usual summary, but we have basically have
  two cleanups, a fix for an SME crash and a fix for hibernation:

   - Fix saving of SME state after SVE vector length is changed

   - Fix sparse warnings for missing vDSO function prototypes

   - Fix hibernation resume path when kfence is enabled

   - Fix field names for the HFGxTR_EL2 register"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
  arm64: vdso: Clear common make C=2 warnings
  arm64: mm: Make hibernation aware of KFENCE
  arm64: Fix HFGxTR_EL2 field naming

15 months agoMerge tag 'pm-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 21 Jul 2023 17:16:20 +0000 (10:16 -0700)]
Merge tag 'pm-6.5-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Revert three recent intel_idle commits that introduced a functional
  issue, included a coding mistake and have been questioned at the
  design level"

* tag 'pm-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "intel_idle: Add support for using intel_idle in a VM guest using just hlt"
  Revert "intel_idle: Add a "Long HLT" C1 state for the VM guest mode"
  Revert "intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()"

15 months agoMerge tag 'sound-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 21 Jul 2023 17:10:18 +0000 (10:10 -0700)]
Merge tag 'sound-6.5-rc3' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A pile of fixes that have been gathered since the previous pull. Most
  of changes are device-specific, and nothing looks too scary.

   - A memory leak fix in ALSA sequencer code in 6.5-rc

   - Many fixes for ASoC Qualcomm CODEC drivers, covering SoundWire
     probe problems

   - A series of ASoC AMD fixes

   - A few fixes and cleanups of selftest stuff

   - HD-audio codec fixes and quirks for Clevo, HP, Lenovo, Dell"

* tag 'sound-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
  ALSA: hda/realtek: Add support for DELL Oasis 13/14/16 laptops
  ALSA: hda/realtek: Fix generic fixup definition for cs35l41 amp
  ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx
  selftests: ALSA: Add test-pcmtest-driver to .gitignore
  ALSA: hda/realtek: Add quirk for Clevo NS70AU
  ASoC: fsl_sai: Disable bit clock with transmitter
  ALSA: seq: Fix memory leak at error path in snd_seq_create_port()
  ASoC: SOF: ipc3-dtrace: uninitialized data in dfsentry_trace_filter_write()
  ASoC: cs42l51: fix driver to properly autoload with automatic module loading
  MAINTAINERS: Redo addition of ssm3515 to APPLE SOUND
  ASoC: rt5640: Fix the issue of speaker noise
  ALSA: hda/realtek - remove 3k pull low procedure
  selftests: ALSA: Fix fclose on an already fclosed file pointer
  ALSA: pcmtest: Don't use static storage to track per device data
  ALSA: pcmtest: Convert to platform remove callback returning void
  ASoC: dt-bindings: audio-graph-card2: Drop incomplete example
  ASoC: dt-bindings: Update maintainer email id
  ASoC: amd: ps: Fix extraneous error messages
  ASoC: fsl_sai: Revert "ASoC: fsl_sai: Enable MCTL_MCLK_EN bit for master mode"
  ASoC: codecs: SND_SOC_WCD934X should select REGMAP_IRQ
  ...

15 months agoMerge tag 'fbdev-for-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Fri, 21 Jul 2023 17:00:09 +0000 (10:00 -0700)]
Merge tag 'fbdev-for-6.5-rc3' of git://git./linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes and cleanups from Helge Deller:
 "Just the usual bunch of code cleanups in various drivers, this time
  mostly in vgacon and imxfb:

   - Code cleanup in vgacon (Jiri Slaby)

   - Explicitly include correct DT includes (Rob Herring)

   - imxfb code cleanup (Yangtao Li, Martin Kaiser)

   - kyrofb: make arrays const and smaller (Colin Ian King)

   - ep93xx-fb: return value check fix (Yuanjun Gong)

   - au1200fb: add missing IRQ check (Zhang Shurong)"

* tag 'fbdev-for-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: Explicitly include correct DT includes
  fbdev: ep93xx-fb: fix return value check in ep93xxfb_probe
  fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
  fbdev: kyro: make some const read-only arrays static and reduce type size
  fbcon: remove unused display (p) from fbcon_redraw()
  sticon: make sticon_set_def_font() void and remove op parameter
  vgacon: cache vc_cell_height in vgacon_cursor()
  vgacon: let vgacon_doresize() return void
  vgacon: remove unused xpos from vgacon_set_cursor_size()
  vgacon: remove unneeded forward declarations
  vgacon: switch vgacon_scrolldelta() and vgacon_restore_screen()
  fbdev: imxfb: remove unneeded labels
  fbdev: imxfb: Convert to devm_platform_ioremap_resource()
  fbdev: imxfb: Convert to devm_kmalloc_array()
  fbdev: imxfb: Removed unneeded release_mem_region
  fbdev: imxfb: switch to DEFINE_SIMPLE_DEV_PM_OPS
  fbdev: imxfb: warn about invalid left/right margin

15 months agodrm/atomic: Fix potential use-after-free in nonblocking commits
Daniel Vetter [Fri, 21 Jul 2023 13:58:38 +0000 (15:58 +0200)]
drm/atomic: Fix potential use-after-free in nonblocking commits

This requires a bit of background.  Properly done a modeset driver's
unload/remove sequence should be

drm_dev_unplug();
drm_atomic_helper_shutdown();
drm_dev_put();

The trouble is that the drm_dev_unplugged() checks are by design racy,
they do not synchronize against all outstanding ioctl.  This is because
those ioctl could block forever (both for modeset and for driver
specific ioctls), leading to deadlocks in hotunplug.  Instead the code
sections that touch the hardware need to be annotated with
drm_dev_enter/exit, to avoid accessing hardware resources after the
unload/remove has finished.

To avoid use-after-free issues all the involved userspace visible
objects are supposed to hold a reference on the underlying drm_device,
like drm_file does.

The issue now is that we missed one, the atomic modeset ioctl can be run
in a nonblocking fashion, and in that case it cannot rely on the implied
drm_device reference provided by the ioctl calling context.  This can
result in a use-after-free if an nonblocking atomic commit is carefully
raced against a driver unload.

Fix this by unconditionally grabbing a drm_device reference for any
drm_atomic_state structures.  Strictly speaking this isn't required for
blocking commits and TEST_ONLY calls, but it's the simpler approach.

Thanks to shanzhulig for the initial idea of grabbing an unconditional
reference, I just added comments, a condensed commit message and fixed a
minor potential issue in where exactly we drop the final reference.

Reported-by: shanzhulig <shanzhulig@gmail.com>
Suggested-by: shanzhulig <shanzhulig@gmail.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
15 months agoia64: mmap: Consider pgoff when searching for free mapping
Helge Deller [Fri, 21 Jul 2023 15:24:32 +0000 (17:24 +0200)]
ia64: mmap: Consider pgoff when searching for free mapping

IA64 is the only architecture which does not consider the pgoff value when
searching for a possible free memory region with vm_unmapped_area().
Adding this seems to have no negative side effect on IA64, so add it now
to make IA64 consistent with all other architectures.

Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-ia64@vger.kernel.org
Link: https://lore.kernel.org/r/20230721152432.196382-3-deller@gmx.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoio_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()
Helge Deller [Fri, 21 Jul 2023 15:24:31 +0000 (17:24 +0200)]
io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()

The io_uring testcase is broken on IA-64 since commit d808459b2e31
("io_uring: Adjust mapping wrt architecture aliasing requirements").

The reason is, that this commit introduced an own architecture
independend get_unmapped_area() search algorithm which finds on IA-64 a
memory region which is outside of the regular memory region used for
shared userspace mappings and which can't be used on that platform
due to aliasing.

To avoid similar problems on IA-64 and other platforms in the future,
it's better to switch back to the architecture-provided
get_unmapped_area() function and adjust the needed input parameters
before the call. Beside fixing the issue, the function now becomes
easier to understand and maintain.

This patch has been successfully tested with the io_uring testcase on
physical x86-64, ppc64le, IA-64 and PA-RISC machines. On PA-RISC the LTP
mmmap testcases did not report any regressions.

Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Fixes: d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements")
Link: https://lore.kernel.org/r/20230721152432.196382-2-deller@gmx.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoarm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
Mark Brown [Thu, 20 Jul 2023 18:38:58 +0000 (19:38 +0100)]
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes

When we reconfigure the SVE vector length we discard the backing storage
for the SVE vectors and then reallocate on next SVE use, leaving the SME
specific state alone. This means that we do not enable SME traps if they
were already disabled. That means that userspace code can enter streaming
mode without trapping, putting the task in a state where if we try to save
the state of the task we will fault.

Since the ABI does not specify that changing the SVE vector length disturbs
SME state, and since SVE code may not be aware of SME code in the process,
we shouldn't simply discard any ZA state. Instead immediately reallocate
the storage for SVE, and disable SME if we change the SVE vector length
while there is no SME state active.

Disabling SME traps on SVE vector length changes would make the overall
code more complex since we would have a state where we have valid SME state
stored but might get a SME trap.

Fixes: 9e4ab6c89109 ("arm64/sme: Implement vector length configuration prctl()s")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230720-arm64-fix-sve-sme-vl-change-v2-1-8eea06b82d57@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
15 months agoMerge tag 'drm-fixes-2023-07-21' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 21 Jul 2023 03:35:38 +0000 (20:35 -0700)]
Merge tag 'drm-fixes-2023-07-21' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Mostly amdgpu fixes, a couple of i915 fixes, some nouveau and then a
  few misc accel and other fixes.

  client:
   - memory leak fix

  dma-buf:
   - memory leak fix

  qaic:
   - bound check fixes
   - map_user_pages leak
   - int overflow fixes

  habanalabs:
   - debugfs stub helper

  nouveau:
   - aux event slot fixes
   - anx9805 cards fixes

  i915:
   - Add sentinel to xehp_oa_b_counters
   - Revert "drm/i915: use localized __diag_ignore_all() instead of per
     file"

  amdgpu:
   - More PCIe DPM fixes for Intel platforms
   - DCN3.0.1 fixes
   - Virtual display timer fix
   - Async flip fix
   - SMU13 clock reporting fixes
   - Add missing PSP firmware declaration
   - DP MST fix
   - DCN3.1.x fixes
   - Slab out of bounds fix"

* tag 'drm-fixes-2023-07-21' of git://anongit.freedesktop.org/drm/drm: (31 commits)
  accel/habanalabs: add more debugfs stub helpers
  drm/nouveau/kms/nv50-: init hpd_irq_lock for PIOR DP
  drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts
  drm/nouveau/i2c: fix number of aux event slots
  drm/amdgpu: use a macro to define no xcp partition case
  drm/amdgpu/vm: use the same xcp_id from root PD
  drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create
  drm/amdgpu: Allocate root PD on correct partition
  drm/amd/display: Keep PHY active for DP displays on DCN31
  drm/amd/display: Prevent vtotal from being set to 0
  drm/amd/display: Disable MPC split by default on special asic
  drm/amd/display: check TG is non-null before checking if enabled
  drm/amd/display: Add polling method to handle MST reply packet
  drm/amd/display: Clean up errors & warnings in amdgpu_dm.c
  drm/amdgpu: Allow the initramfs generator to include psp_13_0_6_ta
  drm/amdgpu/pm: make mclk consistent for smu 13.0.7
  drm/amdgpu/pm: make gfxclock consistent for sienna cichlid
  drm/amd/display: only accept async flips for fast updates
  drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel
  drm/amd/display: add DCN301 specific logic for OTG programming
  ...

15 months agoMerge tag 'amd-drm-fixes-6.5-2023-07-20' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 21 Jul 2023 02:16:41 +0000 (12:16 +1000)]
Merge tag 'amd-drm-fixes-6.5-2023-07-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.5-2023-07-20:

amdgpu:
- More PCIe DPM fixes for Intel platforms
- DCN3.0.1 fixes
- Virtual display timer fix
- Async flip fix
- SMU13 clock reporting fixes
- Add missing PSP firmware declaration
- DP MST fix
- DCN3.1.x fixes
- Slab out of bounds fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720133456.7826-1-alexander.deucher@amd.com
15 months agoMerge tag 'drm-intel-fixes-2023-07-20' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Fri, 21 Jul 2023 02:15:09 +0000 (12:15 +1000)]
Merge tag 'drm-intel-fixes-2023-07-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Add sentinel to xehp_oa_b_counters [perf] (Andrzej Hajda)
- Revert "drm/i915: use localized __diag_ignore_all() instead of per file" (Jani Nikula)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZLjuwhLhwab5B7RY@tursulin-desk
15 months agoMerge tag 'drm-misc-fixes-2023-07-20' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Fri, 21 Jul 2023 02:02:31 +0000 (12:02 +1000)]
Merge tag 'drm-misc-fixes-2023-07-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Memory leak fixes in drm/client, memory access/leak fixes for
accel/qaic, another leak fix in dma-buf and three nouveau fixes around
hotplugging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fmj5nok7zggux2lcpdtls2iknweba54wfc6o4zxq6i6s3dgi2r@7z3eawwhyhen
15 months agoMerge tag 'ata-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Fri, 21 Jul 2023 02:10:50 +0000 (19:10 -0700)]
Merge tag 'ata-6.5-rc3' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fix from Damien Le Moal:

 - Add missing MODULE_DESCRIPTION() in the many of the protocol modules
   for the pata_parport driver to avoid compilation warnings with "make
   W=1".

* tag 'ata-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: pata_parport: Add missing protocol modules description

15 months agoMerge tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 20 Jul 2023 21:46:39 +0000 (14:46 -0700)]
Merge tag 'net-6.5-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from BPF, netfilter, bluetooth and CAN.

  Current release - regressions:

   - eth: r8169: multiple fixes for PCIe ASPM-related problems

   - vrf: fix RCU lockdep splat in output path

  Previous releases - regressions:

   - gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set

   - dsa: mv88e6xxx: do a final check before timing out when polling

   - nf_tables: fix sleep in atomic in nft_chain_validate

  Previous releases - always broken:

   - sched: fix undoing tcf_bind_filter() in multiple classifiers

   - bpf, arm64: fix BTI type used for freplace attached functions

   - can: gs_usb: fix time stamp counter initialization

   - nft_set_pipapo: fix improper element removal (leading to UAF)

  Misc:

   - net: support STP on bridge in non-root netns, STP prevents packet
     loops so not supporting it results in freezing systems of
     unsuspecting users, and in turn very upset noises being made

   - fix kdoc warnings

   - annotate various bits of TCP state to prevent data races"

* tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  net: phy: prevent stale pointer dereference in phy_init()
  tcp: annotate data-races around fastopenq.max_qlen
  tcp: annotate data-races around icsk->icsk_user_timeout
  tcp: annotate data-races around tp->notsent_lowat
  tcp: annotate data-races around rskq_defer_accept
  tcp: annotate data-races around tp->linger2
  tcp: annotate data-races around icsk->icsk_syn_retries
  tcp: annotate data-races around tp->keepalive_probes
  tcp: annotate data-races around tp->keepalive_intvl
  tcp: annotate data-races around tp->keepalive_time
  tcp: annotate data-races around tp->tsoffset
  tcp: annotate data-races around tp->tcp_tx_delay
  Bluetooth: MGMT: Use correct address for memcpy()
  Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014
  Bluetooth: SCO: fix sco_conn related locking and validity issues
  Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
  Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
  Bluetooth: coredump: fix building with coredump disabled
  Bluetooth: ISO: fix iso_conn related locking and validity issues
  Bluetooth: hci_event: call disconnect callback before deleting conn
  ...

15 months agoblk-iocost: skip empty flush bio in iocost
Chengming Zhou [Thu, 20 Jul 2023 12:14:41 +0000 (20:14 +0800)]
blk-iocost: skip empty flush bio in iocost

The flush bio may have data, may have no data (empty flush), we couldn't
calculate cost for empty flush bio. So we'd better just skip it for now.

Another side effect is that empty flush bio's bio_end_sector() is 0, cause
iocg->cursor reset to 0, may break the cost calculation of other bios.

This isn't good enough, since flush bio still consume the device bandwidth,
but flush request is special, can be merged randomly in the flush state
machine, we don't know how to calculate cost for it for now.

Its completion time also has flaws, which may include the pre-flush or
post-flush completion time, but I don't know if we need to fix that and
how to fix it.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230720121441.1408522-1-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoMerge tag 'for-net-2023-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Thu, 20 Jul 2023 19:57:55 +0000 (12:57 -0700)]
Merge tag 'for-net-2023-07-20' of git://git./linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix building with coredump disabled
 - Fix use-after-free in hci_remove_adv_monitor
 - Use RCU for hci_conn_params and iterate safely in hci_sync
 - Fix locking issues on ISO and SCO
 - Fix bluetooth on Intel Macbook 2014

* tag 'for-net-2023-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: MGMT: Use correct address for memcpy()
  Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014
  Bluetooth: SCO: fix sco_conn related locking and validity issues
  Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
  Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
  Bluetooth: coredump: fix building with coredump disabled
  Bluetooth: ISO: fix iso_conn related locking and validity issues
  Bluetooth: hci_event: call disconnect callback before deleting conn
  Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync
====================

Link: https://lore.kernel.org/r/20230720190201.446469-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoMerge tag 'nf-23-07-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 20 Jul 2023 19:54:21 +0000 (12:54 -0700)]
Merge tag 'nf-23-07-20' of https://git./linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
Netfilter fixes for net:

The following patchset contains Netfilter fixes for net:

1. Fix spurious -EEXIST error from userspace due to
   padding holes, this was broken since 4.9 days
   when 'ignore duplicate entries on insert' feature was
   added.

2. Fix a sched-while-atomic bug, present since 5.19.

3. Properly remove elements if they lack an "end range".
   nft userspace always sets an end range attribute, even
   when its the same as the start, but the abi doesn't
   have such a restriction. Always broken since it was
   added in 5.6, all three from myself.

4 + 5: Bound chain needs to be skipped in netns release
   and on rule flush paths, from Pablo Neira.

* tag 'nf-23-07-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: skip bound chain on rule flush
  netfilter: nf_tables: skip bound chain in netns release path
  netfilter: nft_set_pipapo: fix improper element removal
  netfilter: nf_tables: can't schedule in nft_chain_validate
  netfilter: nf_tables: fix spurious set element insertion failure
====================

Link: https://lore.kernel.org/r/20230720165143.30208-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agonet: phy: prevent stale pointer dereference in phy_init()
Vladimir Oltean [Thu, 20 Jul 2023 00:02:31 +0000 (03:02 +0300)]
net: phy: prevent stale pointer dereference in phy_init()

mdio_bus_init() and phy_driver_register() both have error paths, and if
those are ever hit, ethtool will have a stale pointer to the
phy_ethtool_phy_ops stub structure, which references memory from a
module that failed to load (phylib).

It is probably hard to force an error in this code path even manually,
but the error teardown path of phy_init() should be the same as
phy_exit(), which is now simply not the case.

Fixes: 55d8f053ce1b ("net: phy: Register ethtool PHY operations")
Link: https://lore.kernel.org/netdev/ZLaiJ4G6TaJYGJyU@shell.armlinux.org.uk/
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230720000231.1939689-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoMerge branch 'tcp-add-missing-annotations'
Jakub Kicinski [Thu, 20 Jul 2023 19:34:24 +0000 (12:34 -0700)]
Merge branch 'tcp-add-missing-annotations'

Eric Dumazet says:

====================
tcp: add missing annotations

This series was inspired by one syzbot (KCSAN) report.

do_tcp_getsockopt() does not lock the socket, we need to
annotate most of the reads there (and other places as well).

This is a first round, another series will come later.
====================

Link: https://lore.kernel.org/r/20230719212857.3943972-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around fastopenq.max_qlen
Eric Dumazet [Wed, 19 Jul 2023 21:28:57 +0000 (21:28 +0000)]
tcp: annotate data-races around fastopenq.max_qlen

This field can be read locklessly.

Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around icsk->icsk_user_timeout
Eric Dumazet [Wed, 19 Jul 2023 21:28:56 +0000 (21:28 +0000)]
tcp: annotate data-races around icsk->icsk_user_timeout

This field can be read locklessly from do_tcp_getsockopt()

Fixes: dca43c75e7e5 ("tcp: Add TCP_USER_TIMEOUT socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->notsent_lowat
Eric Dumazet [Wed, 19 Jul 2023 21:28:55 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->notsent_lowat

tp->notsent_lowat can be read locklessly from do_tcp_getsockopt()
and tcp_poll().

Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around rskq_defer_accept
Eric Dumazet [Wed, 19 Jul 2023 21:28:54 +0000 (21:28 +0000)]
tcp: annotate data-races around rskq_defer_accept

do_tcp_getsockopt() reads rskq_defer_accept while another cpu
might change its value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->linger2
Eric Dumazet [Wed, 19 Jul 2023 21:28:53 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->linger2

do_tcp_getsockopt() reads tp->linger2 while another cpu
might change its value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around icsk->icsk_syn_retries
Eric Dumazet [Wed, 19 Jul 2023 21:28:52 +0000 (21:28 +0000)]
tcp: annotate data-races around icsk->icsk_syn_retries

do_tcp_getsockopt() and reqsk_timer_handler() read
icsk->icsk_syn_retries while another cpu might change its value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->keepalive_probes
Eric Dumazet [Wed, 19 Jul 2023 21:28:51 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->keepalive_probes

do_tcp_getsockopt() reads tp->keepalive_probes while another cpu
might change its value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->keepalive_intvl
Eric Dumazet [Wed, 19 Jul 2023 21:28:50 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->keepalive_intvl

do_tcp_getsockopt() reads tp->keepalive_intvl while another cpu
might change its value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->keepalive_time
Eric Dumazet [Wed, 19 Jul 2023 21:28:49 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->keepalive_time

do_tcp_getsockopt() reads tp->keepalive_time while another cpu
might change its value.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->tsoffset
Eric Dumazet [Wed, 19 Jul 2023 21:28:48 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->tsoffset

do_tcp_getsockopt() reads tp->tsoffset while another cpu
might change its value.

Fixes: 93be6ce0e91b ("tcp: set and get per-socket timestamp")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agotcp: annotate data-races around tp->tcp_tx_delay
Eric Dumazet [Wed, 19 Jul 2023 21:28:47 +0000 (21:28 +0000)]
tcp: annotate data-races around tp->tcp_tx_delay

do_tcp_getsockopt() reads tp->tcp_tx_delay while another cpu
might change its value.

Fixes: a842fe1425cb ("tcp: add optional per socket transmit delay")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230719212857.3943972-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
15 months agoblk-mq: delete dead struct blk_mq_hw_ctx->queued field
Chengming Zhou [Thu, 20 Jul 2023 09:55:12 +0000 (17:55 +0800)]
blk-mq: delete dead struct blk_mq_hw_ctx->queued field

This counter is not used anywhere, so delete it.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230720095512.1403123-1-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoio_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq
Jens Axboe [Thu, 20 Jul 2023 19:16:53 +0000 (13:16 -0600)]
io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq

io-wq assumes that an issue is blocking, but it may not be if the
request type has asked for a non-blocking attempt. If we get
-EAGAIN for that case, then we need to treat it as a final result
and not retry or arm poll for it.

Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/897
Signed-off-by: Jens Axboe <axboe@kernel.dk>
15 months agoBluetooth: MGMT: Use correct address for memcpy()
Andy Shevchenko [Mon, 17 Jul 2023 09:32:14 +0000 (12:32 +0300)]
Bluetooth: MGMT: Use correct address for memcpy()

In function ‘fortify_memcpy_chk’,
    inlined from ‘get_conn_info_complete’ at net/bluetooth/mgmt.c:7281:2:
include/linux/fortify-string.h:592:25: error: call to
‘__read_overflow2_field’ declared with attribute warning: detected read
beyond size of field (2nd parameter); maybe use struct_group()?
[-Werror=attribute-warning]
  592 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

This is due to the wrong member is used for memcpy(). Use correct one.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: btusb: Fix bluetooth on Intel Macbook 2014
Tomasz Moń [Thu, 13 Jul 2023 10:25:14 +0000 (12:25 +0200)]
Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014

Commit c13380a55522 ("Bluetooth: btusb: Do not require hardcoded
interface numbers") inadvertedly broke bluetooth on Intel Macbook 2014.
The intention was to keep behavior intact when BTUSB_IFNUM_2 is set and
otherwise allow any interface numbers. The problem is that the new logic
condition omits the case where bInterfaceNumber is 0.

Fix BTUSB_IFNUM_2 handling by allowing both interface number 0 and 2
when the flag is set.

Fixes: c13380a55522 ("Bluetooth: btusb: Do not require hardcoded interface numbers")
Reported-by: John Holland <johnbholland@icloud.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217651
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Tested-by: John Holland<johnbholland@icloud.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: SCO: fix sco_conn related locking and validity issues
Pauli Virtanen [Mon, 10 Jul 2023 16:48:19 +0000 (19:48 +0300)]
Bluetooth: SCO: fix sco_conn related locking and validity issues

Operations that check/update sk_state and access conn should hold
lock_sock, otherwise they can race.

The order of taking locks is hci_dev_lock > lock_sock > sco_conn_lock,
which is how it is in connect/disconnect_cfm -> sco_conn_del ->
sco_chan_del.

Fix locking in sco_connect to take lock_sock around updating sk_state
and conn.

sco_conn_del must not occur during sco_connect, as it frees the
sco_conn. Hold hdev->lock longer to prevent that.

sco_conn_add shall return sco_conn with valid hcon. Make it so also when
reusing an old SCO connection waiting for disconnect timeout (see
__sco_sock_close where conn->hcon is set to NULL).

This should not reintroduce the issue fixed in the earlier
commit 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking
dependency on sco_connect_cfm"), the relevant fix of releasing lock_sock
in sco_sock_connect before acquiring hdev->lock is retained.

These changes mirror similar fixes earlier in ISO sockets.

Fixes: 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
Siddh Raman Pant [Tue, 11 Jul 2023 13:13:53 +0000 (18:43 +0530)]
Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link

hci_connect_sco currently returns NULL when there is no link (i.e. when
hci_conn_link() returns NULL).

sco_connect() expects an ERR_PTR in case of any error (see line 266 in
sco.c). Thus, hcon set as NULL passes through to sco_conn_add(), which
tries to get hcon->hdev, resulting in dereferencing a NULL pointer as
reported by syzkaller.

The same issue exists for iso_connect_cis() calling hci_connect_cis().

Thus, make hci_connect_sco() and hci_connect_cis() return ERR_PTR
instead of NULL.

Reported-and-tested-by: syzbot+37acd5d80d00d609d233@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=37acd5d80d00d609d233
Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
Douglas Anderson [Fri, 30 Jun 2023 22:33:14 +0000 (15:33 -0700)]
Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()

KASAN reports that there's a use-after-free in
hci_remove_adv_monitor(). Trawling through the disassembly, you can
see that the complaint is from the access in bt_dev_dbg() under the
HCI_ADV_MONITOR_EXT_MSFT case. The problem case happens because
msft_remove_monitor() can end up freeing the monitor
structure. Specifically:
  hci_remove_adv_monitor() ->
  msft_remove_monitor() ->
  msft_remove_monitor_sync() ->
  msft_le_cancel_monitor_advertisement_cb() ->
  hci_free_adv_monitor()

Let's fix the problem by just stashing the relevant data when it's
still valid.

Fixes: 7cf5c2978f23 ("Bluetooth: hci_sync: Refactor remove Adv Monitor")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: coredump: fix building with coredump disabled
Arnd Bergmann [Mon, 3 Jul 2023 11:30:48 +0000 (13:30 +0200)]
Bluetooth: coredump: fix building with coredump disabled

The btmtk driver uses an IS_ENABLED() check to conditionally compile
the coredump support, but this fails to build because the hdev->dump
member is in an #ifdef:

drivers/bluetooth/btmtk.c: In function 'btmtk_process_coredump':
drivers/bluetooth/btmtk.c:386:30: error: 'struct hci_dev' has no member named 'dump'
  386 |   schedule_delayed_work(&hdev->dump.dump_timeout,
      |                              ^~

The struct member doesn't really make a huge difference in the total size,
so just remove the #ifdef around it to avoid adding similar checks
around each user.

Fixes: 872f8c253cb9e ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support")
Fixes: 9695ef876fd12 ("Bluetooth: Add support for hci devcoredump")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: ISO: fix iso_conn related locking and validity issues
Pauli Virtanen [Sun, 18 Jun 2023 22:04:33 +0000 (01:04 +0300)]
Bluetooth: ISO: fix iso_conn related locking and validity issues

sk->sk_state indicates whether iso_pi(sk)->conn is valid. Operations
that check/update sk_state and access conn should hold lock_sock,
otherwise they can race.

The order of taking locks is hci_dev_lock > lock_sock > iso_conn_lock,
which is how it is in connect/disconnect_cfm -> iso_conn_del ->
iso_chan_del.

Fix locking in iso_connect_cis/bis and sendmsg/recvmsg to take lock_sock
around updating sk_state and conn.

iso_conn_del must not occur during iso_connect_cis/bis, as it frees the
iso_conn. Hold hdev->lock longer to prevent that.

This should not reintroduce the issue fixed in commit 241f51931c35
("Bluetooth: ISO: Avoid circular locking dependency"), since the we
acquire locks in order. We retain the fix in iso_sock_connect to release
lock_sock before iso_connect_* acquires hdev->lock.

Similarly for commit 6a5ad251b7cd ("Bluetooth: ISO: Fix possible
circular locking dependency"). We retain the fix in iso_conn_ready to
not acquire iso_conn_lock before lock_sock.

iso_conn_add shall return iso_conn with valid hcon. Make it so also when
reusing an old CIS connection waiting for disconnect timeout (see
__iso_sock_close where conn->hcon is set to NULL).

Trace with iso_conn_del after iso_chan_add in iso_connect_cis:
===============================================================
iso_sock_create:771: sock 00000000be9b69b7
iso_sock_init:693: sk 000000004dff667e
iso_sock_bind:827: sk 000000004dff667e 70:1a:b8:98:ff:a2 type 1
iso_sock_setsockopt:1289: sk 000000004dff667e
iso_sock_setsockopt:1289: sk 000000004dff667e
iso_sock_setsockopt:1289: sk 000000004dff667e
iso_sock_connect:875: sk 000000004dff667e
iso_connect_cis:353: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
hci_conn_add:1005: hci0 dst 28:3d:c2:4a:7e:da
iso_conn_add:140: hcon 000000007b65d182 conn 00000000daf8625e
__iso_chan_add:214: conn 00000000daf8625e
iso_connect_cfm:1700: hcon 000000007b65d182 bdaddr 28:3d:c2:4a:7e:da status 12
iso_conn_del:187: hcon 000000007b65d182 conn 00000000daf8625e, err 16
iso_sock_clear_timer:117: sock 000000004dff667e state 3
    <Note: sk_state is BT_BOUND (3), so iso_connect_cis is still
    running at this point>
iso_chan_del:153: sk 000000004dff667e, conn 00000000daf8625e, err 16
hci_conn_del:1151: hci0 hcon 000000007b65d182 handle 65535
hci_conn_unlink:1102: hci0: hcon 000000007b65d182
hci_chan_list_flush:2780: hcon 000000007b65d182
iso_sock_getsockopt:1376: sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_getsockopt:1376: sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e
iso_sock_shutdown:1434: sock 00000000be9b69b7, sk 000000004dff667e, how 1
__iso_sock_close:632: sk 000000004dff667e state 5 socket 00000000be9b69b7
     <Note: sk_state is BT_CONNECT (5), even though iso_chan_del sets
     BT_CLOSED (6). Only iso_connect_cis sets it to BT_CONNECT, so it
     must be that iso_chan_del occurred between iso_chan_add and end of
     iso_connect_cis.>
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 8000000006467067 P4D 8000000006467067 PUD 3f5f067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:__iso_sock_close (net/bluetooth/iso.c:664) bluetooth
===============================================================

Trace with iso_conn_del before iso_chan_add in iso_connect_cis:
===============================================================
iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
...
iso_conn_add:140: hcon 0000000093bc551f conn 00000000768ae504
hci_dev_put:1487: hci0 orig refcnt 21
hci_event_packet:7607: hci0: event 0x0e
hci_cmd_complete_evt:4231: hci0: opcode 0x2062
hci_cc_le_set_cig_params:3846: hci0: status 0x07
hci_sent_cmd_data:3107: hci0 opcode 0x2062
iso_connect_cfm:1703: hcon 0000000093bc551f bdaddr 28:3d:c2:4a:7e:da status 7
iso_conn_del:187: hcon 0000000093bc551f conn 00000000768ae504, err 12
hci_conn_del:1151: hci0 hcon 0000000093bc551f handle 65535
hci_conn_unlink:1102: hci0: hcon 0000000093bc551f
hci_chan_list_flush:2780: hcon 0000000093bc551f
__iso_chan_add:214: conn 00000000768ae504
    <Note: this conn was already freed in iso_conn_del above>
iso_sock_clear_timer:117: sock 0000000098323f95 state 3
general protection fault, probably for non-canonical address 0x30b29c630930aec8: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1920 Comm: bluetoothd Tainted: G            E      6.3.0-rc7+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:detach_if_pending+0x28/0xd0
Code: 90 90 0f 1f 44 00 00 48 8b 47 08 48 85 c0 0f 84 ad 00 00 00 55 89 d5 53 48 83 3f 00 48 89 fb 74 7d 66 90 48 8b 03 48 8b 53 08 <>
RSP: 0018:ffffb90841a67d08 EFLAGS: 00010007
RAX: 0000000000000000 RBX: ffff9141bd5061b8 RCX: 0000000000000000
RDX: 30b29c630930aec8 RSI: ffff9141fdd21e80 RDI: ffff9141bd5061b8
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb90841a67b88
R10: 0000000000000003 R11: ffffffff8613f558 R12: ffff9141fdd21e80
R13: 0000000000000000 R14: ffff9141b5976010 R15: ffff914185755338
FS:  00007f45768bd840(0000) GS:ffff9141fdd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000619000424074 CR3: 0000000009f5e005 CR4: 0000000000170ee0
Call Trace:
 <TASK>
 timer_delete+0x48/0x80
 try_to_grab_pending+0xdf/0x170
 __cancel_work+0x37/0xb0
 iso_connect_cis+0x141/0x400 [bluetooth]
===============================================================

Trace with NULL conn->hcon in state BT_CONNECT:
===============================================================
__iso_sock_close:619: sk 00000000f7c71fc5 state 1 socket 00000000d90c5fe5
...
__iso_sock_close:619: sk 00000000f7c71fc5 state 8 socket 00000000d90c5fe5
iso_chan_del:153: sk 00000000f7c71fc5, conn 0000000022c03a7e, err 104
...
iso_sock_connect:862: sk 00000000129b56c3
iso_connect_cis:348: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a
hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a
hci_dev_hold:1495: hci0 orig refcnt 19
__iso_chan_add:214: conn 0000000022c03a7e
    <Note: reusing old conn>
iso_sock_clear_timer:117: sock 00000000129b56c3 state 3
...
iso_sock_ready:1485: sk 00000000129b56c3
...
iso_sock_sendmsg:1077: sock 00000000e5013966, sk 00000000129b56c3
BUG: kernel NULL pointer dereference, address: 00000000000006a8
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1403 Comm: wireplumber Tainted: G            E      6.3.0-rc7+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:iso_sock_sendmsg+0x63/0x2a0 [bluetooth]
===============================================================

Fixes: 241f51931c35 ("Bluetooth: ISO: Avoid circular locking dependency")
Fixes: 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: hci_event: call disconnect callback before deleting conn
Pauli Virtanen [Sun, 18 Jun 2023 22:04:32 +0000 (01:04 +0300)]
Bluetooth: hci_event: call disconnect callback before deleting conn

In hci_cs_disconnect, we do hci_conn_del even if disconnection failed.

ISO, L2CAP and SCO connections refer to the hci_conn without
hci_conn_get, so disconn_cfm must be called so they can clean up their
conn, otherwise use-after-free occurs.

ISO:
==========================================================
iso_sock_connect:880: sk 00000000eabd6557
iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da
...
iso_conn_add:140: hcon 000000001696f1fd conn 00000000b6251073
hci_dev_put:1487: hci0 orig refcnt 17
__iso_chan_add:214: conn 00000000b6251073
iso_sock_clear_timer:117: sock 00000000eabd6557 state 3
...
hci_rx_work:4085: hci0 Event packet
hci_event_packet:7601: hci0: event 0x0f
hci_cmd_status_evt:4346: hci0: opcode 0x0406
hci_cs_disconnect:2760: hci0: status 0x0c
hci_sent_cmd_data:3107: hci0 opcode 0x0406
hci_conn_del:1151: hci0 hcon 000000001696f1fd handle 2560
hci_conn_unlink:1102: hci0: hcon 000000001696f1fd
hci_conn_drop:1451: hcon 00000000d8521aaf orig refcnt 2
hci_chan_list_flush:2780: hcon 000000001696f1fd
hci_dev_put:1487: hci0 orig refcnt 21
hci_dev_put:1487: hci0 orig refcnt 20
hci_req_cmd_complete:3978: opcode 0x0406 status 0x0c
... <no iso_* activity on sk/conn> ...
iso_sock_sendmsg:1098: sock 00000000dea5e2e0, sk 00000000eabd6557
BUG: kernel NULL pointer dereference, address: 0000000000000668
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
RIP: 0010:iso_sock_sendmsg (net/bluetooth/iso.c:1112) bluetooth
==========================================================

L2CAP:
==================================================================
hci_cmd_status_evt:4359: hci0: opcode 0x0406
hci_cs_disconnect:2760: hci0: status 0x0c
hci_sent_cmd_data:3085: hci0 opcode 0x0406
hci_conn_del:1151: hci0 hcon ffff88800c999000 handle 3585
hci_conn_unlink:1102: hci0: hcon ffff88800c999000
hci_chan_list_flush:2780: hcon ffff88800c999000
hci_chan_del:2761: hci0 hcon ffff88800c999000 chan ffff888018ddd280
...
BUG: KASAN: slab-use-after-free in hci_send_acl+0x2d/0x540 [bluetooth]
Read of size 8 at addr ffff888018ddd298 by task bluetoothd/1175

CPU: 0 PID: 1175 Comm: bluetoothd Tainted: G            E      6.4.0-rc4+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x5b/0x90
 print_report+0xcf/0x670
 ? __virt_addr_valid+0xf8/0x180
 ? hci_send_acl+0x2d/0x540 [bluetooth]
 kasan_report+0xa8/0xe0
 ? hci_send_acl+0x2d/0x540 [bluetooth]
 hci_send_acl+0x2d/0x540 [bluetooth]
 ? __pfx___lock_acquire+0x10/0x10
 l2cap_chan_send+0x1fd/0x1300 [bluetooth]
 ? l2cap_sock_sendmsg+0xf2/0x170 [bluetooth]
 ? __pfx_l2cap_chan_send+0x10/0x10 [bluetooth]
 ? lock_release+0x1d5/0x3c0
 ? mark_held_locks+0x1a/0x90
 l2cap_sock_sendmsg+0x100/0x170 [bluetooth]
 sock_write_iter+0x275/0x280
 ? __pfx_sock_write_iter+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 do_iter_readv_writev+0x176/0x220
 ? __pfx_do_iter_readv_writev+0x10/0x10
 ? find_held_lock+0x83/0xa0
 ? selinux_file_permission+0x13e/0x210
 do_iter_write+0xda/0x340
 vfs_writev+0x1b4/0x400
 ? __pfx_vfs_writev+0x10/0x10
 ? __seccomp_filter+0x112/0x750
 ? populate_seccomp_data+0x182/0x220
 ? __fget_light+0xdf/0x100
 ? do_writev+0x19d/0x210
 do_writev+0x19d/0x210
 ? __pfx_do_writev+0x10/0x10
 ? mark_held_locks+0x1a/0x90
 do_syscall_64+0x60/0x90
 ? lockdep_hardirqs_on_prepare+0x149/0x210
 ? do_syscall_64+0x6c/0x90
 ? lockdep_hardirqs_on_prepare+0x149/0x210
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7ff45cb23e64
Code: 15 d1 1f 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d 9d a7 0d 00 00 74 13 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
RSP: 002b:00007fff21ae09b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff45cb23e64
RDX: 0000000000000001 RSI: 00007fff21ae0aa0 RDI: 0000000000000017
RBP: 00007fff21ae0aa0 R08: 000000000095a8a0 R09: 0000607000053f40
R10: 0000000000000001 R11: 0000000000000202 R12: 00007fff21ae0ac0
R13: 00000fffe435c150 R14: 00007fff21ae0a80 R15: 000060f000000040
 </TASK>

Allocated by task 771:
 kasan_save_stack+0x33/0x60
 kasan_set_track+0x25/0x30
 __kasan_kmalloc+0xaa/0xb0
 hci_chan_create+0x67/0x1b0 [bluetooth]
 l2cap_conn_add.part.0+0x17/0x590 [bluetooth]
 l2cap_connect_cfm+0x266/0x6b0 [bluetooth]
 hci_le_remote_feat_complete_evt+0x167/0x310 [bluetooth]
 hci_event_packet+0x38d/0x800 [bluetooth]
 hci_rx_work+0x287/0xb20 [bluetooth]
 process_one_work+0x4f7/0x970
 worker_thread+0x8f/0x620
 kthread+0x17f/0x1c0
 ret_from_fork+0x2c/0x50

Freed by task 771:
 kasan_save_stack+0x33/0x60
 kasan_set_track+0x25/0x30
 kasan_save_free_info+0x2e/0x50
 ____kasan_slab_free+0x169/0x1c0
 slab_free_freelist_hook+0x9e/0x1c0
 __kmem_cache_free+0xc0/0x310
 hci_chan_list_flush+0x46/0x90 [bluetooth]
 hci_conn_cleanup+0x7d/0x330 [bluetooth]
 hci_cs_disconnect+0x35d/0x530 [bluetooth]
 hci_cmd_status_evt+0xef/0x2b0 [bluetooth]
 hci_event_packet+0x38d/0x800 [bluetooth]
 hci_rx_work+0x287/0xb20 [bluetooth]
 process_one_work+0x4f7/0x970
 worker_thread+0x8f/0x620
 kthread+0x17f/0x1c0
 ret_from_fork+0x2c/0x50
==================================================================

Fixes: b8d290525e39 ("Bluetooth: clean up connection in hci_cs_disconnect")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoBluetooth: use RCU for hci_conn_params and iterate safely in hci_sync
Pauli Virtanen [Sun, 18 Jun 2023 22:04:31 +0000 (01:04 +0300)]
Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync

hci_update_accept_list_sync iterates over hdev->pend_le_conns and
hdev->pend_le_reports, and waits for controller events in the loop body,
without holding hdev lock.

Meanwhile, these lists and the items may be modified e.g. by
le_scan_cleanup. This can invalidate the list cursor or any other item
in the list, resulting to invalid behavior (eg use-after-free).

Use RCU for the hci_conn_params action lists. Since the loop bodies in
hci_sync block and we cannot use RCU or hdev->lock for the whole loop,
copy list items first and then iterate on the copy. Only the flags field
is written from elsewhere, so READ_ONCE/WRITE_ONCE should guarantee we
read valid values.

Free params everywhere with hci_conn_params_free so the cleanup is
guaranteed to be done properly.

This fixes the following, which can be triggered e.g. by BlueZ new
mgmt-tester case "Add + Remove Device Nowait - Success", or by changing
hci_le_set_cig_params to always return false, and running iso-tester:

==================================================================
BUG: KASAN: slab-use-after-free in hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
Read of size 8 at addr ffff888001265018 by task kworker/u3:0/32

Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl (./arch/x86/include/asm/irqflags.h:134 lib/dump_stack.c:107)
print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
? __virt_addr_valid (./include/linux/mmzone.h:1915 ./include/linux/mmzone.h:2011 arch/x86/mm/physaddr.c:65)
? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
kasan_report (mm/kasan/report.c:538)
? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
? __pfx_hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2780)
? mutex_lock (kernel/locking/mutex.c:282)
? __pfx_mutex_lock (kernel/locking/mutex.c:282)
? __pfx_mutex_unlock (kernel/locking/mutex.c:538)
? __pfx_update_passive_scan_sync (net/bluetooth/hci_sync.c:2861)
hci_cmd_sync_work (net/bluetooth/hci_sync.c:306)
process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538)
? __pfx_worker_thread (kernel/workqueue.c:2480)
kthread (kernel/kthread.c:376)
? __pfx_kthread (kernel/kthread.c:331)
ret_from_fork (arch/x86/entry/entry_64.S:314)
</TASK>

Allocated by task 31:
kasan_save_stack (mm/kasan/common.c:46)
kasan_set_track (mm/kasan/common.c:52)
__kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
hci_conn_params_add (./include/linux/slab.h:580 ./include/linux/slab.h:720 net/bluetooth/hci_core.c:2277)
hci_connect_le_scan (net/bluetooth/hci_conn.c:1419 net/bluetooth/hci_conn.c:1589)
hci_connect_cis (net/bluetooth/hci_conn.c:2266)
iso_connect_cis (net/bluetooth/iso.c:390)
iso_sock_connect (net/bluetooth/iso.c:899)
__sys_connect (net/socket.c:2003 net/socket.c:2020)
__x64_sys_connect (net/socket.c:2027)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

Freed by task 15:
kasan_save_stack (mm/kasan/common.c:46)
kasan_set_track (mm/kasan/common.c:52)
kasan_save_free_info (mm/kasan/generic.c:523)
__kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)
__kmem_cache_free (mm/slub.c:1807 mm/slub.c:3787 mm/slub.c:3800)
hci_conn_params_del (net/bluetooth/hci_core.c:2323)
le_scan_cleanup (net/bluetooth/hci_conn.c:202)
process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:314)
==================================================================

Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
15 months agoMerge tag 'iomap-6.5-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 20 Jul 2023 17:10:02 +0000 (10:10 -0700)]
Merge tag 'iomap-6.5-fixes-1' of git://git./fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "Fix partial write regression.

  It turns out that fstests doesn't have any test coverage for short
  writes, but LTP does. Fortunately, this was caught right after -rc1
  was tagged.

  Summary:

   - Fix a bug wherein a failed write could clobber short write status"

* tag 'iomap-6.5-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: micro optimize the ki_pos assignment in iomap_file_buffered_write
  iomap: fix a regression for partial write errors