Xuan Zhuo [Mon, 1 Aug 2022 06:38:59 +0000 (14:38 +0800)]
virtio_net: split free_unused_bufs()
This patch separates two functions for freeing sq buf and rq buf from
free_unused_bufs().
When supporting the enable/disable tx/rq queue in the future, it is
necessary to support separate recovery of a sq buf or a rq buf.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-40-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:58 +0000 (14:38 +0800)]
virtio_net: get ringparam by virtqueue_get_vring_max_size()
Use virtqueue_get_vring_max_size() in virtnet_get_ringparam() to set
tx,rx_max_pending.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-39-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:57 +0000 (14:38 +0800)]
virtio_net: set the default max ring size by find_vqs()
Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
rx at the same time.
| rx/tx ring size
-------------------------------------------
speed == UNKNOWN or < 10G| 1024
speed < 40G | 4096
speed >= 40G | 8192
Call virtnet_update_settings() once before calling init_vqs() to update
speed.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-38-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:56 +0000 (14:38 +0800)]
virtio: add helper virtio_find_vqs_ctx_size()
Introduce helper virtio_find_vqs_ctx_size() to call find_vqs and specify
the maximum size of each vq ring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-37-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:55 +0000 (14:38 +0800)]
virtio_mmio: support the arg sizes of find_vqs()
Virtio MMIO support the new parameter sizes of find_vqs().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-36-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:54 +0000 (14:38 +0800)]
virtio_pci: support the arg sizes of find_vqs()
Virtio PCI supports new parameter sizes of find_vqs().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-35-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:53 +0000 (14:38 +0800)]
virtio: find_vqs() add arg sizes
find_vqs() adds a new parameter sizes to specify the size of each vq
vring.
NULL as sizes means that all queues in find_vqs() use the maximum size.
A value in the array is 0, which means that the corresponding queue uses
the maximum size.
In the split scenario, the meaning of size is the largest size, because
it may be limited by memory, the virtio core will try a smaller size.
And the size is power of 2.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-34-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:52 +0000 (14:38 +0800)]
virtio_pci: support VIRTIO_F_RING_RESET
This patch implements virtio pci support for QUEUE RESET.
Performing reset on a queue is divided into these steps:
1. notify the device to reset the queue
2. recycle the buffer submitted
3. reset the vring (may re-alloc)
4. mmap vring to device, and enable the queue
This patch implements virtio_reset_vq(), virtio_enable_resetq() in the
pci scenario.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-33-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:51 +0000 (14:38 +0800)]
virtio_pci: extract the logic of active vq for modern pci
Introduce vp_active_vq() to configure vring to backend after vq attach
vring. And configure vq vector if necessary.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-32-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:50 +0000 (14:38 +0800)]
virtio_pci: introduce helper to get/set queue reset
Introduce new helpers to implement queue reset and get queue reset
status.
https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-31-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:49 +0000 (14:38 +0800)]
virtio_pci: struct virtio_pci_common_cfg add queue_reset
Add queue_reset in virtio_pci_modern_common_cfg.
https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-30-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:48 +0000 (14:38 +0800)]
virtio_ring: struct virtqueue introduce reset
Introduce a new member reset to the structure virtqueue to determine
whether the current vq is in the reset state. Subsequent patches will
use it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-29-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:47 +0000 (14:38 +0800)]
virtio: queue_reset: add VIRTIO_F_RING_RESET
Added VIRTIO_F_RING_RESET, it came from here
https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139
This feature indicates that the driver can reset a queue individually.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-28-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:46 +0000 (14:38 +0800)]
virtio: allow to unbreak/break virtqueue individually
This patch allows the new introduced
__virtqueue_break()/__virtqueue_unbreak() to break/unbreak the
virtqueue.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-27-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:45 +0000 (14:38 +0800)]
virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
here https://github.com/oasis-tcs/virtio-spec/issues/89
In order not to affect the API, add a dedicated structure struct
virtio_pci_modern_common_cfg to virtio_pci_modern.h.
Since I want to add queue_reset after queue_notify_data, I submitted
this patch first.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-26-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:44 +0000 (14:38 +0800)]
virtio_ring: introduce virtqueue_resize()
Introduce virtqueue_resize() to implement the resize of vring.
Based on these, the driver can dynamically adjust the size of the vring.
For example: ethtool -G.
virtqueue_resize() implements resize based on the vq reset function. In
case of failure to allocate a new vring, it will give up resize and use
the original vring.
During this process, if the re-enable reset vq fails, the vq can no
longer be used. Although the probability of this situation is not high.
The parameter recycle is used to recycle the buffer that is no longer
used.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-25-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:43 +0000 (14:38 +0800)]
virtio_ring: packed: introduce virtqueue_resize_packed()
virtio ring packed supports resize.
Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.
In the case of an error, re-initialize(by virtqueue_reinit_packed()) the
virtqueue to ensure that the vring can be used.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-24-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:42 +0000 (14:38 +0800)]
virtio_ring: packed: introduce virtqueue_reinit_packed()
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.
Subsequent patches will call this function after reset vq to
reinitialize vq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-23-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:41 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of attach vring
Separate the logic of attach vring, the subsequent patch will call it
separately.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-22-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:40 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will
call it separately.
This function completes the variable initialization of packed vring. It
together with the logic of atatch constitutes the initialization of
vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-21-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:39 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of alloc state and extra
Separate the logic for alloc desc_state and desc_extra, which will
be called separately by subsequent patches.
Use struct vring_packed to pass desc_state, desc_extra.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-20-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:38 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of alloc queue
Separate the logic of packed to create vring queue.
This feature is required for subsequent virtuqueue reset vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-19-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:37 +0000 (14:38 +0800)]
virtio_ring: packed: introduce vring_free_packed
Free the structure struct vring_vritqueue_packed.
Subsequent patches require it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-18-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:36 +0000 (14:38 +0800)]
virtio_ring: split: introduce virtqueue_resize_split()
virtio ring split supports resize.
Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.
In the case of an error, re-initialize(virtqueue_reinit_split()) the
virtqueue to ensure that the vring can be used.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-17-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:35 +0000 (14:38 +0800)]
virtio_ring: split: reserve vring_align, may_reduce_num
In vring_alloc_queue_split() save vring_align, may_reduce_num to
structure vring_virtqueue_split. Used to create a new vring when
implementing resize.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-16-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:34 +0000 (14:38 +0800)]
virtio_ring: split: introduce virtqueue_reinit_split()
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.
Subsequent patches will call this function after reset vq to
reinitialize vq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-15-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:33 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of attach vring
Separate the logic of attach vring, subsequent patches will call it
separately.
virtqueue_vring_init_split() completes the initialization of other
variables of vring split. We can directly use
vq->split = *vring_split to complete attach.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-14-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:32 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will
call it separately.
This function completes the variable initialization of split vring. It
together with the logic of atatch constitutes the initialization of
vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-13-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:31 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of alloc state and extra
Separate the logic of creating desc_state, desc_extra, and subsequent
patches will call it independently.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-12-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:30 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of alloc queue
Separate the logic of split to create vring queue.
This feature is required for subsequent virtuqueue reset vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-11-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:29 +0000 (14:38 +0800)]
virtio_ring: split: introduce vring_free_split()
Free the structure struct vring_vritqueue_split.
Subsequent patches require it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-10-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:28 +0000 (14:38 +0800)]
virtio_ring: split: __vring_new_virtqueue() accept struct vring_virtqueue_split
__vring_new_virtqueue() instead accepts struct vring_virtqueue_split.
The purpose of this is to pass more information into
__vring_new_virtqueue() to make the code simpler and the structure
cleaner.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-9-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:27 +0000 (14:38 +0800)]
virtio_ring: split: stop __vring_new_virtqueue as export symbol
There is currently only one place to reference __vring_new_virtqueue()
directly from the outside of virtio core. And here vring_new_virtqueue()
can be used instead.
Subsequent patches will modify __vring_new_virtqueue, so stop it as an
export symbol for now.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-8-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:26 +0000 (14:38 +0800)]
virtio_ring: introduce virtqueue_init()
Separate the logic of virtqueue initialization. These variables should
be reset during reset.
This logic can be called independently when implementing resize/reset
later.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-7-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:25 +0000 (14:38 +0800)]
virtio_ring: split vring_virtqueue
Separate the two inline structures(split and packed) from the structure
vring_virtqueue.
In this way, we can use these two structures later to pass parameters
and retain temporary variables.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-6-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:24 +0000 (14:38 +0800)]
virtio_ring: extract the logic of freeing vring
Introduce vring_free() to free the vring of vq.
Subsequent patches will use vring_free() alone.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-5-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:23 +0000 (14:38 +0800)]
virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset
Added documentation for virtqueue_detach_unused_buf, allowing it to be
called on queue reset.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-4-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:22 +0000 (14:38 +0800)]
virtio: struct virtio_config_ops add callbacks for queue_reset
reset can be divided into the following four steps (example):
1. transport: notify the device to reset the queue
2. vring: recycle the buffer submitted
3. vring: reset/resize the vring (may re-alloc)
4. transport: mmap vring to device, and enable the queue
In order to support queue reset, add two callbacks in struct
virtio_config_ops to implement steps 1 and 4.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-3-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:21 +0000 (14:38 +0800)]
virtio: record the maximum queue num supported by the device.
virtio-net can display the maximum (supported by hardware) ring size in
ethtool -g eth0.
When the subsequent patch implements vring reset, it can judge whether
the ring size passed by the driver is legal based on this.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220801063902.129329-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Fri, 10 Jun 2022 09:47:37 +0000 (11:47 +0200)]
drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architectures
Let's make it clearer that simply unlocking CONFIG_VIRTIO_MEM on an
architecture is most probably not sufficient to have it working as
expected.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <
20220610094737.65254-1-david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Minghao Xue [Fri, 10 Jun 2022 08:58:27 +0000 (16:58 +0800)]
virtio_mmio: add support to set IRQ of a virtio device as wakeup source
According to virtio_mmio wakeup flag in device trees, set its IRQ
as wakeup source in virtqueue initialization.
Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com>
Message-Id: <
1654851507-13891-3-git-send-email-quic_mingxue@quicinc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Minghao Xue [Fri, 10 Jun 2022 08:58:26 +0000 (16:58 +0800)]
dt-bindings: virtio: mmio: add optional wakeup-source property
Some systems want to set the interrupt of virtio_mmio device
as a wakeup source. On such systems, we'll use the existence
of the "wakeup-source" property as a signal of requirement.
Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <
1654851507-13891-2-git-send-email-quic_mingxue@quicinc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Robin Murphy [Wed, 8 Jun 2022 11:48:26 +0000 (12:48 +0100)]
vdpa: Use device_iommu_capable()
Use the new interface to check the capability for our device
specifically.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Message-Id: <
548e316fa282ce513fabb991a4c4d92258062eb5.
1654688822.git.robin.murphy@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Michael S. Tsirkin [Thu, 30 Jun 2022 19:10:57 +0000 (15:10 -0400)]
virtio: VIRTIO_HARDEN_NOTIFICATION is broken
This option doesn't really work and breaks too many drivers.
Not yet sure what's the right thing to do, for now
let's make sure randconfig isn't broken by this.
Fixes:
c346dae4f3fb ("virtio: disable notification hardening by default")
Cc: "Jason Wang" <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Jason Wang [Tue, 28 Jun 2022 08:34:30 +0000 (16:34 +0800)]
virtio_pmem: set device ready in probe()
The NVDIMM region could be available before the virtio_device_ready()
that is called by virtio_dev_probe(). This means the driver tries to
use device before DRIVER_OK which violates the spec, fixing this by
set device ready before the nvdimm_pmem_region_create().
Note that this means the virtio_pmem_host_ack() could be triggered
before the creation of the nd region, this is safe since the pmem_lock
has been initialized and whether or not any available buffer is added
before is validated by virtio_pmem_host_ack().
Fixes
6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Acked-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220628083430.61856-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jason Wang [Tue, 28 Jun 2022 08:34:29 +0000 (16:34 +0800)]
virtio_pmem: initialize provider_data through nd_region_desc
We used to initialize the provider_data manually after
nvdimm_pemm_region_create(). This seems to be racy if the flush is
issued before the initialization of provider_data[1]. Fixing this by
initializing the provider_data through nd_region_desc to make sure the
provider_data is ready after the pmem is created.
[1]:
[ 80.152281] nd_pmem namespace0.0: unable to guarantee persistence of writes
[ 92.393956] BUG: kernel NULL pointer dereference, address:
0000000000000318
[ 92.394551] #PF: supervisor read access in kernel mode
[ 92.394955] #PF: error_code(0x0000) - not-present page
[ 92.395365] PGD 0 P4D 0
[ 92.395566] Oops: 0000 [#1] PREEMPT SMP PTI
[ 92.395867] CPU: 2 PID: 506 Comm: mkfs.ext4 Not tainted 5.19.0-rc1+ #453
[ 92.396365] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 92.397178] RIP: 0010:virtio_pmem_flush+0x2f/0x1f0
[ 92.397521] Code: 55 41 54 55 53 48 81 ec a0 00 00 00 65 48 8b 04
25 28 00 00 00 48 89 84 24 98 00 00 00 31 c0 48 8b 87 78 03 00 00 48
89 04 24 <48> 8b 98 18 03 00 00 e8 85 bf 6b 00 ba 58 00 00 00 be c0 0c
00 00
[ 92.398982] RSP: 0018:
ffff9a7380aefc88 EFLAGS:
00010246
[ 92.399349] RAX:
0000000000000000 RBX:
ffff8e77c3f86f00 RCX:
0000000000000000
[ 92.399833] RDX:
ffffffffad4ea720 RSI:
ffff8e77c41e39c0 RDI:
ffff8e77c41c5c00
[ 92.400388] RBP:
ffff8e77c41e39c0 R08:
ffff8e77c19f0600 R09:
0000000000000000
[ 92.400874] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8e77c0814e28
[ 92.401364] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff8e77c41e39c0
[ 92.401849] FS:
00007f3cd75b2780(0000) GS:
ffff8e7937d00000(0000)
knlGS:
0000000000000000
[ 92.402423] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 92.402821] CR2:
0000000000000318 CR3:
0000000103c80002 CR4:
0000000000370ee0
[ 92.403307] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 92.403793] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 92.404278] Call Trace:
[ 92.404481] <TASK>
[ 92.404654] ? mempool_alloc+0x5d/0x160
[ 92.404939] ? terminate_walk+0x5f/0xf0
[ 92.405226] ? bio_alloc_bioset+0xbb/0x3f0
[ 92.405525] async_pmem_flush+0x17/0x80
[ 92.405806] nvdimm_flush+0x11/0x30
[ 92.406067] pmem_submit_bio+0x1e9/0x200
[ 92.406354] __submit_bio+0x80/0x120
[ 92.406621] submit_bio_noacct_nocheck+0xdc/0x2a0
[ 92.406958] submit_bio_wait+0x4e/0x80
[ 92.407234] blkdev_issue_flush+0x31/0x50
[ 92.407526] ? punt_bios_to_rescuer+0x230/0x230
[ 92.407852] blkdev_fsync+0x1e/0x30
[ 92.408112] do_fsync+0x33/0x70
[ 92.408354] __x64_sys_fsync+0xb/0x10
[ 92.408625] do_syscall_64+0x43/0x90
[ 92.408895] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 92.409257] RIP: 0033:0x7f3cd76c6c44
Fixes
6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Acked-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220628083430.61856-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Fri, 24 Jun 2022 07:56:56 +0000 (09:56 +0200)]
vringh: iterate on iotlb_translate to handle large translations
iotlb_translate() can return -ENOBUFS if the bio_vec is not big enough
to contain all the ranges for translation.
This can happen for example if the VMM maps a large bounce buffer,
without using hugepages, that requires more than 16 ranges to translate
the addresses.
To handle this case, let's extend iotlb_translate() to also return the
number of bytes successfully translated.
In copy_from_iotlb()/copy_to_iotlb() loops by calling iotlb_translate()
several times until we complete the translation.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <
20220624075656.13997-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Fri, 24 Jun 2022 02:55:45 +0000 (10:55 +0800)]
virtio_ring: remove the arg vq of vring_alloc_desc_extra()
The parameter vq of vring_alloc_desc_extra() is useless. This patch
removes this parameter.
Subsequent patches will call this function to avoid passing useless
arguments.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <
20220624025621.128843-6-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Fri, 24 Jun 2022 02:55:41 +0000 (10:55 +0800)]
remoteproc: rename len of rpoc_vring to num
Rename the member len in the structure rpoc_vring to num. And remove 'in
bytes' from the comment of it. This is misleading. Because this actually
refers to the size of the virtio vring to be created. The unit is not
bytes.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <
20220624025621.128843-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Sun, 31 Jul 2022 21:03:01 +0000 (14:03 -0700)]
Linux 5.19
Linus Torvalds [Sun, 31 Jul 2022 16:52:20 +0000 (09:52 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One-liner fix of a NULL pointer deref in the Allwinner clk driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: Fix H6 RTC clock definition
Linus Torvalds [Sun, 31 Jul 2022 16:26:53 +0000 (09:26 -0700)]
Merge tag 'x86_urgent_for_v5.19' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Update the 'mitigations=' kernel param documentation
- Check the IBPB feature flag before enabling IBPB in firmware calls
because cloud vendors' fantasy when it comes to creating guest
configurations is unlimited
- Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV
doesn't need it anymore
- Remove dead CONFIG_* items
* tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
x86/configs: Update configs in x86_debug.config
Linus Torvalds [Sun, 31 Jul 2022 16:21:13 +0000 (09:21 -0700)]
Merge tag 'locking_urgent_for_v5.19' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Avoid rwsem lockups in certain situations when handling the handoff
bit
* tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
Linus Torvalds [Sun, 31 Jul 2022 16:12:58 +0000 (09:12 -0700)]
Merge tag 'edac_urgent_for_v5.19' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Relax the condition under which the DIMM label in ghes_edac is set in
order to accomodate an HPE BIOS which sets only the device but not
the bank
- Two forgotten fixes to synopsys_edac when handling error interrupts
* tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Set the DIMM label unconditionally
EDAC/synopsys: Re-enable the error interrupts on v3 hw
EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
Linus Torvalds [Sun, 31 Jul 2022 00:24:16 +0000 (17:24 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Last set of ARM fixes for 5.19:
- fix for MAX_DMA_ADDRESS overflow
- fix for find_*_bit performing an out of bounds memory access"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: findbit: fix overflowing offset
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
Waiman Long [Wed, 22 Jun 2022 20:04:19 +0000 (16:04 -0400)]
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit
d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.
Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock. This is not the
case before commit
d257cc8cb8d5 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.
In some cases, this new behavior may cause lockups as shown in [1] and
[2].
This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].
[1] https://lore.kernel.org/lkml/
20220617134325.GC30825@techsingularity.net/
[2] https://lore.kernel.org/lkml/
3f02975c-1a9d-be20-32cf-
f1d8e3dfafcc@oracle.com/
Fixes:
d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com
Linus Torvalds [Sat, 30 Jul 2022 04:02:35 +0000 (21:02 -0700)]
Merge tag 'mm-hotfixes-stable-2022-07-29' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Two hotfixes, both cc:stable"
* tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hmm: fault non-owner device private entries
page_alloc: fix invalid watermark check on a negative value
Linus Torvalds [Fri, 29 Jul 2022 23:07:35 +0000 (16:07 -0700)]
Merge tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a single fix for NVMe, yet another quirk addition"
* tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block:
nvme-pci: Crucial P2 has bogus namespace ids
Linus Torvalds [Fri, 29 Jul 2022 20:25:31 +0000 (13:25 -0700)]
Merge tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Maxime had the dog^Wmailing list server eat his homework^Wmisc pull
request.
Two more small fixes, one in nouveau svm code and the other in
simpledrm.
nouveau:
- page migration fix
simpledrm:
- fix mode_valid return value"
* tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm:
nouveau/svm: Fix to migrate all requested pages
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
Dave Airlie [Fri, 29 Jul 2022 20:09:48 +0000 (06:09 +1000)]
Merge tag 'drm-misc-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One fix to fix simpledrm mode_valid return value, and one for page
migration in nouveau
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220729094514.sfzhc3gqjgwgal62@penduick
Linus Torvalds [Fri, 29 Jul 2022 20:07:03 +0000 (13:07 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, three in drivers.
The two biggest fixes are ufs and the remaining driver and core fix
are small and obvious (and the core fix is low risk)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix a race condition related to device management
scsi: core: Fix warning in scsi_alloc_sgtables()
scsi: ufs: host: Hold reference returned by of_parse_phandle()
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
Eiichi Tsukata [Thu, 28 Jul 2022 04:39:07 +0000 (04:39 +0000)]
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt"
with the respective retbleed= settings.
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: corbet@lwn.net
Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
Ralph Campbell [Mon, 25 Jul 2022 18:36:14 +0000 (11:36 -0700)]
mm/hmm: fault non-owner device private entries
If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.
For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error. With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.
Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com
Fixes:
08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reported-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jaewon Kim [Mon, 25 Jul 2022 09:52:12 +0000 (18:52 +0900)]
page_alloc: fix invalid watermark check on a negative value
There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.
This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit
f27ce0e14088 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.
If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.
Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.
Handle the negative case by using MIN.
Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes:
f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Reported-by: GyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Fri, 29 Jul 2022 18:26:28 +0000 (11:26 -0700)]
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix addresses for bss symbols, describing variables used in resolving
data access in tools such as 'perf c2c' and 'perf mem'.
- Skip symbols if SHF_ALLOC flag is not set, a technique used for
listing deprecated symbols, its addresses are zeros, so not useful.
- Remove undefined behavior from bpf_perf_object__next() when dealing
with an empty bpf_objects_list list.
- Make a ARM CoreSight disasm script work with both python2 and
python3.
- Sync x86's cpufeatures header with with the kernel sources.
* tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf: Remove undefined behavior from bpf_perf_object__next()
perf symbol: Skip symbols if SHF_ALLOC flag is not set
perf symbol: Correct address for bss symbols
perf scripts python: Let script to be python2 compliant
tools headers cpufeatures: Sync with the kernel sources
Linus Torvalds [Fri, 29 Jul 2022 18:20:40 +0000 (11:20 -0700)]
Merge tag 'wq-for-5.19-rc8-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Just one commit to suppress a spurious warning added during the 5.19
cycle"
* tag 'wq-for-5.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Avoid a false warning in unbind_workers()
Linus Torvalds [Fri, 29 Jul 2022 17:57:26 +0000 (10:57 -0700)]
Merge tag 'pm-5.19-rc9' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Make some false positive RCU splats resulting from a recent intel_idle
driver change go away (Waiman Long)"
* tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix false positive RCU splats due to incorrect hardirqs state
Lai Jiangshan [Fri, 29 Jul 2022 09:44:38 +0000 (17:44 +0800)]
workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.
Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.
It is very easy to trigger the warning:
Set wq_unbound_cpumask to a small set of CPUs.
Offline all the CPUs of wq_unbound_cpumask.
Offline an extra CPU and trigger the warning.
Fixes:
10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Fri, 29 Jul 2022 17:46:03 +0000 (10:46 -0700)]
Merge tag 'riscv-for-linus-5.19-rc9' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
"A build fix for 'make vdso_install' that avoids an issue trying to
install the compat VDSO"
* tag 'riscv-for-linus-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: compat: vdso: Fix vdso_install target
Linus Torvalds [Fri, 29 Jul 2022 17:10:30 +0000 (10:10 -0700)]
Merge tag 'loongarch-fixes-5.19-5' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- Fix cache size calculation, stack protection attributes, ptrace's
fpr_set and "ROM Size" in boardinfo
- Some cleanups and improvements of assembly
- Some cleanups of unused code and useless code
* tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix wrong "ROM Size" of boardinfo
LoongArch: Fix missing fcsr in ptrace's fpr_set
LoongArch: Fix shared cache size calculation
LoongArch: Disable executable stack by default
LoongArch: Remove unused variables
LoongArch: Remove clock setting during cpu hotplug stage
LoongArch: Remove useless header compiler.h
LoongArch: Remove several syntactic sugar macros for branches
LoongArch: Re-tab the assembly files
LoongArch: Simplify "BGT foo, zero" with BGTZ
LoongArch: Simplify "BLT foo, zero" with BLTZ
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
LoongArch: Use the "move" pseudo-instruction where applicable
LoongArch: Use the "jr" pseudo-instruction where applicable
LoongArch: Use ABI names of registers where appropriate
Linus Torvalds [Fri, 29 Jul 2022 16:57:07 +0000 (09:57 -0700)]
Merge tag 'powerpc-5.19-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Re-enable the new amdgpu display engine for powerpc, as long as the
compiler is correctly configured.
- Disable stack variable initialisation in prom_init to fix GCC 12
allmodconfig.
Thanks to Dan Horák and Sudip Mukherjee.
* tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
drm/amdgpu: Re-enable DCN for 64-bit powerpc
powerpc/64s: Disable stack variable initialisation for prom_init
Tiezhu Yang [Thu, 21 Jul 2022 09:53:01 +0000 (17:53 +0800)]
LoongArch: Fix wrong "ROM Size" of boardinfo
We can see the "ROM Size" is different in the following outputs:
[root@linux loongson]# cat /sys/firmware/loongson/boardinfo
BIOS Information
Vendor : Loongson
Version : vUDK2018-LoongArch-V2.0.pre-beta8
ROM Size : 63 KB
Release Date : 06/15/2022
Board Information
Manufacturer : Loongson
Board Name : Loongson-LS3A5000-7A1000-1w-A2101
Family : LOONGSON64
[root@linux loongson]# dmidecode | head -11
...
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: Loongson
Version: vUDK2018-LoongArch-V2.0.pre-beta8
Release Date: 06/15/2022
ROM Size: 4 MB
According to "BIOS Information (Type 0) structure" in the SMBIOS
Reference Specification [1], it shows 64K * (n+1) is the size of
the physical device containing the BIOS if the size is less than
16M.
Additionally, we can see the related code in dmidecode [2]:
u64 s = { .l = (code1 + 1) << 6 };
So the output of dmidecode is correct, the output of boardinfo
is wrong, fix it.
By the way, at present no need to consider the size is 16M or
greater on LoongArch, because it is usually 4M or 8M which is
enough to use.
[1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.6.0.pdf
[2] https://git.savannah.nongnu.org/cgit/dmidecode.git/tree/dmidecode.c#n347
Fixes:
628c3bb40e9a ("LoongArch: Add boot and setup routines")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qi Hu [Thu, 14 Jul 2022 06:25:50 +0000 (14:25 +0800)]
LoongArch: Fix missing fcsr in ptrace's fpr_set
In file ptrace.c, function fpr_set does not copy fcsr data from ubuf
to kbuf. That's the reason why fcsr cannot be modified by ptrace.
This patch fixs this problem and allows users using ptrace to modify
the fcsr.
Co-developed-by: Xu Li <lixu@loongson.cn>
Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 13 Jul 2022 10:00:41 +0000 (18:00 +0800)]
LoongArch: Fix shared cache size calculation
Current calculation of shared cache size is from the node (die) scope,
but we hope 'lscpu' to show the shared cache size of the whole package
for multi-die chips (e.g., Loongson-3C5000L, which contains 4 dies in
one package). So fix it by multiplying nodes_per_package.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Tue, 26 Jul 2022 12:43:11 +0000 (20:43 +0800)]
LoongArch: Disable executable stack by default
Disable executable stack for LoongArch by default, as all modern
architectures do.
Reported-by: Andreas Schwab <schwab@suse.de>
Suggested-by: WANG Xuerui <git@xen0n.name>
Link: https://sourceware.org/pipermail/binutils/2022-July/121992.html
Tested-by: WANG Xuerui <git@xen0n.name>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 20 Jul 2022 07:21:52 +0000 (15:21 +0800)]
LoongArch: Remove unused variables
There are some variables never used or referenced, this patch
removes these varaibles and make the code cleaner.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 20 Jul 2022 07:21:51 +0000 (15:21 +0800)]
LoongArch: Remove clock setting during cpu hotplug stage
On physical machine we can save power by disabling clock of hot removed
cpu. However as different platforms require different methods to
configure clocks, the code is platform-specific, and probably belongs to
firmware/pmu or cpu regulator, rather than generic arch/loongarch code.
Also, there is no such register on QEMU virt machine since the
clock/frequency regulation is not emulated.
This patch removes the hard-coded clock register accesses in generic
LoongArch cpu hotplug flow.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Jun Yi [Thu, 21 Jul 2022 11:10:49 +0000 (19:10 +0800)]
LoongArch: Remove useless header compiler.h
The content of LoongArch's compiler.h is trivial, with some unused
anywhere, so inline the definitions and remove the header.
Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:15 +0000 (23:57 +0800)]
LoongArch: Remove several syntactic sugar macros for branches
These syntactic sugars have been supported by upstream binutils from the
beginning, so no need to patch them locally.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:22 +0000 (23:57 +0800)]
LoongArch: Re-tab the assembly files
Reflow the *.S files for better stylistic consistency, namely hard tabs
after mnemonic position, and vertical alignment of the first operand
with hard tabs. Tab width is obviously 8. Some pre-existing intra-block
vertical alignments are preserved.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:21 +0000 (23:57 +0800)]
LoongArch: Simplify "BGT foo, zero" with BGTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:20 +0000 (23:57 +0800)]
LoongArch: Simplify "BLT foo, zero" with BLTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:19 +0000 (23:57 +0800)]
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
While B{EQ,NE}Z and B{EQ,NE} are different instructions, and the vastly
expanded range for branch destination does not really matter in the few
cases touched, use the B{EQ,NE}Z where possible for shorter lines and
better consistency (e.g. some places used "BEQ foo, zero", while some
used "BEQ zero, foo").
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:18 +0000 (23:57 +0800)]
LoongArch: Use the "move" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.
As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:17 +0000 (23:57 +0800)]
LoongArch: Use the "jr" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.
As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:16 +0000 (23:57 +0800)]
LoongArch: Use ABI names of registers where appropriate
Some of the assembly in the LoongArch port seem to come from a
prehistoric time, when the assembler didn't even have support for the
ABI names we all come to know and love, thus used raw register numbers
which hampered readability.
The usages are found with a regex match inside arch/loongarch, then
manually adjusted for those non-definitions.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Russell King (Oracle) [Tue, 26 Jul 2022 22:51:48 +0000 (23:51 +0100)]
ARM: findbit: fix overflowing offset
When offset is larger than the size of the bit array, we should not
attempt to access the array as we can perform an access beyond the
end of the array. Fix this by changing the pre-condition.
Using "cmp r2, r1; bhs ..." covers us for the size == 0 case, since
this will always take the branch when r1 is zero, irrespective of
the value of r2. This means we can fix this bug without adding any
additional code!
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Thadeu Lima de Souza Cascardo [Thu, 28 Jul 2022 12:26:02 +0000 (09:26 -0300)]
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Some cloud hypervisors do not provide IBPB on very recent CPU processors,
including AMD processors affected by Retbleed.
Using IBPB before firmware calls on such systems would cause a GPF at boot
like the one below. Do not enable such calls when IBPB support is not
present.
EFI Variables Facility v0.08 2004-May-17
general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: efi_rts_wq efi_call_rts
RIP: 0010:efi_call_rts
Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48
RSP: 0018:
ffffb373800d7e38 EFLAGS:
00010246
RAX:
0000000000000001 RBX:
0000000000000006 RCX:
0000000000000049
RDX:
0000000000000000 RSI:
ffff94fbc19d8fe0 RDI:
ffff94fbc1b2b300
RBP:
ffffb373800d7e70 R08:
0000000000000000 R09:
0000000000000000
R10:
000000000000000b R11:
000000000000000b R12:
ffffb3738001fd78
R13:
ffff94fbc2fcfc00 R14:
ffffb3738001fd80 R15:
0000000000000048
FS:
0000000000000000(0000) GS:
ffff94fc3da00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff94fc30201000 CR3:
000000006f610000 CR4:
00000000000406f0
Call Trace:
<TASK>
? __wake_up
process_one_work
worker_thread
? rescuer_thread
kthread
? kthread_complete_and_exit
ret_from_fork
</TASK>
Modules linked in:
Fixes:
28a99e95f55c ("x86/amd: Use IBPB for firmware calls")
Reported-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com
Linus Torvalds [Fri, 29 Jul 2022 03:34:59 +0000 (20:34 -0700)]
Merge tag 'drm-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fix from Dave Airlie:
"Quiet extra week, just a single fix for i915 workaround with execlist
backend.
i915:
- Further reset robustness improvements for execlists [Wa_22011802037]"
* tag 'drm-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm:
drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend
Dave Airlie [Fri, 29 Jul 2022 01:39:13 +0000 (11:39 +1000)]
Merge tag 'drm-intel-fixes-2022-07-28-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Further reset robustness improvements for execlists [Wa_22011802037] (Umesh Nerlige Ramappa)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YuJIWaEbKcs/q0NY@tursulin-desk
Alistair Popple [Wed, 20 Jul 2022 06:27:45 +0000 (16:27 +1000)]
nouveau/svm: Fix to migrate all requested pages
Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.
Fix the calculation so that the entire range will get migrated if
possible.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes:
e3d8b0890469 ("drm/nouveau/svm: map pages after migration")
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com
Cc: <stable@vger.kernel.org> # v5.8+
Linus Torvalds [Thu, 28 Jul 2022 18:54:59 +0000 (11:54 -0700)]
Merge tag 'net-5.19-final' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter, no known blockers for
the release.
Current release - regressions:
- wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix
taking the lock before its initialized
- Bluetooth: mgmt: fix double free on error path
Current release - new code bugs:
- eth: ice: fix tunnel checksum offload with fragmented traffic
Previous releases - regressions:
- tcp: md5: fix IPv4-mapped support after refactoring, don't take the
pure v6 path
- Revert "tcp: change pingpong threshold to 3", improving detection
of interactive sessions
- mld: fix netdev refcount leak in mld_{query | report}_work() due to
a race
- Bluetooth:
- always set event mask on suspend, avoid early wake ups
- L2CAP: fix use-after-free caused by l2cap_chan_put
- bridge: do not send empty IFLA_AF_SPEC attribute
Previous releases - always broken:
- ping6: fix memleak in ipv6_renew_options()
- sctp: prevent null-deref caused by over-eager error paths
- virtio-net: fix the race between refill work and close, resulting
in NAPI scheduled after close and a BUG()
- macsec:
- fix three netlink parsing bugs
- avoid breaking the device state on invalid change requests
- fix a memleak in another error path
Misc:
- dt-bindings: net: ethernet-controller: rework 'fixed-link' schema
- two more batches of sysctl data race adornment"
* tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
stmmac: dwmac-mediatek: fix resource leak in probe
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net: ping6: Fix memleak in ipv6_renew_options().
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
sfc: disable softirqs for ptp TX
ptp: ocp: Select CRC16 in the Kconfig.
tcp: md5: fix IPv4-mapped support
virtio-net: fix the race between refill work and close
mptcp: Do not return EINPROGRESS when subflow creation succeeds
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
netfilter: nft_queue: only allow supported familes and hooks
...
Dan Carpenter [Thu, 28 Jul 2022 11:52:09 +0000 (14:52 +0300)]
stmmac: dwmac-mediatek: fix resource leak in probe
If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt()
before returning. Otherwise it is a resource leak.
Fixes:
fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziyang Xuan [Thu, 28 Jul 2022 01:33:07 +0000 (09:33 +0800)]
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.
=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr
0000000000000308 by task ping6/263
CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
dump_backtrace+0x1a8/0x230
show_stack+0x20/0x70
dump_stack_lvl+0x68/0x84
print_report+0xc4/0x120
kasan_report+0x84/0x120
__asan_load4+0x94/0xd0
find_match.part.0+0x70/0x134
__find_rr_leaf+0x408/0x470
fib6_table_lookup+0x264/0x540
ip6_pol_route+0xf4/0x260
ip6_pol_route_output+0x58/0x70
fib6_rule_lookup+0x1a8/0x330
ip6_route_output_flags_noref+0xd8/0x1a0
ip6_route_output_flags+0x58/0x160
ip6_dst_lookup_tail+0x5b4/0x85c
ip6_dst_lookup_flow+0x98/0x120
rawv6_sendmsg+0x49c/0xc70
inet_sendmsg+0x68/0x94
Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown
So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.
Fixes:
dcd1f572954f ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Thu, 28 Jul 2022 01:22:20 +0000 (18:22 -0700)]
net: ping6: Fix memleak in ipv6_renew_options().
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;
hdr = (struct ipv6_sr_hdr *)data;
hdr->hdrlen = 2;
hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf
...
-net.ipv4.ping_group_range = 0
2147483647
Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.
setsockopt
IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
IPV6_HOPOPTS (inet6_sk(sk)->opt)
IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
IPV6_RTHDR (inet6_sk(sk)->opt)
IPV6_DSTOPTS (inet6_sk(sk)->opt)
IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt
IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96):
comm "repro2", pid 231, jiffies
4294696626 (age 13.118s)
hex dump (first 32 bytes):
01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
[<
000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
[<
00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
[<
000000007096a025>] __sys_setsockopt (net/socket.c:2254)
[<
000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
[<
000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[<
00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=
a8430774139ec3ab7176
Fixes:
6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Jul 2022 09:31:12 +0000 (10:31 +0100)]
watch_queue: Fix missing locking in add_watch_to_object()
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Fixes:
c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 28 Jul 2022 09:31:06 +0000 (10:31 +0100)]
watch_queue: Fix missing rcu annotation
Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).
Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.
Fixes:
c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dimitris Michailidis [Tue, 26 Jul 2022 21:59:23 +0000 (14:59 -0700)]
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
The current implementation of fun_xdp_tx(), used for XPD_TX, is
incorrect in that it takes an address/length pair and later releases it
with page_frag_free(). It is OK for XDP_TX but the same code is used by
ndo_xdp_xmit. In that case it loses the XDP memory type and releases the
packet incorrectly for some of the types. Assorted breakage follows.
Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in
reclaim.
Fixes:
db37bc177dae ("net/funeth: add the data path")
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Link: https://lore.kernel.org/r/20220726215923.7887-1-dmichail@fungible.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 28 Jul 2022 02:56:28 +0000 (19:56 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-07-26
This series contains updates to ice driver only.
Przemyslaw corrects accounting for VF VLANs to allow for correct number
of VLANs for untrusted VF. He also correct issue with checksum offload
on VXLAN tunnels.
Ani allows for two VSIs to share the same MAC address.
Maciej corrects checked bits for descriptor completion of loopback
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
====================
Link: https://lore.kernel.org/r/20220726204646.2171589-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Mon, 25 Jul 2022 22:11:06 +0000 (18:11 -0400)]
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
A NULL pointer dereference was reported by Wei Chen:
BUG: kernel NULL pointer dereference, address:
0000000000000000
RIP: 0010:__list_del_entry_valid+0x26/0x80
Call Trace:
<TASK>
sctp_sched_dequeue_common+0x1c/0x90
sctp_sched_prio_dequeue+0x67/0x80
__sctp_outq_teardown+0x299/0x380
sctp_outq_free+0x15/0x20
sctp_association_free+0xc3/0x440
sctp_do_sm+0x1ca7/0x2210
sctp_assoc_bh_rcv+0x1f6/0x340
This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.
As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.
Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.
Fixes:
5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>