Parav Pandit [Thu, 12 Nov 2020 06:39:59 +0000 (08:39 +0200)]
vdpa: Add missing comment for virtqueue count
Add missing comment for number of virtqueue.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20201112064005.349268-2-parav@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Enrico Weigelt, metux IT consult [Wed, 2 Dec 2020 11:19:31 +0000 (12:19 +0100)]
uapi: virtio_ids: add missing device type IDs from OASIS spec
The OASIS virtio spec (1.1) defines several IDs that aren't reflected
in the header yet. Fixing this by adding the missing IDs, even though
they're not yet used by the kernel yet.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Link: https://lore.kernel.org/r/20201202111931.31953-2-info@metux.net
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Enrico Weigelt, metux IT consult [Wed, 2 Dec 2020 11:19:30 +0000 (12:19 +0100)]
uapi: virtio_ids.h: consistent indentions
Fixing the differing indentions to be consistent and properly aligned.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Link: https://lore.kernel.org/r/20201202111931.31953-1-info@metux.net
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zhang Changzhong [Fri, 4 Dec 2020 08:43:30 +0000 (16:43 +0800)]
vhost scsi: fix error return code in vhost_scsi_set_endpoint()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes:
25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1607071411-33484-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Dan Carpenter [Fri, 4 Dec 2020 14:23:36 +0000 (17:23 +0300)]
virtio_ring: Fix two use after free bugs
The "vq" struct is added to the "vdev->vqs" list prematurely. If we
encounter an error later in the function then the "vq" is freed, but
since it is still on the list that could lead to a use after free bug.
Fixes:
cbeedb72b97a ("virtio_ring: allocate desc state for split ring separately")
Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de>
Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8pGaG/zkI3jk8mk@mwanda
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Dan Carpenter [Fri, 4 Dec 2020 14:23:16 +0000 (17:23 +0300)]
virtio_net: Fix error code in probe()
Set a negative error code intead of returning success if the MTU has
been changed to something invalid.
Fixes:
fe36cbe0671e ("virtio_net: clear MTU when out of range")
Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de>
Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8pGVJSeeCdII1Ys@mwanda
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Dan Carpenter [Fri, 4 Dec 2020 14:23:00 +0000 (17:23 +0300)]
virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
There is a copy and paste bug in the error handling of this code and
it uses "ring_dma_addr" three times instead of "device_event_dma_addr"
and "driver_event_dma_addr".
Fixes:
1ce9e6055fa0 (" virtio_ring: introduce packed ring support")
Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de>
Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8pGRJlEzyn+04u2@mwanda
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Peng Fan [Wed, 9 Dec 2020 08:42:05 +0000 (16:42 +0800)]
tools/virtio: add barrier for aarch64
Add barrier for aarch64 for cross compiling, and most are from Linux Kernel.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20201209084205.24062-4-peng.fan@oss.nxp.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Peng Fan [Wed, 9 Dec 2020 08:42:04 +0000 (16:42 +0800)]
tools/virtio: add krealloc_array
krealloc_array is used in drivers/vhost/vringh.c, add it to avoid build
failure.
Drop WARN_ON_ONCE, because duplicated with the one in bug.h
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20201209084205.24062-3-peng.fan@oss.nxp.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Peng Fan [Wed, 9 Dec 2020 08:42:03 +0000 (16:42 +0800)]
tools/virtio: include asm/bug.h
WARN_ON is used in drivers/vhost/vringh.c, to avoid build failure,
need include asm/bug.h
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20201209084205.24062-2-peng.fan@oss.nxp.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Eli Cohen [Wed, 9 Dec 2020 14:00:04 +0000 (16:00 +0200)]
vdpa/mlx5: Use write memory barrier after updating CQ index
Make sure to put dma write memory barrier after updating CQ consumer
index so the hardware knows that there are available CQE slots in the
queue.
Failure to do this can cause the update of the RX doorbell record to get
updated before the CQ consumer index resulting in CQ overrun.
Fixes:
1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20201209140004.15892-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Max Gurtovoy [Tue, 15 Dec 2020 14:42:56 +0000 (15:42 +0100)]
vdpa: split vdpasim to core and net modules
Introduce new vdpa_sim_net and vdpa_sim (core) drivers. This is a
preparation for adding a vdpa simulator module for block devices.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
[sgarzare: various cleanups/fixes]
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-19-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:55 +0000 (15:42 +0100)]
vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov
vringh_getdesc_iotlb() manages 2 iovs for writable and readable
descriptors. This is very useful for the block device, where for
each request we have both types of descriptor.
Let's split the vdpasim_virtqueue's iov field in out_iov and
in_iov to use them with vringh_getdesc_iotlb().
We are using VIRTIO terminology for "out" (readable by the device)
and "in" (writable by the device) descriptors.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-18-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:54 +0000 (15:42 +0100)]
vdpa_sim: make vdpasim->buffer size configurable
Allow each device to specify the size of the buffer allocated
in vdpa_sim.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-17-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:53 +0000 (15:42 +0100)]
vdpa_sim: use kvmalloc to allocate vdpasim->buffer
The next patch will make the buffer size configurable from each
device.
Since the buffer could be larger than a page, we use kvmalloc()
instead of kmalloc().
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-16-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:52 +0000 (15:42 +0100)]
vdpa_sim: set vringh notify callback
Instead of calling the vq callback directly, we can leverage the
vringh_notify() function, adding vdpasim_vq_notify() and setting it
in the vringh notify callback.
Suggested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-15-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:51 +0000 (15:42 +0100)]
vdpa_sim: add set_config callback in vdpasim_dev_attr
The set_config callback can be used by the device to parse the
config structure modified by the driver.
The callback will be invoked, if set, in vdpasim_set_config() after
copying bytes from caller buffer into vdpasim->config buffer.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-14-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:50 +0000 (15:42 +0100)]
vdpa_sim: add get_config callback in vdpasim_dev_attr
The get_config callback can be used by the device to fill the
config structure.
The callback will be invoked in vdpasim_get_config() before copying
bytes into caller buffer.
Move vDPA-net config updates from vdpasim_set_features() in the
new vdpasim_net_get_config() callback.
This is safe since in vdpa_get_config() we already check that
.set_features() callback is called before .get_config().
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-13-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:49 +0000 (15:42 +0100)]
vdpa_sim: make 'config' generic and usable for any device type
Add new 'config_size' attribute in 'vdpasim_dev_attr' and allocates
'config' dynamically to support any device types.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-12-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:48 +0000 (15:42 +0100)]
vdpa_sim: store parsed MAC address in a buffer
As preparation for the next patches, we store the MAC address,
parsed during the vdpasim_create(), in a buffer that will be used
to fill 'config' together with other configurations.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-11-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:47 +0000 (15:42 +0100)]
vdpa_sim: add work_fn in vdpasim_dev_attr
Rename vdpasim_work() in vdpasim_net_work() and add it to
the vdpasim_dev_attr structure.
Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-10-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:46 +0000 (15:42 +0100)]
vdpa_sim: add supported_features field in vdpasim_dev_attr
Introduce a new VDPASIM_FEATURES macro with the generic features
supported by the vDPA simulator, and VDPASIM_NET_FEATURES macro with
vDPA-net features.
Add 'supported_features' field in vdpasim_dev_attr, to allow devices
to specify their features.
Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-9-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:45 +0000 (15:42 +0100)]
vdpa_sim: add device id field in vdpasim_dev_attr
Remove VDPASIM_DEVICE_ID macro and add 'id' field in vdpasim_dev_attr,
that will be returned by vdpasim_get_device_id().
Use VIRTIO_ID_NET for vDPA-net simulator device id.
Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-8-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:44 +0000 (15:42 +0100)]
vdpa_sim: add struct vdpasim_dev_attr for device attributes
vdpasim_dev_attr will contain device specific attributes. We starting
moving the number of virtqueues (i.e. nvqs) to vdpasim_dev_attr.
vdpasim_create() creates a new vDPA simulator following the device
attributes defined in the vdpasim_dev_attr parameter.
Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-7-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:43 +0000 (15:42 +0100)]
vdpa_sim: rename vdpasim_config_ops variables
These variables store generic callbacks used by the vDPA simulator
core, so we can remove the 'net' word in their names.
Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-6-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:42 +0000 (15:42 +0100)]
vdpa_sim: make IOTLB entries limit configurable
Some devices may require a higher limit for the number of IOTLB
entries, so let's make it configurable through a module parameter.
By default, it's initialized with the current limit (2048).
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-5-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Max Gurtovoy [Tue, 15 Dec 2020 14:42:41 +0000 (15:42 +0100)]
vdpa_sim: remove hard-coded virtq count
Add a new attribute that will define the number of virt queues to be
created for the vdpasim device.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
[sgarzare: replace kmalloc_array() with kcalloc()]
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-4-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:40 +0000 (15:42 +0100)]
vdpa_sim: remove unnecessary headers inclusion
Some headers are not necessary, so let's remove them to do
some cleaning.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-3-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefano Garzarella [Tue, 15 Dec 2020 14:42:39 +0000 (15:42 +0100)]
vdpa: remove unnecessary 'default n' in Kconfig entries
'default n' is not necessary since it is already the default when
nothing is specified.
Suggested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201215144256.155342-2-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Christophe JAILLET [Sun, 29 Nov 2020 12:54:34 +0000 (13:54 +0100)]
vdpa: ifcvf: Use dma_set_mask_and_coherent to simplify code
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.
While at it, fix a typo (s/confiugration/configuration)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20201129125434.1462638-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tian Tao [Wed, 11 Nov 2020 01:14:48 +0000 (09:14 +0800)]
vhost_vdpa: switch to vmemdup_user()
Replace opencoded alloc and copy with vmemdup_user()
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Link: https://lore.kernel.org/r/1605057288-60400-1-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:15 +0000 (14:38 +0100)]
virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
Let's add a safe mechanism to unplug memory, avoiding long/endless loops
when trying to offline memory - similar to in SBM.
Fake-offline all memory (via alloc_contig_range()) before trying to
offline+remove it. Use this mode as default, but allow to enable the other
mode explicitly (which could give better memory hotunplug guarantees in
some environments).
The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=0
on the cmdline.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-30-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:14 +0000 (14:38 +0100)]
virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
Let's try to unplug completely offline big blocks first. Then, (if
enabled via unplug_offline) try to offline and remove whole big blocks.
No locking necessary - we can deal with concurrent onlining/offlining
just fine.
Note1: This is sub-optimal and might be dangerous in some environments: we
could end up in an infinite loop when offlining (e.g., long-term pinnings),
similar as with DIMMs. We'll introduce safe memory hotunplug via
fake-offlining next, and use this basic mode only when explicitly enabled.
Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable
with bigger block sizes.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-29-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:13 +0000 (14:38 +0100)]
mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block
virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.
Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).
This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.
While we could compress the state into 2 bit, using 8 bit is much
easier.
This handling is similar, but different to acpi_scan_try_to_offline():
a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.
b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
David Hildenbrand [Thu, 12 Nov 2020 13:38:12 +0000 (14:38 +0100)]
virtio-mem: allow to force Big Block Mode (BBM) and set the big block size
Let's allow to force BBM, even if subblocks would be possible. Take care
of properly calculating the first big block id, because the start
address might no longer be aligned to the big block size.
Also, allow to manually configure the size of Big Blocks.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-27-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:11 +0000 (14:38 +0100)]
virtio-mem: Big Block Mode (BBM) memory hotplug
Currently, we do not support device block sizes that exceed the Linux
memory block size. For example, having a device block size of 1 GiB (e.g.,
gigantic pages in the hypervisor) won't work with 128 MiB Linux memory
blocks.
Let's implement Big Block Mode (BBM), whereby we add/remove at least
one Linux memory block at a time. With a 1 GiB device block size, a Big
Block (BB) will cover 8 Linux memory blocks.
We'll keep registering the online_page_callback machinery, it will be used
for safe memory hotunplug in BBM next.
Note: BBM is properly prepared for variable-sized Linux memory
blocks that we might see in the future. So we won't care how many Linux
memory blocks a big block actually spans, and how the memory notifier is
called.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-26-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:10 +0000 (14:38 +0100)]
virtio-mem: factor out adding/removing memory from Linux
Let's use wrappers for the low-level functions that dev_dbg/dev_warn
and work on addr + size, such that we can reuse them for adding/removing
in other granularity.
We only warn when adding memory failed, because that's something to pay
attention to. We won't warn when removing failed, we'll reuse that in
racy context soon (and we do have proper BUG_ON() statements in the
current cases where it must never happen).
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-25-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:09 +0000 (14:38 +0100)]
virtio-mem: memory notifier callbacks are specific to Sub Block Mode (SBM)
Let's rename accordingly.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-24-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:08 +0000 (14:38 +0100)]
virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM)
Let's rename them accordingly. virtio_mem_plug_request() and
virtio_mem_unplug_request() will be handled separately.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-23-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:07 +0000 (14:38 +0100)]
virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
Let's move first_mb_id/next_mb_id/last_usable_mb_id accordingly.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-22-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:06 +0000 (14:38 +0100)]
virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)
Let's rename to "sbs_per_mb" and "sb_size" and move accordingly.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-21-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:05 +0000 (14:38 +0100)]
virito-mem: subblock states are specific to Sub Block Mode (SBM)
Let's rename and move accordingly. While at it, rename sb_bitmap to
"sb_states".
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-20-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:04 +0000 (14:38 +0100)]
virtio-mem: memory block states are specific to Sub Block Mode (SBM)
let's use a new "sbm" sub-struct to hold SBM-specific state and rename +
move applicable definitions, functions, and variables (related to
memory block states).
While at it:
- Drop the "_STATE" part from memory block states
- Rename "nb_mb_state" to "mb_count"
- "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state"
- Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t"
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-19-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:03 +0000 (14:38 +0100)]
virito-mem: document Sub Block Mode (SBM)
Let's add some documentation for the current mode - Sub Block Mode (SBM) -
to prepare for a new mode - Big Block Mode (BBM).
Follow-up patches will properly factor out the existing Sub Block Mode
(SBM) and implement Big Block Mode (BBM).
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-18-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:02 +0000 (14:38 +0100)]
virtio-mem: generalize handling when memory is getting onlined deferred
We don't want to add too much memory when it's not getting onlined
immediately, to avoid running OOM. Generalize the handling, to avoid
making use of memory block states. Use a threshold of 1 GiB for now.
Properly adjust the offline size when adding/removing memory. As we are
not always protected by a lock when touching the offline size, use an
atomic64_t. We don't care about races (e.g., someone offlining memory
while we are adding more), only about consistent values.
(1 GiB needs a memmap of ~16MiB - which sounds reasonable even for
setups with little boot memory and (possibly) one virtio-mem device per
node)
We don't want to retrigger when onlining is caused immediately by our
action (e.g., adding memory which immediately gets onlined), so use a
flag to indicate if the workqueue is active and use that as an
indicator whether to trigger a retry. This will also be especially relevant
for Big Block Mode (BBM), whereby we might re-online memory in case
offlining of another memory block failed.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-17-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:01 +0000 (14:38 +0100)]
virtio-mem: don't always trigger the workqueue when offlining memory
Let's trigger from offlining code only when we're not allowed to unplug
online memory. Handle the other case (memmap possibly freeing up another
memory block) when actually removing memory. We now also properly handle
the case when removing already offline memory blocks via
virtio_mem_mb_remove(). When removing via virtio_mem_remove(), when
unloading the driver, virtio_mem_retry() is a NOP and safe to use.
While at it, move retry handling when offlining out of
virtio_mem_notify_offline(), to share it with Big Block Mode (BBM)
soon.
This is a preparation for Big Block Mode (BBM), whereby we can see some
temporary offlining of memory blocks without actually making progress.
Imagine you have a Big Block that spans to Linux memory blocks. Assume
the first Linux memory blocks has no unmovable data on it. When we would
call offline_and_remove_memory() on the big block, we would
1. Try to offline the first block. Works, notifiers triggered.
virtio_mem_retry() called.
2. Try to offline the second block. Does not work.
3. Re-online first block.
4. Exit to main loop, exit workqueue.
5. Retry immediately (due to virtio_mem_retry()), go to 1.
The result are endless retries.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-16-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:38:00 +0000 (14:38 +0100)]
virtio-mem: drop last_mb_id
No longer used, let's drop it.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-15-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:59 +0000 (14:37 +0100)]
virtio-mem: generalize virtio_mem_overlaps_range()
Avoid using memory block ids. While at it, use uint64_t for
address/size.
This is a preparation for Big Block Mode (BBM).
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-14-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:58 +0000 (14:37 +0100)]
virtio-mem: generalize virtio_mem_owned_mb()
Avoid using memory block ids. Rename it to virtio_mem_contains_range().
This is a preparation for Big Block Mode (BBM).
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-13-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:57 +0000 (14:37 +0100)]
virtio-mem: generalize check for added memory
Let's check by traversing busy system RAM resources instead, to avoid
relying on memory block states.
Don't use walk_system_ram_range(), as that works on pages and we want to
use the bare addresses we have easily at hand.
This is a preparation for Big Block Mode (BBM), which won't have memory
block states.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-12-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:56 +0000 (14:37 +0100)]
virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLE
ZONE_MOVABLE is supposed to give some guarantees, yet,
alloc_contig_range() isn't prepared to properly deal with some racy
cases properly (e.g., temporary page pinning when exiting processed, PCP).
Retry 5 times for now. There is certainly room for improvement in the
future.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-11-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:55 +0000 (14:37 +0100)]
virtio-mem: factor out handling of fake-offline pages in memory notifier
Let's factor out the core pieces and place the implementation next to
virtio_mem_fake_offline(). We'll reuse this functionality soon.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-10-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:54 +0000 (14:37 +0100)]
virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
... which now matches virtio_mem_fake_online(). We'll reuse this
functionality soon.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-9-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:53 +0000 (14:37 +0100)]
virtio-mem: print debug messages from virtio_mem_send_*_request()
Let's move the existing dev_dbg() into the functions, print if something
went wrong, and also print for virtio_mem_send_unplug_all_request().
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-8-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:52 +0000 (14:37 +0100)]
virtio-mem: factor out calculation of the bit number within the subblock bitmap
The calculation is already complicated enough, let's limit it to one
location.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-7-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:51 +0000 (14:37 +0100)]
virtio-mem: use "unsigned long" for nr_pages when fake onlining/offlining
No harm done, but let's be consistent.
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-6-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:50 +0000 (14:37 +0100)]
virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
We can drop rc2, we don't actually need the value.
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-5-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:49 +0000 (14:37 +0100)]
virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where
possible to simplify.
Add a comment why we have that restriction for now.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-4-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:48 +0000 (14:37 +0100)]
virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb()
We actually need one byte less (next_mb_id is exclusive, first_mb_id is
inclusive). While at it, compact the code.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-3-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
David Hildenbrand [Thu, 12 Nov 2020 13:37:47 +0000 (14:37 +0100)]
virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
Let's determine the target nid only once in case we have none specified -
usually, we'll end up with node 0 either way.
Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201112133815.13332-2-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Sun, 13 Dec 2020 22:41:30 +0000 (14:41 -0800)]
Linux 5.10
Linus Torvalds [Sun, 13 Dec 2020 19:31:19 +0000 (11:31 -0800)]
Merge tag 'x86-urgent-2020-12-13' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 and membarrier fixes:
- Correct a few problems in the x86 and the generic membarrier
implementation. Small corrections for assumptions about visibility
which have turned out not to be true.
- Make the PAT bits for memory encryption correct vs 4K and 2M/1G
page table entries as they are at a different location.
- Fix a concurrency issue in the the local bandwidth readout of
resource control leading to incorrect values
- Fix the ordering of allocating a vector for an interrupt. The order
missed to respect the provided cpumask when the first attempt of
allocating node local in the mask fails. It then tries the node
instead of trying the full provided mask first. This leads to
erroneous error messages and breaking the (user) supplied affinity
request. Reorder it.
- Make the INT3 padding detection in optprobe work correctly"
* tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix optprobe to detect INT3 padding correctly
x86/apic/vector: Fix ordering in vector assignment
x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP
membarrier: Execute SYNC_CORE on the calling thread
membarrier: Explicitly sync remote cores when SYNC_CORE is requested
membarrier: Add an actual barrier before rseq_preempt()
x86/membarrier: Get rid of a dubious optimization
Linus Torvalds [Sun, 13 Dec 2020 18:36:23 +0000 (10:36 -0800)]
Merge tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"This should be it for 5.10.
Mike and Song looked into the warning case, and thankfully it appears
the fix was pretty trivial - we can just change the md device chunk
type to unsigned int to get rid of it. They cannot currently be < 0,
and nobody is checking for that either.
We're reverting the discard changes as the corruption reports came in
very late, and there's just no time to attempt to deal with it at this
point. Reverting the changes in question is the right call for 5.10"
* tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block:
md: change mddev 'chunk_sectors' from int to unsigned
Revert "md: add md_submit_discard_bio() for submitting discard bio"
Revert "md/raid10: extend r10bio devs to raid disks"
Revert "md/raid10: pull codes that wait for blocked dev into one function"
Revert "md/raid10: improve raid10 discard request"
Revert "md/raid10: improve discard request for far layout"
Revert "dm raid: remove unnecessary discard limits for raid10"
Linus Torvalds [Sat, 12 Dec 2020 20:57:12 +0000 (12:57 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five small fixes. Four in drivers:
- hisi_sas: fix internal queue timeout
- be2iscsi: revert a prior fix causing problems
- bnx2i: add missing dependency
- storvsc: late arriving revert of a problem fix
and one in the core.
The core one is a minor change to stop paying attention to the busy
count when returning out of resources because there's a race window
where the queue might not restart due to missing returning I/O"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
Revert "scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()"
scsi: hisi_sas: Select a suitable queue for internal I/Os
scsi: core: Fix race between handling STS_RESOURCE and completion
scsi: be2iscsi: Revert "Fix a theoretical leak in beiscsi_create_eqs()"
scsi: bnx2i: Requires MMU
Linus Torvalds [Sat, 12 Dec 2020 20:47:46 +0000 (12:47 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Bugfix for the AT24 EEPROM driver"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
misc: eeprom: at24: fix NVMEM name with custom AT24 device name
Linus Torvalds [Sat, 12 Dec 2020 18:08:16 +0000 (10:08 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes for ARM, x86 and tools"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools/kvm_stat: Exempt time-based counters
KVM: mmu: Fix SPTE encoding of MMIO generation upper half
kvm: x86/mmu: Use cpuid to determine max gfn
kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()
selftests: kvm/set_memory_region_test: Fix race in move region test
KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
KVM: arm64: Fix handling of merging tables into a block entry
KVM: arm64: Fix memory leak on stage2 update of a valid PTE
Linus Torvalds [Sat, 12 Dec 2020 18:02:03 +0000 (10:02 -0800)]
Merge tag 'for-linus-5.10c-rc8-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A short series fixing a regression introduced in 5.9 for running as
Xen dom0 on a system with NVMe backed storage"
* tag 'for-linus-5.10c-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: don't use page->lru for ZONE_DEVICE memory
xen: add helpers for caching grant mapping pages
Linus Torvalds [Sat, 12 Dec 2020 17:50:26 +0000 (09:50 -0800)]
Merge tag 'riscv-for-linus-5.10-rc8' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
"Just one fix. It's nothing critical, just a randconfig that wasn't
building. That said, it does seem pretty safe and is technically a
regression so I'm sending it along for 5.10:
- define get_cycles64() all the time, as it's used by most
configurations"
* tag 'riscv-for-linus-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Define get_cycles64() regardless of M-mode
Linus Torvalds [Sat, 12 Dec 2020 17:45:01 +0000 (09:45 -0800)]
Merge tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two fixes in here, fixing issues introduced in this merge window"
* tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block:
io_uring: fix file leak on error path of io ctx creation
io_uring: fix mis-seting personality's creds
Linus Torvalds [Sat, 12 Dec 2020 17:41:33 +0000 (09:41 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for cm109 stomping on its own control URB if it tries to toggle
buzzer immediately after userspace opens input device (found by
syzcaller)
- another fix for Raydium touchscreens that do not like splitting
command transfers
- quirks for i8042, soc_button_array, and goodix drivers to make them
work better with certain hardware.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: goodix - add upside-down quirk for Teclast X98 Pro tablet
Input: cm109 - do not stomp on control URB
Input: i8042 - add Acer laptops to the i8042 reset list
Input: cros_ec_keyb - send 'scancodes' in addition to key events
Input: soc_button_array - add Lenovo Yoga Tablet2 1051L to the dmi_use_low_level_irq list
Input: raydium_ts_i2c - do not split tx transactions
Mike Snitzer [Sat, 12 Dec 2020 16:55:37 +0000 (11:55 -0500)]
md: change mddev 'chunk_sectors' from int to unsigned
Commit
e2782f560c29 ("Revert "dm raid: remove unnecessary discard
limits for raid10"") exposed compiler warnings introduced by commit
e0910c8e4f87 ("dm raid: fix discard limits for raid1 and raid10"):
In file included from ./include/linux/kernel.h:14,
from ./include/asm-generic/bug.h:20,
from ./arch/x86/include/asm/bug.h:93,
from ./include/linux/bug.h:5,
from ./include/linux/mmdebug.h:5,
from ./include/linux/gfp.h:5,
from ./include/linux/slab.h:15,
from drivers/md/dm-raid.c:8:
drivers/md/dm-raid.c: In function ‘raid_io_hints’:
./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast
(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
^~
./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
(__typecheck(x, y) && __no_side_effects(x, y))
^~~~~~~~~~~
./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
__builtin_choose_expr(__safe_cmp(x, y), \
^~~~~~~~~~
./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’
#define min(x, y) __careful_cmp(x, y, <)
^~~~~~~~~~~~~
./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’
__x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
^~~
drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’
limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
^~~~~~~~~~~~
Fix this by changing the chunk_sectors member of 'struct mddev' from
int to 'unsigned int' to match the type used for the 'chunk_sectors'
member of 'struct queue_limits'. Various MD code still uses 'int' but
none of it appears to ever make use of signed int; and storing
positive signed int in unsigned is perfectly safe.
Reported-by: Song Liu <songliubraving@fb.com>
Fixes:
e2782f560c29 ("Revert "dm raid: remove unnecessary discard limits for raid10"")
Fixes:
e0910c8e4f87 ("dm raid: fix discard limits for raid1 and raid10")
Cc: stable@vger,kernel.org # e0910c8e4f87 was marked for stable@
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Masami Hiramatsu [Fri, 11 Dec 2020 07:04:17 +0000 (16:04 +0900)]
x86/kprobes: Fix optprobe to detect INT3 padding correctly
Commit
7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
changed the padding bytes between functions from NOP to INT3. However,
when optprobe decodes a target function it finds INT3 and gives up the
jump optimization.
Instead of giving up any INT3 detection, check whether the rest of the
bytes to the end of the function are INT3. If all of them are INT3,
those come from the linker. In that case, continue the optprobe jump
optimization.
[ bp: Massage commit message. ]
Fixes:
7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Reported-by: Adam Zabrocki <pi3@pi3.com.pl>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
Simon Beginn [Sat, 12 Dec 2020 00:17:32 +0000 (16:17 -0800)]
Input: goodix - add upside-down quirk for Teclast X98 Pro tablet
The touchscreen on the Teclast x98 Pro is also mounted upside-down in
relation to the display orientation.
Signed-off-by: Simon Beginn <linux@simonmicro.de>
Signed-off-by: Bastien Nocera <hadess@hadess.net>
Link: https://lore.kernel.org/r/20201117004253.27A5A27EFD@localhost
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Stefan Raspl [Tue, 8 Dec 2020 21:08:29 +0000 (22:08 +0100)]
tools/kvm_stat: Exempt time-based counters
The new counters halt_poll_success_ns and halt_poll_fail_ns do not count
events. Instead they provide a time, and mess up our statistics. Therefore,
we should exclude them.
Removal is currently implemented with an exempt list. If more counters like
these appear, we can think about a more general rule like excluding all
fields name "*_ns", in case that's a standing convention.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Tested-and-reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <
20201208210829.101324-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maciej S. Szmigiero [Sat, 5 Dec 2020 00:48:08 +0000 (01:48 +0100)]
KVM: mmu: Fix SPTE encoding of MMIO generation upper half
Commit
cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
cleaned up the computation of MMIO generation SPTE masks, however it
introduced a bug how the upper part was encoded:
SPTE bits 52-61 were supposed to contain bits 10-19 of the current
generation number, however a missing shift encoded bits 1-10 there instead
(mostly duplicating the lower part of the encoded generation number that
then consisted of bits 1-9).
In the meantime, the upper part was shrunk by one bit and moved by
subsequent commits to become an upper half of the encoded generation number
(bits 9-17 of bits 0-17 encoded in a SPTE).
In addition to the above, commit
56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation")
has changed the SPTE bit range assigned to encode the generation number and
the total number of bits encoded but did not update them in the comment
attached to their defines, nor in the KVM MMU doc.
Let's do it here, too, since it is too trivial thing to warrant a separate
commit.
Fixes:
cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <
156700708db2a5296c5ed7a8b9ac71f1e9765c85.
1607129096.git.maciej.szmigiero@oracle.com>
Cc: stable@vger.kernel.org
[Reorganize macros so that everything is computed from the bit ranges. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 11 Dec 2020 22:29:46 +0000 (14:29 -0800)]
Merge tag 'mtd/fixes-for-5.10-rc8' of git://git./linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Second series of fixes for raw NAND drivers initiated because of a
rework of the ECC engine subsystem.
The location of the DT parsing logic got moved, breaking several
drivers which in fact were not doing the ECC engine initialization at
the right place.
These drivers have been fixed by enforcing a particular ECC engine
type and algorithm, software Hamming, while the algorithm may be
overwritten by a DT property. This merge request fixes this in the
xway, socrates, plat_nand, pasemi, orion, mpc5121, gpio, au1550 and
ams-delta controller drivers"
* tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: xway: Do not force a particular software ECC engine
mtd: rawnand: socrates: Do not force a particular software ECC engine
mtd: rawnand: plat_nand: Do not force a particular software ECC engine
mtd: rawnand: pasemi: Do not force a particular software ECC engine
mtd: rawnand: orion: Do not force a particular software ECC engine
mtd: rawnand: mpc5121: Do not force a particular software ECC engine
mtd: rawnand: gpio: Do not force a particular software ECC engine
mtd: rawnand: au1550: Do not force a particular software ECC engine
mtd: rawnand: ams-delta: Do not force a particular software ECC engine
Linus Torvalds [Fri, 11 Dec 2020 22:26:17 +0000 (14:26 -0800)]
Merge tag 'mmc-v5.10-rc4-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:
MMC core:
- Fixup condition for CMD13 polling for RPMB requests
MMC host:
- mtk-sd: Fix system suspend/resume support for CQHCI
- mtd-sd: Extend SDIO IRQ fix to more variants
- sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
- tmio: Bring HW to a sane state after a power off"
* tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mediatek: mark PM functions as __maybe_unused
mmc: block: Fixup condition for CMD13 polling for RPMB requests
mmc: tmio: improve bringing HW to a sane state with MMC_POWER_OFF
mmc: sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
mmc: mediatek: Extend recheck_sdio_irq fix to more variants
mmc: mediatek: Fix system suspend/resume support for CQHCI
Wolfram Sang [Fri, 11 Dec 2020 22:23:30 +0000 (23:23 +0100)]
Merge tag 'at24-fixes-for-v5.10' of git://git./linux/kernel/git/brgl/linux into i2c/for-current
at24 fixes for v5.10
- fix NVMEM name with custom AT24 device name
Linus Torvalds [Fri, 11 Dec 2020 22:22:42 +0000 (14:22 -0800)]
Merge tag 'zonefs-5.10-rc7' of git://git./linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
"A single patch in this pull request to fix a BIO and page reference
leak when writing sequential zone files"
* tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix page reference and BIO leak
Andrii Nakryiko [Fri, 11 Dec 2020 21:36:25 +0000 (22:36 +0100)]
bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers
Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.
Fixes:
eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 11 Dec 2020 22:10:51 +0000 (14:10 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: proc, selftests, kbuild, and
mm (pagecache, kasan, hugetlb)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/hugetlb: clear compound_nr before freeing gigantic pages
kasan: fix object remaining in offline per-cpu quarantine
elfcore: fix building with clang
initramfs: fix clang build failure
kbuild: avoid static_assert for genksyms
selftest/fpu: avoid clang warning
proc: use untagged_addr() for pagemap_read addresses
revert "mm/filemap: add static for function __add_to_page_cache_locked"
Gerald Schaefer [Fri, 11 Dec 2020 21:36:53 +0000 (13:36 -0800)]
mm/hugetlb: clear compound_nr before freeing gigantic pages
Commit
1378a5ee451a ("mm: store compound_nr as well as compound_order")
added compound_nr counter to first tail struct page, overlaying with
page->mapping. The overlay itself is fine, but while freeing gigantic
hugepages via free_contig_range(), a "bad page" check will trigger for
non-NULL page->mapping on the first tail page:
BUG: Bad page state in process bash pfn:380001
page:
00000000c35f0856 refcount:0 mapcount:0 mapping:
00000000126b68aa index:0x0 pfn:0x380001
aops:0x0
flags: 0x3ffff00000000000()
raw:
3ffff00000000000 0000000000000100 0000000000000122 0000000100000000
raw:
0000000000000000 0000000000000000 ffffffff00000000 0000000000000000
page dumped because: non-NULL mapping
Modules linked in:
CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-
20201208 #1
Hardware name: IBM 3906 M03 703 (LPAR)
Call Trace:
show_stack+0x6e/0xe8
dump_stack+0x90/0xc8
bad_page+0xd6/0x130
free_pcppages_bulk+0x26a/0x800
free_unref_page+0x6e/0x90
free_contig_range+0x94/0xe8
update_and_free_page+0x1c4/0x2c8
free_pool_huge_page+0x11e/0x138
set_max_huge_pages+0x228/0x300
nr_hugepages_store_common+0xb8/0x130
kernfs_fop_write+0xd2/0x218
vfs_write+0xb0/0x2b8
ksys_write+0xac/0xe0
system_call+0xe6/0x288
Disabling lock debugging due to kernel taint
This is because only the compound_order is cleared in
destroy_compound_gigantic_page(), and compound_nr is set to
1U << order == 1 for order 0 in set_compound_order(page, 0).
Fix this by explicitly clearing compound_nr for first tail page after
calling set_compound_order(page, 0).
Link: https://lkml.kernel.org/r/20201208182813.66391-2-gerald.schaefer@linux.ibm.com
Fixes:
1378a5ee451a ("mm: store compound_nr as well as compound_order")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kuan-Ying Lee [Fri, 11 Dec 2020 21:36:49 +0000 (13:36 -0800)]
kasan: fix object remaining in offline per-cpu quarantine
We hit this issue in our internal test. When enabling generic kasan, a
kfree()'d object is put into per-cpu quarantine first. If the cpu goes
offline, object still remains in the per-cpu quarantine. If we call
kmem_cache_destroy() now, slub will report "Objects remaining" error.
=============================================================================
BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
CPU: 3 PID: 176 Comm: cat Tainted: G B 5.10.0-rc1-00007-g4525c8781ec0-dirty #10
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2b0
show_stack+0x18/0x68
dump_stack+0xfc/0x168
slab_err+0xac/0xd4
__kmem_cache_shutdown+0x1e4/0x3c8
kmem_cache_destroy+0x68/0x130
test_version_show+0x84/0xf0
module_attr_show+0x40/0x60
sysfs_kf_seq_show+0x128/0x1c0
kernfs_seq_show+0xa0/0xb8
seq_read+0x1f0/0x7e8
kernfs_fop_read+0x70/0x338
vfs_read+0xe4/0x250
ksys_read+0xc8/0x180
__arm64_sys_read+0x44/0x58
el0_svc_common.constprop.0+0xac/0x228
do_el0_svc+0x38/0xa0
el0_sync_handler+0x170/0x178
el0_sync+0x174/0x180
INFO: Object 0x(____ptrval____) @offset=15848
INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
stack_trace_save+0x9c/0xd0
set_track+0x64/0xf0
alloc_debug_processing+0x104/0x1a0
___slab_alloc+0x628/0x648
__slab_alloc.isra.0+0x2c/0x58
kmem_cache_alloc+0x560/0x588
test_version_show+0x98/0xf0
module_attr_show+0x40/0x60
sysfs_kf_seq_show+0x128/0x1c0
kernfs_seq_show+0xa0/0xb8
seq_read+0x1f0/0x7e8
kernfs_fop_read+0x70/0x338
vfs_read+0xe4/0x250
ksys_read+0xc8/0x180
__arm64_sys_read+0x44/0x58
el0_svc_common.constprop.0+0xac/0x228
kmem_cache_destroy test_module_slab: Slab cache still has objects
Register a cpu hotplug function to remove all objects in the offline
per-cpu quarantine when cpu is going offline. Set a per-cpu variable to
indicate this cpu is offline.
[qiang.zhang@windriver.com: fix slab double free when cpu-hotplug]
Link: https://lkml.kernel.org/r/20201204102206.20237-1-qiang.zhang@windriver.com
Link: https://lkml.kernel.org/r/1606895585-17382-2-git-send-email-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Guangye Yang <guangye.yang@mediatek.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Qian Cai <qcai@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 11 Dec 2020 21:36:46 +0000 (13:36 -0800)]
elfcore: fix building with clang
kernel/elfcore.c only contains weak symbols, which triggers a bug with
clang in combination with recordmcount:
Cannot find symbol for section 2: .text.
kernel/elfcore.o: failed
Move the empty stubs into linux/elfcore.h as inline functions. As only
two architectures use these, just use the architecture specific Kconfig
symbols to key off the declaration.
Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Barret Rhoden <brho@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 11 Dec 2020 21:36:42 +0000 (13:36 -0800)]
initramfs: fix clang build failure
There is only one function in init/initramfs.c that is in the .text
section, and it is marked __weak. When building with clang-12 and the
integrated assembler, this leads to a bug with recordmcount:
./scripts/recordmcount "init/initramfs.o"
Cannot find symbol for section 2: .text.
init/initramfs.o: failed
I'm not quite sure what exactly goes wrong, but I notice that this
function is only ever called from an __init function, and normally
inlined. Marking it __init as well is clearly correct and it leads to
recordmcount no longer complaining.
Link: https://lkml.kernel.org/r/20201204165742.3815221-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Barret Rhoden <brho@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 11 Dec 2020 21:36:38 +0000 (13:36 -0800)]
kbuild: avoid static_assert for genksyms
genksyms does not know or care about the _Static_assert() built-in, and
sometimes falls back to ignoring the later symbols, which causes
undefined behavior such as
WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned.
ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object
net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation
Redefine static_assert for genksyms to avoid that.
Link: https://lkml.kernel.org/r/20201203230955.1482058-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 11 Dec 2020 21:36:35 +0000 (13:36 -0800)]
selftest/fpu: avoid clang warning
With extra warnings enabled, clang complains about the redundant
-mhard-float argument:
clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument]
Move this into the gcc-only part of the Makefile.
Link: https://lkml.kernel.org/r/20201203223652.1320700-1-arnd@kernel.org
Fixes:
4185b3b92792 ("selftests/fpu: Add an FPU selftest")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Petteri Aimonen <jpa@git.mail.kapsi.fi>
Cc: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miles Chen [Fri, 11 Dec 2020 21:36:31 +0000 (13:36 -0800)]
proc: use untagged_addr() for pagemap_read addresses
When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().
I tested with 5.10-rc4 and the issue remains.
Explanation from Catalin in [1]:
"Arguably, that's a user-space bug since tagged file offsets were never
supported. In this case it's not even a tag at bit 56 as per the arm64
tagged address ABI but rather down to bit 47. You could say that the
problem is caused by the C library (malloc()) or whoever created the
tagged vaddr and passed it to this function. It's not a kernel
regression as we've never supported it.
Now, pagemap is a special case where the offset is usually not
generated as a classic file offset but rather derived by shifting a
user virtual address. I guess we can make a concession for pagemap
(only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"
My test code is based on [2]:
A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8
userspace program:
uint64 OsLayer::VirtualToPhysical(void *vaddr) {
uint64 frame, paddr, pfnmask, pagemask;
int pagesize = sysconf(_SC_PAGESIZE);
off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
int fd = open(kPagemapPath, O_RDONLY);
...
if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
int err = errno;
string errtxt = ErrorString(err);
if (fd >= 0)
close(fd);
return 0;
}
...
}
kernel fs/proc/task_mmu.c:
static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
...
src = *ppos;
svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
end_vaddr = mm->task_size;
/* watch out for wraparound */
// svpfn == 0xb400007662f54
// (mm->task_size >> PAGE) == 0x8000000
if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
start_vaddr = end_vaddr;
ret = 0;
while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
int len;
unsigned long end;
...
}
...
}
[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158
Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.com
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
Cc: <stable@vger.kernel.org> [5.4-]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 11 Dec 2020 21:36:27 +0000 (13:36 -0800)]
revert "mm/filemap: add static for function __add_to_page_cache_locked"
Revert commit
3351b16af494 ("mm/filemap: add static for function
__add_to_page_cache_locked") due to incompatibility with
ALLOW_ERROR_INJECTION which result in build errors.
Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.com
Tested-by: Justin Forbes <jmforbes@linuxtx.org>
Tested-by: Greg Thelen <gthelen@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Torokhov [Thu, 10 Dec 2020 04:13:24 +0000 (20:13 -0800)]
Input: cm109 - do not stomp on control URB
We need to make sure we are not stomping on the control URB that was
issued when opening the device when attempting to toggle buzzer.
To do that we need to mark it as pending in cm109_open().
Reported-and-tested-by: syzbot+150f793ac5bc18eee150@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Miquel Raynal [Thu, 3 Dec 2020 19:03:40 +0000 (20:03 +0100)]
mtd: rawnand: xway: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
d525914b5bd8 ("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-10-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:39 +0000 (20:03 +0100)]
mtd: rawnand: socrates: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
b36bf0a0fe5d ("mtd: rawnand: socrates: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-9-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:38 +0000 (20:03 +0100)]
mtd: rawnand: plat_nand: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
612e048e6aab ("mtd: rawnand: plat_nand: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-8-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:37 +0000 (20:03 +0100)]
mtd: rawnand: pasemi: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
8fc6f1f042b2 ("mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-7-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:36 +0000 (20:03 +0100)]
mtd: rawnand: orion: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Fixes:
553508cec2e8 ("mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-6-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:35 +0000 (20:03 +0100)]
mtd: rawnand: mpc5121: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
6dd09f775b72 ("mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-5-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:34 +0000 (20:03 +0100)]
mtd: rawnand: gpio: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
f6341f6448e0 ("mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-4-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:33 +0000 (20:03 +0100)]
mtd: rawnand: au1550: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
dbffc8ccdf3a ("mtd: rawnand: au1550: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-3-miquel.raynal@bootlin.com
Miquel Raynal [Thu, 3 Dec 2020 19:03:32 +0000 (20:03 +0100)]
mtd: rawnand: ams-delta: Do not force a particular software ECC engine
Originally, commit
d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.
Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.
Add the necessary logic to be sure Hamming keeps being only a default.
Fixes:
59d93473323a ("mtd: rawnand: ams-delta: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201203190340.15522-2-miquel.raynal@bootlin.com
Linus Torvalds [Fri, 11 Dec 2020 18:25:04 +0000 (10:25 -0800)]
Merge tag 'pinctrl-v5.10-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here is a late set of pin control fixes for v5.10, most concern some
minor and major issues found in the Intel drivers. Some are so hairy
that I have no idea what is going on there, but luckily the maintainer
knows what's up.
We also have an interesting fix for AMD, which makes AMD-based laptops
more stable IIUC.
Summary:
- Fix up some SPI group and a register offset on Intel Jasperlake
- Set default bias on Intel Merrifield
- Preserve debouncing on Intel Baytrail
- Stop .set_type() irqchip callback in the AMD driver from fiddling
with the debounce filter
- Fix access to GPIO banks that are pass-thru on the Aspeed
- Fix a fix for the Intel pin control driver to disable Rx/Tx when
requesting a UART line as GPIO"
* tag 'pinctrl-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: Actually disable Tx and Rx buffers on GPIO request
pinctrl: aspeed: Fix GPIO requests on pass-through banks
pinctrl: amd: remove debounce filter setting in IRQ type setting
pinctrl: baytrail: Avoid clearing debounce value when turning it off
pinctrl: merrifield: Set default bias in case no particular value given
pinctrl: jasperlake: Fix HOSTSW_OWN offset
pinctrl: jasperlake: Unhide SPI group of pins