Christoph Hellwig [Wed, 31 May 2023 12:55:33 +0000 (14:55 +0200)]
mtd: block2mtd: factor the early block device open logic into a helper
Simplify add_device a bit by splitting out the cumbersome early boot logic
into a separate helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230531125535.676098-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:32 +0000 (14:55 +0200)]
PM: hibernate: don't use early_lookup_bdev in resume_store
resume_store is a sysfs attribute written during normal kernel runtime,
and it should not use the early_lookup_bdev API that bypasses all normal
path based permission checking, and might cause problems with certain
container environments renaming devices.
Switch to lookup_bdev, which does a normal path lookup instead, and fall
back to trying to parse a numeric dev_t just like early_lookup_bdev did.
Note that this strictly speaking changes the kernel ABI as the PARTUUID=
and PARTLABEL= style syntax is now not available during a running
systems. They never were intended for that, but this breaks things
we'll have to figure out a way to make them available again. But if
avoidable in any way I'd rather avoid that.
Fixes:
421a5fa1a6cf ("PM / hibernate: use name_to_dev_t to parse resume")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:31 +0000 (14:55 +0200)]
dm: only call early_lookup_bdev from early boot context
early_lookup_bdev is supposed to only be called from the early boot
code, but dm_get_device calls it as a general fallback when lookup_bdev
fails, which is problematic because early_lookup_bdev bypasses all normal
path based permission checking, and might cause problems with certain
container environments renaming devices.
Switch to only call early_lookup_bdev when dm is built-in and the system
state in not running yet. This means it is still available when tables
are constructed by dm-init.c from the kernel command line, but not
otherwise.
Note that this strictly speaking changes the kernel ABI as the PARTUUID=
and PARTLABEL= style syntax is now not available during a running
systems. They never were intended for that, but this breaks things
we'll have to figure out a way to make them available again. But if
avoidable in any way I'd rather avoid that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:30 +0000 (14:55 +0200)]
dm: remove dm_get_dev_t
Open code dm_get_dev_t in the only remaining caller, and propagate the
exact error code from lookup_bdev and early_lookup_bdev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:29 +0000 (14:55 +0200)]
dm: open code dm_get_dev_t in dm_init_init
dm_init_init is called from early boot code, and thus lookup_bdev
will never succeed. Just open code that call to early_lookup_bdev
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:28 +0000 (14:55 +0200)]
dm-snap: simplify the origin_dev == cow_dev check in snapshot_ctr
Use the block_device acquired in dm_get_device for the check instead
of doing an extra lookup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:27 +0000 (14:55 +0200)]
block: move more code to early-lookup.c
blk_lookup_devt is only used by code in early-lookup.c, so move it
there.
printk_all_partitions and it's helper bdevt_str are only used by the
early init code in init/do_mounts.c, so they should go there as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:26 +0000 (14:55 +0200)]
block: move the code to do early boot lookup of block devices to block/
Create a new block/early-lookup.c to keep the early block device lookup
code instead of having this code sit with the early mount code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:25 +0000 (14:55 +0200)]
init: clear root_wait on all invalid root= strings
Instead of only clearing root_wait in devt_from_partuuid when the UUID
format was invalid, do that in parse_root_device for all strings that
failed to parse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:24 +0000 (14:55 +0200)]
init: improve the name_to_dev_t interface
name_to_dev_t has a very misleading name, that doesn't make clear
it should only be used by the early init code, and also has a bad
calling convention that doesn't allow returning different kinds of
errors. Rename it to early_lookup_bdev to make the use case clear,
and return an errno, where -EINVAL means the string could not be
parsed, and -ENODEV means it the string was valid, but there was
no device found for it.
Also stub out the whole call for !CONFIG_BLOCK as all the non-block
root cases are always covered in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:23 +0000 (14:55 +0200)]
init: move the nfs/cifs/ram special cases out of name_to_dev_t
The nfs/cifs/ram special case only needs to be parsed once, and only in
the boot code. Move them out of name_to_dev_t and into
prepare_namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:22 +0000 (14:55 +0200)]
init: factor the root_wait logic in prepare_namespace into a helper
The root_wait logic is a bit obsfucated right now. Expand it and move it
into a helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:21 +0000 (14:55 +0200)]
init: handle ubi/mtd root mounting like all other root types
Assign a Root_Generic magic value for UBI/MTD root and handle the root
mounting in mount_root like all other root types. Besides making the
code more clear this also means that UBI/MTD root can be used together
with an initrd (not that anyone should care).
Also factor parsing of the root name into a helper now that it can
be easily done and will get more complicated with subsequent patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:20 +0000 (14:55 +0200)]
init: don't remove the /dev/ prefix from error messages
Remove the code that drops the /dev/ prefix from root_device_name, which
is only used for error messages when mounting the root device fails.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:19 +0000 (14:55 +0200)]
init: pass root_device_name explicitly
Instead of declaring root_device_name as a global variable pass it as an
argument to the functions using it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:18 +0000 (14:55 +0200)]
init: refactor mount_root
Provide stubs for all the lower level mount helpers, and just switch
on ROOT_DEV in the main function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:17 +0000 (14:55 +0200)]
init: rename mount_block_root to mount_root_generic
mount_block_root is also used to mount non-block file systems, so give
it a better name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:16 +0000 (14:55 +0200)]
init: remove pointless Root_* values
Remove all unused defines, and just use the expanded versions for
the SCSI disk majors.
I've decided to keep Root_RAM0 even if it could be expanded as there
is a lot of special casing for it in the init code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:15 +0000 (14:55 +0200)]
PM: hibernate: move finding the resume device out of software_resume
software_resume can be called either from an init call in the boot code,
or from sysfs once the system has finished booting, and the two
invocation methods this can't race with each other.
For the latter case we did just parse the suspend device manually, while
the former might not have one. Split software_resume so that the search
only happens for the boot case, which also means the special lockdep
nesting annotation can go away as the system transition mutex can be
taken a little later and doesn't have the sysfs locking nest inside it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:14 +0000 (14:55 +0200)]
PM: hibernate: remove the global snapshot_test variable
Passing call dependent variable in global variables is a huge
antipattern. Fix it up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:13 +0000 (14:55 +0200)]
PM: hibernate: factor out a helper to find the resume device
Split the logic to find the resume device out software_resume and into
a separate helper to start unwindig the convoluted goto logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230531125535.676098-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 31 May 2023 12:55:12 +0000 (14:55 +0200)]
driver core: return bool from driver_probe_done
bool is the most sensible return value for a yes/no return. Also
add __init as this funtion is only called from the early boot code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230531125535.676098-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:59 +0000 (11:44 +0200)]
ext4: wire up the ->mark_dead holder operation for log devices
Implement a set of holder_ops that shut down the file system when the
block device used as log device is removed undeneath the file system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:58 +0000 (11:44 +0200)]
ext4: wire up sops->shutdown
Wire up the shutdown method to shut down the file system when the
underlying block device is marked dead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:57 +0000 (11:44 +0200)]
ext4: split ext4_shutdown
Split ext4_shutdown into a low-level helper that will be reused for
implementing the shutdown super operation and a wrapper for the ioctl
handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:56 +0000 (11:44 +0200)]
xfs: wire up the ->mark_dead holder operation for log and RT devices
Implement a set of holder_ops that shut down the file system when the
block device used as log or RT device is removed undeneath the file
system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:55 +0000 (11:44 +0200)]
xfs: wire up sops->shutdown
Wire up the shutdown method to shut down the file system when the
underlying block device is marked dead. Add a new message to
clearly distinguish this shutdown reason from other shutdowns.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:54 +0000 (11:44 +0200)]
fs: add a method to shut down the file system
Add a new ->shutdown super operation that can be used to tell the file
system to shut down, and call it from newly created holder ops when the
block device under a file system shuts down.
This only covers the main block device for "simple" file systems using
get_tree_bdev / mount_bdev. File systems their own get_tree method
or opening additional devices will need to set up their own
blk_holder_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:53 +0000 (11:44 +0200)]
block: add a mark_dead holder operation
Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
to notify the holder that the block device it is using has been marked dead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:52 +0000 (11:44 +0200)]
block: introduce holder ops
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims. It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:51 +0000 (11:44 +0200)]
block: remove blk_drop_partitions
There is only a single caller left, so fold the loop into that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:50 +0000 (11:44 +0200)]
block: delete partitions later in del_gendisk
Delay dropping the block_devices for partitions in del_gendisk until
after the call to blk_mark_disk_dead, so that we can implementat
notification of removed devices in blk_mark_disk_dead.
This requires splitting a lower-level drop_partition helper out of
delete_partition and using that from del_gendisk, while having a
common loop for the whole device and partitions that calls
remove_inode_hash, fsync_bdev and __invalidate_device before the
call to blk_mark_disk_dead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:49 +0000 (11:44 +0200)]
block: unhash the inode earlier in delete_partition
Move the call to remove_inode_hash to the beginning of delete_partition,
as we want to prevent opening a block_device that is about to be removed
ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:48 +0000 (11:44 +0200)]
block: avoid repeated work in blk_mark_disk_dead
Check if GD_DEAD is already set in blk_mark_disk_dead, and don't
duplicate the work already done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:47 +0000 (11:44 +0200)]
block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
blk_mark_disk_dead does very similar work a a section of del_gendisk:
- set the GD_DEAD flag
- set the capacity to zero
- start a queue drain
but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by
the disk, sets the capacity to zero before starting the drain, and both
with sending a uevent and kernel message for this fake capacity change.
Move the exact logic from the more heavily used del_gendisk into
blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:46 +0000 (11:44 +0200)]
block: turn bdev_lock into a mutex
There is no reason for this lock to spin, and being able to sleep under
it will come in handy soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:45 +0000 (11:44 +0200)]
block: refactor bd_may_claim
The long if/else chain obsfucates the actual logic. Tidy it up to be
more structured. Also drop the whole argument, as it can be trivially
derived from bdev using bdev_whole, and having the bdev_whole in the
function makes it easier to follow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 09:44:44 +0000 (11:44 +0200)]
block: factor out a bd_end_claim helper from blkdev_put
Move all the logic to release an exclusive claim into a helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Thu, 1 Jun 2023 15:16:46 +0000 (17:16 +0200)]
drbd: stop defining __KERNEL_SYSCALLS__
__KERNEL_SYSCALLS__ hasn't been needed since Linux 2.6.19 so stop
defining it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20230601151646.1386867-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Sat, 3 Jun 2023 04:06:01 +0000 (12:06 +0800)]
ublk: add control command of UBLK_U_CMD_GET_FEATURES
Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's
feature set or capability.
This way can simplify userspace for maintaining compatibility because
userspace doesn't need to send command to one device for querying driver
feature set any more. Such as, with the queried feature set, userspace
can choose to use:
- UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO,
- UBLK_U_CMD_* or UBLK_CMD_*
Userspace code:
https://github.com/ming1/ubdsrv/commits/features-cmd
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230603040601.775227-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Azeem Shaikh [Tue, 30 May 2023 15:56:08 +0000 (15:56 +0000)]
block: Replace all non-returning strlcpy with strscpy
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230530155608.272266-1-azeemshaikh38@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yu Kuai [Wed, 31 May 2023 07:34:35 +0000 (15:34 +0800)]
blk-ioc: protect ioc_destroy_icq() by 'queue_lock'
Currently, icq is tracked by both request_queue(icq->q_node) and
task(icq->ioc_node), and ioc_clear_queue() from elevator exit is not
safe because it can access the list without protection:
ioc_clear_queue ioc_release_fn
lock queue_lock
list_splice
/* move queue list to a local list */
unlock queue_lock
/*
* lock is released, the local list
* can be accessed through task exit.
*/
lock ioc->lock
while (!hlist_empty)
icq = hlist_entry
lock queue_lock
ioc_destroy_icq
delete icq->ioc_node
while (!list_empty)
icq = list_entry() list_del icq->q_node
/*
* This is not protected by any lock,
* list_entry concurrent with list_del
* is not safe.
*/
unlock queue_lock
unlock ioc->lock
Fix this problem by protecting list 'icq->q_node' by queue_lock from
ioc_clear_queue().
Reported-and-tested-by: Pradeep Pragallapati <quic_pragalla@quicinc.com>
Link: https://lore.kernel.org/lkml/20230517084434.18932-1-quic_pragalla@quicinc.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531073435.2923422-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:43 +0000 (04:50 -0700)]
block: mark bio_add_folio as __must_check
Now that all callers of bio_add_folio() check the return value, mark it as
__must_check.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/381360a45ac3684120cfbe1e07685e9c36256e47.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:42 +0000 (04:50 -0700)]
fs: iomap: use bio_add_folio_nofail where possible
When the iomap buffered-io code can't add a folio to a bio, it allocates a
new bio and adds the folio to that one. This is done using bio_add_folio(),
but doesn't check for errors.
As adding a folio to a newly created bio can't fail, use the newly
introduced bio_add_folio_nofail() function.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/58fa893c24c67340a63323f09a179fefdca07f2a.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:41 +0000 (04:50 -0700)]
block: add bio_add_folio_nofail
Just like for bio_add_pages() add a no-fail variant for bio_add_folio().
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/924dff4077812804398ef84128fb920507fa4be1.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:40 +0000 (04:50 -0700)]
block: mark bio_add_page as __must_check
Now that all users of bio_add_page check for the return value, mark
bio_add_page as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/7ae4a902e08fe2e90c012ee07aeb35d4aae28373.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:39 +0000 (04:50 -0700)]
dm-crypt: use __bio_add_page to add single page to clone bio
crypt_alloc_buffer() already allocates enough entries in the clone bio's
vector, so adding a page to the bio can't fail. Use __bio_add_page() to
reflect this.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/f9a4dee5e81389fd70ffc442da01006538e55aca.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:38 +0000 (04:50 -0700)]
md: raid1: check if adding pages to resync bio fails
Check if adding pages to resync bio fails and if bail out.
As the comment above suggests this cannot happen, WARN if it actually
happens. Technically __bio_add_pages() would be sufficient here, but
asserting the pages actually get added to the bio is preferred.
This way we can mark bio_add_pages as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/33aea4c271220dc9bcab58c4b7bec478c1511142.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:37 +0000 (04:50 -0700)]
md: raid1: use __bio_add_page for adding single page to bio
The sync request code uses bio_add_page() to add a page to a newly created bio.
bio_add_page() can fail, but the return value is never checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/6cf7f66c6e646231200d025dfd5f2d3ae75c8fe5.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:36 +0000 (04:50 -0700)]
md: check for failure when adding pages in alloc_behind_master_bio
alloc_behind_master_bio() can possibly add multiple pages to a bio, but it
is not checking for the return value of bio_add_page() if adding really
succeeded.
Check if the page adding succeeded and if not bail out.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/827aa12d44ebf3f50b41b47f5cedc0f80179f2c1.1685532726.git.johannes.thumshirn@wdc.com
[axboe: fold in s/free_page/put_page fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:35 +0000 (04:50 -0700)]
floppy: use __bio_add_page for adding single page to bio
The floppy code uses bio_add_page() to add a page to a newly created bio.
bio_add_page() can fail, but the return value is never checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/33c445a3b431270c72d9be03d5da1b08ae983920.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:34 +0000 (04:50 -0700)]
zram: use __bio_add_page for adding single page to bio
The zram writeback code uses bio_add_page() to add a page to a newly
created bio. bio_add_page() can fail, but the return value is never
checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/cfd141dd7773315879a126f2aa81b7f698bc0e10.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:33 +0000 (04:50 -0700)]
zonefs: use __bio_add_page for adding single page to bio
The zonefs superblock reading code uses bio_add_page() to add a page to a
newly created bio. bio_add_page() can fail, but the return value is
never checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/04c9978ccaa0fc9871cd4248356638d98daccf0c.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:32 +0000 (04:50 -0700)]
gfs2: use __bio_add_page for adding single page to bio
The GFS2 superblock reading code uses bio_add_page() to add a page to a
newly created bio. bio_add_page() can fail, but the return value is never
checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/087c67d4e4973f949d3519c1e4822784ce583c5a.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:31 +0000 (04:50 -0700)]
jfs: logmgr: use __bio_add_page to add single page to bio
The JFS IO code uses bio_add_page() to add a page to a newly created bio.
bio_add_page() can fail, but the return value is never checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/9fb5ed86d19f6e0b6f64dfc4109a48ff8ff24497.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:30 +0000 (04:50 -0700)]
md: raid5: use __bio_add_page to add single page to new bio
The raid5-ppl submission code uses bio_add_page() to add a page to a
newly created bio. bio_add_page() can fail, but the return value is never
checked. For adding consecutive pages, the return is actually checked and
a new bio is allocated if adding the page fails.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/27e6bcd762354bff74602e89159cdd12ae3d1fa9.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:29 +0000 (04:50 -0700)]
md: raid5-log: use __bio_add_page to add single page
The raid5 log metadata submission code uses bio_add_page() to add a page
to a newly created bio. bio_add_page() can fail, but the return value is
never checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/832a810d6c9e71f88b0a39cb076a8c70e8bcb821.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:28 +0000 (04:50 -0700)]
md: use __bio_add_page to add single page
The md-raid superblock writing code uses bio_add_page() to add a page to a
newly created bio. bio_add_page() can fail, but the return value is never
checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Signed-of_-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/ca196f5e650e318106dbb4496eb6cbac4bc800bd.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:27 +0000 (04:50 -0700)]
fs: buffer: use __bio_add_page to add single page to bio
The buffer_head submission code uses bio_add_page() to add a page to a
newly created bio. bio_add_page() can fail, but the return value is never
checked.
Use __bio_add_page() as adding a single page to a newly created bio is
guaranteed to succeed.
This brings us a step closer to marking bio_add_page() as __must_check.
Reviewed-by: Gou Hao <gouhao@uniontech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/84ff2dcbe81b258a73ad900adb5266e208b61a4d.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:26 +0000 (04:50 -0700)]
dm: dm-zoned: use __bio_add_page for adding single metadata page
dm-zoned uses bio_add_page() for adding a single page to a freshly created
metadata bio.
Use __bio_add_page() instead as adding a single page to a new bio is
always guaranteed to succeed.
This brings us a step closer to marking bio_add_page() __must_check
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/55a0c8dad7550379647873b579dc7cfbe0191f96.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:25 +0000 (04:50 -0700)]
drbd: use __bio_add_page to add page to bio
The drbd code only adds a single page to a newly created bio. So use
__bio_add_page() to add the page which is guaranteed to succeed in this
case.
This brings us closer to marking bio_add_page() as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/435007afac14f3766455559059d21843771fae53.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Wed, 31 May 2023 11:50:24 +0000 (04:50 -0700)]
swap: use __bio_add_page to add page to bio
The swap code only adds a single page to a newly created bio. So use
__bio_add_page() to add the page which is guaranteed to succeed in this
case.
This brings us closer to marking bio_add_page() as __must_check.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/5bdafd9de806b2dab92302b30eb7a3a5f10c37d9.1685532726.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Fri, 26 May 2023 21:41:42 +0000 (22:41 +0100)]
block: Use iov_iter_extract_pages() and page pinning in direct-io.c
Change the old block-based direct-I/O code to use iov_iter_extract_pages()
to pin user pages or leave kernel pages unpinned rather than taking refs
when submitting bios.
This makes use of the preceding patches to not take pins on the zero page
(thereby allowing insertion of zero pages in with pinned pages) and to get
additional pins on pages, allowing an extracted page to be used in multiple
bios without having to re-extract it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@infradead.org>
cc: David Hildenbrand <david@redhat.com>
cc: Lorenzo Stoakes <lstoakes@gmail.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: Jason Gunthorpe <jgg@nvidia.com>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: Hillf Danton <hdanton@sina.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-kernel@vger.kernel.org
cc: linux-mm@kvack.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230526214142.958751-4-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Fri, 26 May 2023 21:41:41 +0000 (22:41 +0100)]
mm: Provide a function to get an additional pin on a page
Provide a function to get an additional pin on a page that we already have
a pin on. This will be used in fs/direct-io.c when dispatching multiple
bios to a page we've extracted from a user-backed iter rather than redoing
the extraction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@infradead.org>
cc: David Hildenbrand <david@redhat.com>
cc: Lorenzo Stoakes <lstoakes@gmail.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: Jason Gunthorpe <jgg@nvidia.com>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: Hillf Danton <hdanton@sina.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-kernel@vger.kernel.org
cc: linux-mm@kvack.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20230526214142.958751-3-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Fri, 26 May 2023 21:41:40 +0000 (22:41 +0100)]
mm: Don't pin ZERO_PAGE in pin_user_pages()
Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
to it from the page tables and make unpin_user_page*() correspondingly
ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a
zero page's refcount as we're only allowed ~2 million pins on it -
something that userspace can conceivably trigger.
Add a pair of functions to test whether a page or a folio is a ZERO_PAGE.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@infradead.org>
cc: David Hildenbrand <david@redhat.com>
cc: Lorenzo Stoakes <lstoakes@gmail.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: Jason Gunthorpe <jgg@nvidia.com>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: Hillf Danton <hdanton@sina.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-kernel@vger.kernel.org
cc: linux-mm@kvack.org
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20230526214142.958751-2-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thomas Weißschuh [Tue, 30 May 2023 17:10:00 +0000 (19:10 +0200)]
block: constify the whole_disk device_attribute
The struct is never modified so it can be const.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230419-const-partition-v3-4-4e14e48be367@weissschuh.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thomas Weißschuh [Tue, 30 May 2023 17:09:59 +0000 (19:09 +0200)]
block: constify struct part_attr_group
The struct is never modified so it can be const.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230419-const-partition-v3-3-4e14e48be367@weissschuh.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thomas Weißschuh [Tue, 30 May 2023 17:09:58 +0000 (19:09 +0200)]
block: constify struct part_type part_type
The struct is never modified so it can be const.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230419-const-partition-v3-2-4e14e48be367@weissschuh.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thomas Weißschuh [Tue, 30 May 2023 17:09:57 +0000 (19:09 +0200)]
block: constify partition prober array
The array is never modified so it can be const.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202304191640.SkNk7kVN-lkp@intel.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230419-const-partition-v3-1-4e14e48be367@weissschuh.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 20:57:44 +0000 (21:57 +0100)]
block: convert bio_map_user_iov to use iov_iter_extract_pages
This will pin pages or leave them unaltered rather than getting a ref on
them as appropriate to the iterator.
The pages need to be pinned for DIO rather than having refs taken on them
to prevent VM copy-on-write from malfunctioning during a concurrent fork()
(the result of the I/O could otherwise end up being visible to/affected by
the child process).
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230522205744.2825689-7-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 20:57:43 +0000 (21:57 +0100)]
block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
This will pin pages or leave them unaltered rather than getting a ref on
them as appropriate to the iterator.
The pages need to be pinned for DIO rather than having refs taken on them to
prevent VM copy-on-write from malfunctioning during a concurrent fork() (the
result of the I/O could otherwise end up being affected by/visible to the
child process).
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230522205744.2825689-6-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 20:57:42 +0000 (21:57 +0100)]
block: Add BIO_PAGE_PINNED and associated infrastructure
Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned
(FOLL_PIN) and that the pin will need removing.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230522205744.2825689-5-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Mon, 22 May 2023 20:57:41 +0000 (21:57 +0100)]
block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic
Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted
meaning is only set when a page reference has been acquired that needs to
be released by bio_release_pages().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jan Kara <jack@suse.cz>
cc: Matthew Wilcox <willy@infradead.org>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-block@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230522205744.2825689-4-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 20:57:40 +0000 (21:57 +0100)]
block: Fix bio_flagged() so that gcc can better optimise it
Fix bio_flagged() so that multiple instances of it, such as:
if (bio_flagged(bio, BIO_PAGE_REFFED) ||
bio_flagged(bio, BIO_PAGE_PINNED))
can be combined by the gcc optimiser into a single test in assembly
(arguably, this is a compiler optimisation issue[1]).
The missed optimisation stems from bio_flagged() comparing the result of
the bitwise-AND to zero. This results in an out-of-line bio_release_page()
being compiled to something like:
<+0>: mov 0x14(%rdi),%eax
<+3>: test $0x1,%al
<+5>: jne 0xffffffff816dac53 <bio_release_pages+11>
<+7>: test $0x2,%al
<+9>: je 0xffffffff816dac5c <bio_release_pages+20>
<+11>: movzbl %sil,%esi
<+15>: jmp 0xffffffff816daba1 <__bio_release_pages>
<+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk>
However, the test is superfluous as the return type is bool. Removing it
results in:
<+0>: testb $0x3,0x14(%rdi)
<+4>: je 0xffffffff816e4af4 <bio_release_pages+15>
<+6>: movzbl %sil,%esi
<+10>: jmp 0xffffffff816dab7c <__bio_release_pages>
<+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk>
instead.
Also, the MOVZBL instruction looks unnecessary[2] - I think it's just
're-booling' the mark_dirty parameter.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371
Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230522205744.2825689-3-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 20:57:39 +0000 (21:57 +0100)]
iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing
ZERO_PAGE can't go away, no need to hold an extra reference.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230522205744.2825689-2-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 24 May 2023 14:42:22 +0000 (08:42 -0600)]
Merge branch 'for-6.5/splice' into for-6.5/block
Merge splice bits as subsequent block cleanups and improvements for DIO
depend on them.
* for-6.5/splice: (31 commits)
splice: kdoc for filemap_splice_read() and copy_splice_read()
iov_iter: Kill ITER_PIPE
splice: Remove generic_file_splice_read()
splice: Use filemap_splice_read() instead of generic_file_splice_read()
cifs: Use filemap_splice_read()
trace: Convert trace/seq to use copy_splice_read()
zonefs: Provide a splice-read wrapper
xfs: Provide a splice-read wrapper
orangefs: Provide a splice-read wrapper
ocfs2: Provide a splice-read wrapper
ntfs3: Provide a splice-read wrapper
nfs: Provide a splice-read wrapper
f2fs: Provide a splice-read wrapper
ext4: Provide a splice-read wrapper
ecryptfs: Provide a splice-read wrapper
ceph: Provide a splice-read wrapper
afs: Provide a splice-read wrapper
9p: Add splice_read wrapper
net: Make sock_splice_read() use copy_splice_read() by default
tty, proc, kernfs, random: Use copy_splice_read()
...
David Howells [Mon, 22 May 2023 13:50:18 +0000 (14:50 +0100)]
splice: kdoc for filemap_splice_read() and copy_splice_read()
Provide kerneldoc comments for filemap_splice_read() and
copy_splice_read().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steve French <smfrench@gmail.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-32-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:17 +0000 (14:50 +0100)]
iov_iter: Kill ITER_PIPE
The ITER_PIPE-type iterator was only used by generic_file_splice_read() and
that has been replaced and removed. This leaves ITER_PIPE unused - so
remove it too.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:16 +0000 (14:50 +0100)]
splice: Remove generic_file_splice_read()
Remove generic_file_splice_read() as it has been replaced with calls to
filemap_splice_read() and copy_splice_read().
With this, ITER_PIPE is no longer used.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steve French <smfrench@gmail.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-30-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:15 +0000 (14:50 +0100)]
splice: Use filemap_splice_read() instead of generic_file_splice_read()
Replace pointers to generic_file_splice_read() with calls to
filemap_splice_read().
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-29-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:14 +0000 (14:50 +0100)]
cifs: Use filemap_splice_read()
Make cifs use filemap_splice_read() rather than doing its own version of
generic_file_splice_read().
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steve French <smfrench@gmail.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-28-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:13 +0000 (14:50 +0100)]
trace: Convert trace/seq to use copy_splice_read()
For the splice from the trace seq buffer, just use copy_splice_read().
In the future, something better can probably be done by gifting pages from
seq->buf into the pipe, but that would require changing seq->buf into a
vmap over an array of pages.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Steven Rostedt <rostedt@goodmis.org>
cc: Masami Hiramatsu <mhiramat@kernel.org>
cc: linux-kernel@vger.kernel.org
cc: linux-trace-kernel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-27-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:12 +0000 (14:50 +0100)]
zonefs: Provide a splice-read wrapper
Provide a splice_read wrapper for zonefs. This does some checks before
proceeding and locks the inode across the call to filemap_splice_read() and
a size check in case of truncation. Splicing from direct I/O is handled by
the caller.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Darrick J. Wong <djwong@kernel.org>
cc: linux-xfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230522135018.2742245-26-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:11 +0000 (14:50 +0100)]
xfs: Provide a splice-read wrapper
Provide a splice_read wrapper for XFS. This does a stat count and a
shutdown check before proceeding, then emits a new trace line and locks the
inode across the call to filemap_splice_read() and adds to the stats
afterwards. Splicing from direct I/O or DAX is handled by the caller.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Darrick J. Wong <djwong@kernel.org>
cc: linux-xfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-25-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:10 +0000 (14:50 +0100)]
orangefs: Provide a splice-read wrapper
Provide a splice_read wrapper for ocfs2. This increments the read stats
and then locks the inode across the call to filemap_splice_read() and a
revalidation of the mapping. Splicing from direct I/O is done by the
caller.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Mike Marshall <hubcap@omnibond.com>
cc: Martin Brandenburg <martin@omnibond.com>
cc: devel@lists.orangefs.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-24-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:09 +0000 (14:50 +0100)]
ocfs2: Provide a splice-read wrapper
Provide a splice_read wrapper for ocfs2. This emits trace lines and does
an atime lock/update before calling filemap_splice_read(). Splicing from
direct I/O is handled by the caller.
A couple of new tracepoints are added for this purpose.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Mark Fasheh <mark@fasheh.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: ocfs2-devel@oss.oracle.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-23-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:08 +0000 (14:50 +0100)]
ntfs3: Provide a splice-read wrapper
Provide a splice_read wrapper for NTFS3 to perform various checks before
allowing the operation to proceed.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
cc: ntfs3@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-22-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:07 +0000 (14:50 +0100)]
nfs: Provide a splice-read wrapper
Provide a splice_read wrapper for NFS. This locks the inode around
filemap_splice_read() and revalidates the mapping. Splicing from direct
I/O is handled by the caller.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna@kernel.org>
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-21-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:06 +0000 (14:50 +0100)]
f2fs: Provide a splice-read wrapper
Provide a splice_read wrapper for f2fs. This does some checks and tracing
before calling filemap_splice_read() and will update the iostats
afterwards. Direct I/O is handled by the caller.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jaegeuk Kim <jaegeuk@kernel.org>
cc: Chao Yu <chao@kernel.org>
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-20-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:05 +0000 (14:50 +0100)]
ext4: Provide a splice-read wrapper
Provide a splice_read wrapper for Ext4. This does the inode shutdown check
before proceeding. Splicing from DAX files and O_DIRECT fds is handled by
the caller.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Andreas Dilger <adilger.kernel@dilger.ca>
cc: linux-ext4@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-19-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:04 +0000 (14:50 +0100)]
ecryptfs: Provide a splice-read wrapper
Provide a splice_read wrapper for ecryptfs to update the access time on the
lower file after the operation. Splicing from a direct I/O fd will update
the access time when ->read_iter() is called.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Tyler Hicks <code@tyhicks.com>
cc: ecryptfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-18-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:03 +0000 (14:50 +0100)]
ceph: Provide a splice-read wrapper
Provide a splice_read wrapper for Ceph. This does the inode shutdown check
before proceeding and jumps to copy_splice_read() if the file has inline
data or is a synchronous file.
We try and get FILE_RD and either FILE_CACHE and/or FILE_LAZYIO caps and
hold them across filemap_splice_read(). If we fail to get FILE_CACHE or
FILE_LAZYIO capabilities, we use copy_splice_read() instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-17-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:02 +0000 (14:50 +0100)]
afs: Provide a splice-read wrapper
Provide a splice_read wrapper for AFS to call afs_validate() before going
into generic_file_splice_read() so that we're likely to have a callback
promise from the server.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-16-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:01 +0000 (14:50 +0100)]
9p: Add splice_read wrapper
Add a splice_read wrapper for 9p. We should use copy_splice_read() if
9PL_DIRECT is set and filemap_splice_read() otherwise. Note that this
doesn't seem to be particularly related to O_DIRECT.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-15-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:50:00 +0000 (14:50 +0100)]
net: Make sock_splice_read() use copy_splice_read() by default
Make sock_splice_read() use copy_splice_read() by default as
file_splice_read() will return immediately with 0 as a socket has no
pagecache and is a zero-size file.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: netdev@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-14-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:49:59 +0000 (14:49 +0100)]
tty, proc, kernfs, random: Use copy_splice_read()
Use copy_splice_read() for tty, procfs, kernfs and random files rather
than going through generic_file_splice_read() as they just copy the file
into the output buffer and don't splice pages. This avoids the need for
them to have a ->read_folio() to satisfy filemap_splice_read().
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Arnd Bergmann <arnd@arndb.de>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:49:58 +0000 (14:49 +0100)]
coda: Implement splice-read
Implement splice-read for coda by passing the request down a layer rather
than going through generic_file_splice_read() which is going to be changed
to assume that ->read_folio() is present on buffered files.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: coda@cs.cmu.edu
cc: codalist@coda.cs.cmu.edu
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-12-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:49:57 +0000 (14:49 +0100)]
overlayfs: Implement splice-read
Implement splice-read for overlayfs by passing the request down a layer
rather than going through generic_file_splice_read() which is going to be
changed to assume that ->read_folio() is present on buffered files.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christian Brauner <brauner@kernel.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Amir Goldstein <amir73il@gmail.com>
cc: linux-unionfs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-11-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:49:56 +0000 (14:49 +0100)]
shmem: Implement splice-read
The new filemap_splice_read() has an implicit expectation via
filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
fully populate the pagecache of the file it is reading from[1], potentially
leading to a jump to NULL if this doesn't exist. shmem, however, (and by
extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),
Work around this by equipping shmem with its own splice-read
implementation, based on filemap_splice_read(), but able to paste in
zero_page when there's a page missing.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Daniel Golle <daniel@makrotopia.org>
cc: Guenter Roeck <groeck7@gmail.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Hugh Dickins <hughd@google.com>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/Y+pdHFFTk1TTEBsO@makrotopia.org/
Link: https://lore.kernel.org/r/20230522135018.2742245-10-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Howells [Mon, 22 May 2023 13:49:55 +0000 (14:49 +0100)]
splice: Make splice from a DAX file use copy_splice_read()
Make a read splice from a DAX file go directly to copy_splice_read() to do
the reading as filemap_splice_read() is unlikely to find any pagecache to
splice.
I think this affects only erofs, Ext2, Ext4, fuse and XFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-erofs@lists.ozlabs.org
cc: linux-ext4@vger.kernel.org
cc: linux-xfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-9-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>