Christoph Hellwig [Fri, 27 Mar 2020 08:30:08 +0000 (09:30 +0100)]
block: add a blk_mq_init_queue_data helper
This allows a driver to pass a queuedata member before ->init_hctx is
called. null_blk currently open codes this logic, but I'd rather have
it in the core to ease future maintainance.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 27 Mar 2020 08:07:17 +0000 (09:07 +0100)]
block: move the ->devnode callback to struct block_device_operations
There really isn't any good reason to stash a method directly into
struct gendisk. Move it together with the other block device
operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:42 +0000 (16:48 +0100)]
block: move the part_stat* helpers from genhd.h to a new header
These macros are just used by a few files. Move them out of genhd.h,
which is included everywhere into a new standalone header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:41 +0000 (16:48 +0100)]
block: move block layer internals out of include/linux/genhd.h
None of this needs to be exposed to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:40 +0000 (16:48 +0100)]
block: move guard_bio_eod to bio.c
This is bio layer functionality and not related to buffer heads.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:39 +0000 (16:48 +0100)]
block: unexport get_gendisk
get_gendisk is not used by any modular code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:38 +0000 (16:48 +0100)]
block: unexport disk_map_sector_rcu
disk_map_sector_rcu is not used by any modular code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:37 +0000 (16:48 +0100)]
block: unexport disk_get_part
disk_get_part is not used by any modular code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:36 +0000 (16:48 +0100)]
block: mark part_in_flight and part_in_flight_rw static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 25 Mar 2020 15:48:35 +0000 (16:48 +0100)]
block: mark block_depr static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Johannes Thumshirn [Tue, 24 Mar 2020 15:24:44 +0000 (00:24 +0900)]
block: factor out requeue handling from dispatch code
Factor out the requeue handling from the dispatch code, this will make
subsequent addition of different requeueing schemes easier.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Konstantin Khlebnikov [Wed, 25 Mar 2020 13:07:08 +0000 (16:07 +0300)]
block/diskstats: replace time_in_queue with sum of request times
Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.
This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Konstantin Khlebnikov [Wed, 25 Mar 2020 13:07:06 +0000 (16:07 +0300)]
block/diskstats: accumulate all per-cpu counters in one pass
Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.
Hammering /proc/diskstats with fio shows 2x performance improvement:
fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
--size=1k --bs=1k --fallocate=none --create_on_open=1 \
--time_based=1 --runtime=10 --invalidate=0 --group_report
JOBS=1 JOBS=10
Before: 7k iops 64k iops
After: 18k iops 120k iops
Also this way code is more compact:
add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function old new delta
part_stat_read_all - 194 +194
diskstats_show 1344 631 -713
part_stat_show 1219 392 -827
Total: Before=
14966947, After=
14965601, chg -0.01%
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Konstantin Khlebnikov [Wed, 25 Mar 2020 13:07:04 +0000 (16:07 +0300)]
block/diskstats: more accurate approximation of io_ticks for slow disks
Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.
If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.
Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.
Example: common HDD executes random read 4k requests around 12ms.
fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb
Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:
Before:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43
After:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99
Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.
For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.
Fixes:
5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:30 +0000 (08:25 +0100)]
block: merge partition-generic.c and check.c
Merge block/partition-generic.c and block/partitions/check.c into
a single block/partitions/core.c as the content is closely related
and both files are tiny.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:29 +0000 (08:25 +0100)]
block: move the various x86 Unix label formats out of genhd.h
All these are just used in block/partitions/msdos.c, so move them out of the
genhd.h driver included by every driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:28 +0000 (08:25 +0100)]
partitions/msdos: remove LINUX_SWAP_PARTITION
Just always use NEW_SOLARIS_X86_PARTITION and explain the situation,
as that is less confusing than two names for a single value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:27 +0000 (08:25 +0100)]
block: move the *_PARTITION enum out of genhd.h
The enum containing the *_PARTITION symbolic names is only relevant
for the partition parser. More specifically most values are MSDOS
partition table system indicators and thus should go straight into
msdos.c. One value is only used by the sun partition parser, and the
sun and sgi partition parsers use the same value as the x86 Linux
RAID indicator to also indicate RAID autodetection. Duplicate them
in sun.c and sgi.c given that the different partition types use
entirely different values otherwise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:26 +0000 (08:25 +0100)]
block: move struct partition out of genhd.h
struct partition is the on-disk format of a MSDOS partition table entry.
Move it out of genhd.h into a new msdos_partition.h header and give it
a msdos_ prefix to avoid confusion.
Also move the magic number from block/partitions/msdos.h to the new
header so that it can be used by the SCSI drivers looking at the DOS
partition tables.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:25 +0000 (08:25 +0100)]
block: remove block/partitions/sun.h
Just move the two defines to block/partitions/sun.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:24 +0000 (08:25 +0100)]
block: remove block/partitions/sgi.h
Just move the single define to block/partitions/sgi.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:23 +0000 (08:25 +0100)]
block: remove block/partitions/osf.h
Just move the single define to block/partitions/osf.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:22 +0000 (08:25 +0100)]
block: remove block/partitions/karma.h
Just move the single define to block/partitions/karma.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:21 +0000 (08:25 +0100)]
block: declare all partition detection routines in check.h
There is no good reason to include one header per partition type in
core.c. Instead move the prototypes for the detection routins to
check.h, and remove all now empty headers in block/partitions/.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:20 +0000 (08:25 +0100)]
block: remove warn_no_part
The warn_no_part is initialized to 1 and never changed. Remove
it and execute the code keyed off from it unconditionally.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:19 +0000 (08:25 +0100)]
block: cleanup how md_autodetect_dev is called
Add a new include/linux/raid/detect.h header to declare the
md_autodetect_dev prototype which can be shared between md and
the partition code. Then use IS_BUILTIN to call it instead of the
ifdef magic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:18 +0000 (08:25 +0100)]
block: unexport read_dev_sector and put_dev_sector
read_dev_sector and put_dev_sector are now only used by the partition
parsing code. Remove the export for read_dev_sector and merge it into
the only caller. Clean the mess up a bit by using goto labels and
the SECTOR_SHIFT constant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:17 +0000 (08:25 +0100)]
scsi: simplify scsi_partsize
Call scsi_bios_ptable from scsi_partsize instead of requiring boilerplate
code in the callers. Also switch the calling convention to match that
of the ->bios_param instances calling this function, and use true/false
for the return value instead of the weird -1 convention.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:16 +0000 (08:25 +0100)]
scsi: move scsicam_bios_param to the end of scsicam.c
This avoids the need for a forward declaration and generally keeps the
file in the lower level first, high level last order.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:15 +0000 (08:25 +0100)]
scsi: simplify scsi_bios_ptable
Use read_mapping_page and kmemdup instead of the odd read_dev_sector and
put_dev_sector helpers from the partitioning code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:14 +0000 (08:25 +0100)]
block: remove alloc_part_info and free_part_info
There isn't any good reason not to simply open code the allocation and
freeing of the partition_meta_info structure. Especially as one of
the branches in alloc_part_info is entirely dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:13 +0000 (08:25 +0100)]
block: move sysfs methods shared by disks and partitions to genhd.c
Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code. Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:12 +0000 (08:25 +0100)]
block: move disk_name and related helpers out of partition-generic.c
Thes functions aren't really related to partition support, so move them
to a more suitable place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:11 +0000 (08:25 +0100)]
block: remove __bdevname
There is no good reason for __bdevname to exist. Just open code
printing the string in the callers. For three of them the format
string can be trivially merged into existing printk statements,
and in init/do_mounts.c we can at least do the scnprintf once at
the start of the function, and unconditional of CONFIG_BLOCK to
make the output for tiny configfs a little more helpful.
Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 24 Mar 2020 07:25:10 +0000 (08:25 +0100)]
block: remove the blk_lookup_devt export
This function is only used by init/do_mounts.c, which can't be modular.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Sat, 21 Mar 2020 09:45:21 +0000 (10:45 +0100)]
block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline
In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
flush the rb tree that contains all idle entities belonging to the pd
(cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
invoked before bfq_reparent_active_queues(). Yet the latter may happen
to add some entities to the idle tree. It happens if, in some of the
calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
the queue to move is empty and gets expired.
This commit simply reverses the invocation order between
bfq_flush_idle_tree() and bfq_reparent_active_queues().
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Sat, 21 Mar 2020 09:45:20 +0000 (10:45 +0100)]
block, bfq: make reparent_leaf_entity actually work only on leaf entities
bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
entity represents just a bfq_queue in an entity tree). Yet, the input
entity is guaranteed to always be a leaf entity only in two-level
entity trees. In this respect, because of the error fixed by
commit
14afc5936197 ("block, bfq: fix overwrite of bfq_group pointer
in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
to actually have only two levels. After the latter commit, this does not
hold any longer.
This commit fixes this problem by modifying
bfq_reparent_leaf_entity(), so that it searches an active leaf entity
down the path that stems from the input entity. Such a leaf entity is
guaranteed to exist when bfq_reparent_leaf_entity() is invoked.
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Sat, 21 Mar 2020 09:45:19 +0000 (10:45 +0100)]
block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup
A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
goal of this put is to release a process reference to a bfq_queue. But
process-reference releases may trigger also some extra operation, and,
to this goal, are handled through bfq_release_process_ref(). So, turn
the invocation of bfq_put_queue() into an invocation of
bfq_release_process_ref().
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Sat, 21 Mar 2020 09:45:18 +0000 (10:45 +0100)]
block, bfq: move forward the getting of an extra ref in bfq_bfqq_move
Commit
ecedd3d7e199 ("block, bfq: get extra ref to prevent a queue
from being freed during a group move") gets an extra reference to a
bfq_queue before possibly deactivating it (temporarily), in
bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
being reactivated in its new group.
Yet, the bfq_queue may also be expired (i.e., its service may be
stopped) before the bfq_queue is deactivated. And also an expiration
may lead to a premature freeing. This commit fixes this issue by
simply moving forward the getting of the extra reference already
introduced by commit
ecedd3d7e199 ("block, bfq: get extra ref to
prevent a queue from being freed during a group move").
Reported-by: cki-project@redhat.com
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zhiqiang Liu [Thu, 19 Mar 2020 11:18:13 +0000 (19:18 +0800)]
block, bfq: fix use-after-free in bfq_idle_slice_timer_body
In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
not in bfqd-lock critical section. The bfqq, which is not
equal to NULL in bfq_idle_slice_timer, may be freed after passing
to bfq_idle_slice_timer_body. So we will access the freed memory.
In addition, considering the bfqq may be in race, we should
firstly check whether bfqq is in service before doing something
on it in bfq_idle_slice_timer_body func. If the bfqq in race is
not in service, it means the bfqq has been expired through
__bfq_bfqq_expire func, and wait_request flags has been cleared in
__bfq_bfqd_reset_in_service func. So we do not need to re-clear the
wait_request of bfqq which is not in service.
KASAN log is given as follows:
[13058.354613] ==================================================================
[13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
[13058.354644] Read of size 8 at addr
ffffa02cf3e63f78 by task fork13/19767
[13058.354646]
[13058.354655] CPU: 96 PID: 19767 Comm: fork13
[13058.354661] Call trace:
[13058.354667] dump_backtrace+0x0/0x310
[13058.354672] show_stack+0x28/0x38
[13058.354681] dump_stack+0xd8/0x108
[13058.354687] print_address_description+0x68/0x2d0
[13058.354690] kasan_report+0x124/0x2e0
[13058.354697] __asan_load8+0x88/0xb0
[13058.354702] bfq_idle_slice_timer+0xac/0x290
[13058.354707] __hrtimer_run_queues+0x298/0x8b8
[13058.354710] hrtimer_interrupt+0x1b8/0x678
[13058.354716] arch_timer_handler_phys+0x4c/0x78
[13058.354722] handle_percpu_devid_irq+0xf0/0x558
[13058.354731] generic_handle_irq+0x50/0x70
[13058.354735] __handle_domain_irq+0x94/0x110
[13058.354739] gic_handle_irq+0x8c/0x1b0
[13058.354742] el1_irq+0xb8/0x140
[13058.354748] do_wp_page+0x260/0xe28
[13058.354752] __handle_mm_fault+0x8ec/0x9b0
[13058.354756] handle_mm_fault+0x280/0x460
[13058.354762] do_page_fault+0x3ec/0x890
[13058.354765] do_mem_abort+0xc0/0x1b0
[13058.354768] el0_da+0x24/0x28
[13058.354770]
[13058.354773] Allocated by task 19731:
[13058.354780] kasan_kmalloc+0xe0/0x190
[13058.354784] kasan_slab_alloc+0x14/0x20
[13058.354788] kmem_cache_alloc_node+0x130/0x440
[13058.354793] bfq_get_queue+0x138/0x858
[13058.354797] bfq_get_bfqq_handle_split+0xd4/0x328
[13058.354801] bfq_init_rq+0x1f4/0x1180
[13058.354806] bfq_insert_requests+0x264/0x1c98
[13058.354811] blk_mq_sched_insert_requests+0x1c4/0x488
[13058.354818] blk_mq_flush_plug_list+0x2d4/0x6e0
[13058.354826] blk_flush_plug_list+0x230/0x548
[13058.354830] blk_finish_plug+0x60/0x80
[13058.354838] read_pages+0xec/0x2c0
[13058.354842] __do_page_cache_readahead+0x374/0x438
[13058.354846] ondemand_readahead+0x24c/0x6b0
[13058.354851] page_cache_sync_readahead+0x17c/0x2f8
[13058.354858] generic_file_buffered_read+0x588/0xc58
[13058.354862] generic_file_read_iter+0x1b4/0x278
[13058.354965] ext4_file_read_iter+0xa8/0x1d8 [ext4]
[13058.354972] __vfs_read+0x238/0x320
[13058.354976] vfs_read+0xbc/0x1c0
[13058.354980] ksys_read+0xdc/0x1b8
[13058.354984] __arm64_sys_read+0x50/0x60
[13058.354990] el0_svc_common+0xb4/0x1d8
[13058.354994] el0_svc_handler+0x50/0xa8
[13058.354998] el0_svc+0x8/0xc
[13058.354999]
[13058.355001] Freed by task 19731:
[13058.355007] __kasan_slab_free+0x120/0x228
[13058.355010] kasan_slab_free+0x10/0x18
[13058.355014] kmem_cache_free+0x288/0x3f0
[13058.355018] bfq_put_queue+0x134/0x208
[13058.355022] bfq_exit_icq_bfqq+0x164/0x348
[13058.355026] bfq_exit_icq+0x28/0x40
[13058.355030] ioc_exit_icq+0xa0/0x150
[13058.355035] put_io_context_active+0x250/0x438
[13058.355038] exit_io_context+0xd0/0x138
[13058.355045] do_exit+0x734/0xc58
[13058.355050] do_group_exit+0x78/0x220
[13058.355054] __wake_up_parent+0x0/0x50
[13058.355058] el0_svc_common+0xb4/0x1d8
[13058.355062] el0_svc_handler+0x50/0xa8
[13058.355066] el0_svc+0x8/0xc
[13058.355067]
[13058.355071] The buggy address belongs to the object at
ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
[13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [
ffffa02cf3e63e70,
ffffa02cf3e64040)
[13058.355077] The buggy address belongs to the page:
[13058.355083] page:
ffff7e80b3cf9800 count:1 mapcount:0 mapping:
ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
[13058.366175] flags: 0x2ffffe0000008100(slab|head)
[13058.370781] raw:
2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
[13058.370787] raw:
ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
[13058.370789] page dumped because: kasan: bad access detected
[13058.370791]
[13058.370792] Memory state around the buggy address:
[13058.370797]
ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
[13058.370801]
ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370805] >
ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370808] ^
[13058.370811]
ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370815]
ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[13058.370817] ==================================================================
[13058.370820] Disabling lock debugging due to kernel taint
Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
--
V2->V3: rewrite the comment as suggested by Paolo Valente
V1->V2: add one comment, and add Fixes and Reported-by tag.
Fixes:
aee69d78d ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reported-by: Wang Wang <wangwang2@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Feilong Lin <linfeilong@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Balbir Singh [Fri, 13 Mar 2020 05:30:09 +0000 (05:30 +0000)]
scsi: Convert to use set_capacity_revalidate_and_notify
block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents. This notification is newly added to scsi sd.
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Balbir Singh [Fri, 13 Mar 2020 05:30:08 +0000 (05:30 +0000)]
nvme: Convert to use set_capacity_revalidate_and_notify
block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents. This notification is
newly added to NVME devices
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Balbir Singh [Fri, 13 Mar 2020 05:30:07 +0000 (05:30 +0000)]
xen-blkfront.c: Convert to use set_capacity_revalidate_and_notify
block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents.
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Balbir Singh [Fri, 13 Mar 2020 05:30:06 +0000 (05:30 +0000)]
virtio_blk.c: Convert to use set_capacity_revalidate_and_notify
block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents.
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Balbir Singh [Fri, 13 Mar 2020 05:30:05 +0000 (05:30 +0000)]
block/genhd: Notify udev about capacity change
Allow block/genhd to notify user space (via udev) about disk size changes
using a new helper set_capacity_revalidate_and_notify(), which is a wrapper
on top of set_capacity(). set_capacity_revalidate_and_notify() will only
notify via udev if the current capacity or the target capacity is not zero
and iff the capacity changes.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Someswarudu Sangaraju <ssomesh@amazon.com>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 18 Mar 2020 03:43:36 +0000 (11:43 +0800)]
block: Prevent hung_check firing during long sync IO
submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.
Another report is that task hang can be observed when running mkfs
over raid10 which takes a small max discard sectors limit because
of chunk size.
So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.
Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi <sqazi@google.com>
Reviewed-by: Jesse Barnes <jsbarnes@google.com>
Reviewed-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 18 Mar 2020 08:12:06 +0000 (09:12 +0100)]
block: fix a device invalidation regression
Historically we only set the capacity to zero for devices that support
partitions (independ of actually having partitions created). Doing that
is rather inconsistent, but changing it broke legacy udisks polling for
legacy ide-cdrom devices. Use the crude a crude check for devices that
either are non-removable or partitionable to get the sane behavior for
most device while not breaking userspace for this particular setup.
Fixes:
a1548b674403 ("block: move rescan_partitions to fs/block_dev.c")
Reported-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Alexey Dobriyan [Wed, 12 Feb 2020 17:40:27 +0000 (20:40 +0300)]
block, zoned: fix integer overflow with BLKRESETZONE et al
Check for overflow in addition before checking for end-of-block-device.
Steps to reproduce:
#define _GNU_SOURCE 1
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
typedef unsigned long long __u64;
struct blk_zone_range {
__u64 sector;
__u64 nr_sectors;
};
#define BLKRESETZONE _IOW(0x12, 131, struct blk_zone_range)
int main(void)
{
int fd = open("/dev/nullb0", O_RDWR|O_DIRECT);
struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL};
ioctl(fd, BLKRESETZONE, &zr);
return 0;
}
BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0
Write of size 8 at addr
0000000000000040 by task a.out/1590
CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c02bfa #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
Call Trace:
dump_stack+0x76/0xa0
__kasan_report.cold+0x5/0x3e
kasan_report+0xe/0x20
submit_bio_wait+0x74/0xe0
blkdev_zone_mgmt+0x26f/0x2a0
blkdev_zone_mgmt_ioctl+0x14b/0x1b0
blkdev_ioctl+0xb28/0xe60
block_ioctl+0x69/0x80
ksys_ioctl+0x3af/0xa50
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Weiping Zhang [Thu, 27 Feb 2020 01:38:46 +0000 (09:38 +0800)]
blk-iocost: remove duplicated lines in comments
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Revanth Rajashekar [Tue, 3 Mar 2020 19:17:00 +0000 (12:17 -0700)]
block: sed-opal: Change the check condition for regular session validity
This patch changes the check condition for the validity/authentication
of the session.
1. The Host Session Number(HSN) in the response should match the HSN for
the session.
2. The TPER Session Number(TSN) can never be less than 4096 for a regular
session.
Reference:
Section 3.2.2.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Opal_SSC_Application_Note_1-00_1-00-Final.pdf
Section 3.3.7.1.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Architecture_Core_Spec_v2.01_r1.00.pdf
Co-developed-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stephen Kitt [Sat, 7 Mar 2020 14:56:59 +0000 (15:56 +0100)]
block: Document genhd capability flags
The kernel documentation includes a brief section about genhd
capabilities, but it turns out that the only documented
capability (GENHD_FL_MEDIA_CHANGE_NOTIFY) isn't used any more.
This patch removes that flag, and documents the rest, based on my
understanding of the current uses of these flags in the kernel. The
documentation is kept in the header file, alongside the declarations,
in the hope that it will be kept up-to-date in future; the kernel
documentation is changed to include the documentation generated from
the header file.
Because the ultimate goal is to provide some end-user
documentation (or end-administrator documentation), the comments are
perhaps more user-oriented than might be expected. Since the values
are shown to users in hexadecimal, the documentation lists them in
hexadecimal, and the constant declarations are adjusted to match.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 9 Mar 2020 21:41:38 +0000 (22:41 +0100)]
block: cleanup comment for blk_flush_complete_seq
Remove the comment about return value, since it is not valid after
commit
404b8f5a03d84 ("block: cleanup kick/queued handling").
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 9 Mar 2020 21:41:37 +0000 (22:41 +0100)]
block: remove unneeded argument from blk_alloc_flush_queue
Remove 'q' from arguments since it is not used anymore after
commit
7e992f847a08e ("block: remove non mq parts from the
flush code").
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 9 Mar 2020 21:41:36 +0000 (22:41 +0100)]
block: cleanup for _blk/blk_rq_prep_clone
Both cmd and sense had been moved to scsi_request, so remove
the related comments to avoid confusion.
And as Bart suggested, move _blk_rq_prep_clone into the only
caller (blk_rq_prep_clone).
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 9 Mar 2020 21:41:35 +0000 (22:41 +0100)]
block: remove redundant setting of QUEUE_FLAG_DYING
Previously, blk_cleanup_queue has called blk_set_queue_dying to set the
flag, no need to do it again.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 9 Mar 2020 21:41:34 +0000 (22:41 +0100)]
block: use bio_{wouldblock,io}_error in direct_make_request
Use the two functions to simplify code.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Mon, 9 Mar 2020 21:41:33 +0000 (22:41 +0100)]
block: fix comment for blk_cloned_rq_check_limits
Since the later description mentioned "checked against the new queue
limits", so make the change to avoid confusion.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sahitya Tummala [Wed, 11 Mar 2020 10:37:50 +0000 (16:07 +0530)]
block: Fix use-after-free issue accessing struct io_cq
There is a potential race between ioc_release_fn() and
ioc_clear_queue() as shown below, due to which below kernel
crash is observed. It also can result into use-after-free
issue.
context#1: context#2:
ioc_release_fn() __ioc_clear_queue() gets the same icq
->spin_lock(&ioc->lock); ->spin_lock(&ioc->lock);
->ioc_destroy_icq(icq);
->list_del_init(&icq->q_node);
->call_rcu(&icq->__rcu_head,
icq_free_icq_rcu);
->spin_unlock(&ioc->lock);
->ioc_destroy_icq(icq);
->hlist_del_init(&icq->ioc_node);
This results into below crash as this memory
is now used by icq->__rcu_head in context#1.
There is a chance that icq could be free'd
as well.
22150.386550: <6> Unable to handle kernel write to read-only memory
at virtual address
ffffffaa8d31ca50
...
Call trace:
22150.607350: <2> ioc_destroy_icq+0x44/0x110
22150.611202: <2> ioc_clear_queue+0xac/0x148
22150.615056: <2> blk_cleanup_queue+0x11c/0x1a0
22150.619174: <2> __scsi_remove_device+0xdc/0x128
22150.623465: <2> scsi_forget_host+0x2c/0x78
22150.627315: <2> scsi_remove_host+0x7c/0x2a0
22150.631257: <2> usb_stor_disconnect+0x74/0xc8
22150.635371: <2> usb_unbind_interface+0xc8/0x278
22150.639665: <2> device_release_driver_internal+0x198/0x250
22150.644897: <2> device_release_driver+0x24/0x30
22150.649176: <2> bus_remove_device+0xec/0x140
22150.653204: <2> device_del+0x270/0x460
22150.656712: <2> usb_disable_device+0x120/0x390
22150.660918: <2> usb_disconnect+0xf4/0x2e0
22150.664684: <2> hub_event+0xd70/0x17e8
22150.668197: <2> process_one_work+0x210/0x480
22150.672222: <2> worker_thread+0x32c/0x4c8
Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to
indicate this icq is once marked as destroyed. Also, ensure
__ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so
that icq doesn't get free'd up while it is still using it.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Co-developed-by: Pradeep P V K <ppvk@codeaurora.org>
Signed-off-by: Pradeep P V K <ppvk@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:23 +0000 (21:26 -0700)]
null_blk: Add support for init_hctx() fault injection
This makes it possible to test the error path in blk_mq_realloc_hw_ctxs()
and also several error paths in null_blk.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:22 +0000 (21:26 -0700)]
null_blk: Handle null_add_dev() failures properly
If null_add_dev() fails then null_del_dev() is called with a NULL argument.
Make null_del_dev() handle this scenario correctly. This patch fixes the
following KASAN complaint:
null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
Read of size 8 at addr
0000000000000000 by task find/1062
Call Trace:
dump_stack+0xa5/0xe6
__kasan_report.cold+0x65/0x99
kasan_report+0x16/0x20
__asan_load8+0x58/0x90
null_del_dev+0x28/0x280 [null_blk]
nullb_group_drop_item+0x7e/0xa0 [null_blk]
client_drop_item+0x53/0x80 [configfs]
configfs_rmdir+0x395/0x4e0 [configfs]
vfs_rmdir+0xb6/0x220
do_rmdir+0x238/0x2c0
__x64_sys_unlinkat+0x75/0x90
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:21 +0000 (21:26 -0700)]
null_blk: Fix the null_add_dev() error path
If null_add_dev() fails, clear dev->nullb.
This patch fixes the following KASAN complaint:
BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
Read of size 8 at addr
ffff88803280fc30 by task check/8409
Call Trace:
dump_stack+0xa5/0xe6
print_address_description.constprop.0+0x26/0x260
__kasan_report.cold+0x7b/0x99
kasan_report+0x16/0x20
__asan_load8+0x58/0x90
nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff370926317
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:
00007fff2dd2da48 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000002 RCX:
00007ff370926317
RDX:
0000000000000002 RSI:
0000559437ef23f0 RDI:
0000000000000001
RBP:
0000559437ef23f0 R08:
000000000000000a R09:
0000000000000001
R10:
0000559436703471 R11:
0000000000000246 R12:
0000000000000002
R13:
00007ff370a006a0 R14:
00007ff370a014a0 R15:
00007ff370a008a0
Allocated by task 8409:
save_stack+0x23/0x90
__kasan_kmalloc.constprop.0+0xcf/0xe0
kasan_kmalloc+0xd/0x10
kmem_cache_alloc_node_trace+0x129/0x4c0
null_add_dev+0x24a/0xe90 [null_blk]
nullb_device_power_store+0x1b6/0x270 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 8409:
save_stack+0x23/0x90
__kasan_slab_free+0x112/0x160
kasan_slab_free+0x12/0x20
kfree+0xdf/0x250
null_add_dev+0xaf3/0xe90 [null_blk]
nullb_device_power_store+0x1b6/0x270 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes:
2984c8684f96 ("nullb: factor disk parameters")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:20 +0000 (21:26 -0700)]
null_blk: Fix changing the number of hardware queues
Instead of initializing null_blk hardware queues explicitly after the
request queue has been created, provide .init_hctx() and .exit_hctx()
callback functions. The latter functions are not only called during
request queue allocation but also when the number of hardware queues
changes. Allocate nr_cpu_ids queues during initialization to support
increasing the number of hardware queues above the initial hardware
queue count.
This change fixes increasing the number of hardware queues above the
initial number of hardware queues and also keeps nullb->nr_queues in
sync with the number of hardware queues.
Fixes:
45919fbfe1c4 ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:19 +0000 (21:26 -0700)]
null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed'
Although it is not clear to me why UBSAN complains when 'memory_backed'
is set, this patch suppresses the UBSAN complaint that is triggered when
setting that configfs attribute.
UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1
load of value 16 is not a valid value for type '_Bool'
CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0xa5/0xe6
ubsan_epilogue+0x9/0x26
__ubsan_handle_load_invalid_value+0x6d/0x76
nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:18 +0000 (21:26 -0700)]
blk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs()
q->nr_hw_queues must only be updated once it is known that
blk_mq_realloc_hw_ctxs() has succeeded. Otherwise it can happen that
reallocation fails and that q->nr_hw_queues is larger than the number of
allocated hardware queues. This patch fixes the following crash if
increasing the number of hardware queues fails:
BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x775/0x810
Write of size 8 at addr
0000000000000118 by task check/977
CPU: 3 PID: 977 Comm: check Not tainted 5.6.0-rc1-dbg+ #8
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0xa5/0xe6
__kasan_report.cold+0x65/0x99
kasan_report+0x16/0x20
check_memory_region+0x140/0x1b0
memset+0x28/0x40
blk_mq_map_swqueue+0x775/0x810
blk_mq_update_nr_hw_queues+0x468/0x710
nullb_device_submit_queues_store+0xf7/0x1a0 [null_blk]
configfs_write_file+0x1c4/0x250 [configfs]
__vfs_write+0x4c/0x90
vfs_write+0x145/0x2c0
ksys_write+0xd7/0x180
__x64_sys_write+0x47/0x50
do_syscall_64+0x6f/0x2f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes:
ac0d6b926e74 ("block: Reduce the amount of memory required per request queue")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:17 +0000 (21:26 -0700)]
blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync
blk_mq_map_queues() and multiple .map_queues() implementations expect that
set->map[HCTX_TYPE_DEFAULT].nr_queues is set to the number of hardware
queues. Hence set .nr_queues before calling these functions. This patch
fixes the following kernel warning:
WARNING: CPU: 0 PID: 2501 at include/linux/cpumask.h:137
Call Trace:
blk_mq_run_hw_queue+0x19d/0x350 block/blk-mq.c:1508
blk_mq_run_hw_queues+0x112/0x1a0 block/blk-mq.c:1525
blk_mq_requeue_work+0x502/0x780 block/blk-mq.c:775
process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x361/0x430 kernel/kthread.c:255
Fixes:
ed76e329d74a ("blk-mq: abstract out queue map") # v5.0
Reported-by: syzbot+d44e1b26ce5c3e77458d@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Tue, 10 Mar 2020 04:26:16 +0000 (21:26 -0700)]
blk-mq: Fix a comment in include/linux/blk-mq.h
The 'hctx_list' member of struct blk_mq_hw_ctx is not a list head but
instead an entry in q->unused_hctx_list. Fix the comment above this
struct member.
Fixes:
d386732bc142 ("blk-mq: fill header with kernel-doc")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Mon, 9 Mar 2020 00:44:44 +0000 (17:44 -0700)]
Linux 5.6-rc5
Linus Torvalds [Mon, 9 Mar 2020 00:36:22 +0000 (17:36 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"We've been accruing these for a couple of weeks, so the batch is a bit
bigger than usual.
Largest delta is due to a led-bl driver that is added -- there was a
miscommunication before the merge window and the driver didn't make it
in. Due to this, the platforms needing it regressed. At this point, it
seemed easier to add the new driver than unwind the changes.
Besides that, there are a handful of various fixes:
- AMD tee memory leak fix
- A handful of fixlets for i.MX SCU communication
- A few maintainers woke up and realized DEBUG_FS had been missing
for a while, so a few updates of that.
... and the usual collection of smaller fixes to various platforms"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (37 commits)
ARM: socfpga_defconfig: Add back DEBUG_FS
arm64: dts: socfpga: agilex: Fix gmac compatible
ARM: bcm2835_defconfig: Explicitly restore CONFIG_DEBUG_FS
arm64: dts: meson: fix gxm-khadas-vim2 wifi
arm64: dts: meson-sm1-sei610: add missing interrupt-names
ARM: meson: Drop unneeded select of COMMON_CLK
ARM: dts: bcm2711: Add pcie0 alias
ARM: dts: bcm283x: Add missing properties to the PWR LED
tee: amdtee: fix memory leak in amdtee_open_session()
ARM: OMAP2+: Fix compile if CONFIG_HAVE_ARM_SMCCC is not set
arm: dts: dra76x: Fix mmc3 max-frequency
ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
bus: ti-sysc: Fix 1-wire reset quirk
ARM: dts: r8a7779: Remove deprecated "renesas, rcar-sata" compatible value
soc: imx-scu: Align imx sc msg structs to 4
firmware: imx: Align imx_sc_msg_req_cpu_start to 4
firmware: imx: scu-pd: Align imx sc msg structs to 4
firmware: imx: misc: Align imx sc msg structs to 4
firmware: imx: scu: Ensure sequential TX
ARM: dts: imx7-colibri: Fix frequency for sd/mmc
...
Linus Torvalds [Mon, 9 Mar 2020 00:33:52 +0000 (17:33 -0700)]
Merge tag 'edac_urgent-2020-03-08' of git://git./linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"Error reporting fix for synopsys_edac: do not overwrite partial
decoded error message (Sherry Sun)"
* tag 'edac_urgent-2020-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/synopsys: Do not print an error with back-to-back snprintf() calls
Linus Torvalds [Sun, 8 Mar 2020 15:49:44 +0000 (10:49 -0500)]
Merge tag 'char-misc-5.6-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are four small char/misc driver fixes for reported issues for
5.6-rc5.
These fixes are:
- binder fix for a potential use-after-free problem found (took two
tries to get it right)
- interconnect core fix
- altera-stapl driver fix
All four of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: prevent UAF for binderfs devices II
interconnect: Handle memory allocation errors
altera-stapl: altera_get_note: prevent write beyond end of 'key'
binder: prevent UAF for binderfs devices
Linus Torvalds [Sun, 8 Mar 2020 15:39:40 +0000 (10:39 -0500)]
Merge tag 'driver-core-5.6-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs fixes from Greg KH:
"Here are four small driver core / debugfs patches for 5.6-rc3:
- debugfs api cleanup now that all debugfs_create_regset32() callers
have been fixed up. This was waiting until after the -rc1 merge as
these fixes came in through different trees
- driver core sync state fixes based on reports of minor issues found
in the feature
All of these have been in linux-next with no reported issues"
* tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Skip unnecessary work when device doesn't have sync_state()
driver core: Add dev_has_sync_state()
driver core: Call sync_state() even if supplier has no consumers
debugfs: remove return value of debugfs_create_regset32()
Linus Torvalds [Sun, 8 Mar 2020 15:35:04 +0000 (10:35 -0500)]
Merge tag 'tty-5.6-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial fixes for 5.6-rc5
Just some small serial driver fixes, and a vt core fixup, full details
are:
- vt fixes for issues found by syzbot
- serdev fix for Apple boxes
- fsl_lpuart serial driver fixes
- MAINTAINER update for incorrect serial files
- new device ids for 8250_exar driver
- mvebu-uart fix
All of these have been in linux-next with no reported issues"
* tag 'tty-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: fsl_lpuart: free IDs allocated by IDA
Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"
serdev: Fix detection of UART devices on Apple machines.
MAINTAINERS: Add missed files related to Synopsys DesignWare UART
serial: 8250_exar: add support for ACCES cards
tty:serial:mvebu-uart:fix a wrong return
vt: selection, push sel_lock up
vt: selection, push console lock down
Linus Torvalds [Sun, 8 Mar 2020 15:32:23 +0000 (10:32 -0500)]
Merge tag 'usb-5.6-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are some small USB and PHY driver fixes for reported issues for
5.6-rc5.
Included in here are:
- phy driver fixes
- new USB quirks
- USB cdns3 gadget driver fixes
- USB hub core fixes
All of these have been in linux-next with no reported issues"
* tag 'usb-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: gadget: Update chain bit correctly when using sg list
usb: core: port: do error out if usb_autopm_get_interface() fails
usb: core: hub: do error out if usb_autopm_get_interface() fails
usb: core: hub: fix unhandled return by employing a void function
usb: storage: Add quirk for Samsung Fit flash
usb: quirks: add NO_LPM quirk for Logitech Screen Share
usb: usb251xb: fix regulator probe and error handling
phy: allwinner: Fix GENMASK misuse
usb: cdns3: gadget: toggle cycle bit before reset endpoint
usb: cdns3: gadget: link trb should point to next request
phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
phy: brcm-sata: Correct MDIO operations for 40nm platforms
phy: ti: gmii-sel: do not fail in case of gmii
phy: ti: gmii-sel: fix set of copy-paste errors
phy: core: Fix phy_get() to not return error on link creation failure
phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
Linus Torvalds [Sun, 8 Mar 2020 01:52:55 +0000 (19:52 -0600)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Nothing particularly exciting, some small ODP regressions from the mmu
notifier rework, another bunch of syzkaller fixes, and a bug fix for a
botched syzkaller fix in the first rc pull request.
- Fix busted syzkaller fix in 'get_new_pps' - this turned out to
crash on certain HW configurations
- Bug fixes for various missed things in error unwinds
- Add a missing rcu_read_lock annotation in hfi/qib
- Fix two ODP related regressions from the recent mmu notifier
changes
- Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm
- Revert an old patch in CMA as it is now shown to not be allocating
port numbers properly"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/iwcm: Fix iwcm work deallocation
RDMA/siw: Fix failure handling during device creation
RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
RDMA/odp: Ensure the mm is still alive before creating an implicit child
RDMA/core: Fix protection fault in ib_mr_pool_destroy
IB/mlx5: Fix implicit ODP race
IB/hfi1, qib: Ensure RCU is locked when accessing list
RDMA/core: Fix pkey and port assignment in get_new_pps
RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
RDMA/rw: Fix error flow during RDMA context initialization
RDMA/core: Fix use of logical OR in get_new_pps
Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
Linus Torvalds [Sat, 7 Mar 2020 20:20:29 +0000 (14:20 -0600)]
Merge tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Here are a few io_uring fixes that should go into this release. This
contains:
- Removal of (now) unused io_wq_flush() and associated flag (Pavel)
- Fix cancelation lockup with linked timeouts (Pavel)
- Fix for potential use-after-free when freeing percpu ref for fixed
file sets
- io-wq cancelation fixups (Pavel)"
* tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
io_uring: fix lockup with timeouts
io_uring: free fixed_file_data after RCU grace period
io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL
io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
Linus Torvalds [Sat, 7 Mar 2020 20:14:38 +0000 (14:14 -0600)]
Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here are a few fixes that should go into this release. This contains:
- Revert of a bad bcache patch from this merge window
- Removed unused function (Daniel)
- Fixup for the blktrace fix from Jan from this release (Cengiz)
- Fix of deeper level bfqq overwrite in BFQ (Carlo)"
* tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
blktrace: fix dereference after null check
Revert "bcache: ignore pending signals when creating gc and allocator thread"
block: Remove used kblockd_schedule_work_on()
Linus Torvalds [Sat, 7 Mar 2020 18:00:13 +0000 (12:00 -0600)]
Merge tag 'media/v5.6-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a fix for the media controller links in both hantro driver and in
v4l2-mem2mem core
- some fixes for the pulse8-cec driver
- vicodec: handle alpha channel for RGB32 formats, as it may be used
- mc-entity.c: fix handling of pad flags
* tag 'media/v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: hantro: Fix broken media controller links
media: mc-entity.c: use & to check pad flags, not ==
media: v4l2-mem2mem.c: fix broken links
media: vicodec: process all 4 components for RGB32 formats
media: pulse8-cec: close serio in disconnect, not adap_free
media: pulse8-cec: INIT_DELAYED_WORK was called too late
Pavel Begunkov [Fri, 6 Mar 2020 22:15:22 +0000 (01:15 +0300)]
io_uring: fix lockup with timeouts
There is a recipe to deadlock the kernel: submit a timeout sqe with a
linked_timeout (e.g. test_single_link_timeout_ception() from liburing),
and SIGKILL the process.
Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
during io_put_free() to cancel the linked timeout. Probably, the same
can happen with another io_kill_timeout() call site, that is
io_commit_cqring().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 7 Mar 2020 14:12:47 +0000 (08:12 -0600)]
Merge tag 's390-5.6-5' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix panic in gup_fast on large pud by providing an implementation of
pud_write. This has been overlooked during migration to common gup
code.
- Fix unexpected write combining on PCI stores.
* tag 's390-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: Fix unexpected write combine on resource
s390/mm: fix panic in gup_fast on large pud
Linus Torvalds [Sat, 7 Mar 2020 14:10:34 +0000 (08:10 -0600)]
Merge tag 'powerpc-5.6-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.6:
- One fix for a recent regression to our breakpoint/watchpoint code.
- Another fix for our KUAP support, this time a missing annotation in
a rarely used path in signal handling.
- A fix for our handling of a CPU feature that effects the PMU, when
booting guests in some configurations.
- A minor fix to our linker script to explicitly include the .BTF
section.
Thanks to: Christophe Leroy, Desnes A. Nunes do Rosario, Leonardo
Bras, Naveen N. Rao, Ravi Bangoria, Stefan Berger"
* tag 'powerpc-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
powerpc: Include .BTF section
powerpc/watchpoint: Don't call dar_within_range() for Book3S
Linus Torvalds [Sat, 7 Mar 2020 14:04:54 +0000 (08:04 -0600)]
Merge tag 'for-linus-5.6b-rc5-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Four fixes and a small cleanup patch:
- two fixes by Dongli Zhang fixing races in the xenbus driver
- two fixes by me fixing issues introduced in 5.6
- a small cleanup by Gustavo Silva replacing a zero-length array with
a flexible-array"
* tag 'for-linus-5.6b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/blkfront: fix ring info addressing
xen/xenbus: fix locking
xenbus: req->err should be updated before req->state
xenbus: req->body should be updated before req->state
xen: Replace zero-length array with flexible-array member
Linus Torvalds [Sat, 7 Mar 2020 14:01:43 +0000 (08:01 -0600)]
Merge tag 'for-linus-2020-03-07' of gitolite.pub/scm/linux/kernel/git/brauner/linux
Pull thread fixes from Christian Brauner:
"Here are a few hopefully uncontroversial fixes:
- Use RCU_INIT_POINTER() when initializing rcu protected members in
task_struct to fix sparse warnings.
- Add pidfd_fdinfo_test binary to .gitignore file"
* tag 'for-linus-2020-03-07' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
selftests: pidfd: Add pidfd_fdinfo_test in .gitignore
exit: Fix Sparse errors and warnings
fork: Use RCU_INIT_POINTER() instead of rcu_access_pointer()
Linus Torvalds [Sat, 7 Mar 2020 13:59:30 +0000 (07:59 -0600)]
Merge tag 'sound-5.6-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The regular "bump-in-the-middle" updates, containing mostly ASoC-
related fixes at this time. All changes are reasonably small.
A few entries are for ASoC and ALSA core parts (DAPM, PCM, topology)
for followups of the recent changes and potential buffer overflow by
snprintf(), while the rest are (both new and old) device-specific
fixes for Intel, meson, tas2562, rt1015, as well as the usual HD-audio
quirks"
* tag 'sound-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
ALSA: sgio2audio: Remove usage of dropped hw_params/hw_free functions
ALSA: hda/realtek - Enable the headset of ASUS B9450FA with ALC294
ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1
ALSA: hda/realtek - Add Headset Mic supported
ASoC: wm8741: Fix typo in Kconfig prompt
ASoC: stm32: sai: manage rebind issue
ASoC: SOF: Fix snd_sof_ipc_stream_posn()
ASoC: rt1015: modify pre-divider for sysclk
ASoC: rt1015: add operation callback function for rt1015_dai[]
ASoC: soc-component: tidyup snd_soc_pcm_component_sync_stop()
ASoC: dapm: Correct DAPM handling of active widgets during shutdown
ASoC: tas2562: Fix sample rate error message
ASoC: Intel: Skylake: Fix available clock counter incrementation
ASoC: soc-pcm/soc-compress: don't use snd_soc_dapm_stream_stop()
ASoC: meson: g12a: add tohdmitx reset
ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
ASoC: soc-core: fix for_rtd_codec_dai_rollback() macro
ASoC: topology: Fix memleak in soc_tplg_manifest_load()
ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
...
Takashi Iwai [Sat, 7 Mar 2020 06:24:36 +0000 (07:24 +0100)]
Merge tag 'asoc-fix-v5.6-rc4' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.6
More fixes that have arrived since the merge window, spread out all
over. There's a few things like the operation callback addition for
rt1015 and the meson reset addition which add small new bits of
functionality to fix non-working systems, they're all very small and for
parts of newly added functionality.
Linus Torvalds [Fri, 6 Mar 2020 23:03:37 +0000 (17:03 -0600)]
Merge tag 'linux-kselftest-5.6-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"This consists of a cleanup patch to undo changes to global .gitignore
that added selftests/lkdtm objects and add them to a local
selftests/lkdtm/.gitignore.
Summary of Linus's comments on local vs. global gitignore scope:
- Keep local gitignore patterns in local files.
- Put only global gitignore patterns in the top-level gitignore file.
Local scope keeps things much better separated. It also incidentally
means that if a directory gets renamed, the gitignore file continues
to work unless in the case of renaming the actual files themselves
that are named in the gitignore"
* tag 'linux-kselftest-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftest/lkdtm: Use local .gitignore
Linus Torvalds [Fri, 6 Mar 2020 22:38:33 +0000 (16:38 -0600)]
Merge tag 'riscv-for-linus-5.6-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a handful of fixes that I would like to target for 5.6:
- A pair of fixes to module loading, which we hope solve the last of
the issues with module text being loaded too sparsely for our call
relocations.
- A Kconfig fix that disallows selecting memory models not supported
by NOMMU.
- A series of Kconfig updates to ease selecting the drivers necessary
to run on QEMU's virt platform.
- DTS updates for SiFive's HiFive Unleashed.
- A fix to our seccomp support that avoids mangling restartable
syscalls"
* tag 'riscv-for-linus-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix seccomp reject syscall code path
riscv: dts: Add GPIO reboot method to HiFive Unleashed DTS file
RISC-V: Select Goldfish RTC driver for QEMU virt machine
RISC-V: Select SYSCON Reboot and Poweroff for QEMU virt machine
RISC-V: Enable QEMU virt machine support in defconfigs
RISC-V: Add kconfig option for QEMU virt machine
riscv: Fix range looking for kernel image memblock
riscv: Force flat memory model with no-mmu
riscv: Change code model of module to medany to improve data accessing
riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
Jonathan Neuschäfer [Fri, 6 Mar 2020 22:13:11 +0000 (23:13 +0100)]
parse-maintainers: Mark as executable
This makes the script more convenient to run.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 6 Mar 2020 22:11:34 +0000 (16:11 -0600)]
Merge tag 'devicetree-fixes-for-5.6-3' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
"Another batch of DT fixes. I think this should be the last of it, but
sending pull requests seems to cause people to send more fixes.
Summary:
- Fixes for warnings introduced by hierarchical PSCI binding changes
- Fixes for broken doc references due to DT schema conversions
- Several grammar and typo fixes
- Fix a bunch of dtc warnings in examples"
* tag 'devicetree-fixes-for-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states
dt-bindings: power: Extend nodename pattern for power-domain providers
MAINTAINERS: update ALLWINNER CPUFREQ DRIVER entry
dt-bindings: bus: Drop empty compatible string in example
dt-bindings: power: Convert domain-idle-states bindings to json-schema
dt-bindings: arm: Fix cpu compatibles in the hierarchical example for PSCI
dt-bindings: arm: Correct links to idle states definitions
dt-bindings: mfd: Fix typo in file name of twl-familly.txt
dt-bindings: mfd: tps65910: Improve grammar
dt-bindings: mfd: zii,rave-sp: Fix a typo ("onborad")
dt-bindings: arm: fsl: fix APF6Dev compatible
dt-bindings: Fix dtc warnings in examples
docs: dt: fix several broken doc references
docs: dt: fix several broken references due to renames
MAINTAINERS: clean up PCIE DRIVER FOR CAVIUM THUNDERX
Linus Torvalds [Fri, 6 Mar 2020 22:08:48 +0000 (16:08 -0600)]
Merge tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm
Pull vgacon fix from Daniel Vetter:
"One vgacon input check for stable"
* tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm:
vgacon: Fix a UAF in vgacon_invert_region
Linus Torvalds [Fri, 6 Mar 2020 20:56:46 +0000 (14:56 -0600)]
Merge tag 'for-5.6-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One fixup for DIO when in use with the new checksums, a missed case
where the checksum size was still assuming u32"
* tag 'for-5.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix RAID direct I/O reads with alternate csums
Linus Torvalds [Fri, 6 Mar 2020 20:55:27 +0000 (14:55 -0600)]
Merge tag 'filelock-v5.6-1' of git://git./linux/kernel/git/jlayton/linux
Pull file locking fixes from Jeff Layton:
"Just a couple of late-breaking patches for the file locking code. The
second patch (from yangerkun) fixes a rather nasty looking potential
use-after-free that should go to stable.
The other patch could technically wait for 5.7, but it's fairly
innocuous so I figured we might as well take it"
* tag 'filelock-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: fix a potential use-after-free problem when wakeup a waiter
fcntl: Distribute switch variables for initialization
Linus Torvalds [Fri, 6 Mar 2020 20:50:16 +0000 (14:50 -0600)]
Merge tag 'spi-fix-v5.6-rc4' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A selection of small fixes, mostly for drivers, that have arrived
since the merge window. None of them are earth shattering in
themselves but all useful for affected systems"
* tag 'spi-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi_register_controller(): free bus id on error paths
spi: bcm63xx-hsspi: Really keep pll clk enabled
spi: atmel-quadspi: fix possible MMIO window size overrun
spi/zynqmp: remove entry that causes a cs glitch
spi: pxa2xx: Add CS control clock quirk
spi: spidev: Fix CS polarity if GPIO descriptors are used
spi: qup: call spi_qup_pm_resume_runtime before suspending
spi: spi-omap2-mcspi: Support probe deferral for DMA channels
spi: spi-omap2-mcspi: Handle DMA size restriction on AM65x
Linus Torvalds [Fri, 6 Mar 2020 20:48:30 +0000 (14:48 -0600)]
Merge tag 'regulator-fix-v5.6-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small fixes, one for a minor issue in the stm32-vrefbuf
driver and a documentation fix in the Qualcomm code"
* tag 'regulator-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling
regulator: qcom_spmi: Fix docs for PM8004
Linus Torvalds [Fri, 6 Mar 2020 20:47:06 +0000 (14:47 -0600)]
Merge tag 'hwmon-for-v5.6-rc5' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix an error return in the adt7462 driver, bad voltage limits reported
by the xdpe12284 driver, and a broken documentation reference in the
adm1177 driver documentation"
* tag 'hwmon-for-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
hwmon: (pmbus/xdpe12284) Add callback for vout limits conversion
docs: adm1177: fix a broken reference
Linus Torvalds [Fri, 6 Mar 2020 20:35:47 +0000 (14:35 -0600)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are another three arm64 fixes for 5.6, all pretty minor. Main
thing is fixing a silly bug in the fsl_imx8_ddr PMU driver where we
would zero the counters when disabling them.
- Fix misreporting of ASID limit when KPTI is enabled
- Fix busted NULL pointer checks for GICC structure in ACPI PMU code
- Avoid nobbling the "fsl_imx8_ddr" PMU counters when disabling them"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: context: Fix ASID limit in boot messages
drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
drivers/perf: fsl_imx8_ddr: Correct the CLEAR bit definition
Zhang Xiaoxu [Wed, 4 Mar 2020 02:24:29 +0000 (10:24 +0800)]
vgacon: Fix a UAF in vgacon_invert_region
When syzkaller tests, there is a UAF:
BUG: KASan: use after free in vgacon_invert_region+0x9d/0x110 at addr
ffff880000100000
Read of size 2 by task syz-executor.1/16489
page:
ffffea0000004000 count:0 mapcount:-127 mapping: (null)
index:0x0
page flags: 0xfffff00000000()
page dumped because: kasan: bad access detected
CPU: 1 PID: 16489 Comm: syz-executor.1 Not tainted
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
[<
ffffffffb119f309>] dump_stack+0x1e/0x20
[<
ffffffffb04af957>] kasan_report+0x577/0x950
[<
ffffffffb04ae652>] __asan_load2+0x62/0x80
[<
ffffffffb090f26d>] vgacon_invert_region+0x9d/0x110
[<
ffffffffb0a39d95>] invert_screen+0xe5/0x470
[<
ffffffffb0a21dcb>] set_selection+0x44b/0x12f0
[<
ffffffffb0a3bfae>] tioclinux+0xee/0x490
[<
ffffffffb0a1d114>] vt_ioctl+0xff4/0x2670
[<
ffffffffb0a0089a>] tty_ioctl+0x46a/0x1a10
[<
ffffffffb052db3d>] do_vfs_ioctl+0x5bd/0xc40
[<
ffffffffb052e2f2>] SyS_ioctl+0x132/0x170
[<
ffffffffb11c9b1b>] system_call_fastpath+0x22/0x27
Memory state around the buggy address:
ffff8800000fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
ffff8800000fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
>
ffff880000100000: ff ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff
It can be reproduce in the linux mainline by the program:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <linux/vt.h>
struct tiocl_selection {
unsigned short xs; /* X start */
unsigned short ys; /* Y start */
unsigned short xe; /* X end */
unsigned short ye; /* Y end */
unsigned short sel_mode; /* selection mode */
};
#define TIOCL_SETSEL 2
struct tiocl {
unsigned char type;
unsigned char pad;
struct tiocl_selection sel;
};
int main()
{
int fd = 0;
const char *dev = "/dev/char/4:1";
struct vt_consize v = {0};
struct tiocl tioc = {0};
fd = open(dev, O_RDWR, 0);
v.v_rows = 3346;
ioctl(fd, VT_RESIZEX, &v);
tioc.type = TIOCL_SETSEL;
ioctl(fd, TIOCLINUX, &tioc);
return 0;
}
When resize the screen, update the 'vc->vc_size_row' to the new_row_size,
but when 'set_origin' in 'vgacon_set_origin', vgacon use 'vga_vram_base'
for 'vc_origin' and 'vc_visible_origin', not 'vc_screenbuf'. It maybe
smaller than 'vc_screenbuf'. When TIOCLINUX, use the new_row_size to calc
the offset, it maybe larger than the vga_vram_size in vgacon driver, then
bad access.
Also, if set an larger screenbuf firstly, then set an more larger
screenbuf, when copy old_origin to new_origin, a bad access may happen.
So, If the screen size larger than vga_vram, resize screen should be
failed. This alse fix CVE-2020-8649 and CVE-2020-8647.
Linus pointed out that overflow checking seems absent. We're saved by
the existing bounds checks in vc_do_resize() with rather strict
limits:
if (cols > VC_RESIZE_MAXCOL || lines > VC_RESIZE_MAXROW)
return -EINVAL;
Fixes:
0aec4867dca14 ("[PATCH] SVGATextMode fix")
Reference: CVE-2020-8647 and CVE-2020-8649
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
[danvet: augment commit message to point out overflow safety]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304022429.37738-1-zhangxiaoxu5@huawei.com
Ulf Hansson [Tue, 3 Mar 2020 15:07:47 +0000 (16:07 +0100)]
dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states
The hierarchical topology with power-domain should be described through
child nodes, rather than as currently described in the PSCI root node. Fix
this by adding a patternProperties with a corresponding reference to the
power-domain DT binding.
Additionally, update the example to conform to the new pattern, but also to
the adjusted domain-idle-state DT binding.
Fixes:
a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[robh: Add missing allOf, tweak power-domain node name]
Signed-off-by: Rob Herring <robh@kernel.org>
Ulf Hansson [Tue, 3 Mar 2020 15:07:46 +0000 (16:07 +0100)]
dt-bindings: power: Extend nodename pattern for power-domain providers
The existing binding requires the nodename to have a '@', which is a bit
limiting for the wider use case. Therefore, let's extend the pattern to
allow either '@' or '-'.
Fixes:
a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[robh: drop example change]
Signed-off-by: Rob Herring <robh@kernel.org>
Jens Axboe [Wed, 4 Mar 2020 14:25:50 +0000 (07:25 -0700)]
io_uring: free fixed_file_data after RCU grace period
The percpu refcount protects this structure, and we can have an atomic
switch in progress when exiting. This makes it unsafe to just free the
struct normally, and can trigger the following KASAN warning:
BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
Read of size 1 at addr
ffff888181a19a30 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x3b/0x60
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
__kasan_report.cold+0x1a/0x3d
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
rcu_core+0x370/0x830
? percpu_ref_exit+0x50/0x50
? rcu_note_context_switch+0x7b0/0x7b0
? run_rebalance_domains+0x11d/0x140
__do_softirq+0x10a/0x3e9
irq_exit+0xd5/0xe0
smp_apic_timer_interrupt+0x86/0x200
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:default_idle+0x26/0x1f0
Fix this by punting the final exit and free of the struct to RCU, then
we know that it's safe to do so. Jann suggested the approach of using a
double rcu callback to achieve this. It's important that we do a nested
call_rcu() callback, as otherwise the free could be ordered before the
atomic switch, even if the latter was already queued.
Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
Suggested-by: Jann Horn <jannh@google.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
yangerkun [Wed, 4 Mar 2020 07:25:56 +0000 (15:25 +0800)]
locks: fix a potential use-after-free problem when wakeup a waiter
'
16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the
logic to check waiter->fl_blocker without blocked_lock_lock. And it will
trigger a UAF when we try to wakeup some waiter:
Thread 1 has create a write flock a on file, and now thread 2 try to
unlock and delete flock a, thread 3 try to add flock b on the same file.
Thread2 Thread3
flock syscall(create flock b)
...flock_lock_inode_wait
flock_lock_inode(will insert
our fl_blocked_member list
to flock a's fl_blocked_requests)
sleep
flock syscall(unlock)
...flock_lock_inode_wait
locks_delete_lock_ctx
...__locks_wake_up_blocks
__locks_delete_blocks(
b->fl_blocker = NULL)
...
break by a signal
locks_delete_block
b->fl_blocker == NULL &&
list_empty(&b->fl_blocked_requests)
success, return directly
locks_free_lock b
wake_up(&b->fl_waiter)
trigger UAF
Fix it by remove this logic, and this patch may also fix CVE-2019-19769.
Cc: stable@vger.kernel.org
Fixes:
16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>