Alessio Balsini [Wed, 7 Aug 2019 00:48:28 +0000 (01:48 +0100)]
loop: Add LOOP_SET_DIRECT_IO to compat ioctl
Enabling Direct I/O with loop devices helps reducing memory usage by
avoiding double caching. 32 bit applications running on 64 bits systems
are currently not able to request direct I/O because is missing from the
lo_compat_ioctl.
This patch fixes the compatibility issue mentioned above by exporting
LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry.
The input argument for this ioctl is a single long converted to a 1-bit
boolean, so compatibility is preserved.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zhou Wang [Wed, 24 Jul 2019 03:54:23 +0000 (11:54 +0800)]
lib: scatterlist: Fix to support no mapped sg
In function sg_split, the second sg_calculate_split will return -EINVAL
when in_mapped_nents is 0.
Indeed there is no need to do second sg_calculate_split and sg_split_mapped
when in_mapped_nents is 0, as in_mapped_nents indicates no mapped entry in
original sgl.
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
YueHaibing [Wed, 7 Aug 2019 13:18:47 +0000 (21:18 +0800)]
lightnvm: remove set but not used variables 'data_len' and 'rq_len'
drivers/lightnvm/pblk-read.c: In function pblk_submit_read_gc:
drivers/lightnvm/pblk-read.c:423:6: warning: variable data_len set but not used [-Wunused-but-set-variable]
drivers/lightnvm/pblk-recovery.c: In function pblk_recov_scan_oob:
drivers/lightnvm/pblk-recovery.c:368:15: warning: variable rq_len set but not used [-Wunused-but-set-variable]
They are not used since commit
48e5da725581 ("lightnvm:
move metadata mapping to lower level driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 7 Aug 2019 18:26:53 +0000 (12:26 -0600)]
Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.4/block
Pull MD changes from Song.
* 'md-next' of https://github.com/liu-song-6/linux:
raid1: factor out a common routine to handle the completion of sync write
md: don't call spare_active in md_reap_sync_thread if all member devices can't work
md: don't set In_sync if array is frozen
md: allow last device to be forcibly removed from RAID1/RAID10.
md: Convert to use int_pow()
md/raid10: end bio when the device faulty
md/raid1: end bio when the device faulty
md/raid6: Set R5_ReadError when there is read failure on parity disk
raid1: use an int as the return value of raise_barrier()
Hou Tao [Sat, 27 Jul 2019 06:02:58 +0000 (14:02 +0800)]
raid1: factor out a common routine to handle the completion of sync write
It's just code clean-up.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Wed, 24 Jul 2019 09:09:21 +0000 (11:09 +0200)]
md: don't call spare_active in md_reap_sync_thread if all member devices can't work
When add one disk to array, the md_reap_sync_thread is responsible
to activate the spare and set In_sync flag for the new member in
spare_active().
But if raid1 has one member disk A, and disk B is added to the array.
Then we offline A before all the datas are synchronized from A to B,
obviously B doesn't have the latest data as A, but B is still marked
with In_sync flag.
So let's not call spare_active under the condition, otherwise B is
still showed with 'U' state which is not correct.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Wed, 24 Jul 2019 09:09:20 +0000 (11:09 +0200)]
md: don't set In_sync if array is frozen
When a disk is added to array, the following path is called in mdadm.
Manage_subdevs -> sysfs_freeze_array
-> Manage_add
-> sysfs_set_str(&info, NULL, "sync_action","idle")
Then from kernel side, Manage_add invokes the path (add_new_disk ->
validate_super = super_1_validate) to set In_sync flag.
Since In_sync means "device is in_sync with rest of array", and the new
added disk need to resync thread to help the synchronization of data.
And md_reap_sync_thread would call spare_active to set In_sync for the
new added disk finally. So don't set In_sync if array is in frozen.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Wed, 24 Jul 2019 09:09:19 +0000 (11:09 +0200)]
md: allow last device to be forcibly removed from RAID1/RAID10.
When the 'last' device in a RAID1 or RAID10 reports an error,
we do not mark it as failed. This would serve little purpose
as there is no risk of losing data beyond that which is obviously
lost (as there is with RAID5), and there could be other sectors
on the device which are readable, and only readable from this device.
This in general this maximises access to data.
However the current implementation also stops an admin from removing
the last device by direct action. This is rarely useful, but in many
case is not harmful and can make automation easier by removing special
cases.
Also, if an attempt to write metadata fails the device must be marked
as faulty, else an infinite loop will result, attempting to update
the metadata on all non-faulty devices.
So add 'fail_last_dev' member to 'struct mddev', then we can bypasses
the 'last disk' checks for RAID1 and RAID10, and control the behavior
per array by change sysfs node.
Signed-off-by: NeilBrown <neilb@suse.de>
[add sysfs node for fail_last_dev by Guoqing]
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Andy Shevchenko [Tue, 23 Jul 2019 20:41:55 +0000 (23:41 +0300)]
md: Convert to use int_pow()
Instead of linear approach to calculate power of 10, use generic int_pow()
which does it better.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Yufen Yu [Fri, 19 Jul 2019 05:48:47 +0000 (13:48 +0800)]
md/raid10: end bio when the device faulty
Just like raid1, we do not queue write error bio to retry write
and acknowlege badblocks, when the device is faulty.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Yufen Yu [Fri, 19 Jul 2019 05:48:46 +0000 (13:48 +0800)]
md/raid1: end bio when the device faulty
When write bio return error, it would be added to conf->retry_list
and wait for raid1d thread to retry write and acknowledge badblocks.
In narrow_write_error(), the error bio will be split in the unit of
badblock shift (such as one sector) and raid1d thread issues them
one by one. Until all of the splited bio has finished, raid1d thread
can go on processing other things, which is time consuming.
But, there is a scene for error handling that is not necessary.
When the device has been set faulty, flush_bio_list() may end
bios in pending_bio_list with error status. Since these bios
has not been issued to the device actually, error handlding to
retry write and acknowledge badblocks make no sense.
Even without that scene, when the device is faulty, badblocks info
can not be written out to the device. Thus, we also no need to
handle the error IO.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Xiao Ni [Mon, 8 Jul 2019 02:14:32 +0000 (10:14 +0800)]
md/raid6: Set R5_ReadError when there is read failure on parity disk
7471fb77ce4d ("md/raid6: Fix anomily when recovering a single device in
RAID6.") avoids rereading P when it can be computed from other members.
However, this misses the chance to re-write the right data to P. This
patch sets R5_ReadError if the re-read fails.
Also, when re-read is skipped, we also missed the chance to reset
rdev->read_errors to 0. It can fail the disk when there are many read
errors on P member disk (other disks don't have read error)
V2: upper layer read request don't read parity/Q data. So there is no
need to consider such situation.
This is Reported-by: kbuild test robot <lkp@intel.com>
Fixes:
7471fb77ce4d ("md/raid6: Fix anomily when recovering a single device in RAID6.")
Cc: <stable@vger.kernel.org> #4.4+
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Hou Tao [Tue, 2 Jul 2019 14:35:48 +0000 (22:35 +0800)]
raid1: use an int as the return value of raise_barrier()
Using a sector_t as the return value is misleading, because
raise_barrier() only return 0 or -EINTR.
Also add comments for the return values of raise_barrier().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Hans Holmberg [Wed, 31 Jul 2019 09:41:36 +0000 (11:41 +0200)]
block: stop exporting bio_map_kern
Now that there no module users left of bio_map_kern, stop exporting the
symbol.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hans Holmberg <hans@owltronix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hans Holmberg [Wed, 31 Jul 2019 09:41:35 +0000 (11:41 +0200)]
lightnvm: pblk: use kvmalloc for metadata
There is no reason now not to use kvmalloc, so replace the internal
metadata allocation scheme.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hans Holmberg <hans@owltronix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hans Holmberg [Wed, 31 Jul 2019 09:41:34 +0000 (11:41 +0200)]
lightnvm: move metadata mapping to lower level driver
Now that blk_rq_map_kern can map both kmem and vmem, move internal
metadata mapping down to the lower level driver.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hans Holmberg <hans@owltronix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hans Holmberg [Wed, 31 Jul 2019 09:41:33 +0000 (11:41 +0200)]
lightnvm: remove nvm_submit_io_sync_fn
Move the redundant sync handling interface and wait for a completion in
the lightnvm core instead.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hans Holmberg <hans@owltronix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 25 Jul 2019 09:41:46 +0000 (17:41 +0800)]
blk-mq: balance mapping between present CPUs and queues
Spread queues among present CPUs first, then building mapping on other
non-present CPUs.
So we can minimize count of dead queues which are mapped by un-present
CPUs only. Then bad IO performance can be avoided by unbalanced mapping
between present CPUs and queues.
The similar policy has been applied on Managed IRQ affinity.
Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 25 Jul 2019 02:05:00 +0000 (10:05 +0800)]
scsi: implement .cleanup_rq callback
Implement .cleanup_rq() callback for freeing driver private part
of the request. Then we can avoid to leak this part if the request isn't
completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally.
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes:
396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 25 Jul 2019 02:04:59 +0000 (10:04 +0800)]
blk-mq: add callback of .cleanup_rq
SCSI maintains its own driver private data hooked off of each SCSI
request, and the pridate data won't be freed after scsi_queue_rq()
returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
(e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
fully dispatched them, due to a lower level SCSI driver's resource
limitation identified in scsi_queue_rq(). Currently SCSI's per-request
private data is leaked when the upper layer driver (dm-rq) frees and
then retries these requests in response to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().
This usecase is so specialized that it doesn't warrant training an
existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
account for freeing its driver private data -- doing so would add an
extra branch for handling a special case that all other consumers of
SCSI (and blk-mq) won't ever need to worry about.
So the most pragmatic way forward is to delegate freeing SCSI driver
private data to the upper layer driver (dm-rq). Do so by adding
new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
from dm-rq. A following commit will implement the .cleanup_rq() hook
in scsi_mq_ops.
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes:
396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chaitanya Kulkarni [Thu, 1 Aug 2019 17:26:38 +0000 (10:26 -0700)]
null_blk: implement REQ_OP_ZONE_RESET_ALL
This patch implements newly introduced zone reset all operation for
null_blk driver.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chaitanya Kulkarni [Thu, 1 Aug 2019 17:26:37 +0000 (10:26 -0700)]
scsi: implement REQ_OP_ZONE_RESET_ALL
This patch implements the zone reset all operation for sd_zbc.c. We add
a new boolean parameter for the sd_zbc_setup_reset_cmd() to indicate
REQ_OP_ZONE_RESET_ALL command setup. Along with that we add support in
the completion path for the zone reset all.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chaitanya Kulkarni [Thu, 1 Aug 2019 17:26:36 +0000 (10:26 -0700)]
blk-zoned: implement REQ_OP_ZONE_RESET_ALL
This implements REQ_OP_ZONE_RESET_ALL as a special case of the block
device zone reset operations where we just simply issue bio with the
newly introduced req op.
We issue this req op when the number of sectors is equal to the device's
partition's number of sectors and device has no partitions.
We also add support so that blk_op_str() can print the new reset-all
zone operation.
This patch also adds a generic make request check for newly
introduced REQ_OP_ZONE_RESET_ALL req_opf. We simply return error
when queue is zoned and reset-all flag is not set for
REQ_OP_ZONE_RESET_ALL.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chaitanya Kulkarni [Thu, 1 Aug 2019 17:26:35 +0000 (10:26 -0700)]
block: add req op to reset all zones and flag
This patch introduces a new request operation REQ_OP_ZONE_RESET_ALL.
This is useful for the applications like mkfs where it needs to reset
all the zones present on the underlying block device. As part for this
patch we also introduce new QUEUE_FLAG_ZONE_RESETALL which indicates the
queue zone reset all capability and corresponding helper macro.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:39:55 +0000 (15:39 -0700)]
block: Fix a comment in blk_cleanup_queue()
Change a reference to the legacy block layer into a reference to blk-mq.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:39:07 +0000 (15:39 -0700)]
block: Fix spelling in the header above blkg_lookup()
See also commit
8f4236d9008b ("block: remove QUEUE_FLAG_BYPASS and ->bypass") # v5.0.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:50:44 +0000 (15:50 -0700)]
block: Improve physical block alignment of split bios
Consider the following example:
* The logical block size is 4 KB.
* The physical block size is 8 KB.
* max_sectors equals (16 KB >> 9) sectors.
* A non-aligned 4 KB and an aligned 64 KB bio are merged into a single
non-aligned 68 KB bio.
The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB
+ 16 KB + 4 KB). The start of none of these five bio's is aligned to a
physical block boundary.
This patch ensures that such a bio is split into four aligned and
one non-aligned bio instead of being split into five non-aligned bios.
This improves performance because most block devices can handle aligned
requests faster than non-aligned requests.
Since the physical block size is larger than or equal to the logical
block size, this patch preserves the guarantee that the returned
value is a multiple of the logical block size.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:50:43 +0000 (15:50 -0700)]
block: Simplify blk_bio_segment_split()
Move the max_sectors check into bvec_split_segs() such that a single
call to that function can do all the necessary checks. This patch
optimizes the fast path further, namely if a bvec fits in a page.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:50:42 +0000 (15:50 -0700)]
block: Simplify bvec_split_segs()
Simplify this function by by removing two if-tests. Other than requiring
that the @sectors pointer is not NULL, this patch does not change the
behavior of bvec_split_segs().
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:50:41 +0000 (15:50 -0700)]
block: Document the bio splitting functions
Since what the bio splitting functions do is nontrivial, document these
functions.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 1 Aug 2019 22:50:40 +0000 (15:50 -0700)]
block: Declare several function pointer arguments 'const'
Make it clear to the compiler and also to humans that the functions
that query request queue properties do not modify any member of the
request_queue data structure.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 24 Jul 2019 03:48:43 +0000 (11:48 +0800)]
blk-mq: remove blk_mq_complete_request_sync
blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 24 Jul 2019 03:48:42 +0000 (11:48 +0800)]
nvme: wait until all completed request's complete fn is called
When aborting in-flight request for recovering controller, we have
to make sure that queue's complete function is called on completed
request before moving on. Otherwise, for example, the warning of
WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be
triggered on nvme-rdma.
Fix this issue by using blk_mq_tagset_wait_completed_request.
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 24 Jul 2019 03:48:41 +0000 (11:48 +0800)]
nvme: don't abort completed request in nvme_cancel_request
Before aborting in-flight requests, all IO queues and their interrupts
have been shutdown. However, request's completion function may not be
done yet because it can be scheduled to run via IPI.
So don't abort one request if it is marked as completed, otherwise
we may abort one normal completed request.
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 24 Jul 2019 03:48:40 +0000 (11:48 +0800)]
blk-mq: introduce blk_mq_tagset_wait_completed_request()
blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.
In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.
Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 24 Jul 2019 03:48:39 +0000 (11:48 +0800)]
blk-mq: introduce blk_mq_request_completed()
NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.
So introduce it.
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Mon, 5 Aug 2019 01:40:12 +0000 (18:40 -0700)]
Linux 5.3-rc3
Linus Torvalds [Sun, 4 Aug 2019 23:39:07 +0000 (16:39 -0700)]
Merge tag 'tpmdd-next-
20190805' of git://git.infradead.org/users/jjs/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes that did not make into my first pull request"
* tag 'tpmdd-next-
20190805' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm: tpm_ibm_vtpm: Fix unallocated banks
tpm: Fix null pointer dereference on chip register error path
Linus Torvalds [Sun, 4 Aug 2019 23:37:08 +0000 (16:37 -0700)]
Merge tag 'mtd/fixes-for-5.3-rc3' of git://git./linux/kernel/git/mtd/linux
Pull MTD fixes from Miquel Raynal:
"NAND:
- Fix Micron driver as some chips enable internal ECC correction
during their discovery while they advertize they do not have any.
Hyperbus:
- Restrict the build to only ARM64 SoCs (and compile testing) which
is what should have been done since the beginning.
- Fix Kconfig issue by selection something instead of implying it"
* tag 'mtd/fixes-for-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: hyperbus: Add hardware dependency to AM654 driver
mtd: hyperbus: Kconfig: Fix HBMC_AM654 dependencies
mtd: rawnand: micron: handle on-die "ECC-off" devices correctly
Nayna Jain [Thu, 11 Jul 2019 16:13:35 +0000 (12:13 -0400)]
tpm: tpm_ibm_vtpm: Fix unallocated banks
The nr_allocated_banks and allocated banks are initialized as part of
tpm_chip_register. Currently, this is done as part of auto startup
function. However, some drivers, like the ibm vtpm driver, do not run
auto startup during initialization. This results in uninitialized memory
issue and causes a kernel panic during boot.
This patch moves the pcr allocation outside the auto startup function
into tpm_chip_register. This ensures that allocated banks are initialized
in any case.
Fixes:
879b589210a9 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Reported-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Michal Suchánek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Milan Broz [Thu, 4 Jul 2019 07:26:15 +0000 (09:26 +0200)]
tpm: Fix null pointer dereference on chip register error path
If clk_enable is not defined and chip initialization
is canceled code hits null dereference.
Easily reproducible with vTPM init fail:
swtpm chardev --tpmstate dir=nonexistent_dir --tpm2 --vtpm-proxy
BUG: kernel NULL pointer dereference, address:
00000000
...
Call Trace:
tpm_chip_start+0x9d/0xa0 [tpm]
tpm_chip_register+0x10/0x1a0 [tpm]
vtpm_proxy_work+0x11/0x30 [tpm_vtpm_proxy]
process_one_work+0x214/0x5a0
worker_thread+0x134/0x3e0
? process_one_work+0x5a0/0x5a0
kthread+0xd4/0x100
? process_one_work+0x5a0/0x5a0
? kthread_park+0x90/0x90
ret_from_fork+0x19/0x24
Fixes:
719b7d81f204 ("tpm: introduce tpm_chip_start() and tpm_chip_stop()")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Linus Torvalds [Sun, 4 Aug 2019 17:30:47 +0000 (10:30 -0700)]
Merge tag 'powerpc-5.3-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.3:
- Wire up the new clone3 syscall.
- A fix for the PAPR SCM nvdimm driver, to fix a crash when firmware
gives us a device that's attached to a non-online NUMA node.
- A fix for a boot failure on 32-bit with KASAN enabled.
- Three fixes for implicit fall through warnings, some of which are
errors for us due to -Werror.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Kees Cook, Santosh
Sivaraj, Stephen Rothwell"
* tag 'powerpc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kasan: fix early boot failure on PPC32
drivers/macintosh/smu.c: Mark expected switch fall-through
powerpc/spe: Mark expected switch fall-throughs
powerpc/nvdimm: Pick nearby online node if the device node is not online
powerpc/kvm: Fall through switch case explicitly
powerpc: Wire up clone3 syscall
Geert Uytterhoeven [Mon, 29 Jul 2019 17:56:58 +0000 (19:56 +0200)]
MAINTAINERS: Add Geert as Renesas SoC Co-Maintainer
At the end of the v5.3 upstream kernel development cycle, Simon will be
stepping down from his role as Renesas SoC maintainer. Starting with
the v5.4 development cycle, Geert is taking over this role.
Add Geert as a co-maintainer, and add his git repository and branch.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 4 Aug 2019 17:16:30 +0000 (10:16 -0700)]
Merge tag 'kbuild-fixes-v5.3-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- detect missing missing "WITH Linux-syscall-note" for uapi headers
- fix needless rebuild when using Clang
- fix false-positive cc-option in Kconfig when using Clang
- avoid including corrupted .*.cmd files in the modpost stage
- fix warning of 'make vmlinux'
- fix {m,n,x,g}config to not generate the broken .config on the second
save operation.
- some trivial Makefile fixes
* tag 'kbuild-fixes-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Clear "written" flag to avoid data loss
kbuild: Check for unknown options with cc-option usage in Kconfig and clang
lib/raid6: fix unnecessary rebuild of vpermxor*.c
kbuild: modpost: do not parse unnecessary rules for vmlinux modpost
kbuild: modpost: remove unnecessary dependency for __modpost
kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules
kbuild: modpost: include .*.cmd files only when targets exist
kbuild: initialize CLANG_FLAGS correctly in the top Makefile
kbuild: detect missing "WITH Linux-syscall-note" for uapi headers
Linus Torvalds [Sun, 4 Aug 2019 17:02:13 +0000 (10:02 -0700)]
Merge tag 'safesetid-maintainers-correction-5.3-rc2' of git://github.com/micah-morton/linux
Pull SafeSetID maintainer update from Micah Morton:
"Add entry in MAINTAINERS file for SafeSetID LSM"
* tag 'safesetid-maintainers-correction-5.3-rc2' of git://github.com/micah-morton/linux:
Add entry in MAINTAINERS file for SafeSetID LSM
M. Vefa Bicakci [Sat, 3 Aug 2019 10:02:12 +0000 (06:02 -0400)]
kconfig: Clear "written" flag to avoid data loss
Prior to this commit, starting nconfig, xconfig or gconfig, and saving
the .config file more than once caused data loss, where a .config file
that contained only comments would be written to disk starting from the
second save operation.
This bug manifests itself because the SYMBOL_WRITTEN flag is never
cleared after the first call to conf_write, and subsequent calls to
conf_write then skip all of the configuration symbols due to the
SYMBOL_WRITTEN flag being set.
This commit resolves this issue by clearing the SYMBOL_WRITTEN flag
from all symbols before conf_write returns.
Fixes:
8e2442a5f86e ("kconfig: fix missing choice values in auto.conf")
Cc: linux-stable <stable@vger.kernel.org> # 4.19+
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Linus Torvalds [Sun, 4 Aug 2019 01:50:52 +0000 (18:50 -0700)]
Merge tag 'xtensa-
20190803' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fix from Max Filippov:
"Fix build for xtensa cores with coprocessors that was broken by
entry/return abstraction patch"
* tag 'xtensa-
20190803' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix build for cores with coprocessors
Linus Torvalds [Sat, 3 Aug 2019 19:56:34 +0000 (12:56 -0700)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A set of driver fixes for the I2C subsystem"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: s3c2410: Mark expected switch fall-through
i2c: at91: fix clk_offset for sama5d2
i2c: at91: disable TXRDY interrupt after sending data
i2c: iproc: Fix i2c master read more than 63 bytes
eeprom: at24: make spd world-readable again
Linus Torvalds [Sat, 3 Aug 2019 17:58:46 +0000 (10:58 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
"A set of updates for perf tools and documentation:
perf header:
- Prevent a division by zero
- Deal with an uninitialized warning proper
libbpf:
- Fix the missiong __WORDSIZE definition for musl & al
UAPI headers:
- Synchronize kernel headers
Documentation:
- Fix the memory units for perf.data size"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
libbpf: fix missing __WORDSIZE definition
perf tools: Fix perf.data documentation units for memory size
perf header: Fix use of unitialized value warning
perf header: Fix divide by zero error if f_header.attr_size==0
tools headers UAPI: Sync if_link.h with the kernel
tools headers UAPI: Sync sched.h with the kernel
tools headers UAPI: Sync usbdevice_fs.h with the kernels to get new ioctl
tools perf beauty: Fix usbdevfs_ioctl table generator to handle _IOC()
tools headers UAPI: Update tools's copy of drm.h headers
tools headers UAPI: Update tools's copy of mman.h headers
tools headers UAPI: Update tools's copy of kvm.h headers
tools include UAPI: Sync x86's syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open
Linus Torvalds [Sat, 3 Aug 2019 17:51:29 +0000 (10:51 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull vdso timer fixes from Thomas Gleixner:
"A series of commits to deal with the regression caused by the generic
VDSO implementation.
The usage of clock_gettime64() for 32bit compat fallback syscalls
caused seccomp filters to kill innocent processes because they only
allow clock_gettime().
Handle the compat syscalls with clock_gettime() as before, which is
not a functional problem for the VDSO as the legacy compat application
interface is not y2038 safe anyway. It's just extra fallback code
which needs to be implemented on every architecture.
It's opt in for now so that it does not break the compile of already
converted architectures in linux-next. Once these are fixed, the
#ifdeffery goes away.
So much for trying to be smart and reuse code..."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64: compat: vdso: Use legacy syscalls as fallback
x86/vdso/32: Use 32bit syscall fallback
lib/vdso/32: Provide legacy syscall fallbacks
lib/vdso: Move fallback invocation to the callers
lib/vdso/32: Remove inconsistent NULL pointer checks
Linus Torvalds [Sat, 3 Aug 2019 17:49:45 +0000 (10:49 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A small bunch of fixes from the irqchip department:
- Fix a couple of UAF on error paths (RZA1, GICv3 ITS)
- Fix iMX GPCv2 trigger setting
- Add missing of_node_put() on error path in MBIGEN
- Add another bunch of /* fall-through */ to silence warnings"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/renesas-rza1: Fix an use-after-free in rza1_irqc_probe()
irqchip/irq-imx-gpcv2: Forward irq type to parent
irqchip/irq-mbigen: Add of_node_put() before return
irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail
irqchip/gic-v3: Mark expected switch fall-through
Linus Torvalds [Sat, 3 Aug 2019 17:43:44 +0000 (10:43 -0700)]
Merge tag 'xfs-5.3-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- Avoid leaking kernel stack contents to userspace
- Fix a potential null pointer dereference in the dabtree scrub code
* tag 'xfs-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Fix possible null-pointer dereferences in xchk_da_btree_block_check_sibling()
xfs: fix stack contents leakage in the v1 inumber ioctls
Linus Torvalds [Sat, 3 Aug 2019 16:20:49 +0000 (09:20 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"17 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/acpi/scan.c: document why we don't need the device_hotplug_lock
memremap: move from kernel/ to mm/
lib/test_meminit.c: use GFP_ATOMIC in RCU critical section
asm-generic: fix -Wtype-limits compiler warnings
cgroup: kselftest: relax fs_spec checks
mm/memory_hotplug.c: remove unneeded return for void function
mm/migrate.c: initialize pud_entry in migrate_vma()
coredump: split pipe command whitespace before expanding template
page flags: prioritize kasan bits over last-cpuid
ubsan: build ubsan.c more conservatively
kasan: remove clang version check for KASAN_STACK
mm: compaction: avoid 100% CPU usage during compaction when a task is killed
mm: migrate: fix reference check race between __find_get_block() and migration
mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
ocfs2: remove set but not used variable 'last_hash'
Revert "kmemleak: allow to coexist with fault injection"
kernel/signal.c: fix a kernel-doc markup
Linus Torvalds [Sat, 3 Aug 2019 15:59:11 +0000 (08:59 -0700)]
Merge tag 'riscv/for-v5.3-rc3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Three minor RISC-V-related changes for v5.3-rc3:
- Add build ID to VDSO builds to avoid a double-free in perf when
libelf isn't used
- Align the RV64 defconfig to the output of "make savedefconfig" so
subsequent defconfig patches don't get out of hand
- Drop a superfluous DT property from the FU540 SoC DT data (since it
must be already set in board data that includes it)"
* tag 'riscv/for-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: defconfig: align RV64 defconfig to the output of "make savedefconfig"
riscv: dts: fu540-c000: drop "timebase-frequency"
riscv: Fix perf record without libelf support
David Hildenbrand [Sat, 3 Aug 2019 04:49:29 +0000 (21:49 -0700)]
drivers/acpi/scan.c: document why we don't need the device_hotplug_lock
Let's document why the lock is not needed in acpi_scan_init(), right now
this is not really obvious.
[akpm@linux-foundation.org: fix tpyo]
Link: http://lkml.kernel.org/r/20190731135306.31524-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Sat, 3 Aug 2019 04:49:26 +0000 (21:49 -0700)]
memremap: move from kernel/ to mm/
memremap.c implements MM functionality for ZONE_DEVICE, so it really
should be in the mm/ directory, not the kernel/ one.
Link: http://lkml.kernel.org/r/20190722094143.18387-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Potapenko [Sat, 3 Aug 2019 04:49:22 +0000 (21:49 -0700)]
lib/test_meminit.c: use GFP_ATOMIC in RCU critical section
kmalloc() shouldn't sleep while in RCU critical section, therefore use
GFP_ATOMIC instead of GFP_KERNEL.
The bug was spotted by the 0day kernel testing robot.
Link: http://lkml.kernel.org/r/20190725121703.210874-1-glider@google.com
Fixes:
7e659650cbda ("lib: introduce test_meminit module")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Sat, 3 Aug 2019 04:49:19 +0000 (21:49 -0700)]
asm-generic: fix -Wtype-limits compiler warnings
Commit
d66acc39c7ce ("bitops: Optimise get_order()") introduced a
compilation warning because "rx_frag_size" is an "ushort" while
PAGE_SHIFT here is 16.
The commit changed the get_order() to be a multi-line macro where
compilers insist to check all statements in the macro even when
__builtin_constant_p(rx_frag_size) will return false as "rx_frag_size"
is a module parameter.
In file included from ./arch/powerpc/include/asm/page_64.h:107,
from ./arch/powerpc/include/asm/page.h:242,
from ./arch/powerpc/include/asm/mmu.h:132,
from ./arch/powerpc/include/asm/lppaca.h:47,
from ./arch/powerpc/include/asm/paca.h:17,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./arch/powerpc/include/asm/processor.h:39,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
./include/asm-generic/getorder.h:54:9: warning: comparison is always
true due to limited range of data type [-Wtype-limits]
(((n) < (1UL << PAGE_SHIFT)) ? 0 : \
^
drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
of macro 'get_order'
adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
^~~~~~~~~
Fix it by moving all of this multi-line macro into a proper function,
and killing __get_order() off.
[akpm@linux-foundation.org: remove __get_order() altogether]
[cai@lca.pw: v2]
Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw
Fixes:
d66acc39c7ce ("bitops: Optimise get_order()")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: James Y Knight <jyknight@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Down [Sat, 3 Aug 2019 04:49:15 +0000 (21:49 -0700)]
cgroup: kselftest: relax fs_spec checks
On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct. Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":
% grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.
After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.
Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weitao Hou [Sat, 3 Aug 2019 04:49:12 +0000 (21:49 -0700)]
mm/memory_hotplug.c: remove unneeded return for void function
return is unneeded in void function
Link: http://lkml.kernel.org/r/20190723130814.21826-1-houweitaoo@gmail.com
Signed-off-by: Weitao Hou <houweitaoo@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralph Campbell [Sat, 3 Aug 2019 04:49:08 +0000 (21:49 -0700)]
mm/migrate.c: initialize pud_entry in migrate_vma()
When CONFIG_MIGRATE_VMA_HELPER is enabled, migrate_vma() calls
migrate_vma_collect() which initializes a struct mm_walk but didn't
initialize mm_walk.pud_entry. (Found by code inspection) Use a C
structure initialization to make sure it is set to NULL.
Link: http://lkml.kernel.org/r/20190719233225.12243-1-rcampbell@nvidia.com
Fixes:
8763cb45ab967 ("mm/migrate: new memory migration helper for use with device memory")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Wise [Sat, 3 Aug 2019 04:49:05 +0000 (21:49 -0700)]
coredump: split pipe command whitespace before expanding template
Save the offsets of the start of each argument to avoid having to update
pointers to each argument after every corename krealloc and to avoid
having to duplicate the memory for the dump command.
Executable names containing spaces were previously being expanded from
%e or %E and then split in the middle of the filename. This is
incorrect behaviour since an argument list can represent arguments with
spaces.
The splitting could lead to extra arguments being passed to the core
dump handler that it might have interpreted as options or ignored
completely.
Core dump handlers that are not aware of this Linux kernel issue will be
using %e or %E without considering that it may be split and so they will
be vulnerable to processes with spaces in their names breaking their
argument list. If their internals are otherwise well written, such as
if they are written in shell but quote arguments, they will work better
after this change than before. If they are not well written, then there
is a slight chance of breakage depending on the details of the code but
they will already be fairly broken by the split filenames.
Core dump handlers that are aware of this Linux kernel issue will be
placing %e or %E as the last item in their core_pattern and then
aggregating all of the remaining arguments into one, separated by
spaces. Alternatively they will be obtaining the filename via other
methods. Both of these will be compatible with the new arrangement.
A side effect from this change is that unknown template types (for
example %z) result in an empty argument to the dump handler instead of
the argument being dropped. This is a desired change as:
It is easier for dump handlers to process empty arguments than dropped
ones, especially if they are written in shell or don't pass each
template item with a preceding command-line option in order to
differentiate between individual template types. Most core_patterns in
the wild do not use options so they can confuse different template types
(especially numeric ones) if an earlier one gets dropped in old kernels.
If the kernel introduces a new template type and a core_pattern uses it,
the core dump handler might not expect that the argument can be dropped
in old kernels.
For example, this can result in security issues when %d is dropped in
old kernels. This happened with the corekeeper package in Debian and
resulted in the interface between corekeeper and Linux having to be
rewritten to use command-line options to differentiate between template
types.
The core_pattern for most core dump handlers is written by the handler
author who would generally not insert unknown template types so this
change should be compatible with all the core dump handlers that exist.
Link: http://lkml.kernel.org/r/20190528051142.24939-1-pabs3@bonedaddy.net
Fixes:
74aadce98605 ("core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe")
Signed-off-by: Paul Wise <pabs3@bonedaddy.net>
Reported-by: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Reported-by: Paul Wise <pabs3@bonedaddy.net> [https://lore.kernel.org/linux-fsdevel/c8b7ecb8508895bf4adb62a748e2ea2c71854597.camel@bonedaddy.net/]
Suggested-by: Jakub Wilk <jwilk@jwilk.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Sat, 3 Aug 2019 04:49:02 +0000 (21:49 -0700)]
page flags: prioritize kasan bits over last-cpuid
ARM64 randdconfig builds regularly run into a build error, especially
when NUMA_BALANCING and SPARSEMEM are enabled but not SPARSEMEM_VMEMMAP:
#error "KASAN: not enough bits in page flags for tag"
The last-cpuid bits are already contitional on the available space, so
the result of the calculation is a bit random on whether they were
already left out or not.
Adding the kasan tag bits before last-cpuid makes it much more likely to
end up with a successful build here, and should be reliable for
randconfig at least, as long as that does not randomize NR_CPUS or
NODES_SHIFT but uses the defaults.
In order for the modified check to not trigger in the x86 vdso32 code
where all constants are wrong (building with -m32), enclose all the
definitions with an #ifdef.
[arnd@arndb.de: build fix]
Link: http://lkml.kernel.org/r/CAK8P3a3Mno1SWTcuAOT0Wa9VS15pdU6EfnkxLbDpyS55yO04+g@mail.gmail.com
Link: http://lkml.kernel.org/r/20190722115520.3743282-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190618095347.3850490-1-arnd@arndb.de/
Fixes:
2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Sat, 3 Aug 2019 04:48:58 +0000 (21:48 -0700)]
ubsan: build ubsan.c more conservatively
objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file
as we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option. According to
Andrey Ryabinin, that option is not even needed, dropping it here fixes
the stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes:
d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Sat, 3 Aug 2019 04:48:54 +0000 (21:48 -0700)]
kasan: remove clang version check for KASAN_STACK
asan-stack mode still uses dangerously large kernel stacks of tens of
kilobytes in some drivers, and it does not seem that anyone is working
on the clang bug.
Turn it off for all clang versions to prevent users from accidentally
enabling it once they update to clang-9, and to help automated build
testing with clang-9.
Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Link: http://lkml.kernel.org/r/20190719200347.2596375-1-arnd@arndb.de
Fixes:
6baec880d7a5 ("kasan: turn off asan-stack for clang-8 and earlier")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Sat, 3 Aug 2019 04:48:51 +0000 (21:48 -0700)]
mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes
1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting
while making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and
isolating a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes
22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by commit
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes:
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Sat, 3 Aug 2019 04:48:47 +0000 (21:48 -0700)]
mm: migrate: fix reference check race between __find_get_block() and migration
buffer_migrate_page_norefs() can race with bh users in the following
way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page. Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary. The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock. Between the page lock and
private_lock, it should be impossible for other references to be
acquired and updates to happen during the migration.
A user had reported data corruption issues on a distribution kernel with
a similar page migration implementation as mainline. The data
corruption could not be reproduced with this patch applied. A small
number of migration-intensive tests were run and no performance problems
were noted.
[mgorman@techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes:
89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Sat, 3 Aug 2019 04:48:44 +0000 (21:48 -0700)]
mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
Shakeel Butt reported premature oom on kernel with
"cgroup_disable=memory" since mem_cgroup_is_root() returns false even
though memcg is actually NULL. The drop_caches is also broken.
It is because commit
aeed1d325d42 ("mm/vmscan.c: generalize
shrink_slab() calls in shrink_node()") removed the !memcg check before
!mem_cgroup_is_root(). And, surprisingly root memcg is allocated even
though memory cgroup is disabled by kernel boot parameter.
Add mem_cgroup_disabled() check to make reclaimer work as expected.
Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
Fixes:
aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Hadrava <had@kam.mff.cuni.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YueHaibing [Sat, 3 Aug 2019 04:48:40 +0000 (21:48 -0700)]
ocfs2: remove set but not used variable 'last_hash'
Fixes gcc '-Wunused-but-set-variable' warning:
fs/ocfs2/xattr.c: In function ocfs2_xattr_bucket_find:
fs/ocfs2/xattr.c:3828:6: warning: variable last_hash set but not used [-Wunused-but-set-variable]
It's never used and can be removed.
Link: http://lkml.kernel.org/r/20190716132110.34836-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Sat, 3 Aug 2019 04:48:37 +0000 (21:48 -0700)]
Revert "kmemleak: allow to coexist with fault injection"
When running ltp's oom test with kmemleak enabled, the below warning was
triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
passed in:
WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
Modules linked in: loop dax_pmem dax_pmem_core ip_tables x_tables xfs virtio_net net_failover virtio_blk failover ata_generic virtio_pci virtio_ring virtio libata
CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-
20190710+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
...
kmemleak_alloc+0x4e/0xb0
kmem_cache_alloc+0x2a7/0x3e0
mempool_alloc_slab+0x2d/0x40
mempool_alloc+0x118/0x2b0
bio_alloc_bioset+0x19d/0x350
get_swap_bio+0x80/0x230
__swap_writepage+0x5ff/0xb20
The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, however kmemleak
has __GFP_NOFAIL set all the time due to
d9570ee3bd1d4f2 ("kmemleak:
allow to coexist with fault injection"). But, it doesn't make any sense
to have __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM specified at the same
time.
According to the discussion on the mailing list, the commit should be
reverted for short term solution. Catalin Marinas would follow up with
a better solution for longer term.
The failure rate of kmemleak metadata allocation may increase in some
circumstances, but this should be expected side effect.
Link: http://lkml.kernel.org/r/1563299431-111710-1-git-send-email-yang.shi@linux.alibaba.com
Fixes:
d9570ee3bd1d4f2 ("kmemleak: allow to coexist with fault injection")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mauro Carvalho Chehab [Sat, 3 Aug 2019 04:48:33 +0000 (21:48 -0700)]
kernel/signal.c: fix a kernel-doc markup
The kernel-doc parser doesn't handle expressions with %foo*. Instead,
when an asterisk should be part of a constant, it uses an alternative
notation: `foo*`.
Link: http://lkml.kernel.org/r/7f18c2e0b5e39e6b7eb55ddeb043b8b260b49f2d.1563361575.git.mchehab+samsung@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 3 Aug 2019 01:53:51 +0000 (18:53 -0700)]
Merge tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Daniel Vetter:
"Dave sends his pull, everyone realizes they've been asleep at the
wheel and hits send on their own pulls :-/
Normally I'd just ignore these all because w/e for me and Dave. But
this time around the latecomers also included drm-intel-fixes, which
failed to send out a -fixes pull thus far for this release (screwed up
vacation coverage, despite that 2/3 maintainers were around ... they
all look appropriately guilty), and that really is overdue to get
landed.
And since I had to do a pull request anyway I pulled the other two
late ones too.
intel fixes (didn't have any ever since the main merge window pull):
- gvt fixes (2 cc: stable)
- fix gpu reset vs mm-shrinker vs wakeup fun (needed a few patches)
- two gem locking fixes (one cc: stable)
- pile of misc fixes all over with minor impact, 6 cc: stable, others
from this window
exynos:
- misc minor fixes
misc:
- some build/Kconfig fixes
- regression fix for vm scalability perf test which seems to mostly
exercise dmesg/console logging ...
- the vgem cache flush fix for arm64 broke the world on x86, so
that's reverted again
* tag 'drm-fixes-2019-08-02-1' of git://anongit.freedesktop.org/drm/drm: (42 commits)
Revert "drm/vgem: fix cache synchronization on arm/arm64"
drm/exynos: fix missing decrement of retry counter
drm/exynos: add CONFIG_MMU dependency
drm/exynos: remove redundant assignment to pointer 'node'
drm/exynos: using dev_get_drvdata directly
drm/bochs: Use shadow buffer for bochs framebuffer console
drm/fb-helper: Instanciate shadow FB if configured in device's mode_config
drm/fb-helper: Map DRM client buffer only when required
drm/client: Support unmapping of DRM client buffers
drm/i915: Only recover active engines
drm/i915: Add a wakeref getter for iff the wakeref is already active
drm/i915: Lift intel_engines_resume() to callers
drm/vgem: fix cache synchronization on arm/arm64
drm/i810: Use CONFIG_PREEMPTION
drm/bridge: tc358764: Fix build error
drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m
drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.
drm/i915/gvt: grab runtime pm first for forcewake use
drm/i915/gvt: fix incorrect cache entry for guest page mapping
drm/i915/gvt: Checking workload's gma earlier
...
Linus Torvalds [Sat, 3 Aug 2019 01:40:49 +0000 (18:40 -0700)]
Merge tag 'selinux-pr-
20190801' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"One more small fix for a potential memory leak in an error path"
* tag 'selinux-pr-
20190801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix memory leak in policydb_init()
Jean Delvare [Wed, 31 Jul 2019 08:07:06 +0000 (10:07 +0200)]
mtd: hyperbus: Add hardware dependency to AM654 driver
The hbmc-am654 driver is for the TI AM654, which is an ARM64 SoC, so
don't propose this driver on other architectures unless
build-testing.
Fixes:
b07079f1642c ("mtd: hyperbus: Add driver for TI's HyperBus memory controller")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Vignesh Raghavendra [Fri, 19 Jul 2019 08:29:12 +0000 (13:59 +0530)]
mtd: hyperbus: Kconfig: Fix HBMC_AM654 dependencies
On x86_64, when CONFIG_OF is not disabled:
WARNING: unmet direct dependencies detected for MUX_MMIO
Depends on [n]: MULTIPLEXER [=y] && (OF [=n] || COMPILE_TEST [=n])
Selected by [y]:
- HBMC_AM654 [=y] && MTD [=y] && MTD_HYPERBUS [=y]
due to
config HBMC_AM654
tristate "HyperBus controller driver for AM65x SoC"
select MULTIPLEXER
select MUX_MMIO
Fix this by making HBMC_AM654 imply MUX_MMIO instead of select so
that dependencies are taken care of. MUX_MMIO is optional for
functioning of driver.
Fixes:
b07079f1642c ("mtd: hyperbus: Add driver for TI's HyperBus memory controller")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Marco Felsch [Tue, 30 Jul 2019 13:44:07 +0000 (15:44 +0200)]
mtd: rawnand: micron: handle on-die "ECC-off" devices correctly
Some devices are not supposed to support on-die ECC but experience
shows that internal ECC machinery can actually be enabled through the
"SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
Tables" returns that it is not.
Currently, the driver checks the "READ ID Parameter" field directly
after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction.
To address this in a common way we need to turn off the on-die ECC
directly after reading the "READ ID Parameter" and before checking the
"ECC status".
Cc: stable@vger.kernel.org
Fixes:
dbc44edbf833 ("mtd: rawnand: micron: Fix on-die ECC detection logic")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Linus Torvalds [Fri, 2 Aug 2019 22:26:48 +0000 (15:26 -0700)]
Merge tag 'for-linus-5.3a-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a small cleanup
- a fix for a build error on ARM with some configs
- a fix of a patch for the Xen gntdev driver
- three patches for fixing a potential problem in the swiotlb-xen
driver which Konrad was fine with me carrying them through the Xen
tree
* tag 'for-linus-5.3a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/swiotlb: remember having called xen_create_contiguous_region()
xen/swiotlb: simplify range_straddles_page_boundary()
xen/swiotlb: fix condition for calling xen_destroy_contiguous_region()
xen: avoid link error on ARM
xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero()
xen/pciback: remove set but not used variable 'old_state'
Linus Torvalds [Fri, 2 Aug 2019 22:23:27 +0000 (15:23 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Update the compat layer to allow single-byte watchpoints on all
addresses (similar to the native support)
- arm_pmu: fix the restoration of the counters on the
CPU_PM_ENTER_FAILED path
- Fix build regression with vDSO and Makefile not stripping
CROSS_COMPILE_COMPAT
- Fix the CTR_EL0 (cache type register) sanitisation on heterogeneous
machines (e.g. big.LITTLE)
- Fix the interrupt controller priority mask value when pseudo-NMIs are
enabled
- arm64 kprobes fixes: recovering of the PSTATE.D flag in the
single-step exception handler, NOKPROBE annotations for
unwind_frame() and walk_stackframe(), remove unneeded
rcu_read_lock/unlock from debug handlers
- Several gcc fall-through warnings
- Unused variable warnings
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Make debug exception handlers visible from RCU
arm64: kprobes: Recover pstate.D in single-step exception handler
arm64/mm: fix variable 'tag' set but not used
arm64/mm: fix variable 'pud' set but not used
arm64: Remove unneeded rcu_read_lock from debug handlers
arm64: unwind: Prohibit probing on return_address()
arm64: Lower priority mask for GIC_PRIO_IRQON
arm64/efi: fix variable 'si' set but not used
arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG}
arm64: vdso: Fix Makefile regression
arm64: module: Mark expected switch fall-through
arm64: smp: Mark expected switch fall-through
arm64: hw_breakpoint: Fix warnings about implicit fallthrough
drivers/perf: arm_pmu: Fix failure path in PM notifier
arm64: compat: Allow single-byte watchpoints on all addresses
Linus Torvalds [Fri, 2 Aug 2019 22:18:51 +0000 (15:18 -0700)]
Merge branch 'parisc-5.3-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"A few small fixes for the parisc architecture:
- Fix fall-through warnings in parisc math emu code
- Fix vmlinuz linking failure with debug-enabled kernels
- Fix a race condition in kernel live-patching code
- Add missing archclean Makefile target & defconfig adjustments"
* 'parisc-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Add archclean Makefile target
parisc: Strip debug info from kernel before creating compressed vmlinuz
parisc: Fix build of compressed kernel even with debug enabled
parisc: fix race condition in patching code
parisc: rename default_defconfig to defconfig
parisc: Fix fall-through warnings in fpudispatch.c
parisc: Mark expected switch fall-throughs in fault.c
Linus Torvalds [Fri, 2 Aug 2019 22:13:27 +0000 (15:13 -0700)]
Merge tag 's390-5.3-4' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Default configs updates
- Minor qdio cleanup
- Sparse warnings fixes
- Implicit-fallthrough warnings fixes
* tag 's390-5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: adjust switch fall through comments for -Wimplicit-fallthrough
vfio-ccw: make vfio_ccw_async_region_ops static
s390/3215: add switch fall through comment for -Wimplicit-fallthrough
s390/tape: add fallthrough annotations
s390/mm: add fallthrough annotations
s390/mm: make gmap_test_and_clear_dirty_pmd static
s390/kexec: add missing include to machine_kexec_reloc.c
s390/perf: make cf_diag_csd static
s390/lib: add missing include
s390/boot: add missing declarations and includes
s390: update configs
s390: clean up qdio.h
Linus Torvalds [Fri, 2 Aug 2019 21:46:33 +0000 (14:46 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes to four drivers with no core changes.
The mpt3sas one is theoretical until we get a CPU that goes up to 64
bits physical, the qla2xxx one fixes an oops in a driver
initialization error leg and the others are mostly cosmetic"
[ The fcoe patches may be worth highlighting - they may be "just"
cleanups, but they simplify and fix the odd fc_rport_priv structure
handling rules so that the new gcc-9 warnings about memset crossing
structure boundaries are gone.
The old code was hard for humans to understand too, and really
confused the compiler sanity checks - Linus ]
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix possible fcport null-pointer dereferences
scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
scsi: hpsa: remove printing internal cdb on tag collision
scsi: hpsa: correct scsi command status issue after reset
scsi: fcoe: pass in fcoe_rport structure instead of fc_rport_priv
scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
scsi: libfc: Whitespace cleanup in libfc.h
Linus Torvalds [Fri, 2 Aug 2019 21:31:26 +0000 (14:31 -0700)]
Merge tag 'for-linus-
20190802' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here's a small collection of fixes that should go into this series.
This contains:
- io_uring potential use-after-free fix (Jackie)
- loop regression fix (Jan)
- O_DIRECT fragmented bio regression fix (Damien)
- Mark Denis as the new floppy maintainer (Denis)
- ataflop switch fall-through annotation (Gustavo)
- libata zpodd overflow fix (Kees)
- libata ahci deferred probe fix (Miquel)
- nbd invalidation BUG_ON() fix (Munehisa)
- dasd endless loop fix (Stefan)"
* tag 'for-linus-
20190802' of git://git.kernel.dk/linux-block:
s390/dasd: fix endless loop after read unit address configuration
block: Fix __blkdev_direct_IO() for bio fragments
MAINTAINERS: floppy: take over maintainership
nbd: replace kill_bdev() with __invalidate_device() again
ata: libahci: do not complain in case of deferred probe
io_uring: fix KASAN use after free in io_sq_wq_submit_work
loop: Fix mount(2) failure due to race with LOOP_SET_FD
libata: zpodd: Fix small read overflow in zpodd_get_mech_type()
ataflop: Mark expected switch fall-through
Linus Torvalds [Fri, 2 Aug 2019 21:28:40 +0000 (14:28 -0700)]
Merge tag 'for-5.3/dm-fixes-1' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Fix NULL pointer and various whitespace issues with DM's recent DAX
code changes from commit in 5.3 merge"
* tag 'for-5.3/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: fix various whitespace issues with recent DAX code
dm table: fix dax_dev NULL dereference in device_synchronous()
Linus Torvalds [Fri, 2 Aug 2019 21:23:24 +0000 (14:23 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"Here's our second -rc pull request. Nothing particularly special in
this one. The client removal deadlock fix is kindy tricky, but we had
multiple eyes on it and no one could find a fault in it. A couple
Spectre V1 fixes too. Otherwise, all just normal -rc fodder:
- A couple Spectre V1 fixes (umad, hfi1)
- Fix a tricky deadlock in the rdma core code with refcounting
instead of locks (client removal patches)
- Build errors (hns)
- Fix a scheduling while atomic issue (mlx5)
- Use after free fix (mad)
- Fix error path return code (hns)
- Null deref fix (siw_crypto_hash)
- A few other misc. minor fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp()
RDMA/mlx5: Release locks during notifier unregister
IB/hfi1: Fix Spectre v1 vulnerability
IB/mad: Fix use-after-free in ib mad completion handling
RDMA/restrack: Track driver QP types in resource tracker
IB/mlx5: Fix MR registration flow to use UMR properly
RDMA/devices: Remove the lock around remove_client_context
RDMA/devices: Do not deadlock during client removal
IB/core: Add mitigation for Spectre V1
Do not dereference 'siw_crypto_shash' before checking
RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes
RDMA/hns: Fix build error
Linus Torvalds [Fri, 2 Aug 2019 21:19:41 +0000 (14:19 -0700)]
Merge tag 'for-5.3-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- tiny race window during 2 transactions aborting at the same time can
accidentally lead to a commit
- regression fix, possible deadlock during fiemap
- fix for an old bug when incremental send can fail on a file that has
been deduplicated in a special way
* tag 'for-5.3-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix deadlock between fiemap and transaction commits
Btrfs: fix race leading to fs corruption after transaction abort
Btrfs: fix incremental send failure after deduplication
Linus Torvalds [Fri, 2 Aug 2019 16:02:58 +0000 (09:02 -0700)]
Merge tag 'gfs2-v5.3-rc2.fixes' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
"Fix gfs2 cluster coherency bug"
* tag 'gfs2-v5.3-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Inode dirtying fix
Linus Torvalds [Fri, 2 Aug 2019 15:55:28 +0000 (08:55 -0700)]
Merge tag 'pm-5.3-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix recent regression affecting ACPI device power management"
* tag 'pm-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Fix regression in acpi_device_set_power()
Linus Torvalds [Fri, 2 Aug 2019 15:53:34 +0000 (08:53 -0700)]
Merge tag 'sound-5.3-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
- A further fix for syzcaller issues with USB-audio, addressing NULL
dereference that was introduced by the recent fix
- Avoid a long delay at boot with HD-audio when i915 module was built
but not installed, found on some Debian systems
- A fix of small race window at PCM draining
* tag 'sound-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix gpf in snd_usb_pipe_sanity_check
ALSA: pcm: fix lost wakeup event scenarios in snd_pcm_drain
ALSA: hda: Fix 1-minute detection delay when i915 module is not available
Linus Torvalds [Fri, 2 Aug 2019 15:50:37 +0000 (08:50 -0700)]
Merge tag 'drm-fixes-2019-08-02' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Thanks to Daniel for handling the email the last couple of weeks, flus
and break-ins combined to derail me. Surprised nothing materialised
today to take me out again.
Just more amdgpu navi fixes, msm fixes and a single nouveau regression
fix:
amdgpu:
- navi10 temperature and pstate fixes
- vcn dynamic power management fix
- CS ioctl error handling fix
- debugfs info leak fix
- amdkfd VegaM fix
msm:
- dma sync call fix
- mdp5 dsi command mode fix
- fall-through fixes
- disabled GPU fix
nouveau:
- regression fix for displayport MST support"
* tag 'drm-fixes-2019-08-02' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau: Only release VCPI slots on mode changes
drm: msm: Fix add_gpu_components
drm/msm: Annotate intentional switch statement fall throughs
drm/msm: add support for per-CRTC max_vblank_count on mdp5
drm/msm: Use the correct dma_sync calls in msm_gem
drm/amd/powerplay: correct UVD/VCE/VCN power status retrieval
drm/amd/powerplay: correct Navi10 VCN powergate control (v2)
drm/amd/powerplay: support VCN powergate status retrieval for SW SMU
drm/amd/powerplay: support VCN powergate status retrieval on Raven
drm/amd/powerplay: add new sensor type for VCN powergate status
drm/amdgpu: fix a potential information leaking bug
drm/amdgpu: fix error handling in amdgpu_cs_process_fence_dep
drm/amd/powerplay: enable SW SMU reset functionality
drm/amd/powerplay: fix null pointer dereference around dpm state relates
drm/amdgpu/powerplay: use proper revision id for navi
drm/amd/powerplay: fix temperature granularity error in smu11
drm/amd/powerplay: add callback function of get_thermal_temperature_range
drm/amdkfd: Fix byte align on VegaM
Linus Torvalds [Fri, 2 Aug 2019 15:47:28 +0000 (08:47 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A few fixes for code that came in during the merge window or that
started getting exercised differently this time around:
- Select regmap MMIO kconfig in spreadtrum driver to avoid compile
errors
- Complete kerneldoc on devm_clk_bulk_get_optional()
- Register an essential clk earlier on mediatek mt8183 SoCs so the
clocksource driver can use it
- Fix divisor math in the at91 driver
- Plug a race in Renesas reset control logic"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: renesas: cpg-mssr: Fix reset control race condition
clk: sprd: Select REGMAP_MMIO to avoid compile errors
clk: mediatek: mt8183: Register 13MHz clock earlier for clocksource
clk: Add missing documentation of devm_clk_bulk_get_optional() argument
clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1
Linus Torvalds [Fri, 2 Aug 2019 15:44:33 +0000 (08:44 -0700)]
Merge tag 'arm-swiotlb-5.3' of git://git.infradead.org/users/hch/dma-mapping
Pull arm swiotlb support from Christoph Hellwig:
"This fixes a cascade of regressions that originally started with the
addition of the ia64 port, but only got fatal once we removed most
uses of block layer bounce buffering in Linux 4.18.
The reason is that while the original i386/PAE code that was the first
architecture that supported > 4GB of memory without an iommu decided
to leave bounce buffering to the subsystems, which in those days just
mean block and networking as no one else consumed arbitrary userspace
memory.
Later with ia64, x86_64 and other ports we assumed that either an
iommu or something that fakes it up ("software IOTLB" in beautiful
Intel speak) is present and that subsystems can rely on that for
dealing with addressing limitations in devices. Except that the ARM
LPAE scheme that added larger physical address to 32-bit ARM did not
follow that scheme and thus only worked by chance and only for block
and networking I/O directly to highmem.
Long story, short fix - add swiotlb support to arm when build for LPAE
platforms, which actuallys turns out to be pretty trivial with the
modern dma-direct / swiotlb code to fix the Linux 4.18-ish regression"
* tag 'arm-swiotlb-5.3' of git://git.infradead.org/users/hch/dma-mapping:
arm: use swiotlb for bounce buffering on LPAE configs
dma-mapping: check pfn validity in dma_common_{mmap,get_sgtable}
Linus Torvalds [Fri, 2 Aug 2019 15:41:11 +0000 (08:41 -0700)]
Merge tag 'dma-mapping-5.3-3' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping regression fixes from Christoph Hellwig:
"Two related regression fixes for changes from this merge window to fix
alignment issues introduced in the CMA allocation rework (Nicolin
Chen)"
* tag 'dma-mapping-5.3-3' of git://git.infradead.org/users/hch/dma-mapping:
dma-contiguous: page-align the size in dma_free_contiguous()
dma-contiguous: do not overwrite align in dma_alloc_contiguous()
Daniel Vetter [Fri, 2 Aug 2019 15:10:16 +0000 (17:10 +0200)]
Merge tag 'exynos-drm-fixes-for-v5.3-rc3' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
- Two cleanup patches
. use dev_get_drvdata for readability instead of platform_get_drvdata
. remove redundant assignment to node.
- Two fixup patches
. fix undefined reference to 'vmf_insert_mixed' with NOMMU configuration.
. fix potential infinite spin issue by decrementing 'retry' variable in
scaler_reset function of exynos_drm_scaler.c
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1564734791-745-1-git-send-email-inki.dae@samsung.com
Chris Wilson [Thu, 1 Aug 2019 12:44:58 +0000 (13:44 +0100)]
Revert "drm/vgem: fix cache synchronization on arm/arm64"
commit
7e9e5ead55be ("drm/vgem: fix cache synchronization on arm/arm64")
broke all of the !llc i915-vgem coherency tests in CI, and left the HW
very, very unhappy (which is even more scary).
Fixes:
7e9e5ead55be ("drm/vgem: fix cache synchronization on arm/arm64")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Sean Paul <seanpaul@chromium.org>
Acked-by: Sean Paul <sean@poorly.run>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190801124458.24949-1-chris@chris-wilson.co.uk
Daniel Vetter [Fri, 2 Aug 2019 15:03:04 +0000 (17:03 +0200)]
Merge tag 'drm-misc-fixes-2019-08-02' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.3-rc3:
- Fix some build errors in drm/bridge.
- Do not build i810 on CONFIG_PREEMPTION.
- Fix cache sync on arm in vgem.
- Allow mapping fb in drm_client only when required, and use it to fix bochs fbdev.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/af0dc371-16e0-cee8-0d71-4824d44aa973@linux.intel.com
Vasily Gorbik [Sun, 28 Jul 2019 23:23:46 +0000 (01:23 +0200)]
s390/zcrypt: adjust switch fall through comments for -Wimplicit-fallthrough
Silence the following warnings when built with -Wimplicit-fallthrough=3
enabled by default since 5.3-rc2:
In file included from ./include/linux/preempt.h:11,
from ./include/linux/spinlock.h:51,
from ./include/linux/mmzone.h:8,
from ./include/linux/gfp.h:6,
from ./include/linux/slab.h:15,
from drivers/s390/crypto/ap_queue.c:13:
drivers/s390/crypto/ap_queue.c: In function 'ap_sm_recv':
./include/linux/list.h:577:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
577 | for (pos = list_first_entry(head, typeof(*pos), member); \
| ^~~
drivers/s390/crypto/ap_queue.c:147:3: note: in expansion of macro 'list_for_each_entry'
147 | list_for_each_entry(ap_msg, &aq->pendingq, list) {
| ^~~~~~~~~~~~~~~~~~~
drivers/s390/crypto/ap_queue.c:155:2: note: here
155 | case AP_RESPONSE_NO_PENDING_REPLY:
| ^~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_ep11_xcrb':
drivers/s390/crypto/zcrypt_msgtype6.c:871:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
871 | if (msg->cprbx.cprb_ver_id == 0x04)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:874:2: note: here
874 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_rng':
drivers/s390/crypto/zcrypt_msgtype6.c:901:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
901 | if (msg->cprbx.cprb_ver_id == 0x02)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:907:2: note: here
907 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_xcrb':
drivers/s390/crypto/zcrypt_msgtype6.c:838:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
838 | if (msg->cprbx.cprb_ver_id == 0x02)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:844:2: note: here
844 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_ica':
drivers/s390/crypto/zcrypt_msgtype6.c:801:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
801 | if (msg->cprbx.cprb_ver_id == 0x02)
| ^
drivers/s390/crypto/zcrypt_msgtype6.c:808:2: note: here
808 | default: /* Unknown response type, this should NEVER EVER happen */
| ^~~~~~~
Acked-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Masami Hiramatsu [Thu, 1 Aug 2019 14:36:14 +0000 (23:36 +0900)]
arm64: Make debug exception handlers visible from RCU
Make debug exceptions visible from RCU so that synchronize_rcu()
correctly track the debug exception handler.
This also introduces sanity checks for user-mode exceptions as same
as x86's ist_enter()/ist_exit().
The debug exception can interrupt in idle task. For example, it warns
if we put a kprobe on a function called from idle task as below.
The warning message showed that the rcu_read_lock() caused this
problem. But actually, this means the RCU is lost the context which
is already in NMI/IRQ.
/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events
/sys/kernel/debug/tracing # echo 1 > events/kprobes/enable
/sys/kernel/debug/tracing # [ 135.122237]
[ 135.125035] =============================
[ 135.125310] WARNING: suspicious RCU usage
[ 135.125581] 5.2.0-08445-g9187c508bdc7 #20 Not tainted
[ 135.125904] -----------------------------
[ 135.126205] include/linux/rcupdate.h:594 rcu_read_lock() used illegally while idle!
[ 135.126839]
[ 135.126839] other info that might help us debug this:
[ 135.126839]
[ 135.127410]
[ 135.127410] RCU used illegally from idle CPU!
[ 135.127410] rcu_scheduler_active = 2, debug_locks = 1
[ 135.128114] RCU used illegally from extended quiescent state!
[ 135.128555] 1 lock held by swapper/0/0:
[ 135.128944] #0: (____ptrval____) (rcu_read_lock){....}, at: call_break_hook+0x0/0x178
[ 135.130499]
[ 135.130499] stack backtrace:
[ 135.131192] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-08445-g9187c508bdc7 #20
[ 135.131841] Hardware name: linux,dummy-virt (DT)
[ 135.132224] Call trace:
[ 135.132491] dump_backtrace+0x0/0x140
[ 135.132806] show_stack+0x24/0x30
[ 135.133133] dump_stack+0xc4/0x10c
[ 135.133726] lockdep_rcu_suspicious+0xf8/0x108
[ 135.134171] call_break_hook+0x170/0x178
[ 135.134486] brk_handler+0x28/0x68
[ 135.134792] do_debug_exception+0x90/0x150
[ 135.135051] el1_dbg+0x18/0x8c
[ 135.135260] default_idle_call+0x0/0x44
[ 135.135516] cpu_startup_entry+0x2c/0x30
[ 135.135815] rest_init+0x1b0/0x280
[ 135.136044] arch_call_rest_init+0x14/0x1c
[ 135.136305] start_kernel+0x4d4/0x500
[ 135.136597]
So make debug exception visible to RCU can fix this warning.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Masami Hiramatsu [Thu, 1 Aug 2019 14:25:49 +0000 (23:25 +0900)]
arm64: kprobes: Recover pstate.D in single-step exception handler
kprobes manipulates the interrupted PSTATE for single step, and
doesn't restore it. Thus, if we put a kprobe where the pstate.D
(debug) masked, the mask will be cleared after the kprobe hits.
Moreover, in the most complicated case, this can lead a kernel
crash with below message when a nested kprobe hits.
[ 152.118921] Unexpected kernel single-step exception at EL1
When the 1st kprobe hits, do_debug_exception() will be called.
At this point, debug exception (= pstate.D) must be masked (=1).
But if another kprobes hits before single-step of the first kprobe
(e.g. inside user pre_handler), it unmask the debug exception
(pstate.D = 0) and return.
Then, when the 1st kprobe setting up single-step, it saves current
DAIF, mask DAIF, enable single-step, and restore DAIF.
However, since "D" flag in DAIF is cleared by the 2nd kprobe, the
single-step exception happens soon after restoring DAIF.
This has been introduced by commit
7419333fa15e ("arm64: kprobe:
Always clear pstate.D in breakpoint exception handler")
To solve this issue, this stores all DAIF bits and restore it
after single stepping.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes:
7419333fa15e ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler")
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Daniel Vetter [Fri, 2 Aug 2019 09:31:20 +0000 (11:31 +0200)]
Merge tag 'drm-intel-fixes-2019-08-02' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.3-rc3:
- GVT fixes
- Fix TBT aux powerwell
- Fix PSR2 training pattern duration
- Fix memory leak in runtime wakeref tracking
- Fix ICL memory bandwidth issue preventing planes from being enabled
- Fix OA mux configuration delays for accurate performance data
- Fix VLV/CHV DP audio cdclk frequency requirements
- Fix register whitelisting to fix a number of GL & Vulkan CTS tests
- Fix ICL perf register offsets
- Fix Gen11 Sampler Prefetch workaround, impacting dEQP tests
- Fix various gen2 tracepoints
- A number of GEM locking fixes addressing lockdep issues
- Fix idle engine reset, recover only active engines
- Fix incorrect MCR programming
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87d0hnncgo.fsf@intel.com
Colin Ian King [Mon, 22 Jul 2019 22:25:35 +0000 (23:25 +0100)]
drm/exynos: fix missing decrement of retry counter
Currently the retry counter is not being decremented, leading to a
potential infinite spin if the scalar_reads don't change state.
Addresses-Coverity: ("Infinite loop")
Fixes:
280e54c9f614 ("drm/exynos: scaler: Reset hardware before starting the operation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>