platform/kernel/linux-starfive.git
7 years agoMerge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linus
Jens Axboe [Thu, 8 Jun 2017 14:33:45 +0000 (08:33 -0600)]
Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linus

Christoph writes:

"A few NVMe fixes for 4.12-rc, PCIe reset fixes and APST fixes, a
 RDMA reconnect fix, two FC fixes and a general controller removal fix."

7 years agoFix loop device flush before configure v3
James Wang [Thu, 8 Jun 2017 06:52:51 +0000 (14:52 +0800)]
Fix loop device flush before configure v3

While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
done

Test output:  stock          patched
/dev/loop0    18.1217e-05    8.3842e-05
/dev/loop1     6.1114e-05    0.000147979
/dev/loop10    0.414701      0.000116564
/dev/loop11    0.7474        6.7942e-05
/dev/loop12    0.747986      8.9082e-05
/dev/loop13    0.746532      7.4799e-05
/dev/loop14    0.480041      9.3926e-05
/dev/loop15    1.26453       7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.
(Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.)

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: James Wang <jnwang@suse.com>
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-throttle: set default latency baseline for harddisk
Shaohua Li [Tue, 6 Jun 2017 19:40:43 +0000 (12:40 -0700)]
blk-throttle: set default latency baseline for harddisk

hard disk IO latency varies a lot depending on spindle move. The latency
range could be from several microseconds to several milliseconds. It's
pretty hard to get the baseline latency used by io.low.

We will use a different stragety here. The idea is only using IO with
spindle move to determine if cgroup IO is in good state. For HD, if io
latency is small (< 1ms), we ignore the IO. Such IO is likely from
sequential IO, and is helpless to help determine if a cgroup's IO is
impacted by other cgroups. With this, we only account IO with big
latency. Then we can choose a hardcoded baseline latency for HD (4ms,
which is typical IO latency with seek).  With all these settings, the
io.low latency works for both HD and SSD.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-throttle: fix NULL pointer dereference in throtl_schedule_pending_timer
Joseph Qi [Wed, 7 Jun 2017 03:36:14 +0000 (11:36 +0800)]
blk-throttle: fix NULL pointer dereference in throtl_schedule_pending_timer

I have encountered a NULL pointer dereference in
throtl_schedule_pending_timer:
  [  413.735396] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
  [  413.735535] IP: [<ffffffff812ebbbf>] throtl_schedule_pending_timer+0x3f/0x210
  [  413.735643] PGD 22c8cf067 PUD 22cb34067 PMD 0
  [  413.735713] Oops: 0000 [#1] SMP
  ......

This is caused by the following case:
  blk_throtl_bio
    throtl_schedule_next_dispatch  <= sq is top level one without parent
      throtl_schedule_pending_timer
        sq_to_tg(sq)->td->throtl_slice  <= sq_to_tg(sq) returns NULL

Fix it by using sq_to_td instead of sq_to_tg(sq)->td, which will always
return a valid td.

Fixes: 297e3d854784 ("blk-throttle: make throtl_slice tunable")
Signed-off-by: Joseph Qi <qijiang.qj@alibaba-inc.com>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agonvme: relax APST default max latency to 100ms
Kai-Heng Feng [Wed, 7 Jun 2017 07:25:43 +0000 (15:25 +0800)]
nvme: relax APST default max latency to 100ms

Christoph Hellwig suggests we should to make APST work out of the box.
Hence relax the the default max latency to make them able to enter
deepest power state on default.

Here are id-ctrl excerpts from two high latency NVMes:

vid     : 0x14a4
ssvid   : 0x1b4b
mn      : CX2-GB1024-Q11 NVMe LITEON 1024GB
ps    3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

vid     : 0x15b7
ssvid   : 0x1b4b
mn      : A400 NVMe SanDisk 512GB
ps    3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7 years agonvme: only consider exit latency when choosing useful non-op power states
Kai-Heng Feng [Wed, 7 Jun 2017 07:25:42 +0000 (15:25 +0800)]
nvme: only consider exit latency when choosing useful non-op power states

When a NVMe is in non-op states, the latency is exlat.
The latency will be enlat + exlat only when the NVMe tries to transit
from operational state right atfer it begins to transit to
non-operational state, which should be a rare case.

Therefore, as Andy Lutomirski suggests, use exlat only when deciding power
states to trainsit to.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7 years agonvme-fc: fix missing put reference on controller create failure
James Smart [Mon, 5 Jun 2017 22:03:42 +0000 (15:03 -0700)]
nvme-fc: fix missing put reference on controller create failure

The failure case, of a create controller request, called
nvme_uninit_ctrl() but didn't do a put to allow the nvme
controller to be deleted.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7 years agonvme-fc: on lldd/transport io error, terminate association
James Smart [Fri, 2 Jun 2017 05:54:21 +0000 (22:54 -0700)]
nvme-fc: on lldd/transport io error, terminate association

Per FC-NVME, when lldd or transport detects an i/o error, the
connection must be terminated, which in turn requires the association
to be termianted.  Currently the transport simply creates a nvme
completion status of transport error and returns the io. The FC-NVME
spec makes the mandate as initiator and host, depending on the error,
can get out of sync on outstanding io counts (sqhd/sqtail).

Implement the association teardown on lldd or transport detected
errors.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
7 years agonvme-rdma: fast fail incoming requests while we reconnect
Sagi Grimberg [Mon, 5 Jun 2017 17:35:56 +0000 (20:35 +0300)]
nvme-rdma: fast fail incoming requests while we reconnect

When we encounter an transport/controller errors, error recovery
kicks in which performs:
1. stops io/admin queues
2. moves transport queues out of LIVE state
3. fast fail pending io
4. schedule periodic reconnects.

But we also need to fast fail incoming IO taht enters after we
already scheduled. Given that our queue is not LIVE anymore, simply
restart the request queues to fail in .queue_rq

Reported-by: Alex Turin <alex@vastdata.com>
Reported-by: shahar.salzman <shahar.salzman@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
7 years agonvme-pci: fix multiple ctrl removal scheduling
Rakesh Pandit [Mon, 5 Jun 2017 11:43:11 +0000 (14:43 +0300)]
nvme-pci: fix multiple ctrl removal scheduling

Commit c5f6ce97c1210 tries to address multiple resets but fails as
work_busy doesn't involve any synchronization and can fail.  This is
reproducible easily as can be seen by WARNING below which is triggered
with line:

WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)

Allowing multiple resets can result in multiple controller removal as
well if different conditions inside nvme_reset_work fail and which
might deadlock on device_release_driver.

[  480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0
[  480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast...
[  480.327044]  btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart..
[  480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13
[  480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016
[  480.327066] Workqueue: nvme nvme_reset_work
[  480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000
[  480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0
[  480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246
[  480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200
[  480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128
[  480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000
[  480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000
[  480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128
[  480.327072] FS:  0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000
[  480.327073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0
[  480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  480.327075] Call Trace:
[  480.327079]  ? __switch_to+0x227/0x400
[  480.327081]  process_one_work+0x18c/0x3a0
[  480.327082]  worker_thread+0x4e/0x3b0
[  480.327084]  kthread+0x109/0x140
[  480.327085]  ? process_one_work+0x3a0/0x3a0
[  480.327087]  ? kthread_park+0x60/0x60
[  480.327102]  ret_from_fork+0x2c/0x40
[  480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f.....

This patch addresses the problem by using state of controller to
decide whether reset should be queued or not as state change is
synchronizated using controller spinlock.  Also cancel_work_sync is
used to make sure remove cancels the reset_work and waits for it to
finish.  This patch also changes return value from -ENODEV to more
appropriate -EBUSY if nvme_reset fails to change state.

Fixes: c5f6ce97c1210 ("nvme: don't schedule multiple resets")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7 years agonvme: fix hang in remove path
Ming Lei [Fri, 2 Jun 2017 08:32:08 +0000 (16:32 +0800)]
nvme: fix hang in remove path

We need to start admin queues too in nvme_kill_queues()
for avoiding hang in remove path[1].

This patch is very similar with 806f026f9b901eaf(nvme: use
blk_mq_start_hw_queues() in nvme_kill_queues()).

[1] hang stack trace
[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7 years agoelevator: fix truncation of icq_cache_name
Eric Biggers [Sat, 3 Jun 2017 03:35:51 +0000 (20:35 -0700)]
elevator: fix truncation of icq_cache_name

gcc 7.1 reports the following warning:

    block/elevator.c: In function ‘elv_register’:
    block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
         "%s_io_cq", e->elevator_name);
         ^~~~~~~~~~
    block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
       snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         "%s_io_cq", e->elevator_name);
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The bug is that the name of the icq_cache is 6 characters longer than
the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
for it --- so in the case of a maximum-length elevator name, the 'q'
character in "_io_cq" would be truncated by snprintf().  Fix it by
reserving ELV_NAME_MAX + 6 characters instead.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq: fix direct issue
Ming Lei [Tue, 6 Jun 2017 15:22:00 +0000 (23:22 +0800)]
blk-mq: fix direct issue

If queue is stopped, we shouldn't dispatch request into driver and
hardware, unfortunately the check is removed in bd166ef183c2(blk-mq-sched:
add framework for MQ capable IO schedulers).

This patch fixes the issue by moving the check back into
__blk_mq_try_issue_directly().

This patch fixes request use-after-free[1][2] during canceling requets
of NVMe in nvme_dev_disable(), which can be triggered easily during
NVMe reset & remove test.

[1] oops kernel log when CONFIG_BLK_DEV_INTEGRITY is on
[  103.412969] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[  103.412980] IP: bio_integrity_advance+0x48/0xf0
[  103.412981] PGD 275a88067
[  103.412981] P4D 275a88067
[  103.412982] PUD 276c43067
[  103.412983] PMD 0
[  103.412984]
[  103.412986] Oops: 0000 [#1] SMP
[  103.412989] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support mxm_wmi glue_helper dcdbas ipmi_si mei_me pcspkr mei sg ipmi_devintf lpc_ich ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel nvme ahci nvme_core libahci libata tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
[  103.413035] CPU: 0 PID: 102 Comm: kworker/0:2 Not tainted 4.11.0+ #1
[  103.413036] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  103.413041] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[  103.413043] task: ffff9cc8775c8000 task.stack: ffffc033c252c000
[  103.413045] RIP: 0010:bio_integrity_advance+0x48/0xf0
[  103.413046] RSP: 0018:ffffc033c252fc10 EFLAGS: 00010202
[  103.413048] RAX: 0000000000000000 RBX: ffff9cc8720a8cc0 RCX: ffff9cca72958240
[  103.413049] RDX: ffff9cca72958000 RSI: 0000000000000008 RDI: ffff9cc872537f00
[  103.413049] RBP: ffffc033c252fc28 R08: 0000000000000000 R09: ffffffffb963a0d5
[  103.413050] R10: 000000000000063e R11: 0000000000000000 R12: ffff9cc8720a8d18
[  103.413051] R13: 0000000000001000 R14: ffff9cc872682e00 R15: 00000000fffffffb
[  103.413053] FS:  0000000000000000(0000) GS:ffff9cc877c00000(0000) knlGS:0000000000000000
[  103.413054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.413055] CR2: 000000000000000a CR3: 0000000276c41000 CR4: 00000000001406f0
[  103.413056] Call Trace:
[  103.413063]  bio_advance+0x2a/0xe0
[  103.413067]  blk_update_request+0x76/0x330
[  103.413072]  blk_mq_end_request+0x1a/0x70
[  103.413074]  blk_mq_dispatch_rq_list+0x370/0x410
[  103.413076]  ? blk_mq_flush_busy_ctxs+0x94/0xe0
[  103.413080]  blk_mq_sched_dispatch_requests+0x173/0x1a0
[  103.413083]  __blk_mq_run_hw_queue+0x8e/0xa0
[  103.413085]  __blk_mq_delay_run_hw_queue+0x9d/0xa0
[  103.413088]  blk_mq_start_hw_queue+0x17/0x20
[  103.413090]  blk_mq_start_hw_queues+0x32/0x50
[  103.413095]  nvme_kill_queues+0x54/0x80 [nvme_core]
[  103.413097]  nvme_remove_dead_ctrl_work+0x1f/0x40 [nvme]
[  103.413103]  process_one_work+0x149/0x360
[  103.413105]  worker_thread+0x4d/0x3c0
[  103.413109]  kthread+0x109/0x140
[  103.413111]  ? rescuer_thread+0x380/0x380
[  103.413113]  ? kthread_park+0x60/0x60
[  103.413120]  ret_from_fork+0x2c/0x40
[  103.413121] Code: 08 4c 8b 63 50 48 8b 80 80 00 00 00 48 8b 90 d0 03 00 00 31 c0 48 83 ba 40 02 00 00 00 48 8d 8a 40 02 00 00 48 0f 45 c1 c1 ee 09 <0f> b6 48 0a 0f b6 40 09 41 89 f5 83 e9 09 41 d3 ed 44 0f af e8
[  103.413145] RIP: bio_integrity_advance+0x48/0xf0 RSP: ffffc033c252fc10
[  103.413146] CR2: 000000000000000a
[  103.413157] ---[ end trace cd6875d16eb5a11e ]---
[  103.455368] Kernel panic - not syncing: Fatal exception
[  103.459826] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  103.850916] ---[ end Kernel panic - not syncing: Fatal exception
[  103.857637] sched: Unexpected reschedule of offline CPU#1!
[  103.863762] ------------[ cut here ]------------

[2] kernel hang in blk_mq_freeze_queue_wait() when CONFIG_BLK_DEV_INTEGRITY is off
[  247.129825] INFO: task nvme-test:1772 blocked for more than 120 seconds.
[  247.137311]       Not tainted 4.12.0-rc2.upstream+ #4
[  247.142954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.151704] Call Trace:
[  247.154445]  __schedule+0x28a/0x880
[  247.158341]  schedule+0x36/0x80
[  247.161850]  blk_mq_freeze_queue_wait+0x4b/0xb0
[  247.166913]  ? remove_wait_queue+0x60/0x60
[  247.171485]  blk_freeze_queue+0x1a/0x20
[  247.175770]  blk_cleanup_queue+0x7f/0x140
[  247.180252]  nvme_ns_remove+0xa3/0xb0 [nvme_core]
[  247.185503]  nvme_remove_namespaces+0x32/0x50 [nvme_core]
[  247.191532]  nvme_uninit_ctrl+0x2d/0xa0 [nvme_core]
[  247.196977]  nvme_remove+0x70/0x110 [nvme]
[  247.201545]  pci_device_remove+0x39/0xc0
[  247.205927]  device_release_driver_internal+0x141/0x200
[  247.211761]  device_release_driver+0x12/0x20
[  247.216531]  pci_stop_bus_device+0x8c/0xa0
[  247.221104]  pci_stop_and_remove_bus_device_locked+0x1a/0x30
[  247.227420]  remove_store+0x7c/0x90
[  247.231320]  dev_attr_store+0x18/0x30
[  247.235409]  sysfs_kf_write+0x3a/0x50
[  247.239497]  kernfs_fop_write+0xff/0x180
[  247.243867]  __vfs_write+0x37/0x160
[  247.247757]  ? selinux_file_permission+0xe5/0x120
[  247.253011]  ? security_file_permission+0x3b/0xc0
[  247.258260]  vfs_write+0xb2/0x1b0
[  247.261964]  ? syscall_trace_enter+0x1d0/0x2b0
[  247.266924]  SyS_write+0x55/0xc0
[  247.270540]  do_syscall_64+0x67/0x150
[  247.274636]  entry_SYSCALL64_slow_path+0x25/0x25
[  247.279794] RIP: 0033:0x7f5c96740840
[  247.283785] RSP: 002b:00007ffd00e87ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  247.292238] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5c96740840
[  247.300194] RDX: 0000000000000002 RSI: 00007f5c97060000 RDI: 0000000000000001
[  247.308159] RBP: 00007f5c97060000 R08: 000000000000000a R09: 00007f5c97059740
[  247.316123] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5c96a14400
[  247.324087] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[  370.016340] INFO: task nvme-test:1772 blocked for more than 120 seconds.

Fixes: 12d70958a2e8(blk-mq: don't fail allocating driver tag for stopped hw queue)
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq: pass correct hctx to blk_mq_try_issue_directly
Ming Lei [Tue, 6 Jun 2017 15:21:59 +0000 (23:21 +0800)]
blk-mq: pass correct hctx to blk_mq_try_issue_directly

When direct issue is done on request picked up from plug list,
the hctx need to be updated with the actual hw queue, otherwise
wrong hctx is used and may hurt performance, especially when
wrong SRCU readlock is acquired/released

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agobio-integrity: Do not allocate integrity context for bio w/o data
Dmitry Monakhov [Wed, 10 May 2017 15:20:44 +0000 (19:20 +0400)]
bio-integrity: Do not allocate integrity context for bio w/o data

If bio has no data, such as ones from blkdev_issue_flush(),
then we have nothing to protect.

This patch prevent bugon like follows:

kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
kernel BUG at mm/slab.c:2773!
invalid opcode: 0000 [#1] SMP
Modules linked in: bcache
CPU: 0 PID: 4428 Comm: xfs_io Tainted: G        W       4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
task: ffff880137786440 task.stack: ffffc90000ba8000
RIP: 0010:kfree_debugcheck+0x25/0x2a
RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
FS:  00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
Call Trace:
 kfree+0xc8/0x1b3
 bio_integrity_free+0xc3/0x16b
 bio_free+0x25/0x66
 bio_put+0x14/0x26
 blkdev_issue_flush+0x7a/0x85
 blkdev_fsync+0x35/0x42
 vfs_fsync_range+0x8e/0x9f
 vfs_fsync+0x1c/0x1e
 do_fsync+0x31/0x4a
 SyS_fsync+0x10/0x14
 entry_SYSCALL_64_fastpath+0x1f/0xc2

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoMerge tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Fri, 2 Jun 2017 19:29:03 +0000 (12:29 -0700)]
Merge tag 'xfs-4.12-fixes-3' of git://git./fs/xfs/xfs-linux

Pull XFS fix from Darrick Wong:
 "I've one more bugfix for you for 4.12-rc4: Fix an unmount hang due to
  a race in io buffer accounting"

* tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use ->b_state to fix buffer I/O accounting release race

7 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 2 Jun 2017 19:06:27 +0000 (12:06 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "ACPI-related fixes for arm64:

   - GICC MADT entry validity check fix

   - Skip IRQ registration with pmu=off in an ACPI guest

   - struct acpi_pci_root_ops freeing on error path"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
  drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off
  ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path

7 years agoMerge tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 2 Jun 2017 19:03:07 +0000 (12:03 -0700)]
Merge tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A small fix for rbd FALLOC_FL_ZERO_RANGE/PUNCH_HOLE handling breakage
  introduced in -rc1"

* tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client:
  rbd: implement REQ_OP_WRITE_ZEROES

7 years agoMerge tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 2 Jun 2017 18:50:37 +0000 (11:50 -0700)]
Merge tag 'for-4.12/dm-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - a DM verity fix for a mode when no salt is used

 - a fix to DM to account for the possibility that PREFLUSH or FUA are
   used without the SYNC flag if the underlying storage doesn't have a
   volatile write-cache

 - a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow
   emergency forward progress (by using memory reserves as last resort)

 - a small DM integrity cleanup to use kvmalloc() instead of duplicating
   the same

* tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: make flush bios explicitly sync
  dm ioctl: restore __GFP_HIGH in copy_params()
  dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc()
  dm verity: fix no salt use case

7 years agoMerge tag 'md/4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Linus Torvalds [Fri, 2 Jun 2017 18:47:24 +0000 (11:47 -0700)]
Merge tag 'md/4.12-rc4' of git://git./linux/kernel/git/shli/md

Pull MD fixes from Shaohua Li:
 "Several patches for MD. One notable is making flush bios sync, others
  fix small issues"

* tag 'md/4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md: Make flush bios explicitely sync
  md: report sector of stripes with check mismatches
  md: uuid debug statement now in processor byte order.
  md-cluster: fix potential lock issue in add_new_disk

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 2 Jun 2017 18:44:46 +0000 (11:44 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A set of fixes that should go into the next -rc. This contains:

   - A use-after-free in the request_list exit for the legacy IO path,
     from Bart.

   - A fix for CFQ, fixing a recent regression with the conversion to
     higher resolution timing for iops mode. From Hou Tao.

   - A single fix for nbd, split in two patches, fixing a leak of a data
     structure.

   - A regression fix from Keith, ensuring that callers of
     blk_mq_update_nr_hw_queues() hold the right lock"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: Avoid that blk_exit_rl() triggers a use-after-free
  cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
  blk-mq: Take tagset lock when updating hw queues
  nbd: don't leak nbd_config
  nbd: nbd_reset() call in nbd_dev_add() is redundant

7 years agoMerge tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 2 Jun 2017 18:32:38 +0000 (11:32 -0700)]
Merge tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux

Pull drm displayport quirk support:
 "DP quirk for usb c dongles.

  As mentioned I have a separate request for fixing a regression, but
  also keeping the broken hw working, for certain USB-C DP adapters they
  require a minimised n/m parameters, but an attempt to do this
  generically has failed, we need to quirk these specific adapters.
  However doing it generically regressed some eDP panels.

  This pull adds the infrastructure and a quirk for the adapter"

* tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Detect USB-C specific dongles before reducing M and N
  drm/dp: start a DPCD based DP sink/branch device quirk database
  drm/i915: use drm DP helper to read DPCD desc
  drm/dp: add helper for reading DP sink/branch device desc from DPCD

7 years agoMerge tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 2 Jun 2017 16:40:47 +0000 (09:40 -0700)]
Merge tag 'sound-4.12-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This contains the fixes for a few reported regression for HD-audio and
  USB-audio. All small, trivial, and boring"

* tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix applying MSI dual-codec mobo quirk
  ALSA: usb: Avoid VLA in mixer_us16x08.c
  ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
  Revert "ALSA: usb-audio: purge needless variable length array"

7 years agoMerge tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Fri, 2 Jun 2017 16:26:42 +0000 (09:26 -0700)]
Merge tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "Here is the dmaengine fixes request for 4.12. Fixes bunch of issues in
  the driver, npthing exciting though..

   - mv_xor_v2 driver fixes for handling descriptors, tx_submit
     implementation, removing interrupt coalescing and setting DMA mask
     properly

   - fix usb-dmac DMAOR AE bit definition

   - fix ep93xx start buffer from BASE0 and not drain the transfers in
     terminate_all

   - fix rcar-dmac to use right descriptor pointer for residue
     calculation

   - pl330 fix warn for irq freeup"

* tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: pl330: fix warning in pl330_remove
  rcar-dmac: fixup descriptor pointer for descriptor mode
  dmaengine: ep93xx: Don't drain the transfers in terminate_all()
  dmaengine: ep93xx: Always start from BASE0
  dmaengine: usb-dmac: Fix DMAOR AE bit definition
  dmaengine: mv_xor_v2: set DMA mask to 40 bits
  dmaengine: mv_xor_v2: remove interrupt coalescing
  dmaengine: mv_xor_v2: fix tx_submit() implementation
  dmaengine: mv_xor_v2: enable XOR engine after its configuration
  dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
  dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
  dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Linus Torvalds [Fri, 2 Jun 2017 16:23:56 +0000 (09:23 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

 - corner-case oops fixes for Asus and Wacom drivers from Carlo Caione
   and Jason Gerecke

 - power management fix (reported on SIS0817 touchscreen) for i2c-hid
   devices from Hans de Goede

 - device-id-specific fixes and quirks from Hans de Goede, Diego Elio
   Pettenò and Che-Liang Chiou

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: asus: Stop underlying hardware on remove
  HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
  HID: asus: Add support for T100 keyboard
  HID: elecom: extend to fix the descriptor for DEFT trackballs
  HID: magicmouse: Set multi-touch keybits for Magic Mouse
  HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livep...
Linus Torvalds [Fri, 2 Jun 2017 15:59:17 +0000 (08:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching

Pull livepatching fix from Jiri Kosina:
 "Kconfig dependency fix for livepatching infrastructure from Miroslav
  Benes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS

7 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 2 Jun 2017 15:53:42 +0000 (08:53 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - revert a broken PAT commit that broke a number of systems

   - fix two preemptability warnings/bugs that can trigger under certain
     circumstances, in the debug code and in the microcode loader"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
  x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
  x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug

7 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 2 Jun 2017 15:51:53 +0000 (08:51 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:
 "Misc fixes:

   - three boot crash fixes for uncommon configurations

   - silence a boot warning under virtualization

   - plus a GCC 7 related (harmless) build warning fix"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
  x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
  x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
  efi: Remove duplicate 'const' specifiers
  efi: Don't issue error message when booted under Xen

7 years agoARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
Lorenzo Pieralisi [Fri, 26 May 2017 16:40:02 +0000 (17:40 +0100)]
ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation

The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
muster from an ACPI specification standpoint. Current macro detects the
MADT GICC entry length through ACPI firmware version (it changed from 76
to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
but always uses (erroneously) the ACPICA (latest) struct (ie struct
acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
the current GICC entry memory record exceeds the MADT table end in
memory as defined by the MADT table header itself, which may result in
false negatives depending on the ACPI firmware version and how the MADT
entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
entries are 76 bytes long, so by adding 80 to a GICC entry start address
in memory the resulting address may well be past the actual MADT end,
triggering a false negative).

Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
and update them to always use the firmware version specific MADT GICC
entry length in order to carry out boundary checks.

Fixes: b6cfb277378e ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
Reported-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Al Stone <ahs3@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
7 years agoHID: asus: Stop underlying hardware on remove
Carlo Caione [Tue, 30 May 2017 20:39:46 +0000 (22:39 +0200)]
HID: asus: Stop underlying hardware on remove

We are missing a call to hid_hw_stop() on the remove hook.
Among other things this is causing an Oops when (re-)starting GNOME /
upowerd / ... after the module has been already rmmod-ed.

Signed-off-by: Carlo Caione <carlo@endlessm.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
7 years agodmaengine: pl330: fix warning in pl330_remove
Jean-Philippe Brucker [Thu, 1 Jun 2017 18:22:01 +0000 (19:22 +0100)]
dmaengine: pl330: fix warning in pl330_remove

When removing a device with less than 9 IRQs (AMBA_NR_IRQS), we'll get a
big WARN_ON from devres.c because pl330_remove calls devm_free_irqs for
unallocated irqs. Similarly to pl330_probe, check that IRQ number is
present before calling devm_free_irq.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
7 years agoMerge tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel...
Dave Airlie [Fri, 2 Jun 2017 02:57:32 +0000 (12:57 +1000)]
Merge tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes

DP sink specific quirks

* tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Detect USB-C specific dongles before reducing M and N
  drm/dp: start a DPCD based DP sink/branch device quirk database
  drm/i915: use drm DP helper to read DPCD desc
  drm/dp: add helper for reading DP sink/branch device desc from DPCD

7 years agoMerge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Thu, 1 Jun 2017 23:24:48 +0000 (16:24 -0700)]
Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Revert patch accidentally included in the merge window pull request,
  and fix a crash that was likely a result of buggy client behavior"

* tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix null dereference on replay
  nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"

7 years agoMerge tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 1 Jun 2017 23:17:42 +0000 (16:17 -0700)]
Merge tag 'gcc-plugins-v4.12-rc4' of git://git./linux/kernel/git/kees/linux

Pull gcc-plugin prepwork from Kees Cook:
 "Use designated initializers for mtk-vcodec, powerplay, amdgpu, and
  sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs,
  and NFS.

  Christoph Hellwig recommended that I send these fixes now, rather than
  waiting for the v4.13 merge window. These are all initializer and cast
  fixes needed for the future randstruct plugin that haven't been picked
  up by the respective maintainers"

* tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mtk-vcodec: Use designated initializers
  drm/amd/powerplay: Use designated initializers
  drm/amdgpu: Use designated initializers
  sgi-xp: Use designated initializers
  ocfs2: Use ERR_CAST() to avoid cross-structure cast
  ntfs: Use ERR_CAST() to avoid cross-structure cast
  NFS: Use ERR_CAST() to avoid cross-structure cast

7 years agoblock: Avoid that blk_exit_rl() triggers a use-after-free
Bart Van Assche [Wed, 31 May 2017 21:43:45 +0000 (14:43 -0700)]
block: Avoid that blk_exit_rl() triggers a use-after-free

Since the introduction of .init_rq_fn() and .exit_rq_fn() it is
essential that the memory allocated for struct request_queue
stays around until all blk_exit_rl() calls have finished. Hence
make blk_init_rl() take a reference on struct request_queue.

This patch fixes the following crash:

general protection fault: 0000 [#2] SMP
CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G      D         4.12.0-rc2-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
task: ffff88013a108040 task.stack: ffffc9000071c000
RIP: 0010:free_request_size+0x1a/0x30
RSP: 0018:ffffc9000071fd38 EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: ffff880067362a88 RCX: 0000000000000003
RDX: ffff880067464178 RSI: ffff880067362a88 RDI: ffff880135ea4418
RBP: ffffc9000071fd40 R08: 0000000000000000 R09: 0000000100180009
R10: ffffc9000071fd38 R11: ffffffff81110800 R12: ffff88006752d3d8
R13: ffff88006752d3d8 R14: ffff88013a108040 R15: 000000000000000a
FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa8ec1edb00 CR3: 0000000138ee8000 CR4: 00000000001406e0
Call Trace:
 mempool_destroy.part.10+0x21/0x40
 mempool_destroy+0xe/0x10
 blk_exit_rl+0x12/0x20
 blkg_free+0x4d/0xa0
 __blkg_release_rcu+0x59/0x170
 rcu_process_callbacks+0x260/0x4e0
 __do_softirq+0x116/0x250
 smpboot_thread_fn+0x123/0x1e0
 kthread+0x109/0x140
 ret_from_fork+0x31/0x40

Fixes: commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 1 Jun 2017 17:48:09 +0000 (10:48 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Many small x86 bug fixes: SVM segment registers access rights, nested
  VMX, preempt notifiers, LAPIC virtual wire mode, NMI injection"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix nmi injection failure when vcpu got blocked
  KVM: SVM: do not zero out segment attributes if segment is unusable or not present
  KVM: SVM: ignore type when setting segment registers
  KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
  KVM: x86: Fix virtual wire mode
  KVM: nVMX: Fix handling of lmsw instruction
  KVM: X86: Fix preempt the preemption timer cancel

7 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Linus Torvalds [Thu, 1 Jun 2017 17:45:27 +0000 (10:45 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs

Pull Reiserfs and GFS2 fixes from Jan Kara:
 "Fixes to GFS2 & Reiserfs for the fallout of the recent WRITE_FUA
  cleanup from Christoph.

  Fixes for other filesystems were already merged by respective
  maintainers."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: Make flush bios explicitely sync
  gfs2: Make flush bios explicitely sync

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Thu, 1 Jun 2017 17:40:41 +0000 (10:40 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
 "Here are the target-pending fixes for v4.12-rc4:

   - ibmviscsis ABORT_TASK handling fixes that missed the v4.12 merge
     window. (Bryant Ly and Michael Cyr)

   - Re-add a target-core check enforcing WRITE overflow reject that was
     relaxed in v4.3, to avoid unsupported iscsi-target immediate data
     overflow. (nab)

   - Fix a target-core-user OOPs during device removal. (MNC + Bryant
     Ly)

   - Fix a long standing iscsi-target potential issue where kthread exit
     did not wait for kthread_should_stop(). (Jiang Yi)

   - Fix a iscsi-target v3.12.y regression OOPs involving initial login
     PDU processing during asynchronous TCP connection close. (MNC +
     nab)

  This is a little larger than usual for an -rc4, primarily due to the
  iscsi-target v3.12.y regression OOPs bug-fix.

  However, it's an important patch as MNC + Hannes where both able to
  trigger it using a reduced iscsi initiator login timeout combined with
  a backend taking a long time to complete I/Os during iscsi login
  driven session reinstatement"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: Always wait for kthread_should_stop() before kthread exit
  iscsi-target: Fix initial login PDU asynchronous socket close OOPs
  tcmu: fix crash during device removal
  target: Re-add check to reject control WRITEs with overflow data
  ibmvscsis: Fix the incorrect req_lim_delta
  ibmvscsis: Clear left-over abort_cmd pointers

7 years agoRevert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
Ingo Molnar [Thu, 1 Jun 2017 13:52:23 +0000 (15:52 +0200)]
Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"

This reverts commit cbed27cdf0e3f7ea3b2259e86b9e34df02be3fe4.

As Andy Lutomirski observed:

 "I think this patch is bogus. pat_enabled() sure looks like it's
  supposed to return true if PAT is *enabled*, and these days PAT is
  'enabled' even if there's no HW PAT support."

Reported-by: Bernhard Held <berny156@gmx.de>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org # v4.2+
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoKVM: x86: Fix nmi injection failure when vcpu got blocked
ZhuangYanying [Fri, 26 May 2017 05:16:48 +0000 (13:16 +0800)]
KVM: x86: Fix nmi injection failure when vcpu got blocked

When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads,
other than the lock-holding one, would enter into S state because of
pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could
not be injected into vm.

The reason is:
1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets
cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because
cpu->kvm_vcpu_dirty is true.

It's not enough just to check nmi_queued to decide whether to stay in
vcpu_block() or not. NMI should be injected immediately at any situation.
Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued
in vm_vcpu_has_events().

Do the same change for SMIs.

Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoKVM: SVM: do not zero out segment attributes if segment is unusable or not present
Roman Pen [Thu, 1 Jun 2017 08:55:03 +0000 (10:55 +0200)]
KVM: SVM: do not zero out segment attributes if segment is unusable or not present

This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
was taken on userspace stack.  The root cause lies in the specific AMD CPU
behaviour which manifests itself as unusable segment attributes on SYSRET.
The corresponding work around for the kernel is the following:

61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")

In other turn virtualization side treated unusable segment incorrectly and
restored CPL from SS attributes, which were zeroed out few lines above.

In current patch it is assured only that P bit is cleared in VMCB.save state
and segment attributes are not zeroed out if segment is not presented or is
unusable, therefore CPL can be safely restored from DPL field.

This is only one part of the fix, since QEMU side should be fixed accordingly
not to zero out attributes on its side.  Corresponding patch will follow.

[1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Mikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoALSA: hda - Fix applying MSI dual-codec mobo quirk
Takashi Iwai [Thu, 1 Jun 2017 07:35:30 +0000 (09:35 +0200)]
ALSA: hda - Fix applying MSI dual-codec mobo quirk

The previous commit [63691587f7b0: ALSA: hda - Apply dual-codec quirk
for MSI Z270-Gaming mobo] attempted to apply the existing dual-codec
quirk for a MSI mobo.  But it turned out that this isn't applied
properly due to the MSI-vendor quirk before this entry.  I overlooked
such two MSI entries just because they were put in the wrong position,
although we have a list ordered by PCI SSID numbers.

This patch fixes it by rearranging the unordered entries.

Fixes: 63691587f7b0 ("ALSA: hda - Apply dual-codec quirk for MSI Z270-Gaming mobo")
Reported-by: Rudolf Schmidt <info@rudolfschmidt.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years agoMerge tag 'drm-fixes-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Thu, 1 Jun 2017 04:53:49 +0000 (21:53 -0700)]
Merge tag 'drm-fixes-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "This is the main set of fixes for rc4, one amdgpu fix, some exynos
  regression fixes, some msm fixes and some i915 and GVT fixes.

  I've got a second regression fix for some DP chips that might be a
  bit large, but I think we'd like to land it now, I'll send it along
  tomorrow, once you are happy with this set"

* tag 'drm-fixes-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux: (24 commits)
  drm/amdgpu: Program ring for vce instance 1 at its register space
  drm/exynos: clean up description of exynos_drm_crtc
  drm/exynos: dsi: Remove bridge node reference in removal
  drm/exynos: dsi: Fix the parse_dt function
  drm/exynos: Merge pre/postclose hooks
  drm/msm: Fix the check for the command size
  drm/msm: Take the mutex before calling msm_gem_new_impl
  drm/msm: for array in-fences, check if all backing fences are from our own context before waiting
  drm/msm: constify irq_domain_ops
  drm/msm/mdp5: release hwpipe(s) for unused planes
  drm/msm: Reuse dma_fence_release.
  drm/msm: Expose our reservation object when exporting a dmabuf.
  drm/msm/gpu: check legacy clk names in get_clocks()
  drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
  drm/msm: select PM_OPP
  drm/i915: Stop pretending to mask/unmask LPE audio interrupts
  drm/i915/selftests: Silence compiler warning in igt_ctx_exec
  Revert "drm/i915: Restore lost "Initialized i915" welcome message"
  drm/i915/gvt: clean up unsubmited workloads before destroying kmem cache
  drm/i915/gvt: Disable compression workaround for Gen9
  ...

7 years agoMerge tag 'exynos-drm-fixes-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel...
Dave Airlie [Thu, 1 Jun 2017 02:07:48 +0000 (12:07 +1000)]
Merge tag 'exynos-drm-fixes-for-v4.12' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes

- Fix a regression to description of exynos_drm_crtc
- Remove preclose hook of Exynos
  . This was a exynos change of the patch series[1] merged already.
- Fix one dt broken issue
- Make sure to release bridge_node of Exynos MIPI-DSI driver.

[1] https://lists.freedesktop.org/archives/dri-devel/2017-March/135111.html

* tag 'exynos-drm-fixes-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
  drm/exynos: clean up description of exynos_drm_crtc
  drm/exynos: dsi: Remove bridge node reference in removal
  drm/exynos: dsi: Fix the parse_dt function
  drm/exynos: Merge pre/postclose hooks

7 years agoMerge branch 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Thu, 1 Jun 2017 02:07:18 +0000 (12:07 +1000)]
Merge branch 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

* 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: Program ring for vce instance 1 at its register space

7 years agoMerge branch 'msm-fixes-4.12-rc4' of git://people.freedesktop.org/~robclark/linux...
Dave Airlie [Thu, 1 Jun 2017 02:06:34 +0000 (12:06 +1000)]
Merge branch 'msm-fixes-4.12-rc4' of git://people.freedesktop.org/~robclark/linux into drm-fixes

a few fixes for 4.12..

* 'msm-fixes-4.12-rc4' of git://people.freedesktop.org/~robclark/linux:
  drm/msm: Fix the check for the command size
  drm/msm: Take the mutex before calling msm_gem_new_impl
  drm/msm: for array in-fences, check if all backing fences are from our own context before waiting
  drm/msm: constify irq_domain_ops
  drm/msm/mdp5: release hwpipe(s) for unused planes
  drm/msm: Reuse dma_fence_release.
  drm/msm: Expose our reservation object when exporting a dmabuf.
  drm/msm/gpu: check legacy clk names in get_clocks()
  drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
  drm/msm: select PM_OPP

7 years agoMerge tag 'drm-intel-fixes-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel...
Dave Airlie [Thu, 1 Jun 2017 01:53:34 +0000 (11:53 +1000)]
Merge tag 'drm-intel-fixes-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes

drm/i915 fixes for v4.12-rc4

* tag 'drm-intel-fixes-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Stop pretending to mask/unmask LPE audio interrupts
  drm/i915/selftests: Silence compiler warning in igt_ctx_exec
  Revert "drm/i915: Restore lost "Initialized i915" welcome message"
  drm/i915/gvt: clean up unsubmited workloads before destroying kmem cache
  drm/i915/gvt: Disable compression workaround for Gen9
  drm/i915: set initialised only when init_context callback is NULL
  drm/i915: Fix new -Wint-in-bool-context gcc compiler warning
  drm/i915: use vma->size for appgtt allocate_va_range
  drm/i915: Do not sync RCU during shrinking

7 years agoiscsi-target: Always wait for kthread_should_stop() before kthread exit
Jiang Yi [Tue, 16 May 2017 09:57:55 +0000 (17:57 +0800)]
iscsi-target: Always wait for kthread_should_stop() before kthread exit

There are three timing problems in the kthread usages of iscsi_target_mod:

 - np_thread of struct iscsi_np
 - rx_thread and tx_thread of struct iscsi_conn

In iscsit_close_connection(), it calls

 send_sig(SIGINT, conn->tx_thread, 1);
 kthread_stop(conn->tx_thread);

In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().

So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.

This is invalid according to the documentation of kthread_stop().

(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
 early iscsi_target_rx_thread failure case - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
7 years agoiscsi-target: Fix initial login PDU asynchronous socket close OOPs
Nicholas Bellinger [Thu, 25 May 2017 04:47:09 +0000 (21:47 -0700)]
iscsi-target: Fix initial login PDU asynchronous socket close OOPs

This patch fixes a OOPs originally introduced by:

   commit bb048357dad6d604520c91586334c9c230366a14
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
7 years agodrm/amdgpu: Program ring for vce instance 1 at its register space
Leo Liu [Mon, 29 May 2017 17:13:59 +0000 (13:13 -0400)]
drm/amdgpu: Program ring for vce instance 1 at its register space

We need program ring buffer on instance 1 register space domain,
when only if instance 1 available, with two instances or instance 0,
and we need only program instance 0 regsiter space domain for ring.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 years agomd: Make flush bios explicitely sync
Jan Kara [Wed, 31 May 2017 07:44:33 +0000 (09:44 +0200)]
md: Make flush bios explicitely sync

Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

CC: linux-raid@vger.kernel.org
CC: Shaohua Li <shli@kernel.org>
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shaohua Li <shli@fb.com>
7 years agoMerge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Linus Torvalds [Wed, 31 May 2017 15:29:02 +0000 (08:29 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix regressions:

   - missing CONFIG_EXPORTFS dependency

   - failure if upper fs doesn't support xattr

   - bad error cleanup

  This also adds the concept of "impure" directories complementing the
  "origin" marking introduced in -rc1. Together they enable getting
  consistent st_ino and d_ino for directory listings.

  And there's a bug fix and a cleanup as well"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: filter trusted xattr for non-admin
  ovl: mark upper merge dir with type origin entries "impure"
  ovl: mark upper dir with type origin entries "impure"
  ovl: remove unused arg from ovl_lookup_temp()
  ovl: handle rename when upper doesn't support xattr
  ovl: don't fail copy-up if upper doesn't support xattr
  ovl: check on mount time if upper fs supports setting xattr
  ovl: fix creds leak in copy up error path
  ovl: select EXPORTFS

7 years agocfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
Hou Tao [Wed, 1 Mar 2017 01:02:33 +0000 (09:02 +0800)]
cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode

When adding a cfq_group into the cfq service tree, we use CFQ_IDLE_DELAY
as the delay of cfq_group's vdisktime if there have been other cfq_groups
already.

When cfq is under iops mode, commit 9a7f38c42c2b ("cfq-iosched: Convert
from jiffies to nanoseconds") could result in a large iops delay and
lead to an abnormal io schedule delay for the added cfq_group. To fix
it, we just need to revert to the old CFQ_IDLE_DELAY value: HZ / 5
when iops mode is enabled.

Despite having the same value, the delay of a cfq_queue in idle class
and the delay of cfq_group are different things, so I define two new
macros for the delay of a cfq_group under time-slice mode and iops mode.

Fixes: 9a7f38c42c2b ("cfq-iosched: Convert from jiffies to nanoseconds")
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoxfs: use ->b_state to fix buffer I/O accounting release race
Brian Foster [Wed, 31 May 2017 15:22:52 +0000 (08:22 -0700)]
xfs: use ->b_state to fix buffer I/O accounting release race

We've had user reports of unmount hangs in xfs_wait_buftarg() that
analysis shows is due to btp->bt_io_count == -1. bt_io_count
represents the count of in-flight asynchronous buffers and thus
should always be >= 0. xfs_wait_buftarg() waits for this value to
stabilize to zero in order to ensure that all untracked (with
respect to the lru) buffers have completed I/O processing before
unmount proceeds to tear down in-core data structures.

The value of -1 implies an I/O accounting decrement race. Indeed,
the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
(where the buffer lock is no longer held) means that bp->b_flags can
be updated from an unsafe context. While a user-level reproducer is
currently not available, some intrusive hacks to run racing buffer
lookups/ioacct/releases from multiple threads was used to
successfully manufacture this problem.

Existing callers do not expect to acquire the buffer lock from
xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
this context. It turns out that we already have separate buffer
state bits and associated serialization for dealing with buffer LRU
state in the form of ->b_state and ->b_lock. Therefore, replace the
_XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
accounting wrappers appropriately and make sure they are used with
the correct locking. This ensures that buffer in-flight state can be
modified at buffer release time without racing with modifications
from a buffer lock holder.

Fixes: 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against unmount")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agodm: make flush bios explicitly sync
Jan Kara [Wed, 31 May 2017 07:44:32 +0000 (09:44 +0200)]
dm: make flush bios explicitly sync

Commit b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as
synchronous") removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
7 years agoALSA: usb: Avoid VLA in mixer_us16x08.c
Takashi Iwai [Tue, 30 May 2017 21:24:45 +0000 (23:24 +0200)]
ALSA: usb: Avoid VLA in mixer_us16x08.c

This is another attempt to work around the VLA used in
mixer_us16x08.c.  Basically the temporary array is used individually
for two cases, and we can declare locally in each block, instead of
hackish max() usage.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years agoALSA: usb: Fix a typo in Tascam US-16x08 mixer element
Takashi Iwai [Tue, 30 May 2017 21:21:07 +0000 (23:21 +0200)]
ALSA: usb: Fix a typo in Tascam US-16x08 mixer element

A mixer element created in a quirk for Tascam US-16x08 contains a
typo: it should be "EQ MidLow Q" instead of "EQ MidQLow Q".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
Fixes: d2bb390a2081 ("ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years agoRevert "ALSA: usb-audio: purge needless variable length array"
Takashi Iwai [Tue, 30 May 2017 07:23:41 +0000 (09:23 +0200)]
Revert "ALSA: usb-audio: purge needless variable length array"

This reverts commit 89b593c30e83 ("ALSA: usb-audio: purge needless
variable length array").  The patch turned out to cause a severe
regression, triggering an Oops at snd_usb_ctl_msg().  It was overseen
that snd_usb_ctl_msg() writes back the response to the given buffer,
while the patch changed it to a read-only const buffer.  (One should
always double-check when an extra pointer cast is present...)

As a simple fix, just revert the affected commit.  It was merely a
cleanup.  Although it brings VLA again, it's clearer as a fix.  We'll
address the VLA later in another patch.

Fixes: 89b593c30e83 ("ALSA: usb-audio: purge needless variable length array")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 years ago"Yes, people use FOLL_FORCE ;)"
Linus Torvalds [Tue, 30 May 2017 19:38:59 +0000 (12:38 -0700)]
"Yes, people use FOLL_FORCE ;)"

This effectively reverts commit 8ee74a91ac30 ("proc: try to remove use
of FOLL_FORCE entirely")

It turns out that people do depend on FOLL_FORCE for the /proc/<pid>/mem
case, and we're talking not just debuggers. Talking to the affected people, the use-cases are:

Keno Fischer:
 "We used these semantics as a hardening mechanism in the julia JIT. By
  opening /proc/self/mem and using these semantics, we could avoid
  needing RWX pages, or a dual mapping approach. We do have fallbacks to
  these other methods (though getting EIO here actually causes an assert
  in released versions - we'll updated that to make sure to take the
  fall back in that case).

  Nevertheless the /proc/self/mem approach was our favored approach
  because it a) Required an attacker to be able to execute syscalls
  which is a taller order than getting memory write and b) didn't double
  the virtual address space requirements (as a dual mapping approach
  would).

  I think in general this feature is very useful for anybody who needs
  to precisely control the execution of some other process. Various
  debuggers (gdb/lldb/rr) certainly fall into that category, but there's
  another class of such processes (wine, various emulators) which may
  want to do that kind of thing.

  Now, I suspect most of these will have the other process under ptrace
  control, so maybe allowing (same_mm || ptraced) would be ok, but at
  least for the sandbox/remote-jit use case, it would be perfectly
  reasonable to not have the jit server be a ptracer"

Robert O'Callahan:
 "We write to readonly code and data mappings via /proc/.../mem in lots
  of different situations, particularly when we're adjusting program
  state during replay to match the recorded execution.

  Like Julia, we can add workarounds, but they could be expensive."

so not only do people use FOLL_FORCE for both reads and writes, but they
use it for both the local mm and remote mm.

With these comments in mind, we likely also cannot add the "are we
actively ptracing" check either, so this keeps the new code organization
and does not do a real revert that would add back the original comment
about "Maybe we should limit FOLL_FORCE to actual ptrace users?"

Reported-by: Keno Fischer <keno@juliacomputing.com>
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoblk-mq: Take tagset lock when updating hw queues
Keith Busch [Tue, 30 May 2017 18:39:11 +0000 (14:39 -0400)]
blk-mq: Take tagset lock when updating hw queues

The tagset lock needs to be held when iterating the tag_list, so a
lockdep assert was added when updating number of hardware queues. The
drivers calling this API, however, were unaware of the new requirement,
so are failing the assertion.

This patch takes the lock within the blk-mq function so the drivers do
not have to be modified in order to be safe.

Fixes: 705cda97e ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
Reported-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoKVM: SVM: ignore type when setting segment registers
Gioh Kim [Tue, 30 May 2017 13:24:45 +0000 (15:24 +0200)]
KVM: SVM: ignore type when setting segment registers

Commit 19bca6ab75d8 ("KVM: SVM: Fix cross vendor migration issue with
unusable bit") added checking type when setting unusable.
So unusable can be set if present is 0 OR type is 0.
According to the AMD processor manual, long mode ignores the type value
in segment descriptor. And type can be 0 if it is read-only data segment.
Therefore type value is not related to unusable flag.

This patch is based on linux-next v4.12.0-rc3.

Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoKVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
Radim Krčmář [Fri, 19 May 2017 13:48:51 +0000 (15:48 +0200)]
KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging

kvm_skip_emulated_instruction() will return 0 if userspace is
single-stepping the guest.

kvm_skip_emulated_instruction() uses return status convention of exit
handler: 0 means "exit to userspace" and 1 means "continue vm entries".
The problem is that nested_vmx_check_vmptr() return status means
something else: 0 is ok, 1 is error.

This means we would continue executing after a failure.  Static checker
noticed it because vmptr was not initialized.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6affcbedcac7 ("KVM: x86: Add kvm_skip_emulated_instruction and use it.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: don't leak nbd_config
Ilya Dryomov [Tue, 23 May 2017 15:49:55 +0000 (17:49 +0200)]
nbd: don't leak nbd_config

nbd_config is allocated in nbd_alloc_config(), but never freed.

Fixes: 5ea8d10802ec ("nbd: separate out the config information")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agonbd: nbd_reset() call in nbd_dev_add() is redundant
Ilya Dryomov [Tue, 23 May 2017 15:49:54 +0000 (17:49 +0200)]
nbd: nbd_reset() call in nbd_dev_add() is redundant

There is nothing to clear -- nbd_device has just been allocated.
Fold nbd_reset() into its other caller, nbd_config_put().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agodrivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off
Wei Huang [Tue, 30 May 2017 10:56:22 +0000 (11:56 +0100)]
drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off

We saw perf IRQ init failures when running Linux kernel in an ACPI
guest without PMU (i.e. pmu=off). This is because perf IRQ is not
present when pmu=off, but arm_pmu_acpi still tries to register
or unregister GSI. This patch addresses the problem by checking
gicc->performance_interrupt. If it is 0, which is the value set
by qemu when pmu=off, we skip the IRQ register/unregister process.

[    4.069470] bc00: 0000000000040b00 ffff0000089db190
[    4.070267] [<ffff000008134f80>] enable_percpu_irq+0xdc/0xe4
[    4.071192] [<ffff000008667cc4>] arm_perf_starting_cpu+0x108/0x10c
[    4.072200] [<ffff0000080cbdd4>] cpuhp_invoke_callback+0x14c/0x4ac
[    4.073210] [<ffff0000080ccd3c>] cpuhp_thread_fun+0xd4/0x11c
[    4.074132] [<ffff0000080f1394>] smpboot_thread_fn+0x1b4/0x1c4
[    4.075081] [<ffff0000080ec90c>] kthread+0x10c/0x138
[    4.075921] [<ffff0000080833c0>] ret_from_fork+0x10/0x50
[    4.076947] genirq: Setting trigger mode 4 for irq 43 failed
(gic_set_type+0x0/0x74)

Signed-off-by: Wei Huang <wei@redhat.com>
[will: add comment justifying deviation from ACPI spec, removed redundant hunk]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
7 years agorcar-dmac: fixup descriptor pointer for descriptor mode
Kuninori Morimoto [Tue, 23 May 2017 07:08:43 +0000 (07:08 +0000)]
rcar-dmac: fixup descriptor pointer for descriptor mode

In descriptor mode, the descriptor running pointer is not maintained
by the interrupt handler, thus, driver finds the running descriptor
from the descriptor pointer field in the CHCRB register.
But, CHCRB::DPTR indicates *next* descriptor pointer, not current.
Thus, The residue calculation will be missed. This patch fixup it.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
7 years agoMerge tag 'pinctrl-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Mon, 29 May 2017 17:05:19 +0000 (10:05 -0700)]
Merge tag 'pinctrl-v4.12-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Here is an overdue pull request for pin control fixes, the most
  prominent feature is to make Intel Chromebooks (and I suspect any
  other Cherryview-based Intel thing) happy again, which we really want
  to see.

  There is a patch hitting drivers/firmware/* that I was uncertain to
  who actually manages, but I got Andy Shevchenko's and Dmitry Torokov's
  review tags on it and I trust them both 100% to do the right thing for
  Intel platform drivers.

  Summary:

   - Make a few Intel Chromebooks with Cherryview DMI firmware work
     smoothly.

   - A fix for some bogus allocations in the generic group management
     code.

   - Some GPIO descriptor lookup table stubs. Merged through the pin
     control tree for administrative reasons.

   - Revert the "bi-directional" and "output-enable" generic properties:
     we need more discussions around this. It seems other SoCs are using
     input/output gate enablement and these terms are not correct.

   - Fix mux and drive strength atomically in the MXS driver.

   - Fix the SPDIF function on sunxi A83T.

   - OF table terminators and other small fixes"

* tag 'pinctrl-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: Fix SPDIF function name for A83T
  pinctrl: mxs: atomically switch mux and drive strength config
  pinctrl: cherryview: Extend the Chromebook DMI quirk to Intel_Strago systems
  firmware: dmi: Add DMI_PRODUCT_FAMILY identification string
  pinctrl: core: Fix warning by removing bogus code
  gpiolib: Add stubs for gpiod lookup table interface
  Revert "pinctrl: generic: Add bi-directional and output-enable"
  pinctrl: cherryview: Add terminate entry for dmi_system_id tables

7 years agokthread: fix boot hang (regression) on MIPS/OpenRISC
Vegard Nossum [Mon, 29 May 2017 07:22:07 +0000 (09:22 +0200)]
kthread: fix boot hang (regression) on MIPS/OpenRISC

This fixes a regression in commit 4d6501dce079 where I didn't notice
that MIPS and OpenRISC were reinitialising p->{set,clear}_child_tid to
NULL after our initialisation in copy_process().

We can simply get rid of the arch-specific initialisation here since it
is now always done in copy_process() before hitting copy_thread{,_tls}().

Review notes:

 - As far as I can tell, copy_process() is the only user of
   copy_thread_tls(), which is the only caller of copy_thread() for
   architectures that don't implement copy_thread_tls().

 - After this patch, there is no arch-specific code touching
   p->set_child_tid or p->clear_child_tid whatsoever.

 - It may look like MIPS/OpenRISC wanted to always have these fields be
   NULL, but that's not true, as copy_process() would unconditionally
   set them again _after_ calling copy_thread_tls() before commit
   4d6501dce079.

Fixes: 4d6501dce079c1eb6bf0b1d8f528a5e81770109e ("kthread: Fix use-after-free if kthread fork fails")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net> # MIPS only
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoovl: filter trusted xattr for non-admin
Miklos Szeredi [Mon, 29 May 2017 13:15:27 +0000 (15:15 +0200)]
ovl: filter trusted xattr for non-admin

Filesystems filter out extended attributes in the "trusted." domain for
unprivlieged callers.

Overlay calls underlying filesystem's method with elevated privs, so need
to do the filtering in overlayfs too.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
7 years agoHID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
Hans de Goede [Tue, 9 May 2017 08:04:36 +0000 (10:04 +0200)]
HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices

For ACPI devices which do not have a _PSC method, the ACPI subsys cannot
query their initial state at boot, so these devices are assumed to have
been put in D0 by the BIOS, but for touchscreens that is not always true.

This commit adds a call to acpi_device_fix_up_power to explicitly put
devices without a _PSC method into D0 state (for devices with a _PSC
method it is a nop). Note we only need to do this on probe, after a
resume the ACPI subsys knows the device is in D3 and will properly
put it in D0.

This fixes the SIS0817 i2c-hid touchscreen on a Peaq C1010 2-in-1
device failing to probe with a "hid_descr_cmd failed" error.

Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
7 years agodrm/i915: Detect USB-C specific dongles before reducing M and N
Jani Nikula [Thu, 18 May 2017 11:10:25 +0000 (14:10 +0300)]
drm/i915: Detect USB-C specific dongles before reducing M and N

The Analogix 7737 DP to HDMI converter requires reduced M and N values
when to operate correctly at HBR2. We tried to reduce the M/N values for
all devices in commit 9a86cda07af2 ("drm/i915/dp: reduce link M/N
parameters"), but that regressed some other sinks. Detect this IC by its
OUI value of 0x0022B9 via the DPCD quirk list, and only reduce the M/N
values for that.

v2 by Jani: Rebased on the DP quirk database

v3 by Jani: Rebased on the reworked DP quirk database

v4 by Jani: Improve commit message (Daniel)

Fixes: 9a86cda07af2 ("drm/i915/dp: reduce link M/N parameters")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93578
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100755
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/2d2e30f8f47d3f28c9b74ca2612336a54585c3ec.1495105635.git.jani.nikula@intel.com
7 years agodrm/dp: start a DPCD based DP sink/branch device quirk database
Jani Nikula [Thu, 18 May 2017 11:10:24 +0000 (14:10 +0300)]
drm/dp: start a DPCD based DP sink/branch device quirk database

Face the fact, there are Display Port sink and branch devices out there
in the wild that don't follow the Display Port specifications, or they
have bugs, or just otherwise require special treatment. Start a common
quirk database the drivers can query based on the DP device
identification. At least for now, we leave the workarounds for the
drivers to implement as they see fit.

For starters, add a branch device that can't handle full 24-bit main
link Mdiv and Ndiv main link attributes properly. Naturally, the
workaround of reducing main link attributes for all devices ended up in
regressions for other devices. So here we are.

v2: Rebase on DRM DP desc read helpers

v3: Fix the OUI memcmp blunder (Clint)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Clint Taylor <clinton.a.taylor@intel.com>
Cc: Adam Jackson <ajax@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Tested-by: Clinton Taylor <clinton.a.taylor@intel.com>
Reviewed-by: Clinton Taylor <clinton.a.taylor@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> # v2
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/91ec198dd95258dbf3bee2f6be739e0da73b4fdd.1495105635.git.jani.nikula@intel.com
7 years agodrm/i915: use drm DP helper to read DPCD desc
Jani Nikula [Thu, 18 May 2017 11:10:23 +0000 (14:10 +0300)]
drm/i915: use drm DP helper to read DPCD desc

Switch to using the common DP helpers instead of using our own.

v2: also remove leftover struct intel_dp_desc (Daniel)

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
7 years agodrm/dp: add helper for reading DP sink/branch device desc from DPCD
Jani Nikula [Thu, 18 May 2017 11:10:22 +0000 (14:10 +0300)]
drm/dp: add helper for reading DP sink/branch device desc from DPCD

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/acba54da7d80eafea9e59a893e27e3c31028c0ba.1495105635.git.jani.nikula@intel.com
7 years agoovl: mark upper merge dir with type origin entries "impure"
Amir Goldstein [Wed, 24 May 2017 12:29:33 +0000 (15:29 +0300)]
ovl: mark upper merge dir with type origin entries "impure"

An upper dir is marked "impure" to let ovl_iterate() know that this
directory may contain non pure upper entries whose d_ino may need to be
read from the origin inode.

We already mark a non-merge dir "impure" when moving a non-pure child
entry inside it, to let ovl_iterate() know not to iterate the non-merge
dir directly.

Mark also a merge dir "impure" when moving a non-pure child entry inside
it and when copying up a child entry inside it.

This can be used to optimize ovl_iterate() to perform a "pure merge" of
upper and lower directories, merging the content of the directories,
without having to read d_ino from origin inodes.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
7 years agorbd: implement REQ_OP_WRITE_ZEROES
Ilya Dryomov [Mon, 22 May 2017 17:59:24 +0000 (19:59 +0200)]
rbd: implement REQ_OP_WRITE_ZEROES

Commit 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2a5a9 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.

rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned.  This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now.  Caveats:

- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
  current implementations - nvme and loop

- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
  is hence less helpful than blk_bio_discard_split(), but this can (and
  should) be fixed on the rbd side

In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.

Fixes: 93c1defedcae ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 years agox86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
Borislav Petkov [Sun, 28 May 2017 09:03:42 +0000 (11:03 +0200)]
x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning

... to raw_smp_processor_id() to not trip the

  BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1

check. The reasoning behind it is that __warn() already uses the raw_
variants but the show_regs() path on 32-bit doesn't.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170528092212.fiod7kygpjm23m3o@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agox86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug
Borislav Petkov [Sun, 28 May 2017 20:04:14 +0000 (22:04 +0200)]
x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug

With CONFIG_DEBUG_PREEMPT enabled, I get:

  BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
  caller is debug_smp_processor_id
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2
  Call Trace:
   dump_stack
   check_preemption_disabled
   debug_smp_processor_id
   save_microcode_in_initrd_amd
   ? microcode_init
   save_microcode_in_initrd
   ...

because, well, it says it above, we're using smp_processor_id() in
preemptible code.

But passing the CPU number is not really needed. It is only used to
determine whether we're on the BSP, and, if so, to save the microcode
patch for early loading.

 [ We don't absolutely need to do it on the BSP but we do that
   customarily there. ]

Instead, convert that function parameter to a boolean which denotes
whether the patch should be saved or not, thereby avoiding the use of
smp_processor_id() in preemptible code.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170528200414.31305-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agodrm/exynos: clean up description of exynos_drm_crtc
Inki Dae [Mon, 29 May 2017 00:59:05 +0000 (09:59 +0900)]
drm/exynos: clean up description of exynos_drm_crtc

This patch removes unnecessary descriptions on
exynos_drm_crtc structure and adds one description
which specifies what pipe_clk member does.

pipe_clk support had been added by below patch without any description,
 drm/exynos: add support for pipeline clock to the framework
Commit-id : f26b9343f582f44ec920474d71b4b2220b1ed9a8

Signed-off-by: Inki Dae <inki.dae@samsung.com>
7 years agodrm/exynos: dsi: Remove bridge node reference in removal
Hoegeun Kwon [Fri, 26 May 2017 01:02:01 +0000 (10:02 +0900)]
drm/exynos: dsi: Remove bridge node reference in removal

Since bridge node is referenced during in the probe, it should be
released on removal.

Suggested-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
7 years agodrm/exynos: dsi: Fix the parse_dt function
Hoegeun Kwon [Thu, 13 Apr 2017 06:05:26 +0000 (15:05 +0900)]
drm/exynos: dsi: Fix the parse_dt function

The dsi + panel is a parental relationship, so OF grpah is not needed.
Therefore, the current dsi_parse_dt function will throw an error,
because there is no linked OF graph for the case fimd + dsi + panel.

Parse the Pll burst and esc clock frequency properties in dsi_parse_dt()
and create a bridge_node only if there is an OF graph associated with dsi.

Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Andi Shyti <andi.shyti@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
7 years agodrm/exynos: Merge pre/postclose hooks
Daniel Vetter [Wed, 8 Mar 2017 14:12:53 +0000 (15:12 +0100)]
drm/exynos: Merge pre/postclose hooks

Again no apparent explanation for the split except hysterical raisins.

Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
7 years agoLinux 4.12-rc3
Linus Torvalds [Mon, 29 May 2017 00:20:53 +0000 (17:20 -0700)]
Linux 4.12-rc3

7 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux...
Linus Torvalds [Sun, 28 May 2017 23:18:27 +0000 (16:18 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal SoC management fixes from Eduardo Valentin:

 - fixes to TI SoC driver, Broadcom, qoriq

 - small sparse warning fix on thermal core

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: broadcom: ns-thermal: default on iProc SoCs
  ti-soc-thermal: Fix a typo in a comment line
  ti-soc-thermal: Delete error messages for failed memory allocations in ti_bandgap_build()
  ti-soc-thermal: Use devm_kcalloc() in ti_bandgap_build()
  thermal: core: make thermal_emergency_poweroff static
  thermal: qoriq: remove useless call for of_thermal_get_trip_points()

7 years agomtk-vcodec: Use designated initializers
Kees Cook [Sat, 6 May 2017 08:10:06 +0000 (01:10 -0700)]
mtk-vcodec: Use designated initializers

The randstruct plugin requires designated initializers for structures
that are entirely function pointers.

Cc: Wu-Cheng Li <wuchengli@chromium.org>
Cc: Tiffany Lin <tiffany.lin@mediatek.com>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
7 years agodrm/amd/powerplay: Use designated initializers
Kees Cook [Sat, 6 May 2017 08:09:00 +0000 (01:09 -0700)]
drm/amd/powerplay: Use designated initializers

The randstruct plugin requires designated initializers for structures
that are entirely function pointers.

Cc: Christian König <christian.koenig@amd.com>
Cc: Eric Huang <JinHuiEric.Huang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
7 years agodrm/amdgpu: Use designated initializers
Kees Cook [Sat, 6 May 2017 07:54:07 +0000 (00:54 -0700)]
drm/amdgpu: Use designated initializers

The randstruct plugin requires structures that are entirely function
pointers be initialized using designated initializers.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
7 years agosgi-xp: Use designated initializers
Kees Cook [Wed, 5 Apr 2017 05:07:10 +0000 (22:07 -0700)]
sgi-xp: Use designated initializers

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. In this case, no initializers
are needed (they can be NULL initialized and callers adjusted to check
for NULL, which is more efficient than an indirect call).

Cc: Robin Holt <robinmholt@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
7 years agoocfs2: Use ERR_CAST() to avoid cross-structure cast
Kees Cook [Mon, 8 May 2017 21:49:27 +0000 (14:49 -0700)]
ocfs2: Use ERR_CAST() to avoid cross-structure cast

When trying to propagate an error result, the error return path attempts
to retain the error, but does this with an open cast across very different
types, which the upcoming structure layout randomization plugin flags as
being potentially dangerous in the face of randomization. This is a false
positive, but what this code actually wants to do is use ERR_CAST() to
retain the error value.

Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
7 years agontfs: Use ERR_CAST() to avoid cross-structure cast
Kees Cook [Mon, 8 May 2017 21:45:26 +0000 (14:45 -0700)]
ntfs: Use ERR_CAST() to avoid cross-structure cast

When trying to propagate an error result, the error return path attempts
to retain the error, but does this with an open cast across very different
types, which the upcoming structure layout randomization plugin flags as
being potentially dangerous in the face of randomization. This is a false
positive, but what this code actually wants to do is use ERR_CAST() to
retain the error value.

Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
7 years agoNFS: Use ERR_CAST() to avoid cross-structure cast
Kees Cook [Wed, 5 Apr 2017 00:08:42 +0000 (17:08 -0700)]
NFS: Use ERR_CAST() to avoid cross-structure cast

When the call to nfs_devname() fails, the error path attempts to retain
the error via the mnt variable, but this requires a cast across very
different types (char * to struct vfsmount *), which the upcoming
structure layout randomization plugin flags as being potentially
dangerous in the face of randomization. This is a false positive, but
what this code actually wants to do is retain the error value, so this
patch explicitly sets it, instead of using what seems to be an
unexpected cast.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 years agoefi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
Dave Young [Fri, 26 May 2017 11:36:51 +0000 (12:36 +0100)]
efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot

Sabrina Dubroca reported an early panic:

  BUG: unable to handle kernel paging request at ffffffffff240001
  IP: efi_bgrt_init+0xdc/0x134

  [...]

  ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

... which was introduced by:

  7b0a911478c7 ("efi/x86: Move the EFI BGRT init code to early init code")

The cause is that on this machine the firmware provides the EFI ACPI BGRT
table even on legacy non-EFI bootups - which table should be EFI only.

The garbage BGRT data causes the efi_bgrt_init() panic.

Add a check to skip efi_bgrt_init() in case non-EFI bootup to work around
this firmware bug.

Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.11+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 7b0a911478c7 ("efi/x86: Move the EFI BGRT init code to early init code")
Link: http://lkml.kernel.org/r/20170526113652.21339-6-matt@codeblueprint.co.uk
[ Rewrote the changelog to be more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agox86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
Baoquan He [Fri, 26 May 2017 11:36:50 +0000 (12:36 +0100)]
x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled

For EFI with the 'efi=old_map' kernel option specified, the kernel will panic
when KASLR is enabled:

  BUG: unable to handle kernel paging request at 000000007febd57e
  IP: 0x7febd57e
  PGD 1025a067
  PUD 0

  Oops: 0010 [#1] SMP
  Call Trace:
   efi_enter_virtual_mode()
   start_kernel()
   x86_64_start_reservations()
   x86_64_start_kernel()
   start_cpu()

The root cause is that the identity mapping is not built correctly
in the 'efi=old_map' case.

On 'nokaslr' kernels, PAGE_OFFSET is 0xffff880000000000 which is PGDIR_SIZE
aligned. We can borrow the PUD table from the direct mappings safely. Given a
physical address X, we have pud_index(X) == pud_index(__va(X)).

However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical
address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry
from direct mapping to build identity mapping, instead we need to copy the
PUD entries one by one from the direct mapping.

Fix it.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Frank Ramsay <frank.ramsay@hpe.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-5-matt@codeblueprint.co.uk
[ Fixed and reworded the changelog and code comments to be more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agox86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
Sai Praneeth [Fri, 26 May 2017 11:36:49 +0000 (12:36 +0100)]
x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map

Booting kexec kernel with "efi=old_map" in kernel command line hits
kernel panic as shown below.

 BUG: unable to handle kernel paging request at ffff88007fe78070
 IP: virt_efi_set_variable.part.7+0x63/0x1b0
 PGD 7ea28067
 PUD 7ea2b067
 PMD 7ea2d067
 PTE 0
 [...]
 Call Trace:
  virt_efi_set_variable()
  efi_delete_dummy_variable()
  efi_enter_virtual_mode()
  start_kernel()
  x86_64_start_reservations()
  x86_64_start_kernel()
  start_cpu()

[ efi=old_map was never intended to work with kexec. The problem with
  using efi=old_map is that the virtual addresses are assigned from the
  memory region used by other kernel mappings; vmalloc() space.
  Potentially there could be collisions when booting kexec if something
  else is mapped at the virtual address we allocated for runtime service
  regions in the initial boot - Matt Fleming ]

Since kexec was never intended to work with efi=old_map, disable
runtime services in kexec if booted with efi=old_map, so that we don't
panic.

Tested-by: Lee Chun-Yi <jlee@suse.com>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-4-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoefi: Remove duplicate 'const' specifiers
Arnd Bergmann [Fri, 26 May 2017 11:36:48 +0000 (12:36 +0100)]
efi: Remove duplicate 'const' specifiers

gcc-7 shows these harmless warnings:

  drivers/firmware/efi/libstub/secureboot.c:19:27: error: duplicate 'const' declaration specifier [-Werror=duplicate-decl-specifier]
   static const efi_char16_t const efi_SecureBoot_name[] = {
  drivers/firmware/efi/libstub/secureboot.c:22:27: error: duplicate 'const' declaration specifier [-Werror=duplicate-decl-specifier]

Removing one of the specifiers gives us the expected behavior.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: de8cb458625c ("efi: Get and store the secure boot status")
Link: http://lkml.kernel.org/r/20170526113652.21339-3-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoefi: Don't issue error message when booted under Xen
Juergen Gross [Fri, 26 May 2017 11:36:47 +0000 (12:36 +0100)]
efi: Don't issue error message when booted under Xen

When booted as Xen dom0 there won't be an EFI memmap allocated. Avoid
issuing an error message in this case:

  [    0.144079] efi: Failed to allocate new EFI memmap

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agodrm/msm: Fix the check for the command size
Jordan Crouse [Mon, 8 May 2017 20:34:58 +0000 (14:34 -0600)]
drm/msm: Fix the check for the command size

The overrun check for the size of submitted commands is off by one.
It should allow the offset plus the size to be equal to the
size of the memory object when the command stream is very tightly
constructed.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agodrm/msm: Take the mutex before calling msm_gem_new_impl
Jordan Crouse [Mon, 8 May 2017 20:34:57 +0000 (14:34 -0600)]
drm/msm: Take the mutex before calling msm_gem_new_impl

Amongst its other duties, msm_gem_new_impl adds the newly created
GEM object to the shared inactive list which may also be actively
modifiying the list during submission.  All the paths to modify
the list are protected by the mutex except for the one through
msm_gem_import which can end up causing list corruption.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[add extra WARN_ON(!mutex_is_locked(&dev->struct_mutex))]
Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agodrm/msm: for array in-fences, check if all backing fences are from our own context...
Philipp Zabel [Fri, 17 Mar 2017 18:38:40 +0000 (19:38 +0100)]
drm/msm: for array in-fences, check if all backing fences are from our own context before waiting

Use the dma_fence_match_context helper to check if all backing fences
are from our own context, in which case we don't have to wait.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Gustavo Padovan <gustavo.padovan@collabora.com>
[rebased on code-motion]
Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agodrm/msm: constify irq_domain_ops
Tobias Klauser [Wed, 24 May 2017 16:12:19 +0000 (18:12 +0200)]
drm/msm: constify irq_domain_ops

struct irq_domain_ops is not modified, so it can be made const.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Rob Clark <robdclark@gmail.com>