Omar Sandoval [Thu, 10 May 2018 00:29:24 +0000 (17:29 -0700)]
sbitmap: warn if using smaller shallow depth than was setup
Make sure the user passed the right value to
sbitmap_queue_min_shallow_depth().
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 19:55:14 +0000 (13:55 -0600)]
kyber-iosched: update shallow depth when setting up hardware queue
We don't expect the async depth to be smaller than the wake batch
count for sbitmap, but just in case, inform sbitmap of what shallow
depth kyber may use.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 21:26:55 +0000 (15:26 -0600)]
bfq-iosched: update shallow depth to smallest one used
If our shallow depth is smaller than the wake batching of sbitmap,
we can introduce hangs. Ensure that sbitmap knows how low we'll go.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Thu, 10 May 2018 00:16:31 +0000 (17:16 -0700)]
sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()
The sbitmap queue wake batch is calculated such that once allocations
start blocking, all of the bits which are already allocated must be
enough to fulfill the batch counters of all of the waitqueues. However,
the shallow allocation depth can break this invariant, since we block
before our full depth is being utilized. Add
sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
the sbq will use, and update sbq_calc_wake_batch() to take it into
account.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 21:25:22 +0000 (15:25 -0600)]
bfq-iosched: remove unused variable
bfqd->sb_shift was attempted used as a cache for the sbitmap queue
shift, but we don't need it, as it never changes. Kill it with fire.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 19:27:21 +0000 (13:27 -0600)]
bfq: calculate shallow depths at init time
It doesn't change, so don't put it in the per-IO hot path.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 19:12:10 +0000 (13:12 -0600)]
bfq-iosched: don't worry about reserved tags in limit_depth
Reserved tags are used for error handling, we don't need to
care about them for regular IO. The core won't call us for these
anyway.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 19:28:50 +0000 (13:28 -0600)]
blk-mq: don't call into depth limiting for reserved tags
It's not useful, they are internal and/or error handling recovery
commands.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Fri, 4 May 2018 17:17:01 +0000 (19:17 +0200)]
block, bfq: postpone rq preparation to insert or merge
When invoked for an I/O request rq, the prepare_request hook of bfq
increments reference counters in the destination bfq_queue for rq. In
this respect, after this hook has been invoked, rq may still be
transformed into a request with no icq attached, i.e., for bfq, a
request not associated with any bfq_queue. No further hook is invoked
to signal this tranformation to bfq (in general, to the destination
elevator for rq). This leads bfq into an inconsistent state, because
bfq has no chance to correctly lower these counters back. This
inconsistency may in its turn cause incorrect scheduling and hangs. It
certainly causes memory leaks, by making it impossible for bfq to free
the involved bfq_queue.
On the bright side, no transformation can still happen for rq after rq
has been inserted into bfq, or merged with another, already inserted,
request. Exploiting this fact, this commit addresses the above issue
by delaying the preparation of an I/O request to when the request is
inserted or merged.
This change also gives a performance bonus: a lock-contention point
gets removed. To prepare a request, bfq needs to hold its scheduler
lock. After postponing request preparation to insertion or merging, no
lock needs to be grabbed any longer in the prepare_request hook, while
the lock already taken to perform insertion or merging is used to
preparare the request as well.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christophe JAILLET [Thu, 10 May 2018 07:27:31 +0000 (09:27 +0200)]
mtip32xx: Fix an error handling path in 'mtip_pci_probe()'
Branch to the right label in the error handling path in order to keep it
logical.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
SeongJae Park [Thu, 3 May 2018 09:53:26 +0000 (18:53 +0900)]
brd: Mark as non-rotational
This commit sets QUEUE_FLAG_NONROT and clears up QUEUE_FLAG_ADD_RANDOM
to mark the ramdisks as non-rotational device.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:53 +0000 (02:08 -0700)]
block: consolidate struct request timestamp fields
Currently, struct request has four timestamp fields:
- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq
These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:52 +0000 (02:08 -0700)]
block: move blk_stat_add() to __blk_mq_end_request()
We want this next to blk_account_io_done() for the next change so that
we can call ktime_get() only once for both.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:51 +0000 (02:08 -0700)]
block: use ktime_get_ns() instead of sched_clock() for cfq and bfq
cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit
28f4197e5d47 ("block: disable preemption before
using sched_clock()").
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:50 +0000 (02:08 -0700)]
block: get rid of struct blk_issue_stat
struct blk_issue_stat squashes three things into one u64:
- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling
It turns out that on x86_64, we have a 4 byte hole in struct request
which we can fill with the non-timestamp fields from blk_issue_stat,
simplifying things quite a bit.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:49 +0000 (02:08 -0700)]
block: replace bio->bi_issue_stat with bio-specific type
struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
specific for bios. The helpers work the same way as the blk-stats
helpers.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:48 +0000 (02:08 -0700)]
block: pass struct request instead of struct blk_issue_stat to wbt
issue_stat is going to go away, so first make writeback throttling take
the containing request, update the internal wbt helpers accordingly, and
change rwb->sync_cookie to be the request pointer instead of the
issue_stat pointer. No functional change.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Omar Sandoval [Wed, 9 May 2018 09:08:47 +0000 (02:08 -0700)]
block: move some wbt helpers to blk-wbt.c
A few helpers are only used from blk-wbt.c, so move them there, and put
wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation
for changing how the wbt flags are tracked.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 7 May 2018 16:03:23 +0000 (10:03 -0600)]
blk-wbt: throttle discards like background writes
Throttle discards like we would any background write. Discards should
be background activity, so if they are impacting foreground IO, then
we will throttle them down.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 7 May 2018 15:57:08 +0000 (09:57 -0600)]
blk-wbt: pass in enum wbt_flags to get_rq_wait()
This is in preparation for having more write queues, in which
case we would have needed to pass in more information than just
a simple 'is_kswapd' boolean.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 3 May 2018 15:14:57 +0000 (09:14 -0600)]
blk-wbt: account any writing command as a write
We currently special case WRITE and FLUSH, but we should really
just include any command with the write bit set. This ensures
that we account DISCARD.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 May 2018 21:09:41 +0000 (15:09 -0600)]
block: break discard submissions into the user defined size
Don't build discards bigger than what the user asked for, if the
user decided to limit the size by writing to 'discard_max_bytes'.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tetsuo Handa [Fri, 4 May 2018 16:58:09 +0000 (10:58 -0600)]
loop: remember whether sysfs_create_group() was done
syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.
[1] https://syzkaller.appspot.com/bug?id=
3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+9f03168400f56df89dbc6f1751f4458fe739ff29@syzkaller.appspotmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Renamed sysfs_ready -> sysfs_inited.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thomas Gleixner [Fri, 4 May 2018 14:32:47 +0000 (16:32 +0200)]
block: Shorten interrupt disabled regions
Commit
9c40cef2b799 ("sched: Move blk_schedule_flush_plug() out of
__schedule()") moved the blk_schedule_flush_plug() call out of the
interrupt/preempt disabled region in the scheduler. This allows to replace
local_irq_save/restore(flags) by local_irq_disable/enable() in
blk_flush_plug_list().
But it makes more sense to disable interrupts explicitly when the request
queue is locked end reenable them when the request to is unlocked. This
shortens the interrupt disabled section which is important when the plug
list contains requests for more than one queue. The comment which claims
that disabling interrupts around the loop is misleading as the called
functions can reenable interrupts unconditionally anyway and obfuscates the
scope badly:
local_irq_save(flags);
spin_lock(q->queue_lock);
...
queue_unplugged(q...);
scsi_request_fn();
spin_unlock_irq(q->queue_lock);
-------------------^^^ ????
spin_lock_irq(q->queue_lock);
spin_unlock(q->queue_lock);
local_irq_restore(flags);
Aside of that the detached interrupt disabling is a constant pain for
PREEMPT_RT as it requires patching and special casing when RT is enabled
while with the spin_*_irq() variants this happens automatically.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Anna-Maria Gleixner [Fri, 4 May 2018 14:32:46 +0000 (16:32 +0200)]
block: Remove redundant WARN_ON()
Commit
2fff8a924d4c ("block: Check locking assumptions at runtime") added a
lockdep_assert_held(q->queue_lock) which makes the WARN_ON() redundant
because lockdep will detect and warn about context violations.
The unconditional WARN_ON() does not provide real additional value, so it
can be removed.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sebastian Andrzej Siewior [Fri, 4 May 2018 14:32:45 +0000 (16:32 +0200)]
block: don't disable interrupts during kmap_atomic()
bounce_copy_vec() disables interrupts around kmap_atomic(). This is a
leftover from the old kmap_atomic() implementation which relied on fixed
mapping slots, so the caller had to make sure that the same slot could not
be reused from an interrupting context.
kmap_atomic() was changed to dynamic slots long ago and commit
1ec9c5ddc17a
("include/linux/highmem.h: remove the second argument of k[un]map_atomic()")
removed the slot assignements, but the callers were not checked for now
redundant interrupt disabling.
Remove the conditional interrupt disable.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Florian La Roche [Sun, 6 May 2018 17:34:07 +0000 (19:34 +0200)]
Fix typo in comment.
CONFIG_PRREMPT -> CONFIG_PREEMPT
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 7 May 2018 15:33:29 +0000 (05:33 -1000)]
Merge tag 'devicetree-fixes-for-4.17' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- fix path to display timing binding
- fix some typos in interrupt-names and clock-names
- fix a resource leak on overlay removal
- add missing documentation for R8A77965 DMA, serial, and net
- cleanup sunxi pinctrl description
- add Kieback & Peter GmbH vendor prefix
* tag 'devicetree-fixes-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: panel: lvds: Fix path to display timing bindings
dt-bindings: mvebu-uart: DT fix s/interrupts-names/interrupt-names/
dt-bindings: meson-uart: DT fix s/clocks-names/clock-names/
of: overlay: Stop leaking resources on overlay removal
dtc: checks: drop warning for missing PCI bridge bus-range
dt-bindings: dmaengine: rcar-dmac: document R8A77965 support
dt-bindings: serial: sh-sci: Add support for r8a77965 (H)SCIF
dt-bindings: net: ravb: Add support for r8a77965 SoC
dt-bindings: pinctrl: sunxi: Fix reference to driver
doc: Add vendor prefix for Kieback & Peter GmbH
Linus Torvalds [Mon, 7 May 2018 02:57:38 +0000 (16:57 -1000)]
Linux 4.17-rc4
Linus Torvalds [Sun, 6 May 2018 15:46:29 +0000 (05:46 -1000)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pll KVM fixes from Radim Krčmář:
"ARM:
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
x86:
- Speed up injection of expired timers (for stable)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove APIC Timer periodic/oneshot spikes
arm64: vgic-v2: Fix proxying of cpuif access
KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Linus Torvalds [Sun, 6 May 2018 15:42:24 +0000 (05:42 -1000)]
Merge tag 'iommu-fixes-v4.17-rc4' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- fix a compile warning in the AMD IOMMU driver with irq remapping
disabled
- fix for VT-d interrupt remapping and invalidation size (caused a
BUG_ON when trying to invalidate more than 4GB)
- build fix and a regression fix for broken graphics with old DTS for
the rockchip iommu driver
- a revert in the PCI window reservation code which fixes a regression
with VFIO.
* tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: rockchip: fix building without CONFIG_OF
iommu/vt-d: Use WARN_ON_ONCE instead of BUG_ON in qi_flush_dev_iotlb()
iommu/vt-d: fix shift-out-of-bounds in bug checking
iommu/dma: Move PCI window region reservation back into dma specific path.
iommu/rockchip: Make clock handling optional
iommu/amd: Hide unused iommu_table_lock
iommu/vt-d: Fix usage of force parameter in intel_ir_reconfigure_irte()
Linus Torvalds [Sun, 6 May 2018 15:37:24 +0000 (05:37 -1000)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"Unbreak the CPUID CPUID_8000_0008_EBX reload which got dropped when
the evaluation of physical and virtual bits which uses the same CPUID
leaf was moved out of get_cpu_cap()"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Restore CPUID_8000_0008_EBX reload
Linus Torvalds [Sun, 6 May 2018 15:35:23 +0000 (05:35 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull clocksource fixes from Thomas Gleixner:
"The recent addition of the early TSC clocksource breaks on machines
which have an unstable TSC because in case that TSC is disabled, then
the clocksource selection logic falls back to the early TSC which is
obviously bogus.
That also unearthed a few robustness issues in the clocksource
derating code which are addressed as well"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Rework stale comment
clocksource: Consistent de-rate when marking unstable
x86/tsc: Fix mark_tsc_unstable()
clocksource: Initialize cs->wd_list
clocksource: Allow clocksource_mark_unstable() on unregistered clocksources
x86/tsc: Always unregister clocksource_tsc_early
Linus Torvalds [Sun, 6 May 2018 15:34:06 +0000 (05:34 -1000)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix to prevent false positives in the spurious interrupt
detector when more than a single demultiplex register is evaluated in
the Qualcom irq combiner driver"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/qcom: Fix check for spurious interrupts
Linus Torvalds [Sun, 6 May 2018 03:30:58 +0000 (17:30 -1000)]
Merge tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
- We missed a case in the Dell config dependencies resulting in a
possible bad configuration, resolve it by giving up on trying to keep
DELL_LAPTOP visible in the menu and make it depend on DELL_SMBIOS.
- Fix a null pointer dereference at module unload for the asus-wireless
driver.
* tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: Kconfig: Fix dell-laptop dependency chain.
platform/x86: asus-wireless: Fix NULL pointer dereference
Linus Torvalds [Sun, 6 May 2018 03:28:08 +0000 (17:28 -1000)]
Merge tag 'usb-4.17-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB driver fixes for 4.17-rc4.
The majority of them are some USB gadget fixes that missed my last
pull request. The "largest" patch in here is a fix for the old visor
driver that syzbot found 6 months or so ago and I finally remembered
to fix it.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: host: ehci: Use dma_pool_zalloc()"
usb: typec: tps6598x: handle block reads separately with plain-I2C adapters
usb: typec: tcpm: Release the role mux when exiting
USB: Accept bulk endpoints with 1024-byte maxpacket
xhci: Fix use-after-free in xhci_free_virt_device
USB: serial: visor: handle potential invalid device configuration
USB: serial: option: adding support for ublox R410M
usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
usb: musb: host: fix potential NULL pointer dereference
usb: gadget: composite Allow for larger configuration descriptors
usb: dwc3: gadget: Fix list_del corruption in dwc3_ep_dequeue
usb: dwc3: gadget: dwc3_gadget_del_and_unmap_request() can be static
usb: dwc2: pci: Fix error return code in dwc2_pci_probe()
usb: dwc2: WA for Full speed ISOC IN in DDMA mode.
usb: dwc2: dwc2_vbus_supply_init: fix error check
usb: gadget: f_phonet: fix pn_net_xmit()'s return type
Anthoine Bourgeois [Sun, 29 Apr 2018 22:05:58 +0000 (22:05 +0000)]
KVM: x86: remove APIC Timer periodic/oneshot spikes
Since the commit "
8003c9ae204e: add APIC Timer periodic/oneshot mode VMX
preemption timer support", a Windows 10 guest has some erratic timer
spikes.
Here the results on a 150000 times 1ms timer without any load:
Before
8003c9ae204e | After
8003c9ae204e
Max 1834us | 86000us
Mean 1100us | 1021us
Deviation 59us | 149us
Here the results on a 150000 times 1ms timer with a cpu-z stress test:
Before
8003c9ae204e | After
8003c9ae204e
Max 32000us | 140000us
Mean 1006us | 1997us
Deviation 140us | 11095us
The root cause of the problem is starting hrtimer with an expiry time
already in the past can take more than 20 milliseconds to trigger the
timer function. It can be solved by forward such past timers
immediately, rather than submitting them to hrtimer_start().
In case the timer is periodic, update the target expiration and call
hrtimer_start with it.
v2: Check if the tsc deadline is already expired. Thank you Mika.
v3: Execute the past timers immediately rather than submitting them to
hrtimer_start().
v4: Rearm the periodic timer with advance_periodic_target_expiration() a
simpler version of set_target_expiration(). Thank you Paolo.
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com>
8003c9ae204e ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Sat, 5 May 2018 21:05:31 +0000 (23:05 +0200)]
Merge tag 'kvmarm-fixes-for-4.17-2' of git://git./linux/kernel/git/kvmarm/kvmarm
KVM/arm fixes for 4.17, take #2
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
Linus Torvalds [Sat, 5 May 2018 07:15:25 +0000 (21:15 -1000)]
Merge tag 'kbuild-fixes-v4.17' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- remove state comment in modpost
- extend MAINTAINERS entry to cover modpost and more makefiles
- fix missed building of SANCOV gcc-plugin
- replace left-over 'bison' with $(YACC)
- display short log when generating parer of genksyms
* tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
genksyms: fix typo in parse.tab.{c,h} generation rules
kbuild: replace hardcoded bison in cmd_bison_h with $(YACC)
gcc-plugins: fix build condition of SANCOV plugin
MAINTAINERS: Update Kbuild entry with a few paths
modpost: delete stale comment
Linus Torvalds [Sat, 5 May 2018 07:12:06 +0000 (21:12 -1000)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes froom Stephen Boyd:
"A handful of fixes for the stm32mp1 clk driver came in during the
merge window for the driver that got merged in the merge window.
Plus a warning fix for unused PM ops and a couple fixes for the meson
clk driver clk names that went unnoticed with the regmap rework.
There's also another fix in here for the mux rounding flag which
wasn't doing what it said it did, but now it does"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: meson: meson8b: fix meson8b_cpu_clk parent clock name
clk: meson: meson8b: fix meson8b_fclk_div3_div clock name
clk: meson: drop meson_aoclk_gate_regmap_ops
clk: meson: honor CLK_MUX_ROUND_CLOSEST in clk_regmap
clk: honor CLK_MUX_ROUND_CLOSEST in generic clk mux
clk: cs2000: mark resume function as __maybe_unused
clk: stm32mp1: remove ck_apb_dbg clock
clk: stm32mp1: set stgen_k clock as critical
clk: stm32mp1: add missing tzc2 clock
clk: stm32mp1: fix SAI3 & SAI4 clocks
clk: stm32mp1: remove unused dfsdm_src[] const
clk: stm32mp1: add missing static
Linus Torvalds [Sat, 5 May 2018 07:07:43 +0000 (21:07 -1000)]
Merge tag 'rproc-v4.17-1' of git://github.com/andersson/remoteproc
Pull remoteproc and rpmsg fixes from Bjorn Andersson:
- fix screw-up when reversing boolean for rproc_stop()
- add missing OF node refcounting dereferences
- add missing MODULE_ALIAS in rpmsg_char
* tag 'rproc-v4.17-1' of git://github.com/andersson/remoteproc:
rpmsg: added MODULE_ALIAS for rpmsg_char
remoteproc: qcom: Fix potential device node leaks
remoteproc: fix crashed parameter logic on stop call
Linus Torvalds [Sat, 5 May 2018 07:05:12 +0000 (21:05 -1000)]
Merge tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"vmwgfx, i915, vc4, vga dac fixes.
This seems eerily quiet, so I expect it will explode next week or
something.
One i915 model firmware, two vmwgfx fixes, one vc4 fix and one bridge
leak fix"
* tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux:
drm/bridge: vga-dac: Fix edid memory leak
drm/vc4: Make sure vc4_bo_{inc,dec}_usecnt() calls are balanced
drm/i915/glk: Add MODULE_FIRMWARE for Geminilake
drm/vmwgfx: Fix a buffer object leak
drm/vmwgfx: Clean up fbdev modeset locking
Linus Torvalds [Sat, 5 May 2018 06:57:28 +0000 (20:57 -1000)]
Merge tag 'trace-v4.17-rc1-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Some of the files in the tracing directory show file mode 0444 when
they are writable by root. To fix the confusion, they should be 0644.
Note, either case root can still write to them.
Zhengyuan asked why I never applied that patch (the first one is from
2014!). I simply forgot about it. /me lowers head in shame"
* tag 'trace-v4.17-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix the file mode of stack tracer
ftrace: Have set_graph_* files have normal file modes
Linus Torvalds [Sat, 5 May 2018 06:51:10 +0000 (20:51 -1000)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"This is our first pull request of the rc cycle. It's not that it's
been overly quiet, we were just waiting on a few things before sending
this off.
For instance, the 6 patch series from Intel for the hfi1 driver had
actually been pulled in on Tuesday for a Wednesday pull request, only
to have Jason notice something I missed, so we held off for some
testing, and then on Thursday had to respin the series because the
very first patch needed a minor fix (unnecessary cast is all).
There is a sizable hns patch series in here, as well as a reasonably
largish hfi1 patch series, then all of the lines of uapi updates are
just the change to the new official Linux-OpenIB SPDX tag (a bunch of
our files had what amounts to a BSD-2-Clause + MIT Warranty statement
as their license as a result of the initial code submission years ago,
and the SPDX folks decided it was unique enough to warrant a unique
tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
and core/cache/cma updates to round out the bunch.
None of it was overly large by itself, but in the 2 1/2 weeks we've
been collecting patches, it has added up :-/.
As best I can tell, it's been through 0day (I got a notice about my
last for-next push, but not for my for-rc push, but Jason seems to
think that failure messages are prioritized and success messages not
so much). It's also been through linux-next. And yes, we did notice in
the context portion of the CMA query gid fix patch that there is a
dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
and remove it anywhere we can.
Summary:
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
- SPDX license tag cleanups (new tag Linux-OpenIB)
- RoCE GID fixes related to default GIDs
- Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
mlx4, mlx5, and hfi1 (medium batch)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
RDMA/cma: Do not query GID during QP state transition to RTR
IB/mlx4: Fix integer overflow when calculating optimal MTT size
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
IB/hfi1: Fix loss of BECN with AHG
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Fix handling of FECN marked multicast packet
IB/core: Make ib_mad_client_id atomic
iw_cxgb4: Atomically flush per QP HW CQEs
IB/uverbs: Fix kernel crash during MR deregistration flow
IB/uverbs: Prevent reregistration of DM_MR to regular MR
RDMA/mlx4: Add missed RSS hash inner header flag
RDMA/hns: Fix a couple misspellings
RDMA/hns: Submit bad wr
RDMA/hns: Update assignment method for owner field of send wqe
RDMA/hns: Adjust the order of cleanup hem table
RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
RDMA/hns: Remove some unnecessary attr_mask judgement
RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
...
Linus Torvalds [Sat, 5 May 2018 06:41:44 +0000 (20:41 -1000)]
Merge tag 'for-linus-
20180504' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should to into this release. This contains:
- Set of bcache fixes from Coly, fixing regression in patches that
went into this series.
- Set of NVMe fixes by way of Keith.
- Set of bdi related fixes, one from Jan and two from Tetsuo Handa,
fixing various issues around device addition/removal.
- Two block inflight fixes from Omar, fixing issues around the
transition to using tags for blk-mq inflight accounting that we
did a few releases ago"
* tag 'for-linus-
20180504' of git://git.kernel.dk/linux-block:
bdi: Fix oops in wb_workfn()
nvmet: switch loopback target state to connecting when resetting
nvme/multipath: Fix multipath disabled naming collisions
nvme/multipath: Disable runtime writable enabling parameter
nvme: Set integrity flag for user passthrough commands
nvme: fix potential memory leak in option parsing
bdi: Fix use after free bug in debugfs_remove()
bdi: wake up concurrent wb_shutdown() callers.
bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
bcache: set dc->io_disable to true in conditional_stop_bcache_device()
bcache: add wait_for_kthread_stop() in bch_allocator_thread()
bcache: count backing device I/O error for writeback I/O
bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
bcache: store disk name in struct cache and struct cached_dev
blk-mq: fix sysfs inflight counter
blk-mq: count allocated but not started requests in iostats inflight
Linus Torvalds [Sat, 5 May 2018 06:36:50 +0000 (20:36 -1000)]
Merge tag 'xfs-4.17-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I've got one more bug fix for xfs for 4.17-rc4, which caps the amount
of data we try to handle in one dedupe request so that userspace can't
livelock the kernel.
This series has been run through a full xfstests run during the week
and through a quick xfstests run against this morning's master, with
no ajor failures reported.
Summary:
- Cap the maximum length of a deduplication request at MAX_RW_COUNT/2
to avoid kernel livelock due to excessively large IO requests"
* tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: cap the length of deduplication requests
Linus Torvalds [Sat, 5 May 2018 06:32:18 +0000 (20:32 -1000)]
Merge tag 'for-4.17-rc3-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two regression fixes and one fix for stable"
* tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: send, fix missing truncate for inode with prealloc extent past eof
btrfs: Take trans lock before access running trans in check_delayed_ref
btrfs: Fix wrong first_key parameter in replace_path
Mauro Rossi [Tue, 24 Apr 2018 11:08:18 +0000 (20:08 +0900)]
genksyms: fix typo in parse.tab.{c,h} generation rules
'quet' is replaced by 'quiet' in scripts/genksyms/Makefile
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Tue, 24 Apr 2018 11:07:13 +0000 (20:07 +0900)]
kbuild: replace hardcoded bison in cmd_bison_h with $(YACC)
Commit
73a4f6dbe70a ("kbuild: add LEX and YACC variables") missed to
update cmd_bison_h somehow.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Fri, 13 Apr 2018 05:06:10 +0000 (14:06 +0900)]
gcc-plugins: fix build condition of SANCOV plugin
Since commit
d677a4d60193 ("Makefile: support flag
-fsanitizer-coverage=trace-cmp"), you miss to build the SANCOV
plugin under some circumstances.
CONFIG_KCOV=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
Your compiler does not support -fsanitize-coverage=trace-pc
Your compiler does not support -fsanitize-coverage=trace-cmp
Under this condition, $(CFLAGS_KCOV) is not empty but contains a
space, so the following ifeq-conditional is false.
ifeq ($(CFLAGS_KCOV),)
Then, scripts/Makefile.gcc-plugins misses to add sancov_plugin.so to
gcc-plugin-y while the SANCOV plugin is necessary as an alternative
means.
Fixes: d677a4d60193 ("Makefile: support flag -fsanitizer-coverage=trace-cmp")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Rasmus Villemoes [Thu, 22 Mar 2018 20:58:27 +0000 (21:58 +0100)]
MAINTAINERS: Update Kbuild entry with a few paths
I managed to send some modpost patches to old addresses of both
Masahiro and Michal, and omitted linux-kbuild from cc, because my
tried and trusted scripts/get_maintainer wrapper failed me. Add the
modpost directory to the MAINTAINERS entry, and while at it make the
Makefile glob match scripts/Makefile itself, and add one matching the
Kbuild.include file as well.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Greg Kroah-Hartman [Fri, 4 May 2018 21:38:32 +0000 (14:38 -0700)]
Merge tag 'usb-serial-4.17-rc4' of https://git./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.17-rc4
Here's a fix for a long-standing issue in the visor driver, which could
have security implications. Included is also a new modem device id.
Both commits have been in linux-next for a couple of days with no
reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
Greg Kroah-Hartman [Fri, 4 May 2018 21:35:12 +0000 (14:35 -0700)]
Revert "usb: host: ehci: Use dma_pool_zalloc()"
This reverts commit
22072e83ebd510fb6a090aef9d65ccfda9b1e7e4 as it is
broken.
Alan writes:
What you can't see just from reading the patch is that in both
cases (ehci->itd_pool and ehci->sitd_pool) there are two
allocation paths -- the two branches of an "if" statement -- and
only one of the paths calls dma_pool_[z]alloc. However, the
memset is needed for both paths, and so it can't be eliminated.
Given that it must be present, there's no advantage to calling
dma_pool_zalloc rather than dma_pool_alloc.
Reported-by: Erick Cafferata <erick@cafferata.me>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mario Limonciello [Fri, 20 Apr 2018 17:42:11 +0000 (12:42 -0500)]
platform/x86: Kconfig: Fix dell-laptop dependency chain.
As reported by Randy Dunlap:
>> WARNING: unmet direct dependencies detected for DELL_SMBIOS
>> Depends on [m]: X86 [=y] && X86_PLATFORM_DEVICES [=y]
>> && (DCDBAS [=m] ||
>> DCDBAS [=m]=n) && (ACPI_WMI [=n] || ACPI_WMI [=n]=n)
>> Selected by [y]:
>> - DELL_LAPTOP [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y]
>> && DMI [=y]
>> && BACKLIGHT_CLASS_DEVICE [=y] && (ACPI_VIDEO [=n] ||
>> ACPI_VIDEO [=n]=n)
>> && (RFKILL [=n] || RFKILL [=n]=n) && SERIO_I8042 [=y]
>>
Right now it's possible to set dell laptop to compile in but this
causes dell-smbios to compile in which breaks if dcdbas is a module.
Dell laptop shouldn't select dell-smbios anymore, but depend on it.
Fixes: 32d7b19bad96 (platform/x86: dell-smbios: Resolve dependency error on DCDBAS)
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
João Paulo Rechi Vita [Thu, 19 Apr 2018 14:04:34 +0000 (07:04 -0700)]
platform/x86: asus-wireless: Fix NULL pointer dereference
When the module is removed the led workqueue is destroyed in the remove
callback, before the led device is unregistered from the led subsystem.
This leads to a NULL pointer derefence when the led device is
unregistered automatically later as part of the module removal cleanup.
Bellow is the backtrace showing the problem.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: __queue_work+0x8c/0x410
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
Modules linked in: ccm edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 joydev crypto_simd asus_nb_wmi glue_helper uvcvideo snd_hda_codec_conexant snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel asus_wmi snd_hda_codec cryptd snd_hda_core sparse_keymap videobuf2_vmalloc arc4 videobuf2_memops snd_hwdep input_leds videobuf2_v4l2 ath9k psmouse videobuf2_core videodev ath9k_common snd_pcm ath9k_hw media fam15h_power ath k10temp snd_timer mac80211 i2c_piix4 r8169 mii mac_hid cfg80211 asus_wireless(-) snd soundcore wmi shpchp 8250_dw ip_tables x_tables amdkfd amd_iommu_v2 amdgpu radeon chash i2c_algo_bit drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ahci ttm libahci drm video
CPU: 3 PID: 2177 Comm: rmmod Not tainted 4.15.0-5-generic #6+dev94.b4287e5bem1-Endless
Hardware name: ASUSTeK COMPUTER INC. X555DG/X555DG, BIOS 5.011 05/05/2015
RIP: 0010:__queue_work+0x8c/0x410
RSP: 0018:
ffffbe8cc249fcd8 EFLAGS:
00010086
RAX:
ffff992ac6810800 RBX:
0000000000000000 RCX:
0000000000000008
RDX:
0000000000000000 RSI:
0000000000000008 RDI:
ffff992ac6400e18
RBP:
ffffbe8cc249fd18 R08:
ffff992ac6400db0 R09:
0000000000000000
R10:
0000000000000040 R11:
ffff992ac6400dd8 R12:
0000000000002000
R13:
ffff992abd762e00 R14:
ffff992abd763e38 R15:
000000000001ebe0
FS:
00007f318203e700(0000) GS:
ffff992aced80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
00000001c720e000 CR4:
00000000001406e0
Call Trace:
queue_work_on+0x38/0x40
led_state_set+0x2c/0x40 [asus_wireless]
led_set_brightness_nopm+0x14/0x40
led_set_brightness+0x37/0x60
led_trigger_set+0xfc/0x1d0
led_classdev_unregister+0x32/0xd0
devm_led_classdev_release+0x11/0x20
release_nodes+0x109/0x1f0
devres_release_all+0x3c/0x50
device_release_driver_internal+0x16d/0x220
driver_detach+0x3f/0x80
bus_remove_driver+0x55/0xd0
driver_unregister+0x2c/0x40
acpi_bus_unregister_driver+0x15/0x20
asus_wireless_driver_exit+0x10/0xb7c [asus_wireless]
SyS_delete_module+0x1da/0x2b0
entry_SYSCALL_64_fastpath+0x24/0x87
RIP: 0033:0x7f3181b65fd7
RSP: 002b:
00007ffe74bcbe18 EFLAGS:
00000206 ORIG_RAX:
00000000000000b0
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f3181b65fd7
RDX:
000000000000000a RSI:
0000000000000800 RDI:
0000555ea2559258
RBP:
0000555ea25591f0 R08:
00007ffe74bcad91 R09:
000000000000000a
R10:
0000000000000000 R11:
0000000000000206 R12:
0000000000000003
R13:
00007ffe74bcae00 R14:
0000000000000000 R15:
0000555ea25591f0
Code: 01 00 00 02 0f 85 7d 01 00 00 48 63 45 d4 48 c7 c6 00 f4 fa 87 49 8b 9d 08 01 00 00 48 03 1c c6 4c 89 f7 e8 87 fb ff ff 48 85 c0 <48> 8b 3b 0f 84 c5 01 00 00 48 39 f8 0f 84 bc 01 00 00 48 89 c7
RIP: __queue_work+0x8c/0x410 RSP:
ffffbe8cc249fcd8
CR2:
0000000000000000
---[ end trace
7aa4f4a232e9c39c ]---
Unregistering the led device on the remove callback before destroying the
workqueue avoids this problem.
https://bugzilla.kernel.org/show_bug.cgi?id=196097
Reported-by: Dun Hum <bitter.taste@gmx.com>
Cc: stable@vger.kernel.org
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Linus Torvalds [Fri, 4 May 2018 15:47:21 +0000 (05:47 -1000)]
Merge tag 'for-linus-4.17-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen cleanup from Juergen Gross:
"One cleanup to remove VLAs from the kernel"
* tag 'for-linus-4.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: Remove use of VLAs
James Morse [Fri, 4 May 2018 15:19:24 +0000 (16:19 +0100)]
arm64: vgic-v2: Fix proxying of cpuif access
Proxying the cpuif accesses at EL2 makes use of vcpu_data_guest_to_host
and co, which check the endianness, which call into vcpu_read_sys_reg...
which isn't mapped at EL2 (it was inlined before, and got moved OoL
with the VHE optimizations).
The result is of course a nice panic. Let's add some specialized
cruft to keep the broken platforms that require this hack alive.
But, this code used vcpu_data_guest_to_host(), which expected us to
write the value to host memory, instead we have trapped the guest's
read or write to an mmio-device, and are about to replay it using the
host's readl()/writel() which also perform swabbing based on the host
endianness. This goes wrong when both host and guest are big-endian,
as readl()/writel() will undo the guest's swabbing, causing the
big-endian value to be written to device-memory.
What needs doing?
A big-endian guest will have pre-swabbed data before storing, undo this.
If its necessary for the host, writel() will re-swab it.
For a read a big-endian guest expects to swab the data after the load.
The hosts's readl() will correct for host endianness, giving us the
device-memory's value in the register. For a big-endian guest, swab it
as if we'd only done the load.
For a little-endian guest, nothing needs doing as readl()/writel() leave
the correct device-memory value in registers.
Tested on Juno with that rarest of things: a big-endian 64K host.
Based on a patch from Marc Zyngier.
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Fixes: bf8feb39642b ("arm64: KVM: vgic-v2: Add GICV access from HYP")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Valentin Schneider [Wed, 2 May 2018 10:53:03 +0000 (11:53 +0100)]
KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
One comment still mentioned process_maintenance operations after
commit
af0614991ab6 ("KVM: arm/arm64: vgic: Get rid of unnecessary
process_maintenance operation")
Update the comment to point to vgic_fold_lr_state instead, which
is where maintenance interrupts are taken care of.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
James Morse [Wed, 2 May 2018 11:17:02 +0000 (12:17 +0100)]
KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
A typo in kvm_vcpu_set_be()'s call:
| vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr)
causes us to use the 32bit register value as an index into the sys_reg[]
array, and sail off the end of the linear map when we try to bring up
big-endian secondaries.
| Unable to handle kernel paging request at virtual address
ffff80098b982c00
| Mem abort info:
| ESR = 0x96000045
| Exception class = DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| Data abort info:
| ISV = 0, ISS = 0x00000045
| CM = 0, WnR = 1
| swapper pgtable: 4k pages, 48-bit VAs, pgdp =
000000002ea0571a
| [
ffff80098b982c00] pgd=
00000009ffff8803, pud=
0000000000000000
| Internal error: Oops:
96000045 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 2 PID: 1561 Comm: kvm-vcpu-0 Not tainted
4.17.0-rc3-00001-ga912e2261ca6-dirty #1323
| Hardware name: ARM Juno development board (r1) (DT)
| pstate:
60000005 (nZCv daif -PAN -UAO)
| pc : vcpu_write_sys_reg+0x50/0x134
| lr : vcpu_write_sys_reg+0x50/0x134
| Process kvm-vcpu-0 (pid: 1561, stack limit = 0x000000006df4728b)
| Call trace:
| vcpu_write_sys_reg+0x50/0x134
| kvm_psci_vcpu_on+0x14c/0x150
| kvm_psci_0_2_call+0x244/0x2a4
| kvm_hvc_call_handler+0x1cc/0x258
| handle_hvc+0x20/0x3c
| handle_exit+0x130/0x1ec
| kvm_arch_vcpu_ioctl_run+0x340/0x614
| kvm_vcpu_ioctl+0x4d0/0x840
| do_vfs_ioctl+0xc8/0x8d0
| ksys_ioctl+0x78/0xa8
| sys_ioctl+0xc/0x18
| el0_svc_naked+0x30/0x34
| Code:
73620291 604d00b0 00201891 1ab10194 (
957a33f8)
|---[ end trace
4b4a4f9628596602 ]---
Fix the order of the arguments.
Fixes: 8d404c4c24613 ("KVM: arm64: Rewrite system register accessors to read/write functions")
CC: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Linus Torvalds [Fri, 4 May 2018 15:44:50 +0000 (05:44 -1000)]
Merge tag 'pm-4.17-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"This fixes a regression from the 4.14 cycle in the CPPC cpufreq driver
causing it to use an incorrect transition delay value which leads to a
very high rate of frequency change requests when the schedutil
governor is in use (Prashanth Prakash)"
* tag 'pm-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq / CPPC: Set platform specific transition_delay_us
Linus Torvalds [Fri, 4 May 2018 15:43:33 +0000 (05:43 -1000)]
Merge tag 'acpi-4.17-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This fixes an ACPICA utilities (acpidump) build regression from the
4.16 cycle by setting LD in the CFLAGS passed to the linker to $(CC)
again (Jiri Slaby)"
* tag 'acpi-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools: power/acpi, revert to LD = gcc
Linus Torvalds [Fri, 4 May 2018 15:38:51 +0000 (05:38 -1000)]
Merge tag 'media/v4.17-4' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a trivial one-line fix addressing a PTR_ERR() getting value from a
wrong var at imx driver
- a patch changing my e-mail at the Kernel tree to mchehab@kernel.org.
no code changes
* tag 'media/v4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
MAINTAINERS & files: Canonize the e-mails I use at files
media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
Linus Torvalds [Fri, 4 May 2018 15:37:22 +0000 (05:37 -1000)]
Merge tag 'sound-4.17-rc4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, all deserved for stable.
Two are about core API fixes for the bugs that were triggered by
ever-growing fuzzers, while others are driver-specific fixes"
* tag 'sound-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Check PCM state at xfern compat ioctl
ALSA: aloop: Add missing cable lock to ctl API callbacks
ALSA: dice: fix kernel NULL pointer dereference due to invalid calculation for array index
ALSA: seq: Fix races at MIDI encoding in snd_virmidi_output_trigger()
ALSA: hda - Fix incorrect usage of IS_REACHABLE()
Mauro Carvalho Chehab [Wed, 25 Apr 2018 09:34:48 +0000 (05:34 -0400)]
MAINTAINERS & files: Canonize the e-mails I use at files
From now on, I'll start using my @kernel.org as my development e-mail.
As such, let's remove the entries that point to the old
mchehab@s-opensource.com at MAINTAINERS file.
For the files written with a copyright with mchehab@s-opensource,
let's keep Samsung on their names, using mchehab+samsung@kernel.org,
in order to keep pointing to my employer, with sponsors the work.
For the files written before I join Samsung (on July, 4 2013),
let's just use mchehab@kernel.org.
For bug reports, we can simply point to just kernel.org, as
this will reach my mchehab+samsung inbox anyway.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Brian Warner <brian.warner@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
From: Gustavo A. R. Silva [Mon, 16 Apr 2018 17:28:56 +0000 (13:28 -0400)]
media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
Fix inconsistent IS_ERR and PTR_ERR in imx_csi_probe.
The proper pointer to be passed as argument is pinctrl
instead of priv->vdev.
This issue was detected with the help of Coccinelle.
Fixes: 52e17089d185 ("media: imx: Don't initialize vars that won't be used")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Jiri Slaby [Tue, 24 Apr 2018 07:43:44 +0000 (09:43 +0200)]
tools: power/acpi, revert to LD = gcc
Commit
7ed1c1901fe5 (tools: fix cross-compile var clobbering) removed
setting of LD to $(CROSS_COMPILE)gcc. This broke build of acpica
(acpidump) in power/acpi:
ld: unrecognized option '-D_LINUX'
The tools pass CFLAGS to the linker (incl. -D_LINUX), so revert this
particular change and let LD be $(CC) again. Note that the old behaviour
was a bit different, it used $(CROSS_COMPILE)gcc which was eliminated by
the commit
7ed1c1901fe5. We use $(CC) for that reason.
Fixes: 7ed1c1901fe5 (tools: fix cross-compile var clobbering)
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Fri, 4 May 2018 05:26:51 +0000 (19:26 -1000)]
Merge tag 'linux-kselftest-4.17-rc4' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This Kselftest update for 4.17-rc4 consists of a fix for a syntax
error in the script that runs selftests. Mathieu Desnoyers found this
bug in the script on systems running GNU Make 3.8 or older"
* tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Fix lib.mk run_tests target shell script
Linus Torvalds [Fri, 4 May 2018 04:57:03 +0000 (18:57 -1000)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Various sockmap fixes from John Fastabend (pinned map handling,
blocking in recvmsg, double page put, error handling during redirect
failures, etc.)
2) Fix dead code handling in x86-64 JIT, from Gianluca Borello.
3) Missing device put in RDS IB code, from Dag Moxnes.
4) Don't process fast open during repair mode in TCP< from Yuchung
Cheng.
5) Move address/port comparison fixes in SCTP, from Xin Long.
6) Handle add a bond slave's master into a bridge properly, from
Hangbin Liu.
7) IPv6 multipath code can operate on unitialized memory due to an
assumption that the icmp header is in the linear SKB area. Fix from
Eric Dumazet.
8) Don't invoke do_tcp_sendpages() recursively via TLS, from Dave
Watson.
9) Fix memory leaks in x86-64 JIT, from Daniel Borkmann.
10) RDS leaks kernel memory to userspace, from Eric Dumazet.
11) DCCP can invoke a tasklet on a freed socket, take a refcount. Also
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
dccp: fix tasklet usage
smc: fix sendpage() call
net/smc: handle unregistered buffers
net/smc: call consolidation
qed: fix spelling mistake: "offloded" -> "offloaded"
net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
tcp: restore autocorking
rds: do not leak kernel memory to user land
qmi_wwan: do not steal interfaces from class drivers
ipv4: fix fnhe usage by non-cached routes
bpf: sockmap, fix error handling in redirect failures
bpf: sockmap, zero sg_size on error when buffer is released
bpf: sockmap, fix scatterlist update on error path in send with apply
net_sched: fq: take care of throttled flows before reuse
ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"
bpf, x64: fix memleak when not converging on calls
bpf, x64: fix memleak when not converging after image
net/smc: restrict non-blocking connect finish
8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
sctp: fix the issue that the cookie-ack with auth can't get processed
...
Linus Torvalds [Fri, 4 May 2018 04:31:19 +0000 (18:31 -1000)]
Merge branch 'parisc-4.17-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Fix two section mismatches, convert to read_persistent_clock64(), add
further documentation regarding the HPMC crash handler and make
bzImage the default build target"
* 'parisc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix section mismatches
parisc: drivers.c: Fix section mismatches
parisc: time: Convert read_persistent_clock() to read_persistent_clock64()
parisc: Document rules regarding checksum of HPMC handler
parisc: Make bzImage default build target
Dave Airlie [Fri, 4 May 2018 00:03:27 +0000 (10:03 +1000)]
Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Two fixes for now, one for a long standing problem uncovered by a commit
in the 4.17 merge window, one for a regression introduced by a previous
bugfix, Cc'd stable.
* 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Fix a buffer object leak
drm/vmwgfx: Clean up fbdev modeset locking
Jan Kara [Thu, 3 May 2018 16:26:26 +0000 (18:26 +0200)]
bdi: Fix oops in wb_workfn()
Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:
mod_delayed_work(bdi_wq, &wb->dwork, 0);
and if this happens while wb_shutdown() is waiting in:
flush_delayed_work(&wb->dwork);
the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.
Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.
CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: Tejun Heo <tj@kernel.org>
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Parav Pandit [Wed, 2 May 2018 10:18:59 +0000 (13:18 +0300)]
RDMA/cma: Do not query GID during QP state transition to RTR
When commit [1] was added, SGID was queried to derive the SMAC address.
Then, later on during a refactor [2], SMAC was no longer needed. However,
the now useless GID query remained. Then during additional code changes
later on, the GID query was being done in such a way that it caused iWARP
queries to start breaking. Remove the useless GID query and resolve the
iWARP breakage at the same time.
This is discussed in [3].
[1] commit
dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
[2] commit
5c266b2304fb ("IB/cm: Remove the usage of smac and vid of qp_attr and cm_av")
[3] https://www.spinics.net/lists/linux-rdma/msg63951.html
Suggested-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Wed, 2 May 2018 10:04:25 +0000 (13:04 +0300)]
IB/mlx4: Fix integer overflow when calculating optimal MTT size
When the kernel was compiled using the UBSAN option,
we saw the following stack trace:
[ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27
[ 1184.828114] signed integer overflow:
[ 1184.828247] -
2147483648 - 1 cannot be represented in type 'int'
The problem was caused by calling round_up in procedure
mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack
trace) with the second parameter (1 << block_shift) (which is an int).
The second parameter should have been (1ULL << block_shift) (which
is an unsigned long long).
(1 << block_shift) is treated by the compiler as an int (because 1 is
an integer).
Now, local variable block_shift is initialized to 31.
If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-
214748368.
This is the most negative int value.
Inside the round_up macro, there is a cast applied to ((1 << 31) - 1).
However, this cast is applied AFTER ((1 << 31) - 1) is calculated.
Since (1 << 31) is treated as an int, we get the negative overflow
identified by UBSAN in the process of calculating ((1 << 31) - 1).
The fix is to change (1 << block_shift) to (1ULL << block_shift) on
line 349.
Fixes: 9901abf58368 ("IB/mlx4: Use optimal numbers of MTT entries")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Tue, 1 May 2018 12:36:13 +0000 (05:36 -0700)]
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
When IRQ affinity is set and the interrupt type is unknown, a cpu
mask allocated within the function is never freed. Fix this memory
leak by allocating memory within the scope where it is used.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Tue, 1 May 2018 12:36:06 +0000 (05:36 -0700)]
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
When allocating device data, if there's an allocation failure, the
already allocated memory won't be freed such as per-cpu counters.
Fix memory leaks in exception path by creating a common reentrant
clean up function hfi1_clean_devdata() to be used at driver unload
time and device data allocation failure.
To accomplish this, free_platform_config() and clean_up_i2c() are
changed to be reentrant to remove dependencies when they are called
in different order. This helps avoid NULL pointer dereferences
introduced by this patch if those two functions weren't reentrant.
In addition, set dd->int_counter, dd->rcv_limit,
dd->send_schedule and dd->tx_opstats to NULL after they're freed in
hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Tue, 1 May 2018 12:35:58 +0000 (05:35 -0700)]
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
When an invalid num_vls is used as a module parameter, the code
execution follows an exception path where the macro dd_dev_err()
expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This
causes a NULL pointer dereference.
Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev
earlier in the code. If a dd exists, then dd->pcidev and
dd->pcidev->dev always exists.
BUG: unable to handle kernel NULL pointer dereference
at
00000000000000f0
IP: __dev_printk+0x15/0x90
Workqueue: events work_for_cpu_fn
RIP: 0010:__dev_printk+0x15/0x90
Call Trace:
dev_err+0x6c/0x90
? hfi1_init_pportdata+0x38d/0x3f0 [hfi1]
hfi1_init_dd+0xdd/0x2530 [hfi1]
? pci_conf1_read+0xb2/0xf0
? pci_read_config_word.part.9+0x64/0x80
? pci_conf1_write+0xb0/0xf0
? pcie_capability_clear_and_set_word+0x57/0x80
init_one+0x141/0x490 [hfi1]
local_pci_probe+0x3f/0xa0
work_for_cpu_fn+0x10/0x20
process_one_work+0x152/0x350
worker_thread+0x1cf/0x3e0
kthread+0xf5/0x130
? max_active_store+0x80/0x80
? kthread_bind+0x10/0x10
? do_syscall_64+0x6e/0x1a0
? SyS_exit_group+0x10/0x10
ret_from_fork+0x35/0x40
Cc: <stable@vger.kernel.org> # 4.9.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Tue, 1 May 2018 12:35:51 +0000 (05:35 -0700)]
IB/hfi1: Fix loss of BECN with AHG
AHG may be armed to use the stored header, which by design is limited
to edits in the PSN/A 32 bit word (bth2).
When the code is trying to send a BECN, the use of the stored header
will lose the BECN bit.
Fix by avoiding AHG when getting ready to send a BECN. This is
accomplished by always claiming the packet is not a middle packet which
is an AHG precursor. BECNs are not a normal case and this should not
hurt AHG optimizations.
Cc: <stable@vger.kernel.org> # 4.14.x
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Michael J. Ruhl [Tue, 1 May 2018 12:35:43 +0000 (05:35 -0700)]
IB/hfi1 Use correct type for num_user_context
The module parameter num_user_context is defined as 'int' and
defaults to -1. The module_param_named() says that it is uint.
Correct module_param_named() type information and update the modinfo
text to reflect the default value.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Tue, 1 May 2018 12:35:36 +0000 (05:35 -0700)]
IB/hfi1: Fix handling of FECN marked multicast packet
The code for handling a marked UD packet unconditionally returns the
dlid in the header of the FECN marked packet. This is not correct
for multicast packets where the DLID is in the multicast range.
The subsequent attempt to send the CNP with the multicast lid will
cause the chip to halt the ack send context because the source
lid doesn't match the chip programming. The send context will
be halted and flush any other pending packets in the pio ring causing
the CNP to not be sent.
A part of investigating the fix, it was determined that the 16B work
broke the FECN routine badly with inconsistent use of 16 bit and 32 bits
types for lids and pkeys. Since the port's source lid was correctly 32
bits the type mixmatches need to be dealt with at the same time as
fixing the CNP header issue.
Fix these issues by:
- Using the ports lid for as the SLID for responding to FECN marked UD
packets
- Insure pkey is always 16 bit in this and subordinate routines
- Insure lids are 32 bits in this and subordinate routines
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 88733e3b8450 ("IB/hfi1: Add 16B UD support")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eric Dumazet [Thu, 3 May 2018 16:39:20 +0000 (09:39 -0700)]
dccp: fix tasklet usage
syzbot reported a crash in tasklet_action_common() caused by dccp.
dccp needs to make sure socket wont disappear before tasklet handler
has completed.
This patch takes a reference on the socket when arming the tasklet,
and moves the sock_put() from dccp_write_xmit_timer() to dccp_write_xmitlet()
kernel BUG at kernel/softirq.c:514!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.17.0-rc3+ #30
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515
RSP: 0018:
ffff8801d9b3faf8 EFLAGS:
00010246
dccp_close: ABORT with 65423 bytes unread
RAX:
1ffff1003b367f6b RBX:
ffff8801daf1f3f0 RCX:
0000000000000000
RDX:
ffff8801cf895498 RSI:
0000000000000004 RDI:
0000000000000000
RBP:
ffff8801d9b3fc40 R08:
ffffed0039f12a95 R09:
ffffed0039f12a94
dccp_close: ABORT with 65423 bytes unread
R10:
ffffed0039f12a94 R11:
ffff8801cf8954a3 R12:
0000000000000000
R13:
ffff8801d9b3fc18 R14:
dffffc0000000000 R15:
ffff8801cf895490
FS:
0000000000000000(0000) GS:
ffff8801daf00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001b2bc28000 CR3:
00000001a08a9000 CR4:
00000000001406e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
tasklet_action+0x1d/0x20 kernel/softirq.c:533
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
dccp_close: ABORT with 65423 bytes unread
run_ksoftirqd+0x86/0x100 kernel/softirq.c:646
smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Code: 48 8b 85 e8 fe ff ff 48 8b 95 f0 fe ff ff e9 94 fb ff ff 48 89 95 f0 fe ff ff e8 81 53 6e 00 48 8b 95 f0 fe ff ff e9 62 fb ff ff <0f> 0b 48 89 cf 48 89 8d e8 fe ff ff e8 64 53 6e 00 48 8b 8d e8
RIP: tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP:
ffff8801d9b3faf8
Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 May 2018 18:47:32 +0000 (14:47 -0400)]
Merge branch 'smc-fixes'
Ursula Braun says:
====================
net/smc: fixes 2018/05/03
here are smc fixes for 2 problems:
* receive buffers in SMC must be registered. If registration fails
these buffers must not be kept within the link group for reuse.
Patch 1 is a preparational patch; patch 2 contains the fix.
* sendpage: do not hold the sock lock when calling kernel_sendpage()
or sock_no_sendpage()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Raspl [Thu, 3 May 2018 15:57:39 +0000 (17:57 +0200)]
smc: fix sendpage() call
The sendpage() call grabs the sock lock before calling the default
implementation - which tries to grab it once again.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 3 May 2018 15:57:38 +0000 (17:57 +0200)]
net/smc: handle unregistered buffers
When smc_wr_reg_send() fails then tag (regerr) the affected buffer and
free it in smc_buf_unuse().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Thu, 3 May 2018 15:57:37 +0000 (17:57 +0200)]
net/smc: call consolidation
Consolidate the call to smc_wr_reg_send() in a new function.
No functional changes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 3 May 2018 15:19:32 +0000 (16:19 +0100)]
qed: fix spelling mistake: "offloded" -> "offloaded"
Trivial fix to spelling mistake in DP_NOTICE message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heikki Krogerus [Wed, 25 Apr 2018 14:22:09 +0000 (17:22 +0300)]
usb: typec: tps6598x: handle block reads separately with plain-I2C adapters
If the I2C adapter that the PD controller is attached to
does not support SMBus protocol, the driver needs to handle
block reads separately. The first byte returned in block
read protocol will show the total number of bytes. It needs
to be stripped away.
This is handled separately in the driver only because right
now we have no way of requesting the used protocol with
regmap-i2c. This is in practice a workaround for what is
really a problem in regmap-i2c. The other option would have
been to register custom regmap, or not use regmap at all,
however, since the solution is very simple, I choose to use
it in this case for convenience. It is easy to remove once
we figure out how to handle this kind of cases in
regmap-i2c.
Fixes: 0a4c005bd171 ("usb: typec: driver for TI TPS6598x USB Power Delivery controllers")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heikki Krogerus [Mon, 30 Apr 2018 12:41:56 +0000 (15:41 +0300)]
usb: typec: tcpm: Release the role mux when exiting
The ref count for the USB role switch device must be
released after we are done using the switch.
Fixes: c6962c29729c ("usb: typec: tcpm: Set USB role switch to device mode when configured as such")
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alan Stern [Thu, 3 May 2018 15:04:48 +0000 (11:04 -0400)]
USB: Accept bulk endpoints with 1024-byte maxpacket
Some non-compliant high-speed USB devices have bulk endpoints with a
1024-byte maxpacket size. Although such endpoints don't work with
xHCI host controllers, they do work with EHCI controllers. We used to
accept these invalid sizes (with a warning), but we no longer do
because of an unintentional change introduced by commit
aed9d65ac327
("USB: validate wMaxPacketValue entries in endpoint descriptors").
This patch restores the old behavior, so that people with these
peculiar devices can use them without patching their kernels by hand.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Elvinas <elvinas@veikia.lt>
Fixes: aed9d65ac327 ("USB: validate wMaxPacketValue entries in endpoint descriptors")
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Thu, 3 May 2018 09:12:53 +0000 (10:12 +0100)]
net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
Trivial fix to spelling mistake in netdev_err error message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 3 May 2018 16:27:39 +0000 (06:27 -1000)]
Merge tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
"Fix an incorrect warning selection introduced in the last merge
window"
* tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix inversed DMA_ATTR_NO_WARN test
Zhengyuan Liu [Thu, 8 Feb 2018 01:41:53 +0000 (09:41 +0800)]
tracing: Fix the file mode of stack tracer
It looks weird that the stack_trace_filter file can be written by root
but shows that it does not have write permission by ll command.
Link: http://lkml.kernel.org/r/1518054113-28096-1-git-send-email-liuzhengyuan@kylinos.cn
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Mathias Nyman [Thu, 3 May 2018 14:30:07 +0000 (17:30 +0300)]
xhci: Fix use-after-free in xhci_free_virt_device
KASAN found a use-after-free in xhci_free_virt_device+0x33b/0x38e
where xhci_free_virt_device() sets slot id to 0 if udev exists:
if (dev->udev && dev->udev->slot_id)
dev->udev->slot_id = 0;
dev->udev will be true even if udev is freed because dev->udev is
not set to NULL.
set dev->udev pointer to NULL in xhci_free_dev()
The original patch went to stable so this fix needs to be applied
there as well.
Fixes: a400efe455f7 ("xhci: zero usb device slot_id member when disabling and freeing a xhci slot")
Cc: <stable@vger.kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chen LinX [Wed, 3 Sep 2014 06:31:09 +0000 (14:31 +0800)]
ftrace: Have set_graph_* files have normal file modes
The set_graph_function and set_graph_notrace file mode should be 0644
instead of 0444 as they are writeable. Note, the mode appears to be ignored
regardless, but they should at least look sane.
Link: http://lkml.kernel.org/r/1409725869-4501-1-git-send-email-linx.z.chen@intel.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Chen LinX <linx.z.chen@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Johannes Thumshirn [Thu, 3 May 2018 15:00:35 +0000 (17:00 +0200)]
nvmet: switch loopback target state to connecting when resetting
After commit
bb06ec31452f ("nvme: expand nvmf_check_if_ready checks")
resetting of the loopback nvme target failed as we forgot to switch
it's state to NVME_CTRL_CONNECTING before we reconnect the admin
queues. Therefore the checks in nvmf_check_if_ready() choose to go to
the reject_io case and thus we couldn't sent out an identify
controller command to reconnect.
Change the controller state to NVME_CTRL_CONNECTING after tearing down
the old connection and before re-establishing the connection.
Fixes: bb06ec31452f ("nvme: expand nvmf_check_if_ready checks")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Thu, 26 Apr 2018 20:22:41 +0000 (14:22 -0600)]
nvme/multipath: Fix multipath disabled naming collisions
When CONFIG_NVME_MULTIPATH is set, but we're not using nvme to multipath,
namespaces with multiple paths were not creating unique names due to
reusing the same instance number from the namespace's head.
This patch fixes this by falling back to the non-multipath naming method
when the parameter disabled using multipath.
Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Thu, 26 Apr 2018 20:24:29 +0000 (14:24 -0600)]
nvme/multipath: Disable runtime writable enabling parameter
We can't allow the user to change multipath settings at runtime, as this
will create naming conflicts due to the different naming schemes used
for each mode.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Tue, 17 Apr 2018 20:42:44 +0000 (14:42 -0600)]
nvme: Set integrity flag for user passthrough commands
If the command a separate metadata buffer attached, the request needs
to have the integrity flag set so the driver knows to map it.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chengguang Xu [Sat, 14 Apr 2018 12:06:19 +0000 (20:06 +0800)]
nvme: fix potential memory leak in option parsing
When specifying same string type option several times,
current option parsing may cause memory leak. Hence,
call kfree for previous one in this case.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tetsuo Handa [Mon, 23 Apr 2018 02:21:03 +0000 (11:21 +0900)]
bdi: Fix use after free bug in debugfs_remove()
syzbot is reporting use after free bug in debugfs_remove() [1].
This is because fault injection made memory allocation for
debugfs_create_file() from bdi_debug_register() from bdi_register_va()
fail and continued with setting WB_registered. But when debugfs_remove()
is called from debugfs_remove(bdi->debug_dir) from bdi_debug_unregister()
from bdi_unregister() from release_bdi() because WB_registered was set
by bdi_register_va(), IS_ERR_OR_NULL(bdi->debug_dir) == false despite
debugfs_remove(bdi->debug_dir) was already called from bdi_register_va().
Fix this by making IS_ERR_OR_NULL(bdi->debug_dir) == true.
[1] https://syzkaller.appspot.com/bug?id=
5ab4efd91a96dcea9b68104f159adf4af2a6dfc1
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+049cb4ae097049dac137@syzkaller.appspotmail.com>
Fixes: 97f07697932e6faf ("bdi: convert bdi_debug_register to int")
Cc: weiping zhang <zhangweiping@didichuxing.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Eric Dumazet [Thu, 3 May 2018 03:25:13 +0000 (20:25 -0700)]
tcp: restore autocorking
When adding rb-tree for TCP retransmit queue, we inadvertently broke
TCP autocorking.
tcp_should_autocork() should really check if the rtx queue is not empty.
Tested:
Before the fix :
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
540000 262144 500 10.00 2682.85 2.47 1.59 3.618 2.329
TcpExtTCPAutoCorking 33 0.0
// Same test, but forcing TCP_NODELAY
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
540000 262144 500 10.00 1408.75 2.44 2.96 6.802 8.259
TcpExtTCPAutoCorking 1 0.0
After the fix :
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
540000 262144 500 10.00 5472.46 2.45 1.43 1.761 1.027
TcpExtTCPAutoCorking 361293 0.0
// With TCP_NODELAY option
$ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
540000 262144 500 10.00 5454.96 2.46 1.63 1.775 1.174
TcpExtTCPAutoCorking 315448 0.0
Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Wenig <mwenig@vmware.com>
Tested-by: Michael Wenig <mwenig@vmware.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Wenig <mwenig@vmware.com>
Tested-by: Michael Wenig <mwenig@vmware.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>