review.tizen.org Git - platform/kernel/linux-rpi.git/log

drm/msm/dsi: Use correct pm_runtime_put variant during host_init

The DSI runtime PM suspend/resume callbacks check whether
msm_host->cfg_hnd is non-NULL before trying to enable the bus clocks.
This is done to accommodate early calls to these functions that may
happen before the bus clocks are even initialized.

Calling pm_runtime_put_autosuspend() in dsi_host_init() can result in
racy behaviour since msm_host->cfg_hnd is set very soon after. If the
suspend callback happens too late, we end up trying to disable clocks
that were never enabled, resulting in a bunch of WARN_ON splats.

Use pm_runtime_put_sync() so that the suspend callback is called
immediately.

Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>

Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- fix crashes in skcipher/shash from zero-length input.

- fix softirq GFP_KERNEL allocation in shash_setkey_unaligned.

- error path bug fix in xts create function.

- fix compiler warning regressions in axis and stm32

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: shash - Fix zero-length shash ahash digest crash
  crypto: skcipher - Fix crash on zero-length input
  crypto: shash - Fix a sleep-in-atomic bug in shash_setkey_unaligned
  crypto: xts - Fix an error handling path in 'create()'
  crypto: stm32 - Try to fix hash padding
  crypto: axis - hide an unused variable

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching

Pull livepatching fix from Jiri Kosina:

- bugfix for handling of coming modules (incorrect handling of failure)
from Joe Lawrence

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: unpatch all klp_objects if klp_module_coming fails

ecryptfs: fix dereference of NULL user_key_payload

In eCryptfs, we failed to verify that the authentication token keys are
not revoked before dereferencing their payloads, which is problematic
because the payload of a revoked key is NULL. request_key() *does* skip
revoked keys, but there is still a window where the key can be revoked
before we acquire the key semaphore.

Fix it by updating ecryptfs_get_key_payload_data() to return
-EKEYREVOKED if the key payload is NULL. For completeness we check this
for "encrypted" keys as well as "user" keys, although encrypted keys
cannot be revoked currently.

Alternatively we could use key_validate(), but since we'll also need to
fix ecryptfs_get_key_payload_data() to validate the payload length, it
seems appropriate to just check the payload pointer.

Fixes: 237fead61998 ("[PATCH] ecryptfs: fs/Makefile and fs/Kconfig")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v2.6.19+]
Cc: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>

fscrypt: fix dereference of NULL user_key_payload

When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this. request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

Fixes: 88bd6ccdcdd6 ("ext4 crypto: add encryption key management facilities")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v4.1+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>

lib/digsig: fix dereference of NULL user_key_payload

digsig_verify() requests a user key, then accesses its payload.
However, a revoked key has a NULL payload, and we failed to check for
this. request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

Fixes: 051dbb918c7f ("crypto: digital signature verification support")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v3.3+]
Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>

FS-Cache: fix dereference of NULL user_key_payload

When the file /proc/fs/fscache/objects (available with
CONFIG_FSCACHE_OBJECT_LIST=y) is opened, we request a user key with
description "fscache:objlist", then access its payload. However, a
revoked key has a NULL payload, and we failed to check for this.
request_key() *does* skip revoked keys, but there is still a window
where the key can be revoked before we access its payload.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

Fixes: 4fbf4291aa15 ("FS-Cache: Allow the current state of all objects to be dumped")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v2.6.32+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:

- fix for potential out-of-bounds memory access (found by fuzzing,
   likely requires specially crafted device to trigger) by Jaejoong Kim

- two new device IDs for elecom driver from Alex Manoussakis

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hid-elecom: extend to fix descriptor for HUGE trackball
  HID: usbhid: fix out-of-bounds bug

Merge tag 'sound-4.14-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"It's been a busy week for defending the attacks from fuzzer people.

  This contains various USB-audio driver fixes and sequencer core fixes
  spotted by syzkaller and other fuzzer, as well as one quirk for a
  Plantronics USB audio device"

* tag 'sound-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: caiaq: Fix stray URB at probe error path
  ALSA: seq: Fix use-after-free at creating a port
  ALSA: usb-audio: Kill stray URB at exiting
  ALSA: line6: Fix leftover URB at error-path during probe
  ALSA: line6: Fix NULL dereference at podhd_disconnect()
  ALSA: line6: Fix missing initialization before error path
  ALSA: seq: Fix copy_from_user() call inside lock
  ALSA: usb-audio: Add sample rate quirk for Plantronics P610

Merge branch 'waitid-fix'

Merge waitid() fix from Kees Cook.

I'd have hoped that the unsafe_{get|put}_user() naming would have
avoided these kinds of stupid bugs, but no such luck.

* waitid-fix:
waitid(): Add missing access_ok() checks

x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping

SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode
version number than stepping-4. Linux needs to know this stepping-3
specific version number to also enable the TSC_DEADLINE on stepping-3.

The steppings and ucode versions are documented in the SKX BIOS update:
https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txt

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com

x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors

Commit 594a30fb1242 ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled
due to Errata" on CPUs without the feature", 2017-08-30) was also about
silencing the warning on VirtualBox; however, KVM does expose the TSC
deadline timer, and it's virtualized so that it is immune from CPU errata.

Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs:

     [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                    please update microcode to version: 0xb2 (or later)

Even if you had a hypervisor that does _not_ virtualize the TSC deadline
and rather exposes the hardware one, it should be the hypervisors task
to update microcode and possibly hide the flag from CPUID.  So just
hide the message when running on _any_ hypervisor, not just those that
do not support the TSC deadline timer.

The older check still makes sense, so keep it.

Fixes: bd9240a18e ("x86/apic: Add TSC_DEADLINE quirk due to errata")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com

drm/msm: fix return value check in _msm_gem_kernel_new()

In case of error, the function msm_gem_get_vaddr() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Fixes: 8223286d62e2 ("drm/msm: Add a helper function for in-kernel
buffer allocations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>

drm/msm: use proper memory barriers for updating tail/head

Fixes intermittent corruption of cmdstream dump.

Signed-off-by: Rob Clark <robdclark@gmail.com>

drm/msm/mdp5: add missing max size for 8x74 v1

This should have same max width as v2.

Signed-off-by: Rob Clark <robdclark@gmail.com>

KEYS: encrypted: fix dereference of NULL user_key_payload

A key of type "encrypted" references a "master key" which is used to
encrypt and decrypt the encrypted key's payload. However, when we
accessed the master key's payload, we failed to handle the case where
the master key has been revoked, which sets the payload pointer to NULL.
Note that request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

This was an issue for master keys of type "user" only. Master keys can
also be of type "trusted", but those cannot be revoked.

Fixes: 7e70cb497850 ("keys: add new key-type encrypted")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v2.6.38+]
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: David Safford <safford@us.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>

drm/amdgpu: fix placement flags in amdgpu_ttm_bind

Otherwise we lose the NO_EVICT flag and can try to evict pinned BOs.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()

Stack trace output during a stress test:
[    4.310049] Freeing initrd memory: 22592K
[    4.310646] rtas_flash: no firmware flash support
[    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
[    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
[    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
[    4.313588] Call Trace:
[    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
[    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
[    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
[    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
[    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
[    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
[    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
[    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
[    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
[    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
[    4.314313] Mem-Info:
[    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0

core_imc_mem_init() at system boot use alloc_pages_node() to get memory
and alloc_pages_node() throws this stack dump when tried to allocate
memory from a node which has no memory behind it. Add a ___GFP_NOWARN
flag in allocation request as a fix.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Venkat R.B <venkatb3@in.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.

The case where event ref count hit a negative value is, when perf session is
started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.

Add a check to see if the reference count is alreday zero, before decrementing
the count, so that the ref count will not hit a negative value.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

MAINTAINERS: Add Paul Mackerras as maintainer for KVM/powerpc

Paul is handling almost all of the powerpc related KVM patches nowadays,
so he should be mentioned in the MAINTAINERS file accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit

When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
guest CR4. Before this CR4 loading, the guest CR4 refers to L2
CR4. Because these two CR4's are in different levels of guest, we
should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
is used to handle guest writes to its CR4, checks the guest change to
CR4 and may fail if the change is invalid.

The failure may cause trouble. Consider we start
  a L1 guest with non-zero L1 PCID in use,
     (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
and
  a L2 guest with L2 PCID disabled,
     (i.e. L2 CR4.PCIDE == 0)
and following events may happen:

1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
   into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
   of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
   vcpu->arch.cr4) is left to the value of L2 CR4.

2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
   kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
   because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
   CR3.PCID != 0, L0 KVM will inject GP to L1 guest.

Fixes: 4704d0befb072 ("KVM: nVMX: Exiting from L2 to L1")
Cc: qemu-stable@nongnu.org
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'fixes-for-v4.14-rc5' of git://git./linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

USB: fixes for v4.14-rc5

A deadlock fix in dummy-hcd; Fixing a use-after-free bug in composite;
Renesas got another fix for DMA programming (this time around a fix
for receiving ZLP); Tegra PHY got a suspend fix; A memory leak on our
configfs ABI got plugged.

Other than these, a couple other minor fixes on usbtest.

iommu/exynos: Remove initconst attribute to avoid potential kernel oops

Exynos SYSMMU registers standard platform device with sysmmu_of_match
table, what means that this table is accessed every time a new platform
device is registered in a system. This might happen also after the boot,
so the table must not be attributed as initconst to avoid potential kernel
oops caused by access to freed memory.

Fixes: 6b21a5db3642 ("iommu/exynos: Support for device tree")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>

Merge tag 'drm-misc-fixes-2017-10-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Core Changes:
- sync_file: Fix race in SYNC_IOC_FILE_INFO (John)
- atomic_helper: Give up reference taken in suspend helper (Jeffy)

Cc: John Einar Reitan <john.reitan@arm.com>
Cc: Jeffy Chen <jeffy.chen@rock-chips.com>
* tag 'drm-misc-fixes-2017-10-11' of git://anongit.freedesktop.org/drm/drm-misc:
sync_file: Return consistent status in SYNC_IOC_FILE_INFO
drm/atomic: Unref duplicated drm_atomic_state in drm_atomic_helper_resume()

ACPI: properties: Fix __acpi_node_get_property_reference() return codes

Fix more return codes for device property: Align return codes of
__acpi_node_get_property_reference().

In particular, what was missed previously:

-EPROTO could be returned in certain cases, now -EINVAL;
-EINVAL was returned if the property was not found, now -ENOENT;
-EINVAL was returned also if the index was higher than the number of
entries in a package, now -ENOENT.

Reported-by: Hyungwoo Yang <hyungwoo.yang@intel.com>
Fixes: 3e3119d3088f (device property: Introduce fwnode_property_get_reference_args)
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Tested-by: Hyungwoo Yang <hyungwoo.yang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: properties: Align return codes of __acpi_node_get_property_reference()

acpi_fwnode_get_reference_args(), the function implementing ACPI
support for fwnode_property_get_reference_args(), returns directly
error codes from __acpi_node_get_property_reference(). The latter
uses different error codes than the OF implementation. In particular,
the OF implementation uses -ENOENT to indicate that the property is
not found, a reference entry is empty and there are no more
references.

Document and align the error codes for property for
fwnode_property_get_reference_args() so that they match with
of_parse_phandle_with_args().

Fixes: 3e3119d3088f (device property: Introduce fwnode_property_get_reference_args)
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge tag 'rpmsg-v4.14-fixes' of git://github.com/andersson/remoteproc

Pull rpmsg fixes from Bjorn Andersson:
"This corrects two mistakes in the Qualcomm GLINK SMEM driver"

* tag 'rpmsg-v4.14-fixes' of git://github.com/andersson/remoteproc:
rpmsg: glink: Fix memory leak in qcom_glink_alloc_intent()
rpmsg: glink: Unlock on error in qcom_glink_request_intent()

Merge tag 'rproc-v4.14-fixes' of git://github.com/andersson/remoteproc

Pull remoteproc fixes from Bjorn Andersson:
"This fixes a couple of issues in the imx_rproc driver and corrects the
  Kconfig dependencies of the Qualcomm remoteproc drivers"

* tag 'rproc-v4.14-fixes' of git://github.com/andersson/remoteproc:
  remoteproc: imx_rproc: fix return value check in imx_rproc_addr_init()
  remoteproc: qcom: fix RPMSG_QCOM_GLINK_SMEM dependencies
  remoteproc: imx_rproc: fix a couple off by one bugs

scsi: fc: check for rport presence in fc_block_scsi_eh

Coverity-scan recently found a possible NULL pointer dereference in
fc_block_scsi_eh() as starget_to_rport() either returns the rport for
the startget or NULL.

While it is rather unlikely to have fc_block_scsi_eh() called without an
rport associated it's a good idea to catch potential misuses of the API
gracefully.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix uninitialized work element

Fixes following stack trace

kernel: Call Trace:
kernel: dump_stack+0x63/0x84
kernel: __warn+0xd1/0xf0
kernel: warn_slowpath_null+0x1d/0x20
kernel: __queue_work+0x37a/0x420
kernel: queue_work_on+0x27/0x40
kernel: queue_work+0x14/0x20 [qla2xxx]
kernel: schedule_work+0x13/0x20 [qla2xxx]
kernel: qla2x00_post_work+0xab/0xb0 [qla2xxx]
kernel: qla2x00_post_aen_work+0x3b/0x50 [qla2xxx]
kernel: qla2x00_async_event+0x20d/0x15d0 [qla2xxx]
kernel: ? lock_timer_base+0x7d/0xa0
kernel: qla24xx_intr_handler+0x1da/0x310 [qla2xxx]
kernel: qla2x00_poll+0x36/0x60 [qla2xxx]
kernel: qla2x00_mailbox_command+0x659/0xec0 [qla2xxx]
kernel: ? proc_create_data+0x7a/0xd0
kernel: qla25xx_init_rsp_que+0x15b/0x240 [qla2xxx]
kernel: ? request_irq+0x14/0x20 [qla2xxx]
kernel: qla25xx_create_rsp_que+0x256/0x3c0 [qla2xxx]
kernel: qla2xxx_create_qpair+0x2af/0x5b0 [qla2xxx]
kernel: qla2x00_probe_one+0x1107/0x1c30 [qla2xxx]

Fixes: ec7193e26055 ("qla2xxx: Fix delayed response to command for loop mode/direct connect.")
Cc: <stable@vger.kernel.org> # 4.13
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

remoteproc: imx_rproc: fix return value check in imx_rproc_addr_init()

In case of error, the function devm_ioremap() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should
be replaced with NULL test.

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

drm/i915/bios: parse DDI ports also for CHV for HDMI DDC pin and DP AUX channel

While technically CHV isn't DDI, we do look at the VBT based DDI port
info for HDMI DDC pin and DP AUX channel. (We call these "alternate",
but they're really just something that aren't platform defaults.)

In commit e4ab73a13291 ("drm/i915: Respect alternate_ddc_pin for all DDI
ports") Ville writes, "IIRC there may be CHV system that might actually
need this."

I'm not sure why there couldn't be even more platforms that need this,
but start conservative, and parse the info for CHV in addition to DDI.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100553
Reported-by: Marek Wilczewski <mw@3cte.pl>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/d0815082cb98487618429b62414854137049b888.1506586821.git.jani.nikula@intel.com
(cherry picked from commit 348e4058ebf53904e817eec7a1b25327143c2ed2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

scsi: libiscsi: fix shifting of DID_REQUEUE host byte

The SCSI host byte should be shifted left by 16 in order to have
scsi_decide_disposition() do the right thing (.i.e. requeue the
command).

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 661134ad3765 ("[SCSI] libiscsi, bnx2i: make bound ep check common")
Cc: Lee Duncan <lduncan@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Cc: Chris Leech <cleech@redhat.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

xfs: handle error if xfs_btree_get_bufs fails

Jason reported that a corrupted filesystem failed to replay
the log with a metadata block out of bounds warning:

XFS (dm-2): _xfs_buf_find: Block out of range: block 0x80270fff8, EOFS 0x9c40000

_xfs_buf_find() and xfs_btree_get_bufs() return NULL if
that happens, and then when xfs_alloc_fix_freelist() calls
xfs_trans_binval() on that NULL bp, we oops with:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8

We don't handle _xfs_buf_find errors very well, every
caller higher up the stack gets to guess at why it failed.
But we should at least handle it somehow, so return
EFSCORRUPTED here.

Reported-by: Jason L Tibbitts III <tibbs@math.uh.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: reinit btree pointer on attr tree inactivation walk

xfs_attr3_root_inactive() walks the attr fork tree to invalidate the
associated blocks. xfs_attr3_node_inactive() recursively descends
from internal blocks to leaf blocks, caching block address values
along the way to revisit parent blocks, locate the next entry and
descend down that branch of the tree.

The code that attempts to reread the parent block is unsafe because
it assumes that the local xfs_da_node_entry pointer remains valid
after an xfs_trans_brelse() and re-read of the parent buffer. Under
heavy memory pressure, it is possible that the buffer has been
reclaimed and reallocated by the time the parent block is reread.
This means that 'btree' can point to an invalid memory address, lead
to a random/garbage value for child_fsb and cause the subsequent
read of the attr fork to go off the rails and return a NULL buffer
for an attr fork offset that is most likely not allocated.

Note that this problem can be manufactured by setting
XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers,
creating a file with a multi-level attr fork and removing it to
trigger inactivation.

To address this problem, reinit the node/btree pointers to the
parent buffer after it has been re-read. This ensures btree points
to a valid record and allows the walk to proceed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: Fix bool initialization/comparison

Bool initializations should use true and false. Bool tests don't need
comparisons.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: don't change inode mode if ACL update fails

If we get ENOSPC half way through setting the ACL, the inode mode
can still be changed even though the ACL does not exist. Reorder the
operation to only change the mode of the inode if the ACL is set
correctly.

Whilst this does not fix the problem with crash consistency (that requires
attribute addition to be a deferred op) it does prevent ENOSPC and other
non-fatal errors setting an xattr to be handled sanely.

This fixes xfstests generic/449.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: move more RT specific code under CONFIG_XFS_RT

Various utility functions and interfaces that iterate internal
devices try to reference the realtime device even when RT support is
not compiled into the kernel.

Make sure this code is excluded from the CONFIG_XFS_RT=n build,
and where appropriate stub functions to return fatal errors if
they ever get called when RT support is not present.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: Don't log uninitialised fields in inode structures

Prevent kmemcheck from throwing warnings about reading uninitialised
memory when formatting inodes into the incore log buffer. There are
several issues here - we don't always log all the fields in the
inode log format item, and we never log the inode the
di_next_unlinked field.

In the case of the inode log format item, this is exacerbated
by the old xfs_inode_log_format structure padding issue. Hence make
the padded, 64 bit aligned version of the structure the one we always
use for formatting the log and get rid of the 64 bit variant. This
means we'll always log the 64-bit version and so recovery only needs
to convert from the unpadded 32 bit version from older 32 bit
kernels.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

media: dvb_frontend: only use kref after initialized

As reported by Laurent, when a DVB frontend need to register
two drivers (e. g. a tuner and a demod), if the second driver
fails to register (for example because it was not compiled),
the error handling logic frees the frontend by calling
dvb_frontend_detach(). That used to work fine, but changeset
1f862a68df24 ("[media] dvb_frontend: move kref to struct dvb_frontend")
added a kref at struct dvb_frontend. So, now, instead of just
freeing the data, the error handling do a kref_put().

That works fine only after dvb_register_frontend() succeeds.

While it would be possible to add a helper function that
would be initializing earlier the kref, that would require
changing every single DVB frontend on non-trivial ways, and
would make frontends different than other drivers.

So, instead of doing that, let's focus on the real issue:
only call kref_put() after kref_init(). That's easy to
check, as, when the dvb frontend is successfuly registered,
it will allocate its own private struct. So, if such
struct is allocated, it means that it is safe to use
kref_put(). If not, then nobody is using yet the frontend,
and it is safe to just deallocate it.

Fixes: 1f862a68df24 ("[media] dvb_frontend: move kref to struct dvb_frontend")
Reported-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>

9p: set page uptodate when required in write_end()

Commit 77469c3f570 prevented setting the page as uptodate when we wrote
the right amount of data, fix that.

Fixes: 77469c3f570 ("9p: saner ->write_end() on failing copy into non-uptodate page")
Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Alexander Levin <alexander.levin@verizon.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'gpio-v4.14-2' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Here are some smallish GPIO fixes for v4.14. Like with pin control:
  some build/Kconfig noise and one serious bug in a specific driver.

   - Three Kconfig/build warning fixes

   - A fix for lost edge IRQs in the OMAP driver"

* tag 'gpio-v4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: omap: Fix lost edge interrupts
  gpio: omap: omap_gpio_show_rev is not __init
  gpio: acpi: work around false-positive -Wstring-overflow warning
  gpio: thunderx: select IRQ_DOMAIN_HIERARCHY instead of depends on

Merge tag 'pinctrl-v4.14-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Two small things and a slightly larger thing in the Intel Cherryview.

   - Fix two build problems

   - Fix a regression on the Intel Cherryview interrupt path"

* tag 'pinctrl-v4.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: cherryview: fix issues caused by dynamic gpio irqs mapping
  pinctrl/amd: Fix build dependency on pinmux code
  pinctrl: bcm2835: fix build warning in bcm2835_gpio_irq_handle_bank

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"Fairly old DIO bug caught by Andreas (3.10+) and several slightly
  younger blk_rq_map_user_iov() bugs, both on map and copy codepaths
  (Vitaly and me)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  bio_copy_user_iov(): don't ignore ->iov_offset
  more bio_map_user_iov() leak fixes
  fix unbalanced page refcounting in bio_map_user_iov
  direct-io: Prevent NULL pointer access in submit_page_section

x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c

Some routines in mem_encrypt.c are called very early in the boot process,
e.g. sme_enable(). When CONFIG_KCOV=y is defined the resulting code added
to sme_enable() (and others) for KCOV instrumentation results in a kernel
crash. Disable the KCOV instrumentation for mem_encrypt.c by adding
KCOV_INSTRUMENT_mem_encrypt.o := n to arch/x86/mm/Makefile.

In order to avoid other possible early boot issues, model mem_encrypt.c
after head64.c in regards to tools. In addition to disabling KCOV as
stated above and a previous patch that disables branch profiling, also
remove the "-pg" CFLAG if CONFIG_FUNCTION_TRACER is enabled and set
KASAN_SANITIZE to "n", each of which are done on a file basis.

Reported-by: kernel test robot <lkp@01.org>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171010194504.18887.38053.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

ALSA: caiaq: Fix stray URB at probe error path

caiaq driver doesn't kill the URB properly at its error path during
the probe, which may lead to a use-after-free error later. This patch
addresses it.

Reported-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

HID: hid-elecom: extend to fix descriptor for HUGE trackball

In addition to DEFT, Elecom introduced a larger trackball called HUGE, in
both wired (M-HT1URBK) and wireless (M-HT1DRBK) versions. It has the same
buttons and behavior as the DEFT. This patch adds the two relevant USB IDs
to enable operation of the three Fn buttons on the top of the device.

Cc: Diego Elio Petteno <flameeyes@flameeyes.eu>
Signed-off-by: Alex Manoussakis <amanou@gnu.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

HID: usbhid: fix out-of-bounds bug

The hid descriptor identifies the length and type of subordinate
descriptors for a device. If the received hid descriptor is smaller than
the size of the struct hid_descriptor, it is possible to cause
out-of-bounds.

In addition, if bNumDescriptors of the hid descriptor have an incorrect
value, this can also cause out-of-bounds while approaching hdesc->desc[n].

So check the size of hid descriptor and bNumDescriptors.

BUG: KASAN: slab-out-of-bounds in usbhid_parse+0x9b1/0xa20
Read of size 1 at addr ffff88006c5f8edf by task kworker/1:2/1261

CPU: 1 PID: 1261 Comm: kworker/1:2 Not tainted
4.14.0-rc1-42251-gebb2c2437d80 #169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
print_address_description+0x78/0x280 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351
kasan_report+0x22f/0x340 mm/kasan/report.c:409
__asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:427
usbhid_parse+0x9b1/0xa20 drivers/hid/usbhid/hid-core.c:1004
hid_add_device+0x16b/0xb30 drivers/hid/hid-core.c:2944
usbhid_probe+0xc28/0x1100 drivers/hid/usbhid/hid-core.c:1369
usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
really_probe drivers/base/dd.c:413
driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
__device_attach_driver+0x230/0x290 drivers/base/dd.c:653
bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
__device_attach+0x26e/0x3d0 drivers/base/dd.c:710
device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
device_add+0xd0b/0x1660 drivers/base/core.c:1835
usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
really_probe drivers/base/dd.c:413
driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
__device_attach_driver+0x230/0x290 drivers/base/dd.c:653
bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
__device_attach+0x26e/0x3d0 drivers/base/dd.c:710
device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
device_add+0xd0b/0x1660 drivers/base/core.c:1835
usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
hub_port_connect drivers/usb/core/hub.c:4903
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
worker_thread+0x221/0x1850 kernel/workqueue.c:2253
kthread+0x3a1/0x470 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Cc: stable@vger.kernel.org
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

livepatch: unpatch all klp_objects if klp_module_coming fails

When an incoming module is considered for livepatching by
klp_module_coming(), it iterates over multiple patches and multiple
kernel objects in this order:

list_for_each_entry(patch, &klp_patches, list) {
klp_for_each_object(patch, obj) {

which means that if one of the kernel objects fails to patch,
klp_module_coming()'s error path needs to unpatch and cleanup any kernel
objects that were already patched by a previous patch.

Reported-by: Miroslav Benes <mbenes@suse.cz>
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

usb: usbtest: fix NULL pointer dereference

If the usbtest driver encounters a device with an IN bulk endpoint but
no OUT bulk endpoint, it will try to dereference a NULL pointer
(out->desc.bEndpointAddress). The problem can be solved by adding a
missing test.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

usb: gadget: configfs: Fix memory leak of interface directory data

Kmemleak checking configuration reports a memory leak in
usb_os_desc_prepare_interf_dir function when rndis function
instance is freed and then allocated again. For example, this
happens with FunctionFS driver with RNDIS function enabled
when "ffs-test" test application is run several times in a row.

The data for intermediate "os_desc" group for interface directories
is allocated as a single VLA chunk and (after a change of default
groups handling) is not ever freed and actually not stored anywhere
besides inside a list of default groups of a parent group.

The fix is to make usb_os_desc_prepare_interf_dir function return
a pointer to allocated data (as a pointer to the first VLA item)
instead of (an unused) integer and to make the caller component
(currently the only one is RNDIS function) responsible for storing
the pointer and freeing the memory when appropriate.

Fixes: 1ae1602de028 ("configfs: switch ->default groups to a linked list")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options

KASAN enabled configuration reports an error

    BUG: KASAN: use-after-free in usb_composite_overwrite_options+...
                [libcomposite] at addr ...
    Read of size 1 by task ...

when some driver is un-bound and then bound again.
For example, this happens with FunctionFS driver when "ffs-test"
test application is run several times in a row.

If the driver has empty manufacturer ID string in initial static data,
it is then replaced with generated string. After driver unbinding
the generated string is freed, but the driver data still keep that
pointer. And if the driver is then bound again, that pointer
is re-used for string emptiness check.

The fix is to clean up the driver string data upon its unbinding
to drop the pointer to freed memory.

Fixes: cc2683c318a5 ("usb: gadget: Provide a default implementation of default manufacturer string")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

usb: misc: usbtest: Fix overflow in usbtest_do_ioctl()

There used to be a test against "if (param->sglen > MAX_SGLEN)" but it
was removed during a refactor. It leads to an integer overflow and a
stack overflow in test_queue() if we try to create a too large urbs[]
array on the stack.

There is a second integer overflow in test_queue() as well if
"param->iterations" is too high. I don't immediately see that it's
harmful but I've added a check to prevent it and silence the static
checker warning.

Fixes: 18fc4ebdc705 ("usb: misc: usbtest: Remove timeval usage")
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet

The DREQE bit of the DnFIFOSEL should be set to 1 after the DE bit of
USB-DMAC on R-Car SoCs is set to 1 after the USB-DMAC received a
zero-length packet. Otherwise, a transfer completion interruption
of USB-DMAC doesn't happen. Even if the driver changes the sequence,
normal operations (transmit/receive without zero-length packet) will
not cause any side-effects. So, this patch fixes the sequence anyway.

Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
[shimoda: revise the commit log]
Fixes: e73a9891b3a1 ("usb: renesas_usbhs: add DMAEngine support")
Cc: <stable@vger.kernel.org> # v3.1+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

USB: dummy-hcd: Fix deadlock caused by disconnect detection

The dummy-hcd driver calls the gadget driver's disconnect callback
under the wrong conditions.  It should invoke the callback when Vbus
power is turned off, but instead it does so when the D+ pullup is
turned off.

This can cause a deadlock in the composite core when a gadget driver
is unregistered:

[   88.361471] ============================================
[   88.362014] WARNING: possible recursive locking detected
[   88.362580] 4.14.0-rc2+ #9 Not tainted
[   88.363010] --------------------------------------------
[   88.363561] v4l_id/526 is trying to acquire lock:
[   88.364062]  (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547e03>] composite_disconnect+0x43/0x100 [libcomposite]
[   88.365051]
[   88.365051] but task is already holding lock:
[   88.365826]  (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547b09>] usb_function_deactivate+0x29/0x80 [libcomposite]
[   88.366858]
[   88.366858] other info that might help us debug this:
[   88.368301]  Possible unsafe locking scenario:
[   88.368301]
[   88.369304]        CPU0
[   88.369701]        ----
[   88.370101]   lock(&(&cdev->lock)->rlock);
[   88.370623]   lock(&(&cdev->lock)->rlock);
[   88.371145]
[   88.371145]  *** DEADLOCK ***
[   88.371145]
[   88.372211]  May be due to missing lock nesting notation
[   88.372211]
[   88.373191] 2 locks held by v4l_id/526:
[   88.373715]  #0:  (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547b09>] usb_function_deactivate+0x29/0x80 [libcomposite]
[   88.374814]  #1:  (&(&dum_hcd->dum->lock)->rlock){....}, at: [<ffffffffa05bd48d>] dummy_pullup+0x7d/0xf0 [dummy_hcd]
[   88.376289]
[   88.376289] stack backtrace:
[   88.377726] CPU: 0 PID: 526 Comm: v4l_id Not tainted 4.14.0-rc2+ #9
[   88.378557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   88.379504] Call Trace:
[   88.380019]  dump_stack+0x86/0xc7
[   88.380605]  __lock_acquire+0x841/0x1120
[   88.381252]  lock_acquire+0xd5/0x1c0
[   88.381865]  ? composite_disconnect+0x43/0x100 [libcomposite]
[   88.382668]  _raw_spin_lock_irqsave+0x40/0x54
[   88.383357]  ? composite_disconnect+0x43/0x100 [libcomposite]
[   88.384290]  composite_disconnect+0x43/0x100 [libcomposite]
[   88.385490]  set_link_state+0x2d4/0x3c0 [dummy_hcd]
[   88.386436]  dummy_pullup+0xa7/0xf0 [dummy_hcd]
[   88.387195]  usb_gadget_disconnect+0xd8/0x160 [udc_core]
[   88.387990]  usb_gadget_deactivate+0xd3/0x160 [udc_core]
[   88.388793]  usb_function_deactivate+0x64/0x80 [libcomposite]
[   88.389628]  uvc_function_disconnect+0x1e/0x40 [usb_f_uvc]

This patch changes the code to test the port-power status bit rather
than the port-connect status bit when deciding whether to isue the
callback.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: David Tulloh <david@tulloh.id.au>
CC: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

usb: phy: tegra: Fix phy suspend for UDC

Commit dfebb5f43a78 ("usb: chipidea: Add support for Tegra20/30/114/124")
added UDC support for Tegra but with UDC support enabled, is was found
that Tegra30, Tegra114 and Tegra124 would hang on entry to suspend.

The hang occurred during the suspend of the USB PHY when the Tegra PHY
driver attempted to disable the PHY clock. The problem is that before
the Tegra PHY driver is suspended, the chipidea driver already disabled
the PHY clock and when the Tegra PHY driver suspended, it could not read
DEVLC register and caused the device to hang.

The Tegra USB PHY driver is used by both the Tegra EHCI driver and now
the chipidea UDC driver and so simply removing the disabling of the PHY
clock from the USB PHY driver would not work for the Tegra EHCI driver.
Fortunately, the status of the USB PHY clock can be read from the
USB_SUSP_CTRL register and therefore, to workaround this issue, simply
poll the register prior to disabling the clock in USB PHY driver to see
if clock gating has already been initiated. Please note that it can take
a few uS for the clock to disable and so simply reading this status
register once on entry is not sufficient.

Similarly when turning on the PHY clock, it is possible that the clock
is already enabled or in the process of being enabled, and so check for
this when enabling the PHY.

Please note that no issues are seen with Tegra20 because it has a slightly
different PHY to Tegra30/114/124.

Fixes: dfebb5f43a78 ("usb: chipidea: Add support for Tegra20/30/114/124")
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>

gpu: ipu-v3: pre: implement workaround for ERR009624

The PRE has a bug where a software write to the CTRL register can block
the setting of the ENABLE bit by the hardware in auto repeat mode. When
this happens the PRE will fail to handle new jobs. To work around this
software must not write to CTRL register when the PRE store engine is
inside the unsafe window, where a hardware update to the ENABLE bit
may happen.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: rebased before PRE tiled prefetch support]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>

gpu: ipu-v3: prg: wait for double buffers to be filled on channel startup

Wait for both double buffer to be filled when first starting a channel.
This makes channel startup a lot more reliable, probably because it allows
the internal state machine to settle before the requests from the IPU are
coming in.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: rebased before switch to runtime PM]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>

gpu: ipu-v3: Allow channel burst locking on i.MX6 only

The IDMAC_LOCK_EN registers on i.MX51 have a different layout, and on
i.MX53 enabling the lock feature causes bursts to get lost. Restrict
enabling the burst lock feature to i.MX6.

Reported-by: Patrick Brünn <P.Bruenn@beckhoff.com>
Fixes: 790cb4c7c954 ("drm/imx: lock scanout transfers for consecutive bursts")
Tested-by: Patrick Brünn <P.Bruenn@beckhoff.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>

ALSA: seq: Fix use-after-free at creating a port

There is a potential race window opened at creating and deleting a
port via ioctl, as spotted by fuzzing.  snd_seq_create_port() creates
a port object and returns its pointer, but it doesn't take the
refcount, thus it can be deleted immediately by another thread.
Meanwhile, snd_seq_ioctl_create_port() still calls the function
snd_seq_system_client_ev_port_start() with the created port object
that is being deleted, and this triggers use-after-free like:

BUG: KASAN: use-after-free in snd_seq_ioctl_create_port+0x504/0x630 [snd_seq] at addr ffff8801f2241cb1
=============================================================================
BUG kmalloc-512 (Tainted: G    B          ): kasan: bad access detected
-----------------------------------------------------------------------------
INFO: Allocated in snd_seq_create_port+0x94/0x9b0 [snd_seq] age=1 cpu=3 pid=4511
___slab_alloc+0x425/0x460
__slab_alloc+0x20/0x40
   kmem_cache_alloc_trace+0x150/0x190
snd_seq_create_port+0x94/0x9b0 [snd_seq]
snd_seq_ioctl_create_port+0xd1/0x630 [snd_seq]
snd_seq_do_ioctl+0x11c/0x190 [snd_seq]
snd_seq_ioctl+0x40/0x80 [snd_seq]
do_vfs_ioctl+0x54b/0xda0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x16/0x75
INFO: Freed in port_delete+0x136/0x1a0 [snd_seq] age=1 cpu=2 pid=4717
__slab_free+0x204/0x310
kfree+0x15f/0x180
port_delete+0x136/0x1a0 [snd_seq]
snd_seq_delete_port+0x235/0x350 [snd_seq]
snd_seq_ioctl_delete_port+0xc8/0x180 [snd_seq]
snd_seq_do_ioctl+0x11c/0x190 [snd_seq]
snd_seq_ioctl+0x40/0x80 [snd_seq]
do_vfs_ioctl+0x54b/0xda0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x16/0x75
Call Trace:
  [<ffffffff81b03781>] dump_stack+0x63/0x82
  [<ffffffff81531b3b>] print_trailer+0xfb/0x160
  [<ffffffff81536db4>] object_err+0x34/0x40
  [<ffffffff815392d3>] kasan_report.part.2+0x223/0x520
  [<ffffffffa07aadf4>] ? snd_seq_ioctl_create_port+0x504/0x630 [snd_seq]
  [<ffffffff815395fe>] __asan_report_load1_noabort+0x2e/0x30
  [<ffffffffa07aadf4>] snd_seq_ioctl_create_port+0x504/0x630 [snd_seq]
  [<ffffffffa07aa8f0>] ? snd_seq_ioctl_delete_port+0x180/0x180 [snd_seq]
  [<ffffffff8136be50>] ? taskstats_exit+0xbc0/0xbc0
  [<ffffffffa07abc5c>] snd_seq_do_ioctl+0x11c/0x190 [snd_seq]
  [<ffffffffa07abd10>] snd_seq_ioctl+0x40/0x80 [snd_seq]
  [<ffffffff8136d433>] ? acct_account_cputime+0x63/0x80
  [<ffffffff815b515b>] do_vfs_ioctl+0x54b/0xda0
  .....

We may fix this in a few different ways, and in this patch, it's fixed
simply by taking the refcount properly at snd_seq_create_port() and
letting the caller unref the object after use.  Also, there is another
potential use-after-free by sprintf() call in snd_seq_create_port(),
and this is moved inside the lock.

This fix covers CVE-2017-15265.

Reported-and-tested-by: Michael23 Yu <ycqzsy@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

bio_copy_user_iov(): don't ignore ->iov_offset

Since "block: support large requests in blk_rq_map_user_iov" we
started to call it with partially drained iter; that works fine
on the write side, but reads create a copy of iter for completion
time. And that needs to take the possibility of ->iov_iter != 0
into account...

Cc: stable@vger.kernel.org #v4.5+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

more bio_map_user_iov() leak fixes

we need to take care of failure exit as well - pages already
in bio should be dropped by analogue of bio_unmap_pages(),
since their refcounts had been bumped only once per reference
in bio.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fix unbalanced page refcounting in bio_map_user_iov

bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.

Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

direct-io: Prevent NULL pointer access in submit_page_section

In the code added to function submit_page_section by commit b1058b981,
sdio->bio can currently be NULL when calling dio_bio_submit. This then
leads to a NULL pointer access in dio_bio_submit, so check for a NULL
bio in submit_page_section before trying to submit it instead.

Fixes xfstest generic/250 on gfs2.

Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

PCI: aardvark: Move to struct pci_host_bridge IRQ mapping functions

struct pci_host_bridge gained hooks to map/swizzle IRQs, so that the IRQ
mapping can be done automatically by PCI core code through the
pci_assign_irq() function instead of resorting to arch-specific
implementation callbacks to carry out the same task which force PCI host
bridge drivers implementation to implement per-arch kludges to carry out a
task that is inherently architecture agnostic.

Commit 769b461fc0c0 ("arm64: PCI: Drop DT IRQ allocation from
pcibios_alloc_irq()") was assuming all PCI host controller drivers had been
converted to use ->map_irq(), but that wasn't the case: pci-aardvark had
not been converted. Due to this, it broke the support for legacy PCI
interrupts when using the pci-aardvark driver (used on Marvell Armada 3720
platforms).

In order to fix this, we make sure the ->map_irq and ->swizzle_irq fields
of pci_host_bridge are properly filled in.

Fixes: 769b461fc0c0 ("arm64: PCI: Drop DT IRQ allocation from pcibios_alloc_irq()")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.13+

Revert "PCI: tegra: Do not allocate MSI target memory"

This reverts commit d7bd554f27c942e6b8b54100b4044f9be1038edf.

It turns out that Tegra20 has a bug in the implementation of the MSI
target address register (which is worked around by the existence of the
struct tegra_pcie_soc.msi_base_shift parameter) that restricts the MSI
target memory to the lower 32 bits of physical memory on that particular
generation. The offending patch causes a regression on TrimSlice, which
is a Tegra20-based device and has a PCI network interface card.

An initial, simpler fix was to change the MSI target address for Tegra20
only, but it was pointed out that the offending commit also prevents the
use of 32-bit only MSI capable devices, even on later chips. Technically
this was never guaranteed to work with the prior code in the first place
because the allocated page could have resided beyond the 4 GiB boundary,
but it is still possible that this could've introduced a regression.

The proper fix that was settled on is to select a fixed address within
the lowest 32 bits of physical address space that is otherwise unused,
but testing of that patch has provided mixed results that are not fully
understood yet.

Given all of the above and the relative urgency to get this fixed in
v4.13, revert the offending commit until a universal fix is found.

Fixes: d7bd554f27c9 ("PCI: tegra: Do not allocate MSI target memory")
Reported-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Reported-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.13.x

Merge tag 'seccomp-v4.14-rc5' of git://git./linux/kernel/git/kees/linux

Pull seccomp fixlet from Kees Cook:
"Minor seccomp fix for v4.14-rc5. I debated sending this at all for
  v4.14, but since it fixes a minor issue in the prior fix, which also
  went to -stable, it seemed better to just get all of it cleaned up
  right now.

   - fix missed "static" to avoid Sparse warning (Colin King)"

* tag 'seccomp-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: make function __get_seccomp_filter static

Merge tag 'nfsd-4.14-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix from Bruce Fields:
"One fix for a 4.14 regression, and one minor fix to the MAINTAINERs
  file. (I was weirdly flattered by the idea that lots of random people
  suddenly seemed to think Jeff and I were VFS experts. Turns out it was
  just a typo)"

* tag 'nfsd-4.14-1' of git://linux-nfs.org/~bfields/linux:
  nfsd4: define nfsd4_secinfo_no_name_release()
  MAINTAINERS: associate linux/fs.h with VFS instead of file locking

seccomp: make function __get_seccomp_filter static

The function __get_seccomp_filter is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol '__get_seccomp_filter' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Fixes: 66a733ea6b61 ("seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>

remoteproc: qcom: fix RPMSG_QCOM_GLINK_SMEM dependencies

When RPMSG_QCOM_GLINK_SMEM=m and one driver causes the qcom_common.c file
to be compiled as built-in, we get a link error:

drivers/remoteproc/qcom_common.o: In function `glink_subdev_remove':
qcom_common.c:(.text+0x130): undefined reference to `qcom_glink_smem_unregister'
qcom_common.c:(.text+0x130): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `qcom_glink_smem_unregister'
drivers/remoteproc/qcom_common.o: In function `glink_subdev_probe':
qcom_common.c:(.text+0x160): undefined reference to `qcom_glink_smem_register'
qcom_common.c:(.text+0x160): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `qcom_glink_smem_register'

Out of the three PIL driver instances, QCOM_ADSP_PIL already has a
Kconfig dependency to prevent this from happening, but the other two
do not. This adds the same dependency there.

Fixes: eea07023e6d9 ("remoteproc: qcom: adsp: Allow defining GLINK edge")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

remoteproc: imx_rproc: fix a couple off by one bugs

The priv->mem[] array has IMX7D_RPROC_MEM_MAX elements so the > should
be >= to avoid writing one element beyond the end of the array.

Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

rpmsg: glink: Fix memory leak in qcom_glink_alloc_intent()

We need to free "intent" and "intent->data" on a couple error paths.

Fixes: 933b45da5d1d ("rpmsg: glink: Add support for TX intents")
Acked-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

rpmsg: glink: Unlock on error in qcom_glink_request_intent()

If qcom_glink_tx() fails, then we need to unlock before returning the
error code.

Fixes: 27b9c5b66b23 ("rpmsg: glink: Request for intents when unavailable")
Acked-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Merge tag 'f2fs-for-4.14-rc5' of git://git./linux/kernel/git/jaegeuk/f2fs

Pull f2fs fix from Jaegeuk Kim:
"This contains one bug fix which causes a kernel panic during fstrim
introduced in 4.14-rc1"

* tag 'f2fs-for-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix potential panic during fstrim

Merge tag 'linux-kselftest-4.14-rc5-fixes' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

- fix for x86: sysret_ss_attrs test build failure preventing the x86
   tests from running

- fix mqueue: fix regression in silencing test run output

* tag 'linux-kselftest-4.14-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: mqueue: fix regression in silencing output from RUN_TESTS
  selftests: x86: sysret_ss_attrs doesn't build on a PIE build

iommu/amd: Do not disable SWIOTLB if SME is active

When SME memory encryption is active it will rely on SWIOTLB to handle
DMA for devices that cannot support the addressing requirements of
having the encryption mask set in the physical address. The IOMMU
currently disables SWIOTLB if it is not running in passthrough mode.
This is not desired as non-PCI devices attempting DMA may fail. Update
the code to check if SME is active and not disable SWIOTLB.

Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>

Merge tag 'perf-urgent-for-mingo-4.14-20171010' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Unbreak 'perf record' for arm/arm64 with events with explicit PMU (Mark Rutland)

- Add missing separator for "perf script -F ip,brstack" (and brstackoff) (Mark Santaniello)

- One line, comment only, sync kernel ABI header with tooling header (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

crypto: shash - Fix zero-length shash ahash digest crash

The shash ahash digest adaptor function may crash if given a
zero-length input together with a null SG list. This is because
it tries to read the SG list before looking at the length.

This patch fixes it by checking the length first.

Cc: <stable@vger.kernel.org>
Reported-by: Stephan Müller<smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Müller <smueller@chronox.de>

quota: Generate warnings for DQUOT_SPACE_NOFAIL allocations

Eryu has reported that since commit 7b9ca4c61bc2 "quota: Reduce
contention on dq_data_lock" test generic/233 occasionally fails. This is
caused by the fact that since that commit we don't generate warning and
set grace time for quota allocations that have DQUOT_SPACE_NOFAIL set
(these are for example some metadata allocations in ext4). We need these
allocations to behave regularly wrt warning generation and grace time
setting so fix the code to return to the original behavior.

Reported-and-tested-by: Eryu Guan <eguan@redhat.com>
CC: stable@vger.kernel.org
Fixes: 7b9ca4c61bc278b771fb57d6290a31ab1fc7fdac
Signed-off-by: Jan Kara <jack@suse.cz>

KVM: MMU: always terminate page walks at level 1

is_last_gpte() is not equivalent to the pseudo-code given in commit
6bb69c9b69c31 ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
value of last_nonleaf_level may override the result even if level == 1.

It is critical for is_last_gpte() to return true on level == 1 to
terminate page walks. Otherwise memory corruption may occur as level
is used as an index to various data structures throughout the page
walking code. Even though the actual bug would be wherever the MMU is
initialized (as in the previous patch), be defensive and ensure here
that is_last_gpte() returns the correct value.

This patch is also enough to fix CVE-2017-12188.

Fixes: 6bb69c9b69c315200ddc2bc79aee14c0184cf5b2
Cc: stable@vger.kernel.org
Cc: Andy Honig <ahonig@google.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
[Panic if walk_addr_generic gets an incorrect level; this is a serious
bug and it's not worth a WARN_ON where the recovery path might hide
further exploitable issues; suggested by Andrew Honig. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: update last_nonleaf_level when initializing nested EPT

The function updates context->root_level but didn't call
update_last_nonleaf_level so the previous and potentially wrong value
was used for page walks. For example, a zero value of last_nonleaf_level
would allow a potential out-of-bounds access in arch/x86/mmu/paging_tmpl.h's
walk_addr_generic function (CVE-2017-12188).

Fixes: 155a97a3d7c78b46cef6f1a973c831bc5a4f82bb
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

xen/vcpu: Use a unified name about cpu hotplug state for pv and pvhvm

As xen_cpuhp_setup is called by PV and PVHVM, the name of "x86/xen/hvm_guest"
is confusing.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

ALSA: usb-audio: Kill stray URB at exiting

USB-audio driver may leave a stray URB for the mixer interrupt when it
exits by some error during probe.  This leads to a use-after-free
error as spotted by syzkaller like:
  ==================================================================
  BUG: KASAN: use-after-free in snd_usb_mixer_interrupt+0x604/0x6f0
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x292/0x395 lib/dump_stack.c:52
   print_address_description+0x78/0x280 mm/kasan/report.c:252
   kasan_report_error mm/kasan/report.c:351
   kasan_report+0x23d/0x350 mm/kasan/report.c:409
   __asan_report_load8_noabort+0x19/0x20 mm/kasan/report.c:430
   snd_usb_mixer_interrupt+0x604/0x6f0 sound/usb/mixer.c:2490
   __usb_hcd_giveback_urb+0x2e0/0x650 drivers/usb/core/hcd.c:1779
   ....

  Allocated by task 1484:
   save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
   save_stack+0x43/0xd0 mm/kasan/kasan.c:447
   set_track mm/kasan/kasan.c:459
   kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
   kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772
   kmalloc ./include/linux/slab.h:493
   kzalloc ./include/linux/slab.h:666
   snd_usb_create_mixer+0x145/0x1010 sound/usb/mixer.c:2540
   create_standard_mixer_quirk+0x58/0x80 sound/usb/quirks.c:516
   snd_usb_create_quirk+0x92/0x100 sound/usb/quirks.c:560
   create_composite_quirk+0x1c4/0x3e0 sound/usb/quirks.c:59
   snd_usb_create_quirk+0x92/0x100 sound/usb/quirks.c:560
   usb_audio_probe+0x1040/0x2c10 sound/usb/card.c:618
   ....

  Freed by task 1484:
   save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
   save_stack+0x43/0xd0 mm/kasan/kasan.c:447
   set_track mm/kasan/kasan.c:459
   kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524
   slab_free_hook mm/slub.c:1390
   slab_free_freelist_hook mm/slub.c:1412
   slab_free mm/slub.c:2988
   kfree+0xf6/0x2f0 mm/slub.c:3919
   snd_usb_mixer_free+0x11a/0x160 sound/usb/mixer.c:2244
   snd_usb_mixer_dev_free+0x36/0x50 sound/usb/mixer.c:2250
   __snd_device_free+0x1ff/0x380 sound/core/device.c:91
   snd_device_free_all+0x8f/0xe0 sound/core/device.c:244
   snd_card_do_free sound/core/init.c:461
   release_card_device+0x47/0x170 sound/core/init.c:181
   device_release+0x13f/0x210 drivers/base/core.c:814
   ....

Actually such a URB is killed properly at disconnection when the
device gets probed successfully, and what we need is to apply it for
the error-path, too.

In this patch, we apply snd_usb_mixer_disconnect() at releasing.
Also introduce a new flag, disconnected, to struct usb_mixer_interface
for not performing the disconnection procedure twice.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing

Do not consider the fixed size of hv_vp_set when passing the variable
header size to hv_do_rep_hypercall().

The Hyper-V hypervisor specification states that for a hypercall with a
variable header only the size of the variable portion should be supplied
via the input control.

For HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX calls that means the
fixed portion of hv_vp_set should not be considered.

That fixes random failures of some applications that are unexpectedly
killed with SIGBUS or SIGSEGV.

Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Fixes: 628f54cc6451 ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")
Link: http://lkml.kernel.org/r/1507210469-29065-1-git-send-email-marcelo.cerri@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures

hv_do_hypercall() does virt_to_phys() translation and with some configs
(CONFIG_SLAB) this doesn't work for percpu areas, we pass wrong memory to
hypervisor and get #GP. We could use working slow_virt_to_phys() instead
but doing so kills the performance.

Move pcpu_flush/pcpu_flush_ex structures out of percpu areas and
allocate memory on first call. The additional level of indirection gives
us a small performance penalty, in future we may consider introducing
hypercall functions which avoid virt_to_phys() conversion and cache
physical addresses of pcpu_flush/pcpu_flush_ex structures somewhere.

Reported-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20171005113924.28021-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs

hv_flush_pcpu_ex structures are not cleared between calls for performance
reasons (they're variable size up to PAGE_SIZE each) but we must clear
hv_vp_set.bank_contents part of it to avoid flushing unneeded vCPUs. The
rest of the structure is formed correctly.

To do the clearing in an efficient way stash the maximum possible vCPU
number (this may differ from Linux CPU id).

Reported-by: Jork Loeser <Jork.Loeser@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20171006154854.18092-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/x86/intel/uncore: Fix memory leaks on allocation failures

Currently if an allocation fails then the error return paths
don't free up any currently allocated pmus[].boxes and pmus causing
a memory leak. Add an error clean up exit path that frees these
objects.

Detected by CoverityScan, CID#711632 ("Resource Leak")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: 087bfbb03269 ("perf/x86: Add generic Intel uncore PMU support")
Link: http://lkml.kernel.org/r/20171009172655.6132-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/unwind: Disable unwinder warnings on 32-bit

x86-32 doesn't have stack validation, so in most cases it doesn't make
sense to warn about bad frame pointers.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a69658760800bf281e6353248c23e0fa0acf5230.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/unwind: Align stack pointer in unwinder dump

When printing the unwinder dump, the stack pointer could be unaligned,
for one of two reasons:

- stack corruption; or

- GCC created an unaligned stack.

There's no way for the unwinder to tell the difference between the two,
so we have to assume one or the other.  GCC unaligned stacks are very
rare, and have only been spotted before GCC 5.  Presumably, if we're
doing an unwinder stack dump, stack corruption is more likely than a
GCC unaligned stack.  So always align the stack before starting the
dump.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/unwind: Use MSB for frame pointer encoding on 32-bit

On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings
like:

  WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000

And also there were some stack dumps with a bunch of unreliable '?'
symbols after an apic_timer_interrupt symbol, meaning the unwinder got
confused when it tried to read the regs.

The cause of those issues is that, with GCC 4.8 (and possibly older),
there are cases where GCC misaligns the stack pointer in a leaf function
for no apparent reason:

  c124a388 <acpi_rs_move_data>:
  c124a388:       55                      push   %ebp
  c124a389:       89 e5                   mov    %esp,%ebp
  c124a38b:       57                      push   %edi
  c124a38c:       56                      push   %esi
  c124a38d:       89 d6                   mov    %edx,%esi
  c124a38f:       53                      push   %ebx
  c124a390:       31 db                   xor    %ebx,%ebx
  c124a392:       83 ec 03                sub    $0x3,%esp
  ...
  c124a3e3:       83 c4 03                add    $0x3,%esp
  c124a3e6:       5b                      pop    %ebx
  c124a3e7:       5e                      pop    %esi
  c124a3e8:       5f                      pop    %edi
  c124a3e9:       5d                      pop    %ebp
  c124a3ea:       c3                      ret

If an interrupt occurs in such a function, the regs on the stack will be
unaligned, which breaks the frame pointer encoding assumption.  So on
32-bit, use the MSB instead of the LSB to encode the regs.

This isn't an issue on 64-bit, because interrupts align the stack before
writing to it.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/unwind: Fix dereference of untrusted pointer

Tetsuo Handa and Fengguang Wu reported a panic in the unwinder:

  BUG: unable to handle kernel NULL pointer dereference at 000001f2
  IP: update_stack_state+0xd4/0x340
  *pde = 00000000

  Oops: 0000 [#1] PREEMPT SMP
  CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be67 #592
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
  task: bb0b53c0 task.stack: bb3ac000
  EIP: update_stack_state+0xd4/0x340
  EFLAGS: 00010002 CPU: 0
  EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570
  ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690
  DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
  DR6: fffe0ff0 DR7: 00000400
  Call Trace:
   ? unwind_next_frame+0xea/0x400
   ? __unwind_start+0xf5/0x180
   ? __save_stack_trace+0x81/0x160
   ? save_stack_trace+0x20/0x30
   ? __lock_acquire+0xfa5/0x12f0
   ? lock_acquire+0x1c2/0x230
   ? tick_periodic+0x3a/0xf0
   ? _raw_spin_lock+0x42/0x50
   ? tick_periodic+0x3a/0xf0
   ? tick_periodic+0x3a/0xf0
   ? debug_smp_processor_id+0x12/0x20
   ? tick_handle_periodic+0x23/0xc0
   ? local_apic_timer_interrupt+0x63/0x70
   ? smp_trace_apic_timer_interrupt+0x235/0x6a0
   ? trace_apic_timer_interrupt+0x37/0x3c
   ? strrchr+0x23/0x50
  Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83
  EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f
  CR2: 00000000000001f2
  ---[ end trace 0d147fd4aba8ff50 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

On x86-32, after decoding a frame pointer to get a regs address,
regs_size() dereferences the regs pointer when it checks regs->cs to see
if the regs are user mode.  This is dangerous because it's possible that
what looks like a decoded frame pointer is actually a corrupt value, and
we don't want the unwinder to make things worse.

Instead of calling regs_size() on an unsafe pointer, just assume they're
kernel regs to start with.  Later, once it's safe to access the regs, we
can do the user mode check and corresponding safety check for the
remaining two regs.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5ed8d8bb38c5 ("x86/unwind: Move common code into update_stack_state()")
Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology()

It turns out that not all paths calling arch_update_cpu_topology() hold
cpu_hotplug_lock, but that's OK because those paths can't race with
any concurrent hotplug events.

Warnings were reported with the following trace:

  lockdep_assert_cpus_held
  arch_update_cpu_topology
  sched_init_domains
  sched_init_smp
  kernel_init_freeable
  kernel_init
  ret_from_kernel_thread

Which is safe because it's called early in boot when hotplug is not
live yet.

And also this trace:

  lockdep_assert_cpus_held
  arch_update_cpu_topology
  partition_sched_domains
  cpuset_update_active_cpus
  sched_cpu_deactivate
  cpuhp_invoke_callback
  cpuhp_down_callbacks
  cpuhp_thread_fun
  smpboot_thread_fn
  kthread
  ret_from_kernel_thread

Which is safe because it's called as part of CPU hotplug, so although
we don't hold the CPU hotplug lock, there is another thread driving
the CPU hotplug operation which does hold the lock, and there is no
race.

Thanks to tglx for deciphering it for us.

Fixes: 3e401f7a2e51 ("powerpc: Only obtain cpu_hotplug_lock if called by rtasd")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

powerpc/lib/sstep: Fix count leading zeros instructions

According to the GCC documentation, the behaviour of __builtin_clz()
and __builtin_clzl() is undefined if the value of the input argument
is zero. Without handling this special case, these builtins have been
used for emulating the following instructions:
* Count Leading Zeros Word (cntlzw[.])
* Count Leading Zeros Doubleword (cntlzd[.])

This fixes the emulated behaviour of these instructions by adding an
additional check for this special case.

Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

sched/core: Ensure load_balance() respects the active_mask

While load_balance() masks the source CPUs against active_mask, it had
a hole against the destination CPU. Ensure the destination CPU is also
part of the 'domain-mask & active-mask' set.

Reported-by: Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/core: Address more wake_affine() regressions

The trivial wake_affine_idle() implementation is very good for a
number of workloads, but it comes apart at the moment there are no
idle CPUs left, IOW. the overloaded case.

hackbench:

NO_WA_WEIGHT WA_WEIGHT

hackbench-20  : 7.362717561 seconds 6.450509391 seconds

(win)

netperf:

  NO_WA_WEIGHT WA_WEIGHT

TCP_SENDFILE-1 : Avg: 54524.6 Avg: 52224.3
TCP_SENDFILE-10 : Avg: 48185.2          Avg: 46504.3
TCP_SENDFILE-20 : Avg: 29031.2          Avg: 28610.3
TCP_SENDFILE-40 : Avg: 9819.72          Avg: 9253.12
TCP_SENDFILE-80 : Avg: 5355.3           Avg: 4687.4

TCP_STREAM-1 : Avg: 41448.3          Avg: 42254
TCP_STREAM-10 : Avg: 24123.2          Avg: 25847.9
TCP_STREAM-20 : Avg: 15834.5          Avg: 18374.4
TCP_STREAM-40 : Avg: 5583.91          Avg: 5599.57
TCP_STREAM-80 : Avg: 2329.66          Avg: 2726.41

TCP_RR-1 : Avg: 80473.5          Avg: 82638.8
TCP_RR-10 : Avg: 72660.5          Avg: 73265.1
TCP_RR-20 : Avg: 52607.1          Avg: 52634.5
TCP_RR-40 : Avg: 57199.2          Avg: 56302.3
TCP_RR-80 : Avg: 25330.3          Avg: 26867.9

UDP_RR-1 : Avg: 108266           Avg: 107844
UDP_RR-10 : Avg: 95480            Avg: 95245.2
UDP_RR-20 : Avg: 68770.8          Avg: 68673.7
UDP_RR-40 : Avg: 76231            Avg: 75419.1
UDP_RR-80 : Avg: 34578.3          Avg: 35639.1

UDP_STREAM-1 : Avg: 64684.3          Avg: 66606
UDP_STREAM-10 : Avg: 52701.2          Avg: 52959.5
UDP_STREAM-20 : Avg: 30376.4          Avg: 29704
UDP_STREAM-40 : Avg: 15685.8          Avg: 15266.5
UDP_STREAM-80 : Avg: 8415.13          Avg: 7388.97

(wins and losses)

sysbench:

    NO_WA_WEIGHT WA_WEIGHT

sysbench-mysql-2  :  2135.17 per sec. 2142.51 per sec.
sysbench-mysql-5  :  4809.68 per sec.            4800.19 per sec.
sysbench-mysql-10 :  9158.59 per sec.            9157.05 per sec.
sysbench-mysql-20 : 14570.70 per sec.           14543.55 per sec.
sysbench-mysql-40 : 22130.56 per sec.           22184.82 per sec.
sysbench-mysql-80 : 20995.56 per sec.           21904.18 per sec.

sysbench-psql-2   :  1679.58 per sec.            1705.06 per sec.
sysbench-psql-5   :  3797.69 per sec.            3879.93 per sec.
sysbench-psql-10  :  7253.22 per sec.            7258.06 per sec.
sysbench-psql-20  : 11166.75 per sec.           11220.00 per sec.
sysbench-psql-40  : 17277.28 per sec.           17359.78 per sec.
sysbench-psql-80  : 17112.44 per sec.           17221.16 per sec.

(increase on the top end)

tbench:

NO_WA_WEIGHT

Throughput 685.211 MB/sec   2 clients   2 procs  max_latency=0.123 ms
Throughput 1596.64 MB/sec   5 clients   5 procs  max_latency=0.119 ms
Throughput 2985.47 MB/sec  10 clients  10 procs  max_latency=0.262 ms
Throughput 4521.15 MB/sec  20 clients  20 procs  max_latency=0.506 ms
Throughput 9438.1  MB/sec  40 clients  40 procs  max_latency=2.052 ms
Throughput 8210.5  MB/sec  80 clients  80 procs  max_latency=8.310 ms

WA_WEIGHT

Throughput 697.292 MB/sec   2 clients   2 procs  max_latency=0.127 ms
Throughput 1596.48 MB/sec   5 clients   5 procs  max_latency=0.080 ms
Throughput 2975.22 MB/sec  10 clients  10 procs  max_latency=0.254 ms
Throughput 4575.14 MB/sec  20 clients  20 procs  max_latency=0.502 ms
Throughput 9468.65 MB/sec  40 clients  40 procs  max_latency=2.069 ms
Throughput 8631.73 MB/sec  80 clients  80 procs  max_latency=8.605 ms

(increase on the top end)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

sched/core: Fix wake_affine() performance regression

Eric reported a sysbench regression against commit:

  3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()")

Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed
against his v3.10 enterprise kernel.

PRE (current tip/master):

ivb-ep sysbench:

   2: [30 secs]     transactions:                        64110  (2136.94 per sec.)
   5: [30 secs]     transactions:                        143644 (4787.99 per sec.)
  10: [30 secs]     transactions:                        274298 (9142.93 per sec.)
  20: [30 secs]     transactions:                        418683 (13955.45 per sec.)
  40: [30 secs]     transactions:                        320731 (10690.15 per sec.)
  80: [30 secs]     transactions:                        355096 (11834.28 per sec.)

hsw-ex NAS:

OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds =                    18.01
OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds =                    17.89
OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds =                    17.93
lu.C.x_threads_144_run_1.log: Time in seconds =                   434.68
lu.C.x_threads_144_run_2.log: Time in seconds =                   405.36
lu.C.x_threads_144_run_3.log: Time in seconds =                   433.83

POST (+patch):

ivb-ep sysbench:

   2: [30 secs]     transactions:                        64494  (2149.75 per sec.)
   5: [30 secs]     transactions:                        145114 (4836.99 per sec.)
  10: [30 secs]     transactions:                        278311 (9276.69 per sec.)
  20: [30 secs]     transactions:                        437169 (14571.60 per sec.)
  40: [30 secs]     transactions:                        669837 (22326.73 per sec.)
  80: [30 secs]     transactions:                        631739 (21055.88 per sec.)

hsw-ex NAS:

lu.C.x_threads_144_run_1.log: Time in seconds =                    23.36
lu.C.x_threads_144_run_2.log: Time in seconds =                    22.96
lu.C.x_threads_144_run_3.log: Time in seconds =                    22.52

This patch takes out all the shiny wake_affine() stuff and goes back to
utter basics. Between the two CPUs involved with the wakeup (the CPU
doing the wakeup and the CPU we ran on previously) pick the CPU we can
run on _now_.

This restores much of the regressions against the older kernels,
but leaves some ground in the overloaded case. The default-enabled
WA_WEIGHT (which will be introduced in the next patch) is an attempt
to address the overloaded situation.

Reported-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jinpuwang@gmail.com
Cc: vcaputo@pengaru.com
Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/core: Fix cgroup time when scheduling descendants

Update cgroup time when an event is scheduled in by descendants.

Reviewed-and-tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: leilei.lin <leilei.lin@alibaba-inc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: brendan.d.gregg@gmail.com
Cc: yang_oliver@hotmail.com
Link: http://lkml.kernel.org/r/CALPjY3mkHiekRkRECzMi9G-bjUQOvOjVBAqxmWkTzc-g+0LwMg@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf/core: Avoid freeing static PMU contexts when PMU is unregistered

Since commit:

1fd7e4169954 ("perf/core: Remove perf_cpu_context::unique_pmu")

... when a PMU is unregistered then its associated ->pmu_cpu_context is
unconditionally freed. Whilst this is fine for dynamically allocated
context types (i.e. those registered using perf_invalid_context), this
causes a problem for sharing of static contexts such as
perf_{sw,hw}_context, which are used by multiple built-in PMUs and
effectively have a global lifetime.

Whilst testing the ARM SPE driver, which must use perf_sw_context to
support per-task AUX tracing, unregistering the driver as a result of a
module unload resulted in:

Unable to handle kernel NULL pointer dereference at virtual address 00000038
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in: [last unloaded: arm_spe_pmu]
PC is at ctx_resched+0x38/0xe8
LR is at perf_event_exec+0x20c/0x278
[...]
ctx_resched+0x38/0xe8
perf_event_exec+0x20c/0x278
setup_new_exec+0x88/0x118
load_elf_binary+0x26c/0x109c
search_binary_handler+0x90/0x298
do_execveat_common.isra.14+0x540/0x618
SyS_execve+0x38/0x48

since the software context has been freed and the ctx.pmu->pmu_disable_count
field has been set to NULL.

This patch fixes the problem by avoiding the freeing of static PMU contexts
altogether. Whilst the sharing of dynamic contexts is questionable, this
actually requires the caller to share their context pointer explicitly
and so the burden is on them to manage the object lifetime.

Reported-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1fd7e4169954 ("perf/core: Remove perf_cpu_context::unique_pmu")
Link: http://lkml.kernel.org/r/1507040450-7730-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

locking/selftest: Avoid false BUG report

The work-around for the expected failure is providing another failure :/

Only when CONFIG_PROVE_LOCKING=y do we increment unexpected_testcase_failures,
so only then do we need to decrement, otherwise we'll end up with a negative
number and that will again trigger a BUG (printout, not crash).

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d82fed752942 ("locking/lockdep/selftests: Fix mixed read-write ABBA tests")
Signed-off-by: Ingo Molnar <mingo@kernel.org>

locking/lockdep: Fix stacktrace mess

There is some complication between check_prevs_add() and
check_prev_add() wrt. saving stack traces. The problem is that we want
to be frugal with saving stack traces, since it consumes static
resources.

We'll only know in check_prev_add() if we need the trace, but we can
call into it multiple times. So we want to do on-demand and re-use.

A further complication is that check_prev_add() can drop graph_lock
and mess with our static resources.

In any case, the current state; after commit:

ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")

is that we'll assume the trace contains valid data once
check_prev_add() returns '2'. However, as noted by Josh, this is
false, check_prev_add() can return '2' before having saved a trace,
this then result in the possibility of using uninitialized data.
Testing, as reported by Wu, shows a NULL deref.

So simplify.

Since the graph_lock() thing is a debug path that hasn't
really been used in a long while, take it out back and avoid the
head-ache.

Further initialize the stack_trace to a known 'empty' state; as long
as nr_entries == 0, nothing should deref entries. We can then use the
'entries == NULL' test for a valid trace / on-demand saving.

Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
Signed-off-by: Ingo Molnar <mingo@kernel.org>