platform/kernel/linux-rpi.git
6 years agodrm/amd/pp: Honour DC's clock limits on Rv
Rex Zhu [Tue, 18 Sep 2018 10:07:54 +0000 (18:07 +0800)]
drm/amd/pp: Honour DC's clock limits on Rv

Honour display's request for min engine clock/memory clock.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amd/dc: Trigger set power state task when display configuration changes
Rex Zhu [Fri, 14 Sep 2018 03:32:52 +0000 (11:32 +0800)]
drm/amd/dc: Trigger set power state task when display configuration changes

Revert "drm/amd/display: Remove call to amdgpu_pm_compute_clocks"

This reverts commit dcd473770e86517543691bdb227103d6c781cd0a.

when display configuration changes, dc need to update the changes
to powerplay, also need to trigger a power state task.
amdgpu_pm_compute_clocks is the interface to set power state task
either dpm enabled or powerplay enabled

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoBackMerge v4.19-rc5 into drm-next
Dave Airlie [Thu, 27 Sep 2018 01:06:46 +0000 (11:06 +1000)]
BackMerge v4.19-rc5 into drm-next

Sean Paul requested an -rc5 backmerge from some sun4i fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoMerge tag 'drm-hisilicon-next-2018-09-26' of github.com:xin3liang/linux into drm...
Dave Airlie [Thu, 27 Sep 2018 01:00:06 +0000 (11:00 +1000)]
Merge tag 'drm-hisilicon-next-2018-09-26' of github.com:xin3liang/linux into drm-next

- A crash fix founded in recent linux-next from John Garry
- One sparse warning fix from Souptick Joarder
- Some xxx_unref cleanup from Thomas Zimmermann

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Xinliang Liu <xinliang.liu@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/CAGd==04mXPMjVZ3=cM8r+DSQNM6zy7Anc4T2OsHjZgSsazBTPQ@mail.gmail.com
6 years agoMerge tag 'du-next-20180925' of git://linuxtv.org/pinchartl/media into drm-next
Dave Airlie [Thu, 27 Sep 2018 00:54:54 +0000 (10:54 +1000)]
Merge tag 'du-next-20180925' of git://linuxtv.org/pinchartl/media into drm-next

R-Car DU support for the D3 and E3 SoCs (v4.20)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3289904.RCOHkcp7u8@avalon
6 years agodrm/hisilicon: Replace ttm_bo_unref with ttm_bo_put
Thomas Zimmermann [Tue, 31 Jul 2018 06:33:05 +0000 (08:33 +0200)]
drm/hisilicon: Replace ttm_bo_unref with ttm_bo_put

The function ttm_bo_put releases a reference to a TTM buffer object. The
function's name is more aligned to the Linux kernel convention of naming
ref-counting function _get and _put.

A call to ttm_bo_unref takes the address of the TTM BO object's pointer and
clears the pointer's value to NULL. This is not necessary in most cases and
sometimes even worked around by the calling code. A call to ttm_bo_put only
releases the reference without clearing the pointer.

The current behaviour of cleaning the pointer is kept in the calling code,
but should be removed if not required in a later patch.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
Signed-off-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
6 years agodrm/hisilicon: Replace drm_dev_unref with drm_dev_put
Thomas Zimmermann [Fri, 13 Jul 2018 08:48:24 +0000 (10:48 +0200)]
drm/hisilicon: Replace drm_dev_unref with drm_dev_put

This patch unifies the naming of DRM functions for reference counting
of struct drm_device. The resulting code is more aligned with the rest
of the Linux kernel interfaces.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
Signed-off-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
6 years agogpu/drm/hisilicon: Convert drm_atomic_helper_suspend/resume()
Souptick Joarder [Mon, 6 Aug 2018 14:49:01 +0000 (20:19 +0530)]
gpu/drm/hisilicon: Convert drm_atomic_helper_suspend/resume()

convert drm_atomic_helper_suspend/resume() to use
drm_mode_config_helper_suspend/resume().

Fixed one sparse warning by making hibmc_drm_interrupt
static.

Signed-off-by: Ajit Negi <ajitn.linux@gmail.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
Signed-off-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
6 years agodrm/hisilicon: hibmc: Use HUAWEI PCI vendor ID macro
John Garry [Fri, 21 Sep 2018 17:25:27 +0000 (01:25 +0800)]
drm/hisilicon: hibmc: Use HUAWEI PCI vendor ID macro

Switch to use Huawei PCI vendor ID macro from pci_ids.h file.

In addition, switch to use PCI_VDEVICE() instead of open coding.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
Signed-off-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
6 years agodrm/hisilicon: hibmc: Don't overwrite fb helper surface depth
John Garry [Fri, 21 Sep 2018 17:25:26 +0000 (01:25 +0800)]
drm/hisilicon: hibmc: Don't overwrite fb helper surface depth

Currently the driver overwrites the surface depth provided by the fb
helper to give an invalid bpp/surface depth combination.

This has been exposed by commit 70109354fed2 ("drm: Reject unknown legacy
bpp and depth for drm_mode_addfb ioctl"), which now causes the driver to
fail to probe.

Fix by not overwriting the surface depth.

Fixes: d1667b86795a ("drm/hisilicon/hibmc: Add support for frame buffer")
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
Signed-off-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
6 years agodrm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer pointer
John Garry [Fri, 21 Sep 2018 17:25:25 +0000 (01:25 +0800)]
drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer pointer

In hibmc_drm_fb_create(), when the call to hibmc_framebuffer_init() fails
with error, do not store the error code in the HiBMC device frame-buffer
pointer, as this will be later checked for non-zero value in
hibmc_fbdev_destroy() when our intention is to check for a valid function
pointer.

This fixes the following crash:
[    9.699791] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001a
[    9.708672] Mem abort info:
[    9.711489]   ESR = 0x96000004
[    9.714570]   Exception class = DABT (current EL), IL = 32 bits
[    9.720551]   SET = 0, FnV = 0
[    9.723631]   EA = 0, S1PTW = 0
[    9.726799] Data abort info:
[    9.729702]   ISV = 0, ISS = 0x00000004
[    9.733573]   CM = 0, WnR = 0
[    9.736566] [000000000000001a] user address but active_mm is swapper
[    9.742987] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    9.748614] Modules linked in:
[    9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G        W         4.19.0-rc4-next-20180920-00001-g9b0012c #322
[    9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
[    9.771915] Workqueue: events work_for_cpu_fn
[    9.776312] pstate: 60000005 (nZCv daif -PAN -UAO)
[    9.781150] pc : drm_mode_object_put+0x0/0x20
[    9.785547] lr : hibmc_fbdev_fini+0x40/0x58
[    9.789767] sp : ffff00000af1bcf0
[    9.793108] x29: ffff00000af1bcf0 x28: 0000000000000000
[    9.798473] x27: 0000000000000000 x26: ffff000008f66630
[    9.803838] x25: 0000000000000000 x24: ffff0000095abb98
[    9.809203] x23: ffff8017db92fe00 x22: ffff8017d2b13000
[    9.814568] x21: ffffffffffffffea x20: ffff8017d2f80018
[    9.819933] x19: ffff8017d28a0018 x18: ffffffffffffffff
[    9.825297] x17: 0000000000000000 x16: 0000000000000000
[    9.830662] x15: ffff0000092296c8 x14: ffff00008939970f
[    9.836026] x13: ffff00000939971d x12: ffff000009229940
[    9.841391] x11: ffff0000085f8fc0 x10: ffff00000af1b9a0
[    9.846756] x9 : 000000000000000d x8 : 6620657a696c6169
[    9.852121] x7 : ffff8017d3340580 x6 : ffff8017d4168000
[    9.857486] x5 : 0000000000000000 x4 : ffff8017db92fb20
[    9.862850] x3 : 0000000000002690 x2 : ffff8017d3340480
[    9.868214] x1 : 0000000000000028 x0 : 0000000000000002
[    9.873580] Process kworker/16:1 (pid: 293, stack limit = 0x(____ptrval____))
[    9.880788] Call trace:
[    9.883252]  drm_mode_object_put+0x0/0x20
[    9.887297]  hibmc_unload+0x1c/0x80
[    9.890815]  hibmc_pci_probe+0x170/0x3c8
[    9.894773]  local_pci_probe+0x3c/0xb0
[    9.898555]  work_for_cpu_fn+0x18/0x28
[    9.902337]  process_one_work+0x1e0/0x318
[    9.906382]  worker_thread+0x228/0x450
[    9.910164]  kthread+0x128/0x130
[    9.913418]  ret_from_fork+0x10/0x18
[    9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
[    9.923180] ---[ end trace 2695ffa0af5be375 ]---

Fixes: d1667b86795a ("drm/hisilicon/hibmc: Add support for frame buffer")
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
Signed-off-by: Xinliang Liu <z.liuxinliang@hisilicon.com>
6 years agodrm: rcar-du: Add r8a77990 and r8a77995 device support
Ulrich Hecht [Tue, 14 Aug 2018 13:49:56 +0000 (15:49 +0200)]
drm: rcar-du: Add r8a77990 and r8a77995 device support

Add support for the R-Car D3 (R8A77995) and E3 (R8A77990) SoCs to the
R-Car DU driver. The two SoCs instantiate compatible DUs, so a single
information structure is enough.

Signed-off-by: Ulrich Hecht <uli+renesas@fpond.eu>
[Add support for R8A77990]
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
6 years agodrm: rcar-du: Don't use TV sync mode when not supported by the hardware
Laurent Pinchart [Wed, 22 Aug 2018 13:21:33 +0000 (16:21 +0300)]
drm: rcar-du: Don't use TV sync mode when not supported by the hardware

The official way to stop the display is to clear the display enable
(DEN) bit in the DSYSR register, but that operates at a group level and
affects the two channels in the group. To disable channels selectively,
the driver uses TV sync mode that stops display operation on the channel
and turns output signals into inputs.

While TV sync mode is available in all DU models currently supported,
the D3 and E3 DUs don't support it. We will thus need to find an
alternative way to turn channels off.

In the meantime, condition the switch to TV sync mode to the
availability of the feature, to avoid writing an invalid value to the
DSYSR register. When the feature is unavailable the display output will
turn blank as all planes are disabled when stopping the CRTC.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
6 years agodrm: rcar-du: Cache DSYSR value to ensure known initial value
Laurent Pinchart [Wed, 22 Aug 2018 13:05:02 +0000 (16:05 +0300)]
drm: rcar-du: Cache DSYSR value to ensure known initial value

DSYSR is a DU channel register that also contains group fields. It is
thus written to by both the group and CRTC code, using read-update-write
sequences. As the register isn't initialized explicitly at startup time,
this can lead to invalid or otherwise unexpected values being written to
some of the fields if they have been modified by the firmware or just
not reset properly.

To fix this we can write a fully known value to the DSYSR register when
turning a channel's functional clock on. However, the mix of group and
channel fields complicate this. A simpler solution is to cache the
register and initialize the cached value to the desired hardware
defaults.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
6 years agodrm: rcar-du: Enable configurable DPAD0 routing on Gen3
Laurent Pinchart [Tue, 21 Aug 2018 21:01:07 +0000 (00:01 +0300)]
drm: rcar-du: Enable configurable DPAD0 routing on Gen3

All Gen3 SoCs supported so far have a fixed association between DPAD0
and DU channels, which led to hardcoding that association when writing
the corresponding hardware register. The D3 and E3 will break that
mechanism as DPAD0 can be dynamically connected to either DU0 or DU1.

Make DPAD0 routing dynamic on Gen3. To ensure a valid hardware
configuration when the DU starts without the RGB output enabled, DPAD0
is associated at initialization time to the first DU channel that it can
be connected to. This makes no change on Gen2 as all Gen2 SoCs can
connected DPAD0 to DU0, which is the current implicit default value.

As the DPAD0 source is always 0 when a single source is possible on
Gen2, we can also simplify the Gen2 code in the same function to remove
a conditional check.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
6 years agodrm: rcar-du: Use LVDS PLL clock as dot clock when possible
Laurent Pinchart [Tue, 21 Aug 2018 18:31:04 +0000 (21:31 +0300)]
drm: rcar-du: Use LVDS PLL clock as dot clock when possible

On selected SoCs, the DU can use the clock output by the LVDS encoder
PLL as its input dot clock. This feature is optional, but on the D3 and
E3 SoC it is often the only way to obtain a precise dot clock frequency,
as the other available clocks (CPG-generated clock and external clock)
usually have fixed rates.

Add a DU model information field to describe which DU channels can use
the LVDS PLL output clock as their input clock, and configure clock
routing accordingly.

This feature is available on H2, M2-W, M2-N, D3 and E3 SoCs, with D3 and
E3 being the primary targets. It is left disabled in this commit, and
will be enabled per-SoC after careful testing.

At the hardware level, clock routing is configured at runtime in two
steps, first selecting an internal dot clock between the LVDS PLL clock
and the external DOTCLKIN clock, and then selecting between the internal
dot clock and the CPG-generated clock. The first part requires stopping
the whole DU group in order for the change to take effect, thus causing
flickering on the screen. For this reason we currently hardcode the
clock source to the LVDS PLL clock if available, and allow flicker-free
selection of the external DOTCLKIN clock or CPG-generated clock
otherwise. A more dynamic clock selection process can be implemented
later if the need arises.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
6 years agodrm: rcar-du: Perform the initial CRTC setup from rcar_du_crtc_get()
Laurent Pinchart [Fri, 14 Jul 2017 00:26:17 +0000 (03:26 +0300)]
drm: rcar-du: Perform the initial CRTC setup from rcar_du_crtc_get()

The rcar_du_crtc_get() function is always immediately followed by a call
to rcar_du_crtc_setup(). Call the later from the former to simplify the
code, and add a comment to explain how the get and put calls are
balanced.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
6 years agodrm: rcar-du: lvds: D3/E3 support
Laurent Pinchart [Tue, 21 Aug 2018 15:06:50 +0000 (18:06 +0300)]
drm: rcar-du: lvds: D3/E3 support

The LVDS encoders in the D3 and E3 SoCs differ significantly from those
in the other R-Car Gen3 family members:

- The LVDS PLL architecture is more complex and requires computing PLL
  parameters manually.
- The PLL uses external clocks as inputs, which need to be retrieved
  from DT.
- In addition to the different PLL setup, the startup sequence has
  changed *again* (seems someone had trouble making his/her mind).

Supporting all this requires DT bindings extensions for external clocks,
brand new PLL setup code, and a few quirks to handle the differences in
the startup sequence.

The implementation doesn't support all hardware features yet, namely

- Using the LV[01] clocks generated by the CPG as PLL input.
- Providing the LVDS PLL clock to the DU for use with the RGB output.

Those features can be added later when the need will arise.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
6 years agodrm: bridge: thc63: Restrict modes based on hardware operating frequency
Laurent Pinchart [Wed, 22 Aug 2018 14:04:06 +0000 (17:04 +0300)]
drm: bridge: thc63: Restrict modes based on hardware operating frequency

The THC63LVD1024 is restricted to a pixel clock frequency in the range
of 8 to 135 MHz. Implement the bridge .mode_valid() operation
accordingly.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Tested-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
6 years agodt-bindings: display: renesas: lvds: Add EXTAL and DU_DOTCLKIN clocks
Laurent Pinchart [Wed, 22 Aug 2018 12:27:16 +0000 (15:27 +0300)]
dt-bindings: display: renesas: lvds: Add EXTAL and DU_DOTCLKIN clocks

On the D3 and E3 SoCs, the LVDS encoder can derive its internal pixel
clock from an externally supplied clock, either through the EXTAL pin or
through one of the DU_DOTCLKINx pins. Add corresponding clocks to the DT
bindings.

To retain backward compatibility with DT that don't specify the
clock-names property, the functional clock must always be specified
first, and the clock-names property is optional when only the functional
clock is specified.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
6 years agodt-bindings: display: renesas: lvds: Document r8a77990 bindings
Laurent Pinchart [Mon, 20 Aug 2018 14:12:49 +0000 (17:12 +0300)]
dt-bindings: display: renesas: lvds: Document r8a77990 bindings

The E3 (r8a77990) supports two LVDS channels. Extend the binding to
support them.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
6 years agodt-bindings: display: renesas: du: Document r8a77990 bindings
Laurent Pinchart [Mon, 20 Aug 2018 14:07:25 +0000 (17:07 +0300)]
dt-bindings: display: renesas: du: Document r8a77990 bindings

Document the E3 (r8a77990) SoC in the R-Car DU bindings.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
6 years agoLinux 4.19-rc5
Greg Kroah-Hartman [Sun, 23 Sep 2018 17:15:18 +0000 (19:15 +0200)]
Linux 4.19-rc5

6 years agoMerge tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Greg Kroah-Hartman [Sun, 23 Sep 2018 15:19:27 +0000 (17:19 +0200)]
Merge tag 'mfd-fixes-4.19' of git://git./linux/kernel/git/lee/mfd

Lee writes:
  "MFD fixes for v4.19
   - Fix Dialog DA9063 regulator constraints issue causing failure in
     probe
   - Fix OMAP Device Tree compatible strings to match DT"

* tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: omap-usb-host: Fix dts probe of children
  mfd: da9063: Fix DT probing with constraints

6 years agoMerge tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sun, 23 Sep 2018 11:32:19 +0000 (13:32 +0200)]
Merge tag 'for-linus-4.19d-rc5-tag' of git://git./linux/kernel/git/xen/tip

Juergen writes:
  "xen:
   Two small fixes for xen drivers."

* tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: issue warning message when out of grant maptrack entries
  xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code

6 years agoMerge tag 'for-linus-20180922' of git://git.kernel.dk/linux-block
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:33:28 +0000 (08:33 +0200)]
Merge tag 'for-linus-20180922' of git://git.kernel.dk/linux-block

Jens writes:
  "Just a single fix in this pull request, fixing a regression in
  /proc/diskstats caused by the unification of timestamps."

* tag 'for-linus-20180922' of git://git.kernel.dk/linux-block:
  block: use nanosecond resolution for iostat

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:10:12 +0000 (08:10 +0200)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Thomas writes:
  "A set of fixes for x86:

   - Resolve the kvmclock regression on AMD systems with memory
     encryption enabled. The rework of the kvmclock memory allocation
     during early boot results in encrypted storage, which is not
     shareable with the hypervisor. Create a new section for this data
     which is mapped unencrypted and take care that the later
     allocations for shared kvmclock memory is unencrypted as well.

   - Fix the build regression in the paravirt code introduced by the
     recent spectre v2 updates.

   - Ensure that the initial static page tables cover the fixmap space
     correctly so early console always works. This worked so far by
     chance, but recent modifications to the fixmap layout can -
     depending on kernel configuration - move the relevant entries to a
     different place which is not covered by the initial static page
     tables.

   - Address the regressions and issues which got introduced with the
     recent extensions to the Intel Recource Director Technology code.

   - Update maintainer entries to document reality"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Expand static page table for fixmap space
  MAINTAINERS: Add X86 MM entry
  x86/intel_rdt: Add Reinette as co-maintainer for RDT
  MAINTAINERS: Add Borislav to the x86 maintainers
  x86/paravirt: Fix some warning messages
  x86/intel_rdt: Fix incorrect loop end condition
  x86/intel_rdt: Fix exclusive mode handling of MBA resource
  x86/intel_rdt: Fix incorrect loop end condition
  x86/intel_rdt: Do not allow pseudo-locking of MBA resource
  x86/intel_rdt: Fix unchecked MSR access
  x86/intel_rdt: Fix invalid mode warning when multiple resources are managed
  x86/intel_rdt: Global closid helper to support future fixes
  x86/intel_rdt: Fix size reporting of MBA resource
  x86/intel_rdt: Fix data type in parsing callbacks
  x86/kvm: Use __bss_decrypted attribute in shared variables
  x86/mm: Add .bss..decrypted section to hold shared variables

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:09:16 +0000 (08:09 +0200)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Thomas writes:
  "- Provide a strerror_r wrapper so lib/bpf can be built on systems
     without _GNU_SOURCE
   - Unbreak the man page generator when building out of tree"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf Documentation: Fix out-of-tree asciidoctor man page generation
  tools lib bpf: Provide wrapper for strerror_r to build in !_GNU_SOURCE systems

6 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:06:54 +0000 (08:06 +0200)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Thomas writes:
  "Make the EFI arm stub device tree loader default on to unbreak
  existing EFI boot loaders which do not have DTB support."

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub/arm: default EFI_ARMSTUB_DTB_LOADER to y

6 years agoblock: use nanosecond resolution for iostat
Omar Sandoval [Fri, 21 Sep 2018 23:44:34 +0000 (16:44 -0700)]
block: use nanosecond resolution for iostat

Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
updating properly on 4.18. This is because we started using ktime to
track elapsed time, and we convert nanoseconds to jiffies when we update
the partition counter. However, this gets rounded down, so any I/Os that
take less than a jiffy are not accounted for. Previously in this case,
the value of jiffies would sometimes increment while we were doing I/O,
so at least some I/Os were accounted for.

Let's convert the stats to use nanoseconds internally. We still report
milliseconds as before, now more accurately than ever. The value is
still truncated to 32 bits for backwards compatibility.

Fixes: 522a777566f5 ("block: consolidate struct request timestamp fields")
Cc: stable@vger.kernel.org
Reported-by: Klaus Kusche <klaus.kusche@computerix.info>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoMerge tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Greg Kroah-Hartman [Fri, 21 Sep 2018 18:01:16 +0000 (20:01 +0200)]
Merge tag 'pinctrl-v4.19-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Linus writes:
  "Pin control fixes for v4.19:
   - Two fixes for the Intel pin controllers than cause
     problems on laptops."

* tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: intel: Do pin translation in other GPIO operations as well
  pinctrl: cannonlake: Fix gpio base for GPP-E

6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Greg Kroah-Hartman [Fri, 21 Sep 2018 14:21:42 +0000 (16:21 +0200)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Paolo writes:
  "It's mostly small bugfixes and cleanups, mostly around x86 nested
   virtualization.  One important change, not related to nested
   virtualization, is that the ability for the guest kernel to trap
   CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
   now masked by default.  This is because the feature is detected
   through an MSR; a very bad idea that Intel seems to like more and
   more.  Some applications choke if the other fields of that MSR are
   not initialized as on real hardware, hence we have to disable the
   whole MSR by default, as was the case before Linux 4.12."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
  KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
  kvm: selftests: Add platform_info_test
  KVM: x86: Control guest reads of MSR_PLATFORM_INFO
  KVM: x86: Turbo bits in MSR_PLATFORM_INFO
  nVMX x86: Check VPID value on vmentry of L2 guests
  nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
  KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
  KVM: VMX: check nested state and CR4.VMXE against SMM
  kvm: x86: make kvm_{load|put}_guest_fpu() static
  x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
  KVM: VMX: use preemption timer to force immediate VMExit
  KVM: VMX: modify preemption timer bit only when arming timer
  KVM: VMX: immediately mark preemption timer expired only for zero value
  KVM: SVM: Switch to bitmap_zalloc()
  KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
  kvm: selftests: use -pthread instead of -lpthread
  KVM: x86: don't reset root in kvm_mmu_setup()
  kvm: mmu: Don't read PDPTEs when paging is not enabled
  x86/kvm/lapic: always disable MMIO interface in x2APIC mode
  KVM: s390: Make huge pages unavailable in ucontrol VMs
  ...

6 years agoMerge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs
Greg Kroah-Hartman [Fri, 21 Sep 2018 13:29:44 +0000 (15:29 +0200)]
Merge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs

Richard writes:
  "This pull request contains fixes for UBIFS:
   - A wrong UBIFS assertion in mount code
   - Fix for a NULL pointer deref in mount code
   - Revert of a bad fix for xattrs"

* tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs:
  Revert "ubifs: xattr: Don't operate on deleted inodes"
  ubifs: drop false positive assertion
  ubifs: Check for name being NULL while mounting

6 years agoMerge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block
Greg Kroah-Hartman [Fri, 21 Sep 2018 07:41:05 +0000 (09:41 +0200)]
Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block

Jens writes:
  "Storage fixes for 4.19-rc5

  - Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)

  - NVMe pull request from Christoph, and a single ANA log page fix
    (Hannes)

  - Regression fix for libata qd32 support, where we trigger an illegal
    active command transition. This fixes a CD-ROM detection issue that
    was reported, but could also trigger premature completion of the
    internal tag (me)"

* tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
  floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
  libata: mask swap internal and hardware tag
  nvme: count all ANA groups for ANA Log page

6 years agoMerge tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm
Greg Kroah-Hartman [Fri, 21 Sep 2018 07:11:18 +0000 (09:11 +0200)]
Merge tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm

David writes:
  "drm fixes for 4.19-rc5:

   - core: fix debugfs for atomic, fix the check for atomic for
     non-modesetting drivers
   - amdgpu: adds a new PCI id, some kfd fixes and a sdma fix
   - i915: a bunch of GVT fixes.
   - vc4: scaling fix
   - vmwgfx: modesetting fixes and a old buffer eviction fix
   - udl: framebuffer destruction fix
   - sun4i: disable on R40 fix until next kernel
   - pl111: NULL termination on table fix"

* tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm: (21 commits)
  drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
  drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
  drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
  drm/vmwgfx: Fix buffer object eviction
  drm/vmwgfx: Don't impose STDU limits on framebuffer size
  drm/vmwgfx: limit mode size for all display unit to texture_max
  drm/vmwgfx: limit screen size to stdu_max during check_modeset
  drm/vmwgfx: don't check for old_crtc_state enable status
  drm/amdgpu: add new polaris pci id
  drm: sun4i: drop second PLL from A64 HDMI PHY
  drm: fix drm_drv_uses_atomic_modeset on non modesetting drivers.
  drm/i915/gvt: clear ggtt entries when destroy vgpu
  drm/i915/gvt: request srcu_read_lock before checking if one gfn is valid
  drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handler
  drm/i915/gvt: Init PHY related registers for BXT
  drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation
  drm/fb-helper: Remove set but not used variable 'connector_funcs'
  drm: udl: Destroy framebuffer only if it was initialized
  drm/sun4i: Remove R40 display pipeline compatibles
  drm/pl111: Make sure of_device_id tables are NULL terminated
  ...

6 years agoMerge branch 'drm-next-4.20' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Thu, 20 Sep 2018 23:52:34 +0000 (09:52 +1000)]
Merge branch 'drm-next-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-next

This is a new pull for drm-next on top of last weeks with the following
changes:
- Fixed 64 bit divide
- Fixed vram type on vega20
- Misc vega20 fixes
- Misc DC fixes
- Fix GDS/GWS/OA domain handling

Previous changes from last week:
amdgpu/kfd:
- Picasso (new APU) support
- Raven2 (new APU) support
- Vega20 enablement
- ACP powergating improvements
- Add ABGR/XBGR display support
- VCN JPEG engine support
- Initial xGMI support
- Use load balancing for engine scheduling
- Lots of new documentation
- Rework and clean up i2c and aux handling in DC
- Add DP YCbCr 4:2:0 support in DC
- Add DMCU firmware loading for Raven (used for ABM and PSR)
- New debugfs features in DC
- LVDS support in DC
- Implement wave kill for gfx/compute (light weight reset for shaders)
- Use AGP aperture to avoid gart mappings when possible
- GPUVM performance improvements
- Bulk moves for more efficient GPUVM LRU handling
- Merge amdgpu and amdkfd into one module
- Enable gfxoff and stutter mode on Raven
- Misc cleanups

Scheduler:
- Load balancing support
- Bug fixes

ttm:
- Bulk move functionality
- Bug fixes

radeon:
- Misc cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920150438.12693-1-alexander.deucher@amd.com
6 years agoMerge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Thu, 20 Sep 2018 23:52:21 +0000 (09:52 +1000)]
Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

A few fixes for 4.19:
- Add a new polaris pci id
- KFD fixes for raven and gfx7

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920155850.5455-1-alexander.deucher@amd.com
6 years agoMerge branch 'vmwgfx-fixes-4.19' of git://people.freedesktop.org/~thomash/linux into...
Dave Airlie [Thu, 20 Sep 2018 23:50:46 +0000 (09:50 +1000)]
Merge branch 'vmwgfx-fixes-4.19' of git://people.freedesktop.org/~thomash/linux into drm-fixes

A couple of modesetting fixes and a fix for a long-standing buffer-eviction
problem cc'd stable.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920063935.35492-1-thellstrom@vmware.com
6 years agox86/mm: Expand static page table for fixmap space
Feng Tang [Thu, 20 Sep 2018 02:58:28 +0000 (10:58 +0800)]
x86/mm: Expand static page table for fixmap space

We met a kernel panic when enabling earlycon, which is due to the fixmap
address of earlycon is not statically setup.

Currently the static fixmap setup in head_64.S only covers 2M virtual
address space, while it actually could be in 4M space with different
kernel configurations, e.g. when VSYSCALL emulation is disabled.

So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
and add a build time check to ensure that the fixmap is covered by the
initial static page tables.

Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: kernel test robot <rong.a.chen@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts)
Cc: H Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
6 years agoocfs2: fix ocfs2 read block panic
Junxiao Bi [Thu, 20 Sep 2018 19:22:51 +0000 (12:22 -0700)]
ocfs2: fix ocfs2 read block panic

While reading block, it is possible that io error return due to underlying
storage issue, in this case, BH_NeedsValidate was left in the buffer head.
Then when reading the very block next time, if it was already linked into
journal, that will trigger the following panic.

[203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342!
[203748.702533] invalid opcode: 0000 [#1] SMP
[203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
[203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2
[203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015
[203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000
[203748.703088] RIP: 0010:[<ffffffffa05e9f09>]  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.703130] RSP: 0018:ffff88006ff4b818  EFLAGS: 00010206
[203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000
[203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe
[203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0
[203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000
[203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000
[203748.705871] FS:  00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000
[203748.706370] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670
[203748.707124] Stack:
[203748.707371]  ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001
[203748.707885]  0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00
[203748.708399]  00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000
[203748.708915] Call Trace:
[203748.709175]  [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]
[203748.709680]  [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2]
[203748.710185]  [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2]
[203748.710691]  [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2]
[203748.711204]  [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2]
[203748.711716]  [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2]
[203748.712227]  [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2]
[203748.712737]  [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2]
[203748.713003]  [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2]
[203748.713263]  [<ffffffff8121714b>] vfs_create+0xdb/0x150
[203748.713518]  [<ffffffff8121b225>] do_last+0x815/0x1210
[203748.713772]  [<ffffffff812192e9>] ? path_init+0xb9/0x450
[203748.714123]  [<ffffffff8121bca0>] path_openat+0x80/0x600
[203748.714378]  [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620
[203748.714634]  [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0
[203748.714888]  [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130
[203748.715143]  [<ffffffff81209ffc>] do_sys_open+0x12c/0x220
[203748.715403]  [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180
[203748.715668]  [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190
[203748.715928]  [<ffffffff8120a10e>] SyS_open+0x1e/0x20
[203748.716184]  [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7
[203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10
[203748.717505] RIP  [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2]
[203748.717775]  RSP <ffff88006ff4b818>

Joesph ever reported a similar panic.
Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html
Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomm: slowly shrink slabs with a relatively small number of objects
Roman Gushchin [Thu, 20 Sep 2018 19:22:46 +0000 (12:22 -0700)]
mm: slowly shrink slabs with a relatively small number of objects

9092c71bb724 ("mm: use sc->priority for slab shrink targets") changed the
way that the target slab pressure is calculated and made it
priority-based:

    delta = freeable >> priority;
    delta *= 4;
    do_div(delta, shrinker->seeks);

The problem is that on a default priority (which is 12) no pressure is
applied at all, if the number of potentially reclaimable objects is less
than 4096 (1<<12).

This causes the last objects on slab caches of no longer used cgroups to
(almost) never get reclaimed.  It's obviously a waste of memory.

It can be especially painful, if these stale objects are holding a
reference to a dying cgroup.  Slab LRU lists are reparented on memcg
offlining, but corresponding objects are still holding a reference to the
dying cgroup.  If we don't scan these objects, the dying cgroup can't go
away.  Most likely, the parent cgroup hasn't any directly charged objects,
only remaining objects from dying children cgroups.  So it can easily hold
a reference to hundreds of dying cgroups.

If there are no big spikes in memory pressure, and new memory cgroups are
created and destroyed periodically, this causes the number of dying
cgroups grow steadily, causing a slow-ish and hard-to-detect memory
"leak".  It's not a real leak, as the memory can be eventually reclaimed,
but it could not happen in a real life at all.  I've seen hosts with a
steadily climbing number of dying cgroups, which doesn't show any signs of
a decline in months, despite the host is loaded with a production
workload.

It is an obvious waste of memory, and to prevent it, let's apply a minimal
pressure even on small shrinker lists.  E.g.  if there are freeable
objects, let's scan at least min(freeable, scan_batch) objects.

This fix significantly improves a chance of a dying cgroup to be
reclaimed, and together with some previous patches stops the steady growth
of the dying cgroups number on some of our hosts.

Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
Fixes: 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agokernel/sys.c: remove duplicated include
YueHaibing [Thu, 20 Sep 2018 19:22:43 +0000 (12:22 -0700)]
kernel/sys.c: remove duplicated include

Link: http://lkml.kernel.org/r/20180821133424.18716-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomm: shmem.c: Correctly annotate new inodes for lockdep
Joel Fernandes (Google) [Thu, 20 Sep 2018 19:22:39 +0000 (12:22 -0700)]
mm: shmem.c: Correctly annotate new inodes for lockdep

Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc8-next-20180810+ #36 Not tainted
> ------------------------------------------------------
> syz-executor900/4483 is trying to acquire lock:
00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
> include/linux/fs.h:765 [inline]
00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
> shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>
> but task is already holding lock:
0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
> drivers/staging/android/ashmem.c:448
>
> which lock already depends on the new lock.
>
> -> #2 (ashmem_mutex){+.+.}:
>        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
>        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
>        call_mmap include/linux/fs.h:1844 [inline]
>        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
>        do_mmap+0xa10/0x1220 mm/mmap.c:1535
>        do_mmap_pgoff include/linux/mm.h:2298 [inline]
>        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
>        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
>        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
>        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&mm->mmap_sem){++++}:
>        __might_fault+0x155/0x1e0 mm/memory.c:4568
>        _copy_to_user+0x30/0x110 lib/usercopy.c:25
>        copy_to_user include/linux/uaccess.h:155 [inline]
>        filldir+0x1ea/0x3a0 fs/readdir.c:196
>        dir_emit_dot include/linux/fs.h:3464 [inline]
>        dir_emit_dots include/linux/fs.h:3475 [inline]
>        dcache_readdir+0x13a/0x620 fs/libfs.c:193
>        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
>        __do_sys_getdents fs/readdir.c:231 [inline]
>        __se_sys_getdents fs/readdir.c:212 [inline]
>        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#9){++++}:
>        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
>        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
>        inode_lock include/linux/fs.h:765 [inline]
>        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
>        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
>        vfs_ioctl fs/ioctl.c:46 [inline]
>        file_ioctl fs/ioctl.c:501 [inline]
>        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
>        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
>        __do_sys_ioctl fs/ioctl.c:709 [inline]
>        __se_sys_ioctl fs/ioctl.c:707 [inline]
>        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(ashmem_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(ashmem_mutex);
>   lock(&sb->s_type->i_mutex_key#9);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor900/4483:
>  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
> ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Suggested-by: NeilBrown <neilb@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agofs/proc/kcore.c: fix invalid memory access in multi-page read optimization
Dominique Martinet [Thu, 20 Sep 2018 19:22:35 +0000 (12:22 -0700)]
fs/proc/kcore.c: fix invalid memory access in multi-page read optimization

The 'm' kcore_list item could point to kclist_head, and it is incorrect to
look at m->addr / m->size in this case.

There is no choice but to run through the list of entries for every
address if we did not find any entry in the previous iteration

Reset 'm' to NULL in that case at Omar Sandoval's suggestion.

[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org
Fixes: bf991c2231117 ("proc/kcore: optimize multiple page reads")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agomm: disable deferred struct page for 32-bit arches
Pasha Tatashin [Thu, 20 Sep 2018 19:22:30 +0000 (12:22 -0700)]
mm: disable deferred struct page for 32-bit arches

Deferred struct page init is needed only on systems with large amount of
physical memory to improve boot performance.  32-bit systems do not
benefit from this feature.

Jiri reported a problem where deferred struct pages do not work well with
x86-32:

[    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.036269] Initializing CPU#0
[    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
[    0.038459] page:f6780000 is uninitialized and poisoned
[    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
[    0.040038] ------------[ cut here ]------------
[    0.040399] kernel BUG at include/linux/page-flags.h:293!
[    0.040823] invalid opcode: 0000 [#1] SMP PTI
[    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
[    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.042496] EIP: free_highmem_page+0x64/0x80
[    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
[    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
[    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
[    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
[    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
[    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.046913] DR6: fffe0ff0 DR7: 00000400
[    0.047220] Call Trace:
[    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
[    0.047854]  set_highmem_pages_init+0x5b/0x71
[    0.048202]  mem_init+0x2b/0x1e8
[    0.048460]  start_kernel+0x1d2/0x425
[    0.048757]  i386_start_kernel+0x93/0x97
[    0.049073]  startup_32_smp+0x164/0x168
[    0.049379] Modules linked in:
[    0.049626] ---[ end trace 337949378db0abbb ]---

We free highmem pages before their struct pages are initialized:

mem_init()
 set_highmem_pages_init()
  add_highpages_with_active_regions()
   free_highmem_page()
    .. Access uninitialized struct page here..

Because there is no reason to have this feature on 32-bit systems, just
disable it.

Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
Fixes: 2e3ca40f03bb ("mm: relax deferred struct page requirements")
Signed-off-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agofork: report pid exhaustion correctly
KJ Tsanaktsidis [Thu, 20 Sep 2018 19:22:25 +0000 (12:22 -0700)]
fork: report pid exhaustion correctly

Make the clone and fork syscalls return EAGAIN when the limit on the
number of pids /proc/sys/kernel/pid_max is exceeded.

Currently, when the pid_max limit is exceeded, the kernel will return
ENOSPC from the fork and clone syscalls.  This is contrary to the
documented behaviour, which explicitly calls out the pid_max case as one
where EAGAIN should be returned.  It also leads to really confusing error
messages in userspace programs which will complain about a lack of disk
space when they fail to create processes/threads for this reason.

This error is being returned because alloc_pid() uses the idr api to find
a new pid; when there are none available, idr_alloc_cyclic() returns
-ENOSPC, and this is being propagated back to userspace.

This behaviour has been broken before, and was explicitly fixed in
commit 35f71bc0a09a ("fork: report pid reservation failure properly"),
so I think -EAGAIN is definitely the right thing to return in this case.
The current behaviour change dates from commit 95846ecf9dac ("pid:
replace pid bitmap implementation with IDR AIP") and was I believe
unintentional.

This patch has no impact on the case where allocating a pid fails because
the child reaper for the namespace is dead; that case will still return
-ENOMEM.

Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR AIP")
Signed-off-by: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMAINTAINERS: Add X86 MM entry
Thomas Gleixner [Wed, 19 Sep 2018 12:33:14 +0000 (14:33 +0200)]
MAINTAINERS: Add X86 MM entry

Dave, Andy and Peter are de facto overseing the mm parts of X86. Add an
explicit maintainers entry.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
6 years agox86/intel_rdt: Add Reinette as co-maintainer for RDT
Fenghua Yu [Thu, 20 Sep 2018 19:37:08 +0000 (12:37 -0700)]
x86/intel_rdt: Add Reinette as co-maintainer for RDT

Reinette Chatre is doing great job on enabling pseudo-locking and other
features in RDT. Add her as co-maintainer for RDT.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1537472228-221799-1-git-send-email-fenghua.yu@intel.com
6 years agoRevert "ubifs: xattr: Don't operate on deleted inodes"
Richard Weinberger [Sun, 16 Sep 2018 21:57:35 +0000 (23:57 +0200)]
Revert "ubifs: xattr: Don't operate on deleted inodes"

This reverts commit 11a6fc3dc743e22fb50f2196ec55bee5140d3c52.
UBIFS wants to assert that xattr operations are only issued on files
with positive link count. The said patch made this operations return
-ENOENT for unlinked files such that the asserts will no longer trigger.
This was wrong since xattr operations are perfectly fine on unlinked
files.
Instead the assertions need to be fixed/removed.

Cc: <stable@vger.kernel.org>
Fixes: 11a6fc3dc743 ("ubifs: xattr: Don't operate on deleted inodes")
Reported-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoubifs: drop false positive assertion
Sascha Hauer [Wed, 12 Sep 2018 12:51:38 +0000 (14:51 +0200)]
ubifs: drop false positive assertion

The following sequence triggers

ubifs_assert(c, c->lst.taken_empty_lebs > 0);

at the end of ubifs_remount_fs():

mount -t ubifs /dev/ubi0_0 /mnt
echo 1 > /sys/kernel/debug/ubifs/ubi0_0/ro_error
umount /mnt
mount -t ubifs -o ro /dev/ubix_y /mnt
mount -o remount,ro /mnt

The resulting

UBIFS assert failed in ubifs_remount_fs at 1878 (pid 161)

is a false positive. In the case above c->lst.taken_empty_lebs has
never been changed from its initial zero value. This will only happen
when the deferred recovery is done.

Fix this by doing the assertion only when recovery has been done
already.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoubifs: Check for name being NULL while mounting
Richard Weinberger [Mon, 3 Sep 2018 21:06:23 +0000 (23:06 +0200)]
ubifs: Check for name being NULL while mounting

The requested device name can be NULL or an empty string.
Check for that and refuse to continue. UBIFS has to do this manually
since we cannot use mount_bdev(), which checks for this condition.

Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system")
Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
Signed-off-by: Richard Weinberger <richard@nod.at>
6 years agoKVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
Liran Alon [Sun, 16 Sep 2018 11:28:20 +0000 (14:28 +0300)]
KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs

The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set
their return value in "r" local var and break out of switch block
when they encounter some error.
This is because vcpu_load() is called before the switch block which
have a proper cleanup of vcpu_put() afterwards.

However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return
immediately on error without performing above mentioned cleanup.

Thus, change these handlers to behave as expected.

Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")

Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agodrm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
Yong Zhao [Thu, 13 Sep 2018 01:42:20 +0000 (21:42 -0400)]
drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs

Because CRAT_CU_FLAGS_IOMMU_PRESENT was not set in some BIOS crat, we
need to workaround this.

For future compatibility, we also overwrite the bit in capability according
to the value of needs_iommu_device.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
Yong Zhao [Thu, 13 Sep 2018 01:42:19 +0000 (21:42 -0400)]
drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9

CWSR fails on Raven if the control stack is MTYPE_UC, which is used
for regular GART mappings. As a workaround we map it using MTYPE_NC.

The MEC firmware expects the control stack at one page offset from the
start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU
added a memory allocation flag just for this purpose.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
Amber Lin [Thu, 13 Sep 2018 01:42:18 +0000 (21:42 -0400)]
drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7

A wrong register bit was examinated for checking SDMA status so it reports
false failures. This typo only appears on gfx_v7. gfx_v8 checks the correct
bit.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agopinctrl: intel: Do pin translation in other GPIO operations as well
Mika Westerberg [Tue, 18 Sep 2018 15:36:21 +0000 (18:36 +0300)]
pinctrl: intel: Do pin translation in other GPIO operations as well

For some reason I thought GPIOLIB handles translation from GPIO ranges
to pinctrl pins but it turns out not to be the case. This means that
when GPIOs operations are performed for a pin controller having a custom
GPIO base such as Cannon Lake and Ice Lake incorrect pin number gets
used internally.

Fix this in the same way we did for lock/unlock IRQ operations and
translate the GPIO number to pin before using it.

Fixes: a60eac3239f0 ("pinctrl: intel: Allow custom GPIO base for pad groups")
Reported-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
6 years agoMerge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus
Jens Axboe [Thu, 20 Sep 2018 15:10:38 +0000 (09:10 -0600)]
Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus

Pull NVMe fix from Christoph.

* 'nvme-4.19' of git://git.infradead.org/nvme:
  nvme: count all ANA groups for ANA Log page

6 years agofloppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
Andy Whitcroft [Thu, 20 Sep 2018 15:09:48 +0000 (09:09 -0600)]
floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl

The final field of a floppy_struct is the field "name", which is a pointer
to a string in kernel memory.  The kernel pointer should not be copied to
user memory.  The FDGETPRM ioctl copies a floppy_struct to user memory,
including this "name" field.  This pointer cannot be used by the user
and it will leak a kernel address to user-space, which will reveal the
location of kernel code and data and undermine KASLR protection.

Model this code after the compat ioctl which copies the returned data
to a previously cleared temporary structure on the stack (excluding the
name pointer) and copy out to userspace from there.  As we already have
an inparam union with an appropriate member and that memory is already
cleared even for read only calls make use of that as a temporary store.

Based on an initial patch by Brian Belleville.

CVE-2018-7755
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Broke up long line.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agolibata: mask swap internal and hardware tag
Jens Axboe [Thu, 20 Sep 2018 14:30:55 +0000 (08:30 -0600)]
libata: mask swap internal and hardware tag

hen we're comparing the hardware completion mask passed in from the
driver with the internal tag pending mask, we need to account for the
fact that the internal tag is different from the hardware tag. If not,
then we can end up either prematurely completing the internal tag (since
it's not set in the hw mask), or simply flag an error:

ata2: illegal qc_active transition (100000000->00000001)

If the internal tag is set, then swap that with the hardware tag in this
case before comparing with what the hardware reports.

Fixes: 28361c403683 ("libata: add extra internal command")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201151
Cc: stable@vger.kernel.org
Reported-by: Paul Sbarra <sbarra.paul@gmail.com>
Tested-by: Paul Sbarra <sbarra.paul@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoCompiler Attributes: naked can be shared
Miguel Ojeda [Tue, 18 Sep 2018 16:55:42 +0000 (18:55 +0200)]
Compiler Attributes: naked can be shared

The naked attribute is supported by at least gcc >= 4.6 (for ARM,
which is the only current user), gcc >= 8 (for x86), clang >= 3.1
and icc >= 13. See https://godbolt.org/z/350Dyc

Therefore, move it out of compiler-gcc.h so that the definition
is shared by all compilers.

This also fixes Clang support for ARM32 --- 815f0ddb346c
("include/linux/compiler*.h: make compiler-*.h mutually exclusive").

Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Eli Friedman <efriedma@codeaurora.org>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-sparse@vger.kernel.org
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoCompiler Attributes: naked was fixed in gcc 4.6
Miguel Ojeda [Tue, 18 Sep 2018 16:55:41 +0000 (18:55 +0200)]
Compiler Attributes: naked was fixed in gcc 4.6

Commit 9c695203a7dd ("compiler-gcc.h: gcc-4.5 needs noclone
and noinline on __naked functions") added noinline and noclone
as a workaround for a gcc 4.5 bug, which was resolved in 4.6.0.

Since now the minimum gcc supported version is 4.6,
we can clean it up.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290
and https://godbolt.org/z/h6NMIL

Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Eli Friedman <efriedma@codeaurora.org>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joe Perches <joe@perches.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-sparse@vger.kernel.org
Tested-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 years agoMerge tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd
Greg Kroah-Hartman [Thu, 20 Sep 2018 09:25:20 +0000 (11:25 +0200)]
Merge tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd

Boris writes:
  "- Fixes a bug in the ->read/write_reg() implementation of the m25p80
     driver
   - Make sure of_node_get/put() calls are balanced in the partition
     parsing code
   - Fix a race in the denali NAND controller driver
   - Fix false positive WARN_ON() in the marvell NAND controller driver"

* tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd:
  mtd: devices: m25p80: Make sure the buffer passed in op is DMA-able
  mtd: partitions: fix unbalanced of_node_get/put()
  mtd: rawnand: denali: fix a race condition when DMA is kicked
  mtd: rawnand: marvell: prevent harmless warnings

6 years agoMerge tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Greg Kroah-Hartman [Thu, 20 Sep 2018 07:50:49 +0000 (09:50 +0200)]
Merge tag 'sound-4.19-rc5' of git://git./linux/kernel/git/tiwai/sound

Takashi writes:
  "sound fixes for 4.19-rc5

   here comes a collection of various fixes, mostly for stable-tree
   or regression fixes.

   Two relatively high LOCs are about the (rather simple) conversion of
   uapi integer types in topology API, and a regression fix about HDMI
   hotplug notification on AMD HD-audio.  The rest are all small
   individual fixes like ASoC Intel Skylake race condition, minor
   uninitialized page leak in emu10k1 ioctl, Firewire audio error paths,
   and so on."

* tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
  ALSA: fireworks: fix memory leak of response buffer at error path
  ALSA: oxfw: fix memory leak of discovered stream formats at error path
  ALSA: oxfw: fix memory leak for model-dependent data at error path
  ALSA: bebob: fix memory leak for M-Audio FW1814 and ProjectMix I/O at error path
  ALSA: hda - Enable runtime PM only for discrete GPU
  ALSA: oxfw: fix memory leak of private data
  ALSA: firewire-tascam: fix memory leak of private data
  ALSA: firewire-digi00x: fix memory leak of private data
  sound: don't call skl_init_chip() to reset intel skl soc
  sound: enable interrupt after dma buffer initialization
  Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
  ALSA: emu10k1: fix possible info leak to userspace on SNDRV_EMU10K1_IOCTL_INFO
  ASoC: cs4265: fix MMTLR Data switch control
  ASoC: AMD: Ensure reset bit is cleared before configuring
  ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()
  ALSA: bebob: use address returned by kmalloc() instead of kernel stack for streaming DMA mapping
  ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER
  ASoC: rsnd: adg: care clock-frequency size
  ASoC: uniphier: change status to orphan
  ASoC: rsnd: fixup not to call clk_get/set under non-atomic
  ...

6 years agodrm/vmwgfx: Fix buffer object eviction
Thomas Hellstrom [Fri, 14 Sep 2018 07:24:19 +0000 (09:24 +0200)]
drm/vmwgfx: Fix buffer object eviction

Commit 19be55701071 ("drm/ttm: add operation ctx to ttm_bo_validate v2")
introduced a regression where the vmwgfx driver refused to evict a
buffer that was still busy instead of waiting for it to become idle.

Fix this.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agodrm/vmwgfx: Don't impose STDU limits on framebuffer size
Deepak Rawat [Thu, 13 Sep 2018 10:46:10 +0000 (12:46 +0200)]
drm/vmwgfx: Don't impose STDU limits on framebuffer size

If framebuffers are larger, we create bounce surfaces that are within
STDU limits.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
6 years agodrm/vmwgfx: limit mode size for all display unit to texture_max
Deepak Rawat [Thu, 13 Sep 2018 10:44:13 +0000 (12:44 +0200)]
drm/vmwgfx: limit mode size for all display unit to texture_max

For all display units, limit mode size exposed to texture_max_width/
height as this is the maximum framebuffer size that virtual device can
create.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
6 years agodrm/vmwgfx: limit screen size to stdu_max during check_modeset
Deepak Rawat [Thu, 13 Sep 2018 10:34:37 +0000 (12:34 +0200)]
drm/vmwgfx: limit screen size to stdu_max during check_modeset

For STDU individual screen target size is limited by
SVGA_REG_SCREENTARGET_MAX_WIDTH/HEIGHT registers so add that limit
during atomic check_modeset.

An additional limit is placed in the update_layout ioctl to avoid
requesting layouts that current user-space typically can't support.
Also modified the comments to reflect current limitation on topology.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
6 years agodrm/vmwgfx: don't check for old_crtc_state enable status
Deepak Rawat [Thu, 13 Sep 2018 10:33:49 +0000 (12:33 +0200)]
drm/vmwgfx: don't check for old_crtc_state enable status

During atomic check to prepare the new topology no need to check if
old_crtc_state was enabled or not. This will cause atomic_check to fail
because due to connector routing a crtc can be in atomic_state even if
there was no change to enable status.

Detected this issue with igt run.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
6 years agoMerge tag 'du-next-20180914' of git://linuxtv.org/pinchartl/media into drm-next
Dave Airlie [Thu, 20 Sep 2018 04:12:01 +0000 (14:12 +1000)]
Merge tag 'du-next-20180914' of git://linuxtv.org/pinchartl/media into drm-next

R-Car DU changes for v4.20

The pull request mostly contains updates to the R-Car DU driver, notably
support for interlaced modes on Gen3 hardware, support for the LVDS output on
R8A77980, and a set of miscellaneous bug fixes. There are also two SPDX
conversion patches for the drm shmobile and panel-lvds drivers, as well as an
update to MAINTAINERS to add Kieran Bingham as a co-maintainer for the DU
driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3273568.LdoAI77IYW@avalon
6 years agodrm/amdgpu: add new polaris pci id
Alex Deucher [Tue, 18 Sep 2018 20:28:24 +0000 (15:28 -0500)]
drm/amdgpu: add new polaris pci id

Add new pci id.

Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
6 years agodrm/amdgpu: Exclude MM engines for vega20 virtual device
Frank Min [Thu, 26 Apr 2018 19:45:50 +0000 (03:45 +0800)]
drm/amdgpu: Exclude MM engines for vega20 virtual device

Temporary disable UVD/VCE block if is virtual device

Signed-off-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: add vega20 sriov capability detection
Frank Min [Thu, 26 Apr 2018 19:44:11 +0000 (03:44 +0800)]
drm/amdgpu: add vega20 sriov capability detection

Add sriov capability detection for vega20, then can check if device is
virtual device.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoMerge tag 'drm-misc-next-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 20 Sep 2018 00:14:59 +0000 (10:14 +1000)]
Merge tag 'drm-misc-next-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 4.20:

UAPI Changes:
- None

Cross-subsystem Changes:
- None

Core Changes:
- Allow drivers to disable features with per-device granularity (Ville)
- Use EOPNOTSUPP when iface/feature is unsupported instead of
  EINVAL/errno soup (Chris)
- Simplify M/N DP quirk by using constant N to limit size of M/N (Shawn)
- add quirk for LG LP140WF6-SPM1 eDP panel (Shawn)

Driver Changes:
- i915/amdgpu: Disable DRIVER_ATOMIC for older/unsupported devices (Ville)
- sun4i: add support for R40 HDMI PHY (Icenowy)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Icenowy Zheng <icenowy@aosc.io>
Cc: Lee, Shawn C <shawn.c.lee@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20180919200218.GA186644@art_vandelay
6 years agoMerge tag 'drm-intel-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 20 Sep 2018 00:01:46 +0000 (10:01 +1000)]
Merge tag 'drm-intel-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Only fixes coming from gvt containing "Two more BXT fixes from Colin,
one srcu locking fix and one fix for GGTT clear when destroy vGPU."

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180919151915.GA6309@intel.com
6 years agoMerge tag 'drm-misc-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 20 Sep 2018 00:00:31 +0000 (10:00 +1000)]
Merge tag 'drm-misc-fixes-2018-09-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v4.19-rc5:
- Fix crash in vgem in drm_drv_uses_atomic_modeset.
- Allow atomic drivers that don't set DRIVER_ATOMIC to create debugfs entries.
- Fix compiler warning for unused connector_funcs.
- Fix null pointer deref on UDL unplug.
- Disable DRM support for sun4i's R40 for now.
  (Not all patches went in for v4.19, so it has to wait a cycle.)
- NULL-terminate the of_device_id table in pl111.
- Make sure vc4 NV12 planar format works when displaying an unscaled fb.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/dda393bb-f13f-8d36-711b-cacfc578e5a3@linux.intel.com
6 years agokvm: selftests: Add platform_info_test
Drew Schmitt [Mon, 20 Aug 2018 17:32:16 +0000 (10:32 -0700)]
kvm: selftests: Add platform_info_test

Test guest access to MSR_PLATFORM_INFO when the capability is enabled
or disabled.

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: x86: Control guest reads of MSR_PLATFORM_INFO
Drew Schmitt [Mon, 20 Aug 2018 17:32:15 +0000 (10:32 -0700)]
KVM: x86: Control guest reads of MSR_PLATFORM_INFO

Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: x86: Turbo bits in MSR_PLATFORM_INFO
Drew Schmitt [Mon, 20 Aug 2018 17:32:14 +0000 (10:32 -0700)]
KVM: x86: Turbo bits in MSR_PLATFORM_INFO

Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
the CPUID faulting bit was settable. But now any bit in
MSR_PLATFORM_INFO would be settable. This can be used, for example, to
convey frequency information about the platform on which the guest is
running.

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agonVMX x86: Check VPID value on vmentry of L2 guests
Krish Sadhukhan [Tue, 4 Sep 2018 18:42:58 +0000 (14:42 -0400)]
nVMX x86: Check VPID value on vmentry of L2 guests

According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
following check needs to be enforced on vmentry of L2 guests:

    If the 'enable VPID' VM-execution control is 1, the value of the
    of the VPID VM-execution control field must not be 0000H.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agonVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
Krish Sadhukhan [Fri, 24 Aug 2018 00:03:03 +0000 (20:03 -0400)]
nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2

According to section "Checks on VMX Controls" in Intel SDM vol 3C,
the following check needs to be enforced on vmentry of L2 guests:

   - Bits 5:0 of the posted-interrupt descriptor address are all 0.
   - The posted-interrupt descriptor address does not set any bits
     beyond the processor's physical-address width.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
Liran Alon [Tue, 4 Sep 2018 07:56:52 +0000 (10:56 +0300)]
KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv

In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
it is possible for a vCPU to be blocked while it is in guest-mode.

According to Intel SDM 26.6.5 Interrupt-Window Exiting and
Virtual-Interrupt Delivery: "These events wake the logical processor
if it just entered the HLT state because of a VM entry".
Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
should be waken from the HLT state and injected with the interrupt.

In addition, if while the vCPU is blocked (while it is in guest-mode),
it receives a nested posted-interrupt, then the vCPU should also be
waken and injected with the posted interrupt.

To handle these cases, this patch enhances kvm_vcpu_has_events() to also
check if there is a pending interrupt in L2 virtual APICv provided by
L1. That is, it evaluates if there is a pending virtual interrupt for L2
by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
Evaluation of Pending Interrupts.

Note that this also handles the case of nested posted-interrupt by the
fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
kvm_vcpu_running() -> vmx_check_nested_events() ->
vmx_complete_nested_posted_interrupt().

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: VMX: check nested state and CR4.VMXE against SMM
Paolo Bonzini [Tue, 18 Sep 2018 13:19:17 +0000 (15:19 +0200)]
KVM: VMX: check nested state and CR4.VMXE against SMM

VMX cannot be enabled under SMM, check it when CR4 is set and when nested
virtualization state is restored.

This should fix some WARNs reported by syzkaller, mostly around
alloc_shadow_vmcs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agokvm: x86: make kvm_{load|put}_guest_fpu() static
Sebastian Andrzej Siewior [Wed, 12 Sep 2018 13:33:45 +0000 (15:33 +0200)]
kvm: x86: make kvm_{load|put}_guest_fpu() static

The functions
kvm_load_guest_fpu()
kvm_put_guest_fpu()

are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit
e5bb40251a920 ("KVM: Drop kvm_{load,put}_guest_fpu() exports").

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agox86/hyper-v: rename ipi_arg_{ex,non_ex} structures
Vitaly Kuznetsov [Mon, 27 Aug 2018 16:48:57 +0000 (18:48 +0200)]
x86/hyper-v: rename ipi_arg_{ex,non_ex} structures

These structures are going to be used from KVM code so let's make
their names reflect their Hyper-V origin.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: VMX: use preemption timer to force immediate VMExit
Sean Christopherson [Mon, 27 Aug 2018 22:21:12 +0000 (15:21 -0700)]
KVM: VMX: use preemption timer to force immediate VMExit

A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest.  Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI.  This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).

When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI.  Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed.  But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0.  Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.

For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing.  The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter.  Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.

Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate.  There are no known errata affecting timer value of '0'.

[1] I/O SMIs also have guarantees on when they arrive, but I have
    no idea if/how those are emulated in KVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: VMX: modify preemption timer bit only when arming timer
Sean Christopherson [Mon, 27 Aug 2018 22:21:11 +0000 (15:21 -0700)]
KVM: VMX: modify preemption timer bit only when arming timer

Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths.  For example, the preemption timer can be used to force an
immediate VMExit.  Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: VMX: immediately mark preemption timer expired only for zero value
Sean Christopherson [Mon, 27 Aug 2018 22:21:10 +0000 (15:21 -0700)]
KVM: VMX: immediately mark preemption timer expired only for zero value

A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest.  This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.

Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit.  Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.

Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: SVM: Switch to bitmap_zalloc()
Andy Shevchenko [Thu, 30 Aug 2018 11:49:59 +0000 (14:49 +0300)]
KVM: SVM: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM/MMU: Fix comment in walk_shadow_page_lockless_end()
Tianyu Lan [Fri, 7 Sep 2018 05:45:02 +0000 (05:45 +0000)]
KVM/MMU: Fix comment in walk_shadow_page_lockless_end()

kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
This patch is to fix the commit.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agokvm: selftests: use -pthread instead of -lpthread
Lei Yang [Wed, 29 Aug 2018 07:04:08 +0000 (15:04 +0800)]
kvm: selftests: use -pthread instead of -lpthread

I run into the following error

testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create'
testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status

my gcc version is gcc version 4.8.4
"-pthread" would work everywhere

Signed-off-by: Lei Yang <Lei.Yang@windriver.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: x86: don't reset root in kvm_mmu_setup()
Wei Yang [Fri, 7 Sep 2018 11:59:47 +0000 (19:59 +0800)]
KVM: x86: don't reset root in kvm_mmu_setup()

Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.

    kvm_vm_ioctl_create_vcpu()
        kvm_arch_vcpu_create()
            vmx_create_vcpu()
                kvm_vcpu_init()
                    kvm_arch_vcpu_init()
                        kvm_mmu_create()
        kvm_arch_vcpu_setup()
            kvm_mmu_setup()
                kvm_init_mmu()

This patch set reset_roots to false in kmv_mmu_setup().

Fixes: 50c28f21d045dde8c52548f8482d456b3f0956f5
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agokvm: mmu: Don't read PDPTEs when paging is not enabled
Junaid Shahid [Thu, 9 Aug 2018 00:45:24 +0000 (17:45 -0700)]
kvm: mmu: Don't read PDPTEs when paging is not enabled

kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
CR4.PAE = 1.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agox86/kvm/lapic: always disable MMIO interface in x2APIC mode
Vitaly Kuznetsov [Thu, 2 Aug 2018 15:08:16 +0000 (17:08 +0200)]
x86/kvm/lapic: always disable MMIO interface in x2APIC mode

When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):

PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014

The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.

When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.

Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.

The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoMerge tag 'hwmon-for-linus-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Wed, 19 Sep 2018 20:59:30 +0000 (22:59 +0200)]
Merge tag 'hwmon-for-linus-v4.19-rc5' of git://git./linux/kernel/git/groeck/linux-staging

Guenter writes:
   "Various bug fixes for nct6775 driver"

6 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Greg Kroah-Hartman [Wed, 19 Sep 2018 20:34:22 +0000 (22:34 +0200)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

James writes:
  "SCSI fixes on 20180919

   A couple of small but important fixes, one affecting big endian and
   the other fixing a BUG_ON in scatterlist processing.

Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>"
6 years agodrm/amdgpu: move reserving GDS/GWS/OA into common code
Christian König [Fri, 14 Sep 2018 19:08:57 +0000 (21:08 +0200)]
drm/amdgpu: move reserving GDS/GWS/OA into common code

We don't need that in the per ASIC code.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: initialize GDS/GWS/OA domains even when they are zero sized
Christian König [Fri, 14 Sep 2018 18:59:27 +0000 (20:59 +0200)]
drm/amdgpu: initialize GDS/GWS/OA domains even when they are zero sized

Stops crashing on SI.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: fix up GDS/GWS/OA shifting
Christian König [Fri, 14 Sep 2018 14:06:31 +0000 (16:06 +0200)]
drm/amdgpu: fix up GDS/GWS/OA shifting

That only worked by pure coincident. Completely remove the shifting and
always apply correct PAGE_SHIFT.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: fix shadow BO restoring
Christian König [Tue, 11 Sep 2018 09:50:57 +0000 (11:50 +0200)]
drm/amdgpu: fix shadow BO restoring

Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 years agodrm/amdgpu: always recover VRAM during GPU recovery
Christian König [Tue, 11 Sep 2018 08:36:16 +0000 (10:36 +0200)]
drm/amdgpu: always recover VRAM during GPU recovery

It shouldn't add much overhead and we should make sure that critical
VRAM content is always restored.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>