review.tizen.org Git - platform/kernel/linux-rpi.git/log

PCI: Equalize hotplug memory and io for occupied and empty slots

Currently, a hotplug bridge will be given hpmemsize additional memory
and hpiosize additional io if available, in order to satisfy any future
hotplug allocation requirements.

These calculations don't consider the current memory/io size of the
hotplug bridge/slot, so hotplug bridges/slots which have downstream
devices will be allocated their current allocation in addition to the
hpmemsize value.

This makes for possibly undesirable results with a mix of unoccupied and
occupied slots (ex, with hpmemsize=2M):

  02:03.0 PCI bridge: <-- Occupied
  Memory behind bridge: d6200000-d64fffff [size=3M]
  02:04.0 PCI bridge: <-- Unoccupied
  Memory behind bridge: d6500000-d66fffff [size=2M]

This change considers the current allocation size when using the
hpmemsize/hpiosize parameters to make the reservations predictable for
the mix of unoccupied and occupied slots:

  02:03.0 PCI bridge: <-- Occupied
  Memory behind bridge: d6200000-d63fffff [size=2M]
  02:04.0 PCI bridge: <-- Unoccupied
  Memory behind bridge: d6400000-d65fffff [size=2M]

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

PCI / ACPI: Whitelist D3 for more PCIe hotplug ports

In order to have better power management for Thunderbolt PCIe chains,
Windows enables power management for native PCIe hotplug ports if there is
the following ACPI _DSD attached to the root port:

  Name (_DSD, Package () {
      ToUUID ("6211e2c0-58a3-4af3-90e1-927a4e0c55a4"),
      Package () {
          Package () {"HotPlugSupportInD3", 1}
      }
  })

This is also documented in:

  https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3

Do the same in Linux by introducing new firmware PM callback
(->bridge_d3()) and then implement it for ACPI based systems so that the
above property is checked.

There is one catch, though. The initial pci_dev->bridge_d3 is set before
the root port has ACPI companion bound (the device is not added to the PCI
bus either) so we need to look up the ACPI companion manually in that case
in acpi_pci_bridge_d3().

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / property: Allow multiple property compatible _DSD entries

It is possible to have _DSD entries where the data is compatible with
device properties format but are using different GUID for various reasons.
In addition to that there can be many such _DSD entries for a single device
such as for PCIe root port used to host a Thunderbolt hierarchy:

    Scope (\_SB.PCI0.RP21)
    {
        Name (_DSD, Package () {
            ToUUID ("6211e2c0-58a3-4af3-90e1-927a4e0c55a4"),
            Package () {
                Package () {"HotPlugSupportInD3", 1}
            },

            ToUUID ("efcc06cc-73ac-4bc3-bff0-76143807c389"),
            Package () {
                Package () {"ExternalFacingPort", 1},
                Package () {"UID", 0 }
            }
        })
    }

More information about these new _DSD entries can be found in:

  https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports

To make these available for drivers via unified device property APIs,
modify ACPI property core so that it supports multiple _DSD entries
organized in a linked list. We also store GUID of each _DSD entry in struct
acpi_device_properties in case there is need to differentiate between
entries. The supported GUIDs are then listed in prp_guids array.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>

PCI/PME: Implement runtime PM callbacks

Basically we need to do the same steps than what we do when system sleep is
entered and disable PME interrupt when the root port is runtime suspended.
This prevents spurious wakeups immediately when the port is transitioned
into D3cold.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI: pciehp: Implement runtime PM callbacks

Basically we need to do the same thing when runtime suspending than with
system sleep so re-use those operations here. This makes sure hotplug
interrupt does not trigger immediately when the link goes down.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI/portdrv: Add runtime PM hooks for port service drivers

When PCIe port is runtime suspended/resumed some extra steps might be
needed to be executed from the port service driver side. For instance we
may need to disable PCIe hotplug interrupt to prevent it from triggering
immediately when PCIe link to the downstream component goes down.

To make the above possible add optional ->runtime_suspend() and
->runtime_resume() callbacks to struct pcie_port_service_driver and call
them for each port service in runtime suspend/resume callbacks of portdrv.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: adjust "slot->state" for 5790a9c78e78 ("PCI: pciehp: Unify
controller and slot structs")]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI/portdrv: Resume upon exit from system suspend if left runtime suspended

Currently we try to keep PCIe ports runtime suspended over system suspend
if possible. This mostly happens when entering suspend-to-idle because
there is no need to re-configure wake settings.

This causes problems if the parent port goes into D3cold and it gets
resumed upon exit from system suspend. This may happen for example if the
port is part of PCIe switch and the same switch is connected to a PCIe
endpoint that needs to be resumed. The way exit from D3cold works according
PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold reset is
signaled. After this the device is in D0unitialized state keeping PME
context if it supports wake from D3cold.

The problem occurs when a PCIe hotplug port is left suspended and the
parent port goes into D3cold and back to D0: the port keeps its PME context
but since everything else is reset back to defaults (D0unitialized) it is
not set to detect hotplug events anymore.

For this reason change the PCIe portdrv power management logic so that it
is fine to keep the port runtime suspended over system suspend but it needs
to be resumed upon exit to make sure it gets properly re-initialized.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

PCI: pciehp: Do not handle events if interrupts are masked

PCIe native hotplug shares MSI vector with native PME so the interrupt
handler might get called even the hotplug interrupt is masked. In that case
we should not handle any events because the interrupt was not meant for us.

Modify the PCIe hotplug interrupt handler to check this accordingly and
bail out if it finds out that the interrupt was not about hotplug.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>

PCI: pciehp: Disable hotplug interrupt during suspend

When PCIe hotplug port is transitioned into D3hot, the link to the
downstream component will go down. If hotplug interrupt generation is
enabled when that happens, it will trigger immediately, waking up the
system and bringing the link back up.

To prevent this, disable hotplug interrupt generation when system suspend
is entered. This does not prevent wakeup from low power states according
to PCIe 4.0 spec section 6.7.3.4:

  Software enables a hot-plug event to generate a wakeup event by
  enabling software notification of the event as described in Section
  6.7.3.1. Note that in order for software to disable interrupt generation
  while keeping wakeup generation enabled, the Hot-Plug Interrupt Enable
  bit must be cleared.

So as long as we have set the slot event mask accordingly, wakeup should
work even if slot interrupt is disabled. The port should trigger wake and
then send PME to the root port when the PCIe hierarchy is brought back up.

Limit this to systems using native PME mechanism to make sure older Apple
systems depending on commit e3354628c376 ("PCI: pciehp: Support interrupts
sent from D3hot") still continue working.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI / ACPI: Enable wake automatically for power managed bridges

We enable power management automatically for bridges where
pci_bridge_d3_possible() returns true. However, these bridges may have
ACPI methods such as _DSW that need to be called before D3 entry. For
example in Lenovo Thinkpad X1 Carbon 6th _DSW method is used to prepare
D3cold for the PCIe root port hosting Thunderbolt chain. Because wake is
not enabled _DSW method is never called and the port does not enter
D3cold properly consuming more power than necessary.

Users can work this around by writing "enabled" to "wakeup" sysfs file
under the device in question but that is not something an ordinary user
is expected to do.

Since we already automatically enable power management for PCIe ports
with ->bridge_d3 set extend that to enable wake for them as well,
assuming the port has any ACPI wakeup related objects implemented in the
namespace (adev->wakeup.flags.valid is true). This ensures the necessary
ACPI methods get called at appropriate times and allows the root port in
Thinkpad X1 Carbon 6th to go into D3cold.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI: Do not skip power-managed bridges in pci_enable_wake()

Commit baecc470d5fd ("PCI / PM: Skip bridges in pci_enable_wake()") changed
pci_enable_wake() so that all bridges are skipped when wakeup is enabled
(or disabled) with the reasoning that bridges can only signal wakeup on
behalf of their subordinate devices.

However, there are bridges that can signal wakeup themselves. For example
PCIe downstream and root ports supporting hotplug may signal wakeup upon
hotplug event.

For this reason change pci_enable_wake() so that it skips all bridges
except those that we power manage (->bridge_d3 is set). Those are the ones
that can go into low power states and may need to signal wakeup.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI: Make link active reporting detection generic

The spec has timing requirements when waiting for a link to become active
after a conventional reset. Implement those hard delays when waiting for
an active link so pciehp and dpc drivers don't need to duplicate this.

For devices that don't support data link layer active reporting, wait the
fixed time recommended by the PCIe spec.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI: Unify device inaccessible

Bring surprise removals and permanent failures together so we no longer
need separate flags. The implementation enforces that error handling will
not be able to override a surprise removal's permanent channel failure.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/ERR: Always report current recovery status for udev

A device still participates in error recovery even if it doesn't have
the error callbacks.

Always provide the status for user event watchers.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/ERR: Simplify broadcast callouts

There is no point in having a generic broadcast function if it needs to
have special cases for each callback it broadcasts.

Abstract the error broadcast to only the necessary information and removes
the now unnecessary helper to walk the bus.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/ERR: Run error recovery callbacks for all affected devices

If an Endpoint reported an error with ERR_FATAL, we previously ran driver
error recovery callbacks only for the Endpoint's driver.  But if we reset a
Link to recover from the error, all downstream components are affected,
including the Endpoint, any multi-function peers, and children of those
peers.

Initiate the Link reset from the deepest Downstream Port that is
reliable, and call the error recovery callbacks for all its children.

If a Downstream Port (including a Root Port) reports an error, we assume
the Port itself is reliable and we need to reset its downstream Link.  In
all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume
the Link leading to the component needs to be reset, so we initiate the
reset at the parent Downstream Port.

This allows two other clean-ups.  First, we currently only use a Link
reset, which can only be initiated using a Downstream Port, so we can
remove checks for Endpoints.  Second, the Downstream Port where we initiate
the Link reset is reliable (unlike components downstream from it), so the
special cases for error detect and resume are no longer necessary.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/ERR: Handle fatal error recovery

We don't need to be paranoid about the topology changing while handling an
error. If the device has changed in a hotplug capable slot, we can rely on
the presence detection handling to react to a changing topology.

Restore the fatal error handling behavior that existed before merging DPC
with AER with 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and
re-enumeration of devices").

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/ERR: Use slot reset if available

The secondary bus reset may have link side effects that a hotplug capable
port may incorrectly react to. Use the slot specific reset for hotplug
ports, fixing the undesirable link down-up handling during error
recovering.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: fold in
https://lore.kernel.org/linux-pci/20180926152326.14821-1-keith.busch@intel.com
for issue reported by Stephen Rothwell <sfr@canb.auug.org.au>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/AER: Don't read upstream ports below fatal errors

The AER driver has never read the config space of an endpoint that reported
a fatal error because the link to that device is considered unreliable.

An ERR_FATAL from an upstream port almost certainly indicates an error on
its upstream link, so we can't expect to reliably read its config space for
the same reason.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/AER: Take reference on error devices

Error handling may be running in parallel with a hot removal. Reference
count the device during AER handling so the device can not be freed while
AER wants to reference it.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI/DPC: Save and restore config state

This patch provides DPC save and restore capabilities. This is necessary
for the driver to observe DPC events in the event the configuration space
needs to be restored after a reset.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI: portdrv: Restore PCI config state on slot reset

The port's config space may be cleared after a link reset, which wipes out
the bridge's bus and memory windows. Restore the config space that was
saved during probe so we can access downstream devices.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI: portdrv: Initialize service drivers directly

The PCI port driver saves the PCI state after initializing the device with
the applicable service devices.  This was, however, before the service
drivers were even registered because PCI probe happens before the
device_initcall initialized those service drivers.  The config space state
that the services set up were not being saved.  The end result would cause
PCI devices to not react to events that the drivers think they did if the
PCI state ever needed to be restored.

Fix this by changing the service drivers from using the init calls to
having the portdrv driver calling the services directly.  This will get the
state saved as desired, while making the relationship between the port
driver and the services under it more explicit in the code.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>

PCI: hotplug: Document TODOs

While refactoring the PCI hotplug core's API, I noticed a significant
amount of technical debt in some of the hotplug drivers.  Document the
issues that caught my eye for starters.

I do not have hardware at my disposal that utilizes the listed drivers
and I think that's a prerequisite to work on them to ensure that no
regressions sneak in.  But some of this hardware is so old that it may be
hard to come by.  Obviously, it is fine to support old hardware, but the
drivers need to be maintained.

If noone steps up, perhaps we should consider sunsetting a few drivers
by moving them to staging.  Based on my findings, ibmphp would be the
first candidate.  I've found it fairly difficult to apply my API
refactorings to it and have listed some obvious bugs in the driver.
cpqphp is also in need of a modernization and would be a second
candidate for relegation to staging.

shpchp was introduced in the same commit as pciehp but hasn't benefited
from the same amount of refactoring due to the decline of conventional
PCI's relevance.  Yet hardware supporting it may be more prevalent than
for the proprietary hotplug methods.

Per Documentation/process/2.Process.rst, "a TODO file should be present"
for drivers in staging.  The file introduced by the present commit may
serve as a basis for this.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Dan Zink <dan.zink@hpe.com>
Cc: Prarit Bhargava <prarit@redhat.com>

PCI: hotplug: Embed hotplug_slot

When the PCI hotplug core and its first user, cpqphp, were introduced in
February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
struct for its internal use plus a hotplug_slot struct to be registered
with the hotplug core and linked the two with pointers:
https://git.kernel.org/tglx/history/c/a8a2069f432c

Nowadays, the predominant pattern in the tree is to embed ("subclass")
such structures in one another and cast to the containing struct with
container_of().  But it wasn't until July 2002 that container_of() was
introduced with historic commit ec4f214232cf:
https://git.kernel.org/tglx/history/c/ec4f214232cf

pnv_php, introduced in 2016, did the right thing and embedded struct
hotplug_slot in its internal struct pnv_php_slot, but all other drivers
cargo-culted cpqphp's design and linked separate structs with pointers.

Embedding structs is preferrable to linking them with pointers because
it requires fewer allocations, thereby reducing overhead and simplifying
error paths.  Casting an embedded struct to the containing struct
becomes a cheap subtraction rather than a dereference.  And having fewer
pointers reduces the risk of them pointing nowhere either accidentally
or due to an attack.

Convert all drivers to embed struct hotplug_slot in their internal slot
struct.  The "private" pointer in struct hotplug_slot thereby becomes
unused, so drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>

PCI: hotplug: Drop hotplug_slot_info

Ever since the PCI hotplug core was introduced in 2002, drivers had to
allocate and register a struct hotplug_slot_info for every slot:
https://git.kernel.org/tglx/history/c/a8a2069f432c

Apparently the idea was that drivers furnish the hotplug core with an
up-to-date card presence status, power status, latch status and
attention indicator status as well as notify the hotplug core of changes
thereof.  However only 4 out of 12 hotplug drivers bother to notify the
hotplug core with pci_hp_change_slot_info() and the hotplug core never
made any use of the information:  There is just a single macro in
pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
the driver lacks the corresponding callback in hotplug_slot_ops.  The
macro is called when the user reads the attribute via sysfs.

Now, if the callback isn't defined, the attribute isn't exposed in sysfs
in the first place (see e.g. has_power_file()).  There are only two
situations when the hotplug_slot_info would actually be accessed:

* If the driver defines ->enable_slot or ->disable_slot but not
  ->get_power_status.

* If the driver defines ->set_attention_status but not
  ->get_attention_status.

There is no driver doing the former and just a single driver doing the
latter, namely pnv_php.c.  Amend it with a ->get_attention_status
callback.  With that, the hotplug_slot_info becomes completely unused by
the PCI hotplug core.  But a few drivers use it internally as a cache:

cpcihp uses it to cache the latch_status and adapter_status.
cpqhp uses it to cache the adapter_status.
pnv_php and rpaphp use it to cache the attention_status.
shpchp uses it to cache all four values.

Amend these drivers to cache the information in their private slot
struct.  shpchp's slot struct already contains members to cache the
power_status and adapter_status, so additional members are only needed
for the other two values.  In the case of cpqphp, the cached value is
only accessed in a single place, so instead of caching it, read the
current value from the hardware.

Caution:  acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
populate the hotplug_slot_info with initial values on probe.  That code
is herewith removed.  There is a theoretical chance that the code has
side effects without which the driver fails to function, e.g. if the
ACPI method to read the adapter status needs to be executed at least
once on probe.  That seems unlikely to me, still maintainers should
review the changes carefully for this possibility.

Rafael adds: "I'm not aware of any case in which it will break anything,
[...] but if that happens, it may be necessary to add the execution of
the control methods in question directly to the initialization part."

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>

PCI: hotplug: Constify hotplug_slot_ops

Hotplug drivers cannot declare their hotplug_slot_ops const, making them
attractive targets for attackers, because upon registration of a hotplug
slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
in that struct.

Fix by moving these members to struct hotplug_slot and constify every
driver's hotplug_slot_ops except for pciehp.

pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
port's capabilities, hence cannot declare them const. It can be
converted to __write_rarely once that's mainlined:
http://www.openwall.com/lists/kernel-hardening/2016/11/16/3

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>

PCI: pciehp: Reshuffle controller struct for clarity

The members in pciehp's controller struct are arranged in a seemingly
arbitrary order and have grown to an amount that I no longer consider
easily graspable by contributors.

Sort the members into 5 rubrics:
* Slot Capabilities register and quirks
* Slot Control register access
* Slot Status register event handling
* state machine
* hotplug core interface

Obviously, this is just my personal bikeshed color and if anyone has a
better idea, please come forward. Any ordering will do as long as the
information is presented in a manageable manner.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

PCI: pciehp: Rename controller struct members for clarity

Of the members which were just moved from pciehp's slot struct to the
controller struct, rename "lock" to "state_lock" and rename "work" to
"button_work" for clarity. Perform the rename separately to the
unification of the two structs per Sinan's request.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sinan Kaya <okaya@kernel.org>

PCI: pciehp: Unify controller and slot structs

pciehp was originally introduced together with shpchp in a single
commit, c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug
drivers"):
https://git.kernel.org/tglx/history/c/c16b4b14d980

shpchp supports up to 31 slots per controller, hence uses separate slot
and controller structs. pciehp has a 1:1 relationship between slot and
controller and therefore never required this separation. Nevertheless,
because much of the code had been copy-pasted between the two drivers,
pciehp likewise uses separate structs to this very day.

The artificial separation of data structures adds unnecessary complexity
and bloat to pciehp and requires constantly chasing pointers at runtime.

Simplify the driver by merging struct slot into struct controller.
Merge the slot constructor pcie_init_slot() and the destructor
pcie_cleanup_slot() into the controller counterparts.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

PCI: pciehp: Tolerate Presence Detect hardwired to zero

The WiGig Bus Extension (WBE) specification allows tunneling PCIe over
IEEE 802.11.  A product implementing this spec is the wil6210 from
Wilocity (now part of Qualcomm Atheros).  It integrates a PCIe switch
with a wireless network adapter:

  00.0-+              [1ae9:0101]  Upstream Port
       +-00.0-+       [1ae9:0200]  Downstream Port
       |      +-00.0  [168c:0034]  Atheros AR9462 Wireless Network Adapter
       +-02.0         [1ae9:0201]  Downstream Port
       +-03.0         [1ae9:0201]  Downstream Port

Wirelessly attached devices presumably appear below the hotplug ports
with device ID [1ae9:0201].  Oddly, the Downstream Port [1ae9:0200]
leading to the wireless network adapter is likewise Hotplug Capable,
but has its Presence Detect State bit hardwired to zero.  Even if the
Link Active bit is set, Presence Detect is zero, so this cannot be
caused by in-band presence detection but only by broken hardware.

pciehp assumes an empty slot if Presence Detect State is zero,
regardless of Link Active being one.  Consequently, up until v4.18 it
removes the wireless network adapter in pciehp_resume().  From v4.19 it
already does so in pciehp_probe().

Be lenient towards broken hardware and assume the slot is occupied if
Link Active is set:  Introduce pciehp_card_present_or_link_active()
and use it in lieu of pciehp_get_adapter_status() everywhere, except
in pciehp_handle_presence_or_link_change() whose log messages depend
on which of Presence Detect State or Link Active is set.

Remove the Presence Detect State check from __pciehp_enable_slot()
because it is only called if either of Presence Detect State or Link
Active is set.

Caution: There is a possibility that broken hardware exists which has
working Presence Detect but hardwires Link Active to one.  On such
hardware the slot will now incorrectly be considered always occupied.
If such hardware is discovered, this commit can be rolled back and a
quirk can be added which sets is_hotplug_bridge = 0 for [1ae9:0200].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200839
Reported-and-tested-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rajat Jain <rajatja@google.com>
Cc: Ashok Raj <ashok.raj@intel.com>

PCI: pciehp: Drop hotplug_slot_ops wrappers

pciehp's ->enable_slot, ->disable_slot, ->get_attention_status and
->reset_slot callbacks are currently implemented by wrapper functions
that do nothing else but call down to a backend function. The backends
are not called from anywhere else, so drop the wrappers and use the
backends directly as callbacks, thereby shaving off a few lines of
unnecessary code.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

PCI: pciehp: Drop unnecessary includes

Drop the following includes from pciehp source files which no longer use
any of the included symbols:

* <linux/sched/signal.h> in pciehp.h
  <linux/signal.h> in pciehp_hpc.c
  Added by commit de25968cc87c ("fix more missing includes") to
  accommodate for a call to signal_pending().
  The call was removed by commit 262303fe329a ("pciehp: fix wait command
  completion").

* <linux/interrupt.h> in pciehp_core.c
  Added by historic commit f308a2dfbe63 ("PCI: add PCI Express Port Bus
  Driver subsystem") to accommodate for a call to free_irq():
  https://git.kernel.org/tglx/history/c/f308a2dfbe63
  The call was removed by commit 407f452b05f9 ("pciehp: remove
  unnecessary free_irq").

* <linux/time.h> in pciehp_core.c and pciehp_hpc.c
  Added by commit 34d03419f03b ("PCIEHP: Add Electro Mechanical
  Interlock (EMI) support to the PCIE hotplug driver."),
  which was reverted by commit bd3d99c17039 ("PCI: Remove untested
  Electromechanical Interlock (EMI) support in pciehp.").

* <linux/module.h> in pciehp_ctrl.c, pciehp_hpc.c and pciehp_pci.c
  Added by historic commit c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI
  Express hot-plug drivers"):
  https://git.kernel.org/tglx/history/c/c16b4b14d980
  Module-related symbols were neither used back then in those files,
  nor are they used today.

* <linux/slab.h> in pciehp_ctrl.c
  Added by commit 5a0e3ad6af86 ("include cleanup: Update gfp.h and
  slab.h includes to prepare for breaking implicit slab.h inclusion from
  percpu.h") to accommodate for calls to kmalloc().
  The calls were removed by commit 0e94916e6091 ("PCI: pciehp: Handle
  events synchronously").

* "../pci.h" in pciehp_ctrl.c
  Added by historic commit 67f4660b72f2 ("PCI: ASPM patch for") to
  accommodate for usage of the global variable pcie_mch_quirk:
  https://git.kernel.org/tglx/history/c/67f4660b72f2
  The global variable was removed by commit 0ba379ec0fb1 ("PCI: Simplify
  hotplug mch quirk").

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

PCI: pciehp: Differentiate between surprise and safe removal

When removing PCI devices below a hotplug bridge, pciehp marks them as
disconnected if the card is no longer present in the slot or it quiesces
them if the card is still present (by disabling INTx interrupts, bus
mastering and SERR# reporting).

To detect whether the card is still present, pciehp checks the Presence
Detect State bit in the Slot Status register.  The problem with this
approach is that even if the card is present, the link to it may be
down, and it that case it would be better to mark the devices as
disconnected instead of trying to quiesce them.  Moreover, if the card
in the slot was quickly replaced by another one, the Presence Detect
State bit would be set, yet trying to quiesce the new card's devices
would be wrong and the correct thing to do is to mark the previous
card's devices as disconnected.

Instead of looking at the Presence Detect State bit, it is better to
differentiate whether the card was surprise removed versus safely
removed (via sysfs or an Attention Button press).  On surprise removal,
the devices should be marked as disconnected, whereas on safe removal it
is correct to quiesce the devices.

The knowledge whether a surprise removal or a safe removal is at hand
does exist further up in the call stack:  A surprise removal is
initiated by pciehp_handle_presence_or_link_change(), a safe removal by
pciehp_handle_disable_request().

Pass that information down to pciehp_unconfigure_device() and use it in
lieu of the Presence Detect State bit.  While there, add kernel-doc to
pciehp_unconfigure_device() and pciehp_configure_device().

Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <keith.busch@intel.com>

PCI: Simplify disconnected marking

Commit 89ee9f768003 ("PCI: Add device disconnected state") iterates over
the devices on a parent bus, marks each as disconnected, then marks
each device's children as disconnected using pci_walk_bus().

The same can be achieved more succinctly by calling pci_walk_bus() on
the parent bus.  Moreover, this does not need to wait until acquiring
pci_lock_rescan_remove(), so move it out of that critical section.

The critical section in err.c contains a pci_dev_get() / pci_dev_put()
pair which was apparently copy-pasted from pciehp_pci.c.  In the latter
it serves the purpose of holding the struct pci_dev in place until the
Command register is updated.  err.c doesn't do anything like that, hence
the pair is unnecessary.  Remove it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Oza Pawandeep <poza@codeaurora.org>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Linux 4.19-rc4

Code of Conduct: Let's revamp it.

The Code of Conflict is not achieving its implicit goal of fostering
civility and the spirit of 'be excellent to each other'. Explicit
guidelines have demonstrated success in other projects and other areas
of the kernel.

Here is a Code of Conduct statement for the wider kernel. It is based
on the Contributor Covenant as described at www.contributor-covenant.org

From this point forward, we should abide by these rules in order to help
make the kernel community a welcoming environment to participate in.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Olof Johansson <olof@lxom.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingol Molnar:
"Misc fixes:

   - EFI crash fix

   - Xen PV fixes

   - do not allow PTI on 2-level 32-bit kernels for now

   - documentation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/APM: Fix build warning when PROC_FS is not enabled
  Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
  x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
  x86/xen: Disable CPU0 hotplug for Xen PV
  x86/EISA: Don't probe EISA bus for Xen PV guests
  x86/doc: Fix Documentation/x86/earlyprintk.txt

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Misc fixes: various scheduler metrics corner case fixes, a
  sched_features deadlock fix, and a topology fix for certain NUMA
  systems"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix kernel-doc notation warning
  sched/fair: Fix load_balance redo for !imbalance
  sched/fair: Fix scale_rt_capacity() for SMT
  sched/fair: Fix vruntime_normalized() for remote non-migration wakeup
  sched/pelt: Fix update_blocked_averages() for RT and DL classes
  sched/topology: Set correct NUMA topology type
  sched/debug: Fix potential deadlock when writing to sched_features

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also breakpoint and x86 PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  perf tools: Fix maps__find_symbol_by_name()
  tools headers uapi: Update tools's copy of linux/if_link.h
  tools headers uapi: Update tools's copy of linux/vhost.h
  tools headers uapi: Update tools's copies of kvm headers
  tools headers uapi: Update tools's copy of drm/drm.h
  tools headers uapi: Update tools's copy of asm-generic/unistd.h
  tools headers uapi: Update tools's copy of linux/perf_event.h
  perf/core: Force USER_DS when recording user stack data
  perf/UAPI: Clearly mark __PERF_SAMPLE_CALLCHAIN_EARLY as internal use
  perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs
  perf annotate: Fix parsing aarch64 branch instructions after objdump update
  perf probe powerpc: Ignore SyS symbols irrespective of endianness
  perf event-parse: Use fixed size string for comms
  perf util: Fix bad memory access in trace info.
  perf tools: Streamline bpf examples and headers installation
  perf evsel: Fix potential null pointer dereference in perf_evsel__new_idx()
  perf arm64: Fix include path for asm-generic/unistd.h
  perf/hw_breakpoint: Simplify breakpoint enable in perf_event_modify_breakpoint
  perf/hw_breakpoint: Enable breakpoint in modify_user_hw_breakpoint
  perf/hw_breakpoint: Remove superfluous bp->attr.disabled = 0
  ...

Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
"Misc fixes: liblockdep fixes and ww_mutex fixes"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/ww_mutex: Fix spelling mistake "cylic" -> "cyclic"
  locking/lockdep: Delete unnecessary #include
  tools/lib/lockdep: Add dummy task_struct state member
  tools/lib/lockdep: Add empty nmi.h
  tools/lib/lockdep: Update Sasha Levin email to MSFT
  jump_label: Fix typo in warning message
  locking/mutex: Fix mutex debug call and ww_mutex documentation

x86/APM: Fix build warning when PROC_FS is not enabled

Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:

../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
static int proc_apm_show(struct seq_file *m, void *v)

Fixes: 3f3942aca6da ("proc: introduce proc_create_single{,_data}")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org

Merge tag '4.19-rc3-smb3-cifs' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Fixes for four CIFS/SMB3 potential pointer overflow issues, one minor
  build fix, and a build warning cleanup"

* tag '4.19-rc3-smb3-cifs' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: read overflow in is_valid_oplock_break()
  cifs: integer overflow in in SMB2_ioctl()
  CIFS: fix wrapping bugs in num_entries()
  cifs: prevent integer overflow in nxt_dir_entry()
  fs/cifs: require sha512
  fs/cifs: suppress a string overflow warning

Merge tag 'nfs-for-4.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
"These are a handful of fixes for problems that Trond found. Patch #1
  and #3 have the same name, a second issue was found after applying the
  first patch.

  Stable bugfixes:
   - v4.17+: Fix tracepoint Oops in initiate_file_draining()
   - v4.11+: Fix an infinite loop on I/O

  Other fixes:
   - Return errors if a waiting layoutget is killed
   - Don't open code clearing of delegation state"

* tag 'nfs-for-4.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Don't open code clearing of delegation state
  NFSv4.1 fix infinite loop on I/O.
  NFSv4: Fix a tracepoint Oops in initiate_file_draining()
  pNFS: Ensure we return the error if someone kills a waiting layoutget
  NFSv4: Fix a tracepoint Oops in initiate_file_draining()

Merge tag 'trace-v4.19-rc3' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"This fixes an issue with the build system caused by a change that
  modifies CC_FLAGS_FTRACE. The issue is that it breaks the dependencies
  and causes "make targz-pkg" to rebuild the entire world"

* tag 'trace-v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/Makefile: Fix handling redefinition of CC_FLAGS_FTRACE

Merge tag 'devicetree-fixes-for-4.19-2' of git://git./linux/kernel/git/robh/linux

Pull DeviceTree fix from Rob Herring:
"One regression for a 20 year old PowerMac:

   - Fix a regression on systems having a DT without any phandles which
     happens on a PowerMac G3"

* tag 'devicetree-fixes-for-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: fix phandle cache creation for DTs with no phandles

Merge tag 'for-linus-4.19c-rc4-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"This contains some minor cleanups and fixes:

   - a new knob for controlling scrubbing of pages returned by the Xen
     balloon driver to the Xen hypervisor to address a boot performance
     issue seen in large guests booted pre-ballooned

   - a fix of a regression in the gntdev driver which made it impossible
     to use fully virtualized guests (HVM guests) with a 4.19 based dom0

   - a fix in Xen cpu hotplug functionality which could be triggered by
     wrong admin commands (setting number of active vcpus to 0)

  One further note: the patches have all been under test for several
  days in another branch. This branch has been rebased in order to avoid
  merge conflicts"

* tag 'for-linus-4.19c-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/gntdev: fix up blockable calls to mn_invl_range_start
  xen: fix GCC warning and remove duplicate EVTCHN_ROW/EVTCHN_COL usage
  xen: avoid crash in disable_hotplug_cpu
  xen/balloon: add runtime control for scrubbing ballooned out pages
  xen/manage: don't complain about an empty value in control/sysrq node

Merge tag 'xtensa-20180914' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes and cleanups from Max Filippov:

- don't allocate memory in platform_setup as the memory allocator is
   not initialized at that point yet;

- remove unnecessary ifeq KBUILD_SRC from arch/xtensa/Makefile;

- enable SG chaining in arch/xtensa/Kconfig.

* tag 'xtensa-20180914' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: enable SG chaining in Kconfig
  xtensa: remove unnecessary KBUILD_SRC ifeq conditional
  xtensa: ISS: don't allocate memory in platform_setup

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"The trickle of arm64 fixes continues to come in.

  Nothing that's the end of the world, but we've got a fix for PCI IO
  port accesses, an accidental naked "asm goto" and a fix to the
  vmcoreinfo PT_NOTE merged this time around which we'd like to get
  sorted before it becomes ABI.

   - Fix ioport_map() mapping the wrong physical address for some I/O
     BARs

   - Remove direct use of "asm goto", since some compilers don't like
     that

   - Ensure kimage_voffset is always present in vmcoreinfo PT_NOTE"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  asm-generic: io: Fix ioport_map() for !CONFIG_GENERIC_IOMAP && CONFIG_INDIRECT_PIO
  arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE
  arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto"

NFS: Don't open code clearing of delegation state

Add a helper for the case when the nfs4 open state has been set to use
a delegation stateid, and we want to revert to using the open stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSv4.1 fix infinite loop on I/O.

The previous fix broke recovery of delegated stateids because it assumes
that if we did not mark the delegation as suspect, then the delegation has
effectively been revoked, and so it removes that delegation irrespectively
of whether or not it is valid and still in use. While this is "mostly
harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning
in an infinite loop while complaining that we're using an invalid stateid
(in this case the all-zero stateid).

What we rather want to do here is ensure that the delegation is always
correctly marked as needing testing when that is the case. So we want
to close the loophole offered by nfs4_schedule_stateid_recovery(),
which marks the state as needing to be reclaimed, but not the
delegation that may be backing it.

Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSv4: Fix a tracepoint Oops in initiate_file_draining()

Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to
change the test in the tracepoint.

Fixes: ce5624f7e6675 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

pNFS: Ensure we return the error if someone kills a waiting layoutget

If someone interrupts a wait on one or more outstanding layoutgets in
pnfs_update_layout() then return the ERESTARTSYS/EINTR error.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSv4: Fix a tracepoint Oops in initiate_file_draining()

Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to
change the test in the tracepoint.

Fixes: ce5624f7e6675 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Merge tag 'dmaengine-fix-4.19-rc4' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fix from Vinod Koul:
"Fix the mic_x100_dma driver to use devm_kzalloc for driver memory, so
  that it is freed properly when it unregisters from dmaengine using
  managed API"

* tag 'dmaengine-fix-4.19-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: mic_x100_dma: use devm_kzalloc to fix an issue

Merge tag 'usb-4.19-rc4' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for -rc4.

  The usual suspects of gadget, xhci, and dwc2/3 are in here, along with
  some reverts of reported problem changes, and a number of build
  documentation warning fixes. Full details are in the shortlog.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
  Revert "cdc-acm: implement put_char() and flush_chars()"
  usb: Change usb_of_get_companion_dev() place to usb/common
  usb: xhci: fix interrupt transfer error happened on MTK platforms
  usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()
  usb: misc: uss720: Fix two sleep-in-atomic-context bugs
  usb: host: u132-hcd: Fix a sleep-in-atomic-context bug in u132_get_frame()
  usb: Avoid use-after-free by flushing endpoints early in usb_set_interface()
  linux/mod_devicetable.h: fix kernel-doc missing notation for typec_device_id
  usb/typec: fix kernel-doc notation warning for typec_match_altmode
  usb: Don't die twice if PCI xhci host is not responding in resume
  usb: mtu3: fix error of xhci port id when enable U3 dual role
  usb: uas: add support for more quirk flags
  USB: Add quirk to support DJI CineSSD
  usb: typec: fix kernel-doc parameter warning
  usb/dwc3/gadget: fix kernel-doc parameter warning
  USB: yurex: Check for truncation in yurex_read()
  USB: yurex: Fix buffer over-read in yurex_write()
  usb: host: xhci-plat: Iterate over parent nodes for finding quirks
  xhci: Fix use after free for URB cancellation on a reallocated endpoint
  USB: add quirk for WORLDE Controller KS49 or Prodipe MIDI 49C USB controller
  ...

Merge tag 'tty-4.19-rc4' of git://git./linux/kernel/git/gregkh/tty

Pull tty fixes from Greg KH:
"Here are three small HVC tty driver fixes to resolve a reported
  regression from 4.19-rc1.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: hvc: hvc_write() fix break condition
  tty: hvc: hvc_poll() fix read loop batching
  tty: hvc: hvc_poll() fix read loop hang

Merge tag 'staging-4.19-rc4' of git://git./linux/kernel/git/gregkh/staging

Pull staging/IIO driver fixes from Greg KH:
"Here are a few small staging and iio driver fixes for -rc4.

  Nothing major, just a few small bugfixes for some reported issues, and
  a MAINTAINERS file update for the fbtft drivers.

  We also re-enable the building of the erofs filesystem as the XArray
  patches that were causing it to break never got merged in the -rc1
  cycle, so there's no reason it can't be turned back on for now. The
  problem that was previously there is now being handled in the Xarray
  tree at the moment, so it will not hit us again in the future.

  All of these patches have been in linux-next with no reported issues"

* tag 'staging-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vboxvideo: Change address of scanout buffer on page-flip
  staging: vboxvideo: Fix IRQs no longer working
  staging: gasket: TODO: re-implement using UIO
  staging/fbtft: Update TODO and mailing lists
  staging: erofs: rename superblock flags (MS_xyz -> SB_xyz)
  iio: imu: st_lsm6dsx: take into account ts samples in wm configuration
  Revert "iio: temperature: maxim_thermocouple: add MAX31856 part"
  Revert "staging: erofs: disable compiling temporarile"
  MAINTAINERS: Switch a maintainer for drivers/staging/gasket
  staging: wilc1000: revert "fix TODO to compile spi and sdio components in single module"

Merge tag 'char-misc-4.19-rc4' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a small handful of char/misc driver fixes for 4.19-rc4.

  All of them are simple, resolving reported problems in a few drivers.
  Full details are in the shortlog.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  firmware: Fix security issue with request_firmware_into_buf()
  vmbus: don't return values for uninitalized channels
  fpga: dfl: fme: fix return value check in in pr_mgmt_init()
  misc: hmc6352: fix potential Spectre v1
  Tools: hv: Fix a bug in the key delete code
  misc: ibmvsm: Fix wrong assignment of return code
  android: binder: fix the race mmap and alloc_new_buf_locked
  mei: bus: need to unlink client before freeing
  mei: bus: fix hw module get/put balance
  mei: fix use-after-free in mei_cl_write
  mei: ignore not found client in the enumeration

Revert "x86/mm/legacy: Populate the user page-table with user pgd's"

This reverts commit 1f40a46cf47c12d93a5ad9dccd82bd36ff8f956a.

It turned out that this patch is not sufficient to enable PTI on 32 bit
systems with legacy 2-level page-tables. In this paging mode the huge-page
PTEs are in the top-level page-table directory, where also the mirroring to
the user-space page-table happens. So every huge PTE exits twice, in the
kernel and in the user page-table.

That means that accessed/dirty bits need to be fetched from two PTEs in
this mode to be safe, but this is not trivial to implement because it needs
changes to generic code just for the sake of enabling PTI with 32-bit
legacy paging. As all systems that need PTI should support PAE anyway,
remove support for PTI when 32-bit legacy paging is used.

Fixes: 7757d607c6b3 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org

xen/gntdev: fix up blockable calls to mn_invl_range_start

Patch series "mmu_notifiers follow ups".

Tetsuo has noticed some fallouts from 93065ac753e4 ("mm, oom: distinguish
blockable mode for mmu notifiers").  One of them has been fixed and picked
up by AMD/DRM maintainer [1].  XEN issue is fixed by patch 1.  I have also
clarified expectations about blockable semantic of invalidate_range_end.
Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no
longer used nor needed.

[1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz

This patch (of 3):

93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has
introduced blockable parameter to all mmu_notifiers and the notifier has
to back off when called in !blockable case and it could block down the
road.

The above commit implemented that for mn_invl_range_start but both
in_range checks are done unconditionally regardless of the blockable mode
and as such they would fail all the time for regular calls.  Fix this by
checking blockable parameter as well.

Once we are there we can remove the stale TODO.  The lock has to be
sleepable because we wait for completion down in gnttab_unmap_refs_sync.

Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org
Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

xen: fix GCC warning and remove duplicate EVTCHN_ROW/EVTCHN_COL usage

This patch removes duplicate macro useage in events_base.c.

It also fixes gcc warning:
variable ‘col’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Joshua Abraham <j.abraham1776@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

xen: avoid crash in disable_hotplug_cpu

The command 'xl vcpu-set 0 0', issued in dom0, will crash dom0:

BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 65 Comm: xenwatch Not tainted 4.19.0-rc2-1.ga9462db-default #1 openSUSE Tumbleweed (unreleased)
Hardware name: Intel Corporation S5520UR/S5520UR, BIOS S5500.86B.01.00.0050.050620101605 05/06/2010
RIP: e030:device_offline+0x9/0xb0
Code: 77 24 00 e9 ce fe ff ff 48 8b 13 e9 68 ff ff ff 48 8b 13 e9 29 ff ff ff 48 8b 13 e9 ea fe ff ff 90 66 66 66 66 90 41 54 55 53 <f6> 87 d8 02 00 00 01 0f 85 88 00 00 00 48 c7 c2 20 09 60 81 31 f6
RSP: e02b:ffffc90040f27e80 EFLAGS: 00010203
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8801f3800000 RSI: ffffc90040f27e70 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff820e47b3 R09: 0000000000000000
R10: 0000000000007ff0 R11: 0000000000000000 R12: ffffffff822e6d30
R13: dead000000000200 R14: dead000000000100 R15: ffffffff8158b4e0
FS: 00007ffa595158c0(0000) GS:ffff8801f39c0000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002d8 CR3: 00000001d9602000 CR4: 0000000000002660
Call Trace:
handle_vcpu_hotplug_event+0xb5/0xc0
xenwatch_thread+0x80/0x140
? wait_woken+0x80/0x80
kthread+0x112/0x130
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x3a/0x50

This happens because handle_vcpu_hotplug_event is called twice. In the
first iteration cpu_present is still true, in the second iteration
cpu_present is false which causes get_cpu_device to return NULL.
In case of cpu#0, cpu_online is apparently always true.

Fix this crash by checking if the cpu can be hotplugged, which is false
for a cpu that was just removed.

Also check if the cpu was actually offlined by device_remove, otherwise
leave the cpu_present state as it is.

Rearrange to code to do all work with device_hotplug_lock held.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

xen/balloon: add runtime control for scrubbing ballooned out pages

Scrubbing pages on initial balloon down can take some time, especially
in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
started with memory= significantly lower than maxmem=, all the extra
pages will be scrubbed before returning to Xen. But since most of them
weren't used at all at that point, Xen needs to populate them first
(from populate-on-demand pool). In nested virt case (Xen inside KVM)
this slows down the guest boot by 15-30s with just 1.5GB needed to be
returned to Xen.

Add runtime parameter to enable/disable it, to allow initially disabling
scrubbing, then enable it back during boot (for example in initramfs).
Such usage relies on assumption that a) most pages ballooned out during
initial boot weren't used at all, and b) even if they were, very few
secrets are in the guest at that time (before any serious userspace
kicks in).
Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
enabled by default), controlling default value for the new runtime
switch.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

xen/manage: don't complain about an empty value in control/sysrq node

When guest receives a sysrq request from the host it acknowledges it by
writing '\0' to control/sysrq xenstore node. This, however, make xenstore
watch fire again but xenbus_scanf() fails to parse empty value with "%c"
format string:

sysrq: SysRq : Emergency Sync
Emergency Sync complete
xen:manage: Error -34 reading sysrq code in control/sysrq

Ignore -ERANGE the same way we already ignore -ENOENT, empty value in
control/sysrq is totally legal.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

asm-generic: io: Fix ioport_map() for !CONFIG_GENERIC_IOMAP && CONFIG_INDIRECT_PIO

The !CONFIG_GENERIC_IOMAP version of ioport_map uses MMIO_UPPER_LIMIT to
prevent users from making I/O accesses outside the expected I/O range -
however it erroneously treats MMIO_UPPER_LIMIT as a mask which is
contradictory to its other users.

The introduction of CONFIG_INDIRECT_PIO, which subtracts an arbitrary
amount from IO_SPACE_LIMIT to form MMIO_UPPER_LIMIT, results in ioport_map
mangling the given port rather than capping it.

We address this by aligning more closely with the CONFIG_GENERIC_IOMAP
implementation of ioport_map by using the comparison operator and
returning NULL where the port exceeds MMIO_UPPER_LIMIT. Though note that
we preserve the existing behavior of masking with IO_SPACE_LIMIT such that
we don't break existing buggy drivers that somehow rely on this masking.

Fixes: 5745392e0c2b ("PCI: Apply the new generic I/O management on PCI IO hosts")
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

Merge tag 'printk-for-4.19-rc4' of git://git./linux/kernel/git/pmladek/printk

Pull printk fix from Petr Mladek:
"Revert a commit that caused "quiet", "debug", and "loglevel" early
parameters to be ignored for early boot messages"

* tag 'printk-for-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
Revert "printk: make sure to print log on console."

Merge tag 'ovl-fixes-4.19-rc4' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
"This fixes a regression in the recent file stacking update, reported
  and fixed by Amir Goldstein. The fix is fairly trivial, but involves
  adding a fadvise() f_op and the associated churn in the vfs. As
  discussed on -fsdevel, there are other possible uses for this method,
  than allowing proper stacking for overlays.

  And there's one other fix for a syzkaller detected oops"

* tag 'ovl-fixes-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix oopses in ovl_fill_super() failure paths
  ovl: add ovl_fadvise()
  vfs: implement readahead(2) using POSIX_FADV_WILLNEED
  vfs: add the fadvise() file operation
  Documentation/filesystems: update documentation of file_operations
  ovl: fix GPF in swapfile_activate of file from overlayfs over xfs
  ovl: respect FIEMAP_FLAG_SYNC flag

Merge tag 'for-linus-20180913' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Three fixes that should go into this series. This contains:

   - Increase number of policies supported by blk-cgroup.

     With blk-iolatency, we now have four in kernel, but we had a hard
     limit of three...

   - Fix regression in null_blk, where the zoned supported broke
     queue_mode=0 (bio based).

   - NVMe pull request, with a single fix for an issue in the rdma code"

* tag 'for-linus-20180913' of git://git.kernel.dk/linux-block:
  null_blk: fix zoned support for non-rq based operation
  blk-cgroup: increase number of supported policies
  nvmet-rdma: fix possible bogus dereference under heavy load

Merge tag 'for-4.19/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- DM verity fix for crash due to using vmalloc'd buffers with the
   asynchronous crypto hadsh API.

- Fix to both DM crypt and DM integrity targets to discontinue using
   CRYPTO_TFM_REQ_MAY_SLEEP because its use of GFP_KERNEL can lead to
   deadlock by recursing back into a filesystem.

- Various DM raid fixes related to reshape and rebuild races.

- Fix for DM thin-provisioning to avoid data corruption that was a
   side-effect of needing to abort DM thin metadata transaction due to
   running out of metadata space. Fix is to reserve a small amount of
   metadata space so that once it is used the DM thin-pool can finish
   its active transaction before switching to read-only mode.

* tag 'for-4.19/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm thin metadata: try to avoid ever aborting transactions
  dm raid: bump target version, update comments and documentation
  dm raid: fix RAID leg rebuild errors
  dm raid: fix rebuild of specific devices by updating superblock
  dm raid: fix stripe adding reshape deadlock
  dm raid: fix reshape race on small devices
  dm: disable CRYPTO_TFM_REQ_MAY_SLEEP to fix a GFP_KERNEL recursion deadlock
  dm verity: fix crash on bufio buffer that was allocated with vmalloc

Merge tag 'drm-fixes-2018-09-14' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"This is the general drm fixes pull for rc4.

  i915:
   - Two GVT fixes (one for the mm reference issue you pointed out)
   - Gen 2 video playback fix
   - IPS timeout error suppression on Broadwell

  amdgpu:
   - Small memory leak
   - SR-IOV reset
   - locking fix
   - updated SDMA golden registers

  nouveau:
   - Remove some leftover debugging"

* tag 'drm-fixes-2018-09-14' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/devinit: fix warning when PMU/PRE_OS is missing
  drm/amdgpu: fix error handling in amdgpu_cs_user_fence_chunk
  drm/i915/overlay: Allocate physical registers from stolen
  drm/amdgpu: move PSP init prior to IH in gpu reset
  drm/amdgpu: Fix SDMA hang in prt mode v2
  drm/amdgpu: fix amdgpu_mn_unlock() in the CS error path
  drm/i915/bdw: Increase IPS disable timeout to 100ms
  drm/i915/gvt: Fix the incorrect length of child_device_config issue
  drm/i915/gvt: Fix life cycle reference on KVM mm

Merge tag 'pstore-v4.19-rc4' of git://git./linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
"This fixes a 6 year old pstore bug that everyone just got lucky in
  avoiding, likely due only using page-aligned persistent ram regions:

   - Handle page-vs-byte offset handling between iomap and vmap (Bin Yang)"

* tag 'pstore-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: Fix incorrect persistent ram buffer mapping

Merge tag 'mmc-v4.19-rc2' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC host fixes from Ulf Hansson:

- meson-mx-sdio: Fix OF child-node lookup

- omap_hsmmc: Fix wakeirq handling on removal

* tag 'mmc-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-mx-sdio: fix OF child-node lookup
mmc: omap_hsmmc: fix wakeirq handling on removal

Merge tag 'pinctrl-v4.19-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- A complicated IRQ fix for the MSM driver (see commit)

- Fix the group/function check in the Ingenic driver

- Deal with a possible NULL pointer dereference in the Madera driver

* tag 'pinctrl-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: madera: Fix possible NULL pointer with pdata config
  pinctrl: ingenic: Fix group & function error checking
  pinctrl: msm: Really mask level interrupts to prevent latching

Merge branch 'for-4.19-fixes' of git://git./linux/kernel/git/tj/percpu

Pull percpu maintainership update from Tejun Heo:
"This updates the MAINTAINERS file to transfer the percpu tree
  maintainership to Dennis Zhou.

  Dennis rewrote a good portion of the percpu allocator, knows most of
  percpu related code, is already listed as a co-maintainer, has been
  reliable, and now sits right behind me. I'll keep reviewing and
  involved with percpu stuff and am sure that Dennis will soon make a
  better maintainer than I ever was"

* 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  MAINTAINERS: Make Dennis the percpu tree maintainer

Merge branch 'for-linus' of git://git./linux/kernel/git/rkuo/linux-hexagon-kernel

Pull hexagon fixes from Richard Kuo:
"Some fixes for compile warnings"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
hexagon: modify ffs() and fls() to return int
arch/hexagon: fix kernel/dma.c build warning

Merge tag 's390-4.19-3' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

- One fix for the zcrypt driver to correctly handle incomplete
   encryption/decryption operations.

- A cleanup for the aqmask/apmask parsing to avoid variable length
   arrays on the stack.

* tag 's390-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: remove VLA usage from the AP bus
  s390/crypto: Fix return code checking in cbc_paes_crypt()

mm: get rid of vmacache_flush_all() entirely

Jann Horn points out that the vmacache_flush_all() function is not only
potentially expensive, it's buggy too.  It also happens to be entirely
unnecessary, because the sequence number overflow case can be avoided by
simply making the sequence number be 64-bit.  That doesn't even grow the
data structures in question, because the other adjacent fields are
already 64-bit.

So simplify the whole thing by just making the sequence number overflow
case go away entirely, which gets rid of all the complications and makes
the code faster too.  Win-win.

[ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics
  also just goes away entirely with this ]

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'linux-4.19' of git://github.com/skeggsb/linux into drm-fixes

One more nouveau fix to remove some debug warnings.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <bskeggs@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CABDvA==GF63dy8a9j611=-0x8G6FRu7uC-ZQypsLO_hqV4OAcA@mail.gmail.com

Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

A few fixes for 4.19:
- Fix a small memory leak
- SR-IOV reset fix
- Fix locking in MMU-notifier error path
- Updated SDMA golden settings to fix a PRT hang

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180912154735.2683-1-alexander.deucher@amd.com

Merge tag 'drm-intel-fixes-2018-09-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

This contains a regression fix for video playbacks on gen 2 hardware,
a IPS timeout error suppression on Broadwell and GVT bucked with
"Most critical one is to fix KVM's mm reference when we access guest memory,
issue was raised by Linus [1], and another one with virtual opregion fix."

[1] - https://lists.freedesktop.org/archives/intel-gvt-dev/2018-August/004130.html

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180911223229.GA30328@intel.com

MAINTAINERS: Make Dennis the percpu tree maintainer

Dennis rewrote a significant portion of the percpu allocator and has
shown that he can respond in a timely and helpful manner when issues
are reported against percpu allocator.

Let's make Dennis the percpu tree maintainer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Christoph Lameter <cl@linux.com>

pstore: Fix incorrect persistent ram buffer mapping

persistent_ram_vmap() returns the page start vaddr.
persistent_ram_iomap() supports non-page-aligned mapping.

persistent_ram_buffer_map() always adds offset-in-page to the vaddr
returned from these two functions, which causes incorrect mapping of
non-page-aligned persistent ram buffer.

By default ftrace_size is 4096 and max_ftrace_cnt is nr_cpu_ids. Without
this patch, the zone_sz in ramoops_init_przs() is 4096/nr_cpu_ids which
might not be page aligned. If the offset-in-page > 2048, the vaddr will be
in next page. If the next page is not mapped, it will cause kernel panic:

[    0.074231] BUG: unable to handle kernel paging request at ffffa19e0081b000
...
[    0.075000] RIP: 0010:persistent_ram_new+0x1f8/0x39f
...
[    0.075000] Call Trace:
[    0.075000]  ramoops_init_przs.part.10.constprop.15+0x105/0x260
[    0.075000]  ramoops_probe+0x232/0x3a0
[    0.075000]  platform_drv_probe+0x3e/0xa0
[    0.075000]  driver_probe_device+0x2cd/0x400
[    0.075000]  __driver_attach+0xe4/0x110
[    0.075000]  ? driver_probe_device+0x400/0x400
[    0.075000]  bus_for_each_dev+0x70/0xa0
[    0.075000]  driver_attach+0x1e/0x20
[    0.075000]  bus_add_driver+0x159/0x230
[    0.075000]  ? do_early_param+0x95/0x95
[    0.075000]  driver_register+0x70/0xc0
[    0.075000]  ? init_pstore_fs+0x4d/0x4d
[    0.075000]  __platform_driver_register+0x36/0x40
[    0.075000]  ramoops_init+0x12f/0x131
[    0.075000]  do_one_initcall+0x4d/0x12c
[    0.075000]  ? do_early_param+0x95/0x95
[    0.075000]  kernel_init_freeable+0x19b/0x222
[    0.075000]  ? rest_init+0xbb/0xbb
[    0.075000]  kernel_init+0xe/0xfc
[    0.075000]  ret_from_fork+0x3a/0x50

Signed-off-by: Bin Yang <bin.yang@intel.com>
[kees: add comments describing the mapping differences, updated commit log]
Fixes: 24c3d2f342ed ("staging: android: persistent_ram: Make it possible to use memory outside of bootmem")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>

Merge tag 'pci-v4.19-fixes-1' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

- Add Tyrel Datwyler as maintainer for PPC64 RPA hotplug (Tyrel
   Datwyler)

- Add Gustavo Pimentel as DesignWare PCI maintainer (Joao Pinto)

- Fix a Switchtec Spectre v1 vulnerability (Gustavo A. R. Silva)

- Revert an unnecessary Intel 300 ACS quirk (Mika Westerberg)

- Fix pciehp hot-add/powerfault detection that left indicators in wrong
   state (Keith Busch)

- Fix pci_reset_bus() logic error (Dennis Dalessandro)

- Revert IB/hfi1 PCI reset change that caused a deadlock (Dennis
   Dalessandro)

- Allow enabling PASID on Root Complex Integrated Endpoints (Felix
   Kuehling)

* tag 'pci-v4.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix enabling of PASID on RC integrated endpoints
  IB/hfi1,PCI: Allow bus reset while probing
  PCI: Fix faulty logic in pci_reset_bus()
  PCI: pciehp: Fix hot-add vs powerfault detection order
  switchtec: Fix Spectre v1 vulnerability
  Revert "PCI: Add ACS quirk for Intel 300 series"
  MAINTAINERS: Add Gustavo Pimentel as DesignWare PCI maintainer
  MAINTAINERS: Add entries for PPC64 RPA PCI hotplug drivers

Merge tag 'for-linus-4.19' of git://github.com/cminyard/linux-ipmi

Pull IPMI bugfixes from Corey Minyard:
"A few fixes that came around or after the merge window, except for
  commit cd2315d471f4 ("ipmi: kcs_bmc: don't change device name") which
  is for a driver that very few people use, and those people need the
  change"

* tag 'for-linus-4.19' of git://github.com/cminyard/linux-ipmi:
  ipmi: Fix NULL pointer dereference in ssif_probe
  ipmi: Fix I2C client removal in the SSIF driver
  ipmi: Move BT capabilities detection to the detect call
  ipmi: Rework SMI registration failure
  ipmi: kcs_bmc: don't change device name

Merge tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm

Pull drm nouveau fixes from Dave Airlie:
"I'm sending this separately as it's a bit larger than I generally like
  for one driver, but it does contain a bunch of make my nvidia laptop
  not die (runpm) and a bunch to make my docking station and monitor
  display stuff (mst) fixes.

  Lyude has spent a lot of time on these, and we are putting the fixes
  into distro kernels as well asap, as it helps a bunch of standard
  Lenovo laptops, so I'm fairly happy things are better than they were
  before these patches, but I decided to split them out just for
  clarification"

* tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP panels
  drm/nouveau/disp: fix DP disable race
  drm/nouveau/disp: move eDP panel power handling
  drm/nouveau/disp: remove unused struct member
  drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS
  drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointer
  drm/nouveau: fix oops in client init failure path
  drm/nouveau: Fix nouveau_connector_ddc_detect()
  drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload
  drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early
  drm/nouveau: Reset MST branching unit before enabling
  drm/nouveau: Only write DP_MSTM_CTRL when needed
  drm/nouveau: Remove useless poll_enable() call in drm_load()
  drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()
  drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()
  drm/nouveau: Fix deadlocks in nouveau_connector_detect()
  drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
  drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests
  drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
  drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix up several Kconfig dependencies in netfilter, from Martin Willi
    and Florian Westphal.

2) Memory leak in be2net driver, from Petr Oros.

3) Memory leak in E-Switch handling of mlx5 driver, from Raed Salem.

4) mlx5_attach_interface needs to check for errors, from Huy Nguyen.

5) tipc_release() needs to orphan the sock, from Cong Wang.

6) Need to program TxConfig register after TX/RX is enabled in r8169
    driver, not beforehand, from Maciej S. Szmigiero.

7) Handle 64K PAGE_SIZE properly in ena driver, from Netanel Belgazal.

8) Fix crash regression in ip_do_fragment(), from Taehee Yoo.

9) syzbot can create conditions where kernel log is flooded with
    synflood warnings due to creation of many listening sockets, fix
    that. From Willem de Bruijn.

10) Fix RCU issues in rds socket layer, from Cong Wang.

11) Fix vlan matching in nfp driver, from Pieter Jansen van Vuuren.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
  nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
  nfp: flower: fix vlan match by checking both vlan id and vlan pcp
  tipc: check return value of __tipc_dump_start()
  s390/qeth: don't dump past end of unknown HW header
  s390/qeth: use vzalloc for QUERY OAT buffer
  s390/qeth: switch on SG by default for IQD devices
  s390/qeth: indicate error when netdev allocation fails
  rds: fix two RCU related problems
  r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
  erspan: fix error handling for erspan tunnel
  erspan: return PACKET_REJECT when the appropriate tunnel is not found
  tcp: rate limit synflood warnings further
  MIPS: lantiq: dma: add dev pointer
  netfilter: xt_hashlimit: use s->file instead of s->private
  netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
  netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
  netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
  netfilter: conntrack: reset tcp maxwin on re-register
  qmi_wwan: Support dynamic config on Quectel EP06
  ethernet: renesas: convert to SPDX identifiers
  ...

drm/nouveau/devinit: fix warning when PMU/PRE_OS is missing

Messed up when sending pull request and sent an outdated version of
previous patch, this fixes it up to remove warnings.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

null_blk: fix zoned support for non-rq based operation

The supported added for zones in null_blk seem to assume that only rq
based operation is possible. But this depends on the queue_mode setting,
if this is set to 0, then cmd->bio is what we need to be operating on.
Right now any attempt to load null_blk with queue_mode=0 will
insta-crash, since cmd->rq is NULL and null_handle_cmd() assumes it to
always be set.

Make the zoned code deal with bio's instead, or pass in the
appropriate sector/nr_sectors instead.

Fixes: ca4b2a011948 ("null_blk: add zone support")
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cifs: read overflow in is_valid_oplock_break()

We need to verify that the "data_offset" is within bounds.

Reported-by: Dr Silvio Cesare of InfoSect <silvio.cesare@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>

Merge branch 'nfp-flower-fixes'

Jakub Kicinski says:

====================
nfp: flower: fixes for flower offload

Two fixes for flower matching and tunnel encap. Pieter fixes
VLAN matching if the entire VLAN id is masked out and match
is only performed on the PCP field. Louis adds validation of
tunnel flags for encap, most importantly we should not offload
actions on IPv6 tunnels if it's not supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: reject tunnel encap with ipv6 outer headers for offloading

This fixes a bug where ipv6 tunnels would report that it is
getting offloaded to hardware but would actually be rejected
by hardware.

Fixes: b27d6a95a70d ("nfp: compile flower vxlan tunnel set actions")
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: fix vlan match by checking both vlan id and vlan pcp

Previously we only checked if the vlan id field is present when trying
to match a vlan tag. The vlan id and vlan pcp field should be treated
independently.

Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: check return value of __tipc_dump_start()

When __tipc_dump_start() fails with running out of memory,
we have no reason to continue, especially we should avoid
calling tipc_dump_done().

Fixes: 8f5c5fcf3533 ("tipc: call start and done ops directly in __tipc_nl_compat_dumpit()")
Reported-and-tested-by: syzbot+3f8324abccfbf8c74a9f@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qeth-fixes'

Julian Wiedmann says:

====================
s390/qeth: fixes 2018-09-12

please apply the following qeth fixes for -net.

Patch 1 resolves a regression in an error path, while patch 2 enables
the SG support by default that was newly introduced with 4.19.
Patch 3 takes care of a longstanding problem with large-order
allocations, and patch 4 fixes a potential out-of-bounds access.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: don't dump past end of unknown HW header

For inbound data with an unsupported HW header format, only dump the
actual HW header. We have no idea how much payload follows it, and what
it contains. Worst case, we dump past the end of the Inbound Buffer and
access whatever is located next in memory.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: use vzalloc for QUERY OAT buffer

qeth_query_oat_command() currently allocates the kernel buffer for
the SIOC_QETH_QUERY_OAT ioctl with kzalloc. So on systems with
fragmented memory, large allocations may fail (eg. the qethqoat tool by
default uses 132KB).

Solve this issue by using vzalloc, backing the allocation with
non-contiguous memory.

Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: switch on SG by default for IQD devices

Scatter-gather transmit brings a nice performance boost. Considering the
rather large MTU sizes at play, it's also totally the Right Thing To Do.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: indicate error when netdev allocation fails

Bailing out on allocation error is nice, but we also need to tell the
ccwgroup core that creating the qeth groupdev failed.

Fixes: d3d1b205e89f ("s390/qeth: allocate netdevice early")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3

Commit eeb89e2bb1ac ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
moved loading the fixmap in efi_call_phys_epilog() after load_cr3() since
it was assumed to be more logical.

Turns out this is incorrect: In efi_call_phys_prolog(), the gdt with its
physical address is loaded first, and when the %cr3 is reloaded in _epilog
from initial_page_table to swapper_pg_dir again the gdt is no longer
mapped. This results in a triple fault if an interrupt occurs after
load_cr3() and before load_fixmap_gdt(0). Calling load_fixmap_gdt(0) first
restores the execution order prior to commit eeb89e2bb1ac and fixes the
problem.

Fixes: eeb89e2bb1ac ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-efi@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1536689892-21538-1-git-send-email-linux@roeck-us.net