review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge tag 'arch-timer-errata' of git://git./linux/kernel/git/maz/arm-platforms into clockevents/4.12

arm64 arch timer workaround series, including the base patches
that will also go via the arm64 tree.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers/fttmr010: Refactor to handle clock

The plain Faraday FTTMR010 timer needs a clock to figure out its
tick rate, and the gemini reads it directly from the system
controller set-up. Split the init function and add two paths for
the two compatible-strings. We only support clocking using PCLK
because of lack of documentation on how EXTCLK works.

The Gemini still works like before, but we can also support a
generic, clock-based version.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers/gemini: Rename Gemini timer to Faraday

After some research it turns out that the "Gemini" timer is
actually a generic IP block from Faraday Technology named
FTTMR010, so as to not make things too confusing we need to
rename the driver and its symbols to make sense.

The implementation remains the same in this patch but we fix
the copy-paste error in the timer name "nomadik_mtu" as we're
at it.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource: Augment bindings for Faraday timer

It turns out that the Cortina Gemini timer block is just a
standard IP block from Faraday Technology named FTTMR010.

In order to make things clear and understandable, we rename the
bindings with a Faraday compatible as primary and the Cortina
gemini as a more specific case.

For the plain Faraday timer we require two clock references,
while the Gemini can keep it's syscon lookup pattern.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>

ARM: dts: rockchip: disable arm-global-timer for rk3188

The clocksource and the sched_clock provided by the arm_global_timer
are quite unstable because their rates depend on the cpu frequency.

On the other side, the arm_global_timer has a higher rating than the
rockchip_timer, it will be selected by default by the time framework
while we want to use the stable rockchip clocksource.

Let's disable the arm_global_timer in order to have the rockchip
clocksource selected by default.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>

ARM: dts: rockchip: Add timer entries to rk3188 SoC

The patch add two timers to all rk3188 based boards.

The first timer is from alive subsystem and it act as a backup
for the local timers at sleep time. It act the same as other
SoC rockchip timers already present in kernel.

The second timer is from CPU subsystem and act as replacement
for the arm-global-timer clocksource and sched clock. It run
at stable frequency 24MHz.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>

clocksource/drivers/rockchip_timer: Implement clocksource timer

The clock supplying the arm-global-timer on the rk3188 is coming from the
the cpu clock itself and thus changes its rate everytime cpufreq adjusts
the cpu frequency making this timer unsuitable as a stable clocksource
and sched clock.

The rk3188, rk3288 and following socs share a separate timer block already
handled by the rockchip-timer driver. Therefore adapt this driver to also
be able to act as clocksource and sched clock on rk3188.

In order to test clocksource you can run following commands and check
how much time it take in real. On rk3188 it take about ~45 seconds.

cpufreq-set -f 1.6GHZ
date; sleep 60; date

In order to use the patch you need to declare two timers in the dts
file. The first timer will be initialized as clockevent provider
and the second one as clocksource. The clockevent must be from
alive subsystem as it used as backup for the local timers at sleep
time.

The patch does not break compatibility with older device tree files.
The older device tree files contain only one timer. The timer
will be initialized as clockevent, as expected.

rk3288 (and probably anything newer) is irrelevant to this patch,
as it has the arch timer interface. This patch may be useful
for Cortex-A9/A5 based parts.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

ARM: dts: rockchip: Update compatible property for rk322x timer

Property set to '"rockchip,rk3228-timer", "rockchip,rk3288-timer"'
to match devicetree bindings.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Suggested-by: Heiko Stübner <heiko@sntech.de>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

dt-bindings: Clarify compatible property for rockchip timers

Make all properties description in form '"rockchip,<chip>-timer",
"rockchip,rk3288-timer"' for all chips found in linux kernel.

Suggested-by: Heiko Stübner <heiko@sntech.de>
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource: Add missing line break to error messages

Printing with pr_* functions requires adding line break manually.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers/orion: Add delay_timer implementation

Add an implementation for the ARM delay timer, which is used for
udelay(). This provides less CPU dependent and more accurate delays -
the CPU loop on Marvell Dove appears to calibrate to around 6% too
short.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers/orion: Read clock rate once

Rather than reading the clock rate three times, read it once - we are
about to add a fourth usage.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

arm64: arch_timer: Add HISILICON_ERRATUM_161010101 ACPI matching data

In order to deal with ACPI enabled platforms suffering from the
HISILICON_ERRATUM_161010101, let's add the required OEM data that
allow the workaround to be enabled.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: dann frazier <dann.frazier@canonical.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Allow erratum matching with ACPI OEM information

Just as we're able to identify a broken platform using some DT
information, let's enable a way to spot the offenders with ACPI.

The difference is that we can only match on some OEM info instead
of implementation-specific properties. So in order to avoid the
insane multiplication of errata structures, we allow an array
of OEM descriptions to be attached to an erratum structure.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: dann frazier <dann.frazier@canonical.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Workaround for Cortex-A73 erratum 858921

Cortex-A73 (all versions) counter read can return a wrong value
when the counter crosses a 32bit boundary.

The workaround involves performing the read twice, and to return
one or the other depending on whether a transition has taken place.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Enable CNTVCT_EL0 trap if workaround is enabled

Userspace being allowed to use read CNTVCT_EL0 anytime (and not
only in the VDSO), we need to enable trapping whenever a cntvct
workaround is enabled on a given CPU.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Save cntkctl_el1 as a per-cpu variable

As we're about to allow per CPU cntkctl_el1 configuration, we cannot
rely on the register value to be common when performing power
management.

Let's turn saved_cntkctl into a per-cpu variable.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Move clocksource_counter and co around

In order to access clocksource_counter from the errata handling code,
move it (together with the related structures and functions) towards
the top of the file.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs

Instead of applying a CPU-specific workaround to all CPUs in the system,
allow it to only affect a subset of them (typical big-little case).

This is done by turning the erratum pointer into a per-CPU variable.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Make workaround methods optional

Not all errata need to workaround all access types. Allow them to
be optional.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Rework the set_next_event workarounds

The way we work around errata affecting set_next_event is not very
nice, at it imposes this workaround on errata that do not need it.

Add new workaround hooks and let the existing workarounds use them.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Get rid of erratum_workaround_set_sne

Let's move the handling of workarounds affecting set_next_event
to the affected function, instead of overriding the pointers
as an afterthough. Yes, this is an extra indirection on the
erratum handling path, but the HW is busted anyway.

This will allow for some more flexibility later.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Move arch_timer_reg_read/write around

As we're about to move things around, let's start with the low
level read/write functions. This allows us to use these functions
in the errata handling code without having to use forward declaration
of static functions.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Add erratum handler for CPU-specific capability

Should we ever have a workaround for an erratum that is detected using
a capability and affecting a particular CPU, it'd be nice to have
a way to probe them directly.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: arch_timer: Add infrastructure for multiple erratum detection methods

We're currently stuck with DT when it comes to handling errata, which
is pretty restrictive. In order to make things more flexible, let's
introduce an infrastructure that could support alternative discovery
methods. No change in functionality.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: cpu_errata: Add capability to advertise Cortex-A73 erratum 858921

In order to work around Cortex-A73 erratum 858921 in a subsequent
patch, add the required capability that advertise the erratum.

As the configuration option it depends on is not present yet,
this has no immediate effect.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: cpu_errata: Allow an erratum to be match for all revisions of a core

Some minor erratum may not be fixed in further revisions of a core,
leading to a situation where the workaround needs to be updated each
time an updated core is released.

Introduce a MIDR_ALL_VERSIONS match helper that will work for all
versions of that MIDR, once and for all.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: Define Cortex-A73 MIDR

As we're about to introduce a new workaround that is specific to
Cortex-A73, let's define the coresponding MIDR.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: Add CNTVCT_EL0 trap handler

Since people seem to make a point in breaking the userspace visible
counter, we have no choice but to trap the access. Add the required
handler.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

arm64: Allow checking of a CPU-local erratum

this_cpu_has_cap() only checks the feature array, and not the errata
one. In order to be able to check for a CPU-local erratum, allow it
to inspect the latter as well.

This is consistent with cpus_have_cap()'s behaviour, which includes
errata already.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

timekeeping: Remove pointless conversion to bool

interp_forward is type bool so assignment from a logical operation directly
is sufficient.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1490382215-30505-1-git-send-email-der.herr@hofr.at
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge branch 'fortglx/4.12/time' of https://git.linaro.org/people/john.stultz/linux into timers/core

Pull timekeeping changes from John Stultz:

Main changes are the initial steps of Nicoli's work to make the clockevent
timers be corrected for NTP adjustments. Then a few smaller fixes that
I've queued, and adding Stephen Boyd to the maintainers list for
timekeeping.

sysrq: Reset the watchdog timers while displaying high-resolution timers

On systems with a large number of CPUs, running sysrq-<q> can cause
watchdog timeouts.  There are two slow sections of code in the sysrq-<q>
path in timer_list.c.

1. print_active_timers() - This function is called by print_cpu() and
   contains a slow goto loop.  On a machine with hundreds of CPUs, this
   loop took approximately 100ms for the first CPU in a NUMA node.
   (Subsequent CPUs in the same node ran much quicker.)  The total time
   to print all of the CPUs is ultimately long enough to trigger the
   soft lockup watchdog.

2. print_tickdevice() - This function outputs a large amount of textual
   information.  This function also took approximately 100ms per CPU.

Since sysrq-<q> is not a performance critical path, there should be no
harm in touching the nmi watchdog in both slow sections above.  Touching
it in just one location was insufficient on systems with hundreds of
CPUs as occasional timeouts were still observed during testing.

This issue was observed on an Oracle T7 machine with 128 CPUs, but I
anticipate it may affect other systems with similarly large numbers of
CPUs.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

timers, sched_clock: Update timeout for clock wrap

The scheduler clock framework may not use the correct timeout for the clock
wrap. This happens when a new clock driver calls sched_clock_register()
after the kernel called sched_clock_postinit(). In this case the clock wrap
timeout is too long thus sched_clock_poll() is called too late and the clock
already wrapped.

On my ARM system the scheduler was no longer scheduling any other task than
the idle task because the sched_clock() wrapped.

Signed-off-by: David Engraf <david.engraf@sysgo.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

MAINTAINERS: Add Stephen Boyd as timekeeping reviewer

After showing expertise and presenting on the timekeeping
subsystem at ELC[1], Stephen clearly should be included in
the maintainer list.

[1] https://www.youtube.com/watch?v=Puv4mW55bF8

Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>

clockevents: Make clockevents_config() static

A clockevent device's rate should be configured before or at registration
and changed afterwards through clockevents_update_freq() only.

For the configuration at registration, we already have
clockevents_config_and_register().

Right now, there are no clockevents_config() users outside of the
clockevents core.

To mitigiate the risk of drivers errorneously reconfiguring their rates
through clockevents_config() *after* device registration, make
clockevents_config() static.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

clocksource: h8300_timer8: Don't reset rate in ->set_state_oneshot()

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, h8300_timer8 violates this requirement in that it registers its
clockevent device with the correct rate, but resets its ->mult and ->rate
values in timer8_clock_event_start(), called from its ->set_state_oneshot()
function.

It seems like
commit 4633f4cac85a ("clocksource/drivers/h8300: Cleanup startup and
remove module code."),
which introduced the rate initialization at registration, missed to remove
the manual setting of ->mult and ->shift from timer8_clock_event_start().

Purge the setting of ->mult, ->shift, ->min_delta_ns and ->max_delta_ns
from timer8_clock_event_start().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

clocksource: em_sti: Compute rate before registration

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, em_sti violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot().

This patch moves the setting of the clockevent device's rate to its
registration.

I checked all current em_sti users in arch/arm/mach-shmobile and right now,
none of them changes any rate in any clock tree relevant to em_sti after
their respective time_init(). Since all em_sti instances are created after
time_init(), none of them should ever observe any clock rate changes.

- Determine the ->rate value in em_sti_probe() at device probing rather
than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

clocksource: em_sti: Split clock prepare and enable steps

Currently, the em_sti driver prepares and enables the needed clock in
em_sti_enable(), potentially called through its clockevent device's
->set_state_oneshot().

However, the clk_prepare() step may sleep whereas tick_program_event() and
thus, ->set_state_oneshot(), can be called in atomic context.

Split the clk_prepare_enable() in em_sti_enable() into two steps:
- prepare the clock at device probing via clk_prepare()
- and enable it in em_sti_enable() via clk_enable().
Slightly reorder resource initialization in em_sti_probe() in order to
facilitate error handling in later patches.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

clocksource: sh_tmu: Compute rate before registration again

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_tmu violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final rate through
clockevents_config() called from its ->set_state_oneshot() and
->set_state_periodic() functions respectively.

This patch moves the setting of the clockevent device's rate to its
registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
  commit 66f49121ffa4 ("clocksource: sh_tmu: compute mult and shift before
                        registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
  commit 0aeac458d9eb ("clocksource: sh_tmu: __clocksource_updatefreq_hz()
                        update")
basically reverts this, saying
  "Without this patch the old code uses clocksource_register() together
   with a hack that assumes a never changing clock rate."

However, I checked all current sh_tmu users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_tmu after their respective
time_init(). Since all sh_tmu instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_tmu_channel's ->rate member to struct sh_tmu_device:
  it's a property of the underlying clock which is in turn specific to
  the sh_tmu_device.
- Determine the ->rate value in sh_tmu_setup() at device probing rather
  than at first usage.
- Set the clockevent device's rate at its registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

clocksource: sh_cmt: Compute rate before registration again

With the upcoming NTP correction related rate adjustments to be implemented
in the clockevents core, the latter needs to get informed about every rate
change of a clockevent device made after its registration.

Currently, sh_cmt violates this requirement in that it registers its
clockevent device with a dummy rate and sets its final ->mult and ->shift
values from its ->set_state_oneshot() and ->set_state_periodic() functions
respectively.

This patch moves the setting of the clockevent device's ->mult and ->shift
values to before its registration.

Note that there has been some back and forth regarding this question with
respect to the clocksource also provided by this driver:
  commit f4d7c3565c16 ("clocksource: sh_cmt: compute mult and shift before
                        registration")
moves the rate determination from the clocksource's ->enable() function to
before its registration. OTOH, the later
  commit 3593f5fe40a1 ("clocksource: sh_cmt: __clocksource_updatefreq_hz()
                        update")
basically reverts this, saying
  "Without this patch the old code uses clocksource_register() together
   with a hack that assumes a never changing clock rate."

However, I checked all current sh_cmt users in arch/sh as well as in
arch/arm/mach-shmobile carefully and right now, none of them changes any
rate in any clock tree relevant to sh_cmt after their respective
time_init(). Since all sh_cmt instances are created after time_init(), none
of them should ever observe any clock rate changes.

What's more, both, a clocksource as well as a clockevent device, can
immediately get selected for use at their registration and thus, enabled
at this point already. So it's probably safer to assume a "never changing
clock rate" here.

- Move the struct sh_cmt_channel's ->rate member to struct sh_cmt_device:
  it's a property of the underlying clock which is in turn specific to
  the sh_cmt_device.
- Determine the ->rate value in sh_cmt_setup() at device probing rather
  than at first usage.
- Set the clockevent device's ->mult and ->shift values right before its
  registration.
- Although not strictly necessary for the upcoming clockevent core changes,
  set the clocksource's rate at its registration for consistency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

Linux 4.11-rc3

mm/swap: don't BUG_ON() due to uninitialized swap slot cache

This BUG_ON() triggered for me once at shutdown, and I don't see a
reason for the check. The code correctly checks whether the swap slot
cache is usable or not, so an uninitialized swap slot cache is not
actually problematic afaik.

I've temporarily just switched the BUG_ON() to a WARN_ON_ONCE(), since
I'm not sure why that seemingly pointless check was there. I suspect
the real fix is to just remove it entirely, but for now we'll warn about
it but not bring the machine down.

Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'powerpc-4.11-5' of git://git./linux/kernel/git/powerpc/linux

Pull more powerpc fixes from Michael Ellerman:
"A couple of minor powerpc fixes for 4.11:

   - wire up statx() syscall

   - don't print a warning on memory hotplug when HPT resizing isn't
     available

  Thanks to: David Gibson, Chandan Rajendra"

* tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Don't give a warning when HPT resizing isn't available
  powerpc: Wire up statx() syscall

Merge branch 'parisc-4.11-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:

- Mikulas Patocka added support for R_PARISC_SECREL32 relocations in
   modules with CONFIG_MODVERSIONS.

- Dave Anglin optimized the cache flushing for vmap ranges.

- Arvind Yadav provided a fix for a potential NULL pointer dereference
   in the parisc perf code (and some code cleanups).

- I wired up the new statx system call, fixed some compiler warnings
   with the access_ok() macro and fixed shutdown code to really halt a
   system at shutdown instead of crashing & rebooting.

* 'parisc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix system shutdown halt
  parisc: perf: Fix potential NULL pointer dereference
  parisc: Avoid compiler warnings with access_ok()
  parisc: Wire up statx system call
  parisc: Optimize flush_kernel_vmap_range and invalidate_kernel_vmap_range
  parisc: support R_PARISC_SECREL32 relocation in modules

Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"The bulk of the changes are in qla2xxx target driver code to address
  various issues found during Cavium/QLogic's internal testing (stable
  CC's included), along with a few other stability and smaller
  miscellaneous improvements.

  There are also a couple of different patch sets from Mike Christie,
  which have been a result of his work to use target-core ALUA logic
  together with tcm-user backend driver.

  Finally, a patch to address some long standing issues with
  pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices,
  which will make folks using physical (or virtual) magnetic tape happy"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits)
  qla2xxx: Update driver version to 9.00.00.00-k
  qla2xxx: Fix delayed response to command for loop mode/direct connect.
  qla2xxx: Change scsi host lookup method.
  qla2xxx: Add DebugFS node to display Port Database
  qla2xxx: Use IOCB interface to submit non-critical MBX.
  qla2xxx: Add async new target notification
  qla2xxx: Export DIF stats via debugfs
  qla2xxx: Improve T10-DIF/PI handling in driver.
  qla2xxx: Allow relogin to proceed if remote login did not finish
  qla2xxx: Fix sess_lock & hardware_lock lock order problem.
  qla2xxx: Fix inadequate lock protection for ABTS.
  qla2xxx: Fix request queue corruption.
  qla2xxx: Fix memory leak for abts processing
  qla2xxx: Allow vref count to timeout on vport delete.
  tcmu: Convert cmd_time_out into backend device attribute
  tcmu: make cmd timeout configurable
  tcmu: add helper to check if dev was configured
  target: fix race during implicit transition work flushes
  target: allow userspace to set state to transitioning
  target: fix ALUA transition timeout handling
  ...

Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull device-dax fixes from Dan Williams:
"The device-dax driver was not being careful to handle falling back to
  smaller fault-granularity sizes.

  The driver already fails fault attempts that are smaller than the
  device's alignment, but it also needs to handle the cases where a
  larger page mapping could be established. For simplicity of the
  immediate fix the implementation just signals VM_FAULT_FALLBACK until
  fault-size == device-alignment.

  One fix is for -stable to address pmd-to-pte fallback from the
  original implementation, another fix is for the new (introduced in
  4.11-rc1) pud-to-pmd regression, and a typo fix comes along for the
  ride.

  These have received a build success notification from the kbuild
  robot"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: fix debug output typo
  device-dax: fix pud fault fallback handling
  device-dax: fix pmd/pte fault fallback handling

qla2xxx: Update driver version to 9.00.00.00-k

Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Fix delayed response to command for loop mode/direct connect.

Current driver wait for FW to be in the ready state before
processing in-coming commands. For Arbitrated Loop or
Point-to- Point (not switch), FW Ready state can take a while.
FW will transition to ready state after all Nports have been
logged in. In the mean time, certain initiators have completed
the login and starts IO. Driver needs to start processing all
queues if FW is already started.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Change scsi host lookup method.

For target mode, when new scsi command arrive, driver first performs
a look up of the SCSI Host. The current look up method is based on
the ALPA portion of the NPort ID. For Cisco switch, the ALPA can
not be used as the index. Instead, the new search method is based
on the full value of the Nport_ID via btree lib.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Add DebugFS node to display Port Database

Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Use IOCB interface to submit non-critical MBX.

The Mailbox interface is currently over subscribed. We like
to reserve the Mailbox interface for the chip managment and
link initialization. Any non essential Mailbox command will
be routed through the IOCB interface. The IOCB interface is
able to absorb more commands.

Following commands are being routed through IOCB interface

- Get ID List (007Ch)
- Get Port DB (0064h)
- Get Link Priv Stats (006Dh)

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Add async new target notification

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Export DIF stats via debugfs

Signed-off-by: Anil Gurumurthy <anil.gurumurthy@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Improve T10-DIF/PI handling in driver.

Add routines to support T10 DIF tag.

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Anil Gurumurthy <anil.gurumurthy@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Allow relogin to proceed if remote login did not finish

If the remote port have started the login process, then the
PLOGI and PRLI should be back to back. Driver will allow
the remote port to complete the process. For the case where
the remote port decide to back off from sending PRLI, this
local port sets an expiration timer for the PRLI. Once the
expiration time passes, the relogin retry logic is allowed
to go through and perform login with the remote port.

Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Fix sess_lock & hardware_lock lock order problem.

The main lock that needs to be held for CMD or TMR submission
to upper layer is the sess_lock. The sess_lock is used to
serialize cmd submission and session deletion. The addition
of hardware_lock being held is not necessary. This patch removes
hardware_lock dependency from CMD/TMR submission.

Use hardware_lock only for error response in this case.

Path1
       CPU0                    CPU1
       ----                    ----
  lock(&(&ha->tgt.sess_lock)->rlock);
                               lock(&(&ha->hardware_lock)->rlock);
                               lock(&(&ha->tgt.sess_lock)->rlock);
  lock(&(&ha->hardware_lock)->rlock);

Path2/deadlock
*** DEADLOCK ***
Call Trace:
dump_stack+0x85/0xc2
print_circular_bug+0x1e3/0x250
__lock_acquire+0x1425/0x1620
lock_acquire+0xbf/0x210
_raw_spin_lock_irqsave+0x53/0x70
qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
process_one_work+0x1f4/0x6e0

Cc: <stable@vger.kernel.org>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reported-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Fix inadequate lock protection for ABTS.

Normally, ABTS is sent to Target Core as Task MGMT command.
In the case of error, qla2xxx needs to send response, hardware_lock
is required to prevent request queue corruption.

Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Fix request queue corruption.

When FW notify driver or driver detects low FW resource,
driver tries to send out Busy SCSI Status to tell Initiator
side to back off. During the send process, the lock was not held.

Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Fix memory leak for abts processing

Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

qla2xxx: Allow vref count to timeout on vport delete.

Cc: <stable@vger.kernel.org>
Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcmu: Convert cmd_time_out into backend device attribute

Instead of putting cmd_time_out under ../target/core/user_0/foo/control,
which has historically been used by parameters needed for initial
backend device configuration, go ahead and move cmd_time_out into
a backend device attribute.

In order to do this, tcmu_module_init() has been updated to create
a local struct configfs_attribute **tcmu_attrs, that is based upon
the existing passthrough_attrib_attrs along with the new cmd_time_out
attribute. Once **tcm_attrs has been setup, go ahead and point
it at tcmu_ops->tb_dev_attrib_attrs so it's picked up by target-core.

Also following MNC's previous change, ->cmd_time_out is stored in
milliseconds but exposed via configfs in seconds. Also, note this
patch restricts the modification of ->cmd_time_out to before +
after the TCMU device has been configured, but not while it has
active fabric exports.

Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcmu: make cmd timeout configurable

A single daemon could implement multiple types of devices
using multuple types of real devices that may not support
restarting from crashes and/or handling tcmu timeouts. This
makes the cmd timeout configurable, so handlers that do not
support it can turn if off for now.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcmu: add helper to check if dev was configured

This adds a helper to check if the dev was configured. It
will be used in the next patch to prevent updates to some
config settings after the device has been setup.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux

Pull OpenRISC fixes from Stafford Horne:
"OpenRISC fixes for build issues that were exposed by kbuild robots
  after 4.11 merge. All from allmodconfig builds. This includes:

   - bug in the handling of 8-byte get_user() calls

   - module build failure due to multile missing symbol exports"

* tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
  openrisc: Export symbols needed by modules
  openrisc: fix issue handling 8 byte get_user calls
  openrisc: xchg: fix `computed is not used` warning

target: fix race during implicit transition work flushes

This fixes the following races:

1. core_alua_do_transition_tg_pt could have read
tg_pt_gp_alua_access_state and gone into this if chunk:

if (!explicit &&
atomic_read(&tg_pt_gp->tg_pt_gp_alua_access_state) ==
ALUA_ACCESS_STATE_TRANSITION) {

and then core_alua_do_transition_tg_pt_work could update the
state. core_alua_do_transition_tg_pt would then only set
tg_pt_gp_alua_pending_state and the tg_pt_gp_alua_access_state would
not get updated with the second calls state.

2. core_alua_do_transition_tg_pt could be setting
tg_pt_gp_transition_complete while the tg_pt_gp_transition_work
is already completing. core_alua_do_transition_tg_pt then waits on the
completion that will never be called.

To handle these issues, we just call flush_work which will return when
core_alua_do_transition_tg_pt_work has completed so there is no need
to do the complete/wait. And, if core_alua_do_transition_tg_pt_work
was running, instead of trying to sneak in the state change, we just
schedule up another core_alua_do_transition_tg_pt_work call.

Note that this does not handle a possible race where there are multiple
threads call core_alua_do_transition_tg_pt at the same time. I think
we need a mutex in target_tg_pt_gp_alua_access_state_store.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: allow userspace to set state to transitioning

Userspace target_core_user handlers like tcmu-runner may want to set the
ALUA state to transitioning while it does implicit transitions. This
patch allows that state when set from configfs.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: fix ALUA transition timeout handling

The implicit transition time tells initiators the min time
to wait before timing out a transition. We currently schedule
the transition to occur in tg_pt_gp_implicit_trans_secs
seconds so there is no room for delays. If
core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
needs to write out info to a remote file, then the initiator can
easily time out the operation.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Use system workqueue for ALUA transitions

If tcmu-runner is processing a STPG and needs to change the kernel's
ALUA state then we cannot use the same work queue for task management
requests and ALUA transitions, because we could deadlock. The problem
occurs when a STPG times out before tcmu-runner is able to
call into target_tg_pt_gp_alua_access_state_store->
core_alua_do_port_transition -> core_alua_do_transition_tg_pt ->
queue_work. In this case, the tmr is on the work queue waiting for
the STPG to complete, but the STPG transition is now queued behind
the waiting tmr.

Note:
This bug will also be fixed by this patch:
http://www.spinics.net/lists/target-devel/msg14560.html
which switches the tmr code to use the system workqueues.

For both, I am not sure if we need a dedicated workqueue since
it is not a performance path and I do not think we need WQ_MEM_RECLAIM
to make forward progress to free up memory like the block layer does.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: fail ALUA transitions for pscsi

We do not setup the LU group for pscsi devices, so if you write
a state to alua_access_state that will cause a transition you will
get a NULL pointer dereference.

This patch will fail attempts to try and transition the path
for backend devices that set the TRANSPORT_FLAG_PASSTHROUGH_ALUA
flag.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: allow ALUA setup for some passthrough backends

This patch allows passthrough backends to use the core/base LIO
ALUA setup and state checks, but still handle the execution of
commands.

This will allow the target_core_user module to execute STPG and RTPG
in userspace, and not have to duplicate the ALUA state checks, path
information (needed so we can check if command is executable on
specific paths) and setup (rtslib sets/updates the configfs ALUA
interface like it does for iblock or file).

For STPG, the target_core_user userspace daemon, tcmu-runner will
still execute the STPG, and to update the core/base LIO state it
will use the existing configfs interface. For RTPG, tcmu-runner
will loop over configfs and/or cache the state.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcmu: return on first Opt parse failure

We only were returing failure if the last opt to be parsed failed.
This has a return failure when we first detect a failure.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcmu: allow hw_max_sectors greater than 128

tcmu hard codes the hw_max_sectors to 128 which is a litle small.
Userspace uses the max_sectors to report the optimal IO size and
some initiators perform better with larger IOs (open-iscsi seems
to do better with 256 to 512 depending on the test).

(Fix do not display hw max sectors twice - MNC)

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Drop pointless tfo->check_stop_free check

All in-tree fabric drivers provide a tfo->check_stop_free(),
so there is no need to do the extra check within existing
transport_cmd_check_stop_to_fabric() code.

Just to be sure, add a check in target_fabric_tf_ops_check()
to notify any out-of-tree drivers that might be missing it.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

parisc: Fix system shutdown halt

On those parisc machines which don't provide a software power off
function, the system currently kills the init process at the end of a
shutdown and unexpectedly restarts insteads of halting.
Fix it by adding a loop which will not return.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+

parisc: perf: Fix potential NULL pointer dereference

Fix potential NULL pointer dereference and clean up
coding style errors (code indent, trailing whitespaces).

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>

Merge branch 'smp-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull CPU hotplug fix from Thomas Gleixner:
"A single fix preventing the concurrent execution of the CPU hotplug
  callback install/invocation machinery. Long standing bug caused by a
  massive brain slip of that Gleixner dude, which went unnoticed for
  almost a year"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Serialize callback invocations proper

Merge tag 'pm-4.11-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix a few more intel_pstate issues and one small issue in the
  cpufreq core.

  Specifics:

   - Fix breakage in the intel_pstate's debugfs interface for PID
     controller tuning (Rafael Wysocki)

   - Fix computations related to P-state limits in intel_pstate to avoid
     excessive rounding errors leading to visible inaccuracies (Srinivas
     Pandruvada, Rafael Wysocki)

   - Add a missing newline to a message printed by one function in the
     cpufreq core and clean up that function (Rafael Wysocki)"

* tag 'pm-4.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Fix and clean up show_cpuinfo_cur_freq()
  cpufreq: intel_pstate: Avoid percentages in limits-related computations
  cpufreq: intel_pstate: Correct frequency setting in the HWP mode
  cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()

Merge branches 'pm-cpufreq-fixes' and 'intel_pstate-fixes'

* pm-cpufreq-fixes:
  cpufreq: Fix and clean up show_cpuinfo_cur_freq()

* intel_pstate-fixes:
  cpufreq: intel_pstate: Avoid percentages in limits-related computations
  cpufreq: intel_pstate: Correct frequency setting in the HWP mode
  cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()

Merge tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
"We have a handful of stable fixes to fix kernel warnings and other
  bugs that have been around for a while. We've also found a few other
  reference counting bugs and memory leaks since the initial 4.11 pull.

  Stable Bugfixes:
   - Fix decrementing nrequests in NFS v4.2 COPY to fix kernel warnings
   - Prevent a double free in async nfs4_exchange_id()
   - Squelch a kbuild sparse complaint for xprtrdma

  Other Bugfixes:
   - Fix a typo (NFS_ATTR_FATTR_GROUP_NAME) that causes a memory leak
   - Fix a reference leak that causes kernel warnings
   - Make nfs4_cb_sv_ops static to fix a sparse warning
   - Respect a server's max size in CREATE_SESSION
   - Handle errors from nfs4_pnfs_ds_connect
   - Flexfiles layout shouldn't mark devices as unavailable"

* tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  pNFS/flexfiles: never nfs4_mark_deviceid_unavailable
  pNFS: return status from nfs4_pnfs_ds_connect
  NFSv4.1 respect server's max size in CREATE_SESSION
  NFS prevent double free in async nfs4_exchange_id
  nfs: make nfs4_cb_sv_ops static
  xprtrdma: Squelch kbuild sparse complaint
  NFS: fix the fault nrequests decreasing for nfs_inode COPY
  NFSv4: fix a reference leak caused WARNING messages
  nfs4: fix a typo of NFS_ATTR_FATTR_GROUP_NAME

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"An assorted pile of fixes along with some hardware enablement:

   - a fix for a KASAN / branch profiling related boot failure

   - some more fallout of the PUD rework

   - a fix for the Always Running Timer which is not initialized when
     the TSC frequency is known at boot time (via MSR/CPUID)

   - a resource leak fix for the RDT filesystem

   - another unwinder corner case fixup

   - removal of the warning for duplicate NMI handlers because there are
     legitimate cases where more than one handler can be registered at
     the last level

   - make a function static - found by sparse

   - a set of updates for the Intel MID platform which got delayed due
     to merge ordering constraints. It's hardware enablement for a non
     mainstream platform, so there is no risk"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mpx: Make unnecessarily global function static
  x86/intel_rdt: Put group node in rdtgroup_kn_unlock
  x86/unwind: Fix last frame check for aligned function stacks
  mm, x86: Fix native_pud_clear build error
  x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
  x86/platform/intel-mid: Add power button support for Merrifield
  x86/platform/intel-mid: Use common power off sequence
  x86/platform: Remove warning message for duplicate NMI handlers
  x86/tsc: Fix ART for TSC_KNOWN_FREQ
  x86/platform/intel-mid: Correct MSI IRQ line for watchdog device

Merge branch 'x86-acpi-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 acpi fixes from Thomas Gleixner:
"This update deals with the fallout of the recent work to make
  cpuid/node mappings persistent.

  It turned out that the boot time ACPI based mapping tripped over ACPI
  inconsistencies and caused regressions. It's partially reverted and
  the fragile part replaced by an implementation which makes the mapping
  persistent when a CPU goes online for the first time"

* 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  acpi/processor: Check for duplicate processor ids at hotplug time
  acpi/processor: Implement DEVICE operator for processor enumeration
  x86/acpi: Restore the order of CPU IDs
  Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
  Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
"A set of perf related fixes:

   - fix a CR4.PCE propagation issue caused by usage of mm instead of
     active_mm and therefore propagated the wrong value.

   - perf core fixes, which plug a use-after-free issue and make the
     event inheritance on fork more robust.

   - a tooling fix for symbol handling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf symbols: Fix symbols__fixup_end heuristic for corner cases
  x86/perf: Clarify why x86_pmu_event_mapped() isn't racy
  x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
  perf/core: Better explain the inherit magic
  perf/core: Simplify perf_event_free_task()
  perf/core: Fix event inheritance on fork()
  perf/core: Fix use-after-free in perf_release()

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
"From the scheduler departement:

   - a bunch of sched deadline related fixes which deal with various
     buglets and corner cases.

   - two fixes for the loadavg spikes which are caused by the delayed
     NOHZ accounting"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Use deadline instead of period when calculating overflow
  sched/deadline: Throttle a constrained deadline task activated after the deadline
  sched/deadline: Make sure the replenishment timer fires in the next period
  sched/loadavg: Use {READ,WRITE}_ONCE() for sample window
  sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
  sched/deadline: Add missing update_rq_clock() in dl_task_timer()

Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
"Three fixes related to locking:

   - fix a SIGKILL issue for RWSEM_GENERIC_SPINLOCK which has been fixed
     for the XCHGADD variant already

   - plug a potential use after free in the futex code

   - prevent leaking a held spinlock in an futex error handling code
     path"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Fix down_write_killable() for CONFIG_RWSEM_GENERIC_SPINLOCK=y
  futex: Add missing error handling to FUTEX_REQUEUE_PI
  futex: Fix potential use-after-free in FUTEX_REQUEUE_PI

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"Just a simple revert of a new sched_clock implementation which turned
out to be buggy"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock"

pNFS/flexfiles: never nfs4_mark_deviceid_unavailable

The flexfiles layout should never mark a device unavailable.

Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call
directly from files layout where it's still needed.

The flexfiles driver still handles marked devices in error paths, but will
now print a rate limited warning.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

pNFS: return status from nfs4_pnfs_ds_connect

The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it
can wait on another context to reach the same failure.

This checks that the rpc_create succeeded and returns the error to the
caller.

When an error is returned, both the files and flexfiles layouts will return
NULL from _prepare_ds(). The flexfiles layout will also return the layout
with the error NFS4ERR_NXIO.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSv4.1 respect server's max size in CREATE_SESSION

Currently client doesn't respect max sizes server returns in CREATE_SESSION.
nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0
so they never get set to the sizes returned by the server.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFS prevent double free in async nfs4_exchange_id

Since rpc_task is async, the release function should be called which
will free the impl_id, scope, and owner.

Trond pointed at 2 more problems:
-- use of client pointer after free in the nfs4_exchangeid_release() function
-- cl_count mismatch if rpc_run_task() isn't run

Fixes: 8d89bd70bc9 ("NFS setup async exchange_id")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

nfs: make nfs4_cb_sv_ops static

Fixes the following sparse warning:

fs/nfs/callback.c:235:21: warning: symbol 'nfs4_cb_sv_ops' was not
declared. Should it be static?

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

xprtrdma: Squelch kbuild sparse complaint

New complaint from kbuild for 4.9.y:

net/sunrpc/xprtrdma/verbs.c:489:19: sparse: incompatible types in
comparison expression (different type sizes)

verbs.c:
489 max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES);

I can't reproduce this running sparse here. Likewise, "make W=1
net/sunrpc/xprtrdma/verbs.o" never indicated any issue.

A little poking suggests that because the range of its values is
small, gcc can make the actual width of RPCRDMA_MAX_SEND_SGES
smaller than the width of an unsigned integer.

Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

NFS: fix the fault nrequests decreasing for nfs_inode COPY

The nfs_commit_file for NFSv4.2's COPY operation goes through
the commit path for normal WRITE, but without increase nrequests,
so, the nrequests decreased in nfs_commit_release_pages is fault.
After that, the nrequests will be wrong.

[ 5670.299881] ------------[ cut here ]------------
[ 5670.300295] WARNING: CPU: 0 PID: 27656 at fs/nfs/inode.c:127 nfs_clear_inode+0x66/0x90 [nfs]
[ 5670.300558] Modules linked in: nfsv4(E) nfs(E) fscache(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event ppdev f2fs coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_ens1371 intel_rapl_perf gameport snd_ac97_codec vmw_balloon ac97_bus snd_seq snd_pcm joydev snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm e1000 crc32c_intel mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: fscache]
[ 5670.302925] CPU: 0 PID: 27656 Comm: umount.nfs4 Tainted: G        W   E   4.11.0-rc1+ #519
[ 5670.303292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 5670.304094] Call Trace:
[ 5670.304510]  dump_stack+0x63/0x86
[ 5670.304917]  __warn+0xcb/0xf0
[ 5670.305276]  warn_slowpath_null+0x1d/0x20
[ 5670.305661]  nfs_clear_inode+0x66/0x90 [nfs]
[ 5670.306093]  nfs4_evict_inode+0x61/0x70 [nfsv4]
[ 5670.306480]  evict+0xbb/0x1c0
[ 5670.306888]  dispose_list+0x4d/0x70
[ 5670.307233]  evict_inodes+0x178/0x1a0
[ 5670.307579]  generic_shutdown_super+0x44/0xf0
[ 5670.307985]  nfs_kill_super+0x21/0x40 [nfs]
[ 5670.308325]  deactivate_locked_super+0x43/0x70
[ 5670.308698]  deactivate_super+0x5a/0x60
[ 5670.309036]  cleanup_mnt+0x3f/0x90
[ 5670.309407]  __cleanup_mnt+0x12/0x20
[ 5670.309837]  task_work_run+0x80/0xa0
[ 5670.310162]  exit_to_usermode_loop+0x89/0x90
[ 5670.310497]  syscall_return_slowpath+0xaa/0xb0
[ 5670.310875]  entry_SYSCALL_64_fastpath+0xa7/0xa9
[ 5670.311197] RIP: 0033:0x7f1bb3617fe7
[ 5670.311545] RSP: 002b:00007ffecbabb828 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[ 5670.311906] RAX: 0000000000000000 RBX: 0000000001dca1f0 RCX: 00007f1bb3617fe7
[ 5670.312239] RDX: 000000000000000c RSI: 0000000000000001 RDI: 0000000001dc83c0
[ 5670.312653] RBP: 0000000001dc83c0 R08: 0000000000000001 R09: 0000000000000000
[ 5670.312998] R10: 0000000000000755 R11: 0000000000000206 R12: 00007ffecbabc66a
[ 5670.313335] R13: 0000000001dc83a0 R14: 0000000000000000 R15: 0000000000000000
[ 5670.313758] ---[ end trace bf4bfe7764e4eb40 ]---

Cc: linux-kernel@vger.kernel.org
Fixes: 67911c8f18 ("NFS: Add nfs_commit_file()")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

Merge tag 'afs-20170316' of git://git./linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
"Fixes to the AFS filesystem in the kernel.

  They fix a variety of bugs. These include some issues fixed for
  consistency with other AFS implementations:

   - handle AFS mode bits better

   - use the client mtime rather than the server mtime in the protocol

   - handle the server returning more or less data than was requested in
     a FetchData call

   - distinguish mountpoints from symlinks based on the mode bits rather
     than preemptively reading every symlink to find out what it
     actually represents

  One other notable change for the user is that files are now flushed on
  close analogously with other network filesystems"

* tag 'afs-20170316' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (28 commits)
  afs: Don't wait for page writeback with the page lock held
  afs: ->writepage() shouldn't call clear_page_dirty_for_io()
  afs: Fix abort on signal while waiting for call completion
  afs: Fix an off-by-one error in afs_send_pages()
  afs: Fix afs_kill_pages()
  afs: Fix page leak in afs_write_begin()
  afs: Don't set PG_error on local EINTR or ENOMEM when filling a page
  afs: Populate and use client modification time
  afs: Better abort and net error handling
  afs: Invalid op ID should abort with RXGEN_OPCODE
  afs: Fix the maths in afs_fs_store_data()
  afs: Use a bvec rather than a kvec in afs_send_pages()
  afs: Make struct afs_read::remain 64-bit
  afs: Fix AFS read bug
  afs: Prevent callback expiry timer overflow
  afs: Migrate vlocation fields to 64-bit
  afs: security: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
  afs: inode: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
  afs: Distinguish mountpoints from symlinks by file mode alone
  afs: Flush outstanding writes when an fd is closed
  ...

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fix from Russell King:
"Just one change to add the statx syscall this time around"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: wire up statx syscall

Merge tag 'for-linus-4.11b-rc3-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"A minor fix for using the appropriate refcount_t instead of atomic_t"

* tag 'for-linus-4.11b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
drivers, xen: convert grant_map.users from atomic_t to refcount_t

Merge tag 'drm-fixes-for-v4.11-rc3' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Bunch of fixes across the drivers, in a St Patrick's day pull request
  (please turn terminal colors to green on black or black on green for
  full effect).

  On the arm side, tilcdc, omap and malidp got fixes, while amd has some
  powermanagement fixes, and intel has a set of fixes across the driver.

  Nothing seems to bad or scary at this point"

* tag 'drm-fixes-for-v4.11-rc3' of git://people.freedesktop.org/~airlied/linux: (27 commits)
  drm/amd/amdgpu:  Fix debugfs reg read/write address width
  drm/amdgpu/si: add dpm quirk for Oland
  drm/radeon/si: add dpm quirk for Oland
  drm: amd: remove broken include path
  drm/amd/powerplay: fix copy error in smu7_clockpoweragting.c
  drm/tilcdc: Set framebuffer DMA address to HW only if CRTC is enabled
  drm/tilcdc: Fix hardcoded fail-return value in tilcdc_crtc_create()
  drm/i915: Fix forcewake active domain tracking
  drm/i915: Nuke skl_update_plane debug message from the pipe update critical section
  drm/i915: use correct node for handling cache domain eviction
  uapi: fix drm/omap_drm.h userspace compilation errors
  drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
  drm/amdgpu: fix parser init error path to avoid crash in parser fini
  drm/amd/amdgpu: Disable GFX_PG on Carrizo until compute issues solved
  drm: mali-dp: Fix smart layer not going to composition
  drm: mali-dp: Remove mclk rate management
  drm/i915: Drain the freed state from the tail of the next commit
  drm/i915: Nuke debug messages from the pipe update critical section
  drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl
  drm/i915: Store a permanent error in obj->mm.pages
  ...

hrtimer: Remove hrtimer_peek_ahead_timers() leftovers

This function was removed in commit c6eb3f70d448 (hrtimer: Get rid of
hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted.

Delete it now.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge tag 'perf-urgent-for-mingo-4.11-20170317' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fix from Arnaldo Carvalho de Melo:

- Fix symbols__fixup_end heuristic for corner cases, such as JITted eBPF
programs, that are loaded at page aligned addresses, just after the
kernel proper (Daniel Borkmann)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf symbols: Fix symbols__fixup_end heuristic for corner cases

The current symbols__fixup_end() heuristic for the last entry in the rb
tree is suboptimal as it leads to not being able to recognize the symbol
in the call graph in a couple of corner cases, for example:

i) If the symbol has a start address (f.e. exposed via kallsyms)
    that is at a page boundary, then the roundup(curr->start, 4096)
    for the last entry will result in curr->start == curr->end with
    a symbol length of zero.

ii) If the symbol has a start address that is shortly before a page
    boundary, then also here, curr->end - curr->start will just be
    very few bytes, where it's unrealistic that we could perform a
    match against.

Instead, change the heuristic to roundup(curr->start, 4096) + 4096, so
that we can catch such corner cases and have a better chance to find
that specific symbol. It's still just best effort as the real end of the
symbol is unknown to us (and could even be at a larger offset than the
current range), but better than the current situation.

Alexei reported that he recently run into case i) with a JITed eBPF
program (these are all page aligned) as the last symbol which wasn't
properly shown in the call graph (while other eBPF program symbols in
the rb tree were displayed correctly). Since this is a generic issue,
lets try to improve the heuristic a bit.

Reported-and-Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 2e538c4a1847 ("perf tools: Improve kernel/modules symbol lookup")
Link: http://lkml.kernel.org/r/bb5c80d27743be6f12afc68405f1956a330e1bc9.1489614365.git.daniel@iogearbox.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>