Arvind Yadav [Mon, 13 Nov 2017 13:53:49 +0000 (19:23 +0530)]
irqchip/gic-v3: pr_err() strings should end with newlines
pr_err() messages should end with a new-line to avoid other messages
being concatenated.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Arvind Yadav [Mon, 13 Nov 2017 13:53:48 +0000 (19:23 +0530)]
irqchip/s3c24xx: pr_err() strings should end with newlines
pr_err() messages should end with a new-line to avoid other messages
being concatenated.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Johan Hovold [Sat, 11 Nov 2017 16:51:25 +0000 (17:51 +0100)]
irqchip/gic-v3: Fix ppi-partitions lookup
Fix child-node lookup during initialisation, which ended up searching
the whole device tree depth-first starting at the parent rather than
just matching on its children.
To make things worse, the parent gic node was prematurely freed, while
the ppi-partitions node was leaked.
Fixes:
e3825ba1af3a ("irqchip/gic-v3: Add support for partitioned PPIs")
Cc: stable <stable@vger.kernel.org> # 4.7
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Fri, 10 Nov 2017 09:00:41 +0000 (09:00 +0000)]
irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
Should the call to irq_set_vcpu_affinity() fail at map time,
we should restore the normal lazy-disable behaviour instead
of staying with the eager disable that GICv4 requires.
Reported-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Thu, 9 Nov 2017 14:17:59 +0000 (14:17 +0000)]
genirq: Track whether the trigger type has been set
When requesting a shared interrupt, we assume that the firmware
support code (DT or ACPI) has called irqd_set_trigger_type
already, so that we can retrieve it and check that the requester
is being reasonnable.
Unfortunately, we still have non-DT, non-ACPI systems around,
and these guys won't call irqd_set_trigger_type before requesting
the interrupt. The consequence is that we fail the request that
would have worked before.
We can either chase all these use cases (boring), or address it
in core code (easier). Let's have a per-irq_desc flag that
indicates whether irqd_set_trigger_type has been called, and
let's just check it when checking for a shared interrupt.
If it hasn't been set, just take whatever the interrupt
requester asks.
Fixes:
382bd4de6182 ("genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Petr Cvek <petrcvekcz@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Matt Redfearn [Thu, 9 Nov 2017 11:02:45 +0000 (11:02 +0000)]
irqchip: mips-gic: Print warning if inherited GIC base is used
If the physical address of the GIC resource cannot be read from device
tree, then the code falls back to reading it from the gcr_gic_base
register. Hopefully this has been set to a sane value by the bootloader
or some platform code, but is defined by the hardware manual to have
"undefined" reset state. Using it as the address at which the GIC will
be mapped into physical memory space can therefore be risky if it has
not been initialised, since it may result in the GIC being mapped to an
effectively random address anywhere in physical memory, where it might
conflict with peripherals or RAM and lead to weird crashes.
Since a "sane value" is very platform specific because it is particular
to the platform's memory map, it is difficult to test for. At the very
least, a warning message should be printed in the case that we trust the
inherited value.
Reported-by: Amit Kama <amit.kama@satixfy.com>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Matt Redfearn [Thu, 9 Nov 2017 11:02:44 +0000 (11:02 +0000)]
irqchip/mips-gic: Add pr_fmt and reword pr_* messages
Several messages from the MIPS GIC driver include the text "GIC", but
the format is not standard. Add a pr_fmt of "irq-mips-gic: " and reword
the messages now that they will be prefixed with the driver name.
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ludovic Barre [Mon, 6 Nov 2017 17:03:36 +0000 (18:03 +0100)]
irqchip/stm32: Move the wakeup on interrupt mask
Move irq_set_wake on interrupt mask, needed to wake up from
low power mode as the event mask is not able to do so.
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ludovic Barre [Mon, 6 Nov 2017 17:03:35 +0000 (18:03 +0100)]
irqchip/stm32: Fix initial values
-After cold boot, imr default value depends on hardware configuration.
-After hot reboot the registers must be cleared to avoid residue.
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ludovic Barre [Mon, 6 Nov 2017 17:03:34 +0000 (18:03 +0100)]
irqchip/stm32: Add stm32h7 support
stm32h7 has up to 96 inputs
(3 banks of 32 inputs max).
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ludovic Barre [Mon, 6 Nov 2017 17:03:33 +0000 (18:03 +0100)]
dt-bindings/interrupt-controllers: Add compatible string for stm32h7
This patch updates stm32-exti documentation with stm32h7-exti
compatible string.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ludovic Barre [Mon, 6 Nov 2017 17:03:32 +0000 (18:03 +0100)]
irqchip/stm32: Add multi-bank management
-Prepare to manage multi-bank of external interrupts
(N banks of 32 inputs).
-Prepare to manage registers offsets by compatible
(registers offsets could be different follow per stm32 platform).
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ludovic Barre [Mon, 6 Nov 2017 17:03:31 +0000 (18:03 +0100)]
irqchip/stm32: Select GENERIC_IRQ_CHIP
This patch adds GENERIC_IRQ_CHIP to stm32 exti
config.
Signed-off-by: Ludovic Barre <ludovic.barre@st.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Mon, 6 Nov 2017 18:34:37 +0000 (18:34 +0000)]
irqchip/exiu: Add support for Socionext Synquacer EXIU controller
The Socionext Synquacer SoC has an external interrupt unit (EXIU)
that forwards a block of 32 configurable input lines to 32 adjacent
level-high type GICv3 SPIs.
The EXIU has per-interrupt level/edge and polarity controls, and
mask bits that keep the outgoing lines de-asserted, even though
the controller may still latch interrupt conditions that occur
while the line is masked.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Mon, 6 Nov 2017 18:34:36 +0000 (18:34 +0000)]
dt-bindings: Add description of Socionext EXIU interrupt controller
Add a description of the External Interrupt Unit (EXIU) interrupt
controller as found on the Socionext SynQuacer SoC.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Tue, 7 Nov 2017 10:04:38 +0000 (10:04 +0000)]
irqchip/gic-v3-its: Fix VPE activate callback return value
its_vpe_irq_domain_activate should always return 0. Really. There
is not a single case why it wouldn't. So this "return true;" is
really a copy/paste issue that got revealed now that we actually
check the return value of the activate method.
Brown paper bag day.
Fixes:
2247e1bf7063 ("irqchip/gic-v3-its: Limit scope of VPE mapping to be per ITS")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:51 +0000 (09:41 -0700)]
irqchip: mips-gic: Make IPI bitmaps static
We have 2 bitmaps used to keep track of interrupts dedicated to IPIs in
the MIPS GIC irqchip driver. These bitmaps are only used from the one
compilation unit of that driver, and so can be made static. Do so in
order to avoid polluting the symbol table & global namespace.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:50 +0000 (09:41 -0700)]
irqchip: mips-gic: Share register writes in gic_set_type()
The gic_set_type() function included writes to the MIPS GIC polarity,
trigger & dual-trigger registers in each case of a switch statement
determining the IRQs type. This is all well & good when we only have a
single cluster & thus a single GIC whose register we want to update. It
will lead to significant duplication once we have multi-cluster support
& multiple GICs to update.
Refactor this such that we determine values for the polarity, trigger &
dual-trigger registers and then have a single set of register writes
following the switch statement. This will allow us to write the same
values to each GIC in a multi-cluster system in a later patch, rather
than needing to duplicate more register writes in each case.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:49 +0000 (09:41 -0700)]
irqchip: mips-gic: Remove gic_vpes variable
Following the past few patches nothing uses the gic_vpes variable any
longer. Remove the dead code.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:48 +0000 (09:41 -0700)]
irqchip: mips-gic: Use num_possible_cpus() to reserve IPIs
Reserving a number of IPIs based upon the number of VPs reported by the
GIC makes little sense for a few reasons:
- The kernel may have been configured with NR_CPUS less than the number
of VPs in the cluster, in which case using gic_vpes causes us to
reserve more interrupts for IPIs than we will possibly use.
- If a kernel is configured without support for multi-threading & runs
on a system with multi-threading & multiple VPs per core then we'll
similarly reserve more interrupts for IPIs than we will possibly use.
- In systems with multiple clusters the GIC can only provide us with
the number of VPs in its cluster, not across all clusters. In this
case we'll reserve fewer interrupts for IPIs than we need.
Fix these issues by using num_possible_cpus() instead, which in all
cases is actually indicative of how many IPIs we may need.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:47 +0000 (09:41 -0700)]
irqchip: mips-gic: Configure EIC when CPUs come online
Rather than configuring EIC mode for all CPUs during boot, configure it
locally on each when they come online. This will become important with
multi-cluster support, since clusters may be powered on & off (for
example via hotplug) and would lose the EIC configuration when powered
off.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:46 +0000 (09:41 -0700)]
irqchip: mips-gic: Mask local interrupts when CPUs come online
We currently walk through the range 0..gic_vpes-1, expecting these
values all to be valid Linux CPU numbers to provide to mips_cm_vp_id(),
and masking all routable local interrupts during boot. This approach has
a few drawbacks:
- In multi-cluster systems we won't have access to all CPU's GIC local
registers when the driver is probed, since clusters (and their GICs)
may be powered down at this point & only brought online later.
- In multi-cluster systems we may power down clusters at runtime, for
example if we offline all CPUs within it via hotplug, and the
cluster's GIC may lose state. We therefore need to reinitialise it
when powering back up, which this approach does not take into
account.
- The range 0..gic_vpes-1 may not all be valid Linux CPU numbers, for
example if we run a kernel configured to support fewer CPUs than the
system it is running on actually has. In this case we'll get garbage
values from mips_cm_vp_id() as we read past the end of the cpu_data
array.
Fix this and simplify the code somewhat by writing an all-bits-set
value to the VP-local reset mask register when a CPU is brought online,
before any local interrupts are configured for it. This removes the need
for us to access all CPUs during driver probe, removing all of the
problems described above.
In the name of simplicity we drop the checks for routability of
interrupts and simply clear the mask bits for all interrupts. Bits for
non-routable local interrupts will have no effect so there's no point
performing extra work to avoid modifying them.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:45 +0000 (09:41 -0700)]
irqchip: mips-gic: Use irq_cpu_online to (un)mask all-VP(E) IRQs
The gic_all_vpes_local_irq_controller chip currently attempts to operate
on all CPUs/VPs in the system when masking or unmasking an interrupt.
This has a few drawbacks:
- In multi-cluster systems we may not always have access to all CPUs in
the system. When all CPUs in a cluster are powered down that
cluster's GIC may also power down, in which case we cannot configure
its state.
- Relatedly, if we power down a cluster after having configured
interrupts for CPUs within it then the cluster's GIC may lose state &
we need to reconfigure it. The current approach doesn't take this
into account.
- It's wasteful if we run Linux on fewer VPs than are present in the
system. For example if we run a uniprocessor kernel on CPU0 of a
system with 16 CPUs then there's no point in us configuring CPUs
1-15.
- The implementation is also lacking in that it expects the range
0..gic_vpes-1 to represent valid Linux CPU numbers which may not
always be the case - for example if we run on a system with more VPs
than the kernel is configured to support.
Fix all of these issues by only configuring the affected interrupts for
CPUs which are online at the time, and recording the configuration in a
new struct gic_all_vpes_chip_data for later use by CPUs being brought
online. We register a CPU hotplug state (reusing
CPUHP_AP_IRQ_GIC_STARTING which the ARM GIC driver uses, and which seems
suitably generic for reuse with the MIPS GIC) and execute
irq_cpu_online() in order to configure the interrupts on the newly
onlined CPU.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Paul Burton [Tue, 31 Oct 2017 16:41:44 +0000 (09:41 -0700)]
irqchip: mips-gic: Inline gic_local_irq_domain_map()
The gic_local_irq_domain_map() function has only one callsite in
gic_irq_domain_map(), and the split between the two functions makes it
unclear that they duplicate calculations & checks.
Inline gic_local_irq_domain_map() into gic_irq_domain_map() in order to
clean this up. Doing this makes the following small issues obvious, and
the patch tidies them up:
- Both functions used GIC_HWIRQ_TO_LOCAL() to convert a hwirq number to
a local IRQ number. We now only do this once. Although the compiler
ought to have optimised this away before anyway, the change leaves us
with less duplicate code.
- gic_local_irq_domain_map() had a check for invalid local interrupt
numbers (intr > GIC_LOCAL_INT_FDC). This condition can never occur
because any hwirq higher than those used for local interrupts is a
shared interrupt, which gic_irq_domain_map() already handles
separately. We therefore remove this check.
- The decision of whether to map the interrupt to gic_cpu_pin or
timer_cpu_pin can be handled within the existing switch statement in
gic_irq_domain_map(), shortening the code a little.
The change additionally prepares us nicely for the following patch of
the series which would otherwise need to duplicate the check for whether
a local interrupt should be percpu_devid or just percpu (ie. the switch
statement from gic_irq_domain_map()) in gic_local_irq_domain_map().
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Martin Blumenstingl [Sun, 29 Oct 2017 23:05:21 +0000 (00:05 +0100)]
irqchip/meson-gpio: add support for Meson8 SoCs
Meson8 uses the same GPIO interrupt controller IP block as the other
Meson SoCs. A total of 134 pins can be spied on, which is the sum of:
- 22 pins on bank GPIOX
- 17 pins on bank GPIOY
- 30 pins on bank GPIODV
- 10 pins on bank GPIOH
- 15 pins on bank GPIOZ
- 7 pins on bank CARD
- 19 pins on bank BOOT
- 14 pins in the AO domain
Acked-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Dou Liyang [Mon, 30 Oct 2017 02:15:00 +0000 (10:15 +0800)]
irqdomain: Update the comments of fwnode field of irq_domain structure
Commit:
f110711a6053 ("irqdomain: Convert irqdomain-%3Eof_node to fwnode")
converted of_node field to fwnode, but didn't update its comments.
Update it.
Fixes:
f110711a6053 ("irqdomain: Convert irqdomain-%3Eof_node to fwnode")
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Fri, 27 Oct 2017 08:34:22 +0000 (10:34 +0200)]
irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU interface
There is a lot of broken firmware out there that don't really
expose the information the kernel requires when it comes with dealing
with GICv2:
(1) Firmware that only describes the first 4kB of GICv2
(2) Firmware that describe 128kB of CPU interface, while
the usable portion of the address space is between
60 and 68kB
So far, we only deal with (2). But we have platforms exhibiting
behaviour (1), resulting in two sub-cases:
(a) The GIC is occupying 8kB, as required by the GICv2 architecture
(b) It is actually spread 128kB, and this is likely to be a version
of (2)
This patch tries to work around both (a) and (b) by poking at
the outside of the described memory region, and try to work out
what is actually there. This is of course unsafe, and should
only be enabled if there is no way to otherwise fix the DT provided
by the firmware (we provide a "irqchip.gicv2_force_probe" option
to that effect).
Note that for the time being, we restrict ourselves to GICv2
implementations provided by ARM, since there I have no knowledge
of an alternative implementations. This could be relaxed if such
an implementation comes to light on a broken platform.
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Thu, 26 Oct 2017 09:44:07 +0000 (10:44 +0100)]
irqchip/gic-v3-its: Setup VLPI properties at map time
So far, we require the hypervisor to update the VLPI properties
once the the VLPI mapping has been established. While this
makes it easy for the ITS driver, it creates a window where
an incoming interrupt can be delivered with an unknown set
of properties. Not very nice.
Instead, let's add a "properties" field to the mapping structure,
and use that to configure the VLPI before it actually gets mapped.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Thu, 2 Nov 2017 15:53:54 +0000 (15:53 +0000)]
Merge tag 'v4.14-rc3' into irq/irqchip-4.15
Required merge to get mainline irqchip updates.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Wei Yongjun [Wed, 11 Oct 2017 10:50:23 +0000 (10:50 +0000)]
irqchip/aspeed-i2c-ic: Fix return value check in aspeed_i2c_ic_of_init()
In case of error, the function of_iomap() returns NULL pointer not
ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test..
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Jerome Brunet [Mon, 18 Sep 2017 13:46:10 +0000 (15:46 +0200)]
irqchip/meson: Add support for gpio interrupt controller
Add support for the interrupt gpio controller found on Amlogic's meson
SoC family.
This controller is a separate controller from the gpio controller. It is
able to spy on the SoC pad. It is essentially a 256 to 8 router with a
filtering block to select level or edge and polarity. The number of actual
mappable inputs depends on the SoC.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Jerome Brunet [Mon, 18 Sep 2017 13:46:09 +0000 (15:46 +0200)]
dt-bindings: interrupt-controller: Add DT binding for meson GPIO interrupt controller
This commit adds the device tree bindings description for Amlogic's GPIO
interrupt controller available on the meson8b, gxbb and gxl SoC families
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Thu, 19 Oct 2017 09:11:34 +0000 (10:11 +0100)]
irqchip/gic-v3-its: Update effective affinity on VPE mapping
When setting the affinity of a VPE (either because we map or move
it), make sure the effective affinity is correctly reported back
to the core kernel.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Mon, 9 Oct 2017 12:17:43 +0000 (13:17 +0100)]
irqchip/gic-v3-its: Only send VINVALL to a single ITS
Sending VINVALL to all ITSs is completely pointless, as all
we're trying to achieve is to tell the redistributor that
the property table for this VPE should be invalidated.
Let's issue the command on the first valid ITS and be done with it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sun, 8 Oct 2017 17:50:36 +0000 (18:50 +0100)]
irqchip/gic-v3-its: Limit scope of VPE mapping to be per ITS
So far, we map all VPEs on all ITSs. While this is not wrong,
this is quite a big hammer, as moving a VPE around requires
all ITSs to be synchronized. Needles to say, this is an
expensive proposition.
Instead, let's switch to a mode where we issue VMAPP commands
only on ITSs that are actually involved in reporting interrupts
to the given VM.
For that purpose, we refcount the number of interrupts are are
mapped for this VM on each ITS, performing the map/unmap
operations as required. It then allows us to use this refcount
to only issue VMOVP to the ITSs that need to know about this
VM.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sun, 8 Oct 2017 17:46:39 +0000 (18:46 +0100)]
irqchip/gic-v3-its: Make its_send_vmapp operate on a single ITS
Currently, its_send_vmapp operates on all ITSs. As we're about
to try and limit the amount of commands we send to ITSs that are
not involved in dealing with a given VM, let's redefine that
primitive so that it takes a target ITS as a parameter.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sun, 8 Oct 2017 14:16:09 +0000 (15:16 +0100)]
irqchip/gic-v3-its: Make its_send_vinvall operate on a single ITS
Currently, its_send_vinvall operates on all ITSs. As we're about
to try and limit the amount of commands we send to ITSs that are
not involved in dealing with a given VM, let's redefine that
primitive so that it takes a target ITS as a parameter.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sun, 8 Oct 2017 17:48:06 +0000 (18:48 +0100)]
irqchip/gic-v3-its: Make GICv4_ITS_LIST_MAX globally available
As we're about to make use of the maximum number of ITSs in
a GICv4 system, let's make this value global (and rename it to
GICv4_ITS_LIST_MAX).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Sun, 8 Oct 2017 17:44:42 +0000 (18:44 +0100)]
irqchip/gic-v3-its: Track per-ITS list number
At boot time, we enumerate all the GICv4-capable ITSs, and build
a mask of the available ITSs. Take this opportunity to store
the ITS number in the its_node structure so that we can use it
at a later time.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Fri, 28 Jul 2017 20:20:37 +0000 (21:20 +0100)]
irqchip/gic-v3-its: Workaround HiSilicon Hip07 redistributor addressing
The ITSes on the Hip07 (as present in the Huawei D05) are broken when
it comes to addressing the redistributors, and need to be explicitely
told to address the VLPI page instead of the redistributor base address.
So let's add yet another quirk, fixing up the target address
in the command stream.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Fri, 28 Jul 2017 20:16:58 +0000 (21:16 +0100)]
irqchip/gic-v3-its: Pass its_node pointer to each command builder
In order to be able to issue command variants depending on
how broken an ITS is, let's pass the its pointer to all
command building primitives.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Fri, 4 Aug 2017 16:45:50 +0000 (17:45 +0100)]
irqchip/gic-v3-its: Add post-mortem info on command timeout
If the ITS stops processing commands, we're pretty much toasted
as we cannot update the configuration anymore (and we're not
even sure that the ITS still translates interrups).
If that happens, let's dump some basic information about the
state of affairs before moving on.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Tue, 17 Oct 2017 16:55:56 +0000 (17:55 +0100)]
irqchip/gic-v3: Add workaround for Synquacer pre-ITS
The Socionext Synquacer SoC's implementation of GICv3 has a so-called
'pre-ITS', which maps 32-bit writes targeted at a separate window of
size '4 << device_id_bits' onto writes to GITS_TRANSLATER with device
ID taken from bits [device_id_bits + 1:2] of the window offset.
Writes that target GITS_TRANSLATER directly are reported as originating
from device ID #0.
So add a workaround for this. Given that this breaks isolation, clear
the IRQ_DOMAIN_FLAG_MSI_REMAP flag as well.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Tue, 17 Oct 2017 16:55:55 +0000 (17:55 +0100)]
irqchip/gic: Make quirks matching conditional on init return value
As it turns out, the IIDR is not sufficient to distinguish between GICv3
implementations when it comes to enabling quirks. So update the prototype
of the init() hook to return a bool, and interpret a 'false' return value
as no match, in which case the 'enabling workaround' log message should
not be printed.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Ard Biesheuvel [Tue, 17 Oct 2017 16:55:54 +0000 (17:55 +0100)]
irqchip/gic-v3: Probe device ID space before quirks handling
Before adding another SoC whose device ID space deviates from the
value presented in the GIC ID registers, let's slightly refactor
the code so that the ID registers are probed before that quirks
handling executes. This allows us to move the device ID override
into the quirk handler itself.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Shanker Donthineni [Fri, 6 Oct 2017 15:24:00 +0000 (10:24 -0500)]
irqchip/gic-v3: Add support for Range Selector (RS) feature
A new feature Range Selector (RS) has been added to GIC specification
in order to support more than 16 CPUs at affinity level 0. New fields
are introduced in SGI system registers (ICC_SGI0R_EL1, ICC_SGI1R_EL1
and ICC_ASGI1R_EL1) to relax an artificial limit of 16 at level 0.
- A new RSS field in ICC_CTLR_EL3, ICC_CTLR_EL1 and ICV_CTLR_EL1:
[18] - Range Selector Support (RSS)
0b0 = Targeted SGIs with affinity level 0 values of 0-15 are supported.
0b1 = Targeted SGIs with affinity level 0 values of 0-255 are supported.
- A new RS field in ICC_SGI0R_EL1, ICC_SGI1R_EL1 and ICC_ASGI1R_EL1:
[47:44] - RangeSelector (RS) which group of 16 TargetList[n] field
TargetList[n] represents aff0 value ((RS*16)+n)
When ICC_CTLR_EL3.RSS==0 or ICC_CTLR_EL1.RSS==0, RS is RES0.
- A new RSS field in GICD_TYPER:
[26] - Range Selector Support (RSS)
0b0 = Targeted SGIs with affinity level 0 values of 0-15 are supported.
0b1 = Targeted SGIs with affinity level 0 values of 0-255 are supported.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Doug Berger [Tue, 19 Sep 2017 01:00:00 +0000 (18:00 -0700)]
irqchip/brcmstb-l2: Add support for the BCM7271 L2 controller
Add the initialization of the generic irq chip for the BCM7271 L2
interrupt controller. This controller only supports level
interrupts and uses the "brcm,bcm7271-l2-intc" compatibility
string.
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Doug Berger [Tue, 19 Sep 2017 00:59:59 +0000 (17:59 -0700)]
irqchip/brcmstb-l2: Abstract register accesses
Added register block offsets to the brcmstb_l2_intc_data structure
for the status and mask registers to support reading the active
interupts in an abstracted way. It seems like an irq_chip method
should have been provided for this, but it's not there yet.
Abstracted the implementation of the handler, suspend, and resume
functions to not use any hard coded register offsets.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Doug Berger [Tue, 19 Sep 2017 00:59:58 +0000 (17:59 -0700)]
irqchip/brcmstb-l2: Remove some processing from the handler
Saving the generic chip pointer in the brcmstb_l2_intc_data prevents
the need to call irq_get_domain_generic_chip(). Also don't need to
save parent_irq and base there since local variables in the
brcmstb_l2_intc_of_init() function are just as good.
The handle_edge_irq flow or chained_irq_enter takes care of the
acknowledgment of the interrupt so it is redundant to clear it in
brcmstb_l2_intc_irq_handle().
irq_linear_revmap() is a fast path equivalent of irq_find_mapping()
that is appropriate to use for domain controllers of this type.
Defining irq_mask_ack is slightly more efficient than just
implementing irq_mask and irq_ack separately.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Geert Uytterhoeven [Wed, 4 Oct 2017 12:17:58 +0000 (14:17 +0200)]
irqchip/renesas-intc-irqpin: Use of_device_get_match_data() helper
Use the of_device_get_match_data() helper instead of open coding.
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Geert Uytterhoeven [Fri, 6 Oct 2017 11:51:32 +0000 (13:51 +0200)]
dt-bindings: irqchip: renesas-irqc: Document R-Car M3-W, V3M, D3 support
Document support for the Interrupt Controller for Externel Devices
(INTC-EX) in the Renesas M3-W (r8a7796), V3M (r8a77970), and D3
(r8a77995) SoCs.
No driver update is needed.
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Masahiro Yamada [Fri, 22 Sep 2017 12:20:41 +0000 (21:20 +0900)]
irqdomain: Add __rcu annotations to radix tree slot
Fix different address spaces warning of sparse.
kernel/irq/irqdomain.c:1463:14: warning: incorrect type in assignment (different address spaces)
kernel/irq/irqdomain.c:1463:14: expected void **slot
kernel/irq/irqdomain.c:1463:14: got void [noderef] <asn:4>**
kernel/irq/irqdomain.c:1465:66: warning: incorrect type in argument 2 (different address spaces)
kernel/irq/irqdomain.c:1465:66: expected void [noderef] <asn:4>**slot
kernel/irq/irqdomain.c:1465:66: got void **slot
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Masahiro Yamada [Thu, 5 Oct 2017 01:44:54 +0000 (10:44 +0900)]
irqdomain: Move revmap_trees_mutex to struct irq_domain
The revmap_trees_mutex protects domain->revmap_tree. There is no
need to make it global because it is allowed to modify revmap_tree
of two different domains concurrently. Having said that, this would
not be a actual bottleneck because the interrupt map/unmap does not
occur quite often. Rather, the motivation is to tidy up the code
from a data structure point of view.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Randy Dunlap [Mon, 16 Oct 2017 18:04:33 +0000 (11:04 -0700)]
irqchip: Add Kconfig menu
Add a menu for IRQ chip drivers. This makes the Device drivers menu be more
consistent (listing "subsystems" instead of specific options) and makes the
IRQCHIP options be listed in expected places for 'make menu|xconfig'.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: https://lkml.kernel.org/r/3db7385a-c6a1-5c93-0797-6f4b6b2b2cde@infradead.org
Ladislav Michl [Mon, 16 Oct 2017 16:13:03 +0000 (18:13 +0200)]
irqchip/irq-omap-intc: Do not statically initialize variables
omap_nr_pending and omap_nr_irqs variables are initialized
right at the beginning of intc_of_init function, so there's
no need to statically initialize them.
Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: linux-omap@vger.kernel.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: https://lkml.kernel.org/r/20171016161303.veumgcd3xom5c54r@lenoch
Ladislav Michl [Mon, 16 Oct 2017 16:04:22 +0000 (18:04 +0200)]
irqchip/irq-omap-intc: Remove omap3_init_irq()
All mach-omap2 variants are device tree only now, so this function is dead
code. Remove it.
Signed-off-by: Ladislav Michl <ladis@linux-mips.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-omap@vger.kernel.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: https://lkml.kernel.org/r/20171016160422.uu2i7vvrgy7cc4aw@lenoch
Linus Torvalds [Sun, 1 Oct 2017 21:54:54 +0000 (14:54 -0700)]
Linux 4.14-rc3
Linus Torvalds [Sun, 1 Oct 2017 20:55:32 +0000 (13:55 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"This contains the following fixes and improvements:
- Avoid dereferencing an unprotected VMA pointer in the fault signal
generation code
- Fix inline asm call constraints for GCC 4.4
- Use existing register variable to retrieve the stack pointer
instead of forcing the compiler to create another indirect access
which results in excessive extra 'mov %rsp, %<dst>' instructions
- Disable branch profiling for the memory encryption code to prevent
an early boot crash
- Fix a sparse warning caused by casting the __user annotation in
__get_user_asm_u64() away
- Fix an off by one error in the loop termination of the error patch
in the x86 sysfs init code
- Add missing CPU IDs to various Intel specific drivers to enable the
functionality on recent hardware
- More (init) constification in the numachip code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use register variable to get stack pointer value
x86/mm: Disable branch profiling in mem_encrypt.c
x86/asm: Fix inline asm call constraints for GCC 4.4
perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
perf/x86/intel/rapl: Add missing CPU IDs
perf/x86/msr: Add missing CPU IDs
perf/x86/intel/cstate: Add missing CPU IDs
x86: Don't cast away the __user in __get_user_asm_u64()
x86/sysfs: Fix off-by-one error in loop termination
x86/mm: Fix fault error path using unsafe vma pointer
x86/numachip: Add const and __initconst to numachip2_clockevent
Linus Torvalds [Sun, 1 Oct 2017 20:03:16 +0000 (13:03 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"This adds a new timer wheel function which is required for the
conversion of the timer callback function from the 'unsigned long
data' argument to 'struct timer_list *timer'. This conversion has two
benefits:
1) It makes struct timer_list smaller
2) Many callers hand in a pointer to the timer or to the structure
containing the timer, which happens via type casting both at setup
and in the callback. This change gets rid of the typecasts.
Once the conversion is complete, which is planned for 4.15, the old
setup function and the intermediate typecast in the new setup function
go away along with the data field in struct timer_list.
Merging this now into mainline allows a smooth queueing of the actual
conversion in the affected maintainer trees without creating
dependencies"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
um/time: Fixup namespace collision
timer: Prepare to change timer callback argument type
Linus Torvalds [Sun, 1 Oct 2017 19:34:42 +0000 (12:34 -0700)]
Merge branch 'smp-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull smp/hotplug fixes from Thomas Gleixner:
"This addresses the fallout of the new lockdep mechanism which covers
completions in the CPU hotplug code.
The lockdep splats are false positives, but there is no way to
annotate that reliably. The solution is to split the completions for
CPU up and down, which requires some reshuffling of the failure
rollback handling as well"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Hotplug state fail injection
smp/hotplug: Differentiate the AP completion between up and down
smp/hotplug: Differentiate the AP-work lockdep class between up and down
smp/hotplug: Callback vs state-machine consistency
smp/hotplug: Rewrite AP state machine core
smp/hotplug: Allow external multi-instance rollback
smp/hotplug: Add state diagram
Linus Torvalds [Sun, 1 Oct 2017 19:10:02 +0000 (12:10 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"The scheduler pull request comes with the following updates:
- Prevent a divide by zero issue by validating the input value of
sysctl_sched_time_avg
- Make task state printing consistent all over the place and have
explicit state characters for IDLE and PARKED so they wont be
displayed as 'D' state which confuses tools"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/sysctl: Check user input value of sysctl_sched_time_avg
sched/debug: Add explicit TASK_PARKED printing
sched/debug: Ignore TASK_IDLE for SysRq-W
sched/debug: Add explicit TASK_IDLE printing
sched/tracing: Use common task-state helpers
sched/tracing: Fix trace_sched_switch task-state printing
sched/debug: Remove unused variable
sched/debug: Convert TASK_state to hex
sched/debug: Implement consistent task-state printing
Linus Torvalds [Sun, 1 Oct 2017 19:06:31 +0000 (12:06 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
- Prevent a division by zero in the perf aux buffer handling
- Sync kernel headers with perf tool headers
- Fix a build failure in the syscalltbl code
- Make the debug messages of perf report --call-graph work correctly
- Make sure that all required perf files are in the MANIFEST for
container builds
- Fix the atrr.exclude kernel handling so it respects the
perf_event_paranoid and the user permissions
- Make perf test on s390x work correctly
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Only update ->aux_wakeup in non-overwrite mode
perf test: Fix vmlinux failure on s390x part 2
perf test: Fix vmlinux failure on s390x
perf tools: Fix syscalltbl build failure
perf report: Fix debug messages with --call-graph option
perf evsel: Fix attr.exclude_kernel setting for default cycles:p
tools include: Sync kernel ABI headers with tooling headers
perf tools: Get all of tools/{arch,include}/ in the MANIFEST
Linus Torvalds [Sun, 1 Oct 2017 19:02:47 +0000 (12:02 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two fixes for locking:
- Plug a hole the pi_stat->owner serialization which was changed
recently and failed to fixup two usage sites.
- Prevent reordering of the rwsem_has_spinner() check vs the
decrement of rwsem count in up_write() which causes a missed
wakeup"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem-xadd: Fix missed wakeup due to reordering of load
futex: Fix pi_state->owner serialization
Linus Torvalds [Sun, 1 Oct 2017 19:00:56 +0000 (12:00 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Add a missing NULL pointer check in free_irq()
- Fix a memory leak/memory corruption in the generic irq chip
- Add missing rcu annotations for radix tree access
- Use ffs instead of fls when extracting data from a chip register in
the MIPS GIC irq driver
- Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
end up at the target CPU and not at CPU0
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/generic-chip: Don't replace domain's name
irqdomain: Add __rcu annotations to radix tree accessors
irqchip/mips-gic: Use effective affinity to unmask
irqchip/mips-gic: Fix shifts to extract register fields
genirq: Check __free_irq() return value for NULL
Linus Torvalds [Sun, 1 Oct 2017 18:12:29 +0000 (11:12 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull objtool fixes from Thomas Gleixner:
"Two small fixes for objtool:
- Support frame pointer setup via 'lea (%rsp), %rbp' which was not
yet supported and caused build warnings
- Disable unreacahble warnings for GCC4.4 and older to avoid false
positives caused by the compiler itself"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Support unoptimized frame pointer setup
objtool: Skip unreachable warnings for GCC 4.4 and older
Linus Torvalds [Sat, 30 Sep 2017 19:52:32 +0000 (12:52 -0700)]
Merge tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
- Fix partition alignment check in mtdcore.c
- Fix a buffer overflow in the Atmel NAND driver
* tag 'mtd/fixes-for-4.14-rc3' of git://git.infradead.org/linux-mtd:
mtd: nand: atmel: fix buffer overflow in atmel_pmecc_user
mtd: Fix partition alignment check on multi-erasesize devices
Linus Torvalds [Sat, 30 Sep 2017 19:50:56 +0000 (12:50 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eight mostly minor fixes for recently discovered issues in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ILLEGAL REQUEST + ASC==27 => target failure
scsi: aacraid: Add a small delay after IOP reset
scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add()
scsi: scsi_transport_fc: set scsi_target_id upon rescan
scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
scsi: aacraid: error: testing array offset 'bus' after use
scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function
scsi: aacraid: Fix 2T+ drives on SmartIOC-2000
Linus Torvalds [Sat, 30 Sep 2017 02:35:41 +0000 (19:35 -0700)]
Merge tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform drivers fix from Darren Hart:
"Newly discovered species of fujitsu laptops break some assumptions
about ACPI device pairings.
fujitsu-laptop: Don't oops when FUJ02E3 is not present"
* tag 'platform-drivers-x86-v4.14-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: fujitsu-laptop: Don't oops when FUJ02E3 is not presnt
Linus Torvalds [Sat, 30 Sep 2017 02:33:32 +0000 (19:33 -0700)]
Merge tag 'led_fixes-4.14-rc3' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED fixes from Jacek Anaszewski:
"Four fixes for the as3645a LED flash controller and one update to
MAINTAINERS"
* tag 'led_fixes-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
MAINTAINERS: Add entry for MediaTek PMIC LED driver
as3645a: Unregister indicator LED on device unbind
as3645a: Use integer numbers for parsing LEDs
dt: bindings: as3645a: Use LED number to refer to LEDs
as3645a: Use ams,input-max-microamp as documented in DT bindings
Linus Torvalds [Fri, 29 Sep 2017 19:59:59 +0000 (12:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull waitid fix from Al Viro:
"Fix infoleak in waitid()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix infoleak in waitid(2)
Linus Torvalds [Fri, 29 Sep 2017 19:57:35 +0000 (12:57 -0700)]
Merge branch 'for-4.14-rc3' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We've collected a bunch of isolated fixes, for crashes, user-visible
behaviour or missing bits from other subsystem cleanups from the past.
The overall number is not small but I was not able to make it
significantly smaller. Most of the patches are supposed to go to
stable"
* 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: log csums for all modified extents
Btrfs: fix unexpected result when dio reading corrupted blocks
btrfs: Report error on removing qgroup if del_qgroup_item fails
Btrfs: skip checksum when reading compressed data if some IO have failed
Btrfs: fix kernel oops while reading compressed data
Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
Btrfs: do not backup tree roots when fsync
btrfs: remove BTRFS_FS_QUOTA_DISABLING flag
btrfs: propagate error to btrfs_cmp_data_prepare caller
btrfs: prevent to set invalid default subvolid
Btrfs: send: fix error number for unknown inode types
btrfs: fix NULL pointer dereference from free_reloc_roots()
btrfs: finish ordered extent cleaning if no progress is found
btrfs: clear ordered flag on cleaning up ordered extents
Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO
Btrfs: do not reset bio->bi_ops while writing bio
Btrfs: use the new helper wbc_to_write_flags
Linus Torvalds [Fri, 29 Sep 2017 19:55:33 +0000 (12:55 -0700)]
Merge tag 'md/4.14-rc3' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"A few fixes for MD. Mainly fix a problem introduced in 4.13, which we
retry bio for some code paths but not all in some situations"
* tag 'md/4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid5: cap worker count
dm-raid: fix a race condition in request handling
md: fix a race condition for flush request handling
md: separate request handling
Linus Torvalds [Fri, 29 Sep 2017 19:46:13 +0000 (12:46 -0700)]
Merge tag 'pci-v4.14-fixes-3' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix CONFIG_PCI=n build error (introduced in v4.14-rc1) (Geert
Uytterhoeven)
- fix a race in sysfs driver_override store/show (Nicolai Stange)
* tag 'pci-v4.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix race condition with driver_override
PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
Linus Torvalds [Fri, 29 Sep 2017 19:43:36 +0000 (12:43 -0700)]
Merge tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Regular fixes pull, some amdkfd, amdgpu, etnaviv, sun4i, qxl, tegra
fixes.
I've got an outstanding pull for i915 but it wasn't on an rc2 base so
I wanted to ship these out first, I might get to it before rc3 or I
might not"
* tag 'drm-fixes-for-v4.14-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/tegra: trace: Fix path to include
qxl: fix framebuffer unpinning
drm/sun4i: cec: Enable back CEC-pin framework
drm/amdkfd: Print event limit messages only once per process
drm/amdkfd: Fix kernel-queue wrapping bugs
drm/amdkfd: Fix incorrect destroy_mqd parameter
drm/radeon: disable hard reset in hibernate for APUs
drm/amdgpu: revert tile table update for oland
etnaviv: fix gem object list corruption
etnaviv: fix submit error path
qxl: fix primary surface handling
drm/amdkfd: check for null dev to avoid a null pointer dereference
Linus Torvalds [Fri, 29 Sep 2017 19:37:07 +0000 (12:37 -0700)]
Merge tag 'iommu-fixes-v4.14-rc2' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- A comment fix for 'struct iommu_ops'
- Format string fixes for AMD IOMMU, unfortunatly I missed that during
review.
- Limit mediatek physical addresses to 32 bit for v7s to fix a warning
triggered in io-page-table code.
- Fix dma-sync in io-pgtable-arm-v7s code
* tag 'iommu-fixes-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix comment for iommu_ops.map_sg
iommu/amd: pr_err() strings should end with newlines
iommu/mediatek: Limit the physical address in 32bit for v7s
iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA
Linus Torvalds [Fri, 29 Sep 2017 19:31:35 +0000 (12:31 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- SPsel register initialisation on reset as the architecture defines
its state as unknown
- Use READ_ONCE when dereferencing pmd_t pointers to avoid race
conditions in page_vma_mapped_walk() (or fast GUP) with concurrent
modifications of the page table
- Avoid invoking the mm fault handling code for kernel addresses (check
against TASK_SIZE) which would otherwise result in calling
might_sleep() in atomic context
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fault: Route pte translation faults via do_translation_fault
arm64: mm: Use READ_ONCE when dereferencing pointer to pte table
arm64: Make sure SPsel is always set
Linus Torvalds [Fri, 29 Sep 2017 19:24:28 +0000 (12:24 -0700)]
Merge tag 'for-linus-4.14c-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- avoid a warning when compiling with clang
- consider read-only bits in xen-pciback when writing to a BAR
- fix a boot crash of pv-domains
* tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping
xen-pciback: relax BAR sizing write value check
x86/xen: clean up clang build warning
Linus Torvalds [Fri, 29 Sep 2017 19:18:55 +0000 (12:18 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Mixed bugfixes. Perhaps the most interesting one is a latent bug that
was finally triggered by PCID support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm/x86: Handle async PF in RCU read-side critical sections
KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
KVM: VMX: use cmpxchg64
KVM: VMX: simplify and fix vmx_vcpu_pi_load
KVM: VMX: avoid double list add with VT-d posted interrupts
KVM: VMX: extract __pi_post_block
KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception
KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
Al Viro [Fri, 29 Sep 2017 17:43:15 +0000 (13:43 -0400)]
fix infoleak in waitid(2)
kernel_waitid() can return a PID, an error or 0. rusage is filled in the first
case and waitid(2) rusage should've been copied out exactly in that case, *not*
whenever kernel_waitid() has not returned an error. Compat variant shares that
braino; none of kernel_wait4() callers do, so the below ought to fix it.
Reported-and-tested-by: Alexander Potapenko <glider@google.com>
Fixes:
ce72a16fa705 ("wait4(2)/waitid(2): separate copying rusage to userland")
Cc: stable@vger.kernel.org # v4.13
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Andrey Ryabinin [Fri, 29 Sep 2017 14:15:36 +0000 (17:15 +0300)]
x86/asm: Use register variable to get stack pointer value
Currently we use current_stack_pointer() function to get the value
of the stack pointer register. Since commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
... we have a stack register variable declared. It can be used instead of
current_stack_pointer() function which allows to optimize away some
excessive "mov %rsp, %<dst>" instructions:
-mov %rsp,%rdx
-sub %rdx,%rax
-cmp $0x3fff,%rax
-ja
ffffffff810722fd <ist_begin_non_atomic+0x2d>
+sub %rsp,%rax
+cmp $0x3fff,%rax
+ja
ffffffff810722fa <ist_begin_non_atomic+0x2a>
Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
and use it instead of the removed function.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Fri, 29 Sep 2017 16:24:19 +0000 (11:24 -0500)]
x86/mm: Disable branch profiling in mem_encrypt.c
Some routines in mem_encrypt.c are called very early in the boot process,
e.g. sme_encrypt_kernel(). When CONFIG_TRACE_BRANCH_PROFILING=y is defined
the resulting branch profiling associated with the check to see if SME is
active results in a kernel crash. Disable branch profiling for
mem_encrypt.c by defining DISABLE_BRANCH_PROFILING before including any
header files.
Reported-by: kernel test robot <lkp@01.org>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929162419.6016.53390.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 29 Sep 2017 17:31:46 +0000 (19:31 +0200)]
Merge tag 'perf-urgent-for-mingo-4.14-
20170928' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix syscalltbl build failure (Akemi Yagi)
- Fix attr.exclude_kernel setting for default cycles:p, this time for
!root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo)
- Sync kernel ABI headers with tooling headers (Ingo Molnar)
- Remove misleading debug messages with --call-graph option (Mengting Zhang)
- Revert vmlinux symbol resolution patches for s390x (Thomas Richter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Fri, 29 Sep 2017 17:26:35 +0000 (10:26 -0700)]
Merge branch 'fixes-v4.14-rc3' of git://git./linux/kernel/git/jmorris/linux-security
Pull keys fixes from James Morris:
"Notable here is a rewrite of big_key crypto by Jason Donenfeld to
address some issues in the original code.
From Jason's commit log:
"This started out as just replacing the use of crypto/rng with
get_random_bytes_wait, so that we wouldn't use bad randomness at
boot time. But, upon looking further, it appears that there were
even deeper underlying cryptographic problems, and that this seems
to have been committed with very little crypto review. So, I rewrote
the whole thing, trying to keep to the conventions introduced by the
previous author, to fix these cryptographic flaws."
There has been positive review of the new code by Eric Biggers and
Herbert Xu, and it passes basic testing via the keyutils test suite.
Eric also manually tested it.
Generally speaking, we likely need to improve the amount of crypto
review for kernel crypto users including keys (I'll post a note
separately to ksummit-discuss)"
* 'fixes-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
security/keys: rewrite all of big_key crypto
security/keys: properly zero out sensitive key material in big_key
KEYS: use kmemdup() in request_key_auth_new()
KEYS: restrict /proc/keys by credentials at open time
KEYS: reset parent each time before searching key_user_tree
KEYS: prevent KEYCTL_READ on negative key
KEYS: prevent creating a different user's keyrings
KEYS: fix writing past end of user-supplied buffer in keyring_read()
KEYS: fix key refcount leak in keyctl_read_key()
KEYS: fix key refcount leak in keyctl_assume_authority()
KEYS: don't revoke uninstantiated key in request_key_auth_new()
KEYS: fix cred refcount leak in request_key_auth_new()
Will Deacon [Fri, 29 Sep 2017 11:27:41 +0000 (12:27 +0100)]
arm64: fault: Route pte translation faults via do_translation_fault
We currently route pte translation faults via do_page_fault, which elides
the address check against TASK_SIZE before invoking the mm fault handling
code. However, this can cause issues with the path walking code in
conjunction with our word-at-a-time implementation because
load_unaligned_zeropad can end up faulting in kernel space if it reads
across a page boundary and runs into a page fault (e.g. by attempting to
read from a guard region).
In the case of such a fault, load_unaligned_zeropad has registered a
fixup to shift the valid data and pad with zeroes, however the abort is
reported as a level 3 translation fault and we dispatch it straight to
do_page_fault, despite it being a kernel address. This results in calling
a sleeping function from atomic context:
BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
in_atomic(): 0, irqs_disabled(): 0, pid: 10290
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[...]
[<
ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
[<
ffffff8e016cd158>] __might_sleep+0x7c/0x8c
[<
ffffff8e016977f0>] do_page_fault+0x140/0x330
[<
ffffff8e01681328>] do_mem_abort+0x54/0xb0
Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
[...]
[<
ffffff8e016844fc>] el1_da+0x18/0x78
[<
ffffff8e017f399c>] path_parentat+0x44/0x88
[<
ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
[<
ffffff8e017f5044>] filename_create+0x4c/0x128
[<
ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
[<
ffffff8e01684e30>] el0_svc_naked+0x24/0x28
Code:
36380080 d5384100 f9400800 9402566d (
d4210000)
---[ end trace
2d01889f2bca9b9f ]---
Fix this by dispatching all translation faults to do_translation_faults,
which avoids invoking the page fault logic for faults on kernel addresses.
Cc: <stable@vger.kernel.org>
Reported-by: Ankit Jain <ankijain@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Fri, 29 Sep 2017 10:29:55 +0000 (11:29 +0100)]
arm64: mm: Use READ_ONCE when dereferencing pointer to pte table
On kernels built with support for transparent huge pages, different CPUs
can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
and they must take care to use READ_ONCE to avoid value tearing or caching
of stale values by the compiler. Unfortunately, these functions call into
our pgtable macros, which don't use READ_ONCE, and compiler caching has
been observed to cause the following crash during ext4 writeback:
PC is at check_pte+0x20/0x170
LR is at page_vma_mapped_walk+0x2e0/0x540
[...]
Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
Call trace:
[<
ffff000008233328>] check_pte+0x20/0x170
[<
ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
[<
ffff000008234adc>] page_mkclean_one+0xac/0x278
[<
ffff000008234d98>] rmap_walk_file+0xf0/0x238
[<
ffff000008236e74>] rmap_walk+0x64/0xa0
[<
ffff0000082370c8>] page_mkclean+0x90/0xa8
[<
ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
[<
ffff00000832f984>] mpage_submit_page+0x34/0x98
[<
ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
[<
ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
[<
ffff00000833530c>] ext4_writepages+0x484/0xe30
[<
ffff0000081f6ab4>] do_writepages+0x44/0xe8
[<
ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
[<
ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
[<
ffff000008324310>] ext4_sync_file+0x80/0x4b8
[<
ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
[<
ffff0000082332b4>] SyS_msync+0x194/0x1e8
This is because page_vma_mapped_walk loads the PMD twice before calling
pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
(where it sees a valid table pointer due to a concurrent pmd_populate).
However, the compiler inlines everything and caches the first value in
a register, which is subsequently used in pte_offset_phys which returns
a junk pointer that is later dereferenced when attempting to access the
relevant pte.
This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
that a stale value is not used. Whilst this is a point fix for a known
failure (and simple to backport), a full fix moving all of our page table
accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
page_vma_mapped_walk is in the works for a future kernel release.
Cc: Jon Masters <jcm@redhat.com>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: <stable@vger.kernel.org>
Fixes:
f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Boqun Feng [Fri, 29 Sep 2017 11:01:45 +0000 (19:01 +0800)]
kvm/x86: Handle async PF in RCU read-side critical sections
Sasha Levin reported a WARNING:
| WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
| rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
| WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
| rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
...
| CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-
20170908+ #246
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
| 1.10.1-1ubuntu1 04/01/2014
| Call Trace:
...
| RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
| RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
| RSP: 0018:
ffff88003b2debc8 EFLAGS:
00010002
| RAX:
0000000000000001 RBX:
1ffff1000765bd85 RCX:
0000000000000000
| RDX:
1ffff100075d7882 RSI:
ffffffffb5c7da20 RDI:
ffff88003aebc410
| RBP:
ffff88003b2def30 R08:
dffffc0000000000 R09:
0000000000000001
| R10:
0000000000000000 R11:
0000000000000000 R12:
ffff88003b2def08
| R13:
0000000000000000 R14:
ffff88003aebc040 R15:
ffff88003aebc040
| __schedule+0x201/0x2240 kernel/sched/core.c:3292
| schedule+0x113/0x460 kernel/sched/core.c:3421
| kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
| do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
| async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
| RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
| RSP: 0018:
ffff88003b2df520 EFLAGS:
00010283
| RAX:
000000000000003f RBX:
ffffffffb5d1e141 RCX:
ffff88003b2df670
| RDX:
0000000000000001 RSI:
dffffc0000000000 RDI:
ffffffffb5d1e140
| RBP:
ffff88003b2df560 R08:
dffffc0000000000 R09:
0000000000000000
| R10:
ffff88003b2df718 R11:
0000000000000000 R12:
ffff88003b2df5d8
| R13:
0000000000000064 R14:
ffffffffb5d1e140 R15:
0000000000000000
| vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
| sprintf+0xbe/0xf0 lib/vsprintf.c:2386
| proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
| get_link fs/namei.c:1047 [inline]
| link_path_walk+0x1041/0x1490 fs/namei.c:2127
...
This happened when the host hit a page fault, and delivered it as in an
async page fault, while the guest was in an RCU read-side critical
section. The guest then tries to reschedule in kvm_async_pf_task_wait(),
but rcu_preempt_note_context_switch() would treat the reschedule as a
sleep in RCU read-side critical section, which is not allowed (even in
preemptible RCU). Thus the WARN.
To cure this, make kvm_async_pf_task_wait() go to the halt path if the
PF happens in a RCU read-side critical section.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Fri, 29 Sep 2017 01:16:44 +0000 (18:16 -0700)]
KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume
------------[ cut here ]------------
WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G W OE 4.13.0+ #17
RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
Call Trace:
? emulator_read_emulated+0x15/0x20 [kvm]
? segmented_read+0xae/0xf0 [kvm]
vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
x86_emulate_instruction+0x733/0x810 [kvm]
vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
? kvm_arch_vcpu_load+0x62/0x230 [kvm]
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.
This should actually work all the time, making vmx_inject_page_fault_nested
totally unnecessary. However, that's not working yet, so this patch can work
around the issue in the meanwhile.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ethan Zhao [Mon, 4 Sep 2017 05:59:34 +0000 (13:59 +0800)]
sched/sysctl: Check user input value of sysctl_sched_time_avg
System will hang if user set sysctl_sched_time_avg to 0:
[root@XXX ~]# sysctl kernel.sched_time_avg_ms=0
Stack traceback for pid 0
0xffff883f6406c600 0 0 1 3 R 0xffff883f6406cf50 *swapper/3
ffff883f7ccc3ae8 0000000000000018 ffffffff810c4dd0 0000000000000000
0000000000017800 ffff883f7ccc3d78 0000000000000003 ffff883f7ccc3bf8
ffffffff810c4fc9 ffff883f7ccc3c08 00000000810c5043 ffff883f7ccc3c08
Call Trace:
<IRQ> [<
ffffffff810c4dd0>] ? update_group_capacity+0x110/0x200
[<
ffffffff810c4fc9>] ? update_sd_lb_stats+0x109/0x600
[<
ffffffff810c5507>] ? find_busiest_group+0x47/0x530
[<
ffffffff810c5b84>] ? load_balance+0x194/0x900
[<
ffffffff810ad5ca>] ? update_rq_clock.part.83+0x1a/0xe0
[<
ffffffff810c6d42>] ? rebalance_domains+0x152/0x290
[<
ffffffff810c6f5c>] ? run_rebalance_domains+0xdc/0x1d0
[<
ffffffff8108a75b>] ? __do_softirq+0xfb/0x320
[<
ffffffff8108ac85>] ? irq_exit+0x125/0x130
[<
ffffffff810b3a17>] ? scheduler_ipi+0x97/0x160
[<
ffffffff81052709>] ? smp_reschedule_interrupt+0x29/0x30
[<
ffffffff8173a1be>] ? reschedule_interrupt+0x6e/0x80
<EOI> [<
ffffffff815bc83c>] ? cpuidle_enter_state+0xcc/0x230
[<
ffffffff815bc80c>] ? cpuidle_enter_state+0x9c/0x230
[<
ffffffff815bc9d7>] ? cpuidle_enter+0x17/0x20
[<
ffffffff810cd6dc>] ? cpu_startup_entry+0x38c/0x420
[<
ffffffff81053373>] ? start_secondary+0x173/0x1e0
Because divide-by-zero error happens in function:
update_group_capacity()
update_cpu_capacity()
scale_rt_capacity()
{
...
total = sched_avg_period() + delta;
used = div_u64(avg, total);
...
}
To fix this issue, check user input value of sysctl_sched_time_avg, keep
it unchanged when hitting invalid input, and set the minimum limit of
sysctl_sched_time_avg to 1 ms.
Reported-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: ethan.kernel@gmail.com
Cc: keescook@chromium.org
Cc: mcgrof@kernel.org
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1504504774-18253-1-git-send-email-ethan.zhao@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Thu, 28 Sep 2017 21:58:26 +0000 (16:58 -0500)]
x86/asm: Fix inline asm call constraints for GCC 4.4
The kernel test bot (run by Xiaolong Ye) reported that the following commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
is causing double faults in a kernel compiled with GCC 4.4.
Linus subsequently diagnosed the crash pattern and the buggy commit and found that
the issue is with this code:
register unsigned int __asm_call_sp asm("esp");
#define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)
Even on a 64-bit kernel, it's using ESP instead of RSP. That causes GCC
to produce the following bogus code:
ffffffff8147461d: 89 e0 mov %esp,%eax
ffffffff8147461f: 4c 89 f7 mov %r14,%rdi
ffffffff81474622: 4c 89 fe mov %r15,%rsi
ffffffff81474625: ba 20 00 00 00 mov $0x20,%edx
ffffffff8147462a: 89 c4 mov %eax,%esp
ffffffff8147462c: e8 bf 52 05 00 callq
ffffffff814c98f0 <copy_user_generic_unrolled>
Despite the absurdity of it backing up and restoring the stack pointer
for no reason, the bug is actually the fact that it's only backing up
and restoring the lower 32 bits of the stack pointer. The upper 32 bits
are getting cleared out, corrupting the stack pointer.
So change the '__asm_call_sp' register variable to be associated with
the actual full-size stack pointer.
This also requires changing the __ASM_SEL() macro to be based on the
actual compiled arch size, rather than the CONFIG value, because
CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso).
Otherwise Clang fails to build the kernel because it complains about the
use of a 64-bit register (RSP) in a 32-bit file.
Reported-and-Bisected-and-Tested-by: kernel test robot <xiaolong.ye@intel.com>
Diagnosed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:37:28 +0000 (18:37 +0200)]
sched/debug: Add explicit TASK_PARKED printing
Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it
its own print state because it will not in fact get woken by regular
wakeups and is a long-term state.
This requires moving TASK_PARKED into the TASK_REPORT mask, and since
that latter needs to be a contiguous bitmask, we need to shuffle the
bits around a bit.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:32:41 +0000 (18:32 +0200)]
sched/debug: Ignore TASK_IDLE for SysRq-W
Markus reported that tasks in TASK_IDLE state are reported by SysRq-W,
which results in undesirable clutter.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:30:40 +0000 (18:30 +0200)]
sched/debug: Add explicit TASK_IDLE printing
Markus reported that kthreads that idle using TASK_IDLE instead of
TASK_INTERRUPTIBLE are reported in as TASK_UNINTERRUPTIBLE and things
like htop mark those red.
This is undesirable, so add an explicit state for TASK_IDLE.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:23:31 +0000 (18:23 +0200)]
sched/tracing: Use common task-state helpers
Remove yet another task-state char instance.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Prateek Sood [Thu, 7 Sep 2017 14:30:58 +0000 (20:00 +0530)]
locking/rwsem-xadd: Fix missed wakeup due to reordering of load
If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed:
spinning writer up_write caller
--------------- -----------------------
[S] osq_unlock() [L] osq
spin_lock(wait_lock)
sem->count=0xFFFFFFFF00000001
+0xFFFFFFFF00000000
count=sem->count
MB
sem->count=0xFFFFFFFE00000001
-0xFFFFFFFF00000001
spin_trylock(wait_lock)
return
rwsem_try_write_lock(count)
spin_unlock(wait_lock)
schedule()
Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().
The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: longman@redhat.com
Cc: parri.andrea@gmail.com
Cc: sramana@codeaurora.org
Link: http://lkml.kernel.org/r/1504794658-15397-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:19:53 +0000 (18:19 +0200)]
sched/tracing: Fix trace_sched_switch task-state printing
Convert trace_sched_switch to use the common task-state helpers and
fix the "X" and "Z" order, possibly they ended up in the wrong order
because TASK_REPORT has them in the wrong order too.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:14:08 +0000 (18:14 +0200)]
sched/debug: Remove unused variable
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:13:36 +0000 (18:13 +0200)]
sched/debug: Convert TASK_state to hex
Bit patterns are easier in hex.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 22 Sep 2017 16:09:26 +0000 (18:09 +0200)]
sched/debug: Implement consistent task-state printing
Currently get_task_state() and task_state_to_char() report different
states, create a number of common helpers and unify the reported state
space.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Thomas Gleixner [Fri, 29 Sep 2017 08:07:44 +0000 (10:07 +0200)]
um/time: Fixup namespace collision
The new timer_setup() function for struct timer_list collides with a
private um function. Rename it.
Fixes:
686fef928bba ("timer: Prepare to change timer callback argument type")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Kees Cook <keescook@chromium.org>
Alexander Shishkin [Wed, 6 Sep 2017 16:08:11 +0000 (19:08 +0300)]
perf/aux: Only update ->aux_wakeup in non-overwrite mode
The following commit:
d9a50b0256 ("perf/aux: Ensure aux_wakeup represents most recent wakeup index")
changed the AUX wakeup position calculation to rounddown(), which causes
a division-by-zero in AUX overwrite mode (aka "snapshot mode").
The zero denominator results from the fact that perf record doesn't set
aux_watermark to anything, in which case the kernel will set it to half
the AUX buffer size, but only for non-overwrite mode. In the overwrite
mode aux_watermark stays zero.
The good news is that, AUX overwrite mode, wakeups don't happen and
related bookkeeping is not relevant, so we can simply forego the whole
wakeup updates.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20170906160811.16510-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>