Juergen Gross [Fri, 4 Sep 2015 12:05:51 +0000 (14:05 +0200)]
xen: switch extra memory accounting to use pfns
Instead of using physical addresses for accounting of extra memory
areas available for ballooning switch to pfns as this is much less
error prone regarding partial pages.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 4 Sep 2015 12:18:08 +0000 (14:18 +0200)]
xen: limit memory to architectural maximum
When a pv-domain (including dom0) is started it tries to size it's
p2m list according to the maximum possible memory amount it ever can
achieve. Limit the initial maximum memory size to the architectural
limit of the hardware in order to avoid overflows during remapping
of memory.
This problem will occur when dom0 is started with an initial memory
size being a multiple of 1GB, but without specifying it's maximum
memory size. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Wed, 19 Aug 2015 16:53:11 +0000 (18:53 +0200)]
xen: avoid another early crash of memory limited dom0
Commit
b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory occurring
only on some hardware.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
of this is true and the E820 map of the machine is sparse (some areas
are not covered) then the machine might crash early in the boot
process.
An example E820 map triggering the problem looks like this:
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cf7fafff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cf7fb000-0x00000000cf95ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cf960000-0x00000000cfb62fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfb63000-0x00000000cfd14fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfd15000-0x00000000cfd61fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfd62000-0x00000000cfd6cfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000cfd6d000-0x00000000cfd6ffff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfd70000-0x00000000cfd70fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfd71000-0x00000000cfea8fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfea9000-0x00000000cfeb9fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfeba000-0x00000000cfecafff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfecb000-0x00000000cfecbfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfecc000-0x00000000cfedbfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfedc000-0x00000000cfedcfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfedd000-0x00000000cfeddfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfede000-0x00000000cfee3fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfee4000-0x00000000cfef6fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000cfef7000-0x00000000cfefffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000020effffff] usable
In this case the area a0000-dffff isn't present in the map. This will
confuse the memory setup of the domain when remapping the memory from
such holes to populated areas.
To avoid the problem the accounting of to be remapped memory has to
count such holes in the E820 map as well.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Wed, 19 Aug 2015 16:52:34 +0000 (18:52 +0200)]
xen: avoid early crash of memory limited dom0
Commit
b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same and exactly a multiple of 1 GB. The
kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
for the problem to happen. In this case it will crash very early
during boot due to the virtual mapped p2m list not being large
enough to be able to remap any memory:
(XEN) Freed 304kB init memory.
mapping kernel into physical memory
about to get started...
(XEN) traps.c:459:d0v0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at
ffff82d080229a93 create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.5.2-pre x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<
ffffffff81d120cb>]
(XEN) RFLAGS:
0000000000000206 EM: 1 CONTEXT: pv guest (d0v0)
(XEN) rax:
ffffffff81db2000 rbx:
000000004d000000 rcx:
0000000000000000
(XEN) rdx:
000000004d000000 rsi:
0000000000063000 rdi:
000000004d063000
(XEN) rbp:
ffffffff81c03d78 rsp:
ffffffff81c03d28 r8:
0000000000023000
(XEN) r9:
00000001040ff000 r10:
0000000000007ff0 r11:
0000000000000000
(XEN) r12:
0000000000063000 r13:
000000000004d000 r14:
0000000000000063
(XEN) r15:
0000000000000063 cr0:
0000000080050033 cr4:
00000000000006f0
(XEN) cr3:
0000000105c0f000 cr2:
ffffc90000268000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=
ffffffff81c03d28:
(XEN)
0000000000000000 0000000000000000 ffffffff81d120cb 000000010000e030
(XEN)
0000000000010006 ffffffff81c03d68 000000000000e02b ffffffffffffffff
(XEN)
0000000000000063 000000000004d063 ffffffff81c03de8 ffffffff81d130a7
(XEN)
ffffffff81c03de8 000000000004d000 00000001040ff000 0000000000105db1
(XEN)
00000001040ff001 000000000004d062 ffff8800092d6ff8 0000000002027000
(XEN)
ffff8800094d8340 ffff8800092d6ff8 00003ffffffff000 ffff8800092d7ff8
(XEN)
ffffffff81c03e48 ffffffff81d13c43 ffff8800094d8000 ffff8800094d9000
(XEN)
0000000000000000 ffff8800092d6000 00000000092d6000 000000004cfbf000
(XEN)
00000000092d6000 00000000052d5442 0000000000000000 0000000000000000
(XEN)
ffffffff81c03ed8 ffffffff81d185c1 0000000000000000 0000000000000000
(XEN)
ffffffff81c03e78 ffffffff810f8ca4 ffffffff81c03ed8 ffffffff8171a15d
(XEN)
0000000000000010 ffffffff81c03ee8 0000000000000000 0000000000000000
(XEN)
ffffffff81f0e402 ffffffffffffffff ffffffff81dae900 0000000000000000
(XEN)
0000000000000000 0000000000000000 ffffffff81c03f28 ffffffff81d0cf0f
(XEN)
0000000000000000 0000000000000000 0000000000000000 ffffffff81db82e0
(XEN)
0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)
ffffffff81c03f38 ffffffff81d0c603 ffffffff81c03ff8 ffffffff81d11c86
(XEN)
0300000100000032 0000000000000005 0000000000000020 0000000000000000
(XEN)
0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)
0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
This can be avoided by allocating aneough space for the p2m to cover
the maximum memory of dom0 plus the identity mapped holes required
for PCI space, BIOS etc.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Julien Grall [Fri, 7 Aug 2015 16:34:34 +0000 (17:34 +0100)]
arm/xen: Remove helpers which are PV specific
ARM guests are always HVM. The current implementation is assuming a 1:1
mapping which is only true for DOM0 and may not be at all in the future.
Furthermore, all the helpers but arbitrary_virt_to_machine are used in
x86 specific code (or only compiled for).
The helper arbitrary_virt_to_machine is only used in PV specific code.
Therefore we should never call the function.
Add a BUG() in this helper and drop all the others.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:38 +0000 (16:34 -0400)]
xen/x86: Don't try to set PCE bit in CR4
Since VPMU code emulates RDPMC instruction with RDMSR and because hypervisor
does not emulate it there is no reason to try setting CR4's PCE bit (and the
hypervisor will warn on seeing it set).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:37 +0000 (16:34 -0400)]
xen/PMU: PMU emulation code
Add PMU emulation code that runs when we are processing a PMU interrupt.
This code will allow us not to trap to hypervisor on each MSR/LVTPC access
(of which there may be quite a few in the handler).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:36 +0000 (16:34 -0400)]
xen/PMU: Intercept PMU-related MSR and APIC accesses
Provide interfaces for recognizing accesses to PMU-related MSRs and
LVTPC APIC and process these accesses in Xen PMU code.
(The interrupt handler performs XENPMU_flush right away in the beginning
since no PMU emulation is available. It will be added with a later patch).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:35 +0000 (16:34 -0400)]
xen/PMU: Describe vendor-specific PMU registers
AMD and Intel PMU register initialization and helpers that determine
whether a register belongs to PMU.
This and some of subsequent PMU emulation code is somewhat similar to
Xen's PMU implementation.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:34 +0000 (16:34 -0400)]
xen/PMU: Initialization code for Xen PMU
Map shared data structure that will hold CPU registers, VPMU context,
V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills
this information in its handler and passes it to the guest for further
processing.
Set up PMU VIRQ.
Now that perf infrastructure will assume that PMU is available on a PV
guest we need to be careful and make sure that accesses via RDPMC
instruction don't cause fatal traps by the hypervisor. Provide a nop
RDPMC handler.
For the same reason avoid issuing a warning on a write to APIC's LVTPC.
Both of these will be made functional in later patches.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:33 +0000 (16:34 -0400)]
xen/PMU: Sysfs interface for setting Xen PMU mode
Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Boris Ostrovsky [Mon, 10 Aug 2015 20:34:32 +0000 (16:34 -0400)]
xen: xensyms support
Export Xen symbols to dom0 via /proc/xen/xensyms (similar to
/proc/kallsyms).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:37 +0000 (06:51 +0200)]
xen: remove no longer needed p2m.h
Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more.
Most definitions in this file are used in p2m.c only. Move those into
p2m.c.
set_phys_range_identity() is already declared in
arch/x86/include/asm/xen/page.h, add __init annotation there.
MAX_REMAP_RANGES isn't used at all, just delete it.
The only define left is P2M_PER_PAGE which is moved to page.h as well.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:36 +0000 (06:51 +0200)]
xen: allow more than 512 GB of RAM for 64 bit pv-domains
64 bit pv-domains under Xen are limited to 512 GB of RAM today. The
main reason has been the 3 level p2m tree, which was replaced by the
virtual mapped linear p2m list. Parallel to the p2m list which is
being used by the kernel itself there is a 3 level mfn tree for usage
by the Xen tools and eventually for crash dump analysis. For this tree
the linear p2m list can serve as a replacement, too. As the kernel
can't know whether the tools are capable of dealing with the p2m list
instead of the mfn tree, the limit of 512 GB can't be dropped in all
cases.
This patch replaces the hard limit by a kernel parameter which tells
the kernel to obey the 512 GB limit or not. The default is selected by
a configuration parameter which specifies whether the 512 GB limit
should be active per default for domUs (domain save/restore/migration
and crash dump analysis are affected).
Memory above the domain limit is returned to the hypervisor instead of
being identity mapped, which was wrong anyway.
The kernel configuration parameter to specify the maximum size of a
domain can be deleted, as it is not relevant any more.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:35 +0000 (06:51 +0200)]
xen: move p2m list if conflicting with e820 map
Check whether the hypervisor supplied p2m list is placed at a location
which is conflicting with the target E820 map. If this is the case
relocate it to a new area unused up to now and compliant to the E820
map.
As the p2m list might by huge (up to several GB) and is required to be
mapped virtually, set up a temporary mapping for the copied list.
For pvh domains just delete the p2m related information from start
info instead of reserving the p2m memory, as we don't need it at all.
For 32 bit kernels adjust the memblock_reserve() parameters in order
to cover the page tables only. This requires to memblock_reserve() the
start_info page on it's own.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:34 +0000 (06:51 +0200)]
xen: add explicit memblock_reserve() calls for special pages
Some special pages containing interfaces to xen are being reserved
implicitly only today. The memblock_reserve() call to reserve them is
meant to reserve the p2m list supplied by xen. It is just reserving
not only the p2m list itself, but some more pages up to the start of
the xen built page tables.
To be able to move the p2m list to another pfn range, which is needed
for support of huge RAM, this memblock_reserve() must be split up to
cover all affected reserved pages explicitly.
The affected pages are:
- start_info page
- xenstore ring (might be missing, mfn is 0 in this case)
- console ring (not for initial domain)
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:33 +0000 (06:51 +0200)]
mm: provide early_memremap_ro to establish read-only mapping
During early boot as Xen pv domain the kernel needs to map some page
tables supplied by the hypervisor read only. This is needed to be
able to relocate some data structures conflicting with the physical
memory map especially on systems with huge RAM (above 512GB).
Provide the function early_memremap_ro() to provide this read only
mapping.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:32 +0000 (06:51 +0200)]
xen: check for initrd conflicting with e820 map
Check whether the initrd is placed at a location which is conflicting
with the target E820 map. If this is the case relocate it to a new
area unused up to now and compliant to the E820 map.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:31 +0000 (06:51 +0200)]
xen: check pre-allocated page tables for conflict with memory map
Check whether the page tables built by the domain builder are at
memory addresses which are in conflict with the target memory map.
If this is the case just panic instead of running into problems
later.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:30 +0000 (06:51 +0200)]
xen: check for kernel memory conflicting with memory layout
Checks whether the pre-allocated memory of the loaded kernel is in
conflict with the target memory map. If this is the case, just panic
instead of run into problems later, as there is nothing we can do
to repair this situation.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:29 +0000 (06:51 +0200)]
xen: find unused contiguous memory area
For being able to relocate pre-allocated data areas like initrd or
p2m list it is mandatory to find a contiguous memory area which is
not yet in use and doesn't conflict with the memory map we want to
be in effect.
In case such an area is found reserve it at once as this will be
required to be done in any case.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:28 +0000 (06:51 +0200)]
xen: check memory area against e820 map
Provide a service routine to check a physical memory area against the
E820 map. The routine will return false if the complete area is RAM
according to the E820 map and true otherwise.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:27 +0000 (06:51 +0200)]
xen: split counting of extra memory pages from remapping
Memory pages in the initial memory setup done by the Xen hypervisor
conflicting with the target E820 map are remapped. In order to do this
those pages are counted and remapped in xen_set_identity_and_remap().
Split the counting from the remapping operation to be able to setup
the needed memory sizes in time but doing the remap operation at a
later time. This enables us to simplify the interface to
xen_set_identity_and_remap() as the number of remapped and released
pages is no longer needed here.
Finally move the remapping further down to prepare relocating
conflicting memory contents before the memory might be clobbered by
xen_set_identity_and_remap(). This requires to not destroy the Xen
E820 map when the one for the system is being constructed.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:26 +0000 (06:51 +0200)]
xen: move static e820 map to global scope
Instead of using a function local static e820 map in xen_memory_setup()
and calling various functions in the same source with the map as a
parameter use a map directly accessible by all functions in the source.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:25 +0000 (06:51 +0200)]
xen: eliminate scalability issues from initial mapping setup
Direct Xen to place the initial P->M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.
As the initial P->M table is copied rather early during boot to
domain private memory and it's initial virtual mapping is dropped,
the easiest way to avoid virtual address conflicts with other
addresses in the kernel is to use a user address area for the
virtual address of the initial P->M table. This allows us to just
throw away the page tables of the initial mapping after the copy
without having to care about address invalidation.
It should be noted that this patch won't enable a pv-domain to USE
more than 512 GB of RAM. It just enables it to be started with a
P->M table covering more memory. This is especially important for
being able to boot a Dom0 on a system with more than 512 GB memory.
Signed-off-by: Juergen Gross <jgross@suse.com>
Based-on-patch-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:24 +0000 (06:51 +0200)]
xen: don't build mfn tree if tools don't need it
In case the Xen tools indicate they don't need the p2m 3 level tree
as they support the virtual mapped linear p2m list, just omit building
the tree.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:23 +0000 (06:51 +0200)]
xen: save linear p2m list address in shared info structure
The virtual address of the linear p2m list should be stored in the
shared info structure read by the Xen tools to be able to support
64 bit pv-domains larger than 512 GB. Additionally the linear p2m
list interface includes a generation count which is changed prior
to and after each mapping change of the p2m list. Reading the
generation count the Xen tools can detect changes of the mappings
and re-read the p2m list eventually.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Fri, 17 Jul 2015 04:51:22 +0000 (06:51 +0200)]
xen: sync with xen headers
Use the newest headers from the xen tree to get some new structure
layouts.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Julien Grall [Mon, 3 Aug 2015 09:50:55 +0000 (09:50 +0000)]
arm/xen: Drop the definition of xen_pci_platform_unplug
The commit
6f6c15ef912465b3aaafe709f39bd6026a8b3e72 "xen/pvhvm: Remove
the xen_platform_pci int." makes the x86 version of
xen_pci_platform_unplug static.
Therefore we don't need anymore to define a dummy xen_pci_platform_unplug
for ARM.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Julien Grall [Tue, 28 Jul 2015 09:10:42 +0000 (10:10 +0100)]
xen/events: Support event channel rebind on ARM
Currently, the event channel rebind code is gated with the presence of
the vector callback.
The virtual interrupt controller on ARM has the concept of per-CPU
interrupt (PPI) which allow us to support per-VCPU event channel.
Therefore there is no need of vector callback for ARM.
Xen is already using a free PPI to notify the guest VCPU of an event.
Furthermore, the xen code initialization in Linux (see
arch/arm/xen/enlighten.c) is requesting correctly a per-CPU IRQ.
Introduce new helper xen_support_evtchn_rebind to allow architecture
decide whether rebind an event is support or not. It will always return
true on ARM and keep the same behavior on x86.
This is also allow us to drop the usage of xen_have_vector_callback
entirely in the ARM code.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Bob Liu [Mon, 13 Jul 2015 09:55:24 +0000 (17:55 +0800)]
xen-blkfront: convert to blk-mq APIs
Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.
Only one hardware queue is used now, so there is no significant
performance change
The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.
Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Konstantin Khlebnikov [Wed, 15 Jul 2015 09:52:01 +0000 (12:52 +0300)]
xen/preempt: use need_resched() instead of should_resched()
This code is used only when CONFIG_PREEMPT=n and only in non-atomic
context: xen_in_preemptible_hcall is set only in
privcmd_ioctl_hypercall(). Thus preempt_count is zero and
should_resched() is equal to need_resched().
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Colin Ian King [Thu, 16 Jul 2015 19:34:42 +0000 (20:34 +0100)]
x86/xen: fix non-ANSI declaration of xen_has_pv_devices()
xen_has_pv_devices() has no parameters, so use the normal void
parameter convention to make it match the prototype in the header file
include/xen/platform_pci.h.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Linus Torvalds [Sun, 16 Aug 2015 23:34:13 +0000 (16:34 -0700)]
Linux 4.2-rc7
Linus Torvalds [Sun, 16 Aug 2015 22:44:33 +0000 (15:44 -0700)]
Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A smallish batch of fixes, a little more than expected this late, but
all fixes are contained to their platforms and seem reasonably low
risk:
- a somewhat large SMP fix for ux500 that still seemed warranted to
include here
- OMAP DT fixes for pbias regulator specification that broke due to
some DT reshuffling
- PCIe IRQ routing bugfix for i.MX
- networking fixes for keystone
- runtime PM for OMAP GPMC
- a couple of error path bug fixes for exynos"
* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
ARM: dts: keystone: fix the clock node for mdio
memory: omap-gpmc: Don't try to save uninitialized GPMC context
ARM: imx6: correct i.MX6 PCIe interrupt routing
ARM: ux500: add an SMP enablement type and move cpu nodes
ARM: dts: dra7: Fix broken pbias device creation
ARM: dts: OMAP5: Fix broken pbias device creation
ARM: dts: OMAP4: Fix broken pbias device creation
ARM: dts: omap243x: Fix broken pbias device creation
ARM: EXYNOS: fix double of_node_put() on error path
ARM: EXYNOS: Fix potentian kfree() of ro memory
Linus Torvalds [Sun, 16 Aug 2015 22:39:31 +0000 (15:39 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS bugfix from Ralf Baechle:
"Only a single MIPS fix - the math when invoking syscall_trace_enter
was wrong"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Fix seccomp syscall argument for MIPS64
Linus Torvalds [Sun, 16 Aug 2015 22:11:25 +0000 (15:11 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Merge x86 fixes from Ingo Molnar:
"Two followup fixes related to the previous LDT fix"
Also applied a further FPU emulation fix from Andy Lutomirski to the
branch before actually merging it.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
x86/ldt: Further fix FPU emulation
x86/ldt: Correct FPU emulation access to LDT
x86/ldt: Correct LDT access in single stepping logic
Andy Lutomirski [Fri, 14 Aug 2015 22:02:55 +0000 (15:02 -0700)]
x86/ldt: Further fix FPU emulation
The previous fix confused a selector with a segment prefix. Fix it.
Compile-tested only.
Cc: stable@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes:
4809146b86c3 ("x86/ldt: Correct FPU emulation access to LDT")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jann Horn [Sun, 16 Aug 2015 18:27:01 +0000 (20:27 +0200)]
fs/fuse: fix ioctl type confusion
fuse_dev_ioctl() performed fuse_get_dev() on a user-supplied fd,
leading to a type confusion issue. Fix it by checking file->f_op.
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Olof Johansson [Sun, 16 Aug 2015 19:29:57 +0000 (21:29 +0200)]
Merge tag 'keystone-dts-late-fixes-v2' of git://git./linux/kernel/git/ssantosh/linux-keystone into fixes
ARM: Couple of Keysyone MDIO DTS fixes for 4.2-rc6+
These are necessary to get the NIC card working on all Keystone
EVMs. Couple of boards are broken without these two fixes.
* tag 'keystone-dts-late-fixes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
ARM: dts: keystone: fix the clock node for mdio
Signed-off-by: Olof Johansson <olof@lixom.net>
Markos Chandras [Thu, 13 Aug 2015 07:47:59 +0000 (08:47 +0100)]
MIPS: Fix seccomp syscall argument for MIPS64
Commit
4c21b8fd8f14 ("MIPS: seccomp: Handle indirect system calls (o32)")
fixed indirect system calls on O32 but it also introduced a bug for MIPS64
where it erroneously modified the v0 (syscall) register with the assumption
that the sycall offset hasn't been taken into consideration. This breaks
seccomp on MIPS64 n64 and n32 ABIs. We fix this by replacing the addition
with a move instruction.
Fixes:
4c21b8fd8f14 ("MIPS: seccomp: Handle indirect system calls (o32)")
Cc: <stable@vger.kernel.org> # 3.15+
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10951/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Sat, 15 Aug 2015 20:54:53 +0000 (13:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This has two libfc fixes for bugs causing rare crashes, one iscsi fix
for a potential hang on shutdown, and a fix for an I/O blocksize issue
which caused a regression"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
sd: Fix maximum I/O size for BLOCK_PC requests
libfc: Fix fc_fcp_cleanup_each_cmd()
libfc: Fix fc_exch_recv_req() error path
libiscsi: Fix host busy blocking during connection teardown
Linus Torvalds [Sat, 15 Aug 2015 00:27:52 +0000 (17:27 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Just two very small & simple patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
KVM: x86: zero IDT limit on entry to SMM
Linus Torvalds [Sat, 15 Aug 2015 00:05:26 +0000 (17:05 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Update maintainers for DRM STI driver
mm: cma: mark cma_bitmap_maxno() inline in header
zram: fix pool name truncation
memory-hotplug: fix wrong edge when hot add a new node
.mailmap: Andrey Ryabinin has moved
ipc/sem.c: update/correct memory barriers
mm/hwpoison: fix panic due to split huge zero page
ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
mm/hwpoison: fix page refcount of unknown non LRU page
Linus Torvalds [Fri, 14 Aug 2015 23:10:04 +0000 (16:10 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clock fix from Stephen Boyd:
"A one-liner for a regression found in the PXA clock driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: pxa: pxa3xx: fix CKEN register access
Benjamin Gaignard [Fri, 14 Aug 2015 22:35:24 +0000 (15:35 -0700)]
Update maintainers for DRM STI driver
Add Vincent Abriou and myself as maintainers.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Vincent Abriou <vincent.abriou@st.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gregory Fong [Fri, 14 Aug 2015 22:35:21 +0000 (15:35 -0700)]
mm: cma: mark cma_bitmap_maxno() inline in header
cma_bitmap_maxno() was marked as static and not static inline, which can
cause warnings about this function not being used if this file is included
in a file that does not call that function, and violates the conventions
used elsewhere. The two options are to move the function implementation
back to mm/cma.c or make it inline here, and it's simple enough for the
latter to make sense.
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Fri, 14 Aug 2015 22:35:19 +0000 (15:35 -0700)]
zram: fix pool name truncation
zram_meta_alloc() constructs a pool name for zs_create_pool() call as
snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);
However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.
With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:
debugfs dir <zram100> creation failed
zram: Error creating memory pool
... and so on.
Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id. We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xishi Qiu [Fri, 14 Aug 2015 22:35:16 +0000 (15:35 -0700)]
memory-hotplug: fix wrong edge when hot add a new node
When we add a new node, the edge of memory may be wrong.
e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G],
1. hotremove the node3,
2. then hotadd node3 with a part of memory, mem:[26G-30G],
3. call hotadd_new_pgdat()
free_area_init_node()
get_pfn_range_for_nid()
4. it will return wrong start_pfn and end_pfn, because we have not
update the memblock.
This patch also fixes a BUG_ON during hot-addition, please see
http://marc.info/?l=linux-kernel&m=
142961156129456&w=2
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Fri, 14 Aug 2015 22:35:13 +0000 (15:35 -0700)]
.mailmap: Andrey Ryabinin has moved
Update my email address.
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Fri, 14 Aug 2015 22:35:10 +0000 (15:35 -0700)]
ipc/sem.c: update/correct memory barriers
sem_lock() did not properly pair memory barriers:
!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.
As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.
With regards to -stable:
The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability). The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 14 Aug 2015 22:35:08 +0000 (15:35 -0700)]
mm/hwpoison: fix panic due to split huge zero page
Bug:
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:1957!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re
CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27
Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
task:
ffff880204e3d600 ti:
ffff8800db16c000 task.ti:
ffff8800db16c000
RIP: split_huge_page_to_list+0xdb/0x120
Call Trace:
memory_failure+0x32e/0x7c0
madvise_hwpoison+0x8b/0x160
SyS_madvise+0x40/0x240
? do_page_fault+0x37/0x90
entry_SYSCALL_64_fastpath+0x12/0x71
Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0
RIP split_huge_page_to_list+0xdb/0x120
RSP <
ffff8800db16fde8>
---[ end trace
aee7ce0df8e44076 ]---
Testcase:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>
#include <string.h>
#define MB 1024*1024
int main(void)
{
char *mem;
posix_memalign((void **)&mem, 2 * MB, 200 * MB);
madvise(mem, 200 * MB, MADV_HWPOISON);
free(mem);
return 0;
}
Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
The get_user_pages_fast() which called in madvise_hwpoison() will get
huge zero page if the page is not allocated before. Huge zero page is a
tranparent huge page, however, it is not an anonymous page.
memory_failure will split the huge zero page and trigger
BUG_ON(is_huge_zero_page(page));
After commit
98ed2b0052e6 ("mm/memory-failure: give up error handling
for non-tail-refcounted thp"), memory_failure will not catch non anon
thp from madvise_hwpoison path and this bug occur.
Fix it by catching non anon thp in memory_failure in order to not split
huge zero page in madvise_hwpoison path.
After this patch:
Injecting memory failure for page 0x202800 at 0x7fd8ae800000
MCE: 0x202800: non anonymous thp
[...]
[akpm@linux-foundation.org: remove second split, per Wanpeng]
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Herton R. Krzesinski [Fri, 14 Aug 2015 22:35:05 +0000 (15:35 -0700)]
ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
After we acquire the sma->sem_perm lock in exit_sem(), we are protected
against a racing IPC_RMID operation. Also at that point, we are the last
user of sem_undo_list. Therefore it isn't required that we acquire or use
ulp->lock.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Herton R. Krzesinski [Fri, 14 Aug 2015 22:35:02 +0000 (15:35 -0700)]
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).
For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:
Slab corruption (Not tainted): kmalloc-64 start=
ffff88003b45c1c0, len=64
000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b kkkkkkkk.kkkkkkk
010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........
Prev obj: start=
ffff88003b45c180, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff ...........7....
Next obj: start=
ffff88003b45c200, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff ........h).<....
BUG: spinlock wrong CPU on CPU#2, test/18028
general protection fault: 0000 [#1] SMP
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: spin_dump+0x53/0xc0
Call Trace:
spin_bug+0x30/0x40
do_raw_spin_unlock+0x71/0xa0
_raw_spin_unlock+0xe/0x10
freeary+0x82/0x2a0
? _raw_spin_lock+0xe/0x10
semctl_down.clone.0+0xce/0x160
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
SyS_semctl+0x236/0x2c0
? syscall_trace_leave+0xde/0x130
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
RIP [<
ffffffff810d6053>] spin_dump+0x53/0xc0
RSP <
ffff88003750fd68>
---[ end trace
783ebb76612867a0 ]---
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 3 PID: 18053 Comm: test Tainted: G D 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: native_read_tsc+0x0/0x20
Call Trace:
? delay_tsc+0x40/0x70
__delay+0xf/0x20
do_raw_spin_lock+0x96/0x140
_raw_spin_lock+0xe/0x10
sem_lock_and_putref+0x11/0x70
SYSC_semtimedop+0x7bf/0x960
? handle_mm_fault+0xbf6/0x1880
? dequeue_task_fair+0x79/0x4a0
? __do_page_fault+0x19a/0x430
? kfree_debugcheck+0x16/0x40
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
? do_audit_syscall_entry+0x66/0x70
? syscall_trace_enter_phase1+0x139/0x160
SyS_semtimedop+0xe/0x10
SyS_semop+0x10/0x20
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
Kernel panic - not syncing: softlockup: hung tasks
I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).
The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).
After the patch I'm unable to reproduce the problem using the test case
[1].
[1] Test case used below:
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#define NSEM 1
#define NSET 5
int sid[NSET];
void thread()
{
struct sembuf op;
int s;
uid_t pid = getuid();
s = rand() % NSET;
op.sem_num = pid % NSEM;
op.sem_op = 1;
op.sem_flg = SEM_UNDO;
semop(sid[s], &op, 1);
exit(EXIT_SUCCESS);
}
void create_set()
{
int i, j;
pid_t p;
union {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
} un;
/* Create and initialize semaphore set */
for (i = 0; i < NSET; i++) {
sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
if (sid[i] < 0) {
perror("semget");
exit(EXIT_FAILURE);
}
}
un.val = 0;
for (i = 0; i < NSET; i++) {
for (j = 0; j < NSEM; j++) {
if (semctl(sid[i], j, SETVAL, un) < 0)
perror("semctl");
}
}
/* Launch threads that operate on semaphore set */
for (i = 0; i < NSEM * NSET * NSET; i++) {
p = fork();
if (p < 0)
perror("fork");
if (p == 0)
thread();
}
/* Free semaphore set */
for (i = 0; i < NSET; i++) {
if (semctl(sid[i], NSEM, IPC_RMID))
perror("IPC_RMID");
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
int main(int argc, char **argv)
{
pid_t p;
srand(time(NULL));
while (1) {
p = fork();
if (p < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (p == 0) {
create_set();
goto end;
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
end:
return 0;
}
[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 14 Aug 2015 22:34:59 +0000 (15:34 -0700)]
mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
Hugetlbfs pages will get a refcount in get_any_page() or
madvise_hwpoison() if soft offlining through madvise. The refcount which
is held by the soft offline path should be released if we fail to isolate
hugetlbfs pages.
Fix it by reducing the refcount for both isolation success and failure.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 14 Aug 2015 22:34:56 +0000 (15:34 -0700)]
mm/hwpoison: fix page refcount of unknown non LRU page
After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.
Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 14 Aug 2015 18:06:43 +0000 (11:06 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"A single clocksource driver suspend/resume fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled
Linus Torvalds [Fri, 14 Aug 2015 17:57:16 +0000 (10:57 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX'
(Intel PT) race related core fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler
perf/x86/intel: Fix memory leak on hot-plug allocation fail
perf: Fix PERF_EVENT_IOC_PERIOD migration race
perf: Fix double-free of the AUX buffer
perf: Fix fasync handling on inherited events
perf tools: Fix test build error when bindir contains double slash
perf stat: Fix transaction lenght metrics
perf: Fix running time accounting
Linus Torvalds [Fri, 14 Aug 2015 17:45:23 +0000 (10:45 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"A single fix for a locking self-test crash"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock: Fix kernel panic in locking-selftest
Linus Torvalds [Fri, 14 Aug 2015 17:39:32 +0000 (10:39 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Back from holidays, found these in the cracks: one nouveau revert, one
vmwgfx locking fix and a bunch of exynos fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"
drm/vmwgfx: Fix execbuf locking issues
drm/exynos/fimc: fix runtime pm support
drm/exynos/mixer: always update INT_EN cache
drm/exynos/mixer: correct vsync configuration sequence
drm/exynos/mixer: fix interrupt clearing
drm/exynos/hdmi: fix edid memory leak
drm/exynos: gsc: fix wrong bitwise operation for swap detection
Alexandre Courbot [Wed, 12 Aug 2015 04:17:38 +0000 (13:17 +0900)]
Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"
This reverts commit
1addc1264852
This commit seems to cause crashes in gk104_fifo_intr_runlist() by
returning 0xbad0da00 when register 0x2a00 is read. Since this commit was
intended for GM20B which is not completely supported yet, let's revert
it for the time being.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Afzal Mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Hellstrom [Wed, 12 Aug 2015 05:31:17 +0000 (22:31 -0700)]
drm/vmwgfx: Fix execbuf locking issues
This addresses two issues that cause problems with viewperf maya-03 in
situation with memory pressure.
The first issue causes attempts to unreserve buffers if batched
reservation fails due to, for example, a signal pending. While previously
the ttm_eu api was resistant against this type of error, it is no longer
and the lockdep code will complain about attempting to unreserve buffers
that are not reserved. The issue is resolved by avoid calling
ttm_eu_backoff_reservation in the buffer reserve error path.
The second issue is that the binding_mutex may be held when user-space
fence objects are created and hence during memory reclaims. This may cause
recursive attempts to grab the binding mutex. The issue is resolved by not
holding the binding mutex across fence creation and submission.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 13 Aug 2015 23:47:07 +0000 (09:47 +1000)]
Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
This pull request fixes memory leak and some issues related to
mixer and gscaler driver issues.
* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos/fimc: fix runtime pm support
drm/exynos/mixer: always update INT_EN cache
drm/exynos/mixer: correct vsync configuration sequence
drm/exynos/mixer: fix interrupt clearing
drm/exynos/hdmi: fix edid memory leak
drm/exynos: gsc: fix wrong bitwise operation for swap detection
Linus Torvalds [Thu, 13 Aug 2015 23:34:56 +0000 (16:34 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Another few small ARM fixes, mostly addressing some VDSO issues"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8410/1: VDSO: fix coarse clock monotonicity regression
ARM: 8409/1: Mark ret_fast_syscall as a function
ARM: 8408/1: Fix the secondary_startup function in Big Endian case
ARM: 8405/1: VDSO: fix regression with toolchains lacking ld.bfd executable
Linus Torvalds [Thu, 13 Aug 2015 23:19:44 +0000 (16:19 -0700)]
x86: fix error handling for 32-bit compat out-of-range system call numbers
Commit
3f5159a9221f ("x86/asm/entry/32: Update -ENOSYS handling to match
the 64-bit logic") broke the ENOSYS handling for the 32-bit compat case.
The proper error return value was never loaded into %rax, except if
things just happened to go through the audit paths, which ended up
reloading the return value.
This moves the loading or %rax into the normal system call path, just to
make sure the error case triggers it. It's kind of sad, since it adds a
useless instruction to reload the register to the fast path, but it's
not like that single load from the stack is going to be noticeable.
Reported-by: David Drysdale <drysdale@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 Aug 2015 20:52:46 +0000 (13:52 -0700)]
Merge tag 'dm-4.2-fixes-5' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- two stable fixes for corruption seen in a snapshot of thinp metadata;
metadata snapshots aren't widely used but help provide a consistent
view of the metadata associated with an active thin-pool.
- a dm-cache fix for the 4.2 "default" policy switch from "mq" to "smq"
* tag 'dm-4.2-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache policy smq: move 'dm-cache-default' module alias to SMQ
dm btree: add ref counting ops for the leaves of top level btrees
dm thin metadata: delete btrees when releasing metadata snapshot
Linus Torvalds [Thu, 13 Aug 2015 20:44:32 +0000 (13:44 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull xen block driver fixes from Jens Axboe:
"A few small bug fixes for xen-blk{front,back} that have been sitting
over my vacation"
* 'for-linus' of git://git.kernel.dk/linux-block:
xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()
xen-blkfront: don't add indirect pages to list when !feature_persistent
xen-blkfront: introduce blkfront_gather_backend_features()
Linus Torvalds [Thu, 13 Aug 2015 20:36:22 +0000 (13:36 -0700)]
Merge tag 'for-linus-4.2-rc6-tag' of git://git./linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- revert a fix from 4.2-rc5 that was causing lots of WARNING spam.
- fix a memory leak affecting backends in HVM guests.
- fix PV domU hang with certain configurations.
* tag 'for-linus-4.2-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Revert "xen/events/fifo: Handle linked events when closing a port"
x86/xen: build "Xen PV" APIC driver for domU as well
Linus Torvalds [Thu, 13 Aug 2015 15:25:20 +0000 (08:25 -0700)]
Revert x86 sigcontext cleanups
This reverts commits
9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and
c6f2062935c8 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").
They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).
Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 Aug 2015 17:46:39 +0000 (10:46 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Workaround hw bug when acquiring PCI bos ownership of iwlwifi
devices, from Emmanuel Grumbach.
2) Falling back to vmalloc in conntrack should not emit a warning, from
Pablo Neira Ayuso.
3) Fix NULL deref when rtlwifi driver is used as an AP, from Luis
Felipe Dominguez Vega.
4) Rocker doesn't free netdev on device removal, from Ido Schimmel.
5) UDP multicast early sock demux has route handling races, from Eric
Dumazet.
6) Fix L4 checksum handling in openvswitch, from Glenn Griffin.
7) Fix use-after-free in skb_set_peeked, from Herbert Xu.
8) Don't advertize NETIF_F_FRAGLIST in virtio_net driver, this can lead
to fraglists longer than the driver can support. From Jason Wang.
9) Fix mlx5 on non-4k-pagesize systems, from Carol L Soto.
10) Fix interrupt storm in bna driver, from Ivan Vecera.
11) Don't propagate -EBUSY from netlink_insert(), from Daniel Borkmann.
12) Fix inet request sock leak, from Eric Dumazet.
13) Fix TX interrupt masking and marking in TX descriptors of fs_enet
driver, from LEROY Christophe.
14) Get rid of rule optimizer in gianfar driver, it's buggy and unlikely
to get fixed any time soon. From Jakub Kicinski
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
cosa: missing error code on failure in probe()
gianfar: remove faulty filer optimizer
gianfar: correct list membership accounting
gianfar: correct filer table writing
bonding: Gratuitous ARP gets dropped when first slave added
net: dsa: Do not override PHY interface if already configured
net: fs_enet: mask interrupts for TX partial frames.
net: fs_enet: explicitly remove I flag on TX partial frames
inet: fix possible request socket leak
inet: fix races with reqsk timers
mkiss: Fix error handling in mkiss_open()
bnx2x: Free NVRAM lock at end of each page
bnx2x: Prevent null pointer dereference on SKB release
cxgb4: missing curly braces in t4_setup_debugfs()
net-timestamp: Update skb_complete_tx_timestamp comment
ipv6: don't reject link-local nexthop on other interface
netlink: make sure -EBUSY won't escape from netlink_insert
bna: fix interrupts storm caused by erroneous packets
net: mvpp2: replace TX coalescing interrupts with hrtimer
net: mvpp2: enable proper per-CPU TX buffers unmapping
...
Linus Torvalds [Thu, 13 Aug 2015 17:22:11 +0000 (10:22 -0700)]
Merge tag 'edac_fix_for_4.2' of git://git./linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
"A ppc4xx_edac fix for accessing ->csrows properly. This driver was
missed during the conversion a couple of years ago"
* tag 'edac_fix_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, ppc4xx: Access mci->csrows array elements properly
Murali Karicheri [Mon, 10 Aug 2015 03:06:27 +0000 (20:06 -0700)]
ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
Currently mdio bindings are defined in keystone.dtsi and this results
in incorrect unit address for the node on K2E and K2L SoCs. Fix this
by moving them to SoC specific DTS file.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Murali Karicheri [Mon, 10 Aug 2015 03:06:27 +0000 (20:06 -0700)]
ARM: dts: keystone: fix the clock node for mdio
Currently the MDIO clock is pointing to clkpa instead of clkcpgmac.
MDIO is part of the ethss and the clock should be clkcpgmac.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Olof Johansson [Thu, 13 Aug 2015 12:27:12 +0000 (14:27 +0200)]
Merge tag 'omap-for-v4.2/fixes-rc6' of git://git./linux/kernel/git/tmlind/linux-omap into fixes
Fix a NULL pointer exception for omap GPMC bus code if probe fails.
* tag 'omap-for-v4.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
memory: omap-gpmc: Don't try to save uninitialized GPMC context
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Thu, 13 Aug 2015 10:24:55 +0000 (12:24 +0200)]
Merge tag 'imx-fixes-4.2-3' of git://git./linux/kernel/git/shawnguo/linux into fixes
The i.MX fixes for 4.2, 3rd round:
- Fix i.MX6 PCIe interrupt routing which gets missed from stacked IRQ
domain conversion. The PCIe wakeup support is currently broken
because of this.
* tag 'imx-fixes-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx6: correct i.MX6 PCIe interrupt routing
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Thu, 13 Aug 2015 10:18:45 +0000 (12:18 +0200)]
Merge tag 'samsung-mach-fixes-4.2' of https://github.com/krzk/linux into fixes
Two fixes for bugs in Exynos power domain error exit path:
1. kfree() of read-only memory (name of power domain returned
by kstrdup_const()),
2. Doubled of_node_put() leading to invalid ref count for OF node.
* tag 'samsung-mach-fixes-4.2' of https://github.com/krzk/linux:
ARM: EXYNOS: fix double of_node_put() on error path
ARM: EXYNOS: Fix potentian kfree() of ro memory
Signed-off-by: Olof Johansson <olof@lixom.net>
Michael Walle [Tue, 21 Jul 2015 09:00:53 +0000 (11:00 +0200)]
EDAC, ppc4xx: Access mci->csrows array elements properly
The commit
de3910eb79ac ("edac: change the mem allocation scheme to
make Documentation/kobject.txt happy")
changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: <stable@vger.kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
Dan Carpenter [Wed, 12 Aug 2015 21:08:01 +0000 (00:08 +0300)]
cosa: missing error code on failure in probe()
If register_hdlc_device() fails, the current code returns 0 but we
should return an error code instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Aug 2015 21:47:06 +0000 (14:47 -0700)]
Merge branch 'gianfar-fixes'
Jakub Kicinski says:
====================
gianfar: filer changes
respinning with examples as requested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 12 Aug 2015 00:41:57 +0000 (02:41 +0200)]
gianfar: remove faulty filer optimizer
Current filer rule optimization is broken in several ways:
(1) Can perform reads/writes beyond end of allocated tables.
(gianfar_ethtool.c:1326).
(2) It breaks badly for rules with more than 2 specifiers
(e.g. matching ip, port, tos).
Example:
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1 tos 1 action 1
Added rule with ID 254
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2 tos 2 action 9
Added rule with ID 253
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3 tos 3 action 17
Added rule with ID 252
# ./filer_decode /sys/kernel/debug/gfar1/filer_raw
00: MASK ==
00000210 AND Q:00 ctrl:
00000080 prop:
00000210
01: FPR ==
00000210 AND CLE Q:00 ctrl:
00000281 prop:
00000210
02: MASK ==
ffffffff AND Q:00 ctrl:
00000080 prop:
ffffffff
03: DPT ==
00000003 AND Q:00 ctrl:
0000008e prop:
00000003
04: TOS ==
00000003 AND Q:00 ctrl:
0000008a prop:
00000003
05: DIA ==
0a000003 AND Q:11 ctrl:
0000448c prop:
0a000003
06: DPT ==
00000002 AND Q:00 ctrl:
0000008e prop:
00000002
07: TOS ==
00000002 AND Q:00 ctrl:
0000008a prop:
00000002
08: DIA ==
0a000002 AND Q:09 ctrl:
0000248c prop:
0a000002
09: DIA ==
0a000001 AND Q:00 ctrl:
0000008c prop:
0a000001
0a: DPT ==
00000001 AND Q:00 ctrl:
0000008e prop:
00000001
0b: TOS ==
00000001 CLE Q:01 ctrl:
0000060a prop:
00000001
ff: MASK >=
00000000 Q:00 ctrl:
00000020 prop:
00000000
(Entire cluster gets AND-ed together).
(3) We observed that the masking rules it generates do not
play well with clustering on P2020. Only first rule
of the cluster would ever fire. Given that optimizer
relies heavily on masking this is very hard to fix.
Example:
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1 action 1
Added rule with ID 254
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2 action 9
Added rule with ID 253
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3 action 17
Added rule with ID 252
# ./filer_decode /sys/kernel/debug/gfar1/filer_raw
00: MASK ==
00000210 AND Q:00 ctrl:
00000080 prop:
00000210
01: FPR ==
00000210 AND CLE Q:00 ctrl:
00000281 prop:
00000210
02: MASK ==
ffffffff AND Q:00 ctrl:
00000080 prop:
ffffffff
03: DPT ==
00000003 AND Q:00 ctrl:
0000008e prop:
00000003
04: DIA ==
0a000003 Q:11 ctrl:
0000440c prop:
0a000003
05: DPT ==
00000002 AND Q:00 ctrl:
0000008e prop:
00000002
06: DIA ==
0a000002 Q:09 ctrl:
0000240c prop:
0a000002
07: DIA ==
0a000001 AND Q:00 ctrl:
0000008c prop:
0a000001
08: DPT ==
00000001 CLE Q:01 ctrl:
0000060e prop:
00000001
ff: MASK >=
00000000 Q:00 ctrl:
00000020 prop:
00000000
Which looks correct according to the spec but only the first
(eth id 252)/last added rule for 10.0.0.3 will ever trigger.
As if filer did not treat the AND CLE as cluster start but
also kept AND-ing the rules. We found no errata covering this.
The fact that nobody noticed (2) or (3) makes me think
that this feature is not very widely used and we should just
remove it.
Reported-by: Aleksander Dutkowski <adutkowski@gmail.com>
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Acked-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 12 Aug 2015 00:41:56 +0000 (02:41 +0200)]
gianfar: correct list membership accounting
At a cost of one line let's make sure .count is correct
when calling gfar_process_filer_changes().
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 12 Aug 2015 00:41:55 +0000 (02:41 +0200)]
gianfar: correct filer table writing
MAX_FILER_IDX is the last usable index. Using less-than
will already guarantee that one entry for catch-all rule
will be left, no need to subtract 1 here.
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Venkatsubra [Tue, 11 Aug 2015 14:57:23 +0000 (07:57 -0700)]
bonding: Gratuitous ARP gets dropped when first slave added
When the first slave is added (such as during bootup) the first
gratuitous ARP gets dropped. We don't see this drop during a failover.
The packet gets dropped in qdisc (noop_enqueue).
The fix is to delay the sending of gratuitous ARPs till the bond dev's
carrier is present.
It can also be worked around by setting num_grat_arp to more than 1.
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 8 Aug 2015 19:58:57 +0000 (12:58 -0700)]
net: dsa: Do not override PHY interface if already configured
In case we need to divert reads/writes using the slave MII bus, we may have
already fetched a valid PHY interface property from Device Tree, and that
mode is used by the PHY driver to make configuration decisions.
If we could not fetch the "phy-mode" property, we will assign p->phy_interface
to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
to whether or not we should override the interface value.
Fixes:
19334920eaf7 ("net: dsa: Set valid phy interface type")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin K. Petersen [Tue, 23 Jun 2015 16:13:59 +0000 (12:13 -0400)]
sd: Fix maximum I/O size for BLOCK_PC requests
Commit
bcdb247c6b6a ("sd: Limit transfer length") clamped the maximum
size of an I/O request to the MAXIMUM TRANSFER LENGTH field in the BLOCK
LIMITS VPD. This had the unfortunate effect of also limiting the maximum
size of non-filesystem requests sent to the device through sg/bsg.
Avoid using blk_queue_max_hw_sectors() and set the max_sectors queue
limit directly.
Also update the comment in blk_limits_max_hw_sectors() to clarify that
max_hw_sectors defines the limit for the I/O controller only.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Brian King <brking@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Linus Torvalds [Wed, 12 Aug 2015 18:25:01 +0000 (11:25 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix coarse clock monotonicity (VDSO timestamp off by one jiffy
compared to the syscall one)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: VDSO: fix coarse clock monotonicity regression
Bart Van Assche [Fri, 5 Jun 2015 21:20:51 +0000 (14:20 -0700)]
libfc: Fix fc_fcp_cleanup_each_cmd()
Since fc_fcp_cleanup_cmd() can sleep this function must not
be called while holding a spinlock. This patch avoids that
fc_fcp_cleanup_each_cmd() triggers the following bug:
BUG: scheduling while atomic: sg_reset/1512/0x00000202
1 lock held by sg_reset/1512:
#0: (&(&fsp->scsi_pkt_lock)->rlock){+.-...}, at: [<
ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Preemption disabled at:[<
ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Call Trace:
[<
ffffffff816c612c>] dump_stack+0x4f/0x7b
[<
ffffffff810828bc>] __schedule_bug+0x6c/0xd0
[<
ffffffff816c87aa>] __schedule+0x71a/0xa10
[<
ffffffff816c8ad2>] schedule+0x32/0x80
[<
ffffffffc0217eac>] fc_seq_set_resp+0xac/0x100 [libfc]
[<
ffffffffc0218b11>] fc_exch_done+0x41/0x60 [libfc]
[<
ffffffffc0225cff>] fc_fcp_cleanup_each_cmd.isra.21+0xcf/0x150 [libfc]
[<
ffffffffc0225f43>] fc_eh_device_reset+0x1c3/0x270 [libfc]
[<
ffffffff814a2cc9>] scsi_try_bus_device_reset+0x29/0x60
[<
ffffffff814a3908>] scsi_ioctl_reset+0x258/0x2d0
[<
ffffffff814a2650>] scsi_ioctl+0x150/0x440
[<
ffffffff814b3a9d>] sd_ioctl+0xad/0x120
[<
ffffffff8132f266>] blkdev_ioctl+0x1b6/0x810
[<
ffffffff811da608>] block_ioctl+0x38/0x40
[<
ffffffff811b4e08>] do_vfs_ioctl+0x2f8/0x530
[<
ffffffff811b50c1>] SyS_ioctl+0x81/0xa0
[<
ffffffff816cf8b2>] system_call_fastpath+0x16/0x7a
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Bart Van Assche [Fri, 5 Jun 2015 21:20:46 +0000 (14:20 -0700)]
libfc: Fix fc_exch_recv_req() error path
Due to patch "libfc: Do not invoke the response handler after
fc_exch_done()" (commit ID
7030fd62) the lport_recv() call
in fc_exch_recv_req() is passed a dangling pointer. Avoid this
by moving the fc_frame_free() call from fc_invoke_resp() to its
callers. This patch fixes the following crash:
general protection fault: 0000 [#3] PREEMPT SMP
RIP: fc_lport_recv_req+0x72/0x280 [libfc]
Call Trace:
fc_exch_recv+0x642/0xde0 [libfc]
fcoe_percpu_receive_thread+0x46a/0x5ed [fcoe]
kthread+0x10a/0x120
ret_from_fork+0x42/0x70
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Linus Torvalds [Wed, 12 Aug 2015 18:13:54 +0000 (11:13 -0700)]
Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux
Pull amd drm fixes from Alex Deucher:
"Dave is on vacation at the moment, so please pull these radeon and
amdgpu fixes directly.
Just a few minor things for 4.2:
- add a new radeon pci id
- fix a power management regression in amdgpu
- fix HEVC command buffer validation in amdgpu"
* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: add new OLAND pci id
Revert "drm/amdgpu: Configure doorbell to maximum slots"
drm/amdgpu: add context buffer size check for HEVC
John Soni Jose [Wed, 24 Jun 2015 01:11:58 +0000 (06:41 +0530)]
libiscsi: Fix host busy blocking during connection teardown
In case of hw iscsi offload, an host can have N-number of active
connections. There can be IO's running on some connections which
make host->host_busy always TRUE. Now if logout from a connection
is tried then the code gets into an infinite loop as host->host_busy
is always TRUE.
iscsi_conn_teardown(....)
{
.........
/*
* Block until all in-progress commands for this connection
* time out or fail.
*/
for (;;) {
spin_lock_irqsave(session->host->host_lock, flags);
if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
spin_unlock_irqrestore(session->host->host_lock, flags);
break;
}
spin_unlock_irqrestore(session->host->host_lock, flags);
msleep_interruptible(500);
iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
"host_busy %d host_failed %d\n",
atomic_read(&session->host->host_busy),
session->host->host_failed);
................
...............
}
}
This is not an issue with software-iscsi/iser as each cxn is a separate
host.
Fix:
Acquiring eh_mutex in iscsi_conn_teardown() before setting
session->state = ISCSI_STATE_TERMINATE.
Signed-off-by: John Soni Jose <sony.john@avagotech.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Chris Leech <cleech@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Alex Deucher [Mon, 10 Aug 2015 19:28:49 +0000 (15:28 -0400)]
drm/radeon: add new OLAND pci id
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Alex Deucher [Mon, 10 Aug 2015 15:08:31 +0000 (11:08 -0400)]
Revert "drm/amdgpu: Configure doorbell to maximum slots"
This reverts commit
78ad5cdd21f0d614983fc397338944e797ec70b9.
This commit breaks dpm and suspend/resume on CZ.
Boyuan Zhang [Wed, 5 Aug 2015 18:03:48 +0000 (14:03 -0400)]
drm/amdgpu: add context buffer size check for HEVC
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Linus Torvalds [Wed, 12 Aug 2015 16:06:39 +0000 (09:06 -0700)]
Merge tag 'regmap-fix-v4.2-rc6' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"regmap: Fix handling of present bits on rbtree cache block resize
When expanding a cache block we use krealloc() to resize the register
present bitmap without initialising the newly allocated data (the
original code was written for kzalloc()). Add an appropraite memset()
to fix that"
* tag 'regmap-fix-v4.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: regcache-rbtree: Clean new present bits on present bitmap resize
Yi Zhang [Wed, 12 Aug 2015 11:22:43 +0000 (19:22 +0800)]
dm cache policy smq: move 'dm-cache-default' module alias to SMQ
When creating dm-cache with the default policy, it will call
request_module("dm-cache-default") to register the default policy.
But the "dm-cache-default" alias was left referring to the MQ policy.
Fix this by moving the module alias to SMQ.
Fixes:
bccab6a0 (dm cache: switch the "default" cache replacement policy from mq to smq)
Signed-off-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Joe Thornber [Wed, 12 Aug 2015 14:12:09 +0000 (15:12 +0100)]
dm btree: add ref counting ops for the leaves of top level btrees
When using nested btrees, the top leaves of the top levels contain
block addresses for the root of the next tree down. If we shadow a
shared leaf node the leaf values (sub tree roots) should be incremented
accordingly.
This is only an issue if there is metadata sharing in the top levels.
Which only occurs if metadata snapshots are being used (as is possible
with dm-thinp). And could result in a block from the thinp metadata
snap being reused early, thus corrupting the thinp metadata snap.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Joe Thornber [Wed, 12 Aug 2015 14:10:21 +0000 (15:10 +0100)]
dm thin metadata: delete btrees when releasing metadata snapshot
The device details and mapping trees were just being decremented
before. Now btree_del() is called to do a deep delete.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Matt Fleming [Thu, 6 Aug 2015 12:12:43 +0000 (13:12 +0100)]
perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler
Tony reports that booting his 144-cpu machine with maxcpus=10 triggers
the following WARN_ON():
[ 21.045727] WARNING: CPU: 8 PID: 647 at arch/x86/kernel/cpu/perf_event_intel_cqm.c:1267 intel_cqm_cpu_prepare+0x75/0x90()
[ 21.045744] CPU: 8 PID: 647 Comm: systemd-udevd Not tainted 4.2.0-rc4 #1
[ 21.045745] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0066.R00.
1506021730 06/02/2015
[ 21.045747]
0000000000000000 0000000082771b09 ffff880856333ba8 ffffffff81669b67
[ 21.045748]
0000000000000000 0000000000000000 ffff880856333be8 ffffffff8107b02a
[ 21.045750]
ffff88085b789800 ffff88085f68a020 ffffffff819e2470 000000000000000a
[ 21.045750] Call Trace:
[ 21.045757] [<
ffffffff81669b67>] dump_stack+0x45/0x57
[ 21.045759] [<
ffffffff8107b02a>] warn_slowpath_common+0x8a/0xc0
[ 21.045761] [<
ffffffff8107b15a>] warn_slowpath_null+0x1a/0x20
[ 21.045762] [<
ffffffff81036725>] intel_cqm_cpu_prepare+0x75/0x90
[ 21.045764] [<
ffffffff81036872>] intel_cqm_cpu_notifier+0x42/0x160
[ 21.045767] [<
ffffffff8109a33d>] notifier_call_chain+0x4d/0x80
[ 21.045769] [<
ffffffff8109a44e>] __raw_notifier_call_chain+0xe/0x10
[ 21.045770] [<
ffffffff8107b538>] _cpu_up+0xe8/0x190
[ 21.045771] [<
ffffffff8107b65a>] cpu_up+0x7a/0xa0
[ 21.045774] [<
ffffffff8165e920>] cpu_subsys_online+0x40/0x90
[ 21.045777] [<
ffffffff81433b37>] device_online+0x67/0x90
[ 21.045778] [<
ffffffff81433bea>] online_store+0x8a/0xa0
[ 21.045782] [<
ffffffff81430e78>] dev_attr_store+0x18/0x30
[ 21.045785] [<
ffffffff8126b6ba>] sysfs_kf_write+0x3a/0x50
[ 21.045786] [<
ffffffff8126ad40>] kernfs_fop_write+0x120/0x170
[ 21.045789] [<
ffffffff811f0b77>] __vfs_write+0x37/0x100
[ 21.045791] [<
ffffffff811f38b8>] ? __sb_start_write+0x58/0x110
[ 21.045795] [<
ffffffff81296d2d>] ? security_file_permission+0x3d/0xc0
[ 21.045796] [<
ffffffff811f1279>] vfs_write+0xa9/0x190
[ 21.045797] [<
ffffffff811f2075>] SyS_write+0x55/0xc0
[ 21.045800] [<
ffffffff81067300>] ? do_page_fault+0x30/0x80
[ 21.045804] [<
ffffffff816709ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 21.045805] ---[ end trace
fe228b836d8af405 ]---
The root cause is that CPU_UP_PREPARE is completely the wrong notifier
action from which to access cpu_data(), because smp_store_cpu_info()
won't have been executed by the target CPU at that point, which in turn
means that ->x86_cache_max_rmid and ->x86_cache_occ_scale haven't been
filled out.
Instead let's invoke our handler from CPU_STARTING and rename it
appropriately.
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Link: http://lkml.kernel.org/r/1438863163-14083-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 10 Aug 2015 12:17:34 +0000 (14:17 +0200)]
perf/x86/intel: Fix memory leak on hot-plug allocation fail
We fail to free the shared_regs allocation if the constraint_list
allocation fails.
Cure this and be more consistent in NULL-ing the pointers after free.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 4 Aug 2015 17:22:49 +0000 (19:22 +0200)]
perf: Fix PERF_EVENT_IOC_PERIOD migration race
I ran the perf fuzzer, which triggered some WARN()s which are due to
trying to stop/restart an event on the wrong CPU.
Use the normal IPI pattern to ensure we run the code on the correct CPU.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
bad7192b842c ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period")
Signed-off-by: Ingo Molnar <mingo@kernel.org>