Daniel Axtens [Fri, 14 Aug 2015 07:41:24 +0000 (17:41 +1000)]
cxl: Don't remove AFUs/vPHBs in cxl_reset
If the driver doesn't participate in EEH, the AFUs will be removed
by cxl_remove, which will be invoked by EEH.
If the driver does particpate in EEH, the vPHB needs to stick around
so that the it can particpate.
In both cases, we shouldn't remove the AFU/vPHB.
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:23 +0000 (17:41 +1000)]
cxl: Refactor AFU init/teardown
As with an adapter, some aspects of initialisation are done only once
in the lifetime of an AFU: for example, allocating memory, or setting
up sysfs/debugfs files.
However, we may want to be able to do some parts of the initialisation
multiple times: for example, in error recovery we want to be able to
tear down and then re-map IO memory and IRQs.
Therefore, refactor AFU init/teardown as follows.
- Create two new functions: 'cxl_configure_afu', and its pair
'cxl_deconfigure_afu'. As with the adapter functions,
these (de)configure resources that do not need to last the entire
lifetime of the AFU.
- Allocating and releasing memory remain the task of 'cxl_alloc_afu'
and 'cxl_release_afu'.
- Once-only functions that do not involve allocating/releasing memory
stay in the overarching 'cxl_init_afu'/'cxl_remove_afu' pair.
However, the task of picking an AFU mode and activating it has been
broken out.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:22 +0000 (17:41 +1000)]
cxl: Refactor adaptor init/teardown
Some aspects of initialisation are done only once in the lifetime of
an adapter: for example, allocating memory for the adapter,
allocating the adapter number, or setting up sysfs/debugfs files.
However, we may want to be able to do some parts of the
initialisation multiple times: for example, in error recovery we
want to be able to tear down and then re-map IO memory and IRQs.
Therefore, refactor CXL init/teardown as follows.
- Keep the overarching functions 'cxl_init_adapter' and its pair,
'cxl_remove_adapter'.
- Move all 'once only' allocation/freeing steps to the existing
'cxl_alloc_adapter' function, and its pair 'cxl_release_adapter'
(This involves moving allocation of the adapter number out of
cxl_init_adapter.)
- Create two new functions: 'cxl_configure_adapter', and its pair
'cxl_deconfigure_adapter'. These two functions 'wire up' the
hardware --- they (de)configure resources that do not need to
last the entire lifetime of the adapter
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:21 +0000 (17:41 +1000)]
cxl: Clean up adapter MMIO unmap path.
- MMIO pointer unmapping is guarded by a null pointer check.
However, iounmap doesn't null the pointer, just invalidate it.
Therefore, explicitly null the pointer after unmapping.
- afu_desc_mmio also needs to be unmapped.
- PCI regions are allocated in cxl_map_adapter_regs.
Therefore they should be released in unmap, not elsewhere.
Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:20 +0000 (17:41 +1000)]
cxl: Make IRQ release idempotent
Check if an IRQ is mapped before releasing it.
This will simplify future EEH code by allowing unconditional unmapping
of IRQs.
Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:19 +0000 (17:41 +1000)]
cxl: Allocate and release the SPA with the AFU
Previously the SPA was allocated and freed upon entering and leaving
AFU-directed mode. This causes some issues for error recovery - contexts
hold a pointer inside the SPA, and they may persist after the AFU has
been detached.
We would ideally like to allocate the SPA when the AFU is allocated, and
release it until the AFU is released. However, we don't know how big the
SPA needs to be until we read the AFU descriptor.
Therefore, restructure the code:
- Allocate the SPA only once, on the first attach.
- Release the SPA only when the entire AFU is being released (not
detached). Guard the release with a NULL check, so we don't free
if it was never allocated (e.g. dedicated mode)
Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:18 +0000 (17:41 +1000)]
cxl: Drop commands if the PCI channel is not in normal state
If the PCI channel has gone down, don't attempt to poke the hardware.
We need to guard every time cxl_whatever_(read|write) is called. This
is because a call to those functions will dereference an offset into an
mmio register, and the mmio mappings get invalidated in the EEH
teardown.
Check in the read/write functions in the header.
We give them the same semantics as usual PCI operations:
- a write to a channel that is down is ignored.
- a read from a channel that is down returns all fs.
Also, we try to access the MMIO space of a vPHB device as part of the
PCI disable path. Because that's a read that bypasses most of our usual
checks, we handle it explicitly.
As far as user visible warnings go:
- Check link state in file ops, return -EIO if down.
- Be reasonably quiet if there's an error in a teardown path,
or when we already know the hardware is going down.
- Throw a big WARN if someone tries to start a CXL operation
while the card is down. This gives a useful stacktrace for
debugging whatever is doing that.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 07:41:17 +0000 (17:41 +1000)]
cxl: Convert MMIO read/write macros to inline functions
We're about to make these more complex, so make them functions
first.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 14 Aug 2015 06:03:19 +0000 (16:03 +1000)]
powerpc/eeh: Probe after unbalanced kref check
In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().
However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.
This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.
This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.
Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.
Fixes:
d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: stable@vger.kernel.org
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Wed, 5 Aug 2015 07:08:31 +0000 (12:38 +0530)]
powerpc: Add an inline function to update POWER8 HID0
Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).
Create an inline function name update_power8_hid0() which follows this
recipe and invoke it from the static split core path.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Thu, 6 Aug 2015 13:05:07 +0000 (18:35 +0530)]
powerpc/prom: Use DRCONF flags while processing detected LMBs
Replace hard coded values with existing DRCONF flags while procesing
detected LMBs from the device tree. Does not change any functionality.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Wed, 29 Jul 2015 07:10:04 +0000 (12:40 +0530)]
powerpc/xmon: Drop the valid variable completely in dump_segments()
The value of 'valid' is always zero when 'esid' is zero, and if 'esid'
is non-zero then the value of 'valid' is irrelevant because we are using
logical or in the if expression.
In fact 'valid' can be dropped completely from dump_segments() by
simply doing the check with SLB_ESID_V directly in the if.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Wed, 29 Jul 2015 07:10:03 +0000 (12:40 +0530)]
powerpc/prom: Simplify the logic to fetch SLB size
The code to fetch the SLB size from the device tree wants to first look
for "slb-size" and then if that's not found "ibm,slb-size".
We can simplify the code by looking for the properties and then if we
find one of them we set mmu_slb_size.
We also change the function name from check_cpu_slb_size() to
init_mmu_slb_size() as the function doesn't check anything, it only
initialises mmu_slb_size.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Wed, 29 Jul 2015 07:10:02 +0000 (12:40 +0530)]
powerpc/slb: Add documentation on runtime patching of SLB encoding
This patch adds some documentation to patch_slb_encoding() explaining
how it works.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Update change log and mention the signedness of the immediate]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Wed, 29 Jul 2015 07:09:59 +0000 (12:39 +0530)]
powerpc/slb: Rename all the 'slot' occurrences to 'entry'
The SLB code uses 'slot' and 'entry' interchangeably, change it to always
use 'entry'.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Wed, 29 Jul 2015 07:09:58 +0000 (12:39 +0530)]
powerpc/slb: Remove a duplicate extern variable
This patch just removes one redundant entry for one extern variable
'slb_compare_rr_to_size' from the scope. This patch does not change
any functionality.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 7 Aug 2015 03:18:20 +0000 (13:18 +1000)]
cxl: sparse: Silence iomem warning in debugfs file creation
An IO address, tagged with __iomem, is passed to debugfs_create_file
as private data. This requires that it be cast to void *. The cast
drops the __iomem annotation and so creates a sparse warning:
drivers/misc/cxl/debugfs.c:51:57: warning: cast removes address space of expression
The address space marker is added back in the file operations
(fops_io_u64).
Silence the warning with __force.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 7 Aug 2015 03:18:18 +0000 (13:18 +1000)]
cxl: sparse: Make declarations static
A few declarations were identified by sparse as needing to be static:
drivers/misc/cxl/irq.c:408:6: warning: symbol 'afu_irq_name_free' was not declared. Should it be static?
drivers/misc/cxl/irq.c:467:6: warning: symbol 'afu_register_hwirqs' was not declared. Should it be static?
drivers/misc/cxl/file.c:254:6: warning: symbol 'afu_compat_ioctl' was not declared. Should it be static?
drivers/misc/cxl/file.c:399:30: warning: symbol 'afu_master_fops' was not declared. Should it be static?
Make them static.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Fri, 7 Aug 2015 03:18:17 +0000 (13:18 +1000)]
cxl: Compile with -Werror
It's a good idea, and it brings us in line with the rest of arch/powerpc.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Naveen N. Rao [Fri, 24 Apr 2015 08:54:44 +0000 (14:24 +0530)]
powerpc/ftrace: add powerpc timebase as a trace clock source
Add a new powerpc-specific trace clock using the timebase register,
similar to x86-tsc. This gives us
- a fast, monotonic, hardware clock source for trace entries, and
- a clock that can be used to correlate events across cpus as well as across
hypervisor and guests.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wei Yongjun [Thu, 16 Apr 2015 12:18:50 +0000 (20:18 +0800)]
powerpc/4xx: Fix return value check in hsta_msi_probe()
In case of error, the functions platform_get_resource() and kmalloc()
returns NULL not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Bolle [Fri, 31 Jul 2015 12:14:20 +0000 (14:14 +0200)]
windfarm: remove three exported but unused functions
wf_find_control(), wf_find_sensor(), and wf_is_overtemp() are exported
but unused. Remove these three functions.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Bolle [Fri, 31 Jul 2015 12:12:20 +0000 (14:12 +0200)]
windfarm: make wf_critical_overtemp() static
wf_critical_overtemp() is exported. But nothing uses that export.
That's unsurprising because there's no header that defines it. Stop
exporting that function and make it static.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Bolle [Fri, 31 Jul 2015 12:08:58 +0000 (14:08 +0200)]
windfarm: decrement client count when unregistering
wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.
Fixes:
75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Joe Perches [Mon, 29 Jun 2015 21:30:39 +0000 (14:30 -0700)]
powerpc: Remove redundant breaks
break; break; isn't useful.
Remove one.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Kevin Hao [Fri, 12 Jun 2015 02:26:37 +0000 (10:26 +0800)]
powerpc: pci: use %pR for printing struct resource
Use %pR to simplify the debug code. This also make the debug info more
readable.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
[mpe: Unsplit multi-line printk strings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Wed, 29 Jul 2015 04:07:22 +0000 (14:07 +1000)]
cxl: Don't ignore add_process_element() result when attaching context
Currently when attaching a context in dedicated mode, we ignore the
result of add_process_element(), which could potentially fail.
If add_process_element() returns an error, pass it back to the caller.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Tue, 4 Aug 2015 11:18:56 +0000 (16:48 +0530)]
powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI.
Invoke new opal_cec_reboot2() call with reboot type
OPAL_REBOOT_PLATFORM_ERROR (for unrecoverable HMI interrupts) to inform
BMC/OCC about this error, so that BMC can collect relevant data for error
analysis and decide what component to de-configure before rebooting.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Fri, 31 Jul 2015 15:54:38 +0000 (21:24 +0530)]
powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors.
On non-recoverable MCE errors in kernel space, Linux kernel panics
and system reboots. On BMC based system opal-prd runs as a daemon
in the host. Hence, kernel crash may prevent opal-prd to detect and
analyze this MCE error. This may land us in a situation where the faulty
memory never gets de-configured and Linux would keep hitting same MCE error
again and again. If this happens in early stage of kernel initialization,
then Linux will keep crashing and rebooting in a loop.
This patch fixes this issue by invoking new opal_cec_reboot2() call with
reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
error, so that BMC can collect relevant data for error analysis and
decide what component to de-configure before rebooting.
This patch is dependent on OPAL patchset posted on skiboot mailing list
at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
introduces opal_cec_reboot2() opal call.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Tue, 5 May 2015 08:05:43 +0000 (13:35 +0530)]
powerpc/powernv: Pull all HMI events before panic.
In the event of unrecovered HMI the existing code panics as soon as
it receives the first unrecovered HMI event. This makes host to report
partial information about HMIs before panic. There may be more errors
which would have caused the HMI and hence more HMI event would have been
generated waiting to be pulled by host. This patch implements a logic to
pull and display all the HMI event before going down panic path.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Tue, 5 May 2015 08:04:58 +0000 (13:34 +0530)]
powerpc/powernv: display reason for Malfunction Alert HMI.
The V2 version of HMI event now carries additional information for
Malfunction Alert. It now contains error information about CORE and NX
checkstop. This patch checks and displays the check stop reason before
panic.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 23 Jul 2015 10:21:11 +0000 (20:21 +1000)]
selftests/seccomp: Add powerpc support
Wire up the syscall number and regs so the tests work on powerpc.
With the powerpc kernel support just merged, all tests pass on ppc64,
ppc64 (compat), ppc64le, ppc, ppc64e and ppc64e (compat).
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 23 Jul 2015 10:21:10 +0000 (20:21 +1000)]
selftests/seccomp: Make seccomp tests work on big endian
The seccomp_bpf test uses BPF_LD|BPF_W|BPF_ABS to load 32-bit values
from seccomp_data->args. On big endian machines this will load the high
word of the argument, which is not what the test wants.
Borrow a hack from samples/seccomp/bpf-helper.h which changes the offset
on big endian to account for this.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:09 +0000 (20:21 +1000)]
powerpc/kernel: Enable seccomp filter
This commit enables seccomp filter on powerpc, now that we have all the
necessary pieces in place.
To support seccomp's desire to modify the syscall return value under
some circumstances, we use a different ABI to the ptrace ABI. That is we
use r3 as the syscall return value, and orig_gpr3 is the first syscall
parameter.
This means the seccomp code, or a ptracer via SECCOMP_RET_TRACE, will
see -ENOSYS preloaded in r3. This is identical to the behaviour on x86,
and allows seccomp or the ptracer to either leave the -ENOSYS or change
it to something else, as well as rejecting or not the syscall by
modifying r0.
If seccomp does not reject the syscall, we restore the register state to
match what ptrace and audit expect, ie. r3 is the first syscall
parameter again. We do this restore using orig_gpr3, which may have been
modified by seccomp, which allows seccomp to modify the first syscall
paramater and allow the syscall to proceed.
We need to #ifdef the the additional handling of r3 for seccomp, so move
it all out of line.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:08 +0000 (20:21 +1000)]
powerpc/kernel: Add SIG_SYS support for compat tasks
SIG_SYS was added in commit
a0727e8ce513 "signal, x86: add SIGSYS info
and make it synchronous."
Because we use the asm-generic struct siginfo, we got support for
SIG_SYS for free as part of that commit.
However there was no compat handling added for powerpc. That means we've
been advertising the existence of signfo._sifields._sigsys to compat
tasks, but not actually filling in the fields correctly.
Luckily it looks like no one has noticed, presumably because the only
user of SIGSYS in the kernel is seccomp filter, which we don't support
yet.
So before we enable seccomp filter, add compat handling for SIGSYS.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:07 +0000 (20:21 +1000)]
powerpc: Change syscall_get_nr() to return int
The documentation for syscall_get_nr() in asm-generic says:
Note this returns int even on 64-bit machines. Only 32 bits of
system call number can be meaningful. If the actual arch value
is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
However our implementation was never updated to reflect this.
Generally it's not important, but there is once case where it matters.
For seccomp filter with SECCOMP_RET_TRACE, the tracer will set
regs->gpr[0] to -1 to reject the syscall. When the task is a compat
task, this means we end up with 0xffffffff in r0 because ptrace will
zero extend the 32-bit value.
If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will
see a positive value in r0 and will incorrectly allow the syscall
through seccomp.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:06 +0000 (20:21 +1000)]
powerpc: Use orig_gpr3 in syscall_get_arguments()
Currently syscall_get_arguments() is used by syscall tracepoints, and
collect_syscall() which is used in some debugging as well as
/proc/pid/syscall.
The current implementation just copies regs->gpr[3 .. 5] out, which is
fine for all the current use cases.
When we enable seccomp filter, that will also start using
syscall_get_arguments(). However for seccomp filter we want to use r3
as the return value of the syscall, and orig_gpr3 as the first
parameter. This will allow seccomp to modify the return value in r3.
To support this we need to modify syscall_get_arguments() to return
orig_gpr3 instead of r3. This is safe for all uses because orig_gpr3
always contains the r3 value that was passed to the syscall. We store it
in the syscall entry path and never modify it.
Update syscall_set_arguments() while we're here, even though it's never
used.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:05 +0000 (20:21 +1000)]
powerpc: Rework syscall_get_arguments() so there is only one loop
Currently syscall_get_arguments() has two loops, one for compat and one
for regular tasks. In prepartion for the next patch, which changes which
registers we use, switch it to only have one loop, so we only have one
place to update.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:04 +0000 (20:21 +1000)]
powerpc: Don't negate error in syscall_set_return_value()
Currently the only caller of syscall_set_return_value() is seccomp
filter, which is not enabled on powerpc.
This means we have not noticed that our implementation of
syscall_set_return_value() negates error, even though the value passed
in is already negative.
So remove the negation in syscall_set_return_value(), and expect the
caller to do it like all other implementations do.
Also add a comment about the ccr handling.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:03 +0000 (20:21 +1000)]
powerpc: Drop unused syscall_get_error()
syscall_get_error() is unused, and never has been.
It's also probably wrong, as it negates r3 before returning it, but that
depends on what the caller is expecting.
It also doesn't deal with compat, and doesn't deal with TIF_NOERROR.
Although we could fix those, until it has a caller and it's clear what
semantics the caller wants it's just untested code. So drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:02 +0000 (20:21 +1000)]
powerpc/kernel: Change the do_syscall_trace_enter() API
The API for calling do_syscall_trace_enter() is currently sensible
enough, it just returns the (modified) syscall number.
However once we enable seccomp filter it will get more complicated. When
seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or
a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may
*not* set a return value in r3.
That means the assembler that calls do_syscall_trace_enter() can not
blindly return ENOSYS, it needs to only return ENOSYS if a return value
has not already been set.
There is no way to implement that logic with the current API. So change
the do_syscall_trace_enter() API to make it deal with the return code
juggling, and the assembler can then just return whatever return code it
is given.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Thu, 23 Jul 2015 10:21:01 +0000 (20:21 +1000)]
powerpc/kernel: Switch to using MAX_ERRNO
Currently on powerpc we have our own #define for the highest (negative)
errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
which are not clear.
The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
In particular seccomp uses MAX_ERRNO to restrict the value that a
seccomp filter can return.
Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
tracer wanting to return 600, expecting it to be seen as an error, would
instead find on powerpc that userspace sees a successful syscall with a
return value of 600.
To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
We are somewhat confident that generic syscalls that can return a
non-error value above negative MAX_ERRNO have already been updated to
use force_successful_syscall_return().
I have also checked all the powerpc specific syscalls, and believe that
none of them expect to return a non-error value between -MAX_ERRNO and
-516. So this change should be safe ...
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Anshuman Khandual [Tue, 30 Jun 2015 08:20:28 +0000 (13:50 +0530)]
powerpc/perf: Change type of the bhrb_users variable
This patch just changes data type of bhrb_users variable from
int to unsigned int because it never contains a negative value.
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sukadev Bhattiprolu [Wed, 15 Jul 2015 03:01:49 +0000 (20:01 -0700)]
powerpc/perf/hv-24x7: Simplify extracting counter from result buffer
Simplify code that extracts a 24x7 counter from the HCALL's result buffer.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sukadev Bhattiprolu [Wed, 15 Jul 2015 03:01:48 +0000 (20:01 -0700)]
powerpc/perf/hv-24x7: Whitespace - fix parameter alignment
Fix parameter alignment to be consistent with coding style.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Fri, 17 Jul 2015 10:11:43 +0000 (20:11 +1000)]
powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*
The hardware RNG on POWER8 and POWER7+ can be relatively slow, since
it can only supply one 64-bit value per microsecond. Currently we
read it in arch_get_random_long(), but that slows down reading from
/dev/urandom since the code in random.c calls arch_get_random_long()
for every longword read from /dev/urandom.
Since the hardware RNG supplies high-quality entropy on every read, it
matches the semantics of arch_get_random_seed_long() better than those
of arch_get_random_long(). Therefore this commit makes the code use
the POWER8/7+ hardware RNG only for arch_get_random_seed_{long,int}
and not for arch_get_random_{long,int}.
This won't affect any other PowerPC-based platforms because none of
them currently support a hardware RNG. To make it clear that the
ppc_md function pointer is used for arch_get_random_seed_*, we rename
it from get_random_long to get_random_seed.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Thomas Huth [Fri, 17 Jul 2015 10:46:58 +0000 (12:46 +0200)]
powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers
The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
Call Trace:
[
c00000007ffe7b90] [
c000000000807670] dump_stack+0xa0/0xdc (unreliable)
[
c00000007ffe7bc0] [
c0000000000e1f14] ___might_sleep+0x134/0x180
[
c00000007ffe7c20] [
c00000000002aec0] rtas_busy_delay+0x30/0xd0
[
c00000007ffe7c50] [
c00000000002bde4] rtas_get_sensor+0x74/0xe0
[
c00000007ffe7ce0] [
c000000000083264] ras_epow_interrupt+0x44/0x450
[
c00000007ffe7d90] [
c000000000120260] handle_irq_event_percpu+0xa0/0x300
[
c00000007ffe7e70] [
c000000000120524] handle_irq_event+0x64/0xc0
[
c00000007ffe7eb0] [
c000000000124dbc] handle_fasteoi_irq+0xec/0x260
[
c00000007ffe7ef0] [
c00000000011f4f0] generic_handle_irq+0x50/0x80
[
c00000007ffe7f20] [
c000000000010f3c] __do_irq+0x8c/0x200
[
c00000007ffe7f90] [
c0000000000236cc] call_do_irq+0x14/0x24
[
c00000007e6f39e0] [
c000000000011144] do_IRQ+0x94/0x110
[
c00000007e6f3a30] [
c000000000002594] hardware_interrupt_common+0x114/0x180
Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.
The EPOW sensor is defined to be "fast" in sPAPR - mpe.
Fixes:
587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Thomas Huth [Wed, 22 Jul 2015 16:56:47 +0000 (18:56 +0200)]
powerpc/rtas: Replace magic values with defines
rtas.h already has some nice #defines for RTAS return status
codes - let's use them instead of hard-coded "magic" values!
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Tue, 12 May 2015 07:05:32 +0000 (17:05 +1000)]
powerpc/eeh: Dump PHB diag-data for non-existing PE
When detecting EEH error on non-existing PE, including the reserved
one, the PE is simply unfrozen without dumping the PHB diag-data,
which is useful for locating the root cause of the EEH error. The
patch dumps the PHB diag-data when non-existing PE reports error.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Tue, 12 May 2015 07:05:22 +0000 (17:05 +1000)]
powerpc/eeh: Fix wrong printed PE number
On LE kernel, the non-existing PE number in BE format derived from
skiboot firmware isn't converted to LE format properly as following
kernel log indicates:
EEH: Clear non-existing PHB#4-PE#
200000000000000
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Mon, 20 Jul 2015 02:58:43 +0000 (08:28 +0530)]
powerpc/signal: Add helper function to fetch quad word aligned pointer
This patch adds one helper function 'sigcontext_vmx_regs' which computes
quad word aligned pointer for 'vmx_reserve' array element in sigcontext
structure making the code more readable.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Reword comment and fix build for CONFIG_ALTIVEC=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Mon, 6 Jul 2015 10:25:33 +0000 (15:55 +0530)]
powerpc/signal: Fix confusing header documentation in sigcontext.h
Commit
ce48b2100785 "powerpc: Add VSX context save/restore, ptrace and
signal support" expanded the 'vmx_reserve' array element to contain 101
double words, but the comment block above was not updated.
Also reorder the constants in the array size declaration to reflect the
logic mentioned in the comment block above. This change helps in
explaining how the HW registers are represented in the array. But no
functional change.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Reworded change log and added whitespace around +'s]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anshuman Khandual [Mon, 6 Jul 2015 10:54:10 +0000 (16:24 +0530)]
powerpc/tm: Drop tm_orig_msr from thread_struct
Currently tm_orig_msr is getting used during process context switch only.
Then there is ckpt_regs which saves the checkpointed userspace context
The MSR slot contained in ckpt_regs structure can be used during process
context switch instead of tm_orig_msr, thus allowing us to drop it from
thread_struct structure. This patch does that change.
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Johannes Thumshirn [Thu, 9 Jul 2015 07:39:42 +0000 (09:39 +0200)]
cxl: Destroy afu->contexts_idr on release of an afu
Destroy afu->contexts_idr on release of an afu, reclaiming the allocated
memory.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Johannes Thumshirn [Wed, 8 Jul 2015 15:14:36 +0000 (17:14 +0200)]
cxl: Destroy cxl_adapter_idr on module_exit
Destroy cxl_adapter_idr on module exit, reclaiming the allocated memory.
This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@
module_init(init);
@ defines_module_exit @
identifier exit;
@@
module_exit(exit);
@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@
DEFINE_IDR(idr);
@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
idr_destroy(&idr);
...
}
@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
+idr_destroy(&idr);
}
</SmPL>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Vipin K Parashar [Wed, 8 Jul 2015 11:06:01 +0000 (16:36 +0530)]
powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platform
This patch adds support for OPAL EPOW (Environmental and Power Warnings)
and DPO (Delayed Power Off) events for the PowerNV platform. These events
are generated on FSP (Flexible Service Processor) based systems. EPOW
events are generated due to various critical system conditions that
require system shutdown. A few examples of these conditions are high
ambient temperature or system running on UPS power with low UPS battery.
DPO event is generated in response to admin initiated system shutdown
request. Upon receipt of EPOW and DPO events the host kernel invokes
orderly_poweroff() for performing graceful system shutdown.
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Acked-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Tue, 23 Jun 2015 07:01:13 +0000 (17:01 +1000)]
powerpc/powernv: Unfreeze VF PE on releasing it
When releasing PE for SRIOV VF, the PE is forced to be frozen
wrongly. When the same PE is picked for another VF, it won't
work anyhow. The patch fixes the issue by unfreezing, not
freezing the VF PE when releasing it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Mon, 22 Jun 2015 03:45:47 +0000 (13:45 +1000)]
powerpc/powernv: Include VF PE in PELTV of PF PE
The PELTV of PF PE should include VF PE, which is missed by current
code, so that the VF PE is frozen automatically when freezing PF PE.
The patch fixes the PELTV of PF PE to include VF PE.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Fri, 19 Jun 2015 02:26:19 +0000 (12:26 +1000)]
powerpc/powernv: Pick M64 PEs based on BARs
On PHB3, PE might be reserved in advance to reflect the M64 segments
consumed by the PE according to M64 BARs (exclude VF BARs) of the PCI
devices included in the PE. The PE is picked based on M64 BARs instead
of the bridge's M64 windows, which might include VF BARs. Otherwise,
wrong PE could be picked.
The patch calculates the used M64 segments and PE numbers according to
the M64 BARs, excluding VF BARs, of PCI devices in one particular PE,
instead of the bridge's M64 windows. Then the right PE number is picked.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Fri, 19 Jun 2015 02:26:18 +0000 (12:26 +1000)]
powerpc/powernv: Boolean argument for pnv_ioda_setup_bus_PE()
The patch changes the type of last argument of pnv_ioda_setup_bus_PE()
and phb::pick_m64_pe() to boolean. No functional change.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Fri, 19 Jun 2015 02:26:17 +0000 (12:26 +1000)]
powerpc/powernv: Reserve M64 PEs based on BARs
On PHB3, some PEs might be reserved in advance to reflect the M64
segments consumed by those PEs. We're reserving PEs based on the
M64 window of root port, which might contain VF BAR. The PEs for
VFs are allocated dynamically, not reserved based on the consumed
M64 segments. So the M64 window of root port isn't reliable for
the task. Instead, we go through M64 BARs (VF BARs excluded) of
PCI devices under the specified root bus and reserve PEs accordingly,
as the patch does.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gavin Shan [Fri, 19 Jun 2015 02:26:16 +0000 (12:26 +1000)]
powerpc/powernv: Allow to reserve one PE for multiple times
The PE numbers are reserved according to root port's M64 window,
which is aligned to M64 segment finely. So one PE shouldn't be
reserved for multiple times. We will reserve PE numbers according
to the M64 BARs of PCI device in subsequent patches, which aren't
aligned to M64 segment size finely. It means one particular PE
could be reserved for multiple times.
The patch allows one PE to be reserved for multiple times and we
print the warning message at debugging level.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Tue, 7 Jul 2015 03:56:59 +0000 (13:56 +1000)]
powerpc: Remove mtmsrd(), use existing mtmsr()
mtmsr() does the right thing on 32bit and 64bit, so use it everywhere.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 29 Aug 2014 07:01:43 +0000 (17:01 +1000)]
powerpc: Add macros for the ibm_architecture_vec[] lengths
The encoding of the lengths in the ibm_architecture_vec array is
"interesting" to say the least. It's non-obvious how the number of bytes
we provide relates to the length value.
In fact we already got it wrong once, see
11e9ed43ca8a "Fix up
ibm_architecture_vec definition".
So add some macros to make it (hopefully) clearer. These at least have
the property that the integer present in the code is equal to the number
of bytes that follows it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Benjamin Herrenschmidt [Wed, 24 Jun 2015 05:25:31 +0000 (15:25 +1000)]
powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask
This patch adds the ability to the DMA direct ops to fallback to the IOMMU
ops for coherent alloc/free if the coherent mask of the device isn't
suitable for accessing the direct DMA space and the device also happens
to have an active IOMMU table.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 24 Jun 2015 05:25:27 +0000 (15:25 +1000)]
powerpc/iommu: Cleanup setting of DMA base/offset
Now that the table and the offset can co-exist, we no longer need
to flip/flop, we can just establish both once at boot time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 24 Jun 2015 05:25:22 +0000 (15:25 +1000)]
powerpc/iommu: Remove dma_data union
To support "hybrid" DMA ops in a subsequent patch, we will need both
a direct DMA offset and an iommu pointer. Those are currently exclusive
(a union), so change them to be separate fields.
While there, also type iommu_table_base properly and make exist only
on CONFIG_PPC64 since it's not referenced on 32-bit at all.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rasmus Villemoes [Thu, 11 Jun 2015 11:27:52 +0000 (13:27 +0200)]
cxl: use more common format specifier
A precision of 16 (%.16llx) has the same effect as a field width of 16
along with passing the 0 flag (%016llx), but the latter is much more
common in the kernel tree. Update cxl to use that.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rasmus Villemoes [Thu, 11 Jun 2015 11:27:51 +0000 (13:27 +0200)]
cxl: Add explicit precision specifiers
C99 says that a precision given as simply '.' with no following digits
or * should be interpreted as 0. The kernel's printf implementation,
however, treats this case as if the precision was omitted. C99 also
says that if both the precision and value are 0, no digits should be
printed. Even if the kernel followed C99 to the letter, I don't think
that would be particularly useful in these cases. For consistency with
most other format strings in the file, use an explicit precision of 16
and add a 0x prefix.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Sun, 12 Jul 2015 22:10:30 +0000 (15:10 -0700)]
Linux 4.2-rc2
Linus Torvalds [Sun, 12 Jul 2015 22:00:20 +0000 (15:00 -0700)]
Revert "drm/i915: Use crtc_state->active in primary check_plane func"
This reverts commit
dec4f799d0a4c9edae20512fa60b0a36f3299ca2.
Jörg Otte reports a NULL pointder dereference due to this commit, as
'crtc_state' very much can be NULL:
crtc_state = state->base.state ?
intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL;
So the change to test 'crtc_state->base.active' cannot possibly be
correct as-is.
There may be some other minimal fix (like just checking crtc_state for
NULL), but I'm just reverting it now for the rc2 release, and people
like Daniel Vetter who actually know this code will figure out what the
right solution is in the longer term.
Reported-and-bisected-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 12 Jul 2015 21:09:36 +0000 (14:09 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"Fixes for this cycle regression in overlayfs and a couple of
long-standing (== all the way back to 2.6.12, at least) bugs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
freeing unlinked file indefinitely delayed
fix a braino in ovl_d_select_inode()
9p: don't leave a half-initialized inode sitting around
Linus Torvalds [Sun, 12 Jul 2015 20:55:24 +0000 (13:55 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"A fair number of 4.2 fixes also because Markos opened the flood gates.
- Patch up the math used calculate the location for the page bitmap.
- The FDC (Not what you think, FDC stands for Fast Debug Channel) IRQ
around was causing issues on non-Malta platforms, so move the code
to a Malta specific location.
- A spelling fix replicated through several files.
- Fix to the emulation of an R2 instruction for R6 cores.
- Fix the JR emulation for R6.
- Further patching of mindless 64 bit issues.
- Ensure the kernel won't crash on CPUs with L2 caches with >= 8
ways.
- Use compat_sys_getsockopt for O32 ABI on 64 bit kernels.
- Fix cache flushing for multithreaded cores.
- A build fix"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: O32: Use compat_sys_getsockopt.
MIPS: c-r4k: Extend way_string array
MIPS: Pistachio: Support CDMM & Fast Debug Channel
MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
MIPS: c-r4k: Fix cache flushing for MT cores
Revert "MIPS: Kconfig: Disable SMP/CPS for 64-bit"
MIPS: cps-vec: Use macros for various arithmetics and memory operations
MIPS: kernel: cps-vec: Replace KSEG0 with CKSEG0
MIPS: kernel: cps-vec: Use ta0-ta3 pseudo-registers for 64-bit
MIPS: kernel: cps-vec: Replace mips32r2 ISA level with mips64r2
MIPS: kernel: cps-vec: Replace 'la' macro with PTR_LA
MIPS: kernel: smp-cps: Fix 64-bit compatibility errors due to pointer casting
MIPS: Fix erroneous JR emulation for MIPS R6
MIPS: Fix branch emulation for BLTC and BGEC instructions
MIPS: kernel: traps: Fix broken indentation
MIPS: bootmem: Don't use memory holes for page bitmap
MIPS: O32: Do not handle require 32 bytes from the stack to be readable.
MIPS, CPUFREQ: Fix spelling of Institute.
MIPS: Lemote 2F: Fix build caused by recent mass rename.
Linus Torvalds [Sun, 12 Jul 2015 17:02:38 +0000 (10:02 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- the high latency PIT detection fix, which slipped through the cracks
for rc1
- a regression fix for the early printk mechanism
- the x86 part to plug irq/vector related hotplug races
- move the allocation of the espfix pages on cpu hotplug to non atomic
context. The current code triggers a might_sleep() warning.
- a series of KASAN fixes addressing boot crashes and usability
- a trivial typo fix for Kconfig help text
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
x86/irq: Retrieve irq data after locking irq_desc
x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
x86/irq: Plug irq vector hotplug race
x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
x86/espfix: Init espfix on the boot CPU side
x86/espfix: Add 'cpu' parameter to init_espfix_ap()
x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
x86/kasan: Add message about KASAN being initialized
x86/kasan: Fix boot crash on AMD processors
x86/kasan: Flush TLBs after switching CR3
x86/kasan: Fix KASAN shadow region page tables
x86/init: Clear 'init_level4_pgt' earlier
x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
Linus Torvalds [Sun, 12 Jul 2015 16:36:59 +0000 (09:36 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"This update from the timer departement contains:
- A series of patches which address a shortcoming in the tick
broadcast code.
If the broadcast device is not available or an hrtimer emulated
broadcast device, some of the original assumptions lead to boot
failures. I rather plugged all of the corner cases instead of only
addressing the issue reported, so the change got a little larger.
Has been extensivly tested on x86 and arm.
- Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
- A regression fix for the imx clocksource driver
- An update to the new state callbacks mechanism for clockevents.
This is required to simplify the conversion, which will take place
in 4.3"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
time: Get rid of do_posix_clock_monotonic_gettime
cris: Replace do_posix_clock_monotonic_gettime()
tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
tick/broadcast: Handle spurious interrupts gracefully
tick/broadcast: Check for hrtimer broadcast active early
tick/broadcast: Return busy when IPI is pending
tick/broadcast: Return busy if periodic mode and hrtimer broadcast
tick/broadcast: Move the check for periodic mode inside state handling
tick/broadcast: Prevent deep idle if no broadcast device available
tick/broadcast: Make idle check independent from mode and config
tick/broadcast: Sanity check the shutdown of the local clock_event
tick/broadcast: Prevent hrtimer recursion
clockevents: Allow set-state callbacks to be optional
clocksource/imx: Define clocksource for mx27
Linus Torvalds [Sun, 12 Jul 2015 16:15:02 +0000 (09:15 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a cpu hotplug race vs. interrupt descriptors:
Prevent irq setup/teardown across the cpu starting/dying parts of cpu
hotplug so that the starting/dying cpu has a stable view of the
descriptor space. This has been an issue for all architectures in the
cpu dying phase, where interrupts are migrated away from the dying
cpu. In the starting phase its mostly a x86 issue vs the vector space
update"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hotplug: Prevent alloc/free of irq descriptors during cpu up/down
Al Viro [Wed, 8 Jul 2015 01:42:38 +0000 (02:42 +0100)]
freeing unlinked file indefinitely delayed
Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course). However, there's one case where that
does *not* happen. Namely, if you open it by fhandle with cold dcache,
then unlink() and close().
In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal. In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()). The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in. As the result, we have the final iput() delayed
indefinitely. It's trivial to reproduce -
void flush_dcache(void)
{
system("mount -o remount,rw /");
}
static char buf[20 * 1024 * 1024];
main()
{
int fd;
union {
struct file_handle f;
char buf[MAX_HANDLE_SZ];
} x;
int m;
x.f.handle_bytes = sizeof(x);
chdir("/root");
mkdir("foo", 0700);
fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
close(fd);
name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
flush_dcache();
fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
unlink("foo/bar");
write(fd, buf, sizeof(buf));
system("df ."); /* 20Mb eaten */
close(fd);
system("df ."); /* should've freed those 20Mb */
flush_dcache();
system("df ."); /* should be the same as #2 */
}
will spit out something like
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 283282 21692 93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).
Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Jul 2015 14:39:45 +0000 (10:39 -0400)]
fix a braino in ovl_d_select_inode()
when opening a directory we want the overlayfs inode, not one from
the topmost layer.
Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com>
Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Jul 2015 14:34:29 +0000 (10:34 -0400)]
9p: don't leave a half-initialized inode sitting around
Cc: stable@vger.kernel.org # all branches
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sun, 12 Jul 2015 03:44:31 +0000 (20:44 -0700)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/djbw/nvdimm
Pull libnvdimm fixes from Dan Williams:
"1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
bug fixes (patches 1-6)
2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).
Granted these are slightly large for a -rc update. They have been
out for review in one form or another since the end of May and were
deferred from the merge window while we settled on the "PMEM API"
for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
and wmb_pmem).
Now that those apis are merged we implement them in the BLK driver
to guarantee that mmio aperture moves stay ordered with respect to
incoming read/write requests, and that writes are flushed through
those mmio-windows and platform-buffers to be persistent on media.
These pass the sub-system unit tests with the updates to
tools/testing/nvdimm, and have received a successful build-report from
the kbuild robot (468 configs).
With acks from Rafael for the touches to drivers/acpi/"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
nfit: add support for NVDIMM "latch" flag
nfit: update block I/O path to use PMEM API
tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
tools/testing/nvdimm: fix return code for unimplemented commands
tools/testing/nvdimm: mock ioremap_wt
pmem: add maintainer for include/linux/pmem.h
nfit: fix smatch "use after null check" report
nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
libnvdimm: smatch cleanups in __nd_ioctl
sparse: fix misplaced __pmem definition
Linus Torvalds [Sat, 11 Jul 2015 18:24:15 +0000 (11:24 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Mostly slight adjusments for new drivers, but also one core fix for
which finally the dependencies are now available as well"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Mark instantiated device nodes with OF_POPULATE
i2c: jz4780: Fix return value if probe fails
i2c: xgene-slimpro: Fix missing mbox_free_channel call in probe error path
i2c: I2C_MT65XX should depend on HAS_DMA
Linus Torvalds [Sat, 11 Jul 2015 18:16:04 +0000 (11:16 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A fix (revert) for a recent regression in Synaptics driver and a fix
for Elan i2c touchpad driver"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - allocate 3 slots to keep stability in image sensors"
Input: elan_i2c - change the hover event from MT to ST
Linus Torvalds [Sat, 11 Jul 2015 18:08:21 +0000 (11:08 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A small set of fixes for problems found by smatch in new drivers that
we added this rc and a handful of driver fixes that came in during the
merge window"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
drivers: clk: st: Incorrect register offset used for lock_status
clk: mediatek: mt8173: Fix enabling of critical clocks
drivers: clk: st: Fix mux bit-setting for Cortex A9 clocks
drivers: clk: st: Add CLK_GET_RATE_NOCACHE flag to clocks
drivers: clk: st: Fix flexgen lock init
drivers: clk: st: Fix FSYN channel values
drivers: clk: st: Remove unused code
clk: qcom: Use parent rate when set rate to pixel RCG clock
clk: at91: do not leak resources
clk: stm32: Fix out-by-one error path in the index lookup
clk: iproc: fix bit manipulation arithmetic
clk: iproc: fix memory leak from clock name
Linus Torvalds [Sat, 11 Jul 2015 18:02:51 +0000 (11:02 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of fixes for radeon, intel, omap and one amdkfd fix.
Radeon fixes are all over, but it does fix some cursor corruption
across suspend/resume. i915 should fix the second warn you were
seeing, so let us know if not. omap is a bunch of small fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits)
drm/radeon: disable vce init on cayman (v2)
drm/amdgpu: fix timeout calculation
drm/radeon: check if BO_VA is set before adding it to the invalidation list
drm/radeon: allways add the VM clear duplicate
Revert "Revert "drm/radeon: dont switch vt on suspend""
drm/radeon: Fold radeon_set_cursor() into radeon_show_cursor()
drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2)
drm/radeon: Clean up reference counting and pinning of the cursor BOs
drm/amdkfd: validate pdd where it acquired first
Revert "drm/i915: Allocate context objects from stolen"
drm/i915: Declare the swizzling unknown for L-shaped configurations
drm/radeon: fix underflow in r600_cp_dispatch_texture()
drm/radeon: default to 2048 MB GART size on SI+
drm/radeon: fix HDP flushing
drm/radeon: use RCU query for GEM_BUSY syscall
drm/amdgpu: Handle irqs only based on irq ring, not irq status regs.
drm/radeon: Handle irqs only based on irq ring, not irq status regs.
drm/i915: Use crtc_state->active in primary check_plane func
drm/i915: Check crtc->active in intel_crtc_disable_planes
drm/i915: Restore all GGTT VMAs on resume
...
Linus Torvalds [Sat, 11 Jul 2015 17:38:10 +0000 (10:38 -0700)]
Merge branch 'for-linus2' of git://git./linux/kernel/git/jmorris/linux-security
Pull selinux fixes from James Morris.
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: fix mprotect PROT_EXEC regression caused by mm change
selinux: don't waste ebitmap space when importing NetLabel categories
Linus Torvalds [Sat, 11 Jul 2015 17:26:34 +0000 (10:26 -0700)]
Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This is an assortment of fixes. Most of the commits are from Filipe
(fsync, the inode allocation cache and a few others). Mark kicked in
a series fixing corners in the extent sharing ioctls, and everyone
else fixed up on assorted other problems"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong check for btrfs_force_chunk_alloc()
Btrfs: fix warning of bytes_may_use
Btrfs: fix hang when failing to submit bio of directIO
Btrfs: fix a comment in inode.c:evict_inode_truncate_pages()
Btrfs: fix memory corruption on failure to submit bio for direct IO
btrfs: don't update mtime/ctime on deduped inodes
btrfs: allow dedupe of same inode
btrfs: fix deadlock with extent-same and readpage
btrfs: pass unaligned length to btrfs_cmp_data()
Btrfs: fix fsync after truncate when no_holes feature is enabled
Btrfs: fix fsync xattr loss in the fast fsync path
Btrfs: fix fsync data loss after append write
Btrfs: fix crash on close_ctree() if cleaner starts new transaction
Btrfs: fix race between caching kthread and returning inode to inode cache
Btrfs: use kmem_cache_free when freeing entry in inode cache
Btrfs: fix race between balance and unused block group deletion
btrfs: add error handling for scrub_workers_get()
btrfs: cleanup noused initialization of dev in btrfs_end_bio()
btrfs: qgroup: allow user to clear the limitation on qgroup
Linus Torvalds [Sat, 11 Jul 2015 17:20:36 +0000 (10:20 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Kevin Hilman:
"A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
prima2 as well as a few arm64-specific DT fixes.
This series also includes a late to support a new Allwinner (sunxi)
SoC, but since it's rather simple and isolated to the
platform-specific code, it's included it for this -rc"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
arm: dts: vexpress: add missing CCI PMU device node to TC2
arm: dts: vexpress: describe all PMUs in TC2 dts
GICv3: Add ITS entry to THUNDER dts
arm64: dts: Add poweroff button device node for APM X-Gene platform
ARM: dts: am4372.dtsi: disable rfbi
ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
ARM: dts: am4372: Add emif node
Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
ARM: sunxi: Enable simplefb in the defconfig
ARM: Remove deprecated symbol from defconfig files
ARM: sunxi: Add Machine support for A33
ARM: sunxi: Introduce Allwinner H3 support
Documentation: sunxi: Update Allwinner SoC documentation
ARM: prima2: move to use REGMAP APIs for rtciobrg
ARM: dts: atlas7: add pinctrl and gpio descriptions
ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
memory: omap-gpmc: Fix parsing of devices
Thomas Gleixner [Sat, 11 Jul 2015 12:26:34 +0000 (14:26 +0200)]
tick/broadcast: Prevent NULL pointer dereference
Dan reported that the recent changes to the broadcast code introduced
a potential NULL dereference.
Add the proper check.
Fixes:
e0454311903d "tick/broadcast: Sanity check the shutdown of the local clock_event"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Fri, 10 Jul 2015 23:54:37 +0000 (16:54 -0700)]
Merge branch 'parisc-4.2-1' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"We have one important patch from Dave Anglin and myself which fixes
PTE/TLB race conditions which caused random segmentation faults on our
debian buildd servers, and one patch from Alex Ivanov which speeds up
the graphical text console on the STI framebuffer driver"
* 'parisc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results
stifb: Implement hardware accelerated copyarea
James Morris [Fri, 10 Jul 2015 23:13:45 +0000 (09:13 +1000)]
Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/selinux into for-linus2
Stephen Smalley [Fri, 10 Jul 2015 13:40:59 +0000 (09:40 -0400)]
selinux: fix mprotect PROT_EXEC regression caused by mm change
commit
66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup
skip security check and lockdep conflict with XFS") caused a regression
for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings. However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a
mmap, the security hook is passed a NULL file and knows it is dealing
with an anonymous mapping and therefore applies an execmem check and no
file checks. On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check
and no execmem check. Since the aforementioned commit now marks the
shmem zero inode with the S_PRIVATE flag, the file checks are disabled
and we have no checking at all on mprotect PROT_EXEC. Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case. This makes the mmap and mprotect checking
consistent for shared anonymous mappings, as well as for /dev/zero and
ashmem.
Cc: <stable@vger.kernel.org> # 4.1.x
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Linus Torvalds [Fri, 10 Jul 2015 19:49:56 +0000 (12:49 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes and clean-up from Catalin Marinas:
- ACPI fix when checking the validity of the GICC MADT subtable
- handle debug exceptions in the el*_inv exception entries
- remove pointless register assignment in two compat syscall wrappers
- unnecessary include path
- defconfig update
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: entry32: remove pointless register assignment
arm64: entry: handle debug exceptions in el*_inv
arm64: Keep the ARM64 Kconfig selects sorted
ACPI / ARM64 : use the new BAD_MADT_GICC_ENTRY macro
ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro
arm64: defconfig: Add Ceva ahci to the defconfig
arm64: remove another unnecessary libfdt include path
John David Anglin [Wed, 1 Jul 2015 21:18:37 +0000 (17:18 -0400)]
parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results
The increased use of pdtlb/pitlb instructions seemed to increase the
frequency of random segmentation faults building packages. Further, we
had a number of cases where TLB inserts would repeatedly fail and all
forward progress would stop. The Haskell ghc package caused a lot of
trouble in this area. The final indication of a race in pte handling was
this syslog entry on sibaris (C8000):
swap_free: Unused swap offset entry
00000004
BUG: Bad page map in process mysqld pte:
00000100 pmd:
019bbec5
addr:
00000000ec464000 vm_flags:
00100073 anon_vma:
0000000221023828 mapping: (null) index:ec464
CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
Backtrace:
[<
0000000040173eb0>] show_stack+0x20/0x38
[<
0000000040444424>] dump_stack+0x9c/0x110
[<
00000000402a0d38>] print_bad_pte+0x1a8/0x278
[<
00000000402a28b8>] unmap_single_vma+0x3d8/0x770
[<
00000000402a4090>] zap_page_range+0xf0/0x198
[<
00000000402ba2a4>] SyS_madvise+0x404/0x8c0
Note that the pte value is 0 except for the accessed bit 0x100. This bit
shouldn't be set without the present bit.
It should be noted that the madvise system call is probably a trigger for many
of the random segmentation faults.
In looking at the kernel code, I found the following problems:
1) The pte_clear define didn't take TLB lock when clearing a pte.
2) We didn't test pte present bit inside lock in exception support.
3) The pte and tlb locks needed to merged in order to ensure consistency
between page table and TLB. This also has the effect of serializing TLB
broadcasts on SMP systems.
The attached change implements the above and a few other tweaks to try
to improve performance. Based on the timing code, TLB purges are very
slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
beneficial to test the split_tlb variable to avoid duplicate purges.
Probably, all PA 2.0 machines have combined TLBs.
I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
applications and most threads have a stack size that is too large to
make this useful. I added some comments to this effect.
Since implementing 1 through 3, I haven't had any random segmentation
faults on mx3210 (rp3440) in about one week of building code and running
as a Debian buildd.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Helge Deller <deller@gmx.de>
Alex Ivanov [Mon, 15 Jun 2015 05:50:45 +0000 (08:50 +0300)]
stifb: Implement hardware accelerated copyarea
This patch adds hardware assisted scrolling. The code is based upon the
following investigation: https://parisc.wiki.kernel.org/index.php/NGLE#Blitter
A simple 'time ls -la /usr/bin' test shows 1.6x speed increase over soft
copy and 2.3x increase over FBINFO_READS_FAST (prefer soft copy over
screen redraw) on Artist framebuffer.
Signed-off-by: Alex Ivanov <lausgans@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Linus Torvalds [Fri, 10 Jul 2015 19:16:59 +0000 (12:16 -0700)]
Merge tag 'powerpc-4.2-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- opal-prd mmap fix from Vaidy
- set kernel taint for MCEs from Daniel
- alignment exception description from Anton
- ppc4xx_hsta_msi build fix from Daniel
- opal-elog interrupt fix from Alistair
- core_idle_state race fix from Shreyas
- hv-24x7 lockdep fix from Sukadev
- multiple cxl fixes from Daniel, Ian, Mikey & Maninder
- update MAINTAINERS to point at shared tree
* tag 'powerpc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
cxl: Check if afu is not null in cxl_slbia
powerpc: Update MAINTAINERS to point at shared tree
powerpc/perf/24x7: Fix lockdep warning
cxl: Fix off by one error allowing subsequent mmap page to be accessed
cxl: Fail mmap if requested mapping is larger than assigned problem state area
cxl: Fix refcounting in kernel API
powerpc/powernv: Fix race in updating core_idle_state
powerpc/powernv: Fix opal-elog interrupt handler
powerpc/ppc4xx_hsta_msi: Include ppc-pci.h to fix reference to hose_list
powerpc: Add plain English description for alignment exception oopses
cxl: Test the correct mmio space before unmapping
powerpc: Set the correct kernel taint on machine check errors
cxl/vphb.c: Use phb pointer after NULL check
powerpc/powernv: Fix vma page prot flags in opal-prd driver
Ross Zwisler [Fri, 10 Jul 2015 17:06:14 +0000 (11:06 -0600)]
nfit: add support for NVDIMM "latch" flag
Add support in the NFIT BLK I/O path for the "latch" flag
defined in the "Get Block NVDIMM Flags" _DSM function:
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
This flag requires the driver to read back the command register after it
is written in the block I/O path. This ensures that the hardware has
fully processed the new command and moved the aperture appropriately.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Ross Zwisler [Fri, 10 Jul 2015 17:06:13 +0000 (11:06 -0600)]
nfit: update block I/O path to use PMEM API
Update the nfit block I/O path to use the new PMEM API and to adhere to
the read/write flows outlined in the "NVDIMM Block Window Driver
Writer's Guide":
http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
This includes adding support for targeted NVDIMM flushes called "flush
hints" in the ACPI 6.0 specification:
http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
For performance and media durability the mapping for a BLK aperture is
moved to a write-combining mapping which is consistent with
memcpy_to_pmem() and wmb_blk().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 10 Jul 2015 18:07:03 +0000 (14:07 -0400)]
tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
In preparation for fixing the BLK path to properly use "directed
pcommit" enable the unit test infrastructure to emit mock "flush"
tables. Writes to these flush addresses trigger a memory controller to
flush its internal buffers to persistent media, similar to the x86
"pcommit" instruction.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 10 Jul 2015 17:06:12 +0000 (11:06 -0600)]
tools/testing/nvdimm: fix return code for unimplemented commands
The implementation for the new "DIMM Flags" DSM relies on the -ENOTTY
return code to indicate that the flags are unimplimented and to fall
back to a safe default. As is the -ENXIO error code erroneoously
indicates to fail enabling a BLK region.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 10 Jul 2015 17:06:11 +0000 (11:06 -0600)]
tools/testing/nvdimm: mock ioremap_wt
In the 4.2-rc1 merge the default_memremap_pmem() implementation switched
from ioremap_nocache() to ioremap_wt(). Add it to the list of mocked
routines to restore the ability to run the unit tests.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>