Benjamin Herrenschmidt [Wed, 16 Aug 2017 06:01:14 +0000 (16:01 +1000)]
powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
VSX uses a combination of the old vector registers, the old FP
registers and new "second halves" of the FP registers.
Thus when we need to see the VSX state in the thread struct
(flush_vsx_to_thread()) or when we'll use the VSX in the kernel
(enable_kernel_vsx()) we need to ensure they are all flushed into
the thread struct if either of them is individually enabled.
Unfortunately we only tested if the whole VSX was enabled, not if they
were individually enabled.
Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 9 Aug 2017 12:41:26 +0000 (22:41 +1000)]
powerpc/watchdog: add locking around init/exit functions
When CPUs start and stop the watchdog, they manipulate shared data
that is normally protected by the lock. Other CPUs can be running
concurrently at this time, so it's a good idea to use locking here
to be on the safe side.
Remove the barrier which is undocumented and didn't do anything.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 9 Aug 2017 12:41:25 +0000 (22:41 +1000)]
powerpc/watchdog: Fix marking of stuck CPUs
When the SMP detector finds other CPUs stuck, it iterates over
them and marks them as stuck. This pulls them out of the pending
mask and allows the detector to continue with remaining good
CPUs (if nmi_watchdog=panic is not enabled).
The code to dothat was buggy because when setting a CPU stuck,
if the pending mask became empty, it resets it to keep the
watchdog running. However the iterator will continue to run
over the new pending mask and mark remaining good CPUs sas stuck.
Fix this by doing it with cpumask bitwise operations.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 9 Aug 2017 12:41:24 +0000 (22:41 +1000)]
powerpc/watchdog: Fix final-check recovered case
When the watchdog decides to panic, it takes the lock and double
checks everything (to avoid races with the CPU being unstuck or
panic()ed by something else).
The exit label was misplaced and would result in all-CPUs backtrace
and watchdog panic even in the case that the condition was found to be
resolved.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 9 Aug 2017 12:41:23 +0000 (22:41 +1000)]
powerpc/watchdog: Moderate touch_nmi_watchdog overhead
Some code can go into a tight loop calling touch_nmi_watchdog (e.g.,
stop_machine CPU hotplug code). This can cause contention on watchdog
locks particularly if all CPUs with watchdog enabled are spinning in
the loops.
Avoid this storm of activity by running the watchdog timer callback
from this path if we have exceeded the timer period since it was last
run.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 9 Aug 2017 12:41:22 +0000 (22:41 +1000)]
powerpc/watchdog: Improve watchdog lock primitive
- Hard-disable interrupts before taking the lock, which prevents
soft-NMI re-entrancy and therefore can prevent deadlocks.
- Use raw_ variants of local_irq_disable to avoid irq debugging.
- When the lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible).
Some stalls have been noticed at high loads that go away with improved
locking. There should not be so much locking contention in the first
place (which is addressed in a subsequent patch), but locking should
still be improved.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 9 Aug 2017 12:41:21 +0000 (22:41 +1000)]
powerpc: NMI IPI improve lock primitive
When the NMI IPI lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible). This
improves behaviour under high contention (e.g., a system crash when
a number of CPUs are trying to enter the debugger).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 9 Aug 2017 10:57:55 +0000 (20:57 +1000)]
powerpc/configs: Re-enable HARD/SOFT lockup detectors
In commit
05a4a9527931 ("kernel/watchdog: split up config options"),
CONFIG_LOCKUP_DETECTOR was split into two separate config options,
CONFIG_HARDLOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR.
Our defconfigs still have CONFIG_LOCKUP_DETECTOR=y, but that is no longer
user selectable, and we don't mention the new options, so we end up with
none of them enabled.
So update the defconfigs to turn on the new SOFT and HARD options, the
end result being the same as what we had previously.
Fixes: 05a4a9527931 ("kernel/watchdog: split up config options")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Tue, 8 Aug 2017 08:43:15 +0000 (14:13 +0530)]
powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails
Currently, we use the opal call opal_slw_set_reg() to inform the
Sleep-Winkle Engine (SLW) to restore the contents of some of the
Hypervisor state on wakeup from deep idle states that lose full
hypervisor context (characterized by the flag
OPAL_PM_LOSE_FULL_CONTEXT).
However, the current code has a bug in that if opal_slw_set_reg()
fails, we don't disable the use of these deep states (winkle on
POWER8, stop4 onwards on POWER9).
This patch fixes this bug by ensuring that if programing the
sleep-winkle engine to restore the hypervisor states in
pnv_save_sprs_for_deep_states() fails, then we exclude such states by
clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from
supported_cpuidle_states. As a result POWER8 will be prevented from
using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to
the default stop state when available.
Further, we ensure in the initialization of the cpuidle-powernv driver
to only include those states whose flags are present in
supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT
states when they have been disabled due to stop-api failure.
Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 7 Aug 2017 11:25:01 +0000 (21:25 +1000)]
Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
This reverts commit
bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f.
As reported by Andreas, this commit is causing unrecoverable SLB misses in the
system call exit path:
Unrecoverable exception 4100 at
c00000000000a1ec
Oops: Unrecoverable exception, sig: 6 [#1]
SMP NR_CPUS=2 PowerMac
...
CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1
task:
c00000018335e080 task.stack:
c000000139e50000
NIP:
c00000000000a1ec LR:
c00000000000a118 CTR:
0000000000000000
REGS:
c000000139e53bb0 TRAP: 4100 Not tainted (4.13.0-rc3)
MSR:
9000000000001030 <SF,HV,ME,IR,DR> CR:
24000044 XER:
20000000 SOFTE: 1
GPR00:
0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe
GPR04:
c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080
GPR08:
900000000000d032 0000000000000000 0000000000000002 fffffffffffff001
GPR12:
c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88
GPR16:
0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000
GPR20:
00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000
GPR24:
00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000
GPR28:
00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10
NIP [
c00000000000a1ec] system_call_exit+0xc0/0x21c
LR [
c00000000000a118] system_call+0x58/0x6c
Call Trace:
[
c000000139e53e30] [
c00000000000a118] system_call+0x58/0x6c (unreliable)
Instruction dump:
64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000
60000000 7c004039 4082001c e8ed0170 <
88070b78>
88c70b79 7c003214 2c200000
This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an
SLB miss on the thread struct.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Diagnosed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 1 Aug 2017 13:59:28 +0000 (23:59 +1000)]
powerpc/64: Fix __check_irq_replay missing decrementer interrupt
If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.
The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.
This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.
Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 20 Jul 2017 01:53:22 +0000 (11:53 +1000)]
powerpc/perf: POWER9 PMU stops after idle workaround
POWER9 DD2 PMU can stop after a state-loss idle in some conditions.
A solution is to set then clear MMCRA[60] after wake from state-loss
idle. MMCRA[60] is a non-architected bit, see the user manual for
details.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sergei Shtylyov [Sat, 29 Jul 2017 19:52:09 +0000 (22:52 +0300)]
powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Freescale MPC832x RDB board
code still only regards 0 as a failure indication -- fix it up.
Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Sat, 29 Jul 2017 12:50:27 +0000 (22:50 +1000)]
powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
The watchdog soft-NMI exception stack setup loads a stack pointer
twice, which is an obvious error. It ends up using the system reset
interrupt (true-NMI) stack, which is also a bug because the watchdog
could be preempted by a system reset interrupt that overwrites the
NMI stack.
Change the soft-NMI to use the "emergency stack". The current kernel
stack is not used, because of the longer-term goal to prevent
asynchronous stack access using soft-disable.
Fixes: 2104180a5369 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 31 Jul 2017 10:20:29 +0000 (20:20 +1000)]
Merge tag 'v4.13-rc1' into fixes
The fixes branch is based off a random pre-rc1 commit, because we had
some fixes that needed to go in before rc1 was released.
However we now need to fix some code that went in after that point, but
before rc1, so merge rc1 to get that code into fixes so we can fix it!
Alistair Popple [Wed, 26 Jul 2017 05:26:40 +0000 (15:26 +1000)]
powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
Commit
8e3f1b1d8255 ("powerpc/powernv/pci: Enable 64-bit devices to access
>4GB DMA space") introduced the ability for PCI device drivers to request a
DMA mask between 64 and 32 bits and actually get a mask greater than
32-bits. However currently if certain machine configuration dependent
conditions are not meet the code silently falls back to a 32-bit mask.
This makes it hard for device drivers to detect which mask they actually
got. Instead we should return an error when the request could not be
fulfilled which allows drivers to either fallback or implement other
workarounds as documented in DMA-API-HOWTO.txt.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 26 Jul 2017 13:19:04 +0000 (23:19 +1000)]
powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
Historically the boot wrapper was always built 32-bit big endian, even
for 64-bit kernels. That was because old firmwares didn't necessarily
support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
uses CROSS32CC for compilation.
However when we added 64-bit little endian support, we also added
support for building the boot wrapper 64-bit. However we kept using
CROSS32CC, because in most cases it is just CC and everything works.
However if the user doesn't specify CROSS32_COMPILE (which no one ever
does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
becomes just "gcc". On native systems that is probably OK, but if we're
cross building it definitely isn't, leading to eg:
gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
gcc: error: unrecognized argument in option ‘-mabi=elfv2’
gcc: error: unrecognized command line option ‘-mlittle-endian’
make: *** [zImage] Error 2
To fix it, stop using CROSS32CC, because we may or may not be building
32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
CROSS32_COMPILE if it's set and we're building for 32-bit.
Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Michael Ellerman [Thu, 27 Jul 2017 13:23:37 +0000 (23:23 +1000)]
powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
CPU, which means it has to run *on* the boot CPU.
In the past we ensured it ran on the boot CPU by changing the CPU
affinity mask of current directly. That was removed in commit
6d11b87d55eb ("powerpc/smp: Replace open coded task affinity logic"),
and replaced with a work queue call.
Unfortunately using a work queue leads to a lockdep warning, now that
the CPU hotplug lock is a regular semaphore:
======================================================
WARNING: possible circular locking dependency detected
...
kworker/0:1/971 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++++}, at: [<
c000000000100974>] apply_workqueue_attrs+0x34/0xa0
but task is already holding lock:
((&wfc.work)){+.+.+.}, at: [<
c0000000000fdb2c>] process_one_work+0x25c/0x800
...
CPU0 CPU1
---- ----
lock((&wfc.work));
lock(cpu_hotplug_lock.rw_sem);
lock((&wfc.work));
lock(cpu_hotplug_lock.rw_sem);
Although the deadlock can't happen in practice, because
smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
lockdep can't tell that.
Luckily in commit
8fb12156b8db ("init: Pin init task to the boot CPU,
initially") tglx changed the generic code to pin init to the boot CPU
to begin with. The unpinning of init from the boot CPU happens in
sched_init_smp(), which is called after smp_cpus_done().
So smp_cpus_done() is always called on the boot CPU, which means we
don't need the work queue call at all - and the lockdep warning goes
away.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Gustavo Romero [Wed, 19 Jul 2017 05:44:13 +0000 (01:44 -0400)]
powerpc/tm: Fix saving of TM SPRs in core dump
Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
TFIAR, TEXASR) to the thread struct, unless the process is currently
inside a suspended transaction.
If the process is core dumping, and the TM SPRs have changed since the
last time the process was context switched, then we will save stale
values of the TM SPRs to the core dump.
Fix it by saving the live register state to the thread struct in that
case.
Fixes: 08e1c01d6aed ("powerpc/ptrace: Enable support for TM SPR state")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Oliver O'Halloran [Thu, 27 Jul 2017 15:35:53 +0000 (01:35 +1000)]
powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
The Radix MMU translation tree as defined in ISA v3.0 contains two
different types of entry, directories and leaves. Leaves are
identified by _PAGE_PTE being set.
The formats of the two entries are different, with the directory
entries containing no spare bits for use by software. In particular
the bit we use for _PAGE_DEVMAP is not reserved for software, and is
part of the NLB (Next Level Base) field, essentially the address of
the next level in the tree.
Note that the Linux pte_t is not == _PAGE_PTE. A huge page pmd
entry (or devmap!) is also a leaf and so has _PAGE_PTE set, even
though we use a pmd_t for it in Linux.
The fix is to ensure that the pmd/pte_devmap() confirm they are
looking at a leaf entry (_PAGE_PTE) as well as checking _PAGE_DEVMAP.
Fixes: ebd31197931d ("powerpc/mm: Add devmap support for ppc64")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add a comment in the code and flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Sat, 17 Jun 2017 14:30:55 +0000 (20:00 +0530)]
powerpc/mm/hash: Free the subpage_prot_table correctly
Fixes: dad6f37c2602e ("powerpc: subpage_protect: Increase the array size to take care of 64TB")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Wed, 26 Jul 2017 05:00:42 +0000 (15:00 +1000)]
powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
In commit
efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions
for modules"), we added an ld version check early in the powerpc
top-level Makefile.
Because the Makefile runs before the kernel config is setup, the
checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
we end up configuring ld for 32-bit big endian.
That would be OK, except that for historical (or perhaps no) reason,
we use 'override LD' to add the endian flags to the LD variable
itself, rather than the normal approach of adding them to LDFLAGS.
The end result is that when we check the ld version we run it as:
$(CROSS_COMPILE)ld -EB -m elf32ppc --version
This often works, unless you are using a 64-bit only and/or little
endian only, toolchain. In which case you see something like:
$ make defconfig
powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
/bin/sh: 1: [: -ge: unexpected operator
The proper fix is to stop using 'override LD', but that will require a
fair bit of testing. Instead we can fix it for now just by reordering
the Makefile to do the version check earlier.
Fixes: efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions for modules")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Laurent Vivier [Fri, 21 Jul 2017 14:51:39 +0000 (16:51 +0200)]
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
As for commit
68baf692c435 ("powerpc/pseries: Fix of_node_put()
underflow during DLPAR remove"), the call to of_node_put() must be
removed from pSeries_reconfig_remove_node().
dlpar_detach_node() and pSeries_reconfig_remove_node() both call
of_detach_node(), and thus the node should not be released in both
cases.
Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Mon, 24 Jul 2017 04:26:06 +0000 (14:26 +1000)]
powerpc/mm/radix: Workaround prefetch issue with KVM
There's a somewhat architectural issue with Radix MMU and KVM.
When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.
The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.
This can cause stale translations and subsequent crashes.
Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.
We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.
We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.
There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:
- On the way out of a guest, before we clear the current VCPU in the
PACA, we check the PID and if it's outside of the permitted range
we flush the TLB for that PID.
- When context switching, if the mm is "new" on that CPU (the
corresponding bit was set for the first time in the mm cpumask), we
check if any sibling thread is in KVM (has a non-NULL VCPU pointer
in the PACA). If that is the case, we also flush the PID for that
CPU (core).
This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.
A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 14 Jul 2017 06:51:23 +0000 (16:51 +1000)]
powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
Currently even with STRICT_KERNEL_RWX we leave the __init text marked
executable after init, which is bad.
Add a hook to mark it NX (no-execute) before we free it, and implement
it for radix and hash.
Note that we use __init_end as the end address, not _einittext,
because overlaps_kernel_text() uses __init_end, because there are
additional executable sections other than .init.text between
__init_begin and __init_end.
Tested on radix and hash with:
0:mon> p $__init_begin
*** 400 exception occurred
Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 14 Jul 2017 06:51:22 +0000 (16:51 +1000)]
powerpc/mm/hash: Refactor hash__mark_rodata_ro()
Move the core logic into a helper, so we can use it for changing other
permissions.
We also change the logic to align start down, and end up. This means
calling the function with a range will expand that range to be at
least 1 mmu_linear_psize page in size. We need that so we can use it
on __init_begin ... __init_end which is not a full page in size.
This should always work for _stext/__init_begin, because we align
__init_begin to _stext + 16M in the linker script.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 14 Jul 2017 06:51:21 +0000 (16:51 +1000)]
powerpc/mm/radix: Refactor radix__mark_rodata_ro()
Move the core logic into a helper, so we can use it for changing permissions
other than _PAGE_WRITE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 18 Jul 2017 05:32:44 +0000 (15:32 +1000)]
powerpc/64s: Fix hypercall entry clobbering r12 input
A previous optimisation incorrectly assumed the PAPR hcall does
not use r12, and clobbers it upon entry. In fact it is used as
an input. This can result in KVM guests crashing (observed with
PR KVM).
Instead of using r12 to save r13, tihs patch saves r13 in ctr.
This is more costly, but not as slow as using the SPRG.
Fixes: acd7d8cef0153 ("powerpc/64s: Optimize hypercall/syscall entry")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Mon, 10 Jul 2017 06:19:38 +0000 (16:19 +1000)]
powerpc/perf: Avoid spurious PMU interrupts after idle
POWER9 DD2 can see spurious PMU interrupts after state-loss idle in
some conditions.
A solution is to save and reload MMCR0 over state-loss idle.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 17 Jul 2017 11:19:01 +0000 (21:19 +1000)]
powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
In commit
1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode
on POWER9"), we added additional flags to the OPAL call to configure
CPUs at boot.
These flags only work on Power9 firmwares, and worse can cause boot
failures on Power8 machines, so we check for CPU_FTR_ARCH_300 (aka POWER9)
before adding the extra flags.
Unfortunately we forgot that opal_configure_cores() is called before
the CPU feature checks are dynamically patched, meaning the check
always returns true.
We definitely need to do something to make the CPU feature checks less
prone to bugs like this, but for now the minimal fix is to use
early_cpu_has_feature().
Reported-and-tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Fixes: 1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Sat, 15 Jul 2017 22:22:10 +0000 (15:22 -0700)]
Linux v4.13-rc1
Linus Torvalds [Sat, 15 Jul 2017 19:58:58 +0000 (12:58 -0700)]
Merge tag 'standardize-docs' of git://git.lwn.net/linux
Pull documentation format standardization from Jonathan Corbet:
"This series converts a number of top-level documents to the RST format
without incorporating them into the Sphinx tree. The hope is to bring
some uniformity to kernel documentation and, perhaps more importantly,
have our existing docs serve as an example of the desired formatting
for those that will be added later.
Mauro has gone through and fixed up a lot of top-level documentation
files to make them conform to the RST format, but without moving or
renaming them in any way. This will help when we incorporate the ones
we want to keep into the Sphinx doctree, but the real purpose is to
bring a bit of uniformity to our documentation and let the top-level
docs serve as examples for those writing new ones"
* tag 'standardize-docs' of git://git.lwn.net/linux: (84 commits)
docs: kprobes.txt: Fix whitespacing
tee.txt: standardize document format
cgroup-v2.txt: standardize document format
dell_rbu.txt: standardize document format
zorro.txt: standardize document format
xz.txt: standardize document format
xillybus.txt: standardize document format
vfio.txt: standardize document format
vfio-mediated-device.txt: standardize document format
unaligned-memory-access.txt: standardize document format
this_cpu_ops.txt: standardize document format
svga.txt: standardize document format
static-keys.txt: standardize document format
smsc_ece1099.txt: standardize document format
SM501.txt: standardize document format
siphash.txt: standardize document format
sgi-ioc4.txt: standardize document format
SAK.txt: standardize document format
rpmsg.txt: standardize document format
robust-futexes.txt: standardize document format
...
Linus Torvalds [Sat, 15 Jul 2017 19:44:02 +0000 (12:44 -0700)]
Merge tag 'random_for_linus' of git://git./linux/kernel/git/tytso/random
Pull random updates from Ted Ts'o:
"Add wait_for_random_bytes() and get_random_*_wait() functions so that
callers can more safely get random bytes if they can block until the
CRNG is initialized.
Also print a warning if get_random_*() is called before the CRNG is
initialized. By default, only one single-line warning will be printed
per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
warning will be printed for each function which tries to get random
bytes before the CRNG is initialized. This can get spammy for certain
architecture types, so it is not enabled by default"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: reorder READ_ONCE() in get_random_uXX
random: suppress spammy warnings about unseeded randomness
random: warn when kernel uses unseeded randomness
net/route: use get_random_int for random counter
net/neighbor: use get_random_u32 for 32-bit hash random
rhashtable: use get_random_u32 for hash_rnd
ceph: ensure RNG is seeded before using
iscsi: ensure RNG is seeded before use
cifs: use get_random_u32 for 32-bit lock random
random: add get_random_{bytes,u32,u64,int,long,once}_wait family
random: add wait_for_random_bytes() API
Linus Torvalds [Sat, 15 Jul 2017 19:00:42 +0000 (12:00 -0700)]
Merge branch 'work.mount' of git://git./linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro:
"Preparations for fsmount/fsopen stuff (coming next cycle). Everything
gets moved to explicit ->show_options(), killing ->s_options off +
some cosmetic bits around fs/namespace.c and friends. Basically, the
stuff needed to work with fsmount series with minimum of conflicts
with other work.
It's not strictly required for this merge window, but it would reduce
the PITA during the coming cycle, so it would be nice to have those
bits and pieces out of the way"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
isofs: Fix isofs_show_options()
VFS: Kill off s_options and helpers
orangefs: Implement show_options
9p: Implement show_options
isofs: Implement show_options
afs: Implement show_options
affs: Implement show_options
befs: Implement show_options
spufs: Implement show_options
bpf: Implement show_options
ramfs: Implement show_options
pstore: Implement show_options
omfs: Implement show_options
hugetlbfs: Implement show_options
VFS: Don't use save/replace_mount_options if not using generic_show_options
VFS: Provide empty name qstr
VFS: Make get_filesystem() return the affected filesystem
VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Provide a function to create a NUL-terminated string from unterminated data
Linus Torvalds [Sat, 15 Jul 2017 18:47:27 +0000 (11:47 -0700)]
Merge branch 'work.__copy_to_user' of git://git./linux/kernel/git/viro/vfs
Pull more __copy_.._user elimination from Al Viro.
* 'work.__copy_to_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
drm_dp_aux_dev: switch to read_iter/write_iter
Linus Torvalds [Sat, 15 Jul 2017 18:17:52 +0000 (11:17 -0700)]
Merge branch 'work.uaccess-unaligned' of git://git./linux/kernel/git/viro/vfs
Pull uacess-unaligned removal from Al Viro:
"That stuff had just one user, and an exotic one, at that - binfmt_flat
on arm and m68k"
* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill {__,}{get,put}_user_unaligned()
binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
Linus Torvalds [Sat, 15 Jul 2017 18:06:17 +0000 (11:06 -0700)]
Merge branch 'misc.compat' of git://git./linux/kernel/git/viro/vfs
Pull network field-by-field copy-in updates from Al Viro:
"This part of the misc compat queue was held back for review from
networking folks and since davem has jus ACKed those..."
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get_compat_bpf_fprog(): don't copyin field-by-field
get_compat_msghdr(): get rid of field-by-field copyin
copy_msghdr_from_user(): get rid of field-by-field copyin
Linus Torvalds [Sat, 15 Jul 2017 17:59:54 +0000 (10:59 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"Boston platform support:
- Document DT bindings
- Add CLK driver for board clocks
CM:
- Avoid per-core locking with CM3 & higher
- WARN on attempt to lock invalid VP, not BUG
CPS:
- Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6
- Prevent multi-core with dcache aliasing
- Handle cores not powering down more gracefully
- Handle spurious VP starts more gracefully
DSP:
- Add lwx & lhx missaligned access support
eBPF:
- Add MIPS support along with many supporting change to add the
required infrastructure
Generic arch code:
- Misc sysmips MIPS_ATOMIC_SET fixes
- Drop duplicate HAVE_SYSCALL_TRACEPOINTS
- Negate error syscall return in trace
- Correct forced syscall errors
- Traced negative syscalls should return -ENOSYS
- Allow samples/bpf/tracex5 to access syscall arguments for sane
traces
- Cleanup from old Kconfig options in defconfigs
- Fix PREF instruction usage by memcpy for MIPS R6
- Fix various special cases in the FPU eulation
- Fix some special cases in MIPS16e2 support
- Fix MIPS I ISA /proc/cpuinfo reporting
- Sort MIPS Kconfig alphabetically
- Fix minimum alignment requirement of IRQ stack as required by
ABI / GCC
- Fix special cases in the module loader
- Perform post-DMA cache flushes on systems with MAARs
- Probe the I6500 CPU
- Cleanup cmpxchg and add support for 1 and 2 byte operations
- Use queued read/write locks (qrwlock)
- Use queued spinlocks (qspinlock)
- Add CPU shared FTLB feature detection
- Handle tlbex-tlbp race condition
- Allow storing pgd in C0_CONTEXT for MIPSr6
- Use current_cpu_type() in m4kc_tlbp_war()
- Support Boston in the generic kernel
Generic platform:
- yamon-dt: Pull YAMON DT shim code out of SEAD-3 board
- yamon-dt: Support > 256MB of RAM
- yamon-dt: Use serial* rather than uart* aliases
- Abstract FDT fixup application
- Set RTC_ALWAYS_BCD to 0
- Add a MAINTAINERS entry
core kernel:
- qspinlock.c: include linux/prefetch.h
Loongson 3:
- Add support
Perf:
- Add I6500 support
SEAD-3:
- Remove GIC timer from DT
- Set interrupt-parent per-device, not at root node
- Fix GIC interrupt specifiers
SMP:
- Skip IPI setup if we only have a single CPU
VDSO:
- Make comment match reality
- Improvements to time code in VDSO"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (86 commits)
locking/qspinlock: Include linux/prefetch.h
MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
MIPS: Fix minimum alignment requirement of IRQ stack
MIPS: generic: Support MIPS Boston development boards
MIPS: DTS: img: Don't attempt to build-in all .dtb files
clk: boston: Add a driver for MIPS Boston board clocks
dt-bindings: Document img,boston-clock binding
MIPS: Traced negative syscalls should return -ENOSYS
MIPS: Correct forced syscall errors
MIPS: Negate error syscall return in trace
MIPS: Drop duplicate HAVE_SYSCALL_TRACEPOINTS select
MIPS16e2: Provide feature overrides for non-MIPS16 systems
MIPS: MIPS16e2: Report ASE presence in /proc/cpuinfo
MIPS: MIPS16e2: Subdecode extended LWSP/SWSP instructions
MIPS: MIPS16e2: Identify ASE presence
MIPS: VDSO: Fix a mismatch between comment and preprocessor constant
MIPS: VDSO: Add implementation of gettimeofday() fallback
MIPS: VDSO: Add implementation of clock_gettime() fallback
MIPS: VDSO: Fix conversions in do_monotonic()/do_monotonic_coarse()
MIPS: Use current_cpu_type() in m4kc_tlbp_war()
...
Linus Torvalds [Sat, 15 Jul 2017 17:49:33 +0000 (10:49 -0700)]
Merge branch 'for-linus-4.13-rc1' of git://git./linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
"Mostly fixes for UML:
- First round of fixes for PTRACE_GETRESET/SETREGSET
- A printf vs printk cleanup
- Minor improvements"
* 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Correctly check for PTRACE_GETRESET/SETREGSET
um: v2: Use generic NOTES macro
um: Add kerneldoc for userspace_tramp() and start_userspace()
um: Add kerneldoc for segv_handler
um: stub-data.h: remove superfluous include
um: userspace - be more verbose in ptrace set regs error
um: add dummy ioremap and iounmap functions
um: Allow building and running on older hosts
um: Avoid longjmp/setjmp symbol clashes with libpthread.a
um: console: Ignore console= option
um: Use os_warn to print out pre-boot warning/error messages
um: Add os_warn() for pre-boot warning/error messages
um: Use os_info for the messages on normal path
um: Add os_info() for pre-boot information messages
um: Use printk instead of printf in make_uml_dir
Linus Torvalds [Sat, 15 Jul 2017 17:46:14 +0000 (10:46 -0700)]
Merge tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs
Pull UBIFS updates from Richard Weinberger:
- Updates and fixes for the file encryption mode
- Minor improvements
- Random fixes
* tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs:
ubifs: Set double hash cookie also for RENAME_EXCHANGE
ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
ubifs: Don't leak kernel memory to the MTD
ubifs: Change gfp flags in page allocation for bulk read
ubifs: Fix oops when remounting with no_bulk_read.
ubifs: Fail commit if TNC is obviously inconsistent
ubifs: allow userspace to map mounts to volumes
ubifs: Wire-up statx() support
ubifs: Remove dead code from ubifs_get_link()
ubifs: Massage debug prints wrt. fscrypt
ubifs: Add assert to dent_key_init()
ubifs: Fix unlink code wrt. double hash lookups
ubifs: Fix data node size for truncating uncompressed nodes
ubifs: Don't encrypt special files on creation
ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename
ubifs: Fix inode data budget in ubifs_mknod
ubifs: Correctly evict xattr inodes
ubifs: Unexport ubifs_inode_slab
ubifs: don't bother checking for encryption key in ->mmap()
ubifs: require key for truncate(2) of encrypted file
Linus Torvalds [Sat, 15 Jul 2017 17:18:16 +0000 (10:18 -0700)]
Merge tag 'kvm-4.13-2' of git://git./virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM updates for v4.13
Common:
- add uevents for VM creation/destruction
- annotate and properly access RCU-protected objects
s390:
- rename IOCTL added in the first v4.13 merge
x86:
- emulate VMLOAD VMSAVE feature in SVM
- support paravirtual asynchronous page fault while nested
- add Hyper-V userspace interfaces for better migration
- improve master clock corner cases
- extend internal error reporting after EPT misconfig
- correct single-stepping of emulated instructions in SVM
- handle MCE during VM entry
- fix nVMX VM entry checks and nVMX VMCS shadowing"
* tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: x86: hyperv: make VP_INDEX managed by userspace
KVM: async_pf: Let guest support delivery of async_pf from guest mode
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
KVM: x86: make backwards_tsc_observed a per-VM variable
KVM: trigger uevents when creating or destroying a VM
KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
KVM: SVM: Rename lbr_ctl field in the vmcb control area
KVM: SVM: Prepare for new bit definition in lbr_ctl
KVM: SVM: handle singlestep exception when skipping emulated instructions
KVM: x86: take slots_lock in kvm_free_pit
KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
kvm: vmx: Properly handle machine check during VM-entry
KVM: x86: update master clock before computing kvmclock_offset
kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
...
Sebastian Andrzej Siewior [Fri, 30 Jun 2017 14:37:13 +0000 (16:37 +0200)]
random: reorder READ_ONCE() in get_random_uXX
Avoid the READ_ONCE in commit
4a072c71f49b ("random: silence compiler
warnings and fix race") if we can leave the function after
arch_get_random_XXX().
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Thu, 8 Jun 2017 08:16:59 +0000 (04:16 -0400)]
random: suppress spammy warnings about unseeded randomness
Unfortunately, on some models of some architectures getting a fully
seeded CRNG is extremely difficult, and so this can result in dmesg
getting spammed for a surprisingly long time. This is really bad from
a security perspective, and so architecture maintainers really need to
do what they can to get the CRNG seeded sooner after the system is
booted. However, users can't do anything actionble to address this,
and spamming the kernel messages log will only just annoy people.
For developers who want to work on improving this situation,
CONFIG_WARN_UNSEEDED_RANDOM has been renamed to
CONFIG_WARN_ALL_UNSEEDED_RANDOM. By default the kernel will always
print the first use of unseeded randomness. This way, hopefully the
security obsessed will be happy that there is _some_ indication when
the kernel boots there may be a potential issue with that architecture
or subarchitecture. To see all uses of unseeded randomness,
developers can enable CONFIG_WARN_ALL_UNSEEDED_RANDOM.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 15 Jul 2017 05:57:32 +0000 (22:57 -0700)]
Merge tag 'xfs-4.13-merge-6' of git://git./fs/xfs/xfs-linux
Pull XFS fixes from Darrick Wong:
"Largely debugging and regression fixes.
- Add some locking assertions for the _ilock helpers.
- Revert the XFS_QMOPT_NOLOCK patch; after discussion with hch the
online fsck patch that would have needed it has been redesigned and
no longer needs it.
- Fix behavioral regression of SEEK_HOLE/DATA with negative offsets
to match 4.12-era XFS behavior"
* tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: in iomap seek_{hole,data}, return -ENXIO for negative offsets
Revert "xfs: grab dquots without taking the ilock"
xfs: assert locking precondition in xfs_readlink_bmap_ilocked
xfs: assert locking precondіtion in xfs_attr_list_int_ilocked
xfs: fixup xfs_attr_get_ilocked
Linus Torvalds [Sat, 15 Jul 2017 05:55:52 +0000 (22:55 -0700)]
Merge branch 'for-4.13-part2' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We've identified and fixed a silent corruption (introduced by code in
the first pull), a fixup after the blk_status_t merge and two fixes to
incremental send that Filipe has been hunting for some time"
* 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unexpected return value of bio_readpage_error
btrfs: btrfs_create_repair_bio never fails, skip error handling
btrfs: cloned bios must not be iterated by bio_for_each_segment_all
Btrfs: fix write corruption due to bio cloning on raid5/6
Btrfs: incremental send, fix invalid memory access
Btrfs: incremental send, fix invalid path for link commands
Linus Torvalds [Sat, 15 Jul 2017 05:53:37 +0000 (22:53 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull a few more input updates from Dmitry Torokhov:
- multi-touch handling for Xen
- fix for long-standing bug causing crashes in i8042 on boot
- change to gpio_keys to better handle key presses during system state
transition
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - fix crash at boot time
Input: gpio_keys - handle the missing key press event in resume phase
Input: xen-kbdfront - add multi-touch support
Linus Torvalds [Sat, 15 Jul 2017 05:49:50 +0000 (22:49 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- fix new compiler warnings in cavium
- set post-op IV properly in caam (this fixes chaining)
- fix potential use-after-free in atmel in case of EBUSY
- fix sleeping in softirq path in chcr
- disable buggy sha1-avx2 driver (may overread and page fault)
- fix use-after-free on signals in caam
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: cavium - make several functions static
crypto: chcr - Avoid algo allocation in softirq.
crypto: caam - properly set IV after {en,de}crypt
crypto: atmel - only treat EBUSY as transient if backlog
crypto: af_alg - Avoid sock_graft call warning
crypto: caam - fix signals handling
crypto: sha1-ssse3 - Disable avx2
Linus Torvalds [Sat, 15 Jul 2017 05:39:35 +0000 (22:39 -0700)]
Merge tag 'devprop-fix-4.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"This fixes a problem with bool properties that could be seen as "true"
when the property was not present at all by adding a special helper
for bool properties with checks for all of the requisute conditions
(Sakari Ailus)"
* tag 'devprop-fix-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: Introduce fwnode_call_bool_op() for ops that return bool
Linus Torvalds [Sat, 15 Jul 2017 05:27:13 +0000 (22:27 -0700)]
Merge tag 'acpi-fixes-4.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix the return value of an IRQ mapping routine in the ACPI core,
fix an EC driver issue causing abnormal fan behavior after system
resume on some systems and add quirks for ACPI device objects that
need to be treated as "always present" to work around bogus
implementations of the _STA control method.
Specifics:
- Fix the return value of acpi_gsi_to_irq() to make the GSI to IRQ
mapping work on the Mustang (ARM64) platform (Mark Salter).
- Fix an EC driver issue that causes fans to behave abnormally after
system resume on some systems which turns out to be related to
switching over the EC into the polling mode during the noirq stages
of system suspend and resume (Lv Zheng).
- Add quirks for ACPI device objects that need to be treated as
"always present", because their _STA methods are designed to work
around Windows driver bugs and return garbage from our perspective
(Hans de Goede)"
* tag 'acpi-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / x86: Add KIOX000A accelerometer on GPD win to always_present_ids array
ACPI / x86: Add Dell Venue 11 Pro 7130 touchscreen to always_present_ids
ACPI / x86: Allow matching always_present_id array entries by DMI
Revert "ACPI / EC: Enable event freeze mode..." to fix a regression
ACPI / EC: Drop EC noirq hooks to fix a regression
ACPI / irq: Fix return code of acpi_gsi_to_irq()
Linus Torvalds [Sat, 15 Jul 2017 05:24:25 +0000 (22:24 -0700)]
Merge tag 'pm-fixes-4.13-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a recently exposed issue in the PCI device wakeup code and
one older problem related to PCI device wakeup that has been reported
recently, modify one more piece of computations in intel_pstate to get
rid of a rounding error, fix a possible race in the schedutil cpufreq
governor, fix the device PM QoS sysfs interface to correctly handle
invalid user input, fix return values of two probe routines in devfreq
drivers and constify an attribute_group structure in devfreq.
Specifics:
- Avoid clearing the PCI PME Enable bit for devices as a result of
config space restoration which confuses AML executed afterward and
causes wakeup events to be lost on some systems (Rafael Wysocki).
- Fix the native PCIe PME interrupts handling in the cases when the
PME IRQ is set up as a system wakeup one so that runtime PM remote
wakeup works as expected after system resume on systems where that
happens (Rafael Wysocki).
- Fix the device PM QoS sysfs interface to handle invalid user input
correctly instead of using an unititialized variable value as the
latency tolerance for the device at hand (Dan Carpenter).
- Get rid of one more rounding error from intel_pstate computations
(Srinivas Pandruvada).
- Fix the schedutil cpufreq governor to prevent it from possibly
accessing unititialized data structures from governor callbacks in
some cases on systems when multiple CPUs share a single cpufreq
policy object (Vikram Mulukutla).
- Fix the return values of probe routines in two devfreq drivers
(Gustavo Silva).
- Constify an attribute_group structure in devfreq (Arvind Yadav)"
* tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / PM: Fix native PME handling during system suspend/resume
PCI / PM: Restore PME Enable after config space restoration
cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race
PM / QoS: return -EINVAL for bogus strings
cpufreq: intel_pstate: Fix ratio setting for min_perf_pct
PM / devfreq: constify attribute_group structures.
PM / devfreq: tegra: fix error return code in tegra_devfreq_probe()
PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe()
Linus Torvalds [Sat, 15 Jul 2017 04:57:25 +0000 (21:57 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge even more updates from Andrew Morton:
- a few leftovers
- fault-injector rework
- add a module loader test driver
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kmod: throttle kmod thread limit
kmod: add test driver to stress test the module loader
MAINTAINERS: give kmod some maintainer love
xtensa: use generic fb.h
fault-inject: add /proc/<pid>/fail-nth
fault-inject: simplify access check for fail-nth
fault-inject: make fail-nth read/write interface symmetric
fault-inject: parse as natural 1-based value for fail-nth write interface
fault-inject: automatically detect the number base for fail-nth write interface
kernel/watchdog.c: use better pr_fmt prefix
MAINTAINERS: move the befs tree to kernel.org
lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
mm: fix overflow check in expand_upwards()
Daniel Micay [Fri, 14 Jul 2017 21:28:12 +0000 (17:28 -0400)]
replace incorrect strscpy use in FORTIFY_SOURCE
Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum
possible size of the outermost object, but strscpy defines the count
parameter as the exact buffer size, so this could copy past the end of
the source. This would still be wrong with the planned usage of
__builtin_object_size(p, 1) for intra-object overflow checks since it's
the maximum possible size of the specified object with no guarantee of
it being that large.
Reuse of the fortified functions like this currently makes the runtime
error reporting less precise but that can be improved later on.
Noticed by Dave Jones and KASAN.
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 15 Jul 2017 04:50:50 +0000 (21:50 -0700)]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile updates from Chris Metcalf:
"This adds support for an <arch/intreg.h> to help with removing
__need_xxx #defines from glibc, and removes some dead code in
arch/tile/mm/init.c"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
mm, tile: drop arch_{add,remove}_memory
tile: prefer <arch/intreg.h> to __need_int_reg_t
Linus Torvalds [Fri, 14 Jul 2017 22:33:15 +0000 (15:33 -0700)]
Merge tag 'powerpc-4.13-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Nothing that really stands out, just a bunch of fixes that have come
in in the last couple of weeks.
None of these are actually fixes for code that is new in 4.13. It's
roughly half older bugs, with fixes going to stable, and half
fixes/updates for Power9.
Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin
Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin,
Oliver O'Halloran"
* tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix atomic64_inc_not_zero() to return an int
powerpc: Fix emulation of mfocrf in emulate_step()
powerpc: Fix emulation of mcrf in emulate_step()
powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9
powerpc/asm: Mark cr0 as clobbered in mftb()
powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
powerpc/mm/radix: Synchronize updates to the process table
powerpc/mm/radix: Properly clear process table entry
powerpc/powernv: Tell OPAL about our MMU mode on POWER9
powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:11 +0000 (14:50 -0700)]
kmod: throttle kmod thread limit
If we reach the limit of modprobe_limit threads running the next
request_module() call will fail. The original reason for adding a kill
was to do away with possible issues with in old circumstances which would
create a recursive series of request_module() calls.
We can do better than just be super aggressive and reject calls once we've
reached the limit by simply making pending callers wait until the
threshold has been reduced, and then throttling them in, one by one.
This throttling enables requests over the kmod concurrent limit to be
processed once a pending request completes. Only the first item queued up
to wait is woken up. The assumption here is once a task is woken it will
have no other option to also kick the queue to check if there are more
pending tasks -- regardless of whether or not it was successful.
By throttling and processing only max kmod concurrent tasks we ensure we
avoid unexpected fatal request_module() calls, and we keep memory
consumption on module loading to a minimum.
With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
time to run both tests:
time ./kmod.sh -t 0008
real 0m16.366s
user 0m0.883s
sys 0m8.916s
time ./kmod.sh -t 0009
real 0m50.803s
user 0m0.791s
sys 0m9.852s
Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:08 +0000 (14:50 -0700)]
kmod: add test driver to stress test the module loader
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests
early (refer to the force_init_test module parameter), however since a
lot of test can get a system out of memory fast we leave this disabled
for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls only
allow each one parameter a test driver for these is rather simple.
Other factors that can help out test driver though are the number of
calls we issue and knowing current limitations of each. This exposes
configuration as much as possible through userspace to be able to build
tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we add a
knob to let us create new devices at will) also be possible to perform
more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
XXX: add test restult for 0007
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them. If
you use these paritions for your rootfs you can change the default test
driver used for get_fs_type() by exporting it into your environment. For
example of other test defaults you can override refer to kmod.sh
allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build different test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized
version of module order for us. Once it finds the modules it needs, as
per depmod symbol dep map, it will finit_module() the respective
modules which are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother userspace
as the module already is loaded. This is all handled by kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
not for "xfs". The optimization in kernel described in b) fails to
catch if we have a lot of consecutive get_fs_type() calls. The reason
is the optimization in place does not look for aliases. This means two
consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of
memory) we have a period of time during which module request for the
same name either with request_module() or get_fs_type() will *also*
fail to load even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type()
call does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures with
certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
It panics a box after reaping all userspace processes and still not
having enough memory to reap.
[arnd@arndb.de: add dependencies for test module]
Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:05 +0000 (14:50 -0700)]
MAINTAINERS: give kmod some maintainer love
As suggested by Jessica, I've been actively working on kmod, so might as
well reflect its maintained status.
Changes are expected to go through akpm's tree.
Link: http://lkml.kernel.org/r/20170628223155.26472-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Suggested-by: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobias Klauser [Fri, 14 Jul 2017 21:50:03 +0000 (14:50 -0700)]
xtensa: use generic fb.h
The arch uses a verbatim copy of the asm-generic version and does not
add any own implementations to the header, so use asm-generic/fb.h
instead of duplicating code.
Link: http://lkml.kernel.org/r/20170517083545.2115-1-tklauser@distanz.ch
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:50:00 +0000 (14:50 -0700)]
fault-inject: add /proc/<pid>/fail-nth
fail-nth interface is only created in /proc/self/task/<current-tid>/.
This change also adds it in /proc/<pid>/.
This makes shell based tool a bit simpler.
$ bash -c "builtin echo 100 > /proc/self/fail-nth && exec ls /"
Link: http://lkml.kernel.org/r/1491490561-10485-6-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:57 +0000 (14:49 -0700)]
fault-inject: simplify access check for fail-nth
The fail-nth file is created with 0666 and the access is permitted if
and only if the task is current.
This file is owned by the currnet user. So we can create it with 0644
and allow the owner to write it. This enables to watch the status of
task->fail_nth from another processes.
[akinobu.mita@gmail.com: don't convert unsigned type value as signed int]
Link: http://lkml.kernel.org/r/1492444483-9239-1-git-send-email-akinobu.mita@gmail.com
[akinobu.mita@gmail.com: avoid unwanted data race to task->fail_nth]
Link: http://lkml.kernel.org/r/1499962492-8931-1-git-send-email-akinobu.mita@gmail.com
Link: http://lkml.kernel.org/r/1491490561-10485-5-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:54 +0000 (14:49 -0700)]
fault-inject: make fail-nth read/write interface symmetric
The read interface for fail-nth looks a bit odd. Read from this file
returns "NYYYY..." or "YYYYY..." (this makes me surprise when cat this
file). Because there is no EOF condition. The first character
indicates current->fail_nth is zero or not, and then current->fail_nth
is reset to zero.
Just returning task->fail_nth value is more natural to understand.
Link: http://lkml.kernel.org/r/1491490561-10485-4-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:52 +0000 (14:49 -0700)]
fault-inject: parse as natural 1-based value for fail-nth write interface
The value written to fail-nth file is parsed as 0-based. Parsing as
one-based is more natural to understand and it enables to cancel the
previous setup by simply writing '0'.
This change also converts task->fail_nth from signed to unsigned int.
Link: http://lkml.kernel.org/r/1491490561-10485-3-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:49 +0000 (14:49 -0700)]
fault-inject: automatically detect the number base for fail-nth write interface
Automatically detect the number base to use when writing to fail-nth
file instead of always parsing as a decimal number.
Link: http://lkml.kernel.org/r/1491490561-10485-2-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Fri, 14 Jul 2017 21:49:46 +0000 (14:49 -0700)]
kernel/watchdog.c: use better pr_fmt prefix
After commit
73ce0511c436 ("kernel/watchdog.c: move hardlockup
detector to separate file"), 'NMI watchdog' is inappropriate in
kernel/watchdog.c, using 'watchdog' only.
Link: http://lkml.kernel.org/r/1499928642-48983-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis de Bethencourt [Fri, 14 Jul 2017 21:49:44 +0000 (14:49 -0700)]
MAINTAINERS: move the befs tree to kernel.org
Update the location of the befs git tree and my email address.
Link: http://lkml.kernel.org/r/20170709110012.2991-1-luisbg@kernel.org
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Ellerman [Fri, 14 Jul 2017 21:49:41 +0000 (14:49 -0700)]
lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
atomic64_inc_not_zero() returns a "truth value" which in C is
traditionally an int. That means callers are likely to expect the
result will fit in an int.
If an implementation returns a "true" value which does not fit in an
int, then there's a possibility that callers will truncate it when they
store it in an int.
In fact this happened in practice, see commit
966d2b04e070
("percpu-refcount: fix reference leak during percpu-atomic transition").
So add a test that the result fits in an int, even when the input
doesn't. This catches the case where an implementation just passes the
non-zero input value out as the result.
Link: http://lkml.kernel.org/r/1499775133-1231-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Fri, 14 Jul 2017 21:49:38 +0000 (14:49 -0700)]
mm: fix overflow check in expand_upwards()
Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.
Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.
Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.
Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Mon, 26 Jun 2017 11:49:04 +0000 (13:49 +0200)]
ubifs: Set double hash cookie also for RENAME_EXCHANGE
We developed RENAME_EXCHANGE and UBIFS_FLG_DOUBLE_HASH more or less in
parallel and this case was forgotten. :-(
Cc: stable@vger.kernel.org
Fixes: d63d61c16972 ("ubifs: Implement UBIFS_FLG_DOUBLE_HASH")
Signed-off-by: Richard Weinberger <richard@nod.at>
Xiaolei Li [Fri, 23 Jun 2017 02:37:23 +0000 (10:37 +0800)]
ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
The inode is not locked in init_xattrs when creating a new inode.
Without this patch, there will occurs assert when booting or creating
a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled.
Log likes:
UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156)
CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S
4.12.0-rc1-207440-g1e70b02 #2
Hardware name: MediaTek MT2712 evaluation board (DT)
Call trace:
[<
ffff000008088538>] dump_backtrace+0x0/0x238
[<
ffff000008088834>] show_stack+0x14/0x20
[<
ffff0000083d98d4>] dump_stack+0x9c/0xc0
[<
ffff00000835d524>] ubifs_xattr_set+0x374/0x5e0
[<
ffff00000835d7ec>] init_xattrs+0x5c/0xb8
[<
ffff000008385788>] security_inode_init_security+0x110/0x190
[<
ffff00000835e058>] ubifs_init_security+0x30/0x68
[<
ffff00000833ada0>] ubifs_mkdir+0x100/0x200
[<
ffff00000820669c>] vfs_mkdir+0x11c/0x1b8
[<
ffff00000820b73c>] SyS_mkdirat+0x74/0xd0
[<
ffff000008082f8c>] __sys_trace_return+0x0/0x4
Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Fri, 16 Jun 2017 14:21:44 +0000 (16:21 +0200)]
ubifs: Don't leak kernel memory to the MTD
When UBIFS prepares data structures which will be written to the MTD it
ensues that their lengths are multiple of 8. Since it uses kmalloc() the
padded bytes are left uninitialized and we leak a few bytes of kernel
memory to the MTD.
To make sure that all bytes are initialized, let's switch to kzalloc().
Kzalloc() is fine in this case because the buffers are not huge and in
the IO path the performance bottleneck is anyway the MTD.
Cc: stable@vger.kernel.org
Fixes: 1e51764a3c2a ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Hyunchul Lee [Wed, 14 Jun 2017 00:31:49 +0000 (09:31 +0900)]
ubifs: Change gfp flags in page allocation for bulk read
In low memory situations, page allocations for bulk read
can kill applications for reclaiming memory, and print an
failure message when allocations are failed.
Because bulk read is just an optimization, we don't have
to do these and can stop page allocations.
Though this siutation happens rarely, add __GFP_NORETRY
to prevent from excessive memory reclaim and killing
applications, and __GFP_WARN to suppress this failure
message.
For this, Use readahead_gfp_mask for gfp flags when
allocating pages.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
karam.lee [Mon, 12 Jun 2017 01:46:31 +0000 (10:46 +0900)]
ubifs: Fix oops when remounting with no_bulk_read.
When remounting with the no_bulk_read option,
there is a problem accessing the "bulk_read buffer(bu.buf)"
which has already been freed.
If the bulk_read option is enabled,
ubifs_tnc_bulk_read uses the pre-allocated bu.buf.
While bu.buf is being used by ubifs_tnc_bulk_read,
remounting with no_bulk_read frees bu.buf.
So I added code to check the use of "bu.buf" to avoid this situation.
------
I tested as follows(kernel v3.18) :
Use the script to repeat "no_bulk_read <-> bulk_read"
remount.sh
#!/bin/sh
while true do;
mount -o remount,no_bulk_read ${MOUNT_POINT};
sleep 1;
mount -o remount,bulk_read ${MOUNT_POINT};
sleep 1;
done
Perform read operation
cat ${MOUNT_POINT}/* > /dev/null
The problem is reproduced immediately.
[ 234.256845][kernel.0]Internal error: Oops: 17 [#1] PREEMPT ARM
[ 234.258557][kernel.0]CPU: 0 PID: 2752 Comm: cat Tainted: G W O 3.18.31+ #51
[ 234.259531][kernel.0]task:
cbff8580 ti:
cbd66000 task.ti:
cbd66000
[ 234.260306][kernel.0]PC is at validate_data_node+0x10/0x264
[ 234.260994][kernel.0]LR is at ubifs_tnc_bulk_read+0x388/0x3ec
[ 234.261712][kernel.0]pc : [<
c01d98fc>] lr : [<
c01dc300>] psr:
80000013
[ 234.261712][kernel.0]sp :
cbd67ba0 ip :
00000001 fp :
00000000
[ 234.263337][kernel.0]r10:
cd3e0260 r9 :
c0df2008 r8 :
00000000
[ 234.264087][kernel.0]r7 :
cd3e0000 r6 :
00000000 r5 :
cd3e0278 r4 :
cd3e0000
[ 234.264999][kernel.0]r3 :
00000003 r2 :
cd3e0280 r1 :
00000000 r0 :
cd3e0000
[ 234.265910][kernel.0]Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 234.266896][kernel.0]Control:
10c53c7d Table:
8c40c059 DAC:
00000015
[ 234.267711][kernel.0]Process cat (pid: 2752, stack limit = 0xcbd66400)
[ 234.268525][kernel.0]Stack: (0xcbd67ba0 to 0xcbd68000)
[ 234.269169][kernel.0]7ba0:
cd7c3940 c03d8650 0001bfe0 00002ab2 00000000 cbd67c5c cbd67c58 0001bfe0
[ 234.270287][kernel.0]7bc0:
cd3e0000 00002ab2 0001bfe0 00000014 cbd66000 cd3e0260 00000000 c01d6660
[ 234.271403][kernel.0]7be0:
00002ab2 00000000 c82a5800 ffffffff cd3e0298 cd3e0278 00000000 cd3e0000
[ 234.272520][kernel.0]7c00:
00000000 00000000 cd3e0260 c01dc300 00002ab2 00000000 60000013 d663affa
[ 234.273639][kernel.0]7c20:
cd3e01f0 cd3e01f0 60000013 c09397ec 00000000 cd3e0278 00002ab2 00000000
[ 234.274755][kernel.0]7c40:
cd3e0000 c01dbf48 00000014 00000003 00000160 00000015 00000004 d663affa
[ 234.275874][kernel.0]7c60:
ccdaa978 cd3e0278 cd3e0000 cf32a5f4 ccdaa820 00000044 cbd66000 cd3e0260
[ 234.276992][kernel.0]7c80:
00000003 c01cec84 ccdaa8dc cbd67cc4 cbd67ec0 00000010 ccdaa978 00000000
[ 234.278108][kernel.0]7ca0:
0000015e ccdaa8dc 00000000 00000000 cf32a5d0 00000000 0000015f ccdaa8dc
[ 234.279228][kernel.0]7cc0:
00000000 c8488300 0009e5a4 0000000e cbd66000 0000015e cf32a5f4 c0113c04
[ 234.280346][kernel.0]7ce0:
0000009f 0000003c c00098c4 ffffffff 00001000 00000000 000000ad 00000010
[ 234.281463][kernel.0]7d00:
00000038 cd68f580 00000150 c8488360 00000000 cbd67d30 cbd67d70 0000000e
[ 234.282579][kernel.0]7d20:
00000010 00000000 c0951874 c0112a9c cf379b60 cf379b84 cf379890 cf3798b4
[ 234.283699][kernel.0]7d40:
cf379578 cf37959c cf379380 cf3793a4 cf3790b0 cf3790d4 cf378fd8 cf378ffc
[ 234.284814][kernel.0]7d60:
cf378f48 cf378f6c cf32a5f4 cf32a5d0 00000000 00001000 00000018 00000000
[ 234.285932][kernel.0]7d80:
00001000 c0050da4 00000000 00001000 cec04c00 00000000 00001000 c0e11328
[ 234.287049][kernel.0]7da0:
00000000 00001000 cbd66000 00000000 00001000 c0012a60 00000000 00001000
[ 234.288166][kernel.0]7dc0:
cbd67dd4 00000000 00001000 80000013 00000000 00001000 cd68f580 00000000
[ 234.289285][kernel.0]7de0:
00001000 c915d600 00000000 00001000 cbd67e48 00000000 00001000 00000018
[ 234.290402][kernel.0]7e00:
00000000 00001000 00000000 00000000 00001000 c915d768 c915d768 c0113550
[ 234.291522][kernel.0]7e20:
cd68f580 cbd67e48 cd68f580 cb6713c0 00010000 000ac5a4 00000000 001fc5a4
[ 234.292637][kernel.0]7e40:
00000000 c8488300 cbd67ec0 00eb0000 cd68f580 c0113ee4 00000000 cbd67ec0
[ 234.293754][kernel.0]7e60:
cd68f580 c8488300 cbd67ec0 00eb0000 cd68f580 00150000 c8488300 00eb0000
[ 234.294874][kernel.0]7e80:
00010000 c0112fd0 00000000 cbd67ec0 cd68f580 00150000 00000000 cd68f580
[ 234.295991][kernel.0]7ea0:
cbd67ef0 c011308c 00000000 00000002 cd768850 00010000 00000000 c01133fc
[ 234.297110][kernel.0]7ec0:
00150000 00000000 cbd67f50 00000000 00000000 cb6713c0 01000000 cbd67f48
[ 234.298226][kernel.0]7ee0:
cbd67f50 c8488300 00000000 c0113204 00010000 01000000 00000000 cb6713c0
[ 234.299342][kernel.0]7f00:
00150000 00000000 cbd67f50 00000000 00000000 00000000 00000000 00000000
[ 234.300462][kernel.0]7f20:
cbd67f50 01000000 01000000 cb6713c0 c8488300 c00ebba8 01000000 00000000
[ 234.301577][kernel.0]7f40:
c8488300 cb6713c0 00000000 00000000 00000000 00000000 ccdaa820 00000000
[ 234.302697][kernel.0]7f60:
00000000 01000000 00000003 00000001 cbd66000 00000000 00000001 c00ec678
[ 234.303813][kernel.0]7f80:
00000000 00000200 00000000 01000000 01000000 00000000 00000000 000000ef
[ 234.304933][kernel.0]7fa0:
c000e904 c000e780 01000000 00000000 00000001 00000003 00000000 01000000
[ 234.306049][kernel.0]7fc0:
01000000 00000000 00000000 000000ef 00000001 00000003 01000000 00000001
[ 234.307165][kernel.0]7fe0:
00000000 beafb78c 0000ad08 00128d1c 60000010 00000001 00000000 00000000
[ 234.308292][kernel.0][<
c01d98fc>] (validate_data_node) from [<
c01dc300>] (ubifs_tnc_bulk_read+0x388/0x3ec)
[ 234.309493][kernel.0][<
c01dc300>] (ubifs_tnc_bulk_read) from [<
c01cec84>] (ubifs_readpage+0x1dc/0x46c)
[ 234.310656][kernel.0][<
c01cec84>] (ubifs_readpage) from [<
c0113c04>] (__generic_file_splice_read+0x29c/0x4cc)
[ 234.311890][kernel.0][<
c0113c04>] (__generic_file_splice_read) from [<
c0113ee4>] (generic_file_splice_read+0xb0/0xf4)
[ 234.313214][kernel.0][<
c0113ee4>] (generic_file_splice_read) from [<
c0112fd0>] (do_splice_to+0x68/0x7c)
[ 234.314386][kernel.0][<
c0112fd0>] (do_splice_to) from [<
c011308c>] (splice_direct_to_actor+0xa8/0x190)
[ 234.315544][kernel.0][<
c011308c>] (splice_direct_to_actor) from [<
c0113204>] (do_splice_direct+0x90/0xb8)
[ 234.316741][kernel.0][<
c0113204>] (do_splice_direct) from [<
c00ebba8>] (do_sendfile+0x17c/0x2b8)
[ 234.317838][kernel.0][<
c00ebba8>] (do_sendfile) from [<
c00ec678>] (SyS_sendfile64+0xc4/0xcc)
[ 234.318890][kernel.0][<
c00ec678>] (SyS_sendfile64) from [<
c000e780>] (ret_fast_syscall+0x0/0x38)
[ 234.319983][kernel.0]Code:
e92d47f0 e24dd050 e59f9228 e1a04000 (
e5d18014)
Signed-off-by: karam.lee <karam.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 7 Jun 2017 21:33:35 +0000 (23:33 +0200)]
ubifs: Fail commit if TNC is obviously inconsistent
A reference to LEB 0 or with length 0 in the TNC
is never correct and could be caused by a memory corruption.
Don't write such a bad index node to the MTD.
Instead fail the commit which will turn UBIFS into read-only mode.
This is less painful than having the bad reference on the MTD
from where UBFIS has no chance to recover.
Signed-off-by: Richard Weinberger <richard@nod.at>
Rabin Vincent [Wed, 31 May 2017 09:40:27 +0000 (11:40 +0200)]
ubifs: allow userspace to map mounts to volumes
There currently appears to be no way for userspace to find out the
underlying volume number for a mounted ubifs file system, since ubifs
uses anonymous block devices. The volume name is present in
/proc/mounts but UBI volumes can be renamed after the volume has been
mounted.
To remedy this, show the UBI number and UBI volume number as part of the
options visible under /proc/mounts.
Also, accept and ignore the ubi= vol= options if they are used mounting
(patch from Richard Weinberger).
# mount -t ubifs ubi:baz x
# mount
ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2)
# ubirename /dev/ubi0 baz bazz
# mount
ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2)
# ubinfo -d 0 -n 2
Volume ID: 2 (on ubi0)
Type: dynamic
Alignment: 1
Size: 67 LEBs (
1063424 bytes, 1.0 MiB)
State: OK
Name: bazz
Character device major/minor: 254:3
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Sat, 20 May 2017 22:16:26 +0000 (00:16 +0200)]
ubifs: Wire-up statx() support
statx() can report what flags a file has, expose flags that UBIFS
supports. Especially STATX_ATTR_COMPRESSED and STATX_ATTR_ENCRYPTED
can be interesting for userspace.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:49 +0000 (10:36 +0200)]
ubifs: Remove dead code from ubifs_get_link()
We check the length already, no need to check later
again for an empty string.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:48 +0000 (10:36 +0200)]
ubifs: Massage debug prints wrt. fscrypt
If file names are encrypted we can no longer print them.
That's why we have to change these prints or remove them completely.
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:47 +0000 (10:36 +0200)]
ubifs: Add assert to dent_key_init()
...to make sure that we don't use it for double hashed lookups
instead of dent_key_init_hash().
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Wed, 17 May 2017 08:36:46 +0000 (10:36 +0200)]
ubifs: Fix unlink code wrt. double hash lookups
When removing an encrypted file with a long name and without having
the key we have to be able to locate and remove the directory entry
via a double hash. This corner case was simply forgotten.
Fixes: 528e3d178f25 ("ubifs: Add full hash lookup support")
Reported-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
David Oberhollenzer [Wed, 17 May 2017 08:36:45 +0000 (10:36 +0200)]
ubifs: Fix data node size for truncating uncompressed nodes
Currently, the function truncate_data_node only updates the
destination data node size if compression is used. For
uncompressed nodes, the old length is incorrectly retained.
This patch makes sure that the length is correctly set when
compression is disabled.
Fixes: 7799953b34d1 ("ubifs: Implement encrypt/decrypt for all IO")
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
David Gstir [Wed, 17 May 2017 11:36:16 +0000 (13:36 +0200)]
ubifs: Don't encrypt special files on creation
When a new inode is created, we check if the containing folder has a encryption
policy set and inherit that. This should however only be done for regular
files, links and subdirectories. Not for sockes fifos etc.
Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto")
Cc: stable@vger.kernel.org
Signed-off-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Hyunchul Lee [Tue, 16 May 2017 23:58:02 +0000 (08:58 +0900)]
ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename
in RENAME_WHITEOUT error path, fscrypt_name should be freed.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Hyunchul Lee [Tue, 16 May 2017 23:57:18 +0000 (08:57 +0900)]
ubifs: Fix inode data budget in ubifs_mknod
Assign inode data budget to budget request correctly.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Tue, 16 May 2017 22:20:27 +0000 (00:20 +0200)]
ubifs: Correctly evict xattr inodes
UBIFS handles extended attributes just like files, as consequence of
that, they also have inodes.
Therefore UBIFS does all the inode machinery also for xattrs. Since new
inodes have i_nlink of 1, a file or xattr inode will be evicted
if i_nlink goes down to 0 after an unlink. UBIFS assumes this model also
for xattrs, which is not correct.
One can create a file "foo" with xattr "user.test". By reading
"user.test" an inode will be created, and by deleting "user.test" it
will get evicted later. The assumption breaks if the file "foo", which
hosts the xattrs, will be removed. VFS nor UBIFS does not remove each
xattr via ubifs_xattr_remove(), it just removes the host inode from
the TNC and all underlying xattr nodes too and the inode will remain
in the cache and wastes memory.
To solve this problem, remove xattr inodes from the VFS inode cache in
ubifs_xattr_remove() to make sure that they get evicted.
Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Cc: <stable@vger.kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Richard Weinberger [Tue, 16 May 2017 22:20:26 +0000 (00:20 +0200)]
ubifs: Unexport ubifs_inode_slab
This SLAB is only being used in super.c, there is no need to expose
it into the global namespace.
Signed-off-by: Richard Weinberger <richard@nod.at>
Linus Torvalds [Fri, 14 Jul 2017 20:31:52 +0000 (13:31 -0700)]
Merge tag 'ntb-4.13' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"The major change in the series is a rework of the NTB infrastructure
to all for IDT hardware to be supported (and resulting fallout from
that). There are also a few clean-ups, etc.
New IDT NTB driver and changes to the NTB infrastructure to allow for
this different kind of NTB HW, some style fixes (per Greg KH
recommendation), and some ntb_test tweaks"
* tag 'ntb-4.13' of git://github.com/jonmason/ntb:
ntb_netdev: set the net_device's parent
ntb: Add error path/handling to Debug FS entry creation
ntb: Add more debugfs support for ntb_perf testing options
ntb: Remove debug-fs variables from the context structure
ntb: Add a module option to control affinity of DMA channels
NTB: Add IDT 89HPESxNTx PCIe-switches support
ntb_hw_intel: Style fixes: open code macros that just obfuscate code
ntb_hw_amd: Style fixes: open code macros that just obfuscate code
NTB: Add ntb.h comments
NTB: Add PCIe Gen4 link speed
NTB: Add new Memory Windows API documentation
NTB: Add Messaging NTB API
NTB: Alter Scratchpads API to support multi-ports devices
NTB: Alter MW API to support multi-ports devices
NTB: Alter link-state API to support multi-port devices
NTB: Add indexed ports NTB API
NTB: Make link-state API being declared first
NTB: ntb_test: add parameter for doorbell bitmask
NTB: ntb_test: modprobe on remote host
Linus Torvalds [Fri, 14 Jul 2017 20:12:32 +0000 (13:12 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
- Improve thermal cpu_cooling interaction with cpufreq core.
The cpu_cooling driver is designed to use CPU frequency scaling to
avoid high thermal states for a platform. But it wasn't glued really
well with cpufreq core.
For example clipped-cpus is copied from the policy structure and its
much better to use the policy->cpus (or related_cpus) fields directly
as they may have got updated. Not that things were broken before this
series, but they can be optimized a bit more.
This series tries to improve interactions between cpufreq core and
cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
driver. (Viresh Kumar)
- A couple of fixes and cleanups in thermal core and imx, hisilicon,
bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter,
Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
thermal: bcm2835: fix an error code in probe()
thermal: hisilicon: Handle return value of clk_prepare_enable
thermal: imx: Handle return value of clk_prepare_enable
thermal: int340x: check for sensor when PTYP is missing
Thermal/int340x: Fix few typos and kernel-doc style
thermal: fix source code documentation for parameters
thermal: cpu_cooling: Replace kmalloc with kmalloc_array
thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
thermal: cpu_cooling: get_level() can't fail
thermal: cpu_cooling: create structure for idle time stats
thermal: cpu_cooling: merge frequency and power tables
thermal: cpu_cooling: get rid of 'allowed_cpus'
thermal: cpu_cooling: OPPs are registered for all CPUs
thermal: cpu_cooling: store cpufreq policy
cpufreq: create cpufreq_table_count_valid_entries()
thermal: cpu_cooling: use cpufreq_policy to register cooling device
thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
thermal: cpu_cooling: remove cpufreq_cooling_get_level()
...
Linus Torvalds [Fri, 14 Jul 2017 20:10:06 +0000 (13:10 -0700)]
Merge tag 'mmc-v4.13-2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.13 rc1.
MMC core:
- Restore some behaviour of MMC_IOC_MULTI_CMD commands
- Fix using un-initialized variable in mmc_blk_issue_drv_op()
- Fix mmc block queue cleanup
MMC host:
- sdhci-acpi: Workaround conflict with PCI wifi on GPD Win handheld
- tmio-mmc: Fix bad pointer math"
* tag 'mmc-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: tmio-mmc: fix bad pointer math
mmc: block: Prevent new req entering queue after its cleanup
mmc: block: Let MMC_IOC_MULTI_CMD return zero again for zero entries
mmc: block: Initialize ret in mmc_blk_issue_drv_op() for MMC_DRV_OP_IOCTL
mmc: sdhci-acpi: Workaround conflict with PCI wifi on GPD Win handheld
Mauro Carvalho Chehab [Wed, 12 Jul 2017 13:03:09 +0000 (10:03 -0300)]
docs: kprobes.txt: Fix whitespacing
The notes at the end of this file start with a blank space,
instead of a blank line, violating ReST format.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 12 Jul 2017 13:06:20 +0000 (10:06 -0300)]
tee.txt: standardize document format
Each text file under Documentation follows a different format. Some
doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- adjust identation of titles;
- mark ascii artwork as a literal block;
- adjust references.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Sun, 14 May 2017 11:48:40 +0000 (08:48 -0300)]
cgroup-v2.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Comment the internal index;
- Use :Date: and :Author: for authorship;
- Mark titles;
- Mark literal blocks;
- Adjust witespaces;
- Mark notes;
- Use table notation for the existing tables.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 17:25:05 +0000 (14:25 -0300)]
dell_rbu.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx.
Currently, the document is completely unformatted. Add
titles, do indentation, mark literal blocks.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:55:46 +0000 (09:55 -0300)]
zorro.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Use right marks for titles;
- Use authorship marks;
- Mark literals and literal blocks;
- Use autonumbered list for references.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:54:22 +0000 (09:54 -0300)]
xz.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Use marks for titles;
- Adjust indentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:45:31 +0000 (09:45 -0300)]
xillybus.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Adjust indentation;
- Mark authorship;
- Comment internal contents table;
- Mark literal blocks;
- Don't use all-upercase titles.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:38:00 +0000 (09:38 -0300)]
vfio.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- adjust title marks;
- use footnote marks;
- mark literal blocks;
- adjust identation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:26:06 +0000 (09:26 -0300)]
vfio-mediated-device.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
In this specific document, the title, copyright and authorship
are added as if it were a C file!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- convert document preambule to the proper format;
- mark literal blocks;
- adjust identation;
- use numbered lists for references.
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:16:19 +0000 (09:16 -0300)]
unaligned-memory-access.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- promote document title one level;
- use markups for authorship and put it at the beginning;
- mark literal blocks;
- adjust identation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:10:48 +0000 (09:10 -0300)]
this_cpu_ops.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- promote document title one level;
- mark literal blocks;
- move authorship to the beginning of the file and use markups.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mauro Carvalho Chehab [Wed, 17 May 2017 12:00:17 +0000 (09:00 -0300)]
svga.txt: standardize document format
Each text file under Documentation follows a different
format. Some doesn't even have titles!
Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Use standard notation for titles;
- Use the note mark;
- mark literal blocks;
- adjust identation;
- mark the table.
Acked-By: Martin Mares <mj@ucw.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>