Martin Schwidefsky [Thu, 12 Oct 2017 11:24:46 +0000 (13:24 +0200)]
s390/ctl_reg: move control register definitions to ctl_reg.h
The nmi.h header has some constant defines for control register bits.
These definitions should really be located in ctl_reg.h. Move and
rename the defines.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Thu, 12 Oct 2017 11:24:45 +0000 (13:24 +0200)]
s390/ctl_reg: use decoding unions in update_cr_regs
Add a decoding union for the bits in control registers 2 and use
'union ctlreg0' and 'union ctlreg2' in update_cr_regs to improve
readability.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 15 Sep 2017 14:24:31 +0000 (16:24 +0200)]
s390/nmi: use smp_emergency_stop instead of smp_send_stop
The smp_send_stop() function can be called from s390_handle_damage
while DAT is off. This happens if a machine check indicates that
kernel gprs or control registers can not be restored. The function
smp_send_stop reenables DAT via __load_psw_mask. That should work
for the case of lost kernel gprs and the system will do the expected
stop of all CPUs. But if control registers are lost, in particular
CR13 with the home space ASCE, interesting secondary crashes may
occur.
Make smp_emergency_stop callable from nmi.c and remove the cpumask
argument. Replace the smp_send_stop call with smp_emergency_stop in
the s390_handle_damage function.
In addition add notrace and NOKPROBE_SYMBOL annotations for all
functions required for the emergency shutdown.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 13 Oct 2017 10:59:22 +0000 (12:59 +0200)]
s390/vdso: move boot_vdso_data to vdso.c
The boot_vdso_data variable is related to the vdso code, the magic of the
initial vdso area for the early boot and the replacement of it in vdso_init
should all be put into vdso.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Vasily Gorbik [Thu, 12 Oct 2017 11:01:47 +0000 (13:01 +0200)]
s390/spinlock: use cpu alternatives to enable niai instruction
Enable niai instruction in the spinlock code at run-time for machines
on which facility 49 is available (zEC12 and newer).
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Vasily Gorbik [Thu, 12 Oct 2017 11:01:47 +0000 (13:01 +0200)]
s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Thu, 12 Oct 2017 11:42:54 +0000 (13:42 +0200)]
s390/dasd: remove unused debug macros
Get rid of unused wrapper macros around debug_sprintf_exception.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Thu, 12 Oct 2017 11:57:26 +0000 (13:57 +0200)]
s390/debug: only write data once
debug_event_common memsets the active debug entry with zeros to
prevent stale data leakage. This is overwritten with the actual
debug data in the next step. Only write zeros to that part of the
debug entry that's not used by new debug data.
Micro benchmarks show a 2-10% reduction of cpu cycles with this
approach.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Mon, 9 Oct 2017 15:49:38 +0000 (17:49 +0200)]
s390/debug: improve debug_event
debug_event currently truncates the data if used with a size larger than
the buf_size of the debug feature. For lots of callers of this function,
wrappers have been implemented that loop until all data is handled.
Move that functionality into debug_event_common and get rid of the wrappers.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Philipp Rudo [Tue, 17 Oct 2017 10:28:08 +0000 (12:28 +0200)]
s390/kexec: Fix checksum validation return code for kdump
Before kexec boots to a crash kernel it checks whether the image in memory
changed after load. This is done by the function kdump_csum_valid, which
returns true, i.e. an int != 0, on success and 0 otherwise. In other words
when kdump_csum_valid returns an error code it means that the validation
succeeded. This is not only counterintuitive but also produces the wrong
result if the kernel was build without CONFIG_CRASH_DUMP. Fix this by
making kdump_csum_valid return a bool.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 16 Oct 2017 10:43:12 +0000 (12:43 +0200)]
Merge tag 'vfio-ccw-
20171016' of git://git./linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw update from Cornelia Huck:
"Some improvements in data handling for vfio-ccw."
Dong Jia Shi [Wed, 11 Oct 2017 02:38:22 +0000 (04:38 +0200)]
vfio: ccw: validate the count field of a ccw before pinning
If the count field of a ccw is zero, there is no need to
try to pin page(s) for it. Let's check the count value
before starting pinning operations.
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <
20171011023822.42948-3-bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Dong Jia Shi [Wed, 11 Oct 2017 02:38:21 +0000 (04:38 +0200)]
vfio: ccw: bypass bad idaw address when fetching IDAL ccws
We currently return the same error code (-EFAULT) to indicate two
different error cases:
1. a bug in vfio-ccw implementation has been found.
2. a buggy channel program has been detected.
This brings difficulty for userland program (specifically Qemu) to
handle.
Let's use -EFAULT to only indicate the first case. For the second
case, we simply hand over the ccws to lower level for further
handling.
Notice:
Once a bad idaw address is detected, the current behavior is to
suppress the ssch. With this fix, the channel program will be
accepted, and part of the channel program (the part ahead of
the bad idaw) could possibly be executed by the device before
I/O conclusion.
Suggested-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <
20171011023822.42948-2-bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Martin Schwidefsky [Tue, 4 Jul 2017 05:40:04 +0000 (07:40 +0200)]
s390: update defconfig
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 13 Oct 2017 07:06:29 +0000 (09:06 +0200)]
s390/debug: adjust coding style
The debug feature code hasn't been touched in ages and the code also
looks like this. Therefore clean up the code so it looks a bit more
like current coding style.
There is no functional change - actually I made also sure that the
generated code with performance_defconfig is identical.
A diff of old vs new with "objdump -d" is empty.
The code is still not checkpatch clean, but that was not the goal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Vasyl Gomonovych [Wed, 11 Oct 2017 19:52:46 +0000 (21:52 +0200)]
s390/pkey: fix kzalloc-simple.cocci warnings
drivers/s390/crypto/pkey_api.c:128:11-18: WARNING:
kzalloc should be used for cprbmem, instead of kmalloc/memset
Use kzalloc rather than kmalloc followed by memset with 0
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 6 Oct 2017 11:17:10 +0000 (13:17 +0200)]
s390/kprobes: remove KPROBE_SWAP_INST state
For an unknown reason the s390 kprobes instruction replacement
function modifies the kprobe_status of the current CPU to
KPROBE_SWAP_INST. This was supposed to catch traps that happened
during instruction patching. Such a fault is not supposed to happen,
and silently discarding such a fault is certainly also not what we
want. In fact s390 is the only architecture which has this odd piece
of code.
Just remove this and behave like all other architectures. This was
pointed out by Jens Remus.
Reported-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Wed, 4 Oct 2017 17:27:09 +0000 (19:27 +0200)]
s390: cleanup string ops prototypes
Just some trivial changes like removing the extern keyword from the
header file, renaming arguments to match the man pages, and whitespace
removal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Wed, 4 Oct 2017 17:27:08 +0000 (19:27 +0200)]
s390: optimize memset implementation
Like for the memset16/32/64 variants avoid that subsequent mvc
instructions depend on each other since that might have negative
performance impacts.
This patch is currently hardly relevant since at least gcc 7.1
generates only inline memset code and not a single memset call.
However there is no reason to not provide an optimized version
just in case gcc generates memset calls again, like it did in
the past.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Wed, 4 Oct 2017 17:27:07 +0000 (19:27 +0200)]
s390/mm: use memset64 instead of clear_table
Use memset64 instead of the (now) open-coded variant clear_table.
Performance wise there is no difference.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Wed, 4 Oct 2017 17:27:05 +0000 (19:27 +0200)]
s390: implement memset16, memset32 & memset64
Provide fast versions of the new memset variants. E.g. the generic
memset64 is ten times slower than the optimized version if used on a
whole page.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 9 Oct 2017 09:16:49 +0000 (11:16 +0200)]
Merge branch 'sthyi' into features
Add the store-hypervisor-information code into features
using a tip branch for parallel merging into the KVM tree.
QingFeng Hao [Fri, 29 Sep 2017 10:41:52 +0000 (12:41 +0200)]
s390/sthyi: add s390_sthyi system call
Add a syscall of s390_sthyi to implement STHYI instruction in LPAR
which reuses the implementation for KVM by Janosch Frank -
commit
95ca2cb57985 ("KVM: s390: Add sthyi emulation").
STHYI(Store Hypervisor Information) is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.
For the arguments of s390_sthyi, code shall be 0 and flags is reserved for
future use, info is the output argument to store the required hypervisor
info.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
QingFeng Hao [Fri, 29 Sep 2017 10:41:51 +0000 (12:41 +0200)]
s390/sthyi: add cache to store hypervisor info
STHYI requires extensive locking in the higher hypervisors and is
very computational/memory expensive. Therefore we cache the retrieved
hypervisor info whose valid period is 1s with mutex to allow concurrent
access. rw semaphore can't benefit here due to cache line bounce.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
QingFeng Hao [Fri, 29 Sep 2017 10:41:50 +0000 (12:41 +0200)]
s390/sthyi: reorganize sthyi implementation
As we need to support sthyi instruction on LPAR too, move the common code
to kernel part and kvm related code to intercept.c for better reuse.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Sat, 30 Sep 2017 08:54:31 +0000 (10:54 +0200)]
s390: use generic rwsem implementation
We never optimized our rwsem inline assemblies to make use of the new
atomic instructions. The generic rwsem implementation implicitly makes
use of the new instructions, since it implements the required rwsem
primitives with atomic operations, which we did optimize.
However even when compiling for old architectures the generic variant
still generates better code. So it's time to simply remove our old
code and switch to the generic implementation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Wed, 27 Sep 2017 07:45:00 +0000 (09:45 +0200)]
s390/disassembler: add new z14 instructions
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 26 Sep 2017 14:15:06 +0000 (16:15 +0200)]
s390/disassembler: add missing z13 instructions
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 26 Sep 2017 15:06:32 +0000 (17:06 +0200)]
s390/disassembler: add sthyi instruction
This instruction came with a z/VM extension and not with a specific
machine generation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 26 Sep 2017 07:16:53 +0000 (09:16 +0200)]
s390/disassembler: remove double instructions
Remove a couple of instructions that are listed twice.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 26 Sep 2017 11:36:01 +0000 (13:36 +0200)]
s390/disassembler: fix LRDFU format
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 26 Sep 2017 07:16:48 +0000 (09:16 +0200)]
s390/disassembler: add missing end marker for e7 table
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Cc: <stable@vger.kernel.org> # v3.18+
Fixes:
3585cb0280654 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Thu, 28 Sep 2017 06:05:13 +0000 (08:05 +0200)]
s390/virtio: simplify Makefile
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Thomas Huth [Wed, 27 Sep 2017 11:42:01 +0000 (13:42 +0200)]
s390/virtio: remove the old KVM virtio transport
There is no recent user space application available anymore which still
supports this old virtio transport. Additionally, commit
3b2fbb3f06ef
("virtio/s390: deprecate old transport") introduced a deprecation message
in the driver, and apparently nobody complained so far that it is still
required. So let's simply remove it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Julian Wiedmann [Thu, 14 Sep 2017 07:52:32 +0000 (09:52 +0200)]
s390/ccwgroup: tie a ccwgroup driver to its ccw driver
When grouping devices, the ccwgroup core only checks whether all of the
devices are bound to the same ccw_driver. It has no means of checking
if the requesting ccwgroup driver actually supports this device type.
qeth implements its own device matching in qeth_core_probe_device(),
while ctcm and lcs currently have no sanity-checking at all.
Enable ccwgroup drivers to optionally defer the device type checking to
the ccwgroup core, by specifying their supported ccw_driver.
This allows us drop the device type matching from qeth, and improves
the robustness of ctcm and lcs.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Harald Freudenberger [Mon, 18 Sep 2017 10:48:09 +0000 (12:48 +0200)]
s390/crypto: add s390 platform specific aes gcm support.
This patch introduces gcm(aes) support into the aes_s390 kernel module.
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Patrick Steuer [Mon, 18 Sep 2017 10:48:08 +0000 (12:48 +0200)]
s390/crypto: add inline assembly for KMA instruction to cpacf.h
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 24 Mar 2017 16:32:23 +0000 (17:32 +0100)]
s390/rwlock: introduce rwlock wait queueing
Like the common queued rwlock code the s390 implementation uses the
queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
the queueing. The encoding of the rwlock_t differs though, the counter
field in the rwlock_t is split into two parts. The upper two bytes hold
the write bit and the write wait counter, the lower two bytes hold the
read counter.
The arch_read_lock operation works exactly like the common qrwlock but
the enqueue operation for a writer follows a diffent logic. After the
failed inline try to get the rwlock in write, the writer first increases
the write wait counter, acquires the wait spin_lock for the queueing,
and then loops until there are no readers and the write bit is zero.
Without the write wait counter a CPU that just released the rwlock
could immediately reacquire the lock in the inline code, bypassing all
outstanding read and write waiters. For s390 this would cause massive
imbalances in favour of writers in case of a contended rwlock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 24 Mar 2017 16:25:02 +0000 (17:25 +0100)]
s390/spinlock: introduce spinlock wait queueing
The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.
The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.
The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:
lhi %r0,0 # 0 indicates an empty lock
l %r1,0x3a0 # CPU number + 1 from lowcore
cs %r0,%r1,<some_lock> # lock operation
jnz call_wait # on failure call wait function
locked:
...
call_wait:
la %r2,<some_lock>
brasl %r14,arch_spin_lock_wait
j locked
A spin_unlock is as simple as before:
lhi %r0,0
sth %r0,2(%r2) # unlock operation
After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.
To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.
While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Sun, 4 Dec 2016 13:36:04 +0000 (14:36 +0100)]
s390/spinlock: use the cpu number +1 as spinlock value
The queued spinlock code will come out simpler if the encoding of
the CPU that holds the spinlock is (cpu+1) instead of (~cpu).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 22 Sep 2017 12:17:41 +0000 (14:17 +0200)]
s390/topology: add detection of dedicated vs shared CPUs
The topology information returned by STSI 15.x.x contains a flag
if the CPUs of a topology-list are dedicated or shared. Make this
information available if the machine provides topology information.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Himanshu Jha [Sun, 24 Sep 2017 12:00:14 +0000 (17:30 +0530)]
s390/sclp: Use setup_timer and mod_timer
Use setup_timer and mod_timer API instead of structure assignments.
This is done using Coccinelle and semantic patch used
for this as follows:
@@
expression x,y,z,a,b;
@@
-init_timer (&x);
+setup_timer (&x, y, z);
+mod_timer (&a, b);
-x.function = y;
-x.data = z;
-x.expires = b;
-add_timer(&a);
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Thu, 21 Sep 2017 12:43:10 +0000 (14:43 +0200)]
s390/cpumf: remove superfluous nr_cpumask_bits check
Paul Burton reported that the nr_cpumask_bits check
within cpumsf_pmu_event_init() is not necessary.
Actually there is already a prior check within
perf_event_alloc(). Therefore remove the check.
Reported-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Harald Freudenberger [Tue, 12 Sep 2017 05:04:26 +0000 (07:04 +0200)]
s390/zcrypt: Explicitly check input data length.
The function to prepare MEX type 50 ap messages did
not explicitly check for the data length in case of
data > 512 bytes. Instead the function assumes the
boundary check done in the ioctl function will always
reject requests with invalid data length values.
However, screening just the function code may give the
illusion, that there may be a gap which could be
exploited by userspace for buffer overwrite attacks.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Tue, 12 Sep 2017 09:21:00 +0000 (11:21 +0200)]
s390/cmf: use tod_to_ns()
Instead of open coding tod clock to ns conversions use the timex helper.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 8 Sep 2017 19:01:38 +0000 (21:01 +0200)]
s390/cmf: avg_utilization
All (but one) cmf related sysfs attributes have been converted to read
directly from the measurement block using cmf_read. This is not
possible for the avg_utilization attribute since this is an aggregation
of several values for which cmf_read only returns average values.
Move the computation of the utilization value to the cmf_read interface
such that it can use the raw data.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Thu, 7 Sep 2017 11:18:40 +0000 (13:18 +0200)]
s390/cmf: read from hw buffer
To ensure data consistency when reading from the channel measurement
block we wait for the subchannel to become idle and copy the whole
block. This is unnecessary when all we do is export the individual
values via sysfs. Read the values for sysfs export directly from
the measurement block.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Wed, 6 Sep 2017 08:44:23 +0000 (10:44 +0200)]
s390/cmf: simplify copy_block
cmf_copy_block tries to ensure data consistency by copying the
channel measurement block twice and comparing the data. This was
needed on very old machines only. Nowadays we guarantee
consistency by copying the data when the channel is idle.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Wed, 6 Sep 2017 17:05:29 +0000 (19:05 +0200)]
s390/cmf: simplify cmb_copy_wait
No need for refcounting - the data can be on stack. Also change
the locking in this function to only use spin_lock_irq (the
function waits, thus it's never called from IRQ context).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Wed, 6 Sep 2017 11:43:20 +0000 (13:43 +0200)]
s390/cmf: simplify set_schib_wait
No need for refcounting - the data can be on stack.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Tue, 5 Sep 2017 12:22:48 +0000 (14:22 +0200)]
s390/cmf: set_schib_wait add timeout
When enabling channel measurement fails with a busy condition we wait
for the next interrupt to arrive before we retry the operation. For
devices which usually don't create interrupts we wait forever.
Although the waiting is done interruptible that behavior is not
expected and confused some users. Abort the operation after a 10s
timeout.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jean Delvare [Mon, 18 Sep 2017 09:13:13 +0000 (11:13 +0200)]
s390/char: fix cdev_add usage
Function cdev_add does set cdev->dev, so there is no point in setting
it prior to calling this function.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Johannes Thumshirn [Thu, 14 Sep 2017 12:11:15 +0000 (14:11 +0200)]
samples/kprobes: Add s390 case in kprobe example module
Add info prints in sample kprobe handlers for S/390
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Alice Frosi [Thu, 14 Sep 2017 10:36:03 +0000 (12:36 +0200)]
s390/ptrace: add runtime instrumention register get/set
Add runtime instrumention register get and set which allows to read
and modify the runtime instrumention control block.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Alice Frosi [Thu, 14 Sep 2017 10:35:45 +0000 (12:35 +0200)]
s390/runtime_instrumentation: clean up struct runtime_instr_cb
Update runtime_instr_cb structure to be consistent with the runtime
instrumentation documentation.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Tue, 12 Sep 2017 11:49:57 +0000 (13:49 +0200)]
s390: add support for FORTIFY_SOURCE
This is the quite trivial backend for s390 which is required to enable
FORTIFY_SOURCE support.
See commit
6974f0c4555e ("include/linux/string.h: add the option of
fortified string.h functions") for more details.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:23 +0000 (11:24 +0200)]
s390: get rid of exit_thread()
exit_thread() is empty now. Therefore remove it and get rid of a
pointless branch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:23 +0000 (11:24 +0200)]
s390/guarded storage: simplify task exit handling
Free data structures required for guarded storage from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.
In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:23 +0000 (11:24 +0200)]
s390/ptrace: fix guarded storage regset handling
If the guarded storage regset for current is supposed to be changed,
the regset from user space is copied directly into the guarded storage
control block.
If then the process gets scheduled away while the control block is
being copied and before the new control block has been loaded, the
result is random: the process can be scheduled away due to a page
fault or preemption. If that happens the already copied parts will be
overwritten by save_gs_cb(), called from switch_to().
Avoid this by copying the data to a temporary buffer on the stack and
do the actual update with preemption disabled.
Fixes:
f5bbd7219891 ("s390/ptrace: guarded storage regset for the current task")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:22 +0000 (11:24 +0200)]
s390/guarded storage: fix possible memory corruption
For PREEMPT enabled kernels the guarded storage (GS) code contains a
possible use-after-free bug. If a task that makes use of GS exits, it
will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via exit_thread().
If exit_thread_gs() gets preempted after the GS control block of the
task has been freed but before the pointer to it is set to NULL, then
save_gs_cb(), called from switch_to(), will write to already freed
memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes:
916cda1aa1b4 ("s390: add a system call for guarded storage")
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:22 +0000 (11:24 +0200)]
s390/runtime instrumentation: simplify task exit handling
Free data structures required for runtime instrumentation from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.
In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:22 +0000 (11:24 +0200)]
s390/runtime instrumention: fix possible memory corruption
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes:
e4b8b3f33fca ("s390: add support for runtime instrumentation")
Cc: <stable@vger.kernel.org> # v3.7+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 11 Sep 2017 09:24:22 +0000 (11:24 +0200)]
s390: convert release_thread() into a static inline function
release_thread() is an empty function that gets called on every task
exit. Move the function to a header file and force inlining of it, so
that the compiler can optimize it away instead of generating a
pointless function call.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Linus Torvalds [Wed, 27 Sep 2017 19:22:12 +0000 (12:22 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull quota and isofs fixes from Jan Kara:
"Two quota fixes (fallout of the quota locking changes) and an isofs
build fix"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix quota corruption with generic/232 test
isofs: fix build regression
quota: add missing lock into __dquot_transfer()
Linus Torvalds [Wed, 27 Sep 2017 17:51:08 +0000 (10:51 -0700)]
Merge tag 'linux-kselftest-4.14-rc3-fixes' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This update consists of:
- fixes to several existing tests
- a test for regression introduced by
b9470c27607b ("inet: kill
smallest_size and smallest_port")
- seccomp support for glibc 2.26 siginfo_t.h
- fixes to kselftest framework and tests to run make O=dir use-case
- fixes to silence unnecessary test output to de-clutter test results"
* tag 'linux-kselftest-4.14-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (28 commits)
selftests: timers: set-timer-lat: Fix hang when testing unsupported alarms
selftests: timers: set-timer-lat: fix hang when std out/err are redirected
selftests/memfd: correct run_tests.sh permission
selftests/seccomp: Support glibc 2.26 siginfo_t.h
selftests: futex: Makefile: fix for loops in targets to run silently
selftests: Makefile: fix for loops in targets to run silently
selftests: mqueue: Use full path to run tests from Makefile
selftests: futex: copy sub-dir test scripts for make O=dir run
selftests: lib.mk: copy test scripts and test files for make O=dir run
selftests: sync: kselftest and kselftest-clean fail for make O=dir case
selftests: sync: use TEST_CUSTOM_PROGS instead of TEST_PROGS
selftests: lib.mk: add TEST_CUSTOM_PROGS to allow custom test run/install
selftests: watchdog: fix to use TEST_GEN_PROGS and remove clean
selftests: lib.mk: fix test executable status check to use full path
selftests: Makefile: clear LDFLAGS for make O=dir use-case
selftests: lib.mk: kselftest and kselftest-clean fail for make O=dir case
Makefile: kselftest and kselftest-clean fail for make O=dir case
selftests/net: msg_zerocopy enable build with older kernel headers
selftests: actually run the various net selftests
selftest: add a reuseaddr test
...
Linus Torvalds [Wed, 27 Sep 2017 15:33:26 +0000 (08:33 -0700)]
Merge branch 'x86-fpu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu fixes and cleanups from Ingo Molnar:
"This is _way_ more cleanups than fixes, but the bugs were subtle and
hard to hit, and the primary reason for them existing was the
unnecessary historical complexity of some of the x86/fpu interfaces.
The first bunch of commits clean up and simplify the xstate user copy
handling functions, in reaction to the collective head-scratching
about the xstate user-copy handling code that leads up to the fix for
this SkyLake xstate handling bug:
0852b374173b: x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake CPUs
The cleanups don't change any functionality, they just (hopefully)
make it all clearer, more consistent, more debuggable and more robust.
Note that most of the linecount increase comes from these commits,
where we better split the user/kernel copy logic by having more
variants, instead repeated fragile patterns of:
if (kbuf) {
memcpy(kbuf + pos, data, copy);
} else {
if (__copy_to_user(ubuf + pos, data, copy))
return -EFAULT;
}
The next bunch of commits simplify the FPU state-machine to get rid of
old lazy-FPU idiosyncrasies - a defensive simplification to make all
the code easier to review and fix. No change in functionality.
Then there's a couple of additional debugging tweaks: static checker
warning fix and move an FPU related warning to under WARN_ON_FPU(),
followed by another bunch of commits that represent a finegrained
split-up of the fixes from Eric Biggers to handle weird xstate bits
properly.
I did this finegrained split-up because some of these fixes also
impact the ABI for weird xstate handling, for which we'd like to have
good bisection results, should they cause any problems. (We also had
one regression with the more monolithic fixes, so splitting it all up
sounded prudent for robustness reasons as well.)
About the whole series: the commits up to
03eaec81ac09 have been in
-next for months - but I've recently rebased them to remove a state
machine clean-up commit that was objected to, and to make it more
bisectable - so technically it's a new, rebased tree.
Robustness history: this series had some regressions along the way,
and all reported regressions have been fixed. All but one of the
regressions manifested itself as easy to report warnings. The previous
version of this latest series was also in linux-next, with one
(warning-only) regression reported which is fixed in the latest
version.
Barring last minute brown paper bag bugs (and the commits are now
older by a day which I'd hope helps paperbag reduction), I'm
reasonably confident about its general robustness.
Famous last words ..."
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/fpu: Use using_compacted_format() instead of open coded X86_FEATURE_XSAVES
x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_user_to_xstate()
x86/fpu: Eliminate the 'xfeatures' local variable in copy_user_to_xstate()
x86/fpu: Copy the full header in copy_user_to_xstate()
x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_kernel_to_xstate()
x86/fpu: Eliminate the 'xfeatures' local variable in copy_kernel_to_xstate()
x86/fpu: Copy the full state_header in copy_kernel_to_xstate()
x86/fpu: Use validate_xstate_header() to validate the xstate_header in __fpu__restore_sig()
x86/fpu: Use validate_xstate_header() to validate the xstate_header in xstateregs_set()
x86/fpu: Introduce validate_xstate_header()
x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]()
x86/fpu: Rename fpu__activate_curr() to fpu__initialize()
x86/fpu: Simplify and speed up fpu__copy()
x86/fpu: Fix stale comments about lazy FPU logic
x86/fpu: Rename fpu::fpstate_active to fpu::initialized
x86/fpu: Remove fpu__current_fpstate_write_begin/end()
x86/fpu: Fix fpu__activate_fpstate_read() and update comments
x86/fpu: Reinitialize FPU registers if restoring FPU state fails
x86/fpu: Don't let userspace set bogus xcomp_bv
x86/fpu: Turn WARN_ON() in context switch into WARN_ON_FPU()
...
Jan Kara [Tue, 26 Sep 2017 08:36:05 +0000 (10:36 +0200)]
quota: Fix quota corruption with generic/232 test
Eric has reported that since commit
d2faa415166b "quota: Do not acquire
dqio_sem for dquot overwrites in v2 format" test generic/232
occasionally fails due to quota information being incorrect. Indeed that
commit was too eager to remove dqio_sem completely from the path that
just overwrites quota structure with updated information. Although that
is innocent on its own, another process that inserts new quota structure
to the same block can perform read-modify-write cycle of that block thus
effectively discarding quota information update if they race in a wrong
way.
Fix the problem by acquiring dqio_sem for reading for overwrites of
quota structure. Note that it *is* possible to completely avoid taking
dqio_sem in the overwrite path however that will require modifying path
inserting / deleting quota structures to avoid RMW cycles of the full
block and for now it is not clear whether it is worth the hassle.
Fixes:
d2faa415166b2883428efa92f451774ef44373ac
Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Tue, 26 Sep 2017 23:54:22 +0000 (16:54 -0700)]
Merge tag 'mmc-v4.14-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- sdhci-pci: Fix voltage switch for some Intel host controllers
- tmio: remove broken and noisy debug macro
* tag 'mmc-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci: Fix voltage switch for some Intel host controllers
mmc: tmio: remove broken and noisy debug macro
Andreas Gruenbacher [Mon, 25 Sep 2017 10:23:03 +0000 (12:23 +0200)]
vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
In generic_file_llseek_size, return -ENXIO for negative offsets as well
as offsets beyond EOF. This affects filesystems which don't implement
SEEK_HOLE / SEEK_DATA internally, possibly because they don't support
holes.
Fixes xfstest generic/448.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Tue, 26 Sep 2017 08:17:43 +0000 (10:17 +0200)]
Merge branch 'WIP.x86/fpu' into x86/fpu, because it's ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:13 +0000 (12:59 +0200)]
x86/fpu: Use using_compacted_format() instead of open coded X86_FEATURE_XSAVES
This is the canonical method to use.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-11-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:12 +0000 (12:59 +0200)]
x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_user_to_xstate()
Tighten the checks in copy_user_to_xstate().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-10-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:11 +0000 (12:59 +0200)]
x86/fpu: Eliminate the 'xfeatures' local variable in copy_user_to_xstate()
We now have this field in hdr.xfeatures.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-9-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:10 +0000 (12:59 +0200)]
x86/fpu: Copy the full header in copy_user_to_xstate()
This is in preparation to verify the full xstate header as supplied by user-space.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-8-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:09 +0000 (12:59 +0200)]
x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_kernel_to_xstate()
Tighten the checks in copy_kernel_to_xstate().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-7-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:08 +0000 (12:59 +0200)]
x86/fpu: Eliminate the 'xfeatures' local variable in copy_kernel_to_xstate()
We have this information in the xstate_header.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-6-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:07 +0000 (12:59 +0200)]
x86/fpu: Copy the full state_header in copy_kernel_to_xstate()
This is in preparation to verify the full xstate header as supplied by user-space.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-5-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:06 +0000 (12:59 +0200)]
x86/fpu: Use validate_xstate_header() to validate the xstate_header in __fpu__restore_sig()
Tighten the checks in __fpu__restore_sig() and update comments.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-4-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:05 +0000 (12:59 +0200)]
x86/fpu: Use validate_xstate_header() to validate the xstate_header in xstateregs_set()
Tighten the checks in xstateregs_set().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-3-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Eric Biggers [Sun, 24 Sep 2017 10:59:04 +0000 (12:59 +0200)]
x86/fpu: Introduce validate_xstate_header()
Move validation of user-supplied xstate_header into a helper function,
in preparation of calling it from both the ptrace and sigreturn syscall
paths.
The new function also considers it to be an error if *any* reserved bits
are set, whereas before we were just clearing most of them silently.
This should reduce the chance of bugs that fail to correctly validate
user-supplied XSAVE areas. It also will expose any broken userspace
programs that set the other reserved bits; this is desirable because
such programs will lose compatibility with future CPUs and kernels if
those bits are ever used for anything. (There shouldn't be any such
programs, and in fact in the case where the compacted format is in use
we were already validating xfeatures. But you never know...)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 23 Sep 2017 11:37:45 +0000 (13:37 +0200)]
x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]()
As per the new nomenclature we don't 'activate' the FPU state
anymore, we initialize it. So drop the _activate_fpstate name
from these functions, which were a bit of a mouthful anyway,
and name them:
fpu__prepare_read()
fpu__prepare_write()
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 23 Sep 2017 13:00:15 +0000 (15:00 +0200)]
x86/fpu: Rename fpu__activate_curr() to fpu__initialize()
Rename this function to better express that it's all about
initializing the FPU state of a task which goes hand in hand
with the fpu::initialized field.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 23 Sep 2017 13:00:14 +0000 (15:00 +0200)]
x86/fpu: Simplify and speed up fpu__copy()
fpu__copy() has a preempt_disable()/enable() pair, which it had to do to
be able to atomically unlazy the current task when doing an FNSAVE.
But we don't unlazy tasks anymore, we always do direct saves/restores of
FPU context.
So remove both the unnecessary critical section, and update the comments.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-32-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 23 Sep 2017 13:00:13 +0000 (15:00 +0200)]
x86/fpu: Fix stale comments about lazy FPU logic
We don't do any lazy restore anymore, what we have are two pieces of optimization:
- no-FPU tasks that don't save/restore the FPU context (kernel threads are such)
- cached FPU registers maintained via the fpu->last_cpu field. This means that
if an FPU task context switches to a non-FPU task then we can maintain the
FPU registers as an in-FPU copies (cache), and skip the restoration of them
once we switch back to the original FPU-using task.
Update all the comments that still referred to old 'lazy' and 'unlazy' concepts.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-31-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 26 Sep 2017 07:43:36 +0000 (09:43 +0200)]
x86/fpu: Rename fpu::fpstate_active to fpu::initialized
The x86 FPU code used to have a complex state machine where both the FPU
registers and the FPU state context could be 'active' (or inactive)
independently of each other - which enabled features like lazy FPU restore.
Much of this complexity is gone in the current code: now we basically can
have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
state at all, plus full FPU users that save/restore directly with no laziness
whatsoever.
But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
this flag has become a simple flag that shows whether the FPU context saving
area in the thread struct is initialized and used, or not.
Rename it to fpu::initialized to express this simplicity in the name as well.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 23 Sep 2017 13:00:11 +0000 (15:00 +0200)]
x86/fpu: Remove fpu__current_fpstate_write_begin/end()
These functions are not used anymore, so remove them.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 23 Sep 2017 13:00:10 +0000 (15:00 +0200)]
x86/fpu: Fix fpu__activate_fpstate_read() and update comments
fpu__activate_fpstate_read() can be called for the current task
when coredumping - or for stopped tasks when ptrace-ing.
Implement this properly in the code and update the comments.
This also fixes an incorrect (but harmless) warning introduced by
one of the earlier patches.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-28-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 26 Sep 2017 01:24:14 +0000 (18:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull compat fix from Al Viro:
"I really wish gcc warned about conversions from pointer to function
into void *..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a typo in put_compat_shm_info()
Al Viro [Tue, 26 Sep 2017 00:38:45 +0000 (20:38 -0400)]
fix a typo in put_compat_shm_info()
"uip" misspelled as "up"; unfortunately, the latter happens to be
a function and gcc is happy to convert it to void *...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Mon, 25 Sep 2017 22:46:04 +0000 (15:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Two sets of NVMe pull requests from Christoph:
- Fixes for the Fibre Channel host/target to fix spec compliance
- Allow a zero keep alive timeout
- Make the debug printk for broken SGLs work better
- Fix queue zeroing during initialization
- Set of RDMA and FC fixes
- Target div-by-zero fix
- bsg double-free fix.
- ndb unknown ioctl fix from Josef.
- Buffered vs O_DIRECT page cache inconsistency fix. Has been floating
around for a long time, well reviewed. From Lukas.
- brd overflow fix from Mikulas.
- Fix for a loop regression in this merge window, where using a union
for two members of the loop_cmd turned out to be a really bad idea.
From Omar.
- Fix for an iostat regression fix in this series, using the wrong API
to get at the block queue. From Shaohua.
- Fix for a potential blktrace delection deadlock. From Waiman.
* 'for-linus' of git://git.kernel.dk/linux-block: (30 commits)
nvme-fcloop: fix port deletes and callbacks
nvmet-fc: sync header templates with comments
nvmet-fc: ensure target queue id within range.
nvmet-fc: on port remove call put outside lock
nvme-rdma: don't fully stop the controller in error recovery
nvme-rdma: give up reconnect if state change fails
nvme-core: Use nvme_wq to queue async events and fw activation
nvme: fix sqhd reference when admin queue connect fails
block: fix a crash caused by wrong API
fs: Fix page cache inconsistency when mixing buffered and AIO DIO
nvmet: implement valid sqhd values in completions
nvme-fabrics: Allow 0 as KATO value
nvme: allow timed-out ios to retry
nvme: stop aer posting if controller state not live
nvme-pci: Print invalid SGL only once
nvme-pci: initialize queue memory before interrupts
nvmet-fc: fix failing max io queue connections
nvme-fc: use transport-specific sgl format
nvme: add transport SGL definitions
nvme.h: remove FC transport-specific error values
...
Linus Torvalds [Mon, 25 Sep 2017 22:41:56 +0000 (15:41 -0700)]
Merge tag 'gfs2-for-linus-4.14-rc3' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Bob Peterson:
"GFS2: Fix an old regression in GFS2's debugfs interface
This fixes a regression introduced by commit
88ffbf3e037e ("GFS2: Use
resizable hash table for glocks"). The regression caused the glock dump
in debugfs to not report all the glocks, which makes debugging
extremely difficult"
* tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix debugfs glocks dump
Linus Torvalds [Mon, 25 Sep 2017 22:37:19 +0000 (15:37 -0700)]
Merge tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze fixes from Michal Simek:
- Kbuild fix
- use vma_pages
- setup default little endians
* tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
arch: change default endian for microblaze
microblaze: Cocci spatch "vma_pages"
microblaze: Add missing kvm_para.h to Kbuild
Linus Torvalds [Mon, 25 Sep 2017 22:22:31 +0000 (15:22 -0700)]
Merge tag 'trace-v4.14-rc1-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Stack tracing and RCU has been having issues with each other and
lockdep has been pointing out constant problems.
The changes have been going into the stack tracer, but it has been
discovered that the problem isn't with the stack tracer itself, but it
is with calling save_stack_trace() from within the internals of RCU.
The stack tracer is the one that can trigger the issue the easiest,
but examining the problem further, it could also happen from a WARN()
in the wrong place, or even if an NMI happened in this area and it did
an rcu_read_lock().
The critical area is where RCU is not watching. Which can happen while
going to and from idle, or bringing up or taking down a CPU.
The final fix was to put the protection in kernel_text_address() as it
is the one that requires RCU to be watching while doing the stack
trace.
To make this work properly, Paul had to allow rcu_irq_enter() happen
after rcu_nmi_enter(). This should have been done anyway, since an NMI
can page fault (reading vmalloc area), and a page fault triggers
rcu_irq_enter().
One patch is just a consolidation of code so that the fix only needed
to be done in one location"
* tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove RCU work arounds from stack tracer
extable: Enable RCU if it is not watching in kernel_text_address()
extable: Consolidate *kernel_text_address() functions
rcu: Allow for page faults in NMI handlers
James Smart [Tue, 19 Sep 2017 21:01:50 +0000 (14:01 -0700)]
nvme-fcloop: fix port deletes and callbacks
Now that there are potentially long delays between when a remoteport or
targetport delete calls is made and when the callback occurs (dev_loss_tmo
timeout), no longer block in the delete routines and move the final nport
puts to the callbacks.
Moved the fcloop_nport_get/put/free routines to avoid forward declarations.
Ensure port_info structs used in registrations are nulled in case fields
are not set (ex: devloss_tmo values).
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Wed, 20 Sep 2017 18:07:26 +0000 (11:07 -0700)]
nvmet-fc: sync header templates with comments
Comments were incorrect:
- defer_rcv was in host port template. moved to target port template
- Added Mandatory statements for target port template items
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Tue, 19 Sep 2017 23:33:56 +0000 (16:33 -0700)]
nvmet-fc: ensure target queue id within range.
When searching for queue id's ensure they are within the expected range.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Tue, 19 Sep 2017 22:13:11 +0000 (15:13 -0700)]
nvmet-fc: on port remove call put outside lock
Avoid calling the put routine, as it may traverse to free routines while
holding the target lock.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 21 Sep 2017 14:01:38 +0000 (17:01 +0300)]
nvme-rdma: don't fully stop the controller in error recovery
By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 21 Sep 2017 14:01:37 +0000 (17:01 +0300)]
nvme-rdma: give up reconnect if state change fails
If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 21 Sep 2017 14:01:36 +0000 (17:01 +0300)]
nvme-core: Use nvme_wq to queue async events and fw activation
async_event_work might race as it is executed from two different
workqueues at the moment.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>