Alexey Dobriyan [Fri, 19 Jan 2018 00:34:05 +0000 (16:34 -0800)]
proc: fix coredump vs read /proc/*/stat race
do_task_stat() accesses IP and SP of a task without bumping reference
count of a stack (which became an entity with independent lifetime at
some point).
Steps to reproduce:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void)
{
setrlimit(RLIMIT_CORE, &(struct rlimit){});
while (1) {
char buf[64];
char buf2[4096];
pid_t pid;
int fd;
pid = fork();
if (pid == 0) {
*(volatile int *)0 = 0;
}
snprintf(buf, sizeof(buf), "/proc/%u/stat", pid);
fd = open(buf, O_RDONLY);
read(fd, buf2, sizeof(buf2));
close(fd);
waitpid(pid, NULL, 0);
}
return 0;
}
BUG: unable to handle kernel paging request at
0000000000003fd8
IP: do_task_stat+0x8b4/0xaf0
PGD
800000003d73e067 P4D
800000003d73e067 PUD
3d558067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
RIP: 0010:do_task_stat+0x8b4/0xaf0
Call Trace:
proc_single_show+0x43/0x70
seq_read+0xe6/0x3b0
__vfs_read+0x1e/0x120
vfs_read+0x84/0x110
SyS_read+0x3d/0xa0
entry_SYSCALL_64_fastpath+0x13/0x6c
RIP: 0033:0x7f4d7928cba0
RSP: 002b:
00007ffddb245158 EFLAGS:
00000246
Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef <48> 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24
RIP: do_task_stat+0x8b4/0xaf0 RSP:
ffffc90000607cc8
CR2:
0000000000003fd8
John Ogness said: for my tests I added an else case to verify that the
race is hit and correctly mitigated.
Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: "Kohli, Gaurav" <gkohli@codeaurora.org>
Tested-by: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xi Kangjie [Fri, 19 Jan 2018 00:34:00 +0000 (16:34 -0800)]
scripts/gdb/linux/tasks.py: fix get_thread_info
Since kernel 4.9, the thread_info has been moved into task_struct, no
longer locates at the bottom of kernel stack.
See commits
c65eacbe290b ("sched/core: Allow putting thread_info into
task_struct") and
15f4eae70d36 ("x86: Move thread_info into
task_struct").
Before fix:
(gdb) set $current = $lx_current()
(gdb) p $lx_thread_info($current)
$1 = {flags =
1470918301}
(gdb) p $current.thread_info
$2 = {flags =
2147483648}
After fix:
(gdb) p $lx_thread_info($current)
$1 = {flags =
2147483648}
(gdb) p $current.thread_info
$2 = {flags =
2147483648}
Link: http://lkml.kernel.org/r/20180118210159.17223-1-imxikangjie@gmail.com
Fixes: 15f4eae70d36 ("x86: Move thread_info into task_struct")
Signed-off-by: Xi Kangjie <imxikangjie@gmail.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Deacon [Fri, 19 Jan 2018 00:33:57 +0000 (16:33 -0800)]
scripts/decodecode: fix decoding for AArch64 (arm64) instructions
There are a couple of problems with the decodecode script and arm64:
1. AArch64 objdump refuses to disassemble .4byte directives as instructions,
insisting that they are data values and displaying them as:
a94153f3 .word 0xa94153f3 <-- trapping instruction
This is resolved by using the .inst directive instead.
2. Disassembly of branch instructions attempts to provide the target as
an offset from a symbol, e.g.:
0:
34000082 cbz w2, 10 <.text+0x10>
however this falls foul of the grep -v, which matches lines containing
".text" and ends up removing all branch instructions from the dump.
This patch resolves both issues by using the .inst directive for 4-byte
quantities on arm64 and stripping the resulting binaries (as is done on
arm already) to remove the mapping symbols.
Link: http://lkml.kernel.org/r/1506596147-23630-1-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Fri, 19 Jan 2018 00:33:53 +0000 (16:33 -0800)]
mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages
When setting page_owner = on, the following warning can be seen in the
boot log:
WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:2537 drain_all_pages+0x171/0x1a0
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc7-next-
20180109-1-default+ #7
Hardware name: Dell Inc. Latitude E7470/0T6HHJ, BIOS 1.11.3 11/09/2016
RIP: 0010:drain_all_pages+0x171/0x1a0
Call Trace:
init_page_owner+0x4e/0x260
start_kernel+0x3e6/0x4a6
? set_init_arg+0x55/0x55
secondary_startup_64+0xa5/0xb0
Code: c5 ed ff 89 df 48 c7 c6 20 3b 71 82 e8 f9 4b 52 00 3b 05 d7 0b f8 00 89 c3 72 d5 5b 5d 41 5
This warning is shown because we are calling drain_all_pages() in
init_early_allocated_pages(), but mm_percpu_wq is not up yet, it is being
set up later on in kernel_init_freeable() -> init_mm_internals().
Link: http://lkml.kernel.org/r/20180109153921.GA13070@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@techadventures.net>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ayush Mittal <ayush.m@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Fri, 19 Jan 2018 00:33:50 +0000 (16:33 -0800)]
mm/memory.c: release locked page in do_swap_page()
James reported a bug in swap paging-in from his testing. It is that
do_swap_page doesn't release locked page so system hang-up happens due
to a deadlock on PG_locked.
It was introduced by
0bcac06f27d7 ("mm, swap: skip swapcache for swapin
of synchronous device") because I missed swap cache hit places to update
swapcache variable to work well with other logics against swapcache in
do_swap_page.
This patch fixes it.
Debugged by James Bottomley.
Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartnership.com>
Link: http://lkml.kernel.org/r/20180102235606.GA19438@bbox
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 18 Jan 2018 18:57:59 +0000 (10:57 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"These are the ARM BPF fixes as discussed earlier this week"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: net: bpf: clarify tail_call index
ARM: net: bpf: fix LDX instructions
ARM: net: bpf: fix register saving
ARM: net: bpf: correct stack layout documentation
ARM: net: bpf: move stack documentation
ARM: net: bpf: fix stack alignment
ARM: net: bpf: fix tail call jumps
ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUs
Linus Torvalds [Thu, 18 Jan 2018 18:54:52 +0000 (10:54 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull two NVMe fixes from Jens Axboe:
"Two important fixes for the sgl support for nvme that is new in this
release"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-pci: take sglist coalescing in dma_map_sg into account
nvme-pci: check segement valid for SGL use
Linus Torvalds [Thu, 18 Jan 2018 18:49:26 +0000 (10:49 -0800)]
Merge tag 'mmc-v4.15-rc2-3' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"sdhci-esdhc-imx: Fixup clock to make i.MX53 Loco (IMX53QSB) boot
again"
* tag 'mmc-v4.15-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: Fix i.MX53 eSDHCv3 clock
Linus Torvalds [Thu, 18 Jan 2018 17:50:24 +0000 (09:50 -0800)]
Merge tag 'gpio-v4.15-5' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"This is the (hopefully) last GPIO fix for v4.15, fixing the bit
fiddling in the MMIO GPIO driver.
Again the especially endowed screwer-upper who has been open coding
bit fiddling is yours truly"
* tag 'gpio-v4.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mmio: Also read bits that are zero
Christoph Hellwig [Wed, 17 Jan 2018 21:04:38 +0000 (22:04 +0100)]
nvme-pci: take sglist coalescing in dma_map_sg into account
Some iommu implementations can merge physically and/or virtually
contiguous segments inside sg_map_dma. The NVMe SGL support does not take
this into account and will warn because of falling off a loop. Pass the
number of mapped segments to nvme_pci_setup_sgls so that the SGL setup
can take the number of mapped segments into account.
Reported-by: Fangjian (Turing) <f.fangjian@huawei.com>
Fixes: a7a7cbe3 ("nvme-pci: add SGL support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keith Busch [Wed, 17 Jan 2018 21:04:37 +0000 (22:04 +0100)]
nvme-pci: check segement valid for SGL use
The driver needs to verify there is a payload with a command before
seeing if it should use SGLs to map it.
Fixes: 955b1b5a00ba ("nvme-pci: move use_sgl initialization to nvme_init_iod()")
Reported-by: Paul Menzel <pmenzel+linux-nvme@molgen.mpg.de>
Reviewed-by: Paul Menzel <pmenzel+linux-nvme@molgen.mpg.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Wed, 17 Jan 2018 20:30:06 +0000 (12:30 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- A rather involved set of memory hardware encryption fixes to
support the early loading of microcode files via the initrd. These
are larger than what we normally take at such a late -rc stage, but
there are two mitigating factors: 1) much of the changes are
limited to the SME code itself 2) being able to early load
microcode has increased importance in the post-Meltdown/Spectre
era.
- An IRQ vector allocator fix
- An Intel RDT driver use-after-free fix
- An APIC driver bug fix/revert to make certain older systems boot
again
- A pkeys ABI fix
- TSC calibration fixes
- A kdump fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/vector: Fix off by one in error path
x86/intel_rdt/cqm: Prevent use after free
x86/mm: Encrypt the initrd earlier for BSP microcode update
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
x86/mm: Use a struct to reduce parameters for SME PGD mapping
x86/mm: Clean up register saving in the __enc_copy() assembly code
x86/idt: Mark IDT tables __initconst
Revert "x86/apic: Remove init_bsp_APIC()"
x86/mm/pkeys: Fix fill_sig_info_pkey
x86/tsc: Print tsc_khz, when it differs from cpu_khz
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
x86/tsc: Future-proof native_calibrate_tsc()
kdump: Write the correct address of mem_section into vmcoreinfo
Linus Torvalds [Wed, 17 Jan 2018 20:28:22 +0000 (12:28 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"A delayacct statistics correctness fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
delayacct: Account blkio completion on the correct task
Linus Torvalds [Wed, 17 Jan 2018 20:26:37 +0000 (12:26 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 perf fix from Ingo Molnar:
"An Intel RAPL events fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Fix Haswell and Broadwell server RAPL event
Linus Torvalds [Wed, 17 Jan 2018 20:24:42 +0000 (12:24 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two futex fixes: a input parameters robustness fix, and futex race
fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Prevent overflow by strengthen input validation
futex: Avoid violating the 10th rule of futex
Linus Torvalds [Wed, 17 Jan 2018 19:54:56 +0000 (11:54 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 pti bits and fixes from Thomas Gleixner:
"This last update contains:
- An objtool fix to prevent a segfault with the gold linker by
changing the invocation order. That's not just for gold, it's a
general robustness improvement.
- An improved error message for objtool which spares tearing hairs.
- Make KASAN fail loudly if there is not enough memory instead of
oopsing at some random place later
- RSB fill on context switch to prevent RSB underflow and speculation
through other units.
- Make the retpoline/RSB functionality work reliably for both Intel
and AMD
- Add retpoline to the module version magic so mismatch can be
detected
- A small (non-fix) update for cpufeatures which prevents cpu feature
clashing for the upcoming extra mitigation bits to ease
backporting"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
module: Add retpoline tag to VERMAGIC
x86/cpufeature: Move processor tracing out of scattered features
objtool: Improve error message for bad file argument
objtool: Fix seg fault with gold linker
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/kasan: Panic if there is not enough memory to boot
Linus Torvalds [Wed, 17 Jan 2018 19:43:42 +0000 (11:43 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A one-liner fix which prevents deferrable timers becoming stale when
the system does not switch into NOHZ mode"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Unconditionally check deferrable base
Russell King [Sat, 13 Jan 2018 12:11:26 +0000 (12:11 +0000)]
ARM: net: bpf: clarify tail_call index
As per
90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT"), the index used
for array lookup is defined to be 32-bit wide. Update a misleading
comment that suggests it is 64-bit wide.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 21:06:16 +0000 (21:06 +0000)]
ARM: net: bpf: fix LDX instructions
When the source and destination register are identical, our JIT does not
generate correct code, which leads to kernel oopses.
Fix this by (a) generating more efficient code, and (b) making use of
the temporary earlier if we will overwrite the address register.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 22:38:18 +0000 (22:38 +0000)]
ARM: net: bpf: fix register saving
When an eBPF program tail-calls another eBPF program, it enters it after
the prologue to avoid having complex stack manipulations. This can lead
to kernel oopses, and similar.
Resolve this by always using a fixed stack layout, a CPU register frame
pointer, and using this when reloading registers before returning.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 22:51:27 +0000 (22:51 +0000)]
ARM: net: bpf: correct stack layout documentation
The stack layout documentation incorrectly suggests that the BPF JIT
scratch space starts immediately below BPF_FP. This is not correct,
so let's fix the documentation to reflect reality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 21:26:14 +0000 (21:26 +0000)]
ARM: net: bpf: move stack documentation
Move the stack documentation towards the top of the file, where it's
relevant for things like the register layout.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 16:10:07 +0000 (16:10 +0000)]
ARM: net: bpf: fix stack alignment
As per
2dede2d8e925 ("ARM EABI: stack pointer must be 64-bit aligned
after a CPU exception") the stack should be aligned to a 64-bit boundary
on EABI systems. Ensure that the eBPF JIT appropraitely aligns the
stack.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 11:39:54 +0000 (11:39 +0000)]
ARM: net: bpf: fix tail call jumps
When a tail call fails, it is documented that the tail call should
continue execution at the following instruction. An example tail call
sequence is:
12: (85) call bpf_tail_call#12
13: (b7) r0 = 0
14: (95) exit
The ARM assembler for the tail call in this case ends up branching to
instruction 14 instead of instruction 13, resulting in the BPF filter
returning a non-zero value:
178: ldr r8, [sp, #588] ; insn 12
17c: ldr r6, [r8, r6]
180: ldr r8, [sp, #580]
184: cmp r8, r6
188: bcs 0x1e8
18c: ldr r6, [sp, #524]
190: ldr r7, [sp, #528]
194: cmp r7, #0
198: cmpeq r6, #32
19c: bhi 0x1e8
1a0: adds r6, r6, #1
1a4: adc r7, r7, #0
1a8: str r6, [sp, #524]
1ac: str r7, [sp, #528]
1b0: mov r6, #104
1b4: ldr r8, [sp, #588]
1b8: add r6, r8, r6
1bc: ldr r8, [sp, #580]
1c0: lsl r7, r8, #2
1c4: ldr r6, [r6, r7]
1c8: cmp r6, #0
1cc: beq 0x1e8
1d0: mov r8, #32
1d4: ldr r6, [r6, r8]
1d8: add r6, r6, #44
1dc: bx r6
1e0: mov r0, #0 ; insn 13
1e4: mov r1, #0
1e8: add sp, sp, #596 ; insn 14
1ec: pop {r4, r5, r6, r7, r8, sl, pc}
For other sequences, the tail call could end up branching midway through
the following BPF instructions, or maybe off the end of the function,
leading to unknown behaviours.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Russell King [Sat, 13 Jan 2018 11:35:15 +0000 (11:35 +0000)]
ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUs
Avoid the 'bx' instruction on CPUs that have no support for Thumb and
thus do not implement this instruction by moving the generation of this
opcode to a separate function that selects between:
bx reg
and
mov pc, reg
according to the capabilities of the CPU.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Thomas Gleixner [Tue, 16 Jan 2018 11:20:18 +0000 (12:20 +0100)]
x86/apic/vector: Fix off by one in error path
Keith reported the following warning:
WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
x86_vector_free_irqs+0xa1/0x180
x86_vector_alloc_irqs+0x1e4/0x3a0
msi_domain_alloc+0x62/0x130
The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.
Adjust the error path to handle this correctly.
Fixes: b5dc8e6c21e7 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
Reported-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
Thomas Gleixner [Tue, 16 Jan 2018 18:59:59 +0000 (19:59 +0100)]
x86/intel_rdt/cqm: Prevent use after free
intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.
BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr
ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510
Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510
Do the teardown first and then free memory.
Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Roderick W. Smith" <rod.smith@canonical.com>
Cc: 1733662@bugs.launchpad.net
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
Andi Kleen [Tue, 16 Jan 2018 20:52:28 +0000 (12:52 -0800)]
module: Add retpoline tag to VERMAGIC
Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.
It doesn't handle the case when retpoline has been runtime disabled. Even
in this case the match of the retcompile status will be enforced. This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
Paolo Bonzini [Tue, 16 Jan 2018 15:42:25 +0000 (16:42 +0100)]
x86/cpufeature: Move processor tracing out of scattered features
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
so do not duplicate it in the scattered features word.
Besides being more tidy, this will be useful for KVM when it presents
processor tracing to the guests. KVM selects host features that are
supported by both the host kernel (depending on command line options,
CPU errata, or whatever) and KVM. Whenever a full feature word exists,
KVM's code is written in the expectation that the CPUID bit number
matches the X86_FEATURE_* bit number, but this is not the case for
X86_FEATURE_INTEL_PT.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 17 Jan 2018 00:47:40 +0000 (16:47 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"We had a few more items creep up over the last week. Given we are in
-rc8, these are obviously limited to bugs that have a big downside and
for which we are certain of the fix.
The first is a straight up oops bug that all you have to do is read
the code to see it's a guaranteed 100% oops bug.
The second is a use-after-free issue. We get away lucky if the queue
we are shutting down is empty, but if it isn't, we can end up oopsing.
We really need to drain the queue before destroying it.
The final one is an issue with bad user input causing us to access our
port array out of bounds. While fixing the array out of bounds issue,
it was noticed that the original code did the same thing twice (the
call to rdma_ah_set_port_num()), so its removal is not balanced by a
readd elsewhere, it was already where it needed to be in addition to
where it didn't need to be.
Summary:
- Oops fix in hfi1 driver
- use-after-free issue in iser-target
- use of user supplied array index without proper checking"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Fix out-of-bound access while querying AH
IB/hfi1: Prevent a NULL dereference
iser-target: Fix possible use-after-free in connection establishment error
Linus Walleij [Tue, 16 Jan 2018 08:51:51 +0000 (09:51 +0100)]
gpio: mmio: Also read bits that are zero
The code for .get_multiple() has bugs:
1. The simple .get_multiple() just reads a register, masks it
and sets the return value. This is not correct: we only want to
assign values (whether 0 or 1) to the bits that are set in the
mask. Fix this by using &= ~mask to clear all bits in the mask
and then |= val & mask to set the corresponding bits from the
read.
2. The bgpio_get_multiple_be() call has a similar problem: it
uses the |= operator to set the bits, so only the bits in the
mask are affected, but it misses to clear all returned bits
from the mask initially, so some bits will be returned
erroneously set to 1.
3. The bgpio_get_set_multiple() again fails to clear the bits
from the mask.
4. find_next_bit() wasn't handled correctly, use a totally
different approach for one function and change the other
function to follow the design pattern of assigning the first
bit to -1, then use bit + 1 in the for loop and < num_iterations
as break condition.
Fixes: 80057cb417b2 ("gpio-mmio: Use the new .get_multiple() callback")
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Reported-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Tue, 16 Jan 2018 20:45:30 +0000 (12:45 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers.
2) Memory leak in key_notify_policy(), from Steffen Klassert.
3) Fix overflow with bpf arrays, from Daniel Borkmann.
4) Fix RDMA regression with mlx5 due to mlx5 no longer using
pci_irq_get_affinity(), from Saeed Mahameed.
5) Missing RCU read locking in nl80211_send_iface() when it calls
ieee80211_bss_get_ie(), from Dominik Brodowski.
6) cfg80211 should check dev_set_name()'s return value, from Johannes
Berg.
7) Missing module license tag in 9p protocol, from Stephen Hemminger.
8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike
Maloney.
9) Fix endless loop in netlink extack code, from David Ahern.
10) TLS socket layer sets inverted error codes, resulting in an endless
loop. From Robert Hering.
11) Revert openvswitch erspan tunnel support, it's mis-designed and we
need to kill it before it goes into a real release. From William Tu.
12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
net, sched: fix panic when updating miniq {b,q}stats
qed: Fix potential use-after-free in qed_spq_post()
nfp: use the correct index for link speed table
lan78xx: Fix failure in USB Full Speed
sctp: do not allow the v4 socket to bind a v4mapped v6 address
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
ibmvnic: Fix pending MAC address changes
netlink: extack: avoid parenthesized string constant warning
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
net: Allow neigh contructor functions ability to modify the primary_key
sh_eth: fix dumping ARSTR
Revert "openvswitch: Add erspan tunnel support."
net/tls: Fix inverted error codes to avoid endless loop
ipv6: ip6_make_skb() needs to clear cork.base.dst
sctp: avoid compiler warning on implicit fallthru
net: ipv4: Make "ip route get" match iif lo rules again.
netlink: extack needs to be reset each time through loop
tipc: fix a memory leak in tipc_nl_node_get_link()
ipv6: fix udpv6 sendmsg crash caused by too small MTU
...
Linus Torvalds [Tue, 16 Jan 2018 20:13:52 +0000 (12:13 -0800)]
Merge tag 'sound-4.15' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few small last-minute fixes that should sneak into 4.15:
- remove a spurious WARN_ON() triggered by syzkaller
- fix for ioctl races in ALSA sequencer
- two trivial HD-audio fixup entries"
* tag 'sound-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: seq: Make ioctls race-free
ALSA: pcm: Remove yet superfluous WARN_ON()
ALSA: hda - Apply the existing quirk to iMac 14,1
ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
Linus Torvalds [Tue, 16 Jan 2018 20:09:36 +0000 (12:09 -0800)]
Merge tag 'trace-v4.15-rc4-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Bring back context level recursive protection in ring buffer.
The simpler counter protection failed, due to a path when tracing
with trace_clock_global() as it could not be reentrant and depended
on the ring buffer recursive protection to keep that from happening.
- Prevent branch profiling when FORTIFY_SOURCE is enabled.
It causes 50 - 60 MB in warning messages. Branch profiling should
never be run on production systems, so there's no reason that it
needs to be enabled with FORTIFY_SOURCE.
* tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
ring-buffer: Bring back context level recursive checks
Daniel Borkmann [Mon, 15 Jan 2018 22:12:09 +0000 (23:12 +0100)]
net, sched: fix panic when updating miniq {b,q}stats
While working on fixing another bug, I ran into the following panic
on arm64 by simply attaching clsact qdisc, adding a filter and running
traffic on ingress to it:
[...]
[ 178.188591] Unable to handle kernel read from unreadable memory at virtual address
810fb501f000
[ 178.197314] Mem abort info:
[ 178.200121] ESR = 0x96000004
[ 178.203168] Exception class = DABT (current EL), IL = 32 bits
[ 178.209095] SET = 0, FnV = 0
[ 178.212157] EA = 0, S1PTW = 0
[ 178.215288] Data abort info:
[ 178.218175] ISV = 0, ISS = 0x00000004
[ 178.222019] CM = 0, WnR = 0
[ 178.224997] user pgtable: 4k pages, 48-bit VAs, pgd =
0000000023cb3f33
[ 178.231531] [
0000810fb501f000] *pgd=
0000000000000000
[ 178.236508] Internal error: Oops:
96000004 [#1] SMP
[...]
[ 178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G W 4.15.0-rc7+ #5
[ 178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017
[ 178.326887] pstate:
60400005 (nZCv daif +PAN -UAO)
[ 178.331685] pc : __netif_receive_skb_core+0x49c/0xac8
[ 178.336728] lr : __netif_receive_skb+0x28/0x78
[ 178.341161] sp :
ffff00002344b750
[ 178.344465] x29:
ffff00002344b750 x28:
ffff810fbdfd0580
[ 178.349769] x27:
0000000000000000 x26:
ffff000009378000
[...]
[ 178.418715] x1 :
0000000000000054 x0 :
0000000000000000
[ 178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4)
[ 178.430537] Call trace:
[ 178.432976] __netif_receive_skb_core+0x49c/0xac8
[ 178.437670] __netif_receive_skb+0x28/0x78
[ 178.441757] process_backlog+0x9c/0x160
[ 178.445584] net_rx_action+0x2f8/0x3f0
[...]
Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
Problem is that this cannot work since they are actually set up right after
the qdisc ->init() callback in qdisc_create(), so first packet going into
sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
therefore panic.
In order to fix this, allocation of {b,q}stats needs to happen before we
call into ->init(). In net-next, there's already such option through commit
d59f5ffa59d8 ("net: sched: a dflt qdisc may be used with per cpu stats").
However, the bug needs to be fixed in net still for 4.15. Thus, include
these bits to reduce any merge churn and reuse the static_flags field to
set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
there is no other user left. Prashant Bhole ran into the same issue but
for net-next, thus adding him below as well as co-author. Same issue was
also reported by Sandipan Das when using bcc.
Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html
Reported-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Co-authored-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Co-authored-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roland Dreier [Mon, 15 Jan 2018 20:24:49 +0000 (12:24 -0800)]
qed: Fix potential use-after-free in qed_spq_post()
We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
calling qed_spq_add_entry(). The test is fine is the mode is EBLOCK,
but if it isn't then qed_spq_add_entry() might kfree(p_ent).
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 15 Jan 2018 19:47:53 +0000 (11:47 -0800)]
nfp: use the correct index for link speed table
sts variable is holding link speed as well as state. We should
be using ls to index into ls_to_ethtool.
Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuiko Oshino [Mon, 15 Jan 2018 18:24:28 +0000 (13:24 -0500)]
lan78xx: Fix failure in USB Full Speed
Fix initialize the uninitialized tx_qlen to an appropriate value when USB
Full Speed is used.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Jan 2018 19:28:14 +0000 (14:28 -0500)]
Merge tag 'mac80211-for-davem-2018-01-15' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
More fixes:
* hwsim:
- properly flush deletion works at module unload
- validate # of channels passed from userspace
* cfg80211:
- fix RCU locking regression
- initialize on-stack channel data for nl80211 event
- check dev_set_name() return value
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Jan 2018 09:02:00 +0000 (17:02 +0800)]
sctp: do not allow the v4 socket to bind a v4mapped v6 address
The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.
The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.
This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.
Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Jan 2018 09:01:36 +0000 (17:01 +0800)]
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
After commit
cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.
However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.
This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.
Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Jan 2018 09:01:19 +0000 (17:01 +0800)]
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
After introducing sctp_stream structure, sctp uses stream->outcnt as the
out stream nums instead of c.sinit_num_ostreams.
However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
in sctp_sendmsg. At that moment, stream->outcnt is still using previous
value. If it's value is not updated, the sinit_num_ostreams of sinit could
not really work.
This patch is to fix it by updating stream->outcnt and reiniting stream
if stream outcnt has been change by sinit in sendmsg.
Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 11 Jan 2018 01:39:52 +0000 (19:39 -0600)]
ibmvnic: Fix pending MAC address changes
Due to architecture limitations, the IBM VNIC client driver is unable
to perform MAC address changes unless the device has "logged in" to
its backing device. Currently, pending MAC changes are handled before
login, resulting in an error and failure to change the MAC address.
Moving that chunk to the end of the ibmvnic_login function, when we are
sure that it was successful, fixes that.
The MAC address can be changed when the device is up or down, so
only check if the device is in a "PROBED" state before setting the
MAC address.
Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Snyder [Mon, 18 Dec 2017 16:15:10 +0000 (16:15 +0000)]
delayacct: Account blkio completion on the correct task
Before commit:
e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
delayacct_blkio_end() was called after context-switching into the task which
completed I/O.
This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.
With
e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.
But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.
Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
so that it can update the statistics of the correct task.
Signed-off-by: Josh Snyder <joshs@netflix.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Brendan Gregg <bgregg@netflix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-block@vger.kernel.org
Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Wed, 10 Jan 2018 19:26:34 +0000 (13:26 -0600)]
x86/mm: Encrypt the initrd earlier for BSP microcode update
Currently the BSP microcode update code examines the initrd very early
in the boot process. If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet. Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Wed, 10 Jan 2018 19:26:26 +0000 (13:26 -0600)]
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
In preparation for encrypting more than just the kernel, the encryption
support in sme_encrypt_kernel() needs to support 4KB page aligned
encryption instead of just 2MB large page aligned encryption.
Update the routines that populate the PGD to support non-2MB aligned
addresses. This is done by creating PTE page tables for the start
and end portion of the address range that fall outside of the 2MB
alignment. This results in, at most, two extra pages to hold the
PTE entries for each mapping of a range.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192626.6026.75387.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Wed, 10 Jan 2018 19:26:16 +0000 (13:26 -0600)]
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
In preparation for encrypting more than just the kernel during early
boot processing, centralize the use of the PMD flag settings based
on the type of mapping desired. When 4KB aligned encryption is added,
this will allow either PTE flags or large page PMD flags to be used
without requiring the caller to adjust.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192615.6026.14767.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Wed, 10 Jan 2018 19:26:05 +0000 (13:26 -0600)]
x86/mm: Use a struct to reduce parameters for SME PGD mapping
In preparation for follow-on patches, combine the PGD mapping parameters
into a struct to reduce the number of function arguments and allow for
direct updating of the next pagetable mapping area pointer.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192605.6026.96206.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tom Lendacky [Wed, 10 Jan 2018 19:25:56 +0000 (13:25 -0600)]
x86/mm: Clean up register saving in the __enc_copy() assembly code
Clean up the use of PUSH and POP and when registers are saved in the
__enc_copy() assembly function in order to improve the readability of the code.
Move parameter register saving into general purpose registers earlier
in the code and move all the pushes to the beginning of the function
with corresponding pops at the end.
We do this to prepare fixes.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192556.6026.74187.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Mon, 15 Jan 2018 14:17:08 +0000 (08:17 -0600)]
objtool: Improve error message for bad file argument
If a nonexistent file is supplied to objtool, it complains with a
non-helpful error:
open: No such file or directory
Improve it to:
objtool: Can't open 'foo': No such file or directory
Reported-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/406a3d00a21225eee2819844048e17f68523ccf6.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Mon, 15 Jan 2018 14:17:07 +0000 (08:17 -0600)]
objtool: Fix seg fault with gold linker
Objtool segfaults when the gold linker is used with
CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.
With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
being passed to objtool. The gold linker seems to strip unused ELF
symbols by default, which confuses objtool and causes the seg fault when
it's trying to generate ORC metadata.
Objtool should really be running immediately after GCC anyway, without a
linker call in between. Change the makefile ordering so that objtool is
called before the linker.
Reported-and-tested-by: Markus <M4rkusXXL@web.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/355f04da33581f4a3bf82e5b512973624a1e23a2.1516025651.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Leon Romanovsky [Fri, 12 Jan 2018 05:58:39 +0000 (07:58 +0200)]
RDMA/mlx5: Fix out-of-bound access while querying AH
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.
==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr
ffff880019ae2268 by task ibv_rc_pingpong/409
CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted
4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xe9/0x18f
print_address_description+0xa2/0x350
kasan_report+0x3a5/0x400
to_rdma_ah_attr+0xa8/0x3b0
mlx5_ib_query_qp+0xd35/0x1330
ib_query_qp+0x8a/0xb0
ib_uverbs_query_qp+0x237/0x7f0
ib_uverbs_write+0x617/0xd80
__vfs_write+0xf7/0x500
vfs_write+0x149/0x310
SyS_write+0xca/0x190
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:
00007ffee5498738 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
00007fe9c7ce4b00 RCX:
00007fe9c7a275a0
RDX:
0000000000000018 RSI:
00007ffee5498800 RDI:
0000000000000003
RBP:
000055d0c8d3f010 R08:
00007ffee5498800 R09:
0000000000000018
R10:
00000000000000ba R11:
0000000000000246 R12:
0000000000008000
R13:
0000000000004fb0 R14:
000055d0c8d3f050 R15:
00007ffee5498560
Allocated by task 1:
__kmalloc+0x3f9/0x430
alloc_mad_private+0x25/0x50
ib_mad_post_receive_mads+0x204/0xa60
ib_mad_init_device+0xa59/0x1020
ib_register_device+0x83a/0xbc0
mlx5_ib_add+0x50e/0x5c0
mlx5_add_device+0x142/0x410
mlx5_register_interface+0x18f/0x210
mlx5_ib_init+0x56/0x63
do_one_initcall+0x15b/0x270
kernel_init_freeable+0x2d8/0x3d0
kernel_init+0x14/0x190
ret_from_fork+0x24/0x30
Freed by task 0:
(stack is not available)
The buggy address belongs to the object at
ffff880019ae2000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
512-byte region [
ffff880019ae2000,
ffff880019ae2200)
The buggy address belongs to the page:
page:
000000005d674e18 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw:
4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw:
dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>
ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
Cc: <stable@vger.kernel.org>
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Johannes Berg [Mon, 15 Jan 2018 11:42:25 +0000 (12:42 +0100)]
netlink: extack: avoid parenthesized string constant warning
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
in newer versions of gcc:
warning: array initialized from parenthesized string constant
Just remove the parentheses, they're not needed in this context since
anyway since there can be no operator precendence issues or similar.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Jan 2018 19:53:44 +0000 (14:53 -0500)]
Merge branch 'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY'
Jim Westfall says:
====================
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
This used to be the previous behavior in older kernels but became broken in
a263b3093641f (ipv4: Make neigh lookups directly in output packet path)
and then later removed because it was broken in
0bb4087cbec0 (ipv4: Fix neigh
lookup keying over loopback/point-to-point devices)
Not having this results in there being an arp entry for every remote ip
address that the device talks to. Given a fairly active device it can
cause the arp table to become huge and/or having to add/purge large number
of entires to keep within table size thresholds.
$ ip -4 neigh show nud noarp | grep tun | wc -l
55850
$ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10
arp_cach|arp_cach|arp_cach|
entries| allocs|destroys|
81493|
620166816|
620126069|
101867| 10186| 0|
113854| 5993| 0|
118773| 2459| 0|
27937| 18579| 63998|
39256| 5659| 0|
56231| 8487| 0|
65602| 4685| 0|
79697| 7047| 0|
90733| 5517| 0|
v2:
- fixes coding style issues
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jim Westfall [Sun, 14 Jan 2018 12:18:51 +0000 (04:18 -0800)]
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in
a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in
0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jim Westfall [Sun, 14 Jan 2018 12:18:50 +0000 (04:18 -0800)]
net: Allow neigh contructor functions ability to modify the primary_key
Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 13 Jan 2018 17:22:01 +0000 (20:22 +0300)]
sh_eth: fix dumping ARSTR
ARSTR is always located at the start of the TSU register region, thus
using add_reg() instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR. Use the correct condition/macro there...
Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Fri, 12 Jan 2018 20:29:22 +0000 (12:29 -0800)]
Revert "openvswitch: Add erspan tunnel support."
This reverts commit
ceaa001a170e43608854d5290a48064f57b565ed.
The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field. Thus, this
patch reverts it and later patch will redo it using nested attr.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Pravin Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
r.hering@avm.de [Fri, 12 Jan 2018 14:42:06 +0000 (15:42 +0100)]
net/tls: Fix inverted error codes to avoid endless loop
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().
Signed-off-by: Robert Hering <r.hering@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 12 Jan 2018 06:31:18 +0000 (22:31 -0800)]
ipv6: ip6_make_skb() needs to clear cork.base.dst
In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :
If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.
Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 15 Jan 2018 19:07:27 +0000 (11:07 -0800)]
tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
I regularly get 50 MB - 60 MB files during kernel randconfig builds.
These large files mostly contain (many repeats of; e.g., 124,594):
In file included from ../include/linux/string.h:6:0,
from ../include/linux/uuid.h:20,
from ../include/linux/mod_devicetable.h:13,
from ../scripts/mod/devicetable-offsets.c:3:
../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
______f = { \
^
../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
^
../include/linux/string.h:425:2: note: in expansion of macro 'if'
if (p_size == (size_t)-1 && q_size == (size_t)-1)
^
This only happens when CONFIG_FORTIFY_SOURCE=y and
CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
FORTIFY_SOURCE=y.
Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Marcelo Ricardo Leitner [Thu, 11 Jan 2018 16:22:06 +0000 (14:22 -0200)]
sctp: avoid compiler warning on implicit fallthru
These fall-through are expected.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 11 Jan 2018 09:36:26 +0000 (18:36 +0900)]
net: ipv4: Make "ip route get" match iif lo rules again.
Commit
3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.
Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.
inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 10 Jan 2018 21:00:39 +0000 (13:00 -0800)]
netlink: extack needs to be reset each time through loop
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.
Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 10 Jan 2018 20:50:25 +0000 (12:50 -0800)]
tipc: fix a memory leak in tipc_nl_node_get_link()
When tipc_node_find_by_name() fails, the nlmsg is not
freed.
While on it, switch to a goto label to properly
free it.
Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Wed, 10 Jan 2018 17:45:10 +0000 (12:45 -0500)]
ipv6: fix udpv6 sendmsg crash caused by too small MTU
The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers. A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU. For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.
Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.
Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task:
ffff8801d0b68580 task.stack:
ffff8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:
ffff8801ac6bf570 EFLAGS:
00010216
RAX:
0000000000010000 RBX:
0000000000000028 RCX:
ffffc90003cce000
RDX:
00000000000001b8 RSI:
ffffffff839df06f RDI:
ffff8801d9478ca0
RBP:
ffff8801ac6bf780 R08:
ffff8801cc3f1dbc R09:
0000000000000000
R10:
ffff8801ac6bf7a0 R11:
43cb4b7b1948a9e7 R12:
ffff8801cc3f1dc8
R13:
ffff8801cc3f1d40 R14:
0000000000001036 R15:
dffffc0000000000
FS:
00007f43d740c700(0000) GS:
ffff8801dc100000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f7834984000 CR3:
00000001d79b9000 CR4:
00000000001406e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
ip6_finish_skb include/net/ipv6.h:911 [inline]
udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x352/0x5a0 net/socket.c:1750
SyS_sendto+0x40/0x50 net/socket.c:1718
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:
00007f43d740bc08 EFLAGS:
00000216 ORIG_RAX:
000000000000002c
RAX:
ffffffffffffffda RBX:
00000000007180a8 RCX:
00000000004512e9
RDX:
000000000000002e RSI:
0000000020d08000 RDI:
0000000000000005
RBP:
0000000000000086 R08:
00000000209c1000 R09:
000000000000001c
R10:
0000000000040800 R11:
0000000000000216 R12:
00000000004b9c69
R13:
00000000ffffffff R14:
0000000000000005 R15:
00000000202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP:
ffff8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP:
ffff8801ac6bf570
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mike Maloney <maloney@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 10 Jan 2018 16:30:22 +0000 (17:30 +0100)]
net: cs89x0: add MODULE_LICENSE
This driver lacks a MODULE_LICENSE tag, leading to a Kbuild warning:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/cirrus/cs89x0.o
This adds license, author, and description according to the
comment block at the start of the file.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Wed, 10 Jan 2018 15:24:45 +0000 (16:24 +0100)]
ppp: unlock all_ppp_mutex before registering device
ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().
Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.
However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Wed, 10 Jan 2018 14:03:05 +0000 (16:03 +0200)]
ptr_ring: document usage around __ptr_ring_peek
This explains why is the net usage of __ptr_ring_peek
actually ok without locks.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Mon, 8 Jan 2018 16:23:18 +0000 (08:23 -0800)]
9p: add missing module license for xen transport
The 9P of Xen module is missing required license and module information.
See https://bugzilla.kernel.org/show_bug.cgi?id=198109
Reported-by: Alan Bartlett <ajb@elrepo.org>
Fixes: 868eb122739a ("xen/9pfs: introduce Xen 9pfs transport driver")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Rostedt (VMware) [Mon, 15 Jan 2018 15:47:09 +0000 (10:47 -0500)]
ring-buffer: Bring back context level recursive checks
Commit
1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
simpler") replaced the context level recursion checks with a simple counter.
This would prevent the ring buffer code from recursively calling itself more
than the max number of contexts that exist (Normal, softirq, irq, nmi). But
this change caused a lockup in a specific case, which was during suspend and
resume using a global clock. Adding a stack dump to see where this occurred,
the issue was in the trace global clock itself:
trace_buffer_lock_reserve+0x1c/0x50
__trace_graph_entry+0x2d/0x90
trace_graph_entry+0xe8/0x200
prepare_ftrace_return+0x69/0xc0
ftrace_graph_caller+0x78/0xa8
queued_spin_lock_slowpath+0x5/0x1d0
trace_clock_global+0xb0/0xc0
ring_buffer_lock_reserve+0xf9/0x390
The function graph tracer traced queued_spin_lock_slowpath that was called
by trace_clock_global. This pointed out that the trace_clock_global() is not
reentrant, as it takes a spin lock. It depended on the ring buffer recursive
lock from letting that happen.
By removing the context detection and adding just a max number of allowable
recursions, it allowed the trace_clock_global() to be entered again and try
to retake the spinlock it already held, causing a deadlock.
Fixes: 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
Reported-by: David Weinehall <david.weinehall@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Benoît Thébaudeau [Sun, 14 Jan 2018 18:43:05 +0000 (19:43 +0100)]
mmc: sdhci-esdhc-imx: Fix i.MX53 eSDHCv3 clock
Commit
5143c953a786 ("mmc: sdhci-esdhc-imx: Allow all supported
prescaler values") made it possible to set SYSCTL.SDCLKFS to 0 in SDR
mode, thus bypassing the SD clock frequency prescaler, in order to be
able to get higher SD clock frequencies in some contexts. However, that
commit missed the fact that this value is illegal on the eSDHCv3
instance of the i.MX53. This seems to be the only exception on i.MX,
this value being legal even for the eSDHCv2 instances of the i.MX53.
Fix this issue by changing the minimum prescaler value if the i.MX53
eSDHCv3 is detected. According to the i.MX53 reference manual, if
DLLCTRL[10] can be set, then the controller is eSDHCv3, else it is
eSDHCv2.
This commit fixes the following issue, which was preventing the i.MX53
Loco (IMX53QSB) board from booting Linux 4.15.0-rc5:
[ 1.882668] mmcblk1: error -84 transferring data, sector 2048, nr 8, cmd response 0x900, card status 0xc00
[ 2.002255] mmcblk1: error -84 transferring data, sector 2050, nr 6, cmd response 0x900, card status 0xc00
[ 12.645056] mmc1: Timeout waiting for hardware interrupt.
[ 12.650473] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 12.656921] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00001201
[ 12.663366] mmc1: sdhci: Blk size: 0x00000004 | Blk cnt: 0x00000000
[ 12.669813] mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
[ 12.676258] mmc1: sdhci: Present: 0x01f8028f | Host ctl: 0x00000013
[ 12.682703] mmc1: sdhci: Power: 0x00000002 | Blk gap: 0x00000000
[ 12.689148] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x0000003f
[ 12.695594] mmc1: sdhci: Timeout: 0x0000008e | Int stat: 0x00000000
[ 12.702039] mmc1: sdhci: Int enab: 0x107f004b | Sig enab: 0x107f004b
[ 12.708485] mmc1: sdhci: AC12 err: 0x00000000 | Slot int: 0x00001201
[ 12.714930] mmc1: sdhci: Caps: 0x07eb0000 | Caps_1: 0x08100810
[ 12.721375] mmc1: sdhci: Cmd: 0x0000163a | Max curr: 0x00000000
[ 12.727821] mmc1: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x00000000
[ 12.734265] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
[ 12.740709] mmc1: sdhci: Host ctl2: 0x00000000
[ 12.745157] mmc1: sdhci: ADMA Err: 0x00000001 | ADMA Ptr: 0xc8049200
[ 12.751601] mmc1: sdhci: ============================================
[ 12.758110] print_req_error: I/O error, dev mmcblk1, sector 2050
[ 12.764135] Buffer I/O error on dev mmcblk1p1, logical block 0, lost sync page write
[ 12.775163] EXT4-fs (mmcblk1p1): mounted filesystem without journal. Opts: (null)
[ 12.782746] VFS: Mounted root (ext4 filesystem) on device 179:9.
[ 12.789151] mmcblk1: response CRC error sending SET_BLOCK_COUNT command, card status 0x900
Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau.dev@gmail.com>
Reported-by: Wladimir J. van der Laan <laanwj@gmail.com>
Tested-by: Wladimir J. van der Laan <laanwj@gmail.com>
Fixes: 5143c953a786 ("mmc: sdhci-esdhc-imx: Allow all supported prescaler values")
Cc: <stable@vger.kernel.org> # v4.13+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Johannes Berg [Mon, 15 Jan 2018 08:58:27 +0000 (09:58 +0100)]
cfg80211: check dev_set_name() return value
syzbot reported a warning from rfkill_alloc(), and after a while
I think that the reason is that it was doing fault injection and
the dev_set_name() failed, leaving the name NULL, and we didn't
check the return value and got to rfkill_alloc() with a NULL name.
Since we really don't want a NULL name, we ought to check the
return value.
Fixes: fb28ad35906a ("net: struct device - replace bus_id with dev_name(), dev_set_name()")
Reported-by: syzbot+1ddfb3357e1d7bb5b5d3@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 15 Jan 2018 08:32:36 +0000 (09:32 +0100)]
mac80211_hwsim: validate number of different channels
When creating a new radio on the fly, hwsim allows this
to be done with an arbitrary number of channels, but
cfg80211 only supports a limited number of simultaneous
channels, leading to a warning.
Fix this by validating the number - this requires moving
the define for the maximum out to a visible header file.
Reported-by: syzbot+8dd9051ff19940290931@syzkaller.appspotmail.com
Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Benjamin Beichler [Wed, 10 Jan 2018 16:42:51 +0000 (17:42 +0100)]
mac80211_hwsim: add workqueue to wait for deferred radio deletion on mod unload
When closing multiple wmediumd instances with many radios and try to
unload the mac80211_hwsim module, it may happen that the work items live
longer than the module. To wait especially for this deletion work items,
add a work queue, otherwise flush_scheduled_work would be necessary.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Dominik Brodowski [Mon, 15 Jan 2018 07:12:15 +0000 (08:12 +0100)]
nl80211: take RCU read lock when calling ieee80211_bss_get_ie()
As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both
the call to this function and any operation on this variable need
protection by the RCU read lock.
Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces")
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 15 Jan 2018 08:12:05 +0000 (09:12 +0100)]
cfg80211: fully initialize old channel for event
Paul reported that he got a report about undefined behaviour
that seems to me to originate in using uninitialized memory
when the channel structure here is used in the event code in
nl80211 later.
He never reported whether this fixed it, and I wasn't able
to trigger this so far, but we should do the right thing and
fully initialize the on-stack structure anyway.
Reported-by: Paul Menzel <pmenzel+linux-wireless@molgen.mpg.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tom Lendacky [Sat, 13 Jan 2018 23:27:30 +0000 (17:27 -0600)]
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
The PAUSE instruction is currently used in the retpoline and RSB filling
macros as a speculation trap. The use of PAUSE was originally suggested
because it showed a very, very small difference in the amount of
cycles/time used to execute the retpoline as compared to LFENCE. On AMD,
the PAUSE instruction is not a serializing instruction, so the pause/jmp
loop will use excess power as it is speculated over waiting for return
to mispredict to the correct target.
The RSB filling macro is applicable to AMD, and, if software is unable to
verify that LFENCE is serializing on AMD (possible when running under a
hypervisor), the generic retpoline support will be used and, so, is also
applicable to AMD. Keep the current usage of PAUSE for Intel, but add an
LFENCE instruction to the speculation trap for AMD.
The same sequence has been adopted by GCC for the GCC generated retpolines.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Kees Cook <keescook@google.com>
Link: https://lkml.kernel.org/r/20180113232730.31060.36287.stgit@tlendack-t1.amdoffice.net
David Woodhouse [Fri, 12 Jan 2018 17:49:25 +0000 (17:49 +0000)]
x86/retpoline: Fill RSB on context switch for affected CPUs
On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.
This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.
Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.
On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.
[ tglx: Added missing vendor check and slighty massaged comments and
changelog ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
Andrey Ryabinin [Wed, 10 Jan 2018 15:36:02 +0000 (18:36 +0300)]
x86/kasan: Panic if there is not enough memory to boot
Currently KASAN doesn't panic in case it don't have enough memory
to boot. Instead, it crashes in some random place:
kernel BUG at arch/x86/mm/physaddr.c:27!
RIP: 0010:__phys_addr+0x268/0x276
Call Trace:
kasan_populate_shadow+0x3f2/0x497
kasan_init+0x12e/0x2b2
setup_arch+0x2825/0x2a2c
start_kernel+0xc8/0x15f4
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x72/0x75
secondary_startup_64+0xa5/0xb0
Use memblock_virt_alloc_try_nid() for allocations without failure
fallback. It will panic with an out of memory message.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: lkp@01.org
Link: https://lkml.kernel.org/r/20180110153602.18919-1-aryabinin@virtuozzo.com
Linus Torvalds [Sun, 14 Jan 2018 23:32:30 +0000 (15:32 -0800)]
Linux 4.15-rc8
Linus Torvalds [Sun, 14 Jan 2018 23:30:02 +0000 (15:30 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixlet from Thomas Gleixner.
Remove a warning about lack of compiler support for retpoline that most
people can't do anything about, so it just annoys them needlessly.
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Remove compile time warning
Linus Torvalds [Sun, 14 Jan 2018 23:03:17 +0000 (15:03 -0800)]
Merge tag 'powerpc-4.15-7' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for an oops at boot if we take a hotplug interrupt before we
are ready to handle it.
The bulk is patches to implement mitigation for Meltdown, see the
change logs for more details.
Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon
Masters, Jose Ricardo Ziviani, David Gibson"
* tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Check device-tree for RFI flush settings
powerpc/pseries: Query hypervisor for RFI flush settings
powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
powerpc/64s: Add support for RFI flush of L1-D cache
powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
powerpc/64s: Simple RFI macro conversions
powerpc/64: Add macros for annotating the destination of rfid/hrfid
powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
Thomas Gleixner [Sun, 14 Jan 2018 22:19:49 +0000 (23:19 +0100)]
timers: Unconditionally check deferrable base
When the timer base is checked for expired timers then the deferrable base
must be checked as well. This was missed when making the deferrable base
independent of base::nohz_active.
Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Cc: rt@linutronix.de
Thomas Gleixner [Sun, 14 Jan 2018 21:13:29 +0000 (22:13 +0100)]
x86/retpoline: Remove compile time warning
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:
It's wrong because it will just make people turn off RETPOLINE, and the
asm updates - and return stack clearing - that are independent of the
compiler are likely the most important parts because they are likely the
ones easiest to target.
And it's annoying because most people won't be able to do anything about
it. The number of people building their own compiler? Very small. So if
their distro hasn't got a compiler yet (and pretty much nobody does), the
warning is just annoying crap.
It is already properly reported as part of the sysfs interface. The
compile-time warning only encourages bad things.
Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
Andi Kleen [Fri, 22 Dec 2017 00:18:21 +0000 (16:18 -0800)]
x86/idt: Mark IDT tables __initconst
const variables must use __initconst, not __initdata.
Fix this up for the IDT tables, which got it consistently wrong.
Fixes: 16bc18d895ce ("x86/idt: Move 32-bit idt_descr to C code")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171222001821.2157-7-andi@firstfloor.org
Linus Torvalds [Sun, 14 Jan 2018 18:22:45 +0000 (10:22 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull NVMe fix from Jens Axboe:
"Just a single fix for nvme over fabrics that should go into 4.15"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-fabrics: initialize default host->id in nvmf_host_default()
Li Jinyue [Thu, 14 Dec 2017 09:04:54 +0000 (17:04 +0800)]
futex: Prevent overflow by strengthen input validation
UBSAN reports signed integer overflow in kernel/futex.c:
UBSAN: Undefined behaviour in kernel/futex.c:2041:18
signed integer overflow:
0 - -
2147483648 cannot be represented in type 'int'
Add a sanity check to catch negative values of nr_wake and nr_requeue.
Signed-off-by: Li Jinyue <lijinyue@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: dvhart@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com
Linus Torvalds [Sun, 14 Jan 2018 17:51:25 +0000 (09:51 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
Peter Zijlstra [Fri, 8 Dec 2017 12:49:39 +0000 (13:49 +0100)]
futex: Avoid violating the 10th rule of futex
Julia reported futex state corruption in the following scenario:
waiter waker stealer (prio > waiter)
futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
timeout=[N ms])
futex_wait_requeue_pi()
futex_wait_queue_me()
freezable_schedule()
<scheduled out>
futex(LOCK_PI, uaddr2)
futex(CMP_REQUEUE_PI, uaddr,
uaddr2, 1, 0)
/* requeues waiter to uaddr2 */
futex(UNLOCK_PI, uaddr2)
wake_futex_pi()
cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
<woken by waker>
<hrtimer_wakeup() fires,
clears sleeper->task>
futex(LOCK_PI, uaddr2)
__rt_mutex_start_proxy_lock()
try_to_take_rt_mutex() /* steals lock */
rt_mutex_set_owner(lock, stealer)
<preempted>
<scheduled in>
rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
try_to_take_rt_mutex() /* fails, lock held by stealer */
if (timeout && !timeout->task)
return -ETIMEDOUT;
fixup_owner()
/* lock wasn't acquired, so,
fixup_pi_state_owner skipped */
return -ETIMEDOUT;
/* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */
futex_lock(LOCK_PI, uaddr2)
-> bails with EDEADLK, futex word says we're owner.
And suggested that what commit:
73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:
16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
changed all the locking rules. Even without that, the sequence:
- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
- locked = 1;
- goto out;
- }
- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
- owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- if (!owner)
- owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
- raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
- ret = fixup_pi_state_owner(uaddr, q, owner);
already suggests there were races; otherwise we'd never have to look
at next_owner.
So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.
Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.
Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright <julia@ni.com>
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Julia Cartwright <julia@ni.com>
Tested-by: Gratian Crisan <gratian.crisan@ni.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net
David S. Miller [Sun, 14 Jan 2018 16:01:33 +0000 (11:01 -0500)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2018-01-13
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Follow-up fix to the recent BPF out-of-bounds speculation
fix that prevents max_entries overflows and an undefined
behavior on 32 bit archs on index_mask calculation, from
Daniel.
2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that
was otherwise throwing an unknown opcode warning in the
interpreter, from Daniel.
3) Typo fix in one of the user facing verbose() messages that
was added during the BPF out-of-bounds speculation fix,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ville Syrjälä [Tue, 28 Nov 2017 14:53:50 +0000 (16:53 +0200)]
Revert "x86/apic: Remove init_bsp_APIC()"
This reverts commit
b371ae0d4a194b178817b0edfb6a7395c7aec37a. It causes
boot hangs on old P3/P4 systems when the local APIC is enforced in UP mode.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/20171128145350.21560-1-ville.syrjala@linux.intel.com
Eric W. Biederman [Fri, 12 Jan 2018 20:31:35 +0000 (14:31 -0600)]
x86/mm/pkeys: Fix fill_sig_info_pkey
SEGV_PKUERR is a signal specific si_code which happens to have the same
numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP, FPE_FLTOVF,
TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID, as such it is not safe
to just test the si_code the signal number must also be tested to prevent a
false positive in fill_sig_info_pkey.
This error was by inspection, and BUS_MCEERR_AR appears to be a real
candidate for confusion. So pass in si_signo and check for SIG_SEGV to
verify that it is actually a SEGV_PKUERR
Fixes: 019132ff3daf ("x86/mm/pkeys: Fill in pkey field in siginfo")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180112203135.4669-2-ebiederm@xmission.com
Len Brown [Fri, 22 Dec 2017 05:27:56 +0000 (00:27 -0500)]
x86/tsc: Print tsc_khz, when it differs from cpu_khz
If CPU and TSC frequency are the same the printout of the CPU frequency is
valid for the TSC as well:
tsc: Detected 2900.000 MHz processor
If the TSC frequency is different there is no information in dmesg. Add a
conditional printout:
tsc: Detected 2904.000 MHz TSC
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com
Len Brown [Fri, 22 Dec 2017 05:27:55 +0000 (00:27 -0500)]
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
problematic:
- SKX workstations (with same model # as server variants) use a 24 MHz
crystal. This results in a -4.0% time drift rate on SKX workstations.
- SKX servers subject the crystal to an EMI reduction circuit that reduces its
actual frequency by (approximately) -0.25%. This results in -1 second per
10 minute time drift as compared to network time.
This issue can also trigger a timer and power problem, on configurations
that use the LAPIC timer (versus the TSC deadline timer). Clock ticks
scheduled with the LAPIC timer arrive a few usec before the time they are
expected (according to the slow TSC). This causes Linux to poll-idle, when
it should be in an idle power saving state. The idle and clock code do not
graciously recover from this error, sometimes resulting in significant
polling and measurable power impact.
Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
and the TSC refined calibration will update tsc_khz to correct for the
difference.
[ tglx: Sanitized change log ]
Fixes: 6baf3d61821f ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
Len Brown [Fri, 22 Dec 2017 05:27:54 +0000 (00:27 -0500)]
x86/tsc: Future-proof native_calibrate_tsc()
If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
the built-in table then native_calibrate_tsc() will still set the
X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.
As a consequence such systems use cpu_khz for the TSC frequency which is
incorrect when cpu_khz != tsc_khz resulting in time drift.
Return early when the crystal frequency cannot be retrieved without setting
the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
calibration is invoked.
[ tglx: Steam-blastered changelog. Sigh ]
Fixes: 4ca4df0b7eb0 ("x86/tsc: Mark TSC frequency determined by CPUID as known")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Bin Gao <bin.gao@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
Peter Zijlstra [Sun, 14 Jan 2018 10:27:13 +0000 (11:27 +0100)]
x86,perf: Disable intel_bts when PTI
The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.
This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.
As a quick fix disable this driver when PTI is enabled to prevent
malfunction.
Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: greg@kroah.com
Cc: hughd@google.com
Cc: luto@amacapital.net
Cc: Vince Weaver <vince@deater.net>
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
W. Trevor King [Fri, 12 Jan 2018 23:24:59 +0000 (15:24 -0800)]
security/Kconfig: Correct the Documentation reference for PTI
When the config option for PTI was added a reference to documentation was
added as well. But the documentation did not exist at that point. The final
documentation has a different file name.
Fix it up to point to the proper file.
Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig")
Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
Thomas Gleixner [Sat, 13 Jan 2018 23:23:57 +0000 (00:23 +0100)]
x86/pti: Fix !PCID and sanitize defines
The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.
This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.
While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.
This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.
Make sure that on non PCID machines bit 11 is not set by the page table
switching code.
Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.
That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
Linus Torvalds [Sat, 13 Jan 2018 22:10:32 +0000 (14:10 -0800)]
Merge tag 'usb-4.15-rc8' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and device ids for 4.15-rc8
Nothing major, small fixes for various devices, some resolutions for
bugs found by fuzzers, and the usual handful of new device ids.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Documentation: usb: fix typo in UVC gadgetfs config command
usb: misc: usb3503: make sure reset is low for at least 100us
uas: ignore UAS for Norelsys NS1068(X) chips
USB: UDC core: fix double-free in usb_add_gadget_udc_release
USB: fix usbmon BUG trigger
usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
usbip: remove kernel addresses from usb device and urb debug msgs
usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
USB: serial: cp210x: add new device ID ELV ALC 8xxx
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ