Jisheng Zhang [Sun, 30 Oct 2022 12:45:17 +0000 (20:45 +0800)]
riscv: fix race when vmap stack overflow
Currently, when detecting vmap stack overflow, riscv firstly switches
to the so called shadow stack, then use this shadow stack to call the
get_overflow_stack() to get the overflow stack. However, there's
a race here if two or more harts use the same shadow stack at the same
time.
To solve this race, we introduce spin_shadow_stack atomic var, which
will be swap between its own address and 0 in atomic way, when the
var is set, it means the shadow_stack is being used; when the var
is cleared, it means the shadow_stack isn't being used.
Fixes:
31da94c25aea ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
[Palmer: Add AQ to the swap, and also some comments.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Tong Tiangen [Mon, 21 Jun 2021 03:28:55 +0000 (11:28 +0800)]
riscv: add VMAP_STACK overflow detection
This patch adds stack overflow detection to riscv, usable when
CONFIG_VMAP_STACK=y.
Overflow is detected in kernel exception entry(kernel/entry.S), if the
kernel stack is overflow and been detected, the overflow handler is
invoked on a per-cpu overflow stack. This approach preserves GPRs and
the original exception information.
The overflow detect is performed before any attempt is made to access
the stack and the principle of stack overflow detection: kernel stacks
are aligned to double their size, enabling overflow to be detected with
a single bit test. For example, a 16K stack is aligned to 32K, ensuring
that bit 14 of the SP must be zero. On an overflow (or underflow), this
bit is flipped. Thus, overflow (of less than the size of the stack) can
be detected by testing whether this bit is set.
This gives us a useful error message on stack overflow, as can be
trigger with the LKDTM overflow test:
[ 388.053267] lkdtm: Performing direct entry EXHAUST_STACK
[ 388.053663] lkdtm: Calling function with 1024 frame size to depth 32 ...
[ 388.054016] lkdtm: loop 32/32 ...
[ 388.054186] lkdtm: loop 31/32 ...
[ 388.054491] lkdtm: loop 30/32 ...
[ 388.054672] lkdtm: loop 29/32 ...
[ 388.054859] lkdtm: loop 28/32 ...
[ 388.055010] lkdtm: loop 27/32 ...
[ 388.055163] lkdtm: loop 26/32 ...
[ 388.055309] lkdtm: loop 25/32 ...
[ 388.055481] lkdtm: loop 24/32 ...
[ 388.055653] lkdtm: loop 23/32 ...
[ 388.055837] lkdtm: loop 22/32 ...
[ 388.056015] lkdtm: loop 21/32 ...
[ 388.056188] lkdtm: loop 20/32 ...
[ 388.058145] Insufficient stack space to handle exception!
[ 388.058153] Task stack: [0xffffffd014260000..0xffffffd014264000]
[ 388.058160] Overflow stack: [0xffffffe1f8d2c220..0xffffffe1f8d2d220]
[ 388.058168] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90
[ 388.058175] Hardware name: riscv-virtio,qemu (DT)
[ 388.058187] epc : number+0x32/0x2c0
[ 388.058247] ra : vsnprintf+0x2ae/0x3f0
[ 388.058255] epc :
ffffffe0002d38f6 ra :
ffffffe0002d814e sp :
ffffffd01425ffc0
[ 388.058263] gp :
ffffffe0012e4010 tp :
ffffffe08014da00 t0 :
ffffffd0142606e8
[ 388.058271] t1 :
0000000000000000 t2 :
0000000000000000 s0 :
ffffffd014260070
[ 388.058303] s1 :
ffffffd014260158 a0 :
ffffffd01426015e a1 :
ffffffd014260158
[ 388.058311] a2 :
0000000000000013 a3 :
ffff0a01ffffff10 a4 :
ffffffe000c398e0
[ 388.058319] a5 :
511b02ec65f3e300 a6 :
0000000000a1749a a7 :
0000000000000000
[ 388.058327] s2 :
ffffffff000000ff s3 :
00000000ffff0a01 s4 :
ffffffe0012e50a8
[ 388.058335] s5 :
0000000000ffff0a s6 :
ffffffe0012e50a8 s7 :
ffffffe000da1cc0
[ 388.058343] s8 :
ffffffffffffffff s9 :
ffffffd0142602b0 s10:
ffffffd0142602a8
[ 388.058351] s11:
ffffffd01426015e t3 :
00000000000f0000 t4 :
ffffffffffffffff
[ 388.058359] t5 :
000000000000002f t6 :
ffffffd014260158
[ 388.058366] status:
0000000000000100 badaddr:
ffffffd01425fff8 cause:
000000000000000f
[ 388.058374] Kernel panic - not syncing: Kernel stack overflow
[ 388.058381] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90
[ 388.058387] Hardware name: riscv-virtio,qemu (DT)
[ 388.058393] Call Trace:
[ 388.058400] [<
ffffffe000004944>] walk_stackframe+0x0/0xce
[ 388.058406] [<
ffffffe0006f0b28>] dump_backtrace+0x38/0x46
[ 388.058412] [<
ffffffe0006f0b46>] show_stack+0x10/0x18
[ 388.058418] [<
ffffffe0006f3690>] dump_stack+0x74/0x8e
[ 388.058424] [<
ffffffe0006f0d52>] panic+0xfc/0x2b2
[ 388.058430] [<
ffffffe0006f0acc>] print_trace_address+0x0/0x24
[ 388.058436] [<
ffffffe0002d814e>] vsnprintf+0x2ae/0x3f0
[ 388.058956] SMP: stopping secondary CPUs
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jeff Xie [Sun, 20 Jun 2021 12:01:51 +0000 (20:01 +0800)]
riscv: ptrace: add argn syntax
This enables ftrace kprobe events to access kernel function
arguments via $argN syntax.
Signed-off-by: Jeff Xie <huan.xie@suse.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Nanyong Sun [Thu, 17 Jun 2021 09:58:31 +0000 (17:58 +0800)]
riscv: mm: fix build errors caused by mk_pmd()
With "riscv: mm: add THP support on 64-bit", mk_pmd() function
introduce build errors,
1.build with CONFIG_ARCH_RV32I=y:
arch/riscv/include/asm/pgtable.h: In function 'mk_pmd':
arch/riscv/include/asm/pgtable.h:513:9: error: implicit declaration of function 'pfn_pmd';
did you mean 'pfn_pgd'? [-Werror=implicit-function-declaration]
2.build with CONFIG_SPARSEMEM=y && CONFIG_SPARSEMEM_VMEMMAP=n
arch/riscv/include/asm/pgtable.h: In function 'mk_pmd':
include/asm-generic/memory_model.h:64:14: error: implicit declaration of function 'page_to_section';
did you mean 'present_section'? [-Werror=implicit-function-declaration]
Move the definition of mk_pmd to pgtable-64.h to fix the first error.
Use macro definition instead of inline function for mk_pmd
to fix the second problem. It is similar to the mk_pte macro.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Alexandre Ghiti [Thu, 17 Jun 2021 13:53:07 +0000 (15:53 +0200)]
riscv: Introduce structure that group all variables regarding kernel mapping
We have a lot of variables that are used to hold kernel mapping addresses,
offsets between physical and virtual mappings and some others used for XIP
kernels: they are all defined at different places in mm/init.c, so group
them into a single structure with, for some of them, more explicit and concise
names.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Palmer Dabbelt [Thu, 1 Jul 2021 04:50:32 +0000 (21:50 -0700)]
Merge branch 'riscv-wx-mappings' into for-next
This contains both the short-term fix for the W+X boot mappings and the
larger cleanup.
* riscv-wx-mappings:
riscv: Map the kernel with correct permissions the first time
riscv: Introduce set_kernel_memory helper
riscv: Simplify xip and !xip kernel address conversion macros
riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED
riscv: mm: Fix W+X mappings at boot
Alexandre Ghiti [Thu, 24 Jun 2021 12:00:41 +0000 (14:00 +0200)]
riscv: Map the kernel with correct permissions the first time
For 64-bit kernels, we map all the kernel with write and execute
permissions and afterwards remove writability from text and executability
from data.
For 32-bit kernels, the kernel mapping resides in the linear mapping, so we
map all the linear mapping as writable and executable and afterwards we
remove those properties for unused memory and kernel mapping as
described above.
Change this behavior to directly map the kernel with correct permissions
and avoid going through the whole mapping to fix the permissions.
At the same time, this fixes an issue introduced by commit
2bfc6cd81bd1
("riscv: Move kernel mapping outside of linear mapping") as reported
here https://github.com/starfive-tech/linux/issues/17.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Alexandre Ghiti [Thu, 24 Jun 2021 12:00:40 +0000 (14:00 +0200)]
riscv: Introduce set_kernel_memory helper
This helper should be used for setting permissions to the kernel
mapping as it takes pointers as arguments and then avoids explicit cast
to unsigned long needed for the set_memory_* API.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Liu Shixin [Tue, 15 Jun 2021 03:07:34 +0000 (11:07 +0800)]
riscv: Enable KFENCE for riscv64
Add architecture specific implementation details for KFENCE and enable
KFENCE for the riscv64 architecture. In particular, this implements the
required interface in <asm/kfence.h>.
KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the kfence pool to be mapped at
page granularity.
Testing this patch using the testcases in kfence_test.c and all passed.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Palmer Dabbelt [Sat, 12 Jun 2021 03:40:42 +0000 (20:40 -0700)]
RISC-V: Use asm-generic for {in,out}{bwlq}
The asm-generic implementation is functionally identical to the RISC-V
version.
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Guo Ren [Sun, 6 Jun 2021 15:20:50 +0000 (17:20 +0200)]
riscv: add ASID-based tlbflushing methods
Implement optimized version of the tlb flushing routines for systems
using ASIDs. These are behind the use_asid_allocator static branch to
not affect existing systems not using ASIDs.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
[hch: rebased on top of previous cleanups, use the same algorithm as
the non-ASID based code for local vs global flushes, keep functions
as local as possible]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Christoph Hellwig [Sun, 6 Jun 2021 15:20:49 +0000 (17:20 +0200)]
riscv: pass the mm_struct to __sbi_tlb_flush_range
Move the call mm_cpumask from the callers into __sbi_tlb_flush_range to
reduce a bit of duplicate code and prepare for future changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Kefeng Wang [Wed, 2 Jun 2021 08:55:17 +0000 (16:55 +0800)]
riscv: Add mem kernel parameter support
The memblock_enforce_memory_limit() could change the memblock
range, so move the dram_end assignment after it in bootmem_init(),
then support mem= cmdline.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Alexandre Ghiti [Fri, 4 Jun 2021 11:49:48 +0000 (13:49 +0200)]
riscv: Simplify xip and !xip kernel address conversion macros
To simplify the kernel address conversion code, make the same definition of
kernel_mapping_pa_to_va and kernel_mapping_va_to_pa compatible for both xip
and !xip kernel by defining XIP_OFFSET to 0 in !xip kernel.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Alexandre Ghiti [Fri, 4 Jun 2021 11:49:47 +0000 (13:49 +0200)]
riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED
Make the physical RAM base address available for all kernels, not only
XIP kernels as it will allow to simplify address conversions macros.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Kefeng Wang [Wed, 2 Jun 2021 08:55:16 +0000 (16:55 +0800)]
riscv: Only initialize swiotlb when necessary
The SWIOTLB buffer is not needed unless the physical address space
is beyond the limit of dma, only initialize swiotlb when swiotlb_force
is true or not all system memory is DMA-able.
Also move the swiotlb_init() into mem_init().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Vitaly Wool [Tue, 8 Jun 2021 08:21:27 +0000 (10:21 +0200)]
riscv: fix typo in init.c
Commit
010623568222 introduced a typo in "__initdata" spelling
which led to build breakage for XIP. Fix that.
Fixes:
010623568222 ("riscv: mm: init: Consolidate vars, functions")
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Guo Ren [Sun, 30 May 2021 04:53:27 +0000 (04:53 +0000)]
riscv: Cleanup unused functions
These functions haven't been used, so just remove them. The patch
has been tested with riscv.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Kefeng Wang [Sat, 29 May 2021 11:15:34 +0000 (19:15 +0800)]
riscv: mm: Use better bitmap_zalloc()
Use better bitmap_zalloc() to allocate bitmap.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Bixuan Cui [Sat, 29 May 2021 08:06:57 +0000 (16:06 +0800)]
riscv: fix build error when CONFIG_SMP is disabled
Fix build error when disable CONFIG_SMP:
mm/pgtable-generic.o: In function `.L19':
pgtable-generic.c:(.text+0x42): undefined reference to `flush_pmd_tlb_range'
mm/pgtable-generic.o: In function `pmdp_huge_clear_flush':
pgtable-generic.c:(.text+0x6c): undefined reference to `flush_pmd_tlb_range'
mm/pgtable-generic.o: In function `pmdp_invalidate':
pgtable-generic.c:(.text+0x162): undefined reference to `flush_pmd_tlb_range'
Fixes:
e88b333142e4 ("riscv: mm: add THP support on 64-bit")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Acked-by: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Sun, 16 May 2021 09:00:38 +0000 (17:00 +0800)]
riscv: mm: Fix W+X mappings at boot
When the kernel mapping was moved the last 2GB of the address space,
(__va(PFN_PHYS(max_low_pfn))) is much smaller than the .data section
start address, the last set_memory_nx() in protect_kernel_text_data()
will fail, thus the .data section is still mapped as W+X. This results
in below W+X mapping waring at boot. Fix it by passing the correct
.data section page num to the set_memory_nx().
[ 0.396516] ------------[ cut here ]------------
[ 0.396889] riscv/mm: Found insecure W+X mapping at address (____ptrval____)/0xffffffff80c00000
[ 0.398347] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/ptdump.c:258 note_page+0x244/0x24a
[ 0.398964] Modules linked in:
[ 0.399459] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1+ #14
[ 0.400003] Hardware name: riscv-virtio,qemu (DT)
[ 0.400591] epc : note_page+0x244/0x24a
[ 0.401368] ra : note_page+0x244/0x24a
[ 0.401772] epc :
ffffffff80007c86 ra :
ffffffff80007c86 sp :
ffffffe000e7bc30
[ 0.402304] gp :
ffffffff80caae88 tp :
ffffffe000e70000 t0 :
ffffffff80cb80cf
[ 0.402800] t1 :
ffffffff80cb80c0 t2 :
0000000000000000 s0 :
ffffffe000e7bc80
[ 0.403310] s1 :
ffffffe000e7bde8 a0 :
0000000000000053 a1 :
ffffffff80c83ff0
[ 0.403805] a2 :
0000000000000010 a3 :
0000000000000000 a4 :
6c7e7a5137233100
[ 0.404298] a5 :
6c7e7a5137233100 a6 :
0000000000000030 a7 :
ffffffffffffffff
[ 0.404849] s2 :
ffffffff80e00000 s3 :
0000000040000000 s4 :
0000000000000000
[ 0.405393] s5 :
0000000000000000 s6 :
0000000000000003 s7 :
ffffffe000e7bd48
[ 0.405935] s8 :
ffffffff81000000 s9 :
ffffffffc0000000 s10:
ffffffe000e7bd48
[ 0.406476] s11:
0000000000001000 t3 :
0000000000000072 t4 :
ffffffffffffffff
[ 0.407016] t5 :
0000000000000002 t6 :
ffffffe000e7b978
[ 0.407435] status:
0000000000000120 badaddr:
0000000000000000 cause:
0000000000000003
[ 0.408052] Call Trace:
[ 0.408343] [<
ffffffff80007c86>] note_page+0x244/0x24a
[ 0.408855] [<
ffffffff8010c5a6>] ptdump_hole+0x14/0x1e
[ 0.409263] [<
ffffffff800f65c6>] walk_pgd_range+0x2a0/0x376
[ 0.409690] [<
ffffffff800f6828>] walk_page_range_novma+0x4e/0x6e
[ 0.410146] [<
ffffffff8010c5f8>] ptdump_walk_pgd+0x48/0x78
[ 0.410570] [<
ffffffff80007d66>] ptdump_check_wx+0xb4/0xf8
[ 0.410990] [<
ffffffff80006738>] mark_rodata_ro+0x26/0x2e
[ 0.411407] [<
ffffffff8031961e>] kernel_init+0x44/0x108
[ 0.411814] [<
ffffffff80002312>] ret_from_exception+0x0/0xc
[ 0.412309] ---[ end trace
7ec3459f2547ea83 ]---
[ 0.413141] Checked W+X mappings: failed, 512 W+X pages found
Fixes:
2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Guo Ren [Wed, 26 May 2021 05:49:20 +0000 (05:49 +0000)]
riscv: Use global mappings for kernel pages
We map kernel pages into all addresses spages, so they can be marked as
global. This allows hardware to avoid flushing the kernel mappings when
moving between address spaces.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[Palmer: commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Randy Dunlap [Tue, 25 May 2021 00:13:02 +0000 (17:13 -0700)]
riscv: TRANSPARENT_HUGEPAGE: depends on MMU
Fix a Kconfig warning and many build errors:
WARNING: unmet direct dependencies detected for COMPACTION
Depends on [n]: MMU [=n]
Selected by [y]:
- TRANSPARENT_HUGEPAGE [=y] && HAVE_ARCH_TRANSPARENT_HUGEPAGE [=y]
and the subseqent thousands of build errors and warnings.
Fixes:
e88b333142e4 ("riscv: mm: add THP support on 64-bit")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Sun, 16 May 2021 13:15:56 +0000 (21:15 +0800)]
riscv: mm: init: Consolidate vars, functions
Consolidate the following items in init.c
Staticize global vars as much as possible;
Add __initdata mark if the global var isn't needed after init
Add __init mark if the func isn't needed after init
Add __ro_after_init if the global var is read only after init
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Sun, 16 May 2021 12:59:42 +0000 (20:59 +0800)]
riscv: Add __init section marker to some functions again
These functions are not needed after booting, so mark them as __init
to move them to the __init section.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Sun, 18 Apr 2021 16:29:19 +0000 (00:29 +0800)]
riscv: kprobes: Remove redundant kprobe_step_ctx
Inspired by commit
ba090f9cafd5 ("arm64: kprobes: Remove redundant
kprobe_step_ctx"), the ss_pending and match_addr of kprobe_step_ctx
are redundant because those can be replaced by KPROBE_HIT_SS and
&cur_kprobe->ainsn.api.insn[0] + GET_INSN_LENGTH(cur->opcode)
respectively.
Remove the kprobe_step_ctx to simplify the code.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Wed, 12 May 2021 14:55:45 +0000 (22:55 +0800)]
riscv: Turn has_fpu into a static key if FPU=y
The has_fpu check sits at hot code path: switch_to(). Currently, has_fpu
is a bool variable if FPU=y, switch_to() checks it each time, we can
optimize out this check by turning the has_fpu into a static key.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Tue, 11 May 2021 17:42:31 +0000 (01:42 +0800)]
riscv: Optimize switch_mm by passing "cpu" to flush_icache_deferred()
Directly passing the cpu to flush_icache_deferred() rather than calling
smp_processor_id() again.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Palmer: drop the QEMU performance numbers, and update the comment]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Kefeng Wang [Fri, 14 May 2021 09:49:08 +0000 (17:49 +0800)]
riscv: mm: Drop redundant _sdata and _edata declaration
The _sdata/_edata is already in sections.h, drop redundant
declaration.
Also move _xiprom/_exiprom declarations at the beginning of
the file, cleanup one CONFIG_XIP_KERNEL.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Kefeng Wang [Mon, 10 May 2021 11:42:22 +0000 (19:42 +0800)]
riscv: Move setup_bootmem into paging_init
Make setup_bootmem() static.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Stanislaw Kardach [Mon, 12 Apr 2021 11:10:12 +0000 (13:10 +0200)]
riscv: enable generic PCI resource mapping
Enable the PCI resource mapping on RISC-V using the generic framework.
This allows userspace applications to mmap PCI resources using
/sys/devices/pci*/*/resource* interface.
The mmap has been tested with Intel x520-DA2 NIC card on a HiFive
Unmatched board (SiFive FU740 SoC).
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Fri, 7 May 2021 14:19:59 +0000 (22:19 +0800)]
riscv: mm: Remove setup_zero_page()
The empty_zero_page sits at .bss..page_aligned section, so will be
cleared to zero during clearing bss, we don't need to clear it again.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Jisheng Zhang [Fri, 16 Apr 2021 16:37:22 +0000 (00:37 +0800)]
riscv: mremap speedup - enable HAVE_MOVE_PUD and HAVE_MOVE_PMD
HAVE_MOVE_PUD enables remapping pages at the PUD level if both the source
and destination addresses are PUD-aligned.
HAVE_MOVE_PMD does similar speedup on the PMD level.
With HAVE_MOVE_PUD enabled, there is about a 143x improvement on qemu
With HAVE_MOVE_PMD enabled, there is about a 5x improvement on qemu
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Nanyong Sun [Fri, 30 Apr 2021 08:28:50 +0000 (16:28 +0800)]
riscv: mm: add THP support on 64-bit
Bring Transparent HugePage support to riscv. A
transparent huge page is always represented as a pmd.
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Nanyong Sun [Fri, 30 Apr 2021 08:28:49 +0000 (16:28 +0800)]
riscv: mm: add param stride for __sbi_tlb_flush_range
Add a parameter: stride for __sbi_tlb_flush_range(),
represent the page stride between the address of start and end.
Normally, the stride is PAGE_SIZE, and when flush huge page
address, the stride can be the huge page size such as:PMD_SIZE,
then it only need to flush one tlb entry if the address range
within PMD_SIZE.
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Nanyong Sun [Fri, 30 Apr 2021 08:28:48 +0000 (16:28 +0800)]
riscv: mm: make pmd_bad() check leaf condition
In the definition in Documentation/vm/arch_pgtable_helpers.rst,
pmd_bad() means test a non-table mapped PMD, so it should also
return true when it is a leaf page.
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Nanyong Sun [Fri, 30 Apr 2021 08:28:47 +0000 (16:28 +0800)]
riscv: mm: add _PAGE_LEAF macro
In riscv, a page table entry is leaf when any bit of read, write,
or execute bit is set. So add a macro:_PAGE_LEAF instead of
(_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC), which is frequently used
to determine if it is a leaf page. This make code easier to read,
without any functional change.
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Linus Torvalds [Sun, 9 May 2021 21:17:44 +0000 (14:17 -0700)]
Linux 5.13-rc1
Linus Torvalds [Sun, 9 May 2021 21:03:33 +0000 (14:03 -0700)]
fbmem: fix horribly incorrect placement of __maybe_unused
Commit
b9d79e4ca4ff ("fbmem: Mark proc_fb_seq_ops as __maybe_unused")
places the '__maybe_unused' in an entirely incorrect location between
the "struct" keyword and the structure name.
It's a wonder that gcc accepts that silently, but clang quite reasonably
warns about it:
drivers/video/fbdev/core/fbmem.c:736:21: warning: attribute declaration must precede definition [-Wignored-attributes]
static const struct __maybe_unused seq_operations proc_fb_seq_ops = {
^
Fix it.
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 9 May 2021 20:42:39 +0000 (13:42 -0700)]
Merge tag 'drm-next-2021-05-10' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Bit later than usual, I queued them all up on Friday then promptly
forgot to write the pull request email. This is mainly amdgpu fixes,
with some radeon/msm/fbdev and one i915 gvt fix thrown in.
amdgpu:
- MPO hang workaround
- Fix for concurrent VM flushes on vega/navi
- dcefclk is not adjustable on navi1x and newer
- MST HPD debugfs fix
- Suspend/resumes fixes
- Register VGA clients late in case driver fails to load
- Fix GEM leak in user framebuffer create
- Add support for polaris12 with 32 bit memory interface
- Fix duplicate cursor issue when using overlay
- Fix corruption with tiled surfaces on VCN3
- Add BO size and stride check to fix BO size verification
radeon:
- Fix off-by-one in power state parsing
- Fix possible memory leak in power state parsing
msm:
- NULL ptr dereference fix
fbdev:
- procfs disabled warning fix
i915:
- gvt: Fix a possible division by zero in vgpu display rate
calculation"
* tag 'drm-next-2021-05-10' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: Use device specific BO size & stride check.
drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.
drm/amd/pm: initialize variable
drm/radeon: Avoid power table parsing memory leaks
drm/radeon: Fix off-by-one power_state index heap overwrite
drm/amd/display: Fix two cursor duplication when using overlay
drm/amdgpu: add new MC firmware for Polaris12 32bit ASIC
fbmem: Mark proc_fb_seq_ops as __maybe_unused
drm/msm/dpu: Delete bonkers code
drm/i915/gvt: Prevent divided by zero when calculating refresh rate
amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create
drm/amdgpu: Register VGA clients after init can no longer fail
drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown
drm/amdgpu: fix r initial values
drm/amd/display: fix wrong statement in mst hpd debugfs
amdgpu/pm: set pp_dpm_dcefclk to readonly on NAVI10 and newer gpus
amdgpu/pm: Prevent force of DCEFCLK on NAVI10 and SIENNA_CICHLID
drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
drm/amd/display: Reject non-zero src_y and src_x for video planes
Linus Torvalds [Sun, 9 May 2021 20:25:14 +0000 (13:25 -0700)]
Merge tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Turns out the bio max size change still has issues, so let's get it
reverted for 5.13-rc1. We'll shake out the issues there and defer it
to 5.14 instead"
* tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block:
Revert "bio: limit bio max size"
Linus Torvalds [Sun, 9 May 2021 20:19:29 +0000 (13:19 -0700)]
Merge tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small SMB3 chmultichannel related changesets (also for stable)
from the SMB3 test event this week.
The other fixes are still in review/testing"
* tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6:
smb3: if max_channels set to more than one channel request multichannel
smb3: do not attempt multichannel to server which does not support it
smb3: when mounting with multichannel include it in requested capabilities
Linus Torvalds [Sun, 9 May 2021 20:14:34 +0000 (13:14 -0700)]
Merge tag 'sched-urgent-2021-05-09' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A set of scheduler updates:
- Prevent PSI state corruption when schedule() races with cgroup
move.
A recent commit combined two PSI callbacks to reduce the number of
cgroup tree updates, but missed that schedule() can drop rq::lock
for load balancing, which opens the race window for
cgroup_move_task() which then observes half updated state.
The fix is to solely use task::ps_flags instead of looking at the
potentially mismatching scheduler state
- Prevent an out-of-bounds access in uclamp caused bu a rounding
division which can lead to an off-by-one error exceeding the
buckets array size.
- Prevent unfairness caused by missing load decay when a task is
attached to a cfs runqueue.
The old load of the task was attached to the runqueue and never
removed. Fix it by enforcing the load update through the hierarchy
for unthrottled run queue instances.
- A documentation fix fot the 'sched_verbose' command line option"
* tag 'sched-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix unfairness caused by missing load decay
sched: Fix out-of-bound access in uclamp
psi: Fix psi state corruption when schedule() races with cgroup move
sched,doc: sched_debug_verbose cmdline should be sched_verbose
Linus Torvalds [Sun, 9 May 2021 20:07:03 +0000 (13:07 -0700)]
Merge tag 'locking-urgent-2021-05-09' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A set of locking related fixes and updates:
- Two fixes for the futex syscall related to the timeout handling.
FUTEX_LOCK_PI does not support the FUTEX_CLOCK_REALTIME bit and
because it's not set the time namespace adjustment for clock
MONOTONIC is applied wrongly.
FUTEX_WAIT cannot support the FUTEX_CLOCK_REALTIME bit because its
always a relative timeout.
- Cleanups in the futex syscall entry points which became obvious
when the two timeout handling bugs were fixed.
- Cleanup of queued_write_lock_slowpath() as suggested by Linus
- Fixup of the smp_call_function_single_async() prototype"
* tag 'locking-urgent-2021-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Make syscall entry points less convoluted
futex: Get rid of the val2 conditional dance
futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI
Revert
337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
locking/qrwlock: Cleanup queued_write_lock_slowpath()
smp: Fix smp_call_function_single_async prototype
Linus Torvalds [Sun, 9 May 2021 20:00:26 +0000 (13:00 -0700)]
Merge tag 'perf_urgent_for_v5.13_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 perf fix from Borislav Petkov:
"Handle power-gating of AMD IOMMU perf counters properly when they are
used"
* tag 'perf_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
Linus Torvalds [Sun, 9 May 2021 19:52:25 +0000 (12:52 -0700)]
Merge tag 'x86_urgent_for_v5.13_rc1' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A bunch of things accumulated for x86 in the last two weeks:
- Fix guest vtime accounting so that ticks happening while the guest
is running can also be accounted to it. Along with a consolidation
to the guest-specific context tracking helpers.
- Provide for the host NMI handler running after a VMX VMEXIT to be
able to run on the kernel stack correctly.
- Initialize MSR_TSC_AUX when RDPID is supported and not RDTSCP (virt
relevant - real hw supports both)
- A code generation improvement to TASK_SIZE_MAX through the use of
alternatives
- The usual misc and related cleanups and improvements"
* tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Consolidate guest enter/exit logic to common helpers
context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
context_tracking: Consolidate guest enter/exit wrappers
sched/vtime: Move guest enter/exit vtime accounting to vtime.h
sched/vtime: Move vtime accounting external declarations above inlines
KVM: x86: Defer vtime accounting 'til after IRQ handling
context_tracking: Move guest exit vtime accounting to separate helpers
context_tracking: Move guest exit context tracking to separate helpers
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
x86/cpu: Remove write_tsc() and write_rdtscp_aux() wrappers
x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
x86/resctrl: Fix init const confusion
x86: Delete UD0, UD1 traces
x86/smpboot: Remove duplicate includes
x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant
Jens Axboe [Sun, 9 May 2021 03:49:48 +0000 (21:49 -0600)]
Revert "bio: limit bio max size"
This reverts commit
cd2c7545ae1beac3b6aae033c7f31193b3255946.
Alex reports that the commit causes corruption with LUKS on ext4. Revert
it for now so that this can be investigated properly.
Link: https://lore.kernel.org/linux-block/1620493841.bxdq8r5haw.none@localhost/
Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 8 May 2021 18:52:37 +0000 (11:52 -0700)]
Merge tag 'riscv-for-linus-5.13-mw1' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid over-allocating the kernel's mapping on !MMU systems,
which could lead to up to 2MiB of lost memory
- The SiFive address extension errata only manifest on rv64, they are
now disabled on rv32 where they are unnecessary
- A pair of late-landing cleanups
* tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: remove unused handle_exception symbol
riscv: Consistify protect_kernel_linear_mapping_text_rodata() use
riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y
riscv: Only extend kernel reservation if mapped read-only
Linus Torvalds [Sat, 8 May 2021 18:30:22 +0000 (11:30 -0700)]
drm/i915/display: fix compiler warning about array overrun
intel_dp_check_mst_status() uses a 14-byte array to read the DPRX Event
Status Indicator data, but then passes that buffer at offset 10 off as
an argument to drm_dp_channel_eq_ok().
End result: there are only 4 bytes remaining of the buffer, yet
drm_dp_channel_eq_ok() wants a 6-byte buffer. gcc-11 correctly warns
about this case:
drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_check_mst_status’:
drivers/gpu/drm/i915/display/intel_dp.c:3491:22: warning: ‘drm_dp_channel_eq_ok’ reading 6 bytes from a region of size 4 [-Wstringop-overread]
3491 | !drm_dp_channel_eq_ok(&esi[10], intel_dp->lane_count)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/display/intel_dp.c:3491:22: note: referencing argument 1 of type ‘const u8 *’ {aka ‘const unsigned char *’}
In file included from drivers/gpu/drm/i915/display/intel_dp.c:38:
include/drm/drm_dp_helper.h:1466:6: note: in a call to function ‘drm_dp_channel_eq_ok’
1466 | bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE],
| ^~~~~~~~~~~~~~~~~~~~
6:14 elapsed
This commit just extends the original array by 2 zero-initialized bytes,
avoiding the warning.
There may be some underlying bug in here that caused this confusion, but
this is at least no worse than the existing situation that could use
random data off the stack.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 8 May 2021 17:44:36 +0000 (10:44 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is a set of minor fixes in various drivers (qla2xxx, ufs,
scsi_debug, lpfc) one doc fix and a fairly large update to the fnic
driver to remove the open coded iteration functions in favour of the
scsi provided ones"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Use scsi_host_busy_iter() to traverse commands
scsi: fnic: Kill 'exclude_id' argument to fnic_cleanup_io()
scsi: scsi_debug: Fix cmd_per_lun, set to max_queue
scsi: ufs: core: Narrow down fast path in system suspend path
scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
scsi: ufs: core: Do not put UFS power into LPM if link is broken
scsi: qla2xxx: Prevent PRLI in target mode
scsi: qla2xxx: Add marginal path handling support
scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found
scsi: ufs: core: Fix a typo in ufs-sysfs.c
scsi: lpfc: Fix bad memory access during VPD DUMP mailbox command
scsi: lpfc: Fix DMA virtual address ptr assignment in bsg
scsi: lpfc: Fix illegal memory access on Abort IOCBs
scsi: blk-mq: Fix build warning when making htmldocs
Linus Torvalds [Sat, 8 May 2021 17:00:11 +0000 (10:00 -0700)]
Merge tag 'kbuild-v5.13-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- Convert sh and sparc to use generic shell scripts to generate the
syscall headers
- refactor .gitignore files
- Update kernel/config_data.gz only when the content of the .config
is really changed, which avoids the unneeded re-link of vmlinux
- move "remove stale files" workarounds to scripts/remove-stale-files
- suppress unused-but-set-variable warnings by default for Clang
as well
- fix locale setting LANG=C to LC_ALL=C
- improve 'make distclean'
- always keep intermediate objects from scripts/link-vmlinux.sh
- move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
- misc cleanups
* tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>
kbuild: Don't remove link-vmlinux temporary files on exit/signal
kbuild: remove the unneeded comments for external module builds
kbuild: make distclean remove tag files in sub-directories
kbuild: make distclean work against $(objtree) instead of $(srctree)
kbuild: refactor modname-multi by using suffix-search
kbuild: refactor fdtoverlay rule
kbuild: parameterize the .o part of suffix-search
arch: use cross_compiling to check whether it is a cross build or not
kbuild: remove ARCH=sh64 support from top Makefile
.gitignore: prefix local generated files with a slash
kbuild: replace LANG=C with LC_ALL=C
Makefile: Move -Wno-unused-but-set-variable out of GCC only block
kbuild: add a script to remove stale generated files
kbuild: update config_data.gz only when the content of .config is changed
.gitignore: ignore only top-level modules.builtin
.gitignore: move tags and TAGS close to other tag files
kernel/.gitgnore: remove stale timeconst.h and hz.bc
usr/include: refactor .gitignore
genksyms: fix stale comment
...
Steve French [Sat, 8 May 2021 00:33:51 +0000 (19:33 -0500)]
smb3: if max_channels set to more than one channel request multichannel
Mounting with "multichannel" is obviously implied if user requested
more than one channel on mount (ie mount parm max_channels>1).
Currently both have to be specified. Fix that so that if max_channels
is greater than 1 on mount, enable multichannel rather than silently
falling back to non-multichannel.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
Cc: <stable@vger.kernel.org> # v5.11+
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Steve French [Sat, 8 May 2021 01:00:41 +0000 (20:00 -0500)]
smb3: do not attempt multichannel to server which does not support it
We were ignoring CAP_MULTI_CHANNEL in the server response - if the
server doesn't support multichannel we should not be attempting it.
See MS-SMB2 section 3.2.5.2
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sat, 8 May 2021 15:49:54 +0000 (08:49 -0700)]
Merge tag 'powerpc-5.13-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc updates and fixes from Michael Ellerman:
"A bit of a mixture of things, tying up some loose ends.
There's the removal of the nvlink code, which dependend on a commit in
the vfio tree. Then the enablement of huge vmalloc which was in next
for a few weeks but got dropped due to conflicts. And there's also a
few fixes.
Summary:
- Remove the nvlink support now that it's only user has been removed.
- Enable huge vmalloc mappings for Radix MMU (P9).
- Fix KVM conversion to gfn-based MMU notifier callbacks.
- Fix a kexec/kdump crash with hot plugged CPUs.
- Fix boot failure on 32-bit with CONFIG_STACKPROTECTOR.
- Restore alphabetic order of the selects under CONFIG_PPC.
Thanks to: Christophe Leroy, Christoph Hellwig, Nicholas Piggin,
Sandipan Das, and Sourabh Jain"
* tag 'powerpc-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks
powerpc/kconfig: Restore alphabetic order of the selects under CONFIG_PPC
powerpc/32: Fix boot failure with CONFIG_STACKPROTECTOR
powerpc/powernv/memtrace: Fix dcache flushing
powerpc/kexec_file: Use current CPU info while setting up FDT
powerpc/64s/radix: Enable huge vmalloc mappings
powerpc/powernv: remove the nvlink support
Steve French [Fri, 7 May 2021 23:24:11 +0000 (18:24 -0500)]
smb3: when mounting with multichannel include it in requested capabilities
In the SMB3/SMB3.1.1 negotiate protocol request, we are supposed to
advertise CAP_MULTICHANNEL capability when establishing multiple
channels has been requested by the user doing the mount. See MS-SMB2
sections 2.2.3 and 3.2.5.2
Without setting it there is some risk that multichannel could fail
if the server interpreted the field strictly.
Reviewed-By: Tom Talpey <tom@talpey.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sat, 8 May 2021 15:31:46 +0000 (08:31 -0700)]
Merge tag 'net-5.13-rc1' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc1, including fixes from bpf, can and
netfilter trees. Self-contained fixes, nothing risky.
Current release - new code bugs:
- dsa: ksz: fix a few bugs found by static-checker in the new driver
- stmmac: fix frame preemption handshake not triggering after
interface restart
Previous releases - regressions:
- make nla_strcmp handle more then one trailing null character
- fix stack OOB reads while fragmenting IPv4 packets in openvswitch
and net/sched
- sctp: do asoc update earlier in sctp_sf_do_dupcook_a
- sctp: delay auto_asconf init until binding the first addr
- stmmac: clear receive all(RA) bit when promiscuous mode is off
- can: mcp251x: fix resume from sleep before interface was brought up
Previous releases - always broken:
- bpf: fix leakage of uninitialized bpf stack under speculation
- bpf: fix masking negation logic upon negative dst register
- netfilter: don't assume that skb_header_pointer() will never fail
- only allow init netns to set default tcp cong to a restricted algo
- xsk: fix xp_aligned_validate_desc() when len == chunk_size to avoid
false positive errors
- ethtool: fix missing NLM_F_MULTI flag when dumping
- can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
- sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
- bridge: fix NULL-deref caused by a races between assigning
rx_handler_data and setting the IFF_BRIDGE_PORT bit
Latecomer:
- seg6: add counters support for SRv6 Behaviors"
* tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
atm: firestream: Use fallthrough pseudo-keyword
net: stmmac: Do not enable RX FIFO overflow interrupts
mptcp: fix splat when closing unaccepted socket
i40e: Remove LLDP frame filters
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix broken XDP support
netfilter: nftables: avoid potential overflows on 32bit arches
netfilter: nftables: avoid overflows in nft_hash_buckets()
tcp: Specify cmsgbuf is user pointer for receive zerocopy.
mlxsw: spectrum_mr: Update egress RIF list before route's action
net: ipa: fix inter-EE IRQ register definitions
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
can: mcp251x: fix resume from sleep before interface was brought up
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
netfilter: nftables: Fix a memleak from userdata error path in new objects
netfilter: remove BUG_ON() after skb_header_pointer()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
...
Masahiro Yamada [Wed, 5 May 2021 17:45:15 +0000 (02:45 +0900)]
linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>
<linux/kconfig.h> is included from all the kernel-space source files,
including C, assembly, linker scripts. It is intended to contain a
minimal set of macros to evaluate CONFIG options.
IF_ENABLED() is an intruder here because (x ? y : z) is C code, which
should not be included from assembly files or linker scripts.
Also, <linux/kconfig.h> is no longer self-contained because NULL is
defined in <linux/stddef.h>.
Move IF_ENABLED() out to <linux/kernel.h> as PTR_IF(). PTF_IF()
takes the general boolean expression instead of a CONFIG option
so that it fits better in <linux/kernel.h>.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Michael Ellerman [Sat, 8 May 2021 11:12:55 +0000 (21:12 +1000)]
Merge branch 'master' into next
Merge master back into next, this allows us to resolve some conflicts in
arch/powerpc/Kconfig, and also re-sort the symbols under config PPC so
that they are in alphabetical order again.
Jakub Kicinski [Fri, 7 May 2021 23:10:12 +0000 (16:10 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Add SECMARK revision 1 to fix incorrect layout that prevents
from remove rule with this target, from Phil Sutter.
2) Fix pernet exit path spat in arptables, from Florian Westphal.
3) Missing rcu_read_unlock() for unknown nfnetlink callbacks,
reported by syzbot, from Eric Dumazet.
4) Missing check for skb_header_pointer() NULL pointer in
nfnetlink_osf.
5) Remove BUG_ON() after skb_header_pointer() from packet path
in several conntrack helper and the TCP tracker.
6) Fix memleak in the new object error path of userdata.
7) Avoid overflows in nft_hash_buckets(), reported by syzbot,
also from Eric.
8) Avoid overflows in 32bit arches, from Eric.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nftables: avoid potential overflows on 32bit arches
netfilter: nftables: avoid overflows in nft_hash_buckets()
netfilter: nftables: Fix a memleak from userdata error path in new objects
netfilter: remove BUG_ON() after skb_header_pointer()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
netfilter: nfnetlink: add a missing rcu_read_unlock()
netfilter: arptables: use pernet ops struct during unregister
netfilter: xt_SECMARK: add new revision to fix structure layout
====================
Link: https://lore.kernel.org/r/20210507174739.1850-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 7 May 2021 23:04:22 +0000 (16:04 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue
Nguyen, Anthony L says:
====================
Intel Wired LAN Driver Updates 2021-05-07
This series contains updates to i40e driver only.
Magnus fixes XDP by adding and correcting checks that were caused by a
previous commit which introduced a new variable but did not account for
it in all paths.
Yunjian Wang adds a return in an error path to prevent reading a freed
pointer.
Jaroslaw forces link reset when changing FEC so that changes take
affect.
Mateusz fixes PHY types for 2.5G and 5G as there is a differentiation on
PHY identifiers based on operation.
Arkadiusz removes filtering of LLDP frames for software DCB as this is
preventing them from being properly transmitted.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Remove LLDP frame filters
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix broken XDP support
====================
Link: https://lore.kernel.org/r/20210507164151.2878147-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Ming Chen [Fri, 7 May 2021 12:38:43 +0000 (20:38 +0800)]
atm: firestream: Use fallthrough pseudo-keyword
Add pseudo-keyword macro fallthrough[1]
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Wei Ming Chen <jj251510319013@gmail.com>
Link: https://lore.kernel.org/r/20210507123843.10602-1-jj251510319013@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yannick Vignon [Thu, 6 May 2021 14:33:12 +0000 (16:33 +0200)]
net: stmmac: Do not enable RX FIFO overflow interrupts
The RX FIFO overflows when the system is not able to process all received
packets and they start accumulating (first in the DMA queue in memory,
then in the FIFO). An interrupt is then raised for each overflowing packet
and handled in stmmac_interrupt(). This is counter-productive, since it
brings the system (or more likely, one CPU core) to its knees to process
the FIFO overflow interrupts.
stmmac_interrupt() handles overflow interrupts by writing the rx tail ptr
into the corresponding hardware register (according to the MAC spec, this
has the effect of restarting the MAC DMA). However, without freeing any rx
descriptors, the DMA stops right away, and another overflow interrupt is
raised as the FIFO overflows again. Since the DMA is already restarted at
the end of stmmac_rx_refill() after freeing descriptors, disabling FIFO
overflow interrupts and the corresponding handling code has no side effect,
and eliminates the interrupt storm when the RX FIFO overflows.
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20210506143312.20784-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 7 May 2021 00:16:38 +0000 (17:16 -0700)]
mptcp: fix splat when closing unaccepted socket
If userspace exits before calling accept() on a listener that had at least
one new connection ready, we get:
Attempt to release TCP socket in state 8
This happens because the mptcp socket gets cloned when the TCP connection
is ready, but the socket is never exposed to userspace.
The client additionally sends a DATA_FIN, which brings connection into
CLOSE_WAIT state. This in turn prevents the orphan+state reset fixup
in mptcp_sock_destruct() from doing its job.
Fixes:
3721b9b64676b ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/185
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20210507001638.225468-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 7 May 2021 21:49:18 +0000 (14:49 -0700)]
Merge tag 'tag-chrome-platform-for-v5.13' of git://git./linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
"cros_ec_typec:
- Changes around DP mode check, hard reset, tracking port change.
cros_ec misc:
- wilco_ec: Convert stream-like files from nonseekable to stream open
- cros_usbpd_notify: Listen to EC_HSOT_EVENT_USB_MUX host event
- fix format warning in cros_ec_typec"
* tag 'tag-chrome-platform-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_lpc: Use DEFINE_MUTEX() for mutex lock
platform/chrome: cros_usbpd_notify: Listen to EC_HOST_EVENT_USB_MUX host event
platform/chrome: cros_ec_typec: Add DP mode check
platform/chrome: cros_ec_typec: Handle hard reset
platform/chrome: cros_ec: Add Type C hard reset
platform/chrome: cros_ec_typec: Track port role
platform/chrome: cros_ec_typec: fix clang -Wformat warning
platform/chrome: cros_ec_typec: Check for device within remove function
platform/chrome: wilco_ec: convert stream-like files from nonseekable_open -> stream_open
Linus Torvalds [Fri, 7 May 2021 20:06:34 +0000 (13:06 -0700)]
Merge tag 'i3c/for-5.13' of git://git./linux/kernel/git/i3c/linux
Pull i3cupdates from Alexandre Belloni:
"Fix i3c_master_register error path"
* tag 'i3c/for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
Revert "i3c master: fix missing destroy_workqueue() on error in i3c_master_register"
dt-bindings: i3c: Fix silvaco,i3c-master-v1 compatible string
i3c: master: svc: remove redundant assignment to cmd->read_len
Linus Torvalds [Fri, 7 May 2021 19:11:05 +0000 (12:11 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"A mix of fixes and clean-ups that turned up too late for the first
pull request:
- Restore terminal stack frame records. Their previous removal caused
traces which cross secondary_start_kernel to terminate one entry
too late, with a spurious "0" entry.
- Fix boot warning with pseudo-NMI due to the way we manipulate the
PMR register.
- ACPI fixes: avoid corruption of interrupt mappings on watchdog
probe failure (GTDT), prevent unregistering of GIC SGIs.
- Force SPARSEMEM_VMEMMAP as the only memory model, it saves with
having to test all the other combinations.
- Documentation fixes and updates: tagged address ABI exceptions on
brk/mmap/mremap(), event stream frequency, update booting
requirements on the configuration of traps"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kernel: Update the stale comment
arm64: Fix the documented event stream frequency
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
arm64: Explicitly document boot requirements for SVE
arm64: Explicitly require that FPSIMD instructions do not trap
arm64: Relax booting requirements for configuration of traps
arm64: cpufeatures: use min and max
arm64: stacktrace: restore terminal records
arm64/vdso: Discard .note.gnu.property sections in vDSO
arm64: doc: Add brk/mmap/mremap() to the Tagged Address ABI Exceptions
psci: Remove unneeded semicolon
ACPI: irq: Prevent unregistering of GIC SGIs
ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
arm64: Show three registers per line
arm64: remove HAVE_DEBUG_BUGVERBOSE
arm64: alternative: simplify passing alt_region
arm64: Force SPARSEMEM_VMEMMAP as the only memory management model
arm64: vdso32: drop -no-integrated-as flag
Linus Torvalds [Fri, 7 May 2021 18:40:18 +0000 (11:40 -0700)]
Merge tag 'sound-fix-5.13-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few device-specific HD-audio and USB-audio fixes"
* tag 'sound-fix-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HP
ALSA: hda/realtek: Add fixup for HP OMEN laptop
ALSA: hda/realtek: Fix speaker amp on HP Envy AiO 32
ALSA: hda/realtek: Fix silent headphone output on ASUS UX430UA
ALSA: usb-audio: Add dB range mapping for Sennheiser Communications Headset PC 8
ALSA: hda/realtek: ALC285 Thinkpad jack pin quirk is unreachable
Linus Torvalds [Fri, 7 May 2021 18:35:12 +0000 (11:35 -0700)]
Merge tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- dasd spelling fixes (Bhaskar)
- Limit bio max size on multi-page bvecs to the hardware limit, to
avoid overly large bio's (and hence latencies). Originally queued for
the merge window, but needed a fix and was dropped from the initial
pull (Changheun)
- NVMe pull request (Christoph):
- reset the bdev to ns head when failover (Daniel Wagner)
- remove unsupported command noise (Keith Busch)
- misc passthrough improvements (Kanchan Joshi)
- fix controller ioctl through ns_head (Minwoo Im)
- fix controller timeouts during reset (Tao Chiu)
- rnbd fixes/cleanups (Gioh, Md, Dima)
- Fix iov_iter re-expansion (yangerkun)
* tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block:
block: reexpand iov_iter after read/write
nvmet: remove unsupported command noise
nvme-multipath: reset bdev to ns head when failover
nvme-pci: fix controller reset hang when racing with nvme_timeout
nvme: move the fabrics queue ready check routines to core
nvme: avoid memset for passthrough requests
nvme: add nvme_get_ns helper
nvme: fix controller ioctl through ns_head
bio: limit bio max size
RDMA/rtrs: fix uninitialized symbol 'cnt'
s390: dasd: Mundane spelling fixes
block/rnbd: Remove all likely and unlikely
block/rnbd-clt: Check the return value of the function rtrs_clt_query
block/rnbd: Fix style issues
block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
Linus Torvalds [Fri, 7 May 2021 18:29:23 +0000 (11:29 -0700)]
Merge tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Mostly fixes for merge window merged code. In detail:
- Error case memory leak fixes (Colin, Zqiang)
- Add the tools/io_uring/ to the list of maintained files (Lukas)
- Set of fixes for the modified buffer registration API (Pavel)
- Sanitize io thread setup on x86 (Stefan)
- Ensure we truncate transfer count for registered buffers (Thadeu)"
* tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block:
x86/process: setup io_threads more like normal user space threads
MAINTAINERS: add io_uring tool to IO_URING
io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers
io_uring: Fix memory leak in io_sqe_buffers_register()
io_uring: Fix premature return from loop and memory leak
io_uring: fix unchecked error in switch_start()
io_uring: allow empty slots for reg buffers
io_uring: add more build check for uapi
io_uring: dont overlap internal and user req flags
io_uring: fix drain with rsrc CQEs
Linus Torvalds [Fri, 7 May 2021 18:23:41 +0000 (11:23 -0700)]
Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- Add validation of the UDP retrans parameter to prevent shift
out-of-bounds
- Don't discard pNFS layout segments that are marked for return
Bugfixes:
- Fix a NULL dereference crash in xprt_complete_bc_request() when the
NFSv4.1 server misbehaves.
- Fix the handling of NFS READDIR cookie verifiers
- Sundry fixes to ensure attribute revalidation works correctly when
the server does not return post-op attributes.
- nfs4_bitmask_adjust() must not change the server global bitmasks
- Fix major timeout handling in the RPC code.
- NFSv4.2 fallocate() fixes.
- Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling
- Copy offload attribute revalidation fixes
- Fix an incorrect filehandle size check in the pNFS flexfiles driver
- Fix several RDMA transport setup/teardown races
- Fix several RDMA queue wrapping issues
- Fix a misplaced memory read barrier in sunrpc's call_decode()
Features:
- Micro optimisation of the TCP transmission queue using TCP_CORK
- statx() performance improvements by further splitting up the
tracking of invalid cached file metadata.
- Support the NFSv4.2 'change_attr_type' attribute and use it to
optimise handling of change attribute updates"
* tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits)
xprtrdma: Fix a NULL dereference in frwr_unmap_sync()
sunrpc: Fix misplaced barrier in call_decode
NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
xprtrdma: Move fr_mr field to struct rpcrdma_mr
xprtrdma: Move the Work Request union to struct rpcrdma_mr
xprtrdma: Move fr_linv_done field to struct rpcrdma_mr
xprtrdma: Move cqe to struct rpcrdma_mr
xprtrdma: Move fr_cid to struct rpcrdma_mr
xprtrdma: Remove the RPC/RDMA QP event handler
xprtrdma: Don't display r_xprt memory addresses in tracepoints
xprtrdma: Add an rpcrdma_mr_completion_class
xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation
xprtrdma: Avoid Send Queue wrapping
xprtrdma: Do not wake RPC consumer on a failed LocalInv
xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()
xprtrdma: Rename frwr_release_mr()
xprtrdma: rpcrdma_mr_pop() already does list_del_init()
xprtrdma: Delete rpcrdma_recv_buffer_put()
xprtrdma: Fix cwnd update ordering
...
Linus Torvalds [Fri, 7 May 2021 18:18:52 +0000 (11:18 -0700)]
Merge tag '9p-for-5.13-rc1' of git://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
"An error handling fix and constification"
* tag '9p-for-5.13-rc1' of git://github.com/martinetd/linux:
fs: 9p: fix v9fs_file_open writeback fid error check
9p: Constify static struct v9fs_attr_group
Arkadiusz Kubalewski [Fri, 16 Apr 2021 21:43:57 +0000 (23:43 +0200)]
i40e: Remove LLDP frame filters
Remove filters from being setup in case of software DCB and allow the
LLDP frames to be properly transmitted to the wire.
It is not possible to transmit the LLDP frame out of the port, if they
are filtered by control VSI. This prohibits software LLDP agent
properly communicate its DCB capabilities to the neighbors.
Fixes:
4b208eaa8078 ("i40e: Add init and default config of software based DCB")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Mateusz Palczewski [Tue, 13 Apr 2021 14:43:07 +0000 (14:43 +0000)]
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
Unlike other supported adapters, 2.5G and 5G use different
PHY type identifiers for reading/writing PHY settings
and for reading link status. This commit introduces
separate PHY identifiers for these two operation types.
Fixes:
2e45d3f4677a ("i40e: Add support for X710 B/P & SFP+ cards")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jaroslaw Gawin [Tue, 13 Apr 2021 14:19:40 +0000 (14:19 +0000)]
i40e: fix the restart auto-negotiation after FEC modified
When FEC mode was changed the link didn't know it because
the link was not reset and new parameters were not negotiated.
Set a flag 'I40E_AQ_PHY_ENABLE_ATOMIC_LINK' in 'abilities'
to restart the link and make it run with the new settings.
Fixes:
1d96340196f1 ("i40e: Add support FEC configuration for Fortville 25G")
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Yunjian Wang [Mon, 12 Apr 2021 14:41:18 +0000 (22:41 +0800)]
i40e: Fix use-after-free in i40e_client_subtask()
Currently the call to i40e_client_del_instance frees the object
pf->cinst, however pf->cinst->lan_info is being accessed after
the free. Fix this by adding the missing return.
Addresses-Coverity: ("Read from pointer after free")
Fixes:
7b0b1a6d0ac9 ("i40e: Disable iWARP VSI PETCP_ENA flag on netdev down events")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 26 Apr 2021 11:14:01 +0000 (13:14 +0200)]
i40e: fix broken XDP support
Commit
12738ac4754e ("i40e: Fix sparse errors in i40e_txrx.c") broke
XDP support in the i40e driver. That commit was fixing a sparse error
in the code by introducing a new variable xdp_res instead of
overloading this into the skb pointer. The problem is that the code
later uses the skb pointer in if statements and these where not
extended to also test for the new xdp_res variable. Fix this by adding
the correct tests for xdp_res in these places.
The skb pointer was used to store the result of the XDP program by
overloading the results in the error pointer
ERR_PTR(-result). Therefore, the allocation failure test that used to
only test for !skb now need to be extended to also consider !xdp_res.
i40e_cleanup_headers() had a check that based on the skb value being
an error pointer, i.e. a result from the XDP program != XDP_PASS, and
if so start to process a new packet immediately, instead of populating
skb fields and sending the skb to the stack. This check is not needed
anymore, since we have added an explicit test for xdp_res being set
and if so just do continue to pick the next packet from the NIC.
Fixes:
12738ac4754e ("i40e: Fix sparse errors in i40e_txrx.c")
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Eric Dumazet [Thu, 6 May 2021 12:53:50 +0000 (05:53 -0700)]
netfilter: nftables: avoid potential overflows on 32bit arches
User space could ask for very large hash tables, we need to make sure
our size computations wont overflow.
nf_tables_newset() needs to double check the u64 size
will fit into size_t field.
Fixes:
0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Thu, 6 May 2021 12:53:23 +0000 (05:53 -0700)]
netfilter: nftables: avoid overflows in nft_hash_buckets()
Number of buckets being stored in 32bit variables, we have to
ensure that no overflows occur in nft_hash_buckets()
syzbot injected a size == 0x40000000 and reported:
UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 29539 Comm: syz-executor.4 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
__roundup_pow_of_two include/linux/log2.h:57 [inline]
nft_hash_buckets net/netfilter/nft_set_hash.c:411 [inline]
nft_hash_estimate.cold+0x19/0x1e net/netfilter/nft_set_hash.c:652
nft_select_set_ops net/netfilter/nf_tables_api.c:3586 [inline]
nf_tables_newset+0xe62/0x3110 net/netfilter/nf_tables_api.c:4322
nfnetlink_rcv_batch+0xa09/0x24b0 net/netfilter/nfnetlink.c:488
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:612 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:630
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
___sys_sendmsg+0xf3/0x170 net/socket.c:2404
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
Fixes:
0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Fri, 7 May 2021 07:34:51 +0000 (00:34 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
"This is everything else from -mm for this merge window.
90 patches.
Subsystems affected by this patch series: mm (cleanups and slub),
alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat,
checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov,
panic, delayacct, gdb, resource, selftests, async, initramfs, ipc,
drivers/char, and spelling"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits)
mm: fix typos in comments
mm: fix typos in comments
treewide: remove editor modelines and cruft
ipc/sem.c: spelling fix
fs: fat: fix spelling typo of values
kernel/sys.c: fix typo
kernel/up.c: fix typo
kernel/user_namespace.c: fix typos
kernel/umh.c: fix some spelling mistakes
include/linux/pgtable.h: few spelling fixes
mm/slab.c: fix spelling mistake "disired" -> "desired"
scripts/spelling.txt: add "overflw"
scripts/spelling.txt: Add "diabled" typo
scripts/spelling.txt: add "overlfow"
arm: print alloc free paths for address in registers
mm/vmalloc: remove vwrite()
mm: remove xlate_dev_kmem_ptr()
drivers/char: remove /dev/kmem for good
mm: fix some typos and code style problems
ipc/sem.c: mundane typo fixes
...
Lu Jialin [Fri, 7 May 2021 01:06:50 +0000 (18:06 -0700)]
mm: fix typos in comments
succed -> succeed in mm/hugetlb.c
wil -> will in mm/mempolicy.c
wit -> with in mm/page_alloc.c
Retruns -> Returns in mm/page_vma_mapped.c
confict -> conflict in mm/secretmem.c
No functionality changed.
Link: https://lkml.kernel.org/r/20210408140027.60623-1-lujialin4@huawei.com
Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 7 May 2021 01:06:47 +0000 (18:06 -0700)]
mm: fix typos in comments
Fix ~94 single-word typos in locking code comments, plus a few
very obvious grammar mistakes.
Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 7 May 2021 01:06:44 +0000 (18:06 -0700)]
treewide: remove editor modelines and cruft
The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."
I recently receive a patch to explicitly add a new one.
Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.
It is even nicer if scripts/checkpatch.pl can check it.
If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.
[1] https://lore.kernel.org/lkml/
20200703073143.423557-1-danny@kdrag0n.dev/
Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bhaskar Chowdhury [Fri, 7 May 2021 01:06:41 +0000 (18:06 -0700)]
ipc/sem.c: spelling fix
s/purpuse/purpose/
Link: https://lkml.kernel.org/r/20210319221432.26631-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dingsenjie [Fri, 7 May 2021 01:06:39 +0000 (18:06 -0700)]
fs: fat: fix spelling typo of values
vaules -> values
Link: https://lkml.kernel.org/r/20210302034817.30384-1-dingsenjie@163.com
Signed-off-by: dingsenjie <dingsenjie@yulong.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xiaofeng Cao [Fri, 7 May 2021 01:06:36 +0000 (18:06 -0700)]
kernel/sys.c: fix typo
change 'infite' to 'infinite'
change 'concurent' to 'concurrent'
change 'memvers' to 'members'
change 'decendants' to 'descendants'
change 'argumets' to 'arguments'
Link: https://lkml.kernel.org/r/20210316112904.10661-1-cxfcosmos@gmail.com
Signed-off-by: Xiaofeng Cao <caoxiaofeng@yulong.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bhaskar Chowdhury [Fri, 7 May 2021 01:06:33 +0000 (18:06 -0700)]
kernel/up.c: fix typo
s/condtions/conditions/
Link: https://lkml.kernel.org/r/20210317032732.3260835-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xiaofeng Cao [Fri, 7 May 2021 01:06:30 +0000 (18:06 -0700)]
kernel/user_namespace.c: fix typos
change 'verifing' to 'verifying'
change 'certaint' to 'certain'
change 'approprpiate' to 'appropriate'
Link: https://lkml.kernel.org/r/20210317100129.12440-1-caoxiaofeng@yulong.com
Signed-off-by: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhouchuangao [Fri, 7 May 2021 01:06:27 +0000 (18:06 -0700)]
kernel/umh.c: fix some spelling mistakes
Fix some spelling mistakes, and modify the order of the parameter comments
to be consistent with the order of the parameters passed to the function.
Link: https://lkml.kernel.org/r/1615636139-4076-1-git-send-email-zhouchuangao@vivo.com
Signed-off-by: zhouchuangao <zhouchuangao@vivo.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bhaskar Chowdhury [Fri, 7 May 2021 01:06:24 +0000 (18:06 -0700)]
include/linux/pgtable.h: few spelling fixes
Few spelling fixes throughout the file.
Link: https://lkml.kernel.org/r/20210318201404.6380-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Fri, 7 May 2021 01:06:21 +0000 (18:06 -0700)]
mm/slab.c: fix spelling mistake "disired" -> "desired"
There is a spelling mistake in a comment. Fix it.
Link: https://lkml.kernel.org/r/20210317094158.5762-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drew Fustini [Fri, 7 May 2021 01:06:18 +0000 (18:06 -0700)]
scripts/spelling.txt: add "overflw"
Add typo "overflw" for "overflow". This typo was found and fixed in
drivers/clocksource/timer-pistachio.c.
Link: https://lore.kernel.org/lkml/20210305090315.384547-1-drew@beagleboard.org/
Link: https://lkml.kernel.org/r/20210305095151.388182-1-drew@beagleboard.org
Signed-off-by: Drew Fustini <drew@beagleboard.org>
Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zuoqilin [Fri, 7 May 2021 01:06:15 +0000 (18:06 -0700)]
scripts/spelling.txt: Add "diabled" typo
Increase "diabled" spelling error check.
Link: https://lkml.kernel.org/r/20210304070106.2313-1-zuoqilin1@163.com
Signed-off-by: zuoqilin <zuoqilin@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drew Fustini [Fri, 7 May 2021 01:06:12 +0000 (18:06 -0700)]
scripts/spelling.txt: add "overlfow"
Add typo "overlfow" for "overflow". This typo was found and fixed in
net/sctp/tsnmap.c.
Link: https://lore.kernel.org/netdev/20210304055548.56829-1-drew@beagleboard.org/
Link: https://lkml.kernel.org/r/20210304072657.64577-1-drew@beagleboard.org
Signed-off-by: Drew Fustini <drew@beagleboard.org>
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maninder Singh [Fri, 7 May 2021 01:06:09 +0000 (18:06 -0700)]
arm: print alloc free paths for address in registers
In case of a use after free kernel oops, the freeing path of the object
is required to debug futher. In most of cases the object address is
present in one of the registers.
Thus check the register's address and if it belongs to slab, print its
alloc and free path.
e.g. in the below issue register r6 belongs to slab, and a use after
free issue occurred on one of its dereferenced values:
Unable to handle kernel paging request at virtual address
6b6b6b6f
....
pc : [<
c0538afc>] lr : [<
c0465674>] psr:
60000013
sp :
c8927d40 ip :
ffffefff fp :
c8aa8020
r10:
c8927e10 r9 :
00000001 r8 :
00400cc0
r7 :
00000000 r6 :
c8ab0180 r5 :
c1804a80 r4 :
c8aa8008
r3 :
c1a5661c r2 :
00000000 r1 :
6b6b6b6b r0 :
c139bf48
.....
Register r6 information: slab kmalloc-64 start
c8ab0140 data offset 64 pointer offset 0 size 64 allocated at meminfo_proc_show+0x40/0x4fc
meminfo_proc_show+0x40/0x4fc
seq_read_iter+0x18c/0x4c4
proc_reg_read_iter+0x84/0xac
generic_file_splice_read+0xe8/0x17c
splice_direct_to_actor+0xb8/0x290
do_splice_direct+0xa0/0xe0
do_sendfile+0x2d0/0x438
sys_sendfile64+0x12c/0x140
ret_fast_syscall+0x0/0x58
0xbeeacde4
Free path:
meminfo_proc_show+0x5c/0x4fc
seq_read_iter+0x18c/0x4c4
proc_reg_read_iter+0x84/0xac
generic_file_splice_read+0xe8/0x17c
splice_direct_to_actor+0xb8/0x290
do_splice_direct+0xa0/0xe0
do_sendfile+0x2d0/0x438
sys_sendfile64+0x12c/0x140
ret_fast_syscall+0x0/0x58
0xbeeacde4
Link: https://lkml.kernel.org/r/1615891032-29160-3-git-send-email-maninder1.s@samsung.com
Co-developed-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Fri, 7 May 2021 01:06:06 +0000 (18:06 -0700)]
mm/vmalloc: remove vwrite()
The last user (/dev/kmem) is gone. Let's drop it.
Link: https://lkml.kernel.org/r/20210324102351.6932-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: huang ying <huang.ying.caritas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Fri, 7 May 2021 01:06:01 +0000 (18:06 -0700)]
mm: remove xlate_dev_kmem_ptr()
Since /dev/kmem has been removed, let's remove the xlate_dev_kmem_ptr()
leftovers.
Link: https://lkml.kernel.org/r/20210324102351.6932-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Fri, 7 May 2021 01:05:55 +0000 (18:05 -0700)]
drivers/char: remove /dev/kmem for good
Patch series "drivers/char: remove /dev/kmem for good".
Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and
memory ballooning, I started questioning the existence of /dev/kmem.
Comparing it with the /proc/kcore implementation, it does not seem to be
able to deal with things like
a) Pages unmapped from the direct mapping (e.g., to be used by secretmem)
-> kern_addr_valid(). virt_addr_valid() is not sufficient.
b) Special cases like gart aperture memory that is not to be touched
-> mem_pfn_is_ram()
Unless I am missing something, it's at least broken in some cases and might
fault/crash the machine.
Looks like its existence has been questioned before in 2005 and 2010 [1],
after ~11 additional years, it might make sense to revive the discussion.
CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by
mistake?). All distributions disable it: in Ubuntu it has been disabled
for more than 10 years, in Debian since 2.6.31, in Fedora at least
starting with FC3, in RHEL starting with RHEL4, in SUSE starting from
15sp2, and OpenSUSE has it disabled as well.
1) /dev/kmem was popular for rootkits [2] before it got disabled
basically everywhere. Ubuntu documents [3] "There is no modern user of
/dev/kmem any more beyond attackers using it to load kernel rootkits.".
RHEL documents in a BZ [5] "it served no practical purpose other than to
serve as a potential security problem or to enable binary module drivers
to access structures/functions they shouldn't be touching"
2) /proc/kcore is a decent interface to have a controlled way to read
kernel memory for debugging puposes. (will need some extensions to
deal with memory offlining/unplug, memory ballooning, and poisoned
pages, though)
3) It might be useful for corner case debugging [1]. KDB/KGDB might be a
better fit, especially, to write random memory; harder to shoot
yourself into the foot.
4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems
to be incompatible with 64bit [1]. For educational purposes,
/proc/kcore might be used to monitor value updates -- or older
kernels can be used.
5) It's broken on arm64, and therefore, completely disabled there.
Looks like it's essentially unused and has been replaced by better
suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's
just remove it.
[1] https://lwn.net/Articles/147901/
[2] https://www.linuxjournal.com/article/10505
[3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled
[4] https://sourceforge.net/projects/kme/
[5] https://bugzilla.redhat.com/show_bug.cgi?id=154796
Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Corentin Labbe <clabbe@baylibre.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Gregory Clement <gregory.clement@bootlin.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: James Troup <james.troup@canonical.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <kasong@redhat.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: openrisc@lists.librecores.org
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Pavel Machek (CIP)" <pavel@denx.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: sparclinux@vger.kernel.org
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Theodore Dubois <tblodt@icloud.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: William Cohen <wcohen@redhat.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shijie Luo [Fri, 7 May 2021 01:05:51 +0000 (18:05 -0700)]
mm: fix some typos and code style problems
fix some typos and code style problems in mm.
gfp.h: s/MAXNODES/MAX_NUMNODES
mmzone.h: s/then/than
rmap.c: s/__vma_split()/__vma_adjust()
swap.c: s/__mod_zone_page_stat/__mod_zone_page_state, s/is is/is
swap_state.c: s/whoes/whose
z3fold.c: code style problem fix in z3fold_unregister_migration
zsmalloc.c: s/of/or, s/give/given
Link: https://lkml.kernel.org/r/20210419083057.64820-1-luoshijie1@huawei.com
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bhaskar Chowdhury [Fri, 7 May 2021 01:05:48 +0000 (18:05 -0700)]
ipc/sem.c: mundane typo fixes
s/runtine/runtime/
s/AQUIRE/ACQUIRE/
s/seperately/separately/
s/wont/won\'t/
s/succesfull/successful/
Link: https://lkml.kernel.org/r/20210326022240.26375-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 7 May 2021 01:05:45 +0000 (18:05 -0700)]
modules: add CONFIG_MODPROBE_PATH
Allow the developer to specifiy the initial value of the modprobe_path[]
string. This can be used to set it to the empty string initially, thus
effectively disabling request_module() during early boot until userspace
writes a new value via the /proc/sys/kernel/modprobe interface. [1]
When building a custom kernel (often for an embedded target), it's normal
to build everything into the kernel that is needed for booting, and indeed
the initramfs often contains no modules at all, so every such
request_module() done before userspace init has mounted the real rootfs is
a waste of time.
This is particularly useful when combined with the previous patch, which
made the initramfs unpacking asynchronous - for that to work, it had to
make any usermodehelper call wait for the unpacking to finish before
attempting to invoke the userspace helper. By eliminating all such
(known-to-be-futile) calls of usermodehelper, the initramfs unpacking and
the {device,late}_initcalls can proceed in parallel for much longer.
For a relatively slow ppc board I'm working on, the two patches combined
lead to 0.2s faster boot - but more importantly, the fact that the
initramfs unpacking proceeds completely in the background while devices
get probed means I get to handle the gpio watchdog in time without getting
reset.
[1] __request_module() already has an early -ENOENT return when
modprobe_path is the empty string.
Link: https://lkml.kernel.org/r/20210313212528.2956377-3-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>