platform/kernel/linux-rpi.git
3 years agodt-bindings: add resets property to dw-apb-timer
Damien Le Moal [Wed, 10 Feb 2021 05:02:22 +0000 (14:02 +0900)]
dt-bindings: add resets property to dw-apb-timer

The Synopsis DesignWare APB timer driver
(drivers/clocksource/dw_apb_timer_of.c) indirectly uses the resets
property of its node as it executes the function of_reset_control_get().
Make sure that this property is documented in
timer/snps,dw-apb-timer.yaml to avoid make dtbs_check warnings.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: fix sifive gpio properties
Damien Le Moal [Wed, 10 Feb 2021 05:02:21 +0000 (14:02 +0900)]
dt-bindings: fix sifive gpio properties

The sifive gpio IP block supports up to 32 GPIOs. Reflect that in the
interrupts property description and maxItems. Also add the standard
ngpios property to describe the number of GPIOs available on the
implementation.

Also add the "canaan,k210-gpiohs" compatible string to indicate the use
of this gpio controller in the Canaan Kendryte K210 SoC. If this
compatible string is used, do not define the clocks property as
required as the K210 SoC does not have a software controllable clock
for the Sifive gpio IP block.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: update sifive uart compatible string
Damien Le Moal [Wed, 10 Feb 2021 05:02:20 +0000 (14:02 +0900)]
dt-bindings: update sifive uart compatible string

Add the compatible string "canaan,k210-uarths" to the sifive uart
bindings to indicate the use of this IP block in the Canaan Kendryte
K210 SoC.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: update sifive clint compatible string
Damien Le Moal [Wed, 10 Feb 2021 05:02:19 +0000 (14:02 +0900)]
dt-bindings: update sifive clint compatible string

Add the "canaan,k210-clint" compatible string to the Sifive clint
bindings to indicate the use of the "sifive,clint0" IP block in the
Canaan Kendryte K210 SoC. The description of the compatible string
property is also updated to reflect this addition.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: update sifive plic compatible string
Damien Le Moal [Wed, 10 Feb 2021 05:02:18 +0000 (14:02 +0900)]
dt-bindings: update sifive plic compatible string

Add the compatible string "canaan,k210-plic" to the Sifive plic bindings
to indicate the use of the "sifive,plic-1.0.0" IP block in the Canaan
Kendryte K210 SoC. The description is also updated to reflect this
change, that is, that SoCs from other vendors may also use this plic
implementation.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: update risc-v cpu properties
Damien Le Moal [Wed, 10 Feb 2021 05:02:17 +0000 (14:02 +0900)]
dt-bindings: update risc-v cpu properties

The Canaan Kendryte K210 SoC CPU cores are based on a rocket chip
version using a draft verion of the RISC-V ISA specifications. To avoid
any confusion with CPU cores using stable specifications, add the
compatible string "canaan,k210" for this SoC CPU cores.

Also add the "riscv,none" value to the mmu-type property to allow a DT
to indicate that the CPU being described does not have an MMU or that
it has an MMU that is not usable (which is the case for the K210 SoC).

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: add Canaan boards compatible strings
Damien Le Moal [Wed, 10 Feb 2021 05:02:16 +0000 (14:02 +0900)]
dt-bindings: add Canaan boards compatible strings

Introduce the file riscv/canaan.yaml to document compatible strings
related to the Canaan Kendryte K210 SoC. The compatible string
"canaan,kendryte-k210" used to indicate the use of this SoC to the
early SoC init code is added. This new file also defines the compatible
strings of all supported boards based on this SoC.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: update MAINTAINERS file
Damien Le Moal [Wed, 10 Feb 2021 05:02:15 +0000 (14:02 +0900)]
dt-bindings: update MAINTAINERS file

Add a reference to the Canaan K210 system controller driver bindings
file Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
in the MAINTAINERS file entry for this driver.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoclk: Add RISC-V Canaan Kendryte K210 clock driver
Damien Le Moal [Wed, 10 Feb 2021 05:02:14 +0000 (14:02 +0900)]
clk: Add RISC-V Canaan Kendryte K210 clock driver

Add a clock provider driver for the Canaan Kendryte K210 RISC-V SoC.
This new driver with the compatible string "canaan,k210-clk" implements
support for the full clock structure of the K210 SoC. Since it is
required for the correct operation of the SoC, this driver is
selected by default for compilation when the SOC_CANAAN option is
selected.

With this change, the k210-sysctl driver is turned into a simple
platform driver which enables its power bus clock and triggers
populating its child nodes. The sysctl driver retains the SOC early
initialization code, but the implementation now relies on the new
function k210_clk_early_init() provided by the new clk-k210 driver.

The clock structure implemented and many of the coding ideas for the
driver come from the work by Sean Anderson on the K210 support for the
U-Boot project.

Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: linux-clk@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: Add a non-void return for sbi v02 functions
Atish Patra [Thu, 4 Feb 2021 05:26:43 +0000 (21:26 -0800)]
RISC-V: Add a non-void return for sbi v02 functions

SBI v0.2 functions can return an error code from SBI implementation.
We are already processing the SBI error code and coverts it to the Linux
error code.

Propagate to the error code to the caller as well. As of now, kvm is the
only user of these error codes.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: Implement ASID allocator
Anup Patel [Wed, 3 Feb 2021 09:49:07 +0000 (15:19 +0530)]
RISC-V: Implement ASID allocator

Currently, we do local TLB flush on every MM switch. This is very harsh on
performance because we are forcing page table walks after every MM switch.

This patch implements ASID allocator for assigning an ASID to a MM context.
The number of ASIDs are limited in HW so we create a logical entity named
CONTEXTID for assigning to MM context. The lower bits of CONTEXTID are ASID
and upper bits are VERSION number. The number of usable ASID bits supported
by HW are detected at boot-time by writing 1s to ASID bits in SATP CSR.

We allocate new CONTEXTID on first MM switch for a MM context where the
ASID is allocated from an ASID bitmap and VERSION is provide by an atomic
counter. At time of allocating new CONTEXTID, if we run out of available
ASIDs then:
1. We flush the ASID bitmap
2. Increment current VERSION atomic counter
3. Re-allocate ASID from ASID bitmap
4. Flush TLB on all CPUs
5. Try CONTEXTID re-assignment on all CPUs

Please note that we don't use ASID #0 because it is used at boot-time by
all CPUs for initial MM context. Also, newly created context is always
assigned CONTEXTID #0 (i.e. VERSION #0 and ASID #0) which is an invalid
context in our implementation.

Using above approach, we have virtually infinite CONTEXTIDs on-top-of
limited number of HW ASIDs. This approach is inspired from ASID allocator
used for Linux ARM/ARM64 but we have adapted it for RISC-V. Overall, this
ASID allocator helps us reduce rate of local TLB flushes on every CPU
thereby increasing performance.

This patch is tested on QEMU virt machine, Spike and SiFive Unleashed
board. On QEMU virt machine, we see some (3-5% approx) performance
improvement with SW emulated TLBs provided by QEMU. Unfortunately,
the ASID bits of the SATP CSR are not implemented on Spike and SiFive
Unleashed board so we don't see any change in performance. On real HW
having all ASID bits implemented, the performance gains will be much
more due improved sharing of TLB among different processes.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoarch_numa: fix common code printing of phys_addr_t
Randy Dunlap [Thu, 28 Jan 2021 03:55:33 +0000 (19:55 -0800)]
arch_numa: fix common code printing of phys_addr_t

Fix build warnings in the arch_numa common code:

../include/linux/kern_levels.h:5:18: warning: format '%Lx' expects argument of type 'long long unsigned int', but argument 3 has type 'phys_addr_t' {aka 'unsigned int'} [-Wformat=]
../drivers/base/arch_numa.c:360:56: note: format string is defined here
  360 |    pr_warn("Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
../drivers/base/arch_numa.c:435:39: note: format string is defined here
  435 |  pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1);

Fixes: ae3c107cd8be ("numa: Move numa implementation to common code")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: remove unneeded semicolon
Chengyang Fan [Mon, 25 Jan 2021 11:23:47 +0000 (19:23 +0800)]
RISC-V: remove unneeded semicolon

Remove a superfluous semicolon after function definition.

Signed-off-by: Chengyang Fan <cy.fan@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: probes: Treat the instruction stream as host-endian
Palmer Dabbelt [Sat, 23 Jan 2021 03:34:29 +0000 (19:34 -0800)]
RISC-V: probes: Treat the instruction stream as host-endian

Neither of these are actually correct: the instruction stream is defined
(for versions of the ISA manual newer than 2.2) as a stream of 16-bit
little-endian parcels, which is different than just being little-endian.
In theory we should represent this as a type, but we don't have any
concrete plans for the big endian stuff so it doesn't seem worth the
time -- we've got variants of this all over the place.

Instead I'm just dropping the unnecessary type conversion, which is a
NOP on LE systems but causes an sparse error as the types are all mixed
up.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agopinctrl: Add RISC-V Canaan Kendryte K210 FPIOA driver
Damien Le Moal [Tue, 12 Jan 2021 00:58:40 +0000 (09:58 +0900)]
pinctrl: Add RISC-V Canaan Kendryte K210 FPIOA driver

Add the pinctrl-k210.c pinctrl driver for the Canaan Kendryte K210
field programmable IO array (FPIOA) to allow configuring the SoC pin
functions. The K210 has 48 programmable pins which can take any of 256
possible functions.

This patch is inspired from the k210 pinctrl driver for the u-boot
project and contains many direct contributions from Sean Anderson.

The MAINTAINERS file is updated, adding the entry "CANAAN/KENDRYTE K210
SOC FPIOA DRIVER" with myself listed as maintainer for this driver.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoarch/riscv:fix typo in a comment in arch/riscv/kernel/image-vars.h
tangchunyou [Thu, 21 Jan 2021 01:55:13 +0000 (09:55 +0800)]
arch/riscv:fix typo in a comment in arch/riscv/kernel/image-vars.h

"kerne" -> "kernel"

Signed-off-by: WenZhang <zhangwen@yulong.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv/kasan: add KASAN_VMALLOC support
Nylon Chen [Sat, 16 Jan 2021 05:58:35 +0000 (13:58 +0800)]
riscv/kasan: add KASAN_VMALLOC support

It references to x86/s390 architecture.

So, it doesn't map the early shadow page to cover VMALLOC space.

Prepopulate top level page table for the range that would otherwise be
empty.

lower levels are filled dynamically upon memory allocation while
booting.

Signed-off-by: Nylon Chen <nylon7@andestech.com>
Signed-off-by: Nick Hu <nickhu@andestech.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Covert to reserve_initrd_mem()
Kefeng Wang [Fri, 15 Jan 2021 05:46:06 +0000 (13:46 +0800)]
riscv: Covert to reserve_initrd_mem()

Covert to the generic reserve_initrd_mem() function.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoinitramfs: Provide a common initrd reserve function
Kefeng Wang [Fri, 15 Jan 2021 05:46:04 +0000 (13:46 +0800)]
initramfs: Provide a common initrd reserve function

Some architectures(eg, ARM and riscv) have similar logic to
check and reserve the memory of initrd, let's provide a common
function reserve_initrd_mem() to reduce duplicated code.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoinitrd: Add the preprocessor guard in initrd.h
Kefeng Wang [Fri, 15 Jan 2021 05:46:03 +0000 (13:46 +0800)]
initrd: Add the preprocessor guard in initrd.h

Add the preprocessor guard in initrd.h to prevent possible
build error from the multiple inclusion of same header file
multiple time.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: add BUILTIN_DTB support for MMU-enabled targets
Vitaly Wool [Fri, 15 Jan 2021 23:49:48 +0000 (01:49 +0200)]
riscv: add BUILTIN_DTB support for MMU-enabled targets

Sometimes, especially in a production system we may not want to
use a "smart bootloader" like u-boot to load kernel, ramdisk and
device tree from a filesystem on eMMC, but rather load the kernel
from a NAND partition and just run it as soon as we can, and in
this case it is convenient to have device tree compiled into the
kernel binary. Since this case is not limited to MMU-less systems,
let's support it for these which have MMU enabled too.

While at it, provide __dtb_start as a parameter to setup_vm() in
BUILTIN_DTB case, so we don't have to duplicate BUILTIN_DTB specific
processing in MMU-enabled and MMU-disabled versions of setup_vm().

Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv/stacktrace: Fix stack output without ra on the stack top
Chen Huang [Mon, 11 Jan 2021 12:40:14 +0000 (20:40 +0800)]
riscv/stacktrace: Fix stack output without ra on the stack top

When a function doesn't have a callee, then it will not
push ra into the stack, such as lkdtm_BUG() function,

addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
ebreak

The struct stackframe use {fp,ra} to get information from
stack, if walk_stackframe() with pr_regs, we will obtain
wrong value and bad stacktrace,

[<ffffffe00066c56c>] lkdtm_BUG+0x6/0x8
---[ end trace 18da3fbdf08e25d5 ]---

Correct the next fp and pc, after that, full stacktrace
shown as expects,

[<ffffffe00066c56c>] lkdtm_BUG+0x6/0x8
[<ffffffe0008b24a4>] lkdtm_do_action+0x14/0x1c
[<ffffffe00066c372>] direct_entry+0xc0/0x10a
[<ffffffe000439f86>] full_proxy_write+0x42/0x6a
[<ffffffe000309626>] vfs_write+0x7e/0x214
[<ffffffe00030992a>] ksys_write+0x98/0xc0
[<ffffffe000309960>] sys_write+0xe/0x16
[<ffffffe0002014bc>] ret_from_syscall+0x0/0x2
---[ end trace 61917f3d9a9fadcd ]---

Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Improve __show_regs
Kefeng Wang [Mon, 11 Jan 2021 12:40:13 +0000 (20:40 +0800)]
riscv: Improve __show_regs

Show the function symbols of epc and ra to improve the
readability of crash reports, and align the printing
formats about the raw epc value.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add dump stack in show_regs
Kefeng Wang [Mon, 11 Jan 2021 12:40:12 +0000 (20:40 +0800)]
riscv: Add dump stack in show_regs

Like commit 1149aad10b1e ("arm64: Add dump_backtrace() in show_regs"),
dump the stack in riscv show_regs as common code expects.

Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Enable per-task stack canaries
Guo Ren [Thu, 17 Dec 2020 16:29:18 +0000 (16:29 +0000)]
riscv: Enable per-task stack canaries

This enables the use of per-task stack canary values if GCC has
support for emitting the stack canary reference relative to the
value of tp, which holds the task struct pointer in the riscv
kernel.

After compare arm64 and x86 implementations, seems arm64's is more
flexible and readable. The key point is how gcc get the offset of
stack_canary from gs/el0_sp.

x86: Use a fix offset from gs, not flexible.

struct fixed_percpu_data {
/*
 * GCC hardcodes the stack canary as %gs:40.  Since the
 * irq_stack is the object at %gs:0, we reserve the bottom
 * 48 bytes of the irq stack for the canary.
 */
char            gs_base[40]; // :(
unsigned long   stack_canary;
};

arm64: Use -mstack-protector-guard-offset & guard-reg
gcc options:
-mstack-protector-guard=sysreg
-mstack-protector-guard-reg=sp_el0
-mstack-protector-guard-offset=xxx

riscv: Use -mstack-protector-guard-offset & guard-reg
gcc options:
-mstack-protector-guard=tls
-mstack-protector-guard-reg=tp
-mstack-protector-guard-offset=xxx

 GCC's implementation has been merged:
 commit c931e8d5a96463427040b0d11f9c4352ac22b2b0
 Author: Cooper Qu <cooper.qu@linux.alibaba.com>
 Date:   Mon Jul 13 16:15:08 2020 +0800

     RISC-V: Add support for TLS stack protector canary access

In the end, these codes are inserted by gcc before return:

*  0xffffffe00020b396 <+120>:   ld      a5,1008(tp) # 0x3f0
*  0xffffffe00020b39a <+124>:   xor     a5,a5,a4
*  0xffffffe00020b39c <+126>:   mv      a0,s5
*  0xffffffe00020b39e <+128>:   bnez    a5,0xffffffe00020b61c <_do_fork+766>
   0xffffffe00020b3a2 <+132>:   ld      ra,136(sp)
   0xffffffe00020b3a4 <+134>:   ld      s0,128(sp)
   0xffffffe00020b3a6 <+136>:   ld      s1,120(sp)
   0xffffffe00020b3a8 <+138>:   ld      s2,112(sp)
   0xffffffe00020b3aa <+140>:   ld      s3,104(sp)
   0xffffffe00020b3ac <+142>:   ld      s4,96(sp)
   0xffffffe00020b3ae <+144>:   ld      s5,88(sp)
   0xffffffe00020b3b0 <+146>:   ld      s6,80(sp)
   0xffffffe00020b3b2 <+148>:   ld      s7,72(sp)
   0xffffffe00020b3b4 <+150>:   addi    sp,sp,144
   0xffffffe00020b3b6 <+152>:   ret
   ...
*  0xffffffe00020b61c <+766>:   auipc   ra,0x7f8
*  0xffffffe00020b620 <+770>:   jalr    -1764(ra) # 0xffffffe000a02f38 <__stack_chk_fail>

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Cooper Qu <cooper.qu@linux.alibaba.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add support for function error injection
Guo Ren [Thu, 17 Dec 2020 16:01:45 +0000 (16:01 +0000)]
riscv: Add support for function error injection

Inspired by the commit 42d038c4fb00 ("arm64: Add support for function
error injection"), this patch supports function error injection for
riscv.

This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.

Test log:
 cd /sys/kernel/debug/fail_function
 echo sys_clone > inject
 echo 100 > probability
 echo 1 > interval
 ls /
[  313.176875] FAULT_INJECTION: forcing a failure.
[  313.176875] name fail_function, interval 1, probability 100, space 0, times 1
[  313.184357] CPU: 0 PID: 87 Comm: sh Not tainted 5.8.0-rc5-00007-g6a758cc #117
[  313.187616] Call Trace:
[  313.189100] [<ffffffe0002036b6>] walk_stackframe+0x0/0xc2
[  313.191626] [<ffffffe00020395c>] show_stack+0x40/0x4c
[  313.193927] [<ffffffe000556c60>] dump_stack+0x7c/0x96
[  313.194795] [<ffffffe0005522e8>] should_fail+0x140/0x142
[  313.195923] [<ffffffe000299ffc>] fei_kprobe_handler+0x2c/0x5a
[  313.197687] [<ffffffe0009e2ec4>] kprobe_breakpoint_handler+0xb4/0x18a
[  313.200054] [<ffffffe00020357e>] do_trap_break+0x36/0xca
[  313.202147] [<ffffffe000201bca>] ret_from_exception+0x0/0xc
[  313.204556] [<ffffffe000201bbc>] ret_from_syscall+0x0/0x2
-sh: can't fork: Invalid argument

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add uprobes supported
Guo Ren [Thu, 17 Dec 2020 16:01:44 +0000 (16:01 +0000)]
riscv: Add uprobes supported

This patch adds support for uprobes on riscv architecture.

Just like kprobe, it support single-step and simulate instructions.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add KPROBES_ON_FTRACE supported
Guo Ren [Thu, 17 Dec 2020 16:01:43 +0000 (16:01 +0000)]
riscv: Add KPROBES_ON_FTRACE supported

This patch adds support for kprobes on ftrace call sites to avoids
much of the overhead with regular kprobes. Try it with simple
steps:

 echo 'p:myprobe sys_clone a0=%a0 a1=%a1 stack_val=+4($stack)' > /sys/kernel/de
bug/tracing/kprobe_events
 echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
 cat /sys/kernel/debug/tracing/trace
 tracer: nop

 entries-in-buffer/entries-written: 1/1   #P:1

                                _-----=> irqs-off
                               / _----=> need-resched
                              | / _---=> hardirq/softirq
                              || / _--=> preempt-depth
                              ||| /     delay
           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
              | |         |   ||||      |         |
              sh-92      [000] ....   369.899962: myprobe: (sys_clone+0x0/0x28) a0=0x1200011 a1=0x0 stack_val=0x201c20ffffffe0
 cat /sys/kernel/debug/kprobes/list
ffffffe00020b584  k  sys_clone+0x0    [FTRACE]
                                       ^^^^^^

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add kprobes supported
Guo Ren [Thu, 17 Dec 2020 16:01:42 +0000 (16:01 +0000)]
riscv: Add kprobes supported

This patch enables "kprobe & kretprobe" to work with ftrace
interface. It utilized software breakpoint as single-step
mechanism.

Some instructions which can't be single-step executed must be
simulated in kernel execution slot, such as: branch, jal, auipc,
la ...

Some instructions should be rejected for probing and we use a
blacklist to filter, such as: ecall, ebreak, ...

We use ebreak & c.ebreak to replace origin instruction and the
kprobe handler prepares an executable memory slot for out-of-line
execution with a copy of the original instruction being probed.
In execution slot we add ebreak behind original instruction to
simulate a single-setp mechanism.

The patch is based on packi's work [1] and csky's work [2].
 - The kprobes_trampoline.S is all from packi's patch
 - The single-step mechanism is new designed for riscv without hw
   single-step trap
 - The simulation codes are from csky
 - Frankly, all codes refer to other archs' implementation

 [1] https://lore.kernel.org/linux-riscv/20181113195804.22825-1-me@packi.ch/
 [2] https://lore.kernel.org/linux-csky/20200403044150.20562-9-guoren@kernel.org/

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Patrick Stählin <me@packi.ch>
Signed-off-by: Patrick Stählin <me@packi.ch>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Patrick Stählin <me@packi.ch>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT
Guo Ren [Thu, 17 Dec 2020 16:01:41 +0000 (16:01 +0000)]
riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT

This patch changes the current detour mechanism of dynamic ftrace
which has been discussed during LPC 2020 RISCV-MC [1].

Before the patch, we used mcount for detour:
<funca>:
addi sp,sp,-16
sd   ra,8(sp)
sd   s0,0(sp)
addi s0,sp,16
mv   a5,ra
mv   a0,a5
auipc ra,0x0 -> nop
jalr  -296(ra) <_mcount@plt> ->nop
...

After the patch, we use nop call site area for detour:
<funca>:
nop -> REG_S ra, -SZREG(sp)
nop -> auipc ra, 0x?
nop -> jalr ?(ra)
nop -> REG_L ra, -SZREG(sp)
...

The mcount mechanism is mixed with gcc function prologue which is
not very clear. The patchable function entry just put 16 bytes nop
before the front of the function prologue which could be filled
with a separated detour mechanism.

[1] https://www.linuxplumbersconf.org/event/7/contributions/807/

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Fixup patch_text panic in ftrace
Guo Ren [Thu, 17 Dec 2020 16:01:40 +0000 (16:01 +0000)]
riscv: Fixup patch_text panic in ftrace

Just like arm64, we can't trace the function in the patch_text path.

Here is the bug log:

[   45.234334] Unable to handle kernel paging request at virtual address ffffffd38ae80900
[   45.242313] Oops [#1]
[   45.244600] Modules linked in:
[   45.247678] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.9.0-00025-g9b7db83-dirty #215
[   45.255797] epc: ffffffe00021689a ra : ffffffe00021718e sp : ffffffe01afabb58
[   45.262955]  gp : ffffffe00136afa0 tp : ffffffe01af94d00 t0 : 0000000000000002
[   45.270200]  t1 : 0000000000000000 t2 : 0000000000000001 s0 : ffffffe01afabc08
[   45.277443]  s1 : ffffffe0013718a8 a0 : 0000000000000000 a1 : ffffffe01afabba8
[   45.284686]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : c4c16ad38ae80900
[   45.291929]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
[   45.299173]  s2 : 0000000000000001 s3 : ffffffe000206a60 s4 : ffffffe000206a60
[   45.306415]  s5 : 00000000000009ec s6 : ffffffe0013718a8 s7 : c4c16ad38ae80900
[   45.313658]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000001
[   45.320902]  s11: 0000000000000003 t3 : 0000000000000001 t4 : ffffffffd192fe79
[   45.328144]  t5 : ffffffffb8f80000 t6 : 0000000000040000
[   45.333472] status: 0000000200000100 badaddr: ffffffd38ae80900 cause: 000000000000000f
[   45.341514] ---[ end trace d95102172248fdcf ]---
[   45.346176] note: migration/0[11] exited with preempt_count 1

(gdb) x /2i $pc
=> 0xffffffe00021689a <__do_proc_dointvec+196>: sd      zero,0(s7)
   0xffffffe00021689e <__do_proc_dointvec+200>: li      s11,0

(gdb) bt
0  __do_proc_dointvec (tbl_data=0x0, table=0xffffffe01afabba8,
write=0, buffer=0x0, lenp=0x7bf897061f9a0800, ppos=0x4, conv=0x0,
data=0x52464e43) at kernel/sysctl.c:581
1  0xffffffe00021718e in do_proc_dointvec (data=<optimized out>,
conv=<optimized out>, ppos=<optimized out>, lenp=<optimized out>,
buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
at kernel/sysctl.c:964
2  proc_dointvec_minmax (ppos=<optimized out>, lenp=<optimized out>,
buffer=<optimized out>, write=<optimized out>, table=<optimized out>)
at kernel/sysctl.c:964
3  proc_do_static_key (table=<optimized out>, write=1, buffer=0x0,
lenp=0x0, ppos=0x7bf897061f9a0800) at kernel/sysctl.c:1643
4  0xffffffe000206792 in ftrace_make_call (rec=<optimized out>,
addr=<optimized out>) at arch/riscv/kernel/ftrace.c:109
5  0xffffffe0002c9c04 in __ftrace_replace_code
(rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
6  0xffffffe0002ca0b2 in ftrace_replace_code (mod_flags=<optimized
out>) at kernel/trace/ftrace.c:2530
7  0xffffffe0002ca26a in ftrace_modify_all_code (command=5) at
kernel/trace/ftrace.c:2677
8  0xffffffe0002ca30e in __ftrace_modify_code (data=<optimized out>)
at kernel/trace/ftrace.c:2703
9  0xffffffe0002c13b0 in multi_cpu_stop (data=0x0) at kernel/stop_machine.c:224
10 0xffffffe0002c0fde in cpu_stopper_thread (cpu=<optimized out>) at
kernel/stop_machine.c:491
11 0xffffffe0002343de in smpboot_thread_fn (data=0x0) at kernel/smpboot.c:165
12 0xffffffe00022f8b4 in kthread (_create=0xffffffe01af0c040) at
kernel/kthread.c:292
13 0xffffffe000201fac in handle_exception () at arch/riscv/kernel/entry.S:236

   0xffffffe00020678a <+114>:   auipc   ra,0xffffe
   0xffffffe00020678e <+118>:   jalr    -118(ra) # 0xffffffe000204714 <patch_text_nosync>
   0xffffffe000206792 <+122>:   snez    a0,a0

(gdb) disassemble patch_text_nosync
Dump of assembler code for function patch_text_nosync:
   0xffffffe000204714 <+0>:     addi    sp,sp,-32
   0xffffffe000204716 <+2>:     sd      s0,16(sp)
   0xffffffe000204718 <+4>:     sd      ra,24(sp)
   0xffffffe00020471a <+6>:     addi    s0,sp,32
   0xffffffe00020471c <+8>:     auipc   ra,0x0
   0xffffffe000204720 <+12>:    jalr    -384(ra) # 0xffffffe00020459c <patch_insn_write>
   0xffffffe000204724 <+16>:    beqz    a0,0xffffffe00020472e <patch_text_nosync+26>
   0xffffffe000204726 <+18>:    ld      ra,24(sp)
   0xffffffe000204728 <+20>:    ld      s0,16(sp)
   0xffffffe00020472a <+22>:    addi    sp,sp,32
   0xffffffe00020472c <+24>:    ret
   0xffffffe00020472e <+26>:    sd      a0,-24(s0)
   0xffffffe000204732 <+30>:    auipc   ra,0x4
   0xffffffe000204736 <+34>:    jalr    -1464(ra) # 0xffffffe00020817a <flush_icache_all>
   0xffffffe00020473a <+38>:    ld      a0,-24(s0)
   0xffffffe00020473e <+42>:    ld      ra,24(sp)
   0xffffffe000204740 <+44>:    ld      s0,16(sp)
   0xffffffe000204742 <+46>:    addi    sp,sp,32
   0xffffffe000204744 <+48>:    ret

(gdb) disassemble flush_icache_all-4
Dump of assembler code for function flush_icache_all:
   0xffffffe00020817a <+0>:     addi    sp,sp,-8
   0xffffffe00020817c <+2>:     sd      ra,0(sp)
   0xffffffe00020817e <+4>:     auipc   ra,0xfffff
   0xffffffe000208182 <+8>:     jalr    -1822(ra) # 0xffffffe000206a60 <ftrace_caller>
   0xffffffe000208186 <+12>:    ld      ra,0(sp)
   0xffffffe000208188 <+14>:    addi    sp,sp,8
   0xffffffe00020818a <+0>:     addi    sp,sp,-16
   0xffffffe00020818c <+2>:     sd      s0,0(sp)
   0xffffffe00020818e <+4>:     sd      ra,8(sp)
   0xffffffe000208190 <+6>:     addi    s0,sp,16
   0xffffffe000208192 <+8>:     li      a0,0
   0xffffffe000208194 <+10>:    auipc   ra,0xfffff
   0xffffffe000208198 <+14>:    jalr    -410(ra) # 0xffffffe000206ffa <sbi_remote_fence_i>
   0xffffffe00020819c <+18>:    ld      s0,0(sp)
   0xffffffe00020819e <+20>:    ld      ra,8(sp)
   0xffffffe0002081a0 <+22>:    addi    sp,sp,16
   0xffffffe0002081a2 <+24>:    ret

(gdb) frame 5
(rec=0xffffffe01ae40c30, enable=3) at kernel/trace/ftrace.c:2503
2503                    return ftrace_make_call(rec, ftrace_addr);
(gdb) p /x rec->ip
$2 = 0xffffffe00020817a -> flush_icache_all !

When we modified flush_icache_all's patchable-entry with ftrace_caller:
 - Insert ftrace_caller at flush_icache_all prologue.
 - Call flush_icache_all to sync I/Dcache, but flush_icache_all is
just we modified by half.

Link: https://lore.kernel.org/linux-riscv/CAJF2gTT=oDWesWe0JVWvTpGi60-gpbNhYLdFWN_5EbyeqoEDdw@mail.gmail.com/T/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Fixup wrong ftrace remove cflag
Guo Ren [Thu, 17 Dec 2020 16:01:39 +0000 (16:01 +0000)]
riscv: Fixup wrong ftrace remove cflag

We must use $(CC_FLAGS_FTRACE) instead of directly using -pg. It
will cause -fpatchable-function-entry error.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Fixup compile error BUILD_BUG_ON failed
Guo Ren [Thu, 17 Dec 2020 16:01:38 +0000 (16:01 +0000)]
riscv: Fixup compile error BUILD_BUG_ON failed

Unfortunately, the current code couldn't be compiled:

  CC      arch/riscv/kernel/patch.o
In file included from ./include/linux/kernel.h:11,
                 from ./include/linux/list.h:9,
                 from ./include/linux/preempt.h:11,
                 from ./include/linux/spinlock.h:51,
                 from arch/riscv/kernel/patch.c:6:
In function â€˜fix_to_virt’,
    inlined from â€˜patch_map’ at arch/riscv/kernel/patch.c:37:17:
./include/linux/compiler.h:392:38: error: call to â€˜__compiletime_assert_205’ declared with attribute error: BUILD_BUG_ON failed: idx >= __end_of_fixed_addresses
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
./include/linux/compiler.h:373:4: note: in definition of macro â€˜__compiletime_assert’
    prefix ## suffix();    \
    ^~~~~~
./include/linux/compiler.h:392:2: note: in expansion of macro â€˜_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro â€˜compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:2: note: in expansion of macro â€˜BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^~~~~~~~~~~~~~~~
./include/asm-generic/fixmap.h:32:2: note: in expansion of macro â€˜BUILD_BUG_ON’
  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
  ^~~~~~~~~~~~

Because fix_to_virt(, idx) needs a const value, not a dynamic variable of
reg-a0 or BUILD_BUG_ON failed with "idx >= __end_of_fixed_addresses".

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: Implement ptrace regs and stack API
Patrick Stählin [Thu, 17 Dec 2020 16:01:37 +0000 (16:01 +0000)]
RISC-V: Implement ptrace regs and stack API

Needed for kprobes support. Copied and adapted from arm64 code.

Guo Ren fixup pt_regs type for linux-5.8-rc1.

Signed-off-by: Patrick Stählin <me@packi.ch>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add machine name to kernel boot log and stack dump output
Kefeng Wang [Wed, 25 Nov 2020 11:44:15 +0000 (19:44 +0800)]
riscv: Add machine name to kernel boot log and stack dump output

Add the machine name to kernel boot-up log, and install
the machine name to stack dump for DT boot mode.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add numa support for riscv64 platform
Atish Patra [Thu, 19 Nov 2020 00:38:29 +0000 (16:38 -0800)]
riscv: Add numa support for riscv64 platform

Use the generic numa implementation to add NUMA support for RISC-V.
This is based on Greentime's patch[1] but modified to use generic NUMA
implementation and few more fixes.

[1] https://lkml.org/lkml/2020/1/10/233

Co-developed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add support pte_protnone and pmd_protnone if CONFIG_NUMA_BALANCING
Greentime Hu [Thu, 19 Nov 2020 00:38:28 +0000 (16:38 -0800)]
riscv: Add support pte_protnone and pmd_protnone if CONFIG_NUMA_BALANCING

These two functions are used to distinguish between PROT_NONENUMA
protections and hinting fault protections.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Separate memory init from paging init
Atish Patra [Thu, 19 Nov 2020 00:38:27 +0000 (16:38 -0800)]
riscv: Separate memory init from paging init

Currently, we perform some memory init functions in paging init. But,
that will be an issue for NUMA support where DT needs to be flattened
before numa initialization and memblock_present can only be called
after numa initialization.

Move memory initialization related functions to a separate function.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agonuma: Move numa implementation to common code
Atish Patra [Thu, 19 Nov 2020 00:38:26 +0000 (16:38 -0800)]
numa: Move numa implementation to common code

ARM64 numa implementation is generic enough that RISC-V can reuse that
implementation with very minor cosmetic changes. This will help both
ARM64 and RISC-V in terms of maintanace and feature improvement

Move the numa implementation code to common directory so that both ISAs
can reuse this. This doesn't introduce any function changes for ARM64.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoarm64, numa: Change the numa init functions name to be generic
Atish Patra [Thu, 19 Nov 2020 00:38:25 +0000 (16:38 -0800)]
arm64, numa: Change the numa init functions name to be generic

This is a preparatory patch for unifying numa implementation between
ARM64 & RISC-V. As the numa implementation will be moved to generic
code, rename the arm64 related functions to a generic one.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Add Canaan Kendryte K210 reset controller
Damien Le Moal [Sun, 13 Dec 2020 13:50:47 +0000 (22:50 +0900)]
riscv: Add Canaan Kendryte K210 reset controller

Add a reset controller driver for the Canaan Kendryte K210 SoC. This
driver relies on its syscon compatible parent node (sysctl) for its
register mapping. Default this driver compilation to y when the
SOC_CANAAN option is selected.

The MAINTAINERS file is updated, adding the entry "CANAAN/KENDRYTE K210
SOC RESET CONTROLLER DRIVER" with myself listed as maintainer for this
driver.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: pinctrl: Document canaan,k210-fpioa bindings
Damien Le Moal [Sun, 13 Dec 2020 13:50:44 +0000 (22:50 +0900)]
dt-bindings: pinctrl: Document canaan,k210-fpioa bindings

Document the device tree bindings for the Canaan Kendryte K210 SoC
Fully Programmable IO Array (FPIOA) pinctrl driver in
Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml. The
new header file include/dt-bindings/pinctrl/k210-fpioa.h is added to
define all 256 possible pin functions of the SoC IO pins, as well as
macros simplifying the definition of pin functions in a device tree.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: reset: Document canaan,k210-rst bindings
Damien Le Moal [Sun, 13 Dec 2020 13:50:43 +0000 (22:50 +0900)]
dt-bindings: reset: Document canaan,k210-rst bindings

Document the device tree bindings for the Canaan Kendryte K210 SoC
reset controller driver in
Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml. The header
file include/dt-bindings/reset/k210-rst.h is added to define all
possible reset lines of the SoC.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-binding: mfd: Document canaan,k210-sysctl bindings
Damien Le Moal [Sun, 13 Dec 2020 13:50:45 +0000 (22:50 +0900)]
dt-binding: mfd: Document canaan,k210-sysctl bindings

Document the device tree bindings of the Canaan Kendryte K210 SoC
system controller driver in
Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: cleanup Canaan Kendryte K210 sysctl driver
Damien Le Moal [Sun, 13 Dec 2020 13:50:40 +0000 (22:50 +0900)]
riscv: cleanup Canaan Kendryte K210 sysctl driver

Introduce the header file include/soc/canaan/k210-sysctl.h to have a
common definition of the Canaan Kendryte K210 SoC system controller
registers. Simplify the k210 system controller driver code by removing
unused register bits definition. The MAINTAINERS file is updated,
adding the entry "CANAAN/KENDRYTE K210 SOC SYSTEM CONTROLLER DRIVER"
with myself listed as maintainer for this driver.
This is a preparatory patch for introducing the K210 clock driver. No
functional changes are introduced.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Fix Canaan Kendryte K210 device tree
Damien Le Moal [Sun, 13 Dec 2020 13:50:39 +0000 (22:50 +0900)]
riscv: Fix Canaan Kendryte K210 device tree

Remove the clocks property from the cpu and clint nodes as these are
ignored. Also remove the clock-frequency property from the cpu nodes as
riscv relies on the timebase-frequency property.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Use vendor name for K210 SoC support
Damien Le Moal [Sun, 13 Dec 2020 13:50:38 +0000 (22:50 +0900)]
riscv: Use vendor name for K210 SoC support

Rename configuration options and directories related to the Kendryte
K210 SoC to use the SoC vendor name (canaan) instead of the "kendryte"
branding name.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Fix builtin DTB handling
Damien Le Moal [Sun, 13 Dec 2020 13:50:37 +0000 (22:50 +0900)]
riscv: Fix builtin DTB handling

All SiPeed K210 MAIX boards have the exact same vendor, arch and
implementation IDs, preventing differentiation to select the correct
device tree to use through the SOC_BUILTIN_DTB_DECLARE() macro. This
result in this macro to be useless and mandates changing the code of
the sysctl driver to change the builtin device tree suitable for the
target board.

Fix this problem by removing the SOC_BUILTIN_DTB_DECLARE() macro since
it is used only for the K210 support. The code searching the builtin
DTBs using the vendor, arch an implementation IDs is also removed.
Support for builtin DTB falls back to the simpler and more traditional
handling of builtin DTB using the CONFIG_BUILTIN_DTB option, similarly
to other architectures.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: dts: add initial board data for the SiFive HiFive Unmatched
Yash Shah [Tue, 8 Dec 2020 04:55:41 +0000 (10:25 +0530)]
riscv: dts: add initial board data for the SiFive HiFive Unmatched

Add initial board data for the SiFive HiFive Unmatched A00.
This patch is dependent on Zong's Patchset[0].

[0]: https://lore.kernel.org/linux-riscv/20201130082330.77268-4-zong.li@sifive.com/T/#u

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: riscv: Update YAML doc to support SiFive HiFive Unmatched board
Yash Shah [Tue, 8 Dec 2020 04:55:40 +0000 (10:25 +0530)]
dt-bindings: riscv: Update YAML doc to support SiFive HiFive Unmatched board

Add new compatible strings to the YAML DT binding document to support
SiFive's HiFive Unmatched board

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: dts: add initial support for the SiFive FU740-C000 SoC
Yash Shah [Tue, 8 Dec 2020 04:55:39 +0000 (10:25 +0530)]
riscv: dts: add initial support for the SiFive FU740-C000 SoC

Add initial support for the SiFive FU540-C000 SoC. FU740-C000 is built
around the SiFIve U7 Core Complex and a TileLink interconnect.

This file is expected to grow as more device drivers are added to the
kernel.

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: gpio: Update DT binding docs to support SiFive FU740 SoC
Yash Shah [Tue, 8 Dec 2020 04:55:37 +0000 (10:25 +0530)]
dt-bindings: gpio: Update DT binding docs to support SiFive FU740 SoC

Add new compatible strings to the DT binding documents to support SiFive
FU740-C000.

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: pwm: Update DT binding docs to support SiFive FU740 SoC
Yash Shah [Tue, 8 Dec 2020 04:55:35 +0000 (10:25 +0530)]
dt-bindings: pwm: Update DT binding docs to support SiFive FU740 SoC

Add new compatible strings to the DT binding documents to support SiFive
FU740-C000.

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: riscv: Update DT binding docs to support SiFive FU740 SoC
Yash Shah [Tue, 8 Dec 2020 04:55:33 +0000 (10:25 +0530)]
dt-bindings: riscv: Update DT binding docs to support SiFive FU740 SoC

Add new compatible strings in cpus.yaml to support the E71 and U74 CPU
cores ("harts") that are present on FU740-C000 SoC.

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoRISC-V: sifive_l2_cache: Update L2 cache driver to support SiFive FU740
Yash Shah [Thu, 10 Dec 2020 10:28:03 +0000 (15:58 +0530)]
RISC-V: sifive_l2_cache: Update L2 cache driver to support SiFive FU740

SiFive FU740 has 4 ECC interrupt sources as compared to 3 in FU540.
Update the L2 cache controller driver to support this additional
interrupt in case of FU740-C000 chip.

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agodt-bindings: riscv: Update l2 cache DT documentation to add support for SiFive FU740
Yash Shah [Thu, 10 Dec 2020 10:28:02 +0000 (15:58 +0530)]
dt-bindings: riscv: Update l2 cache DT documentation to add support for SiFive FU740

The L2 cache controller in SiFive FU740 has 4 ECC interrupt sources as
compared to 3 in FU540. Update the DT documentation accordingly with
"compatible" and "interrupt" property changes.

Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv/mm: Prevent kernel module to access user memory without uaccess routines
Eric Lin [Fri, 4 Dec 2020 05:42:59 +0000 (13:42 +0800)]
riscv/mm: Prevent kernel module to access user memory without uaccess routines

We found this issue in an legacy out-of-tree kernel module
which didn't properly access user space pointer by get/put_user().
Such an illegal access loops in the page fault handler.
To resolve this, let it die here.

Signed-off-by: Eric Lin <tesheng@andestech.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv/mm: Introduce a die_kernel_fault() helper function
Eric Lin [Fri, 4 Dec 2020 05:42:58 +0000 (13:42 +0800)]
riscv/mm: Introduce a die_kernel_fault() helper function

Like arm64, this patch adds a die_kernel_fault() helper
to ensure the same semantics for the different kernel faults.

Signed-off-by: Eric Lin <tesheng@andestech.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Cleanup sbi function stubs when RISCV_SBI disabled
Kefeng Wang [Thu, 26 Nov 2020 02:40:38 +0000 (10:40 +0800)]
riscv: Cleanup sbi function stubs when RISCV_SBI disabled

Fix sbi_init() function declaration mismatch between RISCV_SBI
enable and disable, as it always returned 0, make it void function.

Drop some stubs which won't be used if RISCV_SBI disabled.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoLinux 5.11-rc2
Linus Torvalds [Sun, 3 Jan 2021 23:55:30 +0000 (15:55 -0800)]
Linux 5.11-rc2

3 years agoMerge tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 2 Jan 2021 20:22:46 +0000 (12:22 -0800)]
Merge tag 's390-5.11-3' of git://git./linux/kernel/git/s390/linux

Pull s390 cleanups from Vasily Gorbik:
 "Update defconfigs and sort config select list"

* tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/Kconfig: sort config S390 select list once again
  s390: update defconfigs

3 years agoMerge tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 2 Jan 2021 19:53:05 +0000 (11:53 -0800)]
Merge tag 'pm-5.11-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a crash in intel_pstate during resume from suspend-to-RAM
  that may occur after recent changes and two resource leaks in error
  paths in the operating performance points (OPP) framework, add a new
  C-states table to intel_idle and update the cpuidle MAINTAINERS entry
  to cover the governors too.

  Specifics:

   - Fix recently introduced crash in the intel_pstate driver that
     occurs if scale-invariance is disabled during resume from
     suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
     values made by the platform firmware (Rafael Wysocki).

   - Fix a memory leak and add a missing clk_put() in error paths in the
     OPP framework (Quanyang Wang, Viresh Kumar).

   - Add new C-states table for SnowRidge processors to the intel_idle
     driver (Artem Bityutskiy).

   - Update the MAINTAINERS entry for cpuidle to make it clear that the
     governors are covered by it too (Lukas Bulwahn)"

* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: add SnowRidge C-state table
  cpufreq: intel_pstate: Fix fast-switch fallback path
  opp: Call the missing clk_put() on error
  opp: fix memory leak in _allocate_opp_table
  MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK

3 years agoMerge branches 'pm-cpufreq' and 'pm-cpuidle'
Rafael J. Wysocki [Sat, 2 Jan 2021 09:16:32 +0000 (10:16 +0100)]
Merge branches 'pm-cpufreq' and 'pm-cpuidle'

* pm-cpufreq:
  cpufreq: intel_pstate: Fix fast-switch fallback path

* pm-cpuidle:
  intel_idle: add SnowRidge C-state table
  MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 1 Jan 2021 20:58:07 +0000 (12:58 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).

  The big core two fixes are for power management ("block: Do not accept
  any requests while suspended" and "block: Fix a race in the runtime
  power management code") which finally sorts out the resume problems
  we've occasionally been having.

  To make the resume fix, there are seven necessary precursors which
  effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
  in block is automatically a power management exempt one.

  All of the non-PM preempt cases are removed except for the one in the
  SCSI Parallel Interface (spi) domain validation which is a genuine
  case where we have to run requests at high priority to validate the
  bus so this becomes an autopm get/put protected request"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
  scsi: cxgb4i: Fix TLS dependency
  scsi: ufs: Un-inline ufshcd_vops_device_reset function
  scsi: ufs: Re-enable WriteBooster after device reset
  scsi: ufs-mediatek: Use correct path to fix compile error
  scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
  scsi: block: Do not accept any requests while suspended
  scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
  scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
  scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
  scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
  scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
  scsi: block: Introduce BLK_MQ_REQ_PM
  scsi: block: Fix a race in the runtime power management code
  scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
  scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
  scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
  scsi: ufs-pci: Fix restore from S4 for Intel controllers
  scsi: ufs-mediatek: Keep VCC always-on for specific devices
  scsi: ufs: Allow regulators being always-on
  scsi: ufs: Clear UAC for RPMB after ufshcd resets
  ...

3 years agoMerge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 1 Jan 2021 20:49:09 +0000 (12:49 -0800)]
Merge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Two minor block fixes from this last week that should go into 5.11:

   - Add missing NOWAIT debugfs definition (Andres)

   - Fix kerneldoc warning introduced this merge window (Randy)"

* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
  block: add debugfs stanza for QUEUE_FLAG_NOWAIT
  fs: block_dev.c: fix kernel-doc warnings from struct block_device changes

3 years agoMerge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 1 Jan 2021 20:29:49 +0000 (12:29 -0800)]
Merge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes that should go into 5.11, all marked for stable as well:

   - Fix issue around identity COW'ing and users that share a ring
     across processes

   - Fix a hang associated with unregistering fixed files (Pavel)

   - Move the 'process is exiting' cancelation a bit earlier, so
     task_works aren't affected by it (Pavel)"

* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
  kernel/io_uring: cancel io_uring before task works
  io_uring: fix io_sqe_files_unregister() hangs
  io_uring: add a helper for setting a ref node
  io_uring: don't assume mm is constant across submits

3 years agodepmod: handle the case of /sbin/depmod without /sbin in PATH
Linus Torvalds [Mon, 28 Dec 2020 19:40:22 +0000 (11:40 -0800)]
depmod: handle the case of /sbin/depmod without /sbin in PATH

Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).

Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.

Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokernel/io_uring: cancel io_uring before task works
Pavel Begunkov [Wed, 30 Dec 2020 21:34:16 +0000 (21:34 +0000)]
kernel/io_uring: cancel io_uring before task works

For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.

Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix io_sqe_files_unregister() hangs
Pavel Begunkov [Wed, 30 Dec 2020 21:34:15 +0000 (21:34 +0000)]
io_uring: fix io_sqe_files_unregister() hangs

io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add a helper for setting a ref node
Pavel Begunkov [Wed, 30 Dec 2020 21:34:14 +0000 (21:34 +0000)]
io_uring: add a helper for setting a ref node

Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client
Linus Torvalds [Wed, 30 Dec 2020 20:02:12 +0000 (12:02 -0800)]
Merge tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A fix for an edge case in MClientRequest encoding and a couple of
  trivial fixups for the new msgr2 support"

* tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client:
  libceph: add __maybe_unused to DEFINE_MSGR2_FEATURE
  libceph: align session_key and con_secret to 16 bytes
  libceph: fix auth_signature buffer allocation in secure mode
  ceph: reencode gid_list when reconnecting

3 years agointel_idle: add SnowRidge C-state table
Artem Bityutskiy [Sun, 27 Dec 2020 10:11:16 +0000 (12:11 +0200)]
intel_idle: add SnowRidge C-state table

Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.

The following has been changed.

 1. C1E latency changed from 10us to 15us. It was measured using the
    open source "wult" tool (the "nic" method, 15us is the 99.99th
    percentile).

 2. C1E power break even changed from 20us to 25us, which may result
    in less C1E residency in some workloads.

 3. C6 latency changed from 50us to 130us. Measured the same way as C1E.

The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.

On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agocpufreq: intel_pstate: Fix fast-switch fallback path
Rafael J. Wysocki [Tue, 29 Dec 2020 17:08:18 +0000 (18:08 +0100)]
cpufreq: intel_pstate: Fix fast-switch fallback path

When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.

However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.

Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
3 years agoMerge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Rafael J. Wysocki [Wed, 30 Dec 2020 17:19:34 +0000 (18:19 +0100)]
Merge branch 'opp/linux-next' of git://git./linux/kernel/git/vireshk/pm

Pull operating performance points (OPP) framework fixes for 5.11-rc2
from Viresh Kumar:

"This contains two patches to fix freeing of resources in error paths."

* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  opp: Call the missing clk_put() on error
  opp: fix memory leak in _allocate_opp_table

3 years agos390/Kconfig: sort config S390 select list once again
Heiko Carstens [Mon, 28 Dec 2020 08:51:57 +0000 (09:51 +0100)]
s390/Kconfig: sort config S390 select list once again

...and add comments at the top and bottom.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
3 years agos390: update defconfigs
Heiko Carstens [Wed, 18 Nov 2020 20:23:33 +0000 (21:23 +0100)]
s390: update defconfigs

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
3 years agoblock: add debugfs stanza for QUEUE_FLAG_NOWAIT
Andres Freund [Mon, 28 Dec 2020 19:27:18 +0000 (11:27 -0800)]
block: add debugfs stanza for QUEUE_FLAG_NOWAIT

This was missed in 021a24460dc2. Leads to the numeric value of
QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
/sys/kernel/debug/block/*/state.

Fixes: 021a24460dc28e7412aecfae89f60e1847e685c0
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agofs: block_dev.c: fix kernel-doc warnings from struct block_device changes
Randy Dunlap [Tue, 29 Dec 2020 03:47:06 +0000 (19:47 -0800)]
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes

Fix new kernel-doc warnings in fs/block_dev.c:

../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'

Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
Fixes: 37c3fc9abb25 ("block: simplify the block device claiming interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Tue, 29 Dec 2020 23:45:49 +0000 (15:45 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "16 patches

  Subsystems affected by this patch series: mm (selftests, hugetlb,
  pagecache, mremap, kasan, and slub), kbuild, checkpatch, misc, and
  lib"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: slub: call account_slab_page() after slab page initialization
  zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
  lib/zlib: fix inflating zlib streams on s390
  lib/genalloc: fix the overflow when size is too big
  kdev_t: always inline major/minor helper functions
  sizes.h: add SZ_8G/SZ_16G/SZ_32G macros
  local64.h: make <asm/local64.h> mandatory
  kasan: fix null pointer dereference in kasan_record_aux_stack
  mm: generalise COW SMC TLB flushing race comment
  mm/mremap.c: fix extent calculation
  mm: memmap defer init doesn't work as expected
  mm: add prototype for __add_to_page_cache_locked()
  checkpatch: prefer strscpy to strlcpy
  Revert "kbuild: avoid static_assert for genksyms"
  mm/hugetlb: fix deadlock in hugetlb_cow error path
  selftests/vm: fix building protection keys test

3 years agomm: slub: call account_slab_page() after slab page initialization
Roman Gushchin [Tue, 29 Dec 2020 23:15:07 +0000 (15:15 -0800)]
mm: slub: call account_slab_page() after slab page initialization

It's convenient to have page->objects initialized before calling into
account_slab_page().  In particular, this information can be used to
pre-alloc the obj_cgroup vector.

Let's call account_slab_page() a bit later, after the initialization of
page->objects.

This commit doesn't bring any functional change, but is required for
further optimizations.

[akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch]

Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agozlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
Randy Dunlap [Tue, 29 Dec 2020 23:15:04 +0000 (15:15 -0800)]
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c

In commit 11fb479ff5d9 ("zlib: export S390 symbols for zlib modules"), I
added EXPORT_SYMBOL()s to dfltcc_inflate.c but then Mikhail said that
these should probably be in dfltcc_syms.c with the other
EXPORT_SYMBOL()s.

However, that is contrary to the current kernel style, which places
EXPORT_SYMBOL() immediately after the function that it applies to, so
move all EXPORT_SYMBOL()s to their respective function locations and
drop the dfltcc_syms.c file.  Also move MODULE_LICENSE() from the
deleted file to dfltcc.c.

[rdunlap@infradead.org: remove dfltcc_syms.o from Makefile]
Link: https://lkml.kernel.org/r/20201227171837.15492-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/20201219052530.28461-1-rdunlap@infradead.org
Fixes: 11fb479ff5d9 ("zlib: export S390 symbols for zlib modules")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Zaslonko Mikhail <zaslonko@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/zlib: fix inflating zlib streams on s390
Ilya Leoshkevich [Tue, 29 Dec 2020 23:15:01 +0000 (15:15 -0800)]
lib/zlib: fix inflating zlib streams on s390

Decompressing zlib streams on s390 fails with "incorrect data check"
error.

Userspace zlib checks inflate_state.flags in order to byteswap checksums
only for zlib streams, and s390 hardware inflate code, which was ported
from there, tries to match this behavior.  At the same time, kernel zlib
does not use inflate_state.flags, so it contains essentially random
values.  For many use cases either zlib stream is zeroed out or checksum
is not used, so this problem is masked, but at least SquashFS is still
affected.

Fix by always passing a checksum to and from the hardware as is, which
matches zlib_inflate()'s expectations.

Link: https://lkml.kernel.org/r/20201215155551.894884-1-iii@linux.ibm.com
Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolib/genalloc: fix the overflow when size is too big
Huang Shijie [Tue, 29 Dec 2020 23:14:58 +0000 (15:14 -0800)]
lib/genalloc: fix the overflow when size is too big

Some graphic card has very big memory on chip, such as 32G bytes.

In the following case, it will cause overflow:

    pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
    ret = gen_pool_add(pool, 0x1000000, SZ_32G, NUMA_NO_NODE);

    va = gen_pool_alloc(pool, SZ_4G);

The overflow occurs in gen_pool_alloc_algo_owner():

....
size = nbits << order;
....

The @nbits is "int" type, so it will overflow.
Then the gen_pool_avail() will return the wrong value.

This patch converts some "int" to "unsigned long", and
changes the compare code in while.

Link: https://lkml.kernel.org/r/20201229060657.3389-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Reported-by: Shi Jiasheng <jiasheng.shi@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokdev_t: always inline major/minor helper functions
Josh Poimboeuf [Tue, 29 Dec 2020 23:14:55 +0000 (15:14 -0800)]
kdev_t: always inline major/minor helper functions

Silly GCC doesn't always inline these trivial functions.

Fixes the following warning:

  arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to new_encode_dev() with UACCESS enabled

Link: https://lkml.kernel.org/r/984353b44a4484d86ba9f73884b7306232e25e30.1608737428.git.jpoimboe@redhat.com
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> [build-tested]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosizes.h: add SZ_8G/SZ_16G/SZ_32G macros
Huang Shijie [Tue, 29 Dec 2020 23:14:52 +0000 (15:14 -0800)]
sizes.h: add SZ_8G/SZ_16G/SZ_32G macros

Add these macros, since we can use them in drivers.

Link: https://lkml.kernel.org/r/20201229072819.11183-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolocal64.h: make <asm/local64.h> mandatory
Randy Dunlap [Tue, 29 Dec 2020 23:14:49 +0000 (15:14 -0800)]
local64.h: make <asm/local64.h> mandatory

Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.

This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.

Build-tested on 21 of 25 arch-es.  (tools problems on the others)

Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.

Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agokasan: fix null pointer dereference in kasan_record_aux_stack
Walter Wu [Tue, 29 Dec 2020 23:14:46 +0000 (15:14 -0800)]
kasan: fix null pointer dereference in kasan_record_aux_stack

Syzbot reported the following [1]:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 2d993067 P4D 2d993067 PUD 19a3c067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 3852 Comm: kworker/1:2 Not tainted 5.10.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events free_ipc
  RIP: 0010:kasan_record_aux_stack+0x77/0xb0

Add null checking slab object from kasan_get_alloc_meta() in order to
avoid null pointer dereference.

[1] https://syzkaller.appspot.com/x/log.txt?x=10a82a50d00000

Link: https://lkml.kernel.org/r/20201228080018.23041-1-walter-zh.wu@mediatek.com
Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: generalise COW SMC TLB flushing race comment
Nicholas Piggin [Tue, 29 Dec 2020 23:14:43 +0000 (15:14 -0800)]
mm: generalise COW SMC TLB flushing race comment

I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue.  The
original changelog and mailing list thread didn't help me either.

This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?

The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.

The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).

References: 4ce072f1faf29 ("mm: fix a race condition under SMC + COW")
Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/mremap.c: fix extent calculation
Kalesh Singh [Tue, 29 Dec 2020 23:14:40 +0000 (15:14 -0800)]
mm/mremap.c: fix extent calculation

When `next < old_addr`, `next - old_addr` arithmetic underflows causing
`extent` to be incorrect.

Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`.

Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com
Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: memmap defer init doesn't work as expected
Baoquan He [Tue, 29 Dec 2020 23:14:37 +0000 (15:14 -0800)]
mm: memmap defer init doesn't work as expected

VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.

Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node.  However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.

E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2.  The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
only one memory section's memmap initialized.  That's why more time is
costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.

Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: add prototype for __add_to_page_cache_locked()
Souptick Joarder [Tue, 29 Dec 2020 23:14:34 +0000 (15:14 -0800)]
mm: add prototype for __add_to_page_cache_locked()

Otherwise it causes a gcc warning:

  mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]

A previous attempt to make this function static led to compilation
errors when CONFIG_DEBUG_INFO_BTF is enabled because
__add_to_page_cache_locked() is referred to by BPF code.

Adding a prototype will silence the warning.

Link: https://lkml.kernel.org/r/1608693702-4665-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocheckpatch: prefer strscpy to strlcpy
Joe Perches [Tue, 29 Dec 2020 23:14:31 +0000 (15:14 -0800)]
checkpatch: prefer strscpy to strlcpy

Prefer strscpy over the deprecated strlcpy function.

Link: https://lkml.kernel.org/r/19fe91084890e2c16fe56f960de6c570a93fa99b.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoRevert "kbuild: avoid static_assert for genksyms"
Masahiro Yamada [Tue, 29 Dec 2020 23:14:28 +0000 (15:14 -0800)]
Revert "kbuild: avoid static_assert for genksyms"

This reverts commit 14dc3983b5dff513a90bd5a8cc90acaf7867c3d0.

Macro Elver had sent a fix proper fix earlier, and also pointed out
corner cases:
 "I guess what you propose is simpler, but might still have corner cases
  where we still get warnings. In particular, if some file (for whatever
  reason) does not include build_bug.h and uses a raw _Static_assert(),
  then we still get warnings. E.g. I see 1 user of raw _Static_assert()
  (drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h )."

I believe the raw use of _Static_assert() should be allowed, so this
should be fixed in genksyms.

Even after commit 14dc3983b5df ("kbuild: avoid static_assert for
genksyms"), I confirmed the following test code emits the warning.

  ---------------->8----------------
  #include <linux/export.h>

  _Static_assert((1 ?: 0), "");

  void foo(void) { }
  EXPORT_SYMBOL(foo);
  ---------------->8----------------

  WARNING: modpost: EXPORT symbol "foo" [vmlinux] version generation failed, symbol will not be versioned.

Now that commit 869b91992bce ("genksyms: Ignore module scoped
_Static_assert()") fixed this issue properly, the workaround should
be reverted.

Link: https://lkml.org/lkml/2020/12/10/845
Link: https://lkml.kernel.org/r/20201219183911.181442-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/hugetlb: fix deadlock in hugetlb_cow error path
Mike Kravetz [Tue, 29 Dec 2020 23:14:25 +0000 (15:14 -0800)]
mm/hugetlb: fix deadlock in hugetlb_cow error path

syzbot reported the deadlock here [1].  The issue is in hugetlb cow
error handling when there are not enough huge pages for the faulting
task which took the original reservation.  It is possible that other
(child) tasks could have consumed pages associated with the reservation.
In this case, we want the task which took the original reservation to
succeed.  So, we unmap any associated pages in children so that they can
be used by the faulting task that owns the reservation.

The unmapping code needs to hold i_mmap_rwsem in write mode.  However,
due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
sharing synchronization") we are already holding i_mmap_rwsem in read
mode when hugetlb_cow is called.

Technically, i_mmap_rwsem does not need to be held in read mode for COW
mappings as they can not share pmd's.  Modifying the fault code to not
take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings
is too involved for a stable fix.

Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before
unmapping.  This is OK as it is technically not needed.  They are
reacquired after unmapping as expected by calling code.  Since this is
done in an uncommon error path, the overhead of dropping and reacquiring
mutexes is acceptable.

While making changes, remove redundant BUG_ON after unmap_ref_private.

[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com

Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: syzbot+5eee4145df3c15e96625@syzkaller.appspotmail.com
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoselftests/vm: fix building protection keys test
Harish [Tue, 29 Dec 2020 23:14:22 +0000 (15:14 -0800)]
selftests/vm: fix building protection keys test

Commit d8cbe8bfa7d ("tools/testing/selftests/vm: fix build error") tried
to include a ARCH check for powerpc, however ARCH is not defined in the
Makefile before including lib.mk.  This makes test building to skip on
both x86 and powerpc.

Fix the arch check by replacing it using machine type as it is already
defined and used in the test.

Link: https://lkml.kernel.org/r/20201215100402.257376-1-harish@linux.ibm.com
Fixes: d8cbe8bfa7d ("tools/testing/selftests/vm: fix build error")
Signed-off-by: Harish <harish@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoio_uring: don't assume mm is constant across submits
Jens Axboe [Tue, 29 Dec 2020 17:50:46 +0000 (10:50 -0700)]
io_uring: don't assume mm is constant across submits

If we COW the identity, we assume that ->mm never changes. But this
isn't true of multiple processes end up sharing the ring. Hence treat
id->mm like like any other process compontent when it comes to the
identity mapping. This is pretty trivial, just moving the existing grab
into io_grab_identity(), and including a check for the match.

Cc: stable@vger.kernel.org # 5.10
Fixes: 1e6fa5216a0e ("io_uring: COW io_identity on mismatch")
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>:
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>:
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'for-5.11/dm-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Mon, 28 Dec 2020 21:32:16 +0000 (13:32 -0800)]
Merge tag 'for-5.11/dm-fix' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mike Snitzer:
 "Revert WQ_SYSFS change that broke reencryption (and all other
  functionality that requires reloading a dm-crypt DM table)"

* tag 'for-5.11/dm-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  Revert "dm crypt: export sysfs of kcryptd workqueue"

3 years agoRevert "dm crypt: export sysfs of kcryptd workqueue"
Mike Snitzer [Mon, 28 Dec 2020 21:13:52 +0000 (16:13 -0500)]
Revert "dm crypt: export sysfs of kcryptd workqueue"

This reverts commit a2b8b2d975673b1a50ab0bcce5d146b9335edfad.

WQ_SYSFS breaks the ability to reload a DM table due to sysfs kobject
collision (due to active and inactive table). Given lack of
demonstrated need for exposing this workqueue via sysfs: revert
exposing it.

Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
3 years agolibceph: add __maybe_unused to DEFINE_MSGR2_FEATURE
Ilya Dryomov [Wed, 16 Dec 2020 17:40:05 +0000 (18:40 +0100)]
libceph: add __maybe_unused to DEFINE_MSGR2_FEATURE

Avoid -Wunused-const-variable warnings for "make W=1".

Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agolibceph: align session_key and con_secret to 16 bytes
Ilya Dryomov [Tue, 15 Dec 2020 15:49:07 +0000 (16:49 +0100)]
libceph: align session_key and con_secret to 16 bytes

crypto_shash_setkey() and crypto_aead_setkey() will do a (small)
GFP_ATOMIC allocation to align the key if it isn't suitably aligned.
It's not a big deal, but at the same time easy to avoid.

The actual alignment requirement is dynamic, queryable with
crypto_shash_alignmask() and crypto_aead_alignmask(), but shouldn't
be stricter than 16 bytes for our algorithms.

Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>