Andrew Jones [Wed, 26 Apr 2023 14:13:32 +0000 (16:13 +0200)]
RISC-V: hwprobe: There can only be one first
Only capture the first cpu_id in order for the comparison
below to be of any use.
Fixes: ea3de9ce8aa2 ("RISC-V: Add a syscall for HW probing")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230426141333.10063-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Mon, 24 Apr 2023 09:23:13 +0000 (11:23 +0200)]
riscv: Allow to downgrade paging mode from the command line
Add 2 early command line parameters that allow to downgrade satp mode
(using the same naming as x86):
- "no5lvl": use a 4-level page table (down from sv57 to sv48)
- "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39)
Note that going through the device tree to get the kernel command line
works with ACPI too since the efi stub creates a device tree anyway with
the command line.
In KASAN kernels, we can't use the libfdt that early in the boot process
since we are not ready to execute instrumented functions. So instead of
using the "generic" libfdt, we compile our own versions of those functions
that are not instrumented and that are prefixed so that they do not
conflict with the generic ones. We also need the non-instrumented versions
of the string functions and the prefixed versions of memcpy/memmove.
This is largely inspired by commit
aacd149b6238 ("arm64: head: avoid
relocating the kernel twice for KASLR") from which I removed compilation
flags that were not relevant to RISC-V at the moment (LTO, SCS). Also
note that we have to link with -z norelro to avoid ld.lld to throw a
warning with the new .got sections, like in commit
311bea3cb9ee ("arm64:
link with -z norelro for LLD or aarch64-elf").
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley [Mon, 24 Apr 2023 17:05:43 +0000 (18:05 +0100)]
dt-bindings: riscv: add sv57 mmu-type
Dumping the dtb from new versions of QEMU warns that sv57 is an
undocumented mmu-type. The kernel has supported sv57 for about a year,
so bring it into the fold.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230424-rival-habitual-478567c516f0@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Thu, 20 Apr 2023 19:49:34 +0000 (12:49 -0700)]
RISC-V: hwprobe: Remove __init on probe_vendor_features()
probe_vendor_features() is now called from smp_callin(), which is not
__init code and runs during cpu hotplug events. Remove the
__init_or_module decoration from it and the functions it calls to avoid
walking into outer space.
Fixes: 62a31d6e38bd ("RISC-V: hwprobe: Support probing of misaligned access performance")
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230420194934.1871356-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 19 Apr 2023 14:47:45 +0000 (07:47 -0700)]
Merge patch series "Introduce 64b relocatable kernel"
Alexandre Ghiti <alexghiti@rivosinc.com> says:
After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.
The first patch allows to build relocatable kernels but is not selected
by default. That patch is a requirement for KASLR.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.
* b4-shazam-merge:
riscv: Use --emit-relocs in order to move .rela.dyn in init
riscv: Check relocations at compile time
powerpc: Move script to check relocations at compile time in scripts/
riscv: Introduce CONFIG_RELOCATABLE
riscv: Move .rela.dyn outside of init to avoid empty relocations
riscv: Prepare EFI header for relocatable kernels
Link: https://lore.kernel.org/r/20230329045329.64565-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Wed, 29 Mar 2023 04:53:29 +0000 (06:53 +0200)]
riscv: Use --emit-relocs in order to move .rela.dyn in init
To circumvent an issue where placing the relocations inside the init
sections produces empty relocations, use --emit-relocs. But to avoid
carrying those relocations in vmlinux, use an intermediate
vmlinux.relocs file which is a copy of vmlinux *before* stripping its
relocations.
Suggested-by: Björn Töpel <bjorn@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-7-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Wed, 29 Mar 2023 04:53:28 +0000 (06:53 +0200)]
riscv: Check relocations at compile time
Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.
There exists a script in scripts/ that extracts the relocations from
vmlinux that is then used at postlink to check the relocations.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230329045329.64565-6-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Wed, 29 Mar 2023 04:53:27 +0000 (06:53 +0200)]
powerpc: Move script to check relocations at compile time in scripts/
Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.
Powerpc architecture has a script that allows to check at compile time
for such unexpected relocations: extract the common logic to scripts/
so that other architectures can take advantage of it.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20230329045329.64565-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Wed, 29 Mar 2023 04:53:26 +0000 (06:53 +0200)]
riscv: Introduce CONFIG_RELOCATABLE
This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are embedded into
the kernel.
Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the compiler
options are those used in arm64 which uses the same RELA relocation
format.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Wed, 29 Mar 2023 04:53:25 +0000 (06:53 +0200)]
riscv: Move .rela.dyn outside of init to avoid empty relocations
This is a preparatory patch for relocatable kernels: .rela.dyn should be
in .init but doing so actually produces empty relocations, so this should
be a temporary commit until we find a solution.
This issue was reported here [1].
[1] https://lore.kernel.org/all/
4a6fc7a3-9697-a49b-0941-
97f32194b0d7@ghiti.fr/.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Wed, 29 Mar 2023 04:53:24 +0000 (06:53 +0200)]
riscv: Prepare EFI header for relocatable kernels
ld does not handle relocations correctly as explained here [1],
a fix for that was proposed by Nelson there but we have to support older
toolchains and then provide this fix.
Note that llvm does not need this fix and is then excluded.
[1] https://sourceware.org/pipermail/binutils/2023-March/126690.html
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 19 Apr 2023 14:24:56 +0000 (07:24 -0700)]
Merge patch series "RISC-V kasan rework"
Alexandre Ghiti <alexghiti@rivosinc.com> says:
As described in patch 2, our current kasan implementation is intricate,
so I tried to simplify the implementation and mimic what arm64/x86 are
doing.
In addition it fixes UEFI bootflow with a kasan kernel and kasan inline
instrumentation: all kasan configurations were tested on a large ubuntu
kernel with success with KASAN_KUNIT_TEST and KASAN_MODULE_TEST.
inline ubuntu config + uefi:
sv39: OK
sv48: OK
sv57: OK
outline ubuntu config + uefi:
sv39: OK
sv48: OK
sv57: OK
Actually 1 test always fails with KASAN_KUNIT_TEST that I have to check:
KASAN failure expected in "set_bit(nr, addr)", but none occurrred
Note that Palmer recently proposed to remove COMMAND_LINE_SIZE from the
userspace abi
https://lore.kernel.org/lkml/
20221211061358.28035-1-palmer@rivosinc.com/T/
so that we can finally increase the command line to fit all kasan kernel
parameters.
All of this should hopefully fix the syzkaller riscv build that has been
failing for a few months now, any test is appreciated and if I can help
in any way, please ask.
* b4-shazam-merge:
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
riscv: Fix ptdump when KASAN is enabled
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
riscv: Rework kasan population functions
riscv: Split early and final KASAN population functions
Link: https://lore.kernel.org/r/20230203075232.274282-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 3 Feb 2023 07:52:32 +0000 (08:52 +0100)]
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
If KASAN is enabled, VMAP_STACK depends on KASAN_VMALLOC so enable
KASAN_VMALLOC with KASAN so that we can enable VMAP_STACK by default.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-7-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 3 Feb 2023 07:52:31 +0000 (08:52 +0100)]
riscv: Fix ptdump when KASAN is enabled
The KASAN shadow region was moved next to the kernel mapping but the
ptdump code was not updated and it appears to break the dump of the kernel
page table, so fix this by moving the KASAN shadow region in ptdump.
Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-6-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 3 Feb 2023 07:52:30 +0000 (08:52 +0100)]
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
The EFI stub must not use any KASAN instrumented code as the kernel
proper did not initialize the thread pointer and the mapping for the
KASAN shadow region.
Avoid using the generic strcmp function, instead use the one in
drivers/firmware/efi/libstub/string.c.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 3 Feb 2023 07:52:29 +0000 (08:52 +0100)]
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
The early virtual address should lie in the kernel address space for
inline kasan instrumentation to succeed, otherwise kasan tries to
dereference an address that does not exist in the address space (since
kasan only maps *kernel* address space, not the userspace).
Simply use the very first address of the kernel address space for the
early fdt mapping.
It allowed an Ubuntu kernel to boot successfully with inline
instrumentation.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 3 Feb 2023 07:52:28 +0000 (08:52 +0100)]
riscv: Rework kasan population functions
Our previous kasan population implementation used to have the final kasan
shadow region mapped with kasan_early_shadow_page, because we did not clean
the early mapping and then we had to populate the kasan region "in-place"
which made the code cumbersome.
So now we clear the early mapping, establish a temporary mapping while we
populate the kasan shadow region with just the kernel regions that will
be used.
This new version uses the "generic" way of going through a page table
that may be folded at runtime (avoid the XXX_next macros).
It was tested with outline instrumentation on an Ubuntu kernel
configuration successfully.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 3 Feb 2023 07:52:27 +0000 (08:52 +0100)]
riscv: Split early and final KASAN population functions
This is a preliminary work that allows to make the code more
understandable.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 19 Apr 2023 03:43:07 +0000 (20:43 -0700)]
Merge patch series "riscv: Use PUD/P4D/PGD pages for the linear mapping"
Alexandre Ghiti <alexghiti@rivosinc.com> says:
This patchset intends to improve tlb utilization by using hugepages for
the linear mapping.
As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
take care of isolating the kernel text and rodata so that they are not
mapped with a PUD mapping which would then assign wrong permissions to
the whole region: it is achieved the same way as arm64 by using the
memblock nomap API which isolates those regions and re-merge them afterwards
thus avoiding any issue with the system resources tree creation.
arch/riscv/include/asm/page.h | 19 ++++++-
arch/riscv/mm/init.c | 102 ++++++++++++++++++++++++++--------
arch/riscv/mm/physaddr.c | 16 ++++++
drivers/of/fdt.c | 11 ++--
4 files changed, 118 insertions(+), 30 deletions(-)
* b4-shazam-merge:
riscv: Use PUD/P4D/PGD pages for the linear mapping
riscv: Move the linear mapping creation in its own function
riscv: Get rid of riscv_pfn_base variable
Link: https://lore.kernel.org/r/20230324155421.271544-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 24 Mar 2023 15:54:21 +0000 (16:54 +0100)]
riscv: Use PUD/P4D/PGD pages for the linear mapping
During the early page table creation, we used to set the mapping for
PAGE_OFFSET to the kernel load address: but the kernel load address is
always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
PAGE_OFFSET is).
But actually we don't have to establish this mapping (ie set va_pa_offset)
that early in the boot process because:
- first, setup_vm installs a temporary kernel mapping and among other
things, discovers the system memory,
- then, setup_vm_final creates the final kernel mapping and takes
advantage of the discovered system memory to create the linear
mapping.
During the first phase, we don't know the start of the system memory and
then until the second phase is finished, we can't use the linear mapping at
all and phys_to_virt/virt_to_phys translations must not be used because it
would result in a different translation from the 'real' one once the final
mapping is installed.
So here we simply delay the initialization of va_pa_offset to after the
system memory discovery. But to make sure noone uses the linear mapping
before, we add some guard in the DEBUG_VIRTUAL config.
Finally we can use PUD/P4D/PGD hugepages when possible, which will result
in a better TLB utilization.
Note that:
- this does not apply to rv32 as the kernel mapping lies in the linear
mapping.
- we rely on the firmware to protect itself using PMP.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Rob Herring <robh@kernel.org> # DT bits
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 24 Mar 2023 15:54:20 +0000 (16:54 +0100)]
riscv: Move the linear mapping creation in its own function
No change intended, it just splits the linear mapping creation from
setup_vm_final: this prepares for upcoming additions to the linear
mapping creation.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Fri, 24 Mar 2023 15:54:19 +0000 (16:54 +0100)]
riscv: Get rid of riscv_pfn_base variable
Use directly phys_ram_base instead, riscv_pfn_base is just the pfn of
the address contained in phys_ram_base.
Even if there is no functional change intended in this patch, actually
setting phys_ram_base that early changes the behaviour of
kernel_mapping_pa_to_va during the early boot: phys_ram_base used to be
zero before this patch and now it is set to the physical start address of
the kernel. But it does not break the conversion of a kernel physical
address into a virtual address since kernel_mapping_pa_to_va should only
be used on kernel physical addresses, i.e. addresses greater than the
physical start address of the kernel.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley [Wed, 5 Apr 2023 10:21:10 +0000 (11:21 +0100)]
RISC-V: align ISA extension Kconfig help text with each other
Other extensions only capitalise the first letter in the text visible
in Kconfig menus, and provide a short comment about the extension's
meaning. Do the same for Svnapot & Svpbmt.
The precedent for capitalisation in the Kconfig text was set by Zicbom
& sorta followed for Zicboz. The RVI styling used for multi-letter
extensions only capitalises the first letter, so do the same here.
If nothing else, my OCD likes it when the extensions follow a consistent
pattern.
While editing one of the lines, reformat the "spelling" of 64-bit.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230405-pucker-cogwheel-3a999a94a2f2@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Song Shuai [Fri, 10 Mar 2023 11:03:36 +0000 (19:03 +0800)]
riscv: Kconfig: enable SCHED_MC kconfig
RISC-V now builds the sched domain based on the simple possible map.
Enable SCHED_MC to make the building based on cpu_coregroup_mask()
which also takes care of the NUMA and cores with LLC.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230310110336.970985-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Song Shuai [Thu, 23 Mar 2023 12:39:24 +0000 (20:39 +0800)]
riscv: export cpu/freq invariant to scheduler
RISC-V now manages CPU topology using arch_topology which provides
CPU capacity and frequency related interfaces to access the cpu/freq
invariant in possible heterogeneous or DVFS-enabled platforms.
Here adds topology.h file to export the arch_topology interfaces for
replacing the scheduler's constant-based cpu/freq invariant accounting.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Ley Foon Tan <lftan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230323123924.3032174-1-suagrfillet@gmail.com
[Palmer: Fix the whitespace issues.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Tue, 18 Apr 2023 23:01:19 +0000 (16:01 -0700)]
Merge patch series "RISC-V Hardware Probing User Interface"
Evan Green <evan@rivosinc.com> says:
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
* b4-shazam-merge:
RISC-V: Add hwprobe vDSO function and data
selftests: Test the new RISC-V hwprobe interface
RISC-V: hwprobe: Support probing of misaligned access performance
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: Add a syscall for HW probing
RISC-V: Move struct riscv_cpuinfo to new header
Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Fri, 7 Apr 2023 23:11:03 +0000 (16:11 -0700)]
RISC-V: Add hwprobe vDSO function and data
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the
riscv_hwprobe syscall and answer common queries. We stash a copy of
static answers for the "all CPUs" case in the vDSO data page. This data
is private to the vDSO, so we can decide later to change what's stored
there or under what conditions we defer to the syscall. Currently all
data can be discovered at boot, so the vDSO function answers all queries
when the cpumask is set to the "all CPUs" hint.
There's also a boolean in the data that lets the vDSO function know that
all CPUs are the same. In that case, the vDSO will also answer queries
for arbitrary CPU masks in addition to the "all CPUs" hint.
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Fri, 7 Apr 2023 23:11:02 +0000 (16:11 -0700)]
selftests: Test the new RISC-V hwprobe interface
This adds a test for the recently added RISC-V interface for probing
hardware capabilities. It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-6-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Fri, 7 Apr 2023 23:11:01 +0000 (16:11 -0700)]
RISC-V: hwprobe: Support probing of misaligned access performance
This allows userspace to select various routines to use based on the
performance of misaligned access on the target hardware.
Rather than adding DT bindings, this change taps into the alternatives
mechanism used to probe CPU errata. Add a new function pointer alongside
the vendor-specific errata_patch_func() that probes for desirable errata
(otherwise known as "features"). Unlike the errata_patch_func(), this
function is called on each CPU as it comes up, so it can save
feature information per-CPU.
The T-head C906 has fast unaligned access, both as defined by GCC [1],
and in performing a basic benchmark, which determined that byte copies
are >50% slower than a misaligned word copy of the same data size (source
for this test at [2]):
bytecopy size f000 count 50000 offset 0 took
31664899 us
wordcopy size f000 count 50000 offset 0 took
5180919 us
wordcopy size f000 count 50000 offset 1 took
13416949 us
[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L353
[2] https://pastebin.com/EPXvDHSW
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-5-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Fri, 7 Apr 2023 23:11:00 +0000 (16:11 -0700)]
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
We have an implicit set of base behaviors that userspace depends on,
which are mostly defined in various ISA specifications.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-4-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Fri, 7 Apr 2023 23:10:59 +0000 (16:10 -0700)]
RISC-V: Add a syscall for HW probing
We don't have enough space for these all in ELF_HWCAP{,2} and there's no
system call that quite does this, so let's just provide an arch-specific
one to probe for hardware capabilities. This currently just provides
m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
the future.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green [Fri, 7 Apr 2023 23:10:58 +0000 (16:10 -0700)]
RISC-V: Move struct riscv_cpuinfo to new header
In preparation for tracking and exposing microarchitectural details to
userspace (like whether or not unaligned accesses are fast), move the
riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
need to be used by more than just cpu.c.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Song Shuai [Wed, 8 Mar 2023 06:47:34 +0000 (14:47 +0800)]
Revert "riscv: Set more data to cacheinfo"
This reverts commit
baf7cbd94b5688f167443a2cc3dcea3300132099.
There are some duplicate cache attributes populations executed
in both ci_leaf_init() and later cache_setup_properties().
Revert the commit
baf7cbd94b56 ("riscv: Set more data to cacheinfo")
to setup only the level and type attributes at this early place.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230308064734.512457-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Björn Töpel [Mon, 3 Apr 2023 06:52:07 +0000 (08:52 +0200)]
riscv: entry: Save a0 prior syscall_enter_from_user_mode()
The RISC-V calling convention passes the first argument, and the
return value in the a0 register. For this reason, the a0 register
needs some extra care; When handling syscalls, the a0 register is
saved into regs->orig_a0, so a0 can be properly restored for,
e.g. interrupted syscalls.
This functionality was broken with the introduction of the generic
entry patches. Here, a0 was saved into orig_a0 after calling
syscall_enter_from_user_mode(), which can change regs->a0 for some
paths, incorrectly restoring a0.
This is resolved, by saving a0 prior doing the
syscall_enter_from_user_mode() call.
Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230403065207.1070974-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley [Fri, 24 Mar 2023 12:12:41 +0000 (12:12 +0000)]
RISC-V: convert new selectors of RISCV_ALTERNATIVE to dependencies
for-next contains two additional extensions that select
RISCV_ALTERNATIVE. RISCV_ALTERNATIVE no longer needs to be selected by
individual config options as it is now selected for !XIP_KERNEL builds
by the top level RISCV option.
These extensions rely on the alternative framework, so convert the
"select"s to "depends on"s instead.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230324121240.3594777-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 29 Mar 2023 18:56:08 +0000 (11:56 -0700)]
Merge patch series "RISC-V: Fixes for riscv_has_extension[un]likely()'s alternative dependency"
Conor Dooley <conor.dooley@microchip.com> says:
Here's my attempt at fixing both the use of an FPU on XIP kernels and
the issue that Jason ran into where CONFIG_FPU, which needs the
alternatives frame work for has_fpu() checks, could be enabled without
the alternatives actually being present.
For the former, a "slow" fallback that does not use alternatives is
added to riscv_has_extension_[un]likely() that can be used with XIP.
Obviously, we want to make use of Jisheng's alternatives based approach
where possible, so any users of riscv_has_extension_[un]likely() will
want to make sure that they select RISCV_ALTERNATIVE.
If they don't however, they'll hit the fallback path which (should,
sparing a silly mistake from me!) behave in the same way, thus
succeeding silently. Sounds like a
To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading
like the plague through the various places that want to check for the
presence of extensions, and sidestep the potential silent "success"
mentioned above, all users RISCV_ALTERNATIVE are converted from selects
to dependencies, with the option being selected for all !XIP_KERNEL
builds.
I know that the VDSO was a key place that Jisheng wanted to use the new
helper rather than static branches, and I think the fallback path
should not cause issues there.
See the thread at [1] for the prior discussion.
1 - https://lore.kernel.org/linux-riscv/
20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573
[Palmer: these were also merged into fixes, but there's a cleanup that
depends on the merge so I'm taking it into for-next as well.]
* b4-shazam-merge:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* commit '
1ee7fc3f4d0a93831a20d5566f203d5ad6d44de8':
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Conor Dooley [Fri, 24 Mar 2023 10:05:39 +0000 (10:05 +0000)]
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
When moving switch_to's has_fpu() over to using
riscv_has_extension_likely() rather than static branches, the FPU code
gained a dependency on the alternatives framework.
That dependency has now been removed, as riscv_has_extension_ikely() now
contains a fallback path, using __riscv_isa_extension_available(), but
if CONFIG_RISCV_ALTERNATIVE isn't selected when CONFIG_FPU is, has_fpu()
checks will not benefit from the "fast path" that the alternatives
framework provides.
We want to ensure that alternatives are available whenever
riscv_has_extension_[un]likely() is used, rather than silently falling
back to the slow path, but rather than rely on selecting
RISCV_ALTERNATIVE in the myriad of locations that may use
riscv_has_extension_[un]likely(), select it (almost) always instead by
adding it to the main RISCV config entry.
xip kernels cannot make use of the alternatives framework, so it is not
enabled for those configurations, although this is the status quo.
All current sites that select RISCV_ALTERNATIVE are converted to
dependencies on the option instead. The explicit dependencies on
!XIP_KERNEL can be dropped, as RISCV_ALTERNATIVE is not user selectable.
Fixes: 702e64550b12 ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ZBruFRwt3rUVngPu@zx2c4.com/
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-3-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley [Fri, 24 Mar 2023 10:05:38 +0000 (10:05 +0000)]
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
The has_fpu() check, which in turn calls riscv_has_extension_likely(),
relies on alternatives to figure out whether the system has an FPU.
As a result, it will malfunction on XIP kernels, as they do not support
the alternatives mechanism.
When alternatives support is not present, fall back to using
__riscv_isa_extension_available() in riscv_has_extension_[un]likely()
instead stead, which handily takes the same argument, so that kernels
that do not support alternatives can accurately report the presence of
FPU support.
Fixes: 702e64550b12 ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ad445951-3d13-4644-94d9-e0989cda39c3@spud/
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-2-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Mon, 27 Mar 2023 23:27:49 +0000 (16:27 -0700)]
Merge patch series "Add RISC-V 32 NOMMU support"
Jesse Taube <mr.bossman075@gmail.com> says:
This patch-set aims to add NOMMU support to RV32.
Many people want to build simple emulators or HDL
models of RISC-V this patch makes it possible to
run linux on them.
Yimin Gu is the original author of this set.
Submitted here:
https://lists.buildroot.org/pipermail/buildroot/2022-November/656134.html
Though Jesse T rewrote the Dconf.
* b4-shazam-merge:
riscv: configs: Add nommu PHONY defconfig for RV32
riscv: Kconfig: Allow RV32 to build with no MMU
Link: https://lore.kernel.org/r/20230301002657.352637-1-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jesse Taube [Wed, 1 Mar 2023 00:26:57 +0000 (19:26 -0500)]
riscv: configs: Add nommu PHONY defconfig for RV32
32bit risc-v can be configured to run without MMU. Introduce
rv32_nommu_virt_defconfig .PHONY target, that is based on
nommu_virt_defconfig. This is similar to how rv32_defconfig
is based on "defconfig".
Suggested-by: Conor Dooley <conor@kernel.org>
Signed-off-by: Jesse Taube <Mr.Bossman075@gmail.com>
Cc: Yimin Gu <ustcymgu@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230301002657.352637-4-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Yimin Gu [Wed, 1 Mar 2023 00:26:56 +0000 (19:26 -0500)]
riscv: Kconfig: Allow RV32 to build with no MMU
Some RISC-V 32bit cores do not have an MMU, and the kernel should be
able to build for them. This patch enables the RV32 to be built with
no MMU support.
Signed-off-by: Yimin Gu <ustcymgu@gmail.com>
CC: Jesse Taube <Mr.Bossman075@gmail.com>
Tested-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: Jesse Taube <Mr.Bossman075@gmail.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230301002657.352637-3-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Thu, 23 Mar 2023 15:47:05 +0000 (08:47 -0700)]
Merge patch series "riscv: Add GENERIC_ENTRY support"
guoren@kernel.org <guoren@kernel.org> says:
From: Guo Ren <guoren@linux.alibaba.com>
The patches convert riscv to use the generic entry infrastructure from
kernel/entry/*. Some optimization for entry.S with new .macro and merge
ret_from_kernel_thread into ret_from_fork.
* b4-shazam-merge:
riscv: entry: Consolidate general regs saving/restoring
riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
riscv: entry: Convert to generic entry
riscv: entry: Add noinstr to prevent instrumentation inserted
riscv: ptrace: Remove duplicate operation
Link: https://lore.kernel.org/r/20230222033021.983168-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Wed, 22 Feb 2023 03:30:21 +0000 (22:30 -0500)]
riscv: entry: Consolidate general regs saving/restoring
Consolidate the saving/restoring GPs (except zero, ra, sp, gp,
tp and t0) into save_from_x6_to_x31/restore_from_x6_to_x31 macros.
No functional change intended.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-8-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Wed, 22 Feb 2023 03:30:20 +0000 (22:30 -0500)]
riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
the only difference is whether call the fn(arg) or not, this can be
achieved by testing fn is NULL or not, I.E s0 is 0 or not. Many
architectures have done the same thing, it makes entry.S more clean.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-7-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Wed, 22 Feb 2023 03:30:19 +0000 (22:30 -0500)]
riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
Since riscv is converted to generic entry, there's no need for the
extra wrappers of trace_hardirqs_{on,off}.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-6-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Guo Ren [Wed, 22 Feb 2023 03:30:18 +0000 (22:30 -0500)]
riscv: entry: Convert to generic entry
This patch converts riscv to use the generic entry infrastructure from
kernel/entry/*. The generic entry makes maintainers' work easier and
codes more elegant. Here are the changes:
- More clear entry.S with handle_exception and ret_from_exception
- Get rid of complex custom signal implementation
- Move syscall procedure from assembly to C, which is much more
readable.
- Connect ret_from_fork & ret_from_kernel_thread to generic entry.
- Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
- Use the standard preemption code instead of custom
Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Guo Ren [Wed, 22 Feb 2023 03:30:17 +0000 (22:30 -0500)]
riscv: entry: Add noinstr to prevent instrumentation inserted
Without noinstr the compiler is free to insert instrumentation (think
all the k*SAN, KCov, GCov, ftrace etc..) which can call code we're not
yet ready to run this early in the entry path, for instance it could
rely on RCU which isn't on yet, or expect lockdep state. (by peterz)
Link: https://lore.kernel.org/linux-riscv/YxcQ6NoPf3AH0EXe@hirez.programming.kicks-ass.net/
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-4-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Guo Ren [Wed, 22 Feb 2023 03:30:16 +0000 (22:30 -0500)]
riscv: ptrace: Remove duplicate operation
The TIF_SYSCALL_TRACE is controlled by a common code, see
kernel/ptrace.c and include/linux/thread_info.h.
clear_task_syscall_work(child, SYSCALL_TRACE);
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230222033021.983168-3-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 15 Mar 2023 14:11:08 +0000 (07:11 -0700)]
Merge patch series "RISC-V: Apply Zicboz to clear_page"
Andrew Jones <ajones@ventanamicro.com> says:
When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch6's commit message for more details).
For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).
* b4-shazam-merge:
RISC-V: KVM: Expose Zicboz to the guest
RISC-V: KVM: Provide UAPI for Zicboz block size
RISC-V: Use Zicboz in clear_page when available
RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
RISC-V: Add Zicboz detection and block size parsing
dt-bindings: riscv: Document cboz-block-size
RISC-V: Factor out body of riscv_init_cbom_blocksize loop
RISC-V: alternatives: Support patching multiple insns in assembly
Link: https://lore.kernel.org/r/20230224162631.405473-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:31 +0000 (17:26 +0100)]
RISC-V: KVM: Expose Zicboz to the guest
Guests may use the cbo.zero instruction when the CPU has the Zicboz
extension and the hypervisor sets henvcfg.CBZE.
Add Zicboz support for KVM guests which may be enabled and
disabled from KVM userspace using the ISA extension ONE_REG API.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230224162631.405473-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:30 +0000 (17:26 +0100)]
RISC-V: KVM: Provide UAPI for Zicboz block size
We're about to allow guests to use the Zicboz extension. KVM
userspace needs to know the cache block size in order to
properly advertise it to the guest. Provide a virtual config
register for userspace to get it with the GET_ONE_REG API, but
setting it cannot be supported, so disallow SET_ONE_REG.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230224162631.405473-8-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:29 +0000 (17:26 +0100)]
RISC-V: Use Zicboz in clear_page when available
Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by using the alternatives' cpufeature
value (the upper 16 bits of patch_id). The value used is the maximum
zicboz block size order accepted at the patch site. This enables us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)
For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:28 +0000 (17:26 +0100)]
RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
cpufeature IDs are consecutive integers starting at 26, so a 32-bit
patch ID allows an aircraft carrier load of feature IDs. Repurposing
the upper 16 bits still leaves a boat load of feature IDs and gains
16 bits which may be used to control patching on a per patch-site
basis.
This will be initially used in Zicboz's application to clear_page(),
as Zicboz's block size must also be considered. In that case, the
upper 16-bit value's role will be to convey the maximum block size
which the Zicboz clear_page() implementation supports.
cpufeature patch sites which need to check for the existence or
absence of other cpufeatures may also be able to make use of this.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:27 +0000 (17:26 +0100)]
RISC-V: Add Zicboz detection and block size parsing
Parse "riscv,cboz-block-size" from the DT by piggybacking on Zicbom's
riscv_init_cbom_blocksize(). Additionally check the DT for the presence
of the "zicboz" extension and, when it's present, validate the parsed
cboz block size as we do Zicbom's cbom block size with
riscv_isa_extension_check().
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:26 +0000 (17:26 +0100)]
dt-bindings: riscv: Document cboz-block-size
The Zicboz operation (cbo.zero) operates on a block-size defined
for the cpu-core. While we already have the riscv,cbom-block-size
property, it only provides the block size for Zicbom operations.
Even though it's likely Zicboz and Zicbom will use the same size,
that's not required by the specification. Create another property
specifically for Zicboz.
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230224162631.405473-4-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:25 +0000 (17:26 +0100)]
RISC-V: Factor out body of riscv_init_cbom_blocksize loop
Refactor riscv_init_cbom_blocksize() to prepare for it to be used
for both cbom block size and cboz block size.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 16:26:24 +0000 (17:26 +0100)]
RISC-V: alternatives: Support patching multiple insns in assembly
As pointed out in commit
d374a16539b1 ("RISC-V: fix compile error
from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
parameters passed to macros within macros to avoid spaces being
interpreted as separators. ALT_NEW_CONTENT was trying to handle
this by defining new_c has a vararg, but this isn't sufficient
for calling ALTERNATIVE() from assembly with multiple instructions
in the new/old sequences. Remove the vararg "hack" and use quotes.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 15 Mar 2023 03:51:34 +0000 (20:51 -0700)]
Merge patch series "riscv: alternative/cpufeature related cleanups"
Andrew Jones <ajones@ventanamicro.com> says:
This series has no intended functional change. These cleanups were
found while renaming errata_id to patch_id in order to better
convey that its purpose is larger than errata (it's also for
cpufeatures).
* b4-shazam-merge:
riscv: cpufeature: Drop errata_list.h and other unused includes
riscv: lib: Include hwcap.h directly
riscv: alternatives: Rename errata_id to patch_id
riscv: alternatives: Remove unnecessary define and unused struct
riscv: Rename Kconfig.erratas to Kconfig.errata
riscv: Clarify RISCV_ALTERNATIVE help text
Link: https://lore.kernel.org/r/20230224154601.88163-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 15:46:01 +0000 (16:46 +0100)]
riscv: cpufeature: Drop errata_list.h and other unused includes
Drop errata_list.h, since cpufeature.c includes hwcap.h directly to
get cpufeature IDs. And, while there, prune the rest of the unused
includes too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 15:46:00 +0000 (16:46 +0100)]
riscv: lib: Include hwcap.h directly
When using alternatives for cpufeatures we should include hwcap.h
directly, rather than through errata_list.h. Opportunistically drop
an unused include too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 15:45:59 +0000 (16:45 +0100)]
riscv: alternatives: Rename errata_id to patch_id
Alternatives are used for both errata and cpufeatures. Use a more
generic name, 'patch_id', as in "ID of code patching site", to
avoid confusion when alternatives are used for cpufeatures.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 15:45:58 +0000 (16:45 +0100)]
riscv: alternatives: Remove unnecessary define and unused struct
A define and a struct were introduced with commit
6f4eea90465a
("riscv: Introduce alternative mechanism to apply errata solution"),
which introduced alternatives to RISC-V. The define is used for
an arbitrary string length, specific to sifive errata, so just use
the number directly there instead. The struct has never been used,
so remove it.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-4-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 15:45:57 +0000 (16:45 +0100)]
riscv: Rename Kconfig.erratas to Kconfig.errata
Errata is already plural for erratum. Rename it to make the
grammar gooder.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones [Fri, 24 Feb 2023 15:45:56 +0000 (16:45 +0100)]
riscv: Clarify RISCV_ALTERNATIVE help text
Clarify RISCV_ALTERNATIVE's help text by pointing out that code
patching is not only done at boot time, but also module load time.
Also point out that this is the minimal possible overhead.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230224154601.88163-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Thu, 9 Mar 2023 23:46:40 +0000 (15:46 -0800)]
Merge patch series "riscv, mm: detect svnapot cpu support at runtime"
Qinglin Pan <panqinglin00@gmail.com> says:
Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K
page. This patch set is for using Svnapot in hugetlb fs and huge vmap.
This patchset adds a Kconfig item for using Svnapot in
"Platform type"->"SVNAPOT extension support". Its default value is on,
and people can set it off if they don't allow kernel to detect Svnapot
hardware support and leverage it.
Tested on:
- qemu rv64 with "Svnapot support" off and svnapot=true.
- qemu rv64 with "Svnapot support" on and svnapot=true.
- qemu rv64 with "Svnapot support" off and svnapot=false.
- qemu rv64 with "Svnapot support" on and svnapot=false.
* b4-shazam-merge:
riscv: mm: support Svnapot in huge vmap
riscv: mm: support Svnapot in hugetlb page
riscv: mm: modify pte format for Svnapot
Link: https://lore.kernel.org/r/20230209131647.17245-1-panqinglin00@gmail.com
[Palmer: fix up the feature ordering in the merge]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Qinglin Pan [Thu, 9 Feb 2023 13:16:47 +0000 (21:16 +0800)]
riscv: mm: support Svnapot in huge vmap
As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
for Svnapot to support huge vmap about napot size.
It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
can be tested by test_vmalloc with [1] applied, and probe this
module to run fix_size_alloc_test with use_huge true.
[1]https://lore.kernel.org/all/
20221212055657.698420-1-panqinglin2020@iscas.ac.cn/
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230209131647.17245-4-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Qinglin Pan [Thu, 9 Feb 2023 13:16:46 +0000 (21:16 +0800)]
riscv: mm: support Svnapot in hugetlb page
Svnapot can be used to support 64KB hugetlb page, so it can become a new
option when using hugetlbfs. Add a basic implementation of hugetlb page,
and support 64KB as a size in it by using Svnapot.
For test, boot kernel with command line contains "default_hugepagesz=64K
hugepagesz=64K hugepages=20" and run a simple test like this:
tools/testing/selftests/vm/map_hugetlb 1 16
And it should be passed.
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-3-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Qinglin Pan [Thu, 9 Feb 2023 13:16:45 +0000 (21:16 +0800)]
riscv: mm: modify pte format for Svnapot
Add one alternative to enable/disable svnapot support, enable this static
key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
option is set. It will influence the behavior of has_svnapot. All code
dependent on svnapot should make sure that has_svnapot return true firstly.
Modify PTE definition for Svnapot, and creates some functions in pgtable.h
to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
64KB napot size is supported in spec, so some macros has only 64KB version.
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-2-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Sun, 5 Mar 2023 22:52:03 +0000 (14:52 -0800)]
Linux 6.3-rc1
Linus Torvalds [Sat, 4 Mar 2023 21:35:43 +0000 (13:35 -0800)]
cpumask: re-introduce constant-sized cpumask optimizations
Commit
aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.
The optimization was then later added back in a limited form by commit
6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.
Instead, just re-introduce the optimization, with some changes.
Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":
- the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
This is used for situations where we should use the exact size.
- the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
fits in a single word and the bitmap operations thus end up able
to trigger the "small_const_nbits()" optimizations.
This is used for the operations that have optimized single-word
cases that get inlined, notably the bit find and scanning functions.
- the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
is an sufficiently small constant that makes simple "copy" and
"clear" operations more efficient.
This is arbitrarily set at four words or less.
As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like
movl nr_cpu_ids(%rip), %edx
addq $63, %rdx
shrq $3, %rdx
andl $-8, %edx
callq memset@PLT
on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.
In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single
movq $0,cpumask
instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.
Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.
But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.
In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'. Better remove them now than have somebody introduce use
of them later.
Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless. Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 5 Mar 2023 19:32:30 +0000 (11:32 -0800)]
Merge tag 'v6.3-p2' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression in the caam driver"
* tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: caam - Fix edesc/iv ordering mixup
Linus Torvalds [Sun, 5 Mar 2023 19:27:48 +0000 (11:27 -0800)]
Merge tag 'x86-urgent-2023-03-05' of git://git./linux/kernel/git/tip/tip
Pull x86 updates from Thomas Gleixner:
"A small set of updates for x86:
- Return -EIO instead of success when the certificate buffer for SEV
guests is not large enough
- Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
on return to userspace for performance reasons, but the leaves user
space vulnerable to cross-thread attacks which STIBP prevents.
Update the documentation accordingly"
* tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt/sev-guest: Return -EIO if certificate buffer is not large enough
Documentation/hw-vuln: Document the interaction between IBRS and STIBP
x86/speculation: Allow enabling STIBP with legacy IBRS
Linus Torvalds [Sun, 5 Mar 2023 19:19:16 +0000 (11:19 -0800)]
Merge tag 'irq-urgent-2023-03-05' of git://git./linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"A set of updates for the interrupt susbsystem:
- Prevent possible NULL pointer derefences in
irq_data_get_affinity_mask() and irq_domain_create_hierarchy()
- Take the per device MSI lock before invoking code which relies on
it being hold
- Make sure that MSI descriptors are unreferenced before freeing
them. This was overlooked when the platform MSI code was converted
to use core infrastructure and results in a fals positive warning
- Remove dead code in the MSI subsystem
- Clarify the documentation for pci_msix_free_irq()
- More kobj_type constification"
* tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced
genirq/msi: Drop dead domain name assignment
irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy()
genirq/irqdesc: Make kobj_type structures constant
PCI/MSI: Clarify usage of pci_msix_free_irq()
genirq/msi: Take the per-device MSI lock before validating the control structure
genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
Linus Torvalds [Sun, 5 Mar 2023 19:11:52 +0000 (11:11 -0800)]
Merge tag 'pull-misc' of git://git./linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:
"Adding Christian Brauner as VFS co-maintainer"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Adding VFS co-maintainer
Linus Torvalds [Sun, 5 Mar 2023 19:07:58 +0000 (11:07 -0800)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull VM_FAULT_RETRY fixes from Al Viro:
"Some of the page fault handlers do not deal with the following case
correctly:
- handle_mm_fault() has returned VM_FAULT_RETRY
- there is a pending fatal signal
- fault had happened in kernel mode
Correct action in such case is not "return unconditionally" - fatal
signals are handled only upon return to userland and something like
copy_to_user() would end up retrying the faulting instruction and
triggering the same fault again and again.
What we need to do in such case is to make the caller to treat that as
failed uaccess attempt - handle exception if there is an exception
handler for faulting instruction or oops if there isn't one.
Over the years some architectures had been fixed and now are handling
that case properly; some still do not. This series should fix the
remaining ones.
Status:
- m68k, riscv, hexagon, parisc: tested/acked by maintainers.
- alpha, sparc32, sparc64: tested locally - bug has been reproduced
on the unpatched kernel and verified to be fixed by this series.
- ia64, microblaze, nios2, openrisc: build, but otherwise completely
untested"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
openrisc: fix livelock in uaccess
nios2: fix livelock in uaccess
microblaze: fix livelock in uaccess
ia64: fix livelock in uaccess
sparc: fix livelock in uaccess
alpha: fix livelock in uaccess
parisc: fix livelock in uaccess
hexagon: fix livelock in uaccess
riscv: fix livelock in uaccess
m68k: fix livelock in uaccess
Masahiro Yamada [Sun, 16 Oct 2022 18:23:49 +0000 (03:23 +0900)]
Remove Intel compiler support
include/linux/compiler-intel.h had no update in the past 3 years.
We often forget about the third C compiler to build the kernel.
For example, commit
a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
only mentioned GCC and Clang.
init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
and nobody has reported any issue.
I guess the Intel Compiler support is broken, and nobody is caring
about it.
Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
deprecated:
$ icc -v
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
deprecated and will be removed from product release in the second half
of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
compiler moving forward. Please transition to use this compiler. Use
'-diag-disable=10441' to disable this message.
icc version 2021.7.0 (gcc version 12.1.0 compatibility)
Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
complete adoption of LLVM".
lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
untouched for better sync with https://github.com/facebook/zstd
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 5 Mar 2023 01:27:29 +0000 (20:27 -0500)]
Adding VFS co-maintainer
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 4 Mar 2023 22:48:29 +0000 (14:48 -0800)]
Merge tag 'i2c-for-6.3-rc1-part2' of git://git./linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"Some improvements/fixes for the newly added GXP driver and a Kconfig
dependency fix"
* tag 'i2c-for-6.3-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: gxp: fix an error code in probe
i2c: gxp: return proper error on address NACK
i2c: gxp: remove "empty" switch statement
i2c: Disable I2C_APPLE when I2C_PASEMI is a builtin
Linus Torvalds [Sat, 4 Mar 2023 22:03:27 +0000 (14:03 -0800)]
mm: avoid gcc complaint about pointer casting
The migration code ends up temporarily stashing information of the wrong
type in unused fields of the newly allocated destination folio. That
all works fine, but gcc does complain about the pointer type mis-use:
mm/migrate.c: In function ‘__migrate_folio_extract’:
mm/migrate.c:1050:20: note: randstruct: casting between randomized structure pointer types (ssa): ‘struct anon_vma’ and ‘struct address_space’
1050 | *anon_vmap = (void *)dst->mapping;
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
and gcc is actually right to complain since it really doesn't understand
that this is a very temporary special case where this is ok.
This could be fixed in different ways by just obfuscating the assignment
sufficiently that gcc doesn't see what is going on, but the truly
"proper C" way to do this is by explicitly using a union.
Using unions for type conversions like this is normally hugely ugly and
syntactically nasty, but this really is one of the few cases where we
want to make it clear that we're not doing type conversion, we're really
re-using the value bit-for-bit just using another type.
IOW, this should not become a common pattern, but in this one case using
that odd union is probably the best way to document to the compiler what
is conceptually going on here.
[ Side note: there are valid cases where we convert pointers to other
pointer types, notably the whole "folio vs page" situation, where the
types actually have fundamental commonalities.
The fact that the gcc note is limited to just randomized structures
means that we don't see equivalent warnings for those cases, but it
migth also mean that we miss other cases where we do play these kinds
of dodgy games, and this kind of explicit conversion might be a good
idea. ]
I verified that at least for an allmodconfig build on x86-64, this
generates the exact same code, apart from line numbers and assembler
comment changes.
Fixes: 64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()")
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 4 Mar 2023 21:32:50 +0000 (13:32 -0800)]
Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes.
Eight are for MM and seven are for other parts of the kernel. Seven
are cc:stable and eight address post-6.3 issues or were judged
unsuitable for -stable backporting"
* tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: map Dikshita Agarwal's old address to his current one
mailmap: map Vikash Garodia's old address to his current one
fs/cramfs/inode.c: initialize file_ra_state
fs: hfsplus: fix UAF issue in hfsplus_put_super
panic: fix the panic_print NMI backtrace setting
lib: parser: update documentation for match_NUMBER functions
kasan, x86: don't rename memintrinsics in uninstrumented files
kasan: test: fix test for new meminstrinsic instrumentation
kasan: treat meminstrinsic as builtins in uninstrumented files
kasan: emit different calls for instrumentable memintrinsics
ocfs2: fix non-auto defrag path not working issue
ocfs2: fix defrag path triggering jbd2 ASSERT
mailmap: map Georgi Djakov's old Linaro address to his current one
mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH
mm/damon/paddr: fix missing folio_put()
mm/mremap: fix dup_anon_vma() in vma_merge() case 4
Linus Torvalds [Sat, 4 Mar 2023 19:20:42 +0000 (11:20 -0800)]
Merge tag 'powerpc-6.3-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Drop orphaned VAS MAINTAINERS entry
- Fix build errors with clang and KCSAN
- Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together
with recordmcount
Thanks to Nathan Chancellor.
* tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Avoid dead code/data elimination when using recordmcount
powerpc/vmlinux.lds: Add .text.asan/tsan sections
powerpc: Drop orphaned VAS MAINTAINERS entry
Linus Torvalds [Sat, 4 Mar 2023 18:53:59 +0000 (10:53 -0800)]
Merge tag 'sound-fix-6.3-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of various small fixes that have been gathered since the
last PR.
The majority of changes are for ASoC, and there is a small change in
ASoC PCM core, but the rest are all for driver- specific fixes /
quirks / updates"
* tag 'sound-fix-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
ALSA: ice1712: Delete unreachable code in aureon_add_controls()
ALSA: ice1712: Do not left ice->gpio_mutex locked in aureon_add_controls()
ALSA: hda/realtek: Add quirk for HP EliteDesk 800 G6 Tower PC
ALSA: hda/realtek: Improve support for Dell Precision 3260
ASoC: mediatek: mt8195: add missing initialization
ASoC: mediatek: mt8188: add missing initialization
ASoC: amd: yc: Add DMI entries to support HP OMEN 16-n0xxx (8A43)
ASoC: zl38060 add gpiolib dependency
ASoC: sam9g20ek: Disable capture unless building with microphone input
ASoC: mt8192: Fix range for sidetone positive gain
ASoC: mt8192: Report an error if when an invalid sidetone gain is written
ASoC: mt8192: Fix event generation for controls
ASoC: mt8192: Remove spammy log messages
ASoC: mchp-pdmc: fix poc noise at capture startup
ASoC: dt-bindings: sama7g5-pdmc: add microchip,startup-delay-us binding
ASoC: soc-pcm: add option to start DMA after DAI
ASoC: mt8183: Fix event generation for I2S DAI operations
ASoC: mt8183: Remove spammy logging from I2S DAI driver
ASoC: mt6358: Remove undefined HPx Mux enumeration values
ASoC: mt6358: Validate Wake on Voice 2 writes
...
Linus Torvalds [Sat, 4 Mar 2023 00:33:28 +0000 (16:33 -0800)]
Merge tag 'for-v6.3-part2' of git://git./linux/kernel/git/sre/linux-power-supply
Pull more power supply updates from Sebastian Reichel:
- Fix DT binding for Richtek RT9467
- Fix a NULL pointer check in the power-supply core
- Document meaning of absent "present" property
* tag 'for-v6.3-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
dt-bindings: power: supply: Revise Richtek RT9467 compatible name
ABI: testing: sysfs-class-power: Document absence of "present" property
power: supply: fix null pointer check order in __power_supply_register
Linus Torvalds [Sat, 4 Mar 2023 00:26:43 +0000 (16:26 -0800)]
Merge tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- xfstest generic/208 fix (memory leak)
- minor netfs fix (to address smatch warning)
- a DFS fix for stable
- a reconnect race fix
- two multichannel fixes
- RDMA (smbdirect) fix
- two additional writeback fixes from David
* tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix memory leak in direct I/O
cifs: prevent data race in cifs_reconnect_tcon()
cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID
iov: Fix netfs_extract_user_to_sg()
cifs: Fix cifs_write_back_from_locked_folio()
cifs: reuse cifs_match_ipaddr for comparison of dstaddr too
cifs: match even the scope id for ipv6 addresses
cifs: Fix an uninitialised variable
cifs: Add some missing xas_retry() calls
Linus Torvalds [Thu, 2 Mar 2023 23:49:44 +0000 (15:49 -0800)]
umh: simplify the capability pointer logic
The usermodehelper code uses two fake pointers for the two capability
cases: CAP_BSET for reading and writing 'usermodehelper_bset', and
CAP_PI to read and write 'usermodehelper_inheritable'.
This seems to be a completely unnecessary indirection, since we could
instead just use the pointers themselves, and never have to do any "if
this then that" kind of logic.
So just get rid of the fake pointer values, and use the real pointer
values instead.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 3 Mar 2023 23:00:28 +0000 (15:00 -0800)]
Merge tag 'cocci-for-6.3' of git://git./linux/kernel/git/jlawall/linux
Pull coccinelle updates from Julia Lawall:
"Changes in make coccicheck and improve a semantic patch
This makes a couple of changes in make coccicheck related to shell
commands.
It also updates the api/atomic_as_refcounter semantic patch to include
WARNING in the output message, as done in other cases"
* tag 'cocci-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
scripts: coccicheck: Use /usr/bin/env
scripts: coccicheck: Avoid warning about spurious escape
coccinelle: api/atomic_as_refcounter: include message type in output
Linus Torvalds [Fri, 3 Mar 2023 22:51:15 +0000 (14:51 -0800)]
Merge tag 'rust-fixes-6.3-rc1' of https://github.com/Rust-for-Linux/linux
Pull Rust fix from Miguel Ojeda:
"A single build error fix: there was a change during the merge window
to a C header parsed by the Rust bindings generator, introducing a
type that it does not handle well.
The fix tells the generator to treat the type as opaque (for now)"
* tag 'rust-fixes-6.3-rc1' of https://github.com/Rust-for-Linux/linux:
rust: bindgen: Add `alt_instr` as opaque type
Linus Torvalds [Fri, 3 Mar 2023 22:41:50 +0000 (14:41 -0800)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Updates that missed the first pull, mostly because of needing more
soak time.
Driver updates (zfcp, ufs, mpi3mr, plus two ipr bug fixes), an
enclosure services (ses) update (mostly bug fixes) and other minor bug
fixes and changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
scsi: zfcp: Trace when request remove fails after qdio send fails
scsi: zfcp: Change the type of all fsf request id fields and variables to u64
scsi: zfcp: Make the type for accessing request hashtable buckets size_t
scsi: ufs: core: Simplify ufshcd_execute_start_stop()
scsi: ufs: core: Rely on the block layer for setting RQF_PM
scsi: core: Extend struct scsi_exec_args
scsi: lpfc: Fix double word in comments
scsi: core: Remove the /proc/scsi/${proc_name} directory earlier
scsi: core: Fix a source code comment
scsi: cxgbi: Remove unneeded version.h include
scsi: qedi: Remove unneeded version.h include
scsi: mpi3mr: Remove unneeded version.h include
scsi: mpi3mr: Fix missing mrioc->evtack_cmds initialization
scsi: mpi3mr: Use number of bits to manage bitmap sizes
scsi: mpi3mr: Remove unnecessary memcpy() to alltgt_info->dmi
scsi: mpi3mr: Fix issues in mpi3mr_get_all_tgt_info()
scsi: mpi3mr: Fix an issue found by KASAN
scsi: mpi3mr: Replace 1-element array with flex-array
scsi: ipr: Work around fortify-string warning
scsi: ipr: Make ipr_probe_ioa_part2() return void
...
Dan Carpenter [Mon, 27 Feb 2023 10:06:33 +0000 (13:06 +0300)]
i2c: gxp: fix an error code in probe
This is passing IS_ERR() instead of PTR_ERR() so instead of an error
code it prints and returns the number 1.
Fixes: 4a55ed6f89f5 ("i2c: Add GXP SoC I2C Controller")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Nick Hawkins <nick.hawkins@hpe.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Wolfram Sang [Mon, 20 Feb 2023 14:40:59 +0000 (15:40 +0100)]
i2c: gxp: return proper error on address NACK
According to Documentation/i2c/fault-codes.rst, NACK after sending an
address should be -ENXIO.
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Wolfram Sang [Fri, 17 Feb 2023 22:13:30 +0000 (23:13 +0100)]
i2c: gxp: remove "empty" switch statement
There used to be error messages which had to go. Now, it only consists
of 'break's, so it can go.
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Benjamin Gray [Mon, 27 Feb 2023 23:33:17 +0000 (10:33 +1100)]
i2c: Disable I2C_APPLE when I2C_PASEMI is a builtin
The ppc64le_allmodconfig sets I2C_PASEMI=y and leaves COMPILE_TEST to
default to y and I2C_APPLE to default to m, running into a known
incompatible configuration that breaks the build [1]. Specifically,
a common dependency (i2c-pasemi-core.o in this case) cannot be used by
both builtin and module consumers.
Disable I2C_APPLE when I2C_PASEMI is a builtin to prevent this.
[1]: https://lore.kernel.org/all/
202112061809.XT99aPrf-lkp@intel.com
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Linus Torvalds [Fri, 3 Mar 2023 18:41:59 +0000 (10:41 -0800)]
Merge tag 'thermal-6.3-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These fix two issues in the Intel thermal control drivers.
Specifics:
- Fix an error pointer dereference in the quark_dts Intel thermal
driver (Dan Carpenter)
- Fix the intel_bxt_pmic_thermal driver Kconfig entry to select
REGMAP which is not user-visible instead of depending on it (Randy
Dunlap)"
* tag 'thermal-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: BXT_PMIC: select REGMAP instead of depending on it
thermal: intel: quark_dts: fix error pointer dereference
Linus Torvalds [Fri, 3 Mar 2023 18:36:01 +0000 (10:36 -0800)]
Merge tag 'acpi-6.3-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These update ACPI quirks for some x86 platforms and add an IRQ
override quirk for one more system.
Specifics:
- Add an ACPI IRQ override quirk for Asus Expertbook
B2402FBA
(Vojtech Hejsek)
- Drop a suspend-to-idle quirk for HP Elitebook G9 that is not needed
any more after a firmware update (Mario Limonciello)
- Add all Cezanne systems to the list for forcing StorageD3Enable,
because they all need the same quirk (Mario Limonciello)"
* tag 'acpi-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable
ACPI: x86: Drop quirk for HP Elitebook
ACPI: resource: Skip IRQ override on Asus Expertbook
B2402FBA
Linus Torvalds [Fri, 3 Mar 2023 18:30:58 +0000 (10:30 -0800)]
Merge tag 'pm-6.3-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update power capping (new hardware support and cleanup) and
cpufreq (bug fixes, cleanups and intel_pstate adjustment for a new
platform).
Specifics:
- Fix error handling in the apple-soc cpufreq driver (Dan Carpenter)
- Change the log level of a message in the amd-pstate cpufreq driver
so it is more visible to users (Kai-Heng Feng)
- Adjust the balance_performance EPP value for Sapphire Rapids in the
intel_pstate cpufreq driver (Srinivas Pandruvada)
- Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick
Alcock)
- Make a read-only kobj_type structure in the schedutil cpufreq
governor constant (Thomas Weißschuh)
- Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL
power capping driver (Sumeet Pawnikar)"
* tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: apple-soc: Fix an IS_ERR() vs NULL check
powercap: remove MODULE_LICENSE in non-modules
cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules
powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC
cpufreq: amd-pstate: remove MODULE_LICENSE in non-modules
cpufreq: schedutil: make kobj_type structure constant
cpufreq: amd-pstate: Let user know amd-pstate is disabled
cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids
Linus Torvalds [Fri, 3 Mar 2023 18:25:29 +0000 (10:25 -0800)]
Merge tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
"Here's a set of fixes/changes that didn't make the first cut, either
because they got queued before I sent the early merge request, or
fixes that came in afterwards. In detail:
- Don't set MSG_NOSIGNAL on recv/recvmsg opcodes, as AF_PACKET will
error out (David)
- Fix for spurious poll wakeups (me)
- Fix for a file leak for buffered reads in certain conditions
(Joseph)
- Don't allow registered buffers of mixed types (Pavel)
- Improve handling of huge pages for registered buffers (Pavel)
- Provided buffer ring size calculation fix (Wojciech)
- Minor cleanups (me)"
* tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux:
io_uring/poll: don't pass in wake func to io_init_poll_iocb()
io_uring: fix fget leak when fs don't support nowait buffered read
io_uring/poll: allow some retries for poll triggering spuriously
io_uring: remove MSG_NOSIGNAL from recvmsg
io_uring/rsrc: always initialize 'folio' to NULL
io_uring/rsrc: optimise registered huge pages
io_uring/rsrc: optimise single entry advance
io_uring/rsrc: disallow multi-source reg buffers
io_uring: remove unused wq_list_merge
io_uring: fix size calculation when registering buf ring
io_uring/rsrc: fix a comment in io_import_fixed()
io_uring: rename 'in_idle' to 'in_cancel'
io_uring: consolidate the put_ref-and-return section of adding work
Linus Torvalds [Fri, 3 Mar 2023 18:21:39 +0000 (10:21 -0800)]
Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Don't access released socket during error recovery (Akinobu
Mita)
- Bring back auto-removal of deleted namespaces during sequential
scan (Christoph Hellwig)
- Fix an error code in nvme_auth_process_dhchap_challenge (Dan
Carpenter)
- Show well known discovery name (Daniel Wagner)
- Add a missing endianess conversion in effects masking (Keith
Busch)
- Fix for a regression introduced in blk-rq-qos during init in this
merge window (Breno)
- Reorder a few fields in struct blk_mq_tag_set, eliminating a few
holes and shrinking it (Christophe)
- Remove redundant bdev_get_queue() NULL checks (Juhyung)
- Add sed-opal single user mode support flag (Luca)
- Remove SQE128 check in ublk as it isn't needed, saving some memory
(Ming)
- Op specific segment checking for cloned requests (Uday)
- Exclusive open partition scan fixes (Yu)
- Loop offset/size checking before assigning them in the device (Zhong)
- Bio polling fixes (me)
* tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux:
blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
nvme-fabrics: show well known discovery name
nvme-tcp: don't access released socket during error recovery
nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
nvme: bring back auto-removal of deleted namespaces during sequential scan
blk-iocost: Pass gendisk to ioc_refresh_params
nvme: fix sparse warning on effects masking
block: be a bit more careful in checking for NULL bdev while polling
block: clear bio->bi_bdev when putting a bio back in the cache
loop: loop_set_status_from_info() check before assignment
ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
block: remove more NULL checks after bdev_get_queue()
blk-mq: Reorder fields in 'struct blk_mq_tag_set'
block: fix scan partition for exclusively open device again
block: Revert "block: Do not reread partition table on exclusively open device"
sed-opal: add support flag for SUM in status ioctl
Linus Torvalds [Fri, 3 Mar 2023 18:17:44 +0000 (10:17 -0800)]
Merge tag 'ata-6.3-fix' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
- Revert commit
104ff59af73a ("ata: ahci: Add Tiger Lake UP{3,4} AHCI
controller") as it is causing serious regressions (failure to boot)
on some laptops
* tag 'ata-6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci: Revert "ata: ahci: Add Tiger Lake UP{3,4} AHCI controller"
Rafael J. Wysocki [Fri, 3 Mar 2023 17:45:53 +0000 (18:45 +0100)]
Merge branches 'acpi-pm' and 'acpi-x86'
Merge additional ACPI quirks for x86 systems:
- Drop a suspend-to-idle quirk for HP Elitebook G9 that is not needed
any more after a firmware update (Mario Limonciello).
- Add all Cezanne systems to the list for forcing StorageD3Enable,
because they all need the same quirk (Mario Limonciello).
* acpi-pm:
ACPI: x86: Drop quirk for HP Elitebook
* acpi-x86:
ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable
Linus Torvalds [Fri, 3 Mar 2023 17:38:01 +0000 (09:38 -0800)]
Merge tag 's390-6.3-2' of git://git./linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Add empty command line parameter handling stubs to kernel for all
command line parameters which are handled in the decompressor. This
avoids invalid "Unknown kernel command line parameters" messages from
the kernel, and also avoids that these will be incorrectly passed to
user space. This caused already confusion, therefore add the empty
stubs
- Add missing phys_to_virt() handling to machine check handler
- Introduce and use a union to be used for zcrypt inline assemblies.
This makes sure that only a register wide member of the union is
passed as input and output parameter to inline assemblies, while
usual C code uses other members of the union to access bit fields of
it
- Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
atomically read a 128-bit value from memory. This replaces the
(mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf
code. Currently gcc does not generate the used lpq instruction if
__READ_ONCE() is used for aligned 128-bit accesses, therefore use
this s390 specific helper
- Simplify machine check handler code if a task needs to be killed
because of e.g. register corruption due to a machine malfunction
- Perform CPU reset to clear pending interrupts and TLB entries on an
already stopped target CPU before delegating work to it
- Generate arch/s390/boot/vmlinux.map link map for the decompressor,
when CONFIG_VMLINUX_MAP is enabled for debugging purposes
- Fix segment type handling for dcssblk devices. It incorrectly always
returned type "READ/WRITE" even for read-only segements, which can
result in a kernel panic if somebody tries to write to a read-only
device
- Sort config S390 select list again
- Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
test
* tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kprobes: fix current_kprobe never cleared after kprobes reenter
s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
s390/Kconfig: sort config S390 select list again
s390/extmem: return correct segment type in __segment_load()
s390/decompressor: add link map saving
s390/smp: perform cpu reset before delegating work to target cpu
s390/mcck: cleanup user process termination path
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
s390/nmi: fix virtual-physical address confusion
s390/setup: do not complain about parameters handled in decompressor