platform/kernel/linux-rpi.git
2 years agoKVM: arm64: Extract ESR_ELx.EC only
Mark Rutland [Wed, 3 Nov 2021 11:05:45 +0000 (11:05 +0000)]
KVM: arm64: Extract ESR_ELx.EC only

Since ARMv8.0 the upper 32 bits of ESR_ELx have been RES0, and recently
some of the upper bits gained a meaning and can be non-zero. For
example, when FEAT_LS64 is implemented, ESR_ELx[36:32] contain ISS2,
which for an ST64BV or ST64BV0 can be non-zero. This can be seen in ARM
DDI 0487G.b, page D13-3145, section D13.2.37.

Generally, we must not rely on RES0 bit remaining zero in future, and
when extracting ESR_ELx.EC we must mask out all other bits.

All C code uses the ESR_ELx_EC() macro, which masks out the irrelevant
bits, and therefore no alterations are required to C code to avoid
consuming irrelevant bits.

In a couple of places the KVM assembly extracts ESR_ELx.EC using LSR on
an X register, and so could in theory consume previously RES0 bits. In
both cases this is for comparison with EC values ESR_ELx_EC_HVC32 and
ESR_ELx_EC_HVC64, for which the upper bits of ESR_ELx must currently be
zero, but this could change in future.

This patch adjusts the KVM vectors to use UBFX rather than LSR to
extract ESR_ELx.EC, ensuring these are robust to future additions to
ESR_ELx.

Cc: stable@vger.kernel.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211103110545.4613-1-mark.rutland@arm.com
2 years agoMerge branch kvm/selftests/memslot into kvmarm-master/next
Marc Zyngier [Thu, 21 Oct 2021 10:40:03 +0000 (11:40 +0100)]
Merge branch kvm/selftests/memslot into kvmarm-master/next

* kvm/selftests/memslot:
  : .
  : Enable KVM memslot selftests on arm64, making them less
  : x86 specific.
  : .
  KVM: selftests: Build the memslot tests for arm64
  KVM: selftests: Make memslot_perf_test arch independent

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: selftests: Build the memslot tests for arm64
Ricardo Koller [Tue, 7 Sep 2021 18:09:57 +0000 (11:09 -0700)]
KVM: selftests: Build the memslot tests for arm64

Add memslot_perf_test and memslot_modification_stress_test to the list
of aarch64 selftests.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210907180957.609966-3-ricarkol@google.com
2 years agoKVM: selftests: Make memslot_perf_test arch independent
Ricardo Koller [Tue, 7 Sep 2021 18:09:56 +0000 (11:09 -0700)]
KVM: selftests: Make memslot_perf_test arch independent

memslot_perf_test uses ucalls for synchronization between guest and
host. Ucalls API is architecture independent: tests do not need to know
details like what kind of exit they generate on a specific arch.  More
specifically, there is no need to check whether an exit is KVM_EXIT_IO
in x86 for the host to know that the exit is ucall related, as
get_ucall() already makes that check.

Change memslot_perf_test to not require specifying what exit does a
ucall generate. Also add a missing ucall_init.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210907180957.609966-2-ricarkol@google.com
2 years agoMerge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/next
Marc Zyngier [Mon, 18 Oct 2021 16:20:50 +0000 (17:20 +0100)]
Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/next

* kvm-arm64/pkvm/fixed-features: (22 commits)
  : .
  : Add the pKVM fixed feature that allows a bunch of exceptions
  : to either be forbidden or be easily handled at EL2.
  : .
  KVM: arm64: pkvm: Give priority to standard traps over pvm handling
  KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()
  KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around
  KVM: arm64: pkvm: Consolidate include files
  KVM: arm64: pkvm: Preserve pending SError on exit from AArch32
  KVM: arm64: pkvm: Handle GICv3 traps as required
  KVM: arm64: pkvm: Drop sysregs that should never be routed to the host
  KVM: arm64: pkvm: Drop AArch32-specific registers
  KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI
  KVM: arm64: pkvm: Use a single function to expose all id-regs
  KVM: arm64: Fix early exit ptrauth handling
  KVM: arm64: Handle protected guests at 32 bits
  KVM: arm64: Trap access to pVM restricted features
  KVM: arm64: Move sanitized copies of CPU features
  KVM: arm64: Initialize trap registers for protected VMs
  KVM: arm64: Add handlers for protected VM System Registers
  KVM: arm64: Simplify masking out MTE in feature id reg
  KVM: arm64: Add missing field descriptor for MDCR_EL2
  KVM: arm64: Pass struct kvm to per-EC handlers
  KVM: arm64: Move early handlers to per-EC handlers
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: pkvm: Give priority to standard traps over pvm handling
Marc Zyngier [Wed, 13 Oct 2021 12:03:46 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Give priority to standard traps over pvm handling

Checking for pvm handling first means that we cannot handle ptrauth
traps or apply any of the workarounds (GICv3 or TX2 #219).

Flip the order around.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-12-maz@kernel.org
2 years agoKVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()
Marc Zyngier [Wed, 13 Oct 2021 12:03:45 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()

Passing a VM pointer around is odd, and results in extra work on
VHE. Follow the rest of the design that uses the vcpu instead, and
let the nVHE code look into the struct kvm as required.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-11-maz@kernel.org
2 years agoKVM: arm64: pkvm: Move kvm_handle_pvm_restricted around
Marc Zyngier [Wed, 13 Oct 2021 12:03:44 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around

Place kvm_handle_pvm_restricted() next to its little friends such
as kvm_handle_pvm_sysreg().

This allows to make inject_undef64() static.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-10-maz@kernel.org
2 years agoKVM: arm64: pkvm: Consolidate include files
Marc Zyngier [Wed, 13 Oct 2021 12:03:43 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Consolidate include files

kvm_fixed_config.h is pkvm specific, and would be better placed
near its users. At the same time, include/nvhe/sys_regs.h is now
almost empty.

Merge the two into arch/arm64/kvm/hyp/include/nvhe/fixed_config.h.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-9-maz@kernel.org
2 years agoKVM: arm64: pkvm: Preserve pending SError on exit from AArch32
Marc Zyngier [Wed, 13 Oct 2021 12:03:42 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Preserve pending SError on exit from AArch32

Don't drop a potential SError when a guest gets caught red-handed
running AArch32 code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-8-maz@kernel.org
2 years agoKVM: arm64: pkvm: Handle GICv3 traps as required
Marc Zyngier [Wed, 13 Oct 2021 12:03:41 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Handle GICv3 traps as required

Forward accesses to the ICV_*SGI*_EL1 registers to EL1, and
emulate ICV_SRE_EL1 by returning a fixed value.

This should be enough to support GICv3 in a protected guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-7-maz@kernel.org
2 years agoKVM: arm64: pkvm: Drop sysregs that should never be routed to the host
Marc Zyngier [Wed, 13 Oct 2021 12:03:40 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Drop sysregs that should never be routed to the host

A bunch of system registers (most of them MM related) should never
trap to the host under any circumstance. Keep them close to our chest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-6-maz@kernel.org
2 years agoKVM: arm64: pkvm: Drop AArch32-specific registers
Marc Zyngier [Wed, 13 Oct 2021 12:03:39 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Drop AArch32-specific registers

All the SYS_*32_EL2 registers are AArch32-specific. Since we forbid
AArch32, there is no need to handle those in any way.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-5-maz@kernel.org
2 years agoKVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI
Marc Zyngier [Wed, 13 Oct 2021 12:03:38 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI

The ERR*/ERX* registers should be handled as RAZ/WI, and there
should be no need to involve EL1 for that.

Add a helper that handles such registers, and repaint the sysreg
table to declare these registers as RAZ/WI.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-4-maz@kernel.org
2 years agoKVM: arm64: pkvm: Use a single function to expose all id-regs
Marc Zyngier [Wed, 13 Oct 2021 12:03:37 +0000 (13:03 +0100)]
KVM: arm64: pkvm: Use a single function to expose all id-regs

Rather than exposing a whole set of helper functions to retrieve
individual ID registers, use the existing decoding tree and expose
a single helper instead.

This allow a number of functions to be made static, and we now
have a single entry point to maintain.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-3-maz@kernel.org
2 years agoKVM: arm64: Fix early exit ptrauth handling
Marc Zyngier [Wed, 13 Oct 2021 12:03:36 +0000 (13:03 +0100)]
KVM: arm64: Fix early exit ptrauth handling

The previous rework of the early exit code to provide an EC-based
decoding tree missed the fact that we have two trap paths for
ptrauth: the instructions (EC_PAC) and the sysregs (EC_SYS64).

Rework the handlers to call the ptrauth handling code on both
paths.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-2-maz@kernel.org
2 years agoMerge branch kvm-arm64/memory-accounting into kvmarm-master/next
Marc Zyngier [Sun, 17 Oct 2021 10:29:36 +0000 (11:29 +0100)]
Merge branch kvm-arm64/memory-accounting into kvmarm-master/next

* kvm-arm64/memory-accounting:
  : .
  : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base
  : to better track memory allocation made on behalf of a VM.
  : .
  KVM: arm64: Add memcg accounting to KVM allocations
  KVM: arm64: vgic: Add memcg accounting to vgic allocations

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Add memcg accounting to KVM allocations
Jia He [Tue, 7 Sep 2021 12:31:12 +0000 (20:31 +0800)]
KVM: arm64: Add memcg accounting to KVM allocations

Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM
allocations"), it would be better to make arm64 KVM consistent with
common kvm codes.

The memory allocations of VM scope should be charged into VM process
cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT.

There remain a few cases since these allocations are global, not in VM
scope.

Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210907123112.10232-3-justin.he@arm.com
2 years agoKVM: arm64: vgic: Add memcg accounting to vgic allocations
Jia He [Tue, 7 Sep 2021 12:31:11 +0000 (20:31 +0800)]
KVM: arm64: vgic: Add memcg accounting to vgic allocations

Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM
allocations"), it would be better to make arm64 vgic consistent with
common kvm codes.

The memory allocations of VM scope should be charged into VM process
cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT.

There remain a few cases since these allocations are global, not in VM
scope.

Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com
2 years agoMerge branch kvm-arm64/selftest/timer into kvmarm-master/next
Marc Zyngier [Sun, 17 Oct 2021 10:19:42 +0000 (11:19 +0100)]
Merge branch kvm-arm64/selftest/timer into kvmarm-master/next

* kvm-arm64/selftest/timer:
  : .
  : Add a set of selftests for the KVM/arm64 timer emulation.
  : Comes with a minimal GICv3 infrastructure.
  : .
  KVM: arm64: selftests: arch_timer: Support vCPU migration
  KVM: arm64: selftests: Add arch_timer test
  KVM: arm64: selftests: Add host support for vGIC
  KVM: arm64: selftests: Add basic GICv3 support
  KVM: arm64: selftests: Add light-weight spinlock support
  KVM: arm64: selftests: Add guest support to get the vcpuid
  KVM: arm64: selftests: Maintain consistency for vcpuid type
  KVM: arm64: selftests: Add support to disable and enable local IRQs
  KVM: arm64: selftests: Add basic support to generate delays
  KVM: arm64: selftests: Add basic support for arch_timers
  KVM: arm64: selftests: Add support for cpu_relax
  KVM: arm64: selftests: Introduce ARM64_SYS_KVM_REG
  tools: arm64: Import sysreg.h
  KVM: arm64: selftests: Add MMIO readl/writel support

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: selftests: arch_timer: Support vCPU migration
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:39 +0000 (23:34 +0000)]
KVM: arm64: selftests: arch_timer: Support vCPU migration

Since the timer stack (hardware and KVM) is per-CPU, there
are potential chances for races to occur when the scheduler
decides to migrate a vCPU thread to a different physical CPU.
Hence, include an option to stress-test this part as well by
forcing the vCPUs to migrate across physical CPUs in the
system at a particular rate.

Originally, the bug for the fix with commit 3134cc8beb69d0d
("KVM: arm64: vgic: Resample HW pending state on deactivation")
was discovered using arch_timer test with vCPU migrations and
can be easily reproduced.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-16-rananta@google.com
2 years agoKVM: arm64: selftests: Add arch_timer test
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:38 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add arch_timer test

Add a KVM selftest to validate the arch_timer functionality.
Primarily, the test sets up periodic timer interrupts and
validates the basic architectural expectations upon its receipt.

The test provides command-line options to configure the period
of the timer, number of iterations, and number of vCPUs.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-15-rananta@google.com
2 years agoKVM: arm64: selftests: Add host support for vGIC
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:37 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add host support for vGIC

Implement a simple library to perform vGIC-v3 setup
from a host point of view. This includes creating a
vGIC device, setting up distributor and redistributor
attributes, and mapping the guest physical addresses.

The definition of REDIST_REGION_ATTR_ADDR is taken from
aarch64/vgic_init test. Hence, replace the definition
by including vgic.h in the test file.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-14-rananta@google.com
2 years agoKVM: arm64: selftests: Add basic GICv3 support
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:36 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add basic GICv3 support

Add basic support for ARM Generic Interrupt Controller v3.
The support provides guests to setup interrupts.

The work is inspired from kvm-unit-tests and the kernel's
GIC driver (drivers/irqchip/irq-gic-v3.c).

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-13-rananta@google.com
2 years agoKVM: arm64: selftests: Add light-weight spinlock support
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:35 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add light-weight spinlock support

Add a simpler version of spinlock support for ARM64 for
the guests to use.

The implementation is loosely based on the spinlock
implementation in kvm-unit-tests.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-12-rananta@google.com
2 years agoKVM: arm64: selftests: Add guest support to get the vcpuid
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:34 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add guest support to get the vcpuid

At times, such as when in the interrupt handler, the guest wants
to get the vcpuid that it's running on to pull the per-cpu private
data. As a result, introduce guest_get_vcpuid() that returns the
vcpuid of the calling vcpu. The interface is architecture
independent, but defined only for arm64 as of now.

Suggested-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-11-rananta@google.com
2 years agoKVM: arm64: selftests: Maintain consistency for vcpuid type
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:33 +0000 (23:34 +0000)]
KVM: arm64: selftests: Maintain consistency for vcpuid type

The prototype of aarch64_vcpu_setup() accepts vcpuid as
'int', while the rest of the aarch64 (and struct vcpu)
carries it as 'uint32_t'. Hence, change the prototype
to make it consistent throughout the board.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-10-rananta@google.com
2 years agoKVM: arm64: selftests: Add support to disable and enable local IRQs
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:32 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add support to disable and enable local IRQs

Add functions local_irq_enable() and local_irq_disable() to
enable and disable the IRQs from the guest, respectively.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-9-rananta@google.com
2 years agoKVM: arm64: selftests: Add basic support to generate delays
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:31 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add basic support to generate delays

Add udelay() support to generate a delay in the guest.

The routines are derived and simplified from kernel's
arch/arm64/lib/delay.c.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-8-rananta@google.com
2 years agoKVM: arm64: selftests: Add basic support for arch_timers
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:30 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add basic support for arch_timers

Add a minimalistic library support to access the virtual timers,
that can be used for simple timing functionalities, such as
introducing delays in the guest.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-7-rananta@google.com
2 years agoKVM: arm64: selftests: Add support for cpu_relax
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:29 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add support for cpu_relax

Implement the guest helper routine, cpu_relax(), to yield
the processor to other tasks.

The function was derived from
arch/arm64/include/asm/vdso/processor.h.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-6-rananta@google.com
2 years agoKVM: arm64: selftests: Introduce ARM64_SYS_KVM_REG
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:28 +0000 (23:34 +0000)]
KVM: arm64: selftests: Introduce ARM64_SYS_KVM_REG

With the inclusion of sysreg.h, that brings in system register
encodings, it would be redundant to re-define register encodings
again in processor.h to use it with ARM64_SYS_REG for the KVM
functions such as set_reg() or get_reg(). Hence, add helper macro,
ARM64_SYS_KVM_REG, that converts SYS_* definitions in sysreg.h
into ARM64_SYS_REG definitions.

Also replace all the users of ARM64_SYS_REG, relying on
the encodings created in processor.h, with ARM64_SYS_KVM_REG and
remove the definitions.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-5-rananta@google.com
2 years agotools: arm64: Import sysreg.h
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:26 +0000 (23:34 +0000)]
tools: arm64: Import sysreg.h

Bring-in the kernel's arch/arm64/include/asm/sysreg.h
into tools/ for arm64 to make use of all the standard
register definitions in consistence with the kernel.

Make use of the register read/write definitions from
sysreg.h, instead of the existing definitions. A syntax
correction is needed for the files that use write_sysreg()
to make it compliant with the new (kernel's) syntax.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
[maz: squashed two commits in order to keep the series bisectable]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-3-rananta@google.com
Link: https://lore.kernel.org/r/20211007233439.1826892-4-rananta@google.com
2 years agoKVM: arm64: selftests: Add MMIO readl/writel support
Raghavendra Rao Ananta [Thu, 7 Oct 2021 23:34:25 +0000 (23:34 +0000)]
KVM: arm64: selftests: Add MMIO readl/writel support

Define the readl() and writel() functions for the guests to
access (4-byte) the MMIO region.

The routines, and their dependents, are inspired from the kernel's
arch/arm64/include/asm/io.h and arch/arm64/include/asm/barrier.h.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211007233439.1826892-2-rananta@google.com
2 years agoMerge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/next
Marc Zyngier [Sun, 17 Oct 2021 10:10:14 +0000 (11:10 +0100)]
Merge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/next

* kvm-arm64/vgic-fixes-5.16:
  : .
  : Multiple updates to the GICv3 emulation in order to better support
  : the dreadful Apple M1 that only implements half of it, and in a
  : broken way...
  : .
  KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode
  KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS
  KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
  KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
  KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode
Marc Zyngier [Sun, 10 Oct 2021 15:09:10 +0000 (16:09 +0100)]
KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode

Having realised that a virtual LPI does transition through an active
state that does not exist on bare metal, align the CPU interface
emulation with the behaviour specified in the architecture pseudocode.

The LPIs now transition to active on IAR read, and to inactive on
EOI write. Special care is taken not to increment the EOIcount for
an LPI that isn't present in the LRs.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-6-maz@kernel.org
2 years agoKVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS
Marc Zyngier [Sun, 10 Oct 2021 15:09:09 +0000 (16:09 +0100)]
KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS

Since we are trapping all sysreg accesses when ICH_VTR_EL2.SEIS
is set, and that we never deliver an SError when emulating
any of the GICv3 sysregs, don't advertise ICC_CTLR_EL1.SEIS.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-5-maz@kernel.org
2 years agoKVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
Marc Zyngier [Sun, 10 Oct 2021 15:09:08 +0000 (16:09 +0100)]
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible

On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.

In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.

There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.

It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:

Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596

With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649

which is a pretty convincing result.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
2 years agoKVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
Marc Zyngier [Sun, 10 Oct 2021 15:09:07 +0000 (16:09 +0100)]
KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors

The infamous M1 has a feature nobody else ever implemented,
in the form of the "GIC locally generated SError interrupts",
also known as SEIS for short.

These SErrors are generated when a guest does something that violates
the GIC state machine. It would have been simpler to just *ignore*
the damned thing, but that's not what this HW does. Oh well.

This part of of the architecture is also amazingly under-specified.
There is a whole 10 lines that describe the feature in a spec that
is 930 pages long, and some of these lines are factually wrong.
Oh, and it is deprecated, so the insentive to clarify it is low.

Now, the spec says that this should be a *virtual* SError when
HCR_EL2.AMO is set. As it turns out, that's not always the case
on this CPU, and the SError sometimes fires on the host as a
physical SError. Goodbye, cruel world. This clearly is a HW bug,
and it means that a guest can easily take the host down, on demand.

Thankfully, we have seen systems that were just as broken in the
past, and we have the perfect vaccine for it.

Apple M1, please meet the Cavium ThunderX workaround. All your
GIC accesses will be trapped, sanitised, and emulated. Only the
signalling aspect of the HW will be used. It won't be super speedy,
but it will at least be safe. You're most welcome.

Given that this has only ever been seen on this single implementation,
that the spec is unclear at best and that we cannot trust it to ever
be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
being set.

Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
2 years agoKVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3
Marc Zyngier [Sun, 10 Oct 2021 15:09:06 +0000 (16:09 +0100)]
KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3

Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value
visible on the host, even if we were running a GICv2-enabled VM
on a GICv3+compat host.

That's fine, but we also now have the case of a host that does not
expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is
confusing. Thank you M1.

Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1
when a GICv3 is exposed to the guest. This also hides a GICv4.1
CPU interface from the guest which has no business knowing about
the v4.1 extension.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-2-maz@kernel.org
2 years agoMerge branch kvm-arm64/misc-5.16 into kvmarm-master/next
Marc Zyngier [Tue, 12 Oct 2021 14:49:14 +0000 (15:49 +0100)]
Merge branch kvm-arm64/misc-5.16 into kvmarm-master/next

* kvm-arm64/misc-5.16:
  : .
  : - Allow KVM to be disabled from the command-line
  : - Clean up CONFIG_KVM vs CONFIG_HAVE_KVM
  : - Fix endianess evaluation on MMIO access from EL0
  : .
  KVM: arm64: Fix reporting of endianess when the access originates at EL0

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Fix reporting of endianess when the access originates at EL0
Marc Zyngier [Tue, 12 Oct 2021 11:23:12 +0000 (12:23 +0100)]
KVM: arm64: Fix reporting of endianess when the access originates at EL0

We currently check SCTLR_EL1.EE when computing the address of
a faulting guest access. However, the fault could have occured at
EL0, in which case the right bit to check would be SCTLR_EL1.E0E.

This is pretty unlikely to cause any issue in practice: You'd have
to have a guest with a LE EL1 and a BE EL0 (or the other way around),
and have mapped a device into the EL0 page tables.

Good luck with that!

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211012112312.1247467-1-maz@kernel.org
2 years agoMerge branch kvm-arm64/pkvm/restrict-hypercalls into kvmarm-master/next
Marc Zyngier [Mon, 11 Oct 2021 15:55:03 +0000 (16:55 +0100)]
Merge branch kvm-arm64/pkvm/restrict-hypercalls into kvmarm-master/next

* kvm-arm64/pkvm/restrict-hypercalls:
  : .
  : Restrict the use of some hypercalls as well as kexec once
  : the protected KVM mode has been initialised.
  : .
  Documentation: admin-guide: Document side effects when pKVM is enabled

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoDocumentation: admin-guide: Document side effects when pKVM is enabled
Alexandru Elisei [Mon, 11 Oct 2021 15:38:35 +0000 (16:38 +0100)]
Documentation: admin-guide: Document side effects when pKVM is enabled

Recent changes to KVM for arm64 has made it impossible for the host to
hibernate or use kexec when protected mode is enabled via the kernel
command line.

There are people who rely on kexec (for example, developers who use kexec
as a quick way to test a new kernel), let's document this change in
behaviour, so it doesn't catch them by surprise and we have a place to
point people to if it does.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011153835.291147-1-alexandru.elisei@arm.com
2 years agoKVM: arm64: Handle protected guests at 32 bits
Fuad Tabba [Sun, 10 Oct 2021 14:56:36 +0000 (15:56 +0100)]
KVM: arm64: Handle protected guests at 32 bits

Protected KVM does not support protected AArch32 guests. However,
it is possible for the guest to force run AArch32, potentially
causing problems. Add an extra check so that if the hypervisor
catches the guest doing that, it can prevent the guest from
running again by resetting vcpu->arch.target and returning
ARM_EXCEPTION_IL.

If this were to happen, The VMM can try and fix it by re-
initializing the vcpu with KVM_ARM_VCPU_INIT, however, this is
likely not possible for protected VMs.

Adapted from commit 22f553842b14 ("KVM: arm64: Handle Asymmetric
AArch32 systems")

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-12-tabba@google.com
2 years agoKVM: arm64: Trap access to pVM restricted features
Fuad Tabba [Sun, 10 Oct 2021 14:56:35 +0000 (15:56 +0100)]
KVM: arm64: Trap access to pVM restricted features

Trap accesses to restricted features for VMs running in protected
mode.

Access to feature registers are emulated, and only supported
features are exposed to protected VMs.

Accesses to restricted registers as well as restricted
instructions are trapped, and an undefined exception is injected
into the protected guests, i.e., with EC = 0x0 (unknown reason).
This EC is the one used, according to the Arm Architecture
Reference Manual, for unallocated or undefined system registers
or instructions.

Only affects the functionality of protected VMs. Otherwise,
should not affect non-protected VMs when KVM is running in
protected mode.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-11-tabba@google.com
2 years agoKVM: arm64: Move sanitized copies of CPU features
Fuad Tabba [Sun, 10 Oct 2021 14:56:34 +0000 (15:56 +0100)]
KVM: arm64: Move sanitized copies of CPU features

Move the sanitized copies of the CPU feature registers to the
recently created sys_regs.c. This consolidates all copies in a
more relevant file.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-10-tabba@google.com
2 years agoKVM: arm64: Initialize trap registers for protected VMs
Fuad Tabba [Sun, 10 Oct 2021 14:56:33 +0000 (15:56 +0100)]
KVM: arm64: Initialize trap registers for protected VMs

Protected VMs have more restricted features that need to be
trapped. Moreover, the host should not be trusted to set the
appropriate trapping registers and their values.

Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and
cptr_el2 at EL2 for protected guests, based on the values of the
guest's feature id registers.

No functional change intended as trap handlers introduced in the
previous patch are still not hooked in to the guest exit
handlers.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com
2 years agoKVM: arm64: Add handlers for protected VM System Registers
Fuad Tabba [Sun, 10 Oct 2021 14:56:32 +0000 (15:56 +0100)]
KVM: arm64: Add handlers for protected VM System Registers

Add system register handlers for protected VMs. These cover Sys64
registers (including feature id registers), and debug.

No functional change intended as these are not hooked in yet to
the guest exit handlers introduced earlier. So when trapping is
triggered, the exit handlers let the host handle it, as before.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-8-tabba@google.com
2 years agoKVM: arm64: Simplify masking out MTE in feature id reg
Fuad Tabba [Sun, 10 Oct 2021 14:56:31 +0000 (15:56 +0100)]
KVM: arm64: Simplify masking out MTE in feature id reg

Simplify code for hiding MTE support in feature id register when
MTE is not enabled/supported by KVM.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-7-tabba@google.com
2 years agoKVM: arm64: Add missing field descriptor for MDCR_EL2
Fuad Tabba [Sun, 10 Oct 2021 14:56:30 +0000 (15:56 +0100)]
KVM: arm64: Add missing field descriptor for MDCR_EL2

It's not currently used. Added for completeness.

No functional change intended.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-6-tabba@google.com
2 years agoKVM: arm64: Pass struct kvm to per-EC handlers
Fuad Tabba [Sun, 10 Oct 2021 14:56:29 +0000 (15:56 +0100)]
KVM: arm64: Pass struct kvm to per-EC handlers

We need struct kvm to check for protected VMs to be able to pick
the right handlers for them in subsequent patches.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-5-tabba@google.com
2 years agoKVM: arm64: Move early handlers to per-EC handlers
Marc Zyngier [Sun, 10 Oct 2021 14:56:28 +0000 (15:56 +0100)]
KVM: arm64: Move early handlers to per-EC handlers

Simplify the early exception handling by slicing the gigantic decoding
tree into a more manageable set of functions, similar to what we have
in handle_exit.c.

This will also make the structure reusable for pKVM's own early exit
handling.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211010145636.1950948-4-tabba@google.com
2 years agoKVM: arm64: Don't include switch.h into nvhe/kvm-main.c
Marc Zyngier [Sun, 10 Oct 2021 14:56:27 +0000 (15:56 +0100)]
KVM: arm64: Don't include switch.h into nvhe/kvm-main.c

hyp-main.c includes switch.h while it only requires adjust-pc.h.
Fix it to remove an unnecessary dependency.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211010145636.1950948-3-tabba@google.com
2 years agoKVM: arm64: Move __get_fault_info() and co into their own include file
Marc Zyngier [Sun, 10 Oct 2021 14:56:26 +0000 (15:56 +0100)]
KVM: arm64: Move __get_fault_info() and co into their own include file

In order to avoid including the whole of the switching helpers
in unrelated files, move the __get_fault_info() and related helpers
into their own include file.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211010145636.1950948-2-tabba@google.com
2 years agoMerge branch kvm-arm64/raz-sysregs into kvmarm-master/next
Marc Zyngier [Mon, 11 Oct 2021 13:15:53 +0000 (14:15 +0100)]
Merge branch kvm-arm64/raz-sysregs into kvmarm-master/next

* kvm-arm64/raz-sysregs:
  : .
  : Simplify the handling of RAZ register, removing pointless indirections.
  : .
  KVM: arm64: Replace get_raz_id_reg() with get_raz_reg()
  KVM: arm64: Use get_raz_reg() for userspace reads of PMSWINC_EL0
  KVM: arm64: Return early from read_id_reg() if register is RAZ

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Replace get_raz_id_reg() with get_raz_reg()
Alexandru Elisei [Mon, 11 Oct 2021 10:58:40 +0000 (11:58 +0100)]
KVM: arm64: Replace get_raz_id_reg() with get_raz_reg()

Reading a RAZ ID register isn't different from reading any other RAZ
register, so get rid of get_raz_id_reg() and replace it with get_raz_reg(),
which does the same thing, but does it without going through two layers of
indirection.

No functional change.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011105840.155815-4-alexandru.elisei@arm.com
2 years agoKVM: arm64: Use get_raz_reg() for userspace reads of PMSWINC_EL0
Alexandru Elisei [Mon, 11 Oct 2021 10:58:39 +0000 (11:58 +0100)]
KVM: arm64: Use get_raz_reg() for userspace reads of PMSWINC_EL0

PMSWINC_EL0 is a write-only register and was initially part of the VCPU
register state, but was later removed in commit 7a3ba3095a32 ("KVM:
arm64: Remove PMSWINC_EL0 shadow register"). To prevent regressions, the
register was kept accessible from userspace as Read-As-Zero (RAZ).

The read function that is used to handle userspace reads of this
register is get_raz_id_reg(), which, while technically correct, as it
returns 0, it is not semantically correct, as PMSWINC_EL0 is not an ID
register as the function name suggests.

Add a new function, get_raz_reg(), to use it as the accessor for
PMSWINC_EL0, as to not conflate get_raz_id_reg() to handle other types
of registers.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011105840.155815-3-alexandru.elisei@arm.com
2 years agoKVM: arm64: Return early from read_id_reg() if register is RAZ
Alexandru Elisei [Mon, 11 Oct 2021 10:58:38 +0000 (11:58 +0100)]
KVM: arm64: Return early from read_id_reg() if register is RAZ

If read_id_reg() is called for an ID register which is Read-As-Zero (RAZ),
it initializes the return value to zero, then goes through a list of
registers which require special handling before returning the final value.

By not returning as soon as it checks that the register should be RAZ, the
function creates the opportunity for bugs, if, for example, a patch changes
a register to RAZ (like has happened with PMSWINC_EL0 in commit
11663111cd49), but doesn't remove the special handling from read_id_reg();
or if a register is RAZ in certain situations, but readable in others.

Return early to make it impossible for a RAZ register to be anything other
than zero.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011105840.155815-2-alexandru.elisei@arm.com
2 years agoMerge branch kvm-arm64/misc-5.16 into kvmarm-master/next
Marc Zyngier [Mon, 11 Oct 2021 09:14:38 +0000 (10:14 +0100)]
Merge branch kvm-arm64/misc-5.16 into kvmarm-master/next

* kvm-arm64/misc-5.16:
  : .
  : - Allow KVM to be disabled from the command-line
  : - Clean up CONFIG_KVM vs CONFIG_HAVE_KVM
  : .
  KVM: arm64: Depend on HAVE_KVM instead of OF
  KVM: arm64: Unconditionally include generic KVM's Kconfig
  KVM: arm64: Allow KVM to be disabled from the command line

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Depend on HAVE_KVM instead of OF
Sean Christopherson [Tue, 21 Sep 2021 22:22:31 +0000 (15:22 -0700)]
KVM: arm64: Depend on HAVE_KVM instead of OF

Select HAVE_KVM at all times on arm64, as the OF requirement is
always there (even in the case of an ACPI system, we still depend
on some of the OF infrastructure), and won't fo away.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Will Deacon <will@kernel.org>
[maz: Drop the "HAVE_KVM if OF" dependency, as OF is always there on arm64,
 new commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210921222231.518092-3-seanjc@google.com
2 years agoKVM: arm64: Unconditionally include generic KVM's Kconfig
Sean Christopherson [Tue, 21 Sep 2021 22:22:30 +0000 (15:22 -0700)]
KVM: arm64: Unconditionally include generic KVM's Kconfig

Unconditionally "source" the generic KVM Kconfig instead of wrapping it
with KVM=y.  A future patch will select HAVE_KVM so that referencing
HAVE_KVM in common kernel code doesn't break, and because KVM=y and
HAVE_KVM=n is weird.  Source the generic KVM Kconfig unconditionally so
that HAVE_KVM and KVM don't end up with a circular dependency.

Note, all but one of generic KVM's "configs" are of the HAVE_XYZ nature,
and the one outlier correctly takes a dependency on CONFIG_KVM, i.e. the
generic Kconfig is intended to be included unconditionally.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
[maz: made NVHE_EL2_DEBUG depend on KVM]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210921222231.518092-2-seanjc@google.com
2 years agoKVM: arm64: Allow KVM to be disabled from the command line
Marc Zyngier [Fri, 1 Oct 2021 17:05:53 +0000 (18:05 +0100)]
KVM: arm64: Allow KVM to be disabled from the command line

Although KVM can be compiled out of the kernel, it cannot be disabled
at runtime. Allow this possibility by introducing a new mode that
will prevent KVM from initialising.

This is useful in the (limited) circumstances where you don't want
KVM to be available (what is wrong with you?), or when you want
to install another hypervisor instead (good luck with that).

Reviewed-by: David Brazdil <dbrazdil@google.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Scull <ascull@google.com>
Link: https://lore.kernel.org/r/20211001170553.3062988-1-maz@kernel.org
2 years agoMerge branch kvm-arm64/vgic-ipa-checks into kvmarm-master/next
Marc Zyngier [Mon, 11 Oct 2021 08:43:06 +0000 (09:43 +0100)]
Merge branch kvm-arm64/vgic-ipa-checks into kvmarm-master/next

* kvm-arm64/vgic-ipa-checks:
  : .
  : Add extra checks to prevent ther various GIC regions to land
  : outside of the IPA space (and tests to verify that it works).
  : .
  KVM: arm64: selftests: Add init ITS device test
  KVM: arm64: selftests: Add test for legacy GICv3 REDIST base partially above IPA range
  KVM: arm64: selftests: Add tests for GIC redist/cpuif partially above IPA range
  KVM: arm64: selftests: Add some tests for GICv2 in vgic_init
  KVM: arm64: selftests: Make vgic_init/vm_gic_create version agnostic
  KVM: arm64: selftests: Make vgic_init gic version agnostic
  KVM: arm64: vgic: Drop vgic_check_ioaddr()
  KVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size
  KVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size
  KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size
  kvm: arm64: vgic: Introduce vgic_check_iorange

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: selftests: Add init ITS device test
Ricardo Koller [Tue, 5 Oct 2021 01:19:21 +0000 (18:19 -0700)]
KVM: arm64: selftests: Add init ITS device test

Add some ITS device init tests: general KVM device tests (address not
defined already, address aligned) and tests for the ITS region being
within the addressable IPA range.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-12-ricarkol@google.com
2 years agoKVM: arm64: selftests: Add test for legacy GICv3 REDIST base partially above IPA...
Ricardo Koller [Tue, 5 Oct 2021 01:19:20 +0000 (18:19 -0700)]
KVM: arm64: selftests: Add test for legacy GICv3 REDIST base partially above IPA range

Add a new test into vgic_init which checks that the first vcpu fails to
run if there is not sufficient REDIST space below the addressable IPA
range.  This only applies to the KVM_VGIC_V3_ADDR_TYPE_REDIST legacy API
as the required REDIST space is not know when setting the DIST region.

Note that using the REDIST_REGION API results in a different check at
first vcpu run: that the number of redist regions is enough for all
vcpus. And there is already a test for that case in, the first step of
test_v3_new_redist_regions.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-11-ricarkol@google.com
2 years agoKVM: arm64: selftests: Add tests for GIC redist/cpuif partially above IPA range
Ricardo Koller [Tue, 5 Oct 2021 01:19:19 +0000 (18:19 -0700)]
KVM: arm64: selftests: Add tests for GIC redist/cpuif partially above IPA range

Add tests for checking that KVM returns the right error when trying to
set GICv2 CPU interfaces or GICv3 Redistributors partially above the
addressable IPA range. Also tighten the IPA range by replacing
KVM_CAP_ARM_VM_IPA_SIZE with the IPA range currently configured for the
guest (i.e., the default).

The check for the GICv3 redistributor created using the REDIST legacy
API is not sufficient as this new test only checks the check done using
vcpus already created when setting the base. The next commit will add
the missing test which verifies that the KVM check is done at first vcpu
run.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-10-ricarkol@google.com
2 years agoKVM: arm64: selftests: Add some tests for GICv2 in vgic_init
Ricardo Koller [Tue, 5 Oct 2021 01:19:18 +0000 (18:19 -0700)]
KVM: arm64: selftests: Add some tests for GICv2 in vgic_init

Add some GICv2 tests: general KVM device tests and DIST/CPUIF overlap
tests.  Do this by making test_vcpus_then_vgic and test_vgic_then_vcpus
in vgic_init GIC version agnostic.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-9-ricarkol@google.com
2 years agoKVM: arm64: selftests: Make vgic_init/vm_gic_create version agnostic
Ricardo Koller [Tue, 5 Oct 2021 01:19:17 +0000 (18:19 -0700)]
KVM: arm64: selftests: Make vgic_init/vm_gic_create version agnostic

Make vm_gic_create GIC version agnostic in the vgic_init test. Also
add a nr_vcpus arg into it instead of defaulting to NR_VCPUS.

No functional change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-8-ricarkol@google.com
2 years agoKVM: arm64: selftests: Make vgic_init gic version agnostic
Ricardo Koller [Tue, 5 Oct 2021 01:19:16 +0000 (18:19 -0700)]
KVM: arm64: selftests: Make vgic_init gic version agnostic

As a preparation for the next commits which will add some tests for
GICv2, make aarch64/vgic_init GIC version agnostic. Add a new generic
run_tests function(gic_dev_type) that starts all applicable tests using
GICv3 or GICv2. GICv2 tests are attempted if GICv3 is not available in
the system. There are currently no GICv2 tests, but the test passes now
in GICv2 systems.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-7-ricarkol@google.com
2 years agoKVM: arm64: vgic: Drop vgic_check_ioaddr()
Ricardo Koller [Tue, 5 Oct 2021 01:19:15 +0000 (18:19 -0700)]
KVM: arm64: vgic: Drop vgic_check_ioaddr()

There are no more users of vgic_check_ioaddr(). Move its checks to
vgic_check_iorange() and then remove it.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-6-ricarkol@google.com
2 years agoKVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size
Ricardo Koller [Tue, 5 Oct 2021 01:19:14 +0000 (18:19 -0700)]
KVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size

Verify that the ITS region does not extend beyond the VM-specified IPA
range (phys_size).

  base + size > phys_size AND base < phys_size

Add the missing check into vgic_its_set_attr() which is called when
setting the region.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-5-ricarkol@google.com
2 years agoKVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size
Ricardo Koller [Tue, 5 Oct 2021 01:19:13 +0000 (18:19 -0700)]
KVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size

Verify that the GICv2 CPU interface does not extend beyond the
VM-specified IPA range (phys_size).

  base + size > phys_size AND base < phys_size

Add the missing check into kvm_vgic_addr() which is called when setting
the region. This patch also enables some superfluous checks for the
distributor (vgic_check_ioaddr was enough as alignment == size for the
distributors).

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-4-ricarkol@google.com
2 years agoKVM: arm64: vgic-v3: Check redist region is not above the VM IPA size
Ricardo Koller [Tue, 5 Oct 2021 01:19:12 +0000 (18:19 -0700)]
KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size

Verify that the redistributor regions do not extend beyond the
VM-specified IPA range (phys_size). This can happen when using
KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
with:

  base + size > phys_size AND base < phys_size

Add the missing check into vgic_v3_alloc_redist_region() which is called
when setting the regions, and into vgic_v3_check_base() which is called
when attempting the first vcpu-run. The vcpu-run check does not apply to
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
before the first vcpu-run. Note that using the REDIST_REGIONS API
results in a different check, which already exists, at first vcpu run:
that the number of redist regions is enough for all vcpus.

Finally, this patch also enables some extra tests in
vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
redist api: like checking that the REDIST region can fit all the already
created vcpus.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-3-ricarkol@google.com
2 years agokvm: arm64: vgic: Introduce vgic_check_iorange
Ricardo Koller [Tue, 5 Oct 2021 01:19:11 +0000 (18:19 -0700)]
kvm: arm64: vgic: Introduce vgic_check_iorange

Add the new vgic_check_iorange helper that checks that an iorange is
sane: the start address and size have valid alignments, the range is
within the addressable PA range, start+size doesn't overflow, and the
start wasn't already defined.

No functional change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-2-ricarkol@google.com
2 years agoMerge branch kvm-arm64/pkvm/restrict-hypercalls into kvmarm-master/next
Marc Zyngier [Mon, 11 Oct 2021 08:21:37 +0000 (09:21 +0100)]
Merge branch kvm-arm64/pkvm/restrict-hypercalls into kvmarm-master/next

* kvm-arm64/pkvm/restrict-hypercalls:
  : .
  : Restrict the use of some hypercalls as well as kexec once
  : the protected KVM mode has been initialised.
  : .
  KVM: arm64: Disable privileged hypercalls after pKVM finalisation
  KVM: arm64: Prevent re-finalisation of pKVM for a given CPU
  KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall
  KVM: arm64: Reject stub hypercalls after pKVM has been initialised
  arm64: Prevent kexec and hibernation if is_protected_kvm_enabled()
  KVM: arm64: Turn __KVM_HOST_SMCCC_FUNC_* into an enum (mostly)

Signed-off-by: Marc Zyngier <maz@kernel.org>
2 years agoKVM: arm64: Disable privileged hypercalls after pKVM finalisation
Will Deacon [Fri, 8 Oct 2021 13:58:39 +0000 (14:58 +0100)]
KVM: arm64: Disable privileged hypercalls after pKVM finalisation

After pKVM has been 'finalised' using the __pkvm_prot_finalize hypercall,
the calling CPU will have a Stage-2 translation enabled to prevent access
to memory pages owned by EL2.

Although this forms a significant part of the process to deprivilege the
host kernel, we also need to ensure that the hypercall interface is
reduced so that the EL2 code cannot, for example, be re-initialised using
a new set of vectors.

Re-order the hypercalls so that only a suffix remains available after
finalisation of pKVM.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-7-will@kernel.org
2 years agoKVM: arm64: Prevent re-finalisation of pKVM for a given CPU
Will Deacon [Fri, 8 Oct 2021 13:58:38 +0000 (14:58 +0100)]
KVM: arm64: Prevent re-finalisation of pKVM for a given CPU

__pkvm_prot_finalize() completes the deprivilege of the host when pKVM
is in use by installing a stage-2 translation table for the calling CPU.

Issuing the hypercall multiple times for a given CPU makes little sense,
but in such a case just return early with -EPERM rather than go through
the whole page-table dance again.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-6-will@kernel.org
2 years agoKVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall
Will Deacon [Fri, 8 Oct 2021 13:58:37 +0000 (14:58 +0100)]
KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall

If the __pkvm_prot_finalize hypercall returns an error, we WARN but fail
to propagate the failure code back to kvm_arch_init().

Pass a pointer to a zero-initialised return variable so that failure
to finalise the pKVM protections on a host CPU can be reported back to
KVM.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-5-will@kernel.org
2 years agoKVM: arm64: Reject stub hypercalls after pKVM has been initialised
Will Deacon [Fri, 8 Oct 2021 13:58:36 +0000 (14:58 +0100)]
KVM: arm64: Reject stub hypercalls after pKVM has been initialised

The stub hypercalls provide mechanisms to reset and replace the EL2 code,
so uninstall them once pKVM has been initialised in order to ensure the
integrity of the hypervisor code.

To ensure pKVM initialisation remains functional, split cpu_hyp_reinit()
into two helper functions to separate usage of the stub from usage of
pkvm hypercalls either side of __pkvm_init on the boot CPU.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-4-will@kernel.org
2 years agoarm64: Prevent kexec and hibernation if is_protected_kvm_enabled()
Will Deacon [Fri, 8 Oct 2021 13:58:35 +0000 (14:58 +0100)]
arm64: Prevent kexec and hibernation if is_protected_kvm_enabled()

When pKVM is enabled, the hypervisor code at EL2 and its data structures
are inaccessible to the host kernel and cannot be torn down or replaced
as this would defeat the integrity properies which pKVM aims to provide.
Furthermore, the ABI between the host and EL2 is flexible and private to
whatever the current implementation of KVM requires and so booting a new
kernel with an old EL2 component is very likely to end in disaster.

In preparation for uninstalling the hyp stub calls which are relied upon
to reset EL2, disable kexec and hibernation in the host when protected
KVM is enabled.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-3-will@kernel.org
2 years agoKVM: arm64: Turn __KVM_HOST_SMCCC_FUNC_* into an enum (mostly)
Marc Zyngier [Fri, 8 Oct 2021 13:58:34 +0000 (14:58 +0100)]
KVM: arm64: Turn __KVM_HOST_SMCCC_FUNC_* into an enum (mostly)

__KVM_HOST_SMCCC_FUNC_* is a royal pain, as there is a fair amount
of churn around these #defines, and we avoid making it an enum
only for the sake of the early init, low level code that requires
__KVM_HOST_SMCCC_FUNC___kvm_hyp_init to be usable from assembly.

Let's be brave and turn everything but this symbol into an enum,
using a bit of arithmetic to avoid any overlap.

Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/877depq9gw.wl-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-2-will@kernel.org
2 years agoLinux 5.15-rc4
Linus Torvalds [Sun, 3 Oct 2021 21:08:47 +0000 (14:08 -0700)]
Linux 5.15-rc4

2 years agoelf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
Chen Jingwen [Tue, 28 Sep 2021 12:56:57 +0000 (20:56 +0800)]
elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings

In commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.

Unfortunately, this will cause kernel to fail to start with:

    1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
    Failed to execute /init (error -17)

The reason is that the elf interpreter (ld.so) has overlapping segments.

  readelf -l ld-2.31.so
  Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                   0x000000000002c94c 0x000000000002c94c  R E    0x10000
    LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                   0x00000000000021e8 0x0000000000002320  RW     0x10000
    LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                   0x00000000000011ac 0x0000000000001328  RW     0x10000

The reason for this problem is the same as described in commit
ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").

Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.

Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Oct 2021 20:56:53 +0000 (13:56 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a number of ext4 bugs in fast_commit, inline data, and delayed
  allocation.

  Also fix error handling code paths in ext4_dx_readdir() and
  ext4_fill_super().

  Finally, avoid a grabbing a journal head in the delayed allocation
  write in the common cases where we are overwriting a pre-existing
  block or appending to an inode"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: recheck buffer uptodate bit under buffer lock
  ext4: fix potential infinite loop in ext4_dx_readdir()
  ext4: flush s_error_work before journal destroy in ext4_fill_super
  ext4: fix loff_t overflow in ext4_max_bitmap_size()
  ext4: fix reserved space counter leakage
  ext4: limit the number of blocks in one ADD_RANGE TLV
  ext4: enforce buffer head state assertion in ext4_da_map_blocks
  ext4: remove extent cache entries when truncating inline data
  ext4: drop unnecessary journal handle in delalloc write
  ext4: factor out write end code of inline file
  ext4: correct the error path of ext4_write_inline_data_end()
  ext4: check and update i_disksize properly
  ext4: add error checking to ext4_ext_replay_set_iblocks()

2 years agoobjtool: print out the symbol type when complaining about it
Linus Torvalds [Sun, 3 Oct 2021 20:45:48 +0000 (13:45 -0700)]
objtool: print out the symbol type when complaining about it

The objtool warning that the kvm instruction emulation code triggered
wasn't very useful:

    arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception

in that it helpfully tells you which symbol name it had trouble figuring
out the relocation for, but it doesn't actually say what the unknown
symbol type was that triggered it all.

In this case it was because of missing type information (type 0, aka
STT_NOTYPE), but on the whole it really should just have printed that
out as part of the message.

Because if this warning triggers, that's very much the first thing you
want to know - why did reloc2sec_off() return failure for that symbol?

So rather than just saying you can't handle some type of symbol without
saying what the type _was_, just print out the type number too.

Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agokvm: fix objtool relocation warning
Linus Torvalds [Sun, 3 Oct 2021 20:34:19 +0000 (13:34 -0700)]
kvm: fix objtool relocation warning

The recent change to make objtool aware of more symbol relocation types
(commit 24ff65257375: "objtool: Teach get_alt_entry() about more
relocation types") also added another check, and resulted in this
objtool warning when building kvm on x86:

    arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception

The reason seems to be that kvm_fastop_exception() is marked as a global
symbol, which causes the relocation to ke kept around for objtool.  And
at the same time, the kvm_fastop_exception definition (which is done as
an inline asm statement) doesn't actually set the type of the global,
which then makes objtool unhappy.

The minimal fix is to just not mark kvm_fastop_exception as being a
global symbol.  It's only used in that one compilation unit anyway, so
it was always pointless.  That's how all the other local exception table
labels are done.

I'm not entirely happy about the kinds of games that the kvm code plays
with doing its own exception handling, and the fact that it confused
objtool is most definitely a symptom of the code being a bit too subtle
and ad-hoc.  But at least this trivial one-liner makes objtool no longer
upset about what is going on.

Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types")
Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sun, 3 Oct 2021 18:19:02 +0000 (11:19 -0700)]
Merge tag 'char-misc-5.15-rc4' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small misc driver fixes for 5.15-rc4. They are in two
  "groups":

   - ipack driver fixes for issues found by Johan Hovold

   - interconnect driver fixes for reported problems

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ipack: ipoctal: fix module reference leak
  ipack: ipoctal: fix missing allocation-failure check
  ipack: ipoctal: fix tty-registration error handling
  ipack: ipoctal: fix tty registration race
  ipack: ipoctal: fix stack information leak
  interconnect: qcom: sdm660: Add missing a2noc qos clocks
  dt-bindings: interconnect: sdm660: Add missing a2noc qos clocks
  interconnect: qcom: sdm660: Correct NOC_QOS_PRIORITY shift and mask
  interconnect: qcom: sdm660: Fix id of slv_cnoc_mnoc_cfg

2 years agoMerge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Oct 2021 18:10:09 +0000 (11:10 -0700)]
Merge tag 'driver-core-5.15-rc4' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core and kernfs fixes for reported issues for
  5.15-rc4. These fixes include:

   - kernfs positive dentry bugfix

   - debugfs_create_file_size error path fix

   - cpumask sysfs file bugfix to preserve the user/kernel abi (has been
     reported multiple times.)

   - devlink fixes for mdiobus devices as reported by the subsystem
     maintainers.

  Also included in here are some devlink debugging changes to make it
  easier for people to report problems when asked. They have already
  helped with the mdiobus and other subsystems reporting issues.

  All of these have been linux-next for a while with no reported issues"

* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: also call kernfs_set_rev() for positive dentry
  driver core: Add debug logs when fwnode links are added/deleted
  driver core: Create __fwnode_link_del() helper function
  driver core: Set deferred probe reason when deferred by driver core
  net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
  driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  driver core: fw_devlink: Improve handling of cyclic dependencies
  cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
  debugfs: debugfs_create_file_size(): use IS_ERR to check for error

2 years agoMerge tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Oct 2021 17:49:00 +0000 (10:49 -0700)]
Merge tag 'sched_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Tell the compiler to always inline is_percpu_thread()

 - Make sure tunable_scaling buffer is null-terminated after an update
   in sysfs

 - Fix LTP named regression due to cgroup list ordering

* tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Always inline is_percpu_thread()
  sched/fair: Null terminate buffer when updating tunable_scaling
  sched/fair: Add ancestors of unthrottled undecayed cfs_rq

2 years agoMerge tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Oct 2021 17:32:27 +0000 (10:32 -0700)]
Merge tag 'perf_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure the destroy callback is reset when a event initialization
   fails

 - Update the event constraints for Icelake

 - Make sure the active time of an event is updated even for inactive
   events

* tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: fix userpage->time_enabled of inactive events
  perf/x86/intel: Update event constraints for ICX
  perf/x86: Reset destroy callback on event init failure

2 years agoMerge tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 3 Oct 2021 17:23:54 +0000 (10:23 -0700)]
Merge tag 'objtool_urgent_for_v5.15_rc4' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:

 - Handle symbol relocations properly due to changes in the toolchains
   which remove section symbols now

* tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Teach get_alt_entry() about more relocation types

2 years agoMerge tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 Oct 2021 00:51:01 +0000 (17:51 -0700)]
Merge tag 'hwmon-for-v5.15-rc4' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Fixed various potential NULL pointer accesses in w8379* drivers

 - Improved error handling, fault reporting, and fixed rounding in
   thmp421 driver

 - Fixed error handling in ltc2947 driver

 - Added missing attribute to pmbus/mp2975 driver

 - Fixed attribute values in pbus/ibm-cffps, occ, and mlxreg-fan
   drivers

 - Removed unused residual code from k10temp driver

* tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
  hwmon: (pmbus/ibm-cffps) max_power_out swap changes
  hwmon: (occ) Fix P10 VRM temp sensors
  hwmon: (ltc2947) Properly handle errors when looking for the external clock
  hwmon: (tmp421) fix rounding for negative values
  hwmon: (tmp421) report /PVLD condition as fault
  hwmon: (tmp421) handle I2C errors
  hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
  hwmon: (k10temp) Remove residues of current and voltage

2 years agoMerge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Sun, 3 Oct 2021 00:43:54 +0000 (17:43 -0700)]
Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:
 "Eleven fixes for the ksmbd kernel server, mostly security related:

   - an important fix for disabling weak NTLMv1 authentication

   - seven security (improved buffer overflow checks) fixes

   - fix for wrong infolevel struct used in some getattr/setattr paths

   - two small documentation fixes"

* tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: missing check for NULL in convert_to_nt_pathname()
  ksmbd: fix transform header validation
  ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
  ksmbd: add validation in smb2 negotiate
  ksmbd: add request buffer validation in smb2_set_info
  ksmbd: use correct basic info level in set_file_basic_info()
  ksmbd: remove NTLMv1 authentication
  ksmbd: fix documentation for 2 functions
  MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry
  ksmbd: fix invalid request buffer access in compound
  ksmbd: remove RFC1002 check in smb2 request

2 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 2 Oct 2021 19:56:03 +0000 (12:56 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Five fairly minor fixes and spelling updates, all in drivers. Even
  though the ufs fix is in tracing, it's a potentially exploitable use
  beyond end of array bug"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: csiostor: Add module softdep on cxgb4
  scsi: qla2xxx: Fix excessive messages during device logout
  scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"
  scsi: ses: Fix unsigned comparison with less than zero
  scsi: ufs: Fix illegal offset in UPIU event trace

2 years agoMerge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 2 Oct 2021 18:00:36 +0000 (11:00 -0700)]
Merge tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few block fixes for this release:

   - Revert a BFQ commit that causes breakage for people. Unfortunately
     it was auto-selected for stable as well, so now 5.14.7 suffers from
     it too. Hopefully stable will pick up this revert quickly too, so
     we can remove the issue on that end as well.

   - Add a quirk for Apple NVMe controllers, which due to their
     non-compliance broke due to the introduction of command sequences
     (Keith)

   - Use shifts in nbd, fixing a __divdi3 issue (Nick)"

* tag 'block-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
  nbd: use shifts rather than multiplies
  Revert "block, bfq: honor already-setup queue merges"
  nvme: add command id quirk for apple controllers

2 years agoMerge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 2 Oct 2021 17:26:19 +0000 (10:26 -0700)]
Merge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes in here:

   - The signal issue that was discussed start of this week (me).

   - Kill dead fasync support in io_uring. Looks like it was broken
     since io_uring was initially merged, and given that nobody has ever
     complained about it, let's just kill it (Pavel)"

* tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
  io_uring: kill fasync
  io-wq: exclusively gate signal based exit on get_signal() return

2 years agoMerge tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 2 Oct 2021 17:08:35 +0000 (10:08 -0700)]
Merge tag 'libnvdimm-fixes-5.15-rc4' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A fix for a regression added this cycle in the pmem driver, and for a
  long standing bug for failed NUMA node lookups on ARM64.

  This has appeared in -next for several days with no reported issues.

  Summary:

   - Fix a regression that caused the sysfs ABI for pmem block devices
     to not be registered. This fails the nvdimm unit tests and dax
     xfstests.

   - Fix numa node lookups for dax-kmem memory (device-dax memory
     assigned to the page allocator) on ARM64"

* tag 'libnvdimm-fixes-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm/pmem: fix creating the dax group
  ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect

2 years agocachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL object
Dave Wysochanski [Fri, 1 Oct 2021 14:37:31 +0000 (15:37 +0100)]
cachefiles: Fix oops in trace_cachefiles_mark_buried due to NULL object

In cachefiles_mark_object_buried, the dentry in question may not have an
owner, and thus our cachefiles_object pointer may be NULL when calling
the tracepoint, in which case we will also not have a valid debug_id to
print in the tracepoint.

Check for NULL object in the tracepoint and if so, just set debug_id to
MAX_UINT as was done in 2908f5e101e3 ("fscache: Add a cookie debug ID
and use that in traces").

This fixes the following oops:

    FS-Cache: Cache "mycache" added (type cachefiles)
    CacheFiles: File cache on vdc registered
    ...
    Workqueue: fscache_object fscache_object_work_func [fscache]
    RIP: 0010:trace_event_raw_event_cachefiles_mark_buried+0x4e/0xa0 [cachefiles]
    ....
    Call Trace:
     cachefiles_mark_object_buried+0xa5/0xb0 [cachefiles]
     cachefiles_bury_object+0x270/0x430 [cachefiles]
     cachefiles_walk_to_object+0x195/0x9c0 [cachefiles]
     cachefiles_lookup_object+0x5a/0xc0 [cachefiles]
     fscache_look_up_object+0xd7/0x160 [fscache]
     fscache_object_work_func+0xb2/0x340 [fscache]
     process_one_work+0x1f1/0x390
     worker_thread+0x53/0x3e0
     kthread+0x127/0x150

Fixes: 2908f5e101e3 ("fscache: Add a cookie debug ID and use that in traces")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agodrm/i915: fix blank screen booting crashes
Hugh Dickins [Sat, 2 Oct 2021 10:17:29 +0000 (03:17 -0700)]
drm/i915: fix blank screen booting crashes

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>