review.tizen.org Git - platform/kernel/linux-rpi.git/log

projects / platform / kernel / linux-rpi.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Paolo Bonzini [Thu, 9 Dec 2021 19:10:04 +0000 (14:10 -0500)]

KVM: x86: avoid out of bounds indices for fixed performance counters

Because IceLake has 4 fixed performance counters but KVM only
supports 3, it is possible for reprogram_fixed_counters to pass
to reprogram_fixed_counter an index that is out of bounds for the
fixed_pmc_events array.

Ultimately intel_find_fixed_event, which is the only place that uses
fixed_pmc_events, handles this correctly because it checks against the
size of fixed_pmc_events anyway. Every other place operates on the
fixed_counters[] array which is sized according to INTEL_PMC_MAX_FIXED.
However, it is cleaner if the unsupported performance counters are culled
early on in reprogram_fixed_counters.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Lai Jiangshan [Thu, 16 Dec 2021 02:19:38 +0000 (10:19 +0800)]

KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT + !URG

When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but GUEST_CR3 is
still vmx->ept_identity_map_addr if EPT + !URG. So VCPU_EXREG_CR3 is
considered to be dirty and GUEST_CR3 needs to be updated in this case.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211216021938.11752-4-jiangshanlai@gmail.com>
Fixes: c62c7bd4f95b ("KVM: VMX: Update vmcs.GUEST_CR3 only when the guest CR3 is dirty")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Lai Jiangshan [Thu, 16 Dec 2021 02:19:37 +0000 (10:19 +0800)]

KVM: x86/mmu: Reconstruct shadow page root if the guest PDPTEs is changed

For shadow paging, the page table needs to be reconstructed before the
coming VMENTER if the guest PDPTEs is changed.

But not all paths that call load_pdptrs() will cause the page tables to be
reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots()
are used to launch later reconstruction.

The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and
CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs()
when changing CR0.CD and CR0.NW.

The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current
PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after
load_pdptrs() when rewriting the CR3 with the same value.

The commit a91a7c709600("KVM: X86: Don't reset mmu context when
toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after
load_pdptrs() when changing CR4.PGE.

Guests like linux would keep the PDPTEs unchanged for every instance of
pagetable, so this missing reconstruction has no problem for linux
guests.

Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")
Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush")
Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211216021938.11752-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Lai Jiangshan [Thu, 16 Dec 2021 02:19:36 +0000 (10:19 +0800)]

KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()

The host CR3 in the vcpu thread can only be changed when scheduling,
so commit 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
changed vmx.c to only save it in vmx_prepare_switch_to_guest().

However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS.
vmx_set_host_fs_gs() is called in both places, so rename it to
vmx_set_vmcs_host_state() and make it update HOST_CR3.

Fixes: 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211216021938.11752-2-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Paolo Bonzini [Fri, 10 Dec 2021 23:13:37 +0000 (18:13 -0500)]

Revert "KVM: X86: Update mmu->pdptrs only when it is changed"

This reverts commit 24cd19a28cb7174df502162641d6e1e12e7ffbd9.
Sean Christopherson reports:

"Commit 24cd19a28cb7 ('KVM: X86: Update mmu->pdptrs only when it is
changed') breaks nested VMs with EPT in L0 and PAE shadow paging in L2.
Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't
investigating how it breaks things."

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Peter Gonda [Wed, 8 Dec 2021 19:16:42 +0000 (11:16 -0800)]

selftests: KVM: sev_migrate_tests: Add mirror command tests

Add tests to confirm mirror vms can only run correct subset of commands.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Message-Id: <20211208191642.3792819-4-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Peter Gonda [Wed, 8 Dec 2021 19:16:41 +0000 (11:16 -0800)]

selftests: KVM: sev_migrate_tests: Fix sev_ioctl()

TEST_ASSERT in SEV ioctl was allowing errors because it checked return
value was good OR the FW error code was OK. This TEST_ASSERT should
require both (aka. AND) values are OK. Removes the LAUNCH_START from the
mirror VM because this call correctly fails because mirror VMs cannot
call this command. Currently issues with the PSP driver functions mean
the firmware error is not always reset to SEV_RET_SUCCESS when a call is
successful. Mainly sev_platform_init() doesn't correctly set the fw
error if the platform has already been initialized.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Message-Id: <20211208191642.3792819-3-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Peter Gonda [Wed, 8 Dec 2021 19:16:40 +0000 (11:16 -0800)]

selftests: KVM: sev_migrate_tests: Fix test_sev_mirror()

Mirrors should not be able to call LAUNCH_START. Remove the call on the
mirror to correct the test before fixing sev_ioctl() to correctly assert
on this failed ioctl.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Message-Id: <20211208191642.3792819-2-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Paolo Bonzini [Fri, 7 Jan 2022 15:43:02 +0000 (10:43 -0500)]

Merge tag 'kvm-riscv-5.17-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 5.17, take #1

- Use common KVM implementation of MMU memory caches
- SBI v0.2 support for Guest
- Initial KVM selftests support
- Fix to avoid spurious virtual interrupts after clearing hideleg CSR
- Update email address for Anup and Atish

commit | commitdiff | tree

Paolo Bonzini [Fri, 7 Jan 2022 15:42:19 +0000 (10:42 -0500)]

Merge tag 'kvmarm-5.17' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.16

- Simplification of the 'vcpu first run' by integrating it into
  KVM's 'pid change' flow

- Refactoring of the FP and SVE state tracking, also leading to
  a simpler state and less shared data between EL1 and EL2 in
  the nVHE case

- Tidy up the header file usage for the nvhe hyp object

- New HYP unsharing mechanism, finally allowing pages to be
  unmapped from the Stage-1 EL2 page-tables

- Various pKVM cleanups around refcounting and sharing

- A couple of vgic fixes for bugs that would trigger once
  the vcpu xarray rework is merged, but not sooner

- Add minimal support for ARMv8.7's PMU extension

- Rework kvm_pgtable initialisation ahead of the NV work

- New selftest for IRQ injection

- Teach selftests about the lack of default IPA space and
  page sizes

- Expand sysreg selftest to deal with Pointer Authentication

- The usual bunch of cleanups and doc update

commit | commitdiff | tree

Anup Patel [Mon, 3 Jan 2022 13:24:58 +0000 (18:54 +0530)]

MAINTAINERS: Update Anup's email address

I am no longer work at Western Digital so update my email address to
personal one and add entries to .mailmap as well.

Signed-off-by: Anup Patel <anup@brainfault.org>
Acked-by: Atish Patra <atishp@rivosinc.com>

commit | commitdiff | tree

Vincent Chen [Mon, 27 Dec 2021 03:05:14 +0000 (11:05 +0800)]

KVM: RISC-V: Avoid spurious virtual interrupts after clearing hideleg CSR

When the last VM is terminated, the host kernel will invoke function
hardware_disable_nolock() on each CPU to disable the related virtualization
functions. Here, RISC-V currently only clears hideleg CSR and hedeleg CSR.
This behavior will cause the host kernel to receive spurious interrupts if
hvip CSR has pending interrupts and the corresponding enable bits in vsie
CSR are asserted. To avoid it, hvip CSR and vsie CSR must be cleared
before clearing hideleg CSR.

Fixes: 99cdc6c18c2d ("RISC-V: Add initial skeletal KVM support")
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Anup Patel [Tue, 5 Oct 2021 12:39:56 +0000 (18:09 +0530)]

KVM: selftests: Add initial support for RISC-V 64-bit

We add initial support for RISC-V 64-bit in KVM selftests using
which we can cross-compile and run arch independent tests such as:
demand_paging_test
dirty_log_test
kvm_create_max_vcpus,
kvm_page_table_test
set_memory_region_test
kvm_binary_stats_test

All VM guest modes defined in kvm_util.h require at least 48-bit
guest virtual address so to use KVM RISC-V selftests hardware
need to support at least Sv48 MMU for guest (i.e. VS-mode).

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>

commit | commitdiff | tree

Anup Patel [Fri, 26 Nov 2021 13:03:45 +0000 (18:33 +0530)]

KVM: selftests: Add EXTRA_CFLAGS in top-level Makefile

We add EXTRA_CFLAGS to the common CFLAGS of top-level Makefile which will
allow users to pass additional compile-time flags such as "-static".

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com>

commit | commitdiff | tree

Anup Patel [Fri, 26 Nov 2021 11:35:51 +0000 (17:05 +0530)]

RISC-V: KVM: Add VM capability to allow userspace get GPA bits

The number of GPA bits supported for a RISC-V Guest/VM is based on the
MMU mode used by the G-stage translation. The KVM RISC-V will detect and
use the best possible MMU mode for the G-stage in kvm_arch_init().

We add a generic VM capability KVM_CAP_VM_GPA_BITS which can be used by
the KVM userspace to get the number of GPA (guest physical address) bits
supported for a Guest/VM.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>

commit | commitdiff | tree

Anup Patel [Fri, 26 Nov 2021 05:18:41 +0000 (10:48 +0530)]

RISC-V: KVM: Forward SBI experimental and vendor extensions

The SBI experimental extension space is for temporary (or experimental)
stuff whereas SBI vendor extension space is for hardware vendor specific
stuff. Both these SBI extension spaces won't be standardized by the SBI
specification so let's blindly forward such SBI calls to the userspace.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>

commit | commitdiff | tree

Jisheng Zhang [Sun, 28 Nov 2021 16:07:39 +0000 (00:07 +0800)]

RISC-V: KVM: make kvm_riscv_vcpu_fp_clean() static

There are no users outside vcpu_fp.c so make kvm_riscv_vcpu_fp_clean()
static.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Atish Patra [Thu, 2 Dec 2021 23:58:23 +0000 (15:58 -0800)]

MAINTAINERS: Update Atish's email address

I am no longer employed by western digital. Update my email address to
personal one and add entries to .mailmap as well.

Signed-off-by: Atish Patra <atishp@atishpatra.org>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Atish Patra [Thu, 18 Nov 2021 08:39:12 +0000 (00:39 -0800)]

RISC-V: KVM: Add SBI HSM extension in KVM

SBI HSM extension allows OS to start/stop harts any time. It also allows
ordered booting of harts instead of random booting.

Implement SBI HSM exntesion and designate the vcpu 0 as the boot vcpu id.
All other non-zero non-booting vcpus should be brought up by the OS
implementing HSM extension. If the guest OS doesn't implement HSM
extension, only single vcpu will be available to OS.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Atish Patra [Thu, 18 Nov 2021 08:39:11 +0000 (00:39 -0800)]

RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2

The SBI v0.2 contains some of the improved versions of required v0.1
extensions such as remote fence, timer and IPI.

This patch implements those extensions.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Atish Patra [Thu, 18 Nov 2021 08:39:10 +0000 (00:39 -0800)]

RISC-V: KVM: Add SBI v0.2 base extension

SBI v0.2 base extension defined to allow backward compatibility and
probing of future extensions. This is also the only mandatory SBI
extension that must be implemented by SBI implementors.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Atish Patra [Thu, 18 Nov 2021 08:39:09 +0000 (00:39 -0800)]

RISC-V: KVM: Reorganize SBI code by moving SBI v0.1 to its own file

With SBI v0.2, there may be more SBI extensions in future. It makes more
sense to group related extensions in separate files. Guest kernel will
choose appropriate SBI version dynamically.

Move the existing implementation to a separate file so that it can be
removed in future without much conflict.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Atish Patra [Thu, 18 Nov 2021 08:39:08 +0000 (00:39 -0800)]

RISC-V: KVM: Mark the existing SBI implementation as v0.1

The existing SBI specification impelementation follows v0.1
specification. The latest specification allows more scalability
and performance improvements.

Rename the existing implementation as v0.1 and provide a way
to allow future extensions.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Sean Christopherson [Thu, 4 Nov 2021 16:41:07 +0000 (16:41 +0000)]

KVM: RISC-V: Use common KVM implementation of MMU memory caches

Use common KVM's implementation of the MMU memory caches, which for all
intents and purposes is semantically identical to RISC-V's version, the
only difference being that the common implementation will fall back to an
atomic allocation if there's a KVM bug that triggers a cache underflow.

RISC-V appears to have based its MMU code on arm64 before the conversion
to the common caches in commit c1a33aebe91d ("KVM: arm64: Use common KVM
implementation of MMU memory caches"), despite having also copy-pasted
the definition of KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE in kvm_types.h.

Opportunistically drop the superfluous wrapper
kvm_riscv_stage2_flush_cache(), whose name is very, very confusing as
"cache flush" in the context of MMU code almost always refers to flushing
hardware caches, not freeing unused software objects.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>

commit | commitdiff | tree

Marc Zyngier [Tue, 4 Jan 2022 17:16:15 +0000 (17:16 +0000)]

Merge branch kvm-arm64/misc-5.17 into kvmarm-master/next

* kvm-arm64/misc-5.17:
  : .
  : Misc fixes and improvements:
  : - Add minimal support for ARMv8.7's PMU extension
  : - Constify kvm_io_gic_ops
  : - Drop kvm_is_transparent_hugepage() prototype
  : - Drop unused workaround_flags field
  : - Rework kvm_pgtable initialisation
  : - Documentation fixes
  : - Replace open-coded SCTLR_EL1.EE useage with its defined macro
  : - Sysreg list selftest update to handle PAuth
  : - Include cleanups
  : .
  KVM: arm64: vgic: Replace kernel.h with the necessary inclusions
  KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve()
  KVM: arm64: selftests: get-reg-list: Add pauth configuration
  KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on()
  KVM: arm64: Fix comment for kvm_reset_vcpu()
  KVM: arm64: Use defined value for SCTLR_ELx_EE
  KVM: arm64: Rework kvm_pgtable initialisation
  KVM: arm64: Drop unused workaround_flags vcpu field

Signed-off-by: Marc Zyngier <maz@kernel.org>

commit | commitdiff | tree

Andy Shevchenko [Tue, 4 Jan 2022 15:19:40 +0000 (17:19 +0200)]

KVM: arm64: vgic: Replace kernel.h with the necessary inclusions

arm_vgic.h does not require all the stuff that kernel.h provides.
Replace kernel.h inclusion with the list of what is really being used.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220104151940.55399-1-andriy.shevchenko@linux.intel.com

commit | commitdiff | tree

Marc Zyngier [Tue, 4 Jan 2022 14:03:43 +0000 (14:03 +0000)]

Merge branch kvm-arm64/selftest/irq-injection into kvmarm-master/next

* kvm-arm64/selftest/irq-injection:
  : .
  : New tests from Ricardo Koller:
  : "This series adds a new test, aarch64/vgic-irq, that validates the injection of
  : different types of IRQs from userspace using various methods and configurations"
  : .
  KVM: selftests: aarch64: Add test for restoring active IRQs
  KVM: selftests: aarch64: Add ISPENDR write tests in vgic_irq
  KVM: selftests: aarch64: Add tests for IRQFD in vgic_irq
  KVM: selftests: Add IRQ GSI routing library functions
  KVM: selftests: aarch64: Add test_inject_fail to vgic_irq
  KVM: selftests: aarch64: Add tests for LEVEL_INFO in vgic_irq
  KVM: selftests: aarch64: Level-sensitive interrupts tests in vgic_irq
  KVM: selftests: aarch64: Add preemption tests in vgic_irq
  KVM: selftests: aarch64: Cmdline arg to set EOI mode in vgic_irq
  KVM: selftests: aarch64: Cmdline arg to set number of IRQs in vgic_irq test
  KVM: selftests: aarch64: Abstract the injection functions in vgic_irq
  KVM: selftests: aarch64: Add vgic_irq to test userspace IRQ injection
  KVM: selftests: aarch64: Add vGIC library functions to deal with vIRQ state
  KVM: selftests: Add kvm_irq_line library function
  KVM: selftests: aarch64: Add GICv3 register accessor library functions
  KVM: selftests: aarch64: Add function for accessing GICv3 dist and redist registers
  KVM: selftests: aarch64: Move gic_v3.h to shared headers

Signed-off-by: Marc Zyngier <maz@kernel.org>

commit | commitdiff | tree

Marc Zyngier [Tue, 4 Jan 2022 14:03:26 +0000 (14:03 +0000)]

Merge branch kvm-arm64/selftest/ipa into kvmarm-master/next

* kvm-arm64/selftest/ipa:
  : .
  : Expand the KVM/arm64 selftest infrastructure to discover
  : supported page sizes at runtime, support 16kB pages, and
  : find out about the original M1 stupidly small IPA space.
  : .
  KVM: selftests: arm64: Add support for various modes with 16kB page size
  KVM: selftests: arm64: Add support for VM_MODE_P36V48_{4K,64K}
  KVM: selftests: arm64: Rework TCR_EL1 configuration
  KVM: selftests: arm64: Check for supported page sizes
  KVM: selftests: arm64: Introduce a variable default IPA size
  KVM: selftests: arm64: Initialise default guest mode at test startup time

Signed-off-by: Marc Zyngier <maz@kernel.org>

commit | commitdiff | tree

Zenghui Yu [Thu, 30 Dec 2021 14:15:35 +0000 (22:15 +0800)]

KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve()

kvm_arm_init_arch_resources() was renamed to kvm_arm_init_sve() in
commit a3be836df7cb ("KVM: arm/arm64: Demote
kvm_arm_init_arch_resources() to just set up SVE"). Fix the function
name in comment of kvm_vcpu_finalize_sve().

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211230141535.1389-1-yuzenghui@huawei.com

commit | commitdiff | tree

Marc Zyngier [Tue, 28 Dec 2021 12:14:14 +0000 (12:14 +0000)]

KVM: arm64: selftests: get-reg-list: Add pauth configuration

The get-reg-list test ignores the Pointer Authentication features,
which is a shame now that we have relatively common HW with this feature.

Define two new configurations (with and without PMU) that exercise the
KVM capabilities.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211228121414.1013250-1-maz@kernel.org

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:06 +0000 (18:39 -0800)]

KVM: selftests: aarch64: Add test for restoring active IRQs

Add a test that restores multiple IRQs in active state, it does it by
writing into ISACTIVER from the guest and using KVM ioctls. This test
tries to emulate what would happen during a live migration: restore
active IRQs.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-18-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:05 +0000 (18:39 -0800)]

KVM: selftests: aarch64: Add ISPENDR write tests in vgic_irq

Add injection tests that use writing into the ISPENDR register (to mark
IRQs as pending). This is typically used by migration code.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-17-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:04 +0000 (18:39 -0800)]

KVM: selftests: aarch64: Add tests for IRQFD in vgic_irq

Add injection tests for the KVM_IRQFD ioctl into vgic_irq.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-16-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:03 +0000 (18:39 -0800)]

KVM: selftests: Add IRQ GSI routing library functions

Add an architecture independent wrapper function for creating and
writing IRQ GSI routing tables. Also add a function to add irqchip
entries.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-15-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:02 +0000 (18:39 -0800)]

KVM: selftests: aarch64: Add test_inject_fail to vgic_irq

Add tests for failed injections to vgic_irq. This tests that KVM can
handle bogus IRQ numbers.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-14-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:01 +0000 (18:39 -0800)]

KVM: selftests: aarch64: Add tests for LEVEL_INFO in vgic_irq

Add injection tests for the LEVEL_INFO ioctl (level-sensitive specific)
into vgic_irq.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-13-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:39:00 +0000 (18:39 -0800)]

KVM: selftests: aarch64: Level-sensitive interrupts tests in vgic_irq

Add a cmdline arg for using level-sensitive interrupts (vs the default
edge-triggered). Then move the handler into a generic handler function
that takes the type of interrupt (level vs. edge) as an arg. When
handling line-sensitive interrupts it sets the line to low after
acknowledging the IRQ.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-12-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:59 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Add preemption tests in vgic_irq

Add tests for IRQ preemption (having more than one activated IRQ at the
same time). This test injects multiple concurrent IRQs and handles them
without handling the actual exceptions. This is done by masking
interrupts for the whole test.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-11-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:58 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Cmdline arg to set EOI mode in vgic_irq

Add a new cmdline arg to set the EOI mode for all vgic_irq tests. This
specifies whether a write to EOIR will deactivate IRQs or not.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-10-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:57 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Cmdline arg to set number of IRQs in vgic_irq test

Add the ability to specify the number of vIRQs exposed by KVM (arg
defaults to 64). Then extend the KVM_IRQ_LINE test by injecting all
available SPIs at once (specified by the nr-irqs arg). As a bonus,
inject all SGIs at once as well.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-9-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:56 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Abstract the injection functions in vgic_irq

Build an abstraction around the injection functions, so the preparation
and checking around the actual injection can be shared between tests.
All functions are stored as pointers in arrays of kvm_inject_desc's
which include the pointer and what kind of interrupts they can inject.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-8-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:55 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Add vgic_irq to test userspace IRQ injection

Add a new KVM selftest, vgic_irq, for testing userspace IRQ injection. This
particular test injects an SPI using KVM_IRQ_LINE on GICv3 and verifies
that the IRQ is handled in the guest. The next commits will add more
types of IRQs and different modes.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-7-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:54 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Add vGIC library functions to deal with vIRQ state

Add a set of library functions for userspace code in selftests to deal
with vIRQ state (i.e., ioctl wrappers).

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-6-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:53 +0000 (18:38 -0800)]

KVM: selftests: Add kvm_irq_line library function

Add an architecture independent wrapper function for the KVM_IRQ_LINE
ioctl.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-5-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:52 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Add GICv3 register accessor library functions

Add library functions for accessing GICv3 registers: DIR, PMR, CTLR,
ISACTIVER, ISPENDR.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-4-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:51 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Add function for accessing GICv3 dist and redist registers

Add a generic library function for reading and writing GICv3 distributor
and redistributor registers. Then adapt some functions to use it; more
will come and use it in the next commit.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-3-ricarkol@google.com

commit | commitdiff | tree

Ricardo Koller [Tue, 9 Nov 2021 02:38:50 +0000 (18:38 -0800)]

KVM: selftests: aarch64: Move gic_v3.h to shared headers

Move gic_v3.h to the shared headers location. There are some definitions
that will be used in the vgic-irq test.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Acked-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211109023906.1091208-2-ricarkol@google.com

commit | commitdiff | tree

Marc Zyngier [Mon, 27 Dec 2021 12:48:09 +0000 (12:48 +0000)]

KVM: selftests: arm64: Add support for various modes with 16kB page size

The 16kB page size is not a popular choice, due to only a few CPUs
actually implementing support for it. However, it can lead to some
interesting performance improvements given the right uarch choices.

Add support for this page size for various PA/VA combinations.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211227124809.1335409-7-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Mon, 27 Dec 2021 12:48:08 +0000 (12:48 +0000)]

KVM: selftests: arm64: Add support for VM_MODE_P36V48_{4K,64K}

Some of the arm64 systems out there have an IPA space that is
positively tiny. Nonetheless, they make great KVM hosts.

Add support for 36bit IPA support with 4kB pages, which makes
some of the fruity machines happy. Whilst we're at it, add support
for 64kB pages as well, though these boxes have no support for it.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211227124809.1335409-6-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Mon, 27 Dec 2021 12:48:07 +0000 (12:48 +0000)]

KVM: selftests: arm64: Rework TCR_EL1 configuration

The current way we initialise TCR_EL1 is a bit cumbersome, as
we mix setting TG0 and IPS in the same swtch statement.

Split it into two statements (one for the base granule size, and
another for the IPA size), allowing new modes to be added in a
more elegant way.

No functional change intended.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211227124809.1335409-5-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Mon, 27 Dec 2021 12:48:06 +0000 (12:48 +0000)]

KVM: selftests: arm64: Check for supported page sizes

Just as arm64 implemenations don't necessary support all IPA
ranges, they don't all support the same page sizes either. Fun.

Create a dummy VM to snapshot the page sizes supported by the
host, and filter the supported modes.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211227124809.1335409-4-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Mon, 27 Dec 2021 12:48:05 +0000 (12:48 +0000)]

KVM: selftests: arm64: Introduce a variable default IPA size

Contrary to popular belief, there is no such thing as a default
IPA size on arm64. Anything goes, and implementations are the
usual Wild West.

The selftest infrastructure default to 40bit IPA, which obviously
doesn't work for some systems out there.

Turn VM_MODE_DEFAULT from a constant into a variable, and let
guest_modes_append_default() populate it, depending on what
the HW can do. In order to preserve the current behaviour, we
still pick 40bits IPA as the default if it is available, and
the largest supported IPA space otherwise.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211227124809.1335409-3-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Mon, 27 Dec 2021 12:48:04 +0000 (12:48 +0000)]

KVM: selftests: arm64: Initialise default guest mode at test startup time

As we are going to add support for a variable default mode on arm64,
let's make sure it is setup first by using a constructor that gets
called before the actual test runs.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20211227124809.1335409-2-maz@kernel.org

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Dec 2021 17:59:53 +0000 (12:59 -0500)]

Merge tag 'kvm-s390-next-5.17-1' of git://git./linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fix and cleanup

- fix sigp sense/start/stop/inconsistency
- cleanups

commit | commitdiff | tree

Paolo Bonzini [Tue, 21 Dec 2021 17:51:09 +0000 (12:51 -0500)]

Merge remote-tracking branch 'kvm/master' into HEAD

Pick commit fdba608f15e2 ("KVM: VMX: Wake vCPU when delivering posted
IRQ even if vCPU == this vCPU"). In addition to fixing a bug, it
also aligns the non-nested and nested usage of triggering posted
interrupts, allowing for additional cleanups.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Tue, 21 Dec 2021 15:37:00 +0000 (10:37 -0500)]

KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU

Drop a check that guards triggering a posted interrupt on the currently
running vCPU, and more importantly guards waking the target vCPU if
triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
If a vIRQ is delivered from asynchronous context, the target vCPU can be
the currently running vCPU and can also be blocking, in which case
skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to
be a wake event for the vCPU.

The "do nothing" logic when "vcpu == running_vcpu" mostly works only
because the majority of calls to ->deliver_posted_interrupt(), especially
when using posted interrupts, come from synchronous KVM context.  But if
a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ
and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to
use posted interrupts, wake events from the device will be delivered to
KVM from IRQ context, e.g.

  vfio_msihandler()
  |
  |-> eventfd_signal()
      |
      |-> ...
          |
          |->  irqfd_wakeup()
               |
               |->kvm_arch_set_irq_inatomic()
                  |
                  |-> kvm_irq_delivery_to_apic_fast()
                      |
                      |-> kvm_apic_set_irq()

This also aligns the non-nested and nested usage of triggering posted
interrupts, and will allow for additional cleanups.

Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reported-by: Longpeng (Mike) <longpeng2@huawei.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211208015236.1616697-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Fuad Tabba [Wed, 8 Dec 2021 19:32:57 +0000 (19:32 +0000)]

KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on()

The barrier is there for power_off rather than power_state.
Probably typo in commit 358b28f09f0ab074 ("arm/arm64: KVM: Allow
a VCPU to fully reset itself").

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208193257.667613-3-tabba@google.com

commit | commitdiff | tree

Fuad Tabba [Wed, 8 Dec 2021 19:32:56 +0000 (19:32 +0000)]

KVM: arm64: Fix comment for kvm_reset_vcpu()

The comment for kvm_reset_vcpu() refers to the sysreg table as
being the table above, probably because of the code extracted at
commit f4672752c321ea36 ("arm64: KVM: virtual CPU reset").

Fix the comment to remove the potentially confusing reference.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com

commit | commitdiff | tree

Fuad Tabba [Wed, 8 Dec 2021 19:28:10 +0000 (19:28 +0000)]

KVM: arm64: Use defined value for SCTLR_ELx_EE

Replace the hardcoded value with the existing definition.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com

commit | commitdiff | tree

Sean Christopherson [Tue, 7 Dec 2021 19:30:06 +0000 (19:30 +0000)]

KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state

Add a selftest to attempt to enter L2 with invalid guests state by
exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set
invalid guest state (marking TR unusable is arbitrary chosen for its
relative simplicity).

This is a regression test for a bug introduced by commit c8607e4a086f
("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if
!from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid
guest state and ultimately triggered a WARN due to nested_vmx_vmexit()
seeing vmx->fail==true while attempting to synthesize a nested VM-Exit.

The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT
for L2, which is somewhat arbitrary behavior, instead of emulating L2.
KVM should never emulate L2 due to invalid guest state, as it's
architecturally impossible for L1 to run an L2 guest with invalid state
as nested VM-Enter should always fail, i.e. L1 needs to do the emulation.
Stuffing state via KVM ioctl() is a non-architctural, out-of-band case,
hence the TRIPLE_FAULT being rather arbitrary.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211207193006.120997-5-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Tue, 7 Dec 2021 19:30:05 +0000 (19:30 +0000)]

KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state

Update the documentation for kvm-intel's emulate_invalid_guest_state to
rectify the description of KVM's default behavior, and to document that
the behavior and thus parameter only applies to L1.

Fixes: a27685c33acc ("KVM: VMX: Emulate invalid guest state by default")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211207193006.120997-4-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Tue, 7 Dec 2021 19:30:04 +0000 (19:30 +0000)]

KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required

Synthesize a triple fault if L2 guest state is invalid at the time of
VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
invalid guest state, since from L1's perspective, it's architecturally
impossible for L2 to have invalid state while L2 is running in hardware.
E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
or #GP.

Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
can trigger this scenario, as nested VM-Enter correctly rejects any
attempt to enter L2 with invalid state.

RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
Following AMD's SMRAM layout is important as AMD's layout saves/restores
the descriptor cache information, including CS.RPL and SS.RPL, and also
defines all the fields relevant to invalid guest state as read-only, i.e.
so long as the vCPU had valid state before the SMI, which is guaranteed
for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
layout saves/restores only the selector, which means that scenarios where
the selector and cached RPL don't match, e.g. conforming code segments,
would yield invalid guest state.  Intel CPUs fudge around this issued by
stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
Treatment of RSM", paraphrasing for brevity:

  IF internal storage indicates that the [CPU was post-VMXON]
  THEN
     enter VMX operation (root or non-root);
     restore VMX-critical state as defined in Section 34.14.1;
     set to their fixed values any bits in CR0 and CR4 whose values must
     be fixed in VMX operation [unless coming from an unrestricted guest];
     IF RFLAGS.VM = 0 AND (in VMX root operation OR the
        “unrestricted guest” VM-execution control is 0)
     THEN
       CS.RPL := SS.DPL;
       SS.RPL := SS.DPL;
     FI;
     restore current VMCS pointer;
  FI;

Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
only way for CR0 and/or CR4 to have illegal values is if they were
modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
section states "modifying these registers will result in unpredictable
behavior".

KVM's ioctl() behavior is less straightforward.  Because KVM allows
ioctls() to be executed in any order, rejecting an ioctl() if it would
result in invalid L2 guest state is not an option as KVM cannot know if
a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
reject KVM_RUN if L2 contained invalid guest state, but that carries the
risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
exited to userspace.  Setting a flag/request to detect such a scenario is
undesirable because (a) it's extremely unlikely to add value to KVM as a
whole, and (b) KVM would need to consider ioctl() interactions with such
a flag, e.g. if userspace migrated the vCPU while the flag were set.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211207193006.120997-3-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Tue, 7 Dec 2021 19:30:03 +0000 (19:30 +0000)]

KVM: VMX: Always clear vmx->fail on emulation_required

Revert a relatively recent change that set vmx->fail if the vCPU is in L2
and emulation_required is true, as that behavior is completely bogus.
Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong:

  (a) it's impossible to have both a VM-Fail and VM-Exit
  (b) vmcs.EXIT_REASON is not modified on VM-Fail
  (c) emulation_required refers to guest state and guest state checks are
      always VM-Exits, not VM-Fails.

For KVM specifically, emulation_required is handled before nested exits
in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect,
i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored.
Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit()
firing when tearing down the VM as KVM never expects vmx->fail to be set
when L2 is active, KVM always reflects those errors into L1.

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548
                                nested_vmx_vmexit+0x16bd/0x17e0
                                arch/x86/kvm/vmx/nested.c:4547
  Modules linked in:
  CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547
  Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80
  Call Trace:
   vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline]
   nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330
   vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799
   kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989
   kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441
   kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline]
   kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline]
   kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220
   kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489
   __fput+0x3fc/0x870 fs/file_table.c:280
   task_work_run+0x146/0x1c0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0x705/0x24f0 kernel/exit.c:832
   do_group_exit+0x168/0x2d0 kernel/exit.c:929
   get_signal+0x1740/0x2120 kernel/signal.c:2852
   arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300
   do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c8607e4a086f ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry")
Reported-by: syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211207193006.120997-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Andrew Jones [Tue, 14 Dec 2021 15:18:42 +0000 (16:18 +0100)]

selftests: KVM: Fix non-x86 compiling

Attempting to compile on a non-x86 architecture fails with

include/kvm_util.h: In function â€˜vm_compute_max_gfnâ€™:
include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type â€˜struct kvm_vmâ€™
return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
^~

This is because the declaration of struct kvm_vm is in
lib/kvm_util_internal.h as an effort to make it private to
the test lib code. We can still provide arch specific functions,
though, by making the generic function symbols weak. Do that to
fix the compile error.

Fixes: c8cc43c1eae2 ("selftests: KVM: avoid failures due to reserved HyperTransport region")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20211214151842.848314-1-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Marc Orr [Thu, 9 Dec 2021 15:52:57 +0000 (07:52 -0800)]

KVM: x86: Always set kvm_run->if_flag

The kvm_run struct's if_flag is a part of the userspace/kernel API. The
SEV-ES patches failed to set this flag because it's no longer needed by
QEMU (according to the comment in the source code). However, other
hypervisors may make use of this flag. Therefore, set the flag for
guests with encrypted registers (i.e., with guest_state_protected set).

Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Signed-off-by: Marc Orr <marcorr@google.com>
Message-Id: <20211209155257.128747-1-marcorr@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Tue, 14 Dec 2021 03:35:28 +0000 (03:35 +0000)]

KVM: x86/mmu: Don't advance iterator after restart due to yielding

After dropping mmu_lock in the TDP MMU, restart the iterator during
tdp_iter_next() and do not advance the iterator.  Advancing the iterator
results in skipping the top-level SPTE and all its children, which is
fatal if any of the skipped SPTEs were not visited before yielding.

When zapping all SPTEs, i.e. when min_level == root_level, restarting the
iter and then invoking tdp_iter_next() is always fatal if the current gfn
has as a valid SPTE, as advancing the iterator results in try_step_side()
skipping the current gfn, which wasn't visited before yielding.

Sprinkle WARNs on iter->yielded being true in various helpers that are
often used in conjunction with yielding, and tag the helper with
__must_check to reduce the probabily of improper usage.

Failing to zap a top-level SPTE manifests in one of two ways.  If a valid
SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),
the shadow page will be leaked and KVM will WARN accordingly.

  WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm]
  RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
   kvm_destroy_vm+0x162/0x2a0 [kvm]
   kvm_vcpu_release+0x34/0x60 [kvm]
   __fput+0x82/0x240
   task_work_run+0x5c/0x90
   do_exit+0x364/0xa10
   ? futex_unqueue+0x38/0x60
   do_group_exit+0x33/0xa0
   get_signal+0x155/0x850
   arch_do_signal_or_restart+0xed/0x750
   exit_to_user_mode_prepare+0xc5/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x48/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by
kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of
marking a struct page as dirty/accessed after it has been put back on the
free list.  This directly triggers a WARN due to encountering a page with
page_count() == 0, but it can also lead to data corruption and additional
errors in the kernel.

  WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171
  RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm]
  Call Trace:
   <TASK>
   kvm_set_pfn_dirty+0x120/0x1d0 [kvm]
   __handle_changed_spte+0x92e/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   zap_gfn_range+0x549/0x620 [kvm]
   kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm]
   mmu_free_root_page+0x219/0x2c0 [kvm]
   kvm_mmu_free_roots+0x1b4/0x4e0 [kvm]
   kvm_mmu_unload+0x1c/0xa0 [kvm]
   kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm]
   kvm_put_kvm+0x3b1/0x8b0 [kvm]
   kvm_vcpu_release+0x4e/0x70 [kvm]
   __fput+0x1f7/0x8c0
   task_work_run+0xf8/0x1a0
   do_exit+0x97b/0x2230
   do_group_exit+0xda/0x2a0
   get_signal+0x3be/0x1e50
   arch_do_signal_or_restart+0x244/0x17f0
   exit_to_user_mode_prepare+0xcb/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x4d/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Note, the underlying bug existed even before commit 1af4a96025b3 ("KVM:
x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to
tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still
incorrectly advance past a top-level entry when yielding on a lower-level
entry.  But with respect to leaking shadow pages, the bug was introduced
by yielding before processing the current gfn.

Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or
callers could jump to their "retry" label.  The downside of that approach
is that tdp_mmu_iter_cond_resched() _must_ be called before anything else
in the loop, and there's no easy way to enfornce that requirement.

Ideally, KVM would handling the cond_resched() fully within the iterator
macro (the code is actually quite clean) and avoid this entire class of
bugs, but that is extremely difficult do while also supporting yielding
after tdp_mmu_set_spte_atomic() fails.  Yielding after failing to set a
SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly
bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE
may block operations on the SPTE for a significant amount of time.

Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211214033528.123268-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Wei Wang [Fri, 17 Dec 2021 12:49:34 +0000 (07:49 -0500)]

KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all

The fixed counter 3 is used for the Topdown metrics, which hasn't been
enabled for KVM guests. Userspace accessing to it will fail as it's not
included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines,
which have this counter.

To reproduce it on ICX+ machines, ./state_test reports:
==== Test Assertion Failure ====
lib/x86_64/processor.c:1078: r == nmsrs
pid=4564 tid=4564 - Argument list too long
1  0x000000000040b1b9: vcpu_save_state at processor.c:1077
2  0x0000000000402478: main at state_test.c:209 (discriminator 6)
3  0x00007fbe21ed5f92: ?? ??:0
4  0x000000000040264d: _start at ??:?
Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c)

With this patch, it works well.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Thu, 9 Dec 2021 06:05:46 +0000 (06:05 +0000)]

KVM: x86: Retry page fault if MMU reload is pending and root has no sp

Play nice with a NULL shadow page when checking for an obsolete root in
the page fault handler by flagging the page fault as stale if there's no
shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
Invalidating memslots, which is the only case where _all_ roots need to
be reloaded, requests all vCPUs to reload their MMUs while holding
mmu_lock for lock.

The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
backed by a shadow page.  Running with TDP disabled or with nested NPT
explodes spectaculary due to dereferencing a NULL shadow page pointer.

Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
root.  Zapping shadow pages in response to guest activity, e.g. when the
guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
vCPU isn't using the affected root.  I.e. KVM_REQ_MMU_RELOAD can be seen
with a completely valid root shadow page.  This is a bit of a moot point
as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
be cleaned up in the future.

Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211209060552.2956723-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 16 Dec 2021 16:52:12 +0000 (17:52 +0100)]

KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs

Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend
on guest visible CPUIDs and (incorrect) KVM logic implementing it is
about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden
and causes test to fail.

Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211216165213.338923-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Vitaly Kuznetsov [Thu, 16 Dec 2021 16:52:13 +0000 (17:52 +0100)]

KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES

The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should
not depend on guest visible CPUID entries, even if just to allow
creating/restoring guest MSRs and CPUIDs in any sequence.

Fixes: 27461da31089 ("KVM: x86/pmu: Support full width counting")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211216165213.338923-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Paolo Bonzini [Sun, 19 Dec 2021 14:27:21 +0000 (15:27 +0100)]

Merge branch 'topic/ppc-kvm' of https://git./linux/kernel/git/powerpc/linux into HEAD

Fix conflicts between memslot overhaul and commit 511d25d6b789f ("KVM:
PPC: Book3S: Suppress warnings when allocating too big memory slots")
from the powerpc tree.

commit | commitdiff | tree

Eric Farman [Mon, 13 Dec 2021 21:05:50 +0000 (22:05 +0100)]

KVM: s390: Clarify SIGP orders versus STOP/RESTART

With KVM_CAP_S390_USER_SIGP, there are only five Signal Processor
orders (CONDITIONAL EMERGENCY SIGNAL, EMERGENCY SIGNAL, EXTERNAL CALL,
SENSE, and SENSE RUNNING STATUS) which are intended for frequent use
and thus are processed in-kernel. The remainder are sent to userspace
with the KVM_CAP_S390_USER_SIGP capability. Of those, three orders
(RESTART, STOP, and STOP AND STORE STATUS) have the potential to
inject work back into the kernel, and thus are asynchronous.

Let's look for those pending IRQs when processing one of the in-kernel
SIGP orders, and return BUSY (CC2) if one is in process. This is in
agreement with the Principles of Operation, which states that only one
order can be "active" on a CPU at a time.

Cc: stable@vger.kernel.org
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20211213210550.856213-2-farman@linux.ibm.com
[borntraeger@linux.ibm.com: add stable tag]
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

commit | commitdiff | tree

Janosch Frank [Mon, 7 Jun 2021 08:07:13 +0000 (08:07 +0000)]

s390: uv: Add offset comments to UV query struct and fix naming

Changes to the struct are easier to manage with offset comments so
let's add some. And now that we know that the last struct member has
the wrong name let's also fix this.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

commit | commitdiff | tree

Janis Schoetterl-Glausch [Fri, 26 Nov 2021 16:45:49 +0000 (17:45 +0100)]

KVM: s390: gaccess: Cleanup access to guest pages

Introduce a helper function for guest frame access.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20211126164549.7046-4-scgl@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>

commit | commitdiff | tree

Janis Schoetterl-Glausch [Fri, 26 Nov 2021 16:45:48 +0000 (17:45 +0100)]

KVM: s390: gaccess: Refactor access address range check

Do not round down the first address to the page boundary, just translate
it normally, which gives the value we care about in the first place.
Given this, translating a single address is just the special case of
translating a range spanning a single page.

Make the output optional, so the function can be used to just check a
range.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20211126164549.7046-3-scgl@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>

commit | commitdiff | tree

Janis Schoetterl-Glausch [Fri, 26 Nov 2021 16:45:47 +0000 (17:45 +0100)]

KVM: s390: gaccess: Refactor gpa and length calculation

Improve readability by renaming the length variable and
not calculating the offset manually.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20211126164549.7046-2-scgl@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>

commit | commitdiff | tree

Marc Zyngier [Mon, 29 Nov 2021 20:00:45 +0000 (20:00 +0000)]

KVM: arm64: Rework kvm_pgtable initialisation

Ganapatrao reported that the kvm_pgtable->mmu pointer is more or
less hardcoded to the main S2 mmu structure, while the nested
code needs it to point to other instances (as we have one instance
per nested context).

Rework the initialisation of the kvm_pgtable structure so that
this assumtion doesn't hold true anymore. This requires some
minor changes to the order in which things are initialised
(the mmu->arch pointer being the critical one).

Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211129200150.351436-5-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Thu, 16 Dec 2021 13:06:09 +0000 (13:06 +0000)]

Merge branch kvm-arm64/pkvm-hyp-sharing into kvmarm-master/next

* kvm-arm64/pkvm-hyp-sharing:
  : .
  : Series from Quentin Perret, implementing HYP page share/unshare:
  :
  : This series implements an unshare hypercall at EL2 in nVHE
  : protected mode, and makes use of it to unmmap guest-specific
  : data-structures from EL2 stage-1 during guest tear-down.
  : Crucially, the implementation of the share and unshare
  : routines use page refcounts in the host kernel to avoid
  : accidentally unmapping data-structures that overlap a common
  : page.
  : [...]
  : .
  KVM: arm64: pkvm: Unshare guest structs during teardown
  KVM: arm64: Expose unshare hypercall to the host
  KVM: arm64: Implement do_unshare() helper for unsharing memory
  KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()
  KVM: arm64: Implement do_share() helper for sharing memory
  KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
  KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages
  KVM: arm64: pkvm: Refcount the pages shared with EL2
  KVM: arm64: Introduce kvm_share_hyp()
  KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
  KVM: arm64: Fixup hyp stage-1 refcount
  KVM: arm64: Refcount hyp stage-1 pgtable pages
  KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator

Signed-off-by: Marc Zyngier <maz@kernel.org>

commit | commitdiff | tree

Quentin Perret [Wed, 15 Dec 2021 16:12:31 +0000 (16:12 +0000)]

KVM: arm64: pkvm: Unshare guest structs during teardown

Make use of the newly introduced unshare hypercall during guest teardown
to unmap guest-related data structures from the hyp stage-1.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-15-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:30 +0000 (16:12 +0000)]

KVM: arm64: Expose unshare hypercall to the host

Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-14-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:29 +0000 (16:12 +0000)]

KVM: arm64: Implement do_unshare() helper for unsharing memory

Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.

Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.

Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-13-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:28 +0000 (16:12 +0000)]

KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()

__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.

Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-12-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:27 +0000 (16:12 +0000)]

KVM: arm64: Implement do_share() helper for sharing memory

By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.

Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.

Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-11-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:26 +0000 (16:12 +0000)]

KVM: arm64: Introduce wrappers for host and hyp spin lock accessors

In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-10-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:25 +0000 (16:12 +0000)]

KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages

Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.

This state will be used for permission checking during page transitions
in later patches.

Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-9-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 15 Dec 2021 16:12:24 +0000 (16:12 +0000)]

KVM: arm64: pkvm: Refcount the pages shared with EL2

In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-8-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 15 Dec 2021 16:12:23 +0000 (16:12 +0000)]

KVM: arm64: Introduce kvm_share_hyp()

The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:22 +0000 (16:12 +0000)]

KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2

Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
stage-1 mappings at EL2.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-6-qperret@google.com

commit | commitdiff | tree

Will Deacon [Wed, 15 Dec 2021 16:12:21 +0000 (16:12 +0000)]

KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table

kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.

Wire up this callback for the hypervisor stage-1 page-table.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-5-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 15 Dec 2021 16:12:20 +0000 (16:12 +0000)]

KVM: arm64: Fixup hyp stage-1 refcount

In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-4-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 15 Dec 2021 16:12:19 +0000 (16:12 +0000)]

KVM: arm64: Refcount hyp stage-1 pgtable pages

To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-3-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 15 Dec 2021 16:12:18 +0000 (16:12 +0000)]

KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator

In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.

In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211215161232.1480836-2-qperret@google.com

commit | commitdiff | tree

Marc Zyngier [Thu, 16 Dec 2021 12:54:12 +0000 (12:54 +0000)]

Merge branch kvm-arm64/vgic-fixes-5.17 into kvmarm-master/next

* kvm-arm64/vgic-fixes-5.17:
  : .
  : A few vgic fixes:
  : - Harden vgic-v3 error handling paths against signed vs unsigned
  :   comparison that will happen once the xarray-based vcpus are in
  : - Demote userspace-triggered console output to kvm_debug()
  : .
  KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug()
  KVM: arm64: vgic-v3: Fix vcpu index comparison

Signed-off-by: Marc Zyngier <maz@kernel.org>

commit | commitdiff | tree

Marc Zyngier [Thu, 16 Dec 2021 10:45:07 +0000 (10:45 +0000)]

KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug()

Running the KVM selftests results in these messages being dumped
in the kernel console:

[  188.051073] kvm [469]: VGIC redist and dist frames overlap
[  188.056820] kvm [469]: VGIC redist and dist frames overlap
[  188.076199] kvm [469]: VGIC redist and dist frames overlap

Being amle to trigger this from userspace is definitely not on,
so demote these warnings to kvm_debug().

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Thu, 16 Dec 2021 10:45:26 +0000 (10:45 +0000)]

KVM: arm64: vgic-v3: Fix vcpu index comparison

When handling an error at the point where we try and register
all the redistributors, we unregister all the previously
registered frames by counting down from the failing index.

However, the way the code is written relies on that index
being a signed value. Which won't be true once we switch to
an xarray-based vcpu set.

Since this code is pretty awkward the first place, and that the
failure mode is hard to spot, rewrite this loop to iterate
over the vcpus upwards rather than downwards.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org

commit | commitdiff | tree

Marc Zyngier [Wed, 15 Dec 2021 14:21:23 +0000 (14:21 +0000)]

Merge branch kvm-arm64/pkvm-cleanups-5.17 into kvmarm-master/next

* kvm-arm64/pkvm-cleanups-5.17:
  : .
  : pKVM cleanups from Quentin Perret:
  :
  : This series is a collection of various fixes and cleanups for KVM/arm64
  : when running in nVHE protected mode. The first two patches are real
  : fixes/improvements, the following two are minor cleanups, and the last
  : two help satisfy my paranoia so they're certainly optional.
  : .
  KVM: arm64: pkvm: Make kvm_host_owns_hyp_mappings() robust to VHE
  KVM: arm64: pkvm: Stub io map functions
  KVM: arm64: Make __io_map_base static
  KVM: arm64: Make the hyp memory pool static
  KVM: arm64: pkvm: Disable GICv2 support
  KVM: arm64: pkvm: Fix hyp_pool max order

Signed-off-by: Marc Zyngier <maz@kernel.org>

commit | commitdiff | tree

Quentin Perret [Wed, 8 Dec 2021 15:22:59 +0000 (15:22 +0000)]

KVM: arm64: pkvm: Make kvm_host_owns_hyp_mappings() robust to VHE

The kvm_host_owns_hyp_mappings() function should return true if and only
if the host kernel is responsible for creating the hypervisor stage-1
mappings. That is only possible in standard non-VHE mode, or during boot
in protected nVHE mode. But either way, none of this makes sense in VHE,
so make sure to catch this case as well, hence making the function
return sensible values in any context (VHE or not).

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-7-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 8 Dec 2021 15:22:58 +0000 (15:22 +0000)]

KVM: arm64: pkvm: Stub io map functions

Now that GICv2 is disabled in nVHE protected mode there should be no
other reason for the host to use create_hyp_io_mappings() or
kvm_phys_addr_ioremap(). Add sanity checks to make sure that assumption
remains true looking forward.

Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-6-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 8 Dec 2021 15:22:57 +0000 (15:22 +0000)]

KVM: arm64: Make __io_map_base static

The __io_map_base variable is used at EL2 to track the end of the
hypervisor's "private" VA range in nVHE protected mode. However it
doesn't need to be used outside of mm.c, so let's make it static to keep
all the hyp VA allocation logic in one place.

Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-5-qperret@google.com

commit | commitdiff | tree

Quentin Perret [Wed, 8 Dec 2021 15:22:56 +0000 (15:22 +0000)]

KVM: arm64: Make the hyp memory pool static

The hyp memory pool struct is sized to fit exactly the needs of the
hypervisor stage-1 page-table allocator, so it is important it is not
used for anything else. As it is currently used only from setup.c,
reduce its visibility by marking it static.

Signed-off-by: Quentin Perret <qperret@google.com>
Reviewed-by: Andrew Walbran <qwandor@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-4-qperret@google.com

Domain: System / Kernel;