platform/kernel/linux-rpi.git
20 months agoKVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:25 +0000 (15:54 +0100)]
KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test

Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the Partition
assist page.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-48-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: evmcs_test: Introduce L2 TLB flush test
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:24 +0000 (15:54 +0100)]
KVM: selftests: evmcs_test: Introduce L2 TLB flush test

Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the
Partition assist page.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-47-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Introduce rdmsr_from_l2() and use it for MSR-Bitmap tests
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:23 +0000 (15:54 +0100)]
KVM: selftests: Introduce rdmsr_from_l2() and use it for MSR-Bitmap tests

Hyper-V MSR-Bitmap tests do RDMSR from L2 to exit to L1. While 'evmcs_test'
correctly clobbers all GPRs (which are not preserved), 'hyperv_svm_test'
does not. Introduce a more generic rdmsr_from_l2() to avoid code
duplication and remove hardcoding of MSRs.  Do not put it in common code
because it is really just a selftests bug rather than a processor
feature that requires it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-46-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Stuff RAX/RCX with 'safe' values in vmmcall()/vmcall()
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:22 +0000 (15:54 +0100)]
KVM: selftests: Stuff RAX/RCX with 'safe' values in vmmcall()/vmcall()

vmmcall()/vmcall() are used to exit from L2 to L1 and no concrete hypercall
ABI is currenty followed. With the introduction of Hyper-V L2 TLB flush
it becomes (theoretically) possible that L0 will take responsibility for
handling the call and no L1 exit will happen. Prevent this by stuffing RAX
(KVM ABI) and RCX (Hyper-V ABI) with 'safe' values.

While on it, convert vmmcall() to 'static inline', make it setup stack
frame and move to include/x86_64/svm_util.h.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-45-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Allocate Hyper-V partition assist page
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:21 +0000 (15:54 +0100)]
KVM: selftests: Allocate Hyper-V partition assist page

In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate
so-called Partition assist page.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-44-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Create a vendor independent helper to allocate Hyper-V specific test...
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:20 +0000 (15:54 +0100)]
KVM: selftests: Create a vendor independent helper to allocate Hyper-V specific test pages

There's no need to pollute VMX and SVM code with Hyper-V specific
stuff and allocate Hyper-V specific test pages for all test as only
few really need them. Create a dedicated struct and an allocation
helper.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-43-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Split off load_evmcs() from load_vmcs()
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:19 +0000 (15:54 +0100)]
KVM: selftests: Split off load_evmcs() from load_vmcs()

In preparation to putting Hyper-V specific test pages to a dedicated
struct, move eVMCS load logic from load_vmcs(). Tests call load_vmcs()
directly and the only one which needs 'enlightened' version is
evmcs_test so there's not much gain in having this merged.

Temporary pass both GPA and HVA to load_evmcs().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-42-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:18 +0000 (15:54 +0100)]
KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h

Hyper-V VP assist page is not eVMCS specific, it is also used for
enlightened nSVM. Move the code to vendor neutral place.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-41-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:17 +0000 (15:54 +0100)]
KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h

'struct hv_vp_assist_page' definition doesn't match TLFS. Also, define
'struct hv_nested_enlightenments_control' and use it instead of opaque
'__u64'.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-40-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:16 +0000 (15:54 +0100)]
KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h

'struct hv_enlightened_vmcs' definition in selftests is not '__packed'
and so we rely on the compiler doing the right padding. This is not
obvious so it seems beneficial to use the same definition as in kernel.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-39-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Hyper-V PV TLB flush selftest
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:15 +0000 (15:54 +0100)]
KVM: selftests: Hyper-V PV TLB flush selftest

Introduce a selftest for Hyper-V PV TLB flush hypercalls
(HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).

The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
loop reading from a certain GVA checking the observed value. Sender
vCPU swaos the data page with another page filled with a different value.
The expectation for workers is also altered. Without TLB flush on worker
vCPUs, they may continue to observe old value. To guard against accidental
TLB flushes for worker vCPUs the test is repeated 100 times.

Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
fast' modes.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-38-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Export vm_vaddr_unused_gap() to make it possible to request unmapped...
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:13 +0000 (15:54 +0100)]
KVM: selftests: Export vm_vaddr_unused_gap() to make it possible to request unmapped ranges

Currently, tests can only request a new vaddr range by using
vm_vaddr_alloc()/vm_vaddr_alloc_page()/vm_vaddr_alloc_pages() but
these functions allocate and map physical pages too. Make it possible
to request unmapped range too.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-36-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:12 +0000 (15:54 +0100)]
KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too

Similar to vm_vaddr_alloc(), virt_map() needs to reflect the mapping
in vm->vpages_mapped.

While on it, remove unneeded code wrapping in vm_vaddr_alloc().

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-35-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Hyper-V PV IPI selftest
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:11 +0000 (15:54 +0100)]
KVM: selftests: Hyper-V PV IPI selftest

Introduce a selftest for Hyper-V PV IPI hypercalls
(HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx).

The test creates one 'sender' vCPU and two 'receiver' vCPU and then
issues various combinations of send IPI hypercalls in both 'normal'
and 'fast' (with XMM input where necessary) mode. Later, the test
checks whether IPIs were delivered to the expected destination vCPU[s].

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-34-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Move the function doing Hyper-V hypercall to a common header
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:10 +0000 (15:54 +0100)]
KVM: selftests: Move the function doing Hyper-V hypercall to a common header

All Hyper-V specific tests issuing hypercalls need this.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-33-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Move HYPERV_LINUX_OS_ID definition to a common header
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:09 +0000 (15:54 +0100)]
KVM: selftests: Move HYPERV_LINUX_OS_ID definition to a common header

HYPERV_LINUX_OS_ID needs to be written to HV_X64_MSR_GUEST_OS_ID by
each Hyper-V specific selftest.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-32-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Better XMM read/write helpers
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:08 +0000 (15:54 +0100)]
KVM: selftests: Better XMM read/write helpers

set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
from 128-bit registers. Moreover, these helpers are not used. Borrow
_kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
XMM0-XMM8 for now.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-31-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: Expose Hyper-V L2 TLB flush feature
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:07 +0000 (15:54 +0100)]
KVM: x86: Expose Hyper-V L2 TLB flush feature

With both nSVM and nVMX implementations in place, KVM can now expose
Hyper-V L2 TLB flush feature to userspace.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-30-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: nSVM: hyper-v: Enable L2 TLB flush
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:06 +0000 (15:54 +0100)]
KVM: nSVM: hyper-v: Enable L2 TLB flush

Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
both in extended 'nested controls' in VMCB and VP assist page.
According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
- HV_SVM_EXITCODE_ENL exit_code.
- HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.

Note: VP assist page is cached in 'struct kvm_vcpu_hv' so
recalc_intercepts() doesn't need to read from guest's memory. KVM
needs to update the case upon each VMRUN and after svm_set_nested_state
(svm_get_nested_state_pages()) to handle the case when the guest got
migrated while L2 was running.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-29-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: Make kvm_hv_get_assist_page() return 0/-errno
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:05 +0000 (15:54 +0100)]
KVM: x86: Make kvm_hv_get_assist_page() return 0/-errno

Convert kvm_hv_get_assist_page() to return 'int' and propagate possible
errors from kvm_read_guest_cached().

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-28-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: nVMX: hyper-v: Enable L2 TLB flush
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:04 +0000 (15:54 +0100)]
KVM: nVMX: hyper-v: Enable L2 TLB flush

Enable L2 TLB flush feature on nVMX when:
- Enlightened VMCS is in use.
- The feature flag is enabled in eVMCS.
- The feature flag is enabled in partition assist page.

Perform synthetic vmexit to L1 after processing TLB flush call upon
request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH).

Note: nested_evmcs_l2_tlb_flush_enabled() uses cached VP assist page copy
which gets updated from nested_vmx_handle_enlightened_vmptrld(). This is
also guaranteed to happen post migration with eVMCS backed L2 running.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-27-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv'
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:03 +0000 (15:54 +0100)]
KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv'

In preparation to enabling L2 TLB flush, cache VP assist page in
'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry()
to nested_get_evmptr() and make it return eVMCS GPA directly.

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-26-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Introduce fast guest_hv_cpuid_has_l2_tlb_flush() check
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:02 +0000 (15:54 +0100)]
KVM: x86: hyper-v: Introduce fast guest_hv_cpuid_has_l2_tlb_flush() check

Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
from L2 in L0 to process L2 TLB flush requests.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-25-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: L2 TLB flush
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:01 +0000 (15:54 +0100)]
KVM: x86: hyper-v: L2 TLB flush

Handle L2 TLB flush requests by going through all vCPUs and checking
whether there are vCPUs running the same VM_ID with a VP_ID specified
in the requests. Perform synthetic exit to L2 upon finish.

Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit
racy, we count on the fact that KVM flushes the whole L2 VPID upon
transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon
transition between L1 and L2 to make sure all pending requests are
always processed.

For the reference, Hyper-V TLFS refers to the feature as "Direct
Virtual Flush".

Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-24-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:54:00 +0000 (15:54 +0100)]
KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()

The newly introduced helper checks whether vCPU is performing a
Hyper-V TLB flush hypercall. This is required to filter out L2 TLB
flush hypercalls for processing.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-23-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hook
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:59 +0000 (15:53 +0100)]
KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hook

Hyper-V supports injecting synthetic L2->L1 exit after performing
L2 TLB flush operation but the procedure is vendor specific. Introduce
.hv_inject_synthetic_vmexit_post_tlb_flush nested hook for it.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-22-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:58 +0000 (15:53 +0100)]
KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id

Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
assist page address to handle L2 TLB flush requests.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-21-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:57 +0000 (15:53 +0100)]
KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use

To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
also needed to handle post-flush exit to L1 upon request.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-20-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on...
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:56 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'

To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
dynamically.

Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2
requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings,
i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so
can remain an on-stack allocation.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-19-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Create a separate fifo for L2 TLB flush
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:55 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Create a separate fifo for L2 TLB flush

To handle L2 TLB flush requests, KVM needs to use a separate fifo from
regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
something in L2 is made, the target vCPU can transition from L2 to L1,
receive a request to flush a GVA for L1 and then try to enter L2 back.
The first request needs to be processed at this point. Similarly,
requests to flush GVAs in L1 must wait until L2 exits to L1.

No functional change as KVM doesn't handle L2 TLB flush requests from
L2 yet.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-18-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:54 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()

Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
for a smaller number of vCPUs in the request. When Hyper-V TLB flush
is in  use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
send IPIs to a large number of vCPUs (and are rarely used in general).

Introduce hv_is_vp_in_sparse_set() to directly check if the specified
VP_ID is present in sparse vCPU set.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-17-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead...
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:53 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw '64'

It may not be clear from where the '64' limit for the maximum sparse
bank number comes from, use HV_MAX_SPARSE_VCPU_BANKS define instead.
Use HV_VCPUS_PER_SPARSE_BANK in KVM_HV_MAX_SPARSE_VCPU_SET_BITS's
definition. Opportunistically adjust the comment around BUILD_BUG_ON().

No functional change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-16-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agox86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:52 +0000 (15:53 +0100)]
x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants

It may not come clear from where the magical '64' value used in
__cpumask_to_vpset() come from. Moreover, '64' means both the maximum
sparse bank number as well as the number of vCPUs per bank. Add defines
to make things clear. These defines are also going to be used by KVM.

No functional change.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-15-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:51 +0000 (15:53 +0100)]
KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs

To handle L2 TLB flush requests, KVM needs to translate the specified
L2 GPA to L1 GPA to read hypercall arguments from there.

No functional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-14-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:50 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercalls

Extended GVA ranges support bit seems to indicate whether lower 12
bits of GVA can be used to specify up to 4095 additional consequent
GVAs to flush. This is somewhat described in TLFS.

Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
requests by flushing the whole VPID so technically, extended GVA
ranges were already supported. As such requests are handled more
gently now, advertizing support for extended ranges starts making
sense to reduce the size of TLB flush requests.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-13-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:49 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently

Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
flushing the whole VPID and this is sub-optimal. Switch to handling
these requests with 'flush_tlb_gva()' hooks instead. Use the newly
introduced TLB flush fifo to queue the requests.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-12-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Add helper to read hypercall data for array
Sean Christopherson [Tue, 1 Nov 2022 14:53:48 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Add helper to read hypercall data for array

Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
reading a guest-provided array can be reused in the future, e.g. for
getting a list of virtual addresses whose TLB entries need to be flushed.

Opportunisticaly swap the order of the data and XMM adjustment so that
the XMM/gpa offsets are bundled together.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-11-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Introduce TLB flush fifo
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:47 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Introduce TLB flush fifo

To allow flushing individual GVAs instead of always flushing the whole
VPID a per-vCPU structure to pass the requests is needed. Use standard
'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095
following GFNs in the lower 12 bits) and 'flush all'.

The size of the fifo is arbitrarily set to '16'.

Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and
kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the
queue before returning -EOPNOTSUPP (which triggers full TLB flush) so
the functional change is very small but the infrastructure is prepared
to handle individual GVA flush requests.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-10-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:46 +0000 (15:53 +0100)]
KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH
request in kvm_vcpu_flush_tlb_guest().

The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest().

No functional change intended.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: Move clearing of TLB_FLUSH_CURRENT to kvm_vcpu_flush_tlb_all()
Sean Christopherson [Tue, 1 Nov 2022 14:53:45 +0000 (15:53 +0100)]
KVM: x86: Move clearing of TLB_FLUSH_CURRENT to kvm_vcpu_flush_tlb_all()

Clear KVM_REQ_TLB_FLUSH_CURRENT in kvm_vcpu_flush_tlb_all() instead of in
its sole caller that processes KVM_REQ_TLB_FLUSH.  Regardless of why/when
kvm_vcpu_flush_tlb_all() is called, flushing "all" TLB entries also
flushes "current" TLB entries.

Ideally, there will never be another caller of kvm_vcpu_flush_tlb_all(),
and moving the handling "requires" extra work to document the ordering
requirement, but future Hyper-V paravirt TLB flushing support will add
similar logic for flush "guest" (Hyper-V can flush a subset of "guest"
entries).  And in the Hyper-V case, KVM needs to do more than just clear
the request, the queue of GPAs to flush also needs to purged, and doing
all only in the request path is undesirable as kvm_vcpu_flush_tlb_guest()
does have multiple callers (though it's unlikely KVM's paravirt TLB flush
will coincide with Hyper-V's paravirt TLB flush).

Move the logic even though it adds extra "work" so that KVM will be
consistent with how flush requests are processed when the Hyper-V support
lands.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}"
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:44 +0000 (15:53 +0100)]
KVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}"

To conform with SVM, rename VMX specific Hyper-V files from "evmcs.{ch}"
to "hyperv.{ch}". While Enlightened VMCS is a lion's share of these
files, some stuff (e.g. enlightened MSR bitmap, the upcoming Hyper-V
L2 TLB flush, ...) goes beyond that.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'
Vitaly Kuznetsov [Tue, 1 Nov 2022 14:53:43 +0000 (15:53 +0100)]
KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'

To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent,
rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change
eliminates the use of confusing 'direct' and adds the missing underscore.

No functional change.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agox86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments"
Sean Christopherson [Tue, 1 Nov 2022 14:53:42 +0000 (15:53 +0100)]
x86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments"

Now that KVM isn't littered with "struct hv_enlightenments" casts, rename
the struct to "hv_vmcb_enlightenments" to highlight the fact that the
struct is specifically for SVM's VMCB.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: SVM: Add a proper field for Hyper-V VMCB enlightenments
Sean Christopherson [Tue, 1 Nov 2022 14:53:41 +0000 (15:53 +0100)]
KVM: SVM: Add a proper field for Hyper-V VMCB enlightenments

Add a union to provide hv_enlightenments side-by-side with the sw_reserved
bytes that Hyper-V's enlightenments overlay.  Casting sw_reserved
everywhere is messy, confusing, and unnecessarily unsafe.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: Move "struct hv_enlightenments" to x86_64/svm.h
Sean Christopherson [Tue, 1 Nov 2022 14:53:40 +0000 (15:53 +0100)]
KVM: selftests: Move "struct hv_enlightenments" to x86_64/svm.h

Move Hyper-V's VMCB "struct hv_enlightenments" to the svm.h header so
that the struct can be referenced in "struct vmcb_control_area".
Alternatively, a dedicated header for SVM+Hyper-V could be added, a la
x86_64/evmcs.h, but it doesn't appear that Hyper-V will end up needing
a wholesale replacement for the VMCB.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agox86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.h
Sean Christopherson [Tue, 1 Nov 2022 14:53:39 +0000 (15:53 +0100)]
x86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.h

Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the
definitions come directly from the TLFS[*], not from KVM.

No functional change intended.

[*] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields

[vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of
hyperv-tlfs.h, keep svm/hyperv.h]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: avoid memslot check in NX hugepage recovery if it cannot succeed
Paolo Bonzini [Thu, 17 Nov 2022 17:25:02 +0000 (12:25 -0500)]
KVM: x86: avoid memslot check in NX hugepage recovery if it cannot succeed

Since gfn_to_memslot() is relatively expensive, it helps to
skip it if it the memslot cannot possibly have dirty logging
enabled.  In order to do this, add to struct kvm a counter
of the number of log-page memslots.  While the correct value
can only be read with slots_lock taken, the NX recovery thread
is content with using an approximate value.  Therefore, the
counter is an atomic_t.

Based on https://lore.kernel.org/kvm/20221027200316.2221027-2-dmatlack@google.com/
by David Matlack.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoMerge branch 'kvm-svm-harden' into HEAD
Paolo Bonzini [Thu, 17 Nov 2022 16:50:23 +0000 (11:50 -0500)]
Merge branch 'kvm-svm-harden' into HEAD

This fixes three issues in nested SVM:

1) in the shutdown_interception() vmexit handler we call kvm_vcpu_reset().
However, if running nested and L1 doesn't intercept shutdown, the function
resets vcpu->arch.hflags without properly leaving the nested state.
This leaves the vCPU in inconsistent state and later triggers a kernel
panic in SVM code.  The same bug can likely be triggered by sending INIT
via local apic to a vCPU which runs a nested guest.

On VMX we are lucky that the issue can't happen because VMX always
intercepts triple faults, thus triple fault in L2 will always be
redirected to L1.  Plus, handle_triple_fault() doesn't reset the vCPU.
INIT IPI can't happen on VMX either because INIT events are masked while
in VMX mode.

Secondarily, KVM doesn't honour SHUTDOWN intercept bit of L1 on SVM.
A normal hypervisor should always intercept SHUTDOWN, a unit test on
the other hand might want to not do so.

Finally, the guest can trigger a kernel non rate limited printk on SVM
from the guest, which is fixed as well.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: remove exit_int_info warning in svm_handle_exit
Maxim Levitsky [Thu, 3 Nov 2022 14:13:51 +0000 (16:13 +0200)]
KVM: x86: remove exit_int_info warning in svm_handle_exit

It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).

Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).

Thus this warning can be user triggred and has very little value.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-10-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: add svm part to triple_fault_test
Maxim Levitsky [Thu, 3 Nov 2022 14:13:50 +0000 (16:13 +0200)]
KVM: selftests: add svm part to triple_fault_test

Add a SVM implementation to triple_fault_test to test that
emulated/injected shutdown works.

Since instead of the VMX, the SVM allows the hypervisor to avoid
intercepting shutdown in guest, don't intercept shutdown to test that
KVM suports this correctly.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-9-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: allow L1 to not intercept triple fault
Maxim Levitsky [Thu, 3 Nov 2022 14:13:49 +0000 (16:13 +0200)]
KVM: x86: allow L1 to not intercept triple fault

This is SVM correctness fix - although a sane L1 would intercept
SHUTDOWN event, it doesn't have to, so we have to honour this.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agokvm: selftests: add svm nested shutdown test
Maxim Levitsky [Thu, 3 Nov 2022 14:13:48 +0000 (16:13 +0200)]
kvm: selftests: add svm nested shutdown test

Add test that tests that on SVM if L1 doesn't intercept SHUTDOWN,
then L2 crashes L1 and doesn't crash L2

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: selftests: move idt_entry to header
Maxim Levitsky [Thu, 3 Nov 2022 14:13:47 +0000 (16:13 +0200)]
KVM: selftests: move idt_entry to header

struct idt_entry will be used for a test which will break IDT on purpose.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: forcibly leave nested mode on vCPU reset
Maxim Levitsky [Thu, 3 Nov 2022 14:13:46 +0000 (16:13 +0200)]
KVM: x86: forcibly leave nested mode on vCPU reset

While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.

On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.

If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.

Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.

This issue is assigned CVE-2022-3344

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: add kvm_leave_nested
Maxim Levitsky [Thu, 3 Nov 2022 14:13:45 +0000 (16:13 +0200)]
KVM: x86: add kvm_leave_nested

add kvm_leave_nested which wraps a call to nested_ops->leave_nested
into a function.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use
Maxim Levitsky [Thu, 3 Nov 2022 14:13:44 +0000 (16:13 +0200)]
KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use

Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.

This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86: nSVM: leave nested mode on vCPU free
Maxim Levitsky [Thu, 3 Nov 2022 14:13:43 +0000 (16:13 +0200)]
KVM: x86: nSVM: leave nested mode on vCPU free

If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.

Soon a warning will be added for this condition.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages
David Matlack [Thu, 3 Nov 2022 20:44:21 +0000 (13:44 -0700)]
KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages

Do not recover (i.e. zap) an NX Huge Page that is being dirty tracked,
as it will just be faulted back in at the same 4KiB granularity when
accessed by a vCPU. This may need to be changed if KVM ever supports
2MiB (or larger) dirty tracking granularity, or faulting huge pages
during dirty tracking for reads/executes. However for now, these zaps
are entirely wasteful.

In order to check if this commit increases the CPU usage of the NX
recovery worker thread I used a modified version of execute_perf_test
[1] that supports splitting guest memory into multiple slots and reports
/proc/pid/schedstat:se.sum_exec_runtime for the NX recovery worker just
before tearing down the VM. The goal was to force a large number of NX
Huge Page recoveries and see if the recovery worker used any more CPU.

Test Setup:

  echo 1000 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
  echo 10 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio

Test Command:

  ./execute_perf_test -v64 -s anonymous_hugetlb_1gb -x 16 -o

        | kvm-nx-lpage-re:se.sum_exec_runtime      |
        | ---------------------------------------- |
Run     | Before             | After               |
------- | ------------------ | ------------------- |
1       | 730.084105         | 724.375314          |
2       | 728.751339         | 740.581988          |
3       | 736.264720         | 757.078163          |

Comparing the median results, this commit results in about a 1% increase
CPU usage of the NX recovery worker when testing a VM with 16 slots.
However, the effect is negligible with the default halving time of NX
pages, which is 1 hour rather than 10 seconds given by period_ms = 1000,
ratio = 10.

[1] https://lore.kernel.org/kvm/20221019234050.3919566-2-dmatlack@google.com/

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20221103204421.1146958-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry
Paolo Bonzini [Thu, 17 Nov 2022 16:05:51 +0000 (11:05 -0500)]
KVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry

A removed SPTE is never present, hence the "if" in kvm_tdp_mmu_map
only fails in the exact same conditions that the earlier loop
tested in order to issue a  "break". So, instead of checking twice the
condition (upper level SPTEs could not be created or was frozen), just
exit the loop with a goto---the usual poor-man C replacement for RAII
early returns.

While at it, do not use the "ret" variable for return values of
functions that do not return a RET_PF_* enum.  This is clearer
and also makes it possible to initialize ret to RET_PF_RETRY.

Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoKVM: x86/mmu: Split huge pages mapped by the TDP MMU on fault
David Matlack [Wed, 9 Nov 2022 18:59:05 +0000 (10:59 -0800)]
KVM: x86/mmu: Split huge pages mapped by the TDP MMU on fault

Now that the TDP MMU has a mechanism to split huge pages, use it in the
fault path when a huge page needs to be replaced with a mapping at a
lower level.

This change reduces the negative performance impact of NX HugePages.
Prior to this change if a vCPU executed from a huge page and NX
HugePages was enabled, the vCPU would take a fault, zap the huge page,
and mapping the faulting address at 4KiB with execute permissions
enabled. The rest of the memory would be left *unmapped* and have to be
faulted back in by the guest upon access (read, write, or execute). If
guest is backed by 1GiB, a single execute instruction can zap an entire
GiB of its physical address space.

For example, it can take a VM longer to execute from its memory than to
populate that memory in the first place:

$ ./execute_perf_test -s anonymous_hugetlb_1gb -v96

Populating memory             : 2.748378795s
Executing from memory         : 2.899670885s

With this change, such faults split the huge page instead of zapping it,
which avoids the non-present faults on the rest of the huge page:

$ ./execute_perf_test -s anonymous_hugetlb_1gb -v96

Populating memory             : 2.729544474s
Executing from memory         : 0.111965688s   <---

This change also reduces the performance impact of dirty logging when
eager_page_split=N. eager_page_split=N (abbreviated "eps=N" below) can
be desirable for read-heavy workloads, as it avoids allocating memory to
split huge pages that are never written and avoids increasing the TLB
miss cost on reads of those pages.

             | Config: ept=Y, tdp_mmu=Y, 5% writes           |
             | Iteration 1 dirty memory time                 |
             | --------------------------------------------- |
vCPU Count   | eps=N (Before) | eps=N (After) | eps=Y        |
------------ | -------------- | ------------- | ------------ |
2            | 0.332305091s   | 0.019615027s  | 0.006108211s |
4            | 0.353096020s   | 0.019452131s  | 0.006214670s |
8            | 0.453938562s   | 0.019748246s  | 0.006610997s |
16           | 0.719095024s   | 0.019972171s  | 0.007757889s |
32           | 1.698727124s   | 0.021361615s  | 0.012274432s |
64           | 2.630673582s   | 0.031122014s  | 0.016994683s |
96           | 3.016535213s   | 0.062608739s  | 0.044760838s |

Eager page splitting remains beneficial for write-heavy workloads, but
the gap is now reduced.

             | Config: ept=Y, tdp_mmu=Y, 100% writes         |
             | Iteration 1 dirty memory time                 |
             | --------------------------------------------- |
vCPU Count   | eps=N (Before) | eps=N (After) | eps=Y        |
------------ | -------------- | ------------- | ------------ |
2            | 0.317710329s   | 0.296204596s  | 0.058689782s |
4            | 0.337102375s   | 0.299841017s  | 0.060343076s |
8            | 0.386025681s   | 0.297274460s  | 0.060399702s |
16           | 0.791462524s   | 0.298942578s  | 0.062508699s |
32           | 1.719646014s   | 0.313101996s  | 0.075984855s |
64           | 2.527973150s   | 0.455779206s  | 0.079789363s |
96           | 2.681123208s   | 0.673778787s  | 0.165386739s |

Further study is needed to determine if the remaining gap is acceptable
for customer workloads or if eager_page_split=N still requires a-priori
knowledge of the VM workload, especially when considering these costs
extrapolated out to large VMs with e.g. 416 vCPUs and 12TB RAM.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20221109185905.486172-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20 months agoMerge tag 'kvm-selftests-6.2-1' of https://github.com/kvm-x86/linux into HEAD
Paolo Bonzini [Thu, 17 Nov 2022 14:03:55 +0000 (09:03 -0500)]
Merge tag 'kvm-selftests-6.2-1' of https://github.com/kvm-x86/linux into HEAD

KVM selftests updates for 6.2

perf_util:
 - Add support for pinning vCPUs in dirty_log_perf_test.
 - Add a lightweight psuedo RNG for guest use, and use it to randomize
   the access pattern and write vs. read percentage in the so called
   "perf util" tests.
 - Rename the so called "perf_util" framework to "memstress".

ucall:
 - Add a common pool-based ucall implementation (code dedup and pre-work
   for running SEV (and beyond) guests in selftests.
 - Fix an issue in ARM's single-step test when using the new pool-based
   implementation; LDREX/STREX don't play nice with single-step exceptions.

init:
 - Provide a common constructor and arch hook, which will eventually be
   used by x86 to automatically select the right hypercall (AMD vs. Intel).

x86:
 - Clean up x86's page tabe management.
 - Clean up and enhance the "smaller maxphyaddr" test, and add a related
   test to cover generic emulation failure.
 - Clean up the nEPT support checks.
 - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

20 months agoKVM: selftests: Check for KVM nEPT support using "feature" MSRs
David Matlack [Wed, 16 Nov 2022 20:42:28 +0000 (12:42 -0800)]
KVM: selftests: Check for KVM nEPT support using "feature" MSRs

When checking for nEPT support in KVM, use kvm_get_feature_msr() instead
of vcpu_get_msr() to retrieve KVM's default TRUE_PROCBASED_CTLS and
PROCBASED_CTLS2 MSR values, i.e. don't require a VM+vCPU to query nEPT
support.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com
[sean: rebase on merged code, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Assert in prepare_eptp() that nEPT is supported
David Matlack [Wed, 16 Nov 2022 20:46:31 +0000 (12:46 -0800)]
KVM: selftests: Assert in prepare_eptp() that nEPT is supported

Now that a VM isn't needed to check for nEPT support, assert that KVM
supports nEPT in prepare_eptp() instead of skipping the test, and push
the TEST_REQUIRE() check out to individual tests.  The require+assert are
somewhat redundant and will incur some amount of ongoing maintenance
burden, but placing the "require" logic in the test makes it easier to
find/understand a test's requirements and in this case, provides a very
strong hint that the test cares about nEPT.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com
[sean: rebase on merged code, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Drop helpers for getting specific KVM supported CPUID entry
Sean Christopherson [Thu, 6 Oct 2022 00:51:25 +0000 (00:51 +0000)]
KVM: selftests: Drop helpers for getting specific KVM supported CPUID entry

Drop kvm_get_supported_cpuid_entry() and its inner helper now that all
known usage can use X86_FEATURE_*, X86_PROPERTY_*, X86_PMU_FEATURE_*, or
the dedicated Family/Model helpers.  Providing "raw" access to CPUID
leafs is undesirable as it encourages open coding CPUID checks, which is
often error prone and not self-documenting.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-13-seanjc@google.com
20 months agoKVM: selftests: Add and use KVM helpers for x86 Family and Model
Sean Christopherson [Thu, 6 Oct 2022 00:51:24 +0000 (00:51 +0000)]
KVM: selftests: Add and use KVM helpers for x86 Family and Model

Add KVM variants of the x86 Family and Model helpers, and use them in the
PMU event filter test.  Open code the retrieval of KVM's supported CPUID
entry 0x1.0 in anticipation of dropping kvm_get_supported_cpuid_entry().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-12-seanjc@google.com
20 months agoKVM: selftests: Add dedicated helpers for getting x86 Family and Model
Sean Christopherson [Thu, 6 Oct 2022 00:51:23 +0000 (00:51 +0000)]
KVM: selftests: Add dedicated helpers for getting x86 Family and Model

Add dedicated helpers for getting x86's Family and Model, which are the
last holdouts that "need" raw access to CPUID information.  FMS info is
a mess and requires not only splicing together multiple values, but
requires doing so conditional in the Family case.

Provide wrappers to reduce the odds of copy+paste errors, but mostly to
allow for the eventual removal of kvm_get_supported_cpuid_entry().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-11-seanjc@google.com
20 months agoKVM: selftests: Add PMU feature framework, use in PMU event filter test
Sean Christopherson [Thu, 6 Oct 2022 00:51:22 +0000 (00:51 +0000)]
KVM: selftests: Add PMU feature framework, use in PMU event filter test

Add an X86_PMU_FEATURE_* framework to simplify probing architectural
events on Intel PMUs, which require checking the length of a bit vector
and the _absence_ of a "feature" bit.  Add helpers for both KVM and
"this CPU", and use the newfangled magic (along with X86_PROPERTY_*)
to  clean up pmu_event_filter_test.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-10-seanjc@google.com
20 months agoKVM: selftests: Convert vmx_pmu_caps_test to use X86_PROPERTY_*
Sean Christopherson [Thu, 6 Oct 2022 00:51:21 +0000 (00:51 +0000)]
KVM: selftests: Convert vmx_pmu_caps_test to use X86_PROPERTY_*

Add X86_PROPERTY_PMU_VERSION and use it in vmx_pmu_caps_test to replace
open coded versions of the same functionality.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-9-seanjc@google.com
20 months agoKVM: selftests: Convert AMX test to use X86_PROPRETY_XXX
Sean Christopherson [Thu, 6 Oct 2022 00:51:20 +0000 (00:51 +0000)]
KVM: selftests: Convert AMX test to use X86_PROPRETY_XXX

Add and use x86 "properties" for the myriad AMX CPUID values that are
validated by the AMX test.  Drop most of the test's single-usage
helpers so that the asserts more precisely capture what check failed.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-8-seanjc@google.com
20 months agoKVM: selftests: Add kvm_cpu_*() support for X86_PROPERTY_*
Sean Christopherson [Thu, 6 Oct 2022 00:51:19 +0000 (00:51 +0000)]
KVM: selftests: Add kvm_cpu_*() support for X86_PROPERTY_*

Extent X86_PROPERTY_* support to KVM, i.e. add kvm_cpu_property() and
kvm_cpu_has_p(), and use the new helpers in kvm_get_cpu_address_width().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-7-seanjc@google.com
20 months agoKVM: selftests: Refactor kvm_cpuid_has() to prep for X86_PROPERTY_* support
Sean Christopherson [Thu, 6 Oct 2022 00:51:18 +0000 (00:51 +0000)]
KVM: selftests: Refactor kvm_cpuid_has() to prep for X86_PROPERTY_* support

Refactor kvm_cpuid_has() to prepare for extending X86_PROPERTY_* support
to KVM as well as "this CPU".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-6-seanjc@google.com
20 months agoKVM: selftests: Use X86_PROPERTY_MAX_KVM_LEAF in CPUID test
Sean Christopherson [Thu, 6 Oct 2022 00:51:17 +0000 (00:51 +0000)]
KVM: selftests: Use X86_PROPERTY_MAX_KVM_LEAF in CPUID test

Use X86_PROPERTY_MAX_KVM_LEAF to replace the equivalent open coded check
on KVM's maximum paravirt CPUID leaf.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-5-seanjc@google.com
20 months agoKVM: selftests: Add X86_PROPERTY_* framework to retrieve CPUID values
Sean Christopherson [Thu, 6 Oct 2022 00:51:16 +0000 (00:51 +0000)]
KVM: selftests: Add X86_PROPERTY_* framework to retrieve CPUID values

Introduce X86_PROPERTY_* to allow retrieving values/properties from CPUID
leafs, e.g. MAXPHYADDR from CPUID.0x80000008.  Use the same core code as
X86_FEATURE_*, the primary difference is that properties are multi-bit
values, whereas features enumerate a single bit.

Add this_cpu_has_p() to allow querying whether or not a property exists
based on the maximum leaf associated with the property, e.g. MAXPHYADDR
doesn't exist if the max leaf for 0x8000_xxxx is less than 0x8000_0008.

Use the new property infrastructure in vm_compute_max_gfn() to prove
that the code works as intended.  Future patches will convert additional
selftests code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-4-seanjc@google.com
20 months agoKVM: selftests: Refactor X86_FEATURE_* framework to prep for X86_PROPERTY_*
Sean Christopherson [Thu, 6 Oct 2022 00:51:15 +0000 (00:51 +0000)]
KVM: selftests: Refactor X86_FEATURE_* framework to prep for X86_PROPERTY_*

Refactor the X86_FEATURE_* framework to prepare for extending the core
logic to support "properties".  The "feature" framework allows querying a
single CPUID bit to detect the presence of a feature; the "property"
framework will extend the idea to allow querying a value, i.e. to get a
value that is a set of contiguous bits in a CPUID leaf.

Opportunistically add static asserts to ensure features are fully defined
at compile time, and to try and catch mistakes in the definition of
features.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-3-seanjc@google.com
20 months agoKVM: selftests: Add X86_FEATURE_PAE and use it calc "fallback" MAXPHYADDR
Sean Christopherson [Thu, 6 Oct 2022 00:51:14 +0000 (00:51 +0000)]
KVM: selftests: Add X86_FEATURE_PAE and use it calc "fallback" MAXPHYADDR

Add X86_FEATURE_PAE and use it to guesstimate the MAXPHYADDR when the
MAXPHYADDR CPUID entry isn't supported.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-2-seanjc@google.com
20 months agoKVM: selftests: Add a test for KVM_CAP_EXIT_ON_EMULATION_FAILURE
David Matlack [Wed, 2 Nov 2022 18:46:54 +0000 (11:46 -0700)]
KVM: selftests: Add a test for KVM_CAP_EXIT_ON_EMULATION_FAILURE

Add a selftest to exercise the KVM_CAP_EXIT_ON_EMULATION_FAILURE
capability.

This capability is also exercised through
smaller_maxphyaddr_emulation_test, but that test requires
allow_smaller_maxphyaddr=Y, which is off by default on Intel when ept=Y
and unconditionally disabled on AMD when npt=Y. This new test ensures
that KVM_CAP_EXIT_ON_EMULATION_FAILURE is exercised independent of
allow_smaller_maxphyaddr.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-11-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Expect #PF(RSVD) when TDP is disabled
David Matlack [Wed, 2 Nov 2022 18:46:53 +0000 (11:46 -0700)]
KVM: selftests: Expect #PF(RSVD) when TDP is disabled

Change smaller_maxphyaddr_emulation_test to expect a #PF(RSVD), rather
than an emulation failure, when TDP is disabled. KVM only needs to
emulate instructions to emulate a smaller guest.MAXPHYADDR when TDP is
enabled.

Fixes: 39bbcc3a4e39 ("selftests: kvm: Allows userspace to handle emulation errors.")
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-10-dmatlack@google.com
[sean: massage comment to talk about having to emulate due to MAXPHYADDR]
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Provide error code as a KVM_ASM_SAFE() output
Sean Christopherson [Wed, 2 Nov 2022 18:46:52 +0000 (11:46 -0700)]
KVM: selftests: Provide error code as a KVM_ASM_SAFE() output

Provide the error code on a fault in KVM_ASM_SAFE(), e.g. to allow tests
to assert that #PF generates the correct error code without needing to
manually install a #PF handler.  Use r10 as the scratch register for the
error code, as it's already clobbered by the asm blob (loaded with the
RIP of the to-be-executed instruction).  Deliberately load the output
"error_code" even in the non-faulting path so that error_code is always
initialized with deterministic data (the aforementioned RIP), i.e to
ensure a selftest won't end up with uninitialized consumption regardless
of how KVM_ASM_SAFE() is used.

Don't clear r10 in the non-faulting case and instead load error code with
the RIP (see above).  The error code is valid if and only if an exception
occurs, and '0' isn't necessarily a better "invalid" value, e.g. '0'
could result in false passes for a buggy test.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-9-dmatlack@google.com
20 months agoKVM: selftests: Avoid JMP in non-faulting path of KVM_ASM_SAFE()
Sean Christopherson [Wed, 2 Nov 2022 18:46:51 +0000 (11:46 -0700)]
KVM: selftests: Avoid JMP in non-faulting path of KVM_ASM_SAFE()

Clear R9 in the non-faulting path of KVM_ASM_SAFE() and fall through to
to a common load of "vector" to effectively load "vector" with '0' to
reduce the code footprint of the asm blob, to reduce the runtime overhead
of the non-faulting path (when "vector" is stored in a register), and so
that additional output constraints that are valid if and only if a fault
occur are loaded even in the non-faulting case.

A future patch will add a 64-bit output for the error code, and if its
output is not explicitly loaded with _something_, the user of the asm
blob can end up technically consuming uninitialized data.  Using a
common path to load the output constraints will allow using an existing
scratch register, e.g. r10, to hold the error code in the faulting path,
while also guaranteeing the error code is initialized with deterministic
data in the non-faulting patch (r10 is loaded with the RIP of
to-be-executed instruction).

Consuming the error code when a fault doesn't occur would obviously be a
test bug, but there's no guarantee the compiler will detect uninitialized
consumption.  And conversely, it's theoretically possible that the
compiler might throw a false positive on uninitialized data, e.g. if the
compiler can't determine that the non-faulting path won't touch the error
code.

Alternatively, the error code could be explicitly loaded in the
non-faulting path, but loading a 64-bit memory|register output operand
with an explicitl value requires a sign-extended "MOV imm32, r/m64",
which isn't exactly straightforward and has a largish code footprint.
And loading the error code with what is effectively garbage (from a
scratch register) avoids having to choose an arbitrary value for the
non-faulting case.

Opportunistically remove a rogue asterisk in the block comment.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-8-dmatlack@google.com
20 months agoKVM: selftests: Copy KVM PFERR masks into selftests
David Matlack [Wed, 2 Nov 2022 18:46:50 +0000 (11:46 -0700)]
KVM: selftests: Copy KVM PFERR masks into selftests

Copy KVM's macros for page fault error masks into processor.h so they
can be used in selftests.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-7-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: x86/mmu: Use BIT{,_ULL}() for PFERR masks
David Matlack [Wed, 2 Nov 2022 18:46:49 +0000 (11:46 -0700)]
KVM: x86/mmu: Use BIT{,_ULL}() for PFERR masks

Use the preferred BIT() and BIT_ULL() to construct the PFERR masks
rather than open-coding the bit shifting.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-6-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Move flds instruction emulation failure handling to header
David Matlack [Wed, 2 Nov 2022 18:46:48 +0000 (11:46 -0700)]
KVM: selftests: Move flds instruction emulation failure handling to header

Move the flds instruction emulation failure handling code to a header
so it can be re-used in an upcoming test.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-5-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Delete dead ucall code
David Matlack [Wed, 2 Nov 2022 18:46:47 +0000 (11:46 -0700)]
KVM: selftests: Delete dead ucall code

Delete a bunch of code related to ucall handling from
smaller_maxphyaddr_emulation_test. The only thing
smaller_maxphyaddr_emulation_test needs to check is that the vCPU exits
with UCALL_DONE after the second vcpu_run().

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Explicitly require instructions bytes
David Matlack [Wed, 2 Nov 2022 18:46:46 +0000 (11:46 -0700)]
KVM: selftests: Explicitly require instructions bytes

Hard-code the flds instruction and assert the exact instruction bytes
are present in run->emulation_failure. The test already requires the
instruction bytes to be present because that's the only way the test
will advance the RIP past the flds and get to GUEST_DONE().

Note that KVM does not necessarily return exactly 2 bytes in
run->emulation_failure since it may not know the exact instruction
length in all cases. So just assert that
run->emulation_failure.insn_size is at least 2.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Rename emulator_error_test to smaller_maxphyaddr_emulation_test
David Matlack [Wed, 2 Nov 2022 18:46:45 +0000 (11:46 -0700)]
KVM: selftests: Rename emulator_error_test to smaller_maxphyaddr_emulation_test

Rename emulator_error_test to smaller_maxphyaddr_emulation_test and
update the comment at the top of the file to document that this is
explicitly a test to validate that KVM emulates instructions in response
to an EPT violation when emulating a smaller MAXPHYADDR.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Don't assume vcpu->id is '0' in xAPIC state test
Gautam Menghani [Mon, 17 Oct 2022 17:58:19 +0000 (23:28 +0530)]
KVM: selftests: Don't assume vcpu->id is '0' in xAPIC state test

In xapic_state_test's test_icr(), explicitly skip iterations that would
match vcpu->id instead of assuming vcpu->id is '0', so that IPIs are
are correctly sent to non-existent vCPUs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/YyoZr9rXSSMEtdh5@google.com
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Link: https://lore.kernel.org/r/20221017175819.12672-1-gautammenghani201@gmail.com
[sean: massage shortlog and changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Add arch specific post vm creation hook
Vishal Annapurve [Tue, 15 Nov 2022 21:38:45 +0000 (21:38 +0000)]
KVM: selftests: Add arch specific post vm creation hook

Add arch specific API kvm_arch_vm_post_create to perform any required setup
after VM creation.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-4-vannapurve@google.com
[sean: place x86's implementation by vm_arch_vcpu_add()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Add arch specific initialization
Vishal Annapurve [Tue, 15 Nov 2022 21:38:44 +0000 (21:38 +0000)]
KVM: selftests: Add arch specific initialization

Introduce arch specific API: kvm_selftest_arch_init to allow each arch to
handle initialization before running any selftest logic.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-3-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: move common startup logic to kvm_util.c
Vishal Annapurve [Tue, 15 Nov 2022 21:38:43 +0000 (21:38 +0000)]
KVM: selftests: move common startup logic to kvm_util.c

Consolidate common startup logic in one place by implementing a single
setup function with __attribute((constructor)) for all selftests within
kvm_util.c.

This allows moving logic like:
        /* Tell stdout not to buffer its content */
        setbuf(stdout, NULL);
to a single file for all selftests.

This will also allow any required setup at entry in future to be done in
common main function.

Link: https://lore.kernel.org/lkml/Ywa9T+jKUpaHLu%2Fl@google.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-2-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Play nice with huge pages when getting PTEs/GPAs
Sean Christopherson [Thu, 6 Oct 2022 00:45:12 +0000 (00:45 +0000)]
KVM: selftests: Play nice with huge pages when getting PTEs/GPAs

Play nice with huge pages when getting PTEs and translating GVAs to GPAs,
there's no reason to disallow using huge pages in selftests.  Use
PG_LEVEL_NONE to indicate that the caller doesn't care about the mapping
level and just wants to get the pte+level.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-8-seanjc@google.com
20 months agoKVM: selftests: Use vm_get_page_table_entry() in addr_arch_gva2gpa()
Sean Christopherson [Thu, 6 Oct 2022 00:45:11 +0000 (00:45 +0000)]
KVM: selftests: Use vm_get_page_table_entry() in addr_arch_gva2gpa()

Use vm_get_page_table_entry() in addr_arch_gva2gpa() to get the leaf PTE
instead of manually walking page tables.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-7-seanjc@google.com
20 months agoKVM: selftests: Use virt_get_pte() when getting PTE pointer
Sean Christopherson [Thu, 6 Oct 2022 00:45:10 +0000 (00:45 +0000)]
KVM: selftests: Use virt_get_pte() when getting PTE pointer

Use virt_get_pte() in vm_get_page_table_entry() instead of open coding
equivalent code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-6-seanjc@google.com
20 months agoKVM: selftests: Verify parent PTE is PRESENT when getting child PTE
Sean Christopherson [Thu, 6 Oct 2022 00:45:09 +0000 (00:45 +0000)]
KVM: selftests: Verify parent PTE is PRESENT when getting child PTE

Verify the parent PTE is PRESENT when getting a child via virt_get_pte()
so that the helper can be used for getting PTEs/GPAs without losing
sanity checks that the walker isn't wandering into the weeds.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-5-seanjc@google.com
20 months agoKVM: selftests: Remove useless shifts when creating guest page tables
Sean Christopherson [Thu, 6 Oct 2022 00:45:08 +0000 (00:45 +0000)]
KVM: selftests: Remove useless shifts when creating guest page tables

Remove the pointless shift from GPA=>GFN and immediately back to
GFN=>GPA when creating guest page tables.  Ignore the other walkers
that have a similar pattern for the moment, they will be converted
to use virt_get_pte() in the near future.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-4-seanjc@google.com
20 months agoKVM: selftests: Drop reserved bit checks from PTE accessor
Sean Christopherson [Thu, 6 Oct 2022 00:45:07 +0000 (00:45 +0000)]
KVM: selftests: Drop reserved bit checks from PTE accessor

Drop the reserved bit checks from the helper to retrieve a PTE, there's
very little value in sanity checking the constructed page tables as any
will quickly be noticed in the form of an unexpected #PF.  The checks
also place unnecessary restrictions on the usage of the helpers, e.g. if
a test _wanted_ to set reserved bits for whatever reason.

Removing the NX check in particular allows for the removal of the @vcpu
param, which will in turn allow the helper to be reused nearly verbatim
for addr_gva2gpa().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-3-seanjc@google.com
20 months agoKVM: selftests: Drop helpers to read/write page table entries
Sean Christopherson [Thu, 6 Oct 2022 00:45:06 +0000 (00:45 +0000)]
KVM: selftests: Drop helpers to read/write page table entries

Drop vm_{g,s}et_page_table_entry() and instead expose the "inner"
helper (was _vm_get_page_table_entry()) that returns a _pointer_ to the
PTE, i.e. let tests directly modify PTEs instead of bouncing through
helpers that just make life difficult.

Opportunsitically use BIT_ULL() in emulator_error_test, and use the
MAXPHYADDR define to set the "rogue" GPA bit instead of open coding the
same value.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-2-seanjc@google.com
20 months agoKVM: selftests: Fix spelling mistake "begining" -> "beginning"
Colin Ian King [Wed, 28 Sep 2022 21:34:58 +0000 (22:34 +0100)]
KVM: selftests: Fix spelling mistake "begining" -> "beginning"

There is a spelling mistake in an assert message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220928213458.64089-1-colin.i.king@gmail.com
[sean: fix an ironic typo in the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
20 months agoKVM: selftests: Add ucall pool based implementation
Peter Gonda [Thu, 6 Oct 2022 00:34:09 +0000 (00:34 +0000)]
KVM: selftests: Add ucall pool based implementation

To play nice with guests whose stack memory is encrypted, e.g. AMD SEV,
introduce a new "ucall pool" implementation that passes the ucall struct
via dedicated memory (which can be mapped shared, a.k.a. as plain text).

Because not all architectures have access to the vCPU index in the guest,
use a bitmap with atomic accesses to track which entries in the pool are
free/used.  A list+lock could also work in theory, but synchronizing the
individual pointers to the guest would be a mess.

Note, there's no need to rewalk the bitmap to ensure success.  If all
vCPUs are simply allocating, success is guaranteed because there are
enough entries for all vCPUs.  If one or more vCPUs are freeing and then
reallocating, success is guaranteed because vCPUs _always_ walk the
bitmap from 0=>N; if vCPU frees an entry and then wins a race to
re-allocate, then either it will consume the entry it just freed (bit is
the first free bit), or the losing vCPU is guaranteed to see the freed
bit (winner consumes an earlier bit, which the loser hasn't yet visited).

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Peter Gonda <pgonda@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-8-seanjc@google.com
20 months agoKVM: selftests: Drop now-unnecessary ucall_uninit()
Sean Christopherson [Thu, 6 Oct 2022 00:34:08 +0000 (00:34 +0000)]
KVM: selftests: Drop now-unnecessary ucall_uninit()

Drop ucall_uninit() and ucall_arch_uninit() now that ARM doesn't modify
the host's copy of ucall_exit_mmio_addr, i.e. now that there's no need to
reset the pointer before potentially creating a new VM.  The few calls to
ucall_uninit() are all immediately followed by kvm_vm_free(), and that is
likely always going to hold true, i.e. it's extremely unlikely a test
will want to effectively disable ucall in the middle of a test.

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-7-seanjc@google.com
20 months agoKVM: selftests: Make arm64's MMIO ucall multi-VM friendly
Sean Christopherson [Thu, 6 Oct 2022 00:34:07 +0000 (00:34 +0000)]
KVM: selftests: Make arm64's MMIO ucall multi-VM friendly

Fix a mostly-theoretical bug where ARM's ucall MMIO setup could result in
different VMs stomping on each other by cloberring the global pointer.

Fix the most obvious issue by saving the MMIO gpa into the VM.

A more subtle bug is that creating VMs in parallel (on multiple tasks)
could result in a VM using the wrong address.  Synchronizing a global to
a guest effectively snapshots the value on a per-VM basis, i.e. the
"global" is already prepped to work with multiple VMs, but setting the
global in the host is not thread-safe.  To fix that bug, add
write_guest_global() to allow stuffing a VM's copy of a "global" without
modifying the host value.

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-6-seanjc@google.com