review.tizen.org Git - platform/kernel/linux-starfive.git/log

KVM: x86: hyper-v: Fix 'using uninitialized value' Coverity warning

In kvm_hv_flush_tlb(), 'data_offset' and 'consumed_xmm_halves' variables
are used in a mutually exclusive way: in 'hc->fast' we count in 'XMM
halves' and increase 'data_offset' otherwise. Coverity discovered, that in
one case both variables are incremented unconditionally. This doesn't seem
to cause any issues as the only user of 'data_offset'/'consumed_xmm_halves'
data is kvm_hv_get_tlb_flush_entries() -> kvm_hv_get_hc_data() which also
takes into account 'hc->fast' but is still worth fixing.

To make things explicit, put 'data_offset' and 'consumed_xmm_halves' to
'struct kvm_hv_hcall' as a union and use at call sites. This allows to
remove explicit 'data_offset'/'consumed_xmm_halves' parameters from
kvm_hv_get_hc_data()/kvm_get_sparse_vp_set()/kvm_hv_get_tlb_flush_entries()
helpers.

Note: 'struct kvm_hv_hcall' is allocated on stack in kvm_hv_hypercall() and
is not zeroed, consumers are supposed to initialize the appropriate field
if needed.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527764 ("Uninitialized variables")
Fixes: 260970862c88 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221208102700.959630-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: ioapic: Fix level-triggered EOI and userspace I/OAPIC reconfigure race

When scanning userspace I/OAPIC entries, intercept EOI for level-triggered
IRQs if the current vCPU has a pending and/or in-service IRQ for the
vector in its local API, even if the vCPU doesn't match the new entry's
destination. This fixes a race between userspace I/OAPIC reconfiguration
and IRQ delivery that results in the vector's bit being left set in the
remote IRR due to the eventual EOI not being forwarded to the userspace
I/OAPIC.

Commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC
reconfigure race") fixed the in-kernel IOAPIC, but not the userspace
IOAPIC configuration, which has a similar race.

Fixes: 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race")

Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221208094415.12723-1-attofari@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: Prevent zero period event from being repeatedly released

The current vPMU can reuse the same pmc->perf_event for the same
hardware event via pmc_pause/resume_counter(), but this optimization
does not apply to a portion of the TSX events (e.g., "event=0x3c,in_tx=1,
in_tx_cp=1"), where event->attr.sample_period is legally zero at creation,
thus making the perf call to perf_event_period() meaningless (no need to
adjust sample period in this case), and instead causing such reusable
perf_events to be repeatedly released and created.

Avoid releasing zero sample_period events by checking is_sampling_event()
to follow the previously enable/disable optimization.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20221207071506.15733-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Add proper ReST tables for userspace MSR exits/flags

Add ReST formatting to the set of userspace MSR exits/flags so that the
resulting HTML docs generate a table instead of malformed gunk. This
also fixes a warning that was introduced by a recent cleanup of the
relevant documentation (yay copy+paste).

>> Documentation/virt/kvm/api.rst:7287: WARNING: Block quote ends
without a blank line; unexpected unindent.

Fixes: 1ae099540e8c ("KVM: x86: Allow deflecting unknown MSR accesses to user space")
Fixes: 1f158147181b ("KVM: x86: Clean up KVM_CAP_X86_USER_SPACE_MSR documentation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221207000959.2035098-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge remote-tracking branch 'kvm/queue' into HEAD

x86 Xen-for-KVM:

* Allow the Xen runstate information to cross a page boundary

* Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

* add support for 32-bit guests in SCHEDOP_poll

x86 fixes:

* One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

* Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
   years back when eliminating unnecessary barriers when switching between
   vmcs01 and vmcs02.

* Clean up the MSR filter docs.

* Clean up vmread_error_trampoline() to make it more obvious that params
  must be passed on the stack, even for x86-64.

* Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
  of the current guest CPUID.

* Fudge around a race with TSC refinement that results in KVM incorrectly
  thinking a guest needs TSC scaling when running on a CPU with a
  constant TSC, but no hardware-enumerated TSC frequency.

* Advertise (on AMD) that the SMM_CTL MSR is not supported

* Remove unnecessary exports

Selftests:

* Fix an inverted check in the access tracking perf test, and restore
  support for asserting that there aren't too many idle pages when
  running on bare metal.

* Fix an ordering issue in the AMX test introduced by recent conversions
  to use kvm_cpu_has(), and harden the code to guard against similar bugs
  in the future.  Anything that tiggers caching of KVM's supported CPUID,
  kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
  the caching occurs before the test opts in via prctl().

* Fix build errors that occur in certain setups (unsure exactly what is
  unique about the problematic setup) due to glibc overriding
  static_assert() to a variant that requires a custom message.

* Introduce actual atomics for clear/set_bit() in selftests

Documentation:

* Remove deleted ioctls from documentation

* Various fixes

KVM: selftests: Allocate ucall pool from MEM_REGION_DATA

MEM_REGION_TEST_DATA is meant to hold data explicitly used by a
selftest, not implicit allocations due to the selftests infrastructure.
Allocate the ucall pool from MEM_REGION_DATA much like the rest of the
selftests library allocations.

Fixes: 426729b2cf2e ("KVM: selftests: Add ucall pool based implementation")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-5-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: selftests: Align VA space allocator with TTBR0

An interesting feature of the Arm architecture is that the stage-1 MMU
supports two distinct VA regions, controlled by TTBR{0,1}_EL1. As KVM
selftests on arm64 only uses TTBR0_EL1, the VA space is constrained to
[0, 2^(va_bits-1)). This is different from other architectures that
allow for addressing low and high regions of the VA space from a single
page table.

KVM selftests' VA space allocator presumes the valid address range is
split between low and high memory based the MSB, which of course is a
poor match for arm64's TTBR0 region.

Allow architectures to override the default VA space layout. Make use of
the override to align vpages_valid with the behavior of TTBR0 on arm64.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvmarm-6.2' of https://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.2

- Enable the per-vcpu dirty-ring tracking mechanism, together with an
  option to keep the good old dirty log around for pages that are
  dirtied by something other than a vcpu.

- Switch to the relaxed parallel fault handling, using RCU to delay
  page table reclaim and giving better performance under load.

- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
  option, which multi-process VMMs such as crosvm rely on.

- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
  to have its own view of a vcpu, keeping that state private.

- Add support for the PMUv3p5 architecture revision, bringing support
  for 64bit counters on systems that support it, and fix the
  no-quite-compliant CHAIN-ed counter support for the machines that
  actually exist out there.

- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
  only) as a prefix of the oncoming support for 4kB and 16kB pages.

- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
  stage-2 faults and access tracking. You name it, we got it, we
  probably broke it.

- Pick a small set of documentation and spelling fixes, because no
  good merge window would be complete without those.

As a side effect, this tag also drags:

- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
  series

- A shared branch with the arm64 tree that repaints all the system
  registers to match the ARM ARM's naming, and resulting in
  interesting conflicts

Merge remote-tracking branch 'arm64/for-next/sysregs' into kvmarm-master/next

Merge arm64's sysreg repainting branch to avoid too many
ugly conflicts...

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/misc-6.2 into kvmarm-master/next

* kvm-arm64/misc-6.2:
  : .
  : Misc fixes for 6.2:
  :
  : - Fix formatting for the pvtime documentation
  :
  : - Fix a comment in the VHE-specific Makefile
  : .
  KVM: arm64: Fix typo in comment
  KVM: arm64: Fix pvtime documentation

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/pmu-unchained into kvmarm-master/next

* kvm-arm64/pmu-unchained:
  : .
  : PMUv3 fixes and improvements:
  :
  : - Make the CHAIN event handling strictly follow the architecture
  :
  : - Add support for PMUv3p5 (64bit counters all the way)
  :
  : - Various fixes and cleanups
  : .
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run
  KVM: arm64: PMU: Simplify PMCR_EL0 reset handling
  KVM: arm64: PMU: Replace version number '0' with ID_AA64DFR0_EL1_PMUVer_NI
  KVM: arm64: PMU: Make kvm_pmc the main data structure
  KVM: arm64: PMU: Simplify vcpu computation on perf overflow notification
  KVM: arm64: PMU: Allow PMUv3p5 to be exposed to the guest
  KVM: arm64: PMU: Implement PMUv3p5 long counter support
  KVM: arm64: PMU: Allow ID_DFR0_EL1.PerfMon to be set from userspace
  KVM: arm64: PMU: Allow ID_AA64DFR0_EL1.PMUver to be set from userspace
  KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation
  KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits
  KVM: arm64: PMU: Simplify setting a counter to a specific value
  KVM: arm64: PMU: Add counter_index_to_*reg() helpers
  KVM: arm64: PMU: Only narrow counters that are not 64bit wide
  KVM: arm64: PMU: Narrow the overflow checking when required
  KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
  KVM: arm64: PMU: Always advertise the CHAIN event
  KVM: arm64: PMU: Align chained counter implementation with architecture pseudocode
  arm64: Add ID_DFR0_EL1.PerfMon values for PMUv3p7 and IMP_DEF

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/mte-map-shared into kvmarm-master/next

* kvm-arm64/mte-map-shared:
  : .
  : Update the MTE support to allow the VMM to use shared mappings
  : to back the memslots exposed to MTE-enabled guests.
  :
  : Patches courtesy of Catalin Marinas and Peter Collingbourne.
  : .
  : Fix a number of issues with MTE, such as races on the tags
  : being initialised vs the PG_mte_tagged flag as well as the
  : lack of support for VM_SHARED when KVM is involved.
  :
  : Patches from Catalin Marinas and Peter Collingbourne.
  : .
  Documentation: document the ABI changes for KVM_CAP_ARM_MTE
  KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled
  KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled
  arm64: mte: Lock a page for MTE tag initialisation
  mm: Add PG_arch_3 page flag
  KVM: arm64: Simplify the sanitise_mte_tags() logic
  arm64: mte: Fix/clarify the PG_mte_tagged semantics
  mm: Do not enable PG_arch_2 for all 64-bit architectures

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/pkvm-vcpu-state into kvmarm-master/next

* kvm-arm64/pkvm-vcpu-state: (25 commits)
  : .
  : Large drop of pKVM patches from Will Deacon and co, adding
  : a private vm/vcpu state at EL2, managed independently from
  : the EL1 state. From the cover letter:
  :
  : "This is version six of the pKVM EL2 state series, extending the pKVM
  : hypervisor code so that it can dynamically instantiate and manage VM
  : data structures without the host being able to access them directly.
  : These structures consist of a hyp VM, a set of hyp vCPUs and the stage-2
  : page-table for the MMU. The pages used to hold the hypervisor structures
  : are returned to the host when the VM is destroyed."
  : .
  KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()
  KVM: arm64: Don't unnecessarily map host kernel sections at EL2
  KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2
  KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host
  KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
  KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  KVM: arm64: Consolidate stage-2 initialisation into a single function
  KVM: arm64: Add generic hyp_memcache helpers
  KVM: arm64: Provide I-cache invalidation by virtual address at EL2
  KVM: arm64: Initialise hypervisor copies of host symbols unconditionally
  KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1
  KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  KVM: arm64: Rename 'host_kvm' to 'host_mmu'
  KVM: arm64: Add hyp_spinlock_t static initializer
  KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2
  KVM: arm64: Prevent the donation of no-map pages
  KVM: arm64: Implement do_donate() helper for donating memory
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/parallel-faults into kvmarm-master/next

* kvm-arm64/parallel-faults:
  : .
  : Parallel stage-2 fault handling, courtesy of Oliver Upton.
  : From the cover letter:
  :
  : "Presently KVM only takes a read lock for stage 2 faults if it believes
  : the fault can be fixed by relaxing permissions on a PTE (write unprotect
  : for dirty logging). Otherwise, stage 2 faults grab the write lock, which
  : predictably can pile up all the vCPUs in a sufficiently large VM.
  :
  : Like the TDP MMU for x86, this series loosens the locking around
  : manipulations of the stage 2 page tables to allow parallel faults. RCU
  : and atomics are exploited to safely build/destroy the stage 2 page
  : tables in light of multiple software observers."
  : .
  KVM: arm64: Reject shared table walks in the hyp code
  KVM: arm64: Don't acquire RCU read lock for exclusive table walks
  KVM: arm64: Take a pointer to walker data in kvm_dereference_pteref()
  KVM: arm64: Handle stage-2 faults in parallel
  KVM: arm64: Make table->block changes parallel-aware
  KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  KVM: arm64: Make block->table PTE changes parallel-aware
  KVM: arm64: Split init and set for table PTE
  KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  KVM: arm64: Protect stage-2 traversal with RCU
  KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  KVM: arm64: Use an opaque type for pteps
  KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  KVM: arm64: Pass mm_ops through the visitor context
  KVM: arm64: Stash observed pte value in visitor context
  KVM: arm64: Combine visitor arguments into a context structure

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/dirty-ring into kvmarm-master/next

* kvm-arm64/dirty-ring:
  : .
  : Add support for the "per-vcpu dirty-ring tracking with a bitmap
  : and sprinkles on top", courtesy of Gavin Shan.
  :
  : This branch drags the kvmarm-fixes-6.1-3 tag which was already
  : merged in 6.1-rc4 so that the branch is in a working state.
  : .
  KVM: Push dirty information unconditionally to backup bitmap
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
  KVM: selftests: Use host page size to map ring buffer in dirty_log_test
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: Support dirty ring in conjunction with bitmap
  KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/52bit-fixes into kvmarm-master/next

* kvm-arm64/52bit-fixes:
  : .
  : 52bit PA fixes, courtesy of Ryan Roberts. From the cover letter:
  :
  : "I've been adding support for FEAT_LPA2 to KVM and as part of that work have been
  : testing various (84) configurations of HW, host and guest kernels on FVP. This
  : has thrown up a couple of pre-existing bugs, for which the fixes are provided."
  : .
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: Fix PAR_TO_HPFAR() to work independently of PA_BITS.
  KVM: arm64: Fix kvm init failure when mode!=vhe and VA_BITS=52.

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Fix benign bug with incorrect use of VA_BITS

get_user_mapping_size() uses kvm's pgtable library to walk a user space
page table created by the kernel, and in doing so, passes metadata
that the library needs, including ia_bits, which defines the size of the
input address.

For the case where the kernel is compiled for 52 VA bits but runs on HW
that does not support LVA, it will fall back to 48 VA bits at runtime.
Therefore we must use vabits_actual rather than VA_BITS to get the true
address size.

This is benign in the current code base because the pgtable library only
uses it for error checking.

Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute the THP mapping size")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221205114031.3972780-1-ryan.roberts@arm.com

Merge branch kvm-arm64/selftest/access-tracking into kvmarm-master/next

* kvm-arm64/selftest/access-tracking:
  : .
  : Small series to add support for arm64 to access_tracking_perf_test and
  : correct a couple bugs along the way.
  :
  : Patches courtesy of Oliver Upton.
  : .
  KVM: selftests: Build access_tracking_perf_test for arm64
  KVM: selftests: Have perf_test_util signal when to stop vCPUs

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/selftest/s2-faults into kvmarm-master/next

* kvm-arm64/selftest/s2-faults:
  : .
  : New KVM/arm64 selftests exercising various sorts of S2 faults, courtesy
  : of Ricardo Koller. From the cover letter:
  :
  : "This series adds a new aarch64 selftest for testing stage 2 fault handling
  : for various combinations of guest accesses (e.g., write, S1PTW), backing
  : sources (e.g., anon), and types of faults (e.g., read on hugetlbfs with a
  : hole, write on a readonly memslot). Each test tries a different combination
  : and then checks that the access results in the right behavior (e.g., uffd
  : faults with the right address and write/read flag). [...]"
  : .
  KVM: selftests: aarch64: Add mix of tests into page_fault_test
  KVM: selftests: aarch64: Add readonly memslot tests into page_fault_test
  KVM: selftests: aarch64: Add dirty logging tests into page_fault_test
  KVM: selftests: aarch64: Add userfaultfd tests into page_fault_test
  KVM: selftests: aarch64: Add aarch64/page_fault_test
  KVM: selftests: Use the right memslot for code, page-tables, and data allocations
  KVM: selftests: Fix alignment in virt_arch_pgd_alloc() and vm_vaddr_alloc()
  KVM: selftests: Add vm->memslots[] and enum kvm_mem_region_type
  KVM: selftests: Stash backing_src_type in struct userspace_mem_region
  tools: Copy bitfield.h from the kernel sources
  KVM: selftests: aarch64: Construct DEFAULT_MAIR_EL1 using sysreg.h macros
  KVM: selftests: Add missing close and munmap in __vm_mem_region_delete()
  KVM: selftests: aarch64: Add virt_get_pte_hva() library function
  KVM: selftests: Add a userfaultfd library

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/selftest/linked-bps into kvmarm-master/next

* kvm-arm64/selftest/linked-bps:
  : .
  : Additional selftests for the arm64 breakpoints/watchpoints,
  : courtesy of Reiji Watanabe. From the cover letter:
  :
  : "This series adds test cases for linked {break,watch}points to the
  : debug-exceptions test, and expands {break,watch}point tests to
  : use non-zero {break,watch}points (the current test always uses
  : {break,watch}point#0)."
  : .
  KVM: arm64: selftests: Test with every breakpoint/watchpoint
  KVM: arm64: selftests: Add a test case for a linked watchpoint
  KVM: arm64: selftests: Add a test case for a linked breakpoint
  KVM: arm64: selftests: Change debug_version() to take ID_AA64DFR0_EL1
  KVM: arm64: selftests: Stop unnecessary test stage tracking of debug-exceptions
  KVM: arm64: selftests: Add helpers to enable debug exceptions
  KVM: arm64: selftests: Remove the hard-coded {b,w}pn#0 from debug-exceptions
  KVM: arm64: selftests: Add write_dbg{b,w}{c,v}r helpers in debug-exceptions
  KVM: arm64: selftests: Use FIELD_GET() to extract ID register fields

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/selftest/memslot-fixes into kvmarm-master/next

* kvm-arm64/selftest/memslot-fixes:
  : .
  : KVM memslot selftest fixes for non-4kB page sizes, courtesy
  : of Gavin Shan. From the cover letter:
  :
  : "kvm/selftests/memslots_perf_test doesn't work with 64KB-page-size-host
  : and 4KB-page-size-guest on aarch64. In the implementation, the host and
  : guest page size have been hardcoded to 4KB. It's ovbiously not working
  : on aarch64 which supports 4KB, 16KB, 64KB individually on host and guest.
  :
  : This series tries to fix it. After the series is applied, the test runs
  : successfully with 64KB-page-size-host and 4KB-page-size-guest."
  : .
  KVM: selftests: memslot_perf_test: Report optimal memory slots
  KVM: selftests: memslot_perf_test: Consolidate memory
  KVM: selftests: memslot_perf_test: Support variable guest page size
  KVM: selftests: memslot_perf_test: Probe memory slots for once
  KVM: selftests: memslot_perf_test: Consolidate loop conditions in prepare_vm()
  KVM: selftests: memslot_perf_test: Use data->nslots in prepare_vm()

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow

Fix the bogus masking when computing the period of a 64bit counter
with 32bit overflow. It really should be treated like a 32bit counter
for the purpose of the period.

Reported-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/Y4jbosgHbUDI0WF4@google.com

Merge branch 'gpc-fixes' of git://git.infradead.org/users/dwmw2/linux into HEAD

Pull Xen-for-KVM changes from David Woodhouse:

* add support for 32-bit guests in SCHEDOP_poll

* the rest of the gfn-to-pfn cache API cleanup

"I still haven't reinstated the last of those patches to make gpc->len
immutable."

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Advertise that the SMM_CTL MSR is not supported

CPUID.80000021H:EAX[bit 9] indicates that the SMM_CTL MSR (0xc0010116) is
not supported. This defeature can be advertised by KVM_GET_SUPPORTED_CPUID
regardless of whether or not the host enumerates it; currently it will be
included only if the host enumerates at least leaf 8000001DH, due to a
preexisting bug in QEMU that KVM has to work around (commit f751d8eac176,
"KVM: x86: work around QEMU issue with synthetic CPUID leaves", 2022-04-29).

Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20221007221644.138355-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: remove unnecessary exports

Several symbols are not used by vendor modules but still exported.
Removing them ensures that new coupling between kvm.ko and kvm-*.ko
is noticed and reviewed.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"

There is a spelling mistake in some help text. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <20221201091354.1613652-1-colin.i.king@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tools: KVM: selftests: Convert clear/set_bit() to actual atomics

Convert {clear,set}_bit() to atomics as KVM's ucall implementation relies
on clear_bit() being atomic, they are defined in atomic.h, and the same
helpers in the kernel proper are atomic.

KVM's ucall infrastructure is the only user of clear_bit() in tools/, and
there are no true set_bit() users. tools/testing/nvdimm/ does make heavy
use of set_bit(), but that code builds into a kernel module of sorts, i.e.
pulls in all of the kernel's header and so is already getting the kernel's
atomic set_bit().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tools: Drop "atomic_" prefix from atomic test_and_set_bit()

Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to
match the kernel nomenclature where test_and_set_bit() is atomic,
and __test_and_set_bit() provides the non-atomic variant.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers

Drop tools' non-atomic test_and_set_bit() and test_and_clear_bit() helpers
now that all users are gone. The names will be claimed in the future for
atomic versions.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests

Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.

Leave the usage in ucall_free() as-is, it's the one place in tools/ that
actually wants/needs atomic behavior.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

perf tools: Use dedicated non-atomic clear/set bit helpers

Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Message-Id: <20221119013450.2643007-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers

Take @bit as an unsigned long instead of a signed int in clear_bit() and
set_bit() so that they match the double-underscore versions, __clear_bit()
and __set_bit().  This will allow converting users that really don't want
atomic operations to the double-underscores without introducing a
functional change, which will in turn allow making {clear,set}_bit()
atomic (as advertised).

Practically speaking, this _should_ have no functional impact.  KVM's
selftests usage is either hardcoded (Hyper-V tests) or is artificially
limited (arch_timer test and dirty_log test).  In KVM, dirty_log test is
the only mildly interesting case as it's use indirectly restricted to
unsigned 32-bit values, but in theory it could generate a negative value
when cast to a signed int.  But in that case, taking an "unsigned long"
is actually a bug fix.

Perf's usage is more difficult to audit, but any code that is affected
by the switch is likely already broken.  perf_header__{set,clear}_feat()
and perf_file_header__read() effectively use only hardcoded enums with
small, positive values, atom_new() passes an unsigned long, but its value
is capped at 128 via NR_ATOM_PER_PAGE, etc...

The only real potential for breakage is in the perf flows that take a
"cpu", but it's unlikely perf is subtly relying on a negative index into
bitmaps, e.g. "cpu" can be "-1", but only as "not valid" placeholder.

Note, tools/testing/nvdimm/ makes heavy use of set_bit(), but that code
builds into a kernel module of sorts, i.e. pulls in all of the kernel's
header and so is getting the kernel's atomic set_bit().  The NVDIMM test
usage of atomics is likely unnecessary, e.g. ndtest_dimm_register() sets
bits in a local variable, but that's neither here nor there as far as
this change is concerned.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: selftests: Enable single-step without a "full" ucall()

Add a new ucall hook, GUEST_UCALL_NONE(), to allow tests to make ucalls
without allocating a ucall struct, and use it to enable single-step
in ARM's debug-exceptions test. Like the disable single-step path, the
enabling path also needs to ensure that no exclusive access sequences are
attempted after enabling single-step, as the exclusive monitor is cleared
on ERET from the debug exception taken to EL2.

The test currently "works" because clear_bit() isn't actually an atomic
operation... yet.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself

When a VM reboots itself, the reset process will result in
an ioctl(KVM_SET_LAPIC, ...) to disable x2APIC mode and set
the xAPIC id of the vCPU to its default value, which is the
vCPU id.

That will be handled in KVM as follows:

     kvm_vcpu_ioctl_set_lapic
       kvm_apic_set_state
  kvm_lapic_set_base  =>  disable X2APIC mode
    kvm_apic_state_fixup
      kvm_lapic_xapic_id_updated
        kvm_xapic_id(apic) != apic->vcpu->vcpu_id
kvm_set_apicv_inhibit(APICV_INHIBIT_REASON_APIC_ID_MODIFIED)
   memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s))  => update APIC_ID

When kvm_apic_set_state invokes kvm_lapic_set_base to disable
x2APIC mode, the old 32-bit x2APIC id is still present rather
than the 8-bit xAPIC id.  kvm_lapic_xapic_id_updated will set the
APICV_INHIBIT_REASON_APIC_ID_MODIFIED bit and disable APICv/x2AVIC.

Instead, kvm_lapic_xapic_id_updated must be called after APIC_ID is
changed.

In fact, this fixes another small issue in the code in that
potential changes to a vCPU's xAPIC ID need not be tracked for
KVM_GET_LAPIC.

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Signed-off-by: Yuan ZhaoXiong <yuanzhaoxiong@baidu.com>
Message-Id: <1669984574-32692-1-git-send-email-yuanzhaoxiong@baidu.com>
Cc: stable@vger.kernel.org
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Remove stale comment about KVM_REQ_UNHALT

Remove a comment about KVM_REQ_UNHALT being set by kvm_vcpu_check_block()
that was missed when KVM_REQ_UNHALT was dropped.

Fixes: c59fb1275838 ("KVM: remove KVM_REQ_UNHALT")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221201220433.31366-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEAD

Misc KVM x86 fixes and cleanups for 6.2:

- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
   years back when eliminating unnecessary barriers when switching between
   vmcs01 and vmcs02.

- Clean up the MSR filter docs.

- Clean up vmread_error_trampoline() to make it more obvious that params
   must be passed on the stack, even for x86-64.

- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
   of the current guest CPUID.

- Fudge around a race with TSC refinement that results in KVM incorrectly
   thinking a guest needs TSC scaling when running on a CPU with a
   constant TSC, but no hardware-enumerated TSC frequency.

Merge tag 'kvm-selftests-6.2-2' of https://github.com/kvm-x86/linux into HEAD

KVM selftests fixes for 6.2

- Fix an inverted check in the access tracking perf test, and restore
   support for asserting that there aren't too many idle pages when
   running on bare metal.

- Fix an ordering issue in the AMX test introduced by recent conversions
   to use kvm_cpu_has(), and harden the code to guard against similar bugs
   in the future.  Anything that tiggers caching of KVM's supported CPUID,
   kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
   the caching occurs before the test opts in via prctl().

- Fix build errors that occur in certain setups (unsure exactly what is
   unique about the problematic setup) due to glibc overriding
   static_assert() to a variant that requires a custom message.

KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR

The ioctls are missing an architecture property that is present in others.

Suggested-by: Sergio Lopez Pascual <slp@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-5-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Reference to kvm_userspace_memory_region in doc and comments

There are still references to the removed kvm_memory_region data structure
but the doc and comments should mention struct kvm_userspace_memory_region
instead, since that is what's used by the ioctl that replaced the old one
and this data structure support the same set of flags.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-4-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl

The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-3-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Delete all references to removed KVM_SET_MEMORY_REGION ioctl

The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-2-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Define and use a custom static assert in lib headers

Define and use kvm_static_assert() in the common KVM selftests headers to
provide deterministic behavior, and to allow creating static asserts
without dummy messages.

The kernel's static_assert() makes the message param optional, and on the
surface, tools/include/linux/build_bug.h appears to follow suit.  However,
glibc may override static_assert() and redefine it as a direct alias of
_Static_assert(), which makes the message parameter mandatory.  This leads
to non-deterministic behavior as KVM selftests code that utilizes
static_assert() without a custom message may or not compile depending on
the order of includes.  E.g. recently added asserts in
x86_64/processor.h fail on some systems with errors like

  In file included from lib/memstress.c:11:0:
  include/x86_64/processor.h: In function ‘this_cpu_has_p’:
  include/x86_64/processor.h:193:34: error: expected ‘,’ before ‘)’ token
    static_assert(low_bit < high_bit);     \
                                    ^
due to _Static_assert() expecting a comma before a message.  The "message
optional" version of static_assert() uses macro magic to strip away the
comma when presented with empty an __VA_ARGS__

  #ifndef static_assert
  #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
  #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
  #endif // static_assert

and effectively generates "_Static_assert(expr, #expr)".

The incompatible version of static_assert() gets defined by this snippet
in /usr/include/assert.h:

  #if defined __USE_ISOC11 && !defined __cplusplus
  # undef static_assert
  # define static_assert _Static_assert
  #endif

which yields "_Static_assert(expr)" and thus fails as above.

KVM selftests don't actually care about using C11, but __USE_ISOC11 gets
defined because of _GNU_SOURCE, which many tests do #define.  _GNU_SOURCE
triggers a massive pile of defines in /usr/include/features.h, including
_ISOC11_SOURCE:

  /* If _GNU_SOURCE was defined by the user, turn on all the other features.  */
  #ifdef _GNU_SOURCE
  # undef  _ISOC95_SOURCE
  # define _ISOC95_SOURCE 1
  # undef  _ISOC99_SOURCE
  # define _ISOC99_SOURCE 1
  # undef  _ISOC11_SOURCE
  # define _ISOC11_SOURCE 1
  # undef  _POSIX_SOURCE
  # define _POSIX_SOURCE  1
  # undef  _POSIX_C_SOURCE
  # define _POSIX_C_SOURCE        200809L
  # undef  _XOPEN_SOURCE
  # define _XOPEN_SOURCE  700
  # undef  _XOPEN_SOURCE_EXTENDED
  # define _XOPEN_SOURCE_EXTENDED 1
  # undef  _LARGEFILE64_SOURCE
  # define _LARGEFILE64_SOURCE    1
  # undef  _DEFAULT_SOURCE
  # define _DEFAULT_SOURCE        1
  # undef  _ATFILE_SOURCE
  # define _ATFILE_SOURCE 1
  #endif

which further down in /usr/include/features.h leads to:

  /* This is to enable the ISO C11 extension.  */
  #if (defined _ISOC11_SOURCE \
       || (defined __STDC_VERSION__ && __STDC_VERSION__ >= 201112L))
  # define __USE_ISOC11   1
  #endif

To make matters worse, /usr/include/assert.h doesn't guard against
multiple inclusion by turning itself into a nop, but instead #undefs a
few macros and continues on.  As a result, it's all but impossible to
ensure the "message optional" version of static_assert() will actually be
used, e.g. explicitly including assert.h and #undef'ing static_assert()
doesn't work as a later inclusion of assert.h will again redefine its
version.

  #ifdef  _ASSERT_H

  # undef _ASSERT_H
  # undef assert
  # undef __ASSERT_VOID_CAST

  # ifdef __USE_GNU
  #  undef assert_perror
  # endif

  #endif /* assert.h      */

  #define _ASSERT_H       1
  #include <features.h>

Fixes: fcba483e8246 ("KVM: selftests: Sanity check input to ioctls() at build time")
Fixes: ee3795536664 ("KVM: selftests: Refactor X86_FEATURE_* framework to prep for X86_PROPERTY_*")
Fixes: 53a7dc0f215e ("KVM: selftests: Add X86_PROPERTY_* framework to retrieve CPUID values")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221122013309.1872347-1-seanjc@google.com

KVM: selftests: Do kvm_cpu_has() checks before creating VM+vCPU

Move the AMX test's kvm_cpu_has() checks before creating the VM+vCPU,
there are no dependencies between the two operations. Opportunistically
add a comment to call out that enabling off-by-default XSAVE-managed
features must be done before KVM_GET_SUPPORTED_CPUID is cached.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-5-seanjc@google.com

KVM: selftests: Disallow "get supported CPUID" before REQ_XCOMP_GUEST_PERM

Disallow using kvm_get_supported_cpuid() and thus caching KVM's supported
CPUID info before enabling XSAVE-managed features that are off-by-default
and must be enabled by ARCH_REQ_XCOMP_GUEST_PERM. Caching the supported
CPUID before all XSAVE features are enabled can result in false negatives
due to testing features that were cached before they were enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-4-seanjc@google.com

KVM: selftests: Move __vm_xsave_require_permission() below CPUID helpers

Move __vm_xsave_require_permission() below the CPUID helpers so that a
future change can reference the cached result of KVM_GET_SUPPORTED_CPUID
while keeping the definition of the variable close to its intended user,
kvm_get_supported_cpuid().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-3-seanjc@google.com

KVM: selftests: Move XFD CPUID checking out of __vm_xsave_require_permission()

Move the kvm_cpu_has() check on X86_FEATURE_XFD out of the helper to
enable off-by-default XSAVE-managed features and into the one test that
currenty requires XFD (XFeature Disable) support. kvm_cpu_has() uses
kvm_get_supported_cpuid() and thus caches KVM_GET_SUPPORTED_CPUID, and so
using kvm_cpu_has() before ARCH_REQ_XCOMP_GUEST_PERM effectively results
in the test caching stale values, e.g. subsequent checks on AMX_TILE will
get false negatives.

Although off-by-default features are nonsensical without XFD, checking
for XFD virtualization prior to enabling such features isn't strictly
required.

Signed-off-by: Lei Wang <lei4.wang@intel.com>
Fixes: 7fbb653e01fd ("KVM: selftests: Check KVM's supported CPUID, not host CPUID, for XFD")
Link: https://lore.kernel.org/r/20221125023839.315207-1-lei4.wang@intel.com
[sean: add Fixes, reword changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-2-seanjc@google.com

KVM: selftests: Restore assert for non-nested VMs in access tracking test

Restore the assert (on x86-64) that <10% of pages are still idle when NOT
running as a nested VM in the access tracking test. The original assert
was converted to a "warning" to avoid false failures when running the
test in a VM, but the non-nested case does not suffer from the same
"infinite TLB size" issue.

Using the HYPERVISOR flag isn't infallible as VMMs aren't strictly
required to enumerate the "feature" in CPUID, but practically speaking
anyone that is running KVM selftests in VMs is going to be using a VMM
and hypervisor that sets the HYPERVISOR flag.

Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-3-seanjc@google.com

KVM: selftests: Fix inverted "warning" in access tracking perf test

Warn if the number of idle pages is greater than or equal to 10% of the
total number of pages, not if the percentage of idle pages is less than
10%. The original code asserted that less than 10% of pages were still
idle, but the check got inverted when the assert was converted to a
warning.

Opportunistically clean up the warning; selftests are 64-bit only, there
is no need to use "%PRIu64" instead of "%lu".

Fixes: 6336a810db5c ("KVM: selftests: replace assertion with warning in access_tracking_perf_test")
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-2-seanjc@google.com

arm64/sysreg: Remove duplicate definitions from asm/sysreg.h

With the new-fangled generation of asm/sysreg-defs.h, some definitions
have ended up being duplicated between the two files.

Remove these duplicate definitions, and consolidate the naming for
GMID_EL1_BS_WIDTH.

Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation

Convert ID_DFR1_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-39-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation

Convert ID_DFR0_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221130171637.718182-38-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation

Convert ID_AFR0_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-37-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation

Convert ID_MMFR5_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-36-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert MVFR2_EL1 to automatic generation

Convert MVFR2_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-35-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert MVFR1_EL1 to automatic generation

Convert MVFR1_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-34-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert MVFR0_EL1 to automatic generation

Convert MVFR0_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-33-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation

Convert ID_PFR2_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-32-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation

Convert ID_PFR1_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-31-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation

Convert ID_PFR0_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221130171637.718182-30-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR6_EL1 to automatic generation

Convert ID_ISAR6_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-29-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR5_EL1 to automatic generation

Convert ID_ISAR5_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-28-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR4_EL1 to automatic generation

Convert ID_ISAR4_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-27-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR3_EL1 to automatic generation

Convert ID_ISAR3_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-26-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR2_EL1 to automatic generation

Convert ID_ISAR2_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-25-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR1_EL1 to automatic generation

Convert ID_ISAR1_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-24-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_ISAR0_EL1 to automatic generation

Convert ID_ISAR0_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-23-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_MMFR4_EL1 to automatic generation

Convert ID_MMFR4_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-22-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_MMFR3_EL1 to automatic generation

Convert ID_MMFR3_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-21-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_MMFR2_EL1 to automatic generation

Convert ID_MMFR2_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-20-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_MMFR1_EL1 to automatic generation

Convert ID_MMFR1_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-19-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert ID_MMFR0_EL1 to automatic generation

Convert ID_MMFR0_EL1 to be automatically generated as per DDI0487I.a,
no functional changes.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-18-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Extend the maximum width of a register and symbol name

32bit has multiple values for its id registers, as extra properties
were added to the CPUs. Some of these end up having long names, which
exceed the fixed 48 character column that the sysreg awk script generates.

For example, the ID_MMFR1_EL1.L1Hvd field has an encoding whose natural
name would be 'invalidate Iside only'. Using this causes compile errors
as the script generates the following:
#define ID_MMFR1_EL1_L1Hvd_INVALIDATE_ISIDE_ONLYUL(0b0001)

Add a few extra characters.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-17-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for MVFR2_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the MVFR2_EL1 register use lower-case for feature
names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-16-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for MVFR1_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the MVFR1_EL1 register use lower-case for feature
names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-15-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for MVFR0_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the MVFR0_EL1 register use lower-case for feature
names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-14-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_DFR1_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_DFR1_EL1 register have an _EL1 suffix.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-13-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_DFR0_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_DFR0_EL1 register have an _EL1 suffix,
and use lower-case for feature names where the arm-arm does the same.

The arm-arm has feature names for some of the ID_DFR0_EL1.PerMon encodings.
Use these feature names in preference to the '8_4' indication of the
architecture version they were introduced in.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-12-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_PFR2_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_PFR2_EL1 register have an _EL1 suffix.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-11-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_PFR1_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_PFR1_EL1 register have an _EL1 suffix,
and use lower case in feature names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-10-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_PFR0_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_PFR0_EL1 register have an _EL1 suffix,
and use lower case in feature names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-9-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_ISAR6_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_ISAR6_EL1 register have an _EL1 suffix.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-8-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_ISAR5_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_ISAR5_EL1 register have an _EL1 suffix.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-7-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_ISAR4_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_ISAR4_EL1 register have an _EL1 suffix,
and use lower-case for feature names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-6-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_ISAR0_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_ISAR0_EL1 register have an _EL1 suffix,
and use lower-case for feature names where the arm-arm does the same.

To functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-5-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_MMFR5_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_MMFR5_EL1 register have an _EL1 suffix.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-4-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_MMFR4_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates.

Ensure symbols for the ID_MMFR4_EL1 register have an _EL1 suffix,
and use lower case in feature names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-3-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Standardise naming for ID_MMFR0_EL1

To convert the 32bit id registers to use the sysreg generation, they
must first have a regular pattern, to match the symbols the script
generates. The scripts would like to follow exactly what is in the
arm-arm, which uses lower case for some of these feature names.

Ensure symbols for the ID_MMFR0_EL1 register have an _EL1 suffix,
and use lower case in feature names where the arm-arm does the same.

No functional change.

Signed-off-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20221130171637.718182-2-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

KVM: x86: Use current rather than snapshotted TSC frequency if it is constant

Don't snapshot tsc_khz into per-cpu cpu_tsc_khz if the host TSC is
constant, in which case the actual TSC frequency will never change and thus
capturing TSC during initialization is unnecessary, KVM can simply use
tsc_khz.  This value is snapshotted from
kvm_timer_init->kvmclock_cpu_online->tsc_khz_changed(NULL)

On CPUs with constant TSC, but not a hardware-specified TSC frequency,
snapshotting cpu_tsc_khz and using that to set a VM's target TSC frequency
can lead to VM to think its TSC frequency is not what it actually is if
refining the TSC completes after KVM snapshots tsc_khz.  The actual
frequency never changes, only the kernel's calculation of what that
frequency is changes.

Ideally, KVM would not be able to race with TSC refinement, or would have
a hook into tsc_refine_calibration_work() to get an alert when refinement
is complete.  Avoiding the race altogether isn't practical as refinement
takes a relative eternity; it's deliberately put on a work queue outside of
the normal boot sequence to avoid unnecessarily delaying boot.

Adding a hook is doable, but somewhat gross due to KVM's ability to be
built as a module.  And if the TSC is constant, which is likely the case
for every VMX/SVM-capable CPU produced in the last decade, the race can be
hit if and only if userspace is able to create a VM before TSC refinement
completes; refinement is slow, but not that slow.

For now, punt on a proper fix, as not taking a snapshot can help some uses
cases and not taking a snapshot is arguably correct irrespective of the
race with refinement.

Signed-off-by: Anton Romanov <romanton@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220608183525.1143682-1-romanton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Verify userspace can stuff IA32_FEATURE_CONTROL at will

Verify the KVM allows userspace to set all supported bits in the
IA32_FEATURE_CONTROL MSR irrespective of the current guest CPUID, and
that all unsupported bits are rejected.

Throw the testcase into vmx_msrs_test even though it's not technically a
VMX MSR; it's close enough, and the most frequently feature controlled by
the MSR is VMX.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220607232353.3375324-4-seanjc@google.com

KVM: VMX: Move MSR_IA32_FEAT_CTL.LOCKED check into "is valid" helper

Move the check on IA32_FEATURE_CONTROL being locked, i.e. read-only from
the guest, into the helper to check the overall validity of the incoming
value. Opportunistically rename the helper to make it clear that it
returns a bool.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220607232353.3375324-3-seanjc@google.com

KVM: VMX: Allow userspace to set all supported FEATURE_CONTROL bits

Allow userspace to set all supported bits in MSR IA32_FEATURE_CONTROL
irrespective of the guest CPUID model, e.g. via KVM_SET_MSRS. KVM's ABI
is that userspace is allowed to set MSRs before CPUID, i.e. can set MSRs
to values that would fault according to the guest CPUID model.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220607232353.3375324-2-seanjc@google.com

KVM: VMX: Make vmread_error_trampoline() uncallable from C code

Declare vmread_error_trampoline() as an opaque symbol so that it cannot
be called from C code, at least not without some serious fudging. The
trampoline always passes parameters on the stack so that the inline
VMREAD sequence doesn't need to clobber registers. regparm(0) was
originally added to document the stack behavior, but it ended up being
confusing because regparm(0) is a nop for 64-bit targets.

Opportunustically wrap the trampoline and its declaration in #ifdeffery
to make it even harder to invoke incorrectly, to document why it exists,
and so that it's not left behind if/when CONFIG_CC_HAS_ASM_GOTO_OUTPUT
is true for all supported toolchains.

No functional change intended.

Cc: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220928232015.745948-1-seanjc@google.com

KVM: nVMX: Reword comments about generating nested CR0/4 read shadows

Reword the comments that (attempt to) document nVMX's overrides of the
CR0/4 read shadows for L2 after calling vmx_set_cr0/4(). The important
behavior that needs to be documented is that KVM needs to override the
shadows to account for L1's masks even though the shadows are set by the
common helpers (and that setting the shadows first would result in the
correct shadows being clobbered).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220831000721.4066617-1-seanjc@google.com

KVM: x86: Clean up KVM_CAP_X86_USER_SPACE_MSR documentation

Clean up the KVM_CAP_X86_USER_SPACE_MSR documentation to eliminate
misleading and/or inconsistent verbiage, and to actually document what
accesses are intercepted by which flags.

  - s/will/may since not all #GPs are guaranteed to be intercepted
  - s/deflect/intercept to align with common KVM terminology
  - s/user space/userspace to align with the majority of KVM docs
  - Avoid using "trap" terminology, as KVM exits to userspace _before_
    stepping, i.e. doesn't exhibit trap-like behavior
  - Actually document the flags

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220831001706.4075399-4-seanjc@google.com

KVM: x86: Reword MSR filtering docs to more precisely define behavior

Reword the MSR filtering documentatiion to more precisely define the
behavior of filtering using common virtualization terminology.

  - Explicitly document KVM's behavior when an MSR is denied
  - s/handled/allowed as there is no guarantee KVM will "handle" the
    MSR access
  - Drop the "fall back" terminology, which incorrectly suggests that
    there is existing KVM behavior to fall back to
  - Fix an off-by-one error in the range (the end is exclusive)
  - Call out the interaction between MSR filtering and
    KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER
  - Delete the redundant paragraph on what '0' and '1' in the bitmap
    means, it's covered by the sections on KVM_MSR_FILTER_{READ,WRITE}
  - Delete the clause on x2APIC MSR behavior depending on APIC base, this
    is covered by stating that KVM follows architectural behavior when
    emulating/virtualizing MSR accesses

Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220831001706.4075399-3-seanjc@google.com

KVM: x86: Delete documentation for READ|WRITE in KVM_X86_SET_MSR_FILTER

Delete the paragraph that describes the behavior when both
KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE are set for a range. There is
nothing special about KVM's handling of this combination, whereas
explicitly documenting the combination suggests that there is some magic
behavior the user needs to be aware of.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220831001706.4075399-2-seanjc@google.com

KVM: VMX: Execute IBPB on emulated VM-exit when guest has IBRS

According to Intel's document on Indirect Branch Restricted
Speculation, "Enabling IBRS does not prevent software from controlling
the predicted targets of indirect branches of unrelated software
executed later at the same predictor mode (for example, between two
different user applications, or two different virtual machines). Such
isolation can be ensured through use of the Indirect Branch Predictor
Barrier (IBPB) command." This applies to both basic and enhanced IBRS.

Since L1 and L2 VMs share hardware predictor modes (guest-user and
guest-kernel), hardware IBRS is not sufficient to virtualize
IBRS. (The way that basic IBRS is implemented on pre-eIBRS parts,
hardware IBRS is actually sufficient in practice, even though it isn't
sufficient architecturally.)

For virtual CPUs that support IBRS, add an indirect branch prediction
barrier on emulated VM-exit, to ensure that the predicted targets of
indirect branches executed in L1 cannot be controlled by software that
was executed in L2.

Since we typically don't intercept guest writes to IA32_SPEC_CTRL,
perform the IBPB at emulated VM-exit regardless of the current
IA32_SPEC_CTRL.IBRS value, even though the IBPB could technically be
deferred until L1 sets IA32_SPEC_CTRL.IBRS, if IA32_SPEC_CTRL.IBRS is
clear at emulated VM-exit.

This is CVE-2022-2196.

Fixes: 5c911beff20a ("KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221019213620.1953281-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: VMX: Guest usage of IA32_SPEC_CTRL is likely

At this point in time, most guests (in the default, out-of-the-box
configuration) are likely to use IA32_SPEC_CTRL. Therefore, drop the
compiler hint that it is unlikely for KVM to be intercepting WRMSR of
IA32_SPEC_CTRL.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221019213620.1953281-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: nVMX: Inject #GP, not #UD, if "generic" VMXON CR0/CR4 check fails

Inject #GP for if VMXON is attempting with a CR0/CR4 that fails the
generic "is CRx valid" check, but passes the CR4.VMXE check, and do the
generic checks _after_ handling the post-VMXON VM-Fail.

The CR4.VMXE check, and all other #UD cases, are special pre-conditions
that are enforced prior to pivoting on the current VMX mode, i.e. occur
before interception if VMXON is attempted in VMX non-root mode.

All other CR0/CR4 checks generate #GP and effectively have lower priority
than the post-VMXON check.

Per the SDM:

    IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ...
        THEN #UD;
    ELSIF not in VMX operation
        THEN
            IF (CPL > 0) or (in A20M mode) or
            (the values of CR0 and CR4 are not supported in VMX operation)
                THEN #GP(0);
    ELSIF in VMX non-root operation
        THEN VMexit;
    ELSIF CPL > 0
        THEN #GP(0);
    ELSE VMfail("VMXON executed in VMX root operation");
    FI;

which, if re-written without ELSIF, yields:

    IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ...
        THEN #UD

    IF in VMX non-root operation
        THEN VMexit;

    IF CPL > 0
        THEN #GP(0)

    IF in VMX operation
        THEN VMfail("VMXON executed in VMX root operation");

    IF (in A20M mode) or
       (the values of CR0 and CR4 are not supported in VMX operation)
                THEN #GP(0);

Note, KVM unconditionally forwards VMXON VM-Exits that occur in L2 to L1,
i.e. there is no need to check the vCPU is not in VMX non-root mode.  Add
a comment to explain why unconditionally forwarding such exits is
functionally correct.

Reported-by: Eric Li <ercli@ucdavis.edu>
Fixes: c7d855c2aff2 ("KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006001956.329314-1-seanjc@google.com

KVM: SVM: Replace kmap_atomic() with kmap_local_page()

The use of kmap_atomic() is being deprecated in favor of
kmap_local_page()[1].

The main difference between atomic and local mappings is that local
mappings don't disable page faults or preemption.

There're 2 reasons we can use kmap_local_page() here:
1. SEV is 64-bit only and kmap_local_page() doesn't disable migration in
this case, but here the function clflush_cache_range() uses CLFLUSHOPT
instruction to flush, and on x86 CLFLUSHOPT is not CPU-local and flushes
the page out of the entire cache hierarchy on all CPUs (APM volume 3,
chapter 3, CLFLUSHOPT). So there's no need to disable preemption to ensure
CPU-local.
2. clflush_cache_range() doesn't need to disable pagefault and the mapping
is still valid even if sleeps. This is also true for sched out/in when
preempted.

In addition, though kmap_local_page() is a thin wrapper around
page_address() on 64-bit, kmap_local_page() should still be used here in
preference to page_address() since page_address() isn't suitable to be used
in a generic function (like sev_clflush_pages()) where the page passed in
is not easy to determine the source of allocation. Keeping the kmap* API in
place means it can be used for things other than highmem mappings[2].

Therefore, sev_clflush_pages() is a function that should use
kmap_local_page() in place of kmap_atomic().

Convert the calls of kmap_atomic() / kunmap_atomic() to kmap_local_page() /
kunmap_local().

[1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com
[2]: https://lore.kernel.org/lkml/5d667258-b58b-3d28-3609-e7914c99b31b@intel.com/

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220928092748.463631-1-zhao1.liu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>