Sean Christopherson [Wed, 30 Nov 2022 23:09:18 +0000 (23:09 +0000)]
KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.
Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.
Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.
Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <
20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:17 +0000 (23:09 +0000)]
KVM: x86: Use KBUILD_MODNAME to specify vendor module name
Use KBUILD_MODNAME to specify the vendor module name instead of manually
writing out the name to make it a bit more obvious that the name isn't
completely arbitrary. A future patch will also use KBUILD_MODNAME to
define pr_fmt, at which point using KBUILD_MODNAME for kvm_x86_ops.name
further reinforces the intended usage of kvm_x86_ops.name.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-34-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:16 +0000 (23:09 +0000)]
KVM: Drop kvm_arch_check_processor_compat() hook
Drop kvm_arch_check_processor_compat() and its support code now that all
architecture implementations are nops.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390
Acked-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <
20221130230934.1014142-33-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:15 +0000 (23:09 +0000)]
KVM: x86: Do CPU compatibility checks in x86 code
Move the CPU compatibility checks to pure x86 code, i.e. drop x86's use
of the common kvm_x86_check_cpu_compat() arch hook. x86 is the only
architecture that "needs" to do per-CPU compatibility checks, moving
the logic to x86 will allow dropping the common code, and will also
give x86 more control over when/how the compatibility checks are
performed, e.g. TDX will need to enable hardware (do VMXON) in order to
perform compatibility checks.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <
20221130230934.1014142-32-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:14 +0000 (23:09 +0000)]
KVM: VMX: Make VMCS configuration/capabilities structs read-only after init
Tag vmcs_config and vmx_capability structs as __init, the canonical
configuration is generated during hardware_setup() and must never be
modified after that point.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-31-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:13 +0000 (23:09 +0000)]
KVM: Drop kvm_arch_{init,exit}() hooks
Drop kvm_arch_init() and kvm_arch_exit() now that all implementations
are nops.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <
20221130230934.1014142-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:12 +0000 (23:09 +0000)]
KVM: s390: Mark __kvm_s390_init() and its descendants as __init
Tag __kvm_s390_init() and its unique helpers as __init. These functions
are only ever called during module_init(), but could not be tagged
accordingly while they were invoked from the common kvm_arch_init(),
which is not __init because of x86.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <
20221130230934.1014142-29-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:11 +0000 (23:09 +0000)]
KVM: s390: Do s390 specific init without bouncing through kvm_init()
Move the guts of kvm_arch_init() into a new helper, __kvm_s390_init(),
and invoke the new helper directly from kvm_s390_init() instead of
bouncing through kvm_init(). Invoking kvm_arch_init() is the very
first action performed by kvm_init(), i.e. this is a glorified nop.
Moving setup to __kvm_s390_init() will allow tagging more functions as
__init, and emptying kvm_arch_init() will allow dropping the hook
entirely once all architecture implementations are nops.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221130230934.1014142-28-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:10 +0000 (23:09 +0000)]
KVM: PPC: Move processor compatibility check to module init
Move KVM PPC's compatibility checks to their respective module_init()
hooks, there's no need to wait until KVM's common compat check, nor is
there a need to perform the check on every CPU (provided by common KVM's
hook), as the compatibility checks operate on global data.
arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
arch/powerpc/kvm/book3s.c: return 0
arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
strcmp(cur_cpu_spec->cpu_name, "e5500")
strcmp(cur_cpu_spec->cpu_name, "e6500")
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Message-Id: <
20221130230934.1014142-27-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:09 +0000 (23:09 +0000)]
KVM: RISC-V: Tag init functions and data with __init, __ro_after_init
Now that KVM setup is handled directly in riscv_kvm_init(), tag functions
and data that are used/set only during init with __init/__ro_after_init.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <
20221130230934.1014142-26-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:08 +0000 (23:09 +0000)]
KVM: RISC-V: Do arch init directly in riscv_kvm_init()
Fold the guts of kvm_arch_init() into riscv_kvm_init() instead of
bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is a
glorified nop as invoking kvm_arch_init() is the very first action
performed by kvm_init().
Moving setup to riscv_kvm_init(), which is tagged __init, will allow
tagging more functions and data with __init and __ro_after_init. And
emptying kvm_arch_init() will allow dropping the hook entirely once all
architecture implementations are nops.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <
20221130230934.1014142-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:07 +0000 (23:09 +0000)]
KVM: MIPS: Register die notifier prior to kvm_init()
Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes
/dev/kvm to userspace and thus allows userspace to create VMs (and call
other ioctls).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221130230934.1014142-24-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:06 +0000 (23:09 +0000)]
KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init()
Invoke kvm_mips_emulation_init() directly from kvm_mips_init() instead
of bouncing through kvm_init()=>kvm_arch_init(). Functionally, this is
a glorified nop as invoking kvm_arch_init() is the very first action
performed by kvm_init().
Emptying kvm_arch_init() will allow dropping the hook entirely once all
architecture implementations are nops.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221130230934.1014142-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:05 +0000 (23:09 +0000)]
KVM: MIPS: Hardcode callbacks to hardware virtualization extensions
Now that KVM no longer supports trap-and-emulate (see commit
45c7e8af4a5e
"MIPS: Remove KVM_TE support"), hardcode the MIPS callbacks to the
virtualization callbacks.
Harcoding the callbacks eliminates the technically-unnecessary check on
non-NULL kvm_mips_callbacks in kvm_arch_init(). MIPS has never supported
multiple in-tree modules, i.e. barring an out-of-tree module, where
copying and renaming kvm.ko counts as "out-of-tree", KVM could never
encounter a non-NULL set of callbacks during module init.
The callback check is also subtly broken, as it is not thread safe,
i.e. if there were multiple modules, loading both concurrently would
create a race between checking and setting kvm_mips_callbacks.
Given that out-of-tree shenanigans are not the kernel's responsibility,
hardcode the callbacks to simplify the code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221130230934.1014142-22-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:04 +0000 (23:09 +0000)]
KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init
Tag kvm_arm_init() and its unique helper as __init, and tag data that is
only ever modified under the kvm_arm_init() umbrella as read-only after
init.
Opportunistically name the boolean param in kvm_timer_hyp_init()'s
prototype to match its definition.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-21-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:03 +0000 (23:09 +0000)]
KVM: arm64: Do arm/arch initialization without bouncing through kvm_init()
Do arm/arch specific initialization directly in arm's module_init(), now
called kvm_arm_init(), instead of bouncing through kvm_init() to reach
kvm_arch_init(). Invoking kvm_arch_init() is the very first action
performed by kvm_init(), so from a initialization perspective this is a
glorified nop.
Avoiding kvm_arch_init() also fixes a mostly benign bug as kvm_arch_exit()
doesn't properly unwind if a later stage of kvm_init() fails. While the
soon-to-be-deleted comment about compiling as a module being unsupported
is correct, kvm_arch_exit() can still be called by kvm_init() if any step
after the call to kvm_arch_init() succeeds.
Add a FIXME to call out that pKVM initialization isn't unwound if
kvm_init() fails, which is a pre-existing problem inherited from
kvm_arch_exit().
Making kvm_arch_init() a nop will also allow dropping kvm_arch_init() and
kvm_arch_exit() entirely once all other architectures follow suit.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:02 +0000 (23:09 +0000)]
KVM: arm64: Unregister perf callbacks if hypervisor finalization fails
Undo everything done by init_subsystems() if a later initialization step
fails, i.e. unregister perf callbacks in addition to unregistering the
power management notifier.
Fixes:
bfa79a805454 ("KVM: arm64: Elevate hypervisor mappings creation at EL2")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-19-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:09:01 +0000 (23:09 +0000)]
KVM: arm64: Free hypervisor allocations if vector slot init fails
Teardown hypervisor mode if vector slot setup fails in order to avoid
leaking any allocations done by init_hyp_mode().
Fixes:
b881cdce77b4 ("KVM: arm64: Allocate hyp vectors statically")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Marc Zyngier [Wed, 30 Nov 2022 23:09:00 +0000 (23:09 +0000)]
KVM: arm64: Simplify the CPUHP logic
For a number of historical reasons, the KVM/arm64 hotplug setup is pretty
complicated, and we have two extra CPUHP notifiers for vGIC and timers.
It looks pretty pointless, and gets in the way of further changes.
So let's just expose some helpers that can be called from the core
CPUHP callback, and get rid of everything else.
This gives us the opportunity to drop a useless notifier entry,
as well as tidy-up the timer enable/disable, which was a bit odd.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:59 +0000 (23:08 +0000)]
KVM: x86: Serialize vendor module initialization (hardware setup)
Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while
doing hardware setup to ensure that concurrent calls are fully serialized.
KVM rejects attempts to load vendor modules if a different module has
already been loaded, but doesn't handle the case where multiple vendor
modules are loaded at the same time, and module_init() doesn't run under
the global module_mutex.
Note, in practice, this is likely a benign bug as no platform exists that
supports both SVM and VMX, i.e. barring a weird VM setup, one of the
vendor modules is guaranteed to fail a support check before modifying
common KVM state.
Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable,
but that comes with its own ugliness as it would require setting
.hardware_enable before success is guaranteed, e.g. attempting to load
the "wrong" could result in spurious failure to load the "right" module.
Introduce a new mutex as using kvm_lock is extremely deadlock prone due
to kvm_lock being taken under cpus_write_lock(), and in the future, under
under cpus_read_lock(). Any operation that takes cpus_read_lock() while
holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes
cpus_read_lock() to register a callback. In theory, KVM could avoid
such problematic paths, i.e. do less setup under kvm_lock, but avoiding
all calls to cpus_read_lock() is subtly difficult and thus fragile. E.g.
updating static calls also acquires cpus_read_lock().
Inverting the lock ordering, i.e. always taking kvm_lock outside
cpus_read_lock(), is not a viable option as kvm_lock is taken in various
callbacks that may be invoked under cpus_read_lock(), e.g. x86's
kvmclock_cpufreq_notifier().
The lockdep splat below is dependent on future patches to take
cpus_read_lock() in hardware_enable_all(), but as above, deadlock is
already is already possible.
======================================================
WARNING: possible circular locking dependency detected
6.0.0-smp--
7ec93244f194-init2 #27 Tainted: G O
------------------------------------------------------
stable/251833 is trying to acquire lock:
ffffffffc097ea28 (kvm_lock){+.+.}-{3:3}, at: hardware_enable_all+0x1f/0xc0 [kvm]
but task is already holding lock:
ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x2a/0xa0
__cpuhp_setup_state+0x2b/0x60
__kvm_x86_vendor_init+0x16a/0x1870 [kvm]
kvm_x86_vendor_init+0x23/0x40 [kvm]
0xffffffffc0a4d02b
do_one_initcall+0x110/0x200
do_init_module+0x4f/0x250
load_module+0x1730/0x18f0
__se_sys_finit_module+0xca/0x100
__x64_sys_finit_module+0x1d/0x20
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #0 (kvm_lock){+.+.}-{3:3}:
__lock_acquire+0x16f4/0x30d0
lock_acquire+0xb2/0x190
__mutex_lock+0x98/0x6f0
mutex_lock_nested+0x1b/0x20
hardware_enable_all+0x1f/0xc0 [kvm]
kvm_dev_ioctl+0x45e/0x930 [kvm]
__se_sys_ioctl+0x77/0xc0
__x64_sys_ioctl+0x1d/0x20
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpu_hotplug_lock);
lock(kvm_lock);
lock(cpu_hotplug_lock);
lock(kvm_lock);
*** DEADLOCK ***
1 lock held by stable/251833:
#0:
ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:58 +0000 (23:08 +0000)]
KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace
Call kvm_init() only after _all_ setup is complete, as kvm_init() exposes
/dev/kvm to userspace and thus allows userspace to create VMs (and call
other ioctls). E.g. KVM will encounter a NULL pointer when attempting to
add a vCPU to the per-CPU loaded_vmcss_on_cpu list if userspace is able to
create a VM before vmx_init() configures said list.
BUG: kernel NULL pointer dereference, address:
0000000000000008
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP
CPU: 6 PID: 1143 Comm: stable Not tainted 6.0.0-rc7+ #988
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:vmx_vcpu_load_vmcs+0x68/0x230 [kvm_intel]
<TASK>
vmx_vcpu_load+0x16/0x60 [kvm_intel]
kvm_arch_vcpu_load+0x32/0x1f0 [kvm]
vcpu_load+0x2f/0x40 [kvm]
kvm_arch_vcpu_create+0x231/0x310 [kvm]
kvm_vm_ioctl+0x79f/0xe10 [kvm]
? handle_mm_fault+0xb1/0x220
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f5a6b05743b
</TASK>
Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel(+) kvm irqbypass
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:57 +0000 (23:08 +0000)]
KVM: x86: Move guts of kvm_arch_init() to standalone helper
Move the guts of kvm_arch_init() to a new helper, kvm_x86_vendor_init(),
so that VMX can do _all_ arch and vendor initialization before calling
kvm_init(). Calling kvm_init() must be the _very_ last step during init,
as kvm_init() exposes /dev/kvm to userspace, i.e. allows creating VMs.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:56 +0000 (23:08 +0000)]
KVM: VMX: Move Hyper-V eVMCS initialization to helper
Move Hyper-V's eVMCS initialization to a dedicated helper to clean up
vmx_init(), and add a comment to call out that the Hyper-V init code
doesn't need to be unwound if vmx_init() ultimately fails.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20221130230934.1014142-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:55 +0000 (23:08 +0000)]
KVM: VMX: Don't bother disabling eVMCS static key on module exit
Don't disable the eVMCS static key on module exit, kvm_intel.ko owns the
key so there can't possibly be users after the kvm_intel.ko is unloaded,
at least not without much bigger issues.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:54 +0000 (23:08 +0000)]
KVM: VMX: Reset eVMCS controls in VP assist page during hardware disabling
Reset the eVMCS controls in the per-CPU VP assist page during hardware
disabling instead of waiting until kvm-intel's module exit. The controls
are activated if and only if KVM creates a VM, i.e. don't need to be
reset if hardware is never enabled.
Doing the reset during hardware disabling will naturally fix a potential
NULL pointer deref bug once KVM disables CPU hotplug while enabling and
disabling hardware (which is necessary to fix a variety of bugs). If the
kernel is running as the root partition, the VP assist page is unmapped
during CPU hot unplug, and so KVM's clearing of the eVMCS controls needs
to occur with CPU hot(un)plug disabled, otherwise KVM could attempt to
write to a CPU's VP assist page after it's unmapped.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20221130230934.1014142-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:53 +0000 (23:08 +0000)]
KVM: Drop arch hardware (un)setup hooks
Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that
all implementations are nops.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <
20221130230934.1014142-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:52 +0000 (23:08 +0000)]
KVM: x86: Move hardware setup/unsetup to init/exit
Now that kvm_arch_hardware_setup() is called immediately after
kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into
kvm_arch_{init,exit}() as a step towards dropping one of the hooks.
To avoid having to unwind various setup, e.g registration of several
notifiers, slot in the vendor hardware setup before the registration of
said notifiers and callbacks. Introducing a functional change while
moving code is less than ideal, but the alternative is adding a pile of
unwinding code, which is much more error prone, e.g. several attempts to
move the setup code verbatim all introduced bugs.
Add a comment to document that kvm_ops_update() is effectively the point
of no return, e.g. it sets the kvm_x86_ops.hardware_enable canary and so
needs to be unwound.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:51 +0000 (23:08 +0000)]
KVM: x86: Do timer initialization after XCR0 configuration
Move kvm_arch_init()'s call to kvm_timer_init() down a few lines below
the XCR0 configuration code. A future patch will move hardware setup
into kvm_arch_init() and slot in vendor hardware setup before the call
to kvm_timer_init() so that timer initialization (among other stuff)
doesn't need to be unwound if vendor setup fails. XCR0 setup on the
other hand needs to happen before vendor hardware setup.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:50 +0000 (23:08 +0000)]
KVM: s390: Move hardware setup/unsetup to init/exit
Now that kvm_arch_hardware_setup() is called immediately after
kvm_arch_init(), fold the guts of kvm_arch_hardware_(un)setup() into
kvm_arch_{init,exit}() as a step towards dropping one of the hooks.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <
20221130230934.1014142-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:49 +0000 (23:08 +0000)]
KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails
In preparation for folding kvm_arch_hardware_setup() into kvm_arch_init(),
unwind initialization one step at a time instead of simply calling
kvm_arch_exit(). Using kvm_arch_exit() regardless of which initialization
step failed relies on all affected state playing nice with being undone
even if said state wasn't first setup. That holds true for state that is
currently configured by kvm_arch_init(), but not for state that's handled
by kvm_arch_hardware_setup(), e.g. calling gmap_unregister_pte_notifier()
without first registering a notifier would result in list corruption due
to attempting to delete an entry that was never added to the list.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <
20221130230934.1014142-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:48 +0000 (23:08 +0000)]
KVM: Teardown VFIO ops earlier in kvm_exit()
Move the call to kvm_vfio_ops_exit() further up kvm_exit() to try and
bring some amount of symmetry to the setup order in kvm_init(), and more
importantly so that the arch hooks are invoked dead last by kvm_exit().
This will allow arch code to move away from the arch hooks without any
change in ordering between arch code and common code in kvm_exit().
That kvm_vfio_ops_exit() is called last appears to be 100% arbitrary. It
was bolted on after the fact by commit
571ee1b68598 ("kvm: vfio: fix
unregister kvm_device_ops of vfio"). The nullified kvm_device_ops_table
is also local to kvm_main.c and is used only when there are active VMs,
so unless arch code is doing something truly bizarre, nullifying the
table earlier in kvm_exit() is little more than a nop.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <
20221130230934.1014142-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:47 +0000 (23:08 +0000)]
KVM: Allocate cpus_hardware_enabled after arch hardware setup
Allocate cpus_hardware_enabled after arch hardware setup so that arch
"init" and "hardware setup" are called back-to-back and thus can be
combined in a future patch. cpus_hardware_enabled is never used before
kvm_create_vm(), i.e. doesn't have a dependency with hardware setup and
only needs to be allocated before /dev/kvm is exposed to userspace.
Free the object before the arch hooks are invoked to maintain symmetry,
and so that arch code can move away from the hooks without having to
worry about ordering changes.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <
20221130230934.1014142-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:46 +0000 (23:08 +0000)]
KVM: Initialize IRQ FD after arch hardware setup
Move initialization of KVM's IRQ FD workqueue below arch hardware setup
as a step towards consolidating arch "init" and "hardware setup", and
eventually towards dropping the hooks entirely. There is no dependency
on the workqueue being created before hardware setup, the workqueue is
used only when destroying VMs, i.e. only needs to be created before
/dev/kvm is exposed to userspace.
Move the destruction of the workqueue before the arch hooks to maintain
symmetry, and so that arch code can move away from the hooks without
having to worry about ordering changes.
Reword the comment about kvm_irqfd_init() needing to come after
kvm_arch_init() to call out that kvm_arch_init() must come before common
KVM does _anything_, as x86 very subtly relies on that behavior to deal
with multiple calls to kvm_init(), e.g. if userspace attempts to load
kvm_amd.ko and kvm_intel.ko. Tag the code with a FIXME, as x86's subtle
requirement is gross, and invoking an arch callback as the very first
action in a helper that is called only from arch code is silly.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 30 Nov 2022 23:08:45 +0000 (23:08 +0000)]
KVM: Register /dev/kvm as the _very_ last thing during initialization
Register /dev/kvm, i.e. expose KVM to userspace, only after all other
setup has completed. Once /dev/kvm is exposed, userspace can start
invoking KVM ioctls, creating VMs, etc... If userspace creates a VM
before KVM is done with its configuration, bad things may happen, e.g.
KVM will fail to properly migrate vCPU state if a VM is created before
KVM has registered preemption notifiers.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221130230934.1014142-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 29 Dec 2022 20:35:48 +0000 (15:35 -0500)]
Merge branch 'kvm-late-6.1' into HEAD
x86:
* Change tdp_mmu to a read-only parameter
* Separate TDP and shadow MMU page fault paths
* Enable Hyper-V invariant TSC control
selftests:
* Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:49 +0000 (11:58 +0200)]
KVM: selftests: Test Hyper-V invariant TSC control
Add a test for the newly introduced Hyper-V invariant TSC control feature:
- HV_X64_MSR_TSC_INVARIANT_CONTROL is not available without
HV_ACCESS_TSC_INVARIANT CPUID bit set and available with it.
- BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL controls the filtering of
architectural invariant TSC (CPUID.80000007H:EDX[8]) bit.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:48 +0000 (11:58 +0200)]
KVM: selftests: Test that values written to Hyper-V MSRs are preserved
Enhance 'hyperv_features' selftest by adding a check that KVM
preserves values written to PV MSRs. Two MSRs are, however, 'special':
- HV_X64_MSR_EOI as it is a 'write-only' MSR,
- HV_X64_MSR_RESET as it always reads as '0'.
The later doesn't require any special handling right now because the
test never writes anything besides '0' to the MSR, leave a TODO node
about the fact.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:47 +0000 (11:58 +0200)]
KVM: selftests: Convert hyperv_features test to using KVM_X86_CPU_FEATURE()
hyperv_features test needs to set certain CPUID bits in Hyper-V feature
leaves but instead of open coding this, common KVM_X86_CPU_FEATURE()
infrastructure can be used.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:46 +0000 (11:58 +0200)]
KVM: selftests: Rename 'msr->available' to 'msr->fault_exepected' in hyperv_features test
It may not be clear what 'msr->available' means. The test actually
checks that accessing the particular MSR doesn't cause #GP, rename
the variable accordingly.
While on it, use 'true'/'false' instead of '1'/'0' for 'write'/
'fault_expected' as these are boolean.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:45 +0000 (11:58 +0200)]
KVM: x86: Hyper-V invariant TSC control
Normally, genuine Hyper-V doesn't expose architectural invariant TSC
(CPUID.80000007H:EDX[8]) to its guests by default. A special PV MSR
(HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x40000118) and corresponding CPUID
feature bit (CPUID.0x40000003.EAX[15]) were introduced. When bit 0 of the
PV MSR is set, invariant TSC bit starts to show up in CPUID. When the
feature is exposed to Hyper-V guests, reenlightenment becomes unneeded.
Add the feature to KVM. Keep CPUID output intact when the feature
wasn't exposed to L1 and implement the required logic for hiding
invariant TSC when the feature was exposed and invariant TSC control
MSR wasn't written to. Copy genuine Hyper-V behavior and forbid to
disable the feature once it was enabled.
For the reference, for linux guests, support for the feature was added
in commit
dce7cd62754b ("x86/hyperv: Allow guests to enable InvariantTSC").
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:44 +0000 (11:58 +0200)]
KVM: x86: Add a KVM-only leaf for CPUID_8000_0007_EDX
CPUID_8000_0007_EDX may come handy when X86_FEATURE_CONSTANT_TSC
needs to be checked.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Thu, 13 Oct 2022 09:58:43 +0000 (11:58 +0200)]
x86/hyperv: Add HV_EXPOSE_INVARIANT_TSC define
Avoid open coding BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL by adding
a dedicated define. While there's only one user at this moment, the
upcoming KVM implementation of Hyper-V Invariant TSC feature will need
to use it as well.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221013095849.705943-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 12 Oct 2022 18:16:58 +0000 (18:16 +0000)]
KVM: x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults
When handling direct page faults, pivot on the TDP MMU being globally
enabled instead of checking if the target MMU is a TDP MMU. Now that the
TDP MMU is all-or-nothing, if the TDP MMU is enabled, KVM will reach
direct_page_fault() if and only if the MMU is a TDP MMU. When TDP is
enabled (obviously required for the TDP MMU), only non-nested TDP page
faults reach direct_page_fault(), i.e. nonpaging MMUs are impossible, as
NPT requires paging to be enabled and EPT faults use ept_page_fault().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221012181702.3663607-8-seanjc@google.com>
[Use tdp_mmu_enabled variable. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 12 Oct 2022 18:16:59 +0000 (18:16 +0000)]
KVM: x86/mmu: Pivot on "TDP MMU enabled" to check if active MMU is TDP MMU
Simplify and optimize the logic for detecting if the current/active MMU
is a TDP MMU. If the TDP MMU is globally enabled, then the active MMU is
a TDP MMU if it is direct. When TDP is enabled, so called nonpaging MMUs
are never used as the only form of shadow paging KVM uses is for nested
TDP, and the active MMU can't be direct in that case.
Rename the helper and take the vCPU instead of an arbitrary MMU, as
nonpaging MMUs can show up in the walk_mmu if L1 is using nested TDP and
L2 has paging disabled. Taking the vCPU has the added bonus of cleaning
up the callers, all of which check the current MMU but wrap code that
consumes the vCPU.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221012181702.3663607-9-seanjc@google.com>
[Use tdp_mmu_enabled variable. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 12 Oct 2022 18:17:00 +0000 (18:17 +0000)]
KVM: x86/mmu: Replace open coded usage of tdp_mmu_page with is_tdp_mmu_page()
Use is_tdp_mmu_page() instead of querying sp->tdp_mmu_page directly so
that all users benefit if KVM ever finds a way to optimize the logic.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221012181702.3663607-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:46 +0000 (10:35 -0700)]
KVM: x86/mmu: Rename __direct_map() to direct_map()
Rename __direct_map() to direct_map() since the leading underscores are
unnecessary. This also makes the page fault handler names more
consistent: kvm_tdp_mmu_page_fault() calls kvm_tdp_mmu_map() and
direct_page_fault() calls direct_map().
Opportunistically make some trivial cleanups to comments that had to be
modified anyway since they mentioned __direct_map(). Specifically, use
"()" when referring to functions, and include kvm_tdp_mmu_map() among
the various callers of disallowed_hugepage_adjust().
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:45 +0000 (10:35 -0700)]
KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults
Stop calling make_mmu_pages_available() when handling TDP MMU faults.
The TDP MMU does not participate in the "available MMU pages" tracking
and limiting so calling this function is unnecessary work when handling
TDP MMU faults.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:44 +0000 (10:35 -0700)]
KVM: x86/mmu: Split out TDP MMU page fault handling
Split out the page fault handling for the TDP MMU to a separate
function. This creates some duplicate code, but makes the TDP MMU fault
handler simpler to read by eliminating branches and will enable future
cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge.
Only compile in the TDP MMU fault handler for 64-bit builds since
kvm_tdp_mmu_map() does not exist in 32-bit builds.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-9-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:43 +0000 (10:35 -0700)]
KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs
Move the initialization of fault.{gfn,slot} earlier in the page fault
handling code for fully direct MMUs. This will enable a future commit to
split out TDP MMU page fault handling without needing to duplicate the
initialization of these 2 fields.
Opportunistically take advantage of the fact that fault.gfn is
initialized in kvm_tdp_page_fault() rather than recomputing it from
fault->addr.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-8-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:42 +0000 (10:35 -0700)]
KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn()
Handle faults on GFNs that do not have a backing memslot in
kvm_faultin_pfn() and drop handle_abnormal_pfn(). This eliminates
duplicate code in the various page fault handlers.
Opportunistically tweak the comment about handling gfn > host.MAXPHYADDR
to reflect that the effect of returning RET_PF_EMULATE at that point is
to avoid creating an MMIO SPTE for such GFNs.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:41 +0000 (10:35 -0700)]
KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling
Pass the kvm_page_fault struct down to kvm_handle_error_pfn() to avoid a
memslot lookup when handling KVM_PFN_ERR_HWPOISON. Opportunistically
move the gfn_to_hva_memslot() call and @current down into
kvm_send_hwpoison_signal() to cut down on line lengths.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:40 +0000 (10:35 -0700)]
KVM: x86/mmu: Handle error PFNs in kvm_faultin_pfn()
Handle error PFNs in kvm_faultin_pfn() rather than relying on the caller
to invoke handle_abnormal_pfn() after kvm_faultin_pfn().
Opportunistically rename kvm_handle_bad_page() to kvm_handle_error_pfn()
to make it more consistent with is_error_pfn().
This commit moves KVM closer to being able to drop
handle_abnormal_pfn(), which will reduce the amount of duplicate code in
the various page fault handlers.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:39 +0000 (10:35 -0700)]
KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn()
Grab mmu_invalidate_seq in kvm_faultin_pfn() and stash it in struct
kvm_page_fault. The eliminates duplicate code and reduces the amount of
parameters needed for is_page_fault_stale().
Preemptively split out __kvm_faultin_pfn() to a separate function for
use in subsequent commits.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:38 +0000 (10:35 -0700)]
KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled
Move kvm_mmu_{init,uninit}_tdp_mmu() behind tdp_mmu_enabled. This makes
these functions consistent with the rest of the calls into the TDP MMU
from mmu.c, and which is now possible since tdp_mmu_enabled is only
modified when the x86 vendor module is loaded. i.e. It will never change
during the lifetime of a VM.
This change also enabled removing the stub definitions for 32-bit KVM,
as the compiler will just optimize the calls out like it does for all
the other TDP MMU functions.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Wed, 21 Sep 2022 17:35:37 +0000 (10:35 -0700)]
KVM: x86/mmu: Change tdp_mmu to a read-only parameter
Change tdp_mmu to a read-only parameter and drop the per-vm
tdp_mmu_enabled. For 32-bit KVM, make tdp_mmu_enabled a macro that is
always false so that the compiler can continue omitting cals to the TDP
MMU.
The TDP MMU was introduced in 5.10 and has been enabled by default since
5.15. At this point there are no known functionality gaps between the
TDP MMU and the shadow MMU, and the TDP MMU uses less memory and scales
better with the number of vCPUs. In other words, there is no good reason
to disable the TDP MMU on a live system.
Purposely do not drop tdp_mmu=N support (i.e. do not force 64-bit KVM to
always use the TDP MMU) since tdp_mmu=N is still used to get test
coverage of KVM's shadow MMU TDP support, which is used in 32-bit KVM.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <
20220921173546.2674386-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Thomas Huth [Tue, 4 Oct 2022 09:31:31 +0000 (11:31 +0200)]
KVM: selftests: x86: Use TAP interface in the tsc_msrs_test
Let's add some output here so that the user has some feedback
about what is being run.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <
20221004093131.40392-4-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Thomas Huth [Tue, 4 Oct 2022 09:31:29 +0000 (11:31 +0200)]
KVM: selftests: Use TAP interface in the kvm_binary_stats_test
The kvm_binary_stats_test test currently does not have any output (unless
one of the TEST_ASSERT statement fails), so it's hard to say for a user
how far it did proceed already. Thus let's make this a little bit more
user-friendly and include some TAP output via the kselftest.h interface.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Message-Id: <
20221004093131.40392-2-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lai Jiangshan [Mon, 12 Dec 2022 09:01:06 +0000 (17:01 +0800)]
kvm: x86/mmu: Warn on linking when sp->unsync_children
Since the commit
65855ed8b034 ("KVM: X86: Synchronize the shadow
pagetable before link it"), no sp would be linked with
sp->unsync_children = 1.
So make it WARN if it is the case.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <
20221212090106.378206-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Fri, 4 Nov 2022 14:47:08 +0000 (15:47 +0100)]
KVM: VMX: Resurrect vmcs_conf sanitization for KVM-on-Hyper-V
Commit
9bcb90650e31 ("KVM: VMX: Get rid of eVMCS specific VMX controls
sanitization") dropped 'vmcs_conf' sanitization for KVM-on-Hyper-V because
there's no known Hyper-V version which would expose a feature
unsupported in eVMCS in VMX feature MSRs. This works well for all
currently existing Hyper-V version, however, future Hyper-V versions
may add features which are supported by KVM and are currently missing
in eVMCSv1 definition (e.g. APIC virtualization, PML,...). When this
happens, existing KVMs will get broken. With the inverted 'unsupported
by eVMCSv1' checks, we can resurrect vmcs_conf sanitization and make
KVM future proof.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20221104144708.435865-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Fri, 4 Nov 2022 14:47:07 +0000 (15:47 +0100)]
KVM: nVMX: Prepare to sanitize tertiary execution controls with eVMCS
In preparation to restoring vmcs_conf sanitization for KVM-on-Hyper-V,
(and for completeness) add tertiary VM-execution controls to
'evmcs_supported_ctrls'.
No functional change intended as KVM doesn't yet expose
MSR_IA32_VMX_PROCBASED_CTLS3 to its guests.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20221104144708.435865-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Fri, 4 Nov 2022 14:47:06 +0000 (15:47 +0100)]
KVM: nVMX: Invert 'unsupported by eVMCSv1' check
When a new feature gets implemented in KVM, EVMCS1_UNSUPPORTED_* defines
need to be adjusted to avoid the situation when the feature is exposed
to the guest but there's no corresponding eVMCS field[s] for it. This
is not obvious and fragile. Invert 'unsupported by eVMCSv1' check and
make it 'supported by eVMCSv1' instead, this way it's much harder to
make a mistake. New features will get added to EVMCS1_SUPPORTED_*
defines when the corresponding fields are added to eVMCS definition.
No functional change intended. EVMCS1_SUPPORTED_* defines are composed
by taking KVM_{REQUIRED,OPTIONAL}_VMX_ defines and filtering out what
was previously known as EVMCS1_UNSUPPORTED_*.
From all the controls, SECONDARY_EXEC_TSC_SCALING requires special
handling as it's actually present in eVMCSv1 definition but is not
currently supported for Hyper-V-on-KVM, just for KVM-on-Hyper-V. As
evmcs_supported_ctrls will be used for both scenarios, just add it
there instead of EVMCS1_SUPPORTED_2NDEXEC.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20221104144708.435865-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vitaly Kuznetsov [Fri, 4 Nov 2022 14:47:05 +0000 (15:47 +0100)]
KVM: nVMX: Sanitize primary processor-based VM-execution controls with eVMCS too
The only unsupported primary processor-based VM-execution control at the
moment is CPU_BASED_ACTIVATE_TERTIARY_CONTROLS and KVM doesn't expose it
in nested VMX feature MSRs anyway (see nested_vmx_setup_ctls_msrs())
but in preparation to inverting "unsupported with eVMCS" checks (and
for completeness) it's better to sanitize MSR_IA32_VMX_PROCBASED_CTLS/
MSR_IA32_VMX_TRUE_PROCBASED_CTLS too.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <
20221104144708.435865-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 30 Nov 2022 18:11:47 +0000 (13:11 -0500)]
KVM: selftests: restore special vmmcall code layout needed by the harness
Commit
8fda37cf3d41 ("KVM: selftests: Stuff RAX/RCX with 'safe' values
in vmmcall()/vmcall()", 2022-11-21) broke the svm_nested_soft_inject_test
because it placed a "pop rbp" instruction after vmmcall. While this is
correct and mimics what is done in the VMX case, this particular test
expects a ud2 instruction right after the vmmcall, so that it can skip
over it in the L1 part of the test.
Inline a suitably-modified version of vmmcall() to restore the
functionality of the test.
Fixes:
8fda37cf3d41 ("KVM: selftests: Stuff RAX/RCX with 'safe' values in vmmcall()/vmcall()"
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <
20221130181147.9911-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 28 Dec 2022 11:00:22 +0000 (06:00 -0500)]
Documentation: kvm: clarify SRCU locking order
Currently only the locking order of SRCU vs kvm->slots_arch_lock
and kvm->slots_lock is documented. Extend this to kvm->lock
since Xen emulation got it terribly wrong.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 28 Dec 2022 10:33:41 +0000 (05:33 -0500)]
KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET
While KVM_XEN_EVTCHN_RESET is usually called with no vCPUs running,
if that happened it could cause a deadlock. This is due to
kvm_xen_eventfd_reset() doing a synchronize_srcu() inside
a kvm->lock critical section.
To avoid this, first collect all the evtchnfd objects in an
array and free all of them once the kvm->lock critical section
is over and th SRCU grace period has expired.
Reported-by: Michal Luczaj <mhal@rbox.co>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 26 Dec 2022 12:03:20 +0000 (12:03 +0000)]
KVM: x86/xen: Documentation updates and clarifications
Most notably, the KVM_XEN_EVTCHN_RESET feature had escaped documentation
entirely. Along with how to turn most stuff off on SHUTDOWN_soft_reset.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20221226120320.1125390-6-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 26 Dec 2022 12:03:19 +0000 (12:03 +0000)]
KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi
These are (uint64_t)-1 magic values are a userspace ABI, allowing the
shared info pages and other enlightenments to be disabled. This isn't
a Xen ABI because Xen doesn't let the guest turn these off except with
the full SHUTDOWN_soft_reset mechanism. Under KVM, the userspace VMM is
expected to handle soft reset, and tear down the kernel parts of the
enlightenments accordingly.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20221226120320.1125390-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Michal Luczaj [Mon, 26 Dec 2022 12:03:18 +0000 (12:03 +0000)]
KVM: x86/xen: Simplify eventfd IOCTLs
Port number is validated in kvm_xen_setattr_evtchn().
Remove superfluous checks in kvm_xen_eventfd_assign() and
kvm_xen_eventfd_update().
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <
20221222203021.1944101-3-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20221226120320.1125390-4-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 26 Dec 2022 12:03:17 +0000 (12:03 +0000)]
KVM: x86/xen: Fix SRCU/RCU usage in readers of evtchn_ports
The evtchnfd structure itself must be protected by either kvm->lock or
SRCU. Use the former in kvm_xen_eventfd_update(), since the lock is
being taken anyway; kvm_xen_hcall_evtchn_send() instead is a reader and
does not need kvm->lock, and is called in SRCU critical section from the
kvm_x86_handle_exit function.
It is also important to use rcu_read_{lock,unlock}() in
kvm_xen_hcall_evtchn_send(), because idr_remove() will *not*
use synchronize_srcu() to wait for readers to complete.
Remove a superfluous if (kvm) check before calling synchronize_srcu()
in kvm_xen_eventfd_deassign() where kvm has been dereferenced already.
Co-developed-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20221226120320.1125390-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Woodhouse [Mon, 26 Dec 2022 12:03:16 +0000 (12:03 +0000)]
KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly
In particular, we shouldn't assume that being contiguous in guest virtual
address space means being contiguous in guest *physical* address space.
In dropping the manual calls to kvm_mmu_gva_to_gpa_system(), also drop
the srcu_read_lock() that was around them. All call sites are reached
from kvm_xen_hypercall() which is called from the handle_exit function
with the read lock already held.
536395260 ("KVM: x86/xen: handle PV timers oneshot mode")
1a65105a5 ("KVM: x86/xen: handle PV spinlocks slowpath")
Fixes:
2fd6df2f2 ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20221226120320.1125390-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Michal Luczaj [Mon, 26 Dec 2022 12:03:15 +0000 (12:03 +0000)]
KVM: x86/xen: Fix memory leak in kvm_xen_write_hypercall_page()
Release page irrespectively of kvm_vcpu_write_guest() return value.
Suggested-by: Paul Durrant <paul@xen.org>
Fixes:
23200b7a30de ("KVM: x86/xen: intercept xen hypercalls if enabled")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <
20221220151454.712165-1-mhal@rbox.co>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <
20221226120320.1125390-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Wed, 7 Dec 2022 00:36:37 +0000 (00:36 +0000)]
KVM: Delete extra block of "};" in the KVM API documentation
Delete an extra block of code/documentation that snuck in when KVM's
documentation was converted to ReST format.
Fixes:
106ee47dc633 ("docs: kvm: Convert api.txt to ReST format")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221207003637.2041211-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lai Jiangshan [Wed, 7 Dec 2022 12:05:05 +0000 (20:05 +0800)]
kvm: x86/mmu: Remove duplicated "be split" in spte.h
"be split be split" -> "be split"
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <
20221207120505.9175-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lai Jiangshan [Wed, 7 Dec 2022 12:06:16 +0000 (20:06 +0800)]
kvm: Remove the unused macro KVM_MMU_READ_{,UN}LOCK()
No code is using KVM_MMU_READ_LOCK() or KVM_MMU_READ_UNLOCK(). They
used to be in virt/kvm/pfncache.c:
KVM_MMU_READ_LOCK(kvm);
retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
KVM_MMU_READ_UNLOCK(kvm);
However, since
58cd407ca4c6 ("KVM: Fix multiple races in gfn=>pfn cache
refresh", 2022-05-25) the code is only relying on the MMU notifier's
invalidation count and sequence number.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <
20221207120617.9409-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lukas Bulwahn [Mon, 5 Dec 2022 08:20:44 +0000 (09:20 +0100)]
MAINTAINERS: adjust entry after renaming the vmx hyperv files
Commit
a789aeba4196 ("KVM: VMX: Rename "vmx/evmcs.{ch}" to
"vmx/hyperv.{ch}"") renames the VMX specific Hyper-V files, but does not
adjust the entry in MAINTAINERS.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.
Repair this file reference in KVM X86 HYPER-V (KVM/hyper-v).
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Fixes:
a789aeba4196 ("KVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}"")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221205082044.10141-1-lukas.bulwahn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Fri, 9 Dec 2022 01:53:02 +0000 (01:53 +0000)]
KVM: selftests: Mark correct page as mapped in virt_map()
The loop marks vaddr as mapped after incrementing it by page size,
thereby marking the *next* page as mapped. Set the bit in vpages_mapped
first instead.
Fixes:
56fc7732031d ("KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <
20221209015307.1781352-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Fri, 9 Dec 2022 01:53:04 +0000 (01:53 +0000)]
KVM: arm64: selftests: Don't identity map the ucall MMIO hole
Currently the ucall MMIO hole is placed immediately after slot0, which
is a relatively safe address in the PA space. However, it is possible
that the same address has already been used for something else (like the
guest program image) in the VA space. At least in my own testing,
building the vgic_irq test with clang leads to the MMIO hole appearing
underneath gicv3_ops.
Stop identity mapping the MMIO hole and instead find an unused VA to map
to it. Yet another subtle detail of the KVM selftests library is that
virt_pg_map() does not update vm->vpages_mapped. Switch over to
virt_map() instead to guarantee that the chosen VA isn't to something
else.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <
20221209015307.1781352-6-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 12 Dec 2022 10:36:53 +0000 (05:36 -0500)]
KVM: selftests: document the default implementation of vm_vaddr_populate_bitmap
Explain the meaning of the bit manipulations of vm_vaddr_populate_bitmap.
These correspond to the "canonical addresses" of x86 and other
architectures, but that is not obvious.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Fri, 9 Dec 2022 20:55:44 +0000 (12:55 -0800)]
KVM: selftests: Use magic value to signal ucall_alloc() failure
Use a magic value to signal a ucall_alloc() failure instead of simply
doing GUEST_ASSERT(). GUEST_ASSERT() relies on ucall_alloc() and so a
failure puts the guest into an infinite loop.
Use -1 as the magic value, as a real ucall struct should never wrap.
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:50 +0000 (00:16 +0000)]
KVM: selftests: Disable "gnu-variable-sized-type-not-at-end" warning
Disable gnu-variable-sized-type-not-at-end so that tests and libraries
can create overlays of variable sized arrays at the end of structs when
using a fixed number of entries, e.g. to get/set a single MSR.
It's possible to fudge around the warning, e.g. by defining a custom
struct that hardcodes the number of entries, but that is a burden for
both developers and readers of the code.
lib/x86_64/processor.c:664:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs header;
^
lib/x86_64/processor.c:772:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs header;
^
lib/x86_64/processor.c:787:19: warning: field 'header' with variable sized type 'struct kvm_msrs'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_msrs header;
^
3 warnings generated.
x86_64/hyperv_tlb_flush.c:54:18: warning: field 'hv_vp_set' with variable sized type 'struct hv_vpset'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct hv_vpset hv_vp_set;
^
1 warning generated.
x86_64/xen_shinfo_test.c:137:25: warning: field 'info' with variable sized type 'struct kvm_irq_routing'
not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct kvm_irq_routing info;
^
1 warning generated.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:49 +0000 (00:16 +0000)]
KVM: selftests: Include lib.mk before consuming $(CC)
Include lib.mk before consuming $(CC) and document that lib.mk overwrites
$(CC) unless make was invoked with -e or $(CC) was specified after make
(which makes the environment override the Makefile). Including lib.mk
after using it for probing, e.g. for -no-pie, can lead to weirdness.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:48 +0000 (00:16 +0000)]
KVM: selftests: Explicitly disable builtins for mem*() overrides
Explicitly disable the compiler's builtin memcmp(), memcpy(), and
memset(). Because only lib/string_override.c is built with -ffreestanding,
the compiler reserves the right to do what it wants and can try to link the
non-freestanding code to its own crud.
/usr/bin/x86_64-linux-gnu-ld: /lib/x86_64-linux-gnu/libc.a(memcmp.o): in function `memcmp_ifunc':
(.text+0x0): multiple definition of `memcmp'; tools/testing/selftests/kvm/lib/string_override.o:
tools/testing/selftests/kvm/lib/string_override.c:15: first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes:
6b6f71484bf4 ("KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest use")
Reported-by: Aaron Lewis <aaronlewis@google.com>
Reported-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:47 +0000 (00:16 +0000)]
KVM: selftests: Probe -no-pie with actual CFLAGS used to compile
Probe -no-pie with the actual set of CFLAGS used to compile the tests,
clang whines about -no-pie being unused if the tests are compiled with
-static.
clang: warning: argument unused during compilation: '-no-pie'
[-Wunused-command-line-argument]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:46 +0000 (00:16 +0000)]
KVM: selftests: Use proper function prototypes in probing code
Make the main() functions in the probing code proper prototypes so that
compiling the probing code with more strict flags won't generate false
negatives.
<stdin>:1:5: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-8-seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:45 +0000 (00:16 +0000)]
KVM: selftests: Rename UNAME_M to ARCH_DIR, fill explicitly for x86
Rename UNAME_M to ARCH_DIR and explicitly set it directly for x86. At
this point, the name of the arch directory really doesn't have anything
to do with `uname -m`, and UNAME_M is unnecessarily confusing given that
its purpose is purely to identify the arch specific directory.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:44 +0000 (00:16 +0000)]
KVM: selftests: Fix a typo in x86-64's kvm_get_cpu_address_width()
Fix a == vs. = typo in kvm_get_cpu_address_width() that results in
@pa_bits being left unset if the CPU doesn't support enumerating its
MAX_PHY_ADDR. Flagged by clang's unusued-value warning.
lib/x86_64/processor.c:1034:51: warning: expression result unused [-Wunused-value]
*pa_bits == kvm_cpu_has(X86_FEATURE_PAE) ? 36 : 32;
Fixes:
3bd396353d18 ("KVM: selftests: Add X86_FEATURE_PAE and use it calc "fallback" MAXPHYADDR")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221213001653.3852042-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:43 +0000 (00:16 +0000)]
KVM: selftests: Use pattern matching in .gitignore
Use pattern matching to exclude everything except .c, .h, .S, and .sh
files from Git. Manually adding every test target has an absurd
maintenance cost, is comically error prone, and leads to bikeshedding
over whether or not the targets should be listed in alphabetical order.
Deliberately do not include the one-off assets, e.g. config, settings,
.gitignore itself, etc as Git doesn't ignore files that are already in
the repository. Adding the one-off assets won't prevent mistakes where
developers forget to --force add files that don't match the "allowed".
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:42 +0000 (00:16 +0000)]
KVM: selftests: Fix divide-by-zero bug in memslot_perf_test
Check that the number of pages per slot is non-zero in get_max_slots()
prior to computing the remaining number of pages. clang generates code
that uses an actual DIV for calculating the remaining, which causes a #DE
if the total number of pages is less than the number of slots.
traps: memslot_perf_te[97611] trap divide error ip:4030c4 sp:
7ffd18ae58f0
error:0 in memslot_perf_test[401000+cb000]
Fixes:
a69170c65acd ("KVM: selftests: memslot_perf_test: Report optimal memory slots")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221213001653.3852042-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:41 +0000 (00:16 +0000)]
KVM: selftests: Delete dead code in x86_64/vmx_tsc_adjust_test.c
Delete an unused struct definition in x86_64/vmx_tsc_adjust_test.c.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20221213001653.3852042-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Tue, 13 Dec 2022 00:16:40 +0000 (00:16 +0000)]
KVM: selftests: Define literal to asm constraint in aarch64 as unsigned long
Define a literal '0' asm input constraint to aarch64/page_fault_test's
guest_cas() as an unsigned long to make clang happy.
tools/testing/selftests/kvm/aarch64/page_fault_test.c:120:16: error:
value size does not match register size specified by the constraint
and modifier [-Werror,-Wasm-operand-widths]
:: "r" (0), "r" (TEST_DATA), "r" (guest_test_memory));
^
tools/testing/selftests/kvm/aarch64/page_fault_test.c:119:15: note:
use constraint modifier "w"
"casal %0, %1, [%2]\n"
^~
%w0
Fixes:
35c581015712 ("KVM: selftests: aarch64: Add aarch64/page_fault_test")
Cc: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <
20221213001653.3852042-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Sun, 25 Dec 2022 21:41:39 +0000 (13:41 -0800)]
Linux 6.2-rc1
Steven Rostedt (Google) [Tue, 20 Dec 2022 18:45:19 +0000 (13:45 -0500)]
treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown". After a timer is set to this state, then it can no
longer be re-armed.
The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed. It also ignores any locations where
the timer->function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.
This was created by using a coccinelle script and the following
commands:
$ cat timer.cocci
@@
expression ptr, slab;
identifier timer, rfield;
@@
(
- del_timer(&ptr->timer);
+ timer_shutdown(&ptr->timer);
|
- del_timer_sync(&ptr->timer);
+ timer_shutdown_sync(&ptr->timer);
)
... when strict
when != ptr->timer
(
kfree_rcu(ptr, rfield);
|
kmem_cache_free(slab, ptr);
|
kfree(ptr);
)
$ spatch timer.cocci . > /tmp/t.patch
$ patch -p1 < /tmp/t.patch
Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ]
Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ]
Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 23 Dec 2022 22:44:08 +0000 (14:44 -0800)]
Merge tag 'spi-fix-v6.2-rc1' of git://git./linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One driver specific change here which handles the case where a SPI
device for some reason tries to change the bus speed during a message
on fsl_spi hardware, this should be very unusual"
* tag 'spi-fix-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl_spi: Don't change speed while chipselect is active
Linus Torvalds [Fri, 23 Dec 2022 22:38:00 +0000 (14:38 -0800)]
Merge tag 'regulator-fix-v6.2-rc1' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Two core fixes here, one for a long standing race which some Qualcomm
systems have started triggering with their UFS driver and another
fixing a problem with supply lookup introduced by the fixes for devm
related use after free issues that were introduced in this merge
window"
* tag 'regulator-fix-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: fix deadlock on regulator enable
regulator: core: Fix resolve supply lookup issue
Linus Torvalds [Fri, 23 Dec 2022 21:56:41 +0000 (13:56 -0800)]
Merge tag 'coccinelle-6.2' of git://git./linux/kernel/git/jlawall/linux
Pull coccicheck update from Julia Lawall:
"Modernize use of grep in coccicheck:
Use 'grep -E' instead of 'egrep'"
* tag 'coccinelle-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
scripts: coccicheck: use "grep -E" instead of "egrep"
Linus Torvalds [Fri, 23 Dec 2022 20:00:24 +0000 (12:00 -0800)]
Merge tag 'hardening-v6.2-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull kernel hardening fixes from Kees Cook:
- Fix CFI failure with KASAN (Sami Tolvanen)
- Fix LKDTM + CFI under GCC 7 and 8 (Kristina Martsenko)
- Limit CONFIG_ZERO_CALL_USED_REGS to Clang > 15.0.6 (Nathan
Chancellor)
- Ignore "contents" argument in LoadPin's LSM hook handling
- Fix paste-o in /sys/kernel/warn_count API docs
- Use READ_ONCE() consistently for oops/warn limit reading
* tag 'hardening-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
cfi: Fix CFI failure with KASAN
exit: Use READ_ONCE() for all oops/warn limit reads
security: Restrict CONFIG_ZERO_CALL_USED_REGS to gcc or clang > 15.0.6
lkdtm: cfi: Make PAC test work with GCC 7 and 8
docs: Fix path paste-o for /sys/kernel/warn_count
LoadPin: Ignore the "contents" argument of the LSM hooks
Linus Torvalds [Fri, 23 Dec 2022 19:55:54 +0000 (11:55 -0800)]
Merge tag 'pstore-v6.2-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Switch pmsg_lock to an rt_mutex to avoid priority inversion (John
Stultz)
- Correctly assign mem_type property (Luca Stefani)
* tag 'pstore-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Properly assign mem_type property
pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXES
pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion
Linus Torvalds [Fri, 23 Dec 2022 19:44:20 +0000 (11:44 -0800)]
Merge tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"Fix up the sound code to not pass __GFP_COMP to the non-coherent DMA
allocator, as it copes with that just as badly as the coherent
allocator, and then add a check to make sure no one passes the flag
ever again"
* tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: reject GFP_COMP for noncoherent allocations
ALSA: memalloc: don't use GFP_COMP for non-coherent dma allocations
Linus Torvalds [Fri, 23 Dec 2022 19:39:18 +0000 (11:39 -0800)]
Merge tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
- improve p9_check_errors to check buffer size instead of msize when
possible (e.g. not zero-copy)
- some more syzbot and KCSAN fixes
- minor headers include cleanup
* tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux:
9p/client: fix data race on req->status
net/9p: fix response size check in p9_check_errors()
net/9p: distinguish zero-copy requests
9p/xen: do not memcpy header into req->rc
9p: set req refcount to zero to avoid uninitialized usage
9p/net: Remove unneeded idr.h #include
9p/fs: Remove unneeded idr.h #include
Linus Torvalds [Fri, 23 Dec 2022 19:15:48 +0000 (11:15 -0800)]
Merge tag 'sound-6.2-rc1-2' of git://git./linux/kernel/git/tiwai/sound
Pull more sound updates from Takashi Iwai:
"A few more updates for 6.2: most of changes are about ASoC
device-specific fixes.
- Lots of ASoC Intel AVS extensions and refactoring
- Quirks for ASoC Intel SOF as well as regression fixes
- ASoC Mediatek and Rockchip fixes
- Intel HD-audio HDMI workarounds
- Usual HD- and USB-audio device-specific quirks"
* tag 'sound-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (54 commits)
ALSA: usb-audio: Add new quirk FIXED_RATE for JBL Quantum810 Wireless
ALSA: azt3328: Remove the unused function snd_azf3328_codec_outl()
ASoC: lochnagar: Fix unused lochnagar_of_match warning
ASoC: Intel: Add HP Stream 8 to bytcr_rt5640.c
ASoC: SOF: mediatek: initialize panic_info to zero
ASoC: rt5670: Remove unbalanced pm_runtime_put()
ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 tablet
ASoC: Intel: soc-acpi: update codec addr on 0C11/0C4F product
ASoC: rockchip: spdif: Add missing clk_disable_unprepare() in rk_spdif_runtime_resume()
ASoC: wm8994: Fix potential deadlock
ASoC: mediatek: mt8195: add sof be ops to check audio active
ASoC: SOF: Revert: "core: unregister clients and machine drivers in .shutdown"
ASoC: SOF: Intel: pci-tgl: unblock S5 entry if DMA stop has failed"
ALSA: hda/hdmi: fix stream-id config keep-alive for rt suspend
ALSA: hda/hdmi: set default audio parameters for KAE silent-stream
ALSA: hda/hdmi: fix i915 silent stream programming flow
ALSA: hda: Error out if invalid stream is being setup
ASoC: dt-bindings: fsl-sai: Reinstate i.MX93 SAI compatible string
ASoC: soc-pcm.c: Clear DAIs parameters after stream_active is updated
ASoC: codecs: wcd-clsh: Remove the unused function
...