KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter
authorSean Christopherson <seanjc@google.com>
Tue, 10 Aug 2021 14:45:26 +0000 (07:45 -0700)
committerPaolo Bonzini <pbonzini@redhat.com>
Fri, 13 Aug 2021 07:35:17 +0000 (03:35 -0400)
Clear nested.pi_pending on nested VM-Enter even if L2 will run without
posted interrupts enabled.  If nested.pi_pending is left set from a
previous L2, vmx_complete_nested_posted_interrupt() will pick up the
stale flag and exit to userspace with an "internal emulation error" due
the new L2 not having a valid nested.pi_desc.

Arguably, vmx_complete_nested_posted_interrupt() should first check for
posted interrupts being enabled, but it's also completely reasonable that
KVM wouldn't screw up a fundamental flag.  Not to mention that the mere
existence of nested.pi_pending is a long-standing bug as KVM shouldn't
move the posted interrupt out of the IRR until it's actually processed,
e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit
with a "pending" posted interrupt.  Fixing the mess is a future problem.

Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI
descriptor as an error, this was a benign bug as the null PI descriptor
effectively served as a check on PI not being enabled.  Even then, the
new flow did not become problematic until KVM started checking the result
of kvm_check_nested_events().

Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing")
Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable")
Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210810144526.2662272-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arch/x86/kvm/vmx/nested.c

index 264a9f4c91792e52b035ce8bac8aafc4de75bf25..bc6327950657ea1a74b4e6579de8534d93df41ac 100644 (file)
@@ -2187,12 +2187,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
                         ~PIN_BASED_VMX_PREEMPTION_TIMER);
 
        /* Posted interrupts setting is only taken from vmcs12.  */
-       if (nested_cpu_has_posted_intr(vmcs12)) {
+       vmx->nested.pi_pending = false;
+       if (nested_cpu_has_posted_intr(vmcs12))
                vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
-               vmx->nested.pi_pending = false;
-       } else {
+       else
                exec_control &= ~PIN_BASED_POSTED_INTR;
-       }
        pin_controls_set(vmx, exec_control);
 
        /*