Wei Yongjun [Tue, 9 Feb 2010 02:41:56 +0000 (10:41 +0800)]
KVM: ia64: destroy ioapic device if fail to setup default irq routing
If KVM_CREATE_IRQCHIP fail due to kvm_setup_default_irq_routing(),
ioapic device is not destroyed and kvm->arch.vioapic is not set to
NULL, this may cause KVM_GET_IRQCHIP and KVM_SET_IRQCHIP access to
unexcepted memory.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Feb 2010 02:33:03 +0000 (10:33 +0800)]
KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl
If we fail to init ioapic device or the fail to setup the default irq
routing, the device register by kvm_create_pic() and kvm_ioapic_init()
remain unregister. This patch fixed to do this.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Feb 2010 02:31:09 +0000 (10:31 +0800)]
KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure
kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure
due to cannot register io dev.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 8 Feb 2010 09:03:51 +0000 (17:03 +0800)]
KVM: PIT: unregister kvm irq notifier if fail to create pit
If fail to create pit, we should unregister kvm irq notifier
which register in kvm_create_pit().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Tue, 9 Feb 2010 08:41:53 +0000 (16:41 +0800)]
KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT
Following the new SDM. Now the bit is named "Ignore PAT memory type".
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 31 Dec 2009 10:10:16 +0000 (12:10 +0200)]
KVM: MMU: Add tracepoint for guest page aging
Signed-off-by: Avi Kivity <avi@redhat.com>
Jochen Maes [Mon, 8 Feb 2010 10:29:33 +0000 (11:29 +0100)]
KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c
Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c
Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Fri, 5 Feb 2010 08:52:46 +0000 (17:52 +0900)]
KVM: Remove redundant reading of rax on OUT instructions
kvm_emulate_pio() and complete_pio() both read out the
RAX register value and copy it to a place into which
the value read out from the port will be copied later.
This patch removes this redundancy.
/*** snippet from arch/x86/kvm/x86.c ***/
int complete_pio(struct kvm_vcpu *vcpu)
{
...
if (!io->string) {
if (io->in) {
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(&val, vcpu->arch.pio_data, io->size);
kvm_register_write(vcpu, VCPU_REGS_RAX, val);
}
...
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Rik van Riel [Wed, 3 Feb 2010 21:11:03 +0000 (16:11 -0500)]
KVM: VMX: emulate accessed bit for EPT
Currently KVM pretends that pages with EPT mappings never got
accessed. This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.
We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.
TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
This seems to help prevent KVM guests from being swapped out when
they should not on my system.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Thu, 28 Jan 2010 11:37:56 +0000 (12:37 +0100)]
KVM: Introduce kvm_host_page_size
This patch introduces a generic function to find out the
host page size for a given gfn. This function is needed by
the kvm iommu code. This patch also simplifies the x86
host_mapping_level function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Julia Lawall [Sat, 6 Feb 2010 08:43:03 +0000 (09:43 +0100)]
KVM: VMX: Remove redundant test in vmx_set_efer()
msr was tested above, so the second test is not needed.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@
if (x == NULL || ...) {
... when forall
return ...; }
... when != goto l;
when != x = e
when != &x
*x == NULL
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joe Perches [Tue, 2 Feb 2010 07:22:07 +0000 (23:22 -0800)]
KVM: ia64: Fix string literal continuation lines
String constants that are continued on subsequent lines with \
are not good.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 7 Feb 2010 09:56:52 +0000 (11:56 +0200)]
KVM: VMX: Wire up .fpu_activate() callback
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Mon, 1 Feb 2010 13:11:52 +0000 (22:11 +0900)]
KVM: fix kvm_fix_hypercall() to return X86EMUL_*
This patch fixes kvm_fix_hypercall() to propagate X86EMUL_*
info generated by emulator_write_emulated() to its callers:
suggested by Marcelo.
The effect of this is x86_emulate_insn() will begin to handle
the page faults which occur in emulator_write_emulated():
this should be OK because emulator_write_emulated_onepage()
always injects page fault when emulator_write_emulated()
returns X86EMUL_PROPAGATE_FAULT.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Mon, 1 Feb 2010 13:11:04 +0000 (22:11 +0900)]
KVM: fix load_guest_segment_descriptor() to return X86EMUL_*
This patch fixes load_guest_segment_descriptor() to return
X86EMUL_PROPAGATE_FAULT when it tries to access the descriptor
table beyond the limit of it: suggested by Marcelo.
I have checked current callers of this helper function,
- kvm_load_segment_descriptor()
- kvm_task_switch()
and confirmed that this patch will change nothing in the
upper layers if we do not change the handling of this
return value from load_guest_segment_descriptor().
Next step: Although fixing the kvm_task_switch() to handle the
propagated faults properly seems difficult, and maybe not worth
it because TSS is not used commonly these days, we can fix
kvm_load_segment_descriptor(). By doing so, the injected #GP
becomes possible to be handled by the guest. The only problem
for this is how to differentiate this fault from the page faults
generated by kvm_read_guest_virt(). We may have to split this
function to achive this goal.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zhai, Edwin [Fri, 29 Jan 2010 06:38:44 +0000 (14:38 +0800)]
KVM: enable PCI multiple-segments for pass-through device
Enable optional parameter (default 0) - PCI segment (or domain) besides
BDF, when assigning PCI device to guest.
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Fri, 29 Jan 2010 07:36:59 +0000 (15:36 +0800)]
KVM: VMX: Remove redundant check in vm_need_virtualize_apic_accesses()
flexpriority_enabled implies cpu_has_vmx_virtualize_apic_accesses() returning
true, so we don't need this check here.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 25 Jan 2010 17:47:02 +0000 (19:47 +0200)]
KVM: Trace failed msr reads and writes
Record failed msrs reads and writes, and the fact that they failed as well.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 25 Jan 2010 17:36:03 +0000 (19:36 +0200)]
KVM: Fix msr trace
- data is 64 bits wide, not unsigned long
- rw is confusingly named
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 25 Jan 2010 10:01:04 +0000 (12:01 +0200)]
KVM: mark segments accessed on HW task switch
On HW task switch newly loaded segments should me marked as accessed.
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Sun, 24 Jan 2010 14:26:40 +0000 (16:26 +0200)]
KVM: VMX: Pass cr0.mp through to the guest when the fpu is active
When cr0.mp is clear, the guest doesn't expect a #NM in response to
a WAIT instruction. Because we always keep cr0.mp set, it will get
a #NM, and potentially be confused.
Fix by keeping cr0.mp set only when the fpu is inactive, and passing
it through when inactive.
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Analyzed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Liu Yu [Fri, 22 Jan 2010 11:36:53 +0000 (19:36 +0800)]
KVM: PPC E500: fix tlbcfg emulation
commit
55fb1027c1cf9797dbdeab48180da530e81b1c39 doesn't update tlbcfg correctly.
Fix it.
And since guest OS likes 'fixed' hardware,
initialize tlbcfg everytime when guest access is useless.
So move this part to init code.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Liu Yu [Fri, 22 Jan 2010 10:50:30 +0000 (18:50 +0800)]
KVM: PPC: Add PVR/PIR init for E500
commit
513579e3a391a3874c478a8493080822069976e8 change the way
we emulate PVR/PIR,
which left PVR/PIR uninitialized on E500, and make guest puzzled.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Liu Yu [Fri, 22 Jan 2010 10:50:29 +0000 (18:50 +0800)]
KVM: PPC E500: Add register l1csr0 emulation
Latest kernel start to access l1csr0 to contron L1.
We just tell guest no operation is on going.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 22 Jan 2010 08:55:05 +0000 (16:55 +0800)]
KVM: MMU: Remove some useless code from alloc_mmu_pages()
If we fail to alloc page for vcpu->arch.mmu.pae_root, call to
free_mmu_pages() is unnecessary, which just do free the page
malloc for vcpu->arch.mmu.pae_root.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:52 +0000 (15:31 +0200)]
KVM: trace guest fpu loads and unloads
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:51 +0000 (15:31 +0200)]
KVM: Optimize kvm_read_cr[04]_bits()
'mask' is always a constant, so we can check whether it includes a bit that
might be owned by the guest very cheaply, and avoid the decache call. Saves
a few hundred bytes of module text.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:50 +0000 (15:31 +0200)]
KVM: Rename vcpu->shadow_efer to efer
None of the other registers have the shadow_ prefix.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:49 +0000 (15:31 +0200)]
KVM: Move cr0/cr4/efer related helpers to x86.h
They have more general scope than the mmu.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:48 +0000 (15:31 +0200)]
KVM: Add a helper for checking if the guest is in protected mode
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:47 +0000 (15:31 +0200)]
KVM: Activate fpu on clts
Assume that if the guest executes clts, it knows what it's doing, and load the
guest fpu to prevent an #NM exception.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:46 +0000 (15:31 +0200)]
KVM: Drop kvm_{load,put}_guest_fpu() exports
Not used anymore.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:45 +0000 (15:31 +0200)]
KVM: Allow kvm_load_guest_fpu() even when !vcpu->fpu_active
This allows accessing the guest fpu from the instruction emulator, as well as
being symmetric with kvm_put_guest_fpu().
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 21 Jan 2010 13:28:46 +0000 (15:28 +0200)]
KVM: x86: fix checking of cr0 validity
Move to/from Control Registers chapter of Intel SDM says. "Reserved bits
in CR0 remain clear after any load of those registers; attempts to set
them have no impact". Control Register chapter says "Bits 63:32 of CR0 are
reserved and must be written with zeros. Writing a nonzero value to any
of the upper 32 bits results in a general-protection exception, #GP(0)."
This patch tries to implement this twisted logic.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Thu, 21 Jan 2010 08:20:04 +0000 (16:20 +0800)]
KVM: Fix kvm_coalesced_mmio_ring duplicate allocation
The commit
0953ca73 "KVM: Simplify coalesced mmio initialization"
allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
didn't discard the original allocation...
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: SVM: Trap all debug register accesses
To enable proper debug register emulation under all conditions, trap
access to all DR0..7. This may be optimized later on.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: SVM: Clean up and enhance mov dr emulation
Enhance mov dr instruction emulation used by SVM so that it properly
handles dr4/5: alias to dr6/7 if cr4.de is cleared. Otherwise return
EMULATE_FAIL which will let our only possible caller in that scenario,
ud_interception, re-inject UD.
We do not need to inject faults, SVM does this for us (exceptions take
precedence over instruction interceptions). For the same reason, the
value overflow checks can be removed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Clean up DR6 emulation
As we trap all debug register accesses, we do not need to switch real
DR6 at all. Clean up update_exception_bitmap at this chance, too.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Fix emulation of DR4 and DR5
Make sure DR4 and DR5 are aliased to DR6 and DR7, respectively, if
CR4.DE is not set.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Fix exceptions of mov to dr
Injecting GP without an error code is a bad idea (causes unhandled guest
exits). Moreover, we must not skip the instruction if we injected an
exception.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Wed, 20 Jan 2010 07:47:21 +0000 (16:47 +0900)]
KVM: x86: Use macros for x86_emulate_ops to avoid future mistakes
The return values from x86_emulate_ops are defined
in kvm_emulate.h as macros X86EMUL_*.
But in emulate.c, we are comparing the return values
from these ops with 0 to check if they're X86EMUL_CONTINUE
or not: X86EMUL_CONTINUE is defined as 0 now.
To avoid possible mistakes in the future, this patch
substitutes "X86EMUL_CONTINUE" for "0" that are being
compared with the return values from x86_emulate_ops.
We think that there are more places we should use these
macros, but the meanings of rc values in x86_emulate_insn()
were not so clear at a glance. If we use proper macros in
this function, we would be able to follow the flow of each
emulation more easily and, maybe, more securely.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Tue, 19 Jan 2010 14:45:23 +0000 (12:45 -0200)]
KVM: fix cleanup_srcu_struct on vm destruction
cleanup_srcu_struct on VM destruction remains broken:
BUG: unable to handle kernel paging request at
ffffffffffffffff
IP: [<
ffffffff802533d2>] srcu_read_lock+0x16/0x21
RIP: 0010:[<
ffffffff802533d2>] [<
ffffffff802533d2>] srcu_read_lock+0x16/0x21
Call Trace:
[<
ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm]
[<
ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm]
[<
ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel]
[<
ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm]
[<
ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm]
Move it to kvm_arch_destroy_vm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Gleb Natapov [Tue, 19 Jan 2010 13:06:38 +0000 (15:06 +0200)]
KVM: fix Hyper-V hypercall warnings and wrong mask value
Fix compilation warnings and wrong mask value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 19 Jan 2010 09:43:21 +0000 (17:43 +0800)]
KVM: VMX: Remove emulation failure report
As Avi noted:
>There are two problems with the kernel failure report. First, it
>doesn't report enough data - registers, surrounding instructions, etc.
>that are needed to explain what is going on. Second, it can flood
>dmesg, which is a pretty bad thing to do.
So we remove the emulation failure report in handle_invalid_guest_state(),
and would inspected the guest using userspace tool in the future.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 18 Jan 2010 11:26:34 +0000 (13:26 +0200)]
KVM: export <asm/hyperv.h>
Needed by <asm/kvm_para.h>.
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Mon, 18 Jan 2010 09:45:10 +0000 (18:45 +0900)]
KVM: rename is_writeble_pte() to is_writable_pte()
There are two spellings of "writable" in
arch/x86/kvm/mmu.c and paging_tmpl.h .
This patch renames is_writeble_pte() to is_writable_pte()
and makes grepping easy.
New name is consistent with the definition of itself:
return pte & PT_WRITABLE_MASK;
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:24 +0000 (15:51 +0200)]
KVM: Implement NotifyLongSpinWait HYPER-V hypercall
Windows issues this hypercall after guest was spinning on a spinlock
for too many iterations.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:23 +0000 (15:51 +0200)]
KVM: Add HYPER-V apic access MSRs
Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up
access to EOI/TPR/ICR apic registers for PV guests.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:22 +0000 (15:51 +0200)]
KVM: Implement bare minimum of HYPER-V MSRs
Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and
VP_INDEX MSRs.
[avi: fix build on i386]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:21 +0000 (15:51 +0200)]
KVM: Add HYPER-V header file
Provide HYPER-V related defines that will be used by following patches.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:14 +0000 (14:49 +0100)]
KVM: PPC: Move Shadow MSR calculation to function
We keep a copy of the MSR around that we use when we go into the guest context.
That copy is basically the normal process MSR flags OR some allowed guest
specified MSR flags. We also AND the external providers into this, so we get
traps on FPU usage when we haven't activated it on the host yet.
Currently this calculation is part of the set_msr function that we use whenever
we set the guest MSR value. With the external providers, we also have the case
that we don't modify the guest's MSR, but only want to update the shadow MSR.
So let's move the shadow MSR parts to a separate function that we then use
whenever we only need to update it. That way we don't accidently kvm_vcpu_block
within a preempt notifier context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:13 +0000 (14:49 +0100)]
KVM: PPC: Keep SRR1 flags around in shadow_msr
SRR1 stores more information that just the MSR value. It also stores
valuable information about the type of interrupt we received, for
example whether the storage interrupt we just got was because of a
missing htab entry or not.
We use that information to speed up the exit path.
Now if we get preempted before we can interpret the shadow_msr values,
we get into vcpu_put which then calls the MSR handler, which then sets
all the SRR1 information bits in shadow_msr to 0. Great.
So let's preserve the SRR1 specific bits in shadow_msr whenever we set
the MSR. They don't hurt.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:12 +0000 (14:49 +0100)]
KVM: PPC: Fix initial GPR settings
Commit
7d01b4c3ed2bb33ceaf2d270cb4831a67a76b51b introduced PACA backed vcpu
values. With this patch, when a userspace app was setting GPRs before it was
actually first loaded, the set values get discarded.
This is because vcpu_load loads them from the vcpu backing store that we use
whenever we're not owning the PACA.
That behavior is not really a major problem, because we don't need it for
qemu. Other users (like kvmctl) do have problems with it though, so let's
better do it right.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:11 +0000 (14:49 +0100)]
KVM: PPC: Add support for FPU/Altivec/VSX
When our guest starts using either the FPU, Altivec or VSX we need to make
sure Linux knows about it and sneak into its process switching code
accordingly.
This patch makes accesses to the above parts of the system work inside the
VM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:10 +0000 (14:49 +0100)]
KVM: PPC: Add helper functions to call real mode loaders
Linux contains quite some bits of code to load FPU, Altivec and VSX lazily for
a task. It calls those bits in real mode, coming from an interrupt handler.
For KVM we better reuse those, so let's wrap a bit of trampoline magic around
them and then we can call them from normal module code.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:09 +0000 (14:49 +0100)]
KVM: PPC: Export __giveup_vsx
We need to explicitly only giveup VSX in KVM, so let's export that
specific function to module space.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Roel Kluin [Thu, 14 Jan 2010 17:05:58 +0000 (18:05 +0100)]
KVM: ia64: remove redundant kvm_get_exit_data() NULL tests
kvm_get_exit_data() cannot return a NULL pointer.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 10 Jan 2010 10:19:20 +0000 (12:19 +0200)]
KVM: SVM: Lazy fpu with npt
Now that we can allow the guest to play with cr0 when the fpu is loaded,
we can enable lazy fpu when npt is in use.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 08:55:27 +0000 (10:55 +0200)]
KVM: SVM: Selective cr0 intercept
If two conditions apply:
- no bits outside TS and EM differ between the host and guest cr0
- the fpu is active
then we can activate the selective cr0 write intercept and drop the
unconditional cr0 read and write intercept, and allow the guest to run
with the host fpu state. This reduces cr0 exits due to guest fpu management
while the guest fpu is loaded.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 10 Jan 2010 10:14:04 +0000 (12:14 +0200)]
KVM: SVM: Restore unconditional cr0 intercept under npt
Currently we don't intercept cr0 at all when npt is enabled. This improves
performance but requires us to activate the fpu at all times.
Remove this behaviour in preparation for adding selective cr0 intercepts.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 7 Jan 2010 11:16:08 +0000 (13:16 +0200)]
KVM: SVM: Initialize fpu_active in init_vmcb()
init_vmcb() sets up the intercepts as if the fpu is active, so initialize it
there. This avoids an INIT from setting up intercepts inconsistent with
fpu_active.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 11:13:01 +0000 (13:13 +0200)]
KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK
Instead of selecting TS and MP as the comments say, the macro included TS and
PE. Luckily the macro is unused now, but fix in order to save a few hours of
debugging from anyone who attempts to use it.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 17:10:22 +0000 (19:10 +0200)]
KVM: Set cr0.et when the guest writes cr0
Follow the hardware.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 30 Dec 2009 16:07:40 +0000 (18:07 +0200)]
KVM: VMX: Give the guest ownership of cr0.ts when the fpu is active
If the guest fpu is loaded, there is nothing interesing about cr0.ts; let
the guest play with it as it will. This makes context switches between fpu
intensive guest processes faster, as we won't trap the clts and cr0 write
instructions.
[marcelo: fix cr0 read shadow update on fpu deactivation; kills F8 install]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Wed, 30 Dec 2009 10:40:26 +0000 (12:40 +0200)]
KVM: Lazify fpu activation and deactivation
Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.
We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 16:43:06 +0000 (18:43 +0200)]
KVM: VMX: Allow the guest to own some cr0 bits
We will use this later to give the guest ownership of cr0.ts.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 16:07:30 +0000 (18:07 +0200)]
KVM: Replace read accesses of vcpu->arch.cr0 by an accessor
Since we'd like to allow the guest to own a few bits of cr0 at times, we need
to know when we access those bits.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 15:33:58 +0000 (17:33 +0200)]
KVM: VMX: trace clts and lmsw instructions as cr accesses
clts writes cr0.ts; lmsw writes cr0[0:15] - record that in ftrace.
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Sun, 10 Jan 2010 02:27:47 +0000 (03:27 +0100)]
KVM: PPC: Make large pages work
An SLB entry contains two pieces of information related to size:
1) PTE size
2) SLB size
The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.
Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.
This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Sun, 10 Jan 2010 02:27:32 +0000 (03:27 +0100)]
KVM: PPC: Pass through program interrupts
When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.
If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.
So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:09 +0000 (02:58 +0100)]
KVM: PPC: Pass program interrupt flags to the guest
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:08 +0000 (02:58 +0100)]
KVM: PPC: Fix HID5 setting code
The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:07 +0000 (02:58 +0100)]
KVM: PPC: Emulate trap SRR1 flags properly
Book3S needs some flags in SRR1 to get to know details about an interrupt.
One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.
This patch implements above behavior, making WARN_ON behave like WARN_ON.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:06 +0000 (02:58 +0100)]
KVM: PPC: Call SLB patching code in interrupt safe manner
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.
To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:
A small helper in linear mapped memory that does mtmsr with IR=0 and
then RFIs info the actual handler.
Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:05 +0000 (02:58 +0100)]
KVM: PPC: Get rid of unnecessary RFI
Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.
Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:04 +0000 (02:58 +0100)]
KVM: PPC: Implement 'skip instruction' mode
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.
Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.
To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.
That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:03 +0000 (02:58 +0100)]
KVM: PPC: Use PACA backed shadow vcpu
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.
After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.
That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:02 +0000 (02:58 +0100)]
KVM: PPC: Add helpers for CR, XER
We now have helpers for the GPRs, so let's also add some for CR and XER.
Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:01 +0000 (02:58 +0100)]
KVM: PPC: Use accessor functions for GPR access
All code in PPC KVM currently accesses gprs in the vcpu struct directly.
While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.
So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Wed, 6 Jan 2010 08:55:23 +0000 (17:55 +0900)]
KVM: Fix the explanation of write_emulated
The explanation of write_emulated is confused with
that of read_emulated. This patch fix it.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 5 Jan 2010 11:02:29 +0000 (19:02 +0800)]
KVM: VMX: Enable EPT 1GB page support
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 5 Jan 2010 11:02:27 +0000 (19:02 +0800)]
KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_ops
Then the callback can provide the maximum supported large page level, which
is more flexible.
Also move the gb page support into x86_64 specific.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 5 Jan 2010 11:02:26 +0000 (19:02 +0800)]
KVM: x86: Moving PT_*_LEVEL to mmu.h
We can use them in x86.c and vmx.c now...
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Mon, 4 Jan 2010 21:19:25 +0000 (22:19 +0100)]
KVM: PPC: Enable lightweight exits again
The PowerPC C ABI defines that registers r14-r31 need to be preserved across
function calls. Since our exit handler is written in C, we can make use of that
and don't need to reload r14-r31 on every entry/exit cycle.
This technique is also used in the BookE code and is called "lightweight exits"
there. To follow the tradition, it's called the same in Book3S.
So far this optimization was disabled though, as the code didn't do what it was
expected to do, but failed to work.
This patch fixes and enables lightweight exits again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Mon, 4 Jan 2010 21:19:22 +0000 (22:19 +0100)]
KVM: PPC: Fix typo in rebolting code
When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.
Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 28 Dec 2009 12:08:30 +0000 (14:08 +0200)]
KVM: avoid taking ioapic mutex for non-ioapic EOIs
When the guest acknowledges an interrupt, it sends an EOI message to the local
apic, which broadcasts it to the ioapic. To handle the EOI, we need to take
the ioapic mutex.
On large guests, this causes a lot of contention on this mutex. Since large
guests usually don't route interrupts via the ioapic (they use msi instead),
this is completely unnecessary.
Avoid taking the mutex by introducing a handled_vectors bitmap. Before taking
the mutex, check if the ioapic was actually responsible for the acked vector.
If not, we can return early.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 28 Dec 2009 14:06:35 +0000 (16:06 +0200)]
KVM: Fill out ftrace exit reason strings
Some exit reasons missed their strings; fill out the table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Sun, 27 Dec 2009 15:00:46 +0000 (17:00 +0200)]
KVM: Bump maximum vcpu count to 64
With slots_lock converted to rcu, the entire kvm hotpath on modern processors
(with npt or ept) now scales beautifully. Increase the maximum vcpu count to
64 to reflect this.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:26 +0000 (14:35 -0200)]
KVM: convert slots_lock to a mutex
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:25 +0000 (14:35 -0200)]
KVM: switch vcpu context to use SRCU
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:24 +0000 (14:35 -0200)]
KVM: convert io_bus to SRCU
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:23 +0000 (14:35 -0200)]
KVM: x86: switch kvm_set_memory_alias to SRCU update
Using a similar two-step procedure as for memslots.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:22 +0000 (14:35 -0200)]
KVM: use SRCU for dirty log
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:21 +0000 (14:35 -0200)]
KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update
Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.
Also simplifies kvm_handle_hva locking.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:20 +0000 (14:35 -0200)]
KVM: use gfn_to_pfn_memslot in kvm_iommu_map_pages
So its possible to iommu map a memslot before making it visible to
kvm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:19 +0000 (14:35 -0200)]
KVM: introduce gfn_to_pfn_memslot
Which takes a memslot pointer instead of using kvm->memslots.
To be used by SRCU convertion later.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:18 +0000 (14:35 -0200)]
KVM: split kvm_arch_set_memory_region into prepare and commit
Required for SRCU convertion later.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:17 +0000 (14:35 -0200)]
KVM: modify alias layout in x86s struct kvm_arch
Have a pointer to an allocated region inside x86's kvm_arch.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Wed, 23 Dec 2009 16:35:16 +0000 (14:35 -0200)]
KVM: modify memslots layout in struct kvm
Have a pointer to an allocated region inside struct kvm.
[alex: fix ppc book 3s]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wu Fengguang [Thu, 24 Dec 2009 01:04:16 +0000 (09:04 +0800)]
KVM: trivial document fixes
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>