Joerg Roedel [Fri, 15 Jan 2010 13:41:15 +0000 (14:41 +0100)]
x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
This patch changes the old map_size parameter of alloc_pte
to a page_size parameter which can be used more easily to
alloc a pte for intermediate page sizes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Joerg Roedel [Mon, 11 Jan 2010 15:38:18 +0000 (16:38 +0100)]
kvm: Change kvm_iommu_map_pages to map large pages
This patch changes the implementation of of
kvm_iommu_map_pages to map the pages with the host page size
into the io virtual address space.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-By: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 20 Jan 2010 16:17:37 +0000 (17:17 +0100)]
VT-d: Change {un}map_range functions to implement {un}map interface
This patch changes the iommu-api functions for mapping and
unmapping page ranges to use the new page-size based
interface. This allows to remove the range based functions
later.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Joerg Roedel [Thu, 21 Jan 2010 15:32:27 +0000 (16:32 +0100)]
iommu-api: Add ->{un}map callbacks to iommu_ops
This patch adds new callbacks for mapping and unmapping
pages to the iommu_ops structure. These callbacks are aware
of page sizes which makes them different to the
->{un}map_range callbacks.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Joerg Roedel [Fri, 8 Jan 2010 12:35:09 +0000 (13:35 +0100)]
iommu-api: Add iommu_map and iommu_unmap functions
These two functions provide support for mapping and
unmapping physical addresses to io virtual addresses. The
difference to the iommu_(un)map_range() is that the new
functions take a gfp_order parameter instead of a size. This
allows the IOMMU backend implementations to detect easier if
a given range can be mapped by larger page sizes.
These new functions should replace the old ones in the long
term.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Joerg Roedel [Wed, 20 Jan 2010 13:52:23 +0000 (14:52 +0100)]
iommu-api: Rename ->{un}map function pointers to ->{un}map_range
The new function pointer names match better with the
top-level functions of the iommu-api which are using them.
Main intention of this change is to make the ->{un}map
pointer names free for two new mapping functions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:57 +0000 (17:47 +0100)]
KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
This marks the guest single-step API improvement of
94fe45da and
91586a3b with a capability flag to allow reliable detection by user
space.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: stable@kernel.org (2.6.33)
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:53 +0000 (17:47 +0100)]
KVM: VMX: Update instruction length on intercepted BP
We intercept #BP while in guest debugging mode. As VM exits due to
intercepted exceptions do not necessarily come with valid
idt_vectoring, we have to update event_exit_inst_len explicitly in such
cases. At least in the absence of migration, this ensures that
re-injections of #BP will find and use the correct instruction length.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: stable@kernel.org (2.6.32, 2.6.33)
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Thu, 18 Feb 2010 10:15:02 +0000 (12:15 +0200)]
KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
This patch fixes emulate_syscall(), emulate_sysenter() and
emulate_sysexit() to handle injected faults properly.
Even though original code injects faults in these functions,
we cannot handle these unless we use the different return
value from the UNHANDLEABLE case. So this patch use X86EMUL_*
codes instead of -1 and 0 and makes x86_emulate_insn() to
handle these propagated faults.
Be sure that, in x86_emulate_insn(), goto cannot_emulate and
goto done with rc equals X86EMUL_UNHANDLEABLE have same effect.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 18 Feb 2010 10:15:01 +0000 (12:15 +0200)]
KVM: Fix segment descriptor loading
Add proper error and permission checking. This patch also change task
switching code to load segment selectors before segment descriptors, like
SDM requires, otherwise permission checking during segment descriptor
loading will be incorrect.
Cc: stable@kernel.org (2.6.33, 2.6.32)
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Thu, 18 Feb 2010 10:15:00 +0000 (12:15 +0200)]
KVM: Fix load_guest_segment_descriptor() to inject page fault
This patch injects page fault when reading descriptor in
load_guest_segment_descriptor() fails with FAULT.
Effects of this injection: This function is used by
kvm_load_segment_descriptor() which is necessary for the
following instructions:
- mov seg,r/m16
- jmp far
- pop ?s
This patch makes it possible to emulate the page faults
generated by these instructions. But be sure that unless
we change the kvm_load_segment_descriptor()'s ret value
propagation this patch has no effect.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 18 Feb 2010 10:14:59 +0000 (12:14 +0200)]
KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
Inject #UD if guest attempts to do so. This is in accordance to Intel
SDM.
Cc: stable@kernel.org (2.6.33, 2.6.32)
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 18 Feb 2010 09:25:22 +0000 (11:25 +0200)]
KVM: Convert kvm->requests_lock to raw_spinlock_t
The code relies on kvm->requests_lock inhibiting preemption.
Noted by Jan Kiszka.
Signed-off-by: Avi Kivity <avi@redhat.com>
Thomas Gleixner [Wed, 17 Feb 2010 14:00:41 +0000 (14:00 +0000)]
KVM: Convert i8254/i8259 locks to raw_spinlocks
The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
them to raw_spinlock. No change for !RT kernels.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 11 Feb 2010 10:41:10 +0000 (12:41 +0200)]
KVM: x86 emulator: disallow opcode 82 in 64-bit mode
Instructions with opcode 82 are not valid in 64 bit mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Thu, 11 Feb 2010 03:12:07 +0000 (11:12 +0800)]
KVM: x86 emulator: code style cleanup
Just remove redundant semicolon.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 26 Jan 2010 14:30:06 +0000 (16:30 +0200)]
KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
These features are unused by modern userspace and can go away. Paravirt
mmu needs to stay a little longer for live migration.
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:36 +0000 (14:21 +0200)]
KVM: x86 emulator: Add LOCK prefix validity checking
Instructions which are not allowed to have LOCK prefix should
generate #UD if one is used.
[avi: fold opcode 82 fix from another patch]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:35 +0000 (14:21 +0200)]
KVM: x86 emulator: Check CPL level during privilege instruction emulation
Add CPL checking in case emulator is tricked into emulating
privilege instruction from userspace.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:34 +0000 (14:21 +0200)]
KVM: x86 emulator: Fix popf emulation
POPF behaves differently depending on current CPU mode. Emulate correct
logic to prevent guest from changing flags that it can't change otherwise.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:33 +0000 (14:21 +0200)]
KVM: x86 emulator: Check IOPL level during io instruction emulation
Make emulator check that vcpu is allowed to execute IN, INS, OUT,
OUTS, CLI, STI.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:32 +0000 (14:21 +0200)]
KVM: x86 emulator: fix memory access during x86 emulation
Currently when x86 emulator needs to access memory, page walk is done with
broadest permission possible, so if emulated instruction was executed
by userspace process it can still access kernel memory. Fix that by
providing correct memory access to page walker during emulation.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:31 +0000 (14:21 +0200)]
KVM: x86 emulator: Add Virtual-8086 mode of emulation
For some instructions CPU behaves differently for real-mode and
virtual 8086. Let emulator know which mode cpu is in, so it will
not poke into vcpu state directly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:30 +0000 (14:21 +0200)]
KVM: x86 emulator: Add group9 instruction decoding
Use groups mechanism to decode 0F C7 instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 10 Feb 2010 12:21:29 +0000 (14:21 +0200)]
KVM: x86 emulator: Add group8 instruction decoding
Use groups mechanism to decode 0F BA instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Michael S. Tsirkin [Wed, 13 Jan 2010 17:12:39 +0000 (19:12 +0200)]
KVM: do not store wqh in irqfd
wqh is unused, so we do not need to store it in irqfd anymore
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Liu Yu [Tue, 2 Feb 2010 11:44:35 +0000 (19:44 +0800)]
KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest
Old method prematurely sets ESR and DEAR.
Move this part after we decide to inject interrupt,
which is more like hardware behave.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Feb 2010 02:41:56 +0000 (10:41 +0800)]
KVM: ia64: destroy ioapic device if fail to setup default irq routing
If KVM_CREATE_IRQCHIP fail due to kvm_setup_default_irq_routing(),
ioapic device is not destroyed and kvm->arch.vioapic is not set to
NULL, this may cause KVM_GET_IRQCHIP and KVM_SET_IRQCHIP access to
unexcepted memory.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Feb 2010 02:33:03 +0000 (10:33 +0800)]
KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl
If we fail to init ioapic device or the fail to setup the default irq
routing, the device register by kvm_create_pic() and kvm_ioapic_init()
remain unregister. This patch fixed to do this.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Feb 2010 02:31:09 +0000 (10:31 +0800)]
KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure
kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure
due to cannot register io dev.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 8 Feb 2010 09:03:51 +0000 (17:03 +0800)]
KVM: PIT: unregister kvm irq notifier if fail to create pit
If fail to create pit, we should unregister kvm irq notifier
which register in kvm_create_pit().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Tue, 9 Feb 2010 08:41:53 +0000 (16:41 +0800)]
KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT
Following the new SDM. Now the bit is named "Ignore PAT memory type".
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 31 Dec 2009 10:10:16 +0000 (12:10 +0200)]
KVM: MMU: Add tracepoint for guest page aging
Signed-off-by: Avi Kivity <avi@redhat.com>
Jochen Maes [Mon, 8 Feb 2010 10:29:33 +0000 (11:29 +0100)]
KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c
Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c
Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Fri, 5 Feb 2010 08:52:46 +0000 (17:52 +0900)]
KVM: Remove redundant reading of rax on OUT instructions
kvm_emulate_pio() and complete_pio() both read out the
RAX register value and copy it to a place into which
the value read out from the port will be copied later.
This patch removes this redundancy.
/*** snippet from arch/x86/kvm/x86.c ***/
int complete_pio(struct kvm_vcpu *vcpu)
{
...
if (!io->string) {
if (io->in) {
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(&val, vcpu->arch.pio_data, io->size);
kvm_register_write(vcpu, VCPU_REGS_RAX, val);
}
...
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Rik van Riel [Wed, 3 Feb 2010 21:11:03 +0000 (16:11 -0500)]
KVM: VMX: emulate accessed bit for EPT
Currently KVM pretends that pages with EPT mappings never got
accessed. This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.
We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.
TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
This seems to help prevent KVM guests from being swapped out when
they should not on my system.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Thu, 28 Jan 2010 11:37:56 +0000 (12:37 +0100)]
KVM: Introduce kvm_host_page_size
This patch introduces a generic function to find out the
host page size for a given gfn. This function is needed by
the kvm iommu code. This patch also simplifies the x86
host_mapping_level function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Julia Lawall [Sat, 6 Feb 2010 08:43:03 +0000 (09:43 +0100)]
KVM: VMX: Remove redundant test in vmx_set_efer()
msr was tested above, so the second test is not needed.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@
if (x == NULL || ...) {
... when forall
return ...; }
... when != goto l;
when != x = e
when != &x
*x == NULL
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joe Perches [Tue, 2 Feb 2010 07:22:07 +0000 (23:22 -0800)]
KVM: ia64: Fix string literal continuation lines
String constants that are continued on subsequent lines with \
are not good.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 7 Feb 2010 09:56:52 +0000 (11:56 +0200)]
KVM: VMX: Wire up .fpu_activate() callback
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Mon, 1 Feb 2010 13:11:52 +0000 (22:11 +0900)]
KVM: fix kvm_fix_hypercall() to return X86EMUL_*
This patch fixes kvm_fix_hypercall() to propagate X86EMUL_*
info generated by emulator_write_emulated() to its callers:
suggested by Marcelo.
The effect of this is x86_emulate_insn() will begin to handle
the page faults which occur in emulator_write_emulated():
this should be OK because emulator_write_emulated_onepage()
always injects page fault when emulator_write_emulated()
returns X86EMUL_PROPAGATE_FAULT.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Mon, 1 Feb 2010 13:11:04 +0000 (22:11 +0900)]
KVM: fix load_guest_segment_descriptor() to return X86EMUL_*
This patch fixes load_guest_segment_descriptor() to return
X86EMUL_PROPAGATE_FAULT when it tries to access the descriptor
table beyond the limit of it: suggested by Marcelo.
I have checked current callers of this helper function,
- kvm_load_segment_descriptor()
- kvm_task_switch()
and confirmed that this patch will change nothing in the
upper layers if we do not change the handling of this
return value from load_guest_segment_descriptor().
Next step: Although fixing the kvm_task_switch() to handle the
propagated faults properly seems difficult, and maybe not worth
it because TSS is not used commonly these days, we can fix
kvm_load_segment_descriptor(). By doing so, the injected #GP
becomes possible to be handled by the guest. The only problem
for this is how to differentiate this fault from the page faults
generated by kvm_read_guest_virt(). We may have to split this
function to achive this goal.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zhai, Edwin [Fri, 29 Jan 2010 06:38:44 +0000 (14:38 +0800)]
KVM: enable PCI multiple-segments for pass-through device
Enable optional parameter (default 0) - PCI segment (or domain) besides
BDF, when assigning PCI device to guest.
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Fri, 29 Jan 2010 07:36:59 +0000 (15:36 +0800)]
KVM: VMX: Remove redundant check in vm_need_virtualize_apic_accesses()
flexpriority_enabled implies cpu_has_vmx_virtualize_apic_accesses() returning
true, so we don't need this check here.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 25 Jan 2010 17:47:02 +0000 (19:47 +0200)]
KVM: Trace failed msr reads and writes
Record failed msrs reads and writes, and the fact that they failed as well.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 25 Jan 2010 17:36:03 +0000 (19:36 +0200)]
KVM: Fix msr trace
- data is 64 bits wide, not unsigned long
- rw is confusingly named
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 25 Jan 2010 10:01:04 +0000 (12:01 +0200)]
KVM: mark segments accessed on HW task switch
On HW task switch newly loaded segments should me marked as accessed.
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Sun, 24 Jan 2010 14:26:40 +0000 (16:26 +0200)]
KVM: VMX: Pass cr0.mp through to the guest when the fpu is active
When cr0.mp is clear, the guest doesn't expect a #NM in response to
a WAIT instruction. Because we always keep cr0.mp set, it will get
a #NM, and potentially be confused.
Fix by keeping cr0.mp set only when the fpu is inactive, and passing
it through when inactive.
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Analyzed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Liu Yu [Fri, 22 Jan 2010 11:36:53 +0000 (19:36 +0800)]
KVM: PPC E500: fix tlbcfg emulation
commit
55fb1027c1cf9797dbdeab48180da530e81b1c39 doesn't update tlbcfg correctly.
Fix it.
And since guest OS likes 'fixed' hardware,
initialize tlbcfg everytime when guest access is useless.
So move this part to init code.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Liu Yu [Fri, 22 Jan 2010 10:50:30 +0000 (18:50 +0800)]
KVM: PPC: Add PVR/PIR init for E500
commit
513579e3a391a3874c478a8493080822069976e8 change the way
we emulate PVR/PIR,
which left PVR/PIR uninitialized on E500, and make guest puzzled.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Liu Yu [Fri, 22 Jan 2010 10:50:29 +0000 (18:50 +0800)]
KVM: PPC E500: Add register l1csr0 emulation
Latest kernel start to access l1csr0 to contron L1.
We just tell guest no operation is on going.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 22 Jan 2010 08:55:05 +0000 (16:55 +0800)]
KVM: MMU: Remove some useless code from alloc_mmu_pages()
If we fail to alloc page for vcpu->arch.mmu.pae_root, call to
free_mmu_pages() is unnecessary, which just do free the page
malloc for vcpu->arch.mmu.pae_root.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:52 +0000 (15:31 +0200)]
KVM: trace guest fpu loads and unloads
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:51 +0000 (15:31 +0200)]
KVM: Optimize kvm_read_cr[04]_bits()
'mask' is always a constant, so we can check whether it includes a bit that
might be owned by the guest very cheaply, and avoid the decache call. Saves
a few hundred bytes of module text.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:50 +0000 (15:31 +0200)]
KVM: Rename vcpu->shadow_efer to efer
None of the other registers have the shadow_ prefix.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:49 +0000 (15:31 +0200)]
KVM: Move cr0/cr4/efer related helpers to x86.h
They have more general scope than the mmu.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:48 +0000 (15:31 +0200)]
KVM: Add a helper for checking if the guest is in protected mode
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:47 +0000 (15:31 +0200)]
KVM: Activate fpu on clts
Assume that if the guest executes clts, it knows what it's doing, and load the
guest fpu to prevent an #NM exception.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:46 +0000 (15:31 +0200)]
KVM: Drop kvm_{load,put}_guest_fpu() exports
Not used anymore.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 21 Jan 2010 13:31:45 +0000 (15:31 +0200)]
KVM: Allow kvm_load_guest_fpu() even when !vcpu->fpu_active
This allows accessing the guest fpu from the instruction emulator, as well as
being symmetric with kvm_put_guest_fpu().
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 21 Jan 2010 13:28:46 +0000 (15:28 +0200)]
KVM: x86: fix checking of cr0 validity
Move to/from Control Registers chapter of Intel SDM says. "Reserved bits
in CR0 remain clear after any load of those registers; attempts to set
them have no impact". Control Register chapter says "Bits 63:32 of CR0 are
reserved and must be written with zeros. Writing a nonzero value to any
of the upper 32 bits results in a general-protection exception, #GP(0)."
This patch tries to implement this twisted logic.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Thu, 21 Jan 2010 08:20:04 +0000 (16:20 +0800)]
KVM: Fix kvm_coalesced_mmio_ring duplicate allocation
The commit
0953ca73 "KVM: Simplify coalesced mmio initialization"
allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
didn't discard the original allocation...
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: SVM: Trap all debug register accesses
To enable proper debug register emulation under all conditions, trap
access to all DR0..7. This may be optimized later on.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: SVM: Clean up and enhance mov dr emulation
Enhance mov dr instruction emulation used by SVM so that it properly
handles dr4/5: alias to dr6/7 if cr4.de is cleared. Otherwise return
EMULATE_FAIL which will let our only possible caller in that scenario,
ud_interception, re-inject UD.
We do not need to inject faults, SVM does this for us (exceptions take
precedence over instruction interceptions). For the same reason, the
value overflow checks can be removed.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Clean up DR6 emulation
As we trap all debug register accesses, we do not need to switch real
DR6 at all. Clean up update_exception_bitmap at this chance, too.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Fix emulation of DR4 and DR5
Make sure DR4 and DR5 are aliased to DR6 and DR7, respectively, if
CR4.DE is not set.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jan Kiszka [Wed, 20 Jan 2010 17:20:20 +0000 (18:20 +0100)]
KVM: VMX: Fix exceptions of mov to dr
Injecting GP without an error code is a bad idea (causes unhandled guest
exits). Moreover, we must not skip the instruction if we injected an
exception.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Wed, 20 Jan 2010 07:47:21 +0000 (16:47 +0900)]
KVM: x86: Use macros for x86_emulate_ops to avoid future mistakes
The return values from x86_emulate_ops are defined
in kvm_emulate.h as macros X86EMUL_*.
But in emulate.c, we are comparing the return values
from these ops with 0 to check if they're X86EMUL_CONTINUE
or not: X86EMUL_CONTINUE is defined as 0 now.
To avoid possible mistakes in the future, this patch
substitutes "X86EMUL_CONTINUE" for "0" that are being
compared with the return values from x86_emulate_ops.
We think that there are more places we should use these
macros, but the meanings of rc values in x86_emulate_insn()
were not so clear at a glance. If we use proper macros in
this function, we would be able to follow the flow of each
emulation more easily and, maybe, more securely.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Marcelo Tosatti [Tue, 19 Jan 2010 14:45:23 +0000 (12:45 -0200)]
KVM: fix cleanup_srcu_struct on vm destruction
cleanup_srcu_struct on VM destruction remains broken:
BUG: unable to handle kernel paging request at
ffffffffffffffff
IP: [<
ffffffff802533d2>] srcu_read_lock+0x16/0x21
RIP: 0010:[<
ffffffff802533d2>] [<
ffffffff802533d2>] srcu_read_lock+0x16/0x21
Call Trace:
[<
ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm]
[<
ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm]
[<
ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel]
[<
ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm]
[<
ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm]
Move it to kvm_arch_destroy_vm.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Gleb Natapov [Tue, 19 Jan 2010 13:06:38 +0000 (15:06 +0200)]
KVM: fix Hyper-V hypercall warnings and wrong mask value
Fix compilation warnings and wrong mask value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Sheng Yang [Tue, 19 Jan 2010 09:43:21 +0000 (17:43 +0800)]
KVM: VMX: Remove emulation failure report
As Avi noted:
>There are two problems with the kernel failure report. First, it
>doesn't report enough data - registers, surrounding instructions, etc.
>that are needed to explain what is going on. Second, it can flood
>dmesg, which is a pretty bad thing to do.
So we remove the emulation failure report in handle_invalid_guest_state(),
and would inspected the guest using userspace tool in the future.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 18 Jan 2010 11:26:34 +0000 (13:26 +0200)]
KVM: export <asm/hyperv.h>
Needed by <asm/kvm_para.h>.
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Mon, 18 Jan 2010 09:45:10 +0000 (18:45 +0900)]
KVM: rename is_writeble_pte() to is_writable_pte()
There are two spellings of "writable" in
arch/x86/kvm/mmu.c and paging_tmpl.h .
This patch renames is_writeble_pte() to is_writable_pte()
and makes grepping easy.
New name is consistent with the definition of itself:
return pte & PT_WRITABLE_MASK;
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:24 +0000 (15:51 +0200)]
KVM: Implement NotifyLongSpinWait HYPER-V hypercall
Windows issues this hypercall after guest was spinning on a spinlock
for too many iterations.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:23 +0000 (15:51 +0200)]
KVM: Add HYPER-V apic access MSRs
Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up
access to EOI/TPR/ICR apic registers for PV guests.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:22 +0000 (15:51 +0200)]
KVM: Implement bare minimum of HYPER-V MSRs
Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and
VP_INDEX MSRs.
[avi: fix build on i386]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 17 Jan 2010 13:51:21 +0000 (15:51 +0200)]
KVM: Add HYPER-V header file
Provide HYPER-V related defines that will be used by following patches.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:14 +0000 (14:49 +0100)]
KVM: PPC: Move Shadow MSR calculation to function
We keep a copy of the MSR around that we use when we go into the guest context.
That copy is basically the normal process MSR flags OR some allowed guest
specified MSR flags. We also AND the external providers into this, so we get
traps on FPU usage when we haven't activated it on the host yet.
Currently this calculation is part of the set_msr function that we use whenever
we set the guest MSR value. With the external providers, we also have the case
that we don't modify the guest's MSR, but only want to update the shadow MSR.
So let's move the shadow MSR parts to a separate function that we then use
whenever we only need to update it. That way we don't accidently kvm_vcpu_block
within a preempt notifier context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:13 +0000 (14:49 +0100)]
KVM: PPC: Keep SRR1 flags around in shadow_msr
SRR1 stores more information that just the MSR value. It also stores
valuable information about the type of interrupt we received, for
example whether the storage interrupt we just got was because of a
missing htab entry or not.
We use that information to speed up the exit path.
Now if we get preempted before we can interpret the shadow_msr values,
we get into vcpu_put which then calls the MSR handler, which then sets
all the SRR1 information bits in shadow_msr to 0. Great.
So let's preserve the SRR1 specific bits in shadow_msr whenever we set
the MSR. They don't hurt.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:12 +0000 (14:49 +0100)]
KVM: PPC: Fix initial GPR settings
Commit
7d01b4c3ed2bb33ceaf2d270cb4831a67a76b51b introduced PACA backed vcpu
values. With this patch, when a userspace app was setting GPRs before it was
actually first loaded, the set values get discarded.
This is because vcpu_load loads them from the vcpu backing store that we use
whenever we're not owning the PACA.
That behavior is not really a major problem, because we don't need it for
qemu. Other users (like kvmctl) do have problems with it though, so let's
better do it right.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:11 +0000 (14:49 +0100)]
KVM: PPC: Add support for FPU/Altivec/VSX
When our guest starts using either the FPU, Altivec or VSX we need to make
sure Linux knows about it and sneak into its process switching code
accordingly.
This patch makes accesses to the above parts of the system work inside the
VM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:10 +0000 (14:49 +0100)]
KVM: PPC: Add helper functions to call real mode loaders
Linux contains quite some bits of code to load FPU, Altivec and VSX lazily for
a task. It calls those bits in real mode, coming from an interrupt handler.
For KVM we better reuse those, so let's wrap a bit of trampoline magic around
them and then we can call them from normal module code.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 15 Jan 2010 13:49:09 +0000 (14:49 +0100)]
KVM: PPC: Export __giveup_vsx
We need to explicitly only giveup VSX in KVM, so let's export that
specific function to module space.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Roel Kluin [Thu, 14 Jan 2010 17:05:58 +0000 (18:05 +0100)]
KVM: ia64: remove redundant kvm_get_exit_data() NULL tests
kvm_get_exit_data() cannot return a NULL pointer.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 10 Jan 2010 10:19:20 +0000 (12:19 +0200)]
KVM: SVM: Lazy fpu with npt
Now that we can allow the guest to play with cr0 when the fpu is loaded,
we can enable lazy fpu when npt is in use.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 08:55:27 +0000 (10:55 +0200)]
KVM: SVM: Selective cr0 intercept
If two conditions apply:
- no bits outside TS and EM differ between the host and guest cr0
- the fpu is active
then we can activate the selective cr0 write intercept and drop the
unconditional cr0 read and write intercept, and allow the guest to run
with the host fpu state. This reduces cr0 exits due to guest fpu management
while the guest fpu is loaded.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 10 Jan 2010 10:14:04 +0000 (12:14 +0200)]
KVM: SVM: Restore unconditional cr0 intercept under npt
Currently we don't intercept cr0 at all when npt is enabled. This improves
performance but requires us to activate the fpu at all times.
Remove this behaviour in preparation for adding selective cr0 intercepts.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 7 Jan 2010 11:16:08 +0000 (13:16 +0200)]
KVM: SVM: Initialize fpu_active in init_vmcb()
init_vmcb() sets up the intercepts as if the fpu is active, so initialize it
there. This avoids an INIT from setting up intercepts inconsistent with
fpu_active.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 11:13:01 +0000 (13:13 +0200)]
KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK
Instead of selecting TS and MP as the comments say, the macro included TS and
PE. Luckily the macro is unused now, but fix in order to save a few hours of
debugging from anyone who attempts to use it.
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 6 Jan 2010 17:10:22 +0000 (19:10 +0200)]
KVM: Set cr0.et when the guest writes cr0
Follow the hardware.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 30 Dec 2009 16:07:40 +0000 (18:07 +0200)]
KVM: VMX: Give the guest ownership of cr0.ts when the fpu is active
If the guest fpu is loaded, there is nothing interesing about cr0.ts; let
the guest play with it as it will. This makes context switches between fpu
intensive guest processes faster, as we won't trap the clts and cr0 write
instructions.
[marcelo: fix cr0 read shadow update on fpu deactivation; kills F8 install]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Wed, 30 Dec 2009 10:40:26 +0000 (12:40 +0200)]
KVM: Lazify fpu activation and deactivation
Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.
We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 16:43:06 +0000 (18:43 +0200)]
KVM: VMX: Allow the guest to own some cr0 bits
We will use this later to give the guest ownership of cr0.ts.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 16:07:30 +0000 (18:07 +0200)]
KVM: Replace read accesses of vcpu->arch.cr0 by an accessor
Since we'd like to allow the guest to own a few bits of cr0 at times, we need
to know when we access those bits.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 29 Dec 2009 15:33:58 +0000 (17:33 +0200)]
KVM: VMX: trace clts and lmsw instructions as cr accesses
clts writes cr0.ts; lmsw writes cr0[0:15] - record that in ftrace.
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Sun, 10 Jan 2010 02:27:47 +0000 (03:27 +0100)]
KVM: PPC: Make large pages work
An SLB entry contains two pieces of information related to size:
1) PTE size
2) SLB size
The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.
Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.
This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Sun, 10 Jan 2010 02:27:32 +0000 (03:27 +0100)]
KVM: PPC: Pass through program interrupts
When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.
If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.
So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:09 +0000 (02:58 +0100)]
KVM: PPC: Pass program interrupt flags to the guest
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:08 +0000 (02:58 +0100)]
KVM: PPC: Fix HID5 setting code
The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 8 Jan 2010 01:58:07 +0000 (02:58 +0100)]
KVM: PPC: Emulate trap SRR1 flags properly
Book3S needs some flags in SRR1 to get to know details about an interrupt.
One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.
This patch implements above behavior, making WARN_ON behave like WARN_ON.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>