kernel/kernel-generic.git
16 years agoKVM: Move apic timer migration away from critical section
Avi Kivity [Wed, 16 Jan 2008 10:49:30 +0000 (12:49 +0200)]
KVM: Move apic timer migration away from critical section

Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port.  Move migration to the regular
vcpu execution path, triggered by a new bitflag.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Put kvm_para.h include outside __KERNEL__
Glauber de Oliveira Costa [Tue, 15 Jan 2008 15:10:15 +0000 (13:10 -0200)]
KVM: Put kvm_para.h include outside __KERNEL__

kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Fix unbounded preemption latency
Avi Kivity [Tue, 15 Jan 2008 16:27:32 +0000 (18:27 +0200)]
KVM: Fix unbounded preemption latency

When preparing to enter the guest, if an interrupt comes in while
preemption is disabled but interrupts are still enabled, we miss a
preemption point.  Fix by explicitly checking whether we need to
reschedule.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Initialize the mmu caches only after verifying cpu support
Avi Kivity [Sun, 13 Jan 2008 11:23:56 +0000 (13:23 +0200)]
KVM: Initialize the mmu caches only after verifying cpu support

Otherwise we re-initialize the mmu caches, which will fail since the
caches are already registered, which will cause us to deinitialize said caches.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Fix dirty page setting for pages removed from rmap
Izik Eidus [Sat, 12 Jan 2008 21:49:09 +0000 (23:49 +0200)]
KVM: MMU: Fix dirty page setting for pages removed from rmap

Right now rmap_remove won't set the page as dirty if the shadow pte
pointed to this page had write access and then it became readonly.
This patches fixes that, by setting the page as dirty for spte changes from
write to readonly access.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move kvm_fpu to asm-x86/kvm.h
Christian Ehrhardt [Tue, 8 Jan 2008 07:04:50 +0000 (08:04 +0100)]
KVM: Portability: Move kvm_fpu to asm-x86/kvm.h

This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD
Sheng Yang [Wed, 2 Jan 2008 06:49:22 +0000 (14:49 +0800)]
KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD

When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.

After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.

In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Merge shadow level check in FNAME(fetch)
Dong, Eddie [Wed, 2 Jan 2008 06:29:08 +0000 (14:29 +0800)]
KVM: MMU: Merge shadow level check in FNAME(fetch)

Remove the redundant level check when fetching
shadow pte for present & non-present spte.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Move kvm_free_some_pages() into critical section
Avi Kivity [Mon, 31 Dec 2007 13:27:49 +0000 (15:27 +0200)]
KVM: MMU: Move kvm_free_some_pages() into critical section

If some other cpu steals mmu pages between our check and an attempt to
allocate, we can run out of mmu pages.  Fix by moving the check into the
same critical section as the allocation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Switch to mmu spinlock
Marcelo Tosatti [Fri, 21 Dec 2007 00:18:26 +0000 (19:18 -0500)]
KVM: MMU: Switch to mmu spinlock

Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.

Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()
Avi Kivity [Sun, 30 Dec 2007 10:29:05 +0000 (12:29 +0200)]
KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()

Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).

[marcelo: avoid recursive locking of mmap_sem]

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add kvm_read_guest_atomic()
Marcelo Tosatti [Fri, 21 Dec 2007 00:18:23 +0000 (19:18 -0500)]
KVM: Add kvm_read_guest_atomic()

In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Concurrent guest walkers
Marcelo Tosatti [Fri, 21 Dec 2007 00:18:22 +0000 (19:18 -0500)]
KVM: MMU: Concurrent guest walkers

Do not hold kvm->lock mutex across the entire pagefault code,
only acquire it in places where it is necessary, such as mmu
hash list, active list, rmap and parent pte handling.

Allow concurrent guest walkers by switching walk_addr() to use
mmap_sem in read-mode.

And get rid of the lockless __gfn_to_page.

[avi: move kvm_mmu_pte_write() locking inside the function]
[avi: add locking for real mode]
[avi: fix cmpxchg locking]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Disable vapic support on Intel machines with FlexPriority
Avi Kivity [Wed, 26 Dec 2007 11:57:04 +0000 (13:57 +0200)]
KVM: Disable vapic support on Intel machines with FlexPriority

FlexPriority accelerates the tpr without any patching.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Accelerated apic support
Avi Kivity [Thu, 25 Oct 2007 14:52:32 +0000 (16:52 +0200)]
KVM: Accelerated apic support

This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: local APIC TPR access reporting facility
Avi Kivity [Mon, 22 Oct 2007 14:50:39 +0000 (16:50 +0200)]
KVM: local APIC TPR access reporting facility

Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel.  This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Print data for unimplemented wrmsr
Avi Kivity [Wed, 19 Dec 2007 10:02:40 +0000 (12:02 +0200)]
KVM: Print data for unimplemented wrmsr

This can help diagnosing what the guest is trying to do.  In many cases
we can get away with partial emulation of msrs.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Add cache miss statistic
Avi Kivity [Tue, 18 Dec 2007 17:47:18 +0000 (19:47 +0200)]
KVM: MMU: Add cache miss statistic

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Coalesce remote tlb flushes
Eddie Dong [Mon, 17 Dec 2007 22:08:27 +0000 (06:08 +0800)]
KVM: MMU: Coalesce remote tlb flushes

Host side TLB flush can be merged together if multiple
spte need to be write-protected.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Expose ioapic to ia64 save/restore APIs
Zhang Xiantao [Mon, 17 Dec 2007 12:27:27 +0000 (20:27 +0800)]
KVM: Expose ioapic to ia64 save/restore APIs

IA64 also needs to see ioapic structure in irqchip.

Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Move kvm_vcpu_kick() to x86.c
Zhang Xiantao [Mon, 17 Dec 2007 06:21:40 +0000 (14:21 +0800)]
KVM: Move kvm_vcpu_kick() to x86.c

Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Move ioapic code to common directory.
Zhang Xiantao [Mon, 17 Dec 2007 06:16:14 +0000 (14:16 +0800)]
KVM: Move ioapic code to common directory.

Move ioapic code to common, since IA64 also needs it.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Move irqchip declarations into new ioapic.h and lapic.h
Zhang Xiantao [Mon, 17 Dec 2007 05:59:56 +0000 (13:59 +0800)]
KVM: Move irqchip declarations into new ioapic.h and lapic.h

This allows reuse of ioapic in ia64.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Move drivers/kvm/* to virt/kvm/
Avi Kivity [Sun, 16 Dec 2007 09:13:16 +0000 (11:13 +0200)]
KVM: Move drivers/kvm/* to virt/kvm/

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Move arch dependent files to new directory arch/x86/kvm/
Avi Kivity [Sun, 16 Dec 2007 09:02:48 +0000 (11:02 +0200)]
KVM: Move arch dependent files to new directory arch/x86/kvm/

This paves the way for multiple architecture support.  Note that while
ioapic.c could potentially be shared with ia64, it is also moved.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Add printk_ratelimit in vmx_intr_assist
Ryan Harper [Thu, 13 Dec 2007 16:21:10 +0000 (10:21 -0600)]
KVM: VMX: Add printk_ratelimit in vmx_intr_assist

Add printk_ratelimit check in front of printk.  This prevents spamming
of the message during 32-bit ubuntu 6.06server install.  Previously, it
would hang during the partition formatting stage.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move kvm_vm_stat to x86.h
Zhang Xiantao [Fri, 14 Dec 2007 02:23:23 +0000 (10:23 +0800)]
KVM: Portability: Move kvm_vm_stat to x86.h

This patch moves kvm_vm_stat to x86.h, and every arch
can define its own kvm_vm_stat in $arch.h

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move round_robin_prev_vcpu and tss_addr to kvm_arch
Zhang Xiantao [Fri, 14 Dec 2007 02:20:16 +0000 (10:20 +0800)]
KVM: Portability: Move round_robin_prev_vcpu and tss_addr to kvm_arch

This patches moves two fields round_robin_prev_vcpu and tss to kvm_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: move vpic and vioapic to kvm_arch
Zhang Xiantao [Fri, 14 Dec 2007 02:17:34 +0000 (10:17 +0800)]
KVM: Portability: move vpic and vioapic to kvm_arch

This patches moves two fields vpid and vioapic to kvm_arch

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move mmu-related fields to kvm_arch
Zhang Xiantao [Fri, 14 Dec 2007 02:01:48 +0000 (10:01 +0800)]
KVM: Portability: Move mmu-related fields to kvm_arch

This patches moves mmu-related fields to kvm_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move memslot aliases to new struct kvm_arch
Zhang Xiantao [Fri, 14 Dec 2007 01:54:20 +0000 (09:54 +0800)]
KVM: Portability: Move memslot aliases to new struct kvm_arch

This patches create kvm_arch to hold arch-specific kvm fileds
and moves fields naliases and aliases to kvm_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move kvm_vcpu_stat to x86.h
Zhang Xiantao [Fri, 14 Dec 2007 01:49:26 +0000 (09:49 +0800)]
KVM: Portability: Move kvm_vcpu_stat to x86.h

This patches moves kvm_vcpu_stat to x86.h, so every
arch can define its own kvm_vcpu_stat structure.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Expand the KVM_VCPU_COMM in kvm_vcpu structure.
Zhang Xiantao [Fri, 14 Dec 2007 01:45:31 +0000 (09:45 +0800)]
KVM: Portability: Expand the KVM_VCPU_COMM in kvm_vcpu structure.

This patches removes KVM_COMM macro, original it is hold
kvm_vcpu common fields.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move kvm_vcpu definition back to kvm.h
Zhang Xiantao [Fri, 14 Dec 2007 01:41:22 +0000 (09:41 +0800)]
KVM: Portability: Move kvm_vcpu definition back to kvm.h

This patches moves kvm_vcpu definition to kvm.h, and finally
kvm.h includes x86.h.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Split mmu-related static inline functions to mmu.h
Zhang Xiantao [Fri, 14 Dec 2007 01:35:10 +0000 (09:35 +0800)]
KVM: Portability: Split mmu-related static inline functions to mmu.h

Since these functions need to know the details of kvm or kvm_vcpu structure,
it can't be put in x86.h.  Create mmu.h to hold them.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Introduce kvm_vcpu_arch
Zhang Xiantao [Thu, 13 Dec 2007 15:50:52 +0000 (23:50 +0800)]
KVM: Portability: Introduce kvm_vcpu_arch

Move all the architecture-specific fields in kvm_vcpu into a new struct
kvm_vcpu_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move kvm{pic,ioapic} accesors to x86 specific code
Zhang Xiantao [Tue, 11 Dec 2007 12:36:00 +0000 (20:36 +0800)]
KVM: Portability: Move kvm{pic,ioapic} accesors to x86 specific code

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: emulated cmpxchg8b should be atomic on i386
Marcelo Tosatti [Wed, 12 Dec 2007 15:46:12 +0000 (10:46 -0500)]
KVM: MMU: emulated cmpxchg8b should be atomic on i386

Emulate cmpxchg8b atomically on i386. This is required to avoid a guest
pte walker from seeing a splitted write.

[avi: make it compile]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: support writing 0 to K8 performance counter control registers
Joerg Roedel [Tue, 11 Dec 2007 14:36:57 +0000 (15:36 +0100)]
KVM: SVM: support writing 0 to K8 performance counter control registers

This lets SVM ignore writes of the value 0 to the performance counter control
registers.  Thus enabling them will still fail in the guest, but a write of 0
which keeps them disabled is accepted.  This is required to boot Windows
Vista 64bit.

[avi: avoid fall-thru in switch statement]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: LAPIC: minor debugging compile fix
Joerg Roedel [Wed, 12 Dec 2007 11:37:24 +0000 (12:37 +0100)]
KVM: LAPIC: minor debugging compile fix

This patch fixes a compile error of the LAPIC code with APIC debugging enabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Fix SMP shadow instantiation race
Marcelo Tosatti [Wed, 12 Dec 2007 00:12:27 +0000 (19:12 -0500)]
KVM: MMU: Fix SMP shadow instantiation race

There is a race where VCPU0 is shadowing a pagetable entry while VCPU1
is updating it, which results in a stale shadow copy.

Fix that by comparing the contents of the cached guest pte with the
current guest pte after write-protecting the guest pagetable.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: Exit to userspace if write to cr8 and not using in-kernel apic
Joerg Roedel [Thu, 6 Dec 2007 20:02:25 +0000 (21:02 +0100)]
KVM: SVM: Exit to userspace if write to cr8 and not using in-kernel apic

With this patch KVM on SVM will exit to userspace if the guest writes to CR8
and the in-kernel APIC is disabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Use mmu_set_spte() for real-mode shadows
Avi Kivity [Sun, 9 Dec 2007 16:43:00 +0000 (18:43 +0200)]
KVM: MMU: Use mmu_set_spte() for real-mode shadows

In addition to removing some duplicated code, this also handles the unlikely
case of real-mode code updating a guest page table.  This can happen when
one vcpu (in real mode) touches a second vcpu's (in protected mode) page
tables, or if a vcpu switches to real mode, touches page tables, and switches
back.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Adjust mmu_set_spte() debug code for gpte removal
Avi Kivity [Sun, 9 Dec 2007 16:39:41 +0000 (18:39 +0200)]
KVM: MMU: Adjust mmu_set_spte() debug code for gpte removal

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Move set_pte() into guest paging mode independent code
Avi Kivity [Sun, 9 Dec 2007 15:40:31 +0000 (17:40 +0200)]
KVM: MMU: Move set_pte() into guest paging mode independent code

As set_pte() no longer references either a gpte or the guest walker, we can
move it out of paging mode dependent code (which compiles twice and is
generally nasty).

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Remove walker argument to set_pte()
Avi Kivity [Sun, 9 Dec 2007 15:33:46 +0000 (17:33 +0200)]
KVM: MMU: Remove walker argument to set_pte()

Unused.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Pass pte dirty flag to set_pte() instead of calculating it on-site
Avi Kivity [Sun, 9 Dec 2007 15:32:30 +0000 (17:32 +0200)]
KVM: MMU: Pass pte dirty flag to set_pte() instead of calculating it on-site

This allows us to remove its dependency on pt_element_t.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: No need to pick up nx bit from guest pte
Avi Kivity [Sun, 9 Dec 2007 15:27:52 +0000 (17:27 +0200)]
KVM: MMU: No need to pick up nx bit from guest pte

We already set it according to cumulative access permissions.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Fix inherited permissions for emulated guest pte updates
Avi Kivity [Sun, 9 Dec 2007 15:00:02 +0000 (17:00 +0200)]
KVM: MMU: Fix inherited permissions for emulated guest pte updates

When we emulate a guest pte write, we fail to apply the correct inherited
permissions from the parent ptes.  Now that we store inherited permissions
in the shadow page, we can use that to update the pte permissions correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Move pte access calculation into a helper function
Avi Kivity [Sun, 9 Dec 2007 14:52:56 +0000 (16:52 +0200)]
KVM: MMU: Move pte access calculation into a helper function

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Set nx bit correctly on shadow ptes
Avi Kivity [Sun, 9 Dec 2007 14:37:36 +0000 (16:37 +0200)]
KVM: MMU: Set nx bit correctly on shadow ptes

While the page table walker correctly generates a guest page fault
if a guest tries to execute a non-executable page, the shadow code does
not mark it non-executable.  This means that if a guest accesses an nx
page first with a read access, then subsequent code fetch accesses will
succeed.

Fix by setting the nx bit on shadow ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Simplify calculation of pte access
Avi Kivity [Sun, 9 Dec 2007 14:15:46 +0000 (16:15 +0200)]
KVM: MMU: Simplify calculation of pte access

The nx bit is awkwardly placed in the 63rd bit position; furthermore it
has a reversed meaning compared to the other bits, which means we can't use
a bitwise and to calculate compounded access masks.

So, we simplify things by creating a new 3-bit exec/write/user access word,
and doing all calculations in that.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Use cmpxchg for pte updates on walk_addr()
Marcelo Tosatti [Fri, 7 Dec 2007 12:56:58 +0000 (07:56 -0500)]
KVM: MMU: Use cmpxchg for pte updates on walk_addr()

In preparation for multi-threaded guest pte walking, use cmpxchg()
when updating guest pte's. This guarantees that the assignment of the
dirty bit can't be lost if two CPU's are faulting the same address
simultaneously.

[avi: fix kunmap_atomic() parameters]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: Trap access to the cr8 register
Avi Kivity [Thu, 6 Dec 2007 17:50:00 +0000 (19:50 +0200)]
KVM: SVM: Trap access to the cr8 register

Later we may be able to use the virtual tpr feature, but for now,
just trap it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: Fix stack instructions on 64-bit mode
Avi Kivity [Thu, 6 Dec 2007 16:14:14 +0000 (18:14 +0200)]
KVM: x86 emulator: Fix stack instructions on 64-bit mode

Stack instructions are always 64-bit on 64-bit mode; many of the
emulated stack instructions did not take that into account.  Fix by
adding a 'Stack' bitflag and setting the operand size appropriately
during the decode stage (except for 'push r/m', which is in a group
with a few other instructions, so it gets its own treatment).

This fixes random crashes on Vista x64.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: Emulate read/write access to cr8
Joerg Roedel [Thu, 6 Dec 2007 14:46:52 +0000 (15:46 +0100)]
KVM: SVM: Emulate read/write access to cr8

This patch adds code to emulate the access to the cr8 register to the x86
instruction emulator in kvm.  This is needed on svm, where there is no
hardware decode for control register access.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Avoid exit when setting cr8 if the local apic is in the kernel
Avi Kivity [Thu, 6 Dec 2007 14:32:45 +0000 (16:32 +0200)]
KVM: VMX: Avoid exit when setting cr8 if the local apic is in the kernel

With apic in userspace, we must exit to userspace after a cr8 write in order
to update the tpr.  But if the apic is in the kernel, the exit is unnecessary.

Noticed by Joerg Roedel.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: fix eflags preparation for emulation
Avi Kivity [Thu, 6 Dec 2007 14:15:02 +0000 (16:15 +0200)]
KVM: x86 emulator: fix eflags preparation for emulation

We prepare eflags for the emulated instruction, then clobber it with an 'andl'.
Fix by popping eflags as the last thing in the sequence.

Patch taken from Xen (16143:959b4b92b6bf)

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Use generalized exception queue for injecting #UD
Avi Kivity [Sun, 25 Nov 2007 13:22:50 +0000 (15:22 +0200)]
KVM: Use generalized exception queue for injecting #UD

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Replace #GP injection by the generalized exception queue
Avi Kivity [Sun, 25 Nov 2007 12:12:03 +0000 (14:12 +0200)]
KVM: Replace #GP injection by the generalized exception queue

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Replace page fault injection by the generalized exception queue
Avi Kivity [Sun, 25 Nov 2007 12:04:58 +0000 (14:04 +0200)]
KVM: Replace page fault injection by the generalized exception queue

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Generalize exception injection mechanism
Avi Kivity [Sun, 25 Nov 2007 11:41:11 +0000 (13:41 +0200)]
KVM: Generalize exception injection mechanism

Instead of each subarch doing its own thing, add an API for queuing an
injection, and manage failed exception injection centerally (i.e., if
an inject failed due to a shadow page fault, we need to requeue it).

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Remove unused prev_shadow_ent variable from fetch()
Marcelo Tosatti [Tue, 4 Dec 2007 18:42:16 +0000 (13:42 -0500)]
KVM: MMU: Remove unused prev_shadow_ent variable from fetch()

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Convert KVM from ->nopage() to ->fault()
npiggin@suse.de [Wed, 5 Dec 2007 07:15:52 +0000 (18:15 +1100)]
KVM: Convert KVM from ->nopage() to ->fault()

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: kvm-devel@lists.sourceforge.net
Cc: avi@qumranet.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Create kvm_arch_vcpu_runnable() function
Hollis Blanchard [Mon, 3 Dec 2007 22:15:26 +0000 (16:15 -0600)]
KVM: Portability: Create kvm_arch_vcpu_runnable() function

This abstracts the detail of x86 hlt and INIT modes into a function.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Stop including x86-specific headers in kvm_main.c
Hollis Blanchard [Mon, 3 Dec 2007 21:30:25 +0000 (15:30 -0600)]
KVM: Portability: Stop including x86-specific headers in kvm_main.c

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move IO device definitions to its own header file
Hollis Blanchard [Mon, 3 Dec 2007 21:30:24 +0000 (15:30 -0600)]
KVM: Portability: Move IO device definitions to its own header file

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move address types to their own header file
Hollis Blanchard [Mon, 3 Dec 2007 21:30:23 +0000 (15:30 -0600)]
KVM: Portability: Move address types to their own header file

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Extend ioapic code to support iosapic
Zhang Xiantao [Sun, 2 Dec 2007 14:53:07 +0000 (22:53 +0800)]
KVM: Extend ioapic code to support iosapic

iosapic supports an additional mmio EOI register compared to ioapic.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Replace dest_Lowest_Prio and dest_Fixed with self-defined macros
Zhang Xiantao [Sun, 2 Dec 2007 14:49:09 +0000 (22:49 +0800)]
KVM: Replace dest_Lowest_Prio and dest_Fixed with self-defined macros

Change
  dest_Loest_Prio -> IOAPIC_LOWEST_PRIORITY
  dest_Fixed -> IOAPIC_FIXED

the original names are x86 specific, while the ioapic code will be reused
for ia64.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Replace kvm_lapic with kvm_vcpu in ioapic/lapic interface
Zhang Xiantao [Sun, 2 Dec 2007 14:35:57 +0000 (22:35 +0800)]
KVM: Replace kvm_lapic with kvm_vcpu in ioapic/lapic interface

This patch replaces lapic structure with kvm_vcpu in ioapic.c, making ioapic
independent of the local apic, as required by ia64.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: SVM: Remove KVM specific defines for MSR_EFER
Carlo Marcelo Arenas Belon [Sat, 1 Dec 2007 12:17:11 +0000 (06:17 -0600)]
KVM: SVM: Remove KVM specific defines for MSR_EFER

This patch removes the KVM specific defines for MSR_EFER that were being used
in the svm support file and migrates all references to use instead the ones
from the kernel headers that are used everywhere else and that have the same
values.

Signed-off-by: Carlo Marcelo Arenas Belon <carenas@sajinet.com.pe>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Export include/linux/kvm.h only if $ARCH actually supports KVM
Avi Kivity [Sun, 2 Dec 2007 08:50:06 +0000 (10:50 +0200)]
KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM

Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing.  Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Correct kvm_init() error paths not freeing bad_pge.
Zhang Xiantao [Thu, 29 Nov 2007 07:35:39 +0000 (15:35 +0800)]
KVM: Correct kvm_init() error paths not freeing bad_pge.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move KVM_INTERRUPT vcpu ioctl to x86.c
Zhang Xiantao [Tue, 20 Nov 2007 20:36:41 +0000 (04:36 +0800)]
KVM: Portability: Move KVM_INTERRUPT vcpu ioctl to x86.c

Other archs doesn't need it.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: unify four switch statements into two
Avi Kivity [Tue, 27 Nov 2007 17:30:56 +0000 (19:30 +0200)]
KVM: x86 emulator: unify four switch statements into two

Unify the special instruction switch with the regular instruction switch,
and the two byte special instruction switch with the regular two byte
instruction switch.  That makes it much easier to find an instruction or
the place an instruction needs to be added in.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: unify two switches
Avi Kivity [Tue, 27 Nov 2007 17:14:21 +0000 (19:14 +0200)]
KVM: x86 emulator: unify two switches

The rep prefix cleanup left two switch () statements next to each other.
Unify them.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: Move rep processing before instruction execution
Avi Kivity [Tue, 27 Nov 2007 17:05:37 +0000 (19:05 +0200)]
KVM: x86 emulator: Move rep processing before instruction execution

Currently rep processing is handled somewhere in the middle of instruction
processing.  Move it to a sensible place.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Add ifdef in irqchip struct for x86 only structures
Jerone Young [Mon, 26 Nov 2007 14:33:53 +0000 (08:33 -0600)]
KVM: Add ifdef in irqchip struct for x86 only structures

This patch fixes a small issue where sturctures:
kvm_pic_state
kvm_ioapic_state

are defined inside x86 specific code and may or may not
be defined in anyway for other architectures. The problem
caused is one cannot compile userspace apps (ex. libkvm)
for other archs since a size cannot be determined for these
structures.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: cmps instruction
Guillaume Thouvenin [Mon, 26 Nov 2007 12:49:09 +0000 (13:49 +0100)]
KVM: x86 emulator: cmps instruction

Add emulation for the cmps instruction.  This lets OpenBSD boot on kvm.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: Rename 'cr2' to 'memop'
Sheng Yang [Fri, 16 Nov 2007 08:29:15 +0000 (16:29 +0800)]
KVM: x86 emulator: Rename 'cr2' to 'memop'

Previous patches have removed the dependency on cr2; we can now stop passing
it to the emulator and rename uses to 'memop'.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: mark pages that were inserted to the shadow pages table as accessed
Izik Eidus [Mon, 26 Nov 2007 12:08:14 +0000 (14:08 +0200)]
KVM: MMU: mark pages that were inserted to the shadow pages table as accessed

Mark guest pages as accessed when removed from the shadow page tables for
better lru processing.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Remove misleading check for mmio during event injection
Avi Kivity [Sun, 25 Nov 2007 15:45:31 +0000 (17:45 +0200)]
KVM: Remove misleading check for mmio during event injection

mmio was already handled in kvm_arch_vcpu_ioctl_run(), so no need to check
again.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: address size and operand size overrides are sticky
Avi Kivity [Thu, 22 Nov 2007 12:16:12 +0000 (14:16 +0200)]
KVM: x86 emulator: address size and operand size overrides are sticky

Current implementation is to toggle, which is incorrect.  Patch ported from
corresponding Xen code.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: x86 emulator: Make a distinction between repeat prefixes F3 and F2
Guillaume Thouvenin [Thu, 22 Nov 2007 10:32:09 +0000 (11:32 +0100)]
KVM: x86 emulator: Make a distinction between repeat prefixes F3 and F2

cmps and scas instructions accept repeat prefixes F3 and F2. So in
order to emulate those prefixed instructions we need to be able to know
if prefixes are REP/REPE/REPZ or REPNE/REPNZ. Currently kvm doesn't make
this distinction. This patch introduces this distinction.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Portability: Move unalias_gfn to arch dependent file
Zhang Xiantao [Thu, 22 Nov 2007 03:20:33 +0000 (11:20 +0800)]
KVM: Portability: Move unalias_gfn to arch dependent file

Non-x86 archs don't need this mechanism. Move it to arch, and
keep its interface in common.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: VMX: Remove the secondary execute control dependency on irqchip
Sheng Yang [Wed, 21 Nov 2007 06:33:25 +0000 (14:33 +0800)]
KVM: VMX: Remove the secondary execute control dependency on irqchip

The state of SECONDARY_VM_EXEC_CONTROL shouldn't depend on in-kernel IRQ chip,
this patch fix this.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Enhance guest cpuid management
Dan Kenigsberg [Wed, 21 Nov 2007 15:10:04 +0000 (17:10 +0200)]
KVM: Enhance guest cpuid management

The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:

 - No way to tell which features the host supports

  While some features can be supported with no changes to kvm, others
  need explicit support.  That means kvm needs to vet the feature set
  before it is passed to the guest.

 - No support for indexed or stateful cpuid entries

  Some cpuid entries depend on ecx as well as on eax, or on internal
  state in the processor (running cpuid multiple times with the same
  input returns different output).  The current cpuid machinery only
  supports keying on eax.

 - No support for save/restore/migrate

  The internal state above needs to be exposed to userspace so it can
  be saved or migrated.

This patch adds extended cpuid support by means of three new ioctls:

 - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
   supports

 - KVM_SET_CPUID2: sets the vcpu's cpuid table

 - KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state

[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
      before]

Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Disallow fork() and similar games when using a VM
Avi Kivity [Wed, 21 Nov 2007 14:41:05 +0000 (16:41 +0200)]
KVM: Disallow fork() and similar games when using a VM

We don't want the meaning of guest userspace changing under our feet.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Rename 'release_page'
Avi Kivity [Wed, 21 Nov 2007 13:32:41 +0000 (15:32 +0200)]
KVM: MMU: Rename 'release_page'

Rename the awkwardly named variable.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Rename variables of type 'struct kvm_mmu_page *'
Avi Kivity [Wed, 21 Nov 2007 13:28:32 +0000 (15:28 +0200)]
KVM: MMU: Rename variables of type 'struct kvm_mmu_page *'

These are traditionally named 'page', but even more traditionally, that name
is reserved for variables that point to a 'struct page'.  Rename them to 'sp'
(for "shadow page").

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: Remove gpa_to_hpa()
Avi Kivity [Wed, 21 Nov 2007 13:01:44 +0000 (15:01 +0200)]
KVM: Remove gpa_to_hpa()

Converting last uses along the way.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Remove gva_to_hpa()
Avi Kivity [Wed, 21 Nov 2007 12:57:44 +0000 (14:57 +0200)]
KVM: MMU: Remove gva_to_hpa()

No longer used.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Simplify nonpaging_map()
Avi Kivity [Wed, 21 Nov 2007 12:54:16 +0000 (14:54 +0200)]
KVM: MMU: Simplify nonpaging_map()

Instead of passing an hpa, pass a regular struct page.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Introduce gfn_to_gpa()
Avi Kivity [Wed, 21 Nov 2007 12:44:45 +0000 (14:44 +0200)]
KVM: MMU: Introduce gfn_to_gpa()

Converting a frame number to an address is tricky since the data type changes
size.  Introduce a function to do it.  This fixes an actual bug when
accessing guest ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Adjust page_header_update_slot() to accept a gfn instead of a gpa
Avi Kivity [Wed, 21 Nov 2007 12:20:22 +0000 (14:20 +0200)]
KVM: MMU: Adjust page_header_update_slot() to accept a gfn instead of a gpa

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Merge set_pte() and set_pte_common()
Avi Kivity [Wed, 21 Nov 2007 12:16:30 +0000 (14:16 +0200)]
KVM: MMU: Merge set_pte() and set_pte_common()

Since set_pte() is now the only caller of set_pte_common(), merge the two
functions.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Remove set_pde()
Avi Kivity [Wed, 21 Nov 2007 12:11:49 +0000 (14:11 +0200)]
KVM: MMU: Remove set_pde()

It is now identical to set_pte().

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Remove extra gaddr parameter from set_pte_common()
Avi Kivity [Wed, 21 Nov 2007 12:08:40 +0000 (14:08 +0200)]
KVM: MMU: Remove extra gaddr parameter from set_pte_common()

Similar information is available in the gfn parameter, so use that.

Signed-off-by: Avi Kivity <avi@qumranet.com>
16 years agoKVM: MMU: Move pse36 handling to the guest walker
Avi Kivity [Wed, 21 Nov 2007 11:54:47 +0000 (13:54 +0200)]
KVM: MMU: Move pse36 handling to the guest walker

Signed-off-by: Avi Kivity <avi@qumranet.com>