Xiao Guangrong [Mon, 30 Aug 2010 10:24:10 +0000 (18:24 +0800)]
KVM: MMU: move audit to a separate file
Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Mon, 30 Aug 2010 10:22:53 +0000 (18:22 +0800)]
KVM: MMU: support disable/enable mmu audit dynamicly
Add a r/w module parameter named 'mmu_audit', it can control audit
enable/disable:
enable:
echo 1 > /sys/module/kvm/parameters/mmu_audit
disable:
echo 0 > /sys/module/kvm/parameters/mmu_audit
This patch not change the logic
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jes Sorensen [Wed, 1 Sep 2010 09:42:04 +0000 (11:42 +0200)]
KVM: Fix guest kernel crash on MSR_K7_CLK_CTL
MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant
on said old AMD CPU models. This change returns the expected value,
which the Linux kernel is expecting to avoid writing back the MSR,
plus it ignores all writes to the MSR.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 30 Aug 2010 09:18:24 +0000 (12:18 +0300)]
KVM: i8259: Make ICW1 conform to spec
ICW is not a full reset, instead it resets a limited number of registers
in the PIC. Change ICW1 emulation to only reset those registers.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 30 Aug 2010 14:12:28 +0000 (17:12 +0300)]
KVM: x86 emulator: clean up control flow in x86_emulate_insn()
x86_emulate_insn() is full of things like
if (rc != X86EMUL_CONTINUE)
goto done;
break;
consolidate all of those at the end of the switch statement.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 3 Aug 2010 12:05:46 +0000 (15:05 +0300)]
KVM: x86 emulator: fix group 11 decoding for reg != 0
These are all undefined.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 3 Aug 2010 11:46:56 +0000 (14:46 +0300)]
KVM: x86 emulator: use single stage decoding for mov instructions
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 1 Sep 2010 07:23:35 +0000 (10:23 +0300)]
KVM: Don't save/restore MSR_IA32_PERF_STATUS
It is read/only; restoring it only results in annoying messages.
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Tue, 31 Aug 2010 22:13:14 +0000 (19:13 -0300)]
KVM: SVM: init_vmcb should reset vcpu->efer
Otherwise EFER_LMA bit is retained across a SIPI reset.
Fixes guest cpu onlining.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Tue, 31 Aug 2010 22:13:13 +0000 (19:13 -0300)]
KVM: SVM: reset mmu context in init_vmcb
Since commit
aad827034e419fa no mmu reinitialization is performed
via init_vmcb.
Zero vcpu->arch.cr0 and pass the reset value as a parameter to
kvm_set_cr0.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 30 Aug 2010 07:46:56 +0000 (10:46 +0300)]
KVM: Fix pio trace direction
out = write, in = read, not the other way round.
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 28 Aug 2010 11:25:09 +0000 (19:25 +0800)]
KVM: MMU: remove count_rmaps()
Nothing is checked in count_rmaps(), so remove it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 28 Aug 2010 11:24:13 +0000 (19:24 +0800)]
KVM: MMU: rewrite audit_mappings_page() function
There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).
This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()
And it adds 'notrap' ptes check in unsync/direct sps
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 28 Aug 2010 11:22:46 +0000 (19:22 +0800)]
KVM: MMU: fix wrong not write protected sp report
The audit code reports some sp not write protected in current code, it's just the
bug in audit_write_protection(), since:
- the invalid sp not need write protected
- using uninitialize local variable('gfn')
- call kvm_mmu_audit() out of mmu_lock's protection
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 28 Aug 2010 11:20:47 +0000 (19:20 +0800)]
KVM: MMU: check rmap for every spte
The read-only spte also has reverse mapping, so fix the code to check them,
also modify the function name to fit its doing
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Sat, 28 Aug 2010 11:19:42 +0000 (19:19 +0800)]
KVM: MMU: fix compile warning in audit code
fix:
arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’:
arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’:
arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘set_spte’:
arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’:
arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jason Wang [Fri, 27 Aug 2010 09:15:06 +0000 (17:15 +0800)]
KVM: pit: Do not check pending pit timer in vcpu thread
Pit interrupt injection was done by workqueue, so no need to check
pending pit timer in vcpu thread which could lead unnecessary
unblocking of vcpu.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 30 Aug 2010 10:01:56 +0000 (12:01 +0200)]
KVM: PPC: Fix CONFIG_KVM_GUEST && !CONFIG_KVM case
When CONFIG_KVM_GUEST is selected, but CONFIG_KVM is not, we were missing
some defines in asm-offsets.c and included too many headers at other places.
This patch makes above configuration work.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 15:34:55 +0000 (18:34 +0300)]
KVM: x86 emulator: simplify ALU opcode block decode further
The ALU opcode block is very regular; introduce D6ALU() to define decode
flags for 6 instructions at a time.
Suggested by Paolo Bonzini.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 10:38:03 +0000 (13:38 +0300)]
KVM: Fix build error due to 64-bit division in nsec_to_cycles()
Use do_div() instead.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:59:01 +0000 (11:59 +0300)]
KVM: x86 emulator: trap and propagate #DE from DIV and IDIV
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:59:00 +0000 (11:59 +0300)]
KVM: x86 emulator: add macros for executing instructions that may trap
Like DIV and IDIV.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:13 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:12 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:11 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:10 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:09 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:08 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify string instruction decode flags
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:07 +0000 (11:56 +0300)]
KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags
Use the new byte/word dual opcode decode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:56:06 +0000 (11:56 +0300)]
KVM: x86 emulator: support byte/word opcode pairs
Many x86 instructions come in byte and word variants distinguished with bit
0 of the opcode. Add macros to aid in defining them.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 26 Aug 2010 08:06:15 +0000 (11:06 +0300)]
KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand
SrcMemFAddr is not defined with the modrm operand designating a register
instead of a memory address.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Wed, 25 Aug 2010 09:47:43 +0000 (12:47 +0300)]
KVM: x86 emulator: get rid of "restart" in emulation context.
x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Wed, 25 Aug 2010 09:47:42 +0000 (12:47 +0300)]
KVM: x86 emulator: move string instruction completion check into separate function
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Wed, 25 Aug 2010 09:47:41 +0000 (12:47 +0300)]
KVM: x86 emulator: Rename variable that shadows another local variable.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Wed, 25 Aug 2010 06:10:53 +0000 (14:10 +0800)]
KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Tue, 24 Aug 2010 13:48:52 +0000 (15:48 +0200)]
KVM: S390: Export kvm_virtio.h
As suggested by Christian, we should expose headers to user space with
information that might be valuable there. The s390 virtio interface is
one of those cases. It defines an ABI between hypervisor and guest, so
it should be exposed to user space.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Tue, 24 Aug 2010 13:48:51 +0000 (15:48 +0200)]
KVM: S390: Add virtio hotplug add support
The one big missing feature in s390-virtio was hotplugging. This is no more.
This patch implements hotplug add support, so you can on the fly add new devices
in the guest.
Keep in mind that this needs a patch for qemu to actually leverage the
functionality.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Alexander Graf [Tue, 24 Aug 2010 13:48:50 +0000 (15:48 +0200)]
KVM: S390: take a full byte as ext_param indicator
Currenty the ext_param field only distinguishes between "config change" and
"vring interrupt". We can do a lot more with it though, so let's enable a
full byte of possible values and constants to #defines while at it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Sun, 22 Aug 2010 11:13:33 +0000 (19:13 +0800)]
KVM: MMU: combine guest pte read between fetch and pte prefetch
Combine guest pte read between guest pte check in the fetch path and pte prefetch
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Sun, 22 Aug 2010 11:12:48 +0000 (19:12 +0800)]
KVM: MMU: prefetch ptes when intercepted guest #PF
Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access
If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Sun, 22 Aug 2010 11:11:43 +0000 (19:11 +0800)]
KVM: MMU: introduce gfn_to_page_many_atomic() function
Introduce this function to get consecutive gfn's pages, it can reduce
gup's overload, used by later patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Sun, 22 Aug 2010 11:10:28 +0000 (19:10 +0800)]
KVM: MMU: introduce hva_to_pfn_atomic function
Introduce hva_to_pfn_atomic(), it's the fast path and can used in atomic
context, the later patch will use it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Sun, 22 Aug 2010 11:08:57 +0000 (19:08 +0800)]
export __get_user_pages_fast() function
This function is used by KVM to pin process's page in the atomic context.
Define the 'weak' function to avoid other architecture not support it
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:33 +0000 (22:07 -1000)]
KVM: x86: Add timekeeping documentation
Basic informational document about x86 timekeeping and how KVM
is affected.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:30 +0000 (22:07 -1000)]
KVM: x86: Fix a possible backwards warp of kvmclock
Kernel time, which advances in discrete steps may progress much slower
than TSC. As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).
We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:29 +0000 (22:07 -1000)]
x86: pvclock: Move scale_delta into common header
The scale_delta function for shift / multiply with 31-bit
precision moves to a common header so it can be used by both
kernel and kvm module.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:28 +0000 (22:07 -1000)]
KVM: x86: Add clock sync request to hardware enable
If there are active VCPUs which are marked as belonging to
a particular hardware CPU, request a clock sync for them when
enabling hardware; the TSC could be desynchronized on a newly
arriving CPU, and we need to recompute guests system time
relative to boot after a suspend event.
This covers both cases.
Note that it is acceptable to take the spinlock, as either
no other tasks will be running and no locks held (BSP after
resume), or other tasks will be guaranteed to drop the lock
relatively quickly (AP on CPU_STARTING).
Noting we now get clock synchronization requests for VCPUs
which are starting up (or restarting), it is tempting to
attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers
at this time, however it is not correct to do so; they are
required for systems with non-constant TSC as the frequency
may not be known immediately after the processor has started
until the cpufreq driver has had a chance to run and query
the chipset.
Updated: implement better locking semantics for hardware_enable
Removed the hack of dropping and retaking the lock by adding the
semantic that we always hold kvm_lock when hardware_enable is
called. The one place that doesn't need to worry about it is
resume, as resuming a frozen CPU, the spinlock won't be taken.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:26 +0000 (22:07 -1000)]
KVM: x86: Robust TSC compensation
Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:25 +0000 (22:07 -1000)]
KVM: x86: Add helper functions for time computation
Add a helper function to compute the kernel time and convert nanoseconds
back to CPU specific cycles. Note that these must not be called in preemptible
context, as that would mean the kernel could enter software suspend state,
which would cause non-atomic operation.
Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel
time helper, these should be bootbased as well.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:24 +0000 (22:07 -1000)]
KVM: x86: Fix deep C-state TSC desynchronization
When CPUs with unstable TSCs enter deep C-state, TSC may stop
running. This causes us to require resynchronization. Since
we can't tell when this may potentially happen, we assume the
worst by forcing re-compensation for it at every point the VCPU
task is descheduled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:23 +0000 (22:07 -1000)]
KVM: x86: Unify TSC logic
Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops. Now all TSC decisions
can be done in one place.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:22 +0000 (22:07 -1000)]
KVM: x86: Warn about unstable TSC
If creating an SMP guest with unstable host TSC, issue a warning
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:21 +0000 (22:07 -1000)]
KVM: x86: Make cpu_tsc_khz updates use local CPU
This simplifies much of the init code; we can now simply always
call tsc_khz_changed, optionally passing it a new value, or letting
it figure out the existing value (while interrupts are disabled, and
thus, by inference from the rule, not raceful against CPU hotplug or
frequency updates, which will issue IPIs to the local CPU to perform
this very same task).
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:20 +0000 (22:07 -1000)]
KVM: x86: TSC reset compensation
Attempt to synchronize TSCs which are reset to the same value. In the
case of a reliable hardware TSC, we can just re-use the same offset, but
on non-reliable hardware, we can get closer by adjusting the offset to
match the elapsed time.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:17 +0000 (22:07 -1000)]
KVM: x86: Move TSC offset writes to common code
Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock. While the lock is overkill
now, it is useful later in this patch series.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:16 +0000 (22:07 -1000)]
KVM: x86: Convert TSC writes to TSC offset writes
Change svm / vmx to be the same internally and write TSC offset
instead of bare TSC in helper functions. Isolated as a single
patch to contain code movement.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Zachary Amsden [Fri, 20 Aug 2010 08:07:15 +0000 (22:07 -1000)]
KVM: x86: Drop vm_init_tsc
This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Mon, 23 Aug 2010 08:13:15 +0000 (16:13 +0800)]
KVM: MMU: fix missing percpu counter destroy
commit
ad05c88266b4cce1c820928ce8a0fb7690912ba1
(KVM: create aggregate kvm_total_used_mmu_pages value)
introduce percpu counter kvm_total_used_mmu_pages but never
destroy it, this may cause oops when rmmod & modprobe.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiaotian Feng [Tue, 24 Aug 2010 02:31:07 +0000 (10:31 +0800)]
KVM: MMU: fix regression from rework mmu_shrink() code
Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/
kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called
by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(),
This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
Reported-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Tested-by: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Thu, 19 Aug 2010 06:25:48 +0000 (14:25 +0800)]
KVM: x86 emulator: add JrCXZ instruction emulation
Add JrCXZ instruction emulation (opcode 0xe3)
Used by FreeBSD boot loader.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Mon, 23 Aug 2010 06:56:54 +0000 (14:56 +0800)]
KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation
Add LDS/LES/LFS/LGS/LSS instruction emulation.
(opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:37 +0000 (18:11 -0700)]
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:28 +0000 (18:11 -0700)]
KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages
Doing this makes the code much more readable. That's
borne out by the fact that this patch removes code. "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called. Keeping this
value as opposed to free makes the next patch simpler.
So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'. That
means they start out zeroed. I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages(). But, that only happens
via kvm ioctls.
Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:14 +0000 (18:11 -0700)]
KVM: rename x86 kvm->arch.n_alloc_mmu_pages
arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated". But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.
It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages. This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Dave Hansen [Fri, 20 Aug 2010 01:11:05 +0000 (18:11 -0700)]
KVM: abstract kvm x86 mmu->n_free_mmu_pages
"free" is a poor name for this value. In this context, it means,
"the number of mmu pages which this kvm instance should be able to
allocate." But "free" implies much more that the objects are there
and ready for use. "available" is a much better description, especially
when you see how it is calculated.
In this patch, we abstract its use into a function. We'll soon
replace the function's contents by calculating the value in a
different way.
All of the reads of n_free_mmu_pages are taken care of in this
patch. The modification sites will be handled in a patch
later in the series.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 19 Aug 2010 12:13:00 +0000 (15:13 +0300)]
KVM: x86 emulator: implement CWD (opcode 99)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 16:29:33 +0000 (19:29 +0300)]
KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 16:25:28 +0000 (19:25 +0300)]
KVM: x86 emulator: add Src2Imm decoding
Needed for 3-operand IMUL.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 16:20:21 +0000 (19:20 +0300)]
KVM: x86 emulator: consolidate immediate decode into a function
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:54:34 +0000 (18:54 +0300)]
KVM: x86 emulator: implement RDTSC (opcode 0F 31)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:53:43 +0000 (18:53 +0300)]
KVM: x86 emulator: remove SrcImplicit
Useless.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:31:43 +0000 (18:31 +0300)]
KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 15:25:25 +0000 (18:25 +0300)]
KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 12:12:09 +0000 (15:12 +0300)]
KVM: x86 emulator: implement RET imm16 (opcode C2)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 12:11:24 +0000 (15:11 +0300)]
KVM: x86 emulator: add SrcImmU16 operand type
Used for RET NEAR instructions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 11:51:45 +0000 (14:51 +0300)]
KVM: x86 emulator: implement CALL FAR (FF /3)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Wed, 18 Aug 2010 11:16:35 +0000 (14:16 +0300)]
KVM: x86 emulator: implement DAS (opcode 2F)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 16 Aug 2010 14:50:56 +0000 (17:50 +0300)]
KVM: x86 emulator: Use a register for ____emulate_2op() destination
Most x86 two operand instructions allow the destination to be a memory operand,
but IMUL (for example) requires that the destination be a register. Change
____emulate_2op() to take a register for both source and destination so we
can invoke IMUL.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 16 Aug 2010 14:49:52 +0000 (17:49 +0300)]
KVM: x86 emulator: pass destination type to ____emulate_2op()
We'll need it later so we can use a register for the destination.
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 18 Aug 2010 08:38:21 +0000 (16:38 +0800)]
KVM: x86 emulator: add LOOP/LOOPcc instruction emulation
Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2).
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Wed, 18 Aug 2010 08:43:13 +0000 (16:43 +0800)]
KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation
Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98)
Used by FreeBSD's boot loader.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 17 Aug 2010 08:22:17 +0000 (11:22 +0300)]
KVM: x86 emulator: fix REPZ/REPNZ termination condition
EFLAGS.ZF needs to be checked after each iteration, not before.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 17 Aug 2010 08:20:37 +0000 (11:20 +0300)]
KVM: x86 emulator: implement SCAS (opcodes AE, AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 17 Aug 2010 08:17:51 +0000 (11:17 +0300)]
KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS
emulate_push() only schedules a push; it doesn't actually push anything.
Call writeback() to flush out the write.
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 03:46:12 +0000 (11:46 +0800)]
KVM: x86 emulator: remove dup code of in/out instruction
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 03:45:12 +0000 (11:45 +0800)]
KVM: x86 emulator: change OUT instruction to use dst instead of src
Change OUT instruction to use dst instead of src, so we can
reuse those code for all out instructions.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 03:36:51 +0000 (11:36 +0800)]
KVM: x86 emulator: introduce DstImmUByte for dst operand decode
Introduce DstImmUByte for dst operand decode, which
will be used for out instruction.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 07:36:36 +0000 (15:36 +0800)]
KVM: x86 emulator: remove useless label from x86_emulate_insn()
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Fri, 6 Aug 2010 09:10:07 +0000 (17:10 +0800)]
KVM: x86 emulator: add setcc instruction emulation
Add setcc instruction emulation (opcode 0x0f 0x90~0x9f)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jiri Kosina [Mon, 16 Aug 2010 15:51:20 +0000 (17:51 +0200)]
KVM: x86: explain 'no-kvmclock' kernel parameter
no-kvmclock kernel parameter is missing its explanation in
Documentation/kernel-parameters.txt. Add it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 17 Aug 2010 01:19:34 +0000 (09:19 +0800)]
KVM: x86 emulator: add XADD instruction emulation
Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 17 Aug 2010 01:17:30 +0000 (09:17 +0800)]
KVM: x86 emulator: put register operand write back to a function
Introduce function write_register_operand() to write back the
register operand.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 17 Aug 2010 02:08:52 +0000 (10:08 +0800)]
KVM: PPC: fix leakage of error page in kvmppc_patch_dcbz()
Add kvm_release_page_clean() after is_error_page() to avoid
leakage of error page.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 15 Aug 2010 21:47:01 +0000 (00:47 +0300)]
KVM: Separate emulation context initialization in a separate function
The code for initializing the emulation context is duplicated at two
locations (emulate_instruction() and kvm_task_switch()). Separate it
in a separate function and call it from there.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 10 Aug 2010 05:48:22 +0000 (13:48 +0800)]
KVM: x86 emulator: add bsf/bsr instruction emulation
Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 8 Aug 2010 18:11:38 +0000 (21:11 +0300)]
KVM: x86 emulator: Fix emulate_grp3 return values
This patch lets emulate_grp3() return X86EMUL_* return codes instead
of hardcoded ones.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mohammed Gamal [Sun, 8 Aug 2010 18:11:37 +0000 (21:11 +0300)]
KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions
This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 9 Aug 2010 03:39:14 +0000 (11:39 +0800)]
KVM: x86 emulator: mask group 8 instruction as BitOp
Mask group 8 instruction as BitOp, so we can share the
code for adjust the source operand.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 9 Aug 2010 03:37:37 +0000 (11:37 +0800)]
KVM: x86 emulator: do not adjust the address for immediate source
adjust the dst address for a register source but not adjust the
address for an immediate source.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Mon, 9 Aug 2010 03:34:56 +0000 (11:34 +0800)]
KVM: x86 emulator: fix negative bit offset BitOp instruction emulation
If bit offset operands is a negative number, BitOp instruction
will return wrong value. This patch fix it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>