Alexander Graf [Wed, 24 Mar 2010 20:48:30 +0000 (21:48 +0100)]
KVM: PPC: Add OSI hypercall interface
MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.
So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.
The only user of it so far is MOL.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:29 +0000 (21:48 +0100)]
KVM: Add support for enabling capabilities per-vcpu
Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.
In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:28 +0000 (21:48 +0100)]
KVM: PPC: Implement alignment interrupt
Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.
But the spec for 970 and 750 also looks different. While 750 requires the
DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault
address (DAR), the 970 declares this as an optional feature. So we need
to reconstruct DSISR and DAR manually.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:27 +0000 (21:48 +0100)]
KVM: PPC: Implement emulation for lbzux and lhax
We get MMIOs with the weirdest instructions. But every time we do,
we need to improve our emulator to implement them.
So let's do that - this time it's lbzux and lhax's round.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:26 +0000 (21:48 +0100)]
KVM: PPC: Make XER load 32 bit
We have a 32 bit value in the PACA to store XER in. We also do an stw
when storing XER in there. But then we load it with ld, completely
screwing it up on every entry.
Welcome to the Big Endian world.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:25 +0000 (21:48 +0100)]
KVM: PPC: Implement BAT reads
BATs can't only be written to, you can also read them out!
So let's implement emulation for reading BAT values again.
While at it, I also made BAT setting flush the segment cache,
so we're absolutely sure there's no MMU state left when writing
BATs.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:24 +0000 (21:48 +0100)]
KVM: PPC: Implement mfsr emulation
We emulate the mfsrin instruction already, that passes the SR number
in a register value. But we lacked support for mfsr that encoded the
SR number in the opcode.
So let's implement it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:23 +0000 (21:48 +0100)]
KVM: PPC: Load VCPU for register fetching
When trying to read or store vcpu register data, we should also make
sure the vcpu is actually loaded, so we're 100% sure we get the correct
values.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:22 +0000 (21:48 +0100)]
KVM: PPC: Don't reload FPU with invalid values
When the guest activates the FPU, we load it up. That's fine when
it wasn't activated before on the host, but if it was we end up
reloading FPU values from last time the FPU was deactivated on the
host without writing the proper values back to the vcpu struct.
This patch checks if the FPU is enabled already and if so just doesn't
bother activating it, making FPU operations survive guest context switches.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:21 +0000 (21:48 +0100)]
KVM: PPC: Split instruction reading out
The current check_ext function reads the instruction and then does
the checking. Let's split the reading out so we can reuse it for
different functions.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:20 +0000 (21:48 +0100)]
KVM: PPC: Book3S_32 guest MMU fixes
This patch makes the VSID of mapped pages always reflecting all special cases
we have, like split mode.
It also changes the tlbie mask to 0x0ffff000 according to the spec. The mask
we used before was incorrect.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:19 +0000 (21:48 +0100)]
KVM: PPC: Make DSISR 32 bits wide
DSISR is only defined as 32 bits wide. So let's reflect that in the
structs too.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:18 +0000 (21:48 +0100)]
KVM: PPC: Allow userspace to unset the IRQ line
Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.
So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.
Signed-off-by: Alexander Graf <agraf@suse.de>
v2 -> v3:
- Add CAP for unset irq
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Wed, 24 Mar 2010 20:48:17 +0000 (21:48 +0100)]
KVM: PPC: Ensure split mode works
On PowerPC we can go into MMU Split Mode. That means that either
data relocation is on but instruction relocation is off or vice
versa.
That mode didn't work properly, as we weren't always flushing
entries when going into a new split mode, potentially mapping
different code or data that we're supposed to.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 25 Mar 2010 10:27:30 +0000 (12:27 +0200)]
KVM: Document KVM_SET_TSS_ADDR
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 25 Mar 2010 10:16:48 +0000 (12:16 +0200)]
KVM: Document KVM_SET_USER_MEMORY_REGION
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 14 Mar 2010 08:16:40 +0000 (10:16 +0200)]
KVM: MMU: Disassociate direct maps from guest levels
Direct maps are linear translations for a section of memory, used for
real mode or with large pages. As such, they are independent of the guest
levels.
Teach the mmu about this by making page->role.glevels = 0 for direct maps.
This allows direct maps to be shared among real mode and the various paging
modes.
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Fri, 19 Mar 2010 09:58:53 +0000 (17:58 +0800)]
KVM: MMU: check reserved bits only if CR4.PSE=1 or CR4.PAE=1
- Check reserved bits only if CR4.PAE=1 or CR4.PSE=1 when guest #PF occurs
- Fix a typo in reset_rsvds_bits_mask()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Tue, 23 Mar 2010 17:15:53 +0000 (14:15 -0300)]
KVM: x86: document KVM_REQ_PENDING_TIMER usage
Document that KVM_REQ_PENDING_TIMER is implicitly used during guest
entry.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 21 Mar 2010 14:58:36 +0000 (16:58 +0200)]
KVM: x86 emulator: fix unlocked CMPXCHG8B emulation
When CMPXCHG8B is executed without LOCK prefix it is racy. Preserve this
behaviour in emulator too.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 21 Mar 2010 11:08:21 +0000 (13:08 +0200)]
KVM: x86 emulator: add decoding of CMPXCHG8B dst operand
Decode CMPXCHG8B destination operand in decoding stage. Fixes regression
introduced by "If LOCK prefix is used dest arg should be memory" commit.
This commit relies on dst operand be decoded at the beginning of an
instruction emulation.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 21 Mar 2010 11:08:20 +0000 (13:08 +0200)]
KVM: x86 emulator: commit rflags as part of registers commit
Make sure that rflags is committed only after successful instruction
emulation.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Sat, 20 Mar 2010 09:14:13 +0000 (10:14 +0100)]
KVM: x86: Fix 32-bit build breakage due to typo
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:28 +0000 (15:20 +0200)]
KVM: small kvm_arch_vcpu_ioctl_run() cleanup.
Unify all conditions that get us back into emulator after returning from
userspace.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:27 +0000 (15:20 +0200)]
KVM: x86 emulator: introduce pio in string read ahead.
To optimize "rep ins" instruction do IO in big chunks ahead of time
instead of doing it only when required during instruction emulation.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:26 +0000 (15:20 +0200)]
KVM: x86 emulator: restart string instruction without going back to a guest.
Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode, but return to a guest
mode each 1024 iterations to allow interrupt injection. Pending
exception causes immediate guest entry too.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:25 +0000 (15:20 +0200)]
KVM: x86 emulator: remove saved_eip
c->eip is never written back in case of emulation failure, so no need to
set it to old value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:24 +0000 (15:20 +0200)]
KVM: x86 emulator: Move string pio emulation into emulator.c
Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:23 +0000 (15:20 +0200)]
KVM: x86 emulator: fix in/out emulation.
in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:22 +0000 (15:20 +0200)]
KVM: x86 emulator: during rep emulation decrement ECX only if emulation succeeded
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:21 +0000 (15:20 +0200)]
KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM
Add decoding of X,Y parameters from Intel SDM which are used by string
instruction to specify source and destination. Use this new decoding
to implement movs, cmps, stos, lods in a generic way.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:20 +0000 (15:20 +0200)]
KVM: x86 emulator: populate OP_MEM operand during decoding.
All struct operand fields are initialized during decoding for all
operand types except OP_MEM, but there is no reason for that. Move
OP_MEM operand initialization into decoding stage for consistency.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:19 +0000 (15:20 +0200)]
KVM: Use task switch from emulator.c
Remove old task switch code from x86.c
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:18 +0000 (15:20 +0200)]
KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:17 +0000 (15:20 +0200)]
KVM: x86 emulator: Emulate task switch in emulator.c
Implement emulation of 16/32 bit task switch in emulator.c
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:16 +0000 (15:20 +0200)]
KVM: x86 emulator: Provide more callbacks for x86 emulator.
Provide get_cached_descriptor(), set_cached_descriptor(),
get_segment_selector(), set_segment_selector(), get_gdt(),
write_std() callbacks.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:15 +0000 (15:20 +0200)]
KVM: x86 emulator: cleanup grp3 return value
When x86_emulate_insn() does not know how to emulate instruction it
exits via cannot_emulate label in all cases except when emulating
grp3. Fix that.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:14 +0000 (15:20 +0200)]
KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:13 +0000 (15:20 +0200)]
KVM: x86 emulator: do not call writeback if msr access fails.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:12 +0000 (15:20 +0200)]
KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations
Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD
for those instruction when appropriate.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:11 +0000 (15:20 +0200)]
KVM: x86 emulator: fix mov dr to inject #UD when needed.
If CR4.DE=1 access to registers DR4/DR5 cause #UD.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:10 +0000 (15:20 +0200)]
KVM: x86 emulator: inject #UD on access to non-existing CR
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:09 +0000 (15:20 +0200)]
KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:08 +0000 (15:20 +0200)]
KVM: x86 emulator: fix 0f 01 /5 emulation
It is undefined and should generate #UD.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:07 +0000 (15:20 +0200)]
KVM: x86 emulator: fix mov r/m, sreg emulation.
mov r/m, sreg generates #UD ins sreg is incorrect.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:06 +0000 (15:20 +0200)]
KVM: Provide current eip as part of emulator context.
Eliminate the need to call back into KVM to get it from emulator.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:05 +0000 (15:20 +0200)]
KVM: Provide x86_emulate_ctxt callback to get current cpl
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:04 +0000 (15:20 +0200)]
KVM: remove realmode_lmsw function.
Use (get|set)_cr callback to emulate lmsw inside emulator.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 18 Mar 2010 13:20:03 +0000 (15:20 +0200)]
KVM: Provide callback to get/set control registers in emulator ops.
Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Takuya Yoshikawa [Mon, 15 Mar 2010 13:13:30 +0000 (22:13 +0900)]
KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.
Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().
This patch clears these problems.
We move the coalesced mmio's initialization out of kvm_create_vm().
This seems to be natural because it includes a registration which
can be done only when vm is successfully created.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gui Jianfeng [Mon, 15 Mar 2010 09:29:09 +0000 (17:29 +0800)]
KVM: VMX: change to use bool return values
Make use of bool as return values, and remove some useless
bool value converting. Thanks Avi to point this out.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 15 Mar 2010 14:38:31 +0000 (16:38 +0200)]
KVM: Remove pointer to rflags from realmode_set_cr parameters.
Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 15 Mar 2010 14:38:30 +0000 (16:38 +0200)]
KVM: x86 emulator: check return value against correct define
Check return value against correct define instead of open code
the value.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 15 Mar 2010 14:38:29 +0000 (16:38 +0200)]
KVM: x86 emulator: fix RCX access during rep emulation
During rep emulation access length to RCX depends on current address
mode.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Mon, 15 Mar 2010 14:38:28 +0000 (16:38 +0200)]
KVM: x86 emulator: Fix DstAcc decoding.
Set correct operation length. Add RAX (64bit) handling.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:57 +0000 (13:59 +0200)]
KVM: MMU: Reinstate pte prefetch on invlpg
Commit
fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:
"The processor may create entries in paging-structure caches for
translations required for prefetches and for accesses that are a
result of speculative execution that would never actually occur
in the executed code path."
And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.
Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter. If a race occured, do not install the pte.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:56 +0000 (13:59 +0200)]
KVM: MMU: Do not instantiate nontrapping spte on unsync page
The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written. This is fine since at present it is only
used on sync pages. However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.
Needed for the next patch which reinstates update_pte() on invlpg.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:55 +0000 (13:59 +0200)]
KVM: Don't follow an atomic operation by a non-atomic one
Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu
but undoes the whole point of doing things atomically.
Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:54 +0000 (13:59 +0200)]
KVM: Make locked operations truly atomic
Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.
These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest. This may cause corruption of guest page tables.
Fix by using an atomic cmpxchg for these operations.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:53 +0000 (13:59 +0200)]
KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
jing zhang [Sat, 13 Mar 2010 07:00:45 +0000 (15:00 +0800)]
KVM: fix assigned_device_enable_host_msix error handling
Free IRQ's and disable MSIX upon failure.
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 12 Mar 2010 04:59:06 +0000 (12:59 +0800)]
KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure
This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
from -EINVAL to -ENXIO if no coalesced mmio dev exists.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 12 Mar 2010 02:11:15 +0000 (10:11 +0800)]
KVM: ia64: fix the error of ioctl KVM_IRQ_LINE if no irq chip
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 12 Mar 2010 02:09:45 +0000 (10:09 +0800)]
KVM: x86: fix the error of ioctl KVM_IRQ_LINE if no irq chip
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 12 Mar 2010 00:45:39 +0000 (08:45 +0800)]
KVM: ia64: fix the error code of ioctl KVM_IA64_VCPU_GET_STACK failure
The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if
copy_to_user() fail, and 0 will be return, we should use -EFAULT
instead of 0 in this case, so this patch fixed it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 5 Mar 2010 04:11:48 +0000 (12:11 +0800)]
KVM: x86: Use native_store_idt() instead of kvm_get_idt()
This patch use generic linux function native_store_idt()
instead of kvm_get_idt(), and also removed the useless
function kvm_get_idt().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 11 Mar 2010 11:01:59 +0000 (13:01 +0200)]
KVM: Trace exception injection
Often an exception can help point out where things start to go wrong.
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 11 Mar 2010 08:50:44 +0000 (10:50 +0200)]
KVM: Move kvm_exit tracepoint rip reading inside tracepoint
Reading rip is expensive on vmx, so move it inside the tracepoint so we only
incur the cost if tracing is enabled.
Signed-off-by: Avi Kivity <avi@redhat.com>
Minchan Kim [Wed, 10 Mar 2010 14:31:22 +0000 (23:31 +0900)]
KVM: remove redundant initialization of page->private
The prep_new_page() in page allocator calls set_page_private(page, 0).
So we don't need to reinitialize private of page.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Avi Kivity<avi@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Xiao Guangrong [Wed, 10 Mar 2010 11:00:43 +0000 (19:00 +0800)]
KVM: cleanup kvm trace
This patch does:
- no need call tracepoint_synchronize_unregister() when kvm module
is unloaded since ftrace can handle it
- cleanup ftrace's macro
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Mar 2010 06:13:43 +0000 (14:13 +0800)]
KVM: PPC: Do not create debugfs if fail to create vcpu
If fail to create the vcpu, we should not create the debugfs
for it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Alexander Graf <agraf@suse.de>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Wei Yongjun [Tue, 9 Mar 2010 06:37:53 +0000 (14:37 +0800)]
KVM: s390: Fix possible memory leak of in kvm_arch_vcpu_create()
This patch fixed possible memory leak in kvm_arch_vcpu_create()
under s390, which would happen when kvm_arch_vcpu_create() fails.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Wed, 3 Mar 2010 15:53:05 +0000 (17:53 +0200)]
KVM: x86 emulator mark VMMCALL and LMSW as privileged
LMSW is present in both group tables. It was marked privileged only in
one of them. Intel analog of VMMCALL is already marked privileged.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:40 +0000 (15:34 +0100)]
KVM: SVM: Ignore lower 12 bit of nested msrpm_pa
These bits are ignored by the hardware too. Implement this
for nested svm too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:39 +0000 (15:34 +0100)]
KVM; SVM: Add correct handling of nested iopm
This patch adds the correct handling of the nested io
permission bitmap. Old behavior was to not lookup the port
in the iopm but only reinject an io intercept to the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:38 +0000 (15:34 +0100)]
KVM: SVM: Use svm_msrpm_offset in nested_svm_exit_handled_msr
There is a generic function now to calculate msrpm offsets.
Use that function in nested_svm_exit_handled_msr() remove
the duplicate logic (which had a bug anyway).
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:37 +0000 (15:34 +0100)]
KVM: SVM: Optimize nested svm msrpm merging
This patch optimizes the way the msrpm of the host and the
guest are merged. The old code merged the 2 msrpm pages
completly. This code needed to touch 24kb of memory for that
operation. The optimized variant this patch introduces
merges only the parts where the host msrpm may contain zero
bits. This reduces the amount of memory which is touched to
48 bytes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:36 +0000 (15:34 +0100)]
KVM: SVM: Introduce direct access msr list
This patch introduces a list with all msrs a guest might
have direct access to and changes the svm_vcpu_init_msrpm
function to use this list.
It also adds a check to set_msr_interception which triggers
a warning if a developer changes a msr intercept that is not
in the list.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:35 +0000 (15:34 +0100)]
KVM: SVM: Move msrpm offset calculation to seperate function
The algorithm to find the offset in the msrpm for a given
msr is needed at other places too. Move that logic to its
own function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Joerg Roedel [Mon, 1 Mar 2010 14:34:34 +0000 (15:34 +0100)]
KVM: SVM: Return correct values in nested_svm_exit_handled_msr
The nested_svm_exit_handled_msr() returned an bool which is
a bug. I worked by accident because the exected integer
return values match with the true and false values. This
patch changes the return value to int and let the function
return the correct values.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andrea Gelmini [Sat, 27 Feb 2010 16:51:43 +0000 (17:51 +0100)]
KVM: arch/x86/kvm/kvm_timer.h checkpatch cleanup
arch/x86/kvm/kvm_timer.h:13: ERROR: code indent should use tabs where possible
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Gleb Natapov [Thu, 25 Feb 2010 14:36:43 +0000 (16:36 +0200)]
KVM: x86 emulator: Implement jmp far opcode ff/5
Implement jmp far opcode ff/5. It is used by multiboot loader.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 25 Feb 2010 14:36:42 +0000 (16:36 +0200)]
KVM: x86 emulator: Add decoding of 16bit second in memory argument
Add decoding of Ep type of argument used by callf/jmpf.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 25 Feb 2010 10:43:09 +0000 (12:43 +0200)]
KVM: move segment_base() into vmx.c
segment_base() is used only by vmx so move it there.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 25 Feb 2010 10:43:08 +0000 (12:43 +0200)]
KVM: fix segment_base() error checking
fix segment_base() to properly check for null segment selector and
avoid accessing NULL pointer if ldt selector in null.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 25 Feb 2010 10:43:07 +0000 (12:43 +0200)]
KVM: Drop kvm_get_gdt() in favor of generic linux function
Linux now has native_store_gdt() to do the same. Use it instead of
kvm local version.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Thu, 25 Feb 2010 02:33:19 +0000 (11:33 +0900)]
KVM: update gfn_to_hva() to use gfn_to_hva_memslot()
Marcelo introduced gfn_to_hva_memslot() when he implemented
gfn_to_pfn_memslot(). Let's use this for gfn_to_hva() too.
Note: also remove parentheses next to return as checkpatch said to do.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:19 +0000 (18:59 +0100)]
KVM: SVM: Clear exit_info for injected INTR exits
When injecting an vmexit.intr into the nested hypervisor
there might be leftover values in the exit_info fields.
Clear them to not confuse nested hypervisors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:18 +0000 (18:59 +0100)]
KVM: SVM: Handle nested selective_cr0 intercept correctly
If we have the following situation with nested svm:
1. Host KVM intercepts cr0 writes
2. Guest hypervisor intercepts only selective cr0 writes
Then we get an cr0 write intercept which is handled on the
host. But that intercepts may actually be a selective cr0
intercept for the guest. This patch checks for this
condition and injects a selective cr0 intercept if needed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:17 +0000 (18:59 +0100)]
KVM: x86: Don't set arch.cr0 in kvm_set_cr0
The vcpu->arch.cr0 variable is already set in the
architecture specific set_cr0 callbacks. There is no need to
set it in the common code.
This allows the architecture code to keep the old arch.cr0
value if it wants. This is required for nested svm to decide
if a selective_cr0 exit needs to be injected.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:16 +0000 (18:59 +0100)]
KVM: SVM: Ignore write of hwcr.ignne
Hyper-V as a guest wants to write this bit. This patch
ignores it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:15 +0000 (18:59 +0100)]
KVM: SVM: Implement emulation of vm_cr msr
This patch implements the emulation of the vm_cr msr for
nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:14 +0000 (18:59 +0100)]
KVM: SVM: Add kvm_nested_intercepts tracepoint
This patch adds a tracepoint to get information about the
most important intercept bitmasks from the nested vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:13 +0000 (18:59 +0100)]
KVM: SVM: Restore tracing of nested vmcb address
A recent change broke tracing of the nested vmcb address. It
was reported as 0 all the time. This patch fixes it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:12 +0000 (18:59 +0100)]
KVM: SVM: Check for nested intercepts on NMI injection
This patch implements the NMI intercept checking for nested
svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:11 +0000 (18:59 +0100)]
KVM: SVM: Reset MMU on nested_svm_vmrun for NPT too
Without resetting the MMU the gva_to_pga function will not
work reliably when the vcpu is running in nested context.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:10 +0000 (18:59 +0100)]
KVM: SVM: Coding style cleanup
This patch removes whitespace errors, fixes comment formats
and most of checkpatch warnings. Now vim does not show
c-space-errors anymore.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:59 +0000 (17:47 +0100)]
KVM: x86: Preserve injected TF across emulation
Call directly into the vendor services for getting/setting rflags in
emulate_instruction to ensure injected TF survives the emulation.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:58 +0000 (17:47 +0100)]
KVM: x86: Drop RF manipulation for guest single-stepping
RF is not required for injecting TF as the latter will trigger only
after an instruction execution anyway. So do not touch RF when arming or
disarming guest single-step mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:56 +0000 (17:47 +0100)]
KVM: SVM: Emulate nRIP feature when reinjecting INT3
When in guest debugging mode, we have to reinject those #BP software
exceptions that are caused by guest-injected INT3. As older AMD
processors do not support the required nRIP VMCB field, try to emulate
it by moving RIP past the instruction on exception injection. Fix it up
again in case the injection failed and we were able to catch this. This
does not work for unintercepted faults, but it is better than doing
nothing.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>