Joerg Roedel [Wed, 24 Feb 2010 17:59:17 +0000 (18:59 +0100)]
KVM: x86: Don't set arch.cr0 in kvm_set_cr0
The vcpu->arch.cr0 variable is already set in the
architecture specific set_cr0 callbacks. There is no need to
set it in the common code.
This allows the architecture code to keep the old arch.cr0
value if it wants. This is required for nested svm to decide
if a selective_cr0 exit needs to be injected.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:16 +0000 (18:59 +0100)]
KVM: SVM: Ignore write of hwcr.ignne
Hyper-V as a guest wants to write this bit. This patch
ignores it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:15 +0000 (18:59 +0100)]
KVM: SVM: Implement emulation of vm_cr msr
This patch implements the emulation of the vm_cr msr for
nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:14 +0000 (18:59 +0100)]
KVM: SVM: Add kvm_nested_intercepts tracepoint
This patch adds a tracepoint to get information about the
most important intercept bitmasks from the nested vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:13 +0000 (18:59 +0100)]
KVM: SVM: Restore tracing of nested vmcb address
A recent change broke tracing of the nested vmcb address. It
was reported as 0 all the time. This patch fixes it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:12 +0000 (18:59 +0100)]
KVM: SVM: Check for nested intercepts on NMI injection
This patch implements the NMI intercept checking for nested
svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:11 +0000 (18:59 +0100)]
KVM: SVM: Reset MMU on nested_svm_vmrun for NPT too
Without resetting the MMU the gva_to_pga function will not
work reliably when the vcpu is running in nested context.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:10 +0000 (18:59 +0100)]
KVM: SVM: Coding style cleanup
This patch removes whitespace errors, fixes comment formats
and most of checkpatch warnings. Now vim does not show
c-space-errors anymore.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:59 +0000 (17:47 +0100)]
KVM: x86: Preserve injected TF across emulation
Call directly into the vendor services for getting/setting rflags in
emulate_instruction to ensure injected TF survives the emulation.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:58 +0000 (17:47 +0100)]
KVM: x86: Drop RF manipulation for guest single-stepping
RF is not required for injecting TF as the latter will trigger only
after an instruction execution anyway. So do not touch RF when arming or
disarming guest single-step mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:56 +0000 (17:47 +0100)]
KVM: SVM: Emulate nRIP feature when reinjecting INT3
When in guest debugging mode, we have to reinject those #BP software
exceptions that are caused by guest-injected INT3. As older AMD
processors do not support the required nRIP VMCB field, try to emulate
it by moving RIP past the instruction on exception injection. Fix it up
again in case the injection failed and we were able to catch this. This
does not work for unintercepted faults, but it is better than doing
nothing.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:55 +0000 (17:47 +0100)]
KVM: x86: Add kvm_is_linear_rip
Based on Gleb's suggestion: Add a helper kvm_is_linear_rip that matches
a given linear RIP against the current one. Use this for guest
single-stepping, more users will follow.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:54 +0000 (17:47 +0100)]
KVM: SVM: Move svm_queue_exception
Move svm_queue_exception past skip_emulated_instruction to allow calling
it later on.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Wed, 24 Feb 2010 09:41:58 +0000 (10:41 +0100)]
KVM: x86: Kick VCPU outside PIC lock again
This restores the deferred VCPU kicking before
956f97cf. We need this
over -rt as wake_up* requires non-atomic context in this configuration.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 22 Feb 2010 15:52:14 +0000 (16:52 +0100)]
KVM: PPC: Destory timer on vcpu destruction
When we destory a vcpu, we should also make sure to kill all pending
timers that could still be up. When not doing this, hrtimers might
dereference null pointers trying to call our code.
This patch fixes spontanious kernel panics seen after closing VMs.
Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 22 Feb 2010 15:52:08 +0000 (16:52 +0100)]
KVM: PPC: Memset vcpu to zeros
While converting the kzalloc we used to allocate our vcpu struct to
vmalloc, I forgot to memset the contents to zeros. That broke quite
a lot.
This patch memsets it to zero again.
Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Feb 2010 09:45:43 +0000 (10:45 +0100)]
KVM: x86: Add support for saving&restoring debug registers
So far user space was not able to save and restore debug registers for
migration or after reset. Plug this hole.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Fri, 19 Feb 2010 18:38:07 +0000 (19:38 +0100)]
KVM: x86: Save&restore interrupt shadow mask
The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.
As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Feb 2010 09:45:41 +0000 (10:45 +0100)]
KVM: x86: Do not return soft events in vcpu_events
To avoid that user space migrates a pending software exception or
interrupt, mask them out on KVM_GET_VCPU_EVENTS. Without this, user
space would try to reinject them, and we would have to reconstruct the
proper instruction length for VMX event injection. Now the pending event
will be reinjected via executing the triggering instruction again.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:01 +0000 (16:23 +0100)]
KVM: SVM: Fix wrong interrupt injection in enable_irq_windows
The nested_svm_intr() function does not execute the vmexit
anymore. Therefore we may still be in the nested state after
that function ran. This patch changes the nested_svm_intr()
function to return wether the irq window could be enabled.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Sun, 21 Feb 2010 13:00:47 +0000 (15:00 +0200)]
KVM: drop unneeded kvm_run check in emulate_instruction()
vcpu->run is initialized on vcpu creation and can never be NULL
here.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 11:24:33 +0000 (12:24 +0100)]
KVM: PPC: Allocate vcpu struct using vmalloc
We used to use get_free_pages to allocate our vcpu struct. Unfortunately
that call failed on me several times after my machine had a big enough
uptime, as memory became too fragmented by then.
Fortunately, we don't need it to be page aligned any more! We can just
vmalloc it and everything's great.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:47 +0000 (11:00 +0100)]
KVM: PPC: Simplify kvmppc_load_up_(FPU|VMX|VSX)
We don't need as complex code. I had some thinkos while writing it, figuring
I needed to support PPC32 paths on PPC64 which would have required DR=0, but
everything just runs fine with DR=1.
So let's make the functions simple C call wrappers that reserve some space on
the stack for the respective functions to clobber.
Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:46 +0000 (11:00 +0100)]
KVM: PPC: Enable use of secondary htab bucket
We had code to make use of the secondary htab buckets, but kept that
disabled because it was unstable when I put it in.
I checked again if that's still the case and apparently it was only
exposing some instability that was there anyways before. I haven't
seen any badness related to usage of secondary htab entries so far.
This should speed up guest memory allocations by quite a bit, because
we now have more space to put PTEs in.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:45 +0000 (11:00 +0100)]
KVM: PPC: Add capability for paired singles
We need to tell userspace that we can emulate paired single instructions.
So let's add a capability export.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:44 +0000 (11:00 +0100)]
KVM: PPC: Implement Paired Single emulation
The one big thing about the Gekko is paired singles.
Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.
Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.
Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.
In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.
That way we can execute paired single FPU operations without implementing a
soft fpu.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:43 +0000 (11:00 +0100)]
KVM: PPC: Enable program interrupt to do MMIO
When we get a program interrupt we usually don't expect it to perform an
MMIO operation. But why not? When we emulate paired singles, we can end
up loading or storing to an MMIO address - and the handling of those
happens in the program interrupt handler.
So let's teach the program interrupt handler how to deal with EMULATE_MMIO.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:42 +0000 (11:00 +0100)]
KVM: PPC: Add helpers to modify ppc fields
The PowerPC specification always lists bits from MSB to LSB. That is
really confusing when you're trying to write C code, because it fits
in pretty badly with the normal (1 << xx) schemes.
So I came up with some nice wrappers that allow to get and set fields
in a u64 with bit numbers exactly as given in the spec. That makes the
code in KVM and the spec easier comparable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:41 +0000 (11:00 +0100)]
KVM: PPC: Fix error in BAT assignment
BATs didn't work. Well, they did, but only up to BAT3. As soon as we
came to BAT4 the offset calculation was screwed up and we ended up
overwriting BAT0-3.
Fortunately, Linux hasn't been using BAT4+. It's still a good
idea to write correct code though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:40 +0000 (11:00 +0100)]
KVM: PPC: Add helpers to call FPU instructions
To emulate paired single instructions, we need to be able to call FPU
operations from within the kernel. Since we don't want gcc to spill
arbitrary FPU code everywhere, we tell it to use a soft fpu.
Since we know we can really call the FPU in safe areas, let's also add
some calls that we can later use to actually execute real world FPU
operations on the host's FPU.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:39 +0000 (11:00 +0100)]
KVM: PPC: Make ext giveup non-static
We need to call the ext giveup handlers from code outside of book3s.c.
So let's make it non-static.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:38 +0000 (11:00 +0100)]
KVM: PPC: Make software load/store return eaddr
The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.
Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:37 +0000 (11:00 +0100)]
KVM: PPC: Implement mtsr instruction emulation
The Book3S_32 specifications allows for two instructions to modify segment
registers: mtsrin and mtsr.
Most normal operating systems use mtsrin, because it allows to define which
segment it wants to change using a register. But since I was trying to run
an embedded guest, it turned out to be using mtsr with hardcoded values.
So let's also emulate mtsr. It's a valid instruction after all.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:36 +0000 (11:00 +0100)]
KVM: PPC: Fix typo in book3s_32 debug code
There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
to debug something I stumbled across that and wanted to save anyone after me
(or myself later) from having to debug that again.
So let's fix the ifdef.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:35 +0000 (11:00 +0100)]
KVM: PPC: Preload FPU when possible
There are some situations when we're pretty sure the guest will use the
FPU soon. So we can save the churn of going into the guest, finding out
it does want to use the FPU and going out again.
This patch adds preloading of the FPU when it's reasonable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:34 +0000 (11:00 +0100)]
KVM: PPC: Combine extension interrupt handlers
When we for example get an Altivec interrupt, but our guest doesn't support
altivec, we need to inject a program interrupt, not an altivec interrupt.
The same goes for paired singles. When an altivec interrupt arrives, we're
pretty sure we need to emulate the instruction because it's a paired single
operation.
So let's make all the ext handlers aware that they need to jump to the
program interrupt handler when an extension interrupt arrives that
was not supposed to arrive for the guest CPU.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:33 +0000 (11:00 +0100)]
KVM: PPC: Add Gekko SPRs
The Gekko has some SPR values that differ from other PPC core values and
also some additional ones.
Let's add support for them in our mfspr/mtspr emulator.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:32 +0000 (11:00 +0100)]
KVM: PPC: Add hidden flag for paired singles
The Gekko implements an extension called paired singles. When the guest wants
to use that extension, we need to make sure we're not running the host FPU,
because all FPU instructions need to get emulated to accomodate for additional
operations that occur.
This patch adds an hflag to track if we're in paired single mode or not.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:31 +0000 (11:00 +0100)]
KVM: PPC: Add AGAIN type for emulation return
Emulation of an instruction can have different outcomes. It can succeed,
fail, require MMIO, do funky BookE stuff - or it can just realize something's
odd and will be fixed the next time around.
Exactly that is what EMULATE_AGAIN means. Using that flag we can now tell
the caller that nothing happened, but we still want to go back to the
guest and see what happens next time we come around.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:30 +0000 (11:00 +0100)]
KVM: PPC: Teach MMIO Signedness
The guest I was trying to get to run uses the LHA and LHAU instructions.
Those instructions basically do a load, but also sign extend the result.
Since we need to fill our registers by hand when doing MMIO, we also need
to sign extend manually.
This patch implements sign extended MMIO and the LHA(U) instructions.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:29 +0000 (11:00 +0100)]
KVM: PPC: Enable MMIO to do 64 bits, fprs and qprs
Right now MMIO access can only happen for GPRs and is at most 32 bit wide.
That's actually enough for almost all types of hardware out there.
Unfortunately, the guest I was using used FPU writes to MMIO regions, so
it ended up writing 64 bit MMIOs using FPRs and QPRs.
So let's add code to handle those odd cases too.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:28 +0000 (11:00 +0100)]
KVM: PPC: Make fpscr 64-bit
Modern PowerPCs have a 64 bit wide FPSCR register. Let's accomodate for that
and make it 64 bits in our vcpu struct too.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:27 +0000 (11:00 +0100)]
KVM: PPC: Add QPR registers
The Gekko has GPRs, SPRs and FPRs like normal PowerPC codes, but
it also has QPRs which are basically single precision only FPU registers
that get used when in paired single mode.
The following patches depend on them being around, so let's add the
definitions early.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:09 +0000 (16:23 +0100)]
KVM: SVM: Remove newlines from nested trace points
The tracing infrastructure adds its own newlines. Remove
them from the trace point printk format strings.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:08 +0000 (16:23 +0100)]
KVM: SVM: Make lazy FPU switching work with nested svm
The new lazy fpu switching code may disable cr0 intercepts
when running nested. This is a bug because the nested
hypervisor may still want to intercept cr0 which will break
in this situation. This patch fixes this issue and makes
lazy fpu switching working with nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:07 +0000 (16:23 +0100)]
KVM: SVM: Activate nested state only when guest state is complete
Certain functions called during the emulated world switch
behave differently when the vcpu is running nested. This is
not the expected behavior during a world switch emulation.
This patch ensures that the nested state is activated only
if the vcpu is completly in nested state.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:06 +0000 (16:23 +0100)]
KVM: SVM: Don't sync nested cr8 to lapic and back
This patch makes syncing of the guest tpr to the lapic
conditional on !nested. Otherwise a nested guest using the
TPR could freeze the guest.
Another important change this patch introduces is that the
cr8 intercept bits are no longer ORed at vmrun emulation if
the guest sets VINTR_MASKING in its VMCB. The reason is that
nested cr8 accesses need alway be handled by the nested
hypervisor because they change the shadow version of the
tpr.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:05 +0000 (16:23 +0100)]
KVM: SVM: Fix nested msr intercept handling
The nested_svm_exit_handled_msr() function maps only one
page of the guests msr permission bitmap. This patch changes
the code to use kvm_read_guest to fix the bug.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:04 +0000 (16:23 +0100)]
KVM: SVM: Annotate nested_svm_map with might_sleep()
The nested_svm_map() function can sleep and must not be
called from atomic context. So annotate that function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:03 +0000 (16:23 +0100)]
KVM: SVM: Sync all control registers on nested vmexit
Currently the vmexit emulation does not sync control
registers were the access is typically intercepted by the
nested hypervisor. But we can not count on that intercepts
to sync these registers too and make the code
architecturally more correct.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:02 +0000 (16:23 +0100)]
KVM: SVM: Fix schedule-while-atomic on nested exception handling
Move the actual vmexit routine out of code that runs with
irqs and preemption disabled.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:00 +0000 (16:23 +0100)]
KVM: SVM: Don't use kmap_atomic in nested_svm_map
Use of kmap_atomic disables preemption but if we run in
shadow-shadow mode the vmrun emulation executes kvm_set_cr3
which might sleep or fault. So use kmap instead for
nested_svm_map.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Fri, 12 Feb 2010 07:02:54 +0000 (16:02 +0900)]
KVM: remove redundant prototype of load_pdptrs()
This patch removes redundant prototype of load_pdptrs().
I found load_pdptrs() twice in kvm_host.h. Let's remove one.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Fri, 12 Feb 2010 07:00:55 +0000 (16:00 +0900)]
KVM: x86 emulator: Fix x86_emulate_insn() not to use the variable rc for non-X86EMUL values
This patch makes non-X86EMUL_* family functions not to use
the variable rc.
Be sure that this changes nothing but makes the purpose of
the variable rc clearer.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Fri, 12 Feb 2010 06:57:56 +0000 (15:57 +0900)]
KVM: x86 emulator: X86EMUL macro replacements: x86_emulate_insn() and its helpers
This patch just replaces integer values used inside
x86_emulate_insn() and its helper functions to X86EMUL_*.
The purpose of this is to make it clear what will happen
when the variable rc is compared to X86EMUL_* at the end
of x86_emulate_insn().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Fri, 12 Feb 2010 06:53:59 +0000 (15:53 +0900)]
KVM: x86 emulator: X86EMUL macro replacements: from do_fetch_insn_byte() to x86_decode_insn()
This patch just replaces the integer values used inside x86's
decode functions to X86EMUL_*.
By this patch, it becomes clearer that we are using X86EMUL_*
value propagated from ops->read_std() in do_fetch_insn_byte().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Thu, 11 Feb 2010 12:43:14 +0000 (14:43 +0200)]
KVM: inject #UD in 64bit mode from instruction that are not valid there
Some instruction are obsolete in a long mode. Inject #UD.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Tue, 16 Feb 2010 08:51:48 +0000 (10:51 +0200)]
KVM: use desc_ptr struct instead of kvm private descriptor_table
x86 arch defines desc_ptr for idt/gdt pointers, no need to define
another structure in kvm code.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Sat, 13 Feb 2010 18:10:26 +0000 (16:10 -0200)]
KVM: add doc note about PIO/MMIO completion API
Document that partially emulated instructions leave the guest state
inconsistent, and that the kernel will complete operations before
checking for pending signals.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Linus Torvalds [Wed, 21 Apr 2010 19:33:12 +0000 (12:33 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/gerg/m68knommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: allow 4 coldfire serial ports
m68knommu: fix coldfire tcdrain
m68knommu: remove a duplicate vector setting line for 68360
Fix m68k-uclinux's rt_sigreturn trampoline
m68knommu: correct the CC flags for Coldfire M5272 targets
uclinux: error message when FLAT reloc symbol is invalid, v2
Linus Torvalds [Wed, 21 Apr 2010 19:31:52 +0000 (12:31 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
mc13783-regulator: fix a memory leak in mc13783_regulator_remove
regulator: Let drivers know when they use the stub API
Linus Torvalds [Wed, 21 Apr 2010 19:31:12 +0000 (12:31 -0700)]
Merge git://git./linux/kernel/git/joern/logfs
* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
[LogFS] Split large truncated into smaller chunks
[LogFS] Set s_bdi
[LogFS] Prevent mempool_destroy NULL pointer dereference
[LogFS] Move assertion
[LogFS] Plug 8 byte information leak
[LogFS] Prevent memory corruption on large deletes
[LogFS] Remove unused method
Fix trivial conflict with added header includes in fs/logfs/super.c
Linus Torvalds [Wed, 21 Apr 2010 19:30:07 +0000 (12:30 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
jfs: add jfs specific ->setattr call
jfs: fix diAllocExt error in resizing filesystem
jfs_dmap.[ch]: trivial typo fix: s/heigth/height/g
Linus Torvalds [Wed, 21 Apr 2010 19:29:46 +0000 (12:29 -0700)]
Merge branch 'kvm-updates/2.6.34' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix TSS size check for 16-bit tasks
KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
KVM: Increase NR_IOBUS_DEVS limit to 200
KVM: fix the handling of dirty bitmaps to avoid overflows
KVM: MMU: fix kvm_mmu_zap_page() and its calling path
KVM: VMX: Save/restore rflags.vm correctly in real mode
KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
KVM: take srcu lock before call to complete_pio()
Linus Torvalds [Wed, 21 Apr 2010 19:28:44 +0000 (12:28 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md/raid5: allow for more than 2^31 chunks.
David Howells [Wed, 21 Apr 2010 11:01:23 +0000 (12:01 +0100)]
AFS: Don't pass error value to page_cache_release() in error handling
In the error handling in afs_mntpt_do_automount(), we pass an error
pointer to page_cache_release() if read_mapping_page() failed. Instead,
we should extend the gotos around the error handling we don't need.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kiszka [Wed, 14 Apr 2010 14:57:11 +0000 (16:57 +0200)]
KVM: x86: Fix TSS size check for 16-bit tasks
A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
size on task switch.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Lai Jiangshan [Tue, 20 Apr 2010 06:29:29 +0000 (14:29 +0800)]
KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
I got this dmesg due to srcu_read_lock() is missing in
kvm_mmu_notifier_release().
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/3100:
#0: (rcu_read_lock){.+.+..}, at: [<
ffffffff810d73dc>] __mmu_notifier_release+0x38/0xdf
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<
ffffffffa0130a6a>] kvm_mmu_zap_all+0x21/0x5e [kvm]
stack backtrace:
Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
Call Trace:
[<
ffffffff8106afd9>] lockdep_rcu_dereference+0xaa/0xb3
[<
ffffffffa0123a89>] unalias_gfn+0x56/0xab [kvm]
[<
ffffffffa0119600>] gfn_to_memslot+0x16/0x25 [kvm]
[<
ffffffffa012ffca>] gfn_to_rmap+0x17/0x6e [kvm]
[<
ffffffffa01300c1>] rmap_remove+0xa0/0x19d [kvm]
[<
ffffffffa0130649>] kvm_mmu_zap_page+0x109/0x34d [kvm]
[<
ffffffffa0130a7e>] kvm_mmu_zap_all+0x35/0x5e [kvm]
[<
ffffffffa0122870>] kvm_arch_flush_shadow+0x16/0x22 [kvm]
[<
ffffffffa01189e0>] kvm_mmu_notifier_release+0x15/0x17 [kvm]
[<
ffffffff810d742c>] __mmu_notifier_release+0x88/0xdf
[<
ffffffff810d73dc>] ? __mmu_notifier_release+0x38/0xdf
[<
ffffffff81040848>] ? exit_mm+0xe0/0x115
[<
ffffffff810c2cb0>] exit_mmap+0x2c/0x17e
[<
ffffffff8103c472>] mmput+0x2d/0xd4
[<
ffffffff81040870>] exit_mm+0x108/0x115
[...]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Philippe De Muyter [Thu, 18 Mar 2010 10:37:13 +0000 (11:37 +0100)]
m68knommu: allow 4 coldfire serial ports
Fix driver/serial/mcf.c for 4-ports coldfire's (e.g. MCF5484).
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Philippe De Muyter [Fri, 2 Apr 2010 15:56:08 +0000 (17:56 +0200)]
m68knommu: fix coldfire tcdrain
Fix tcdrain on coldfire uarts.
Currently with coldfire uarts tcdrain returns without waiting for txempty,
because (tx)fifosize is 0. Fix that and call uart_update_timeout when
setting the baud rate, otherwise tcdrain will wait for an half our :)
Also constify mcf_uart_ops.
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Greg Ungerer [Fri, 19 Feb 2010 01:27:37 +0000 (11:27 +1000)]
m68knommu: remove a duplicate vector setting line for 68360
Remove a duplicate vector setting line for the 68360 interrupt
setup. Pointed out by Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Maxim Kuvyrkov [Tue, 22 Sep 2009 21:25:44 +0000 (01:25 +0400)]
Fix m68k-uclinux's rt_sigreturn trampoline
Signed-off-by: Maxim Kuvyrkov <maxim@codesourcery.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Philip Nye [Tue, 12 Jan 2010 00:18:03 +0000 (10:18 +1000)]
m68knommu: correct the CC flags for Coldfire M5272 targets
Signed-off-by: Philip Nye <philipn@engarts.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Jun Sun [Fri, 1 Jan 2010 01:28:52 +0000 (17:28 -0800)]
uclinux: error message when FLAT reloc symbol is invalid, v2
This patch fixes a cosmetic error in printk. Text segment and data/bss
segment are allocated from two different areas. It is not meaningful to
give the diff between them in the error reporting messages.
Signed-off-by: Jun Sun <jsun@junsun.net>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Joern Engel [Tue, 20 Apr 2010 19:44:10 +0000 (21:44 +0200)]
[LogFS] Split large truncated into smaller chunks
Truncate would do an almost limitless amount of work without invoking
the garbage collector in between. Split it up into more manageable,
though still large, chunks.
Signed-off-by: Joern Engel <joern@logfs.org>
Linus Torvalds [Tue, 20 Apr 2010 16:39:40 +0000 (09:39 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Convert __DQUOT_PARANOIA symbol to standard config option
Jan Kara [Mon, 19 Apr 2010 14:47:20 +0000 (16:47 +0200)]
quota: Convert __DQUOT_PARANOIA symbol to standard config option
Make __DQUOT_PARANOIA define from the old days a standard config option
and turn it off by default.
This gets rid of a quota warning about writes before quota is turned on
for systems with ext4 root filesystem. Currently there's no way to legally
solve this because /etc/mtab has to be written before quota is turned on
on most systems.
Signed-off-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Tue, 20 Apr 2010 16:21:19 +0000 (09:21 -0700)]
Merge branch 'urgent' of git://git./linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
pcmcia: fix error handling in cm4000_cs.c
drivers/pcmcia: Add missing local_irq_restore
serial_cs: MD55x support (PCMCIA GPRS/EDGE modem) (kernel 2.6.33)
pcmcia: avoid late calls to pccard_validate_cis
pcmcia: fix ioport size calculation in rsrc_nonstatic
pcmcia: re-start on MFC override
pcmcia: fix io_probe due to parent (PCI) resources
pcmcia: use previously assigned IRQ for all card functions
Linus Torvalds [Tue, 20 Apr 2010 16:20:55 +0000 (09:20 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Fix hardirq tracing in trap return path.
sparc64: Use correct pt_regs in decode_access_size() error paths.
sparc64: Fix PREEMPT_ACTIVE value.
sparc64: Run NMIs on the hardirq stack.
sparc64: Allocate sufficient stack space in ftrace stubs.
sparc: Fix forgotten kmemleak headers inclusion
Linus Torvalds [Tue, 20 Apr 2010 16:20:23 +0000 (09:20 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix unsafe frame rewinding with hot regs fetching
Linus Torvalds [Tue, 20 Apr 2010 16:20:11 +0000 (09:20 -0700)]
Merge branch 'drm-linus' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: delay vblank cleanup until after driver unload
Christoph Hellwig [Tue, 20 Apr 2010 03:31:02 +0000 (05:31 +0200)]
x86: correctly wire up the newuname system call
Before commit
e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.
Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.
Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sridhar Samudrala [Tue, 30 Mar 2010 23:48:25 +0000 (16:48 -0700)]
KVM: Increase NR_IOBUS_DEVS limit to 200
This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Mon, 12 Apr 2010 10:35:35 +0000 (19:35 +0900)]
KVM: fix the handling of dirty bitmaps to avoid overflows
Int is not long enough to store the size of a dirty bitmap.
This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.
Note: in mark_page_dirty(), we have to consider the fact that
__set_bit() takes the offset as int, not long.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Xiao Guangrong [Fri, 16 Apr 2010 08:34:42 +0000 (16:34 +0800)]
KVM: MMU: fix kvm_mmu_zap_page() and its calling path
This patch fix:
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking
KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Thu, 8 Apr 2010 15:19:35 +0000 (18:19 +0300)]
KVM: VMX: Save/restore rflags.vm correctly in real mode
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode. The means that the following sequence
KVM_SET_REGS (rflags.vm=1)
KVM_SET_SREGS (cr0.pe=1)
Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().
Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andre Przywara [Wed, 24 Mar 2010 16:46:42 +0000 (17:46 +0100)]
KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Thu, 11 Mar 2010 10:20:03 +0000 (12:20 +0200)]
KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
These are guest-triggerable.
Signed-off-by: Avi Kivity <avi@redhat.com>
Takuya Yoshikawa [Tue, 9 Mar 2010 05:55:19 +0000 (14:55 +0900)]
KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
svm_create_vcpu() does not free the pages allocated during the creation
when it fails to complete the allocations. This patch fixes it.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Tue, 9 Mar 2010 10:01:10 +0000 (12:01 +0200)]
KVM: take srcu lock before call to complete_pio()
complete_pio() may use slot table which is protected by srcu.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
David S. Miller [Tue, 20 Apr 2010 07:48:37 +0000 (00:48 -0700)]
sparc64: Fix hardirq tracing in trap return path.
We can overflow the hardirq stack if we set the %pil here
so early, just let the normal control flow do it.
This is fine as we are allowed to do the actual IRQ enable
at any point after we call trace_hardirqs_on.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Barnes [Fri, 26 Mar 2010 18:07:16 +0000 (11:07 -0700)]
drm: delay vblank cleanup until after driver unload
Drivers may use vblank calls now (e.g. drm_vblank_off) in their unload
paths, so don't clean up the vblank related structures until after
driver unload.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
NeilBrown [Tue, 20 Apr 2010 04:13:34 +0000 (14:13 +1000)]
md/raid5: allow for more than 2^31 chunks.
With many large drives and small chunk sizes it is possible
to create a RAID5 with more than 2^31 chunks. Make sure this
works.
Reported-by: Brett King <king.br@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
Linus Torvalds [Mon, 19 Apr 2010 23:29:56 +0000 (16:29 -0700)]
Linux 2.6.34-rc5
Rik van Riel [Wed, 14 Apr 2010 21:59:28 +0000 (17:59 -0400)]
rmap: add exclusively owned pages to the newest anon_vma
The recent anon_vma fixes cause many anonymous pages to end up
in the parent process anon_vma, even when the page is exclusively
owned by the current process.
Adding exclusively owned anonymous pages to the top anon_vma
reduces rmap scanning overhead, especially in workloads with
forking servers.
This patch adds a parameter to __page_set_anon_rmap that can
be used to indicate whether or not the added page is exclusively
owned by the current process.
Pages added through page_add_new_anon_rmap are exclusively
owned by the current process, and can be added to the top
anon_vma.
Pages added through page_add_anon_rmap can be either shared
or exclusively owned, so we do the conservative thing and
add it to the oldest anon_vma.
A next step would be to add the exclusive parameter to
page_add_anon_rmap, to be used from functions where we do
know for sure whether a page is exclusively owned.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Lightly-tested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
[ Edited to look nicer - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 19 Apr 2010 21:20:32 +0000 (14:20 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Turn lower lookup error messages into debug messages
eCryptfs: Copy lower directory inode times and size on link
ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
ecryptfs: fix error code for missing xattrs in lower fs
eCryptfs: Decrypt symlink target for stat size
eCryptfs: Strip metadata in xattr flag in encrypted view
eCryptfs: Clear buffer before reading in metadata xattr
eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front
eCryptfs: Fix metadata in xattr feature regression
David S. Miller [Mon, 19 Apr 2010 20:46:48 +0000 (13:46 -0700)]
sparc64: Use correct pt_regs in decode_access_size() error paths.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tyler Hicks [Thu, 25 Mar 2010 16:16:56 +0000 (11:16 -0500)]
eCryptfs: Turn lower lookup error messages into debug messages
Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted
file name have caused many users to become concerned about their data.
Since this is a rather harmless condition, I'm moving this warning to
only be printed when the ecryptfs_verbosity module param is 1.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Tyler Hicks [Tue, 23 Mar 2010 23:09:02 +0000 (18:09 -0500)]
eCryptfs: Copy lower directory inode times and size on link
The timestamps and size of a lower inode involved in a link() call was
being copied to the upper parent inode. Instead, we should be
copying lower parent inode's timestamps and size to the upper parent
inode. I discovered this bug using the POSIX test suite at Tuxera.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Jeff Mahoney [Fri, 19 Mar 2010 19:35:46 +0000 (15:35 -0400)]
ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.
When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit
4981e081 added this and a d_delete in
2008 and several months later commit
caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.
The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.
As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().
This patch removes the d_drop call and fixes both issues.
This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887
Reported-by: Árpád Bíró <biroa@demasz.hu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>