Avi Kivity [Mon, 16 Jun 2008 04:23:17 +0000 (21:23 -0700)]
KVM: x86 emulator: simplify sib decoding
Instead of using sparse switches, use simpler if/else sequences.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 16 Jun 2008 04:13:41 +0000 (21:13 -0700)]
KVM: x86 emulator: handle undecoded rex.b with r/m = 5 in certain cases
x86_64 does not decode rex.b in certain cases, where the r/m field = 5.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Mohammed Gamal [Sun, 15 Jun 2008 16:37:38 +0000 (19:37 +0300)]
KVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Fri, 13 Jun 2008 19:45:42 +0000 (22:45 +0300)]
KVM: Use printk_rlimit() instead of reporting emulation failures just once
Emulation failure reports are useful, so allow more than one per the lifetime
of the module.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Tan, Li [Fri, 23 May 2008 06:54:09 +0000 (14:54 +0800)]
KVM: Support mixed endian machines
Currently kvmtrace is not portable. This will prevent from copying a
trace file from big-endian target to little-endian workstation for analysis.
In the patch, kernel outputs metadata containing a magic number to trace
log, and changes 64-bit words to be u64 instead of a pair of u32s.
Signed-off-by: Tan Li <li.tan@intel.com>
Acked-by: Jerone Young <jyoung5@us.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Glauber Costa [Tue, 10 Jun 2008 13:46:53 +0000 (10:46 -0300)]
KVM: Do not calculate linear rip in emulation failure report
If we're not gonna do anything (case in which failure is already
reported), we do not need to even bother with calculating the linear rip.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Marcelo Tosatti [Wed, 11 Jun 2008 22:52:53 +0000 (19:52 -0300)]
KVM: only abort guest entry if timer count goes from 0->1
Only abort guest entry if the timer count went from 0->1, since for 1->2
or larger the bit will either be set already or a timer irq will have
been injected.
Using atomic_inc_and_test() for it also introduces an SMP barrier
to the LAPIC version (thought it was unecessary because of timer
migration, but guest can be scheduled to a different pCPU between exit
and kvm_vcpu_block(), so there is the possibility for a race).
Noticed by Avi.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Fri, 30 May 2008 14:05:57 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (ia64 part)
This patch enables coalesced MMIO for ia64 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.
[akpm: fix compile error on ia64]
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Fri, 30 May 2008 14:05:56 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (powerpc part)
This patch enables coalesced MMIO for powerpc architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Fri, 30 May 2008 14:05:55 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (x86 part)
This patch enables coalesced MMIO for x86 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Fri, 30 May 2008 14:05:54 +0000 (16:05 +0200)]
KVM: Add coalesced MMIO support (common part)
This patch adds all needed structures to coalesce MMIOs.
Until an architecture uses it, it is not compiled.
Coalesced MMIO introduces two ioctl() to define where are the MMIO zones that
can be coalesced:
- KVM_REGISTER_COALESCED_MMIO registers a coalesced MMIO zone.
It requests one parameter (struct kvm_coalesced_mmio_zone) which defines
a memory area where MMIOs can be coalesced until the next switch to
user space. The maximum number of MMIO zones is KVM_COALESCED_MMIO_ZONE_MAX.
- KVM_UNREGISTER_COALESCED_MMIO cancels all registered zones inside
the given bounds (bounds are also given by struct kvm_coalesced_mmio_zone).
The userspace client can check kernel coalesced MMIO availability by asking
ioctl(KVM_CHECK_EXTENSION) for the KVM_CAP_COALESCED_MMIO capability.
The ioctl() call to KVM_CAP_COALESCED_MMIO will return 0 if not supported,
or the page offset where will be stored the ring buffer.
The page offset depends on the architecture.
After an ioctl(KVM_RUN), the first page of the KVM memory mapped points to
a kvm_run structure. The offset given by KVM_CAP_COALESCED_MMIO is
an offset to the coalesced MMIO ring expressed in PAGE_SIZE relatively
to the address of the start of th kvm_run structure. The MMIO ring buffer
is defined by the structure kvm_coalesced_mmio_ring.
[akio: fix oops during guest shutdown]
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Laurent Vivier [Fri, 30 May 2008 14:05:53 +0000 (16:05 +0200)]
KVM: kvm_io_device: extend in_range() to manage len and write attribute
Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).
This modification allows to use kvm_io_device with coalesced MMIO.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 29 May 2008 11:56:28 +0000 (14:56 +0300)]
KVM: MMU: Avoid page prefetch on SVM
SVM cannot benefit from page prefetching since guest page fault bypass
cannot by made to work there. Avoid accessing the guest page table in
this case.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 29 May 2008 11:55:03 +0000 (14:55 +0300)]
KVM: MMU: Move nonpaging_prefetch_page()
In preparation for next patch. No code change.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 29 May 2008 11:38:38 +0000 (14:38 +0300)]
KVM: x86 emulator: implement 'push imm' (opcode 0x68)
Encountered in FC6 boot sequence, now that we don't force ss.rpl = 0 during
the protected mode transition. Not really necessary, but nice to have.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 29 May 2008 11:26:29 +0000 (14:26 +0300)]
KVM: x86 emulator: simplify push imm8 emulation
Instead of fetching the data explicitly, use SrcImmByte.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 29 May 2008 11:20:16 +0000 (14:20 +0300)]
KVM: MMU: Optimize prefetch_page()
Instead of reading each pte individually, read 256 bytes worth of ptes and
batch process them.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Guillaume Thouvenin [Tue, 27 May 2008 13:13:28 +0000 (15:13 +0200)]
KVM: x86 emulator: Add support for mov r, sreg (0x8c) instruction
Add support for mov r, sreg (0x8c) instruction
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Guillaume Thouvenin [Tue, 27 May 2008 12:49:15 +0000 (14:49 +0200)]
KVM: x86 emulator: Add support for mov seg, r (0x8e) instruction
Add support for mov r, sreg (0x8c) instruction.
[avi: drop the sreg decoding table in favor of 1:1 encoding]
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Guillaume Thouvenin [Tue, 27 May 2008 08:19:16 +0000 (10:19 +0200)]
KVM: x86 emulator: adds support to mov r,imm (opcode 0xb8) instruction
Add support to mov r, imm (0xb8) instruction.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Guillaume Thouvenin [Tue, 27 May 2008 08:19:08 +0000 (10:19 +0200)]
KVM: x86 emulator: add support for jmp far 0xea
Add support for jmp far (opcode 0xea) instruction.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Guillaume Thouvenin [Tue, 27 May 2008 08:22:20 +0000 (10:22 +0200)]
KVM: x86 emulator: Update c->dst.bytes in decode instruction
Update c->dst.bytes in decode instruction instead of instruction
itself. It's needed because if c->dst.bytes is equal to 0, the
instruction is not emulated.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Guillaume Thouvenin [Tue, 27 May 2008 08:18:46 +0000 (10:18 +0200)]
KVM: Prefixes segment functions that will be exported with "kvm_"
Prefixes functions that will be exported with kvm_.
We also prefixed set_segment() even if it still static
to be coherent.
signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 26 May 2008 17:06:35 +0000 (20:06 +0300)]
KVM: MTRR support
Add emulation for the memory type range registers, needed by VMware esx 3.5,
and by pci device assignment.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 27 May 2008 13:26:01 +0000 (16:26 +0300)]
KVM: Order segment register constants in the same way as cpu operand encoding
This can be used to simplify the x86 instruction decoder.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Sheng Yang [Thu, 15 May 2008 10:23:25 +0000 (18:23 +0800)]
KVM: VMX: Enable NMI with in-kernel irqchip
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Sheng Yang [Thu, 15 May 2008 01:52:48 +0000 (09:52 +0800)]
KVM: IOAPIC/LAPIC: Enable NMI support
[avi: fix ia64 build breakage]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Sun, 25 May 2008 11:38:15 +0000 (14:38 +0300)]
KVM: Remove unnecessary ->decache_regs() call
Since we aren't modifying any register, there's no need to decache
the register state.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 13 May 2008 13:29:20 +0000 (16:29 +0300)]
KVM: Remove decache_vcpus_on_cpu() and related callbacks
Obsoleted by the vmx-specific per-cpu list.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 13 May 2008 13:22:47 +0000 (16:22 +0300)]
KVM: VMX: Add list of potentially locally cached vcpus
VMX hardware can cache the contents of a vcpu's vmcs. This cache needs
to be flushed when migrating a vcpu to another cpu, or (which is the case
that interests us here) when disabling hardware virtualization on a cpu.
The current implementation of decaching iterates over the list of all vcpus,
picks the ones that are potentially cached on the cpu that is being offlined,
and flushes the cache. The problem is that it uses mutex_trylock() to gain
exclusive access to the vcpu, which fires off a (benign) warning about using
the mutex in an interrupt context.
To avoid this, and to make things generally nicer, add a new per-cpu list
of potentially cached vcus. This makes the decaching code much simpler. The
list is vmx-specific since other hardware doesn't have this issue.
[andrea: fix crash on suspend/resume]
Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Tue, 13 May 2008 10:23:38 +0000 (13:23 +0300)]
KVM: Handle virtualization instruction #UD faults during reboot
KVM turns off hardware virtualization extensions during reboot, in order
to disassociate the memory used by the virtualization extensions from the
processor, and in order to have the system in a consistent state.
Unfortunately virtual machines may still be running while this goes on,
and once virtualization extensions are turned off, any virtulization
instruction will #UD on execution.
Fix by adding an exception handler to virtualization instructions; if we get
an exception during reboot, we simply spin waiting for the reset to complete.
If it's a true exception, BUG() so we can have our stack trace.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Thu, 15 May 2008 10:51:35 +0000 (13:51 +0300)]
KVM: MMU: Fix false flooding when a pte points to page table
The KVM MMU tries to detect when a speculative pte update is not actually
used by demand fault, by checking the accessed bit of the shadow pte. If
the shadow pte has not been accessed, we deem that page table flooded and
remove the shadow page table, allowing further pte updates to proceed
without emulation.
However, if the pte itself points at a page table and only used for write
operations, the accessed bit will never be set since all access will happen
through the emulator.
This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels.
The kernel points a kmap_atomic() pte at a page table, and then
proceeds with read-modify-write operations to look at the dirty and accessed
bits. We get a false flood trigger on the kmap ptes, which results in the
mmu spending all its time setting up and tearing down shadows.
Fix by setting the shadow accessed bit on emulated accesses.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi Kivity [Mon, 12 May 2008 16:25:43 +0000 (19:25 +0300)]
KVM: VMX: Trivial vmcs_write64() code simplification
Signed-off-by: Avi Kivity <avi@qumranet.com>
Chris Lalancette [Mon, 5 May 2008 17:05:16 +0000 (13:05 -0400)]
KVM: SVM: Fake MSR_K7 performance counters
Attached is a patch that fixes a guest crash when booting older Linux kernels.
The problem stems from the fact that we are currently emulating
MSR_K7_EVNTSEL[0-3], but not emulating MSR_K7_PERFCTR[0-3]. Because of this,
setup_k7_watchdog() in the Linux kernel receives a GPF when it attempts to
write into MSR_K7_PERFCTR, which causes an OOPs.
The patch fixes it by just "fake" emulating the appropriate MSRs, throwing
away the data in the process. This causes the NMI watchdog to not actually
work, but it's not such a big deal in a virtualized environment.
When we get a write to one of these counters, we printk_ratelimit() a warning.
I decided to print it out for all writes, even if the data is 0; it doesn't
seem to make sense to me to special case when data == 0.
Tested by myself on a RHEL-4 guest, and Joerg Roedel on a Windows XP 64-bit
guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Aurelien Jarno [Fri, 2 May 2008 15:02:23 +0000 (17:02 +0200)]
KVM: PIT: support mode 3
The in-kernel PIT emulation ignores pending timers if operating
under mode 3, which for example Hurd uses.
This mode should output a square wave, high for (N+1)/2 counts and low
for (N-1)/2 counts. As we only care about the resulting interrupts, the
period is N, and mode 3 is the same as mode 2 with regard to
interrupts.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Anthony Liguori [Wed, 30 Apr 2008 20:37:07 +0000 (15:37 -0500)]
KVM: Handle vma regions with no backing page
This patch allows VMAs that contain no backing page to be used for guest
memory. This is useful for assigning mmio regions to a guest.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 30 Apr 2008 15:56:04 +0000 (17:56 +0200)]
KVM: SVM: add tracing support for TDP page faults
To distinguish between real page faults and nested page faults they should be
traced as different events. This is implemented by this patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 30 Apr 2008 15:56:03 +0000 (17:56 +0200)]
KVM: SVM: add missing kvmtrace markers
This patch adds the missing kvmtrace markers to the svm
module of kvm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 30 Apr 2008 15:56:02 +0000 (17:56 +0200)]
KVM: add missing kvmtrace bits
This patch adds some kvmtrace bits to the generic x86 code
where it is instrumented from SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 30 Apr 2008 15:56:01 +0000 (17:56 +0200)]
KVM: SVM: implement dedicated INTR exit handler
With an exit handler for INTR intercepts its possible to account them using
kvmtrace.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 30 Apr 2008 15:56:00 +0000 (17:56 +0200)]
KVM: SVM: implement dedicated NMI exit handler
With an exit handler for NMI intercepts its possible to account them using
kvmtrace.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Joerg Roedel [Wed, 30 Apr 2008 15:55:59 +0000 (17:55 +0200)]
KVM: VMX: move APIC_ACCESS trace entry to generic code
This patch moves the trace entry for APIC accesses from the VMX code to the
generic lapic code. This way APIC accesses from SVM will also be traced.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Harvey Harrison [Sun, 27 Apr 2008 19:14:13 +0000 (12:14 -0700)]
KVM: add statics were possible, function definition in lapic.h
Noticed by sparse:
arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static?
arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static?
arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static?
arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static?
arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static?
arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static?
arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static?
arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Christian Borntraeger [Mon, 21 Apr 2008 11:48:24 +0000 (13:48 +0200)]
KVM: remove long -> void *user -> long cast
kvm_dev_ioctl casts the arg value to void __user *, just to recast it
again to long. This seems unnecessary.
According to objdump the binary code on x86 is unchanged by this patch.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Linus Torvalds [Thu, 17 Jul 2008 17:55:51 +0000 (10:55 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
[PATCH] ocfs2: fix oops in mmap_truncate testing
configfs: call drop_link() to cleanup after create_link() failure
configfs: Allow ->make_item() and ->make_group() to return detailed errors.
configfs: Fix failing mkdir() making racing rmdir() fail
configfs: Fix deadlock with racing rmdir() and rename()
configfs: Make configfs_new_dirent() return error code instead of NULL
configfs: Protect configfs_dirent s_links list mutations
configfs: Introduce configfs_dirent_lock
ocfs2: Don't snprintf() without a format.
ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs
ocfs2/net: Silence build warnings on sparc64
ocfs2: Handle error during journal load
ocfs2: Silence an error message in ocfs2_file_aio_read()
ocfs2: use simple_read_from_buffer()
ocfs2: fix printk format warnings with OCFS2_FS_STATS=n
[PATCH 2/2] ocfs2: Instrument fs cluster locks
[PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option
Linus Torvalds [Thu, 17 Jul 2008 17:55:07 +0000 (10:55 -0700)]
Merge git://git./linux/kernel/git/brodo/pcmcia-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-fixes-2.6:
pcmcia: ide-cs: Remove outdated comment
pcmcia: fix cisinfo_t removal
pcmcia: fix return value in cm4000_cs.c
Linus Torvalds [Thu, 17 Jul 2008 17:38:59 +0000 (10:38 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix asm/e820.h for userspace inclusion
x86: fix numaq_tsc_disable
x86: fix kernel_physical_mapping_init() for large x86 systems
Linus Torvalds [Thu, 17 Jul 2008 17:37:10 +0000 (10:37 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: do not trace library functions
ftrace: do not trace scheduler functions
ftrace: fix lockup with MAXSMP
ftrace: fix merge buglet
Rusty Russell [Tue, 15 Jul 2008 05:02:27 +0000 (15:02 +1000)]
x86: fix asm/e820.h for userspace inclusion
asm-x86/e820.h is included from userspace. 'x86: make e820.c to have
common functions' (
b79cd8f1268bab57ff85b19d131f7f23deab2dee) broke it:
make -C Documentation/lguest
cc -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include
lguest.c -lz -o lguest
In file included from ../../include/asm-x86/bootparam.h:8,
from lguest.c:45:
../../include/asm/e820.h:66: error: expected ‘)’ before ‘start’
../../include/asm/e820.h:67: error: expected ‘)’ before ‘start’
../../include/asm/e820.h:68: error: expected ‘)’ before ‘start’
../../include/asm/e820.h:72: error: expected ‘=’, ‘,’, ‘;’, ‘asm’
or ‘__attribute__’ before ‘e820_update_range’
...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Tue, 15 Jul 2008 06:29:01 +0000 (23:29 -0700)]
x86: fix numaq_tsc_disable
fix:
arch/x86/kernel/numaq_32.c: In function ‘numaq_tsc_disable’:
arch/x86/kernel/numaq_32.c:99: warning: ‘return’ with a value, in function returning void
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 17 Jul 2008 17:24:56 +0000 (19:24 +0200)]
Merge branch 'linus' into x86/urgent
Takashi Iwai [Thu, 17 Jul 2008 16:09:12 +0000 (18:09 +0200)]
fix build error of arch/ia64/kvm/*
Fix calls of smp_call_function*() in arch/ia64/kvm for recent API
changes.
CC [M] arch/ia64/kvm/kvm-ia64.o
arch/ia64/kvm/kvm-ia64.c: In function 'handle_global_purge':
arch/ia64/kvm/kvm-ia64.c:398: error: too many arguments to function 'smp_call_function_single'
arch/ia64/kvm/kvm-ia64.c: In function 'kvm_vcpu_kick':
arch/ia64/kvm/kvm-ia64.c:1696: error: too many arguments to function 'smp_call_function_single'
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 17 Jul 2008 16:15:23 +0000 (09:15 -0700)]
Merge branch 'ptrace-cleanup' of git://git./linux/kernel/git/frob/linux-2.6-utrace
* 'ptrace-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace:
fix dangling zombie when new parent ignores children
do_wait: return security_task_wait() error code in place of -ECHILD
ptrace children revamp
do_wait reorganization
David Woodhouse [Thu, 17 Jul 2008 06:44:32 +0000 (23:44 -0700)]
Update scripts/Makefile.fwinst to cope with older make
Also fix unwanted rebuilds of the firmware/ihex2fw tool by including
the .ihex2fw.cmd file when present.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reported-and-tested-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 17 Jul 2008 16:05:38 +0000 (09:05 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] dasd: use -EOPNOTSUPP instead of -ENOTSUPP
[S390] qdio: new qdio driver.
[S390] cio: Export chsc_error_from_response().
[S390] vmur: Fix return code handling.
[S390] Fix stacktrace compile bug.
[S390] Increase default warning stacksize.
[S390] dasd: Fix cleanup in dasd_{fba,diag}_check_characteristics().
[S390] chsc headers userspace cleanup
[S390] dasd: fix unsolicited SIM handling.
[S390] zfcpdump: Make SCSI disk dump tool recognize storage holes
Grant Likely [Thu, 17 Jul 2008 07:06:55 +0000 (01:06 -0600)]
Fix collateral damage to top level Makefile
The patch named "powerpc/mpc5121: Add clock driver", also contained
an unrelated and bogus change to the top-level makefile. This patch
backs out the bad bit.
SHA1 of offending patch:
137e95906e294913fab02162e8a1948ade49acb5)
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Repented-by: John Rigby <jrigby@freescale.com>
[ Heh. Normally I pick these out from the diffstats, but I guess
I've grown to trust the ppc tree too much ;) - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Thu, 17 Jul 2008 15:40:48 +0000 (17:40 +0200)]
ftrace: do not trace library functions
make function tracing more robust: do not trace library functions.
We've already got a sizable list of exceptions:
ifdef CONFIG_FTRACE
# Do not profile string.o, since it may be used in early boot or vdso
CFLAGS_REMOVE_string.o = -pg
# Also do not profile any debug utilities
CFLAGS_REMOVE_spinlock_debug.o = -pg
CFLAGS_REMOVE_list_debug.o = -pg
CFLAGS_REMOVE_debugobjects.o = -pg
CFLAGS_REMOVE_find_next_bit.o = -pg
CFLAGS_REMOVE_cpumask.o = -pg
CFLAGS_REMOVE_bitmap.o = -pg
endif
... and the pattern has been that random library functionality showed
up in ftrace's critical path (outside of its recursion check), causing
hard to debug lockups.
So be a bit defensive about it and exclude all lib/*.o functions by
default. It's not that they are overly interesting for tracing purposes
anyway. Specific ones can still be traced, in an opt-in manner.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Tue, 15 Apr 2008 20:39:31 +0000 (22:39 +0200)]
ftrace: do not trace scheduler functions
do not trace scheduler functions - it's still a bit fragile
and can lock up with:
http://redhat.com/~mingo/misc/config-Thu_Jul_17_13_34_52_CEST_2008
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 17 Jul 2008 15:38:17 +0000 (17:38 +0200)]
ftrace: fix lockup with MAXSMP
MAXSMP brings in lots of use of various bitops in smp_processor_id()
and friends - causing ftrace to lock up during bootup:
calling anon_inode_init+0x0/0x130
initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
calling acpi_event_init+0x0/0x57
[ hard hang ]
So exclude the bitops facilities from tracing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stefan Haberland [Thu, 17 Jul 2008 15:16:49 +0000 (17:16 +0200)]
[S390] dasd: use -EOPNOTSUPP instead of -ENOTSUPP
return value -ENOTSUPP is not valid in userspace context, use
-EOPNOTSUPP instead
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Jan Glauber [Thu, 17 Jul 2008 15:16:48 +0000 (17:16 +0200)]
[S390] qdio: new qdio driver.
List of major changes:
- split qdio driver into several files
- seperation of thin interrupt code
- improved handling for multiple thin interrupt devices
- inbound and outbound processing now always runs in tasklet context
- significant less tasklet schedules per interrupt needed
- merged qebsm with non-qebsm handling
- cleanup qdio interface and added kerneldoc
- coding style
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cornelia Huck [Thu, 17 Jul 2008 15:16:47 +0000 (17:16 +0200)]
[S390] cio: Export chsc_error_from_response().
Make chsc_error_from_response() available to chsc callers outside
of chsc.c (namely qdio) to avoid duplicating error checking code.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Frank Munzert [Thu, 17 Jul 2008 15:16:46 +0000 (17:16 +0200)]
[S390] vmur: Fix return code handling.
Use -EOPNOTSUPP instead of -ENOTSUPP.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Thu, 17 Jul 2008 15:16:45 +0000 (17:16 +0200)]
[S390] Fix stacktrace compile bug.
Add missing module.h include to fix this:
CC arch/s390/kernel/stacktrace.o
arch/s390/kernel/stacktrace.c:84: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:84: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:84: warning: parameter names (without types) in function declaration
arch/s390/kernel/stacktrace.c:97: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:97: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:97: warning: parameter names (without types) in function declaration
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Thu, 17 Jul 2008 15:16:44 +0000 (17:16 +0200)]
[S390] Increase default warning stacksize.
Compiling a kernel with allmodconfig or allyesconfig results in tons
of gcc warnings, because the default maximum stacksize from which on
gcc will emit a warning is just 256 bytes.
Increase this to 2048, so these warnings don't distract from the real
warnings that we need to watch at.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cornelia Huck [Thu, 17 Jul 2008 15:16:43 +0000 (17:16 +0200)]
[S390] dasd: Fix cleanup in dasd_{fba,diag}_check_characteristics().
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Adrian Bunk [Thu, 17 Jul 2008 15:16:42 +0000 (17:16 +0200)]
[S390] chsc headers userspace cleanup
Kernel headers shouldn't expose functions to userspace.
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Thu, 17 Jul 2008 15:16:41 +0000 (17:16 +0200)]
[S390] dasd: fix unsolicited SIM handling.
Add missing schedule_bh and check that there is 32 bit sense data.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Frank Munzert [Thu, 17 Jul 2008 15:16:40 +0000 (17:16 +0200)]
[S390] zfcpdump: Make SCSI disk dump tool recognize storage holes
The kernel part of zfcpdump establishes a new debugfs file zcore/memmap
which exports information on memory layout (start address and length of each
memory chunk) to its userspace counterpart.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ingo Molnar [Thu, 17 Jul 2008 11:26:50 +0000 (13:26 +0200)]
ftrace: fix merge buglet
-tip testing found a bootup hang here:
initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
calling acpi_event_init+0x0/0x57
the bootup should have continued with:
initcall acpi_event_init+0x0/0x57 returned 0 after 45 msecs
but it hung hard there instead.
bisection led to this commit:
| commit
5806b81ac1c0c52665b91723fd4146a4f86e386b
| Merge: d14c8a6... 6712e29...
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Mon Jul 14 16:11:52 2008 +0200
| Merge branch 'auto-ftrace-next' into tracing/for-linus
turns out that i made this mistake in the merge:
ifdef CONFIG_FTRACE
# Do not profile debug utilities
CFLAGS_REMOVE_tsc_64.o = -pg
CFLAGS_REMOVE_tsc_32.o = -pg
those two files got unified meanwhile - so the dont-profile annotation
got lost. The proper rule is:
CFLAGS_REMOVE_tsc.o = -pg
i guess this could have been caught sooner if the CFLAGS_REMOVE* kbuild
rule aborted the build if it met a target that does not exist anymore?
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Roland McGrath [Wed, 9 Apr 2008 06:12:30 +0000 (23:12 -0700)]
fix dangling zombie when new parent ignores children
This fixes an arcane bug that we think was a regression introduced
by commit
b2b2cbc4b2a2f389442549399a993a8306420baf. When a parent
ignores SIGCHLD (or uses SA_NOCLDWAIT), its children would self-reap
but they don't if it's using ptrace on them. When the parent thread
later exits and ceases to ptrace a child but leaves other live
threads in the parent's thread group, any zombie children are left
dangling. The fix makes them self-reap then, as they would have
done earlier if ptrace had not been in use.
Signed-off-by: Roland McGrath <roland@redhat.com>
Roland McGrath [Mon, 31 Mar 2008 01:41:25 +0000 (18:41 -0700)]
do_wait: return security_task_wait() error code in place of -ECHILD
This reverts the effect of commit
f2cc3eb133baa2e9dc8efd40f417106b2ee520f3
"do_wait: fix security checks". That change reverted the effect of commit
73243284463a761e04d69d22c7516b2be7de096c. The rationale for the original
commit still stands. The inconsistent treatment of children hidden by
ptrace was an unintended omission in the original change and in no way
invalidates its purpose.
This makes do_wait return the error returned by security_task_wait()
(usually -EACCES) in place of -ECHILD when there are some children the
caller would be able to wait for if not for the permission failure. A
permission error will give the user a clue to look for security policy
problems, rather than for mysterious wait bugs.
Signed-off-by: Roland McGrath <roland@redhat.com>
Roland McGrath [Tue, 25 Mar 2008 01:36:23 +0000 (18:36 -0700)]
ptrace children revamp
ptrace no longer fiddles with the children/sibling links, and the
old ptrace_children list is gone. Now ptrace, whether of one's own
children or another's via PTRACE_ATTACH, just uses the new ptraced
list instead.
There should be no user-visible difference that matters. The only
change is the order in which do_wait() sees multiple stopped
children and stopped ptrace attachees. Since wait_task_stopped()
was changed earlier so it no longer reorders the children list, we
already know this won't cause any new problems.
Signed-off-by: Roland McGrath <roland@redhat.com>
Roland McGrath [Thu, 20 Mar 2008 02:24:59 +0000 (19:24 -0700)]
do_wait reorganization
This breaks out the guts of do_wait into three subfunctions.
The control flow is less nonobvious without so much goto.
do_wait_thread and ptrace_do_wait contain the main work of the outer loop.
wait_consider_task contains the main work of the inner loop.
Signed-off-by: Roland McGrath <roland@redhat.com>
Chandra Seetharaman [Thu, 17 Jul 2008 00:35:08 +0000 (17:35 -0700)]
scsi_dh: Verify "dev" is a sdev before accessing it.
Before accessing the device data structure in hardware handlers,
make sure it is a indeed a sdev device.
Yinghai Lu <yhlu.kernel@gmail.com> found the bug on Jul 16, 2008,
and later tested/verified the following fix.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 17 Jul 2008 00:25:46 +0000 (17:25 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
Jesse Barnes [Wed, 16 Jul 2008 23:21:47 +0000 (16:21 -0700)]
Revert "x86/PCI: ACPI based PCI gap calculation"
This reverts commit
809d9a8f93bd8504dcc34b16bbfdfd1a8c9bb1ed.
This one isn't quite ready for prime time. It needs more testing and
additional feedback from the ACPI guys.
Coly Li [Mon, 30 Jun 2008 10:45:45 +0000 (18:45 +0800)]
[PATCH] ocfs2: fix oops in mmap_truncate testing
This patch fixes a mmap_truncate bug which was found by ocfs2 test suite.
In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races
mmap writes and truncates from multiple processes. While the test is
running, a stat from another node forces writeout, causing an oops in
ocfs2_get_block() because it sees a buffer to write which isn't allocated.
This patch fixed the bug by clear dirty and uptodate bits in buffer, leave
the buffer unmapped and return.
Fix is suggested by Mark Fasheh, and I code up the patch.
Signed-off-by: Coly Li <coyli@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Linus Torvalds [Wed, 16 Jul 2008 22:11:07 +0000 (15:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (68 commits)
sdio_uart: Fix SDIO break control to now return success or an error
mmc: host driver for Ricoh Bay1Controllers
sdio: sdio_io.c Fix sparse warnings
sdio: fix the use of hard coded timeout value.
mmc: OLPC: update vdd/powerup quirk comment
mmc: fix spares errors of sdhci.c
mmc: remove multiwrite capability
wbsd: fix bad dma_addr_t conversion
atmel-mci: Driver for Atmel on-chip MMC controllers
mmc: fix sdio_io sparse errors
mmc: wbsd.c fix shadowing of 'dma' variable
MMC: S3C24XX: Refuse incorrectly aligned transfers
MMC: S3C24XX: Add maintainer entry
MMC: S3C24XX: Update error debugging.
MMC: S3C24XX: Add media presence test to request handling.
MMC: S3C24XX: Fix use of msecs where jiffies are needed
MMC: S3C24XX: Add MODULE_ALIAS() entries for the platform devices
MMC: S3C24XX: Fix s3c2410_dma_request() return code check.
MMC: S3C24XX: Allow card-detect on non-IRQ capable pin
MMC: S3C24XX: Ensure host->mrq->data is valid
...
Manually fixed up bogus executable bits on drivers/mmc/core/sdio_io.c
and include/linux/mmc/sdio_func.h when merging.
Linus Torvalds [Wed, 16 Jul 2008 22:02:57 +0000 (15:02 -0700)]
Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
UBIFS: include to compilation
UBIFS: add new flash file system
UBIFS: add brief documentation
MAINTAINERS: add UBIFS section
do_mounts: allow UBI root device name
VFS: export sync_sb_inodes
VFS: move inode_lock into sync_sb_inodes
Linus Torvalds [Wed, 16 Jul 2008 21:53:54 +0000 (14:53 -0700)]
Merge git://git./linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (76 commits)
IDE: Report errors during drive reset back to user space
Update documentation of HDIO_DRIVE_RESET ioctl
IDE: Remove unused code
IDE: Fix HDIO_DRIVE_RESET handling
hd.c: remove the #include <linux/mc146818rtc.h>
update the BLK_DEV_HD help text
move ide/legacy/hd.c to drivers/block/
ide/legacy/hd.c: use late_initcall()
remove BLK_DEV_HD_ONLY
ide: endian annotations in ide-floppy.c
ide-floppy: zero out the whole struct ide_atapi_pc on init
ide-floppy: fold idefloppy_create_test_unit_ready_cmd into idefloppy_open
ide-cd: move request prep chunk from cdrom_do_newpc_cont to rq issue path
ide-cd: move request prep from cdrom_start_rw_cont to rq issue path
ide-cd: move request prep from cdrom_start_seek_continuation to rq issue path
ide-cd: fold cdrom_start_seek into ide_cd_do_request
ide-cd: simplify request issuing path
ide-cd: mv ide_do_rw_cdrom ide_cd_do_request
ide-cd: cdrom_start_seek: remove unused argument block
ide-cd: ide_do_rw_cdrom: add the catch-all bad request case to the if-else block
...
Linus Torvalds [Wed, 16 Jul 2008 21:52:12 +0000 (14:52 -0700)]
Merge branch 'release-2.6.27' of git://git./linux/kernel/git/ak/linux-acpi-merge-2.6
* 'release-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-acpi-merge-2.6: (87 commits)
Fix FADT parsing
Add the ability to reset the machine using the RESET_REG in ACPI's FADT table.
ACPI: use dev_printk when possible
PNPACPI: add support for HP vendor-specific CCSR descriptors
PNP: avoid legacy IDE IRQs
PNP: convert resource options to single linked list
ISAPNP: handle independent options following dependent ones
PNP: remove extra 0x100 bit from option priority
PNP: support optional IRQ resources
PNP: rename pnp_register_*_resource() local variables
PNPACPI: ignore _PRS interrupt numbers larger than PNP_IRQ_NR
PNP: centralize resource option allocations
PNP: remove redundant pnp_can_configure() check
PNP: make resource assignment functions return 0 (success) or -EBUSY (failure)
PNP: in debug resource dump, make empty list obvious
PNP: improve resource assignment debug
PNP: increase I/O port & memory option address sizes
PNP: introduce pnp_irq_mask_t typedef
PNP: make resource option structures private to PNP subsystem
PNP: define PNP-specific IORESOURCE_IO_* flags alongside IRQ, DMA, MEM
...
Martin K. Petersen [Wed, 16 Jul 2008 20:09:06 +0000 (16:09 -0400)]
block: Trivial fix for blk_integrity_rq()
Fail integrity check gracefully when request does not have a bio
attached (BLOCK_PC).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 16 Jul 2008 21:49:49 +0000 (14:49 -0700)]
Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (82 commits)
NFSv4: Remove BKL from the nfsv4 state recovery
SUNRPC: Remove the BKL from the callback functions
NFS: Remove BKL from the readdir code
NFS: Remove BKL from the symlink code
NFS: Remove BKL from the sillydelete operations
NFS: Remove the BKL from the rename, rmdir and unlink operations
NFS: Remove BKL from NFS lookup code
NFS: Remove the BKL from nfs_link()
NFS: Remove the BKL from the inode creation operations
NFS: Remove BKL usage from open()
NFS: Remove BKL usage from the write path
NFS: Remove the BKL from the permission checking code
NFS: Remove attribute update related BKL references
NFS: Remove BKL requirement from attribute updates
NFS: Protect inode->i_nlink updates using inode->i_lock
nfs: set correct fl_len in nlmclnt_test()
SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon
SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier
SUNRPC: None of rpcb_create's callers wants a privileged source port
SUNRPC: Introduce a specific rpcb_create for contacting localhost
...
Jan Beulich [Wed, 16 Jul 2008 21:27:08 +0000 (23:27 +0200)]
Fix FADT parsing
The (1.0 inherited) separate length fields in the FADT are byte granular.
Further, PM1a/b may have distinct lengths and live in distinct address spaces.
acpi_tb_convert_fadt() should account for all of these conditions.
Apart from these changes I'm puzzled by the fact that, not just for
acpi_gbl_xpm1{a,b}_enable, acpi_hw_low_level_{read,write}() get an explicit
size passed rather than using the size found in the passed GAS. What happens
on a platform that defines PM1{a,b} wider than 16 bits? Of course,
acpi_hw_low_level_{read,write}() at present are entirely un-prepared to deal
with sizes other than 8, 16, or 32, not to speak of a non-zero bit_offset or
access_width...
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Aaron Durbin [Wed, 16 Jul 2008 21:27:08 +0000 (23:27 +0200)]
Add the ability to reset the machine using the RESET_REG in ACPI's FADT table.
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bjorn Helgaas [Fri, 27 Jun 2008 14:45:39 +0000 (08:45 -0600)]
ACPI: use dev_printk when possible
Convert printks to use dev_printk(). The most obvious change will
be messages like this:
-ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 31 (level, low) -> IRQ 31
+cciss 0000:00:04.0: PCI INT A -> GSI 31 (level, low) -> IRQ 31
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:19 +0000 (16:57 -0600)]
PNPACPI: add support for HP vendor-specific CCSR descriptors
The HP CCSR descriptor describes MMIO address space that should appear
as a MEM resource. This patch adds support for parsing these descriptors
in the _CRS data.
The visible effect of this is that these MEM resources will appear
in /sys/devices/pnp0/.../resources, which means that "lspnp -v" will
report it, user applications can use this to locate device CSR space,
and kernel drivers can use the normal PNP resource accessors to
locate them.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:18 +0000 (16:57 -0600)]
PNP: avoid legacy IDE IRQs
If an IDE controller is in compatibility mode, it expects to use
IRQs 14 and 15, so PNP should avoid them.
This patch should resolve this problem report:
parallel driver grabs IRQ14 preventing legacy SFF ATA controller from working
https://bugzilla.novell.com/show_bug.cgi?id=375836
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:17 +0000 (16:57 -0600)]
PNP: convert resource options to single linked list
ISAPNP, PNPBIOS, and ACPI describe the "possible resource settings" of
a device, i.e., the possibilities an OS bus driver has when it assigns
I/O port, MMIO, and other resources to the device.
PNP used to maintain this "possible resource setting" information in
one independent option structure and a list of dependent option
structures for each device. Each of these option structures had lists
of I/O, memory, IRQ, and DMA resources, for example:
dev
independent options
ind-io0 -> ind-io1 ...
ind-mem0 -> ind-mem1 ...
...
dependent option set 0
dep0-io0 -> dep0-io1 ...
dep0-mem0 -> dep0-mem1 ...
...
dependent option set 1
dep1-io0 -> dep1-io1 ...
dep1-mem0 -> dep1-mem1 ...
...
...
This data structure was designed for ISAPNP, where the OS configures
device resource settings by writing directly to configuration
registers. The OS can write the registers in arbitrary order much
like it writes PCI BARs.
However, for PNPBIOS and ACPI devices, the OS uses firmware interfaces
that perform device configuration, and it is important to pass the
desired settings to those interfaces in the correct order. The OS
learns the correct order by using firmware interfaces that return the
"current resource settings" and "possible resource settings," but the
option structures above doesn't store the ordering information.
This patch replaces the independent and dependent lists with a single
list of options. For example, a device might have possible resource
settings like this:
dev
options
ind-io0 -> dep0-io0 -> dep1->io0 -> ind-io1 ...
All the possible settings are in the same list, in the order they
come from the firmware "possible resource settings" list. Each entry
is tagged with an independent/dependent flag. Dependent entries also
have a "set number" and an optional priority value. All dependent
entries must be assigned from the same set. For example, the OS can
use all the entries from dependent set 0, or all the entries from
dependent set 1, but it cannot mix entries from set 0 with entries
from set 1.
Prior to this patch PNP didn't keep track of the order of this list,
and it assigned all independent options first, then all dependent
ones. Using the example above, that resulted in a "desired
configuration" list like this:
ind->io0 -> ind->io1 -> depN-io0 ...
instead of the list the firmware expects, which looks like this:
ind->io0 -> depN-io0 -> ind-io1 ...
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:16 +0000 (16:57 -0600)]
ISAPNP: handle independent options following dependent ones
The ISAPNP spec recommends that independent options precede
dependent ones, but this is not actually required. The current
ISAPNP code incorrectly puts such trailing independent options
at the end of the last dependent option list.
This patch fixes that bug by resetting the current option list
to the independent list when we see an "End Dependent Functions"
tag. PNPBIOS and PNPACPI handle this the same way.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:15 +0000 (16:57 -0600)]
PNP: remove extra 0x100 bit from option priority
When building resource options, ISAPNP and PNPBIOS set the priority
to something like "0x100 | PNP_RES_PRIORITY_ACCEPTABLE", but we
immediately mask off the 0x100 again in pnp_build_option(), so that
bit looks superfluous.
Thanks to Rene Herman <rene.herman@gmail.com> for pointing this out.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:14 +0000 (16:57 -0600)]
PNP: support optional IRQ resources
This patch adds an IORESOURCE_IRQ_OPTIONAL flag for use when
assigning resources to a device. If the flag is set and we are
unable to assign an IRQ to the device, we can leave the IRQ
disabled but allow the overall resource allocation to succeed.
Some devices request an IRQ, but can run without an IRQ
(possibly with degraded performance). This flag lets us run
the device without the IRQ instead of just leaving the
device disabled.
This is a reimplementation of this previous change by Rene
Herman <rene.herman@gmail.com>:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=
3b73a223661ed137c5d3d2635f954382e94f5a43
I reimplemented this for two reasons:
- to prepare for converting all resource options into a single linked
list, as opposed to the per-resource-type lists we have now, and
- to preserve the order and number of resource options.
In PNPBIOS and ACPI, we configure a device by giving firmware a
list of resource assignments. It is important that this list
has exactly the same number of resources, in the same order,
as the "template" list we got from the firmware in the first
place.
The problem of a sound card MPU401 being left disabled for want of
an IRQ was reported by Uwe Bugla <uwe.bugla@gmx.de>.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:13 +0000 (16:57 -0600)]
PNP: rename pnp_register_*_resource() local variables
No functional change; just rename "data" to something more
descriptive.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:12 +0000 (16:57 -0600)]
PNPACPI: ignore _PRS interrupt numbers larger than PNP_IRQ_NR
ACPI Extended Interrupt Descriptors can encode 32-bit interrupt
numbers, so an interrupt number may exceed the size of the bitmap
we use to track possible IRQ settings.
To avoid corrupting memory, complain and ignore too-large interrupt
numbers.
There's similar code in pnpacpi_parse_irq_option(), but I didn't
change that because the small IRQ descriptor can only encode
IRQs 0-15, which do not exceed bitmap size.
In the future, we could handle IRQ numbers greater than PNP_IRQ_NR
by replacing the bitmap with a table or list.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:11 +0000 (16:57 -0600)]
PNP: centralize resource option allocations
This patch moves all the option allocations (pnp_mem, pnp_port, etc)
into the pnp_register_{mem,port,irq,dma}_resource() functions. This
will make it easier to rework the option data structures.
The non-trivial part of this patch is the IRQ handling. The backends
have to allocate a local pnp_irq_mask_t bitmap, populate it, and pass
a pointer to pnp_register_irq_resource().
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:10 +0000 (16:57 -0600)]
PNP: remove redundant pnp_can_configure() check
pnp_assign_resources() is static and the only caller checks
pnp_can_configure() before calling it, so no need to do it
again.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:09 +0000 (16:57 -0600)]
PNP: make resource assignment functions return 0 (success) or -EBUSY (failure)
This patch doesn't change any behavior; it just makes the return
values more conventional.
This changes pnp_assign_dma() from a void function to one that
returns an int, just like the other assignment functions. For
now, at least, pnp_assign_dma() always returns 0 (success), so
it appears to never fail, just like before.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:08 +0000 (16:57 -0600)]
PNP: in debug resource dump, make empty list obvious
If the resource list is empty, say that explicitly. Previously,
it was confusing because often the heading was followed by zero
resource lines, then some "add resource" lines from auto-assignment,
so the "add" lines looked like current resources.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Bjorn Helgaas [Fri, 27 Jun 2008 22:57:07 +0000 (16:57 -0600)]
PNP: improve resource assignment debug
When we fail to assign an I/O or MEM resource, include the min/max
in the debug output to help match it with the options.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>