Yinghai Lu [Wed, 10 Feb 2010 09:20:30 +0000 (01:20 -0800)]
x86: Add find_fw_memmap_area
... so we can move early_res up.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-27-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:29 +0000 (01:20 -0800)]
Move round_up/down to kernel.h
... in preparation of moving early_res to kernel/.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-26-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:28 +0000 (01:20 -0800)]
x86: Make 32bit support NO_BOOTMEM
Let's make 32bit consistent with 64bit.
-v2: Andrew pointed out for 32bit that we should use -1ULL
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-25-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:27 +0000 (01:20 -0800)]
early_res: Enhance check_and_double_early_res
... to make it always try to start from low at first.
This makes it less likely for early_memtest to reserve a bad range, in
particular it puts new early_res in a range that is already tested.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-24-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:26 +0000 (01:20 -0800)]
x86: Move back find_e820_area to e820.c
Makes early_res.c more clean, so later could move it to /kernel.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-23-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:25 +0000 (01:20 -0800)]
x86: Add find_early_area_size
Prepare to move bck find_e820_area_size back to e820.c.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-22-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:24 +0000 (01:20 -0800)]
x86: Separate early_res related code from e820.c
... to make e820.c smaller.
-v2: fix 32bit compiling with MAX_DMA32_PFN
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-21-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:23 +0000 (01:20 -0800)]
x86: Move bios page reserve early to head32/64.c
So prepare to make one more clean of early_res.c.
-v2: don't need to reserve first page in early_res
because we already mark that in e820 as reserved already.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-20-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:22 +0000 (01:20 -0800)]
Yinghai Lu [Wed, 10 Feb 2010 09:20:21 +0000 (01:20 -0800)]
sparsemem: Put usemap for one node together
Could save some buffer space instead of applying one by one.
Could help that system that is going to use early_res instead of bootmem
less entries in early_res make search more faster on system with more memory.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-18-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:20 +0000 (01:20 -0800)]
x86: Make 64 bit use early_res instead of bootmem before slab
Finally we can use early_res to replace bootmem for x86_64 now.
Still can use CONFIG_NO_BOOTMEM to enable it or not.
-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message below
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
4B747239.4070907@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:19 +0000 (01:20 -0800)]
x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMA
64bit NUMA already make enough space under 4G with new early_node_mem.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-16-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:18 +0000 (01:20 -0800)]
x86: Make early_node_mem get mem > 4 GB if possible
So we could put pgdata for the node high, and later sparse
vmmap will get the section nr that need.
With this patch will make <4 GB ram not use a sparse vmmap.
before this patch, will get, before swiotlb try get bootmem
[ 0.000000] nid=1 start=0 end=2080000 aligned=1
[ 0.000000] free [10 - 96]
[ 0.000000] free [b12 - 1000]
[ 0.000000] free [359f - 38a3]
[ 0.000000] free [38b5 - 3a00]
[ 0.000000] free [41e01 - 42000]
[ 0.000000] free [73dde - 73e00]
[ 0.000000] free [73fdd - 74000]
[ 0.000000] free [741dd - 74200]
[ 0.000000] free [743dd - 74400]
[ 0.000000] free [745dd - 74600]
[ 0.000000] free [747dd - 74800]
[ 0.000000] free [749dd - 74a00]
[ 0.000000] free [74bdd - 74c00]
[ 0.000000] free [74ddd - 74e00]
[ 0.000000] free [74fdd - 75000]
[ 0.000000] free [751dd - 75200]
[ 0.000000] free [753dd - 75400]
[ 0.000000] free [755dd - 75600]
[ 0.000000] free [757dd - 75800]
[ 0.000000] free [759dd - 75a00]
[ 0.000000] free [75bdd - 7bf5f]
[ 0.000000] free [7f730 - 7f750]
[ 0.000000] free [100000 - 2080000]
[ 0.000000] total free 1f87170
[ 93.301474] Placing 64MB software IO TLB between
ffff880075bdd000 -
ffff880079bdd000
[ 93.311814] software IO TLB at phys 0x75bdd000 - 0x79bdd000
with this patch will get: before swiotlb try get bootmem
[ 0.000000] nid=1 start=0 end=2080000 aligned=1
[ 0.000000] free [a - 96]
[ 0.000000] free [702 - 1000]
[ 0.000000] free [359f - 3600]
[ 0.000000] free [37de - 3800]
[ 0.000000] free [39dd - 3a00]
[ 0.000000] free [3bdd - 3c00]
[ 0.000000] free [3ddd - 3e00]
[ 0.000000] free [3fdd - 4000]
[ 0.000000] free [41dd - 4200]
[ 0.000000] free [43dd - 4400]
[ 0.000000] free [45dd - 4600]
[ 0.000000] free [47dd - 4800]
[ 0.000000] free [49dd - 4a00]
[ 0.000000] free [4bdd - 4c00]
[ 0.000000] free [4ddd - 4e00]
[ 0.000000] free [4fdd - 5000]
[ 0.000000] free [51dd - 5200]
[ 0.000000] free [53dd - 5400]
[ 0.000000] free [55dd - 7bf5f]
[ 0.000000] free [7f730 - 7f750]
[ 0.000000] free [100428 - 100600]
[ 0.000000] free [13ea01 - 13ec00]
[ 0.000000] free [170800 - 2080000]
[ 0.000000] total free 1f87170
[ 92.689485] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 92.699799] Placing 64MB software IO TLB between
ffff8800055dd000 -
ffff8800095dd000
[ 92.710916] software IO TLB at phys 0x55dd000 - 0x95dd000
so will get enough space below 4G, aka pfn 0x100000
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-15-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:17 +0000 (01:20 -0800)]
x86: Dynamically increase early_res array size
Use early_res_count to track the num, and use find_e820 to get a new
buffer, then copy from the old to the new one.
Also, clear early_res to prevent later invalid usage.
-v2 _check_and_double_early_res should take new start
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-14-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:16 +0000 (01:20 -0800)]
x86: Introduce max_early_res and early_res_count
To prepare allocate early res array from fine_e820_area.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-13-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:15 +0000 (01:20 -0800)]
x86: Call early_res_to_bootmem one time
Simplify setup_node_mem: don't use bootmem from other node, instead
just find_e820_area in early_node_mem.
This keeps the boundary between early_res and boot mem more clear, and
lets us only call early_res_to_bootmem() one time instead of for all
nodes.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-12-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:14 +0000 (01:20 -0800)]
x86: Print out RAM buffer information
So we can check that early in the bootlog.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-11-git-send-email-yinghai@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:13 +0000 (01:20 -0800)]
x86: Change range end to start+size
So make interface more consistent with early_res.
Later we can share some code with early_res.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-10-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:12 +0000 (01:20 -0800)]
x86/pci: Enable pci root res read out for 32bit too
Should be good for 32bit too.
-v3: cast res->start
-v4: according to Linus, to use %pR instead of cast
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-9-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:11 +0000 (01:20 -0800)]
x86/pci: Add cap_resource()
Prepare for 32bit pci root bus
-v2: hpa said we should compare with (resource_size_t)~0
-v3: according to Linus to use MAX_RESOURCE instead.
also need need to put related patches together
-v4: according to Andrew, use min in cap_resource()
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-8-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:10 +0000 (01:20 -0800)]
x86/pci: Use u64 instead of size_t in amd_bus.c
Prepare to enable it for 32bit.
-v2: remove not needed cast
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-7-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:09 +0000 (01:20 -0800)]
x86/pci: AMD one chain system to use pci read out res
Found MSI amd k8 based laptops is hiding [0x70000000, 0x80000000) RAM
from e820.
enable amd one chain even for all.
-v2: use bool for found, according to Andrew
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-6-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:08 +0000 (01:20 -0800)]
x86/pci: Use resource_size_t in update_res
Prepare to enable 32bit intel and amd bus.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-5-git-send-email-yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 10 Feb 2010 09:20:07 +0000 (01:20 -0800)]
x86: Move range related operation to one file
We have almost the same code for mtrr cleanup and amd_bus checkup, and
this code will also be used in replacing bootmem with early_res,
so try to move them together and reuse it from different parts.
Also rename update_range to subtract_range as that is what the
function is actually doing.
-v2: update comments as Christoph requested
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <
1265793639-15071-4-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
H. Peter Anvin [Thu, 11 Feb 2010 01:45:09 +0000 (17:45 -0800)]
ibmphp: Rename add_range() to add_bus_range() to avoid conflict
Rename add_range() to add_bus_range() to avoid conflict with the
naming of the generic range manipulation functions.
LKML-Reference: <
1265793639-15071-4-git-send-email-yinghai@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
H. Peter Anvin [Thu, 11 Feb 2010 00:55:28 +0000 (16:55 -0800)]
Merge remote branch 'linus/master' into x86/bootmem
Linus Torvalds [Wed, 10 Feb 2010 15:34:46 +0000 (07:34 -0800)]
Merge branch 'i2c-for-linus' of git://git./linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-tiny-usb: Fix on big-endian systems
Linus Torvalds [Wed, 10 Feb 2010 15:19:07 +0000 (07:19 -0800)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] Fix struct _lowcore layout.
[S390] qdio: prevent call trace if CHPID is offline
[S390] qdio: continue polling for buffer state ERROR
Linus Torvalds [Wed, 10 Feb 2010 15:18:15 +0000 (07:18 -0800)]
Merge branch 'kvm-updates/2.6.33' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PIT: control word is write-only
kvmclock: count total_sleep_time when updating guest clock
Export the symbol of getboottime and mmonotonic_to_bootbased
Linus Torvalds [Wed, 10 Feb 2010 15:17:54 +0000 (07:17 -0800)]
Merge git://git./linux/kernel/git/hskinnemoen/avr32-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: clean up memory allocation in at32_add_device_mci
arch/avr32: Fix build failure for avr32 caused by typo
Linus Torvalds [Wed, 10 Feb 2010 15:16:58 +0000 (07:16 -0800)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix address masking bug in hpte_need_flush()
Linus Torvalds [Wed, 10 Feb 2010 15:16:44 +0000 (07:16 -0800)]
Merge git://git./linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: fix dentry hash calculation for case-insensitive mounts
[CIFS] Don't cache timestamps on utimes due to coarse granularity
[CIFS] Maximum username length check in session setup does not match
cifs: fix length calculation for converted unicode readdir names
[CIFS] Add support for TCP_NODELAY
Linus Torvalds [Wed, 10 Feb 2010 15:15:21 +0000 (07:15 -0800)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
drivers/net: Correct NULL test
MAINTAINERS: networking drivers - Add git net-next tree
net/sched: Fix module name in Kconfig
cxgb3: fix GRO checksum check
dst: call cond_resched() in dst_gc_task()
netfilter: nf_conntrack: fix hash resizing with namespaces
netfilter: xtables: compat out of scope fix
netfilter: nf_conntrack: restrict runtime expect hashsize modifications
netfilter: nf_conntrack: per netns nf_conntrack_cachep
netfilter: nf_conntrack: fix memory corruption with multiple namespaces
Bluetooth: Keep a copy of each HID device's report descriptor
pktgen: Fix freezing problem
igb: make certain to reassign legacy interrupt vectors after reset
irda: add missing BKL in irnet_ppp ioctl
irda: unbalanced lock_kernel in irnet_ppp
ixgbe: Fix return of invalid txq
ixgbe: Fix ixgbe_tx_map error path
netxen: protect resource cleanup by rtnl lock
netxen: fix tx timeout recovery for NX2031 chip
Bluetooth: Enter active mode before establishing a SCO link.
...
David Gibson [Mon, 8 Feb 2010 20:09:03 +0000 (20:09 +0000)]
powerpc: Fix address masking bug in hpte_need_flush()
Commit
f71dc176aa06359681c30ba6877ffccab6fba3a6 'Make
hpte_need_flush() correctly mask for multiple page sizes' introduced
bug, which is triggered when a kernel with a 64k base page size is run
on a system whose hardware does not 64k hash PTEs. In this case, we
emulate 64k pages with multiple 4k hash PTEs, however in
hpte_need_flush() we incorrectly only mask the hardware page size from
the address, instead of the logical page size. This causes things to
go wrong when we later attempt to iterate through the hardware
subpages of the logical page.
This patch corrects the error. It has been tested on pSeries bare
metal by Michael Neuling.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Wed, 10 Feb 2010 01:01:26 +0000 (17:01 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: fix some lockdep issues between md and sysfs.
md: fix 'degraded' calculation when starting a reshape.
NeilBrown [Tue, 9 Feb 2010 05:34:14 +0000 (16:34 +1100)]
md: fix some lockdep issues between md and sysfs.
======
This fix is related to
http://bugzilla.kernel.org/show_bug.cgi?id=15142
but does not address that exact issue.
======
sysfs does like attributes being removed while they are being accessed
(i.e. read or written) and waits for the access to complete.
As accessing some md attributes takes the same lock that is held while
removing those attributes a deadlock can occur.
This patch addresses 3 issues in md that could lead to this deadlock.
Two relate to calling flush_scheduled_work while the lock is held.
This is probably a bad idea in general and as we use schedule_work to
delete various sysfs objects it is particularly bad.
In one case flush_scheduled_work is called from md_alloc (called by
md_probe) called from do_md_run which holds the lock. This call is
only present to ensure that ->gendisk is set. However we can be sure
that gendisk is always set (though possibly we couldn't when that code
was originally written. This is because do_md_run is called in three
different contexts:
1/ from md_ioctl. This requires that md_open has succeeded, and it
fails if ->gendisk is not set.
2/ from writing a sysfs attribute. This can only happen if the
mddev has been registered in sysfs which happens in md_alloc
after ->gendisk has been set.
3/ from autorun_array which is only called by autorun_devices, which
checks for ->gendisk to be set before calling autorun_array.
So the call to md_probe in do_md_run can be removed, and the check on
->gendisk can also go.
In the other case flush_scheduled_work is being called in do_md_stop,
purportedly to wait for all md_delayed_delete calls (which delete the
component rdevs) to complete. However there really isn't any need to
wait for them - they have already been disconnected in all important
ways.
The third issue is that raid5->stop() removes some attribute names
while the lock is held. There is already some infrastructure in place
to delay attribute removal until after the lock is released (using
schedule_work). So extend that infrastructure to remove the
raid5_attrs_group.
This does not address all lockdep issues related to the sysfs
"s_active" lock. The rest can be address by splitting that lockdep
context between symlinks and non-symlinks which hopefully will happen.
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Tue, 9 Feb 2010 19:19:06 +0000 (11:19 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix p9_client_destroy unconditional calling v9fs_put_trans
9p: fix memory leak in v9fs_parse_options()
9p: Fix the kernel crash on a failed mount
9p: fix option parsing
9p: Include fsync support for 9p client
net/9p: fix statsize inside twstat
net/9p: fail when user specifies a transport which we can't find
net/9p: fix virtio transport to correctly update status on connect
Marcelo Tosatti [Fri, 29 Jan 2010 19:28:41 +0000 (17:28 -0200)]
KVM: PIT: control word is write-only
PIT control word (address 0x43) is write-only, reads are undefined.
Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jason Wang [Wed, 27 Jan 2010 11:13:49 +0000 (19:13 +0800)]
kvmclock: count total_sleep_time when updating guest clock
Current kvm wallclock does not consider the total_sleep_time which could cause
wrong wallclock in guest after host suspend/resume. This patch solve
this issue by counting total_sleep_time to get the correct host boot time.
Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Jason Wang [Wed, 27 Jan 2010 11:13:40 +0000 (19:13 +0800)]
Export the symbol of getboottime and mmonotonic_to_bootbased
Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.
Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Heiko Carstens [Tue, 9 Feb 2010 08:46:09 +0000 (09:46 +0100)]
[S390] Fix struct _lowcore layout.
Offsets and sizes are wrong for 32 bit.
Got broken with
866ba284 "[S390] cleanup lowcore.h".
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jan Glauber [Tue, 9 Feb 2010 08:46:08 +0000 (09:46 +0100)]
[S390] qdio: prevent call trace if CHPID is offline
If a CHPID is offline during a device shutdown the ccw_device_halt|clear
may fail and the qdio device stays in state STOPPED until the shutdown is
finished. If an interrupt occurs before the device is set to INACTIVE
the STOPPED state triggers a WARN_ON in the interrupt handler.
Prevent this WARN_ON by catching the STOPPED state in the interrupt
handler.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ursula Braun [Tue, 9 Feb 2010 08:46:07 +0000 (09:46 +0100)]
[S390] qdio: continue polling for buffer state ERROR
Inbound traffic handling may hang if next buffer to check is in
state ERROR, polling is stopped and the final check for further
available inbound buffers disregards buffers in state ERROR.
This patch includes state ERROR when checking availability of
more inbound buffers.
Cc: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
David S. Miller [Tue, 9 Feb 2010 06:45:56 +0000 (22:45 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/holtmann/bluetooth-2.6
Julia Lawall [Tue, 9 Feb 2010 06:44:18 +0000 (22:44 -0800)]
drivers/net: Correct NULL test
Test the value that was just allocated rather than the previously tested one.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@
if (x == NULL || ...) {
... when forall
return ...; }
... when != goto l;
when != x = e
when != &x
*x == NULL
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 9 Feb 2010 06:42:40 +0000 (22:42 -0800)]
MAINTAINERS: networking drivers - Add git net-next tree
During the rc period, patches that are not bugfixes
should be done using the net-next tree.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Luebbe [Tue, 9 Feb 2010 06:41:44 +0000 (22:41 -0800)]
net/sched: Fix module name in Kconfig
The action modules have been prefixed with 'act_', but the Kconfig
description was not changed.
Signed-off-by: Jan Luebbe <jluebbe@debian.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Divy Le Ray [Tue, 9 Feb 2010 06:37:24 +0000 (22:37 -0800)]
cxgb3: fix GRO checksum check
Verify the HW checksum state for frames handed to GRO processing.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Tue, 9 Feb 2010 01:31:47 +0000 (12:31 +1100)]
md: fix 'degraded' calculation when starting a reshape.
This code was written long ago when it was not possible to
reshape a degraded array. Now it is so the current level of
degraded-ness needs to be taken in to account. Also newly addded
devices should only reduce degradedness if they are deemed to be
in-sync.
In particular, if you convert a RAID5 to a RAID6, and increase the
number of devices at the same time, then the 5->6 conversion will
make the array degraded so the current code will produce a wrong
value for 'degraded' - "-1" to be precise.
If the reshape runs to completion end_reshape will calculate a correct
new value for 'degraded', but if a device fails during the reshape an
incorrect decision might be made based on the incorrect value of
"degraded".
This patch is suitable for 2.6.32-stable and if they are still open,
2.6.31-stable and 2.6.30-stable as well.
Cc: stable@kernel.org
Reported-by: Michael Evans <mjevans1983@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Tue, 9 Feb 2010 01:08:01 +0000 (17:08 -0800)]
Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
Revert "nfsd4: fix error return when pseudoroot missing"
Eric Van Hensbergen [Tue, 9 Feb 2010 00:18:34 +0000 (18:18 -0600)]
9p: fix p9_client_destroy unconditional calling v9fs_put_trans
restructure client create code to handle error cases better and
only cleanup initialized portions of the stack.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Linus Torvalds [Tue, 9 Feb 2010 00:05:50 +0000 (16:05 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/jlbec/ocfs2
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2/cluster: Make o2net connect messages KERN_NOTICE
ocfs2/dlm: Fix printing of lockname
ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
ocfs2: Plugs race between the dc thread and an unlock ast message
ocfs2: Remove overzealous BUG_ON during blocked lock processing
ocfs2: Do not downconvert if the lock level is already compatible
ocfs2: Prevent a livelock in dlmglue
ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
ocfs2: Use compat_ptr in reflink_arguments.
ocfs2/dlm: Handle EAGAIN for compatibility - v2
ocfs2: Add parenthesis to wrap the check for O_DIRECT.
ocfs2: Only bug out when page size is larger than cluster size.
ocfs2: Fix memory overflow in cow_by_page.
ocfs2/dlm: Print more messages during lock migration
ocfs2/dlm: Ignore LVBs of locks in the Blocked list
ocfs2/trivial: Remove trailing whitespaces
ocfs2: fix a misleading variable name
ocfs2: Sync max_inline_data_with_xattr from tools.
ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
Eric Van Hensbergen [Mon, 8 Feb 2010 23:59:34 +0000 (17:59 -0600)]
9p: fix memory leak in v9fs_parse_options()
If match_strdup() fail this function exits without freeing the options string.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Mon, 8 Feb 2010 11:50:32 +0000 (11:50 +0000)]
9p: Fix the kernel crash on a failed mount
The patch fix the crash repoted below
[ 15.149907] BUG: unable to handle kernel NULL pointer dereference at
00000001
[ 15.150806] IP: [<
c140b886>] p9_virtio_close+0x18/0x24
.....
....
[ 15.150806] Call Trace:
[ 15.150806] [<
c1408e78>] ? p9_client_destroy+0x3f/0x163
[ 15.150806] [<
c1409342>] ? p9_client_create+0x25f/0x270
[ 15.150806] [<
c1063b72>] ? trace_hardirqs_on+0xb/0xd
[ 15.150806] [<
c11ed4e8>] ? match_token+0x64/0x164
[ 15.150806] [<
c1175e8d>] ? v9fs_session_init+0x2f1/0x3c8
[ 15.150806] [<
c109cfc9>] ? kmem_cache_alloc+0x98/0xb8
[ 15.150806] [<
c1063b72>] ? trace_hardirqs_on+0xb/0xd
[ 15.150806] [<
c1173dd1>] ? v9fs_get_sb+0x47/0x1e8
[ 15.150806] [<
c1173dea>] ? v9fs_get_sb+0x60/0x1e8
[ 15.150806] [<
c10a2e77>] ? vfs_kern_mount+0x81/0x11a
[ 15.150806] [<
c10a2f55>] ? do_kern_mount+0x33/0xbe
[ 15.150806] [<
c10b40b9>] ? do_mount+0x654/0x6b3
[ 15.150806] [<
c1038949>] ? do_page_fault+0x0/0x284
[ 15.150806] [<
c10b28ec>] ? copy_mount_options+0x73/0xd2
[ 15.150806] [<
c10b4179>] ? sys_mount+0x61/0x94
[ 15.150806] [<
c14284e9>] ? syscall_call+0x7/0xb
....
[ 15.203562] ---[ end trace
1dd159357709eb4b ]---
[
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Eric Dumazet [Mon, 8 Feb 2010 23:00:39 +0000 (15:00 -0800)]
dst: call cond_resched() in dst_gc_task()
Kernel bugzilla #15239
On some workloads, it is quite possible to get a huge dst list to
process in dst_gc_task(), and trigger soft lockup detection.
Fix is to call cond_resched(), as we run in process context.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Van Hensbergen [Mon, 8 Feb 2010 22:23:23 +0000 (16:23 -0600)]
9p: fix option parsing
Options pointer is being moved before calling kfree() which seems
to cause problems. This uses a separate pointer to track and free
original allocation.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
M. Mohan Kumar [Mon, 8 Feb 2010 21:36:48 +0000 (15:36 -0600)]
9p: Include fsync support for 9p client
Implement the fsync in the client side by marking stat field values to 'don't touch' so that server may
interpret it as a request to guarantee that the contents of the associated file are committed to stable
storage before the Rwstat message is returned.
Without this patch, calling fsync on a 9p file results in "Invalid argument" error. Please check the attached
C program.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Acked-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Linus Torvalds [Mon, 8 Feb 2010 21:33:31 +0000 (13:33 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix ondemand to not request targets outside policy limits
[CPUFREQ] Fix use after free of struct powernow_k8_data
[CPUFREQ] fix default value for ondemand governor
Sunil Mushran [Fri, 5 Feb 2010 23:41:23 +0000 (15:41 -0800)]
ocfs2/cluster: Make o2net connect messages KERN_NOTICE
Connect and disconnect messages are more than informational as they are required
during root cause analysis for failures. This patch changes them from KERN_INFO
to KERN_NOTICE.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Faseh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Sunil Mushran [Sat, 6 Feb 2010 01:55:56 +0000 (17:55 -0800)]
ocfs2/dlm: Fix printing of lockname
The debug call printing the name of the lock resource was chopping
off the last character. This patch fixes the problem.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
J. Bruce Fields [Mon, 8 Feb 2010 18:42:26 +0000 (13:42 -0500)]
Revert "nfsd4: fix error return when pseudoroot missing"
Commit
f39bde24b275ddc45d fixed the error return from PUTROOTFH in the
case where there is no pseudofilesystem.
This is really a case we shouldn't hit on a correctly configured server:
in the absence of a root filehandle, there's no point accepting version
4 NFS rpc calls at all.
But the shared responsibility between kernel and userspace here means
the kernel on its own can't eliminate the possiblity of this happening.
And we have indeed gotten this wrong in distro's, so new client-side
mount code that attempts to negotiate v4 by default first has to work
around this case.
Therefore when commit
f39bde24b275ddc45d arrived at roughly the same
time as the new v4-default mount code, which explicitly checked only for
the previous error, the result was previously fine mounts suddenly
failing.
We'll fix both sides for now: revert the error change, and make the
client-side mount workaround more robust.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Eric Van Hensbergen [Sat, 16 Jan 2010 01:01:56 +0000 (19:01 -0600)]
net/9p: fix statsize inside twstat
stat structures contain a size prefix. In our twstat messages
we were including the size of the size prefix in the prefix, which is not
what the protocol wants, and Inferno servers would complain.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Eric Van Hensbergen [Sat, 16 Jan 2010 01:01:10 +0000 (19:01 -0600)]
net/9p: fail when user specifies a transport which we can't find
If the user specifies a transport and we can't find it, we failed back
to the default trainsport silently. This patch will make the code
complain more loudly and return an error code.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Eric Van Hensbergen [Sat, 16 Jan 2010 00:54:03 +0000 (18:54 -0600)]
net/9p: fix virtio transport to correctly update status on connect
The 9p virtio transport was not updating its connection status correctly
preventing it from being able to mount the server.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Patrick McHardy [Mon, 8 Feb 2010 19:18:07 +0000 (11:18 -0800)]
netfilter: nf_conntrack: fix hash resizing with namespaces
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.
Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Mon, 8 Feb 2010 19:17:43 +0000 (11:17 -0800)]
netfilter: xtables: compat out of scope fix
As per C99 6.2.4(2) when temporary table data goes out of scope,
the behaviour is undefined:
if (compat) {
struct foo tmp;
...
private = &tmp;
}
[dereference private]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Alexey Dobriyan [Mon, 8 Feb 2010 19:17:22 +0000 (11:17 -0800)]
netfilter: nf_conntrack: restrict runtime expect hashsize modifications
Expectation hashtable size was simply glued to a variable with no code
to rehash expectations, so it was a bug to allow writing to it.
Make "expect_hashsize" readonly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Eric Dumazet [Mon, 8 Feb 2010 19:16:56 +0000 (11:16 -0800)]
netfilter: nf_conntrack: per netns nf_conntrack_cachep
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.
We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).
If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Mon, 8 Feb 2010 19:16:26 +0000 (11:16 -0800)]
netfilter: nf_conntrack: fix memory corruption with multiple namespaces
As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
conntrack, which is located in the data section, might be accidentally
freed when a new namespace is instantiated while the untracked conntrack
is attached to a skb because the reference count it re-initialized.
The best fix would be to use a seperate untracked conntrack per
namespace since it includes a namespace pointer. Unfortunately this is
not possible without larger changes since the namespace is not easily
available everywhere we need it. For now move the untracked conntrack
initialization to the init_net setup function to make sure the reference
count is not re-initialized and handle cleanup in the init_net cleanup
function to make sure namespaces can exit properly while the untracked
conntrack is in use in other namespaces.
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 8 Feb 2010 19:07:10 +0000 (11:07 -0800)]
Merge branch 'v4l_for_linus' of git://linuxtv.org/fixes
* 'v4l_for_linus' of git://linuxtv.org/fixes:
V4L/DVB: dvb-core: fix initialization of feeds list in demux filter
V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet
V4L/DVB: Fix the risk of an oops at dvb_dmx_release
Linus Torvalds [Mon, 8 Feb 2010 18:10:36 +0000 (10:10 -0800)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Invalidate dcache before enabling it
Linus Torvalds [Mon, 8 Feb 2010 18:10:18 +0000 (10:10 -0800)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pseries: Fix kexec regression caused by CPPR tracking
Linus Torvalds [Mon, 8 Feb 2010 18:09:55 +0000 (10:09 -0800)]
Merge branch 'sh/for-2.6.33' of git://git./linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Remove superfluous setup_frame_reg call
sh: Don't continue unwinding across interrupts
sh: Setup frame pointer in handle_exception path
sh: Correct the offset of the return address in ret_from_exception
usb: r8a66597-hcd: Fix up spinlock recursion in root hub polling.
usb: r8a66597-hcd: Flush the D-cache for the pipe-in transfer buffers.
Jeff Layton [Fri, 5 Feb 2010 18:30:36 +0000 (13:30 -0500)]
cifs: fix dentry hash calculation for case-insensitive mounts
case-insensitive mounts shouldn't use full_name_hash(). Make sure we
use the parent dentry's d_hash routine when one is set.
Reported-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Steve French [Mon, 8 Feb 2010 17:39:58 +0000 (17:39 +0000)]
[CIFS] Don't cache timestamps on utimes due to coarse granularity
force revalidate of the file when any of the timestamps are set since
some filesytem types do not have finer granularity timestamps and
we can not always detect which file systems round timestamps down
to determine whether we can cache the mtime on setattr
samba bugzilla 3775
Acked-by: Shirish Pargaonkar <sharishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Francesco Lavra [Sun, 7 Feb 2010 12:49:58 +0000 (09:49 -0300)]
V4L/DVB: dvb-core: fix initialization of feeds list in demux filter
A DVB demultiplexer device can be used to set up either a PES filter or
a section filter. In the former case, the ts field of the feed union of
struct dmxdev_filter is used, in the latter case the sec field of the
same union is used.
The ts field is a struct list_head, and is currently initialized in the
open() method of the demux device. When for a given demuxer a section
filter is set up, the sec field is played with, thus if a PES filter
needs to be set up after that the ts field will be corrupted, causing a
kernel oops.
This fix moves the list head initialization to
dvb_dmxdev_pes_filter_set(), so that the ts field is properly
initialized every time a PES filter is set up.
Signed-off-by: Francesco Lavra <francescolavra@interfree.it>
Cc: stable <stable@kernel.org>
Reviewed-by: Andy Walls <awalls@radix.net>
Tested-by: hermann pitton <hermann-pitton@arcor.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Mon, 1 Feb 2010 14:50:42 +0000 (11:50 -0300)]
V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet
As dvb_dmx_swfilter_packet() is protected by a spinlock, it shouldn't sleep.
However, vmalloc() may call sleep. So, move the initialization of
dvb_demux::cnt_storage field to a better place.
Reviewed-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Mauro Carvalho Chehab [Mon, 1 Feb 2010 13:35:22 +0000 (10:35 -0300)]
V4L/DVB: Fix the risk of an oops at dvb_dmx_release
dvb_dmx_init tries to allocate virtual memory for 2 pointers: filter and feed.
If the second vmalloc fails, filter is freed, but the pointer keeps pointing
to the old place. Later, when dvb_dmx_release() is called, it will try to
free an already freed memory, causing an OOPS.
Reviewed-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Michal Simek [Mon, 1 Feb 2010 11:15:58 +0000 (12:15 +0100)]
microblaze: Invalidate dcache before enabling it
We found that on write-trough kernel is necessary to do that invalidation.
One WB is possible to use invalidation too.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Mark Nelson [Sun, 7 Feb 2010 16:45:12 +0000 (16:45 +0000)]
powerpc/pseries: Fix kexec regression caused by CPPR tracking
The code to track the CPPR values added by commit
49bd3647134ea47420067aea8d1401e722bf2aac ("powerpc/pseries: Track previous
CPPR values to correctly EOI interrupts") broke kexec on pseries because
the kexec code in xics.c calls xics_set_cpu_priority() before the IPI has
been EOI'ed. This wasn't a problem previously but it now triggers a BUG_ON
in xics_set_cpu_priority() because os_cppr->index isn't 0.
Fix this problem by setting the index on the CPPR stack to 0 before calling
xics_set_cpu_priority() in xics_teardown_cpu().
Also make it clear that we only want to set the priority when there's just
one CPPR value in the stack, and enforce it by updating the value of
os_cppr->stack[0] rather than os_cppr->stack[os_cppr->index].
While we're at it change the BUG_ON to a WARN_ON.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Matt Fleming [Sat, 30 Jan 2010 17:37:25 +0000 (17:37 +0000)]
sh: Remove superfluous setup_frame_reg call
There's no need to setup the frame pointer again in
call_handle_tlbmiss. The frame pointer will already have been setup in
handle_interrupt.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Matt Fleming [Sat, 30 Jan 2010 17:36:20 +0000 (17:36 +0000)]
sh: Don't continue unwinding across interrupts
Unfortunately, due to poor DWARF info in current toolchains, unwinding
through interrutps cannot be done reliably. The problem is that the
DWARF info for function epilogues is wrong.
Take this standard epilogue sequence,
80003cc4: e3 6f mov r14,r15
80003cc6: 26 4f lds.l @r15+,pr
80003cc8: f6 6e mov.l @r15+,r14
<---- interrupt here
80003cca: f6 6b mov.l @r15+,r11
80003ccc: f6 6a mov.l @r15+,r10
80003cce: f6 69 mov.l @r15+,r9
80003cd0: 0b 00 rts
If we take an interrupt at the highlighted point, the DWARF info will
bogusly claim that the return address can be found at some offset from
the frame pointer, even though the frame pointer was just restored. The
worst part is if the unwinder finds a text address at the bogus stack
address - unwinding will continue, for a bit, until it finally comes
across an unexpected address on the stack and blows up.
The only solution is to stop unwinding once we've calculated the
function that was executing when the interrupt occurred. This PC can be
easily calculated from pt_regs->pc.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Matt Fleming [Wed, 27 Jan 2010 20:44:59 +0000 (20:44 +0000)]
sh: Setup frame pointer in handle_exception path
In order to allow the DWARF unwinder to unwind through exceptions we
need to setup the frame pointer register (r14).
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Matt Fleming [Wed, 27 Jan 2010 20:05:20 +0000 (20:05 +0000)]
sh: Correct the offset of the return address in ret_from_exception
The address that ret_from_exception and ret_from_irq will return to is
found in the stack slot for SPC, not PR. This error was causing the
DWARF unwinder to pick up the wrong return address on the stack and then
unwind using the unwind tables for the wrong function.
While I'm here I might as well add CFI annotations for the other
registers since they could be useful when unwinding.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Linus Torvalds [Sun, 7 Feb 2010 19:18:28 +0000 (11:18 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Take ima_file_free() to proper place.
ima: rename PATH_CHECK to FILE_CHECK
ima: rename ima_path_check to ima_file_check
ima: initialize ima before inodes can be allocated
fix ima breakage
Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
befs: fix leak
Linus Torvalds [Sun, 7 Feb 2010 18:11:23 +0000 (10:11 -0800)]
Fix race in tty_fasync() properly
This reverts commit
703625118069 ("tty: fix race in tty_fasync") and
commit
b04da8bfdfbb ("fnctl: f_modown should call write_lock_irqsave/
restore") that tried to fix up some of the fallout but was incomplete.
It turns out that we really cannot hold 'tty->ctrl_lock' over calling
__f_setown, because not only did that cause problems with interrupt
disables (which the second commit fixed), it also causes a potential
ABBA deadlock due to lock ordering.
Thanks to Tetsuo Handa for following up on the issue, and running
lockdep to show the problem. It goes roughly like this:
- f_getown gets filp->f_owner.lock for reading without interrupts
disabled, so an interrupt that happens while that lock is held can
cause a lockdep chain from f_owner.lock -> sighand->siglock.
- at the same time, the tty->ctrl_lock -> f_owner.lock chain that
commit
703625118069 introduced, together with the pre-existing
sighand->siglock -> tty->ctrl_lock chain means that we have a lock
dependency the other way too.
So instead of extending tty->ctrl_lock over the whole __f_setown() call,
we now just take a reference to the 'pid' structure while holding the
lock, and then release it after having done the __f_setown. That still
guarantees that 'struct pid' won't go away from under us, which is all
we really ever needed.
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Américo Wang <xiyou.wangcong@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Sun, 7 Feb 2010 08:07:29 +0000 (03:07 -0500)]
Take ima_file_free() to proper place.
Hooks: Just Say No.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mimi Zohar [Tue, 26 Jan 2010 22:02:41 +0000 (17:02 -0500)]
ima: rename PATH_CHECK to FILE_CHECK
With the movement of the ima hooks functions were renamed from *path* to
*file* since they always deal with struct file. This patch renames some of
the ima internal flags to make them consistent with the rest of the code.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mimi Zohar [Tue, 26 Jan 2010 22:02:40 +0000 (17:02 -0500)]
ima: rename ima_path_check to ima_file_check
ima_path_check actually deals with files! call it ima_file_check instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eric Paris [Wed, 9 Dec 2009 20:29:01 +0000 (15:29 -0500)]
ima: initialize ima before inodes can be allocated
ima wants to create an inode information struct (iint) when inodes are
allocated. This means that at least the part of ima which does this
allocation (the allocation is filled with information later) should
before any inodes are created. To accomplish this we split the ima
initialization routine placing the kmem cache allocator inside a
security_initcall() function. Since this makes use of radix trees we also
need to make sure that is initialized before security_initcall().
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mimi Zohar [Wed, 20 Jan 2010 20:35:41 +0000 (15:35 -0500)]
fix ima breakage
The "Untangling ima mess, part 2 with counters" patch messed
up the counters. Based on conversations with Al Viro, this patch
streamlines ima_path_check() by removing the counter maintaince.
The counters are now updated independently, from measuring the file,
in __dentry_open() and alloc_file() by calling ima_counts_get().
ima_path_check() is called from nfsd and do_filp_open().
It also did not measure all files that should have been measured.
Reason: ima_path_check() got bogus value passed as mask.
[AV: mea culpa]
[AV: add missing nfsd bits]
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 26 Jan 2010 10:43:08 +0000 (05:43 -0500)]
Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jun'ichi Nomura [Fri, 29 Jan 2010 00:56:22 +0000 (09:56 +0900)]
freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
Thanks Thomas and Christoph for testing and review.
I removed 'smp_wmb()' before up_write from the previous patch,
since up_write() should have necessary ordering constraints.
(I.e. the change of s_frozen is visible to others after up_write)
I'm quite sure the change is harmless but if you are uncomfortable
with Tested-by/Reviewed-by on the modified patch, please remove them.
If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
deactivate_locked_super().
Also, keep sb->s_frozen consistent so that remount can check the frozen state.
Otherwise a crash reported here can happen:
http://lkml.org/lkml/2010/1/16/37
http://lkml.org/lkml/2010/1/28/53
This patch should be applied for 2.6.32 stable series, too.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Thomas Backlund <tmb@mandriva.org>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 29 Jan 2010 03:11:38 +0000 (22:11 -0500)]
befs: fix leak
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 6 Feb 2010 22:17:12 +0000 (14:17 -0800)]
Linux 2.6.33-rc7
Linus Torvalds [Sat, 6 Feb 2010 21:02:31 +0000 (13:02 -0800)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (w83781d) Request I/O ports individually for probing
hwmon: (lm78) Request I/O ports individually for probing
hwmon: (adt7462) Wrong ADT7462_VOLT_COUNT
Linus Torvalds [Sat, 6 Feb 2010 21:01:39 +0000 (13:01 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Fix leak of relocs along do_execbuffer error path
drm/i915: slow acpi_lid_open() causes flickering - V2
drm/i915: Disable SR when more than one pipe is enabled
drm/i915: page flip support for Ironlake
drm/i915: Fix the incorrect DMI string for Samsung SX20S laptop
drm/i915: Add support for SDVO composite TV
drm/i915: don't trigger ironlake vblank interrupt at irq install
drm/i915: handle non-flip pending case when unpinning the scanout buffer
drm/i915: Fix the device info of Pineview
drm/i915: enable vblank interrupt on ironlake
drm/i915: Prevent use of uninitialized pointers along error path.
drm/i915: disable hotplug detect before Ironlake CRT detect
Linus Torvalds [Sat, 6 Feb 2010 00:16:50 +0000 (16:16 -0800)]
Fix potential crash with sys_move_pages
We incorrectly depended on the 'node_state/node_isset()' functions
testing the node range, rather than checking it explicitly. That's not
reliable, even if it might often happen to work. So do the proper
explicit test.
Reported-by: Marcus Meissner <meissner@suse.de>
Acked-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve French [Sat, 6 Feb 2010 07:08:53 +0000 (07:08 +0000)]
[CIFS] Maximum username length check in session setup does not match
Fix length check reported by D. Binderman (see below)
d binderman <dcb314@hotmail.com> wrote:
>
> I just ran the sourceforge tool cppcheck over the source code of the
> new Linux kernel 2.6.33-rc6
>
> It said
>
> [./cifs/sess.c:250]: (error) Buffer access out-of-bounds
May turn out to be harmless, but best to be safe. Note max
username length is defined to 32 due to Linux (Windows
maximum is 20).
Signed-off-by: Steve French <sfrench@us.ibm.com>
Jeff Layton [Fri, 5 Feb 2010 18:14:00 +0000 (13:14 -0500)]
cifs: fix length calculation for converted unicode readdir names
cifs_from_ucs2 returns the length of the converted name, including the
length of the NULL terminator. We don't want to include the NULL
terminator in the dentry name length however since that'll throw off the
hash calculation for the dentry cache.
I believe that this is the root cause of several problems that have
cropped up recently that seem to be papered over with the "noserverino"
mount option. More confirmation of that would be good, but this is
clearly a bug and it fixes at least one reproducible problem that
was reported.
This patch fixes at least this reproducer in this kernel.org bug:
http://bugzilla.kernel.org/show_bug.cgi?id=15088#c12
Reported-by: Bjorn Tore Sund <bjorn.sund@it.uib.no>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steve French <sfrench@us.ibm.com>