Christoph Hellwig [Sat, 18 Jul 2009 22:15:01 +0000 (18:15 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get
xfs_attr_rmtval_get is always called with i_lock held, but i_lock is taken
in reclaim context so all allocations under it must avoid recursions into
the filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:15:00 +0000 (18:15 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap
xfs_readlink_bmap is called with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into
the filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:59 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set
xfs_attr_rmtval_set is always called with i_lock held, and i_lock is taken
in reclaim context so all allocations under it must avoid recursions into
the filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:58 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory
xfs_buf_associate_memory is used for setting up the spare buffer for the
log wrap case in xlog_sync which can happen under i_lock when called from
xfs_fsync. The i_lock mutex is taken in reclaim context so all allocations
under it must avoid recursions into the filesystem. There are a couple
more uses of xfs_buf_associate_memory in the log recovery code that are
also affected by this, but I'd rather keep the code simple than passing on
a gfp_mask argument. Longer term we should just stop requiring the memoery
allocation in xlog_sync by some smaller rework of the buffer layer.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:57 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result
xfs_dir_cilookup_result is always called with i_lock held, but i_lock is taken
in reclaim context so all allocations under it must avoid recursions into the
filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:56 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make
i_lock is taken in the reclaim context so all allocations under it
must avoid recursions into the filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:55 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc
xfs_da_state_alloc is always called with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into the
filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:54 +0000 (18:14 -0400)]
xfs: switch to NOFS allocation under i_lock in xfs_getbmap
xfs_getbmap allocates memory with i_lock held, but i_lock is taken in
reclaim context so all allocations under it must avoid recursions into
the filesystem.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Sat, 18 Jul 2009 22:14:53 +0000 (18:14 -0400)]
xfs: avoid memory allocation under m_peraglock in growfs code
Allocate the memory for the larger m_perag array before taking the
per-AG lock as the per-AG lock can be taken under the i_lock which
can be taken from reclaim context.
Reported by the new reclaim context tracing in lockdep.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Fri, 31 Jul 2009 05:02:17 +0000 (00:02 -0500)]
xfs: bump up nr_to_write in xfs_vm_writepage
VM calculation for nr_to_write seems off. Bump it way
up, this gets simple streaming writes zippy again.
To be reviewed again after Jens' writeback changes.
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Julia Lawall [Mon, 27 Jul 2009 16:15:25 +0000 (18:15 +0200)]
fs/xfs: Correct redundant test
bp was tested for NULL a few lines before, followed by a return, and there
is no intervening modification of its value.
A simplified version of the semantic match that finds this problem is as
follows: (http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@
if (x == NULL || ...) { ... when forall
return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
*x == NULL
|
*x != NULL
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Mon, 27 Jul 2009 02:52:01 +0000 (21:52 -0500)]
xfs: reduce bmv_count in xfs_vn_fiemap
commit
6321e3ed2acf3ee9643cdd403e1c88605d7944ba caused
the full bmv_count's worth of getbmapx structures to get
allocated; telling it to do MAXEXTNUM was a bit insane,
resulting in ENOMEM every time.
Chop it down to something reasonable, the number of slots
in the caller's input buffer. If this is too large the
caller may get ENOMEM but the reason should not be a
mystery, and they can try again with something smaller.
We add 1 to the value because in the normal getbmap
world, bmv_count includes the header and xfs_getbmap does:
nex = bmv->bmv_count - 1;
if (nex <= 0)
return XFS_ERROR(EINVAL);
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Sun, 5 Jul 2009 17:23:35 +0000 (12:23 -0500)]
xfs: remove XFS_INO64_OFFSET
Commit
a19d9f887d81106d52cacbc9930207b487e07e0e removed the
ino64 option but left the XFS_INO64_OFFSET define it used
in place - just remove it.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Fri, 3 Jul 2009 02:35:43 +0000 (21:35 -0500)]
un-static xfs_read_agf
CONFIG_XFS_DEBUG builds still need xfs_read_agf to be
non-static, oops.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Thu, 2 Jul 2009 05:09:33 +0000 (00:09 -0500)]
xfs: add more statics & drop some unused functions
A lot more functions could be made static, but they need
forward declarations; this does some easy ones, and also
found a few unused functions in the process.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Fri, 12 Jun 2009 15:19:11 +0000 (11:19 -0400)]
xfs: fix small mismerge in xfs_vn_mknod
Identation got messed up when merging the current_umask changes with
the generic ACL support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Christoph Hellwig [Fri, 12 Jun 2009 15:34:55 +0000 (11:34 -0400)]
xfs: fix warnings with CONFIG_XFS_QUOTA disabled
Fix warnings about unitialized dquot variables by making sure
xfs_qm_vop_dqalloc touches it even when quotas are disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Felix Blyakher [Thu, 11 Jun 2009 22:07:28 +0000 (17:07 -0500)]
xfs: fix freeing memory in xfs_getbmap()
Regression from commit
28e211700a81b0a934b6c7a4b8e7dda843634d2f.
Need to free temporary buffer allocated in xfs_getbmap().
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Hedi Berriche <hedi@sgi.com>
Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Felix Blyakher [Thu, 11 Jun 2009 21:56:49 +0000 (16:56 -0500)]
Merge branch 'master' of git://git./fs/xfs/xfs
Felix Blyakher [Wed, 10 Jun 2009 22:07:47 +0000 (17:07 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/torvalds/linux-2.6
Christoph Hellwig [Wed, 10 Jun 2009 15:07:47 +0000 (17:07 +0200)]
xfs: use generic Posix ACL code
This patch rips out the XFS ACL handling code and uses the generic
fs/posix_acl.c code instead. The ondisk format is of course left
unchanged.
This also introduces the same ACL caching all other Linux filesystems do
by adding pointers to the acl and default acl in struct xfs_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Linus Torvalds [Wed, 10 Jun 2009 03:05:27 +0000 (20:05 -0700)]
Linux 2.6.30
Peter Botha [Wed, 10 Jun 2009 00:16:32 +0000 (17:16 -0700)]
char: mxser, fix ISA board lookup
There's a bug in the mxser kernel module that still appears in the
2.6.29.4 kernel.
mxser_get_ISA_conf takes a ioaddress as its first argument, by passing the
not of the ioaddr, you're effectively passing 0 which means it won't be
able to talk to an ISA card. I have tested this, and removing the !
fixes the problem.
Cc: "Peter Botha" <peterb@goldcircle.co.za>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 9 Jun 2009 23:26:26 +0000 (16:26 -0700)]
jbd: fix race in buffer processing in commit code
In commit code, we scan buffers attached to a transaction. During this
scan, we sometimes have to drop j_list_lock and then we recheck whether
the journal buffer head didn't get freed by journal_try_to_free_buffers().
But checking for buffer_jbd(bh) isn't enough because a new journal head
could get attached to our buffer head. So add a check whether the journal
head remained the same and whether it's still at the same transaction and
list.
This is a nasty bug and can cause problems like memory corruption (use after
free) or trigger various assertions in JBD code (observed).
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Tue, 9 Jun 2009 23:26:24 +0000 (16:26 -0700)]
autofs4: remove hashed check in validate_wait()
The recent ->lookup() deadlock correction required the directory inode
mutex to be dropped while waiting for expire completion. We were
concerned about side effects from this change and one has been identified.
I saw several error messages.
They cause autofs to become quite confused and don't really point to the
actual problem.
Things like:
handle_packet_missing_direct:1376: can't find map entry for (43,1827932)
which is usually totally fatal (although in this case it wouldn't be
except that I treat is as such because it normally is).
do_mount_direct: direct trigger not valid or already mounted
/test/nested/g3c/s1/ss1
which is recoverable, however if this problem is at play it can cause
autofs to become quite confused as to the dependencies in the mount tree
because mount triggers end up mounted multiple times. It's hard to
accurately check for this over mounting case and automount shouldn't need
to if the kernel module is doing its job.
There was one other message, similar in consequence of this last one but I
can't locate a log example just now.
When checking if a mount has already completed prior to adding a new mount
request to the wait queue we check if the dentry is hashed and, if so, if
it is a mount point. But, if a mount successfully completed while we
slept on the wait queue mutex the dentry must exist for the mount to have
completed so the test is not really needed.
Mounts can also be done on top of a global root dentry, so for the above
case, where a mount request completes and the wait queue entry has already
been removed, the hashed test returning false can cause an incorrect
callback to the daemon. Also, d_mountpoint() is not sufficient to check
if a mount has completed for the multi-mount case when we don't have a
real mount at the base of the tree.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 9 Jun 2009 23:26:23 +0000 (16:26 -0700)]
shm: fix unused warnings on nommu
The massive nommu update (
8feae131) resulted in these warnings:
ipc/shm.c: In function `sys_shmdt':
ipc/shm.c:974: warning: unused variable `size'
ipc/shm.c:972: warning: unused variable `next'
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 9 Jun 2009 15:48:32 +0000 (08:48 -0700)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
kvm: fix kvm reboot crash when MAXSMP is used
cpumask: alloc zeroed cpumask for static cpumask_var_ts
cpumask: introduce zalloc_cpumask_var
Linus Torvalds [Tue, 9 Jun 2009 15:47:43 +0000 (08:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
bsg: setting rq->bio to NULL
Linus Torvalds [Tue, 9 Jun 2009 15:47:27 +0000 (08:47 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
cls_cgroup: Fix oops when user send improperly 'tc filter add' request
r8169: fix crash when large packets are received
Linus Torvalds [Tue, 9 Jun 2009 15:41:22 +0000 (08:41 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md/raid5: fix bug in reshape code when chunk_size decreases.
md/raid5 - avoid deadlocks in get_active_stripe during reshape
md/raid5: use conf->raid_disks in preference to mddev->raid_disk
FUJITA Tomonori [Tue, 9 Jun 2009 13:17:37 +0000 (15:17 +0200)]
bsg: setting rq->bio to NULL
Due to commit
1cd96c242a829d52f7a5ae98f554ca9775429685 ("block: WARN
in __blk_put_request() for potential bio leak"), BSG SMP requests get
the false warnings:
WARNING: at block/blk-core.c:1068 __blk_put_request+0x52/0xc0()
This sets rq->bio to NULL to avoid that false warnings.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Avi Kivity [Sat, 6 Jun 2009 21:52:35 +0000 (14:52 -0700)]
kvm: fix kvm reboot crash when MAXSMP is used
one system was found there is crash during reboot then kvm/MAXSMP
Sending all processes the KILL signal... done
Please stand by while rebooting the system...
[ 1721.856538] md: stopping all md devices.
[ 1722.852139] kvm: exiting hardware virtualization
[ 1722.854601] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1722.872219] IP: [<
ffffffff8102c6b6>] hardware_disable+0x4c/0xb4
[ 1722.877955] PGD 0
[ 1722.880042] Oops: 0000 [#1] SMP
[ 1722.892548] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:2:0/0:2:0:0/vendor
[ 1722.900977] CPU 9
[ 1722.912606] Modules linked in:
[ 1722.914226] Pid: 0, comm: swapper Not tainted 2.6.30-rc7-tip-01843-g2305324-dirty #299 ...
[ 1722.932589] RIP: 0010:[<
ffffffff8102c6b6>] [<
ffffffff8102c6b6>] hardware_disable+0x4c/0xb4
[ 1722.942709] RSP: 0018:
ffffc900010b6ed8 EFLAGS:
00010046
[ 1722.956121] RAX:
0000000000000000 RBX:
ffffc9000e253140 RCX:
0000000000000009
[ 1722.972202] RDX:
000000000000b020 RSI:
ffffc900010c3220 RDI:
ffffffffffffd790
[ 1722.977399] RBP:
ffffc900010b6f08 R08:
0000000000000000 R09:
0000000000000000
[ 1722.995149] R10:
00000000000004b8 R11:
966912b6c78fddbd R12:
0000000000000009
[ 1723.011551] R13:
000000000000b020 R14:
0000000000000009 R15:
0000000000000000
[ 1723.019898] FS:
0000000000000000(0000) GS:
ffffc900010b3000(0000) knlGS:
0000000000000000
[ 1723.034389] CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
[ 1723.041164] CR2:
0000000000000000 CR3:
0000000001001000 CR4:
00000000000006e0
[ 1723.056192] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 1723.072546] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 1723.080562] Process swapper (pid: 0, threadinfo
ffff88107e464000, task
ffff88047e5a2550)
[ 1723.096144] Stack:
[ 1723.099071]
0000000000000046 ffffc9000e253168 966912b6c78fddbd ffffc9000e253140
[ 1723.115471]
ffff880c7d4304d0 ffffc9000e253168 ffffc900010b6f28 ffffffff81011022
[ 1723.132428]
ffffc900010b6f48 966912b6c78fddbd ffffc900010b6f48 ffffffff8100b83b
[ 1723.141973] Call Trace:
[ 1723.142981] <IRQ> <0> [<
ffffffff81011022>] kvm_arch_hardware_disable+0x26/0x3c
[ 1723.158153] [<
ffffffff8100b83b>] hardware_disable+0x3f/0x55
[ 1723.172168] [<
ffffffff810b95f6>] generic_smp_call_function_interrupt+0x76/0x13c
[ 1723.178836] [<
ffffffff8104cbea>] smp_call_function_interrupt+0x3a/0x5e
[ 1723.194689] [<
ffffffff81035bf3>] call_function_interrupt+0x13/0x20
[ 1723.199750] <EOI> <0> [<
ffffffff814ad3b4>] ? acpi_idle_enter_c1+0xd3/0xf4
[ 1723.217508] [<
ffffffff814ad3ae>] ? acpi_idle_enter_c1+0xcd/0xf4
[ 1723.232172] [<
ffffffff814ad4bc>] ? acpi_idle_enter_bm+0xe7/0x2ce
[ 1723.235141] [<
ffffffff81a8d93f>] ? __atomic_notifier_call_chain+0x0/0xac
[ 1723.253381] [<
ffffffff818c3dff>] ? menu_select+0x58/0xd2
[ 1723.258179] [<
ffffffff818c2c9d>] ? cpuidle_idle_call+0xa4/0xf3
[ 1723.272828] [<
ffffffff81034085>] ? cpu_idle+0xb8/0x101
[ 1723.277085] [<
ffffffff81a80163>] ? start_secondary+0x1bc/0x1d7
[ 1723.293708] Code: b0 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 48 8b 04 cd 30 ee 27 82 49 89 cc 49 89 d5 48 8b 04 10 48 8d b8 90 d7 ff ff <48> 8b 87 70 28 00 00 48 8d 98 90 d7 ff ff eb 16 e8 e9 fe ff ff
[ 1723.335524] RIP [<
ffffffff8102c6b6>] hardware_disable+0x4c/0xb4
[ 1723.342076] RSP <
ffffc900010b6ed8>
[ 1723.352021] CR2:
0000000000000000
[ 1723.354348] ---[ end trace
e2aec53dae150aa1 ]---
it turns out that we need clear cpus_hardware_enabled in that case.
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Yinghai Lu [Sat, 6 Jun 2009 21:51:36 +0000 (14:51 -0700)]
cpumask: alloc zeroed cpumask for static cpumask_var_ts
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Yinghai Lu [Sat, 6 Jun 2009 21:50:36 +0000 (14:50 -0700)]
cpumask: introduce zalloc_cpumask_var
So can get cpumask_var with cpumask_clear
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Minoru Usui [Tue, 9 Jun 2009 11:03:09 +0000 (04:03 -0700)]
cls_cgroup: Fix oops when user send improperly 'tc filter add' request
I found a bug in cls_cgroup_change() in cls_cgroup.c.
cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly,
but tc in iproute2-2.6.29-1 (which I used) didn't set it.
In the current source code of tc in git, it set tca[TCA_OPTIONS].
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
If we always use a newest iproute2 in git when we use cls_cgroup,
we don't face this oops probably.
But I think, kernel shouldn't panic regardless of use program's behaviour.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Jun 2009 11:01:02 +0000 (04:01 -0700)]
r8169: fix crash when large packets are received
Michael Tokarev reported receiving a large packet could crash
a machine with RTL8169 NIC.
( original thread at http://lkml.org/lkml/2009/6/8/192 )
Problem is this driver tells that NIC frames up to 16383 bytes
can be received but provides skb to rx ring allocated with
smaller sizes (1536 bytes in case standard 1500 bytes MTU is used)
When a frame larger than what was allocated by driver is received,
dma transfert can occurs past the end of buffer and corrupt
kernel memory.
Fix is to tell to NIC what is the maximum size a frame can be.
This bug is very old, (before git introduction, linux-2.6.10), and
should be backported to stable versions.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Tue, 9 Jun 2009 06:32:22 +0000 (16:32 +1000)]
md/raid5: fix bug in reshape code when chunk_size decreases.
Now that we support changing the chunksize, we calculate
"reshape_sectors" to be the max of number of sectors in old
and new chunk size.
However there is one please where we still use 'chunksize'
rather than 'reshape_sectors'.
This causes a reshape that reduces the size of chunks to freeze.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 9 Jun 2009 04:39:59 +0000 (14:39 +1000)]
md/raid5 - avoid deadlocks in get_active_stripe during reshape
md has functionality to 'quiesce' and array so that all pending
IO completed and no new IO starts. This is used to achieve a
stable state before making internal changes.
Currently this quiescing applies equally to normal IO, resync
IO, and reshape IO.
However there is a problem with applying it to reshape IO.
Reshape can have multiple 'stripe_heads' that must be active together.
If the quiesce come between allocating the first and the last of
such a collection, then we deadlock, as the last will not be allocated
until the quiesce is lifted, the quiesce will not be lifted until the
first (which has been allocated) gets used, and that first cannot be
used until the last is allocated.
It is not necessary to inhibit reshape IO when a quiesce is
requested. Those places in the code that require a full quiesce will
ensure the reshape thread is not running at all.
So allow reshape requests to get access to new stripe_heads without
being blocked by a 'quiesce'.
This only affects in-place reshapes (i.e. where the array does not
grow or shrink) and these are only newly supported. So this patch is
not needed in earlier kernels.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 9 Jun 2009 04:30:31 +0000 (14:30 +1000)]
md/raid5: use conf->raid_disks in preference to mddev->raid_disk
mddev->raid_disks can be changed and any time by a request from
user-space. It is a suggestion as to what number of raid_disks is
desired.
conf->raid_disks can only be changed by the raid5 module with suitable
locks in place. It is a statement as to the current number of
raid_disks.
There are two places where the latter should be used, but the former
is used. This can lead to a crash when reshaping an array.
This patch changes to mddev-> to conf->
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Mon, 8 Jun 2009 19:31:53 +0000 (12:31 -0700)]
async: Fix lack of boot-time console due to insufficient synchronization
Our async work synchronization was broken by "async: make sure
independent async domains can't accidentally entangle" (commit
d5a877e8dd409d8c702986d06485c374b705d340), because it would report
the wrong lowest active async ID when there was both running and
pending async work.
This caused things like no being able to read the root filesystem,
resulting in missing console devices and inability to run 'init',
causing a boot-time panic.
This fixes it by properly returning the lowest pending async ID: if
there is any running async work, that will have a lower ID than any
pending work, and we should _not_ look at the pending work list.
There were alternative patches from Jaswinder and James, but this one
also cleans up the code by removing the pointless 'ret' variable and
the unnecesary testing for an empty list around 'for_each_entry()' (if
the list is empty, the for_each_entry() thing just won't execute).
Fixes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13474
Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 8 Jun 2009 16:22:53 +0000 (09:22 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Outline udelay and fix a few issues.
MIPS: ioctl.h: Fix headers_check warnings
MIPS: Cobalt: PCI bus is always required to obtain the board ID
MIPS: Kconfig: Remove "Support for" from Cavium system type
MIPS: Sibyte: Honor CONFIG_CMDLINE
SSB: BCM47xx: Export ssb_watchdog_timer_set
Alan Cox [Mon, 8 Jun 2009 11:31:00 +0000 (12:31 +0100)]
pata_netcell: Fix typo
The previous patch submission had a I typo I didn't catch but Bartlomiej
noted. Guess this proves the point about any patch being risky late in an rc
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 8 Jun 2009 16:05:48 +0000 (09:05 -0700)]
Merge branch 'kvm-updates/2.6.30' of git://git./virt/kvm/kvm
* 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Explicity initialize cpus_hardware_enabled
Linus Torvalds [Mon, 8 Jun 2009 16:04:55 +0000 (09:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/bart/ide-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
pdc202xx_old: fix resetproc() method
pdc202xx_old: fix 'pdc20246_dma_ops'
Ralf Baechle [Sat, 28 Feb 2009 09:44:28 +0000 (09:44 +0000)]
MIPS: Outline udelay and fix a few issues.
Outlining fixes the issue were on certain CPUs such as the R10000 family
the delay loop would need an extra cycle if it overlaps a cacheline
boundary.
The rewrite also fixes build errors with GCC 4.4 which was changed in
way incompatible with the kernel's inline assembly.
Relying on pure C for computation of the delay value removes the need for
explicit. The price we pay is a slight slowdown of the computation - to
be fixed on another day.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Jaswinder Singh Rajput [Thu, 4 Jun 2009 12:35:49 +0000 (18:05 +0530)]
MIPS: ioctl.h: Fix headers_check warnings
Make ioctl.h compatible with asm-generic/ioctl.h and userspace
fix the following 'make headers_check' warning:
usr/include/asm-mips/ioctl.h:64: extern's make no sense in userspace
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yoichi Yuasa [Tue, 2 Jun 2009 14:17:07 +0000 (23:17 +0900)]
MIPS: Cobalt: PCI bus is always required to obtain the board ID
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Yoichi Yuasa [Tue, 2 Jun 2009 14:15:10 +0000 (23:15 +0900)]
MIPS: Kconfig: Remove "Support for" from Cavium system type
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Tue, 2 Jun 2009 18:05:28 +0000 (19:05 +0100)]
MIPS: Sibyte: Honor CONFIG_CMDLINE
Original patch by Imre Kaloz <kaloz@openwrt.org>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Matthieu Castet [Fri, 22 May 2009 20:25:04 +0000 (22:25 +0200)]
SSB: BCM47xx: Export ssb_watchdog_timer_set
this patch export ssb_watchdog_timer_set to allow to use it in a Linux
watchdog driver.
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Acked-by : Michael Buesch <mb@bu3sch.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Mon, 8 Jun 2009 15:29:31 +0000 (08:29 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 5543/1: arm: serial amba: add missing declaration in serial.h
[ARM] pxa: fix pxa27x_udc default pullup GPIO
[ARM] pxa/imote2: fix UCAM sensor board ADC model number
mx[23]: don't put clock lookups in __initdata
fix oops when using console=ttymxcN with N > 0
[ARM] ARMv7 errata: only apply fixes when running on applicable CPU
[ARM] 5534/1: kmalloc must return a cache line aligned buffer
Linus Torvalds [Mon, 8 Jun 2009 14:53:59 +0000 (07:53 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
sdhci-of: Fix the wrong accessor to HOSTVER register
mvsdio: fix config failure with some high speed SDHC cards
mvsdio: ignore high speed timing requests from the core
mmc/omap: Use disable_irq_nosync() from within irq handlers.
sdhci-of: Add fsl,esdhc as a valid compatible to bind against
mvsdio: allow automatic loading when modular
mxcmmc: Fix missing return value checking in DMA setup code.
mxcmmc : Reset the SDHC hardware if software timeout occurs.
omap_hsmmc: Trivial fix for a typo in comment
mxcmmc: decrease minimum frequency to make MMC cards work
Christoph Hellwig [Mon, 8 Jun 2009 13:37:16 +0000 (15:37 +0200)]
xfs: remove SYNC_BDFLUSH
SYNC_BDFLUSH is a leftover from IRIX and rather misnamed for todays
code. Make xfs_sync_fsdata and xfs_dq_sync use the SYNC_TRYLOCK flag
for not blocking on logs just as the inode sync code already does.
For xfs_sync_fsdata it's a trivial 1:1 replacement, but for xfs_qm_sync
I use the opportunity to decouple the non-blocking lock case from the
different flushing modes, similar to the inode sync code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:37:11 +0000 (15:37 +0200)]
xfs: remove SYNC_IOWAIT
We want to wait for all I/O to finish when we do data integrity syncs. So
there is no reason to keep SYNC_WAIT separate from SYNC_IOWAIT. This
causes a little change in behaviour for the ENOSPC flushing code which now
does a second submission and wait of buffered I/O, but that should finish
ASAP as we already did an asynchronous writeout earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:35:48 +0000 (15:35 +0200)]
xfs: split xfs_sync_inodes
xfs_sync_inodes is used to write back either file data or inode metadata.
In general we always do these separately, except for one fishy case in
xfs_fs_put_super that does both. So separate xfs_sync_inodes into
separate xfs_sync_data and xfs_sync_attr functions. In xfs_fs_put_super
we first call the data sync and then the attr sync as that was the previous
order. The moved log force in that path doesn't make a difference because
we will force the log again as part of the real unmount process.
The filesystem readonly checks are not performed by the new function but
instead moved into the callers, given that most callers alredy have it
further up in the stack. Also add debug checks that we do not pass in
incorrect flags in the new xfs_sync_data and xfs_sync_attr function and
fix the one place that did pass in a wrong flag.
Also remove a comment mentioning xfs_sync_inodes that has been incorrect
for a while because we always take either the iolock or ilock in the
sync path these days.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:35:27 +0000 (15:35 +0200)]
xfs: use generic inode iterator in xfs_qm_dqrele_all_inodes
Use xfs_inode_ag_iterator instead of opencoding the inode walk in the
quota code. Mark xfs_inode_ag_iterator and xfs_sync_inode_valid non-static
to allow using them from the quota code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Mon, 8 Jun 2009 13:35:14 +0000 (15:35 +0200)]
xfs: introduce a per-ag inode iterator
Given that we walk across the per-ag inode lists so often, it makes sense to
introduce an iterator for this.
Convert the sync and reclaim code to use this new iterator, quota code will
follow in the next patch.
Also change xfs_reclaim_inode to return -EGAIN instead of 1 for an inode
already under reclaim. This simplifies the AG iterator and doesn't
matter for the only other caller.
[hch: merged the lookup and execute callbacks back into one to get the
pag_ici_lock locking correct and simplify the code flow]
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Mon, 8 Jun 2009 13:35:12 +0000 (15:35 +0200)]
xfs: remove unused parameter from xfs_reclaim_inodes
The noblock parameter of xfs_reclaim_inodes is only ever set to zero. Remove
it and all the conditional code that is never executed.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Mon, 8 Jun 2009 13:35:07 +0000 (15:35 +0200)]
xfs: factor out inode validation for sync
Separate the validation of inodes found by the radix
tree walk from the radix tree lookup.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:35:05 +0000 (15:35 +0200)]
xfs: split inode flushing from xfs_sync_inodes_ag
In many cases we only want to sync inode metadata. Split out the inode
flushing into a separate helper to prepare factoring the inode sync code.
Based on a patch from Dave Chinner, but redone to keep the current behaviour
exactly and leave changes to the flushing logic to another patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Mon, 8 Jun 2009 13:35:03 +0000 (15:35 +0200)]
xfs: split inode data writeback from xfs_sync_inodes_ag
In many cases we only want to sync inode data. Start spliting the inode sync
into data sync and inode sync by factoring out the inode data flush.
[hch: minor cleanups]
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:33:32 +0000 (15:33 +0200)]
xfs: kill xfs_qmops
Kill the quota ops function vector and replace it with direct calls or
stubs in the CONFIG_XFS_QUOTA=n case.
Make sure we check XFS_IS_QUOTA_RUNNING in the right spots. We can remove
the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set
otherwise.
This brings us back closer to the way this code worked in IRIX and earlier
Linux versions, but we keep a lot of the more useful factoring of common
code.
Eventually we should also kill xfs_qm_bhv.c, but that's left for a later
patch.
Reduces the size of the source code by about 250 lines and the size of
XFS module by about 1.5 kilobytes with quotas enabled:
text data bss dec hex filename
615957 2960 3848 622765 980ad fs/xfs/xfs.o
617231 3152 3848 624231 98667 fs/xfs/xfs.o.old
Fallout:
- xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects
the inode locked and xfs_qm_dqattach which does the locking around it,
thus removing XFS_QMOPT_ILOCKED.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:33:21 +0000 (15:33 +0200)]
xfs: validate quota log items during log recovery
Arkadiusz has seen really strange crashes in xfs_qm_dqcheck that
I can only explain by a log item being too smal to actually fit the
xfs_dqblk_t we're dereferencing all over xfs_qm_dqcheck. So add
graceful checks for NULL or too small quota items to the log recovery
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Christoph Hellwig [Mon, 8 Jun 2009 13:32:59 +0000 (15:32 +0200)]
xfs: update max log size
Commit
a6634fba3dec4a92f0a2c4e30c80b634c0576ad5 in xfsprogs increased the
maximum log size supported by mkfs. Merged back the changes to xfs_fs.h
so the growfs enforced the same limit and the headers are in sync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Avi Kivity [Sat, 6 Jun 2009 09:34:39 +0000 (12:34 +0300)]
KVM: Explicity initialize cpus_hardware_enabled
Under CONFIG_MAXSMP, cpus_hardware_enabled is allocated from the heap and
not statically initialized. This causes a crash on reboot when kvm thinks
vmx is enabled on random nonexistent cpus and accesses nonexistent percpu
lists.
Fix by explicitly clearing the variable.
Cc: stable@kernel.org
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Alessandro Rubini [Sat, 6 Jun 2009 09:17:57 +0000 (10:17 +0100)]
[ARM] 5543/1: arm: serial amba: add missing declaration in serial.h
This header is sometimes included in the uncompress stage to get
register values, but no <linux/amba/bus.h> can be included there.
So declare "struct amba_device" here before using it in a prototype.
Signed-off-by: Alessandro Rubini <rubini@unipv.it>
Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sergei Shtylyov [Sun, 7 Jun 2009 11:52:50 +0000 (13:52 +0200)]
pdc202xx_old: fix resetproc() method
pdc202xx_reset() calls pdc202xx_reset_host() twice, for both channels, while
that function actually twiddles the single, shared software reset bit -- the
net effect is a duplicated reset and horrendous 4 second delay happening not
only on a channel reset but also when dma_lost_irq() and dma_clear() methods
are called. Fold pdc202xx_reset_host() into pdc202xx_reset(), fix printk(),
and move it before the actual reset...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Sun, 7 Jun 2009 11:52:50 +0000 (13:52 +0200)]
pdc202xx_old: fix 'pdc20246_dma_ops'
Commit
ac95beedf8bc97b24f9540d4da9952f07221c023 (ide: add struct ide_port_ops
(take 2)) erroneously converted the driver's dma_timeout() and dma_lost_irq()
methods to call the driver's resetproc() method regardless of whether it was
defined for this specific controller while it hadn't been defined and hence
called for PDC20246. So the dma_clear() method, the successor of dma_timeout(),
shouldn't exist and the dma_lost_irq() method should be standard for PDC20246.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Linus Torvalds [Sat, 6 Jun 2009 21:33:54 +0000 (14:33 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/pci: fix mmconfig detection with 32bit near 4g
PCI: use fixed-up device class when configuring device
Hugh Dickins [Sat, 6 Jun 2009 20:18:09 +0000 (21:18 +0100)]
integrity: fix IMA inode leak
CONFIG_IMA=y inode activity leaks iint_cache and radix_tree_node objects
until the system runs out of memory. Nowhere is calling ima_inode_free()
a.k.a. ima_iint_delete(). Fix that by calling it from destroy_inode().
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 6 Jun 2009 19:18:14 +0000 (12:18 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
ext3/4 with synchronous writes gets wedged by Postfix
Fix nobh_truncate_page() to not pass stack garbage to get_block()
Linus Torvalds [Sat, 6 Jun 2009 19:17:03 +0000 (12:17 -0700)]
Merge branch 'upstream-linus2' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] pata_ali: Use IGN_SIMPLEX
Al Viro [Wed, 13 May 2009 18:13:40 +0000 (19:13 +0100)]
ext3/4 with synchronous writes gets wedged by Postfix
OK, that's probably the easiest way to do that, as much as I don't like it...
Since iget() et.al. will not accept I_FREEING (will wait to go away
and restart), and since we'd better have serialization between new/free
on fs data structures anyway, we can afford simply skipping I_FREEING
et.al. in insert_inode_locked().
We do that from new_inode, so it won't race with free_inode in any interesting
ways and it won't race with iget (of any origin; nfsd or in case of fs
corruption a lookup) since both still will wait for I_LOCK.
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Tested-by: David Watson <dbwatson@ukfsn.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Theodore Ts'o [Tue, 12 May 2009 11:37:56 +0000 (07:37 -0400)]
Fix nobh_truncate_page() to not pass stack garbage to get_block()
The nobh_truncate_page() function is used by ext2, exofs, and jfs. Of
these three, only ext2 and jfs's get_block() function pays attention
to bh->b_size --- which is normally always the filesystem blocksize
except when the get_block() function is called by either
mpage_readpage(), mpage_readpages(), or the direct I/O routines in
fs/direct_io.c.
Unfortunately, nobh_truncate_page() does not initialize map_bh before
calling the filesystem-supplied get_block() function. So ext2 and jfs
will try to calculate the number of blocks to map by taking stack
garbage and shifting it left by inode->i_blkbits. This should be
*mostly* harmless (except the filesystem will do some unnneeded work)
unless the stack garbage is less than filesystem's blocksize, in which
case maxblocks will be zero, and the attempt to find out whether or
not the filesystem has a hole at a given logical block will fail, and
the page cache entry might not get zero'ed out.
Also if the stack garbage in in map_bh->state happens to have the
BH_Mapped bit set, there could be an attempt to call readpage() on a
non-existent page, which could cause nobh_truncate_page() to return an
error when it should not.
Fix this by initializing map_bh->state and map_bh->size.
Fortunately, it's probably fairly unlikely that ext2 and jfs users
mount with nobh these days.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Alan Cox [Wed, 13 May 2009 14:02:27 +0000 (15:02 +0100)]
[libata] pata_ali: Use IGN_SIMPLEX
Some ALi devices report simplex if they have been disabled and re-enabled, and
restoring the byte does not work. Ignore it - the needed supporting logic is
already present for the SATA ULi ports.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Linus Torvalds [Fri, 5 Jun 2009 18:54:28 +0000 (11:54 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: Fix oops and use after free during space balancing
Btrfs: set device->total_disk_bytes when adding new device
Kevin Hilman [Fri, 5 Jun 2009 17:48:08 +0000 (18:48 +0100)]
mtd: davinci nand: update clock naming
DaVinci clock support has been updated in mainline.
Update clock names accordingly.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 5 Jun 2009 18:53:44 +0000 (11:53 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix: Add HP Compaq nc6000 to the broken poweroff list
ahci: add warning messages for hp laptops with broken suspend
pata_efar: fix PIO2 underclocking
pata_legacy: wait for async probing
Ville Syrjala [Mon, 18 May 2009 22:37:44 +0000 (01:37 +0300)]
ata_piix: Add HP Compaq nc6000 to the broken poweroff list
HP Compaq nc6000 suffers from the double disk spindown issue.
Add it to the broken poweroff DMI list.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Sat, 30 May 2009 11:50:12 +0000 (20:50 +0900)]
ahci: add warning messages for hp laptops with broken suspend
Harddisks on HP dv[4-6] and HDX18 fail to come online after resume on
earlier BIOSen. Fortunately, HP recently released BIOS updates for
all machines to fix the issue. Detect old BIOSen, warn the user to
update BIOS on boot and suspend attempts and fail suspend.
Kudos to all the bug reporters.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kernel.org@epperson.homelinux.net
Cc: emisca@gmail.com
Cc: Gadi Cohen <dragon@wastelands.net>
Cc: Paul Swanson <paul@procursa.com>
Cc: s@ourada.org
Cc: Trevor Davenport <trevor.davenport@gmail.com>
Cc: corruptor1972 <steven_tierney@yahoo.co.uk>
Cc: Victoria Wilson <mail@vwilson.co.uk>
Cc: khiraly <khiraly.list@gmail.com>
Cc: Sean <wollombi@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Sergei Shtylyov [Mon, 1 Jun 2009 19:42:10 +0000 (22:42 +0300)]
pata_efar: fix PIO2 underclocking
Fix the PIO mode 2 using mode 0 timings -- this driver should enable the
fast timing bank starting with PIO2, just like the PIIX/ICH drivers do.
Also, fix/rephrase some comments while at it.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
James Bottomley [Fri, 5 Jun 2009 14:41:39 +0000 (10:41 -0400)]
pata_legacy: wait for async probing
The basic problem here that pata_legacy attaches the host, sees if it found
any devices and detaches it if none were found. With async probing, it's not
waiting until discovery is finished before deciding it has no devices and
trying the detach leading to this warning:
ata1: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
------------[ cut here ]------------
WARNING: at drivers/ata/libata-core.c:6222 ata_host_detach+0x75/0x90()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.30-rc7 #1
Call Trace:
[<
c01fbb05>] ? ata_host_detach+0x75/0x90
[<
c01fbb05>] ? ata_host_detach+0x75/0x90
[<
c01139b5>] ? warn_slowpath_common+0x45/0x80
[<
c01139fa>] ? warn_slowpath_null+0xa/0x10
[<
c01fbb05>] ? ata_host_detach+0x75/0x90
[<
c02f40e0>] ? legacy_init+0x44e/0x87f
[<
c02f3c92>] ? legacy_init+0x0/0x87f
[<
c0101021>] ? _stext+0x21/0x140
[<
c01890ff>] ? proc_register+0x2f/0x190
[<
c018938c>] ? create_proc_entry+0x5c/0xc0
[<
c0135ebe>] ? register_irq_proc+0x6e/0x90
[<
c02e6484>] ? kernel_init+0x6e/0xbf
[<
c02e6416>] ? kernel_init+0x0/0xbf
[<
c01031d7>] ? kernel_thread_helper+0x7/0x10
---[ end trace
ef1ee36e873ae3a0 ]---
Because it detaches before the probe is complete.
One way to fix it would be to put an async_synchronize_full() before looking
for devices, which this patch does. A better way might be to separate libata
into its own domain and only wait for that.
Reported-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Linus Torvalds [Fri, 5 Jun 2009 17:46:48 +0000 (10:46 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH
Dave Jones [Fri, 5 Jun 2009 16:37:07 +0000 (12:37 -0400)]
[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH
The powernow-k8 driver checks to see that the Performance Control/Status
Registers are declared as FFH (functional fixed hardware) by the BIOS.
However, this check got broken in the commit:
0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
[CPUFREQ] checkpatch cleanups for powernow-k8
Fix based on an original patch from Naga Chumbalkar.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Linus Torvalds [Fri, 5 Jun 2009 17:21:52 +0000 (10:21 -0700)]
Revert "drm: don't associate _DRM_DRIVER maps with a master"
This reverts commit
6c51d1cfa0a370b48a157163340190cf5fd2346b, which
apparently causes DRI initialization failures on Radeons.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Requested-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 5 Jun 2009 10:56:18 +0000 (11:56 +0100)]
ivtv: Fix PCI DMA direction
The ivtv stream buffers may be for receive or for send but the attached
sg handle is always destined cpu->device. We flush it correctly but the
allocation is wrongly done with the same type as the buffers.
See bug: http://bugzilla.kernel.org/show_bug.cgi?id=13385
(Note this doesn't close the bug - it fixes the ivtv part and in turn
the logging next shows up some rather alarming DMA sg list warnings in
libata)
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Thu, 4 Jun 2009 23:29:09 +0000 (16:29 -0700)]
ptrace: revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic"
Commit
95a3540da9c81a5987be810e1d9a83640a366bd5 ("ptrace_detach: the wrong
wakeup breaks the ERESTARTxxx logic") removed the "extra"
wake_up_process() from ptrace_detach(), but as Jan pointed out this breaks
the compatibility.
I believe the changelog is right and this wake_up() is wrong in many
ways, but GDB assumes that ptrace(PTRACE_DETACH, child, 0, 0) always
wakes up the tracee.
Despite the fact this breaks SIGNAL_STOP_STOPPED/group_stop_count logic,
and despite the fact this wake_up_process() can break another
assumption: PTRACE_DETACH with SIGSTOP should leave the tracee in
TASK_STOPPED case. Because the untraced child can dequeue SIGSTOP and
call do_signal_stop() before ptrace_detach() calls wake_up_process().
Revert this change for now. We need some fixes even if we we want to keep
the current behaviour, but these fixes are not for 2.6.30.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Thu, 4 Jun 2009 23:29:08 +0000 (16:29 -0700)]
kbuild: fix detection of CONFIG_FRAME_WARN=0
The checking of CONFIG_FRAME_WARN in the top level Makefile forgot to
actually derefence the variable thus leading to an always true check.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Thu, 4 Jun 2009 23:29:07 +0000 (16:29 -0700)]
ptrace: tracehook_report_clone: fix false positives
The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
- If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
we must not queue SIGSTOP.
- If we forked the traced task, but the tracer exits and untraces both the
forking task and the new child (after copy_process() drops tasklist_lock),
we should not queue SIGSTOP too.
Change the code to check task_ptrace() != 0 instead. This is still racy, but
the race is harmless.
We can race with another tracer attaching to this child, or the tracer can
exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
yet, the child must have the pending SIGSTOP anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 4 Jun 2009 22:23:51 +0000 (15:23 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Remove a bad BUG_ON in the fence management code.
Linus Torvalds [Thu, 4 Jun 2009 22:23:39 +0000 (15:23 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: ignore EDID with really tiny modes.
drm: don't associate _DRM_DRIVER maps with a master
drm/i915: intel_lvds.c fix section mismatch
drm: Hook up DPMS property handling in drm_crtc.c. Add drm_helper_connector_dpms.
drm: set permissions on edid file to 0444
drm: add newlines to text sysfs files
drm/radeon: fix ring free alignment calculations
drm: fix irq naming for kms drivers.
Salman Qazi [Thu, 4 Jun 2009 22:20:39 +0000 (15:20 -0700)]
drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero
While running 20 parallel instances of dd as follows:
#!/bin/bash
for i in `seq 1 20`; do
dd if=/dev/zero of=/export/hda3/dd_$i bs=
1073741824 count=1 &
done
wait
on a 16G machine, we noticed that rather than just killing the processes,
the entire kernel went down. Stracing dd reveals that it first does an
mmap2, which makes 1GB worth of zero page mappings. Then it performs a
read on those pages from /dev/zero, and finally it performs a write.
The machine died during the reads. Looking at the code, it was noticed
that /dev/zero's read operation had been changed by
557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") from giving
zero page mappings to actually zeroing the page.
The zeroing of the pages causes physical pages to be allocated to the
process. But, when the process exhausts all the memory that it can, the
kernel cannot kill it, as it is still in the kernel mode allocating more
memory. Consequently, the kernel eventually crashes.
To fix this, I propose that when a fatal signal is pending during
/dev/zero read operation, we simply return and let the user process die.
Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Modified error return and comment trivially. - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Mason [Thu, 4 Jun 2009 19:34:51 +0000 (15:34 -0400)]
Btrfs: Fix oops and use after free during space balancing
The btrfs allocator uses list_for_each to walk the available block
groups when searching for free blocks. It starts off with a hint
to help find the best block group for a given allocation.
The hint is resolved into a block group, but we don't properly check
to make sure the block group we find isn't in the middle of being
freed due to filesystem shrinking or balancing. If it is being
freed, the list pointers in it are bogus and can't be trusted. But,
the code happily goes along and uses them in the list_for_each loop,
leading to all kinds of fun.
The fix used here is to check to make sure the block group we find really
is on the list before we use it. list_del_init is used when removing
it from the list, so we can do a proper check.
The allocation clustering code has a similar bug where it will trust
the block group in the current free space cluster. If our allocation
flags have changed (going from single spindle dup to raid1 for example)
because the drives in the FS have changed, we're not allowed to use
the old block group any more.
The fix used here is to check the current cluster against the
current allocation flags.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Rusty Russell [Wed, 3 Jun 2009 05:22:24 +0000 (14:52 +0930)]
lguest: fix 'unhandled trap 13' with CONFIG_CC_STACKPROTECTOR
We don't set up the canary; let's disable stack protector on boot.c so
we can get into lguest_init, then set it up. As a side effect,
switch_to_new_gdt() sets up %fs for us properly too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Thu, 4 Jun 2009 16:02:58 +0000 (17:02 +0100)]
Merge branch 'fix' of git://git./linux/kernel/git/ycmiao/pxa-linux-2.6
Yan Zheng [Thu, 4 Jun 2009 13:23:50 +0000 (09:23 -0400)]
Btrfs: set device->total_disk_bytes when adding new device
It was not being properly initialized, and so the size saved to
disk was not correct.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Eric Anholt [Thu, 4 Jun 2009 11:18:14 +0000 (11:18 +0000)]
drm/i915: Remove a bad BUG_ON in the fence management code.
This could be triggered by a gtt mapping fault on 965 that decides to
remove the fence from another object that happens to be active currently.
Since the other object doesn't rely on the fence reg for its execution, we
don't wait for it to finish. We'll soon be not waiting on 915 most of the
time as well, so just drop the BUG_ON.
Signed-off-by: Eric Anholt <eric@anholt.net>
Russell King [Thu, 4 Jun 2009 11:27:18 +0000 (12:27 +0100)]
Merge branch 'for-rmk' of git://git.pengutronix.de/git/imx/linux-2.6
Yinghai Lu [Wed, 3 Jun 2009 07:13:13 +0000 (00:13 -0700)]
x86/pci: fix mmconfig detection with 32bit near 4g
Pascal reported and bisected a commit:
| x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case
which broke one system system.
ACPI: Using IOAPIC for interrupt routing
PCI: MCFG configuration 0: base
f0000000 segment 0 buses 0 - 255
PCI: MCFG area at
f0000000 reserved in ACPI motherboard resources
PCI: Using MMCONFIG for extended config space
it didn't have
PCI: updated MCFG configuration 0: base
f0000000 segment 0 buses 0 - 63
anymore, and try to use 0xf000000 - 0xffffffff for mmconfig
For 32bit, mcfg_res->end could be 32bit only (if 64 resources aren't used)
So use end - 1 to pass the value in mcfg->end to avoid overflow.
We don't need to worry about the e820 path, they are always 64 bit.
Reported-by: Pascal Terjan <pterjan@mandriva.com>
Bisected-by: Pascal Terjan <pterjan@mandriva.com>
Tested-by: Pascal Terjan <pterjan@mandriva.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Yu Zhao [Wed, 27 May 2009 16:25:05 +0000 (00:25 +0800)]
PCI: use fixed-up device class when configuring device
The device class may be changed after the fixup, so re-read the class
value from pci_dev when configuring the device. Otherwise some devices
such as JMicron SATA controller won't work.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>