Theodore Ts'o [Wed, 30 Sep 2009 04:32:42 +0000 (00:32 -0400)]
ext4: Use tracepoints for mb_history trace file
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.
By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 29 Sep 2009 19:51:30 +0000 (15:51 -0400)]
ext4, jbd2: Drop unneeded printks at mount and unmount time
There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted. Disable them to economize space
in the system logs. In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Curt Wohlgemuth [Tue, 29 Sep 2009 15:01:03 +0000 (11:01 -0400)]
ext4: Handle nested ext4_journal_start/stop calls without a journal
This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Curt Wohlgemuth [Tue, 29 Sep 2009 20:06:01 +0000 (16:06 -0400)]
ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.
It also removes a test for non-matching transaction which can never
happen.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Frank Mayhar [Tue, 29 Sep 2009 14:07:47 +0000 (10:07 -0400)]
ext4: Avoid updating the inode table bh twice in no journal mode
This is a cleanup of commit 91ac6f4. Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode. Indeed, it would be
duplicated work.
Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 28 Sep 2009 19:58:29 +0000 (15:58 -0400)]
ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming Cao [Mon, 28 Sep 2009 19:48:29 +0000 (15:48 -0400)]
ext4: async direct IO for holes and fallocate support
For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.
But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.
Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue. When fsync() is called, it will go
through the list and do the conversion.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Mingming Cao [Mon, 28 Sep 2009 19:48:41 +0000 (15:48 -0400)]
ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
Currently the DIO VFS code passes create = 0 when writing to the
middle of file. It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock). Direct I/O writes into holes
falls back to buffered IO for this reason.
Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO. Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.
To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.
Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming Cao [Mon, 28 Sep 2009 19:49:08 +0000 (15:49 -0400)]
ext4: Split uninitialized extents for direct I/O
When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.
When the IO is complete, the written extent will be marked as initialized.
Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming Cao [Mon, 28 Sep 2009 19:49:52 +0000 (15:49 -0400)]
ext4: release reserved quota when block reservation for delalloc retry
ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.
Bug found by Jan Kara.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 29 Sep 2009 17:31:31 +0000 (13:31 -0400)]
ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small. This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time. So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode. We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 28 Sep 2009 04:06:20 +0000 (00:06 -0400)]
ext4: Fix hueristic which avoids group preallocation for closed files
The hueristic was designed to avoid using locality group preallocation
when writing the last segment of a closed file. Fix it by move
setting size to the maximum of size and isize until after we check
whether size == isize.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 26 Sep 2009 21:43:59 +0000 (17:43 -0400)]
ext4: Use ext4_msg() for ext4_da_writepage() errors
This allows the user to see what filesystem was involved with a
particular ext4_da_writepage() error. Also, use KERN_CRIT which is
more appropriate than KERN_EMERG.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Tue, 29 Sep 2009 19:59:34 +0000 (15:59 -0400)]
ext4: Update documentation about quota mount options
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Fri, 25 Sep 2009 16:27:30 +0000 (09:27 -0700)]
Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: writeback_inodes_sb() should use bdi_start_writeback()
writeback: don't delay inodes redirtied by a fast dirtier
writeback: make the super_block pinning more efficient
writeback: don't resort for a single super_block in move_expired_inodes()
writeback: move inodes from one super_block together
writeback: get rid to incorrect references to pdflush in comments
writeback: improve readability of the wb_writeback() continue/break logic
writeback: cleanup writeback_single_inode()
writeback: kupdate writeback shall not stop when more io is possible
writeback: stop background writeback when below background threshold
writeback: balance_dirty_pages() shall write more than dirtied pages
fs: Fix busyloop in wb_writeback()
Jens Axboe [Fri, 25 Sep 2009 15:15:03 +0000 (17:15 +0200)]
writeback: writeback_inodes_sb() should use bdi_start_writeback()
Pointless to iterate other devices looking for a super, when
we have a bdi mapping.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Wu Fengguang [Fri, 25 Sep 2009 04:04:10 +0000 (06:04 +0200)]
writeback: don't delay inodes redirtied by a fast dirtier
Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier. We used to call
redirty_tail() in this case, which could delay inode for up to 30s.
This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.
So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.
The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.
CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 24 Sep 2009 13:25:11 +0000 (15:25 +0200)]
writeback: make the super_block pinning more efficient
Currently we pin the inode->i_sb for every single inode. This
increases cache traffic on sb->s_umount sem. Lets instead
cache the inode sb pin state and keep the super_block pinned
for as long as keep writing out inodes from the same
super_block.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Thu, 24 Sep 2009 13:12:57 +0000 (15:12 +0200)]
writeback: don't resort for a single super_block in move_expired_inodes()
If we only moved inodes from a single super_block to the temporary
list, there's no point in doing a resort for multiple super_blocks.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Shaohua Li [Thu, 24 Sep 2009 12:42:33 +0000 (14:42 +0200)]
writeback: move inodes from one super_block together
__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
several partitions, writeback might keep spindle moving between partitions.
To reduce the move, better write big chunk of one partition and then move to
another. Inodes from one fs usually are in one partion, so idealy move indoes
from one fs together should reduce spindle move. This patch tries to address
this. Before per-bdi writeback is added, the behavior is write indoes
from one fs first and then another, so the patch restores previous behavior.
The loop in the patch is a bit ugly, should we add a dirty list for each
superblock in bdi_writeback?
Test in a two partition disk with attached fio script shows about 3% ~ 6%
improvement.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 23 Sep 2009 17:37:09 +0000 (19:37 +0200)]
writeback: get rid to incorrect references to pdflush in comments
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 23 Sep 2009 17:32:26 +0000 (19:32 +0200)]
writeback: improve readability of the wb_writeback() continue/break logic
And throw some comments in there, too.
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Wu Fengguang [Wed, 23 Sep 2009 12:33:42 +0000 (20:33 +0800)]
writeback: cleanup writeback_single_inode()
Make the if-else straight in writeback_single_inode().
No behavior change.
Cc: Jan Kara <jack@suse.cz>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Wu Fengguang [Wed, 23 Sep 2009 12:33:41 +0000 (20:33 +0800)]
writeback: kupdate writeback shall not stop when more io is possible
Fix the kupdate case, which disregards wbc.more_io and stop writeback
prematurely even when there are more inodes to be synced.
wbc.more_io should always be respected.
Also remove the pages_skipped check. It will set when some page(s) of some
inode(s) cannot be written for now. Such inodes will be delayed for a while.
This variable has nothing to do with whether there are other writeable inodes.
CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Wu Fengguang [Wed, 23 Sep 2009 12:33:40 +0000 (20:33 +0800)]
writeback: stop background writeback when below background threshold
Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.
Also simplify the (nr_pages <= 0) checks. Since we already pass in
nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't
need to worry about it being decreased to zero.
Reported-by: Richard Kennedy <richard@rsk.demon.co.uk>
CC: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Wu Fengguang [Wed, 23 Sep 2009 13:56:00 +0000 (21:56 +0800)]
writeback: balance_dirty_pages() shall write more than dirtied pages
Some filesystem may choose to write much more than ratelimit_pages
before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
determine number to write based on real number of dirtied pages.
Otherwise it is possible that
loop {
btrfs_file_write(): dirty 1024 pages
balance_dirty_pages(): write up to 48 pages (= ratelimit_pages * 1.5)
}
in which the writeback rate cannot keep up with dirty rate, and the
dirty pages go all the way beyond dirty_thresh.
The increased write_chunk may make the dirtier more bumpy.
So filesystems shall be take care not to dirty too much at
a time (eg. > 4MB) without checking the ratelimit.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jan Kara [Wed, 16 Sep 2009 17:22:48 +0000 (19:22 +0200)]
fs: Fix busyloop in wb_writeback()
If all inodes are under writeback (e.g. in case when there's only one inode
with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
write anything.
Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linus Torvalds [Fri, 25 Sep 2009 14:44:14 +0000 (07:44 -0700)]
Merge git://git./linux/kernel/git/lethal/sh-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh_mobile_ceu_camera: fix compile breakage, caused by a bad merge
sh: Add support DMA Engine to SH7780
sh: Add support DMA Engine to SH7722
sh: enable onenand support in kfr2r09 defconfig.
sh: update defconfigs.
sh: add FSI driver support for ms7724se
sh: Fix up uninitialized variable use caught by gcc 4.4.
sh: Handle unaligned 16-bit instructions on SH-2A.
sh: mach-ecovec24: Add active low setting for sh_eth
sh: includecheck fix: dwarf.c
Linus Torvalds [Fri, 25 Sep 2009 14:24:42 +0000 (07:24 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] Add support for the Avionic Design Xanthos watchdog timer.
Linus Torvalds [Fri, 25 Sep 2009 14:22:11 +0000 (07:22 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (94 commits)
genetlink: fix netns vs. netlink table locking (2)
3c59x: Get rid of "Trying to free already-free IRQ"
tunnel: eliminate recursion field
ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
net: fix htmldocs sunrpc, clnt.c
Phonet: error on broadcast sending (unimplemented)
Phonet: fix race for port number in concurrent bind()
pktgen: better scheduler friendliness
pktgen: T_TERMINATE flag is unused
ipv4: check optlen for IP_MULTICAST_IF option
ath9k: Initialize txgain and rxgain for newer AR9287 chipsets.
iwlagn: fix panic in iwl{5000,4965}_rx_reply_tx
ath9k: Fix RFKILL bugs
drivers/net/wireless: Use usb_endpoint_dir_out
cfg80211: don't overwrite privacy setting
wl12xx: fix kconfig/link errors
rt2x00: fix the definition of rt2x00crypto_rx_insert_iv
iwlwifi: reduce noise when skb allocation fails
iwlwifi: do not send sync command while holding spinlock
mac80211: fix DTIM setting
...
Thierry Reding [Wed, 23 Sep 2009 11:49:52 +0000 (13:49 +0200)]
[WATCHDOG] Add support for the Avionic Design Xanthos watchdog timer.
This patch adds support for the watchdog timer on Avionic Design Xanthos
boards.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Guennadi Liakhovetski [Fri, 25 Sep 2009 04:36:51 +0000 (13:36 +0900)]
sh_mobile_ceu_camera: fix compile breakage, caused by a bad merge
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Nobuhiro Iwamatsu [Thu, 12 Mar 2009 07:31:45 +0000 (07:31 +0000)]
sh: Add support DMA Engine to SH7780
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Nobuhiro Iwamatsu [Thu, 12 Mar 2009 07:31:41 +0000 (07:31 +0000)]
sh: Add support DMA Engine to SH7722
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Paul Mundt [Fri, 25 Sep 2009 03:15:15 +0000 (12:15 +0900)]
Merge branch 'master' of git://git./linux/kernel/git/torvalds/linux-2.6
Paul Mundt [Fri, 25 Sep 2009 02:55:07 +0000 (11:55 +0900)]
sh: enable onenand support in kfr2r09 defconfig.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Paul Mundt [Fri, 25 Sep 2009 02:53:02 +0000 (11:53 +0900)]
sh: update defconfigs.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Kuninori Morimoto [Fri, 21 Aug 2009 01:24:54 +0000 (01:24 +0000)]
sh: add FSI driver support for ms7724se
Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Linus Torvalds [Fri, 25 Sep 2009 00:25:09 +0000 (17:25 -0700)]
Merge branch 'for-linus' of git://linux-m32r.org/git/takata/linux-2.6_dev
* 'for-linus' of git://www.linux-m32r.org/git/takata/linux-2.6_dev:
m32r: Cleanup linker script using new linker script macros.
m32r: Move the spi_stack_top and spu_stack_top into .init.data section.
m32r: Remove unused .altinstructions and .exit.* code from linker script.
m32r: Move GET_THREAD_INFO definition out of asm/thread_info.h.
m32r: Define THREAD_SIZE only once.
m32r: make PAGE_SIZE available to assembly.
Linus Torvalds [Fri, 25 Sep 2009 00:22:31 +0000 (17:22 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
Fix build of cpm_uart due to core changes
powerpc/8xx: Fix regression introduced by cache coherency rewrite
powerpc/4xx: Fix erroneous xmon warning on PowerPC 4xx
powerpc/mm: Fix 40x and 8xx vs. _PAGE_SPECIAL
powerpc: Cleanup linker script using new linker script macros.
powerpc: Fix ibm,client-architecture-support printout
powerpc: Increase NODES_SHIFT on 64bit from 4 to 8
powerpc/perf_counter: Fix vdso detection
powerpc: Move 64bit heap above 1TB on machines with 1TB segments
powerpc: Change archdata dma_data to a union
powerpc: Rename get_dma_direct_offset get_dma_offset
powerpc/mm: Remove duplicated #include
powerpc/book3e-64: Remove duplicated #include
powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLE
powerpc/pmc: Don't access lppaca on Book3E
powerpc: kmalloc failure ignored in vio_build_iommu_table()
hvc_console: Provide (un)locked version for hvc_resize()
David Howells [Thu, 24 Sep 2009 11:33:48 +0000 (12:33 +0100)]
NOMMU: Ignore mmap() address param as it is a hint
Ignore the address parameter given to NOMMU mmap() as it is a hint, rather
than giving an error if it's non-zero. MAP_FIXED still gets an error.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 24 Sep 2009 11:33:32 +0000 (12:33 +0100)]
NOMMU: Fallback for is_vmalloc_or_module_addr() should be inline
The NOMMU fallback for is_vmalloc_or_module_addr() should be static inline,
not just static, in linux/mm.h.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 24 Sep 2009 14:13:10 +0000 (15:13 +0100)]
NOMMU: Fix MAP_PRIVATE mmap() of objects where the data can be mapped directly
Fix MAP_PRIVATE mmap() of files and devices where the data in the backing store
might be mapped directly. Use the BDI_CAP_MAP_DIRECT capability flag to govern
whether or not we should be trying to map a file directly. This can be used to
determine whether or not a region has been filled in at the point where we call
do_mmap_shared() or do_mmap_private().
The BDI_CAP_MAP_DIRECT capability flag is cleared by validate_mmap_request() if
there's any reason we can't use it. It's also cleared in do_mmap_pgoff() if
f_op->get_unmapped_area() fails.
Without this fix, attempting to run a program from a RomFS image on a
non-mappable MTD partition results in a BUG as the kernel attempts XIP, and
this can be caught in gdb:
Program received signal SIGABRT, Aborted.
0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
(gdb) bt
#0 0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
#1 0xc005f168 in do_mmap_pgoff (file=0xc31a6620, addr=<value optimized out>, len=3808, prot=3, flags=6146, pgoff=0) at mm/nommu.c:1373
#2 0xc00a96b8 in elf_fdpic_map_file (params=0xc33fbbec, file=0xc31a6620, mm=0xc31bef60, what=0xc0213144 "executable") at mm.h:1145
#3 0xc00aa8b4 in load_elf_fdpic_binary (bprm=0xc316cb00, regs=<value optimized out>) at fs/binfmt_elf_fdpic.c:343
#4 0xc006b588 in search_binary_handler (bprm=0x6, regs=0xc33fbce0) at fs/exec.c:1234
#5 0xc006c648 in do_execve (filename=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460, regs=0xc33fbce0) at fs/exec.c:1356
#6 0xc0008cf0 in sys_execve (name=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460) at arch/frv/kernel/process.c:263
#7 0xc00075dc in __syscall_call () at arch/frv/kernel/entry.S:897
Note that this fix does the following commit differently:
commit
a190887b58c32d19c2eee007c5eb8faa970a69ba
Author: David Howells <dhowells@redhat.com>
Date: Sat Sep 5 11:17:07 2009 -0700
nommu: fix error handling in do_mmap_pgoff()
Reported-by: Graff Yang <graff.yang@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 24 Sep 2009 14:12:00 +0000 (15:12 +0100)]
FRV: Flash mappings for the MB93090-MB00 motherboard
Flash mappings for the MB93090-MB00 evaluation motherboard.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geoffrey Thomas [Thu, 24 Sep 2009 14:36:26 +0000 (10:36 -0400)]
alpha: Clean up linker script using new linker script macros.
Note that .data.page_aligned and .data.cacheline_aligned are now after
_data; it was probably a bug that they were before it.
Also, some explicit ALIGN(8)'s between various initcall sections were
removed; this should be harmless as the implicit alignment of
initcall_t was already 8.
Signed-off-by: Geoffrey Thomas <geofft@ksplice.com>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:25 +0000 (10:36 -0400)]
alpha: use .data.init_task instead of .data.init_thread.
alpha is the only architecture that uses the section name
.data.init_thread instead of .data.init_task. So convert alpha to use
.data.init_task like everything else.
.data.init_task does not need a separate output section; this change
also moves it into the .data output section.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:24 +0000 (10:36 -0400)]
powerpc: Cleanup linker script using new linker script macros.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:23 +0000 (10:36 -0400)]
blackfin: Cleanup linker script using new linker script macros.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:22 +0000 (10:36 -0400)]
mn10300: Clean up linker script using higher-level macros.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:21 +0000 (10:36 -0400)]
h8300: Cleanup linker script using new linker script macros.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:20 +0000 (10:36 -0400)]
um: Clean up linker script using standard macros.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:19 +0000 (10:36 -0400)]
xtensa: Cleanup linker script using new linker script macros.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:18 +0000 (10:36 -0400)]
parisc: Remove useless altinstructions code copied from x86.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:17 +0000 (10:36 -0400)]
parisc: Clean up linker script using new linker script macros.
This patch has the (likely harmless) side effect of moving
.data.init_task inside the _edata.
It also changes the alignment of .data.init_task from 16384 to
THREAD_SIZE, which can in some configurations be larger than 16384. I
believe that this change fixes a potential bug on those
configurations.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tim Abbott [Thu, 24 Sep 2009 14:36:16 +0000 (10:36 -0400)]
Optimize the ordering of sections in RW_DATA_SECTION.
The old RW_DATA_SECTION had INIT_TASK_DATA (which was
more-than-PAGE_SIZE-aligned), followed by a bunch of small alignment
stuff, followed by more PAGE_SIZE-aligned stuff, so you wasted memory
in the middle of .data re-aligning back up to PAGE_SIZE.
This patch sorts the sections by alignment requirements, which should
pack them essentially optimally.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 24 Sep 2009 21:47:45 +0000 (14:47 -0700)]
hugetlb_file_setup(): use C, not cpp
Why macros are always wrong:
mm/mmap.c: In function 'do_mmap_pgoff':
mm/mmap.c:953: warning: unused variable 'user'
also, move a couple of struct forward-decls outside `#ifdef
CONFIG_HUGETLB_PAGE' - it's pointless and frequently harmful to make these
conditional (eg, this patch needed `struct user_struct').
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 24 Sep 2009 21:47:43 +0000 (14:47 -0700)]
procfs: disable per-task stack usage on NOMMU
It needs walk_page_range().
Reported-by: Michal Simek <monstr@monstr.eu>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 25 Sep 2009 00:10:17 +0000 (17:10 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Prevent lower dentry from going negative during unlink
eCryptfs: Propagate vfs_read and vfs_write return codes
eCryptfs: Validate global auth tok keys
eCryptfs: Filename encryption only supports password auth tokens
eCryptfs: Check for O_RDONLY lower inodes when opening lower files
eCryptfs: Handle unrecognized tag 3 cipher codes
ecryptfs: improved dependency checking and reporting
eCryptfs: Fix lockdep-reported AB-BA mutex issue
ecryptfs: Remove unneeded locking that triggers lockdep false positives
Linus Torvalds [Fri, 25 Sep 2009 00:08:56 +0000 (17:08 -0700)]
Merge branch 'for-linus' of git://repo.or.cz/cris-mirror
* 'for-linus' of git://repo.or.cz/cris-mirror:
CRIS: Cleanup linker script using new linker script macros.
ARRAY_SIZE changes
CRIS: convert to asm-generic/hardirq.h
CRISv10: Don't autonegotiate if autonegotiation is off
CRIS: fix defconfig build failure
CRIS: add pgprot_noncached
Linus Torvalds [Fri, 25 Sep 2009 00:06:51 +0000 (17:06 -0700)]
Merge branch 'for-linus' of /home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (103 commits)
ARM: 5719/1: [AT91] Fix AC97 breakage
ARM: 5721/1: MMCI enable the use of a regulator
ARM: 5720/1: Move MMCI header to amba include dir
ARM: 5718/1: Sane busids for RealView board components
ARM: 5715/1: Make kprobes unregistration SMP safe
ARM: 5711/1: locomo.c: CodingStyle cleanups
ARM: 5710/1: at91: add AC97 support to at91sam9rl and at91sam9rlek board
ARM: 5709/1: at91: add AC97 support to at91sam9g45 series and at91sam9m10g45ek board
ARM: 5621/1: at91/dmaengine: integration of at_hdmac driver in at91sam9g45 series
ARM: 5620/1: at91/dmaengine: integration of at_hdmac driver in at91sam9rl
ARM: Add support for checking access permissions on prefetch aborts
ARM: Separate out access error checking
ARM: Ensure correct might_sleep() check in pagefault path
ARM: Update page fault handling for new OOM techniques
ARM: Provide definitions and helpers for decoding the FSR register
ARM: 5712/1: SA1100: initialise spinlock in DMA code
ARM: s3c: fix check of index into s3c_gpios[]
ARM: spitz: fix touchscreen max presure
ARM: STMP3xxx: deallocation with negative index of descriptors[]
Thumb-2: Correctly handle undefined instructions in the kernel
...
Linus Torvalds [Fri, 25 Sep 2009 00:06:16 +0000 (17:06 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs[23] tcp breakage in mount with binary options
net: fix htmldocs sunrpc, clnt.c
Linus Torvalds [Fri, 25 Sep 2009 00:06:01 +0000 (17:06 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Don't turn on carrier for a non-active port
IB/mthca: Fix access to freed memory in catastrophic event handling
mlx4_core: Pass cache line size to device FW
RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
IB/mad: Fix lock-lock-timer deadlock in RMPP code
Johannes Berg [Thu, 24 Sep 2009 22:44:05 +0000 (15:44 -0700)]
genetlink: fix netns vs. netlink table locking (2)
Similar to commit
d136f1bd366fdb7e747ca7e0218171e7a00a98a5,
there's a bug when unregistering a generic netlink family,
which is caught by the might_sleep() added in that commit:
BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
2 locks held by rmmod/1510:
#0: (genl_mutex){+.+.+.}, at: [<
ffffffff8138283b>] genl_unregister_family+0x2b/0x130
#1: (rcu_read_lock){.+.+..}, at: [<
ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
Call Trace:
[<
ffffffff81044ff9>] __might_sleep+0x119/0x150
[<
ffffffff81380501>] netlink_table_grab+0x21/0x100
[<
ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
[<
ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
[<
ffffffff81382866>] genl_unregister_family+0x56/0x130
[<
ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
[<
ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]
Fix in the same way by grabbing the netlink table lock
before doing rcu_read_lock().
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Vorontsov [Thu, 24 Sep 2009 08:31:52 +0000 (08:31 +0000)]
3c59x: Get rid of "Trying to free already-free IRQ"
Following trace pops up if we try to suspend with 3c59x ethernet NIC
brought down:
root@b1:~# ifconfig eth16 down
root@b1:~# echo mem > /sys/power/state
...
3c59x 0000:00:10.0: suspend
3c59x 0000:00:10.0: PME# disabled
Trying to free already-free IRQ 48
------------[ cut here ]------------
Badness at
c00554e4 [verbose debug info unavailable]
NIP:
c00554e4 LR:
c00554e4 CTR:
c019a098
REGS:
c7975c60 TRAP: 0700 Not tainted (2.6.31-rc4)
MSR:
00021032 <ME,CE,IR,DR> CR:
28242422 XER:
20000000
TASK =
c79cb0c0[1746] 'bash' THREAD:
c7974000
...
NIP [
c00554e4] __free_irq+0x108/0x1b0
LR [
c00554e4] __free_irq+0x108/0x1b0
Call Trace:
[
c7975d10] [
c00554e4] __free_irq+0x108/0x1b0 (unreliable)
[
c7975d30] [
c005559c] free_irq+0x10/0x24
[
c7975d40] [
c01e21ec] vortex_suspend+0x70/0xc4
[
c7975d60] [
c017e584] pci_legacy_suspend+0x58/0x100
This is because the driver manages interrupts without checking for
netif_running().
Though, there are few other issues with suspend/resume in this driver.
The intention of calling free_irq() in suspend() was to avoid any
possible spurious interrupts (see commit
5b039e681b8c5f30aac9cc04385
"3c59x PM fixes"). But,
- On resume, the driver was requesting IRQ just after pci_set_master(),
but before vortex_up() (which actually resets 3c59x chips).
- Issuing free_irq() on a shared IRQ doesn't guarantee that a buggy
HW won't trigger spurious interrupts in another driver that
requested the same interrupt. So, if we want to protect from
unexpected interrupts, then on suspend we should issue disable_irq(),
not free_irq().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 23 Sep 2009 10:28:33 +0000 (10:28 +0000)]
tunnel: eliminate recursion field
It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.
This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastian Haas [Thu, 24 Sep 2009 03:55:05 +0000 (03:55 +0000)]
ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
The driver mapped only 128 bytes of the CAN controller address space when a
CPC-PCI v2 was detected (incl. CPC-104P). This patch will fix it by always
mapping the whole address space (4096 bytes on all boards) of the
corresponding PCI BAR.
Signed-off-by: Sebastian Haas <haas@ems-wuensche.com>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jaswinder Singh Rajput [Thu, 24 Sep 2009 02:19:41 +0000 (02:19 +0000)]
net: fix htmldocs sunrpc, clnt.c
DOCPROC Documentation/DocBook/networking.xml
Warning(net/sunrpc/clnt.c:647): No description found for parameter 'req'
Warning(net/sunrpc/clnt.c:647): No description found for parameter 'tk_ops'
Warning(net/sunrpc/clnt.c:647): Excess function parameter 'ops' description in 'rpc_run_bc_task'
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Wed, 23 Sep 2009 03:17:11 +0000 (03:17 +0000)]
Phonet: error on broadcast sending (unimplemented)
If we ever implement this, then we can stop returning an error.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Wed, 23 Sep 2009 03:17:10 +0000 (03:17 +0000)]
Phonet: fix race for port number in concurrent bind()
Allocating a port number to a socket and hashing that socket shall be
an atomic operation with regards to other port allocation. Otherwise,
we could allocate a port that is already being allocated to another
socket.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 22 Sep 2009 19:41:43 +0000 (19:41 +0000)]
pktgen: better scheduler friendliness
Previous update did not resched in inner loop causing watchdogs.
Rewrite inner loop to:
* account for delays better with less clock calls
* more accurate timing of delay:
- only delay if packet was successfully sent
- if delay is 100ns and it takes 10ns to build packet then
account for that
* use wait_event_interruptible_timeout rather than open coding it.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 22 Sep 2009 19:41:42 +0000 (19:41 +0000)]
pktgen: T_TERMINATE flag is unused
Get rid of unused flag bit.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shan Wei [Tue, 22 Sep 2009 15:41:10 +0000 (15:41 +0000)]
ipv4: check optlen for IP_MULTICAST_IF option
Due to man page of setsockopt, if optlen is not valid, kernel should return
-EINVAL. But a simple testcase as following, errno is 0, which means setsockopt
is successful.
addr.s_addr = inet_addr("192.1.2.3");
setsockopt(s, IPPROTO_IP, IP_MULTICAST_IF, &addr, 1);
printf("errno is %d\n", errno);
Xiaotian Feng(dfeng@redhat.com) caught the bug. We fix it firstly checking
the availability of optlen and then dealing with the logic like other options.
Reported-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 24 Sep 2009 22:13:11 +0000 (15:13 -0700)]
Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
drivers/staging/Kconfig
drivers/staging/Makefile
drivers/staging/cpc-usb/TODO
drivers/staging/cpc-usb/cpc-usb_drv.c
drivers/staging/cpc-usb/cpc.h
drivers/staging/cpc-usb/cpc_int.h
drivers/staging/cpc-usb/cpcusb.h
Russell King [Thu, 24 Sep 2009 20:22:33 +0000 (21:22 +0100)]
Merge branch 'origin' into for-linus
Conflicts:
MAINTAINERS
Roland Dreier [Thu, 24 Sep 2009 19:43:08 +0000 (12:43 -0700)]
Merge branches 'ipoib', 'mad', 'mlx4', 'mthca' and 'nes' into for-linus
Moni Shoua [Thu, 24 Sep 2009 19:01:05 +0000 (12:01 -0700)]
IPoIB: Don't turn on carrier for a non-active port
Multicast joins can succeed even if the IB port is down. This happens
when the SM runs on the same port with the requesting port. However,
IPoIB calls netif_carrier_on() when the join of the broadcast group
succeeds, without caring about the state of the IB port. The result
is an IPoIB interface in RUNNING state but without an active IB port
to support it.
If a bonding interface uses this IPoIB interface as a slave it might
not detect that this slave is almost useless and failover
functionality will be damaged. The fix checks the state of the IB
port in the carrier_task before calling netif_carrier_on().
Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Al Viro [Thu, 24 Sep 2009 18:58:42 +0000 (14:58 -0400)]
nfs[23] tcp breakage in mount with binary options
We forget to set nfs_server.protocol in tcp case when old-style binary
options are passed to mount. The thing remains zero and never validated
afterwards. As the result, we hit BUG in fs/nfs/client.c:588.
Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data
merged yesterday...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jaswinder Singh Rajput [Thu, 24 Sep 2009 18:58:42 +0000 (14:58 -0400)]
net: fix htmldocs sunrpc, clnt.c
DOCPROC Documentation/DocBook/networking.xml
Warning(net/sunrpc/clnt.c:647): No description found for parameter 'req'
Warning(net/sunrpc/clnt.c:647): No description found for parameter 'tk_ops'
Warning(net/sunrpc/clnt.c:647): Excess function parameter 'ops' description in 'rpc_run_bc_task'
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jack Morgenstein [Thu, 24 Sep 2009 18:55:41 +0000 (11:55 -0700)]
IB/mthca: Fix access to freed memory in catastrophic event handling
catas_reset() uses a pointer to mthca_dev, but mthca_dev is not valid
after the call to __mthca_restart_one().
Based on a similar patch for mlx4 (
634354d7, "mlx4: Fix access to
freed memory") by Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Eli Cohen [Thu, 24 Sep 2009 18:03:03 +0000 (11:03 -0700)]
mlx4_core: Pass cache line size to device FW
ConnectX can work more efficiently if the CPU cache line size is passed
to it with the INIT_HCA firmware command.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Julia Lawall [Thu, 24 Sep 2009 17:59:34 +0000 (10:59 -0700)]
RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
The definition of nes_netdev_ops has initializations of a local function
and eth_mac_addr for its ndo_set_mac_address field. This change uses only
the local function.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
identifier I, s, fld;
position p0,p;
expression E;
@@
struct I s =@p0 { ... .fld@p = E, ...};
@s@
identifier I, s, r.fld;
position r.p0,p;
expression E;
@@
struct I s =@p0 { ... .fld@p = E, ...};
@script:python@
p0 << r.p0;
fld << r.fld;
ps << s.p;
pr << r.p;
@@
if int(ps[0].line)!=int(pr[0].line) or int(ps[0].column)!=int(pr[0].column):
cocci.print_main(fld,p0)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Linus Torvalds [Thu, 24 Sep 2009 17:30:41 +0000 (10:30 -0700)]
Merge branch 'drm-intel-next' of git://git./linux/kernel/git/anholt/drm-intel
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (57 commits)
drm/i915: Handle ERESTARTSYS during page fault
drm/i915: Warn before mmaping a purgeable buffer.
drm/i915: Track purged state.
drm/i915: Remove eviction debug spam
drm/i915: Immediately discard any backing storage for uneeded objects
drm/i915: Do not mis-classify clean objects as purgeable
drm/i915: Whitespace correction for madv
drm/i915: BUG_ON page refleak during unbind
drm/i915: Search harder for a reusable object
drm/i915: Clean up evict from list.
drm/i915: Add tracepoints
drm/i915: framebuffer compression for GM45+
drm/i915: split display functions by chip type
drm/i915: Skip the sanity checks if the current relocation is valid
drm/i915: Check that the relocation points to within the target
drm/i915: correct FBC update when pipe base update occurs
drm/i915: blacklist Acer AspireOne lid status
ACPI: make ACPI button funcs no-ops if not built in
drm/i915: prevent FIFO calculation overflows on 32 bits with high dotclocks
drm/i915: intel_display.c handle latency variable efficiently
...
Fix up trivial conflicts in drivers/gpu/drm/i915/{i915_dma.c|i915_drv.h}
Linus Torvalds [Thu, 24 Sep 2009 16:57:08 +0000 (09:57 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
x86/PCI: make 32 bit NUMA node array int, not unsigned char
x86/PCI: default pcibus cpumask to all cpus if it lacks affinity
MAINTAINTERS: remove hotplug driver entries
PCI: pciehp: remove slot capabilities definitions
PCI: pciehp: remove error message definitions
PCI: pciehp: remove number field
PCI: pciehp: remove hpc_ops
PCI: pciehp: remove pci_dev field
PCI: pciehp: remove crit_sect mutex
PCI: pciehp: remove slot_bus field
PCI: pciehp: remove first_slot field
PCI: pciehp: remove slot_device_offset field
PCI: pciehp: remove hp_slot field
PCI: pciehp: remove device field
PCI: pciehp: remove bus field
PCI: pciehp: remove slot_num_inc field
PCI: pciehp: remove num_slots field
PCI: pciehp: remove slot_list field
PCI: fix VGA arbiter header file
PCI: Disable AER with pci=nomsi
...
Fixed up trivial conflicts in MAINTAINERS
Linus Torvalds [Thu, 24 Sep 2009 16:04:24 +0000 (09:04 -0700)]
Merge branch 'cputime' of git://git390.marist.edu/linux-2.6
* 'cputime' of git://git390.marist.edu/pub/scm/linux-2.6:
[PATCH] Fix idle time field in /proc/uptime
Linus Torvalds [Thu, 24 Sep 2009 16:01:44 +0000 (09:01 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (24 commits)
microblaze: Disable heartbeat/enable emaclite in defconfigs
microblaze: Support simpleImage.dts make target
microblaze: Fix _start symbol to physical address
microblaze: Use LOAD_OFFSET macro to get correct LMA for all sections
microblaze: Create the LOAD_OFFSET macro used to compute VMA vs LMA offsets
microblaze: Copy ppc asm-compat.h for clean handling of constants in asm and C
microblaze: Actually show KiB rather than pages in "Freeing initrd memory:"
microblaze: Support ptrace syscall tracing.
microblaze: Updated CPU version and FPGA family codes in PVR
microblaze: Generate correct signal and siginfo for integer div-by-zero
microblaze: Don't be noisy when userspace causes hardware exceptions
microblaze: Remove ipc.h file which points to non-existing asm-generic file
microblaze: Clear sticky FSR register after generating exception signals
microblaze: Ensure CPU usermode is set on new userspace processes
microblaze: Use correct kbuild variable KBUILD_CFLAGS
microblaze: Save and restore msr in hw exception
microblaze: Add architectural support for USB EHCI host controllers
microblaze: Implement include/asm/syscall.h.
microblaze: Improve checking mechanism for MSR instruction
microblaze: Add checking mechanism for MSR instruction
...
Linus Torvalds [Thu, 24 Sep 2009 16:01:05 +0000 (09:01 -0700)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
module: don't call percpu_modfree on NULL pointer.
module: fix memory leak when load fails after srcversion/version allocated
module: preferred way to use MODULE_AUTHOR
param: allow whitespace as kernel parameter separator
module: reduce string table for loaded modules (v2)
module: reduce symbol table for loaded modules (v2)
Linus Torvalds [Thu, 24 Sep 2009 15:57:29 +0000 (08:57 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (42 commits)
Btrfs: hash the btree inode during fill_super
Btrfs: relocate file extents in clusters
Btrfs: don't rename file into dummy directory
Btrfs: check size of inode backref before adding hardlink
Btrfs: fix releasepage to avoid unlocking extents we haven't locked
Btrfs: Fix test_range_bit for whole file extents
Btrfs: fix errors handling cached state in set/clear_extent_bit
Btrfs: fix early enospc during balancing
Btrfs: deal with NULL space info
Btrfs: account for space used by the super mirrors
Btrfs: fix extent entry threshold calculation
Btrfs: remove dead code
Btrfs: fix bitmap size tracking
Btrfs: don't keep retrying a block group if we fail to allocate a cluster
Btrfs: make balance code choose more wisely when relocating
Btrfs: fix arithmetic error in clone ioctl
Btrfs: add snapshot/subvolume destroy ioctl
Btrfs: change how subvolumes are organized
Btrfs: do not reuse objectid of deleted snapshot/subvol
Btrfs: speed up snapshot dropping
...
Linus Torvalds [Thu, 24 Sep 2009 15:32:11 +0000 (08:32 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
truncate: use new helpers
truncate: new helpers
fs: fix overflow in sys_mount() for in-kernel calls
fs: Make unload_nls() NULL pointer safe
freeze_bdev: grab active reference to frozen superblocks
freeze_bdev: kill bd_mount_sem
exofs: remove BKL from super operations
fs/romfs: correct error-handling code
vfs: seq_file: add helpers for data filling
vfs: remove redundant position check in do_sendfile
vfs: change sb->s_maxbytes to a loff_t
vfs: explicitly cast s_maxbytes in fiemap_check_ranges
libfs: return error code on failed attr set
seq_file: return a negative error code when seq_path_root() fails.
vfs: optimize touch_time() too
vfs: optimization for touch_atime()
vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it
fs/inode.c: add dev-id and inode number for debugging in init_special_inode()
libfs: make simple_read_from_buffer conventional
Linus Torvalds [Thu, 24 Sep 2009 15:31:04 +0000 (08:31 -0700)]
Merge git://git./linux/kernel/git/viro/audit-current
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
lsm: Use a compressed IPv6 string format in audit events
Audit: send signal info if selinux is disabled
Audit: rearrange audit_context to save 16 bytes per struct
Audit: reorganize struct audit_watch to save 8 bytes
Rusty Russell [Fri, 25 Sep 2009 06:32:59 +0000 (00:32 -0600)]
module: don't call percpu_modfree on NULL pointer.
The general one handles NULL, the static obsolescent
(CONFIG_HAVE_LEGACY_PER_CPU_AREA) one in module.c doesn't; Eric's
commit
720eba31 assumed it did, and various frobbings since then kept
that assumption.
All other callers in module.c all protect it with an if; this effectively
does the same as free_init is only goto if we fail percpu_modalloc().
Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Rusty Russell [Fri, 25 Sep 2009 06:32:58 +0000 (00:32 -0600)]
module: fix memory leak when load fails after srcversion/version allocated
Normally the twisty paths of sysfs will free the attributes, but not if
we fail before we hook it into sysfs (which is the last thing we do in
load_module).
(This sysfs code is a turd, no doubt there are other issues lurking too).
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Johannes Berg [Fri, 25 Sep 2009 06:32:58 +0000 (00:32 -0600)]
module: preferred way to use MODULE_AUTHOR
For the longest time now we've been using multiple MODULE_AUTHOR()
statements when a module has more than one author, but the comment here
disagrees.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Peter Oberparleiter [Mon, 6 Jul 2009 15:11:22 +0000 (17:11 +0200)]
param: allow whitespace as kernel parameter separator
Some boot mechanisms require that kernel parameters are stored in a
separate file which is loaded to memory without further processing
(e.g. the "Load from FTP" method on s390). When such a file contains
newline characters, the kernel parameter preceding the newline might
not be correctly parsed (due to the newline being stuck to the end of
the actual parameter value) which can lead to boot failures.
This patch improves kernel command line usability in such a situation
by allowing generic whitespace characters as separators between kernel
parameters.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Jan Beulich [Mon, 6 Jul 2009 13:51:44 +0000 (14:51 +0100)]
module: reduce string table for loaded modules (v2)
Also remove all parts of the string table (referenced by the symbol
table) that are not needed for kallsyms use (i.e. which were only
referenced by symbols discarded by the previous patch, or not
referenced at all for whatever reason).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Jan Beulich [Mon, 6 Jul 2009 13:50:42 +0000 (14:50 +0100)]
module: reduce symbol table for loaded modules (v2)
Discard all symbols not interesting for kallsyms use: absolute,
section, and in the common case (!KALLSYMS_ALL) data ones.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Linus Torvalds [Thu, 24 Sep 2009 14:55:29 +0000 (07:55 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (97 commits)
md: raid-1/10: fix RW bits manipulation
md: remove unnecessary memset from multipath.
md: report device as congested when suspended
md: Improve name of threads created by md_register_thread
md: remove sparse warnings about lock context.
md: remove sparse waring "symbol xxx shadows an earlier one"
async_tx/raid6: add missing dma_unmap calls to the async fail case
ioat3: fix uninitialized var warnings
drivers/dma/ioat/dma_v2.c: fix warnings
raid6test: fix stack overflow
ioat2: clarify ring size limits
md/raid6: cleanup ops_run_compute6_2
md/raid6: eliminate BUG_ON with side effect
dca: module load should not be an error message
ioat: driver version 4.0
dca: registering requesters in multiple dca domains
async_tx: remove HIGHMEM64G restriction
dmaengine: sh: Add Support SuperH DMA Engine driver
dmaengine: Move all map_sg/unmap_sg for slave channel to its client
fsldma: Add DMA_SLAVE support
...
Linus Torvalds [Thu, 24 Sep 2009 14:54:16 +0000 (07:54 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
V4L/DVB (13039): dib0700: not building CONFIG_DVB_TUNER_DIB0070 breaks compilation
V4L/DVB (13038): dvbdev: Remove an anoying/uneeded warning
V4L/DVB (13037): go7007: Revert compatibility code added at the wrong place
media: video: Fix build in saa7164
Linus Torvalds [Thu, 24 Sep 2009 14:53:22 +0000 (07:53 -0700)]
Merge branch 'hwpoison' of git://git./linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...
Andrew Morton [Wed, 23 Sep 2009 22:57:43 +0000 (15:57 -0700)]
drivers/usb/serial/sierra.c: fix CONFIG_PM=n build
drivers/usb/serial/sierra.c: In function 'sierra_suspend':
drivers/usb/serial/sierra.c:936: error: 'struct usb_device' has no member named 'auto_pm'
Repairs
commit
e6929a9020acbeb04d9a3ad9a88234c15be808fd
Author: Oliver Neukum <oliver@neukum.org>
Date: Fri Sep 4 23:19:53 2009 +0200
USB: support for autosuspend in sierra while online
Cc: Greg KH <greg@kroah.com>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: Elina Pasheva <epasheva@sierrawireless.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ivan Kokshaysky [Wed, 23 Sep 2009 22:57:42 +0000 (15:57 -0700)]
alpha: AGP update (fixes compile failure)
This brings Alpha AGP platforms in sync with the change to struct
agp_memory (unsigned long *memory => struct page **pages).
Only compile tested (I don't have titan/marvel hardware), but this change
looks pretty straightforward, so hopefully it's ok.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Dave Airlie <airlied@linux.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>