Linus Torvalds [Thu, 26 May 2011 19:13:22 +0000 (12:13 -0700)]
Merge branch 'spi/next' of git://git.secretlab.ca/git/linux-2.6
* 'spi/next' of git://git.secretlab.ca/git/linux-2.6:
spi/amba-pl022: work in polling or interrupt mode if pl022_dma_probe fails
spi/spi_s3c24xx: Use spi_bitbang_stop instead of spi_unregister_master in s3c24xx_spi_remove
spi/spi_nuc900: Use spi_bitbang_stop instead of spi_unregister_master in nuc900_spi_remove
spi/spi_tegra: use spi_unregister_master() instead of spi_master_put()
spi/spi_sh: use spi_unregister_master instead of spi_master_put in remove path
spi: Use void pointers for data in simple SPI I/O operations
spi/pl022: use cpu_relax in the busy loop
spi/pl022: mark driver non-experimental
spi/pl022: timeout on polled transfer v2
spi/dw_spi: improve the interrupt mode with the batch ops
spi/dw_spi: change poll mode transfer from byte ops to batch ops
spi/dw_spi: remove the un-necessary flush()
spi/dw_spi: unify the low level read/write routines
Linus Torvalds [Thu, 26 May 2011 19:11:54 +0000 (12:11 -0700)]
Merge branch 'omap-for-linus' of git://git./linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (33 commits)
OMAP3: PM: Boot message is not an error, and not helpful, remove it
OMAP3: cpuidle: change the power domains modes determination logic
OMAP3: cpuidle: code rework for improved readability
OMAP3: cpuidle: re-organize the C-states data
OMAP3: clean-up mach specific cpuidle data structures
OMAP3 cpuidle: remove useless SDP specific timings
usb: otg: OMAP4430: Powerdown the internal PHY when USB is disabled
usb: otg: OMAP4430: Fixing the omap4430_phy_init function
usb: musb: am35x: fix compile error when building am35x
usb: musb: OMAP4430: Power down the PHY during board init
omap: drop board-igep0030.c
omap: igep0020: add support for IGEP3
omap: igep0020: minor refactoring
omap: igep0020: name refactoring for future merge with IGEP3
omap: Remove support for omap2evm
arm: omap2plus: GPIO cleanup
omap: musb: introduce default board config
omap: move detection of NAND CS to common-board-devices
omap: use common initialization for PMIC i2c bus
omap: consolidate touch screen initialization among different boards
...
Linus Torvalds [Thu, 26 May 2011 19:11:17 +0000 (12:11 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/4xx: Adding PCIe MSI support
powerpc: Fix irq_free_virt by adjusting bounds before loop
powerpc/irq: Protect irq_radix_revmap_lookup against irq_free_virt
powerpc/irq: Check desc in handle_one_irq and expand generic_handle_irq
powerpc/irq: Always free duplicate IRQ_LEGACY hosts
powerpc/irq: Remove stale and misleading comment
powerpc/cell: Rename ipi functions to match current abstractions
powerpc/cell: Use common smp ipi actions
Remove unused MSG_ flags in linux/smp.h
powerpc/pseries: Update MAX_HCALL_OPCODE to reflect page coalescing
powerpc/oprofile: Handle events that raise an exception without overflowing
powerpc/ftrace: Implement raw syscall tracepoints on PowerPC
Linus Torvalds [Thu, 26 May 2011 19:03:50 +0000 (12:03 -0700)]
Fix build with !HUGETLBFS
I stupidly broke the case of CONFIG_HUGETLBFS=n when doing the
conversion to vm_flags_t in commit
ca16d140af91 ("mm: don't access
vm_flags as 'int'"). And my 'allyesconfig' build didn't find it, for
obvious reasons..
Include <linux/mm_types.h> in <linux/hugetlb.h>. The problem could have
been avoided by just turning the hugetlb_file_setup() error wrapper into
a macro, but mm_types.h is a reasonable include in this file.
Reported-by: Richard -rw- Weinberger <richard.weinberger@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 26 May 2011 17:55:15 +0000 (10:55 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jlbec/ocfs2
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (28 commits)
Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly.
ocfs2/dlm: Do not migrate resource to a node that is leaving the domain
ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG
Ocfs2/move_extents: Set several trivial constraints for threshold.
Ocfs2/move_extents: Let defrag handle partial extent moving.
Ocfs2/move_extents: move/defrag extents within a certain range.
Ocfs2/move_extents: helper to calculate the defraging length in one run.
Ocfs2/move_extents: move entire/partial extent.
Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode.
Ocfs2/move_extents: helper to probe a proper region to move in an alloc group.
Ocfs2/move_extents: helper to validate and adjust moving goal.
Ocfs2/move_extents: find the victim alloc group, where the given #blk fits.
Ocfs2/move_extents: defrag a range of extent.
Ocfs2/move_extents: move a range of extent.
Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving.
Ocfs2/move_extents: Add basic framework and source files for extent moving.
Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2.
Ocfs2/refcounttree: Publicize couple of funcs from refcounttree.c
Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
...
Linus Torvalds [Thu, 26 May 2011 17:50:56 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/djm/tmem
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
xen: cleancache shim to Xen Transcendent Memory
ocfs2: add cleancache support
ext4: add cleancache support
btrfs: add cleancache support
ext3: add cleancache support
mm/fs: add hooks to support cleancache
mm: cleancache core ops functions and config
fs: add field to superblock to support cleancache
mm/fs: cleancache documentation
Fix up trivial conflict in fs/btrfs/extent_io.c due to includes
Linus Torvalds [Thu, 26 May 2011 17:49:11 +0000 (10:49 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent
xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec
xfs: fix up asserts in xfs_iflush_fork
xfs: do not do pointer arithmetic on extent records
xfs: do not use unchecked extent indices in xfs_bunmapi
xfs: do not use unchecked extent indices in xfs_bmapi
xfs: do not use unchecked extent indices in xfs_bmap_add_extent_*
xfs: remove if_lastex
xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag
xfs: do not discard alloc btree blocks
xfs: add online discard support
Linus Torvalds [Thu, 26 May 2011 16:53:20 +0000 (09:53 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits)
jbd2: Add MAINTAINERS entry
jbd2: fix a potential leak of a journal_head on an error path
ext4: teach ext4_ext_split to calculate extents efficiently
ext4: Convert ext4 to new truncate calling convention
ext4: do not normalize block requests from fallocate()
ext4: enable "punch hole" functionality
ext4: add "punch hole" flag to ext4_map_blocks()
ext4: punch out extents
ext4: add new function ext4_block_zero_page_range()
ext4: add flag to ext4_has_free_blocks
ext4: reserve inodes and feature code for 'quota' feature
ext4: add support for multiple mount protection
ext4: ensure f_bfree returned by ext4_statfs() is non-negative
ext4: protect bb_first_free in ext4_trim_all_free() with group lock
ext4: only load buddy bitmap in ext4_trim_fs() when it is needed
jbd2: Fix comment to match the code in jbd2__journal_start()
ext4: fix waiting and sending of a barrier in ext4_sync_file()
jbd2: Add function jbd2_trans_will_send_data_barrier()
jbd2: fix sending of data flush on journal commit
ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly
...
Linus Torvalds [Thu, 26 May 2011 16:52:14 +0000 (09:52 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits)
cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
ceph: remove unnecessary dentry_unhash calls
vfs: clean up vfs_rename_other
vfs: clean up vfs_rename_dir
vfs: clean up vfs_rmdir
vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
libfs: drop unneeded dentry_unhash
vfs: update dentry_unhash() comment
vfs: push dentry_unhash on rename_dir into file systems
vfs: push dentry_unhash on rmdir into file systems
vfs: remove dget() from dentry_unhash()
vfs: dentry_unhash immediately prior to rmdir
vfs: Block mmapped writes while the fs is frozen
...
KOSAKI Motohiro [Thu, 26 May 2011 10:16:19 +0000 (19:16 +0900)]
mm: don't access vm_flags as 'int'
The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[ Changed to use a typedef - we'll extend it to cover more cases
later, since there has been discussion about making it a 64-bit
type.. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Magenheimer [Thu, 26 May 2011 16:02:21 +0000 (10:02 -0600)]
xen: cleancache shim to Xen Transcendent Memory
This patch provides a shim between the kernel-internal cleancache
API (see Documentation/mm/cleancache.txt) and the Xen Transcendent
Memory ABI (see http://oss.oracle.com/projects/tmem).
Xen tmem provides "hypervisor RAM" as an ephemeral page-oriented
pseudo-RAM store for cleancache pages, shared cleancache pages,
and frontswap pages. Tmem provides enterprise-quality concurrency,
full save/restore and live migration support, compression
and deduplication.
A presentation showing up to 8% faster performance and up to 52%
reduction in sectors read on a kernel compile workload, despite
aggressive in-kernel page reclamation ("self-ballooning") can be
found at:
http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:02:08 +0000 (10:02 -0600)]
ocfs2: add cleancache support
This eighth patch of eight in this cleancache series "opts-in"
cleancache for ocfs2. Clustered filesystems must explicitly enable
cleancache by calling cleancache_init_shared_fs anytime an instance
of the filesystem is mounted. Ocfs2 is currently the only user of
the clustered filesystem interface but nevertheless, the cleancache
hooks in the VFS layer are sufficient for ocfs2 including the matching
cleancache_flush_fs hook which must be called on unmount.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
[v8: trivial merge conflict update]
[v5: jeremy@goop.org: simplify init hook and any future fs init changes]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Tso <tytso@mit.edu>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:02:03 +0000 (10:02 -0600)]
ext4: add cleancache support
This seventh patch of eight in this cleancache series "opts-in"
cleancache for ext4. Filesystems must explicitly enable cleancache
by calling cleancache_init_fs anytime an instance of the filesystem
is mounted. For ext4, all other cleancache hooks are in
the VFS layer including the matching cleancache_flush_fs
hook which must be called on unmount.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
[v6-v8: no changes]
[v5: jeremy@goop.org: simplify init hook and any future fs init changes]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:01:56 +0000 (10:01 -0600)]
btrfs: add cleancache support
This sixth patch of eight in this cleancache series "opts-in"
cleancache for btrfs. Filesystems must explicitly enable
cleancache by calling cleancache_init_fs anytime an instance
of the filesystem is mounted. Btrfs uses its own readpage
which must be hooked, but all other cleancache hooks are in
the VFS layer including the matching cleancache_flush_fs hook
which must be called on unmount.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
[v6-v8: no changes]
[v5: jeremy@goop.org: simplify init hook and any future fs init changes]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:01:49 +0000 (10:01 -0600)]
ext3: add cleancache support
This fifth patch of eight in this cleancache series "opts-in"
cleancache for ext3. Filesystems must explicitly enable
cleancache by calling cleancache_init_fs anytime an instance
of the filesystem is mounted. For ext3, all other cleancache
hooks are in the VFS layer including the matching cleancache_flush_fs
hook which must be called on unmount.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
[v6-v8: no changes]
[v5: jeremy@goop.org: simplify init hook and any future fs init changes]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:01:43 +0000 (10:01 -0600)]
mm/fs: add hooks to support cleancache
This fourth patch of eight in this cleancache series provides the
core hooks in VFS for: initializing cleancache per filesystem;
capturing clean pages reclaimed by page cache; attempting to get
pages from cleancache before filesystem read; and ensuring coherency
between pagecache, disk, and cleancache. Note that the placement
of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic
change was required due to a patchset in 2.6.39.
All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become
a check of a boolean global if CONFIG_CLEANCACHE is set but no
cleancache "backend" has claimed cleancache_ops.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
[v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function]
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:01:36 +0000 (10:01 -0600)]
mm: cleancache core ops functions and config
This third patch of eight in this cleancache series provides
the core code for cleancache that interfaces between the hooks in
VFS and individual filesystems and a cleancache backend. It also
includes build and config patches.
Two new files are added: mm/cleancache.c and include/linux/cleancache.h.
Note that CONFIG_CLEANCACHE can default to on; in systems that do
not provide a cleancache backend, all hooks devolve to a simple
check of a global enable flag, so performance impact should
be negligible but can be reduced to zero impact if config'ed off.
However for this first commit, it defaults to off.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
Credits: Cleancache_ops design derived from Jeremy Fitzhardinge
design for tmem
[v8: dan.magenheimer@oracle.com: fix exportfs call affecting btrfs]
[v8: akpm@linux-foundation.org: use static inline function, not macro]
[v7: dan.magenheimer@oracle.com: cleanup sysfs and remove cleancache prefix]
[v6: JBeulich@novell.com: robustly handle buggy fs encode_fh actor definition]
[v5: jeremy@goop.org: clean up global usage and static var names]
[v5: jeremy@goop.org: simplify init hook and any future fs init changes]
[v5: hch@infradead.org: cleaner non-global interface for ops registration]
[v4: adilger@sun.com: interface must support exportfs FS's]
[v4: hch@infradead.org: interface must support 64-bit FS on 32-bit kernel]
[v3: akpm@linux-foundation.org: use one ops struct to avoid pointer hops]
[v3: akpm@linux-foundation.org: document and ensure PageLocked reqts are met]
[v3: ngupta@vflare.org: fix success/fail codes, change funcs to void]
[v2: viro@ZenIV.linux.org.uk: use sane types]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Acked-by: Jan Beulich <JBeulich@novell.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Dan Magenheimer [Thu, 26 May 2011 16:01:19 +0000 (10:01 -0600)]
fs: add field to superblock to support cleancache
This second patch of eight in this cleancache series adds a field to
the generic superblock to squirrel away a pool identifier that is
dynamically provided by cleancache-enabled filesystems at mount time
to uniquely identify files and pages belonging to this mounted filesystem.
Details and a FAQ can be found in Documentation/vm/cleancache.txt
[v8: trivial merge conflict update]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Dan Magenheimer [Thu, 26 May 2011 16:00:56 +0000 (10:00 -0600)]
mm/fs: cleancache documentation
This patchset introduces cleancache, an optional new feature exposed
by the VFS layer that potentially dramatically increases page cache
effectiveness for many workloads in many environments at a negligible
cost. It does this by providing an interface to transcendent memory,
which is memory/storage that is not otherwise visible to and/or directly
addressable by the kernel.
Instead of being discarded, hooks in the reclaim code "put" clean
pages to cleancache. Filesystems that "opt-in" may "get" pages
from cleancache that were previously put, but pages in cleancache are
"ephemeral", meaning they may disappear at any time. And the size
of cleancache is entirely dynamic and unknowable to the kernel.
Filesystems currently supported by this patchset include ext3, ext4,
btrfs, and ocfs2. Other filesystems (especially those built entirely
on VFS) should be easy to add, but should first be thoroughly tested to
ensure coherency.
Details and a FAQ are provided in Documentation/vm/cleancache.txt
This first patch of eight in this cleancache series only adds two
new documentation files.
[v8: minor documentation changes by author]
[v3: akpm@linux-foundation.org: document sysfs API]
[v3: hch@infradead.org: move detailed description to Documentation/vm]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Theodore Ts'o [Thu, 26 May 2011 13:53:09 +0000 (09:53 -0400)]
jbd2: Add MAINTAINERS entry
Create a separate MAINTAINERS entry for jbd2
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Sage Weil [Tue, 24 May 2011 20:06:22 +0000 (13:06 -0700)]
cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
Cifs has no problems with lingering references to unlinked directory
inodes.
CC: Steve French <sfrench@samba.org>
CC: linux-cifs@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:21 +0000 (13:06 -0700)]
ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
Ocfs2 has no issues with lingering references to unlinked directory inodes.
CC: Mark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:20 +0000 (13:06 -0700)]
exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
Exofs has no problems with lingering references to unlinked directory
inodes.
CC: Benny Halevy <bhalevy@panasas.com>
CC: osd-dev@open-osd.org
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:19 +0000 (13:06 -0700)]
nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
NFS has no problems with lingering references to unlinked directory
inodes.
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: linux-nfs@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:18 +0000 (13:06 -0700)]
ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
ext2 has no problems with lingering references to unlinked directory
inodes.
CC: Jan Kara <jack@suse.cz>
CC: linux-ext4@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:17 +0000 (13:06 -0700)]
ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
ext3 has no problems with lingering references to unlinked directory
inodes.
CC: Jan Kara <jack@suse.cz>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Andreas Dilger <adilger.kernel@dilger.ca>
CC: linux-ext4@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:16 +0000 (13:06 -0700)]
ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
ext4 has no problems with lingering references to unlinked directory
inodes.
CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Andreas Dilger <adilger.kernel@dilger.ca>
CC: linux-ext4@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:15 +0000 (13:06 -0700)]
btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
Btrfs has no problems with lingering references to unlinked directory
inodes.
CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:14 +0000 (13:06 -0700)]
ceph: remove unnecessary dentry_unhash calls
Ceph does not need these, and they screw up our use of the dcache as a
consistent cache.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:13 +0000 (13:06 -0700)]
vfs: clean up vfs_rename_other
Simplify control flow to match vfs_rename_dir.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:12 +0000 (13:06 -0700)]
vfs: clean up vfs_rename_dir
Simplify control flow through vfs_rename_dir.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:11 +0000 (13:06 -0700)]
vfs: clean up vfs_rmdir
Simplify the control flow with an out label.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 24 May 2011 20:06:10 +0000 (13:06 -0700)]
vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
vfs_rename_dir() doesn't properly account for filesystems with
FS_RENAME_DOES_D_MOVE. If new_dentry has a target inode attached, it
unhashes the new_dentry prior to the rename() iop and rehashes it after,
but doesn't account for the possibility that rename() may have swapped
{old,new}_dentry. For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
new_dentry (now the old renamed-from name, which d_move() expected to go
away), such that a subsequent lookup will find it. Currently all
FS_RENAME_DOES_D_MOVE filesystems compensate for this by failing in
d_revalidate.
The bug was introduced by: commit
349457ccf2592c14bdf13b6706170ae2e94931b1
"[PATCH] Allow file systems to manually d_move() inside of ->rename()"
Fix by not rehashing the new dentry. Rehashing used to be needed by
d_move() but isn't anymore.
Reported-by: Sage Weil <sage@newdream.net>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:09 +0000 (13:06 -0700)]
libfs: drop unneeded dentry_unhash
There are no libfs issues with dangling references to empty directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:08 +0000 (13:06 -0700)]
vfs: update dentry_unhash() comment
The helper is now only called by file systems, not the VFS.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:07 +0000 (13:06 -0700)]
vfs: push dentry_unhash on rename_dir into file systems
Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:06 +0000 (13:06 -0700)]
vfs: push dentry_unhash on rmdir into file systems
Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.
This does not change behavior for any in-tree file systems.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:05 +0000 (13:06 -0700)]
vfs: remove dget() from dentry_unhash()
This serves no useful purpose that I can discern. All callers (rename,
rmdir) hold their own reference to the dentry.
A quick audit of all file systems showed no relevant checks on the value
of d_count in vfs_rmdir/vfs_rename_dir paths.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Tue, 24 May 2011 20:06:04 +0000 (13:06 -0700)]
vfs: dentry_unhash immediately prior to rmdir
This presumes that there is no reason to unhash a dentry if we fail because
it is a mountpoint or the LSM check fails, and that the LSM checks do not
depend on the dentry being unhashed.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jan Kara [Mon, 23 May 2011 22:23:35 +0000 (00:23 +0200)]
vfs: Block mmapped writes while the fs is frozen
We should not allow file modification via mmap while the filesystem is
frozen. So block in block_page_mkwrite() while the filesystem is frozen.
We cannot do the blocking wait in __block_page_mkwrite() since e.g. ext4
will want to call that function with transaction started in some cases
and that would deadlock. But we can at least do the non-blocking reliable
check in __block_page_mkwrite() which is the hardest part anyway.
We have to check for frozen filesystem with the page marked dirty and under
page lock with which we then return from ->page_mkwrite(). Only that way we
cannot race with writeback done by freezing code - either we mark the page
dirty after the writeback has started, see freezing in progress and block, or
writeback will wait for our page lock which is released only when the fault is
done and then writeback will writeout and writeprotect the page again.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jan Kara [Mon, 23 May 2011 22:23:34 +0000 (00:23 +0200)]
vfs: Create __block_page_mkwrite() helper passing error values back
Create __block_page_mkwrite() helper which does all what block_page_mkwrite()
does except that it passes back errors from __block_write_begin /
block_commit_write calls.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Roman Borisov [Wed, 25 May 2011 23:26:48 +0000 (16:26 -0700)]
fs/namespace.c: bound mount propagation fix
This issue was discovered by users of busybox. And the bug is actual for
busybox users, I don't know how it affects others. Apparently, mount is
called with and without MS_SILENT, and this affects mount() behaviour.
But MS_SILENT is only supposed to affect kernel logging verbosity.
The following script was run in an empty test directory:
mkdir -p mount.dir mount.shared1 mount.shared2
touch mount.dir/a mount.dir/b
mount -vv --bind mount.shared1 mount.shared1
mount -vv --make-rshared mount.shared1
mount -vv --bind mount.shared2 mount.shared2
mount -vv --make-rshared mount.shared2
mount -vv --bind mount.shared2 mount.shared1
mount -vv --bind mount.dir mount.shared2
ls -R mount.dir mount.shared1 mount.shared2
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
rm -f mount.dir/a mount.dir/b mount.dir/c
rmdir mount.dir mount.shared1 mount.shared2
mount -vv was used to show the mount() call arguments and result.
Output shows that flag argument has 0x00008000 = MS_SILENT bit:
mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared1','',0x0010c000,''):0
mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared2','',0x0010c000,''):0
mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0
mount.dir:
a
b
mount.shared1:
mount.shared2:
a
b
After adding --loud option to remove MS_SILENT bit from just one mount cmd:
mkdir -p mount.dir mount.shared1 mount.shared2
touch mount.dir/a mount.dir/b
mount -vv --bind mount.shared1 mount.shared1 2>&1
mount -vv --make-rshared mount.shared1 2>&1
mount -vv --bind mount.shared2 mount.shared2 2>&1
mount -vv --loud --make-rshared mount.shared2 2>&1 # <-HERE
mount -vv --bind mount.shared2 mount.shared1 2>&1
mount -vv --bind mount.dir mount.shared2 2>&1
ls -R mount.dir mount.shared1 mount.shared2 2>&1
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
rm -f mount.dir/a mount.dir/b mount.dir/c
rmdir mount.dir mount.shared1 mount.shared2
The result is different now - look closely at mount.shared1 directory listing.
Now it does show files 'a' and 'b':
mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared1','',0x0010c000,''):0
mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared2','',0x00104000,''):0
mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0
mount.dir:
a
b
mount.shared1:
a
b
mount.shared2:
a
b
The analysis shows that MS_SILENT flag which is ON by default in any
busybox-> mount operations cames to flags_to_propagation_type function and
causes the error return while is_power_of_2 checking because the function
expects only one bit set. This doesn't allow to do busybox->mount with
any --make-[r]shared, --make-[r]private etc options.
Moreover, the recently added flags_to_propagation_type() function doesn't
allow us to do such operations as --make-[r]private --make-[r]shared etc.
when MS_SILENT is on. The idea or clearing the MS_SILENT flag came from
to Denys Vlasenko.
Signed-off-by: Roman Borisov <ext-roman.borisov@nokia.com>
Reported-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Alexander Shishkin <virtuoso@slind.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jonas Gorski [Tue, 24 May 2011 18:12:08 +0000 (20:12 +0200)]
exportfs: reallow building as a module
Commit
990d6c2d7aee921e3bce22b2d6a750fd552262be ("vfs: Add name to file
handle conversion support") changed EXPORTFS to be a bool.
This was needed for earlier revisions of the original patch, but the actual
commit put the code needing it into its own file that only gets compiled
when FHANDLE is selected which in turn selects EXPORTFS.
So EXPORTFS can be safely compiled as a module when not selecting FHANDLE.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 25 Mar 2011 15:00:12 +0000 (11:00 -0400)]
merge handle_reval_dot and nameidata_drop_rcu_last
new helper: complete_walk(). Done on successful completion
of walk, drops out of RCU mode, does d_revalidate of final
result if that hadn't been done already.
handle_reval_dot() and nameidata_drop_rcu_last() subsumed into
that one; callers converted to use of complete_walk().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 25 Mar 2011 14:32:48 +0000 (10:32 -0400)]
consolidate nameidata_..._drop_rcu()
Merge these into a single function (unlazy_walk(nd, dentry)),
kill ..._maybe variants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rupjyoti Sarmah [Tue, 29 Mar 2011 23:10:24 +0000 (23:10 +0000)]
powerpc/4xx: Adding PCIe MSI support
This patch adds MSI support for 440SPe, 460Ex, 460Sx and 405Ex.
Signed-off-by: Rupjyoti Sarmah <rsarmah@apm.com>
Signed-off-by: Tirumala R Marri <tmarri@apm.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Becker [Thu, 26 May 2011 04:51:55 +0000 (21:51 -0700)]
Merge branch 'move_extents' of git://oss.oracle.com/git/tye/linux-2.6 into ocfs2-merge-window
Conflicts:
fs/ocfs2/ioctl.c
Tristan Ye [Mon, 23 May 2011 07:57:26 +0000 (15:57 +0800)]
Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly.
Oops, local-mounted of 'ocfs2_fops_no_plocks' is just missing the support
of unwritten_extents/punching-hole due to no func pointer was given correctly
to '.follocate' field.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Sunil Mushran [Thu, 19 May 2011 21:34:12 +0000 (14:34 -0700)]
ocfs2/dlm: Do not migrate resource to a node that is leaving the domain
During dlm domain shutdown, o2dlm has to free all the lock resources. Ones that
have no locks and references are freed. Ones that have locks and/or references
are migrated to another node.
The first task in migration is finding a target. Currently we scan the lock
resource and find one node that either has a lock or a reference. This is not
very efficient in a parallel umount case as we might end up migrating the
lock resource to a node which itself may have to migrate it to a third node.
The patch scans the dlm->exit_domain_map to ensure the target node is not
leaving the domain. If no valid target node is found, o2dlm does not migrate
the resource but instead waits for the unlock and deref messages that will
allow it to free the resource.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Sunil Mushran [Thu, 19 May 2011 21:34:11 +0000 (14:34 -0700)]
ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG
This patch adds a new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG and ups the dlm
protocol to 1.2.
o2dlm sends this new message in dlm_unregister_domain() to mark the beginning
of the exit domain. This message is sent to all nodes in the domain.
Currently o2dlm has no way of informing other nodes of its impending exit.
This information is useful as the other nodes could disregard the exiting
node in certain operations. For example, in resource migration. If two or
more nodes were umounting in parallel, it would be more efficient if o2dlm
were to choose a non-exiting node to be the new master node rather than an
exiting one.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Milton Miller [Tue, 24 May 2011 20:34:18 +0000 (20:34 +0000)]
powerpc: Fix irq_free_virt by adjusting bounds before loop
Instead of looping over each irq and checking against the irq array
bounds, adjust the bounds before looping.
The old code will not free any irq if the irq + count is above
irq_virq_count because the test in the loop is testing irq + count
instead of irq + i.
This code checks the limits to avoid unsigned integer overflows.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 24 May 2011 20:34:18 +0000 (20:34 +0000)]
powerpc/irq: Protect irq_radix_revmap_lookup against irq_free_virt
The radix-tree code uses call_rcu when freeing internal elements.
We must protect against the elements being freed while we traverse
the tree, even if the returned pointer will still be valid.
While preparing a patch to expand the context in which
irq_radix_revmap_lookup will be called, I realized that the
radix tree was not locked.
When asked
For a normal call_rcu usage, is it allowed to read the structure in
irq_enter / irq_exit, without additional rcu_read_lock? Could an
element freed with call_rcu advance with the cpu still between
irq_enter/irq_exit (and irq_disabled())?
Paul McKenney replied:
Absolutely illegal to do so. OK for call_rcu_sched(), but a
flaming bug for call_rcu().
And thank you very much for finding this!!!
Further analysis:
In the current CONFIG_TREE_RCU implementation. CONFIG_TREE_PREEMPT_RCU
(and CONFIG_TINY_PREEMPT_RCU) uses explicit counters.
These counters are reflected from per-CPU to global in the
scheduling-clock-interrupt handler, so disabling irq does prevent the
grace period from completing. But there are real-time implementations
(such as the one use by the Concurrent guys) where disabling irq
does -not- prevent the grace period from completing.
While an alternative fix would be to switch radix-tree to rcu_sched, I
don't want to audit the other users of radix trees (nor put alternative
freeing in the library). The normal overhead for rcu_read_lock and
unlock are a local counter increment and decrement.
This does not show up in the rcu lockdep because in 2.6.34 commit
2676a58c98 (radix-tree: Disable RCU lockdep checking in radix tree)
deemed it too hard to pass the condition of the protecting lock
to the library.
Signed-off-by: Milton Miller <miltonm@bga.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 24 May 2011 20:34:18 +0000 (20:34 +0000)]
powerpc/irq: Check desc in handle_one_irq and expand generic_handle_irq
Look up the descriptor and check that it is found in handle_one_irq
before checking if we are on the irq stack, and call the handler
directly using the descriptor if we are on the stack.
We need check irq_to_desc finds the descriptor to avoid a NULL
pointer dereference. It could have failed because the number from
ppc_md.get_irq was above NR_IRQS, or various exceptional conditions
with sparse irqs (eg race conditions while freeing an irq if its was
not shutdown in the controller).
fe12bc2c99 (genirq: Uninline and sanity check generic_handle_irq())
moved generic_handle_irq out of line to allow its use by interrupt
controllers in modules. However, handle_one_irq is core arch code.
It already knows the details of struct irq_desc and handling irqs in
the nested irq case. This will avoid the extra stack frame to return
the value we don't check.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 24 May 2011 20:34:17 +0000 (20:34 +0000)]
powerpc/irq: Always free duplicate IRQ_LEGACY hosts
Since kmem caches are allocated before init_IRQ as noted in
3af259d155
(powerpc: Radix trees are available before init_IRQ), we now call
kmalloc in all cases and can can always call kfree if we are asked
to allocate a duplicate or conflicting IRQ_HOST_MAP_LEGACY host.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 24 May 2011 20:34:18 +0000 (20:34 +0000)]
powerpc/irq: Remove stale and misleading comment
The comment claims we will call host->ops->map() to update the flags if
we find a previously established mapping, but we never did. We used
to call remap, but that call was removed in
da05198002 (powerpc: Remove
irq_host_ops->remap hook).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 24 May 2011 20:34:18 +0000 (20:34 +0000)]
powerpc/cell: Rename ipi functions to match current abstractions
Rename functions and arguments to reflect current usage. iic_cause_ipi
becomes iic_message_pass and iic_ipi_to_irq becomes iic_msg_to_irq,
and iic_request_ipi now takes a message (msg) instead of an ipi number.
Also mesg is renamed to msg.
Commit
f1072939b6 (powerpc: Remove checks for MSG_ALL and
MSG_ALL_BUT_SELF) connected the smp_message_pass hook for cell to the
underlying iic_cause_IPI, a platform unique name. Later
23d72bfd8f
(powerpc: Consolidate ipi message mux and demux) added a cause_ipi
hook to the smp_ops, also used in message passing, but for controllers
that can not send 4 unique messages and require multiplexing. It is
even more confusing that the both take two arguments, but one is the
small message ordinal and the other is an opaque long data associated
with the cpu.
Since cell iic maps messages one to one to ipi irqs, rename the
function and argument to translate from ipi to message. Also make it
clear that iic_request_ipi takes a message number as the argument
for which ipi to create and request.
No functionional change, just renames to avoid future confusion.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 24 May 2011 20:34:18 +0000 (20:34 +0000)]
powerpc/cell: Use common smp ipi actions
The cell iic interrupt controller has enough software caused interrupts
to use a unique interrupt for each of the 4 messages powerpc uses.
This means each interrupt gets its own irq action/data combination.
Use the seperate, optimized, arch common ipi action functions
registered via the helper smp_request_message_ipi instead passing the
message as action data to a single action that then demultipexes to
the required acton via a switch statement.
smp_request_message_ipi will register the action as IRQF_PER_CPU
and IRQF_DISABLED, and WARN if the allocation fails for some reason,
so no need to print on that failure. It will return positive if
the message will not be used by the kernel, in which case we can
free the virq.
In addition to elimiating inefficient code, this also corrects the
error that a kernel built with kexec but without a debugger would
not register the ipi for kdump to notify the other cpus of a crash.
This also restores the debugger action to be static to kernel/smp.c.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Milton Miller [Tue, 10 May 2011 19:29:13 +0000 (19:29 +0000)]
Remove unused MSG_ flags in linux/smp.h
Now that powerpc has removed its use of MSG_ALL_BUT_SELF and MSG_ALL
all these MSG_ flags are unused.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian King [Tue, 24 May 2011 03:40:54 +0000 (03:40 +0000)]
powerpc/pseries: Update MAX_HCALL_OPCODE to reflect page coalescing
When page coalescing support was added recently, the MAX_HCALL_OPCODE
define was not updated for the newly added H_GET_MPP_X hcall.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Eric B Munson [Mon, 23 May 2011 04:22:40 +0000 (04:22 +0000)]
powerpc/oprofile: Handle events that raise an exception without overflowing
Commit
0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7
where events can roll back if a specualtive event doesn't actually complete.
This can raise a performance monitor exception. We need to catch this to ensure
that we reset the PMC. In all cases the PMC will be less than 256 cycles from
overflow.
This patch lifts Anton's fix for the problem in perf and applies it to oprofile
as well.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Cc: <stable@kernel.org> # as far back as it applies cleanly
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ian Munsie [Wed, 2 Feb 2011 17:27:24 +0000 (17:27 +0000)]
powerpc/ftrace: Implement raw syscall tracepoints on PowerPC
This patch implements the raw syscall tracepoints on PowerPC and exports
them for ftrace syscalls to use.
To minimise reworking existing code, I slightly re-ordered the thread
info flags such that the new TIF_SYSCALL_TRACEPOINT bit would still fit
within the 16 bits of the andi. instruction's UI field. The instructions
in question are in /arch/powerpc/kernel/entry_{32,64}.S to and the
_TIF_SYSCALL_T_OR_A with the thread flags to see if system call tracing
is enabled.
In the case of 64bit PowerPC, arch_syscall_addr and
arch_syscall_match_sym_name are overridden to allow ftrace syscalls to
work given the unusual system call table structure and symbol names that
start with a period.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Eric Paris [Thu, 26 May 2011 02:49:18 +0000 (19:49 -0700)]
tmpfs: fix XATTR N overriding POSIX_ACL Y
Choosing TMPFS_XATTR default N was switching off TMPFS_POSIX_ACL,
even if it had been Y in oldconfig; and Linus reports that PulseAudio
goes subtly wrong unless it can use ACLs on /dev/shm.
Make TMPFS_POSIX_ACL select TMPFS_XATTR (and depend upon TMPFS),
and move the TMPFS_POSIX_ACL entry before the TMPFS_XATTR entry,
to avoid asking unnecessary questions then ignoring their answers.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Thu, 26 May 2011 01:09:10 +0000 (11:09 +1000)]
video: mb862xx: udelay need linux/delay.h
Fix this:
drivers/video/mb862xx/mb862xx-i2c.c: In function 'mb862xx_i2c_wait_event':
drivers/video/mb862xx/mb862xx-i2c.c:25: error: implicit declaration of function 'udelay'
caused by commit
f8a6b1f44833 ("video: mb862xx: add support for
controller's I2C bus adapter").
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 26 May 2011 01:10:16 +0000 (18:10 -0700)]
Merge git://git./linux/kernel/git/ebiederm/linux-2.6-nsfd
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
net: fix get_net_ns_by_fd for !CONFIG_NET_NS
ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry.
ns: Declare sys_setns in syscalls.h
net: Allow setting the network namespace by fd
ns proc: Add support for the ipc namespace
ns proc: Add support for the uts namespace
ns proc: Add support for the network namespace.
ns: Introduce the setns syscall
ns: proc files for namespace naming policy.
Linus Torvalds [Thu, 26 May 2011 01:06:54 +0000 (18:06 -0700)]
slub: remove no-longer used 'unlock_out' label
Commit
a71ae47a2cbf ("slub: Fix double bit unlock in debug mode")
removed the only goto to this label, resulting in
mm/slub.c: In function '__slab_alloc':
mm/slub.c:1834: warning: label 'unlock_out' defined but not used
fixed trivially by the removal of the label itself too.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 26 May 2011 00:00:17 +0000 (17:00 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (89 commits)
bonding: documentation and code cleanup for resend_igmp
bonding: prevent deadlock on slave store with alb mode (v3)
net: hold rtnl again in dump callbacks
Add Fujitsu 1000base-SX PCI ID to tg3
bnx2x: protect sequence increment with mutex
sch_sfq: fix peek() implementation
isdn: netjet - blacklist Digium TDM400P
via-velocity: don't annotate MAC registers as packed
xen: netfront: hold RTNL when updating features.
sctp: fix memory leak of the ASCONF queue when free asoc
net: make dev_disable_lro use physical device if passed a vlan dev (v2)
net: move is_vlan_dev into public header file (v2)
bug.h: Fix build with CONFIG_PRINTK disabled.
wireless: fix fatal kernel-doc error + warning in mac80211.h
wireless: fix cfg80211.h new kernel-doc warnings
iwlagn: dbg_fixed_rate only used when CONFIG_MAC80211_DEBUGFS enabled
dst: catch uninitialized metrics
be2net: hash key for rss-config cmd not set
bridge: initialize fake_rtable metrics
net: fix __dst_destroy_metrics_generic()
...
Fix up trivial conflicts in drivers/staging/brcm80211/brcmfmac/wl_cfg80211.c
Linus Torvalds [Wed, 25 May 2011 23:55:55 +0000 (16:55 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (75 commits)
mmc: core: eMMC bus width may not work on all platforms
mmc: sdhci: Auto-CMD23 fixes.
mmc: sdhci: Auto-CMD23 support.
mmc: core: Block CMD23 support for UHS104/SDXC cards.
mmc: sdhci: Implement MMC_CAP_CMD23 for SDHCI.
mmc: core: Use CMD23 for multiblock transfers when we can.
mmc: quirks: Add/remove quirks conditional support.
mmc: Add new VUB300 USB-to-SD/SDIO/MMC driver
mmc: sdhci-pxa: Add quirks for DMA/ADMA to match h/w
mmc: core: duplicated trial with same freq in mmc_rescan_try_freq()
mmc: core: add support for eMMC Dual Data Rate
mmc: core: eMMC signal voltage does not use CMD11
mmc: sdhci-pxa: add platform code for UHS signaling
mmc: sdhci: add hooks for setting UHS in platform specific code
mmc: core: clear MMC_PM_KEEP_POWER flag on resume
mmc: dw_mmc: fixed wrong regulator_enable in suspend/resume
mmc: sdhi: allow powering down controller with no card inserted
mmc: tmio: runtime suspend the controller, where possible
mmc: sdhi: support up to 3 interrupt sources
mmc: sdhi: print physical base address and clock rate
...
Linus Torvalds [Wed, 25 May 2011 23:54:01 +0000 (16:54 -0700)]
Merge branch 'kconfig-for-40' of git://git./linux/kernel/git/mmarek/kbuild-2.6
* 'kconfig-for-40' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
xconfig: merge code path to conf_write()
kconfig: do not record timestamp in .config
gconfig: Hide unused left treeview when start up the interface
gconfig: enable rules hint for main treeviews
MAINTAINERS: Update KCONFIG entry
kconfig-language: add to hints
kconfig: Document the new "visible if" syntax
kconfig: quiet commands when V=0
kconfig: change update-po-config to reflect new layout of arch/um
kconfig: make update-po-config work in KBUILD_OUTPUT
kconfig: rearrange clean-files
kconfig: change gconf to modify hostprogs-y like nconf and mconf
kconfig: change qconf to modify hostprogs-y like nconf and mconf
kconfig: only build kxgettext when needed
nconfig: Silence unused return values from wattrset
kconfig: Do not record timestamp in auto.conf and autoconf.h
kconfig: get rid of unused flags
kconfig: allow multiple inclusion of the same file
kconfig: Avoid buffer underrun in choice input
Linus Torvalds [Wed, 25 May 2011 23:53:14 +0000 (16:53 -0700)]
Merge branch 'for-2.6.40' of git://git./linux/kernel/git/oleg/misc
* 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc:
signal: sys_pause() should check signal_pending()
ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread
Linus Torvalds [Wed, 25 May 2011 23:52:50 +0000 (16:52 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: New driver for the SMSC EMC6W201
hwmon: (abituguru) Depend on DMI
hwmon: (it87) Use request_muxed_region
hwmon: (sch5627) Trigger Vbat measurements
hwmon: (sch5627) Add sch5627_send_cmd function
i8k: Integrate with the hwmon subsystem
hwmon: (max6650) Properly support the MAX6650
hwmon: (max6650) Drop device detection
Move ACPI power meter driver to hwmon
hwmon: (f71882fg) Add support for F71808A
hwmon: (f71882fg) Split has_beep in fan_has_beep and temp_has_beep
hwmon: (asc7621) Drop duplicate dependency
hwmon: (jc42) Change detection class
hwmon: Add driver for AMD family 15h processor power information
hwmon: (k10temp) Add support for Fam15h (Bulldozer)
hwmon: Use helper functions to set and get driver data
i8k: Avoid lahf in 64-bit code
Linus Torvalds [Wed, 25 May 2011 22:35:32 +0000 (15:35 -0700)]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (26 commits)
arch/tile: prefer "tilepro" as the name of the 32-bit architecture
compat: include aio_abi.h for aio_context_t
arch/tile: cleanups for tilegx compat mode
arch/tile: allocate PCI IRQs later in boot
arch/tile: support signal "exception-trace" hook
arch/tile: use better definitions of xchg() and cmpxchg()
include/linux/compat.h: coding-style fixes
tile: add an RTC driver for the Tilera hypervisor
arch/tile: finish enabling support for TILE-Gx 64-bit chip
compat: fixes to allow working with tile arch
arch/tile: update defconfig file to something more useful
tile: do_hardwall_trap: do not play with task->sighand
tile: replace mm->cpu_vm_mask with mm_cpumask()
tile,mn10300: add device parameter to dma_cache_sync()
audit: support the "standard" <asm-generic/unistd.h>
arch/tile: clarify flush_buffer()/finv_buffer() function names
arch/tile: kernel-related cleanups from removing static page size
arch/tile: various header improvements for building drivers
arch/tile: disable GX prefetcher during cache flush
arch/tile: tolerate disabling CONFIG_BLK_DEV_INITRD
...
Linus Torvalds [Wed, 25 May 2011 22:35:03 +0000 (15:35 -0700)]
Merge branch 'for-torvalds' of git://git./linux/kernel/git/linusw/linux-stericsson
* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
mach-ux500: voltage domain regulators for DB8500
cpufreq: make DB8500 cpufreq driver compile
cpufreq: update DB8500 cpufreq driver
mach-ux500: move CPUfreq driver to cpufreq subsystem
mfd: add DB5500 PRCMU driver
mfd: update DB8500 PRCMU driver
mach-ux500: move the DB8500 PRCMU driver to MFD
mach-ux500: make PRCMU base address dynamic
mach-ux500: rename PRCMU driver per SoC
mach-ux500: update ASIC version detection
mach-ux500: update SoC and board IRQ handling
mach-ux500: update the DB5500 register file
mach-ux500: update the DB8500 register file
Linus Torvalds [Wed, 25 May 2011 22:34:14 +0000 (15:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/vapier/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: (37 commits)
Blackfin: use new common PERCPU_INPUT define
MAINTAINERS: Fix Analog Devices mailinglist address
Blackfin: boards: update ASoC resources after machine driver overhaul
Blackfin: work around anomaly
05000480
Blackfin: fix addr type with bfin_write_{or,and} helpers
Blackfin: convert /proc/sram to seq_file
Blackfin: switch /proc/gpio to seq_file
Blackfin: fix indentation with bfin_read() helper
Blackfin: convert old cpumask API to new one
Blackfin: don't touch task->cpus_allowed directly
Blackfin: don't touch cpu_possible_map and cpu_present_map directly
Blackfin: bf548-ezkit/bf561-ezkit: update nor flash layout
Blackfin: initial perf_event support
Blackfin: update anomaly lists to latest public info
Blackfin: use on-chip reset func with newer parts
Blackfin: bf533-stamp/bf537-stamp: drop ad1980 from defconfigs
Blackfin: optimize MMR reads during startup a bit
Blackfin: bf537: demux port H mask A and emac rx ints
Blackfin: bf537: fix excessive gpio int demuxing
Blackfin: bf54x: drop unused pm gpio handling
...
Linus Torvalds [Wed, 25 May 2011 22:33:25 +0000 (15:33 -0700)]
Merge branch 'rmobile-latest' of git://git./linux/kernel/git/lethal/sh-2.6
* 'rmobile-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (34 commits)
ARM: mach-shmobile: mackerel: add renesas_usbhs support for USB1
ARM: mach-shmobile: Correct the G4EVM SDHI0 I/O range.
ARM: arch-shmobile: sh7372: add renesas_usbhs irq support
ARM: mach-shmobile: sh73a0: mark DMA slave ID 0 as invalid
ARM: mach-shmobile: mark DMA slave ID 0 as invalid
ARM: mach-shmobile: Enable DMAEngine for SDHI on AG5EVM
ARM: mach-shmobile: Enable DMAEngine for MMCIF on AG5EVM
ARM: mach-shmobile: sh73a0 DMA Engine support for SY-DMAC
dmaengine: shdma: Update SH_DMAC_MAX_CHANNELS to 20
dmaengine: shdma: Fix SH_DMAC_MAX_CHANNELS handling
dmaengine: shdma: Make second memory window optional
ARM: mach-shmobile: Tidy up after SH7372 pm changes.
ARM: mach-shmobile: sh7372 Core Standby CPUIdle
ARM: mach-shmobile: CPUIdle support
ARM: mach-shmobile: sh7372 Core Standby Suspend-to-RAM
ARM: mach-shmobile: Suspend-to-RAM support
mailmap: Add entry for Damian Hobson-Garcia.
ARM: switch mackerel to dynamically manage the platform camera
ARM: mach-shmobile: Add SDHI support for AG5EVM and sh73a0
ARM: arch-shmobile: Use multiple irq vectors for SDHI
...
Thomas Gleixner [Wed, 25 May 2011 21:08:17 +0000 (23:08 +0200)]
hrtimers: Fix typo causing erratic timers
commit
9ec2690758a5 ("timerfd: Manage cancelable timers in timerfd")
introduced a CONFIG_HIGHRES_TIMERS (should be CONFIG_HIGH_RES_TIMERS)
typo, which caused applications depending on CLOCK_REALTIME timers to
become sluggy due to the fact that the time base of the realtime
timers was not updated when the wall clock time was set.
This causes anything from 100% CPU use for some applications to odd
delays and hickups.
Reported-bisected-and-tested-by: Anca Emanuel <anca.emanuel@gmail.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Fatfingered-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Flavio Leitner [Wed, 25 May 2011 08:38:58 +0000 (08:38 +0000)]
bonding: documentation and code cleanup for resend_igmp
Improves the documentation about how IGMP resend parameter
works, fix two missing checks and coding style issues.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Wed, 25 May 2011 08:13:01 +0000 (08:13 +0000)]
bonding: prevent deadlock on slave store with alb mode (v3)
This soft lockup was recently reported:
[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<
ffffffff80064bf0>] [<
ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:
ffff810113167da8 EFLAGS:
00000286
RAX:
ffff810113167fd8 RBX:
ffff810123a47800 RCX:
0000000000ff1025
RDX:
0000000000000000 RSI:
ffff810123a47800 RDI:
ffff81021b57f6f8
RBP:
ffff81021b57f500 R08:
0000000000000000 R09:
000000000000000c
R10:
00000000ffffffff R11:
ffff81011d41c000 R12:
ffff81021b57f000
R13:
0000000000000000 R14:
0000000000000282 R15:
0000000000000282
FS:
00002b3b41ef3f50(0000) GS:
ffff810123b27940(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
00002b3b456dd000 CR3:
000000031fc60000 CR4:
00000000000006e0
Call Trace:
[<
ffffffff80064af9>] _spin_lock_bh+0x9/0x14
[<
ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
[<
ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
[<
ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
[<
ffffffff8006457b>] __down_write_nested+0x12/0x92
[<
ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
[<
ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
[<
ffffffff80016b87>] vfs_write+0xce/0x174
[<
ffffffff80017450>] sys_write+0x45/0x6e
[<
ffffffff8005d28d>] tracesys+0xd5/0xe0
It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down. The bonding driver initializes some data structures
only after its ndo_open routine is called. Among them is the initalization of
the alb tx and rx hash locks. So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.
Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.
Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.
Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do. As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem. We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe
v3) Fix spelling error
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 25 May 2011 07:34:04 +0000 (07:34 +0000)]
net: hold rtnl again in dump callbacks
Commit
e67f88dd12f6 (dont hold rtnl mutex during netlink dump callbacks)
missed fact that rtnl_fill_ifinfo() must be called with rtnl held.
Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
its not easy to solve this problem, so revert this part of the patch.
It also forgot one rcu_read_unlock() in FIB dump_rules()
Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meelis Roos [Wed, 25 May 2011 05:43:47 +0000 (05:43 +0000)]
Add Fujitsu 1000base-SX PCI ID to tg3
This patch adds the PCI ID of Fujitsu 1000base-SX NIC to tg3 driver.
Tested to detect the card, MAC and serdes, not tested with link at the
moment since I have no fiber switch here. I did not add new constants to
the pci_ids.h header file since these constants are used only here.
Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Wed, 25 May 2011 04:55:51 +0000 (04:55 +0000)]
bnx2x: protect sequence increment with mutex
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 25 May 2011 04:40:11 +0000 (04:40 +0000)]
sch_sfq: fix peek() implementation
Since commit
eeaeb068f139 (sch_sfq: allow big packets and be fair),
sfq_peek() can return a different skb that would be normally dequeued by
sfq_dequeue() [ if current slot->allot is negative ]
Use generic qdisc_peek_dequeued() instead of custom implementation, to
get consistent result.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prarit Bhargava [Wed, 25 May 2011 02:12:23 +0000 (02:12 +0000)]
isdn: netjet - blacklist Digium TDM400P
[2nd try ... 1st attempt didn't make it to netdev mailing list]
A quick google search reveals that people with this card are blacklisting it
in the initramfs and in the module blacklist based on a statement that it
is unsupported. Since the older Digium is also unsupported I'm pretty
confident that this newer card is also not supported.
lspci -xxx -vv shows
04:07.0 Communication controller: Tiger Jet Network Inc. Tiger3XX Modem/ISDN interface
Subsystem: Device b100:0003
P.
----8<----
The Asterisk Voice Card, DIGIUM TDM400P is unsupported by the netjet driver.
Blacklist it like the Digium X100P/X101P card.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulrich Hecht [Wed, 25 May 2011 01:07:22 +0000 (01:07 +0000)]
via-velocity: don't annotate MAC registers as packed
On ARM, memory accesses through packed pointers behave in unexpected
ways in GCC releases 4.3 and higher; see https://lkml.org/lkml/2011/2/2/163
for discussion.
In this particular case, 32-bit I/O registers are accessed bytewise,
causing incorrect setting of the DMA address registers which in turn
leads to an error interrupt storm that brings the system to a halt.
Since the mac_regs structure does not need any packing anyway, this patch
simply removes the attribute to fix the issue.
Signed-off-by: Ulrich Hecht <uli@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Tue, 24 May 2011 21:56:02 +0000 (21:56 +0000)]
xen: netfront: hold RTNL when updating features.
Konrad reports:
[ 0.930811] RTNL: assertion failed at /home/konrad/ssd/linux/net/core/dev.c (5258)
[ 0.930821] Pid: 22, comm: xenwatch Not tainted 2.6.39-05193-gd762f43 #1
[ 0.930825] Call Trace:
[ 0.930834] [<
ffffffff8143bd0e>] __netdev_update_features+0xae/0xe0
[ 0.930840] [<
ffffffff8143dd41>] netdev_update_features+0x11/0x30
[ 0.930847] [<
ffffffffa0037105>] netback_changed+0x4e5/0x800 [xen_netfront]
[ 0.930854] [<
ffffffff8132a838>] xenbus_otherend_changed+0xa8/0xb0
[ 0.930860] [<
ffffffff8157ca99>] ? _raw_spin_unlock_irqrestore+0x19/0x20
[ 0.930866] [<
ffffffff8132adfe>] backend_changed+0xe/0x10
[ 0.930871] [<
ffffffff8132875a>] xenwatch_thread+0xba/0x180
[ 0.930876] [<
ffffffff810a8ba0>] ? wake_up_bit+0x40/0x40
[ 0.930881] [<
ffffffff813286a0>] ? split+0xf0/0xf0
[ 0.930886] [<
ffffffff810a8646>] kthread+0x96/0xa0
[ 0.930891] [<
ffffffff815855a4>] kernel_thread_helper+0x4/0x10
[ 0.930896] [<
ffffffff815846b3>] ? int_ret_from_sys_call+0x7/0x1b
[ 0.930901] [<
ffffffff8157cf61>] ? retint_restore_args+0x5/0x6
[ 0.930906] [<
ffffffff815855a0>] ? gs_change+0x13/0x13
This update happens in xenbus watch callback context and hence does not already
hold the rtnl. Take the lock as necessary.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 24 May 2011 21:48:02 +0000 (21:48 +0000)]
sctp: fix memory leak of the ASCONF queue when free asoc
If an ASCONF chunk is outstanding, then the following ASCONF
chunk will be queued for later transmission. But when we free
the asoc, we forget to free the ASCONF queue at the same time,
this will cause memory leak.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 24 May 2011 08:31:09 +0000 (08:31 +0000)]
net: make dev_disable_lro use physical device if passed a vlan dev (v2)
If the device passed into dev_disable_lro is a vlan, then repoint the dev
poniter so that we actually modify the underlying physical device.
Signed-of-by: Neil Horman <nhorman@tuxdriver.com>
CC: davem@davemloft.net
CC: bhutchings@solarflare.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 24 May 2011 08:31:08 +0000 (08:31 +0000)]
net: move is_vlan_dev into public header file (v2)
Migrate is_vlan_dev() to if_vlan.h so that core networkig can use it
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: davem@davemloft.net
CC: bhutchings@solarflare.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Ding Dinghua [Wed, 25 May 2011 21:43:48 +0000 (17:43 -0400)]
jbd2: fix a potential leak of a journal_head on an error path
drop jh->b_jcount in error path
Signed-off-by: Ding Dinghua <dingdinghua@nrchpc.ac.cn>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Yongqiang Yang [Wed, 25 May 2011 21:41:48 +0000 (17:41 -0400)]
ext4: teach ext4_ext_split to calculate extents efficiently
Make ext4_ext_split() get extents to be moved by calculating in a statement
instead of counting in a loop.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 25 May 2011 21:39:48 +0000 (17:39 -0400)]
ext4: Convert ext4 to new truncate calling convention
Trivial conversion. Fixup one error handling case calling vmtruncate()
and remove ->truncate callback. We also fix a bug that IS_IMMUTABLE and
IS_APPEND files could not be truncated during failed writes. In fact, the
test can be completely removed as upper layers do necessary permission
checks for truncate in do_sys_[f]truncate() and may_open() anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Philip Rakity [Wed, 25 May 2011 01:14:58 +0000 (18:14 -0700)]
mmc: core: eMMC bus width may not work on all platforms
CMD19 -- The offical way to validate bus widths from the JEDEC spec
does not work on all platforms. Some platforms that use PCI/PCIe
to connect their SD controllers are known to fail.
If the quirk MMC_BUS_WIDTH_TEST is not defined we try to figure out
the bus width by reading the ext_csd at different bus widths and
compare this against the ext_csd read in 1 bit mode. If no ext_csd
is available we default to 1 bit operations.
Code has been tested on mmp2 against 8 bit eMMC and Transcend 2GB
card that is known to not work in 4 bit mode. The physical pins
on the card are not present to support 4 bit operation.
Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Andrei Warkentin [Wed, 25 May 2011 14:42:50 +0000 (10:42 -0400)]
mmc: sdhci: Auto-CMD23 fixes.
Fixes bugs in Auto-CMD23 feature enable decision. Auto-CMD23
should be enabled if host is >= v3, and SDMA is not in use.
USE_ADMA | USE_SDMA | Auto-CMD23
---------+----------+-----------
0 | 0 | 1
---------+----------+-----------
0 | 1 | 0
---------+----------+-----------
1 | 0 | 1
---------+----------+-----------
1 | 1 | 1
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Andrei Warkentin [Mon, 23 May 2011 20:06:39 +0000 (15:06 -0500)]
mmc: sdhci: Auto-CMD23 support.
Enables Auto-CMD23 support where available (SDHCI 3.0 controllers)
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Tested-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Andrei Warkentin [Mon, 23 May 2011 20:06:38 +0000 (15:06 -0500)]
mmc: core: Block CMD23 support for UHS104/SDXC cards.
SD cards operating at UHS104 or better support SET_BLOCK_COUNT.
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Reviewed-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Andrei Warkentin [Mon, 23 May 2011 20:06:37 +0000 (15:06 -0500)]
mmc: sdhci: Implement MMC_CAP_CMD23 for SDHCI.
Implements support for multiblock transfers bounded
by SET_BLOCK_COUNT (CMD23).
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Andrei Warkentin [Mon, 23 May 2011 20:06:36 +0000 (15:06 -0500)]
mmc: core: Use CMD23 for multiblock transfers when we can.
CMD23-prefixed instead of open-ended multiblock transfers
have a performance advantage on some MMC cards.
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Chris Metcalf [Wed, 25 May 2011 19:24:00 +0000 (15:24 -0400)]
arch/tile: prefer "tilepro" as the name of the 32-bit architecture
With this change, you can (and should) build with ARCH=tilepro for the
current 32-bit chips. Building with ARCH=tile continues to work, but
we've renamed the defconfig file to tilepro_defconfig for consistency.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Linus Torvalds [Wed, 25 May 2011 19:04:15 +0000 (12:04 -0700)]
Merge branch 'misc' of git://git./linux/kernel/git/mmarek/kbuild-2.6
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
export_report: use warn() to issue WARNING, so they go to stderr
export_report: sort SECTION 2 output
export_report: do collectcfiles work in perl itself
kbuild: make versioncheck work in KBUILD_OUTDIR
kbuild: make includecheck work in KBUILD_OUTDIR
kbuild: make headerdep work in KBUILD_OUTDIR
kbuild: add targets to PHONY
kbuild: don't warn about include/linux/version.h not including itself
eradicate bashisms in scripts/patch-kernel
Linus Torvalds [Wed, 25 May 2011 19:03:47 +0000 (12:03 -0700)]
Merge branch 'packaging' of git://git./linux/kernel/git/mmarek/kbuild-2.6
* 'packaging' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
kbuild: Create a kernel-headers RPM
rpm-pkg: Fix when current directory is a symlink
Replace '-' in kernel version with '_'
Linus Torvalds [Wed, 25 May 2011 18:46:31 +0000 (11:46 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
ceph: fix cap flush race reentrancy
libceph: subscribe to osdmap when cluster is full
libceph: handle new osdmap down/state change encoding
rbd: handle online resize of underlying rbd image
ceph: avoid inode lookup on nfs fh reconnect
ceph: use LOOKUPINO to make unconnected nfs fh more reliable
rbd: use snprintf for disk->disk_name
rbd: cleanup: make kfree match kmalloc
rbd: warn on update_snaps failure on notify
ceph: check return value for start_request in writepages
ceph: remove useless check
libceph: add missing breaks in addr_set_port
libceph: fix TAG_WAIT case
ceph: fix broken comparison in readdir loop
libceph: fix osdmap timestamp assignment
ceph: fix rare potential cap leak
libceph: use snprintf for unknown addrs
libceph: use snprintf for formatting object name
ceph: use snprintf for dirstat content
libceph: fix uninitialized value when no get_authorizer method is set
...