Jan Kara [Fri, 18 Sep 2009 16:22:29 +0000 (12:22 -0400)]
ext4: Update documentation about quota mount options
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 17 Sep 2009 15:55:58 +0000 (11:55 -0400)]
ext4: replace MAX_DEFRAG_SIZE with EXT_MAX_BLOCK
There's no reason to redefine the maximum allowable offset
in an extent-based file just for defrag;
EXT_MAX_BLOCK already does this.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 17 Sep 2009 13:34:16 +0000 (09:34 -0400)]
ext4: Fix the alloc on close after a truncate hueristic
In an attempt to avoid doing an unneeded flush after opening a
(previously non-existent) file with O_CREAT|O_TRUNC, the code only
triggered the hueristic if ei->disksize was non-zero. Turns out that
the VFS doesn't call ->truncate() if the file doesn't exist, and
ei->disksize is always zero even if the file previously existed. So
remove the test, since it isn't necessary and in fact disabled the
hueristic.
Thanks to Clemens Eisserer that he was seeing problems with files
written using kwrite and eclipse after sudden crashes caused by a
buggy Intel video driver.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Wed, 16 Sep 2009 23:30:40 +0000 (19:30 -0400)]
ext4: Add a tracepoint for ext4_alloc_da_blocks()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 17 Sep 2009 12:32:22 +0000 (08:32 -0400)]
ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags
EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
and the hex value assigned to it collides with FS_DIRECTIO_FL (which
is also stored in i_flags). There's no reason for the
EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
i_state instead.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Wed, 16 Sep 2009 18:45:10 +0000 (14:45 -0400)]
ext4: limit block allocations for indirect-block files to < 2^32
Today, the ext4 allocator will happily allocate blocks past
2^32 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.
This patch limits such allocations to < 2^32, and adds
BUG_ONs if we do get blocks larger than that.
This should address RH Bug 519471, ext4 bitmap allocator
must limit blocks to < 2^32
* ext4_find_goal() is modified to choose a goal < UINT_MAX,
so that our starting point is in an acceptable range.
* ext4_xattr_block_set() is modified such that the goal block
is < UINT_MAX, as above.
* ext4_mb_regular_allocator() is modified so that the group
search does not continue into groups which are too high
* ext4_mb_use_preallocated() has a check that we don't use
preallocated space which is too far out
* ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
No attempt has been made to limit inode locations to < 2^32,
so we may wind up with blocks far from their inodes. Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.
For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Wed, 16 Sep 2009 18:25:39 +0000 (14:25 -0400)]
ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT
If logical block offset of original file which is passed to
EXT4_IOC_MOVE_EXT is different from donor file's,
a calculation error occurs in ext4_calc_swap_extents(),
therefore wrong block is exchanged between original file and donor file.
As a result, we hit ext4_error() in check_block_validity().
To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
add checks to mext_calc_swap_extents() and handle it as error,
since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT.
Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Wed, 16 Sep 2009 18:25:07 +0000 (14:25 -0400)]
ext4: Add null extent check to ext_get_path
There is the possibility that path structure which is taken
by ext4_ext_find_extent() indicates null extents.
Because during data block exchanging in ext4_move_extents(),
constitution of an extent tree may be changed.
As a solution, the patch adds null extent check
to ext_get_path().
Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Wed, 16 Sep 2009 17:46:35 +0000 (13:46 -0400)]
ext4: Replace BUG_ON() with ext4_error() in move_extents.c
Replace BUG_ON calls with a call to ext4_error()
to print an error message if EXT4_IOC_MOVE_EXT failed
with some kind of reasons. This will help to debug.
Ted pointed this out, thanks.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Wed, 16 Sep 2009 17:46:38 +0000 (13:46 -0400)]
ext4: Replace get_ext_path macro with an inline funciton
Replace get_ext_path macro with an inline function,
since this macro looks like a function call but its arguments
get modified. Ted pointed this out, thanks.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 15 Sep 2009 02:59:50 +0000 (22:59 -0400)]
ext4: Fix include/trace/events/ext4.h to work with Systemtap
Using relative pathnames in #include statements interacts badly with
SystemTap, since the fs/ext4/*.h header files are not packaged up as
part of a distribution kernel's header files. Since systemtap doesn't
use TP_fast_assign(), we can use a blind structure definition and then
make sure the needed header files are defined before the ext4 source
files #include the trace/events/ext4.h header file.
https://bugzilla.redhat.com/show_bug.cgi?id=512478
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 11 Sep 2009 20:51:28 +0000 (16:51 -0400)]
ext4: Fix initalization of s_flex_groups
The s_flex_groups array should have been initialized using atomic_add
to sum up the free counts from the block groups that make up a
flex_bg. By using atomic_set, the value of the s_flex_groups array
was set to the values of the last block group in the flex_bg.
The impact of this bug is that the block and inode allocation
algorithms might not pick the best flex_bg for new allocation.
Thanks to Damien Guibouret for pointing out this problem!
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Andreas Schlick [Fri, 11 Sep 2009 03:16:07 +0000 (23:16 -0400)]
ext4: Always set dx_node's fake_dirent explicitly.
When ext4_dx_add_entry() has to split an index node, it has to ensure that
name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
won't recognise it as an intermediate htree node and consider the htree to
be corrupted.
Signed-off-by: Andreas Schlick <schlick@lavabit.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 11 Sep 2009 13:30:12 +0000 (09:30 -0400)]
ext4: Fix async commit mode to be safe by using a barrier
Previously the journal_async_commit mount option was equivalent to
using barrier=0 (and just as unsafe). This patch fixes it so that we
eliminate the barrier before the commit block (by not using ordered
mode), and explicitly issuing an empty barrier bio after writing the
commit block. Because of the journal checksum, it is safe to do this;
if the journal blocks are not all written before a power failure, the
checksum in the commit block will prevent the last transaction from
being replayed.
Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement:
FSUse% Count Size Files/sec App Overhead
8 1000 10240 30.5 28242
vs.
FSUse% Count Size Files/sec App Overhead
8 1000 10240 45.8 28620
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 10 Sep 2009 21:31:04 +0000 (17:31 -0400)]
ext4: Don't update superblock write time when filesystem is read-only
This avoids updating the superblock write time when we are mounting
the root file system read/only but we need to replay the journal; at
that point, for people who are east of GMT and who make their clock
tick in localtime for Windows bug-for-bug compatibility, and this will
cause e2fsck to complain and force a full file system check.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Thu, 10 Sep 2009 03:50:17 +0000 (23:50 -0400)]
ext4: Clarify the locking details in mballoc
We don't need to take the alloc_sem lock when we are adding new
groups, since mballoc won't see the new group added until we bump
sbi->s_groups_count.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Thu, 10 Sep 2009 03:34:50 +0000 (23:34 -0400)]
ext4: check for need init flag in ext4_mb_load_buddy
We should check for need init flag with the group's alloc_sem held, to
make sure while we are loading the buddy cache and holding a reference
to it, a file system resize can't add new blocks to same group.
The patch also drops the need init flag check in
ext4_mb_regular_allocator() because doing the check without holding
alloc_sem is racy.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Thu, 10 Sep 2009 03:47:46 +0000 (23:47 -0400)]
ext4: move ext4_mb_init_group() function earlier in the mballoc.c
This moves the function around so that it can be called from
ext4_mb_load_buddy().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Frank Mayhar [Thu, 10 Sep 2009 02:33:47 +0000 (22:33 -0400)]
ext4: Make non-journal fsync work properly
Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
mode: If we're not using a journal, ext4_write_inode() now calls
ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
with a new "do_sync" parameter. If that parameter is nonzero _and_ we're
not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
instead of ext4_handle_dirty_metadata().
This problem was found in power-fail testing, checking the amount of
loss of files and blocks after a power failure when using fsync() and
when not using fsync(). It turned out that using fsync() was actually
worse than not doing so, possibly because it increased the likelihood
that the inodes would remain unflushed and would therefore be lost at
the power failure.
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 12 Sep 2009 17:41:55 +0000 (13:41 -0400)]
ext4: Assure that metadata blocks are written during fsync in no journal mode
When there is no journal present, we must attach buffer heads
associated with extent tree and indirect blocks to the inode's
mapping->private_list via mark_buffer_dirty_inode() so that
ext4_sync_file() --- which is called to service fsync() and
fdatasync() system calls --- can write out the inode's metadata blocks
by calling sync_mapping_buffers().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 10 Sep 2009 01:32:41 +0000 (21:32 -0400)]
ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()
When ext4 is using a journal, a metadata block which is deallocated
must be passed into the journal layer so it can be dropped from the
current transaction and/or revoked. This is done by calling the
functions ext4_journal_forget() and ext4_journal_revoke(), which call
jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.
Since the jbd2_journal_forget() and jbd2_journal_revoke() call
bforget(), if ext4 is not using a journal, ext4_journal_forget() and
ext4_journal_revoke() must call bforget() to avoid a dirty metadata
block overwriting a block after it has been reallocated and reused for
another inode's data block.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 8 Sep 2009 12:21:26 +0000 (08:21 -0400)]
ext4: print more sysadmin-friendly message in check_block_validity()
Drop the WARN_ON(1), as he stack trace is not appropriate, since it is
triggered by file system corruption, and it misleads users into
thinking there is a kernel bug. In addition, change the message
displayed by ext4_error() to make it clear that this is a file system
corruption problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Thu, 10 Sep 2009 02:36:03 +0000 (22:36 -0400)]
ext4: Take page lock before looking at attached buffer_heads flags
In order to check whether the buffer_heads are mapped we need to hold
page lock. Otherwise a reclaim can cleanup the attached buffer_heads.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Sun, 6 Sep 2009 03:12:41 +0000 (23:12 -0400)]
ext4: Fix small typo for move_extent_per_page()
This function means moving extents every page, so change its name from
move_exgtent_par_page().
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Sun, 6 Sep 2009 02:46:29 +0000 (22:46 -0400)]
ext4: Return exchanged blocks count to user space in failure
Return exchanged blocks count (moved_len) to user space,
if ext4_move_extents() failed on the way.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Sun, 6 Sep 2009 02:11:55 +0000 (22:11 -0400)]
ext4: Remove unneeded BUG_ON() in ext4_move_extents()
The ext4_move_extents() functions checks with BUG_ON() whether the
exchanged blocks count accords with request blocks count. But, if the
target range (orig_start + len) includes sparse block(s), 'moved_len'
(exchanged blocks count) does not agree with 'len' (request blocks
count), since sparse block is not counted in 'moved_len'. This causes
us to hit the BUG_ON(), even though the function succeeded.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Wed, 16 Sep 2009 18:28:22 +0000 (14:28 -0400)]
ext4: Fix wrong comparisons in mext_check_arguments()
The mext_check_arguments() function in move_extents.c has wrong
comparisons. orig_start which is passed from user-space is block
unit, but i_size of inode is byte unit, therefore the checks do not
work fine. This mis-check leads to the overflow of 'len' and then
hits BUG_ON() in ext4_move_extents(). The patch fixes this issue.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Reviewed-by: Greg Freemyer <greg.freemyer@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Christoph Hellwig [Sun, 6 Sep 2009 01:42:42 +0000 (21:42 -0400)]
ext4: fix cache flush in ext4_sync_file
We need to flush the write cache unconditionally in ->fsync, otherwise
writes into already allocated blocks can get lost. Writes into fully
allocated files are very common when using disk images for
virtualization, and without this fix can easily lose data after
an fdatasync, which is the typical implementation for a cache flush on
the virtual drive.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 5 Sep 2009 16:50:43 +0000 (12:50 -0400)]
ext4: Remove journal_checksum mount option and enable it by default
There's no real cost for the journal checksum feature, and we should
make sure it is enabled all the time.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 17 Sep 2009 12:50:18 +0000 (08:50 -0400)]
ext4: fix tracepoint format string warnings
Unlike on some other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid lots of compile warnings:
In file included from include/trace/ftrace.h:285,
from include/trace/define_trace.h:61,
from include/trace/events/ext4.h:711,
from fs/ext4/super.c:50:
include/trace/events/ext4.h: In function 'ftrace_raw_output_ext4_free_inode':
include/trace/events/ext4.h:12: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tobias Klauser [Sat, 5 Sep 2009 13:28:54 +0000 (09:28 -0400)]
ext4: Declare seq_operations and file_operations structures as const
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 1 Sep 2009 03:13:11 +0000 (23:13 -0400)]
ext4: Add new tracepoint: trace_ext4_da_write_pages()
Add a new tracepoint which shows the pages that will be written using
write_cache_pages() by ext4_da_writepages().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 31 Aug 2009 21:00:59 +0000 (17:00 -0400)]
ext4: Restore wbc->range_start in ext4_da_writepages()
To solve a lock inversion problem, we implement part of the
range_cyclic algorithm in ext4_da_writepages(). (See commit
2acf2c26
for more details.)
As part of that change wbc->range_start was modified by ext4's
writepages function, which causes its callers to get confused since
they aren't expecting the filesystem to modify it. The simplest fix
is to save and restore wbc->range_start in ext4_da_writepages.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Thu, 17 Sep 2009 12:48:28 +0000 (08:48 -0400)]
ext4: Fix spelling typo in the trace format for trace_ext4_da_writepages()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sun, 30 Aug 2009 01:08:08 +0000 (21:08 -0400)]
ext4: Limit number of links that can be created by ext4_link()
In ext4_link we need to check using EXT4_LINK_MAX, and not
EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of
regular files, and not directories.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Sat, 29 Aug 2009 01:43:15 +0000 (21:43 -0400)]
ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories
Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new
parent directory without running into the EXT4_LINK_MAX limit.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 28 Aug 2009 14:40:33 +0000 (10:40 -0400)]
ext4: fix extent sanity checking code with AGGRESSIVE_TEST
The extents sanity-checking code depends on the ext4_ext_space_*()
functions returning the maximum alloable size for eh_max; however,
when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the
extent tree handling code, this prevents a normally created ext4
filesystem from being mounted with the errors:
Aug 26 15:43:50 bsd086 kernel: [ 96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0)
Aug 26 15:43:50 bsd086 kernel: [ 96.070526] EXT4-fs (sda8): no journal found
Bug reported by Akira Fujita.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Wed, 26 Aug 2009 02:36:45 +0000 (22:36 -0400)]
ext4: use ext4_grpblk_t more extensively
unsigned short is potentially too small to track blocks within
a group; today it is safe due to restrictions in e2fsprogs but
we have _lo / _hi bits for group blocks with the intent to go
up to 32 bits, so clean this up now.
There are many more places where we use unsigned/int/unsigned int
to contain a group block but this should at least fix all the
short types.
I added a few comments to the struct ext4_group_info definition
as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Wed, 26 Aug 2009 02:36:25 +0000 (22:36 -0400)]
ext4: use variables not types in sizeofs() for allocations
Precursor to changing some types; to keep things in sync, it
seems better to allocate/memset based on the size of the
variables we are using rather than on some disconnected
basic type like "unsigned short"
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Aneesh Kumar K.V [Wed, 26 Aug 2009 02:36:05 +0000 (22:36 -0400)]
ext4: Add missing unlock_new_inode() call in extent migration code
We need to unlock the new inode before iput. This patch fixes the
following warning when calling chattr +e to migrate a file to use
extents. It also fixes problems in when e4defrag attempts to
defragment an inode.
[ 470.400044] ------------[ cut here ]------------
[ 470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
[ 470.400072] Hardware name: N/A
.....
...
[ 470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
[ 470.400359] Call Trace:
[ 470.400372] [<
ffffffff81037771>] warn_slowpath_common+0x77/0x8f
[ 470.400385] [<
ffffffff81037798>] warn_slowpath_null+0xf/0x11
[ 470.400395] [<
ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
[ 470.400405] [<
ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
[ 470.400413] [<
ffffffff810b7083>] iput+0x61/0x65
[ 470.400455] [<
ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
[ 470.400492] [<
ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
[ 470.400507] [<
ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
[ 470.400517] [<
ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
[ 470.400527] [<
ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
[ 470.400537] [<
ffffffff810b2087>] sys_ioctl+0x51/0x74
[ 470.400549] [<
ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
[ 470.400557] ---[ end trace
ab85723542352dac ]---
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Tue, 18 Aug 2009 04:20:23 +0000 (00:20 -0400)]
ext4: Add feature set check helper for mount & remount paths
A user reported that although his root ext4 filesystem was mounting
fine, other filesystems would not mount, with the:
"Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"
error on his 32-bit box built without CONFIG_LBDAF. This is because
the test at mount time for this situation was not being re-checked
on remount, and the normal boot process makes an ro->rw transition,
so this was being missed.
Refactor to make a common helper function to test the filesystem
features against the type of mount request (RO vs. RW) so that we
stay consistent.
Addresses Red-Hat-Bugzilla: #517650
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Tue, 18 Aug 2009 03:55:24 +0000 (23:55 -0400)]
simplify some logic in ext4_mb_normalize_request
While reading through some of the mballoc code it seems that a couple
spots in the size normalization function could be streamlined.
The test for non-overlapping PAs can be or'd for the start & end
conditions, and the tests for adjacent PAs can be else-if'd -
it's essentially independently testing:
if (A + B <= C)
...
if (A > C)
...
These cannot both be true so it seems like the else-if might
be slightly more efficient and/or informative.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Tue, 18 Aug 2009 03:51:29 +0000 (23:51 -0400)]
ext4: open-code ext4_mb_update_group_info
ext4_mb_update_group_info is only called in one place, and it's
extremely simple. There's no reason to have it in a separate function
in a separate file as far as I can tell, it just obfuscates what's
really going on.
Perhaps it was intended to keep the grp->bb_* manipulation local to
mballoc.c but we're already accessing other grp-> fields in balloc.c
directly so this seems ok.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Tue, 18 Aug 2009 03:48:51 +0000 (23:48 -0400)]
ext4: reject too-large filesystems on 32-bit kernels
ext4 will happily mount a > 16T filesystem on a 32-bit box, but
this is not safe; writes to the block device will wrap past 16T
and the page cache can't index past 16T (232 index * 4k pages).
Adding another test to the existing "too many sectors" test
should do the trick.
Add a comment, a relevant return value, and fix the reference
to the CONFIG_LBD(AF) option as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
H Hartley Sweeten [Tue, 18 Aug 2009 02:38:04 +0000 (22:38 -0400)]
jbd2: bitfields should be unsigned
This fixes sparse noise:
error: dubious one-bit signed bitfield
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@ucw.cz>
Jan Kara [Tue, 18 Aug 2009 02:17:20 +0000 (22:17 -0400)]
ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()
During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).
We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:
1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).
2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.
This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Tue, 18 Aug 2009 01:23:17 +0000 (21:23 -0400)]
jbd2: Annotate transaction start also for jbd2_journal_restart()
lockdep annotation for a transaction start has been at the end of
jbd2_journal_start(). But a transaction is also started from
jbd2_journal_restart(). Move the lockdep annotation to start_this_handle()
which covers both cases.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming [Fri, 18 Sep 2009 17:34:55 +0000 (13:34 -0400)]
ext4: Show unwritten extent flag in ext4_ext_show_leaf()
ext4_ext_show_leaf() will display the leaf extents when extent
debugging is enabled.
Printing out the unwritten bit is useful for debugging unwritten
extent, allow us to see the unwritten extents vs written extents,
after the unwritten extents are splitted or converted.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Mingming [Tue, 1 Sep 2009 12:44:37 +0000 (08:44 -0400)]
ext4: Compile warning fix when EXT_DEBUG enabled
When EXT_DEBUG is enabled I received the following compile warning on
PPC64:
CC [M] fs/ext4/inode.o
CC [M] fs/ext4/extents.o
fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’:
fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’
fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’:
fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’
fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’
fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’
CC [M] fs/ext4/migrate.o
The patch fixes compile warning.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Index: linux-2.6.31-rc4/fs/ext4/extents.c
===================================================================
Theodore Ts'o [Fri, 18 Sep 2009 17:34:02 +0000 (13:34 -0400)]
ext4: Avoid group preallocation for closed files
Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented. In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent. So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files. Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.
So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 10 Aug 2009 02:01:13 +0000 (22:01 -0400)]
ext4: Fix bugs in mballoc's stream allocation mode
The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up. These fields were getting unconditionally all the time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.
Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sun, 9 Aug 2009 20:46:13 +0000 (16:46 -0400)]
ext4: Display the mballoc flags in mb_history in hex instead of decimal
Displaying the flags in base 16 makes it easier to see which flags
have been set.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Fri, 18 Sep 2009 17:38:55 +0000 (13:38 -0400)]
ext4: Add configurable run-time mballoc debugging
Allow mballoc debugging to be enabled at run-time instead of just at
compile time.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Peng Tao [Tue, 11 Aug 2009 03:05:28 +0000 (23:05 -0400)]
ext4: fix journal ref count in move_extent_par_page
move_extent_par_page calls a_ops->write_begin() to increase journal
handler's reference count. However, if either mext_replace_branches()
or ext4_get_block fails, the increased reference count isn't
decreased. This will cause a later attempt to umount of the fs to hang
forever. The patch addresses the issue by calling ext4_journal_stop()
if page is not NULL (which means a_ops->write_end() isn't invoked).
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Andreas Dilger [Tue, 11 Aug 2009 02:51:53 +0000 (22:51 -0400)]
jbd2: round commit timer up to avoid uncommitted transaction
fix jiffie rounding in jbd commit timer setup code. Rounding down
could cause the timer to be fired before the corresponding transaction
has expired. That transaction can stay not committed forever if no
new transaction is created or expicit sync/umount happens.
Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com>
Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Roel Kluin [Tue, 11 Aug 2009 02:47:22 +0000 (22:47 -0400)]
ext4: remove redundant test on unsigned
unsigned i_block cannot be less than 0.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Peng Tao [Tue, 28 Jul 2009 01:44:40 +0000 (21:44 -0400)]
ext4: fix build warning when EXT4FS_DEBUG is on
When compiling with EXT4FS_DEBUG on, gcc will complain with following warnings:
linux-2.6/fs/ext4/ialloc.c: In function ‘ext4_count_free_inodes’:
linux-2.6/fs/ext4/ialloc.c:1192: warning: format ‘%lu’ expects type
‘long unsigned int’, but argument 2 has type ‘ext4_group_t’
So add a type cast to suppress it.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Akira Fujita [Mon, 6 Jul 2009 03:04:36 +0000 (23:04 -0400)]
ext4: Fix compile warnings with MB_DEBUG
When MB_DEBUG is enabled, we get some compile warnings because
ext4_group_t is unsigned int. This patch fixes them.
Signed-off-by Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Joe Perches [Mon, 6 Jul 2009 02:33:08 +0000 (22:33 -0400)]
ext4: Remove unnecessary semicolons in mballoc.c
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Curt Wohlgemuth [Fri, 17 Jul 2009 14:54:08 +0000 (10:54 -0400)]
ext4: More buffer head reference leaks
After the patch I posted last week regarding buffer head ref leaks in
no-journal mode, I looked at all the code that uses buffer heads and
searched for more potential leaks.
The patch below fixes the issues I found; these can occur even when a
journal is present.
The change to inode.c fixes a double release if
ext4_journal_get_create_access() fails.
The changes to namei.c are more complicated. add_dirent_to_buf() will
release the input buffer head EXCEPT when it returns -ENOSPC. There are
some callers of this routine that don't always do the brelse() in the event
that -ENOSPC is returned. Unfortunately, to put this fix into ext4_add_entry()
required capturing the return value of make_indexed_dir() and
add_dirent_to_buf().
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Fri, 17 Jul 2009 14:40:01 +0000 (10:40 -0400)]
jbd2: Fail to load a journal if it is too short
Due to on disk corruption, it can happen that journal is too short. Fail
to load it in such case so that we don't oops somewhere later.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 28 Jul 2009 03:09:47 +0000 (23:09 -0400)]
ext4: Avoid null pointer dereference when decoding EROFS w/o a journal
We need to check to make sure a journal is present before checking the
journal flags in ext4_decode_error().
Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Manish Katiyar [Tue, 28 Jul 2009 01:38:17 +0000 (21:38 -0400)]
ext4: Fix typo in ext4/Kconfig
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Fri, 17 Jul 2009 13:01:04 +0000 (09:01 -0400)]
ext4: Fix memory leak fix when mounting an ext4 filesystem
The allocation of the ext4_group_info array was moved to a new
function ext4_mb_add_group_info() in commit
5f21b0e6 so that online
resize would use a common (and correct) codepath. Unfortunately, the
call to the new ext4_mb_add_group_info() function was added without
removing the code which originally allocated the array. This caused a
memory leak each time an ext4 filesystem was mounted.
The fix is simple; remove the code that did the original allocation,
since it is no longer needed.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Linus Torvalds [Wed, 16 Sep 2009 15:27:10 +0000 (08:27 -0700)]
Merge git://git./linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
debugfs: Modify default debugfs directory for debugging pktcdvd.
debugfs: Modified default dir of debugfs for debugging UHCI.
debugfs: Change debugfs directory of IWMC3200
debugfs: Change debuhgfs directory of trace-events-sample.h
debugfs: Fix mount directory of debugfs by default in events.txt
hpilo: add poll f_op
hpilo: add interrupt handler
hpilo: staging for interrupt handling
driver core: platform_device_add_data(): use kmemdup()
Driver core: Add support for compatibility classes
uio: add generic driver for PCI 2.3 devices
driver-core: move dma-coherent.c from kernel to driver/base
mem_class: fix bug
mem_class: use minor as index instead of searching the array
driver model: constify attribute groups
UIO: remove 'default n' from Kconfig
Driver core: Add accessor for device platform data
Driver core: move dev_get/set_drvdata to drivers/base/dd.c
Driver core: add new device to bus's list before probing
Linus Torvalds [Wed, 16 Sep 2009 15:11:54 +0000 (08:11 -0700)]
Merge git://git./linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (641 commits)
Staging: remove sxg driver
Staging: remove heci driver
Staging: remove at76_usb wireless driver.
Staging: rspiusb: remove the driver
Staging: meilhaus: remove the drivers
Staging: remove me4000 driver.
Staging: line6: ffzb returns an unsigned integer
Staging: line6: pod.c: style cleanups
Staging: iio: introduce missing kfree
Staging: dream: introduce missing kfree
Staging: comedi: addi-data: NULL dereference of amcc in v_pci_card_list_init()
Staging: vt665x: fix built-in compiling
Staging: rt3090: enable NATIVE_WPA_SUPPLICANT_SUPPORT option
Staging: rt3090: port changes in WPA_MIX_PAIR_CIPHER to rt3090
Staging: rt3090: rename device from raX to wlanX
Staging: rt3090: remove possible conflict with rt2860
Staging: rt2860/rt2870/rt3070/rt3090: fix compiler warning on x86_64
Staging: rt2860: add new device ids
Staging: rt3090: add device id 1462:891a
Staging: asus_oled: Cleaned up checkpatch issues.
...
Linus Torvalds [Wed, 16 Sep 2009 15:11:23 +0000 (08:11 -0700)]
Merge git://git./linux/kernel/git/gregkh/pcmcia-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pcmcia-2.6:
pcmcia: document return value of pcmcia_loop_config
pcmcia: dtl1_cs: fix pcmcia_loop_config logic
pcmcia: drop non-existant includes
pcmcia: disable prefetch/burst for OZ6933
pcmcia: fix incorrect argument order to list_add_tail()
pcmcia: drivers/pcmcia/pcmcia_resource.c: Remove unnecessary semicolons
pcmcia: Use phys_addr_t for physical addresses
pcmcia: drivers/pcmcia: Make static
Linus Torvalds [Wed, 16 Sep 2009 14:49:54 +0000 (07:49 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
PCI hotplug: clean up acpi_run_hpp()
PCI hotplug: acpiphp: use generic pci_configure_slot()
PCI hotplug: shpchp: use generic pci_configure_slot()
PCI hotplug: pciehp: use generic pci_configure_slot()
PCI hotplug: add pci_configure_slot()
PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
PCI: Clear saved_state after the state has been restored
PCI PM: Return error codes from pci_pm_resume()
PCI: use dev_printk in quirk messages
PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
PCI Hotplug: acpiphp: find bridges the easy way
PCI: pcie portdrv: remove unused variable
PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
ACPI PM: Replace wakeup.prepared with reference counter
PCI PM: Introduce device flag wakeup_prepared
PCI / ACPI PM: Rework some debug messages
PCI PM: Simplify PCI wake-up code
...
Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases. The
'needs_freset' initialization added in
6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
Linus Torvalds [Wed, 16 Sep 2009 14:46:34 +0000 (07:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK
Linus Torvalds [Wed, 16 Sep 2009 14:45:38 +0000 (07:45 -0700)]
Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: fix possible bdi writeback refcounting problem
writeback: Fix bdi use after free in wb_work_complete()
writeback: improve scalability of bdi writeback work queues
writeback: remove smp_mb(), it's not needed with list_add_tail_rcu()
writeback: use schedule_timeout_interruptible()
writeback: add comments to bdi_work structure
writeback: splice dirty inode entries to default bdi on bdi_destroy()
writeback: separate starting of sync vs opportunistic writeback
writeback: inline allocation failure handling in bdi_alloc_queue_work()
writeback: use RCU to protect bdi_list
writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout
fs: Assign bdi in super_block
writeback: make wb_writeback() take an argument structure
writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE
writeback: get rid of wbc->for_writepages
fs: remove bdev->bd_inode_backing_dev_info
Nick Piggin [Tue, 15 Sep 2009 19:37:55 +0000 (21:37 +0200)]
writeback: fix possible bdi writeback refcounting problem
wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.
Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?
Also add a comment in bdi_work_clear.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nick Piggin [Tue, 15 Sep 2009 19:34:51 +0000 (21:34 +0200)]
writeback: Fix bdi use after free in wb_work_complete()
By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
can already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing
corruption.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nick Piggin [Tue, 15 Sep 2009 19:34:12 +0000 (21:34 +0200)]
writeback: improve scalability of bdi writeback work queues
If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nick Piggin [Tue, 15 Sep 2009 19:32:58 +0000 (21:32 +0200)]
writeback: remove smp_mb(), it's not needed with list_add_tail_rcu()
list_add_tail_rcu contains required barriers.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 15 Sep 2009 19:27:40 +0000 (21:27 +0200)]
writeback: use schedule_timeout_interruptible()
Gets rid of a manual set_current_state().
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 15 Sep 2009 18:04:57 +0000 (20:04 +0200)]
writeback: add comments to bdi_work structure
And document its retriever, get_next_work_item().
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 14 Sep 2009 10:57:56 +0000 (12:57 +0200)]
writeback: splice dirty inode entries to default bdi on bdi_destroy()
We cannot safely ensure that the inodes are all gone at this point
in time, and we must not destroy this bdi with inodes having off it.
So just splice our entries to the default bdi since that one will
always persist.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 16 Sep 2009 13:13:54 +0000 (15:13 +0200)]
writeback: separate starting of sync vs opportunistic writeback
bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.
Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Sun, 13 Sep 2009 18:07:36 +0000 (20:07 +0200)]
writeback: inline allocation failure handling in bdi_alloc_queue_work()
This gets rid of work == NULL in bdi_queue_work() and puts the
OOM handling where it belongs.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 14 Sep 2009 11:12:40 +0000 (13:12 +0200)]
writeback: use RCU to protect bdi_list
Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 15 Sep 2009 07:53:35 +0000 (09:53 +0200)]
writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout
Data integrity writeback must use bdi_start_writeback() and ensure
that wbc->sb and wbc->bdi are set.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 16 Sep 2009 13:02:33 +0000 (15:02 +0200)]
fs: Assign bdi in super_block
We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().
Note that ->s_bdi assignment is required for proper writeback!
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 16 Sep 2009 13:18:25 +0000 (15:18 +0200)]
writeback: make wb_writeback() take an argument structure
We need to be able to pass in range_cyclic as well, so instead
of growing yet another argument, split the arguments into a
struct wb_writeback_args structure that we can use internally.
Also makes it easier to just copy all members to an on-stack
struct, since we can't access work after clearing the pending
bit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Christoph Hellwig [Fri, 11 Sep 2009 07:47:56 +0000 (09:47 +0200)]
writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE
Since it's an opportunistic writeback and not a data integrity action,
don't punt to blocking writeback. Just wakeup the thread and it will
flush old data.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 15 Sep 2009 13:10:20 +0000 (15:10 +0200)]
writeback: get rid of wbc->for_writepages
It's only set, it's never checked. Kill it.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 15 Sep 2009 07:43:56 +0000 (09:43 +0200)]
fs: remove bdev->bd_inode_backing_dev_info
It has been unused since it was introduced in:
commit
520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton <akpm@osdl.org>
Date: Fri May 21 00:46:17 2004 -0700
[PATCH] block device layer: separate backing_dev_info infrastructure
So lets just kill it.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Tue, 15 Sep 2009 19:53:11 +0000 (21:53 +0200)]
block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK
kernel/built-in.o:(.data+0x17b0): undefined reference to `blk_iopoll_enabled'
Since the extern declaration makes the compile work, but the actual
symbol is missing when block/blk-iopoll.o isn't linked in.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Ingo Molnar [Tue, 15 Sep 2009 09:00:26 +0000 (11:00 +0200)]
slub: Fix build error in kmem_cache_open() with !CONFIG_SLUB_DEBUG
This build bug:
mm/slub.c: In function 'kmem_cache_open':
mm/slub.c:2476: error: 'disable_higher_order_debug' undeclared (first use in this function)
mm/slub.c:2476: error: (Each undeclared identifier is reported only once
mm/slub.c:2476: error: for each function it appears in.)
Triggers because there's no !CONFIG_SLUB_DEBUG definition for
disable_higher_order_debug.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Greg Kroah-Hartman [Tue, 15 Sep 2009 17:09:27 +0000 (10:09 -0700)]
Staging: remove sxg driver
Unfortunatly, the upstream company has abandonded development of this
driver. So it's best to just remove the driver from the tree.
Cc: Christopher Harrer <charrer@alacritech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Fri, 11 Sep 2009 16:46:35 +0000 (09:46 -0700)]
Staging: remove heci driver
Intel has officially abandoned this project and does not want to
maintian it or have it included in the main kernel tree, as no one
should use the code, it's not needed anymore.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Thu, 3 Sep 2009 18:58:09 +0000 (11:58 -0700)]
Staging: remove at76_usb wireless driver.
There is already an in-kernel driver for this hardware (since 2.6.30),
at76c50x-usb, and it supports all of the same devices. So this driver
can now be deleted.
Acked-by: Kalle Valo <kalle.valo@iki.fi>
Cc: linux-wireless <linux-wireless@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Thu, 3 Sep 2009 04:33:06 +0000 (21:33 -0700)]
Staging: rspiusb: remove the driver
No one cares, it's a custom userspace interface, and the code hasn't
built in a long time. So remove it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Thu, 3 Sep 2009 04:29:37 +0000 (21:29 -0700)]
Staging: meilhaus: remove the drivers
The comedi drivers should be used instead, no need to have
these in here as well.
Cc: David Kiliani <mail@davidkiliani.de>
Cc: Meilhaus Support <support@meilhaus.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Thu, 3 Sep 2009 04:27:08 +0000 (21:27 -0700)]
Staging: remove me4000 driver.
The comedi drivers should be used instead, no need to have
this driver in the tree duplicating that one.
Cc: Wolfgang Beiter <w.beiter@aon.at>
Cc: Guenter Gebhardt <g.gebhardt@meilhaus.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Frederik Deweerdt [Mon, 14 Sep 2009 08:52:37 +0000 (08:52 +0000)]
Staging: line6: ffzb returns an unsigned integer
find_first_zero_bit returns a positive value, use it accordingly.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Frederik Deweerdt [Mon, 14 Sep 2009 08:51:38 +0000 (08:51 +0000)]
Staging: line6: pod.c: style cleanups
Line6 pod.c: Minor style cleanups
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Julia Lawall [Fri, 11 Sep 2009 16:22:27 +0000 (18:22 +0200)]
Staging: iio: introduce missing kfree
Error handling code following a kmalloc or kzalloc should free the
allocated data.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,f1,l;
position p1,p2;
expression *ptr != NULL;
@@
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
when != if (...) { <+...x...+> }
(
x->f1 = E
|
(x->f1 == NULL || ...)
|
f(...,x->f1,...)
)
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Julia Lawall [Fri, 11 Sep 2009 16:22:27 +0000 (18:22 +0200)]
Staging: dream: introduce missing kfree
Error handling code following a kmalloc or kzalloc should free the
allocated data.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,f1,l;
position p1,p2;
expression *ptr != NULL;
@@
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
when != if (...) { <+...x...+> }
(
x->f1 = E
|
(x->f1 == NULL || ...)
|
f(...,x->f1,...)
)
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Roel Kluin [Mon, 31 Aug 2009 08:54:37 +0000 (10:54 +0200)]
Staging: comedi: addi-data: NULL dereference of amcc in v_pci_card_list_init()
amcc allocation may fail, prevent a NULL dereference.
allocation may fail, prevent a dereference.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alexander Beregalov [Thu, 3 Sep 2009 16:38:14 +0000 (20:38 +0400)]
Staging: vt665x: fix built-in compiling
Fix this build error:
undefined reference to "__this_module"
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>