Btrfs: fix wrong outstanding_extents when doing DIO write
authorMiao Xie <miaox@cn.fujitsu.com>
Thu, 21 Feb 2013 09:48:22 +0000 (02:48 -0700)
committerChris Mason <chris.mason@fusionio.com>
Thu, 21 Feb 2013 13:11:43 +0000 (08:11 -0500)
When running the 083th case of xfstests on the filesystem with
"compress-force=lzo", the following WARNINGs were triggered.
  WARNING: at fs/btrfs/inode.c:7908
  WARNING: at fs/btrfs/inode.c:7909
  WARNING: at fs/btrfs/inode.c:7911
  WARNING: at fs/btrfs/extent-tree.c:4510
  WARNING: at fs/btrfs/extent-tree.c:4511

This problem was introduced by the patch "Btrfs: fix deadlock due
to unsubmitted". In this patch, there are two bugs which caused
the above problem.

The 1st one is a off-by-one bug, if the DIO write return 0, it is
also a short write, we need release the reserved space for it. But
we didn't do it in that patch. Fix it by change "ret > 0" to
"ret >= 0".

The 2nd one is ->outstanding_extents was increased twice when
a short write happened. As we know, ->outstanding_extents is
a counter to keep track of the number of extent items we may
use duo to delalloc, when we reserve the free space for a
delalloc write, we assume that the write will introduce just
one extent item, so we increase ->outstanding_extents by 1 at
that time. And then we will increase it every time we split the
write, it is done at the beginning of btrfs_get_blocks_direct().
So when a short write happens, we needn't increase
->outstanding_extents again. But this patch done.

In order to fix the 2nd problem, I re-write the logic for
->outstanding_extents operation. We don't increase it at the
beginning of btrfs_get_blocks_direct(), instead, we just
increase it when the split actually happens.

Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
fs/btrfs/inode.c

index 4d0aec0..40d49da 100644 (file)
@@ -6708,12 +6708,9 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
        int unlock_bits = EXTENT_LOCKED;
        int ret = 0;
 
-       if (create) {
-               spin_lock(&BTRFS_I(inode)->lock);
-               BTRFS_I(inode)->outstanding_extents++;
-               spin_unlock(&BTRFS_I(inode)->lock);
+       if (create)
                unlock_bits |= EXTENT_DELALLOC | EXTENT_DIRTY;
-       else
+       else
                len = min_t(u64, len, root->sectorsize);
 
        lockstart = start;
@@ -6855,6 +6852,10 @@ unlock:
                if (start + len > i_size_read(inode))
                        i_size_write(inode, start + len);
 
+               spin_lock(&BTRFS_I(inode)->lock);
+               BTRFS_I(inode)->outstanding_extents++;
+               spin_unlock(&BTRFS_I(inode)->lock);
+
                ret = set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
                                     lockstart + len - 1, EXTENT_DELALLOC, NULL,
                                     &cached_state, GFP_NOFS);
@@ -7362,14 +7363,11 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb,
        if (rw & WRITE) {
                if (ret < 0 && ret != -EIOCBQUEUED)
                        btrfs_delalloc_release_space(inode, count);
-               else if (ret > 0 && (size_t)ret < count) {
-                       spin_lock(&BTRFS_I(inode)->lock);
-                       BTRFS_I(inode)->outstanding_extents++;
-                       spin_unlock(&BTRFS_I(inode)->lock);
+               else if (ret >= 0 && (size_t)ret < count)
                        btrfs_delalloc_release_space(inode,
                                                     count - (size_t)ret);
-               }
-               btrfs_delalloc_release_metadata(inode, 0);
+               else
+                       btrfs_delalloc_release_metadata(inode, 0);
        }
 out:
        if (wakeup)