Chris Mason [Wed, 11 Jun 2008 20:51:38 +0000 (16:51 -0400)]
Btrfs: Fix mount -o max_inline=0
max_inline=0 used to force the max_inline size to one sector instead. Now
it properly disables inline data items, while still being able to read
any that happen to exist on disk.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 11 Jun 2008 20:50:36 +0000 (16:50 -0400)]
Btrfs: Add async worker threads for pre and post IO checksumming
Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system. But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.
This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up. The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.
The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines. Two worker pools are created,
one for writes and one for endio processing. The two could deadlock if
we tried to service both from a single pool.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph Hellwig [Tue, 10 Jun 2008 14:40:46 +0000 (10:40 -0400)]
btrfs: allow scanning multiple devices during mount
Allows to specify one or multiple device=/dev/foo options during mount
so that ioctls on the control device can be avoided. Especially useful
when trying to mount a multi-device setup as root.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph Hellwig [Tue, 10 Jun 2008 14:40:29 +0000 (10:40 -0400)]
btrfs: sanity mount option parsing and early mount code
Also adds lots of comments to describe what's going on here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph Hellwig [Tue, 10 Jun 2008 14:21:04 +0000 (10:21 -0400)]
btrfs: fix strange indentation in lookup_extent_mapping
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph Hellwig [Tue, 10 Jun 2008 14:20:57 +0000 (10:20 -0400)]
btrfs: tiny makefile cleanup
use normal kbuild syntax to build acl.o conditinally and remove comment
out lines.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Sage Weil [Tue, 10 Jun 2008 14:07:39 +0000 (10:07 -0400)]
Btrfs: transaction ioctls
These ioctls let a user application hold a transaction open while it
performs a series of operations. A final ioctl does a sync on the fs
(closing the current transaction). This is the main requirement for
Ceph's OSD to be able to keep the data it's storing in a btrfs volume
consistent, and AFAICS it works just fine. The application would do
something like
fd = ::open("some/file", O_RDONLY);
::ioctl(fd, BTRFS_IOC_TRANS_START);
/* do a bunch of stuff */
::ioctl(fd, BTRFS_IOC_TRANS_END);
or just
::close(fd);
And to ensure it commits to disk,
::ioctl(fd, BTRFS_IOC_SYNC);
When a transaction is held open, the trans_handle is attached to the
struct file (via private_data) so that it will get cleaned up if the
process dies unexpectedly. A held transaction is also ended on fsync() to
avoid a deadlock.
A misbehaving application could also deliberately hold a transaction open,
effectively locking up the FS, so it may make sense to restrict something
like this to root or something.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan [Tue, 10 Jun 2008 02:21:46 +0000 (22:21 -0400)]
Btrfs: Dislable acl xattr handlers
The acl code is not yet complete, and the xattr handlers are causing
problems for cp -p on some distros.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Jan Engelhardt [Tue, 10 Jun 2008 02:19:40 +0000 (22:19 -0400)]
Btrfs: bdi_init and bdi_destroy come with 2.6.23
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linda Knippers [Tue, 10 Jun 2008 02:17:11 +0000 (22:17 -0400)]
btrfsctl -A error code fixup
Send the error back to userland if the ioctl fails
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Sven Wegener [Tue, 10 Jun 2008 01:57:42 +0000 (21:57 -0400)]
Btrfs: Invalidate dcache entry after creating snapshot and
We need to invalidate an existing dcache entry after creating a new
snapshot or subvolume, because a negative dache entry will stop us from
accessing the new snapshot or subvolume.
---
ctree.h | 23 +++++++++++++++++++++++
inode.c | 4 ++++
transaction.c | 4 ++++
3 files changed, 31 insertions(+)
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 9 Jun 2008 13:35:50 +0000 (09:35 -0400)]
Btrfs: Fix race in running_transaction checks
When a new transaction was started, the code would incorrectly
set the pointer in fs_info before all the data structures were setup.
fsync heavy workloads hit races on the setup of the ordered inode spinlock
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Mingming [Tue, 27 May 2008 14:55:43 +0000 (10:55 -0400)]
btrfs delete ordered inode handling fix
Use btrfs_release_file instead of a put_inode call
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 27 May 2008 14:52:17 +0000 (10:52 -0400)]
Btrfs: Always use the async submission queue for checksummed writes
This avoids IO stalls and poorly ordered IO from inline writers mixing in
with the async submission queue
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Sat, 24 May 2008 18:04:53 +0000 (14:04 -0400)]
Btrfs: Allocator fix variety pack
* Force chunk allocation when find_free_extent has to do a full scan
* Record the max key at the start of defrag so it doesn't run forever
* Block groups might not be contiguous, make a forward search for the
next block group in extent-tree.c
* Get rid of extra checks for total fs size
* Fix relocate_one_reference to avoid relocating the same file data block
twice when referenced by an older transaction
* Use the open device count when allocating chunks so that we don't
try to allocate from devices that don't exist
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 16 May 2008 17:30:15 +0000 (13:30 -0400)]
Btrfs: Use kzalloc on the fs_devices allocation
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 16 May 2008 17:14:57 +0000 (13:14 -0400)]
Btrfs: Handle transid == 0 while opening devices
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 16 May 2008 17:06:51 +0000 (13:06 -0400)]
Btrfs: Enable btree balancing on old kernels again
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 15 May 2008 20:15:45 +0000 (16:15 -0400)]
Btrfs: Change the congestion functions to meter the number of async submits as well
The async submit workqueue was absorbing too many requests, leading to long
stalls where the async submitters were stalling.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 15 May 2008 13:13:45 +0000 (09:13 -0400)]
Fix corners in writepage and btrfs_truncate_page
The extent_io writepage calls needed an extra check for discarding
pages that started on th last byte in the file.
btrfs_truncate_page needed checks to make sure the page was still part
of the file after reading it, and most importantly, needed to wait for
all IO to the page to finish before freeing the corresponding extents on
disk.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 13 May 2008 20:03:06 +0000 (16:03 -0400)]
Fix btrfs_open_devices to deal with changes since the scan ioctls
Devices can change after the scan ioctls are done, and btrfs_open_devices
needs to be able to verify them as they are opened and used by the FS.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 13 May 2008 17:46:40 +0000 (13:46 -0400)]
Btrfs: Add mount -o degraded to allow mounts to continue with missing devices
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 12 May 2008 17:39:03 +0000 (13:39 -0400)]
Btrfs: Handle write errors on raid1 and raid10
When duplicate copies exist, writes are allowed to fail to one of those
copies. This changeset includes a few changes that allow the FS to
continue even when some IOs fail.
It also adds verification of the parent generation number for btree blocks.
This generation is stored in the pointer to a block, and it ensures
that missed writes to are detected.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 12 May 2008 16:59:19 +0000 (12:59 -0400)]
Btrfs: Pass down the expected generation number when reading tree blocks
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 9 May 2008 15:52:25 +0000 (11:52 -0400)]
Btrfs: Don't do btree balance_dirty_pages on old kernels, it stalls forever
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 9 May 2008 15:46:48 +0000 (11:46 -0400)]
Btrfs: Chunk relocation fine tuning, and add a few printks to show progress
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 8 May 2008 20:31:21 +0000 (16:31 -0400)]
Btrfs: A number of nodatacow fixes
Once part of a delalloc request fails the cow checks, just cow the
entire range
It is possible for the back references to all be from the same root,
but still have snapshots against an extent. The checks are now more strict,
forcing cow any time there are multiple refs against the data extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 8 May 2008 19:05:58 +0000 (15:05 -0400)]
Btrfs: Only open block devices once during mount -o subvol=
btrfs_open_devices needed a check to see if the device was already
open.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 8 May 2008 18:11:56 +0000 (14:11 -0400)]
Btrfs: Update nodatacow mode to support cloned single files and resizing
Before, nodatacow only checked to make sure multiple roots didn't have
references on a single extent. This check makes sure that multiple
inodes don't have references.
nodatacow needed an extra check to see if the block group was currently
readonly. This way cows forced by the chunk relocation code are honored.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 8 May 2008 17:26:18 +0000 (13:26 -0400)]
Btrfs: Properly find the root for snapshotted blocks during chunk relocation
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 7 May 2008 15:43:44 +0000 (11:43 -0400)]
Btrfs: Add support for online device removal
This required a few structural changes to the code that manages bdev pointers:
The VFS super block now gets an anon-bdev instead of a pointer to the
lowest bdev. This allows us to avoid swapping the super block bdev pointer
around at run time.
The code to read in the super block no longer goes through the extent
buffer interface. Things got ugly keeping the mapping constant.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 5 May 2008 10:26:21 +0000 (06:26 -0400)]
Btrfs: Fix clone ioctl to not hold the path over inserts
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 2 May 2008 20:13:49 +0000 (16:13 -0400)]
Btrfs: Silence bogus inode.c compiler warnings
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Jeff Mahoney [Fri, 2 May 2008 19:03:58 +0000 (15:03 -0400)]
Btrfs: Add workaround for AppArmor changing remove_suid()
In openSUSE 10.3, AppArmor modifies remove_suid to take a struct path
rather than just a dentry. This patch tests that the kernel is openSUSE
10.3 or newer and adjusts the call accordingly.
Debian/Ubuntu with AppArmor applied will also need a similar patch.
Maintainers of btrfs under those distributions should build on this
patch or, alternatively, alter their package descriptions to add
-DREMOVE_SUID_PATH to the compiler command line.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
- --- /dev/null 1970-01-01 00:00:00.
000000000 +0000
+++ b/compat.h 2008-02-06 16:46:13.
000000000 -0500
@@ -0,0 +1,15 @@
+#ifndef _COMPAT_H_
+#define _COMPAT_H_
+
+
+/*
+ * Even if AppArmor isn't enabled, it still has different prototypes.
+ * Add more distro/version pairs here to declare which has AppArmor applied.
+ */
+#if defined(CONFIG_SUSE_KERNEL)
+# if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22)
+# define REMOVE_SUID_PATH 1
+# endif
+#endif
+
+#endif /* _COMPAT_H_ */
- --- a/file.c 2008-02-06 11:37:39.
000000000 -0500
+++ b/file.c 2008-02-06 16:46:23.
000000000 -0500
@@ -37,6 +37,7 @@
#include "ordered-data.h"
#include "ioctl.h"
#include "print-tree.h"
+#include "compat.h"
static int btrfs_copy_from_user(loff_t pos, int num_pages, int write_bytes,
@@ -790,7 +791,11 @@ static ssize_t btrfs_file_write(struct f
goto out_nolock;
if (count == 0)
goto out_nolock;
+#ifdef REMOVE_SUID_PATH
+ err = remove_suid(&file->f_path);
+#else
err = remove_suid(fdentry(file));
+#endif
if (err)
goto out_nolock;
file_update_time(file);
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 2 May 2008 18:49:33 +0000 (14:49 -0400)]
Btrfs: Fix do_sync_file_range ifdefs (2.6.22)
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 2 May 2008 18:43:15 +0000 (14:43 -0400)]
Btrfs: Compile warning fixup in volume.c
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Sage Weil [Fri, 2 May 2008 18:43:14 +0000 (14:43 -0400)]
Btrfs: Clone file data ioctl
Add a new ioctl to clone file data
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 30 Apr 2008 17:59:35 +0000 (13:59 -0400)]
Btrfs: Fixes for 2.6.18 enterprise kernels
2.6.18 seems to get caught in an infinite loop when
cancel_rearming_delayed_workqueue is called more than once, so this switches
to cancel_delayed_work, which is arguably more correct.
Also, balance_dirty_pages can run into problems with 2.6.18 based kernels
because it doesn't have the per-bdi dirty limits. This avoids calling
balance_dirty_pages on the btree inode unless there is actually something
to balance, which is a good optimization in general.
Finally there's a compile fix for ordered-data.h
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 29 Apr 2008 18:12:09 +0000 (14:12 -0400)]
Btrfs: Tune stripe selection for raid1 and raid10
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 29 Apr 2008 13:38:00 +0000 (09:38 -0400)]
Btrfs: Deal with failed writes in mirrored configurations
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 28 Apr 2008 20:40:52 +0000 (16:40 -0400)]
Btrfs: Drop some verbose printks
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 28 Apr 2008 19:29:52 +0000 (15:29 -0400)]
Btrfs: Add balance ioctl to restripe the chunks
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 28 Apr 2008 19:29:42 +0000 (15:29 -0400)]
Btrfs: Add new ioctl to add devices
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 28 Apr 2008 13:02:36 +0000 (09:02 -0400)]
Btrfs: Do more optimal file RA during shrinking and defrag
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Sat, 26 Apr 2008 15:03:32 +0000 (11:03 -0400)]
Btrfs: Avoid recursive chunk allocations
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 25 Apr 2008 20:53:30 +0000 (16:53 -0400)]
Btrfs: Make the resizer work based on shrinking and growing devices
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 25 Apr 2008 13:10:45 +0000 (09:10 -0400)]
Btrfs: write_cache_pages came in 2.6.22
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 25 Apr 2008 13:04:37 +0000 (09:04 -0400)]
Btrfs: Add failure handling for read_sys_array
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 25 Apr 2008 13:00:55 +0000 (09:00 -0400)]
Btrfs: write_extent_pages came in 2.6.23
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 25 Apr 2008 12:51:48 +0000 (08:51 -0400)]
Btrfs: Throttle file_write when data=ordered is flushing the inode
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 24 Apr 2008 18:42:46 +0000 (14:42 -0400)]
Btrfs: Fix balance_level to free the middle block if there is room in the left one
balance level starts by trying to empty the middle block, and then
pushes from the right to the middle. This might empty the right block
and leave a small number of pointers in the middle.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 24 Apr 2008 14:54:32 +0000 (10:54 -0400)]
Btrfs: Don't empty the middle buffer in push_nodes_for_insert
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 24 Apr 2008 13:34:34 +0000 (09:34 -0400)]
Btrfs: Fix split_node to require more empty slots in the node as well
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 24 Apr 2008 13:22:51 +0000 (09:22 -0400)]
Btrfs: Make sure nodes have enough room for a double split
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 22 Apr 2008 17:26:47 +0000 (13:26 -0400)]
Btrfs: Fix the unplug_io_fn to grab a consistent copy of page->mapping
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 22 Apr 2008 17:26:46 +0000 (13:26 -0400)]
Fix btrfs_get_extent and get_block corner cases, and disable O_DIRECT reads
The generic O_DIRECT code assumes all the bios have the same bdev,
which isn't true for multi-device btrfs.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 22 Apr 2008 13:24:20 +0000 (09:24 -0400)]
Btrfs: Set nodatasum on the inode when written by a nodatasum mount
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 22 Apr 2008 13:22:11 +0000 (09:22 -0400)]
Deal with page == NULL in the btrfs_unplug_io_fn
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 22 Apr 2008 13:22:07 +0000 (09:22 -0400)]
Btrfs: Add a special device list for chunk allocations
This allows other code that needs to walk every device in the FS to do so
without locking against allocations.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 21 Apr 2008 16:01:38 +0000 (12:01 -0400)]
Btrfs: Simplify device selection for mirrored reads
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 21 Apr 2008 14:03:05 +0000 (10:03 -0400)]
Btrfs: Make an unplug function that doesn't unplug every spindle
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 21 Apr 2008 12:52:50 +0000 (08:52 -0400)]
Btrfs: Remove debugging statements from the invalidatepage calls
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 21 Apr 2008 12:28:10 +0000 (08:28 -0400)]
Btrfs: Add 1MB to the min_free in alloc_chunk
This properly reflects the first 1MB we skip at the start of the device
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 20:13:31 +0000 (16:13 -0400)]
Btrfs: Scale the bdi ra_pages by the number of devices in the FS
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 20:11:30 +0000 (16:11 -0400)]
Force page->private removal in btrfs_invalidatepage
btrfs_invalidatepage is not allowed to leave pages around on the lru.
Any such pages will trigger an oops later on because the VM will see
page->private and assume it is a buffer head.
This also forces extra flushes of the async work queues before
dropping all the pages on the btree inode during unmount. Left over
items on the work queues are one possible cause of busy state ranges
during truncate_inode_pages.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 18:17:20 +0000 (14:17 -0400)]
Btrfs: Set the btree inode i_size to OFFSET_MAX
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 15:55:51 +0000 (11:55 -0400)]
Btrfs: Fix chunk allocation when some devices don't have enough room for stripes
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 14:29:51 +0000 (10:29 -0400)]
Btrfs: Calculate appropriate chunk sizes for both small and large filesystems
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 14:29:50 +0000 (10:29 -0400)]
Btrfs: Don't drop extent_map cache during releasepage on the btree inode
The btree inode should only have a single extent_map in the cache,
it doesn't make sense to ever drop it.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 14:29:49 +0000 (10:29 -0400)]
Btrfs: Add support for labels in the super block
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 18 Apr 2008 14:29:38 +0000 (10:29 -0400)]
Btrfs: Check device uuids along with devids
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 17 Apr 2008 18:08:30 +0000 (14:08 -0400)]
Btrfs: Remove bogus max_sector warnings from the extent_io code
It was testing the bio before doing logical->physical mapping, so the
test was always wrong.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 17 Apr 2008 15:58:30 +0000 (11:58 -0400)]
Btrfs: Avoid 64 bit div for RAID10
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 17 Apr 2008 15:29:12 +0000 (11:29 -0400)]
Btrfs: Use the extent map cache to find the logical disk block during data retries
The data read retry code needs to find the logical disk block before it
can resubmit new bios. But, finding this block isn't allowed to take
the fs_mutex because that will deadlock with a number of different callers.
This changes the retry code to use the extent map cache instead, but
that requires the extent map cache to have the extent we're looking for.
This is a problem because btrfs_drop_extent_cache just drops the entire
extent instead of the little tiny part it is invalidating.
The bulk of the code in this patch changes btrfs_drop_extent_cache to
invalidate only a portion of the extent cache, and changes btrfs_get_extent
to deal with the results.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 16 Apr 2008 17:06:16 +0000 (13:06 -0400)]
Btrfs: Only do async bio submission for pdflush
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 16 Apr 2008 16:59:22 +0000 (12:59 -0400)]
Btrfs: Don't wait on tree block writeback before freeing them anymore
This isn't required anymore because we don't reallocate blocks that
have already been written in this transaction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 16 Apr 2008 15:15:20 +0000 (11:15 -0400)]
Btrfs: Write bio checksumming outside the FS mutex
This significantly improves streaming write performance by allowing
concurrency in the data checksumming.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 16 Apr 2008 15:14:51 +0000 (11:14 -0400)]
Btrfs: Create a work queue for bio writes
This allows checksumming to happen in parallel among many cpus, and
keeps us from bogging down pdflush with the checksumming code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 16 Apr 2008 14:49:51 +0000 (10:49 -0400)]
Btrfs: Add RAID10 support
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 15 Apr 2008 19:41:47 +0000 (15:41 -0400)]
Btrfs: Add chunk uuids and update multi-device back references
Block headers now store the chunk tree uuid
Chunk items records the device uuid for each stripes
Device extent items record better back refs to the chunk tree
Block groups record better back refs to the chunk tree
The chunk tree format has also changed. The objectid of BTRFS_CHUNK_ITEM_KEY
used to be the logical offset of the chunk. Now it is a chunk tree id,
with the logical offset being stored in the offset field of the key.
This allows a single chunk tree to record multiple logical address spaces,
upping the number of bytes indexed by a chunk tree from 2^64 to
2^128.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 14 Apr 2008 13:48:18 +0000 (09:48 -0400)]
Btrfs: A few updates for 2.6.18 and versions older than 2.6.25
This includes fixing a missing spinlock init call that caused oops on mount
for most kernels other than 2.6.25.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 14 Apr 2008 13:46:10 +0000 (09:46 -0400)]
Add a min size parameter to btrfs_alloc_extent
On huge machines, delayed allocation may try to allocate massive extents.
This change allows btrfs_alloc_extent to return something smaller than
the caller asked for, and the data allocation routines will loop over
the allocations until it fills the whole delayed alloc.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Miguel [Fri, 11 Apr 2008 19:50:59 +0000 (15:50 -0400)]
Btrfs: bio_endio support for linux 2.6.23 and older.
bio_endio() changed prototype on linux 2.6.24, support older kernels
using the older prototype.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Miguel [Fri, 11 Apr 2008 19:46:48 +0000 (15:46 -0400)]
Btrfs: define write_cache_pages for linux kernel <= 2.6.20 instead
write_cache_pages doesn't exist in linux 2.6.20, change the #if
condition to match that.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Miguel [Fri, 11 Apr 2008 19:45:51 +0000 (15:45 -0400)]
Btrfs: Endianess bug fix for v0.13 with kernels
Fix for a endianess BUG when using btrfs v0.13 with kernels older than 2.6.23
Problem:
Has of v0.13, btrfs-progs is using crc32c.c equivalent to the one found on
linux-2.6.23/lib/libcrc32c.c Since crc32c_le() changed in linux-2.6.23, when
running btrfs v0.13 with older kernels we have a missmatch between the versions
of crc32c_le() from btrfs-progs and libcrc32c in the kernel. This missmatch
causes a bug when using btrfs on big endian machines.
Solution:
btrfs_crc32c() macro that when compiling for kernels older than 2.6.23, does
endianess conversion to parameters and return value of crc32c().
This endianess conversion nullifies the differences in implementation
of crc32c_le().
If kernel 2.6.23 or better, it calls crc32c().
Signed-off-by: Miguel Sousa Filipe <miguel.filipe@gmail.com>
---
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 11 Apr 2008 16:16:46 +0000 (12:16 -0400)]
Btrfs: Fixup a few u64<->pointer casts for 32 bit
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 11 Apr 2008 14:51:07 +0000 (10:51 -0400)]
Btrfs: Add extra checks to avoid removing extent_state from pages we can't free
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 10 Apr 2008 20:19:33 +0000 (16:19 -0400)]
Btrfs: Write out all super blocks on commit, and bring back proper barrier support
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 10 Apr 2008 14:23:21 +0000 (10:23 -0400)]
Btrfs: Add O_DIRECT read and write (writes == buffered + cache flush)
This adds basic O_DIRECT read and write support. In the write case, we
just do a normal buffered write followed by a cache flush. O_DIRECT +
O_SYNC are required to trigger metadata syncs.
In the read case, there is a basic btrfs_get_block call for use by
the generic O_DIRECT code. This does honor multi-volume mapping rules
but it skips all checksumming.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 10 Apr 2008 14:23:19 +0000 (10:23 -0400)]
Btrfs: Disable extra debugging checks on tree blocks
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 9 Apr 2008 20:28:12 +0000 (16:28 -0400)]
Btrfs: Handle checksumming errors while reading data blocks
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 9 Apr 2008 20:28:12 +0000 (16:28 -0400)]
Btrfs: Retry metadata reads in the face of checksum failures
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 9 Apr 2008 20:28:12 +0000 (16:28 -0400)]
Btrfs: Handle data block end_io through the async work queue
Before it was done by the bio end_io routine, the work queue code is able
to scale much better with faster IO subsystems.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 9 Apr 2008 20:28:12 +0000 (16:28 -0400)]
Btrfs: Do metadata checksums for reads via a workqueue
Before, metadata checksumming was done by the callers of read_tree_block,
which would set EXTENT_CSUM bits in the extent tree to show that a given
range of pages was already checksummed and didn't need to be verified
again.
But, those bits could go away via try_to_releasepage, and the end
result was bogus checksum failures on pages that never left the cache.
The new code validates checksums when the page is read. It is a little
tricky because metadata blocks can span pages and a single read may
end up going via multiple bios.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 9 Apr 2008 20:28:12 +0000 (16:28 -0400)]
Btrfs: Add additional debugging for metadata checksum failures
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 9 Apr 2008 20:28:12 +0000 (16:28 -0400)]
Change btrfs_map_block to return a structure with mappings for all stripes
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 4 Apr 2008 19:40:00 +0000 (15:40 -0400)]
Btrfs: Fix allocation profile init
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Fri, 4 Apr 2008 19:40:00 +0000 (15:40 -0400)]
Btrfs: Don't allow written blocks from this transaction to be reallocated
When a block is freed, it can be immediately reused if it is from
the current transaction. But, an extra check is required to make sure
the block had not been written yet. If it were reused after being written,
the transid in the block header might match the transid of the
next time the block was allocated.
The parent node records the transaction ID of the block it is pointing to,
and this is used as part of validating the block on reads. So, there
can only be one version of a block per transaction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 3 Apr 2008 20:29:03 +0000 (16:29 -0400)]
Btrfs: Add support for duplicate blocks on a single spindle
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 3 Apr 2008 20:29:03 +0000 (16:29 -0400)]
Btrfs: Add support for mirroring across drives
Signed-off-by: Chris Mason <chris.mason@oracle.com>