DESCRIPTION
-----------
-The primary purpose of the balance feature is to spread block groups accross
+The primary purpose of the balance feature is to spread block groups across
all devices so they match constraints defined by the respective profiles. See
`mkfs.btrfs`(8) section 'PROFILES' for more details.
The scope of the balancing process can be further tuned by use of filters that
The balance operation is cancellable by the user. The on-disk state of the
filesystem is always consistent so an unexpected interruption (eg. system crash,
reboot) does not corrupt the filesystem. The progress of the balance operation
-is temporarily stored and will be resumed upon mount, unless the mount option
-'skip_balance' is specified.
+is temporarily stored as an internal state and will be resumed upon mount,
+unless the mount option 'skip_balance' is specified.
WARNING: running balance without filters will take a lot of time as it basically
rewrites the entire filesystem and needs to update all block pointers.
NOTE: The balance subcommand also exists under the *btrfs filesystem*
namespace. This still works for backward compatibility but is deprecated and
-should not be used anymore.
+should not be used any more.
NOTE: A short syntax *btrfs balance <path>* works due to backward compatibility
-but is deprecated and should not be used anymore. Use *btrfs balance start*
+but is deprecated and should not be used any more. Use *btrfs balance start*
command instead.
+PERFORMANCE IMPLICATIONS
+------------------------
+
+Balancing operations are very IO intensive and can also be quite CPU intensive,
+impacting other ongoing filesystem operations. Typically large amounts of data
+are copied from one location to another, with corresponding metadata updates.
+
+Depending upon the block group layout, it can also be seek heavy. Performance
+on rotational devices is noticeably worse compared to SSDs or fast arrays.
+
SUBCOMMAND
----------
*cancel* <path>::
-cancel running or paused balance
+cancels a running or paused balance, the command will block and wait until the
+current blockgroup being processed completes
*pause* <path>::
pause running balance operation, this will store the state of the balance
progress and used filters to the filesystem
*resume* <path>::
-resume interrupted balance
+resume interrupted balance, the balance status must be stored on the filesystem
+from previous run, eg. after it was forcibly interrupted and mounted again with
+'skip_balance'
*start* [options] <path>::
start the balance operation according to the specified filters, no filters
will rewrite the entire filesystem. The process runs in the foreground.
+
+NOTE: the balance command without filters will basically rewrite everything
+in the filesystem. The run time is potentially very long, depending on the
+filesystem size. To prevent starting a full balance by accident, the user is
+warned and has a few seconds to cancel the operation before it starts. The
+warning and delay can be skipped with '--full-balance' option.
++
+Please note that the filters must be written together with the '-d', '-m' and
+'-s' options, because they're optional and bare '-d' etc also work and mean no
+filters.
++
`Options`
+
-d[<filters>]::::
be verbose and print balance filter arguments
-f::::
force reducing of metadata integrity, eg. when going from 'raid1' to 'single'
+--background|--bg::::
+run the balance operation asynchronously in the background, uses `fork`(2) to
+start the process that calls the kernel ioctl
*status* [-v] <path>::
Show status of running or paused balance.
FILTERS
-------
From kernel 3.3 onwards, btrfs balance can limit its action to a subset of the
-full filesystem, and can be used to change the replication configuration (e.g.
+whole filesystem, and can be used to change the replication configuration (e.g.
moving data from single to RAID1). This functionality is accessed through the
'-d', '-m' or '-s' options to btrfs balance start, which filter on data,
metadata and system blocks respectively.
*devid=<id>*::
Balances only block groups which have at least one chunk on the given
-device. To list devices with ids use *btrfs fi show*.
+device. To list devices with ids use *btrfs filesystem show*.
*drange=<range>*::
Balance only block groups which overlap with the given byte range on any
+
NOTE: starting with kernel 4.5, the 'data' chunks can be converted to/from the
'DUP' profile on a single device.
++
+NOTE: starting with kernel 4.6, all profiles can be converted to/from 'DUP' on
+multi-device filesystems.
*limit=<number>*::
*limit=<range>*::
*stripes=<range>*::
Balance only block groups which have the given number of stripes. The parameter
-is a range specified as 'start..end'. Makes sense fo block group profiles that
+is a range specified as 'start..end'. Makes sense for block group profiles that
utilize striping, ie. RAID0/10/5/6. The range minimum and maximum are
inclusive.
------
The way balance operates, it usually needs to temporarily create a new block
-group and move the old data there. For that it needs work space, otherwise
-it fails for ENOSPC reasons.
+group and move the old data there, before the old block group can be removed.
+For that it needs the work space, otherwise it fails for ENOSPC reasons.
This is not the same ENOSPC as if the free space is exhausted. This refers to
-the space on the level of block groups.
+the space on the level of block groups, which are bigger parts of the filesystem
+that contain many file extents.
The free work space can be calculated from the output of the *btrfs filesystem show*
command:
An example of a filter that does not require workspace is 'usage=0'. This will
scan through all unused block groups of a given type and will reclaim the
-space. Ater that it might be possible to run other filters.
+space. After that it might be possible to run other filters.
**CONVERSIONS ON MULTIPLE DEVICES**
Conversion to profiles based on striping (RAID0, RAID5/6) require the work
space on each device. An interrupted balance may leave partially filled block
-groups that might consume the work space.
+groups that consume the work space.
+
+EXAMPLES
+--------
+
+A more comprehensive example when going from one to multiple devices, and back,
+can be found in section 'TYPICAL USECASES' of `btrfs-device`(8).
+
+MAKING BLOCK GROUP LAYOUT MORE COMPACT
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The layout of block groups is not normally visible; most tools report only
+summarized numbers of free or used space, but there are still some hints
+provided.
+
+Let's use the following real life example and start with the output:
+
+--------------------
+$ btrfs filesystem df /path
+Data, single: total=75.81GiB, used=64.44GiB
+System, RAID1: total=32.00MiB, used=20.00KiB
+Metadata, RAID1: total=15.87GiB, used=8.84GiB
+GlobalReserve, single: total=512.00MiB, used=0.00B
+--------------------
+
+Roughly calculating for data, '75G - 64G = 11G', the used/total ratio is
+about '85%'. How can we can interpret that:
+
+* chunks are filled by 85% on average, ie. the 'usage' filter with anything
+ smaller than 85 will likely not affect anything
+* in a more realistic scenario, the space is distributed unevenly, we can
+ assume there are completely used chunks and the remaining are partially filled
+
+Compacting the layout could be used on both. In the former case it would spread
+data of a given chunk to the others and removing it. Here we can estimate that
+roughly 850 MiB of data have to be moved (85% of a 1 GiB chunk).
+
+In the latter case, targeting the partially used chunks will have to move less
+data and thus will be faster. A typical filter command would look like:
+
+--------------------
+# btrfs balance start -dusage=50 /path
+Done, had to relocate 2 out of 97 chunks
+
+$ btrfs filesystem df /path
+Data, single: total=74.03GiB, used=64.43GiB
+System, RAID1: total=32.00MiB, used=20.00KiB
+Metadata, RAID1: total=15.87GiB, used=8.84GiB
+GlobalReserve, single: total=512.00MiB, used=0.00B
+--------------------
+
+As you can see, the 'total' amount of data is decreased by just 1 GiB, which is
+an expected result. Let's see what will happen when we increase the estimated
+usage filter.
+
+--------------------
+# btrfs balance start -dusage=85 /path
+Done, had to relocate 13 out of 95 chunks
+
+$ btrfs filesystem df /path
+Data, single: total=68.03GiB, used=64.43GiB
+System, RAID1: total=32.00MiB, used=20.00KiB
+Metadata, RAID1: total=15.87GiB, used=8.85GiB
+GlobalReserve, single: total=512.00MiB, used=0.00B
+--------------------
+
+Now the used/total ratio is about 94% and we moved about '74G - 68G = 6G' of
+data to the remaining blockgroups, ie. the 6GiB are now free of filesystem
+structures, and can be reused for new data or metadata block groups.
+
+We can do a similar exercise with the metadata block groups, but this should
+not typically be necessary, unless the used/total ratio is really off. Here
+the ratio is roughly 50% but the difference as an absolute number is "a few
+gigabytes", which can be considered normal for a workload with snapshots or
+reflinks updated frequently.
+
+--------------------
+# btrfs balance start -musage=50 /path
+Done, had to relocate 4 out of 89 chunks
+
+$ btrfs filesystem df /path
+Data, single: total=68.03GiB, used=64.43GiB
+System, RAID1: total=32.00MiB, used=20.00KiB
+Metadata, RAID1: total=14.87GiB, used=8.85GiB
+GlobalReserve, single: total=512.00MiB, used=0.00B
+--------------------
+
+Just 1 GiB decrease, which possibly means there are block groups with good
+utilization. Making the metadata layout more compact would in turn require
+updating more metadata structures, ie. lots of IO. As running out of metadata
+space is a more severe problem, it's not necessary to keep the utilization
+ratio too high. For the purpose of this example, let's see the effects of
+further compaction:
+
+--------------------
+# btrfs balance start -musage=70 /path
+Done, had to relocate 13 out of 88 chunks
+
+$ btrfs filesystem df .
+Data, single: total=68.03GiB, used=64.43GiB
+System, RAID1: total=32.00MiB, used=20.00KiB
+Metadata, RAID1: total=11.97GiB, used=8.83GiB
+GlobalReserve, single: total=512.00MiB, used=0.00B
+--------------------
+
+GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Normally the balance operation needs a work space, to temporarily move the
+data before the old block groups gets removed. If there's no work space, it
+ends with 'no space left'.
+
+There's a special case when the block groups are completely unused, possibly
+left after removing lots of files or deleting snapshots. Removing empty block
+groups is automatic since 3.18. The same can be achieved manually with a
+notable exception that this operation does not require the work space. Thus it
+can be used to reclaim unused block groups to make it available.
+
+--------------------
+# btrfs balance start -dusage=0 /path
+--------------------
+
+This should lead to decrease in the 'total' numbers in the *btrfs filesystem df* output.
EXIT STATUS
-----------