Documentation/btrfs-balance.asciidoc

   1 btrfs-balance(8)
   2 ================
   3
   4 NAME
   5 ----
   6 btrfs-balance - balance block groups on a btrfs filesystem
   7
   8 SYNOPSIS
   9 --------
  10 *btrfs balance* <subcommand> <args>
  11
  12 DESCRIPTION
  13 -----------
  14 The primary purpose of the balance feature is to spread block groups across
  15 all devices so they match constraints defined by the respective profiles. See
  16 `mkfs.btrfs`(8) section 'PROFILES' for more details.
  17 The scope of the balancing process can be further tuned by use of filters that
  18 can select the block groups to process. Balance works only on a mounted
  19 filesystem.
  20
  21 The balance operation is cancellable by the user. The on-disk state of the
  22 filesystem is always consistent so an unexpected interruption (eg. system crash,
  23 reboot) does not corrupt the filesystem. The progress of the balance operation
  24 is temporarily stored as an internal state and will be resumed upon mount,
  25 unless the mount option 'skip_balance' is specified.
  26
  27 WARNING: running balance without filters will take a lot of time as it basically
  28 rewrites the entire filesystem and needs to update all block pointers.
  29
  30 The filters can be used to perform following actions:
  31
  32 - convert block group profiles (filter 'convert')
  33 - make block group usage more compact  (filter 'usage')
  34 - perform actions only on a given device (filters 'devid', 'drange')
  35
  36 The filters can be applied to a combination of block group types (data,
  37 metadata, system). Note that changing 'system' needs the force option.
  38
  39 NOTE: the balance operation needs enough work space, ie. space that is
  40 completely unused in the filesystem, otherwise this may lead to ENOSPC reports.
  41 See the section 'ENOSPC' for more details.
  42
  43 COMPATIBILITY
  44 -------------
  45
  46 NOTE: The balance subcommand also exists under the *btrfs filesystem*
  47 namespace. This still works for backward compatibility but is deprecated and
  48 should not be used any more.
  49
  50 NOTE: A short syntax *btrfs balance <path>* works due to backward compatibility
  51 but is deprecated and should not be used any more. Use *btrfs balance start*
  52 command instead.
  53
  54 PERFORMANCE IMPLICATIONS
  55 ------------------------
  56
  57 Balancing operations are very IO intensive and can also be quite CPU intensive,
  58 impacting other ongoing filesystem operations. Typically large amounts of data
  59 are copied from one location to another, with corresponding metadata updates.
  60
  61 Depending upon the block group layout, it can also be seek heavy. Performance
  62 on rotational devices is noticeably worse compared to SSDs or fast arrays.
  63
  64 SUBCOMMAND
  65 ----------
  66 *cancel* <path>::
  67 cancels a running or paused balance, the command will block and wait until the
  68 current blockgroup being processed completes
  69
  70 *pause* <path>::
  71 pause running balance operation, this will store the state of the balance
  72 progress and used filters to the filesystem
  73
  74 *resume* <path>::
  75 resume interrupted balance, the balance status must be stored on the filesystem
  76 from previous run, eg. after it was forcibly interrupted and mounted again with
  77 'skip_balance'
  78
  79 *start* [options] <path>::
  80 start the balance operation according to the specified filters, no filters
  81 will rewrite the entire filesystem. The process runs in the foreground.
  82 +
  83 NOTE: the balance command without filters will basically rewrite everything
  84 in the filesystem. The run time is potentially very long, depending on the
  85 filesystem size. To prevent starting a full balance by accident, the user is
  86 warned and has a few seconds to cancel the operation before it starts. The
  87 warning and delay can be skipped with '--full-balance' option.
  88 +
  89 Please note that the filters must be written together with the '-d', '-m' and
  90 '-s' options, because they're optional and bare '-d' etc also work and mean no
  91 filters.
  92 +
  93 `Options`
  94 +
  95 -d[<filters>]::::
  96 act on data block groups, see `FILTERS` section for details about 'filters'
  97 -m[<filters>]::::
  98 act on metadata chunks, see `FILTERS` section for details about 'filters'
  99 -s[<filters>]::::
 100 act on system chunks (requires '-f'), see `FILTERS` section for details about 'filters'.
 101 -v::::
 102 be verbose and print balance filter arguments
 103 -f::::
 104 force reducing of metadata integrity, eg. when going from 'raid1' to 'single'
 105 --background|--bg::::
 106 run the balance operation asynchronously in the background, uses `fork`(2) to
 107 start the process that calls the kernel ioctl
 108
 109 *status* [-v] <path>::
 110 Show status of running or paused balance.
 111 +
 112 If '-v' option is given, output will be verbose.
 113
 114 FILTERS
 115 -------
 116 From kernel 3.3 onwards, btrfs balance can limit its action to a subset of the
 117 whole filesystem, and can be used to change the replication configuration (e.g.
 118 moving data from single to RAID1). This functionality is accessed through the
 119 '-d', '-m' or '-s' options to btrfs balance start, which filter on data,
 120 metadata and system blocks respectively.
 121
 122 A filter has the following structure: 'type'[='params'][,'type'=...]
 123
 124 The available types are:
 125
 126 *profiles=<profiles>*::
 127 Balances only block groups with the given profiles. Parameters
 128 are a list of profile names separated by "'|'" (pipe).
 129
 130 *usage=<percent>*::
 131 *usage=<range>*::
 132 Balances only block groups with usage under the given percentage. The
 133 value of 0 is allowed and will clean up completely unused block groups, this
 134 should not require any new work space allocated. You may want to use 'usage=0'
 135 in case balance is returning ENOSPC and your filesystem is not too full.
 136 +
 137 The argument may be a single value or a range. The single value 'N' means 'at
 138 most N percent used', equivalent to '..N' range syntax. Kernels prior to 4.4
 139 accept only the single value format.
 140 The minimum range boundary is inclusive, maximum is exclusive.
 141
 142 *devid=<id>*::
 143 Balances only block groups which have at least one chunk on the given
 144 device. To list devices with ids use *btrfs filesystem show*.
 145
 146 *drange=<range>*::
 147 Balance only block groups which overlap with the given byte range on any
 148 device. Use in conjunction with 'devid' to filter on a specific device. The
 149 parameter is a range specified as 'start..end'.
 150
 151 *vrange=<range>*::
 152 Balance only block groups which overlap with the given byte range in the
 153 filesystem's internal virtual address space. This is the address space that
 154 most reports from btrfs in the kernel log use. The parameter is a range
 155 specified as 'start..end'.
 156
 157 *convert=<profile>*::
 158 Convert each selected block group to the given profile name identified by
 159 parameters.
 160 +
 161 NOTE: starting with kernel 4.5, the 'data' chunks can be converted to/from the
 162 'DUP' profile on a single device.
 163 +
 164 NOTE: starting with kernel 4.6, all profiles can be converted to/from 'DUP' on
 165 multi-device filesystems.
 166
 167 *limit=<number>*::
 168 *limit=<range>*::
 169 Process only given number of chunks, after all filters are applied. This can be
 170 used to specifically target a chunk in connection with other filters ('drange',
 171 'vrange') or just simply limit the amount of work done by a single balance run.
 172 +
 173 The argument may be a single value or a range. The single value 'N' means 'at
 174 most N chunks', equivalent to '..N' range syntax. Kernels prior to 4.4 accept
 175 only the single value format.  The range minimum and maximum are inclusive.
 176
 177 *stripes=<range>*::
 178 Balance only block groups which have the given number of stripes. The parameter
 179 is a range specified as 'start..end'. Makes sense for block group profiles that
 180 utilize striping, ie. RAID0/10/5/6.  The range minimum and maximum are
 181 inclusive.
 182
 183 *soft*::
 184 Takes no parameters. Only has meaning when converting between profiles.
 185 When doing convert from one profile to another and soft mode is on,
 186 chunks that already have the target profile are left untouched.
 187 This is useful e.g. when half of the filesystem was converted earlier but got
 188 cancelled.
 189 +
 190 The soft mode switch is (like every other filter) per-type.
 191 For example, this means that we can convert metadata chunks the "hard" way
 192 while converting data chunks selectively with soft switch.
 193
 194 Profile names, used in 'profiles' and 'convert' are one of: 'raid0', 'raid1',
 195 'raid10', 'raid5', 'raid6', 'dup', 'single'. The mixed data/metadata profiles
 196 can be converted in the same way, but it's conversion between mixed and non-mixed
 197 is not implemented. For the constraints of the profiles please refer to `mkfs.btrfs`(8),
 198 section 'PROFILES'.
 199
 200 ENOSPC
 201 ------
 202
 203 The way balance operates, it usually needs to temporarily create a new block
 204 group and move the old data there, before the old block group can be removed.
 205 For that it needs the work space, otherwise it fails for ENOSPC reasons.
 206 This is not the same ENOSPC as if the free space is exhausted. This refers to
 207 the space on the level of block groups, which are bigger parts of the filesytem
 208 that contain many file extents.
 209
 210 The free work space can be calculated from the output of the *btrfs filesystem show*
 211 command:
 212
 213 ------------------------------
 214    Label: 'BTRFS'  uuid: 8a9d72cd-ead3-469d-b371-9c7203276265
 215            Total devices 2 FS bytes used 77.03GiB
 216            devid    1 size 53.90GiB used 51.90GiB path /dev/sdc2
 217            devid    2 size 53.90GiB used 51.90GiB path /dev/sde1
 218 ------------------------------
 219
 220 'size' - 'used' = 'free work space' +
 221 '53.90GiB' - '51.90GiB' = '2.00GiB'
 222
 223 An example of a filter that does not require workspace is 'usage=0'. This will
 224 scan through all unused block groups of a given type and will reclaim the
 225 space. After that it might be possible to run other filters.
 226
 227 **CONVERSIONS ON MULTIPLE DEVICES**
 228
 229 Conversion to profiles based on striping (RAID0, RAID5/6) require the work
 230 space on each device. An interrupted balance may leave partially filled block
 231 groups that consume the work space.
 232
 233 EXAMPLES
 234 --------
 235
 236 A more comprehensive example when going from one to multiple devices, and back,
 237 can be found in section 'TYPICAL USECASES' of `btrfs-device`(8).
 238
 239 MAKING BLOCK GROUP LAYOUT MORE COMPACT
 240 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 241
 242 The layout of block groups is not normally visible; most tools report only
 243 summarized numbers of free or used space, but there are still some hints
 244 provided.
 245
 246 Let's use the following real life example and start with the output:
 247
 248 --------------------
 249 $ btrfs filesystem df /path
 250 Data, single: total=75.81GiB, used=64.44GiB
 251 System, RAID1: total=32.00MiB, used=20.00KiB
 252 Metadata, RAID1: total=15.87GiB, used=8.84GiB
 253 GlobalReserve, single: total=512.00MiB, used=0.00B
 254 --------------------
 255
 256 Roughly calculating for data, '75G - 64G = 11G', the used/total ratio is
 257 about '85%'. How can we can interpret that:
 258
 259 * chunks are filled by 85% on average, ie. the 'usage' filter with anything
 260   smaller than 85 will likely not affect anything
 261 * in a more realistic scenario, the space is distributed unevenly, we can
 262   assume there are completely used chunks and the remaining are partially filled
 263
 264 Compacting the layout could be used on both. In the former case it would spread
 265 data of a given chunk to the others and removing it. Here we can estimate that
 266 roughly 850 MiB of data have to be moved (85% of a 1 GiB chunk).
 267
 268 In the latter case, targeting the partially used chunks will have to move less
 269 data and thus will be faster. A typical filter command would look like:
 270
 271 --------------------
 272 # btrfs balance start -dusage=50 /path
 273 Done, had to relocate 2 out of 97 chunks
 274
 275 $ btrfs filesystem df /path
 276 Data, single: total=74.03GiB, used=64.43GiB
 277 System, RAID1: total=32.00MiB, used=20.00KiB
 278 Metadata, RAID1: total=15.87GiB, used=8.84GiB
 279 GlobalReserve, single: total=512.00MiB, used=0.00B
 280 --------------------
 281
 282 As you can see, the 'total' amount of data is decreased by just 1 GiB, which is
 283 an expected result. Let's see what will happen when we increase the estimated
 284 usage filter.
 285
 286 --------------------
 287 # btrfs balance start -dusage=85 /path
 288 Done, had to relocate 13 out of 95 chunks
 289
 290 $ btrfs filesystem df /path
 291 Data, single: total=68.03GiB, used=64.43GiB
 292 System, RAID1: total=32.00MiB, used=20.00KiB
 293 Metadata, RAID1: total=15.87GiB, used=8.85GiB
 294 GlobalReserve, single: total=512.00MiB, used=0.00B
 295 --------------------
 296
 297 Now the used/total ratio is about 94% and we moved about '74G - 68G = 6G' of
 298 data to the remaining blockgroups, ie. the 6GiB are now free of filesystem
 299 structures, and can be reused for new data or metadata block groups.
 300
 301 We can do a similar exercise with the metadata block groups, but this should
 302 not typically be necessary, unless the used/total ratio is really off. Here
 303 the ratio is roughly 50% but the difference as an absolute number is "a few
 304 gigabytes", which can be considered normal for a workload with snapshots or
 305 reflinks updated frequently.
 306
 307 --------------------
 308 # btrfs balance start -musage=50 /path
 309 Done, had to relocate 4 out of 89 chunks
 310
 311 $ btrfs filesystem df /path
 312 Data, single: total=68.03GiB, used=64.43GiB
 313 System, RAID1: total=32.00MiB, used=20.00KiB
 314 Metadata, RAID1: total=14.87GiB, used=8.85GiB
 315 GlobalReserve, single: total=512.00MiB, used=0.00B
 316 --------------------
 317
 318 Just 1 GiB decrease, which possibly means there are block groups with good
 319 utilization. Making the metadata layout more compact would in turn require
 320 updating more metadata structures, ie. lots of IO. As running out of metadata
 321 space is a more severe problem, it's not necessary to keep the utilization
 322 ratio too high. For the purpose of this example, let's see the effects of
 323 further compaction:
 324
 325 --------------------
 326 # btrfs balance start -musage=70 /path
 327 Done, had to relocate 13 out of 88 chunks
 328
 329 $ btrfs filesystem df .
 330 Data, single: total=68.03GiB, used=64.43GiB
 331 System, RAID1: total=32.00MiB, used=20.00KiB
 332 Metadata, RAID1: total=11.97GiB, used=8.83GiB
 333 GlobalReserve, single: total=512.00MiB, used=0.00B
 334 --------------------
 335
 336 GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS
 337 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 338
 339 Normally the balance operation needs a work space, to temporarily move the
 340 data before the old block groups gets removed. If there's no work space, it
 341 ends with 'no space left'.
 342
 343 There's a special case when the block groups are completely unused, possibly
 344 left after removing lots of files or deleting snapshots. Removing empty block
 345 groups is automatic since 3.18. The same can be achieved manually with a
 346 notable exception that this operation does not require the work space. Thus it
 347 can be used to reclaim unused block groups to make it available.
 348
 349 --------------------
 350 # btrfs balance start -dusage=0 /path
 351 --------------------
 352
 353 This should lead to decrease in the 'total' numbers in the *btrfs filesystem df* output.
 354
 355 EXIT STATUS
 356 -----------
 357 *btrfs balance* returns a zero exit status if it succeeds. Non zero is
 358 returned in case of failure.
 359
 360 AVAILABILITY
 361 ------------
 362 *btrfs* is part of btrfs-progs.
 363 Please refer to the btrfs wiki http://btrfs.wiki.kernel.org for
 364 further details.
 365
 366 SEE ALSO
 367 --------
 368 `mkfs.btrfs`(8),
 369 `btrfs-device`(8)