Documentation/btrfs-balance.asciidoc

   1 btrfs-balance(8)
   2 ================
   3
   4 NAME
   5 ----
   6 btrfs-balance - balance block groups on a btrfs filesystem
   7
   8 SYNOPSIS
   9 --------
  10 *btrfs balance* <subcommand> <args>
  11
  12 DESCRIPTION
  13 -----------
  14 The primary purpose of the balance feature is to spread block groups across
  15 all devices so they match constraints defined by the respective profiles. See
  16 `mkfs.btrfs`(8) section 'PROFILES' for more details.
  17 The scope of the balancing process can be further tuned by use of filters that
  18 can select the block groups to process. Balance works only on a mounted
  19 filesystem.
  20
  21 The balance operation is cancellable by the user. The on-disk state of the
  22 filesystem is always consistent so an unexpected interruption (eg. system crash,
  23 reboot) does not corrupt the filesystem. The progress of the balance operation
  24 is temporarily stored and will be resumed upon mount, unless the mount option
  25 'skip_balance' is specified.
  26
  27 WARNING: running balance without filters will take a lot of time as it basically
  28 rewrites the entire filesystem and needs to update all block pointers.
  29
  30 The filters can be used to perform following actions:
  31
  32 - convert block group profiles (filter 'convert')
  33 - make block group usage more compact  (filter 'usage')
  34 - perform actions only on a given device (filters 'devid', 'drange')
  35
  36 The filters can be applied to a combination of block group types (data,
  37 metadata, system). Note that changing 'system' needs the force option.
  38
  39 NOTE: the balance operation needs enough work space, ie. space that is
  40 completely unused in the filesystem, otherwise this may lead to ENOSPC reports.
  41 See the section 'ENOSPC' for more details.
  42
  43 COMPATIBILITY
  44 -------------
  45
  46 NOTE: The balance subcommand also exists under the *btrfs filesystem*
  47 namespace. This still works for backward compatibility but is deprecated and
  48 should not be used anymore.
  49
  50 NOTE: A short syntax *btrfs balance <path>* works due to backward compatibility
  51 but is deprecated and should not be used anymore. Use *btrfs balance start*
  52 command instead.
  53
  54 PERFORMANCE IMPLICATIONS
  55 ------------------------
  56
  57 Balance operation is intense namely in the IO respect, but can be also CPU
  58 intense. It affects other actions on the filesystem. There are typically lots
  59 of data being copied from one location to another, and lots of metadata get
  60 updated.
  61
  62 Depending on the actual block group layout, it can be also seek-heavy. The
  63 performance on rotational devices is noticeably worse than on SSDs or fast
  64 arrays.
  65
  66 SUBCOMMAND
  67 ----------
  68 *cancel* <path>::
  69 cancel running or paused balance, the command will block and wait until the
  70 actually processed blockgroup is finished
  71
  72 *pause* <path>::
  73 pause running balance operation, this will store the state of the balance
  74 progress and used filters to the filesystem
  75
  76 *resume* <path>::
  77 resume interrupted balance, the balance status must be stored on the filesystem
  78 from previous run, eg. after it was forcibly interrupted and mounted again with
  79 'skip_balance'
  80
  81 *start* [options] <path>::
  82 start the balance operation according to the specified filters, no filters
  83 will rewrite the entire filesystem. The process runs in the foreground.
  84 +
  85 NOTE: the balance command without filters will basically rewrite everything
  86 in the filesystem. The run time is potentially very long, depending on the
  87 filesystem size. To prevent starting a full balance by accident, the user is
  88 warned and has a few seconds to cancel the operation before it starts. The
  89 warning and delay can be skipped with '--full-balance' option.
  90 +
  91 Please note that the filters must be written together with the '-d', '-m' and
  92 '-s' options, because they're optional and bare '-d' etc also work and mean no
  93 filters.
  94 +
  95 `Options`
  96 +
  97 -d[<filters>]::::
  98 act on data block groups, see `FILTERS` section for details about 'filters'
  99 -m[<filters>]::::
 100 act on metadata chunks, see `FILTERS` section for details about 'filters'
 101 -s[<filters>]::::
 102 act on system chunks (requires '-f'), see `FILTERS` section for details about 'filters'.
 103 -v::::
 104 be verbose and print balance filter arguments
 105 -f::::
 106 force reducing of metadata integrity, eg. when going from 'raid1' to 'single'
 107 --background|--bg::::
 108 run the balance operation asynchronously in the background, uses `fork`(2) to
 109 start the process that calls the kernel ioctl
 110
 111 *status* [-v] <path>::
 112 Show status of running or paused balance.
 113 +
 114 If '-v' option is given, output will be verbose.
 115
 116 FILTERS
 117 -------
 118 From kernel 3.3 onwards, btrfs balance can limit its action to a subset of the
 119 whole filesystem, and can be used to change the replication configuration (e.g.
 120 moving data from single to RAID1). This functionality is accessed through the
 121 '-d', '-m' or '-s' options to btrfs balance start, which filter on data,
 122 metadata and system blocks respectively.
 123
 124 A filter has the following structure: 'type'[='params'][,'type'=...]
 125
 126 The available types are:
 127
 128 *profiles=<profiles>*::
 129 Balances only block groups with the given profiles. Parameters
 130 are a list of profile names separated by "'|'" (pipe).
 131
 132 *usage=<percent>*::
 133 *usage=<range>*::
 134 Balances only block groups with usage under the given percentage. The
 135 value of 0 is allowed and will clean up completely unused block groups, this
 136 should not require any new work space allocated. You may want to use 'usage=0'
 137 in case balance is returning ENOSPC and your filesystem is not too full.
 138 +
 139 The argument may be a single value or a range. The single value 'N' means 'at
 140 most N percent used', equivalent to '..N' range syntax. Kernels prior to 4.4
 141 accept only the single value format.
 142 The minimum range boundary is inclusive, maximum is exclusive.
 143
 144 *devid=<id>*::
 145 Balances only block groups which have at least one chunk on the given
 146 device. To list devices with ids use *btrfs fi show*.
 147
 148 *drange=<range>*::
 149 Balance only block groups which overlap with the given byte range on any
 150 device. Use in conjunction with 'devid' to filter on a specific device. The
 151 parameter is a range specified as 'start..end'.
 152
 153 *vrange=<range>*::
 154 Balance only block groups which overlap with the given byte range in the
 155 filesystem's internal virtual address space. This is the address space that
 156 most reports from btrfs in the kernel log use. The parameter is a range
 157 specified as 'start..end'.
 158
 159 *convert=<profile>*::
 160 Convert each selected block group to the given profile name identified by
 161 parameters.
 162 +
 163 NOTE: starting with kernel 4.5, the 'data' chunks can be converted to/from the
 164 'DUP' profile on a single device.
 165 +
 166 NOTE: starting with kernel 4.6, all profiles can be converted to/from 'DUP' on
 167 multi-device filesystems.
 168
 169 *limit=<number>*::
 170 *limit=<range>*::
 171 Process only given number of chunks, after all filters are applied. This can be
 172 used to specifically target a chunk in connection with other filters ('drange',
 173 'vrange') or just simply limit the amount of work done by a single balance run.
 174 +
 175 The argument may be a single value or a range. The single value 'N' means 'at
 176 most N chunks', equivalent to '..N' range syntax. Kernels prior to 4.4 accept
 177 only the single value format.  The range minimum and maximum are inclusive.
 178
 179 *stripes=<range>*::
 180 Balance only block groups which have the given number of stripes. The parameter
 181 is a range specified as 'start..end'. Makes sense for block group profiles that
 182 utilize striping, ie. RAID0/10/5/6.  The range minimum and maximum are
 183 inclusive.
 184
 185 *soft*::
 186 Takes no parameters. Only has meaning when converting between profiles.
 187 When doing convert from one profile to another and soft mode is on,
 188 chunks that already have the target profile are left untouched.
 189 This is useful e.g. when half of the filesystem was converted earlier but got
 190 cancelled.
 191 +
 192 The soft mode switch is (like every other filter) per-type.
 193 For example, this means that we can convert metadata chunks the "hard" way
 194 while converting data chunks selectively with soft switch.
 195
 196 Profile names, used in 'profiles' and 'convert' are one of: 'raid0', 'raid1',
 197 'raid10', 'raid5', 'raid6', 'dup', 'single'. The mixed data/metadata profiles
 198 can be converted in the same way, but it's conversion between mixed and non-mixed
 199 is not implemented. For the constraints of the profiles please refer to `mkfs.btrfs`(8),
 200 section 'PROFILES'.
 201
 202 ENOSPC
 203 ------
 204
 205 The way balance operates, it usually needs to temporarily create a new block
 206 group and move the old data there. For that it needs work space, otherwise
 207 it fails for ENOSPC reasons.
 208 This is not the same ENOSPC as if the free space is exhausted. This refers to
 209 the space on the level of block groups.
 210
 211 The free work space can be calculated from the output of the *btrfs filesystem show*
 212 command:
 213
 214 ------------------------------
 215    Label: 'BTRFS'  uuid: 8a9d72cd-ead3-469d-b371-9c7203276265
 216            Total devices 2 FS bytes used 77.03GiB
 217            devid    1 size 53.90GiB used 51.90GiB path /dev/sdc2
 218            devid    2 size 53.90GiB used 51.90GiB path /dev/sde1
 219 ------------------------------
 220
 221 'size' - 'used' = 'free work space' +
 222 '53.90GiB' - '51.90GiB' = '2.00GiB'
 223
 224 An example of a filter that does not require workspace is 'usage=0'. This will
 225 scan through all unused block groups of a given type and will reclaim the
 226 space. After that it might be possible to run other filters.
 227
 228 **CONVERSIONS ON MULTIPLE DEVICES**
 229
 230 Conversion to profiles based on striping (RAID0, RAID5/6) require the work
 231 space on each device. An interrupted balance may leave partially filled block
 232 groups that might consume the work space.
 233
 234 EXAMPLES
 235 --------
 236
 237 A more comprehensive example when going from one to multiple devices, and back,
 238 can be found in section 'TYPICAL USECASES' of `btrfs-device`(8).
 239
 240 MAKING BLOCK GROUP LAYOUT MORE COMPACT
 241 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 242
 243 The layout of block groups is not normally visible, most tools report only
 244 summarized numbers of free or used space, but there are still some hints
 245 provided.
 246
 247 Let's use the following real life example and start with the output:
 248
 249 --------------------
 250 $ btrfs fi df /path
 251 Data, single: total=75.81GiB, used=64.44GiB
 252 System, RAID1: total=32.00MiB, used=20.00KiB
 253 Metadata, RAID1: total=15.87GiB, used=8.84GiB
 254 GlobalReserve, single: total=512.00MiB, used=0.00B
 255 --------------------
 256
 257 Roughly calculating for data, '75G - 64G = 11G', the used/total ratio is
 258 about '85%'. How can we can interpret that:
 259
 260 * chunks are filled by 85% on average, ie. the 'usage' filter with anything
 261   smaller than 85 will likely not affect anything
 262 * in a more realistic scenario, the space is distributed unevenly, we can
 263   assume there are completely used chunks and the remaining are partially filled
 264
 265 Compacting the layout could be used on both. In the former case it would spread
 266 data of a given chunk to the others and removing it. Here we can estimate that
 267 roughly 850 MiB of data have to be moved (85% of a 1 GiB chunk).
 268
 269 In the latter case, targeting the partially used chunks will have to move less
 270 data and thus will be faster. A typical filter command would look like:
 271
 272 --------------------
 273 # btrfs balance start -dusage=50 /path
 274 Done, had to relocate 2 out of 97 chunks
 275
 276 $ btrfs fi df /path
 277 Data, single: total=74.03GiB, used=64.43GiB
 278 System, RAID1: total=32.00MiB, used=20.00KiB
 279 Metadata, RAID1: total=15.87GiB, used=8.84GiB
 280 GlobalReserve, single: total=512.00MiB, used=0.00B
 281 --------------------
 282
 283 As you can see, the 'total' amount of data is decreased by just 1 GiB, which is
 284 an expected result. Let's see what will happen when we increase the estimated
 285 usage filter.
 286
 287 --------------------
 288 # btrfs balance start -dusage=85 /path
 289 Done, had to relocate 13 out of 95 chunks
 290
 291 $ btrfs fi df /path
 292 Data, single: total=68.03GiB, used=64.43GiB
 293 System, RAID1: total=32.00MiB, used=20.00KiB
 294 Metadata, RAID1: total=15.87GiB, used=8.85GiB
 295 GlobalReserve, single: total=512.00MiB, used=0.00B
 296 --------------------
 297
 298 Now the used/total ratio is about 94% and we moved about '74G - 68G = 6G' of
 299 data to the remaining blockgroups, ie. the 6GiB are now free of filesystem
 300 structures, and can be reused for new data or metadata block groups.
 301
 302 We can do a similar exercise with the metadata block groups, but this should
 303 not be typically necessary, unless the used/total ration is really off. Here
 304 the ratio is roughly 50% but the difference as an absolute number is "a few
 305 gigabytes", which can be considered normal for a workload with snapshots or
 306 reflinks updated frequently.
 307
 308 --------------------
 309 # btrfs balance start -musage=50 /path
 310 Done, had to relocate 4 out of 89 chunks
 311
 312 $ btrfs fi df /path
 313 Data, single: total=68.03GiB, used=64.43GiB
 314 System, RAID1: total=32.00MiB, used=20.00KiB
 315 Metadata, RAID1: total=14.87GiB, used=8.85GiB
 316 GlobalReserve, single: total=512.00MiB, used=0.00B
 317 --------------------
 318
 319 Just 1 GiB decrease, which possibly means there are block groups with good
 320 utilization. Making the metadata layout more compact would in turn require
 321 updating more metadata structures, ie. lots of IO. As running out of metadata
 322 space is a more severe problem, it's not necessary to keep the utilization
 323 ratio too high. For the purpose of this example, let's see the effects of
 324 further compaction:
 325
 326 --------------------
 327 # btrfs balance start -musage=70 /path
 328 Done, had to relocate 13 out of 88 chunks
 329
 330 $ btrfs fi df .
 331 Data, single: total=68.03GiB, used=64.43GiB
 332 System, RAID1: total=32.00MiB, used=20.00KiB
 333 Metadata, RAID1: total=11.97GiB, used=8.83GiB
 334 GlobalReserve, single: total=512.00MiB, used=0.00B
 335 --------------------
 336
 337 GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS
 338 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 339
 340 Normally the balance operation needs a work space, to temporarily move the
 341 data before the old block groups gets removed. If there's no work space, it
 342 ends with 'no space left'.
 343
 344 There's a special case when the block groups are completely unused, possibly
 345 left after removing lots of files or deleting snapshots. Removing empty block
 346 groups is automatic since 3.18. The same can be achieved manually with a
 347 notable exception that this operation does not require the work space. Thus it
 348 can be used to reclaim unused block groups to make it available.
 349
 350 --------------------
 351 # btrfs balance start -dusage=0 /path
 352 --------------------
 353
 354 This should lead to decrease in the 'total' numbers in the *btrfs fi df* output.
 355
 356 EXIT STATUS
 357 -----------
 358 *btrfs balance* returns a zero exit status if it succeeds. Non zero is
 359 returned in case of failure.
 360
 361 AVAILABILITY
 362 ------------
 363 *btrfs* is part of btrfs-progs.
 364 Please refer to the btrfs wiki http://btrfs.wiki.kernel.org for
 365 further details.
 366
 367 SEE ALSO
 368 --------
 369 `mkfs.btrfs`(8),
 370 `btrfs-device`(8)