btrfs-progs: dump-super: changes in options to specify superblocks

[platform/upstream/btrfs-progs.git] / Documentation / btrfs-man5.asciidoc
diff --git a/Documentation/btrfs-man5.asciidoc b/Documentation/btrfs-man5.asciidoc

index 6580866..a1f364e 100644 (file)
--- a/Documentation/btrfs-man5.asciidoc
+++ b/Documentation/btrfs-man5.asciidoc
@@ -12,7 +12,12 @@ tools.  Currently covers:
  
  1. mount options
  
-2. file attributes
+2. filesystem features
+
+3. file attributes
+
+4. control device
+
  
  MOUNT OPTIONS
  -------------
@@ -27,6 +32,9 @@ options please refer to `mount`(8) manpage. The options are sorted alphabeticall
  +
  Enable/disable support for Posix Access Control Lists (ACLs).  See the
  `acl`(5) manual page for more information about ACLs.
++
+The support for ACL is build-time configurable (BTRFS_FS_POSIX_ACL) and
+mount fails if 'acl' is requested but the feature is not compiled in.
  
  *alloc_start='bytes'*::
  (default: 1M, minimum: 1M)
@@ -69,7 +77,7 @@ synchronize all pending data and ordinary metadata blocks, then writes the
  superblock and issues another flush.
  +
  The write flushes incur a slight hit and also prevent the IO block
-scheduler to reorder requests in more effective way. Disabling barriers gets
+scheduler to reorder requests in a more effective way. Disabling barriers gets
  rid of that penalty but will most certainly lead to a corrupted filesystem in
  case of a crash or power loss. The ordinary metadata blocks could be yet
  unwritten at the time the new superblock is stored permanently, expecting that
@@ -121,8 +129,11 @@ but a warning is printed if it's more than 300 seconds (5 minutes).
  +
  Control BTRFS file data compression.  Type may be specified as 'zlib',
  'lzo' or 'no' (for no compression, used for remounting).  If no type
-is specified, 'zlib' is used.  If compress-force is specified,
-all files will be compressed, whether or not they compress well.
+is specified, 'zlib' is used.  If 'compress-force' is specified,
+all files will be compressed, whether or not they compress well. Otherwise
+some simple heuristics are applied to detect an incompressible file. If the
+first blocks written to a file are not compressible, the whole file is
+permanently marked to skip compression.
  +
  NOTE: If compression is enabled, 'nodatacow' and 'nodatasum' are disabled.
  
@@ -133,6 +144,8 @@ NOTE: If compression is enabled, 'nodatacow' and 'nodatasum' are disabled.
  Enable data copy-on-write for newly created files.
  'Nodatacow' implies 'nodatasum', and disables 'compression'. All files created
  under 'nodatacow' are also set the NOCOW file attribute (see `chattr`(1)).
++
+NOTE: If 'nodatacow' or 'nodatasum' are enabled, compression is disabled.
  
  *datasum*::
  *nodatasum*::
@@ -142,6 +155,8 @@ Enable data checksumming for newly created files.
  'Datasum' implies 'datacow', ie. the normal mode of operation. All files created
  under 'nodatasum' inherit the "no checksums" property, however there's no
  corresponding file attribute (see `chattr`(1)).
++
+NOTE: If 'nodatacow' or 'nodatasum' are enabled, compression is disabled.
  
  *degraded*::
  (default: off)
@@ -171,15 +186,17 @@ underlying device, the operation may severely hurt performance in case the TRIM
  operation is synchronous (eg. with SATA devices up to revision 3.0).
  +
  If discarding is not necessary to be done at the block freeing time, there's
-*fstrim* tool that lets the filesystem discard all free blocks in a batch,
-possibly not much interfering with other operations.
+`fstrim` tool that lets the filesystem discard all free blocks in a batch,
+possibly not much interfering with other operations. Also, the the device may
+ignore the TRIM command if the range is too small, so running the batch discard
+can actually discard the blocks.
  
  *enospc_debug*::
  *noenospc_debug*::
  (default: off)
  +
  Enable verbose output for some ENOSPC conditions. It's safe to use but can
-be noisy if the system hits reaches near-full state.
+be noisy if the system reaches near-full state.
  
  *fatal_errors='action'*::
  (since: 3.4, default: bug)
@@ -231,20 +248,31 @@ the option.
  NOTE: Defaults to off due to a potential overflow problem when the free space
  checksums don't fit inside a single page.
  
+*logreplay*::
+*nologreplay*::
+(default: on, even read-only)
++
+Enable/disable log replay at mount time. See also 'treelog'.
++
+WARNING: currently, the tree log is replayed even with a read-only mount! To
+disable that behaviour, mount also with 'nologreplay'.
+
  *max_inline='bytes'*::
-(default: min(8192, page size) )
+(default: min(2048, page size) )
  +
  Specify the maximum amount of space, in bytes, that can be inlined in
  a metadata B-tree leaf.  The value is specified in bytes, optionally
  with a K suffix (case insensitive).  In practice, this value
  is limited by the filesystem block size (named 'sectorsize' at mkfs time),
  and memory page size of the system. In case of sectorsize limit, there's
-some space unavailable due to leaf headers.  For example, a 4k sectorsize, max
-inline data is ~3900 bytes.
+some space unavailable due to leaf headers.  For example, a 4k sectorsize,
+maximum size of inline data is about 3900 bytes.
  +
-Inlining can be completely turned off specifying 0. This will increase data
+Inlining can be completely turned off by specifying 0. This will increase data
  block slack if file sizes are much smaller than block size but will reduce
  metadata consumption in return.
++
+NOTE: the default value has changed to 2048 in kernel 4.6.
  
  *metadata_ratio='value'*::
  (default: 0, internal logic)
@@ -252,17 +280,27 @@ metadata consumption in return.
  Specifies that 1 metadata chunk should be allocated after every 'value' data
  chunks. Default behaviour depends on internal logic, some percent of unused
  metadata space is attempted to be maintained but is not always possible if
-there's not space left for chunk allocation. The option could be useful to
+there's not enough space left for chunk allocation. The option could be useful to
  override the internal logic in favor of the metadata allocation if the expected
  workload is supposed to be metadata intense (snapshots, reflinks, xattrs,
  inlined files).
  
  *recovery*::
-(since: 3.2, default: off)
+(since: 3.2, default: off, deprecated since: 4.5)
  +
-Enable autorecovery attempts if a bad tree root is found at mount time.
-Currently this scans a backup list of several previous tree roots and tries to
-use the first readable. This can be used with read-only mounts as well.
+NOTE: this option has been replaced by 'usebackuproot' and should not be used
+but will work on 4.5+ kernels.
+
+*norecovery*::
+(since: 4.5, default: off)
++
+Do not attempt any data recovery at mount time. This will disable 'logreplay'
+and avoids other write operations.
++
+NOTE: The opposite option 'recovery' used to have different meaning but was
+changed for consistency with other filesystems, where 'norecovery' is used for
+skipping log replay. BTRFS does the same and in general will try to avoid any
+write operations.
  
  *rescan_uuid_tree*::
  (since: 3.12, default: off)
@@ -275,16 +313,28 @@ normally be needed.
  +
  Skip automatic resume of interrupted balance operation after mount.
  May be resumed with *btrfs balance resume* or the paused state can be removed
-by *btrfs balance cancel*.
+by *btrfs balance cancel*. The default behaviour is to start interrutpd balance.
  
  *space_cache*::
+*space_cache=v2*::
  *nospace_cache*::
-('nospace_cache' since: 3.2, default: on)
+('nospace_cache' since: 3.2, 'space_cache=v2' since 4.5, default: on)
++
+Options to control the free space cache.  This affects performance as searching
+for new free blocks could take longer if the space cache is not enabled. On the
+other hand, managing the space cache consumes some resources.  It can be
+disabled without clearing at mount time.
++
+There are two implementations of how the space is tracked. The safe default is
+'v1'.  On large filesystems (many-terabytes) and certain workloads the 'v1'
+performance may degrade.  This problem is addressed by 'v2', that is based on
+b-trees, sometimes referred to as 'free-space-tree'.
++
+'Compatibility notes:'
  +
-Disable freespace cache loading without clearing the cache and the free space
-cache will not be used during the mount. This affects performance as searching
-for new free blocks could take longer. On the other hand, managing the space
-cache consumes some resources.
+* the 'v2' has to be enabled manually at mount time, once
+* kernel without 'v2' support will be able to mount the filesystem in read-only mode
+* 'v2' can be removed by mounting with 'clear_cache'
  
  *ssd*::
  *nossd*::
@@ -293,7 +343,8 @@ cache consumes some resources.
  +
  Options to control SSD allocation schemes.  By default, BTRFS will
  enable or disable SSD allocation heuristics depending on whether a
-rotational or nonrotational disk is in use.  The 'ssd' and 'nossd' options
+rotational or non-rotational disk is in use (contents of
+'/sys/block/DEV/queue/rotational').  The 'ssd' and 'nossd' options
  can override this autodetection.
  +
  The 'ssd_spread' mount option attempts to allocate into bigger and aligned
@@ -309,6 +360,9 @@ This mount option overrides the default subvolume set for the given filesystem.
  Mount subvolume specified by a 'subvolid' number rather than the toplevel
  subvolume.  You can use *btrfs subvolume list* to see subvolume ID numbers.
  This mount option overrides the default subvolume set for the given filesystem.
++
+NOTE: if both 'subvolid' and 'subvol' are specified, they must point at the
+same subvolume, otherwise mount will fail.
  
  *subvolrootid='objectid'*::
  (irrelevant since: 3.2, formally deprecated since: 3.10)
@@ -321,9 +375,9 @@ subvolume that did not reside directly under the toplevel subvolume.
  +
  The number of worker threads to allocate. NRCPUS is number of on-line CPUs
  detected at the time of mount. Small number leads to less parallelism in
-processing data and metadata, higher numbers could lead to a performance due to
-increased locking contention, cache-line bouncing or costly data transfers
-between local CPU memories.
+processing data and metadata, higher numbers could lead to a performance hit
+due to increased locking contention, cache-line bouncing or costly data
+transfers between local CPU memories.
  
  *treelog*::
  *notreelog*::
@@ -334,10 +388,20 @@ stores changes without the need of a full filesystem sync. The log operations
  are flushed at sync and transaction commit. If the system crashes between two
  such syncs, the pending tree log operations are replayed during mount.
  +
-WARNING: currently, the tree log is replayed even with a read-only mount!
+WARNING: currently, the tree log is replayed even with a read-only mount! To
+disable that behaviour, mount also with 'nologreplay'.
  +
  The tree log could contain new files/directories, these would not exist on
-a mounted filesystm if the log is not replayed.
+a mounted filesystem if the log is not replayed.
+
+*usebackuproot*::
+*nousebackuproot*::
++
+Enable autorecovery attempts if a bad tree root is found at mount time.
+Currently this scans a backup list of several previous tree roots and tries to
+use the first readable. This can be used with read-only mounts as well.
++
+NOTE: This option has replaced 'recovery'.
  
  *user_subvol_rm_allowed*::
  (default: off)
@@ -345,6 +409,91 @@ a mounted filesystm if the log is not replayed.
  Allow subvolumes to be deleted by their respective owner. Otherwise, only the
  root user can do that.
  
+FILESYSTEM FEATURES
+-------------------
+
+The basic set of filesystem features gets extended over time. The backward
+compatibility is maintained and the features are optional, need to be
+explicitly asked for so accidental use will not create incompatibilities.
+
+There are several classes and the respective tools to manage the features:
+
+at mkfs time only::
+This is namely for core structures, like the b-tree nodesize, see
+`mkfs.btrfs`(8) for more details.
+
+after mkfs, on an unmounted filesystem::
+Features that may optimize internal structures or add new structures to support
+new functionality, see `btrfstune`(8). The command *btrfs inspect-internal
+dump-super device* will dump a superblock, you can map the value of
+'incompat_flags' to the features listed below
+
+after mkfs, on a mounted filesystem::
+The features of a filesystem (with a given UUID) are listed in
+`/sys/fs/btrfs/UUID/features/`, one file per feature. The status of is stored
+insid the file. The value '1' is for enabled, '0' means the feature was had
+been enabled at the mount time and turned off afterwards.
++
+Whether a particular feature can be turned on a mounted filesystem can be found
+in the directory `/sys/fs/btrfs/features/`, one file per feature. The value '1'
+means the feature can be enabled.
+
+List of features (see also `mkfs.btrfs`(8) section 'FILESYSTEM FEATURES'):
+
+big_metadata::
+(since: 3.4)
++
+the filesystem uses 'nodesize' bigger than the page size
+compress_lzo::
+(since: 2.6.38)
++
+the 'lzo' compression has been used on the filesystem, either as a mount option
+or via *btrfs filesystem defrag*.
+
+default_subvol::
+(since: 2.6.34)
++
+the default subvolume has been set on the filesystem
+
+extended_iref::
+(since: 3.7)
++
+increased hardlink limit per file in a directory to 65536, older kernels
+supported a varying number of hardlinks depending on the sum of all file name
+sizes that can be stored into one metadata block
+
+mixed_backref::
+(since: 2.6.31)
++
+the last major disk format change, improved backreferences
+
+mixed_groups::
+(since: 2.6.37)
++
+mixed data and metadata block groups, ie. the data and metadata are not
+separated and occupy the same block groups, this mode is suitable for small
+volumes as there are no constraints how the remaining space should be used
+(compared to the split mode, where empty metadata space cannot be used for data
+and vice versa)
++
+on the other hand, the final layout is quite unpredictable and possibly highly
+fragmented, which means worse performance
+
+no_holes::
+(since: 3.14)
+improved representation of file extents where holes are not explicitly
+stored as an extent, saves a few percent of metadata if sparse files are used
+
+raid56::
+(since: 3.9)
++
+the filesystem contains or contained a raid56 profile of block groups
++
+skinny_metadata::
+(since: 3.10)
++
+reduced-size metadata for extent references, saves a few percent of metadata
+
  FILE ATTRIBUTES
  ---------------
  The btrfs filesystem supports setting the following file attributes using the
@@ -396,11 +545,43 @@ When set on a directory, all newly created files will inherit this attribute.
  No other attributes are supported.  For the complete list please refer to the
  `chattr`(1) manual page.
  
+CONTROL DEVICE
+--------------
+
+There's a character special device `/dev/btrfs-control` with major and minor
+numbers 10 and 234 (the device can be found under the 'misc' category).
+
+--------------------
+$ ls -l /dev/btrfs-control
+crw------- 1 root root 10, 234 Jan  1 12:00 /dev/btrfs-control
+--------------------
+
+The device accepts some ioctl calls that can perform following actions on the
+filesyste module:
+
+* scan devices for btrfs filesytem (ie. to let multi-device filesystems mount
+  automatically) and register them with the kernel module
+* similar to scan, but also wait until the device scanning process is finished
+  for a given filesystem
+* get the supported features (can be also found under '/sys/fs/btrfs/features')
+
+
+The device is usually created by ..., but can be created manually:
+
+--------------------
+# mknod --mode=600 c 10 234 /dev/btrfs-control
+--------------------
+
+The device is not strictly required but the device scanning will not work and a
+workaround would need to be used to mount a multi-device filesystem. The mount
+option 'device' can trigger the device scanning during mount.
+
  SEE ALSO
  --------
  `acl`(5),
  `btrfs`(8),
  `chattr`(1),
  `fstrim`(8),
+`ioctl`(2),
  `mkfs.btrfs`(8),
  `mount`(8)