platform/kernel/linux-starfive.git
7 years agoxfs: add some comments to xfs_iext_insert/xfs_iext_insert_node
Christoph Hellwig [Thu, 9 Nov 2017 17:11:41 +0000 (09:11 -0800)]
xfs: add some comments to xfs_iext_insert/xfs_iext_insert_node

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: fix number of records handling in xfs_iext_split_leaf
Christoph Hellwig [Thu, 9 Nov 2017 17:11:41 +0000 (09:11 -0800)]
xfs: fix number of records handling in xfs_iext_split_leaf

Fix to check the correct value, and remove a duplicate handling of the
uneven record number split algorith,

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agofs/xfs: Remove NULL check before kmem_cache_destroy
Tim Hansen [Wed, 8 Nov 2017 20:00:40 +0000 (12:00 -0800)]
fs/xfs: Remove NULL check before kmem_cache_destroy

kmem_cache_destroy already checks for null values.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: only check da node header padding on v5 filesystems
Darrick J. Wong [Wed, 8 Nov 2017 20:21:05 +0000 (12:21 -0800)]
xfs: only check da node header padding on v5 filesystems

It turns out that we only started zeroing a new da btree node's block
header on v5 filesystems.  Prior to that, we just wouldn't set anything
at all, which means that the pad field never got set and would retain
whatever happened to be in memory.

Therefore, we can only check the pad for zeroness on v5 filesystems.
shared/006 on a v4 filesystem exposes this scrub bug.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: fix btree scrub deref check
Darrick J. Wong [Mon, 6 Nov 2017 20:09:29 +0000 (12:09 -0800)]
xfs: fix btree scrub deref check

The btree scrubber has some custom code to retrieve and check a btree
block via xfs_btree_lookup_get_block.  This function will either return
an error code (verifiers failed) or a *pblock will be untouched (bad
pointer).  Since we previously set *pblock to NULL, we need to check
*pblock, not pblock, to trigger the early bailout.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: fix uninitialized return values in scrub code
Darrick J. Wong [Mon, 6 Nov 2017 20:01:48 +0000 (12:01 -0800)]
xfs: fix uninitialized return values in scrub code

Fix smatch complaints about uninitialized return codes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: pass inode number to xfs_scrub_ino_set_{preen,warning}
Darrick J. Wong [Mon, 6 Nov 2017 19:46:15 +0000 (11:46 -0800)]
xfs: pass inode number to xfs_scrub_ino_set_{preen,warning}

There are two ways to scrub an inode -- calling xfs_iget and checking
the raw inode core, or by loading the inode cluster buffer and checking
the on-disk contents directly.  The second method is only useful if
_iget fails the verifiers; when this is the case, sc->ip is NULL and
calling the tracepoint will cause a system crash.

Therefore, pass the raw inode number directly into the _preen and
_warning functions.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: refactor the directory data block bestfree checks
Darrick J. Wong [Mon, 6 Nov 2017 19:37:46 +0000 (11:37 -0800)]
xfs: refactor the directory data block bestfree checks

In a directory data block, the zeroth bestfree item must point to the
longest free space.  Therefore, when we check the bestfree block's
records against the data blocks, we only need to compare with bf[0] and
don't need the loop.

The weird loop was most probably the result of an earlier refactoring
gone bad.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: mark xlog_verify_dest_ptr STATIC
Christoph Hellwig [Mon, 6 Nov 2017 19:54:02 +0000 (11:54 -0800)]
xfs: mark xlog_verify_dest_ptr STATIC

We already did it in the forward declaration, but not for the function
body itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: mark xlog_recover_check_summary STATIC
Christoph Hellwig [Mon, 6 Nov 2017 19:54:01 +0000 (11:54 -0800)]
xfs: mark xlog_recover_check_summary STATIC

We already did it in the forward declaration, but not for the function
body itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr static
Christoph Hellwig [Mon, 6 Nov 2017 19:54:01 +0000 (11:54 -0800)]
xfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr static

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove unreachable error injection code in xfs_qm_dqget
Christoph Hellwig [Mon, 6 Nov 2017 19:54:00 +0000 (11:54 -0800)]
xfs: remove unreachable error injection code in xfs_qm_dqget

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove unused debug counts for xfs_lock_inodes
Christoph Hellwig [Mon, 6 Nov 2017 19:54:00 +0000 (11:54 -0800)]
xfs: remove unused debug counts for xfs_lock_inodes

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: mark xfs_errortag_ktype static
Christoph Hellwig [Mon, 6 Nov 2017 19:53:59 +0000 (11:53 -0800)]
xfs: mark xfs_errortag_ktype static

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: trivial sparse fixes for the new scrub code
Christoph Hellwig [Mon, 6 Nov 2017 19:53:58 +0000 (11:53 -0800)]
xfs: trivial sparse fixes for the new scrub code

[darrick: fix broken initializer in xfs_scrub_xattr]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: always define STATIC to static noinline
Christoph Hellwig [Mon, 6 Nov 2017 19:53:58 +0000 (11:53 -0800)]
xfs: always define STATIC to static noinline

Ever since we added the noinline tag there is no good reason to define
away the static for debug builds - we'll get just as good debug
information with our without it, so don't mess up sparse and other
checkers due to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: move xfs_bmbt_irec and xfs_exntst_t to xfs_types.h
Christoph Hellwig [Fri, 3 Nov 2017 17:34:47 +0000 (10:34 -0700)]
xfs: move xfs_bmbt_irec and xfs_exntst_t to xfs_types.h

Neither defines an on-disk format, so move them out of xfs_format.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: pass struct xfs_bmbt_irec to xfs_bmbt_validate_extent
Christoph Hellwig [Fri, 3 Nov 2017 17:34:47 +0000 (10:34 -0700)]
xfs: pass struct xfs_bmbt_irec to xfs_bmbt_validate_extent

This removed an unaligned load per extent, as well as the manual poking
into the on-disk extent format.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove the nr_extents argument to xfs_iext_remove
Christoph Hellwig [Fri, 3 Nov 2017 17:34:47 +0000 (10:34 -0700)]
xfs: remove the nr_extents argument to xfs_iext_remove

We only have two places that remove 2 extents at the same time, so unroll
the loop there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove the nr_extents argument to xfs_iext_insert
Christoph Hellwig [Fri, 3 Nov 2017 17:34:46 +0000 (10:34 -0700)]
xfs: remove the nr_extents argument to xfs_iext_insert

We only have two places that insert 2 extents at the same time, so unroll
the loop there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: use a b+tree for the in-core extent list
Christoph Hellwig [Fri, 3 Nov 2017 17:34:46 +0000 (10:34 -0700)]
xfs: use a b+tree for the in-core extent list

Replace the current linear list and the indirection array for the in-core
extent list with a b+tree to avoid the need for larger memory allocations
for the indirection array when lots of extents are present.  The current
extent list implementations leads to heavy pressure on the memory
allocator when modifying files with a high extent count, and can lead
to high latencies because of that.

The replacement is a b+tree with a few quirks.  The leaf nodes directly
store the extent record in two u64 values.  The encoding is a little bit
different from the existing in-core extent records so that the start
offset and length which are required for lookups can be retreived with
simple mask operations.  The inner nodes store a 64-bit key containing
the start offset in the first half of the node, and the pointers to the
next lower level in the second half.  In either case we walk the node
from the beginninig to the end and do a linear search, as that is more
efficient for the low number of cache lines touched during a search
(2 for the inner nodes, 4 for the leaf nodes) than a binary search.
We store termination markers (zero length for the leaf nodes, an
otherwise impossible high bit for the inner nodes) to terminate the key
list / records instead of storing a count to use the available cache
lines as efficiently as possible.

One quirk of the algorithm is that while we normally split a node half and
half like usual btree implementations we just spill over entries added at
the very end of the list to a new node on its own.  This means we get a
100% fill grade for the common cases of bulk insertion when reading an
inode into memory, and when only sequentially appending to a file.  The
downside is a slightly higher chance of splits on the first random
insertions.

Both insert and removal manually recurse into the lower levels, but
the bulk deletion of the whole tree is still implemented as a recursive
function call, although one limited by the overall depth and with very
little stack usage in every iteration.

For the first few extents we dynamically grow the list from a single
extent to the next powers of two until we have a first full leaf block
and that building the actual tree.

The code started out based on the generic lib/btree.c code from Joern
Engel based on earlier work from Peter Zijlstra, but has since been
rewritten beyond recognition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: allow unaligned extent records in xfs_bmbt_disk_set_all
Christoph Hellwig [Fri, 3 Nov 2017 17:34:45 +0000 (10:34 -0700)]
xfs: allow unaligned extent records in xfs_bmbt_disk_set_all

To make life a little simpler make xfs_bmbt_set_all unaligned access
aware so that we can use it directly on the destination buffer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove support for inlining data/extents into the inode fork
Christoph Hellwig [Fri, 3 Nov 2017 17:34:45 +0000 (10:34 -0700)]
xfs: remove support for inlining data/extents into the inode fork

Supporting a small bit of data inside the inode fork blows up the fork size
a lot, removing the 32 bytes of inline data halves the effective size of
the inode fork (and it still has a lot of unused padding left), and the
performance of a single kmalloc doesn't show up compared to the size to read
an inode or create one.

It also simplifies the fork management code a lot.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: simplify xfs_reflink_convert_cow
Christoph Hellwig [Fri, 3 Nov 2017 17:34:44 +0000 (10:34 -0700)]
xfs: simplify xfs_reflink_convert_cow

Instead of looking up extents to convert and calling xfs_bmapi_write on
each of them just let xfs_bmapi_write handle the full range.  To make
this robust add a new XFS_BMAPI_CONVERT_ONLY that only converts ranges
and never allocates blocks.

[darrick: shorten the stringified CONVERT_ONLY trace flag]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: iterate backwards in xfs_reflink_cancel_cow_blocks
Christoph Hellwig [Fri, 3 Nov 2017 17:34:44 +0000 (10:34 -0700)]
xfs: iterate backwards in xfs_reflink_cancel_cow_blocks

Match the iteration order for extent deletion in the truncate and
reflink I/O completion path.

This also happens to make implementing the new incore extent list
a lot easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: introduce the xfs_iext_cursor abstraction
Christoph Hellwig [Fri, 3 Nov 2017 17:34:43 +0000 (10:34 -0700)]
xfs: introduce the xfs_iext_cursor abstraction

Add a new xfs_iext_cursor structure to hide the direct extent map
index manipulations. In addition to the existing lookup/get/insert/
remove and update routines new primitives to get the first and last
extent cursor, as well as moving up and down by one extent are
provided.  Also new are convenience to increment/decrement the
cursor and retreive the new extent, as well as to peek into the
previous/next extent without updating the cursor and last but not
least a macro to iterate over all extents in a fork.

[darrick: rename for_each_iext to for_each_xfs_iext]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: iterate over extents in xfs_bmap_extents_to_btree
Christoph Hellwig [Fri, 3 Nov 2017 17:34:43 +0000 (10:34 -0700)]
xfs: iterate over extents in xfs_bmap_extents_to_btree

This actually makes the function very slightly less efficient for now as we
detour through the expanded irect format between the in-core extent format
and the on-disk one instead of just endian swapping them.  But with the
incore extent btree the in-core one will use a different format and the
representation will be entirely hidden.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: iterate over extents in xfs_iextents_copy
Christoph Hellwig [Fri, 3 Nov 2017 17:34:42 +0000 (10:34 -0700)]
xfs: iterate over extents in xfs_iextents_copy

This actually makes the function very slightly less efficient for now as we
detour through the expanded irect format between the in-core extent format
and the on-disk one instead of just endian swapping them.  But with the
incore extent btree the in-core one will use a different format and the
representation will be entirely hidden.  It also happens to make the
function a whole more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: pass an on-disk extent to xfs_bmbt_validate_extent
Christoph Hellwig [Fri, 3 Nov 2017 17:34:42 +0000 (10:34 -0700)]
xfs: pass an on-disk extent to xfs_bmbt_validate_extent

This prepares for getting rid of the current in-memory extent format.
At the end of the series we will change the calling convention again
to pass the xfs_bmbt_irec structure once it is available everywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: treat idx as a cursor in xfs_bmap_collapse_extents
Christoph Hellwig [Fri, 3 Nov 2017 17:34:41 +0000 (10:34 -0700)]
xfs: treat idx as a cursor in xfs_bmap_collapse_extents

Stop poking before and after the index and just increment or decrement
it while doing our operations on it to prepare for a new extent list
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: treat idx as a cursor in xfs_bmap_del_extent_*
Christoph Hellwig [Fri, 3 Nov 2017 17:34:41 +0000 (10:34 -0700)]
xfs: treat idx as a cursor in xfs_bmap_del_extent_*

Stop poking before and after the index and just increment or decrement
it while doing our operations on it to prepare for a new extent list
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: treat idx as a cursor in xfs_bmap_add_extent_unwritten_real
Christoph Hellwig [Fri, 3 Nov 2017 17:34:40 +0000 (10:34 -0700)]
xfs: treat idx as a cursor in xfs_bmap_add_extent_unwritten_real

Stop poking before and after the index and just increment or decrement
it while doing our operations on it to prepare for a new extent list
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: treat idx as a cursor in xfs_bmap_add_extent_hole_real
Christoph Hellwig [Fri, 3 Nov 2017 17:34:40 +0000 (10:34 -0700)]
xfs: treat idx as a cursor in xfs_bmap_add_extent_hole_real

Stop poking before and after the index and just increment or decrement
it while doing our operations on it to prepare for a new extent list
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: treat idx as a cursor in xfs_bmap_add_extent_hole_delay
Christoph Hellwig [Fri, 3 Nov 2017 17:34:39 +0000 (10:34 -0700)]
xfs: treat idx as a cursor in xfs_bmap_add_extent_hole_delay

Stop poking before and after the index and just increment or decrement
it while doing our operations on it to prepare for a new extent list
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: treat idx as a cursor in xfs_bmap_add_extent_delay_real
Christoph Hellwig [Fri, 3 Nov 2017 17:34:39 +0000 (10:34 -0700)]
xfs: treat idx as a cursor in xfs_bmap_add_extent_delay_real

Stop poking before and after the index and just increment or decrement
it while doing our operations on it to prepare for a new extent list
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove a duplicate assignment in xfs_bmap_add_extent_delay_real
Christoph Hellwig [Fri, 3 Nov 2017 17:34:38 +0000 (10:34 -0700)]
xfs: remove a duplicate assignment in xfs_bmap_add_extent_delay_real

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: don't create overlapping extents in xfs_bmap_add_extent_delay_real
Christoph Hellwig [Fri, 3 Nov 2017 17:34:38 +0000 (10:34 -0700)]
xfs: don't create overlapping extents in xfs_bmap_add_extent_delay_real

Two cases in xfs_bmap_add_extent_delay_real currently insert a new
extent before updating the existing one that is being split.  While
this works fine with a simple extent list, a more complex tree can't
easily cope with overlapping extent.  Reshuffle the code a bit to update
the slot of the existing delalloc extent to the new real extent before
inserting the shortened delalloc extent before or after it.  This
avoids the overlapping extents while still allowing to update the
br_startblock field of the delalloc extent with the updated indirect
block reservation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: scrub: avoid uninitialized return code
Darrick J. Wong [Thu, 2 Nov 2017 19:48:11 +0000 (12:48 -0700)]
xfs: scrub: avoid uninitialized return code

The newly added xfs_scrub_da_btree_block() function has one code path
that returns the 'error' variable without initializing it first, as
shown by this compiler warning:

fs/xfs/scrub/dabtree.c: In function 'xfs_scrub_da_btree_block':
fs/xfs/scrub/dabtree.c:462:9: error: 'error' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Return zero since the caller will exit the scrub code if we don't produce a
buffer pointer.

Fixes: 7c4a07a424c1 ("xfs: scrub directory/attribute btrees")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
7 years agoxfs: truncate pagecache before writeback in xfs_setattr_size()
Eryu Guan [Thu, 2 Nov 2017 04:43:50 +0000 (21:43 -0700)]
xfs: truncate pagecache before writeback in xfs_setattr_size()

On truncate down, if new size is not block size aligned, we zero the
rest of block to avoid exposing stale data to user, and
iomap_truncate_page() skips zeroing if the range is already in
unwritten state or a hole. Then we writeback from on-disk i_size to
the new size if this range hasn't been written to disk yet, and
truncate page cache beyond new EOF and set in-core i_size.

The problem is that we could write data between di_size and newsize
before removing the page cache beyond newsize, as the extents may
still be in unwritten state right after a buffer write. As such, the
page of data that newsize lies in has not been zeroed by page cache
invalidation before it is written, and xfs_do_writepage() hasn't
triggered it's "zero data beyond EOF" case because we haven't
updated in-core i_size yet. Then a subsequent mmap read could see
non-zeros past EOF.

I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):

  fallocate 0x0 0x1000 0x0 keep_size
  write 0x0 0x1000 0x0
  truncate 0x0 0x800 0x1000
  punch_hole 0x0 0x800 0x800
  mapread 0x0 0x800 0x800

where fallocate allocates unwritten extent but doesn't update
i_size, buffer write populates the page cache and extent is still
unwritten, truncate skips zeroing page past new EOF and writes the
page to disk, punch_hole invalidates the page cache, at last mapread
reads the block back and sees non-zero beyond EOF.

Fix it by moving truncate_setsize() to before writeback so the page
cache invalidation zeros the partial page at the new EOF. This also
triggers "zero data beyond EOF" in xfs_do_writepage() at writeback
time, because newsize has been set and page straddles the newsize.

Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: convert remaining xfs_sb_version_... checks to bool
Dave Chinner [Wed, 1 Nov 2017 22:02:48 +0000 (15:02 -0700)]
xfs: convert remaining xfs_sb_version_... checks to bool

Some were missed in the pass that converted the function return
values from int to bool. Update the remaining ones for consistency.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: scrub extended attribute leaf space
Darrick J. Wong [Tue, 31 Oct 2017 19:10:02 +0000 (12:10 -0700)]
xfs: scrub extended attribute leaf space

As we walk the attribute btree, explicitly check the structure of the
attribute leaves to make sure the pointers make sense and the freemap is
sensible.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: move error injection tags into their own file
Darrick J. Wong [Tue, 31 Oct 2017 19:04:49 +0000 (12:04 -0700)]
xfs: move error injection tags into their own file

Move the error injection tag names into a libxfs header so that we can
share it between kernel and userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: remove inode log format typedef
Darrick J. Wong [Tue, 31 Oct 2017 19:04:24 +0000 (12:04 -0700)]
xfs: remove inode log format typedef

Remove xfs_inode_log_format_t now that xfs_inode_log_format is
explicitly padded and therefore is a real on-disk structure.  This
enables xfs/122 to check the size of the structure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: remove redundant assignment to variable bit
Colin Ian King [Tue, 31 Oct 2017 16:56:06 +0000 (09:56 -0700)]
xfs: remove redundant assignment to variable bit

Variable bit is being assigned a value that is never read, hence
the assignment is redundant and can be removed. Cleans up clang
warning:

fs/xfs/libxfs/xfs_rtbitmap.c:675:3: warning: Value stored to
'bit' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: fix unused variable warning in xfs_buf_set_ref()
Brian Foster [Fri, 27 Oct 2017 16:20:28 +0000 (09:20 -0700)]
xfs: fix unused variable warning in xfs_buf_set_ref()

Fix an unused variable warning on non-DEBUG builds introduced by
commit 7561d27e90 ("xfs: buffer lru reference count error injection
tag").

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: compare btree block keys to parent block's keys during scrub
Darrick J. Wong [Wed, 25 Oct 2017 22:03:46 +0000 (15:03 -0700)]
xfs: compare btree block keys to parent block's keys during scrub

When we're done checking all the records/keys in a btree block, compute
the low and high key of the block and compare them to the associated key
in the parent btree block.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
7 years agoxfs: abort dir/attr btree operation if btree is obviously weird
Darrick J. Wong [Wed, 25 Oct 2017 23:59:43 +0000 (16:59 -0700)]
xfs: abort dir/attr btree operation if btree is obviously weird

Abort an dir/attr btree operation if the attr btree has obvious problems
like loops back to the root or pointers don't point down the tree.
Found by fuzzing btree[0].before to zero in xfs/402, which livelocks on
the cycle in the attr btree.

Apply the same checks to xfs_da3_node_lookup_int.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
7 years agoxfs: refactor extended attribute list operation
Darrick J. Wong [Wed, 25 Oct 2017 23:59:42 +0000 (16:59 -0700)]
xfs: refactor extended attribute list operation

When we're iterating the attribute list and we can't find our previous
location based off the attribute cursor, we'll instead walk down the
attribute btree from the root trying to find where we left off.  Move
this code into a separate function for later cleanups.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
7 years agoxfs: validate sb_logsunit is a multiple of the fs blocksize
Darrick J. Wong [Wed, 25 Oct 2017 23:59:43 +0000 (16:59 -0700)]
xfs: validate sb_logsunit is a multiple of the fs blocksize

Make sure the log stripe unit is sane before proceeding with mounting.
AFAICT this means that logsunit has to be 0, 1, or a multiple of the fs
block size.  Found this by setting the LSB of logsunit in xfs/350 and
watching the system crash as soon as we try to write to the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
7 years agoxfs: drain the buffer LRU on mount
Brian Foster [Thu, 26 Oct 2017 16:31:16 +0000 (09:31 -0700)]
xfs: drain the buffer LRU on mount

Log recovery of v4 filesystems does not use buffer verifiers because
log recovery historically can result in transient buffer corruption
when target buffers might be ahead of the log after a crash. v5
filesystems work around this problem with metadata LSN ordering.

While this log recovery verifier behavior is necessary on v4 supers,
it can result in leaving buffers around in the LRU without verifiers
attached for a significant amount of time. This leads to use of
unverified buffers while the filesystem is in active use, long after
recovery has completed.

To address this problem, drain all buffers from the LRU as a final
step of the log mount sequence. Note that this is done
unconditionally to provide a consistently clean cache footprint,
regardless of superblock version or log state. As a side effect,
this ensures that all cache resident, unverified buffers are
reclaimed after log recovery and therefore must be recreated with
verifiers on subsequent use.

Reported-by: Darrick Wong <darrick.wong@oracle.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: fix log block underflow during recovery cycle verification
Brian Foster [Thu, 26 Oct 2017 16:31:16 +0000 (09:31 -0700)]
xfs: fix log block underflow during recovery cycle verification

It is possible for mkfs to format very small filesystems with too
small of an internal log with respect to the various minimum size
and block count requirements. If this occurs when the log happens to
be smaller than the scan window used for cycle verification and the
scan wraps the end of the log, the start_blk calculation in
xlog_find_head() underflows and leads to an attempt to scan an
invalid range of log blocks. This results in log recovery failure
and a failed mount.

Since there may be filesystems out in the wild with this kind of
geometry, we cannot simply refuse to mount. Instead, cap the scan
window for cycle verification to the size of the physical log. This
ensures that the cycle verification proceeds as expected when the
scan wraps the end of the log.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: more robust recovery xlog buffer validation
Brian Foster [Thu, 26 Oct 2017 16:31:15 +0000 (09:31 -0700)]
xfs: more robust recovery xlog buffer validation

mkfs has a historical problem where it can format very small
filesystems with too small of a physical log. Under certain
conditions, log recovery of an associated filesystem can end up
passing garbage parameter values to some of the cycle and log record
verification functions due to bugs in log recovery not dealing with
such filesystems properly. This results in attempts to read from
bogus/underflowed log block addresses.

Since the buffer read may ultimately succeed, log recovery can
proceed with bogus data and otherwise go off the rails and crash.
One example of this is a negative last_blk being passed to
xlog_find_verify_log_record() causing us to skip the loop, pass a
NULL head pointer to xlog_header_check_mount() and crash.

Improve the xlog buffer verification to address this problem. We
already verify xlog buffer length, so update this mechanism to also
sanity check for a valid log relative block address and otherwise
return an error. Pass a fixed, valid log block address from
xlog_get_bp() since the target address will be validated when the
buffer is read. This ensures that any bogus log block address/length
calculations lead to graceful mount failure rather than risking a
crash or worse if recovery proceeds with bogus data.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: add a new xfs_iext_lookup_extent_before helper
Christoph Hellwig [Mon, 23 Oct 2017 23:32:39 +0000 (16:32 -0700)]
xfs: add a new xfs_iext_lookup_extent_before helper

This helper looks up the last extent the covers space before the passed
in block number.  This is useful for truncate and similar operations that
operate backwards over the extent list.  For xfs_bunmapi it also is
a slight optimization as we can return early if there are not extents
at or below the end of the to be truncated range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: merge xfs_bmap_read_extents into xfs_iread_extents
Christoph Hellwig [Mon, 23 Oct 2017 23:32:39 +0000 (16:32 -0700)]
xfs: merge xfs_bmap_read_extents into xfs_iread_extents

xfs_iread_extents is just a trivial wrapper, there is no good reason
to keep the two separate.

[darrick: minor fixups having left xfs_bmbt_validate_extent intact]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: add asserts for the mmap lock in xfs_{insert,collapse}_file_space
Christoph Hellwig [Mon, 23 Oct 2017 23:32:38 +0000 (16:32 -0700)]
xfs: add asserts for the mmap lock in xfs_{insert,collapse}_file_space

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: rewrite xfs_bmap_first_unused to make better use of xfs_iext_get_extent
Christoph Hellwig [Thu, 19 Oct 2017 18:08:52 +0000 (11:08 -0700)]
xfs: rewrite xfs_bmap_first_unused to make better use of xfs_iext_get_extent

Look at the return value of xfs_iext_get_extent instead of figuring out
the extent count first and looping up to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: don't rely on extent indices in xfs_bmap_insert_extents
Christoph Hellwig [Thu, 19 Oct 2017 18:08:52 +0000 (11:08 -0700)]
xfs: don't rely on extent indices in xfs_bmap_insert_extents

Rewrite xfs_bmap_insert_extents so that we don't rely on extent indices
except for iterating over them.  Not being able to iterate to the previous
extent or finding the extent that stop_fsb is in are sufficient exit
conditions, and we don't need to do any extent count games given that:

  a) we already flushed all delalloc extents past our start offset
     before doing the operation
  b) xfs_iext_count() includes delalloc extents anyway

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: don't rely on extent indices in xfs_bmap_collapse_extents
Christoph Hellwig [Thu, 19 Oct 2017 18:08:51 +0000 (11:08 -0700)]
xfs: don't rely on extent indices in xfs_bmap_collapse_extents

Rewrite xfs_bmap_collapse_extents so that we don't rely on extent indices
except for iterating over them.  Not being able to iterate to the next
extent is a sufficient exit condition, and we don't need to do any extent
count games given that:

  a) we already flushed all delalloc extents past our start offset
     before doing the operation
  b) xfs_iext_count() includes delalloc extents anyway

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: update got in xfs_bmap_shift_update_extent
Christoph Hellwig [Thu, 19 Oct 2017 18:08:51 +0000 (11:08 -0700)]
xfs: update got in xfs_bmap_shift_update_extent

This way the caller gets the proper updated extent returned in got.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove xfs_bmse_shift_one
Christoph Hellwig [Thu, 19 Oct 2017 18:07:34 +0000 (11:07 -0700)]
xfs: remove xfs_bmse_shift_one

Instead do the actual left and right shift work in the callers, and just
keep a helper to update the bmap and rmap btrees as well as the in-core
extent list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: split xfs_bmap_shift_extents
Christoph Hellwig [Thu, 19 Oct 2017 18:07:11 +0000 (11:07 -0700)]
xfs: split xfs_bmap_shift_extents

Have a separate helper for insert vs collapse, as this prepares us for
simplifying the code in the next patches.

Also changed the done output argument to a bool intead of int for both
new functions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove XFS_BMAP_MAX_SHIFT_EXTENTS
Christoph Hellwig [Thu, 19 Oct 2017 18:07:10 +0000 (11:07 -0700)]
xfs: remove XFS_BMAP_MAX_SHIFT_EXTENTS

The define was always set to 1, which means looping until we reach is
was dead code from the start.

Also remove an initialization of next_fsb for the done case that doesn't
fit the new code flow - it was never checked by the caller in the done
case to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: inline xfs_shift_file_space into callers
Christoph Hellwig [Thu, 19 Oct 2017 18:07:10 +0000 (11:07 -0700)]
xfs: inline xfs_shift_file_space into callers

The code is sufficiently different for the insert vs collapse cases both
in xfs_shift_file_space itself and the callers that untangling them will
make life a lot easier down the road.

We still keep a common helper for flushing all data and COW state to get
the inode into the right shape for shifting the extents around.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove if_rdev
Christoph Hellwig [Thu, 19 Oct 2017 18:07:09 +0000 (11:07 -0700)]
xfs: remove if_rdev

We can simply use the i_rdev field in the Linux inode and just convert
to and from the XFS dev_t when reading or logging/writing the inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove the never fully implemented UUID fork format
Christoph Hellwig [Thu, 19 Oct 2017 18:07:09 +0000 (11:07 -0700)]
xfs: remove the never fully implemented UUID fork format

Remove the dead code dealing with the UUID fork format that was never
implemented in Linux (and neither in IRIX as far as I know).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove XFS_BMAP_TRACE_EXLIST
Christoph Hellwig [Thu, 19 Oct 2017 18:06:29 +0000 (11:06 -0700)]
xfs: remove XFS_BMAP_TRACE_EXLIST

Instead of looping over all extents in some debug-only helper just
insert trace points into the loops that already exist in the calling
functions.

Also split the xfs_extlist trace point into one each for reading and
writing extents from disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: move pre/post-bmap tracing into xfs_iext_update_extent
Christoph Hellwig [Thu, 19 Oct 2017 18:04:44 +0000 (11:04 -0700)]
xfs: move pre/post-bmap tracing into xfs_iext_update_extent

xfs_iext_update_extent already has basically all the information needed
to centralize the bmap pre/post tracing.  We just need to pass inode +
bmap state instead of the inode fork pointer to get all trace annotations.

In addition to covering all the existing trace points this gives us
tracing coverage for the extent shifting operations for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: remove post-bmap tracing in xfs_bmap_local_to_extents
Christoph Hellwig [Thu, 19 Oct 2017 18:04:44 +0000 (11:04 -0700)]
xfs: remove post-bmap tracing in xfs_bmap_local_to_extents

Now that we use xfs_iext_insert this is already covered by the tracing
in that function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: make better use of the 'state' variable in xfs_bmap_del_extent_real
Christoph Hellwig [Thu, 19 Oct 2017 18:04:43 +0000 (11:04 -0700)]
xfs: make better use of the 'state' variable in xfs_bmap_del_extent_real

We already have all the information about the fork a=D1=95 well as additional
tracing information, so pass that to xfs_iext_remove().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: add a xfs_bmap_fork_to_state helper
Christoph Hellwig [Thu, 19 Oct 2017 18:02:29 +0000 (11:02 -0700)]
xfs: add a xfs_bmap_fork_to_state helper

This creates the right initial bmap state from the passed in inode
fork enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
7 years agoxfs: scrub quota information
Darrick J. Wong [Wed, 18 Oct 2017 04:37:47 +0000 (21:37 -0700)]
xfs: scrub quota information

Perform some quick sanity testing of the disk quota information.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub realtime bitmap/summary
Darrick J. Wong [Wed, 18 Oct 2017 04:37:46 +0000 (21:37 -0700)]
xfs: scrub realtime bitmap/summary

Perform simple tests of the realtime bitmap and summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub directory parent pointers
Darrick J. Wong [Wed, 18 Oct 2017 04:37:46 +0000 (21:37 -0700)]
xfs: scrub directory parent pointers

Scrub parent pointers, sort of.  For directories, we can ride the
'..' entry up to the parent to confirm that there's at most one
dentry that points back to this directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub symbolic links
Darrick J. Wong [Wed, 18 Oct 2017 04:37:45 +0000 (21:37 -0700)]
xfs: scrub symbolic links

Create the infrastructure to scrub symbolic link data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub extended attributes
Darrick J. Wong [Wed, 18 Oct 2017 04:37:45 +0000 (21:37 -0700)]
xfs: scrub extended attributes

Scrub the hash tree, keys, and values in an extended attribute structure.
Refactor the attribute code to use the transaction if the caller supplied
one to avoid buffer deadocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub directory freespace
Darrick J. Wong [Wed, 18 Oct 2017 04:37:44 +0000 (21:37 -0700)]
xfs: scrub directory freespace

Check the free space information in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub directory metadata
Darrick J. Wong [Wed, 18 Oct 2017 04:37:44 +0000 (21:37 -0700)]
xfs: scrub directory metadata

Scrub the hash tree and all the entries in a directory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub directory/attribute btrees
Darrick J. Wong [Wed, 18 Oct 2017 04:37:43 +0000 (21:37 -0700)]
xfs: scrub directory/attribute btrees

Provide a way to check the shape and scrub the hashes and records
in a directory or extended attribute btree.  These are helper functions
for the directory & attribute scrubbers in subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[fengguang: remove unneeded variable to store return value]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub inode block mappings
Darrick J. Wong [Wed, 18 Oct 2017 04:37:43 +0000 (21:37 -0700)]
xfs: scrub inode block mappings

Scrub an individual inode's block mappings to make sure they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub inodes
Darrick J. Wong [Wed, 18 Oct 2017 04:37:42 +0000 (21:37 -0700)]
xfs: scrub inodes

Scrub the fields within an inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub refcount btrees
Darrick J. Wong [Wed, 18 Oct 2017 04:37:41 +0000 (21:37 -0700)]
xfs: scrub refcount btrees

Plumb in the pieces necessary to check the refcount btree.  If rmap is
available, check the reference count by performing an interval query
against the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub rmap btrees
Darrick J. Wong [Wed, 18 Oct 2017 04:37:41 +0000 (21:37 -0700)]
xfs: scrub rmap btrees

Check the reverse mapping records to make sure that the contents
make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub inode btrees
Darrick J. Wong [Wed, 18 Oct 2017 04:37:40 +0000 (21:37 -0700)]
xfs: scrub inode btrees

Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub free space btrees
Darrick J. Wong [Wed, 18 Oct 2017 04:37:40 +0000 (21:37 -0700)]
xfs: scrub free space btrees

Check the extent records free space btrees to ensure that the values
look sane.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub the AGI
Darrick J. Wong [Wed, 18 Oct 2017 04:37:39 +0000 (21:37 -0700)]
xfs: scrub the AGI

Add a forgotten check to the AGI verifier, then wire up the scrub
infrastructure to check the AGI contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub AGF and AGFL
Darrick J. Wong [Wed, 18 Oct 2017 04:37:39 +0000 (21:37 -0700)]
xfs: scrub AGF and AGFL

Check the block references in the AGF and AGFL headers to make sure
they make sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub the secondary superblocks
Darrick J. Wong [Wed, 18 Oct 2017 04:37:38 +0000 (21:37 -0700)]
xfs: scrub the secondary superblocks

Ensure that the geometry presented in the backup superblocks matches
the primary superblock so that repair can recover the filesystem if
that primary gets corrupted.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: create helpers to scan an allocation group
Darrick J. Wong [Wed, 18 Oct 2017 04:37:38 +0000 (21:37 -0700)]
xfs: create helpers to scan an allocation group

Add some helpers to enable us to lock an AG's headers, create btree
cursors for all btrees in that allocation group, and clean up
afterwards.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub btree keys and records
Darrick J. Wong [Wed, 18 Oct 2017 04:37:37 +0000 (21:37 -0700)]
xfs: scrub btree keys and records

Add to the btree scrubber the ability to check that the keys and
records are in the right order and actually call out to our record
iterator to do actual checking of the records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: scrub the shape of a metadata btree
Darrick J. Wong [Wed, 18 Oct 2017 04:37:37 +0000 (21:37 -0700)]
xfs: scrub the shape of a metadata btree

Create a function that can check the shape of a btree -- each block
passes basic inspection and all the pointers look ok.  In the next patch
we'll add the ability to check the actual keys and records stored within
the btree.  Add some helper functions so that we report detailed scrub
errors in a uniform manner in dmesg.  These are helper functions for
subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: create helpers to scrub a metadata btree
Darrick J. Wong [Wed, 18 Oct 2017 04:37:37 +0000 (21:37 -0700)]
xfs: create helpers to scrub a metadata btree

Create helper functions and tracepoints to deal with errors while
scrubbing a metadata btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: create helpers to record and deal with scrub problems
Darrick J. Wong [Wed, 18 Oct 2017 04:37:36 +0000 (21:37 -0700)]
xfs: create helpers to record and deal with scrub problems

Create helper functions to record crc and corruption problems, and
deal with any other runtime errors that arise.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: probe the scrub ioctl
Darrick J. Wong [Wed, 18 Oct 2017 04:37:36 +0000 (21:37 -0700)]
xfs: probe the scrub ioctl

Create a probe scrubber with id 0.  This will be used by xfs_scrub to
probe the kernel's abilities to scrub (and repair) the metadata.  We do
this by validating the ioctl inputs from userspace, preparing the
filesystem for a scrub (or a repair) operation, and immediately
returning to userspace.  Userspace can use the returned errno and
structure state to decide (in broad terms) if scrub/repair are
supported by the running kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: dispatch metadata scrub subcommands
Darrick J. Wong [Wed, 18 Oct 2017 04:37:35 +0000 (21:37 -0700)]
xfs: dispatch metadata scrub subcommands

Create structures needed to hold scrubbing context and dispatch incoming
commands to the individual scrubbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: create an ioctl to scrub AG metadata
Darrick J. Wong [Wed, 18 Oct 2017 04:37:34 +0000 (21:37 -0700)]
xfs: create an ioctl to scrub AG metadata

Create an ioctl that can be used to scrub internal filesystem metadata.
The new ioctl takes the metadata type, an (optional) AG number, an
(optional) inode number and generation, and a flags argument.  This will
be used by the upcoming XFS online scrub tool.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: create inode pointer verifiers
Darrick J. Wong [Wed, 18 Oct 2017 04:37:34 +0000 (21:37 -0700)]
xfs: create inode pointer verifiers

Create some helper functions to check that inode pointers point to
somewhere within the filesystem and not at the static AG metadata.
Move xfs_internal_inum and create a directory inode check function.
We will use these functions in scrub and elsewhere.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: refactor btree block header checking functions
Darrick J. Wong [Wed, 18 Oct 2017 04:37:33 +0000 (21:37 -0700)]
xfs: refactor btree block header checking functions

Refactor the btree block header checks to have an internal function that
returns the address of the failing check without logging errors.  The
scrubber will call the internal function, while the external version
will maintain the current logging behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: refactor btree pointer checks
Darrick J. Wong [Wed, 18 Oct 2017 04:37:33 +0000 (21:37 -0700)]
xfs: refactor btree pointer checks

Refactor the btree pointer checks so that we can call them from the
scrub code without logging errors to dmesg.  Preserve the existing error
reporting for regular operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: create block pointer check functions
Darrick J. Wong [Wed, 18 Oct 2017 04:37:32 +0000 (21:37 -0700)]
xfs: create block pointer check functions

Create some helper functions to check that a block pointer points
within the filesystem (or AG) and doesn't point at static metadata.
We will use this for scrub.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
7 years agoxfs: return a distinct error code value for IGET_INCORE cache misses
Darrick J. Wong [Wed, 18 Oct 2017 04:37:32 +0000 (21:37 -0700)]
xfs: return a distinct error code value for IGET_INCORE cache misses

For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
return ENODATA so that we don't confuse it with the pre-existing ENOENT
cases (inode is in cache, but freed).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>