From b246666ef792add7d5af09043297e2315b2e918a Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Fri, 11 Mar 2022 11:35:32 +0000 Subject: [PATCH] btrfs: release upper nodes when reading stale btree node from disk When reading a btree node (or leaf), at read_block_for_search(), if we can't find its extent buffer in the cache (the fs_info->buffer_radix radix tree), then we unlock all upper level nodes before reading the btree node/leaf from disk, to prevent blocking other tasks for too long. However if we find that the extent buffer is in the cache but it is not up to date, we don't unlock upper level nodes before reading it from disk, potentially blocking other tasks on upper level nodes for too long. Fix this inconsistent behaviour by unlocking upper level nodes if we need to read a node/leaf from disk because its in-memory extent buffer is not up to date. If we unlocked upper level nodes then we must return -EAGAIN to the caller, just like the case where the extent buffer is not cached in memory. And like that case, we determine if upper level nodes are locked by checking only if the parent node is locked - if it isn't, then no other upper level nodes are locked. This is actually a rare case, as if we have an extent buffer in memory, it typically has the uptodate flag set and passes all the checks done by btrfs_buffer_uptodate(). Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/ctree.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 8396079..e1e942e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1409,12 +1409,21 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p, struct btrfs_key first_key; int ret; int parent_level; + bool unlock_up; + unlock_up = ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]); blocknr = btrfs_node_blockptr(*eb_ret, slot); gen = btrfs_node_ptr_generation(*eb_ret, slot); parent_level = btrfs_header_level(*eb_ret); btrfs_node_key_to_cpu(*eb_ret, &first_key, slot); + /* + * If we need to read an extent buffer from disk and we are holding locks + * on upper level nodes, we unlock all the upper nodes before reading the + * extent buffer, and then return -EAGAIN to the caller as it needs to + * restart the search. We don't release the lock on the current level + * because we need to walk this node to figure out which blocks to read. + */ tmp = find_extent_buffer(fs_info, blocknr); if (tmp) { if (p->reada == READA_FORWARD_ALWAYS) @@ -1436,6 +1445,9 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p, return 0; } + if (unlock_up) + btrfs_unlock_up_safe(p, level + 1); + /* now we're allowed to do a blocking uptodate check */ ret = btrfs_read_buffer(tmp, gen, parent_level - 1, &first_key); if (ret) { @@ -1443,17 +1455,14 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p, btrfs_release_path(p); return -EIO; } - *eb_ret = tmp; - return 0; + + if (unlock_up) + ret = -EAGAIN; + + goto out; } - if ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]) { - /* - * Reduce lock contention at high levels of the btree by - * dropping locks before we read. Don't release the lock - * on the current level because we need to walk this node - * to figure out which blocks to read. - */ + if (unlock_up) { btrfs_unlock_up_safe(p, level + 1); ret = -EAGAIN; } else { @@ -1478,6 +1487,7 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p, if (!extent_buffer_uptodate(tmp)) ret = -EIO; +out: if (ret == 0) { *eb_ret = tmp; } else { -- 2.7.4