From: Darrick J. Wong Date: Tue, 8 Jun 2021 16:26:44 +0000 (-0700) Subject: Merge tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm... X-Git-Tag: accepted/tizen/unified/20230118.172025~6920^2~31 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=ffc18582ed18f1bb16da9ec38a792c7cbc3714a1;p=platform%2Fkernel%2Flinux-rpi.git Merge tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2 xfs: clean up incore inode walk functions This ambitious series aims to cleans up redundant inode walk code in xfs_icache.c, hide implementation details of the quotaoff dquot release code, and eliminates indirect function calls from incore inode walks. The first thing it does is to move all the code that quotaoff calls to release dquots from all incore inodes into xfs_icache.c. Next, it separates the goal of an inode walk from the actual radix tree tags that may or may not be involved and drops the kludgy XFS_ICI_NO_TAG thing. Finally, we split the speculative preallocation (blockgc) and quotaoff dquot release code paths into separate functions so that we can keep the implementations cohesive. Christoph suggested last cycle that we 'simply' change quotaoff not to allow deactivating quota entirely, but as these cleanups are to enable one major change in behavior (deferred inode inactivation) I do not want to add a second behavior change (quotaoff) as a dependency. To be blunt: Additional cleanups are not in scope for this series. Next, I made two observations about incore inode radix tree walks -- since there's a 1:1 mapping between the walk goal and the per-inode processing function passed in, we can use the goal to make a direct call to the processing function. Furthermore, the only caller to supply a nonzero iter_flags argument is quotaoff, and there's only one INEW flag. From that observation, I concluded that it's quite possible to remove two parameters from the xfs_inode_walk* function signatures -- the iter_flags, and the execute function pointer. The middle of the series moves the INEW functionality into the one piece (quotaoff) that wants it, and removes the indirect calls. The final observation is that the inode reclaim walk loop is now almost the same as xfs_inode_walk, so it's silly to maintain two copies. Merge the reclaim loop code into xfs_inode_walk. Lastly, refactor the per-ag radix tagging functions since there's duplicated code that can be consolidated. This series is a prerequisite for the next two patchsets, since deferred inode inactivation will add another inode radix tree tag and iterator function to xfs_inode_walk. v2: walk the vfs inode list when running quotaoff instead of the radix tree, then rework the (now completely internal) inode walk function to take the tag as the main parameter. v3: merge the reclaim loop into xfs_inode_walk, then consolidate the radix tree tagging functions v4: rebase to 5.13-rc4 v5: combine with the quotaoff patchset, reorder functions to minimize forward declarations, split inode walk goals from radix tree tags to reduce conceptual confusion v6: start moving the inode cache code towards the xfs_icwalk prefix * tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: refactor per-AG inode tagging functions xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag xfs: pass struct xfs_eofblocks to the inode scan callback xfs: fix radix tree tag signs xfs: make the icwalk processing functions clean up the grab state xfs: clean up inode state flag tests in xfs_blockgc_igrab xfs: remove indirect calls from xfs_inode_walk{,_ag} xfs: remove iter_flags parameter from xfs_inode_walk_* xfs: move xfs_inew_wait call into xfs_dqrele_inode xfs: separate the dqrele_all inode grab logic from xfs_inode_walk_ag_grab xfs: pass the goal of the incore inode walk to xfs_inode_walk() xfs: rename xfs_inode_walk functions to xfs_icwalk xfs: move the inode walk functions further down xfs: detach inode dquots at the end of inactivation xfs: move the quotaoff dqrele inode walk into xfs_icache.c [djwong: added variable names to function declarations while fixing merge conflicts] Signed-off-by: Darrick J. Wong --- ffc18582ed18f1bb16da9ec38a792c7cbc3714a1 diff --cc fs/xfs/libxfs/xfs_ag.c index 5315e3f,c68a366..0765a0b --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@@ -27,276 -27,6 +27,276 @@@ #include "xfs_defer.h" #include "xfs_log_format.h" #include "xfs_trans.h" +#include "xfs_trace.h" +#include "xfs_inode.h" +#include "xfs_icache.h" + + +/* + * Passive reference counting access wrappers to the perag structures. If the + * per-ag structure is to be freed, the freeing code is responsible for cleaning + * up objects with passive references before freeing the structure. This is + * things like cached buffers. + */ +struct xfs_perag * +xfs_perag_get( + struct xfs_mount *mp, + xfs_agnumber_t agno) +{ + struct xfs_perag *pag; + int ref = 0; + + rcu_read_lock(); + pag = radix_tree_lookup(&mp->m_perag_tree, agno); + if (pag) { + ASSERT(atomic_read(&pag->pag_ref) >= 0); + ref = atomic_inc_return(&pag->pag_ref); + } + rcu_read_unlock(); + trace_xfs_perag_get(mp, agno, ref, _RET_IP_); + return pag; +} + +/* + * search from @first to find the next perag with the given tag set. + */ +struct xfs_perag * +xfs_perag_get_tag( + struct xfs_mount *mp, + xfs_agnumber_t first, - int tag) ++ unsigned int tag) +{ + struct xfs_perag *pag; + int found; + int ref; + + rcu_read_lock(); + found = radix_tree_gang_lookup_tag(&mp->m_perag_tree, + (void **)&pag, first, 1, tag); + if (found <= 0) { + rcu_read_unlock(); + return NULL; + } + ref = atomic_inc_return(&pag->pag_ref); + rcu_read_unlock(); + trace_xfs_perag_get_tag(mp, pag->pag_agno, ref, _RET_IP_); + return pag; +} + +void +xfs_perag_put( + struct xfs_perag *pag) +{ + int ref; + + ASSERT(atomic_read(&pag->pag_ref) > 0); + ref = atomic_dec_return(&pag->pag_ref); + trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref, _RET_IP_); +} + +/* + * xfs_initialize_perag_data + * + * Read in each per-ag structure so we can count up the number of + * allocated inodes, free inodes and used filesystem blocks as this + * information is no longer persistent in the superblock. Once we have + * this information, write it into the in-core superblock structure. + */ +int +xfs_initialize_perag_data( + struct xfs_mount *mp, + xfs_agnumber_t agcount) +{ + xfs_agnumber_t index; + struct xfs_perag *pag; + struct xfs_sb *sbp = &mp->m_sb; + uint64_t ifree = 0; + uint64_t ialloc = 0; + uint64_t bfree = 0; + uint64_t bfreelst = 0; + uint64_t btree = 0; + uint64_t fdblocks; + int error = 0; + + for (index = 0; index < agcount; index++) { + /* + * read the agf, then the agi. This gets us + * all the information we need and populates the + * per-ag structures for us. + */ + error = xfs_alloc_pagf_init(mp, NULL, index, 0); + if (error) + return error; + + error = xfs_ialloc_pagi_init(mp, NULL, index); + if (error) + return error; + pag = xfs_perag_get(mp, index); + ifree += pag->pagi_freecount; + ialloc += pag->pagi_count; + bfree += pag->pagf_freeblks; + bfreelst += pag->pagf_flcount; + btree += pag->pagf_btreeblks; + xfs_perag_put(pag); + } + fdblocks = bfree + bfreelst + btree; + + /* + * If the new summary counts are obviously incorrect, fail the + * mount operation because that implies the AGFs are also corrupt. + * Clear FS_COUNTERS so that we don't unmount with a dirty log, which + * will prevent xfs_repair from fixing anything. + */ + if (fdblocks > sbp->sb_dblocks || ifree > ialloc) { + xfs_alert(mp, "AGF corruption. Please run xfs_repair."); + error = -EFSCORRUPTED; + goto out; + } + + /* Overwrite incore superblock counters with just-read data */ + spin_lock(&mp->m_sb_lock); + sbp->sb_ifree = ifree; + sbp->sb_icount = ialloc; + sbp->sb_fdblocks = fdblocks; + spin_unlock(&mp->m_sb_lock); + + xfs_reinit_percpu_counters(mp); +out: + xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS); + return error; +} + +STATIC void +__xfs_free_perag( + struct rcu_head *head) +{ + struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head); + + ASSERT(!delayed_work_pending(&pag->pag_blockgc_work)); + ASSERT(atomic_read(&pag->pag_ref) == 0); + kmem_free(pag); +} + +/* + * Free up the per-ag resources associated with the mount structure. + */ +void +xfs_free_perag( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { + spin_lock(&mp->m_perag_lock); + pag = radix_tree_delete(&mp->m_perag_tree, agno); + spin_unlock(&mp->m_perag_lock); + ASSERT(pag); + ASSERT(atomic_read(&pag->pag_ref) == 0); + + cancel_delayed_work_sync(&pag->pag_blockgc_work); + xfs_iunlink_destroy(pag); + xfs_buf_hash_destroy(pag); + + call_rcu(&pag->rcu_head, __xfs_free_perag); + } +} + +int +xfs_initialize_perag( + struct xfs_mount *mp, + xfs_agnumber_t agcount, + xfs_agnumber_t *maxagi) +{ + struct xfs_perag *pag; + xfs_agnumber_t index; + xfs_agnumber_t first_initialised = NULLAGNUMBER; + int error; + + /* + * Walk the current per-ag tree so we don't try to initialise AGs + * that already exist (growfs case). Allocate and insert all the + * AGs we don't find ready for initialisation. + */ + for (index = 0; index < agcount; index++) { + pag = xfs_perag_get(mp, index); + if (pag) { + xfs_perag_put(pag); + continue; + } + + pag = kmem_zalloc(sizeof(*pag), KM_MAYFAIL); + if (!pag) { + error = -ENOMEM; + goto out_unwind_new_pags; + } + pag->pag_agno = index; + pag->pag_mount = mp; + + error = radix_tree_preload(GFP_NOFS); + if (error) + goto out_free_pag; + + spin_lock(&mp->m_perag_lock); + if (radix_tree_insert(&mp->m_perag_tree, index, pag)) { + WARN_ON_ONCE(1); + spin_unlock(&mp->m_perag_lock); + radix_tree_preload_end(); + error = -EEXIST; + goto out_free_pag; + } + spin_unlock(&mp->m_perag_lock); + radix_tree_preload_end(); + + /* Place kernel structure only init below this point. */ + spin_lock_init(&pag->pag_ici_lock); + spin_lock_init(&pag->pagb_lock); + spin_lock_init(&pag->pag_state_lock); + INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker); + INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC); + init_waitqueue_head(&pag->pagb_wait); + pag->pagb_count = 0; + pag->pagb_tree = RB_ROOT; + + error = xfs_buf_hash_init(pag); + if (error) + goto out_remove_pag; + + error = xfs_iunlink_init(pag); + if (error) + goto out_hash_destroy; + + /* first new pag is fully initialized */ + if (first_initialised == NULLAGNUMBER) + first_initialised = index; + } + + index = xfs_set_inode_alloc(mp, agcount); + + if (maxagi) + *maxagi = index; + + mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp); + return 0; + +out_hash_destroy: + xfs_buf_hash_destroy(pag); +out_remove_pag: + radix_tree_delete(&mp->m_perag_tree, index); +out_free_pag: + kmem_free(pag); +out_unwind_new_pags: + /* unwind any prior newly initialized pags */ + for (index = first_initialised; index < agcount; index++) { + pag = radix_tree_delete(&mp->m_perag_tree, index); + if (!pag) + break; + xfs_buf_hash_destroy(pag); + xfs_iunlink_destroy(pag); + kmem_free(pag); + } + return error; +} static int xfs_get_aghdr_buf( diff --cc fs/xfs/libxfs/xfs_ag.h index 6006b43,4535de1..4c6f904 --- a/fs/xfs/libxfs/xfs_ag.h +++ b/fs/xfs/libxfs/xfs_ag.h @@@ -9,142 -9,6 +9,142 @@@ struct xfs_mount; struct xfs_trans; +struct xfs_perag; + +/* + * Per-ag infrastructure + */ + +/* per-AG block reservation data structures*/ +struct xfs_ag_resv { + /* number of blocks originally reserved here */ + xfs_extlen_t ar_orig_reserved; + /* number of blocks reserved here */ + xfs_extlen_t ar_reserved; + /* number of blocks originally asked for */ + xfs_extlen_t ar_asked; +}; + +/* + * Per-ag incore structure, copies of information in agf and agi, to improve the + * performance of allocation group selection. + */ +struct xfs_perag { + struct xfs_mount *pag_mount; /* owner filesystem */ + xfs_agnumber_t pag_agno; /* AG this structure belongs to */ + atomic_t pag_ref; /* perag reference count */ + char pagf_init; /* this agf's entry is initialized */ + char pagi_init; /* this agi's entry is initialized */ + char pagf_metadata; /* the agf is preferred to be metadata */ + char pagi_inodeok; /* The agi is ok for inodes */ + uint8_t pagf_levels[XFS_BTNUM_AGF]; + /* # of levels in bno & cnt btree */ + bool pagf_agflreset; /* agfl requires reset before use */ + uint32_t pagf_flcount; /* count of blocks in freelist */ + xfs_extlen_t pagf_freeblks; /* total free blocks */ + xfs_extlen_t pagf_longest; /* longest free space */ + uint32_t pagf_btreeblks; /* # of blocks held in AGF btrees */ + xfs_agino_t pagi_freecount; /* number of free inodes */ + xfs_agino_t pagi_count; /* number of allocated inodes */ + + /* + * Inode allocation search lookup optimisation. + * If the pagino matches, the search for new inodes + * doesn't need to search the near ones again straight away + */ + xfs_agino_t pagl_pagino; + xfs_agino_t pagl_leftrec; + xfs_agino_t pagl_rightrec; + + int pagb_count; /* pagb slots in use */ + uint8_t pagf_refcount_level; /* recount btree height */ + + /* Blocks reserved for all kinds of metadata. */ + struct xfs_ag_resv pag_meta_resv; + /* Blocks reserved for the reverse mapping btree. */ + struct xfs_ag_resv pag_rmapbt_resv; + + /* -- kernel only structures below this line -- */ + + /* + * Bitsets of per-ag metadata that have been checked and/or are sick. + * Callers should hold pag_state_lock before accessing this field. + */ + uint16_t pag_checked; + uint16_t pag_sick; + spinlock_t pag_state_lock; + + spinlock_t pagb_lock; /* lock for pagb_tree */ + struct rb_root pagb_tree; /* ordered tree of busy extents */ + unsigned int pagb_gen; /* generation count for pagb_tree */ + wait_queue_head_t pagb_wait; /* woken when pagb_gen changes */ + + atomic_t pagf_fstrms; /* # of filestreams active in this AG */ + + spinlock_t pag_ici_lock; /* incore inode cache lock */ + struct radix_tree_root pag_ici_root; /* incore inode cache root */ + int pag_ici_reclaimable; /* reclaimable inodes */ + unsigned long pag_ici_reclaim_cursor; /* reclaim restart point */ + + /* buffer cache index */ + spinlock_t pag_buf_lock; /* lock for pag_buf_hash */ + struct rhashtable pag_buf_hash; + + /* for rcu-safe freeing */ + struct rcu_head rcu_head; + + /* background prealloc block trimming */ + struct delayed_work pag_blockgc_work; + + /* + * Unlinked inode information. This incore information reflects + * data stored in the AGI, so callers must hold the AGI buffer lock + * or have some other means to control concurrency. + */ + struct rhashtable pagi_unlinked_hash; +}; + +int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount, + xfs_agnumber_t *maxagi); - int xfs_initialize_perag_data(struct xfs_mount *, xfs_agnumber_t); ++int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno); +void xfs_free_perag(struct xfs_mount *mp); + - struct xfs_perag *xfs_perag_get(struct xfs_mount *, xfs_agnumber_t); - struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *, xfs_agnumber_t, - int tag); - void xfs_perag_put(struct xfs_perag *pag); ++struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno); ++struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno, ++ unsigned int tag); ++void xfs_perag_put(struct xfs_perag *pag); + +/* + * Perag iteration APIs + * + * XXX: for_each_perag_range() usage really needs an iterator to clean up when + * we terminate at end_agno because we may have taken a reference to the perag + * beyond end_agno. Right now callers have to be careful to catch and clean that + * up themselves. This is not necessary for the callers of for_each_perag() and + * for_each_perag_from() because they terminate at sb_agcount where there are + * no perag structures in tree beyond end_agno. + */ +#define for_each_perag_range(mp, next_agno, end_agno, pag) \ + for ((pag) = xfs_perag_get((mp), (next_agno)); \ + (pag) != NULL && (next_agno) <= (end_agno); \ + (next_agno) = (pag)->pag_agno + 1, \ + xfs_perag_put(pag), \ + (pag) = xfs_perag_get((mp), (next_agno))) + +#define for_each_perag_from(mp, next_agno, pag) \ + for_each_perag_range((mp), (next_agno), (mp)->m_sb.sb_agcount, (pag)) + + +#define for_each_perag(mp, agno, pag) \ + (agno) = 0; \ + for_each_perag_from((mp), (agno), (pag)) + +#define for_each_perag_tag(mp, agno, pag, tag) \ + for ((agno) = 0, (pag) = xfs_perag_get_tag((mp), 0, (tag)); \ + (pag) != NULL; \ + (agno) = (pag)->pag_agno + 1, \ + xfs_perag_put(pag), \ + (pag) = xfs_perag_get_tag((mp), (agno), (tag))) struct aghdr_init_data { /* per ag data */