xfs: shutdown during log recovery needs to mark the log shutdown
authorDave Chinner <dchinner@redhat.com>
Wed, 30 Mar 2022 01:22:02 +0000 (18:22 -0700)
committerDarrick J. Wong <djwong@kernel.org>
Wed, 30 Mar 2022 01:22:02 +0000 (18:22 -0700)
When a checkpoint writeback is run by log recovery, corruption
propagated from the log can result in writeback verifiers failing
and calling xfs_force_shutdown() from
xfs_buf_delwri_submit_buffers().

This results in the mount being marked as shutdown, but the log does
not get marked as shut down because:

        /*
         * If this happens during log recovery then we aren't using the runtime
         * log mechanisms yet so there's nothing to shut down.
         */
        if (!log || xlog_in_recovery(log))
                return false;

If there are other buffers that then fail (say due to detecting the
mount shutdown), they will now hang in xfs_do_force_shutdown()
waiting for the log to shut down like this:

  __schedule+0x30d/0x9e0
  schedule+0x55/0xd0
  xfs_do_force_shutdown+0x1cd/0x200
  ? init_wait_var_entry+0x50/0x50
  xfs_buf_ioend+0x47e/0x530
  __xfs_buf_submit+0xb0/0x240
  xfs_buf_delwri_submit_buffers+0xfe/0x270
  xfs_buf_delwri_submit+0x3a/0xc0
  xlog_do_recovery_pass+0x474/0x7b0
  ? do_raw_spin_unlock+0x30/0xb0
  xlog_do_log_recovery+0x91/0x140
  xlog_do_recover+0x38/0x1e0
  xlog_recover+0xdd/0x170
  xfs_log_mount+0x17e/0x2e0
  xfs_mountfs+0x457/0x930
  xfs_fs_fill_super+0x476/0x830

xlog_force_shutdown() always needs to mark the log as shut down,
regardless of whether recovery is in progress or not, so that
multiple calls to xfs_force_shutdown() during recovery don't end
up waiting for the log to be shut down like this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
fs/xfs/xfs_log.c

index 678ca01..fc893d3 100644 (file)
@@ -3847,11 +3847,7 @@ xlog_force_shutdown(
 {
        bool            log_error = (shutdown_flags & SHUTDOWN_LOG_IO_ERROR);
 
-       /*
-        * If this happens during log recovery then we aren't using the runtime
-        * log mechanisms yet so there's nothing to shut down.
-        */
-       if (!log || xlog_in_recovery(log))
+       if (!log)
                return false;
 
        /*
@@ -3860,10 +3856,16 @@ xlog_force_shutdown(
         * before the force will prevent the log force from flushing the iclogs
         * to disk.
         *
-        * Re-entry due to a log IO error shutdown during the log force is
-        * prevented by the atomicity of higher level shutdown code.
+        * When we are in recovery, there are no transactions to flush, and
+        * we don't want to touch the log because we don't want to perturb the
+        * current head/tail for future recovery attempts. Hence we need to
+        * avoid a log force in this case.
+        *
+        * If we are shutting down due to a log IO error, then we must avoid
+        * trying to write the log as that may just result in more IO errors and
+        * an endless shutdown/force loop.
         */
-       if (!log_error)
+       if (!log_error && !xlog_in_recovery(log))
                xfs_log_force(log->l_mp, XFS_LOG_SYNC);
 
        /*