md/raid5: release batch_last before waiting for another stripe_head
authorDavid Jeffery <djeffery@redhat.com>
Mon, 2 Oct 2023 18:32:29 +0000 (14:32 -0400)
committerSong Liu <song@kernel.org>
Tue, 3 Oct 2023 15:53:09 +0000 (08:53 -0700)
When raid5_get_active_stripe is called with a ctx containing a stripe_head in
its batch_last pointer, it can cause a deadlock if the task sleeps waiting on
another stripe_head to become available. The stripe_head held by batch_last
can be blocking the advancement of other stripe_heads, leading to no
stripe_heads being released so raid5_get_active_stripe waits forever.

Like with the quiesce state handling earlier in the function, batch_last
needs to be released by raid5_get_active_stripe before it waits for another
stripe_head.

Fixes: 3312e6c887fe ("md/raid5: Keep a reference to last stripe_head for batch")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231002183422.13047-1-djeffery@redhat.com
drivers/md/raid5.c

index 4cb9c60..284cd71 100644 (file)
@@ -854,6 +854,13 @@ struct stripe_head *raid5_get_active_stripe(struct r5conf *conf,
 
                set_bit(R5_INACTIVE_BLOCKED, &conf->cache_state);
                r5l_wake_reclaim(conf->log, 0);
+
+               /* release batch_last before wait to avoid risk of deadlock */
+               if (ctx && ctx->batch_last) {
+                       raid5_release_stripe(ctx->batch_last);
+                       ctx->batch_last = NULL;
+               }
+
                wait_event_lock_irq(conf->wait_for_stripe,
                                    is_inactive_blocked(conf, hash),
                                    *(conf->hash_locks + hash));