md/raid10: fix task hung in raid10d
authorLi Nan <linan122@huawei.com>
Wed, 22 Feb 2023 04:09:59 +0000 (12:09 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 11 May 2023 14:03:23 +0000 (23:03 +0900)
commit9050576bff9f900c44791c4cded31947094d2952
tree9fcca3f20a46fe547edc97032cbaa20eee2846fa
parentdf6222b01f5307711778cb54b36e66014a6d740a
md/raid10: fix task hung in raid10d

[ Upstream commit 72c215ed8731c88b2d7e09afc51fffc207ae47b8 ]

commit fe630de009d0 ("md/raid10: avoid deadlock on recovery.") allowed
normal io and sync io to exist at the same time. Task hung will occur as
below:

T1                      T2 T3 T4
raid10d
 handle_read_error
  allow_barrier
   conf->nr_pending--
    -> 0
                        //submit sync io
                        raid10_sync_request
                         raise_barrier
  ->will not be blocked
  ...
//submit to drivers
  raid10_read_request
   wait_barrier
    conf->nr_pending++
     -> 1
//retry read fail
raid10_end_read_request
 reschedule_retry
  add to retry_list
  conf->nr_queued++
   -> 1
//sync io fail
end_sync_read
 __end_sync_read
  reschedule_retry
   add to retry_list
                    conf->nr_queued++
     -> 2
 ...
 handle_read_error
 get form retry_list
 conf->nr_queued--
  freeze_array
   wait nr_pending == nr_queued+1
        ->1       ->2
   //task hung

retry read and sync io will be added to retry_list(nr_queued->2) if they
fails. raid10d() called handle_read_error() and hung in freeze_array().
nr_queued will not decrease because raid10d is blocked, nr_pending will
not increase because conf->barrier is not released.

Fix it by moving allow_barrier() after raid10_read_request().
raise_barrier() will wait for nr_waiting to become 0. Therefore, sync io
and regular io will not be issued at the same time.

Also remove the check of nr_queued in stop_waiting_barrier. It can be 0
but don't need to be blocking. Remove the check for MD_RECOVERY_RUNNING as
the check is redundent.

Fixes: fe630de009d0 ("md/raid10: avoid deadlock on recovery.")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230222041000.3341651-2-linan666@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/md/raid10.c