drbd: fix for possible deadlock on IO error during resync
authorLars Ellenberg <lars.ellenberg@linbit.com>
Tue, 14 Sep 2010 18:26:27 +0000 (20:26 +0200)
committerPhilipp Reisner <philipp.reisner@linbit.com>
Thu, 14 Oct 2010 16:38:50 +0000 (18:38 +0200)
commite9e6f3ec535d7b7c9e2ca64ad691e743e7d3c2f0
treecbc17d81b9d937b4fc515548f30f5ed00be193ee
parent22cc37a943832c948808884604ec6f5ff2594c1d
drbd: fix for possible deadlock on IO error during resync

Scenario:

Something (say, flush-147:0) is in drbd_al_begin_io,
holding a local_cnt, waiting for the resync to make progress.

Disk fails, worker in after_state_ch does drbd_rs_cancel_all,
then waits for local_cnt to drop to zero.

flush-147:0 is woken by drbd_rs_cancel_all, needs to write an AL
transaction, and queues that on the worker.

Deadlock.

Fix: do not wait in the worker, have put_ldev() trigger the
state change D_FAILED -> D_DISKLESS when necessary.
put_ldev() cannot do the state change directly, as it may or may not
already hold various spinlocks. We queue a short work instead.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
drivers/block/drbd/drbd_int.h
drivers/block/drbd/drbd_main.c