drbd: fix potential deadlock when trying to detach during handshake
authorLars Ellenberg <lars.ellenberg@linbit.com>
Tue, 29 Aug 2017 08:20:43 +0000 (10:20 +0200)
committerJens Axboe <axboe@kernel.dk>
Tue, 29 Aug 2017 21:34:45 +0000 (15:34 -0600)
commit33d32fa7120ed184efc9be1ea3c016109b4fea84
tree30c0fdf900ecdfbf51207795eebc0d985a06effe
parent427fd2bee0a33a670de186387e79d280a6808a66
drbd: fix potential deadlock when trying to detach during handshake

When requesting a detach, we first suspend IO, and also inhibit meta-data IO
by means of drbd_md_get_buffer(), because we don't want to "fail" the disk
while there is IO in-flight: the transition into D_FAILED for detach purposes
may get misinterpreted as actual IO error in a confused endio function.

We wrap it all into wait_event(), to retry in case the drbd_req_state()
returns SS_IN_TRANSIENT_STATE, as it does for example during an ongoing
connection handshake.

In that example, the receiver thread may need to grab drbd_md_get_buffer()
during the handshake to make progress.  To avoid potential deadlock with
detach, detach needs to grab and release the meta data buffer inside of
that wait_event retry loop. To avoid lock inversion between
mutex_lock(&device->state_mutex) and drbd_md_get_buffer(device),
introduce a new enum chg_state_flag CS_INHIBIT_MD_IO, and move the
call to drbd_md_get_buffer() inside the state_mutex grabbed in
drbd_req_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
drivers/block/drbd/drbd_nl.c
drivers/block/drbd/drbd_state.c
drivers/block/drbd/drbd_state.h