drbd: fix potential spinlock deadlock
authorLars Ellenberg <lars.ellenberg@linbit.com>
Wed, 2 Nov 2011 15:29:45 +0000 (16:29 +0100)
committerPhilipp Reisner <philipp.reisner@linbit.com>
Wed, 9 May 2012 13:15:58 +0000 (15:15 +0200)
drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks
left to be resynced (rs_left) in the current resync extent.
If it detects a mismatch, it complains, and forces a disconnect using
drbd_force_state(mdev, NS(conn, C_DISCONNECTING));

Unfortunately, this may be called while holding the req_lock,
and drbd_force_state() want's to aquire that lock itself. Deadlock.

Don't force a disconnect, but fix up rs_left by recounting and
reassigning the number of dirty blocks in that extent.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
drivers/block/drbd/drbd_actlog.c

index 3d7c215..601ad9e 100644 (file)
@@ -711,16 +711,20 @@ static void drbd_try_clear_on_disk_bm(struct drbd_conf *mdev, sector_t sector,
                        else
                                ext->rs_failed += count;
                        if (ext->rs_left < ext->rs_failed) {
-                               dev_err(DEV, "BAD! sector=%llus enr=%u rs_left=%d "
-                                   "rs_failed=%d count=%d\n",
+                               dev_warn(DEV, "BAD! sector=%llus enr=%u rs_left=%d "
+                                   "rs_failed=%d count=%d cstate=%s\n",
                                     (unsigned long long)sector,
                                     ext->lce.lc_number, ext->rs_left,
-                                    ext->rs_failed, count);
-                               dump_stack();
-
-                               lc_put(mdev->resync, &ext->lce);
-                               drbd_force_state(mdev, NS(conn, C_DISCONNECTING));
-                               return;
+                                    ext->rs_failed, count,
+                                    drbd_conn_str(mdev->state.conn));
+
+                               /* We don't expect to be able to clear more bits
+                                * than have been set when we originally counted
+                                * the set bits to cache that value in ext->rs_left.
+                                * Whatever the reason (disconnect during resync,
+                                * delayed local completion of an application write),
+                                * try to fix it up by recounting here. */
+                               ext->rs_left = drbd_bm_e_weight(mdev, enr);
                        }
                } else {
                        /* Normally this element should be in the cache,