rbd: cancel lock_dwork if the wait is interrupted
authorDongsheng Yang <dongsheng.yang@easystack.cn>
Fri, 27 Sep 2019 15:33:22 +0000 (15:33 +0000)
committerIlya Dryomov <idryomov@gmail.com>
Tue, 15 Oct 2019 15:43:15 +0000 (17:43 +0200)
There is a warning message in my test with below steps:

  # rbd bench --io-type write --io-size 4K --io-threads 1 --io-pattern rand test &
  # sleep 5
  # pkill -9 rbd
  # rbd map test &
  # sleep 5
  # pkill rbd

The reason is that the rbd_add_acquire_lock() is interruptable,
that means, when we kill the waiting on ->acquire_wait, the lock_dwork
could be still running.

1. do_rbd_add() 2. lock_dwork
rbd_add_acquire_lock()
  - queue_delayed_work()
lock_dwork queued
    - wait_for_completion_killable_timeout()  <-- kill happen
rbd_dev_image_unlock() <-- UNLOCKED now, nothing to do.
rbd_dev_device_release()
rbd_dev_image_release()
  - ...
lock successed here
     - cancel_delayed_work_sync(&rbd_dev->lock_dwork)

Then when we reach the rbd_dev_free(), WARN_ON is triggered because
lock_state is not RBD_LOCK_STATE_UNLOCKED.

To fix it, this commit make sure the lock_dwork was finished before
calling rbd_dev_image_unlock().

On the other hand, this would not happend in do_rbd_remove(), because
after rbd mapped, lock_dwork will only be queued for IO request, and
request will continue unless lock_dwork finished. when we call
rbd_dev_image_unlock() in do_rbd_remove(), all requests are done.
That means, lock_state should not be locked again after
rbd_dev_image_unlock().

[ Cancel lock_dwork in rbd_add_acquire_lock(), only if the wait is
  interrupted. ]

Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
drivers/block/rbd.c

index 7c4350c0fb77e6ba5366bf4455ecb8f410484499..39136675dae5b381241a3ae2acce1cd93e2cfa39 100644 (file)
@@ -6639,10 +6639,13 @@ static int rbd_add_acquire_lock(struct rbd_device *rbd_dev)
        queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0);
        ret = wait_for_completion_killable_timeout(&rbd_dev->acquire_wait,
                            ceph_timeout_jiffies(rbd_dev->opts->lock_timeout));
-       if (ret > 0)
+       if (ret > 0) {
                ret = rbd_dev->acquire_err;
-       else if (!ret)
-               ret = -ETIMEDOUT;
+       } else {
+               cancel_delayed_work_sync(&rbd_dev->lock_dwork);
+               if (!ret)
+                       ret = -ETIMEDOUT;
+       }
 
        if (ret) {
                rbd_warn(rbd_dev, "failed to acquire exclusive lock: %ld", ret);