ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
authorpiaojun <piaojun@huawei.com>
Tue, 2 Aug 2016 21:02:19 +0000 (14:02 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Tue, 2 Aug 2016 21:31:41 +0000 (17:31 -0400)
commitee8f7fcbe638b07e8d1c3dc98e8be35e56306d05
tree25b25d9131e4a07e700302c01a5f3924d9c8ae2b
parent309e91911daede6adde0364f489e69909c3f6894
ocfs2/dlm: continue to purge recovery lockres when recovery master goes down

We found a dlm-blocked situation caused by continuous breakdown of
recovery masters described below.  To solve this problem, we should
purge recovery lock once detecting recovery master goes down.

N3                      N2                   N1(reco master)
                        go down
                                             pick up recovery lock and
                                             begin recoverying for N2

                                             go down

pick up recovery
lock failed, then
purge it:
dlm_purge_lockres
  ->DROPPING_REF is set

send deref to N1 failed,
recovery lock is not purged

find N1 go down, begin
recoverying for N1, but
blocked in dlm_do_recovery
as DROPPING_REF is set:
dlm_do_recovery
  ->dlm_pick_recovery_master
    ->dlmlock
      ->dlm_get_lock_resource
        ->__dlm_wait_on_lockres_flags(tmpres,
   DLM_LOCK_RES_DROPPING_REF);

Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/ocfs2/dlm/dlmcommon.h
fs/ocfs2/dlm/dlmmaster.c
fs/ocfs2/dlm/dlmrecovery.c
fs/ocfs2/dlm/dlmthread.c