ceph: take snap_empty_lock atomically with snaprealm refcount change
authorJeff Layton <jlayton@kernel.org>
Tue, 3 Aug 2021 16:47:34 +0000 (12:47 -0400)
committerIlya Dryomov <idryomov@gmail.com>
Wed, 4 Aug 2021 17:20:29 +0000 (19:20 +0200)
commit8434ffe71c874b9c4e184b88d25de98c2bf5fe3f
tree5ef46afe4558aa95b07363f4eb1f19dce90b8b46
parentbf2ba432213fade50dd39f2e348085b758c0726e
ceph: take snap_empty_lock atomically with snaprealm refcount change

There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.

Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.

Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.

With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson <mnelson@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
fs/ceph/snap.c