ceph: take snap_empty_lock atomically with snaprealm refcount change
authorJeff Layton <jlayton@kernel.org>
Tue, 3 Aug 2021 16:47:34 +0000 (12:47 -0400)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 18 Aug 2021 06:59:18 +0000 (08:59 +0200)
commit2fe07584a6236d22be17f3866c4c45e0a3058d2a
treeeddb739e0a1aeea861e0effa6d0e8f4ffa3715ff
parenta23aced54c2c9053a0956fc1a8a6d7c0a0fff96f
ceph: take snap_empty_lock atomically with snaprealm refcount change

commit 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f upstream.

There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.

Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.

Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.

With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson <mnelson@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs/ceph/snap.c