rename(): fix the locking of subdirectories

author Al Viro <viro@zeniv.linux.org.uk>

Mon, 20 Nov 2023 01:25:58 +0000 (20:25 -0500)

committer Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thu, 1 Feb 2024 00:18:57 +0000 (16:18 -0800)
author Al Viro <viro@zeniv.linux.org.uk>
Mon, 20 Nov 2023 01:25:58 +0000 (20:25 -0500)
committer Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 1 Feb 2024 00:18:57 +0000 (16:18 -0800)
diff --git a/Documentation/filesystems/directory-locking.rst b/Documentation/filesystems/directory-locking.rst

index dccd61c..193c226 100644 (file)
--- a/Documentation/filesystems/directory-locking.rst
+++ b/Documentation/filesystems/directory-locking.rst
@@ -22,13 +22,16 @@ exclusive.
  3) object removal.  Locking rules: caller locks parent, finds victim,
  locks victim and calls the method.  Locks are exclusive.
  
-4) rename() that is _not_ cross-directory.  Locking rules: caller locks the
-parent and finds source and target.  We lock both (provided they exist).  If we
-need to lock two inodes of different type (dir vs non-dir), we lock directory
-first.  If we need to lock two inodes of the same type, lock them in inode
-pointer order.  Then call the method.  All locks are exclusive.
-NB: we might get away with locking the source (and target in exchange
-case) shared.
+4) rename() that is _not_ cross-directory.  Locking rules: caller locks
+the parent and finds source and target.  Then we decide which of the
+source and target need to be locked.  Source needs to be locked if it's a
+non-directory; target - if it's a non-directory or about to be removed.
+Take the locks that need to be taken, in inode pointer order if need
+to take both (that can happen only when both source and target are
+non-directories - the source because it wouldn't be locked otherwise
+and the target because mixing directory and non-directory is allowed
+only with RENAME_EXCHANGE, and that won't be removing the target).
+After the locks had been taken, call the method.  All locks are exclusive.
  
  5) link creation.  Locking rules:
  
@@ -44,20 +47,17 @@ rules:
  
         * lock the filesystem
         * lock parents in "ancestors first" order. If one is not ancestor of
-         the other, lock them in inode pointer order.
+         the other, lock the parent of source first.
         * find source and target.
         * if old parent is equal to or is a descendent of target
           fail with -ENOTEMPTY
         * if new parent is equal to or is a descendent of source
           fail with -ELOOP
-       * Lock both the source and the target provided they exist. If we
-         need to lock two inodes of different type (dir vs non-dir), we lock
-         the directory first. If we need to lock two inodes of the same type,
-         lock them in inode pointer order.
+       * Lock subdirectories involved (source before target).
+       * Lock non-directories involved, in inode pointer order.
         * call the method.
  
-All ->i_rwsem are taken exclusive.  Again, we might get away with locking
-the source (and target in exchange case) shared.
+All ->i_rwsem are taken exclusive.
  
  The rules above obviously guarantee that all directories that are going to be
  read, modified or removed by method will be locked by caller.
@@ -67,6 +67,7 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
  
  Proof:
  
+[XXX: will be updated once we are done massaging the lock_rename()]
         First of all, at any moment we have a linear ordering of the
         objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
          of A and ptr(A) < ptr(B)).
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst

index 7be2900..bd12f2f 100644 (file)
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -101,7 +101,7 @@ symlink:    exclusive
  mkdir:         exclusive
  unlink:                exclusive (both)
  rmdir:         exclusive (both)(see below)
-rename:                exclusive (all) (see below)
+rename:                exclusive (both parents, some children) (see below)
  readlink:      no
  get_link:      no
  setattr:       exclusive
@@ -123,6 +123,9 @@ get_offset_ctx  no
         Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
         exclusive on victim.
         cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
+       ->unlink() and ->rename() have ->i_rwsem exclusive on all non-directories
+       involved.
+       ->rename() has ->i_rwsem exclusive on any subdirectory that changes parent.
  
  See Documentation/filesystems/directory-locking.rst for more detailed discussion
  of the locking scheme for directory operations.
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst

index 4d05b98..41d964b 100644 (file)
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1045,3 +1045,21 @@ filesystem type is now moved to a later point when the devices are closed:
  As this is a VFS level change it has no practical consequences for filesystems
  other than that all of them must use one of the provided kill_litter_super(),
  kill_anon_super(), or kill_block_super() helpers.
+
+---
+
+**mandatory**
+
+If ->rename() update of .. on cross-directory move needs an exclusion with
+directory modifications, do *not* lock the subdirectory in question in your
+->rename() - it's done by the caller now [that item should've been added in
+28eceeda130f "fs: Lock moved directories"].
+
+---
+
+**mandatory**
+
+On same-directory ->rename() the (tautological) update of .. is not protected
+by any locks; just don't do it if the old parent is the same as the new one.
+We really can't lock two subdirectories in same-directory rename - not without
+deadlocks.
diff --git a/fs/namei.c b/fs/namei.c

index 94565bd..e728ba0 100644 (file)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3021,20 +3021,14 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
         p = d_ancestor(p2, p1);
         if (p) {
                 inode_lock_nested(p2->d_inode, I_MUTEX_PARENT);
-               inode_lock_nested(p1->d_inode, I_MUTEX_CHILD);
+               inode_lock_nested(p1->d_inode, I_MUTEX_PARENT2);
                 return p;
         }
  
         p = d_ancestor(p1, p2);
-       if (p) {
-               inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
-               inode_lock_nested(p2->d_inode, I_MUTEX_CHILD);
-               return p;
-       }
-
-       lock_two_inodes(p1->d_inode, p2->d_inode,
-                       I_MUTEX_PARENT, I_MUTEX_PARENT2);
-       return NULL;
+       inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
+       inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
+       return p;
  }
  
  /*
@@ -4733,11 +4727,12 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
   *
   *     a) we can get into loop creation.
   *     b) race potential - two innocent renames can create a loop together.
- *        That's where 4.4 screws up. Current fix: serialization on
+ *        That's where 4.4BSD screws up. Current fix: serialization on
   *        sb->s_vfs_rename_mutex. We might be more accurate, but that's another
   *        story.
- *     c) we have to lock _four_ objects - parents and victim (if it exists),
- *        and source.
+ *     c) we may have to lock up to _four_ objects - parents and victim (if it exists),
+ *        and source (if it's a non-directory or a subdirectory that moves to
+ *        different parent).
   *        And that - after we got ->i_mutex on parents (until then we don't know
   *        whether the target exists).  Solution: try to be smart with locking
   *        order for inodes.  We rely on the fact that tree topology may change
@@ -4769,6 +4764,7 @@ int vfs_rename(struct renamedata *rd)
         bool new_is_dir = false;
         unsigned max_links = new_dir->i_sb->s_max_links;
         struct name_snapshot old_name;
+       bool lock_old_subdir, lock_new_subdir;
  
         if (source == target)
                 return 0;
@@ -4822,15 +4818,32 @@ int vfs_rename(struct renamedata *rd)
         take_dentry_name_snapshot(&old_name, old_dentry);
         dget(new_dentry);
         /*
-        * Lock all moved children. Moved directories may need to change parent
-        * pointer so they need the lock to prevent against concurrent
-        * directory changes moving parent pointer. For regular files we've
-        * historically always done this. The lockdep locking subclasses are
-        * somewhat arbitrary but RENAME_EXCHANGE in particular can swap
-        * regular files and directories so it's difficult to tell which
-        * subclasses to use.
+        * Lock children.
+        * The source subdirectory needs to be locked on cross-directory
+        * rename or cross-directory exchange since its parent changes.
+        * The target subdirectory needs to be locked on cross-directory
+        * exchange due to parent change and on any rename due to becoming
+        * a victim.
+        * Non-directories need locking in all cases (for NFS reasons);
+        * they get locked after any subdirectories (in inode address order).
+        *
+        * NOTE: WE ONLY LOCK UNRELATED DIRECTORIES IN CROSS-DIRECTORY CASE.
+        * NEVER, EVER DO THAT WITHOUT ->s_vfs_rename_mutex.
          */
-       lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
+       lock_old_subdir = new_dir != old_dir;
+       lock_new_subdir = new_dir != old_dir || !(flags & RENAME_EXCHANGE);
+       if (is_dir) {
+               if (lock_old_subdir)
+                       inode_lock_nested(source, I_MUTEX_CHILD);
+               if (target && (!new_is_dir || lock_new_subdir))
+                       inode_lock(target);
+       } else if (new_is_dir) {
+               if (lock_new_subdir)
+                       inode_lock_nested(target, I_MUTEX_CHILD);
+               inode_lock(source);
+       } else {
+               lock_two_nondirectories(source, target);
+       }
  
         error = -EPERM;
         if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4878,8 +4891,9 @@ int vfs_rename(struct renamedata *rd)
                         d_exchange(old_dentry, new_dentry);
         }
  out:
-       inode_unlock(source);
-       if (target)
+       if (!is_dir || lock_old_subdir)
+               inode_unlock(source);
+       if (target && (!new_is_dir || lock_new_subdir))
                 inode_unlock(target);
         dput(new_dentry);
         if (!error) {
author	Al Viro <viro@zeniv.linux.org.uk>
	Mon, 20 Nov 2023 01:25:58 +0000 (20:25 -0500)
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	Thu, 1 Feb 2024 00:18:57 +0000 (16:18 -0800)
Documentation/filesystems/directory-locking.rst		patch \| blob \| history
Documentation/filesystems/locking.rst		patch \| blob \| history
Documentation/filesystems/porting.rst		patch \| blob \| history
fs/namei.c		patch \| blob \| history