mm/uffd: reset write protection when unregister with wp-mode
authorPeter Xu <peterx@redhat.com>
Thu, 11 Aug 2022 20:13:40 +0000 (16:13 -0400)
committerAndrew Morton <akpm@linux-foundation.org>
Sat, 20 Aug 2022 22:17:45 +0000 (15:17 -0700)
The motivation of this patch comes from a recent report and patchfix from
David Hildenbrand on hugetlb shared handling of wr-protected page [1].

With the reproducer provided in commit message of [1], one can leverage
the uffd-wp lazy-reset of ptes to trigger a hugetlb issue which can affect
not only the attacker process, but also the whole system.

The lazy-reset mechanism of uffd-wp was used to make unregister faster,
meanwhile it has an assumption that any leftover pgtable entries should
only affect the process on its own, so not only the user should be aware
of anything it does, but also it should not affect outside of the process.

But it seems that this is not true, and it can also be utilized to make
some exploit easier.

So far there's no clue showing that the lazy-reset is important to any
userfaultfd users because normally the unregister will only happen once
for a specific range of memory of the lifecycle of the process.

Considering all above, what this patch proposes is to do explicit pte
resets when unregister an uffd region with wr-protect mode enabled.

It should be the same as calling ioctl(UFFDIO_WRITEPROTECT, wp=false)
right before ioctl(UFFDIO_UNREGISTER) for the user.  So potentially it'll
make the unregister slower.  From that pov it's a very slight abi change,
but hopefully nothing should break with this change either.

Regarding to the change itself - core of uffd write [un]protect operation
is moved into a separate function (uffd_wp_range()) and it is reused in
the unregister code path.

Note that the new function will not check for anything, e.g.  ranges or
memory types, because they should have been checked during the previous
UFFDIO_REGISTER or it should have failed already.  It also doesn't check
mmap_changing because we're with mmap write lock held anyway.

I added a Fixes upon introducing of uffd-wp shmem+hugetlbfs because that's
the only issue reported so far and that's the commit David's reproducer
will start working (v5.19+).  But the whole idea actually applies to not
only file memories but also anonymous.  It's just that we don't need to
fix anonymous prior to v5.19- because there's no known way to exploit.

IOW, this patch can also fix the issue reported in [1] as the patch 2 does.

[1] https://lore.kernel.org/all/20220811103435.188481-3-david@redhat.com/

Link: https://lkml.kernel.org/r/20220811201340.39342-1-peterx@redhat.com
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/userfaultfd.c
include/linux/userfaultfd_k.h
mm/userfaultfd.c

index 1c44bf75f9160cd509f684f0727c8a36be085527..175de70e3adfdd68a88ef6c11e16e68c8f296a17 100644 (file)
@@ -1601,6 +1601,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
                        wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
                }
 
+               /* Reset ptes for the whole vma range if wr-protected */
+               if (userfaultfd_wp(vma))
+                       uffd_wp_range(mm, vma, start, vma_end - start, false);
+
                new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
                prev = vma_merge(mm, prev, start, vma_end, new_flags,
                                 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
index 732b522bacb7e5c15a8d6e00f9f4d4e8c6daa4ea..e1b8a915e9e9fb1346fc2ffdc354afcdbf6627d3 100644 (file)
@@ -73,6 +73,8 @@ extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start,
 extern int mwriteprotect_range(struct mm_struct *dst_mm,
                               unsigned long start, unsigned long len,
                               bool enable_wp, atomic_t *mmap_changing);
+extern void uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *vma,
+                         unsigned long start, unsigned long len, bool enable_wp);
 
 /* mm helpers */
 static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
index 07d3befc80e4134dd6b54be127197ffaad5c10e2..7327b2573f7c2f83c0475109fe7b6e6479b270a5 100644 (file)
@@ -703,14 +703,29 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start,
                              mmap_changing, 0);
 }
 
+void uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma,
+                  unsigned long start, unsigned long len, bool enable_wp)
+{
+       struct mmu_gather tlb;
+       pgprot_t newprot;
+
+       if (enable_wp)
+               newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
+       else
+               newprot = vm_get_page_prot(dst_vma->vm_flags);
+
+       tlb_gather_mmu(&tlb, dst_mm);
+       change_protection(&tlb, dst_vma, start, start + len, newprot,
+                         enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
+       tlb_finish_mmu(&tlb);
+}
+
 int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
                        unsigned long len, bool enable_wp,
                        atomic_t *mmap_changing)
 {
        struct vm_area_struct *dst_vma;
        unsigned long page_mask;
-       struct mmu_gather tlb;
-       pgprot_t newprot;
        int err;
 
        /*
@@ -750,15 +765,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
                        goto out_unlock;
        }
 
-       if (enable_wp)
-               newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
-       else
-               newprot = vm_get_page_prot(dst_vma->vm_flags);
-
-       tlb_gather_mmu(&tlb, dst_mm);
-       change_protection(&tlb, dst_vma, start, start + len, newprot,
-                         enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
-       tlb_finish_mmu(&tlb);
+       uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp);
 
        err = 0;
 out_unlock: