mm: clean up the last pieces of page fault accountings
authorPeter Xu <peterx@redhat.com>
Wed, 12 Aug 2020 01:38:57 +0000 (18:38 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Wed, 12 Aug 2020 17:58:04 +0000 (10:58 -0700)
Here're the last pieces of page fault accounting that were still done
outside handle_mm_fault() where we still have regs==NULL when calling
handle_mm_fault():

arch/powerpc/mm/copro_fault.c:   copro_handle_mm_fault
arch/sparc/mm/fault_32.c:        force_user_fault
arch/um/kernel/trap.c:           handle_page_fault
mm/gup.c:                        faultin_page
                                 fixup_user_fault
mm/hmm.c:                        hmm_vma_fault
mm/ksm.c:                        break_ksm

Some of them has the issue of duplicated accounting for page fault
retries.  Some of them didn't do the accounting at all.

This patch cleans all these up by letting handle_mm_fault() to do per-task
page fault accounting even if regs==NULL (though we'll still skip the perf
event accountings).  With that, we can safely remove all the outliers now.

There's another functional change in that now we account the page faults
to the caller of gup, rather than the task_struct that passed into the gup
code.  More information of this can be found at [1].

After this patch, below things should never be touched again outside
handle_mm_fault():

  - task_struct.[maj|min]_flt
  - PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]

[1] https://lore.kernel.org/lkml/CAHk-=wj_V2Tps2QrMn20_W0OJF9xqNh52XSGA42s-ZJ8Y+GyKw@mail.gmail.com/

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-25-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/powerpc/mm/copro_fault.c
arch/um/kernel/trap.c
mm/gup.c
mm/memory.c

index 2d0276abe0a68a6077f167b558fc1d51dbec41db..8acd001789561e4adaf6a320bc2bda7bebecdeda 100644 (file)
@@ -76,11 +76,6 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
                BUG();
        }
 
-       if (*flt & VM_FAULT_MAJOR)
-               current->maj_flt++;
-       else
-               current->min_flt++;
-
 out_unlock:
        mmap_read_unlock(mm);
        return ret;
index 8d9870d76da12258e56132749ead061b03f018e5..ad12f78bda7e48e127b81d96eb295456f9d8cb8c 100644 (file)
@@ -88,10 +88,6 @@ good_area:
                        BUG();
                }
                if (flags & FAULT_FLAG_ALLOW_RETRY) {
-                       if (fault & VM_FAULT_MAJOR)
-                               current->maj_flt++;
-                       else
-                               current->min_flt++;
                        if (fault & VM_FAULT_RETRY) {
                                flags |= FAULT_FLAG_TRIED;
 
index ae7121d729fa784b3b88e2829d125c8217647d56..d5d44c68fa19e48fe384234892f2d528c0bb9ef0 100644 (file)
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -893,13 +893,6 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
                BUG();
        }
 
-       if (tsk) {
-               if (ret & VM_FAULT_MAJOR)
-                       tsk->maj_flt++;
-               else
-                       tsk->min_flt++;
-       }
-
        if (ret & VM_FAULT_RETRY) {
                if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
                        *locked = 0;
@@ -1255,12 +1248,6 @@ retry:
                goto retry;
        }
 
-       if (tsk) {
-               if (major)
-                       tsk->maj_flt++;
-               else
-                       tsk->min_flt++;
-       }
        return 0;
 }
 EXPORT_SYMBOL_GPL(fixup_user_fault);
index 9b7d35734caaf262c27d6da39e4150225d19afd9..2b7f0e00f3120f480bb637801854c3d3a21b6e6d 100644 (file)
@@ -4400,20 +4400,23 @@ static inline void mm_account_fault(struct pt_regs *regs,
         */
        major = (ret & VM_FAULT_MAJOR) || (flags & FAULT_FLAG_TRIED);
 
+       if (major)
+               current->maj_flt++;
+       else
+               current->min_flt++;
+
        /*
-        * If the fault is done for GUP, regs will be NULL, and we will skip
-        * the fault accounting.
+        * If the fault is done for GUP, regs will be NULL.  We only do the
+        * accounting for the per thread fault counters who triggered the
+        * fault, and we skip the perf event updates.
         */
        if (!regs)
                return;
 
-       if (major) {
-               current->maj_flt++;
+       if (major)
                perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
-       } else {
-               current->min_flt++;
+       else
                perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
-       }
 }
 
 /*