migrate_pages: batch flushing TLB
authorHuang Ying <ying.huang@intel.com>
Mon, 13 Feb 2023 12:34:43 +0000 (20:34 +0800)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 17 Feb 2023 04:43:54 +0000 (20:43 -0800)
The TLB flushing will cost quite some CPU cycles during the folio
migration in some situations.  For example, when migrate a folio of a
process with multiple active threads that run on multiple CPUs.  After
batching the _unmap and _move in migrate_pages(), the TLB flushing can be
batched easily with the existing TLB flush batching mechanism.  This patch
implements that.

We use the following test case to test the patch.

On a 2-socket Intel server,

- Run pmbench memory accessing benchmark

- Run `migratepages` to migrate pages of pmbench between node 0 and
  node 1 back and forth.

With the patch, the TLB flushing IPI reduces 99.1% during the test and the
number of pages migrated successfully per second increases 291.7%.

Haoxin helped to test the patchset on an ARM64 server with 128 cores, 2
NUMA nodes.  Test results show that the page migration performance
increases up to 78%.

NOTE: TLB flushing is batched only for normal folios, not for THP folios.
Because the overhead of TLB flushing for THP folios is much lower than
that for normal folios (about 1/512 on x86 platform).

Link: https://lkml.kernel.org/r/20230213123444.155149-9-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Xin Hao <xhao@linux.alibaba.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/migrate.c
mm/rmap.c

index 00713ccb6643e90e0eca1120643f622f72b0d1e8..2fa420e4f68c51b85fb2039593c37569b8a5fd3f 100644 (file)
@@ -1248,7 +1248,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
                /* Establish migration ptes */
                VM_BUG_ON_FOLIO(folio_test_anon(src) &&
                               !folio_test_ksm(src) && !anon_vma, src);
-               try_to_migrate(src, 0);
+               try_to_migrate(src, TTU_BATCH_FLUSH);
                page_was_mapped = 1;
        }
 
@@ -1806,6 +1806,9 @@ retry:
        stats->nr_thp_failed += thp_retry;
        stats->nr_failed_pages += nr_retry_pages;
 move:
+       /* Flush TLBs for all unmapped folios */
+       try_to_unmap_flush();
+
        retry = 1;
        for (pass = 0;
             pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
index 8287f2cc327d3197b433a11ceec77cd6a7118e0c..15ae24585fc49df4e977a8940f858890f4b0dcf2 100644 (file)
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1952,7 +1952,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
                } else {
                        flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
                        /* Nuke the page table entry. */
-                       pteval = ptep_clear_flush(vma, address, pvmw.pte);
+                       if (should_defer_flush(mm, flags)) {
+                               /*
+                                * We clear the PTE but do not flush so potentially
+                                * a remote CPU could still be writing to the folio.
+                                * If the entry was previously clean then the
+                                * architecture must guarantee that a clear->dirty
+                                * transition on a cached TLB entry is written through
+                                * and traps if the PTE is unmapped.
+                                */
+                               pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+
+                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+                       } else {
+                               pteval = ptep_clear_flush(vma, address, pvmw.pte);
+                       }
                }
 
                /* Set the dirty flag on the folio now the pte is gone. */
@@ -2124,10 +2138,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
 
        /*
         * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
-        * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
+        * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags.
         */
        if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
-                                       TTU_SYNC)))
+                                       TTU_SYNC | TTU_BATCH_FLUSH)))
                return;
 
        if (folio_is_zone_device(folio) &&