mm/mprotect.c: don't touch single threaded PTEs which are on the right node
authorAndi Kleen <ak@linux.intel.com>
Tue, 13 Dec 2016 00:41:47 +0000 (16:41 -0800)
committerLinus Torvalds <torvalds@linux-foundation.org>
Tue, 13 Dec 2016 02:55:07 +0000 (18:55 -0800)
We had some problems with pages getting unmapped in single threaded
affinitized processes.  It was tracked down to NUMA scanning.

In this case it doesn't make any sense to unmap pages if the process is
single threaded and the page is already on the node the process is
running on.

Add a check for this case into the numa protection code, and skip
unmapping if true.

In theory the process could be migrated later, but we will eventually
rescan and unmap and migrate then.

In theory this could be made more fancy: remembering this state per
process or even whole mm.  However that would need extra tracking and be
more complicated, and the simple check seems to work fine so far.

[ak@linux.intel.com: v3: Minor updates from Mel. Change code layout]
Link: http://lkml.kernel.org/r/1476382117-5440-1-git-send-email-andi@firstfloor.org
Link: http://lkml.kernel.org/r/1476288949-20970-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/mprotect.c

index 1193652..05a02b7 100644 (file)
@@ -69,11 +69,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
        pte_t *pte, oldpte;
        spinlock_t *ptl;
        unsigned long pages = 0;
+       int target_node = NUMA_NO_NODE;
 
        pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
        if (!pte)
                return 0;
 
+       /* Get target node for single threaded private VMAs */
+       if (prot_numa && !(vma->vm_flags & VM_SHARED) &&
+           atomic_read(&vma->vm_mm->mm_users) == 1)
+               target_node = numa_node_id();
+
        arch_enter_lazy_mmu_mode();
        do {
                oldpte = *pte;
@@ -95,6 +101,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
                                /* Avoid TLB flush if possible */
                                if (pte_protnone(oldpte))
                                        continue;
+
+                               /*
+                                * Don't mess with PTEs if page is already on the node
+                                * a single-threaded process is running on.
+                                */
+                               if (target_node == page_to_nid(page))
+                                       continue;
                        }
 
                        ptent = ptep_modify_prot_start(mm, addr, pte);