mm: multi-gen LRU: fix crash during cgroup migration
authorYu Zhao <yuzhao@google.com>
Mon, 16 Jan 2023 03:44:05 +0000 (20:44 -0700)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 9 Feb 2023 10:28:20 +0000 (11:28 +0100)
commit de08eaa6156405f2e9369f06ba5afae0e4ab3b62 upstream.

lru_gen_migrate_mm() assumes lru_gen_add_mm() runs prior to itself.  This
isn't true for the following scenario:

    CPU 1                         CPU 2

  clone()
    cgroup_can_fork()
                                cgroup_procs_write()
    cgroup_post_fork()
                                  task_lock()
                                  lru_gen_migrate_mm()
                                  task_unlock()
    task_lock()
    lru_gen_add_mm()
    task_unlock()

And when the above happens, kernel crashes because of linked list
corruption (mm_struct->lru_gen.list).

Link: https://lore.kernel.org/r/20230115134651.30028-1-msizanoen@qtmlabs.xyz/
Link: https://lkml.kernel.org/r/20230116034405.2960276-1-yuzhao@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: msizanoen <msizanoen@qtmlabs.xyz>
Tested-by: msizanoen <msizanoen@qtmlabs.xyz>
Cc: <stable@vger.kernel.org> [6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm/vmscan.c

index 8fcc5fa..96eb9da 100644 (file)
@@ -3290,13 +3290,16 @@ void lru_gen_migrate_mm(struct mm_struct *mm)
        if (mem_cgroup_disabled())
                return;
 
+       /* migration can happen before addition */
+       if (!mm->lru_gen.memcg)
+               return;
+
        rcu_read_lock();
        memcg = mem_cgroup_from_task(task);
        rcu_read_unlock();
        if (memcg == mm->lru_gen.memcg)
                return;
 
-       VM_WARN_ON_ONCE(!mm->lru_gen.memcg);
        VM_WARN_ON_ONCE(list_empty(&mm->lru_gen.list));
 
        lru_gen_del_mm(mm);