sched/numa: Fix NULL pointer dereference in task_numa_migrate()
authorRik van Riel <riel@redhat.com>
Tue, 12 Nov 2013 00:29:25 +0000 (19:29 -0500)
committerIngo Molnar <mingo@kernel.org>
Wed, 13 Nov 2013 12:33:51 +0000 (13:33 +0100)
The cpusets code can split up the scheduler's domain tree into
smaller domains.  Some of those smaller domains may not cross
NUMA nodes at all, leading to a NULL pointer dereference on the
per-cpu sd_numa pointer.

Tasks cannot be migrated out of their domain, so the patch
also sets p->numa_preferred_nid to whereever they are, to
prevent the migration from being retried over and over again.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/n/tip-oosqomw0Jput0Jkvoowhrqtu@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel/sched/fair.c

index df77c60..c11e36f 100644 (file)
@@ -1201,9 +1201,21 @@ static int task_numa_migrate(struct task_struct *p)
         */
        rcu_read_lock();
        sd = rcu_dereference(per_cpu(sd_numa, env.src_cpu));
-       env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2;
+       if (sd)
+               env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2;
        rcu_read_unlock();
 
+       /*
+        * Cpusets can break the scheduler domain tree into smaller
+        * balance domains, some of which do not cross NUMA boundaries.
+        * Tasks that are "trapped" in such domains cannot be migrated
+        * elsewhere, so there is no point in (re)trying.
+        */
+       if (unlikely(!sd)) {
+               p->numa_preferred_nid = cpu_to_node(task_cpu(p));
+               return -EINVAL;
+       }
+
        taskweight = task_weight(p, env.src_nid);
        groupweight = group_weight(p, env.src_nid);
        update_numa_stats(&env.src_stats, env.src_nid);