rcu: Avoid RCU-preempt expedited grace-period botch
authorPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Wed, 21 Sep 2011 21:41:37 +0000 (14:41 -0700)
committerPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Sun, 11 Dec 2011 18:31:21 +0000 (10:31 -0800)
Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:

1. Task A exits its RCU read-side critical section, and removes
itself from the ->blkd_tasks list, releases rnp->lock, and is
then preempted.  Task B remains on the ->blkd_tasks list, and
blocks the current expedited grace period.

2. Task B exits from its RCU read-side critical section and removes
itself from the ->blkd_tasks list.  Because it is the last task
blocking the current expedited grace period, it ends that
expedited grace period.

3. Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
of course indicates that nothing is blocking the nonexistent
expedited grace period. Task A is again preempted.

4. Some other CPU starts an expedited grace period.  There are several
tasks blocking this expedited grace period queued on the
same rcu_node structure that Task A was using in step 1 above.

5. Task A examines its state and incorrectly concludes that it was
the last task blocking the expedited grace period on the current
rcu_node structure.  It therefore reports completion up the
rcu_node tree.

6. The expedited grace period can then incorrectly complete before
the tasks blocked on this same rcu_node structure exit their
RCU read-side critical sections.  Arbitrarily bad things happen.

This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
kernel/rcutree_plugin.h

index 4b9b9f8..7986053 100644 (file)
@@ -312,6 +312,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
 {
        int empty;
        int empty_exp;
+       int empty_exp_now;
        unsigned long flags;
        struct list_head *np;
 #ifdef CONFIG_RCU_BOOST
@@ -382,8 +383,10 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
                /*
                 * If this was the last task on the current list, and if
                 * we aren't waiting on any CPUs, report the quiescent state.
-                * Note that rcu_report_unblock_qs_rnp() releases rnp->lock.
+                * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
+                * so we must take a snapshot of the expedited state.
                 */
+               empty_exp_now = !rcu_preempted_readers_exp(rnp);
                if (!empty && !rcu_preempt_blocked_readers_cgp(rnp)) {
                        trace_rcu_quiescent_state_report("preempt_rcu",
                                                         rnp->gpnum,
@@ -406,7 +409,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
                 * If this was the last task on the expedited lists,
                 * then we need to report up the rcu_node hierarchy.
                 */
-               if (!empty_exp && !rcu_preempted_readers_exp(rnp))
+               if (!empty_exp && empty_exp_now)
                        rcu_report_exp_rnp(&rcu_preempt_state, rnp);
        } else {
                local_irq_restore(flags);