percpu-refcount was incorrectly using preempt_disable/enable() for RCU
critical sections against call_rcu().
6a24474da8 ("percpu-refcount:
consistently use plain (non-sched) RCU") fixed it by converting the
preepmtion operations with rcu_read_[un]lock() citing that there isn't
any advantage in using sched-RCU over using the usual one; however,
rcu_read_[un]lock() for the preemptible RCU implementation -
CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
more expensive than preempt_disable/enable().
In a contrived microbench which repeats the followings,
- percpu_ref_get()
- copy 32 bytes of data into percpu buffer
- percpu_put_get()
- copy 32 bytes of data into percpu buffer
rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
about 15% when compared to using sched-RCU.
As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications. Convert to RCU-sched.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
{
unsigned __percpu *pcpu_count;
- rcu_read_lock();
+ rcu_read_lock_sched();
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
else
atomic_inc(&ref->count);
- rcu_read_unlock();
+ rcu_read_unlock_sched();
}
/**
unsigned __percpu *pcpu_count;
int ret = false;
- rcu_read_lock();
+ rcu_read_lock_sched();
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
ret = true;
}
- rcu_read_unlock();
+ rcu_read_unlock_sched();
return ret;
}
{
unsigned __percpu *pcpu_count;
- rcu_read_lock();
+ rcu_read_lock_sched();
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
else if (unlikely(atomic_dec_and_test(&ref->count)))
ref->release(ref);
- rcu_read_unlock();
+ rcu_read_unlock_sched();
}
#endif
(((unsigned long) ref->pcpu_count)|PCPU_REF_DEAD);
ref->confirm_kill = confirm_kill;
- call_rcu(&ref->rcu, percpu_ref_kill_rcu);
+ call_rcu_sched(&ref->rcu, percpu_ref_kill_rcu);
}