tick/broadcast: Reduce lock cacheline contention
authorWaiman Long <longman@redhat.com>
Mon, 30 Jan 2017 17:57:43 +0000 (12:57 -0500)
committerThomas Gleixner <tglx@linutronix.de>
Sat, 4 Feb 2017 07:54:46 +0000 (08:54 +0100)
commit668802c25729a8e3423015c33c05f1c3be3858e9
tree86bcd7436b39f4a4273bdace08a0b86d14e8196c
parent9556ad6ad0c60a23a7db36af65c9ffff51bbf644
tick/broadcast: Reduce lock cacheline contention

It was observed that on an Intel x86 system without the ARAT (Always
running APIC timer) feature and with fairly large number of CPUs as
well as CPUs coming in and out of intel_idle frequently, the lock
contention on the tick_broadcast_lock can become significant.

To reduce contention, the lock is put into its own cacheline and all
the cpumask_var_t variables are put into the __read_mostly section.

Running the SP benchmark of the NAS Parallel Benchmarks on a 4-socket
16-core 32-thread Nehalam system, the performance number improved
from 3353.94 Mop/s to 3469.31 Mop/s when this patch was applied on
a 4.9.6 kernel.  This is a 3.4% improvement.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1485799063-20857-1-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
include/linux/cpumask.h
kernel/time/tick-broadcast.c