smp: Reduce NMI traffic from CSD waiters to CSD destination
authorImran Khan <imran.f.khan@oracle.com>
Mon, 8 May 2023 22:31:24 +0000 (08:31 +1000)
committerPaul E. McKenney <paulmck@kernel.org>
Mon, 10 Jul 2023 21:19:04 +0000 (14:19 -0700)
commit0d3a00b370424f5f1b9fd037bc8a4a3e7cbf0939
treef8aaac6f2a87ebd06cde951c710984a168695eb5
parent5bd00f6db012f75b42434d39b7fec98b95c1afcc
smp: Reduce NMI traffic from CSD waiters to CSD destination

On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang,
then all of these waiting CPUs send an NMI to the destination CPU in
order to dump its backtrace.

Given enough NMIs, the destination CPU will spent much of its time
producing backtraces, thus further delaying that CPU's response to the
original CSD IPI.  In the worst case, by the time destination CPU is
done producing all of these backtrace NMIs, the CSD wait timeout will
have elapsed so that the waiters resend their backtrace NMIs again,
further delaying forward progress.

Therefore, to avoid these delays, issue the backtrace NMI only from
the first waiter.  The destination CPU's other waiters can make use of
backtrace obtained from the first waiter's NMI.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
kernel/smp.c