printk: disable optimistic spin during panic
authorStephen Brennan <stephen.s.brennan@oracle.com>
Wed, 2 Feb 2022 17:18:19 +0000 (09:18 -0800)
committerPetr Mladek <pmladek@suse.com>
Mon, 14 Feb 2022 12:39:20 +0000 (13:39 +0100)
A CPU executing with console lock spinning enabled might be halted
during a panic. Before the panicking CPU calls console_flush_on_panic(),
it may call console_trylock(), which attempts to optimistically spin,
deadlocking the panic CPU:

CPU 0 (panic CPU)             CPU 1
-----------------             ------
                              printk() {
                                vprintk_func() {
                                  vprintk_default() {
                                    vprintk_emit() {
                                      console_unlock() {
                                        console_lock_spinning_enable();
                                        ... printing to console ...
panic() {
  crash_smp_send_stop() {
    NMI  -------------------> HALT
  }
  atomic_notifier_call_chain() {
    printk() {
      ...
      console_trylock_spinnning() {
        // optimistic spin infinitely

This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:

    #!/bin/bash
    date
    # 991 chars (based on log buffer size):
    chars="$(printf 'a%.0s' {1..991})"
    while :; do
        echo $chars > /dev/kmsg
    done &
    echo c > /proc/sysrq-trigger &
    date
    exit

To avoid this deadlock, ensure that console_trylock_spinning() does not
allow spinning once a panic has begun.

Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220202171821.179394-3-stephen.s.brennan@oracle.com
kernel/printk/printk.c

index f04bbed..e83c127 100644 (file)
@@ -1847,6 +1847,16 @@ static int console_trylock_spinning(void)
        if (console_trylock())
                return 1;
 
+       /*
+        * It's unsafe to spin once a panic has begun. If we are the
+        * panic CPU, we may have already halted the owner of the
+        * console_sem. If we are not the panic CPU, then we should
+        * avoid taking console_sem, so the panic CPU has a better
+        * chance of cleanly acquiring it later.
+        */
+       if (panic_in_progress())
+               return 0;
+
        printk_safe_enter_irqsave(flags);
 
        raw_spin_lock(&console_owner_lock);