powerpc/time: Always set decrementer in timer_interrupt()
authorMichael Ellerman <mpe@ellerman.id.au>
Wed, 20 Apr 2022 14:16:57 +0000 (00:16 +1000)
committerMichael Ellerman <mpe@ellerman.id.au>
Thu, 21 Apr 2022 06:10:56 +0000 (16:10 +1000)
This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt:
Don't enable MSR[EE] in irq handlers unless perf is in use").

Prior to that commit, we always set the decrementer in
timer_interrupt(), to clear the timer interrupt. Otherwise we could end
up continuously taking timer interrupts.

When high res timers are enabled there is no problem seen with leaving
the decrementer untouched in timer_interrupt(), because it will be
programmed via hrtimer_interrupt() -> tick_program_event() ->
clockevents_program_event() -> decrementer_set_next_event().

However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we
see a stall/lockup, because tick_nohz_handler() does not cause a
reprogram of the decrementer, leading to endless timer interrupts.
Example trace:

  [    1.898617][    T7] Freeing initrd memory: 2624K^M
  [   22.680919][    C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M
  [   22.682281][    C1] rcu:     0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M
  [   22.682851][    C1]  (detected by 1, t=2102 jiffies, g=-1179, q=476)^M
  [   22.683649][    C1] Sending NMI from CPU 1 to CPUs 0:^M
  [   22.685252][    C0] NMI backtrace for cpu 0^M
  [   22.685649][    C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M
  [   22.686393][    C0] NIP:  c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M
  [   22.686774][    C0] REGS: c000000002833590 TRAP: 0500   Not tainted  (5.16.0-rc2-00185-g0faf20a1ad16)^M
  [   22.687222][    C0] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24000222  XER: 00000000^M
  [   22.688297][    C0] CFAR: c00000000000c854 IRQMASK: 0 ^M
  ...
  [   22.692637][    C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M
  [   22.694443][    C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M
  [   22.695762][    C0] Call Trace:^M
  [   22.696050][    C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M
  [   22.697377][    C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M
  [   22.698739][    C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M
  [   22.699938][    C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M
  [   22.701119][    C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M

Possibly this should be fixed in the lowres timing code, but that would
be a generic change and could take some time and may not backport
easily, so for now make the programming of the decrementer unconditional
again in timer_interrupt() to avoid the stall/lockup.

Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20220420141657.771442-1-mpe@ellerman.id.au
arch/powerpc/kernel/time.c

index f5cbfe5efd25fdfe2236f40194388511c7f320a4..f80cce0e38994538dcef03bd2dec3f0e03ffa716 100644 (file)
@@ -615,23 +615,22 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
                return;
        }
 
-       /* Conditionally hard-enable interrupts. */
-       if (should_hard_irq_enable()) {
-               /*
-                * Ensure a positive value is written to the decrementer, or
-                * else some CPUs will continue to take decrementer exceptions.
-                * When the PPC_WATCHDOG (decrementer based) is configured,
-                * keep this at most 31 bits, which is about 4 seconds on most
-                * systems, which gives the watchdog a chance of catching timer
-                * interrupt hard lockups.
-                */
-               if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
-                       set_dec(0x7fffffff);
-               else
-                       set_dec(decrementer_max);
+       /*
+        * Ensure a positive value is written to the decrementer, or
+        * else some CPUs will continue to take decrementer exceptions.
+        * When the PPC_WATCHDOG (decrementer based) is configured,
+        * keep this at most 31 bits, which is about 4 seconds on most
+        * systems, which gives the watchdog a chance of catching timer
+        * interrupt hard lockups.
+        */
+       if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
+               set_dec(0x7fffffff);
+       else
+               set_dec(decrementer_max);
 
+       /* Conditionally hard-enable interrupts. */
+       if (should_hard_irq_enable())
                do_hard_irq_enable();
-       }
 
 #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC)
        if (atomic_read(&ppc_n_lost_interrupts) != 0)