cpu/hotplug: Prevent self deadlock on CPU hot-unplug
authorThomas Gleixner <tglx@linutronix.de>
Wed, 23 Aug 2023 08:47:02 +0000 (10:47 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 13 Sep 2023 07:43:00 +0000 (09:43 +0200)
commitfea9dd8653ff39ce383c54e747bde4c39289b4ad
treed0a23025a494f22dc84fd1796cbf883ecdeb8c8e
parent4245ca8f4051459beebb70f189329b670efa32a1
cpu/hotplug: Prevent self deadlock on CPU hot-unplug

commit 2b8272ff4a70b866106ae13c36be7ecbef5d5da2 upstream.

Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.

    CPU1                          CPU2

T1 sets cfs_quota
   starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
T1 initiates offlining of CPU1
Hotplug operation starts
  ...
'period_timer' expires and is re-enqueued on CPU1
  ...
take_cpu_down()
  CPU1 shuts down and does not handle timers
  anymore. They have to be migrated in the
  post dead hotplug steps by the control task.

T1 runs the post dead offline operation
       T1 is scheduled out
T1 waits for 'period_timer' to expire

T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().

Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.

Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernel/cpu.c