driver core: fix deadlock in __device_attach
authorZhang Wensheng <zhangwensheng5@huawei.com>
Wed, 18 May 2022 07:45:16 +0000 (15:45 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Tue, 14 Jun 2022 16:36:09 +0000 (18:36 +0200)
commitdf6de52b80aa3b46f5ac804412355ffe2e1df93e
treeedb79cf2f6fe2abea6ce500f076a03219f5cf5de
parentcdf1a683a01583bca4b618dd16223cbd6e462e21
driver core: fix deadlock in __device_attach

[ Upstream commit b232b02bf3c205b13a26dcec08e53baddd8e59ed ]

In __device_attach function, The lock holding logic is as follows:
...
__device_attach
device_lock(dev)      // get lock dev
  async_schedule_dev(__device_attach_async_helper, dev); // func
    async_schedule_node
      async_schedule_node_domain(func)
        entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
/* when fail or work limit, sync to execute func, but
   __device_attach_async_helper will get lock dev as
   well, which will lead to A-A deadlock.  */
if (!entry || atomic_read(&entry_count) > MAX_WORK) {
  func;
else
  queue_work_node(node, system_unbound_wq, &entry->work)
  device_unlock(dev)

As shown above, when it is allowed to do async probes, because of
out of memory or work limit, async work is not allowed, to do
sync execute instead. it will lead to A-A deadlock because of
__device_attach_async_helper getting lock dev.

To fix the deadlock, move the async_schedule_dev outside device_lock,
as we can see, in async_schedule_node_domain, the parameter of
queue_work_node is system_unbound_wq, so it can accept concurrent
operations. which will also not change the code logic, and will
not lead to deadlock.

Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Link: https://lore.kernel.org/r/20220518074516.1225580-1-zhangwensheng5@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/base/dd.c