habanalabs: abort waiting user threads upon error
authorTomer Tayar <ttayar@habana.ai>
Tue, 8 Nov 2022 12:34:43 +0000 (14:34 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Thu, 26 Jan 2023 08:56:20 +0000 (10:56 +0200)
commitce259804d22f16eac8e6e4757b48fe4d98e76cc6
tree190f008ae0647ddc8df09b8df0b8f6caa2082c9f
parentcdacf3c0007e42feca23ad4021eaa2de5a589988
habanalabs: abort waiting user threads upon error

User should close the FD when being notified about an error, after
which a device reset takes place.

However, if the user has pending threads that wait for completions,
the device release won't be called and eventually the watchdog timeout
will expire, leading to hard reset and killing the user process.

To avoid it, abort such waiting threads right after the error
notification, and block following waiting operations.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/misc/habanalabs/common/command_submission.c
drivers/misc/habanalabs/common/device.c
drivers/misc/habanalabs/common/habanalabs.h