habanalabs: add dedicated define for hard reset
authorOmer Shpigelman <oshpigelman@habana.ai>
Sat, 9 May 2020 09:18:01 +0000 (12:18 +0300)
committerOded Gabbay <oded.gabbay@gmail.com>
Tue, 19 May 2020 11:48:41 +0000 (14:48 +0300)
Gaudi requires longer waiting during reset due to closing of network ports.
Add this explanation to the relevant comment in the code and add a
dedicated define for this reset timeout period, instead of multiplying
another define.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
drivers/misc/habanalabs/device.c
drivers/misc/habanalabs/habanalabs.h

index c89157dafa33740307ce0acce1df1510a0cc6f15..f618cff9a1674886a629604b9ea9e9d4c7416877 100644 (file)
@@ -1326,11 +1326,12 @@ void hl_device_fini(struct hl_device *hdev)
         * This function is competing with the reset function, so try to
         * take the reset atomic and if we are already in middle of reset,
         * wait until reset function is finished. Reset function is designed
-        * to always finish (could take up to a few seconds in worst case).
+        * to always finish. However, in Gaudi, because of all the network
+        * ports, the hard reset could take between 10-30 seconds
         */
 
        timeout = ktime_add_us(ktime_get(),
-                               HL_PENDING_RESET_PER_SEC * 1000 * 1000 * 4);
+                               HL_HARD_RESET_MAX_TIMEOUT * 1000 * 1000);
        rc = atomic_cmpxchg(&hdev->in_reset, 0, 1);
        while (rc) {
                usleep_range(50, 200);
index cfb306daa8d4791998cc22cfed6883dbc73b854e..d77410886a673f434cb3bd5dda776c991455a88e 100644 (file)
@@ -25,6 +25,8 @@
 
 #define HL_PENDING_RESET_PER_SEC       30
 
+#define HL_HARD_RESET_MAX_TIMEOUT      120
+
 #define HL_DEVICE_TIMEOUT_USEC         1000000 /* 1 s */
 
 #define HL_HEARTBEAT_PER_USEC          5000000 /* 5 s */