This patch fixes issue in which we have timeout for multi-CS although
the CS in the list actually completed.
Example scenario (the two threads marked as WAIT for the thread that
handles the wait_for_multi_cs and CMPL as the thread that signal
completion for both CS and multi-CS):
1. Submit CS with sequence X
2. [WAIT]: call wait_for_multi_cs with single CS X
3. [CMPL]: CS X do invoke complete_all for both CS and multi-CS
(multi_cs_completion_done still false)
4. [WAIT]: enter poll_fences, reinit the completion and find the CS
as completed when asking on the fence but multi_cs_done is
still false it returns that no CS actually completed
5. [CMPL]: set multi_cs_handling_done as true
6. [WAIT]: wait for completion but no CS to awake the wait context
and hence wait till timeout
Solution: if CS detected as completed in poll_fences but multi_cs_done
is still false invoke complete_all to the multi-CS completion
and so it will not go to sleep in wait_for_completion but
rather will have a "second chance" to wait for
multi_cs_completion_done.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* returns to user indicating CS completed before it finished
* all of its mcs handling, to avoid race the next time the
* user waits for mcs.
+ * note: when reaching this case fence is definitely not NULL
+ * but NULL check was added to overcome static analysis
*/
- if (!fence->mcs_handling_done)
+ if (fence && !fence->mcs_handling_done) {
+ /*
+ * in case multi CS is completed but MCS handling not done
+ * we "complete" the multi CS to prevent it from waiting
+ * until time-out and the "multi-CS handling done" will have
+ * another chance at the next iteration
+ */
+ complete_all(&mcs_compl->completion);
break;
+ }
mcs_data->completion_bitmap |= BIT(i);
/*