There is a double completion associated with error handling for RC QPs.
The sequence is:
- The do_rc_ack() routine fields an RNR nack and there are 0
rnr_retries configured on the QP.
- qib_error_qp() stops the pending timer
- qib_rc_send_complete() is called from sdma_complete()
- qib_rc_send_complete() starts the timer because the msb of the psn
just completed says an ack is needed.
- a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
- rc_timeout() calls qib_restart_rc()
- qib_restart_rc() calls qib_send_complete() with a
IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
past
The fix avoids starting the timer since another packet will never
arrive.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* there are still requests that haven't been acked.
*/
if ((psn & IB_BTH_REQ_ACK) && qp->s_acked != qp->s_tail &&
- !(qp->s_flags & (QIB_S_TIMER | QIB_S_WAIT_RNR | QIB_S_WAIT_PSN)))
+ !(qp->s_flags & (QIB_S_TIMER | QIB_S_WAIT_RNR | QIB_S_WAIT_PSN)) &&
+ (ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK))
start_timer(qp);
while (qp->s_last != qp->s_acked) {