RDMA/rxe: Handle remote errors in the midst of a Read reply sequence
authorDaisuke Matsuda <matsuda-daisuke@fujitsu.com>
Thu, 13 Oct 2022 01:47:24 +0000 (10:47 +0900)
committerLeon Romanovsky <leon@kernel.org>
Tue, 25 Oct 2022 05:56:32 +0000 (08:56 +0300)
Requesting nodes do not handle a reported error correctly if it is
generated in the middle of multi-packet Read responses, and the node tries
to resend the request endlessly. Let completer terminate the connection in
that case.

Link: https://lore.kernel.org/r/20221013014724.3786212-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
drivers/infiniband/sw/rxe/rxe_comp.c

index fb0c008..c9170dd 100644 (file)
@@ -200,6 +200,10 @@ static inline enum comp_state check_psn(struct rxe_qp *qp,
                 */
                if (pkt->psn == wqe->last_psn)
                        return COMPST_COMP_ACK;
+               else if (pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE &&
+                        (qp->comp.opcode == IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST ||
+                         qp->comp.opcode == IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE))
+                       return COMPST_CHECK_ACK;
                else
                        return COMPST_DONE;
        } else if ((diff > 0) && (wqe->mask & WR_ATOMIC_OR_READ_MASK)) {
@@ -228,6 +232,10 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 
        case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
        case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
+               /* Check NAK code to handle a remote error */
+               if (pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE)
+                       break;
+
                if (pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE &&
                    pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST) {
                        /* read retries of partial data may restart from