RDMA/rxe: Fix incomplete state save in rxe_requester
authorBob Pearson <rpearsonhpe@gmail.com>
Fri, 21 Jul 2023 20:07:49 +0000 (15:07 -0500)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 13 Sep 2023 07:42:52 +0000 (09:42 +0200)
commit70518f3aaf5a059b691867d7d2d46b999319656a
tree0ec5f80374da640b138cddb78848715e55d51d52
parent59a4f61feccf9c5518c76d6efd1742694040d1d5
RDMA/rxe: Fix incomplete state save in rxe_requester

[ Upstream commit 5d122db2ff80cd2aed4dcd630befb56b51ddf947 ]

If a send packet is dropped by the IP layer in rxe_requester()
the call to rxe_xmit_packet() can fail with err == -EAGAIN.
To recover, the state of the wqe is restored to the state before
the packet was sent so it can be resent. However, the routines
that save and restore the state miss a significnt part of the
variable state in the wqe, the dma struct which is used to process
through the sge table. And, the state is not saved before the packet
is built which modifies the dma struct.

Under heavy stress testing with many QPs on a fast node sending
large messages to a slow node dropped packets are observed and
the resent packets are corrupted because the dma struct was not
restored. This patch fixes this behavior and allows the test cases
to succeed.

Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer")
Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/infiniband/sw/rxe/rxe_req.c