The ack timeout retransmission time is affected by the following two
factors: one is packet life time, another is the HCA processing time.
Now the default packet lifetime(CMA_IBOE_PACKET_LIFETIME) is 18.
That means the minimum ack timeout is 2
seconds (2^(18+1)*4us=2.097seconds). The packet lifetime means the
maximum transmission time of packets on the network, 2 seconds is too
long.
Assume the network is a clos topology with three layers, every packet will
pass through five hops of switches. Assume the buffer of every switch is
128MB and the port transmission rate is 25 Gbit/s, the maximum
transmission time of the packet is 200ms(128MB*5/25Gbit/s). Add double
redundancy, it is less than 500ms.
So change the CMA_IBOE_PACKET_LIFETIME to 16, the maximum transmission
time of the packet will be about 500+ms, it is long enough.
Link: https://lore.kernel.org/r/20221125010026.755-1-lengchao@huawei.com
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
#define CMA_CM_RESPONSE_TIMEOUT 20
#define CMA_MAX_CM_RETRIES 15
#define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24)
-#define CMA_IBOE_PACKET_LIFETIME 18
+#define CMA_IBOE_PACKET_LIFETIME 16
#define CMA_PREFERRED_ROCE_GID_TYPE IB_GID_TYPE_ROCE_UDP_ENCAP
static const char * const cma_events[] = {