RDMA/rtrs-clt: Avoid run destroy_con_cq_qp/create_con_cq_qp in parallel
authorJack Wang <jinpu.wang@cloud.ionos.com>
Fri, 23 Oct 2020 07:43:44 +0000 (09:43 +0200)
committerJason Gunthorpe <jgg@nvidia.com>
Wed, 28 Oct 2020 16:17:39 +0000 (13:17 -0300)
commitfcf2959da6a74e71a85ab666e732fa1ed4da2c9a
treed5cab1815b66691f84d91c54931c4ed6d9a933d8
parent73385fdbc43df2e9ba07d4a459d6e0e2110ad2d8
RDMA/rtrs-clt: Avoid run destroy_con_cq_qp/create_con_cq_qp in parallel

It could happen two kworkers race with each other:

        CPU0                             CPU1
    addr_resolver kworker           reconnect kworker
    rtrs_clt_rdma_cm_handler
    rtrs_rdma_addr_resolved
    create_con_cq_qp: s.dev_ref++
    "s.dev_ref is 1"
                                    wait in create_cm fails with TIMEOUT
                                    destroy_con_cq_qp: --s.dev_ref
                                    "s.dev_ref is 0"
                                    destroy_con_cq_qp: sess->s.dev = NULL
     rtrs_cq_qp_create -> create_qp(con, sess->dev->ib_pd...)
    sess->dev is NULL, panic.

To fix the problem using mutex to serialize create_con_cq_qp and
destroy_con_cq_qp.

Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20201023074353.21946-4-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
drivers/infiniband/ulp/rtrs/rtrs-clt.c
drivers/infiniband/ulp/rtrs/rtrs-clt.h