IB/qib: Fix panic in RC error flushing logic
The following panic can occur when flushing a QP:
RIP: 0010:[<
ffffffffa0168e8b>] [<
ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
RSP: 0018:
ffff8803cdc6fc90 EFLAGS:
00010046
RAX:
0000000000000000 RBX:
ffff8803d84ba000 RCX:
0000000000000000
RDX:
0000000000000005 RSI:
ffffc90015a53430 RDI:
ffff8803d84ba000
RBP:
ffff8803cdc6fce0 R08:
ffff8803cdc6fc90 R09:
0000000000000001
R10:
00000000ffffffff R11:
0000000000000000 R12:
ffff8803d84ba0c0
R13:
ffff8803d84ba5cc R14:
0000000000000800 R15:
0000000000000246
FS:
0000000000000000(0000) GS:
ffff880036600000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
CR2:
0000000000000034 CR3:
00000003e44f9000 CR4:
00000000000406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process qib/0 (pid: 1350, threadinfo
ffff8803cdc6e000, task
ffff88042728a100)
Stack:
53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a
Call Trace:
[<
ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib]
[<
ffffffffa0165a10>] ? qib_make_rc_req+0x0/0xe80 [ib_qib]
[<
ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib]
[<
ffffffff814db757>] ? thread_return+0x4e/0x777
[<
ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib]
[<
ffffffff81088bf0>] worker_thread+0x170/0x2a0
[<
ffffffff8108e530>] ? autoremove_wake_function+0x0/0x40
[<
ffffffff81088a80>] ? worker_thread+0x0/0x2a0
[<
ffffffff8108e1c6>] kthread+0x96/0xa0
[<
ffffffff8100c1ca>] child_rip+0xa/0x20
[<
ffffffff8108e130>] ? kthread+0x0/0xa0
[<
ffffffff8100c1c0>] ? child_rip+0x0/0x20
RIP [<
ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
The RC error state flush logic in qib_make_rc_req() could return all
of the acked wqes and potentially have emptied the queue. It would
then unconditionally try return a flush completion via
qib_send_complete() for an invalid wqe, or worse a valid one that is
not queued. The panic results when the completion code tries to
maintain an MR reference count for a NULL MR.
This fix modifies logic to only send one completion per
qib_make_rc_req() call and changing the completion status from
IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress.
The outer loop will call as many times as necessary to flush the queue.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>