IB/cm: Mark stale CM id's whenever the mad agent was unregistered
When there is a CM id object that has port assigned to it, it means that
the cm-id asked for the specific port that it should go by it, but if
that port was removed (hot-unplug event) the cm-id was not updated.
In order to fix that the port keeps a list of all the cm-id's that are
planning to go by it, whenever the port is removed it marks all of them
as invalid.
This commit fixes a kernel panic which happens when running traffic between
guests and we force reboot a guest mid traffic, it triggers a kernel panic:
Call Trace:
[<
ffffffff815271fa>] ? panic+0xa7/0x16f
[<
ffffffff8152b534>] ? oops_end+0xe4/0x100
[<
ffffffff8104a00b>] ? no_context+0xfb/0x260
[<
ffffffff81084db2>] ? del_timer_sync+0x22/0x30
[<
ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0
[<
ffffffff81084240>] ? process_timeout+0x0/0x10
[<
ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20
[<
ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480
[<
ffffffff81065df0>] ? default_wake_function+0x0/0x20
[<
ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core]
[<
ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core]
[<
ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
[<
ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
[<
ffffffff8152a815>] ? page_fault+0x25/0x30
[<
ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm]
[<
ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm]
[<
ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm]
[<
ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm]
[<
ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib]
[<
ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib]
[<
ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib]
[<
ffffffff81094d20>] ? worker_thread+0x170/0x2a0
[<
ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
[<
ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
[<
ffffffff8109aef6>] ? kthread+0x96/0xa0
[<
ffffffff8100c20a>] ? child_rip+0xa/0x20
[<
ffffffff8109ae60>] ? kthread+0x0/0xa0
[<
ffffffff8100c200>] ? child_rip+0x0/0x20
Fixes:
a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>