iavf: fix hang on reboot with ice
When a system with E810 with existing VFs gets rebooted the following
hang may be observed.
Pid 1 is hung in iavf_remove(), part of a network driver:
PID: 1 TASK:
ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow"
#0 [
ffffaad04005fa50] __schedule at
ffffffff8b3239cb
#1 [
ffffaad04005fae8] schedule at
ffffffff8b323e2d
#2 [
ffffaad04005fb00] schedule_hrtimeout_range_clock at
ffffffff8b32cebc
#3 [
ffffaad04005fb80] usleep_range_state at
ffffffff8b32c930
#4 [
ffffaad04005fbb0] iavf_remove at
ffffffffc12b9b4c [iavf]
#5 [
ffffaad04005fbf0] pci_device_remove at
ffffffff8add7513
#6 [
ffffaad04005fc10] device_release_driver_internal at
ffffffff8af08baa
#7 [
ffffaad04005fc40] pci_stop_bus_device at
ffffffff8adcc5fc
#8 [
ffffaad04005fc60] pci_stop_and_remove_bus_device at
ffffffff8adcc81e
#9 [
ffffaad04005fc70] pci_iov_remove_virtfn at
ffffffff8adf9429
#10 [
ffffaad04005fca8] sriov_disable at
ffffffff8adf98e4
#11 [
ffffaad04005fcc8] ice_free_vfs at
ffffffffc04bb2c8 [ice]
#12 [
ffffaad04005fd10] ice_remove at
ffffffffc04778fe [ice]
#13 [
ffffaad04005fd38] ice_shutdown at
ffffffffc0477946 [ice]
#14 [
ffffaad04005fd50] pci_device_shutdown at
ffffffff8add58f1
#15 [
ffffaad04005fd70] device_shutdown at
ffffffff8af05386
#16 [
ffffaad04005fd98] kernel_restart at
ffffffff8a92a870
#17 [
ffffaad04005fda8] __do_sys_reboot at
ffffffff8a92abd6
#18 [
ffffaad04005fee0] do_syscall_64 at
ffffffff8b317159
#19 [
ffffaad04005ff08] __context_tracking_enter at
ffffffff8b31b6fc
#20 [
ffffaad04005ff18] syscall_exit_to_user_mode at
ffffffff8b31b50d
#21 [
ffffaad04005ff28] do_syscall_64 at
ffffffff8b317169
#22 [
ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at
ffffffff8b40009b
RIP:
00007f1baa5c13d7 RSP:
00007fffbcc55a98 RFLAGS:
00000202
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f1baa5c13d7
RDX:
0000000001234567 RSI:
0000000028121969 RDI:
00000000fee1dead
RBP:
00007fffbcc55ca0 R8:
0000000000000000 R9:
00007fffbcc54e90
R10:
00007fffbcc55050 R11:
0000000000000202 R12:
0000000000000005
R13:
0000000000000000 R14:
00007fffbcc55af0 R15:
0000000000000000
ORIG_RAX:
00000000000000a9 CS: 0033 SS: 002b
During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.
Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().
Fixes:
974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Fixes:
a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <mcornea@redhat.com>
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>