Alim Akhtar [Sun, 6 May 2018 10:14:16 +0000 (15:44 +0530)]
scsi: ufs: add quirk to disallow reset of interrupt aggregation
Some host controllers support interrupt aggregation but don't allow
resetting counter and timer in software.
Signed-off-by: Seungwon Jeon <essuuj@gmail.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Alim Akhtar [Sun, 6 May 2018 10:14:15 +0000 (15:44 +0530)]
scsi: ufs: add quirk to fix mishandling utrlclr/utmrlclr
In the right behavior, setting the bit to '0' indicates clear and '1'
indicates no change. If host controller handles this the other way,
UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR can be used.
[mkp: typo]
Signed-off-by: Seungwon Jeon <essuuj@gmail.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: "Asutosh Das (asd)" <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kees Cook [Wed, 2 May 2018 23:58:09 +0000 (16:58 -0700)]
scsi: ufs: ufshcd: Remove VLA usage
On the quest to remove all VLAs from the kernel[1] this moves buffers
off the stack. In the second instance, this collapses two separately
allocated buffers into a single buffer, since they are used
consecutively, which saves 256 bytes (QUERY_DESC_MAX_SIZE + 1) of stack
space.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Luc Van Oostenryck [Tue, 24 Apr 2018 13:15:58 +0000 (15:15 +0200)]
scsi: mptlan: Fix mpt_lan_sdu_send()'s return type
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wen Xiong [Wed, 9 May 2018 18:47:54 +0000 (13:47 -0500)]
scsi: ipr: new IOASC update
This patch adds new adapter error log for P9 system with the new AZ SAS
cable.
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Tue, 8 May 2018 21:54:35 +0000 (22:54 +0100)]
scsi: esas2r: fix spelling mistake: "requestss" -> "requests"
Trivial fix to spelling mistake in esas2r_debug message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Andrei Vagin [Thu, 22 Mar 2018 06:55:02 +0000 (23:55 -0700)]
scsi: target: target/file: Add support of direct and async I/O
There are two advantages:
* Direct I/O allows to avoid the write-back cache, so it reduces affects
to other processes in the system.
* Async I/O allows to handle a few commands concurrently.
DIO + AIO shows a better perfomance for random write operations:
Mode: O_DSYNC Async: 1
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2
WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec
Mode: O_DSYNC Async: 0
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2
WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec
Known issue:
DIF (PI) emulation doesn't work when a target uses async I/O, because
DIF metadata is saved in a separate file, and it is another non-trivial
task how to synchronize writing in two files, so that a following read
operation always returns a consisten metadata for a specified block.
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kees Cook [Wed, 2 May 2018 22:55:45 +0000 (15:55 -0700)]
scsi: libosd: Remove VLA usage
On the quest to remove all VLAs from the kernel[1] this rearranges the
code to avoid a VLA warning under -Wvla (gcc doesn't recognize "const"
variables as not triggering VLA creation). Additionally cleans up
variable naming to avoid 80 character column limit.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Zhu Lingshan [Wed, 2 May 2018 03:13:44 +0000 (11:13 +0800)]
scsi: tcmu: refactor nl wr_cache attr with new helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_WRITECACHE(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
emulate_write_cache in configFS.
Removed tcmu_netlink_event() since we have new netlink
events helpers now.
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Zhu Lingshan [Wed, 2 May 2018 03:13:43 +0000 (11:13 +0800)]
scsi: tcmu: refactor nl dev_size attr with new helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_DEV_SIZE(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
dev_size in configFS.
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Zhu Lingshan [Wed, 2 May 2018 03:13:42 +0000 (11:13 +0800)]
scsi: tcmu: refactor nl dev_cfg attr with new nl helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_DEV_CFG(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
dev_config in configFS.
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Zhu Lingshan [Wed, 2 May 2018 03:13:41 +0000 (11:13 +0800)]
scsi: tcmu: refactor rm_device cmd with new nl helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event TCMU_CMD_REMOVED_DEVICE
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Zhu Lingshan [Wed, 2 May 2018 03:13:40 +0000 (11:13 +0800)]
scsi: tcmu: refactor add_device cmd with new nl helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event TCMU_CMD_ADDED_DEVICE
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Zhu Lingshan [Wed, 2 May 2018 03:13:39 +0000 (11:13 +0800)]
scsi: tcmu: add new netlink events helpers
Add new netlink events helpers tcmu_netlink_event_init() and
tcmu_netlink_event_send(). These new functions intend to replace
existing netlink events helper function tcmu_netlink_event().
The existing function tcmu_netlink_event() works well for events like
TCMU_ADDED_DEVICE and TCMU_REMOVED_DEVICE which only has one netlink
attribute. But if there is a command requires more than one attributes
to send out, we have to use a struct to adapt the paremeter
reconfig_data, it is hard to use one struct or a union in one struct to
adapt every command with different attributes, it may get long and ugly.
With the new two functions, we can call tcmu_netlink_event_init() to
initialize a netlink event, then add all attributes we need by using
nla_put_xxx(), at last use tcmu_netlink_event_send() to send it out. So
that we don't need to use a long struct or union if we want to send
mulitple attributes for different commands.
[mkp: typos]
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wenwen Wang [Tue, 8 May 2018 00:54:01 +0000 (19:54 -0500)]
scsi: 3w-xxxx: fix a missing-check bug
In tw_chrdev_ioctl(), the length of the data buffer is firstly copied
from the userspace pointer 'argp' and saved to the kernel object
'data_buffer_length'. Then a security check is performed on it to make
sure that the length is not more than 'TW_MAX_IOCTL_SECTORS *
512'. Otherwise, an error code -EINVAL is returned. If the security
check is passed, the entire ioctl command is copied again from the
'argp' pointer and saved to the kernel object 'tw_ioctl'. Then, various
operations are performed on 'tw_ioctl' according to the 'cmd'. Given
that the 'argp' pointer resides in userspace, a malicious userspace
process can race to change the buffer length between the two
copies. This way, the user can bypass the security check and inject
invalid data buffer length. This can cause potential security issues in
the following execution.
This patch checks for capable(CAP_SYS_ADMIN) in tw_chrdev_open() to
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wenwen Wang [Tue, 8 May 2018 00:46:43 +0000 (19:46 -0500)]
scsi: 3w-9xxx: fix a missing-check bug
In twa_chrdev_ioctl(), the ioctl driver command is firstly copied from
the userspace pointer 'argp' and saved to the kernel object
'driver_command'. Then a security check is performed on the data buffer
size indicated by 'driver_command', which is
'driver_command.buffer_length'. If the security check is passed, the
entire ioctl command is copied again from the 'argp' pointer and saved
to the kernel object 'tw_ioctl'. Then, various operations are performed
on 'tw_ioctl' according to the 'cmd'. Given that the 'argp' pointer
resides in userspace, a malicious userspace process can race to change
the buffer size between the two copies. This way, the user can bypass
the security check and inject invalid data buffer size. This can cause
potential security issues in the following execution.
This patch checks for capable(CAP_SYS_ADMIN) in twa_chrdev_open()t o
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomohiro Kusumi [Fri, 4 May 2018 23:45:29 +0000 (16:45 -0700)]
scsi: mpt3sas: fix header path in ioctl documentation
MPT2_MAGIC_NUMBER as well as drivers/scsi/mpt2sas/mpt2sas_ctl.h were
removed to reuse mpt3sas code since commit
09ec55ed74 ("mpt2sas: Remove
.c and .h files from mpt2sas driver").
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tomohiro Kusumi [Fri, 4 May 2018 23:45:28 +0000 (16:45 -0700)]
scsi: mpt3sas: remove obsolete path "drivers/scsi/mpt2sas/" from MAINTAINERS
drivers/scsi/mpt2sas/ no longer exists after commit
c84b06a48c ("mpt3sas: Single driver module which supports both SAS 2.0 &
SAS 3.0 HBAs") merged/removed it.
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dan Carpenter [Thu, 3 May 2018 10:54:32 +0000 (13:54 +0300)]
scsi: megaraid: silence a static checker bug
If we had more than 32 megaraid cards then it would cause memory
corruption. That's not likely, of course, but it's handy to enforce it
and make the static checker happy.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Thu, 3 May 2018 10:18:07 +0000 (11:18 +0100)]
scsi: mptsas: fix spelling mistake: "matchs" -> "matches"
Trivial fix to spelling mistake in warning message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Thu, 3 May 2018 09:26:12 +0000 (10:26 +0100)]
scsi: lpfc: fix spelling mistakes: "mabilbox" and "maibox"
Trivial fix to spelling mistakes in lpfc_printf_log log message
"mabilbox" -> "mailbox"
"maibox" -> "mailbox"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Andrei Vagin [Wed, 2 May 2018 20:31:13 +0000 (13:31 -0700)]
scsi: qla2xxx: remove the unused tcm_qla2xxx_cmd_wq
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Wed, 2 May 2018 09:12:43 +0000 (10:12 +0100)]
scsi: mptfusion: fix spelling mistake: "initators" -> "initiators"
Trivial fix to spelling mistake in text string.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiaofei Tan [Wed, 2 May 2018 15:56:34 +0000 (23:56 +0800)]
scsi: hisi_sas: workaround a v3 hw hilink bug
There is an SoC bug of v3 hw development version. When hot- unplugging a
directly attached disk, the PHY down interrupt may not happen. It is
very easy to appear on some boards.
When this issue occurs, the controller will receive many invalid dword
frames, and the "alos" fields of register HILINK_ERR_DFX can indicate
that disk was unplugged.
As an workaround solution, this patch detects this issue in the channel
interrupt, and workaround it by following steps:
- Disable the PHY
- Clear error code and interrupt
- Enable the PHY
Then the HW will reissue PHY down interrupt.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry [Wed, 2 May 2018 15:56:33 +0000 (23:56 +0800)]
scsi: hisi_sas: add readl poll timeout helper wrappers
It is common to use readl poll timeout helpers in the driver, so create
custom wrappers.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiaofei Tan [Wed, 2 May 2018 15:56:32 +0000 (23:56 +0800)]
scsi: hisi_sas: remove redundant handling to event95 for v3
Event95 is used for DFX purpose. The relevant bit for this interrupt in
the ENT_INT_SRC_MSK3 register has been disabled, so remove the
processing.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:31 +0000 (23:56 +0800)]
scsi: hisi_sas: config ATA de-reset as an constrained command for v3 hw
As a unconstrained command, a command can be sent to SATA disk even if
SATA disk status is BUSY, ERR or DRQ.
If an ATA reset assert is successful but ATA reset de-assert fails, then
it will retry the reset de-assert. If reset de- assert retry is
successful, we think it is okay to probe the device but actually it
still has Err status.
Apparently we need to retry the ATA reset assertion and de- assertion
instead for this mentioned scenario.
As such, we config ATA reset assert as a constrained command, if ATA
reset de-assert fails, then ATA reset de-assert retry will also
fail. Then we will retry the proper process of ATA reset assert and
de-assert again.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:30 +0000 (23:56 +0800)]
scsi: hisi_sas: update PHY linkrate after a controller reset
After the controller is reset, we currently may not honour the PHY max
linkrate set via sysfs, in that after a reset we always revert to max
linkrate of 12Gbps, ignoring the value set via sysfs.
This patch modifies to policy to set the programmed PHY linkrate,
honouring the max linkrate programmed via sysfs.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry [Wed, 2 May 2018 15:56:29 +0000 (23:56 +0800)]
scsi: hisi_sas: stop controller timer for reset
We should only have the timer enabled after PHY up after controller
reset, so disable prior to reset.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:28 +0000 (23:56 +0800)]
scsi: hisi_sas: check sas_dev gone earlier in hisi_sas_abort_task()
It is possible to dereference a NULL-pointer in hisi_sas_abort_task() in
special scenario when the device has been removed.
If an SMP task times-out, it will call hisi_sas_abort_task() to
recover. And currently there is a check in hisi_sas_abort_task() to
avoid the situation of processing the abort for the removed device.
However we have an ordering problem, in that we may reference a task for
the removed device before checking if the device has been removed.
Fix this by only referencing the sas_dev after we know it is still
present.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:27 +0000 (23:56 +0800)]
scsi: hisi_sas: fix PI memory size
There are 28 bytes of protection information record of SSP for v3 hw, 16
bytes for v2 hw, and probably 24 for v1 hw (forgotten now).
So use a value big enough in hisi_sas_command_table_ssp.prot to cover
all cases.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:26 +0000 (23:56 +0800)]
scsi: hisi_sas: check host frozen before calling "done" function
When the host is frozen in SCSI EH state, at any point after the LLDD
sets SAS_TASK_STATE_DONE for the sas_task task state, libsas may free
the task; see sas_scsi_find_task().
This puts the LLDD in a difficult position, in that once it sets
SAS_TASK_STATE_DONE for the task state it should not reference the
sas_task again. But the LLDD needs will check the sas_task indirectly in
calling task->task_done()->sas_scsi_task_done() or sas_ata_task_done()
(to check if the host is frozen state actually).
And the LLDD cannot set SAS_TASK_STATE_DONE for the task state after
task->task_done() is called (as the sas_task is free'd at this point).
This situation would seem to be a problem made by libsas.
To work around, check in the LLDD whether the host is in frozen state to
ensure it is ok to call task->task_done() function. If in the frozen
state, we rely on SCSI EH and libsas to free the sas_task directly.
We do not do this for the following IO types:
- SMP - they are managed in libsas directly, outside SCSI EH
- Any internally originated IO, for similar reason
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:25 +0000 (23:56 +0800)]
scsi: hisi_sas: Add some checks to avoid free'ing a sas_task twice
If the SCSI host enters EH, any pending IO will be processed by SCSI
EH. However it is possible that SCSI EH will try to abort the IO and
also at the same time the IO completes in the driver. In this situation
there is a small chance of freeing the sas_task twice.
Then if another IO re-uses freed sas_task before the second time of
free'ing sas_task, it is possible to free incorrect sas_task.
To avoid this situation, add some checks to increase reliability. The
sas_task task state flag SAS_TASK_STATE_ABORTED is used to mutually
protect the LLDD and libsas freeing the task.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 2 May 2018 15:56:24 +0000 (23:56 +0800)]
scsi: hisi_sas: optimise the usage of DQ locking
In the DQ tasklet processing it is not necessary to take the DQ lock, as
there is no contention between adding slots to the CQ and removing slots
from the matching DQ.
In addition, since we run each DQ in a separate tasklet context, there
would be no possible contention between DQ processing running for the
same queue in parallel.
It is still necessary to take hisi_hba lock when free'ing slots.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:59 +0000 (20:37 -0700)]
scsi: lpfc: Comment cleanup regarding Broadcom copyright header
Fix small formatting and wording nits in Broadcom copyright header
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:58 +0000 (20:37 -0700)]
scsi: lpfc: update driver version to 12.0.0.3
Update the driver version to 12.0.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:57 +0000 (20:37 -0700)]
scsi: lpfc: Enhance log messages when reporting CQE errors
Enhance log messages for CQEs as they were not reporting certain fields.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:56 +0000 (20:37 -0700)]
scsi: lpfc: Fix up log messages and stats counters in IO submit code path
Fix up log messages and add an fcp error stat counter in the IO submit
code path to make diagnosing problems easier
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:54 +0000 (20:37 -0700)]
scsi: lpfc: Driver NVME load fails when CPU cnt > WQ resource cnt
If the cpu count is larger than the number of WQ resources available,
adapter attachment eventually failes due to a WQ_CREATE failure.
Calculate the number of WQs desired (which initializes to cpu count)
after accounting for the number of queues the adapter supports and the
number allocated to SCSI and the control/ELS path, and scale down if
necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:53 +0000 (20:37 -0700)]
scsi: lpfc: Handle new link fault code returned by adapter firmware.
The driver encounters a link event ACQE with a fault code it doesn't
recognize, it logs an "Invalid" fault type and futher treats the unknown
value as a mailbox command failure. First off, there is no "invalid"
value, only values that are unknown. Secondly, the fault code doesn't
indicate status - the rest of the ACQE contains that status so there is
no reason to "fail the commands".
Change the "Invalid" to "Unknown". There is no "invalid" code value.
Separate fault code parsing and message genaration from any mbx handling
status.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:52 +0000 (20:37 -0700)]
scsi: lpfc: Correct fw download error message
In situations when the firmware image in inappropriate for the chip
type, initial validation checks were light, allowing the checks to pass,
thus allowing the firmware to be downloaded. Eventually, after the
download, the chip rejects the firmware but it is logged as a generic
firmware download error.
Revise the initial checks to validate the image vs asic type so that the
correct message is displayed and the download process is avoided.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:51 +0000 (20:37 -0700)]
scsi: lpfc: enhance LE data structure copies to hardware
The driver builds the control structures in host memory using
definitions that are based on 32-bit words. After building the structure
it is then written to the adapter.
This patch slightly optimizes LE hosts by copying the structures via
64-bit copies. This is doable as the adapter interface is LE thus there
is no byteswapping as the copy is performed.
The same optimization would be nice on BE systems, but when byteswapping
occurs, it swaps 32-bit words as well, thus trashing the control
structure. Given amount of code that is dependent upon the 32-bit word
definition, it was decided to not change things for the minor
optimization. Thus PPC 64-bit systems sticks with doing 32-bit copies.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Sat, 5 May 2018 03:37:50 +0000 (20:37 -0700)]
scsi: lpfc: Change IO submit return to EBUSY if remote port is recovering
I/O submission paths in the lpfc nvme path are rejecting the io with an
error code that reflects back to the callee as a hard io failure. Many
of these conditions are transient and would likely resolve if retried.
Correct by returning -EBUSY, which the FC transport triggers off of to
return busy status codes to the blk-mq layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:09:05 +0000 (06:09 -0700)]
scsi: qedf: Update version number to 8.33.16.20
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:09:04 +0000 (06:09 -0700)]
scsi: qedf: Update copyright for 2018
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:09:03 +0000 (06:09 -0700)]
scsi: qedf: Add more defensive checks for concurrent error conditions
During an uplink toggle test all error handling is done via timeout and
firmware error conditions which can occur concurrently:
- SCSI layer timeouts
- Error detect CQEs
- Firmware detected underruns
- ABTS timeouts
All these concurrent events require more defensive checks in the driver
including:
- Check both internally and externally generated aborts to make sure the
xid is not already been aborted in another context or in cleanup.
- Check back pointers in qedf_cmd_timeout to verify the context of the
io_req, fcport and qedf_ctx
- Check rport state in host reset handler to not reset the whole host
if the rport is already uploaded or in the process of relogin
- Check to state for an fcport before initiating a middle path ELS
request
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:09:02 +0000 (06:09 -0700)]
scsi: qedf: Set the UNLOADING flag when removing a vport
Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted. This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.
Fixes the crash:
PID: 106676 TASK:
ffff9a436aa90000 CPU: 12 COMMAND: "multipathd"
#0 [
ffff9a43567d3550] machine_kexec+522 at
ffffffffaca60b2a
#1 [
ffff9a43567d35b0] __crash_kexec+114 at
ffffffffacb13512
#2 [
ffff9a43567d3680] crash_kexec+48 at
ffffffffacb13600
#3 [
ffff9a43567d3698] oops_end+168 at
ffffffffad117768
#4 [
ffff9a43567d36c0] no_context+645 at
ffffffffad106f52
#5 [
ffff9a43567d3710] __bad_area_nosemaphore+116 at
ffffffffad106fe9
#6 [
ffff9a43567d3760] bad_area+70 at
ffffffffad107379
#7 [
ffff9a43567d3788] __do_page_fault+1247 at
ffffffffad11a8cf
#8 [
ffff9a43567d37f0] do_page_fault+53 at
ffffffffad11a915
#9 [
ffff9a43567d3820] page_fault+40 at
ffffffffad116768
[exception RIP: qedf_init_task+61]
RIP:
ffffffffc0e13c2d RSP:
ffff9a43567d38d0 RFLAGS:
00010046
RAX:
0000000000000000 RBX:
ffffbe920472c738 RCX:
ffff9a434fa0e3e8
RDX:
ffff9a434f695280 RSI:
ffffbe920472c738 RDI:
ffff9a43aa359c80
RBP:
ffff9a43567d3950 R8:
0000000000000c15 R9:
ffff9a3fb09b9880
R10:
ffff9a434fa0e3e8 R11:
ffff9a43567d35ce R12:
0000000000000000
R13:
ffff9a434f695280 R14:
ffff9a43aa359c80 R15:
ffff9a3fb9e005c0
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:09:01 +0000 (06:09 -0700)]
scsi: qedf: Add additional checks when restarting an rport due to ABTS timeout
There are a couple of kernel cases when we restart a remote port due to
ABTS timeout that we need to handle:
1. Flush any outstanding ABTS requests when flushing I/Os so that we do
not hold up the eh_abort handler indefinitely causing process hangs.
2. Check if we are currently uploading a connection before issuing an
ABTS.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:09:00 +0000 (06:09 -0700)]
scsi: qedf: If qed fails to enable MSI-X fail PCI probe
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:59 +0000 (06:08 -0700)]
scsi: qedf: Honor default_prio module parameter even if DCBX does not converge
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:58 +0000 (06:08 -0700)]
scsi: qedf: Improve firmware debug dump handling
Get all firmware debug data instead of just a grc dump.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Saurav Kashyap [Wed, 25 Apr 2018 13:08:57 +0000 (06:08 -0700)]
scsi: qedf: Remove setting DCBX pending during soft context reset
PROBLEM DESCRIPTION:
According to the logs, STAG was changing and it was triggering soft
reset. In soft reset we used to virtual link down and up and also we
were disabling DCBx flag. Since this was virtual link flap, DCBx never
used to converge again.
SOLUTION:
Code change is to remove disabling DCBx flag from soft reset.
Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:56 +0000 (06:08 -0700)]
scsi: qedf: Add task id to kref_get_unless_zero() debug messages when flushing requests
Helps to corroborate which requests we can't get reference on and if
it's real bug or not.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:55 +0000 (06:08 -0700)]
scsi: qedf: Check if link is already up when receiving a link up event from qed
[mkp: typo]
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:54 +0000 (06:08 -0700)]
scsi: qedf: Return request as DID_NO_CONNECT if MSI-X is not enabled
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:53 +0000 (06:08 -0700)]
scsi: qedf: Release RRQ reference correctly when RRQ command times out
When an RRQ request times out the reference is not getting decremented
correctly as there are still ELS commands leftover when we flush any
pending I/Os during offload:
[ 281.788553] [0000:21:00.3]:[qedf_cmd_timeout:58]:4: ELS timeout, xid=0x96a.
...
[ 281.788553] [0000:21:00.3]:[qedf_cmd_timeout:58]:4: ELS timeout, xid=0x96a.
[ 281.788772] [0000:21:00.3]:[qedf_rrq_compl:182]:4: Entered.
[ 281.788774] [0000:21:00.3]:[qedf_rrq_compl:200]:4: rrq_compl: orig io =
ffffc90004c556f8, orig xid = 0x81b, rrq_xid = 0x96a, refcount=1
...
[ 331.448032] [0000:21:00.3]:[qedf_flush_els_req:1512]:4: Flushing ELS request xid=0x96a refcount=2.
The fix is to call kref_put on the rrq_req in case of timeout as the
timeout handler will call rrq_compl directly vs. a normal completion
where it is call from els_compl.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:52 +0000 (06:08 -0700)]
scsi: qedf: Honor priority from DCBX FCoE App tag
We currently hard code the priority in the 8021q tag to 3 for FCoE
traffic. The vast majority of the time this is fine but if the priority
is something else besides 3, any VLAN ID comparison either in the
non-offload path or offload path will fail and cause dropped frames
where none are expected.
Change the behavior so that the driver default is 3 if we do not get any
DCBX convergence.
If DCBX does converge, then set the FIP/FCoE priority in the following
manner:
1. If the qedf_default_prio modparam is set use that
2. If the DCBX FCoE priority is not in range (0..7) use 3
3. Use the DCBX FCoE priority we get in the driver's DCBX handler
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:51 +0000 (06:08 -0700)]
scsi: qedf: Add dcbx_not_wait module parameter so we won't wait for DCBX convergence to start discovery
This module parameter is to work around cases where we do not receive
the DCBX handler notification from qed but discovery is still possible
if we send out a FIP VLAN request irregardless of the DCBX state.
[mkp: zeroday warning]
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:50 +0000 (06:08 -0700)]
scsi: qedf: Sanity check FCoE/FIP priority value to make sure it's between 0 and 7
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:49 +0000 (06:08 -0700)]
scsi: qedf: Add check for offload before flushing I/Os for target
We need to check that a fcport is offloaded before we try to flush any
requests. No doing so could lead to undefined results and most likely a
crash.
Fixes the oops:
[ 343.971886] [0000:42:00.3]:[qedf_execute_tmf:2070]:8: wait for tm_cmpl timeout!
[ 343.971933] BUG: unable to handle kernel paging request at
00000000000024a8
[ 343.971949] IP: [<
ffffffffa06b8cc6>] qedf_flush_active_ios+0x46/0x260 [qedf]
[ 343.971952] PGD
42c569067 PUD
4160fe067 PMD 0
[ 343.971954] Oops: 0000 [#1] SMP
[ 343.972008] Modules linked in: qedf(OEX) qed(OEX) bnx2i cnic fuse af_packet iscsi_ibft msr xfs intel_rapl sb_edac edac_core x86_pkg_temp_thermal bnx2x geneve intel_powerclamp vxlan coretemp ipmi_ssif ipmi_devintf kvm_intel kvm libiscsi joydev irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel tg3 ip6_udp_tunnel udp_tunnel mdio libcrc32c iTCO_wdt scsi_transport_iscsi uio drbg iTCO_vendor_support iscsi_boot_sysfs dcdbas(X) ipmi_si ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper ptp pps_core pcspkr libphy lpc_ich mfd_core cryptd fjes wmi ipmi_msghandler button crc8 libfcoe libfc scsi_transport_fc mei_me mei shpchp processor acpi_pad btrfs xor hid_generic usbhid raid6_pq sd_mod sr_mod cdrom mgag200 crc32c_intel i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[ 343.972020] fb_sys_fops ttm ahci ehci_pci libahci ehci_hcd drm libata usbcore megaraid_sas usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 [last unloaded: qedf]
[ 343.972022] Supported: Yes, External
[ 343.972026] CPU: 30 PID: 12777 Comm: sg_reset Tainted: G W OE X 4.4.73-5-default #1
[ 343.972027] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.1.3 11/20/2013
[ 343.972029] task:
ffff88018dfc0e80 ti:
ffff88042bd7c000 task.ti:
ffff88042bd7c000
[ 343.972036] RIP: 0010:[<
ffffffffa06b8cc6>] [<
ffffffffa06b8cc6>] qedf_flush_active_ios+0x46/0x260 [qedf]
[ 343.972038] RSP: 0018:
ffff88042bd7fbe0 EFLAGS:
00010286
[ 343.972039] RAX:
0000000000000000 RBX:
ffff88042ce37800 RCX:
0000000000000400
[ 343.972040] RDX:
000000000000060e RSI:
ffffffffa06be830 RDI:
ffff8807e5072cc0
[ 343.972041] RBP:
0000000000001000 R08:
ffffffffa06bff4d R09:
ffff88018dd84580
[ 343.972042] R10:
000000000000018b R11:
0000000000000002 R12:
0000000000002003
[ 343.972043] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff8807e5072cc0
[ 343.972046] FS:
00007fc1c8809700(0000) GS:
ffff88042fbc0000(0000) knlGS:
0000000000000000
[ 343.972048] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 343.972049] CR2:
00000000000024a8 CR3:
00000004236ec000 CR4:
00000000001406e0
[ 343.972050] Stack:
[ 343.972053]
504c78750607e154 ffffffff810a7d10 ffff88042ce37800 0000000000000010
[ 343.972055]
0000000000002003 ffff8807ff480c48 ffff8807e5072cc0 ffffc90004ec4ff8
[ 343.972057]
ffffffffa06b9b86 ffff880800000010 0000000000000282 ffff88042ce37800
[ 343.972058] Call Trace:
[ 343.972094] [<
ffffffffa06b9b86>] qedf_initiate_tmf+0x346/0x3e0 [qedf]
[ 343.972120] [<
ffffffffa000fa06>] scsi_try_bus_device_reset+0x26/0x40 [scsi_mod]
[ 343.972133] [<
ffffffffa001038e>] scsi_ioctl_reset+0x13e/0x260 [scsi_mod]
[ 343.972145] [<
ffffffffa000f416>] scsi_ioctl+0x136/0x3d0 [scsi_mod]
[ 343.972154] [<
ffffffff812ff6eb>] blkdev_ioctl+0x6bb/0x950
[ 343.972164] [<
ffffffff8123cfed>] block_ioctl+0x3d/0x40
[ 343.972170] [<
ffffffff81217e2d>] do_vfs_ioctl+0x2cd/0x4a0
[ 343.972186] [<
ffffffff81218074>] SyS_ioctl+0x74/0x80
[ 343.972193] [<
ffffffff8160916e>] entry_SYSCALL_64_fastpath+0x12/0x6d
[ 343.975285] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:48 +0000 (06:08 -0700)]
scsi: qedf: Fix VLAN display when printing sent FIP frames
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:47 +0000 (06:08 -0700)]
scsi: qedf: Add missing skb frees in error path
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:46 +0000 (06:08 -0700)]
scsi: qedf: Increase the number of default FIP VLAN request retries to 60
Some configurations need more than 30 seconds to respond to a FIP VLAN
request so increase the default to 60 seconds.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chad Dupuis [Wed, 25 Apr 2018 13:08:45 +0000 (06:08 -0700)]
scsi: qedf: Synchronize rport restarts when multiple ELS commands time out
If multiple ELS commands time out, such as aborts, they could all try to
restart the same rport and the same time. This could mean multiple
multiple processes trying to clean up any outstanding commands or trying
to upload the same port.
Add a new flag (QEDF_RPORT_IN_RESET) and check other fcport state flags
before trying to reset the port.
Fixes the crash:
[17501.824701] ------------[ cut here ]------------
[17501.824733] kernel BUG at include/asm-generic/dma-mapping-common.h:65!
[17501.824760] invalid opcode: 0000 [#1] SMP
[17501.824781] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ses enclosure dm_service_time vfat fat sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass joydev btrfs hpilo raid6_pq iTCO_wdt iTCO_vendor_support xor hpwdt ipmi_ssif sg crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ioatdma lpc_ich glue_helper ablk_helper i2c_i801 shpchp cryptd ipmi_si pcspkr acpi_power_meter ipmi_devintf pcc_cpufreq dca wmi ipmi_msghandler dm_multipath nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod
[17501.825119] crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qedf(OE) drm libfcoe ahci qedi(OE) crct10dif_pclmul libfc libahci uio crct10dif_common crc32c_intel libiscsi libata scsi_transport_iscsi scsi_transport_fc tg3 qede(OE) scsi_tgt hpsa qed(OE) i2c_core ptp scsi_transport_sas pps_core iscsi_boot_sysfs dm_mirror dm_region_hash dm_log dm_mod
[17501.825292] CPU: 8 PID: 10531 Comm: kworker/u96:1 Tainted: G OE ------------ 3.10.0-693.el7.x86_64 #1
[17501.825330] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 06/02/2016
[17501.825372] Workqueue: fc_rport_eq fc_rport_work [libfc]
[17501.825395] task:
ffff88101bca8000 ti:
ffff881025278000 task.ti:
ffff881025278000
[17501.825424] RIP: 0010:[<
ffffffffc042def9>] [<
ffffffffc042def9>] qedf_unmap_sg_list.isra.15+0x89/0x90 [qedf]
[17501.825471] RSP: 0018:
ffff88102527bb98 EFLAGS:
00010212
[17501.825493] RAX:
ffff8800224eac00 RBX:
ffffc9000cd05210 RCX:
0000000000001000
[17501.825520] RDX:
000000007e655e40 RSI:
0000000000001000 RDI:
ffff88107fe3b098
[17501.826683] RBP:
ffff88102527bba0 R08:
ffffffff81a13200 R09:
0000000000000286
[17501.827747] R10:
0000000000000004 R11:
0000000000000005 R12:
ffffc9000cd051b8
[17501.828804] R13:
ffff881037640c28 R14:
0000000000000007 R15:
ffffc9000cd05200
[17501.829850] FS:
0000000000000000(0000) GS:
ffff88103fa00000(0000) knlGS:
0000000000000000
[17501.830910] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[17501.831966] CR2:
00007f9b94005f38 CR3:
00000000019f2000 CR4:
00000000003407e0
[17501.833027] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[17501.834087] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[17501.835142] Stack:
[17501.836201]
ffff881033ddbb80 ffff88102527bc30 ffffffffc042f834 0000000000002710
[17501.837264]
ffff88102527bbd0 ffffffff8133d9dd ffffc9000cd052a0 ffff88102527bc30
[17501.838325]
ffffffff816a9c65 0000000000000001 ffff88101bca8000 ffffffff810c4810
[17501.839388] Call Trace:
[17501.840446] [<
ffffffffc042f834>] qedf_scsi_done+0x54/0x1d0 [qedf]
[17501.841504] [<
ffffffff8133d9dd>] ? list_del+0xd/0x30
[17501.842537] [<
ffffffff816a9c65>] ? wait_for_completion_timeout+0x125/0x140
[17501.843560] [<
ffffffff810c4810>] ? wake_up_state+0x20/0x20
[17501.844577] [<
ffffffffc0430311>] qedf_initiate_cleanup+0x2e1/0x310 [qedf]
[17501.845587] [<
ffffffffc04305fe>] qedf_flush_active_ios+0x10e/0x260 [qedf]
[17501.846612] [<
ffffffffc042892f>] qedf_cleanup_fcport+0x5f/0x370 [qedf]
[17501.847613] [<
ffffffffc04292d8>] qedf_rport_event_handler+0x398/0x950 [qedf]
[17501.848602] [<
ffffffff810cdc7c>] ? dequeue_entity+0x11c/0x5d0
[17501.849581] [<
ffffffff81098a2b>] ? __internal_add_timer+0xab/0x130
[17501.850555] [<
ffffffff810ce54e>] ? dequeue_task_fair+0x41e/0x660
[17501.851528] [<
ffffffffc03241a4>] fc_rport_work+0xf4/0x6c0 [libfc]
[17501.852490] [<
ffffffff810a881a>] process_one_work+0x17a/0x440
[17501.853446] [<
ffffffff810a94e6>] worker_thread+0x126/0x3c0
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
himanshu.madhani@cavium.com [Tue, 1 May 2018 16:01:54 +0000 (09:01 -0700)]
scsi: qla2xxx: Update driver version to 10.00.00.07-k
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:53 +0000 (09:01 -0700)]
scsi: qla2xxx: Fix TMF and Multi-Queue config
For target mode, task management command is queued to specific cpu base
on where the SCSI command is residing. This prevent race condition of
task management command getting ahead of regular scsi command.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
himanshu.madhani@cavium.com [Tue, 1 May 2018 16:01:52 +0000 (09:01 -0700)]
scsi: qla2xxx: Prevent relogin loop by removing stale code
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:51 +0000 (09:01 -0700)]
scsi: qla2xxx: Remove stale debug value for login_retry flag
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:50 +0000 (09:01 -0700)]
scsi: qla2xxx: Use predefined get_datalen_for_atio() inline function
- Uses predefine inline function to access add_cdb_len field in ATIO.
- Return SS_RESIDUAL_UNDER status when sending BUSY
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:49 +0000 (09:01 -0700)]
scsi: qla2xxx: Fix Inquiry command being dropped in Target mode
When a connection is established, the target core session may not be
created immediately. Current code will drop/terminate the command based
on the session state. This patch will return BUSY status for any
commands arriving on wire before the session is created.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:48 +0000 (09:01 -0700)]
scsi: qla2xxx: Move GPSC and GFPNID out of session management
Move GPSC & GFPNID commands out of session management to reduce time lag
in reporting the session state to remote port. These commands are not
essential when it comes to maintaining the rport state. Delay sending
these commands after rport state is set to Online.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:47 +0000 (09:01 -0700)]
scsi: qla2xxx: Reduce redundant ADISC command for RSCNs
For each RSCN that triggers a rescan of the fabric, ADISC is used to
revalidate an existing session. If the RSCN is not affecting all
existing sessions, then driver should not send redundant ADISC for all
existing sessions.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:46 +0000 (09:01 -0700)]
scsi: qla2xxx: Delete session for nport id change
This patch fixes regression introduced by commit
a4239945b8ad ("scsi:
qla2xxx: Add switch command to simplify fabric discovery") by scheduling
session deletion when Nport ID changes.
[mkp: clarified commit]
Fixes:
a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery")
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:45 +0000 (09:01 -0700)]
scsi: qla2xxx: Fix Rport and session state getting out of sync
This patch fixes rport state and session state getting out of sync.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 1 May 2018 16:01:44 +0000 (09:01 -0700)]
scsi: qla2xxx: Fix sending ADISC command for login
This patch fixes login_retry login for ADISC command.
when login_retry count reaches 0, further attempt to send ADISC command
is ignored by the code. Remove this redundant login_retry count check
from qla24xx_fcport_handle_login()
[mkp: fix typo]
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:43 +0000 (05:28 -0400)]
scsi: mpt3sas: Update driver version "25.100.00.00"
Update driver version to match OOB/internal driver version.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:42 +0000 (05:28 -0400)]
scsi: mpt3sas: fix possible memory leak.
In ioctl exit path driver refers ioc_list to free memory associated with
diag buffers and event_log pointer used to save events by driver.
If ctl_exit() func is called after unregistering driver, then ioc_list will
be empty and hence driver will not be able to free the allocated memory
which in turn causes memory leak.
So call ctl_exit() function before unregistering mpt3sas driver.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:41 +0000 (05:28 -0400)]
scsi: mpt3sas: For NVME device, issue a protocol level reset
1) Manufacturing Page 11 contains parameters to control internal
firmware behavior. Based on AddlFlags2 field FW/Driver behaviour can
be changed, (flag tm_custom_handling is used for this)
a) For PCIe device, protocol level reset should be used if flag
tm_custom_handling is 0. Since Abort Task Set, LUN reset and Target
reset will result in a protocol level reset. Drivers should issue
only one type of this reset, if that fails then it should escalate to
a controller reset (diag reset/OCR).
b) If the driver has control over the TM reset timeout value, then
driver should use the value exposed in PCIe Device Page 2 for pcie
device (field ControllerResetTO).
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:40 +0000 (05:28 -0400)]
scsi: mpt3sas: Update MPI Headers
Update MPI Files to support protocol level reset for NVMe device.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:39 +0000 (05:28 -0400)]
scsi: mpt3sas: Report Firmware Package Version from HBA Driver.
Added function _base_display_fwpkg_version, which sends FWUpload request
to pull FW package version from FW Image Header. Now driver prints FW
package version in addition to FW version if the PackageVersion is
valid.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:38 +0000 (05:28 -0400)]
scsi: mpt3sas: Cache enclosure pages during enclosure add.
In function _scsih_add_device, for each device connected to an
enclosure, driver reads the enclosure page(To get details like enclosure
handle, enclosure logical ID, enclosure level etc.)
With this patch, instead of reading enclosure page everytime, driver
maintains a list for enclosure device(During enclosure add event,
enclosure device is added to the list and removed from the list on
delete events) and uses the enclosure page from the list.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:37 +0000 (05:28 -0400)]
scsi: mpt3sas: Allow processing of events during driver unload.
Events were not processed during driver unload, hence unloading of
driver doesn't complete when drives are disconnected while unloading of
driver. So don't block events in ISR path, i,e., remove the flag
ioc->remove_host so that events are getting processed during driver
unload. Thus allowing driver unload to complete by processing drive
removal events during driver unload.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:36 +0000 (05:28 -0400)]
scsi: mpt3sas: Increase event log buffer to support 24 port HBA's.
For 24 port HBA's events generated by IOC are more in certain cases and
the current circular buffer may be overwritten.Hence increased the event
log buffer to accommodate more events.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:35 +0000 (05:28 -0400)]
scsi: mpt3sas: Added support for SAS Device Discovery Error Event.
The SAS Device Discovery Error Event is sent to the host when discovery
for a particular device is failed during discovery, even after maximum
retries by the IOC.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:34 +0000 (05:28 -0400)]
scsi: mpt3sas: Enhanced handling of Sense Buffer.
Enhanced DMA allocation for Sense Buffer, if the allocation does not fit
within same 4GB.Introduced is_MSB_are_same function to check if allocted
buffer within 4GB range or not.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:33 +0000 (05:28 -0400)]
scsi: mpt3sas: Optimize I/O memory consumption in driver.
For every IO, memory of PAGE size is allocated for handling NVMe native
PRPS. And in addition to that for every IO (chains need per IO * chain
buffer size, e.g. 38 * 128byte) amount of memory is allocated for chain
buffers.
However, at any point of time; the IO request can be for NVMe target
device (where PRP's page is used for framing PRP's) or can be for SCSI
target device (where chain buffers are used for framing chain
SGE's). This patch modifies the driver to reuse same pre-allocated PRP
page buffers as a chain buffer for IO's targeted for SCSI target
devices. No need to allocate separate buffers for chain SGE's buffers.
Suppose if the number of chain buffers need for IO doesn't fit in the
PRP Page size then driver maintain's separate buffers for those extra
chain buffers that exceeds the PRP page size. For example consider PRP
page size as 4K and chain buffer size as 128 bytes, then number of chain
buffers that can fit in PRP page is 4096/128 => 32. if the number of
chain buffer need per IO exceeds 32; for example consider number of
chains need per IO is 36 then for remaining 4 chain buffer's driver
allocates them individual.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:32 +0000 (05:28 -0400)]
scsi: mpt3sas: Lockless access for chain buffers.
Introduces Chain lookup table/tracker and implements accessing chain
buffer using smid. Removed link list based access of chain buffer which
requires lock and allocated as many chains needed.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:31 +0000 (05:28 -0400)]
scsi: mpt3sas: Pre-allocate RDPQ Array at driver boot time.
Instead of allocating RDPQ array (This stores the address's of each RDPQ
pools) at run time, now it will be allocated once during driver load
time and same will be reused during host reset operation also (instead
of allocating & freeing this buffer on the fly during every host reset
operation) and then freed during driver unload.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chaitra P B [Tue, 24 Apr 2018 09:28:30 +0000 (05:28 -0400)]
scsi: mpt3sas: Bug fix for big endian systems.
This patch fixes sparse warnings and bugs on big endian systems.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Sun, 29 Apr 2018 12:31:49 +0000 (13:31 +0100)]
scsi: mpt3sas: fix spelling mistake: "disbale" -> "disable"
Trivial fix to spelling mistake in module parameter description text
[mkp: applied by hand]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Sun, 29 Apr 2018 12:25:32 +0000 (13:25 +0100)]
scsi: megaraid_sas: fix spelling mistake: "disbale" -> "disable"
Trivial fix to spelling mistake in module parameter description text
[mkp: applied by hand]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Fri, 27 Apr 2018 19:15:52 +0000 (20:15 +0100)]
scsi: esas2r: fix spelling mistake: "asynchromous" -> "asynchronous"
Trivial fix to spelling mistake in module description text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Wed, 25 Apr 2018 10:58:43 +0000 (11:58 +0100)]
scsi: isci: remove redundant check on in_connection_align_insertion_frequency
The sanity check on u->in_connection_align_insertion_frequency is being
performed twice and hence the first check can be removed since it is
redundant. Cleans up cppcheck warning:
drivers/scsi/ibmvscsi/ibmvscsi.c:1711: (warning) Identical inner 'if'
condition is always true.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Sat, 21 Apr 2018 10:58:35 +0000 (18:58 +0800)]
scsi: a100u2w: Use module_pci_driver
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Sat, 21 Apr 2018 10:58:34 +0000 (18:58 +0800)]
scsi: wd719x: Use module_pci_driver
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Sat, 21 Apr 2018 10:58:33 +0000 (18:58 +0800)]
scsi: am53c974: Use module_pci_driver
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Sun, 15 Apr 2018 14:52:37 +0000 (16:52 +0200)]
scsi: scsi_transport_sas: don't bounce highmem pages for the smp handler
All three instance of ->smp_handler deal with highmem backed requests
just fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Arnd Bergmann [Fri, 20 Apr 2018 16:04:40 +0000 (18:04 +0200)]
scsi: ips: fix firmware timestamps for 32-bit
do_gettimeofday() is deprecated since it will stop working in 2038 on
32-bit platforms, leading to incorrect times passed to the firmware.
On 64-bit platforms the current code appears to be fine, as the
calculation passes an 8-bit century number into the firmware that can
represent times long in the future (possibly until 25599).
Using ktime_get_real_seconds() to get a 64-bit seconds value and
time64_to_tm() to convert it into the firmware format greatly simplifies
the ips timekeeping code, makes 32-bit and 64-bit behave the same way
here, and gets us closer to removing the deprecated interfaces.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Arnd Bergmann [Fri, 20 Apr 2018 16:02:09 +0000 (18:02 +0200)]
scsi: esas2r: use ktime_get_real_seconds()
do_gettimeofday() is deprecated because of the y2038 overflow. Here, we
use the result to pass into a 32-bit field in the firmware, which still
risks an overflow, but if the firmware is written to expect unsigned
values, it can at least last until y2106, and there is not much we can
do about it.
This changes do_gettimeofday() to ktime_get_real_seconds(), which at
least simplifies the code a bit, and avoids the deprecated
interface. I'm adding a comment about the overflow to document what
happens.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Thu, 29 Mar 2018 11:43:11 +0000 (19:43 +0800)]
scsi: mvumi: Using module_pci_driver
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>