YueHaibing [Sat, 25 May 2019 12:37:05 +0000 (20:37 +0800)]
scsi: megaraid_sas: remove set but not used variable 'sge_sz'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_create_frame_pool:
drivers/scsi/megaraid/megaraid_sas_base.c:4124:6: warning: variable sge_sz set but not used [-Wunused-but-set-variable]
It's not used any more since commit
200aed582d61 ("megaraid_sas: endianness
related bug fixes and code optimization")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nathan Chancellor [Thu, 6 Jun 2019 05:24:21 +0000 (22:24 -0700)]
scsi: lpfc: Avoid unused function warnings
When building powerpc pseries_defconfig or powernv_defconfig:
drivers/scsi/lpfc/lpfc_nvmet.c:224:1: error: unused function
'lpfc_nvmet_get_ctx_for_xri' [-Werror,-Wunused-function]
drivers/scsi/lpfc/lpfc_nvmet.c:246:1: error: unused function
'lpfc_nvmet_get_ctx_for_oxid' [-Werror,-Wunused-function]
These functions are only compiled when CONFIG_NVME_TARGET_FC is enabled.
Use that same condition so there is no more warning. While the fixes commit
did not introduce these functions, it caused these warnings.
Fixes:
4064b27417a7 ("scsi: lpfc: Make some symbols static")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jack Wang [Thu, 6 Jun 2019 15:33:05 +0000 (17:33 +0200)]
scsi: MAINTAINERS: update maintainer for PM8001
Lindar's email addess is bouncing for some time, just remove it.
ProfitBricks was rebranded to 1 & 1 Cloud IONOS, so update my email address
too.
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nathan Chancellor [Mon, 3 Jun 2019 23:44:06 +0000 (16:44 -0700)]
scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
clang warns:
drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case IBMVSCSI_HOST_ACTION_NONE:
^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
here
if (rc) {
^~
Initialize rc in the IBMVSCSI_HOST_ACTION_UNBLOCK case statement then
shuffle IBMVSCSI_HOST_ACTION_NONE down to the default case statement and
make it return early so that rc is never used uninitialized in this
function.
Fixes:
035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action states")
Link: https://github.com/ClangBuiltLinux/linux/issues/502
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Suggested-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Fri, 31 May 2019 15:28:41 +0000 (23:28 +0800)]
scsi: lpfc: Make some symbols static
Fix sparse warnings:
drivers/scsi/lpfc/lpfc_sli.c:115:1: warning: symbol 'lpfc_sli4_pcimem_bcopy' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_sli.c:7854:1: warning: symbol 'lpfc_sli4_process_missed_mbox_completions' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:223:27: warning: symbol 'lpfc_nvmet_get_ctx_for_xri' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:245:27: warning: symbol 'lpfc_nvmet_get_ctx_for_oxid' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_init.c:75:10: warning: symbol 'lpfc_present_cpu' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Fri, 31 May 2019 15:27:45 +0000 (23:27 +0800)]
scsi: lpfc: Remove set but not used variables 'qp'
Fixes gcc '-Wunused-but-set-variable' warnings:
drivers/scsi/lpfc/lpfc_init.c: In function lpfc_setup_cq_lookup:
drivers/scsi/lpfc/lpfc_init.c:9359:30: warning: variable qp set but not used [-Wunused-but-set-variable]
It's not used since commit
e70596a60f88 ("scsi: lpfc: Fix poor use of
hardware queues if fewer irq vectors")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enzo Matsumiya [Tue, 7 May 2019 15:39:05 +0000 (12:39 -0300)]
scsi: qla2xxx: remove double assignment in qla2x00_update_fcport
Remove double assignment in qla2x00_update_fcport().
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 29 May 2019 09:58:47 +0000 (17:58 +0800)]
scsi: hisi_sas: Disable stash for v3 hw
For v3 hw, stash is enabled to promote performance, but it does little to
improve performance according to current tests. What's more, it causes
exceptions for some situations, so disable it.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Luo Jiaxing [Wed, 29 May 2019 09:58:46 +0000 (17:58 +0800)]
scsi: hisi_sas: Ignore the error code between phy down to phy up
Several error codes will be generated between PHY down to up.
This issue was introduced by HW design. The designers came to the
conclusion that we should ignore these errors.
Signed-off-by: Jiaxing Luo <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 29 May 2019 09:58:45 +0000 (17:58 +0800)]
scsi: hisi_sas: Change the type of some numbers to unsigned
It reports a error as follows from some tools at two places in our code:
runtime error: left shift of 4 by 29 places cannot be represented in type
'int' So change the type of the two numbers to unsigned to avoid the error.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry [Wed, 29 May 2019 09:58:44 +0000 (17:58 +0800)]
scsi: hisi_sas: Reduce HISI_SAS_SGE_PAGE_CNT in size
Macro HISI_SAS_SGE_PAGE_CNT is defined to SG_CHUNK_SIZE, which is 128.
This means that sizeof(struct hisi_sas_slot_buf_table) is 4192. This is
just over a 4K, which can mean inefficient DMA memory usage (for no PI).
Reduce the size of HISI_SAS_SGE_PAGE_CNT to 124 to fit in a 4K page. With
this change, we experience no performance hit.
Cc: dann frazier <dann.frazier@canonical.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiaofei Tan [Wed, 29 May 2019 09:58:43 +0000 (17:58 +0800)]
scsi: hisi_sas: Fix the issue of argument mismatch of printing ecc errors
The argument of dev_err() called by multi_bit_ecc_error_process_v3_hw() is
not right. We pass two arguments, but there is only one printk format
specifier in the string.
Also move the print format string to dev_err().
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen [Wed, 29 May 2019 09:58:42 +0000 (17:58 +0800)]
scsi: hisi_sas: Delete PHY timers when rmmod or probe failed
When removing the driver or when probe fails, we need to delete the PHY
timers.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bharath Vedartham [Wed, 22 May 2019 16:01:49 +0000 (21:31 +0530)]
scsi: message: fusion: Use kmemdup instead of memcpy and kmalloc
Replace kmalloc + memcpy with kmemdup.
This was reported by coccinelle.
Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Sat, 25 May 2019 12:42:02 +0000 (20:42 +0800)]
scsi: megaraid_sas: remove set but not used variables 'host' and 'wait_time'
Fixes gcc '-Wunused-but-set-variable' warnings:
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_suspend:
drivers/scsi/megaraid/megaraid_sas_base.c:7269:20: warning: variable host set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_aen_polling:
drivers/scsi/megaraid/megaraid_sas_base.c:8397:15: warning: variable wait_time set but not used [-Wunused-but-set-variable]
'host' never used since introduction in commit
31ea7088974c ("[SCSI]
megaraid_sas: add hibernation support")
'wait_time' never used since commit
11c71cb4ab7c ("megaraid_sas: Do
not allow PCI access during OCR")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
YueHaibing [Sat, 25 May 2019 12:38:21 +0000 (20:38 +0800)]
scsi: megaraid_sas: remove set but not used variable 'cur_state'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_transition_to_ready:
drivers/scsi/megaraid/megaraid_sas_base.c:3900:6: warning: variable cur_state set but not used [-Wunused-but-set-variable]
Never used since commit
7218df69e360 ("[SCSI] megaraid_sas: use the
firmware boot timeout when waiting for commands")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gen Zhang [Thu, 30 May 2019 01:10:30 +0000 (09:10 +0800)]
scsi: mpt3sas_ctl: fix double-fetch bug in _ctl_ioctl_main()
In _ctl_ioctl_main(), 'ioctl_header' is fetched the first time from
userspace. 'ioctl_header.ioc_number' is then checked. The legal result is
saved to 'ioc'. Then, in condition MPT3COMMAND, the whole struct is fetched
again from the userspace. Then _ctl_do_mpt_command() is called, 'ioc' and
'karg' as inputs.
However, a malicious user can change the 'ioc_number' between the two
fetches, which will cause a potential security issues. Moreover, a
malicious user can provide a valid 'ioc_number' to pass the check in first
fetch, and then modify it in the second fetch.
To fix this, we need to recheck the 'ioc_number' in the second fetch.
Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Acked-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stanley Chu [Tue, 21 May 2019 06:44:54 +0000 (14:44 +0800)]
scsi: ufs: Add error-handling of Auto-Hibernate
Currently auto-hibernate is activated if host supports auto-hibern8
capability. However error-handling is not implemented, which makes the
feature somewhat risky.
If either "Hibernate Enter" or "Hibernate Exit" fail during auto-hibernate
flow, the corresponding interrupt "UIC_HIBERNATE_ENTER" or
"UIC_HIBERNATE_EXIT" shall be raised according to UFS specification.
This patch adds auto-hibernate error-handling:
- Monitor "Hibernate Enter" and "Hibernate Exit" interrupts after
auto-hibernate feature is activated.
- If a failure happens, trigger error-handling just like
"manual-hibernate" failure and apply the same recovery flow: schedule
UFS error handler in ufshcd_check_errors(), and then do host reset and
restore in UFS error handler.
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stanley Chu [Tue, 21 May 2019 06:44:53 +0000 (14:44 +0800)]
scsi: ufs: Do not overwrite Auto-Hibernate timer
Some vendor-specific initialization flow may set its own auto-hibernate
timer. In this case, do not overwrite timer value as "default value" in
ufshcd_init().
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stanley Chu [Tue, 21 May 2019 06:44:52 +0000 (14:44 +0800)]
scsi: ufs: Introduce ufshcd_is_auto_hibern8_supported()
The checking of Auto-Hibernation support is used in many places in the
driver, thus re-factor it as ufshcd_is_auto_hibern8_supported() to make
code more clean.
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jason Yan [Mon, 20 May 2019 14:06:00 +0000 (22:06 +0800)]
scsi: libsas: no need to join wide port again in sas_ex_discover_dev()
Since we are processing events synchronously now, the second call of
sas_ex_join_wide_port() in sas_ex_discover_dev() is not needed. There will
be no races with other works in disco workqueue. So remove the second
sas_ex_join_wide_port().
I did not change the return value of 'res' to error when discover failed
because we need to continue to discover other phys if one phy discover
failed. So let's keep that logic as before and just add a debug log to
detect the failure. And directly return if second fanout expander attatched
to the parent expander because it has nothing to do after the phy is
disabled.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Thomas Meyer [Wed, 29 May 2019 20:21:36 +0000 (22:21 +0200)]
scsi: lpfc: Use *_pool_zalloc rather than *_pool_alloc
Use *_pool_zalloc rather than *_pool_alloc followed by memset with 0.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Colin Ian King [Wed, 22 May 2019 08:39:03 +0000 (09:39 +0100)]
scsi: hpsa: fix an uninitialized read and dereference of pointer dev
Currently the check for a lockup_detected failure exits via the label
return_reset_status that reads and dereferences an uninitialized pointer
dev. Fix this by ensuring dev is inintialized to null.
Addresses-Coverity: ("Uninitialized pointer read")
Fixes:
14991a5bade5 ("scsi: hpsa: correct device resets")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hariprasad Kelam [Tue, 28 May 2019 01:21:52 +0000 (06:51 +0530)]
scsi: target/iscsi: fix possible condition with no effect (if == else)
Fix the following warning reported by coccicheck:
drivers/target/iscsi/iscsi_target_nego.c:175:6-8: WARNING: possible
condition with no effect (if == else)
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Weitao Hou [Mon, 20 May 2019 03:24:03 +0000 (11:24 +0800)]
scsi: pm8001: Fix typo in code comments
Fix abord to abort.
Signed-off-by: Weitao Hou <houweitaoo@gmail.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ondrej Zary [Mon, 27 May 2019 20:19:47 +0000 (22:19 +0200)]
scsi: fdomain: Add PCMCIA support
Add PCMCIA card support to Future Domain SCSI driver.
Tested with IBM SCSI PCMCIA Adapter 40G1890.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ondrej Zary [Sat, 18 May 2019 19:47:24 +0000 (21:47 +0200)]
scsi: fdomain: Add register definitions
Add register bit definitions from documentation to header file and use them
instead of magic constants. No changes to generated binary.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tyrel Datwyler [Fri, 3 May 2019 00:50:58 +0000 (19:50 -0500)]
scsi: ibmvscsi: fix tripping of blk_mq_run_hw_queue WARN_ON
After a successful SRP login response we call scsi_unblock_requests() to
kick any pending IOs. The callback to process this SRP response happens in
a tasklet and therefore is in softirq context. The result of such is that
when blk-mq is enabled, it is no longer safe to call scsi_unblock_requests()
from this context. The result of duing so triggers the following WARN_ON
splat in dmesg after a host reset or CRQ reenablement.
WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1375 __blk_mq_run_hw_queue+0x120/0x180
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc8 #4
NIP [
c0000000009771e0] __blk_mq_run_hw_queue+0x120/0x180
LR [
c000000000977484] __blk_mq_delay_run_hw_queue+0x244/0x250
Call Trace:
__blk_mq_delay_run_hw_queue+0x244/0x250
blk_mq_run_hw_queue+0x8c/0x1c0
blk_mq_run_hw_queues+0x60/0x90
scsi_run_queue+0x1e4/0x3b0
scsi_run_host_queues+0x48/0x80
login_rsp+0xb0/0x100
ibmvscsi_handle_crq+0x30c/0x3e0
ibmvscsi_task+0x54/0xe0
tasklet_action_common.isra.3+0xc4/0x1a0
__do_softirq+0x174/0x3f4
irq_exit+0xf0/0x120
__do_irq+0xb0/0x210
call_do_irq+0x14/0x24
do_IRQ+0x9c/0x130
hardware_interrupt_common+0x14c/0x150
This patch fixes the issue by introducing a new host action for unblocking
the scsi requests in our seperate work thread.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tyrel Datwyler [Fri, 3 May 2019 00:50:57 +0000 (19:50 -0500)]
scsi: ibmvscsi: redo driver work thread to use enum action states
The current implemenation relies on two flags in the driver's private host
structure to signal the need for a host reset or to reenable the CRQ after
a LPAR migration. This patch does away with those flags and introduces a
single action flag and defined enums for the supported kthread work
actions. Lastly, the if/else logic is replaced with a switch statement.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tyrel Datwyler [Fri, 3 May 2019 00:50:56 +0000 (19:50 -0500)]
scsi: ibmvscsi: Wire up host_reset() in the driver's scsi_host_template
Wire up the host_reset function in our driver_template to allow a user
requested adpater reset via the host_reset sysfs attribute.
Example:
echo "adapter" > /sys/class/scsi_host/host0/host_reset
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:11 +0000 (17:49 -0700)]
scsi: lpfc: Update lpfc version to 12.2.0.3
Update lpfc version to 12.2.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:10 +0000 (17:49 -0700)]
scsi: lpfc: Fix kernel warnings related to smp_processor_id()
Kernel warnings may be seen with preempt debugging enabled.
Replace smp_processor_id calls with raw_smp_processor_id or cpu information
stored in hdwq structures.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:09 +0000 (17:49 -0700)]
scsi: lpfc: Fix BFS crash with DIX enabled
Crashes in scsi_queue_rq or in dma_unmap_direct_sg during BFS when lpfc has
lpfc_enable_bg=1.
lpfc is setting DIX and prot sg after scsi_add_host_with_dma() has been
called. The scsi_host_set_prot() and scsi_host_set_guard() routines need to
be called before scsi_add_host_with_dma().
Revise the calling sequence to set the protection/guard data before calling
scsi_add_host_with_dma().
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:08 +0000 (17:49 -0700)]
scsi: lpfc: Fix FDMI fc4type for nvme support
FDMI protocol support registration was not accurately showing nvme
support. The fcponly-path clears the parameter object.
Move the code out of the fcponly code path. Fix the FDMI registration data
to properly check for nvme support. Commonize the manner in which the fdmi
routines set protocol support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:07 +0000 (17:49 -0700)]
scsi: lpfc: Fix fcp_rsp_len checking on lun reset
Issuing a LUN reset was resulting in a command failure which then escalated
to a host reset.
The FCP-4 spec allows fcp_rsp_len field to specify the number of valid
bytes of FCP_RSP_INFO, and the value could be 4 or 8. The driver is
allowing only a value of 8, thus it failed the command.
Revise the driver to allow 4 or 8.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:06 +0000 (17:49 -0700)]
scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors
While fixing the resources per socket, realized the driver was not using
hardware queues (up to 1 per cpu) if there were fewer interrupt
vectors. The driver was only using the hardware queue assigned to the cpu
with the vector.
Rework the affinity map check to use the additional hardware queue elements
that had been allocated. If the cpu count exceeds the hardware queue count
- share, but choose what is shared with by: hyperthread peer, core peer,
socket peer, or finally similar cpu in a different socket.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:05 +0000 (17:49 -0700)]
scsi: lpfc: Fix oops when driver is loaded with 1 interrupt vector
The driver was coded expecting enough hardware queues and interrupt vectors
such that at least there was one per socket. In the case where there were
fewer than sockets, cpus were left unassigned thus null pointers.
Rework the affinity mappings. Map settings for the cpu's that are in the
irq cpu mask. For each cpu not in the mask, map to another cpu that does
have a mask. Choice of the "other" cpu will attempt to map to the same cpu
but differing hyperthread, or cpu within in same core, or cpu within same
socket, or finally cpu in the base socket.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:04 +0000 (17:49 -0700)]
scsi: lpfc: Fix incorrect logical link speed on trunks when links down
Invalid logical speed is displayed for trunk enabled ports when all ports
are down. Also noted that link speed is incorrectly reported for the units
when links are up.
Current code is returning the logical link speed from the last event from
the adapter. In cases where the last link went down, the link speed in the
event was not valid - meaning that although the links where down the field
had a bogus value.
Rework the event handling to qualify the trunk link state before using the
event speed data.
Also correct units on other areas where the logical link speed was taken
from a link event.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:03 +0000 (17:49 -0700)]
scsi: lpfc: Fix memory leak in abnormal exit path from lpfc_eq_create
eq create is leaking mailbox memory if it encounters an error.
rework error path to free the memory.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:02 +0000 (17:49 -0700)]
scsi: lpfc: Rework misleading nvme not supported in firmware message
The driver unconditionally says fw doesn't support nvme when in
truth it was a driver parameter settings that disabled nvme support.
Rework the code validating nvme support to accurately report what
condition is disabling nvme support. Save state on whether nvme
fw supports nvme in case sysfs attributes change dynamically.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:01 +0000 (17:49 -0700)]
scsi: lpfc: Fix hardlockup in scsi_cmd_iocb_cmpl
There is a race condition with the abort handler declaring a waitq
item on it's stack, followed by a timeout in the abort handler that
has it give up on the abort return to its caller. When the io is
finally aborted and its completion handler called, it references
the waitq element that the abort_handler set up, which is no longer
valid resulting in a deadlock.
Fix by clearing the waitq reference, under lock, when the abort
handler timeout gives up. Have the completion handler validate the
waitq before referencing it.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:49:00 +0000 (17:49 -0700)]
scsi: lpfc: Cancel queued work for an IO when processing a received ABTS
When queued work is executed posting a new command to the transport
the driver is reporting a null buffer.
The driver had received an ABTS which matched a command that had
been scheduled for delivery to the transport. The driver proceeded
to cancel the command, but the work item was never cancelled.
Fix by cancelling the queued work item. Also turns out the ABTS
response was not properly sending a BA_ACC, so set the flag to
send the ACC.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:59 +0000 (17:48 -0700)]
scsi: lpfc: Prevent 'use after free' memory overwrite in nvmet LS handling
Use-after-free memory overwrite detected. Problem reported
by Ewan Milne at Red Hat after running lpfc target with additional
memory checking enabled.
Race condition when lpfc_nvmet_xmt_ls_rsp_cmp frees the ctxp
memory in interrupt context before lpfc_nvmet_xmt_ls_rsp
clears a field in the ctxp after successfully issuing the wqe.
Remove the unnecessary ctxp write after reposting the rq buffer. The
ctxp->rqb_buffer field is not checked in LS handling after the wqe
is submitted.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reported-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:58 +0000 (17:48 -0700)]
scsi: lpfc: Fix PT2PT PLOGI collison stopping discovery
Under heavy load the target stops responding, the drivers aborts
timeout and we start recovery by logging out of the target, but
the target is never logged into again.
In a point-to-point scenario, there were battling PLOGI's. When we
received a PLOGI request after having sent one, the driver cancels
the processing of the original plogi. However, the completion path
of the remaining plogi was coded to skip the reg_rpi that should
be happening on the 2nd plogi.
Correct by adding a simple pt2pt check such that the 2nd plogi isn't
skipped and the reg_login occurs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:57 +0000 (17:48 -0700)]
scsi: lpfc: Revert message logging on unsupported topology
Turns out the message change in 12.2.0.1 for unsupported topology
makes the linux driver out of sync with other products.
Revert the message back to the prior content for product consistency.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:56 +0000 (17:48 -0700)]
scsi: lpfc: Fix nvmet handling of received ABTS for unmapped frames
The driver currently is relying on firmware to match ABTSs to existing
exchanges. This works fine as long as an exchange has been assigned to the
io and work posted to it. However, for unmapped frames (rxid=0xFFFF), the
driver has yet to assign an xri. The driver was blindly saying it couldn't
match the ABTS and sending the BA_xxx. However, the command frame may have
been in queues waiting on xri's before posting to the nvmet_fc layer. When
xri's became available, the command frame would still be pushed to the
transport and that io would execute, even though the io had been killed by
ABTS. The initiator, seeing the io ABTS'd, would reuse the exchange for a
different io which would be received on the target and pushed up. If the
"zombie" io then came back down and started transmitting, the initiator
would match the oxid and accept erroneous data. Bad things happened.
Add tracking of active exchanges in the target to allow matching of a
received ABTS against active or pending IO requests. If the ABTS is matched
to a pending or active IO, the drive initiates cleanup and conditionally
notifies the transport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:55 +0000 (17:48 -0700)]
scsi: lpfc: Separate CQ processing for nvmet_fc upcalls
Currently the driver is notified of new command frame receipt by CQEs. As
part of the CQE processing, the driver upcalls the nvmet_fc transport to
deliver the command. nvmet_fc, as part of receiving the command builds out
a context for it, where one of the first steps is to allocate memory for
the io.
When running with tests that do large ios (1MB), it was found on some
systems, the total number of outstanding I/O's, at 1MB per, completely
consumed the system's memory. Thus additional ios were getting blocked in
the memory allocator. Given that this blocked the lpfc thread processing
CQEs, there were lots of other commands that were received and which are
then held up, and given CQEs are serially processed, the aggregate delays
for an IO waiting behind the others became cummulative - enough so that the
initiator hit timeouts for the ios.
The basic fix is to avoid the direct upcall and instead schedule a work
item for each io as it is received. This allows the cq processing to
complete very quickly, and each io can then run or block on it's own.
However, this general solution hurts latency when there are few ios. As
such, implemented the fix such that the driver watches how many CQEs it has
processed sequentially in one run. As long as the count is below a
threshold, the direct nvmet_fc upcall will be made. Only when the count is
exceeded will it revert to work scheduling.
Given that debug of this showed a surprisingly long delay in cq processing,
the io timer stats were updated to better reflect the processing of the
different points.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:54 +0000 (17:48 -0700)]
scsi: lpfc: Revise message when stuck due to unresponsive adapter
Revise a stalled adapter message to also include the number of jobs that
are stalling the thread.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:53 +0000 (17:48 -0700)]
scsi: lpfc: Correct nvmet buffer free race condition
A race condition resulted in receive buffers being placed in the free list
twice.
Change the locking and handling to check whether the "other" path will be
freeing the entry in a later thread and skip it if it is.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:52 +0000 (17:48 -0700)]
scsi: lpfc: Fix nvmet target abort cmd matching
After receiving an unsolicited ABTS (meaning rxid is 0xFFFF), the driver
used the oxid from the initiator to match against a local xri which may
have been allocated for the io. The xri would be the rxid - it's an invalid
check resulting in the command not being matched or erroneously matched.
Change the lookup to use the oxid and the SID to match against received
IO's original values.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Smart [Wed, 22 May 2019 00:48:51 +0000 (17:48 -0700)]
scsi: lpfc: Fix alloc context on oas lun creations
Softlockups are seen in low memory situations. They are due to doing
oas_lun allocation with GFP_KERNEL in atomic contexts.
Change the calls to oas_lun to indicate atomic context so that GFP_ATOMIC
is used.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:50 +0000 (10:05 -0700)]
scsi: megaraid_sas: Update driver version to 07.708.03.00
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:49 +0000 (10:05 -0700)]
scsi: megaraid_sas: Export RAID map through debugfs
Create a debugfs interface for megaraid_sas driver. Provide interface to
dump driver RAID map in debugfs.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:48 +0000 (10:05 -0700)]
scsi: megaraid_sas: Fix MSI-X vector print
Print FW supported MSI-X vector count only if FW supports
MSI-X.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:47 +0000 (10:05 -0700)]
scsi: megaraid_sas: Add debug prints for device list
Add debug prints related to device list being returned by firmware. The a
debug flag to activate these prints.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:46 +0000 (10:05 -0700)]
scsi: megaraid_sas: Add prints in suspend and resume path
Add prints in resume/suspend path to help in debugging hibernation
issues. The print gives an indication when the driver entry points are
called.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:45 +0000 (10:05 -0700)]
scsi: megaraid_sas: Print firmware interrupt status
Add a print to dump the interrupt status in system log for debugging.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:44 +0000 (10:05 -0700)]
scsi: megaraid_sas: Print FW fault information
When driver detects a firmware fault during load, dump additional
information on fault code and subcode that will help in debugging.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:43 +0000 (10:05 -0700)]
scsi: megaraid_sas: Export RAID map id through sysfs
Add a sysfs interface to get the raid map index that is being used by
driver.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:42 +0000 (10:05 -0700)]
scsi: megaraid_sas: Print BAR information from driver
Add prints for BAR address information during driver load. This helps in
debugging issues with BAR address changing during OS boot.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:41 +0000 (10:05 -0700)]
scsi: megaraid_sas: Dump system registers for debugging
When controller fails to transition to READY state during driver probe,
dump the system interface register set. This will give snapshot of the
firmware status for debugging driver load issues.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:40 +0000 (10:05 -0700)]
scsi: megaraid_sas: Dump system interface regs from sysfs
Add a sysfs interface to dump the controller's system interface registers.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:39 +0000 (10:05 -0700)]
scsi: megaraid_sas: Add formatting option for megasas_dump
Add option to format the buffer that is being dumped. Currently, the IO
frame and chain frame dumped in the syslog is getting split across multiple
lines based on the formatting. Fix this by using KERN_CONT in printk.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:38 +0000 (10:05 -0700)]
scsi: megaraid_sas: Enhance internal DCMD timeout prints
Add prints to identify the internal DCMD opcode that has timed out.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:37 +0000 (10:05 -0700)]
scsi: megaraid_sas: Enhance prints in OCR and TM path
This patch enhances the existing debug prints in reset and task management
path.
These debug prints in adapter reset path helps with debugging issues
related to IO timeouts that are seen frequently in the field. Add
additional debug prints to dump the pending command frames before
initiating an adapter reset. Also, print FastPath IOs that are
outstanding.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:36 +0000 (10:05 -0700)]
scsi: megaraid_sas: Load balance completions across all MSI-X
Driver will use "reply descriptor post queues" in round robin fashion when
the combined MSI-X mode is not enabled. With this IO completions are
distributed and load balanced across all the available reply descriptor
post queues equally.
This is enabled only if combined MSI-X mode is not enabled in firmware.
This improves performance and also fixes soft lockups.
When load balancing is enabled, IRQ affinity from driver needs to be
disabled.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:35 +0000 (10:05 -0700)]
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:34 +0000 (10:05 -0700)]
scsi: megaraid_sas: Block PCI config space access from userspace during OCR
While an online controller reset(OCR) is in progress, there is short
duration where all access to controller's PCI config space from the host
needs to be blocked. This is due to a hardware limitation of MegaRAID
controllers.
With this patch, driver will block all access to controller's config space
from userland applications by calling pci_cfg_access_lock() while OCR is in
progress and unlocking after controller comes back to ready state.
Added helper function which locks the config space before initiating OCR
and wait for controller to become READY.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:33 +0000 (10:05 -0700)]
scsi: megaraid_sas: Rework code around controller reset
No functional change. This patch reworks code around controller reset path
which gets rid of a couple of goto labels. This is in preparation for the
next patch which adds PCI config space access locking while controller
reset is in progress.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:32 +0000 (10:05 -0700)]
scsi: megaraid_sas: fw_reset_no_pci_access required for MFI adapters only
fw_reset_no_pci_access is only applicable for MFI controllers and is not
used for Fusion controllers.
For all Fusion controllers, driver can check reset adapter bit in
status register before performing a chip reset without
setting "fw_reset_no_pci_access".
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Shivasharan S [Tue, 7 May 2019 17:05:30 +0000 (10:05 -0700)]
scsi: megaraid_sas: Remove unused variable target_index
No functional change. Remove set but unused variable in
megasas_set_static_target_properties.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ondrej Zary [Tue, 14 May 2019 17:23:09 +0000 (19:23 +0200)]
scsi: fdomain: Resurrect driver - ISA support
Future Domain 16xx ISA SCSI support card support.
Tested on IBM 92F0330 card (18C50 chip) with v1.00 BIOS.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ondrej Zary [Tue, 14 May 2019 17:23:08 +0000 (19:23 +0200)]
scsi: fdomain: Resurrect driver - PCI support
Future Domain TMC-3260/AHA-2920A PCI card support.
Tested on Adaptec AHA-2920A PCI card.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ondrej Zary [Tue, 14 May 2019 17:23:07 +0000 (19:23 +0200)]
scsi: fdomain: Resurrect driver - Core
Future Domain TMC-16xx/TMC-3260 SCSI driver.
This is the core driver, common for PCI, ISA and PCMCIA cards.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:40 +0000 (13:32 -0500)]
scsi: hpsa: update driver version
[mkp: wrong baseline, applied by hand]
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:33 +0000 (13:32 -0500)]
scsi: hpsa: correct device resets
Correct a race condition that occurs between the reset handler and the
completion handler. There are times when the wait_event condition is
never met due to this race condition and the reset never completes.
The reset_pending field is NULL initially.
t Reset Handler Thread Completion Thread
-- -------------------- -----------------
t1 if (c->reset_pending)
t2 c->reset_pending = dev; if (atomic_dev_and_test(counter))
t3 atomic_inc(counter) wait_up_all(event_sync_wait_queue)
t4
t5 wait_event(...counter == 0)
Kernel.org Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=1994350
Bug 199435 - HPSA + P420i resetting logical Direct-Access
never complete
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:26 +0000 (13:32 -0500)]
scsi: hpsa: do-not-complete-cmds-for-deleted-devices
Close up a rare multipath issue.
Close up small hole where a command completes after a device has been
removed from SML and before the device is re-added.
- Mark device as removed in slave_destroy
- Do not complete commands for deleted devices
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:20 +0000 (13:32 -0500)]
scsi: hpsa: wait longer for ptraid commands
Wait longer for outstanding commands before removing a multipath
device. Increase the timeout value for ptraid commands.
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:13 +0000 (13:32 -0500)]
scsi: hpsa: check for tag collision
Correct rare multipath issue where a device is deleted with an
outstanding cmd which results in a tag collision.
The cmd eventually completes. If a collision is detected wait until
the command slot is cleared.
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:07 +0000 (13:32 -0500)]
scsi: hpsa: use local workqueues instead of system workqueues
Avoid system stalls by switching to local workqueue.
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com>
Reviewed-by: David Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Tue, 7 May 2019 18:32:00 +0000 (13:32 -0500)]
scsi: hpsa: correct simple mode
Correct issue with hpsa_simple_mode module parameter. Driver was
hanging due to incorrect interrupt setup.
Reviewed-by: Justin Lindley <justin.lindley@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hannes Reinecke [Mon, 6 May 2019 06:19:15 +0000 (08:19 +0200)]
scsi: osst: kill obsolete driver
The osst driver is becoming obsolete, as the manufacturer went out of
business ages ago, and the maintainer has no means of testing any
improvements anymore. Plus these days flash drives are cheaper and offer a
higher capacity. So drop it completely.
Cc: Willem Riede <osst@riede.org>
Signed-off-by: Hannes Reinece <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 30 Apr 2019 21:39:19 +0000 (14:39 -0700)]
scsi: sd: Inline sd_probe_part2()
Make sd_probe() easier to read by inlining sd_probe_part2(). This patch
does not change any functionality.
[mkp: applied by hand]
Cc: Lee Duncan <lduncan@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Tue, 30 Apr 2019 21:39:18 +0000 (14:39 -0700)]
scsi: sd: Rely on the driver core for asynchronous probing
As explained during the 2018 LSF/MM session about increasing SCSI disk
probing concurrency, the problems with the current probing approach are as
follows:
- The driver core is unaware of asynchronous SCSI LUN probing.
wait_for_device_probe() waits for all asynchronous probes except
asynchronous SCSI disk probes.
- There is unnecessary serialization between sd_probe() and sd_remove().
This can lead to a deadlock.
Hence this patch that modifies the sd driver such that it uses the driver
core framework for asynchronous probing. The async domain and
get_device()/put_device() pairs that became superfluous due to this change
are removed.
This patch does not affect the time needed for loading the scsi_debug
kernel module with parameters delay=0 and max_luns=256.
This patch depends on commit
ef0ff68351be ("driver core: Probe devices
asynchronously instead of the driver") that went upstream in kernel version
v5.1-rc1.
Cc: Lee Duncan <lduncan@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:16 +0000 (12:14 -0400)]
scsi: st: add a SPDX tag to st.c
st.c is the only st file missing licensing information. Add a
GPLv2 tag for the default kernel license.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:15 +0000 (12:14 -0400)]
scsi: sr: add a SPDX tag to sr.c
sr.c is the only sr file missing licensing information. Add a
GPLv2 tag for the default kernel license.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:14 +0000 (12:14 -0400)]
scsi: sg: switch to SPDX tags
Use the the GPLv2+ SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:13 +0000 (12:14 -0400)]
scsi: ses: switch to SPDX tags
Use the the GPLv2 SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:12 +0000 (12:14 -0400)]
scsi: sd: switch remaining files to SPDX tags
Use the the GPLv2 SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:11 +0000 (12:14 -0400)]
scsi: sd: add a SPDX tag to sd.c
sd.c is the only sd file missing licensing information. Add a
GPLv2 tag for the default kernel license.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:10 +0000 (12:14 -0400)]
scsi: libsas: switch remaining files to SPDX tags
Use the the GPLv2 SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:09 +0000 (12:14 -0400)]
scsi: libsas: switch sas_ata.[ch] to SPDX tags
Use the the GPLv2+ SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:08 +0000 (12:14 -0400)]
scsi: libsas: add a SPDX tag to sas_task.c
sas_task.c is the only libsas file missing licensing information. Add a
GPLv2 tag for the default kernel license.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:07 +0000 (12:14 -0400)]
scsi: libiscsi: switch to SPDX tags
Use the the GPLv2+ SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:06 +0000 (12:14 -0400)]
scsi: libfcoe: switch to SPDX tags
Use the the GPLv2 SPDX tag instead of verbose boilerplate text.
[mkp: fixed comment syntax on *.c]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:05 +0000 (12:14 -0400)]
scsi: libfc: switch to SPDX tags
Use the the GPLv2 SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:04 +0000 (12:14 -0400)]
scsi: libfc: remove duplicate GPL boilerplate text
The libfc uapi headers already have proper SPDX tags, remove the
duplicate boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:03 +0000 (12:14 -0400)]
scsi: scsi_transport_srp: switch to SPDX tags
Use the the GPLv2+ SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:02 +0000 (12:14 -0400)]
scsi: scsi_transport_spi: switch to SPDX tags
Use the the GPLv2+ SPDX tag instead of verbose boilerplate text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Wed, 1 May 2019 16:14:01 +0000 (12:14 -0400)]
scsi: scsi_transport_sas: switch to SPDX tags
Use the the GPLv2 SPDX tag instead of a free form blurb.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>