review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge branch '5.3/scsi-sg' into scsi-next

scsi: qla2xxx: move IO flush to the front of NVME rport unregistration

On session deletion, current qla code would unregister an NVMe session
before flushing IOs. This patch would move the unregistration of NVMe
session after IO flush. This way FC-NVMe layer would not have to wait for
stuck IOs. In addition, qla2xxx would stop accepting new IOs during session
deletion.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix NVME cmd and LS cmd timeout race condition

This patch uses kref to protect access between fcp_abort path and nvme
command and LS command completion path.  Stack trace below shows the abort
path is accessing stale memory (nvme_private->sp).

When command kref reaches 0, nvme_private & srb resource will be
disconnected from each other.  Any subsequence nvme abort request will not
be able to reference the original srb.

[ 5631.003998] BUG: unable to handle kernel paging request at 00000010000005d8
[ 5631.004016] IP: [<ffffffffc087df92>] qla_nvme_abort_work+0x22/0x100 [qla2xxx]
[ 5631.004086] Workqueue: events qla_nvme_abort_work [qla2xxx]
[ 5631.004097] RIP: 0010:[<ffffffffc087df92>]  [<ffffffffc087df92>] qla_nvme_abort_work+0x22/0x100 [qla2xxx]
[ 5631.004109] Call Trace:
[ 5631.004115]  [<ffffffffaa4b8174>] ? pwq_dec_nr_in_flight+0x64/0xb0
[ 5631.004117]  [<ffffffffaa4b9d4f>] process_one_work+0x17f/0x440
[ 5631.004120]  [<ffffffffaa4bade6>] worker_thread+0x126/0x3c0

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: on session delete, return nvme cmd

- on session delete or chip reset, reject all NVME commands.

- on NVME command submission error, free srb resource.

Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: qla2xxx: Fix kernel crash after disconnecting NVMe devices

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffc050d10c>] qla_nvme_unregister_remote_port+0x6c/0xf0 [qla2xxx]
PGD 800000084cf41067 PUD 84d288067 PMD 0
Oops: 0000 [#1] SMP
Call Trace:
[<ffffffff98abcfdf>] process_one_work+0x17f/0x440
[<ffffffff98abdca6>] worker_thread+0x126/0x3c0
[<ffffffff98abdb80>] ? manage_workers.isra.26+0x2a0/0x2a0
[<ffffffff98ac4f81>] kthread+0xd1/0xe0
[<ffffffff98ac4eb0>] ? insert_kthread_work+0x40/0x40
[<ffffffff9918ad37>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffff98ac4eb0>] ? insert_kthread_work+0x40/0x40
RIP [<ffffffffc050d10c>] qla_nvme_unregister_remote_port+0x6c/0xf0 [qla2xxx]

The crash is due to a bad entry in the nvme_rport_list. This list is not
protected, and when a remoteport_delete callback is called, driver
traverses the list and crashes.

Actually, the list could be removed and driver could traverse the main
fcport list instead. Fix does exactly that.

Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Update driver version to 07.710.06.00-rc1

Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Introduce various Aero performance modes

For Aero adapters, driver provides three different performance modes
controlled through module parameter named 'perf_mode'. Below are those
performance modes:

0: Balanced - Additional high IOPS reply queues will be enabled along with
    low latency queues. Interrupt coalescing will be enabled only for these
    high IOPS reply queues.

1: IOPS - No additional high IOPS queues are enabled. Interrupt coalescing
    will be enabled on all reply queues.

2: Latency - No additional high IOPS queues are enabled. Interrupt
    coalescing will be disabled on all reply queues. This is a legacy
    behavior similar to Ventura & Invader Series.

Default performance mode settings:

- Performance mode set to 'Balanced', if Aero controller is working in
   16GT/s PCIe speed.

- Performance mode will be set to 'Latency' mode for all other cases.

Through module parameter 'perf_mode', user can override default performance
mode to desired one.

Captured some performance numbers with these performance modes.  4k Random
Read IO performance numbers on 24 SAS SSD drives for above three
performance modes. Performance data is from Intel Skylake and HGST SS300
(drive model SDLL1DLR400GCCA1).

IOPS:
-----------------------------------------------------------------------
  |perf_mode    | qd = 1 | qd = 64 |   note                             |
  |-------------|--------|---------|-------------------------------------
  |balanced     |  259K  |  3061k  | Provides max performance numbers   |
  |             |        |         | both on lower QD workload &        |
  |             |        |         | also on higher QD workload         |
  |-------------|--------|---------|-------------------------------------
  |iops         |  220K  |  3100k  | Provides max performance numbers   |
  |             |        |         | only on higher QD workload.        |
  |-------------|--------|---------|-------------------------------------
  |latency      |  246k  |  2226k  | Provides good performance numbers  |
  |             |        |         | only on lower QD worklaod.         |
  -----------------------------------------------------------------------

Average Latency:
  -----------------------------------------------------
  |perf_mode    |  qd = 1      |    qd = 64           |
  |-------------|--------------|----------------------|
  |balanced     |  92.05 usec  |    501.12 usec       |
  |-------------|--------------|----------------------|
  |iops         |  108.40 usec |    498.10 usec       |
  |-------------|--------------|----------------------|
  |latency      |  97.10 usec  |    689.26 usec       |
  -----------------------------------------------------

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Use high IOPS queues based on IO workload

The driver will use round-robin method for IO submission in batches within
the high IOPS queues when the number of in-flight ios on the target device
is larger than 8. Otherwise the driver will use low latency reply queues.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Set affinity for high IOPS reply queues

High iops queues are mapped to non-managed IRQs. Set affinity of
non-managed irqs to local numa node. Low latency queues are mapped to
managed IRQs.

Driver reserves some reply queues for high IOPS queues (through
pci_alloc_irq_vectors_affinity and .pre_vectors interface). The rest of
queues are for low latency.

Based on IO workload, driver will decide which group of reply queues
(either high IOPS queues or low latency queues) to be used.
High IOPS queues will be mapped to local numa node of controller and
low latency queues will be mapped to CPUs across numa nodes. In general,
high IOPS and low latency queues should fit into 128 reply queues
which is the max number of reply queues supported by Aero adapters.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Enable coalescing for high IOPS queues

Driver should enable interrupt coalescing (during driver load and after
Controller Reset) for High IOPS queues by masking appropriate bits in IOC
INIT frame.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Add support for High IOPS queues

Aero controllers support balanced performance mode through the ability to
configure queues with different properties.

Reply queues with interrupt coalescing enabled are called "high iops reply
queues" and reply queues with interrupt coalescing disabled are called "low
latency reply queues".

The driver configures a combination of high iops and low latency reply
queues if:

- HBA is an AERO controller;

- MSI-X vectors supported by the HBA is 128;

- Total CPU count in the system more than high iops queue count;

- Driver is loaded with default max_msix_vectors module parameter; and

- System booted in non-kdump mode.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Add support for MPI toolbox commands

Added driver support to allow passthrough MPI toolbox type MFI commands to
firmware based on firmware capability.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Offload Aero RAID5/6 division calculations to driver

For RAID5/RAID6 volumes configured behind Aero, driver will be doing 64bit
division operations on behalf of firmware as controller's ARM CPU is very
slow in this division. Later, driver calculates Q-ARM, P-ARM and Log-ARM and
passes those values to firmware by writing these values to RAID_CONTEXT.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: RAID1 PCI bandwidth limit algorithm is applicable for only Ventura

RAID1 PCI bandwidth limit algorithm is not applicable to Aero as it's PCIe
Gen4 adapter.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: megaraid_sas: Add check for count returned by HOST_DEVICE_LIST DCMD

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Handle sequence JBOD map failure at driver level

Issue: This issue is applicable to scenario when JBOD sequence map is
unavailable (memory allocation for JBOD sequence map failed) to driver but
feature is supported by firmware. If the driver sends a JBOD IO by not
adding 255 (MAX_PHYSICAL_DEVICES - 1) to device ID when underlying firmware
supports JBOD sequence map, it will lead to the IO failure.

Fix: For JBOD IOs, driver will not use the RAID map to fetch the devhandle
if JBOD sequence map is unavailable. Driver will set Devhandle to 0xffff
and Target ID to 'device ID + 255 (MAX_PHYSICAL_DEVICES - 1)'.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Don't send FPIO to RL Bypass queue

Firmware does not expect FastPath IO sent through Region Lock Bypass queue.
Though firmware never exposes such settings when fastpath IO can be sent to
RL bypass queue but it's safer to remove dead code which directs fastpath
IO to RL Bypass queue.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: In probe context, retry IOC INIT once if firmware is in fault

Issue: Under certain conditions, controller goes in FAULT state after IOC
INIT fired to firmware. Such Fault can be recovered through controller
reset.

Fix: In driver probe context, if firmware fault is observed post IOC INIT,
driver would do controller reset followed by retry logic for IOC INIT
command.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Release Mutex lock before OCR in case of DCMD timeout

Issue: There is possibility of few DCMDs timing out with 'reset_mutex' lock
held. As part of DCMD timeout handling, driver calls function
megasas_reset_fusion which also tries to acquire same lock 'reset_mutex'
and end up with deadlock.

Fix: Upon timeout of DCMDs (which are fired with 'reset_mutex' lock held),
driver will release 'reset_mutex' before calling OCR function and will
acquire lock again after OCR function returns.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Call disable_irq from process IRQ poll

On PowerPC architecture, calling disable_irq_nosync from IRQ context is not
providing the required effect.

In current megaraid_sas driver, disable_irq_nosync is being called from IRQ
context before enabling IRQ poll. But due to the issue seen on PPC, after
IRQ poll disable and legacy ISR is enabled, we are not seeing our ISR
getting called.

Fix: Call disable_irq from IRQ poll thread context instead of IRQ context.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Remove few debug counters from IO path

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Add support for Non-secure Aero PCI IDs

This patch will add support for non-secure Aero adapter PCI IDs. Driver
will throw an error message when a non-secure type controller is
detected. Purpose of this interface is to avoid interacting with any
firmware which is not secured/signed by Broadcom. Any tampering on Firmware
component will be detected by hardware and it will be communicated to the
driver to avoid any further interaction with that component.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Add 32 bit atomic descriptor support to AERO adapters

Aero adapters provides Atomic Request Descriptor as an alternative method
for posting an entry onto a request queue. The posting of an Atomic Request
Descriptor is an atomic operation, providing a safe mechanism for multiple
processors on the host to post requests without synchronization. This
Atomic Request Descriptor format is identical to first 32 bits of Default
Request Descriptor and uses only 32 bits.

If Aero adapters support Atomic descriptor, driver should use it for
posting IOs and DCMDs to firmware.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Modified the logic to collect IOP event logs

Added the logic for collecting IOP log respective to event log size.

Signed-off-by: Deepak Ukey <deepak.ukey@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pm80xx: Event log size through sysfs

Added support to read event log size from MPI configuration table and
export through sysfs.

Signed-off-by: Deepak Ukey <deepak.ukey@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Fix msix load balance on and off settings

Enable msix load balance only when combined reply queue mode is disabled on
the SAS3 and above generation HBA devices.

Earlier msix load balance used to enable if the number of online cpus is
greater than the number of MSI-X vectors enabled on the HBA.  Combined reply
queue mode will be disabled only on those HBA which works in shared
resources mode. I.e. on SAS3 HBAs it will be <= 8 and on SAS35 HBA devices
it will be <= 16.

- Before this patch if system has 256 logical CPUs and HBA exposes 128
   MSI-X vectors, driver will enable msix load balance.

- After this patch if system has 256 logical CPUs and HBA exposes 128
   MSI-X vectors, driver will disable msix load balance.

- After this patch if system has 256 logical CPUs and HBA exposes 16 MSI-X
   vectors (due to combined reply queue mode being off in HW), driver will
   enable msix load balance.

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Determine smp affinity on per HBA basis

Even though 'smp_affinity_enable' module parameter is enabled, if the
number of online CPUs is bigger than the number of msix vectors enabled on
that HBA, then smp affinity settings should be disabled only for this HBA.

But currently the smp affinity setting is disabled globally and hence smp
affinity will be disabled for subsequent HBAs even though number of msix
vectors enabled for this HBA matches the number of online CPU.

To fix this, define a per HBA variable smp_affinity_enable. Initially this
variable is initialized with smp_affinity_enable module parameter value. If
this HBA has less number of msix vectors configured when compared to number
of online cpus, then only this HBA's variable smp_affinity_enable is set to
zero.

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Use configured PCIe link speed, not max

When enabling high iops queues, the driver should use the HBA's configured
PCIe link speed instead of looking for the maximum link speed.

I.e. enable high iops queues only if Aero/Sea HBA's configured PCIe link
speed is set to 16GT/s.

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Remove CPU arch check to determine perf_mode

Currently default perf_mode is set to 'balanced' on Intel architecture
machines and on other machines default perf_mode is set to 'latency' mode.

This CPU architecture check is removed and the default perf_mode mode is
set to 'balanced' mode on all machines.

User can choose the required performance mode using perf_mode module
parameter.

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: Documentation: Announce ufs-tool v1.0

The ufs-tool stable release v1.0 is available at:

https://github.com/westerndigitalcorporation/ufs-tool

Feedback and bug reports, as always, are welcomed.

Signed-off-by: Arthur Simchaev <Arthur.Simchaev@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: fix bnx2fc_cmd refcount imbalance in send_srr

If cb_arg alloc failed, we can't release the struct orig_io_req refcount
before we take its refcount. As Saurav said, move the srr_err label down
to avoid unnecessary refcount release and nullptr free.

Signed-off-by: Lin Yi <teroincn@163.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: fix bnx2fc_cmd refcount imbalance in send_rec

If cb_arg alloc failed, we can't release the struct orig_io_req refcount
before we take its refcount. As Saurav said, move the rec_err label down
to avoid unnecessary refcount release and nullptr free.

Signed-off-by: Lin Yi <teroincn@163.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: Update the driver version to 2.12.10

Update the driver version to 2.12.10.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: Limit the IO size according to the FW capability

- Reduce the sg_tablesize to 255.

- Reduce the MAX BDs firmware can handle to 255.

- Return IO to ML if BD goes more then 255 after split.

- Correct the size of each BD split to 0xffff.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: Do not allow both a cleanup completion and abort completion for the same request

If firmware sends either cleanup or abort completion, it means other won't
be sent. Clean out flags for other as well.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: Separate out completion flags and variables for abort and cleanup

Separate out abort and cleanup flag and completion, to have better
understaning of what is getting processed.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: Only put reference to io_req in bnx2fc_abts_cleanup if cleanup times out

In certain tests where the SCSI error handler issues an abort that is
already outstanding, we will cleanup the command so that the SCSI error
handler can proceed.  In some of these cases we were seeing a command
mismatch:

kernel: scsi host2: bnx2fc: xid:0x42b eh_abort - refcnt = 2
kernel: bnx2fc: eh_abort: io_req (xid = 0x42b) already in abts processing
kernel: scsi host2: bnx2fc: xid:0x42b Entered bnx2fc_initiate_cleanup
kernel: scsi host2: bnx2fc: xid:0x42b CLEANUP io_req xid = 0x80b
kernel: scsi host2: bnx2fc: xid:0x80b cq_compl- cleanup resp rcvd
kernel: scsi host2: bnx2fc: xid:0x42b complete - rx_state = 9
kernel: scsi host2: bnx2fc: xid:0x42b Entered process_cleanup_compl refcnt = 2, cmd_type = 1
kernel: scsi host2: bnx2fc: xid:0x42b scsi_done. err_code = 0x7
kernel: scsi host2: bnx2fc: xid:0x42b sc=ffff8807f93dfb80, result=0x7, retries=0, allowed=5
kernel: ------------[ cut here ]------------
kernel: WARNING: at /root/rpmbuild/BUILD/netxtreme2-7.14.43/obj/default/bnx2fc-2.12.1/driver/bnx2fc_io.c:1347 bnx2fc_eh_abort+0x56f/0x680 [bnx2fc]()
kernel: xid=0x42b refcount=-1
kernel: Modules linked in:
kernel: nls_utf8 isofs sr_mod cdrom tcp_lp dm_round_robin xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge ebtable_filter ebtables fuse ip6table_filter ip6_tables iptable_filter bnx2fc(OE) cnic(OE) uio fcoe libfcoe 8021q libfc garp mrp scsi_transport_fc stp llc scsi_tgt vfat fat dm_service_time intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses enclosure ipmi_ssif i2c_core hpilo hpwdt wmi sg ipmi_devintf pcspkr ipmi_si ipmi_msghandler shpchp acpi_power_meter dm_multipath nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs sd_mod crc_t10dif
kernel: crct10dif_generic bnx2x(OE) crct10dif_pclmul crct10dif_common crc32c_intel mdio ptp pps_core libcrc32c smartpqi scsi_transport_sas fjes uas usb_storage dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 9 PID: 2012 Comm: scsi_eh_2 Tainted: G        W  OE  ------------   3.10.0-514.el7.x86_64 #1
kernel: Hardware name: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 03/21/2018
kernel: ffff8807f25a3d98 0000000015e7fa0c ffff8807f25a3d50 ffffffff81685eac
kernel: ffff8807f25a3d88 ffffffff81085820 ffff8807f8e39000 ffff880801ff7468
kernel: ffff880801ff7610 0000000000002002 ffff8807f8e39014 ffff8807f25a3df0
kernel: Call Trace:
kernel: [<ffffffff81685eac>] dump_stack+0x19/0x1b
kernel: [<ffffffff81085820>] warn_slowpath_common+0x70/0xb0
kernel: [<ffffffff810858bc>] warn_slowpath_fmt+0x5c/0x80
kernel: [<ffffffff8168d842>] ? _raw_spin_lock_bh+0x12/0x50
kernel: [<ffffffffa0549e6f>] bnx2fc_eh_abort+0x56f/0x680 [bnx2fc]
kernel: [<ffffffff814570af>] scsi_error_handler+0x59f/0x8b0
kernel: [<ffffffff81456b10>] ? scsi_eh_get_sense+0x250/0x250
kernel: [<ffffffff810b052f>] kthread+0xcf/0xe0
kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
kernel: [<ffffffff81696418>] ret_from_fork+0x58/0x90
kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
kernel: ---[ end trace 42deb88f2032b111 ]---

The reason that there was a mismatch is that the SCSI command is actual
returned from the cleanup handler.  In previous testing, the type of
cleanup notification we'd get from the CQE did not trigger the code that
returned the SCSI command.  To overcome the previous behavior we would put
a reference in bnx2fc_abts_cleanup() to account for the SCSI command.
However, in cases where the SCSI command is actually off, we end up with an
extra put.

The fix for this is to only take the extra put in bnx2fc_abts_cleanup if
the completion for the cleanup times out.

Signed-off-by: Chad Dupuis <cdupuis@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: bnx2fc: Redo setting source FCoE MAC

For bnx2fc, the source FCoE MAC is stored in the fcoe_port struct in the
data_src_mac field.  Currently this is set in fcoe_ctlr_recv_flogi which
ends up setting it by simply using fc_fcoe_set_mac() which only uses the
default FCF-MAC.  We still want to store the source FCoE MAC in
port->data_src_mac but we want to snoop the FLOGI response payload so as to
set it in the following method:

1. If a granted_mac is found, use that.

2. If not granted_mac is there but there is a FCF-MAP from the FCF then
   create the MAC from the FCF-MAP and the destination ID from the frame.

3. If there is no FCF-MAP the use the spec. default FCF-MAP and the
   destination ID from the frame.

Signed-off-by: Chad Dupuis <cdupuis@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufshdc-pci: Add Intel PCI IDs for EHL

Add more Intel PCI IDs.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs-bsg: complete ufs-bsg job only if no error

In the case of UPIU/DME request execution failed in UFS device,
ufs_bsg_request() will complete the failed bsg job by calling
bsg_job_done(). Meanwhile, it returns this error status to blk-mq layer,
then triggers blk-mq completing this request again, this will cause the
following panic.

Call trace:
ll_sc___cmpxchg_case_acq_32+0x4/0x20
complete+0x28/0x70
blk_end_sync_rq+0x24/0x30
blk_mq_end_request+0xb8/0x118
bsg_job_put+0x4c/0x58
bsg_complete+0x20/0x30
blk_done_softirq+0xb4/0xe8
do_softirq+0x154/0x3f0
run_ksoftirqd+0x4c/0x68
smpboot_thread_fn+0x22c/0x268
kthread+0x130/0x138
ret_from_fork+0x10/0x1c
Code: f84107fe d65f03c0 d503201f f9800011 (885ffc10)
---[ end trace d92825bff6326e66 ]---
Kernel panic - not syncing: Fatal exception in interrupt

This patch is to fix this issue. The solution is to complete the ufs-bsg
job only if no error happened.

[mkp: commit description tweak]

Fixes: df032bf27a41 (scsi: ufs: Add a bsg endpoint that supports UPIUs)
Signed-off-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs-bsg: fix typo in ufs_bsg_request

Correct dev_dbg to dev_err, so as to print out the error information in
case of DME command failed.

Signed-off-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: virtio_scsi: remove unused 'affinity_hint_set'

The 'affinity_hint_set' is not used any longer since commit
0d9f0a52c8b9 ("virtio_scsi: use virtio IRQ affinity").

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: use DEVICE_ATTR_{RO, RW}

Use existing macros. No functional change.

[mkp: typo]

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: make driver options visible in sys

Support is easier with all driver parameters visible in sysfs. Also I've
replaced a constant with an octal permission.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs-qcom: Add support for platforms booting ACPI

New Qualcomm AArch64 based laptops are now available which use UFS as their
primary data storage medium. These devices are supplied with ACPI support
out of the box. This patch ensures the Qualcomm UFS driver will be bound
when the "QCOM24A5" H/W device is advertised as present.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Use struct_size() helper

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array. For example:

struct MR_PD_CFG_SEQ_NUM_SYNC {
...
struct MR_PD_CFG_SEQ seq[1];
} __packed;

Make use of the struct_size() helper instead of an open-coded version in
order to avoid any potential type mistakes.

So, replace the following form:

sizeof(struct MR_PD_CFG_SEQ_NUM_SYNC) + (sizeof(struct MR_PD_CFG_SEQ) * (MAX_PHYSICAL_DEVICES - 1))

with:

struct_size(pd_sync, seq, MAX_PHYSICAL_DEVICES - 1)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mac_scsi: Treat Last Byte Sent time-out as failure

A system bus error during a PDMA send operation can result in bytes being
lost. Theoretically that could cause the target to remain in DATA OUT phase
and the initiator (expecting a phase change) would time-out waiting for the
Last Byte Sent flag. Should that happen, fail the transfer so the core
driver will stop using PDMA with this target.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mac_scsi: Enable PDMA on Mac IIfx

Add support for Apple's custom "SCSI DMA" chip. This patch doesn't make use
of its DMA capability. Just the PDMA capability is sufficient to improve
sequential read throughput by a factor of 5.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Joshua Thompson <funaho@jurai.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mac_scsi: Fix pseudo DMA implementation, take 2

A system bus error during a PDMA transfer can mess up the calculation of
the transfer residual (the PDMA handshaking hardware lacks a byte
counter). This results in data corruption.

The algorithm in this patch anticipates a bus error by starting each
transfer with a MOVE.B instruction. If a bus error is caught the transfer
will be retried. If a bus error is caught later in the transfer (for a
MOVE.W instruction) the transfer gets failed and subsequent requests for
that target will use PIO instead of PDMA.

This avoids the "!REQ and !ACK" error so the severity level of that message
is reduced to KERN_DEBUG.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v4.14+
Fixes: 3a0f64bfa907 ("mac_scsi: Fix pseudo DMA implementation")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reported-by: Chris Jones <chris@martin-jones.com>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mac_scsi: Increase PIO/PDMA transfer length threshold

Some targets introduce delays when handshaking the response to certain
commands. For example, a disk may send a 96-byte response to an INQUIRY
command (or a 24-byte response to a MODE SENSE command) too slowly.

Apparently the first 12 or 14 bytes are handshaked okay but then the system
bus error timeout is reached while transferring the next word.

Since the scsi bus phase hasn't changed, the driver then sets the target
borken flag to prevent further PDMA transfers. The driver also logs the
warning, "switching to slow handshake".

Raise the PDMA threshold to 512 bytes so that PIO transfers will be used
for these commands. This default is sufficiently low that PDMA will still
be used for READ and WRITE commands.

The existing threshold (16 bytes) was chosen more or less at random.
However, best performance requires the threshold to be as low as possible.
Those systems that don't need the PIO workaround at all may benefit from
mac_scsi.setup_use_pdma=1

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: stable@vger.kernel.org # v4.14+
Fixes: 3a0f64bfa907 ("mac_scsi: Fix pseudo DMA implementation")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: NCR5380: Handle PDMA failure reliably

A PDMA error is handled in the core driver by setting the device's 'borken'
flag and aborting the command. Unfortunately, do_abort() is not
dependable. Perform a SCSI bus reset instead, to make sure that the command
fails and gets retried.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: stable@vger.kernel.org # v4.20+
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: NCR5380: Always re-enable reselection interrupt

The reselection interrupt gets disabled during selection and must be
re-enabled when hostdata->connected becomes NULL. If it isn't re-enabled a
disconnected command may time-out or the target may wedge the bus while
trying to reselect the host. This can happen after a command is aborted.

Fix this by enabling the reselection interrupt in NCR5380_main() after
calls to NCR5380_select() and NCR5380_information_transfer() return.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: stable@vger.kernel.org # v4.9+
Fixes: 8b00c3d5d40d ("ncr5380: Implement new eh_abort_handler")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Revert "scsi: ncr5380: Increase register polling limit"

This reverts commit 4822827a69d7cd3bc5a07b7637484ebd2cf88db6.

The purpose of that commit was to suppress a timeout warning message which
appeared to be caused by target latency. But suppressing the warning is
undesirable as the warning may indicate a messed up transfer count.

Another problem with that commit is that 15 ms is too long to keep
interrupts disabled as interrupt latency can cause system clock drift and
other problems.

Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 4822827a69d7 ("scsi: ncr5380: Increase register polling limit")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: wd719x: Fix resets and aborts

Host reset oopses because it calls wd719x_chip_init, which calls
request_firmware, under a spinlock. Stop the RISC first, then flush active
SCBs under a spinlock. Finally call wd719x_chip_init unlocked.

Also found and fixed more bugs during tests:

Affected active SCBs were not flushed during abort, bus and device
reset. This caused problems in a following host reset (hang or oops).

Device and bus reset failed under load because the result of the reset
command is WD719X_SUE_TERM or WD719X_SUE_RESET. Don't treat these codes as
error in wd719x_wait_done.

wd719x_direct_cmd for RESET/ABORT commands didn't work properly, causing
timeouts. Looks like it was caused by the WD719X_DISABLE_INT bit. Not
setting it for RESET/ABORT commands seems to fix the probem. Also lower
the log level of the corresponding "direct command completed" message to
debug.

Unfortunately, my documentation is missing some pages, including page
67 (SPIDER67.gif) about resets :(

Reported-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: RDMA/srp: Fix a sleep-in-invalid-context bug

The previous patch guarantees that srp_queuecommand() does not get
invoked while reconnecting occurs. Hence remove the code from
srp_queuecommand() that prevents command queueing while reconnecting.
This patch avoids that the following can appear in the kernel log:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9
1 lock held by scsi_eh_9/5600:
#0: (rcu_read_lock){....}, at: [<00000000cbb798c7>] __blk_mq_run_hw_queue+0xf1/0x1e0
Preemption disabled at:
[<00000000139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0
CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: G W 4.15.0-rc4-dbg+ #1
Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016
Call Trace:
dump_stack+0x67/0x99
___might_sleep+0x16a/0x250 [ib_srp]
__mutex_lock+0x46/0x9d0
srp_queuecommand+0x356/0x420 [ib_srp]
scsi_dispatch_cmd+0xf6/0x3f0
scsi_queue_rq+0x4a8/0x5f0
blk_mq_dispatch_rq_list+0x73/0x440
blk_mq_sched_dispatch_requests+0x109/0x1a0
__blk_mq_run_hw_queue+0x131/0x1e0
__blk_mq_delay_run_hw_queue+0x9a/0xf0
blk_mq_run_hw_queue+0xc0/0x1e0
blk_mq_start_hw_queues+0x2c/0x40
scsi_run_queue+0x18e/0x2d0
scsi_run_host_queues+0x22/0x40
scsi_error_handler+0x18d/0x5f0
kthread+0x11c/0x140
ret_from_fork+0x24/0x30

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: Avoid that .queuecommand() gets called for a blocked SCSI device

Several SCSI transport and LLD drivers surround code that does not
tolerate concurrent calls of .queuecommand() with scsi_target_block() /
scsi_target_unblock(). These last two functions use
blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
queues to prevent concurrent .queuecommand() calls. However, that is
not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
Hence surround the .queuecommand() call from the SCSI error handler with
code that avoids that .queuecommand() gets called in the blocked state.

Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
code that calls blk_get_request() + blk_execute_rq() is not an option
since scsi_send_eh_cmnd() must be able to make forward progress even
if all requests have been allocated.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: Restrict user space SCSI device state changes to "running" and "offline"

The ability to modify the SCSI device state was introduced by commit
638127e579a4 ("[PATCH] Fix error handler offline behaviour"; v2.6.12). That
same commit introduced the following device states:

       { SDEV_CREATED, "created" },
       { SDEV_RUNNING, "running" },
       { SDEV_CANCEL,  "cancel"  },
       { SDEV_DEL,     "deleted" },
       { SDEV_QUIESCE, "quiesce" },
       { SDEV_OFFLINE, "offline" },

The SDEV_BLOCK state was introduced later to avoid that an FC cable pull
would immediately result in an I/O error (commit 1094e682310e; "[PATCH]
suspending I/Os to a device"; v2.6.12). That same patch introduced the
ability to set the SDEV_BLOCK state from user space. I'm not sure whether
that ability was introduced on purpose or accidentally.

Since there is agreement that only writing "running" or "offline" into
the SCSI sysfs device state attribute makes sense, restrict sysfs writes
to these values.

This patch makes sure that SDEV_BLOCK is only used for its original
purpose, namely to allow transport drivers and LLDs to block further
.queuecommand() calls while transport layer or adapter recovery is in
progress.

Note: a web search for "/sys/class/scsi_device" AND "device/state"
revealed several storage configuration guides. The instructions I found
in these guides tell users to write the value "running" or "offline" in
the SCSI device state sysfs attribute and no other values.

[mkp: typo]

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: James Smart <james.smart@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: cxgb4i: add support for IEEE_8021QAZ_APP_SEL_STREAM selector

IEEE_8021QAZ_APP_SEL_STREAM is a valid selector for iSCSI connections, so
add code to use IEEE_8021QAZ_APP_SEL_STREAM selector to get priority mask.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: tcmu: Simplify tcmu_update_uio_info()

Use 'kasprintf()' instead of:
   - snprintf(NULL, 0...
   - kmalloc(...
   - snprintf(...

This is less verbose and saves 7 bytes (i.e. the space for '/(null)') if
'udev->dev_config' is NULL.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: storvsc: Add ability to change scsi queue depth

Adding functionality to allow the SCSI queue depth to be changed by
utilizing the "scsi_change_queue_depth" function.

[mkp: checkpatch]

Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.

This patch fixes the following warning:

drivers/scsi/mpt3sas/mpt3sas_base.c: In function  _base_update_ioc_page1_inlinewith_perf_mode :
drivers/scsi/mpt3sas/mpt3sas_base.c:4510:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (ioc->high_iops_queues) {
      ^
drivers/scsi/mpt3sas/mpt3sas_base.c:4530:2: note: here
  case MPT_PERF_MODE_LATENCY:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.

Fixes: 30cb97023f38 ("scsi: mpt3sas: Introduce perf_mode module parameter")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: libsas: aic94xx: hisi_sas: mvsas: pm8001: Use dev_is_expander()

Many times in libsas, and in LLDDs which use libsas, the check for an
expander device is re-implemented or open coded.

Use dev_is_expander() instead. We rename this from
sas_dev_type_is_expander() to not spill so many lines in referencing.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: don't preallocate small SGL in case of NO_SG_CHAIN

The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
support SG_CHAIN, preallocation of small SGL can't work at all.

Fix this issue by not using small preallocation in case of NO_SG_CHAIN.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lib/sg_pool.c: clear 'first_chunk' in case of no preallocation

If user doesn't ask to preallocate by passing zero 'nents_first_chunk' to
sg_alloc_table_chained, we need to make sure that 'first_chunk' is cleared.
Otherwise, __sg_alloc_table() still may think that the 1st SGL should be
from the preallocation.

Fixes the issue by clearing 'first_chunk' in sg_alloc_table_chained() if
'nents_first_chunk' is zero.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: avoid preallocating big SGL for data

scsi_mq_setup_tags() preallocates a big buffer for the IO SGL. The size is
based on scsi_mq_sgl_size() which is determined based on
shost->sg_tablesize and SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the resulting scsi_mq_sgl_size() is often too big. SG_CHUNK_SIZE results in
a static 4KB SGL allocation per command.

If an HBA has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For lpfc, nr_hw_queues can be 70
and each queue's depth 3781. This means the resulting preallocation for
the data SGL is 70*3781*2K = 517MB.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for SCSI as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.

[mkp: attempted to clarify commit desc]

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: avoid preallocating big SGL for protection information

scsi_mq_setup_tags() currently preallocates a big buffer for protection
SGL entries. scsi_mq_sgl_size() is used to determine the size for both data
and protection information scatterlists but the protection buffer is
usually much smaller. For example, one 512-byte sector needs 8 bytes of
protection information. Given that the maximum number of sectors for one
request is 2560 (BLK_DEF_MAX_SECTORS) sectors, the max protection
information buffer size is just 20K.

The protection information segment count generally matches the number of
bios in the request. As a result, the typical actual number of segments
won't be very big. And should the need arise, allocating a bigger SGL from
slab is fast enough.

Pre-allocate only one SGL entry for protection information and switch to
runtime allocation in case that the protection information segment number
is bigger than 1. This reduces memory tied up by static command
allocations. For example, 500+ MB is saved on single lpfc HBA.

[mkp: attempted to clarify commit desc]

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lib/sg_pool.c: improve APIs for allocating sg pool

sg_alloc_table_chained() currently allows the caller to provide one
preallocated SGL and returns if the requested number isn't bigger than
size of that SGL. This is used to inline an SGL for an IO request.

However, scattergather code only allows that size of the 1st preallocated
SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory
(4KB) is claimed for the SGL for each IO request. If the I/O is small, it
would be prudent to allocate a smaller SGL.

Introduce an extra parameter to sg_alloc_table_chained() and
sg_free_table_chained() for specifying size of the preallocated SGL.

Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the
same size except for the last one. Change the code to allow both functions
to accept a variable size for the 1st preallocated SGL.

[mkp: attempted to clarify commit desc]

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: netdev@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: esp: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: NCR5380: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Cc: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: wd33c93: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ppa: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pcmcia: nsp_cs: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: imm: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: aha152x: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

Finn added the change to replace SCp.buffers_residual with
sg_is_last() for fixing updating it, and the similar change has been
applied on NCR5380.c

[mkp: clarified commit message]

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: s390: zfcp_fc: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Cc: Steffen Maier <maier@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: staging: unisys: visorhba: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Cc: devel@driverdev.osuosl.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: usb: image: microtek: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Cc: Oliver Neukum <oliver@neukum.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: pmcraid: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ipr: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mvumi: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message and folded in build fix reported by zeroday]

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: advansys: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: vmw_pscsi: use sg helper to iterate over scatterlist

Unlike the legacy I/O path, scsi-mq preallocates a large array to hold
the scatterlist for each request. This static allocation can consume
substantial amounts of memory on modern controllers which support a
large number of concurrently outstanding requests.

To facilitate a switch to a smaller static allocation combined with a
dynamic allocation for requests that need it, we need to make sure all
SCSI drivers handle chained scatterlists correctly.

Convert remaining drivers that directly dereference the scatterlist
array to using the iterator functions.

[mkp: clarified commit message]

Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Update driver version to 29.100.00.00

Update driver version from 28.100.00.00 to 29.100.00.00
This is equivalent to Phase 10 OOB driver.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Introduce perf_mode module parameter

1. Introduce module parameter perf_mode for only Aero/Sea generation HBAs.

2. Update IOC page1 fields according to performance mode.

Below are the performance modes that can be enabled with module parameter
perf_mode:

0: Balanced - Few high iops reply queues will be enabled.  Interrupt
    coalescing will be enabled only for these high iops reply descriptor
    queues.

1: Iops - Interrupt coalescing will be enabled on all reply queues.
    Coalescing timeout is set to 0x20.This is default value for Aero.

2: Latency - Interrupt coalescing will be enabled on all reply queues.
    Coalescing timeout is set to 0xA.  This is a legacy behavior similar to
    Ventura & Invader HBA series.

Default perf mode set by driver will be balanced mode if the following
conditions are met:

- CPU vendor = Intel;
- Aero controller working in 16GT/s pcie speed

Performance mode will be set to latency mode for all other cases.

4k Random Read IO performance numbers on 24 SAS SSD drives for above three
permormance modes. Performance data is from Intel Skylake and HGST SS300
(drive model SDLL1DLR400GCCA1).

IOPs:
-----------------------------------------------------------------------
  |perf_mode    | qd = 1 | qd = 64 |   note                             |
  |-------------|--------|---------|-------------------------------------
  |balanced     |  259K  |  3061k  | Provides max performance numbers   |
  |             |        |         | both on lower QD workload &        |
  |             |        |         | also on higher QD workload         |
  |-------------|--------|---------|-------------------------------------
  |iops         |  220K  |  3100k  | Provides max performance numbers   |
  |             |        |         | only on higher QD workload.        |
  |-------------|--------|---------|-------------------------------------
  |latency      |  246k  |  2226k  | Provides good performance numbers  |
  |             |        |         | only on lower QD worklaod.         |
  -----------------------------------------------------------------------

Avarage Latency:
  -----------------------------------------------------
  |perf_mode    |  qd = 1      |    qd = 64           |
  |-------------|--------------|----------------------|
  |balanced     |  92.05 usec  |    501.12 usec       |
  |-------------|--------------|----------------------|
  |iops         |  108.40 usec |    498.10 usec       |
  |-------------|--------------|----------------------|
  |latency      |  97.10 usec  |    689.26 usec       |
  -----------------------------------------------------

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Enable interrupt coalescing on high iops

Enable interrupt coalescing only on high iops queues.

In ioc config page 1, offset 0x14 (ProductSpecific field) is used to
determine interrupt coalescing enabled/disabled on per reply descriptor
post queue group(8) basis. If 31st bit is zero, then interrupt coalescing
is enabled for all reply descriptor post queues. If 31st bit is set to one,
then user can enable/disable interrupt coalescing on per reply descriptor
post queue group(8) basis. So to enable interrupt coalescing only on first
reply descriptor post queue group (i.e. on high iops queues), set bit 0 and
31.

This configuration should reset during driver unload or shutdown to the
default settings. For this, the driver takes copy of default ioc page 1 and
copies back the default or unmodified ioc page1 during unload and
shutdown. This means that on next driver load (e.g. if older version driver
is loaded by user), current modified changes on ioc page1 won't take
effect.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Affinity high iops queues IRQs to local node

High iops queues are mapped to non-managed irqs. Set affinity of
non-managed irqs to local numa node. Low latency queues are mapped to
managed irqs.

Driver reserves some reply queues for max iops (through
pci_alloc_irq_vectors_affinity and .pre_vectors interface). The rest of
queues are for low latency.

Based on io workload in io submission path, driver will decide which group
of reply queues (either high iops queues or low latency queues) to be
used. High iops queues will be mapped to local numa node of controller and
low latency queues will be mapped to cpus across numa nodes. In general,
high iops and low latency queues should fit into 128 reply queues
which is the max number of reply queues supported by Aero/Sea.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: save and use MSI-X index for posting RD

In the IO submission path _base_get_msix_index is called twice. Initially
while getting the smid and subsequently while posting the request
descriptor (RD).

Refactor code to query msix index only while posting the request
descriptor. Save determined msix index in msix_io field.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Use high iops queues under some circumstances

The driver will use round-robin method for io submission in batches within
the high iops queues when the number of in-flight ios on the target device
is larger than 8. Otherwise the driver will use low latency reply queues.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: change _base_get_msix_index prototype

Code refactoring.

In function _base_get_msix_index, add scmd as second argument. This change
is made in preparation for the next patch where we introduce a new function
to get the MSI-X index for high iops queues.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Add flag high_iops_queues

Aero controllers support balanced performance mode through the ability to
configure queues with different properties.

Reply queues with interrupt coalescing enabled are called "high iops reply
queues" and reply queues with interrupt coalescing disabled are called "low
latency reply queues".

The driver configures a combination of high iops and low latency reply
queues if:

- HBA is an AERO controller;

- MSI-X vectors supported by the HBA is 128;

- Total CPU count in the system more than high iops queue count;

- Driver is loaded with default max_msix_vectors module parameter; and

- System booted in non-kdump mode.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: Add Atomic RequestDescriptor support on Aero

If the Aero HBA supports Atomic Request Descriptors, it sets the Atomic
Request Descriptor Capable bit in the IOCCapabilities field of the IOCFacts
Reply message. Driver uses an Atomic Request Descriptor as an alternative
method for posting an entry onto a request queue.

The posting of an Atomic Request Descriptor is an atomic operation,
providing a safe mechanism for multiple processors on the host to post
requests without synchronization. This Atomic Request Descriptor format is
identical to first 32 bits of Default Request Descriptor and uses only 32
bits.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: mpt3sas: function pointers of request descriptor

This code refactoring introduces function pointers.

Host uses Request Descriptors of different types for posting an entry onto
a request queue. Based on controller type and capabilities, host can also
use atomic descriptors other than normal descriptors. Using function
pointer will avoid if-else statements

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: isci: Grammar s/the its/its/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: aic7xxx: Spelling s/configuraion/configuration/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Remove unused including <linux/version.h>

Remove including <linux/version.h> that don't need it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: use DEVICE_ATTR_{RO, RW}

Use existing macros. No functional change.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: use octal permissions instead of constants

Checkpatch emits a warning when using symbolic permissions. Use octal
permissions instead. No functional change.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: make max_sectors visible in sys

Support is easier with all driver parameters visible in sysfs.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: remove set but not used variables 'buff_addr' and 'ci_h'

Fixes gcc '-Wunused-but-set-variable' warnings:

drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_fw_crash_buffer_show:
drivers/scsi/megaraid/megaraid_sas_base.c:3138:16: warning: variable buff_addr set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_get_pd_list:
drivers/scsi/megaraid/megaraid_sas_base.c:4426:13: warning: variable ci_h set but not used [-Wunused-but-set-variable]

'buff_addr' is never used since inroduction in commit fc62b3fc9021
("megaraid_sas : Firmware crash dump feature support")

'ci_h' is not used since commit 9b3d028f3468 ("scsi: megaraid_sas:
Pre-allocate frequently used DMA buffers")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>