Randy Dunlap [Thu, 18 May 2023 21:27:40 +0000 (14:27 -0700)]
scsi: docs: introduction: Multiple cleanups
Modify URLs to use https instead of http.
Remove ancient URLs that don't work.
Change "scsi" in text to "SCSI".
Change "cdrom" in text to "CD-ROM".
Drop the reference to "autoclean" for modules since I can't
find it in any current documentation.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230518212749.18266-3-rdunlap@infradead.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Randy Dunlap [Thu, 18 May 2023 21:27:39 +0000 (14:27 -0700)]
scsi: docs: Organize the SCSI documentation
Break the SCSI documentation up into categories:
Introduction, APIs, driver parameters, and host adapter drivers instead
of alphabetical by document file name (i.e., no organization).
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230518212749.18266-2-rdunlap@infradead.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gustavo A. R. Silva [Wed, 17 May 2023 21:22:45 +0000 (15:22 -0600)]
scsi: lpfc: Replace one-element array with flexible-array member
One-element arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace one-element arrays with flexible-array
members in a couple of structures, and refactor the rest of the code,
accordingly.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy() and help us make progress towards globally enabling
-fstrict-flex-arrays=3 [1].
This results in no differences in binary output.
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/295
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/6c6dcab88524c14c47fd06b9332bd96162656db5.1684358315.git.gustavoars@kernel.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christophe JAILLET [Sun, 7 May 2023 15:23:49 +0000 (17:23 +0200)]
scsi: mpi3mr: Fix the type used for pointers to bitmap
Bitmaps are "unsigned long[]", so better use "unsigned long *" instead of a
plain "void *" when dealing with pointers to bitmaps.
This is more informative.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/8bdf9148ce1a5d01aac11c46c8617b477813457e.1683473011.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Yuchen Yang [Fri, 5 May 2023 14:12:55 +0000 (22:12 +0800)]
scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
Smatch complains that:
tw_probe() warn: missing error code 'retval'
This patch adds error checking to tw_probe() to handle initialization
failure. If tw_reset_sequence() function returns a non-zero value, the
function will return -EINVAL to indicate initialization failure.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Yuchen Yang <u202114568@hust.edu.cn>
Link: https://lore.kernel.org/r/20230505141259.7730-1-u202114568@hust.edu.cn
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Keoseong Park [Wed, 3 May 2023 10:46:30 +0000 (19:46 +0900)]
scsi: ufs: core: Return earlier if ufshcd_hba_init_crypto_capabilities() fails
The 'err' variable is used only as the result of
ufshcd_hba_init_crypto_capabilities(), so return 'err' immediately when
failed. If it is not an error, explicitly return 0.
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Link: https://lore.kernel.org/r/20230503104630epcms2p8b82734102ffb920531e9264604086372@epcms2p8
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 22 May 2023 21:09:51 +0000 (17:09 -0400)]
Merge patch series "Add Command Duration Limits support"
Niklas Cassel <nks@flawful.org> says:
This series adds support for Command Duration Limits.
The series is based on linux tag: v6.4-rc1
The series can also be found in git: https://github.com/floatious/linux/commits/cdl-v7
=================
CDL in ATA / SCSI
=================
Command Duration Limits is defined in:
T13 ATA Command Set - 5 (ACS-5) and
T10 SCSI Primary Commands - 6 (SPC-6) respectively
(a simpler version of CDL is defined in T10 SPC-5).
CDL defines Duration Limits Descriptors (DLD).
7 DLDs for read commands and 7 DLDs for write commands.
Simply put, a DLD contains a limit and a policy.
A command can specify that a certain limit should be applied by setting
the DLD index field (3 bits, so 0-7) in the command itself.
The DLD index points to one of the 7 DLDs.
DLD index 0 means no descriptor, so no limit.
DLD index 1-7 means DLD 1-7.
A DLD can have a few different policies, but the two major ones are:
-Policy 0xF (abort), command will be completed with command aborted error
(ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
the command timed out.
-Policy 0xD (complete-unavailable), command will be completed without
error (ATA) or status GOOD (SCSI), with sense data indicating that the
command timed out. Note that the command will not have transferred any
data to/from the device when the command timed out, even though the
command returned success.
Regardless of the CDL policy, in case of a CDL timeout, the I/O will
result in a -ETIME error to user-space.
The DLDs are defined in the CDL log page(s) and are readable and writable.
Reading and writing the CDL DLDs are outside the scope of the kernel.
If a user wants to read or write the descriptors, they can do so using a
user-space application that sends passthrough commands, such as cdl-tools:
https://github.com/westerndigitalcorporation/cdl-tools
================================
The introduction of ioprio hints
================================
What the kernel does provide, is a method to let I/O use one of the CDL DLDs
defined in the device. Note that the kernel will simply forward the DLD index
to the device, so the kernel currently does not know, nor does it need to know,
how the DLDs are defined inside the device.
The way that the CDL DLD index is supplied to the kernel is by introducing a
new 10 bit "ioprio hint" field within the existing 16 bit ioprio definition.
Currently, only 6 out of the 16 ioprio bits are in use, the remaining 10 bits
are unused, and are currently explicitly disallowed to be set by the kernel.
For now, we only add ioprio hints representing CDL DLD index 1-7. Additional
ioprio hints for other QoS features could be defined in the future.
A theoretical future work could be to make an I/O scheduler aware of these
hints. E.g. for CDL, an I/O scheduler could make use of the duration limit
in each descriptor, and take that information into account while scheduling
commands. Right now, the ioprio hints will be ignored by the I/O schedulers.
==============================
How to use CDL from user-space
==============================
Since CDL is mutually exclusive with NCQ priority
(see ncq_prio_enable and sas_ncq_prio_enable in
Documentation/ABI/testing/sysfs-block-device),
CDL has to be explicitly enabled using:
echo 1 > /sys/block/$bdev/device/cdl_enable
Since the ioprio hints are supplied through the existing I/O priority API,
it should be simple for an application to make use of the ioprio hints.
It simply has to reuse one of the new macros defined in
include/uapi/linux/ioprio.h: IOPRIO_PRIO_HINT() or IOPRIO_PRIO_VALUE_HINT(),
and supply one of the new hints defined in include/uapi/linux/ioprio.h:
IOPRIO_HINT_DEV_DURATION_LIMIT_[1-7], which indicates that the I/O should
use the corresponding CDL DLD index 1-7.
By reusing the I/O priority API, the user can both define a DLD to use per
AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread
(ioprio_set()).
=======
Testing
=======
With the following fio patches:
https://github.com/floatious/fio/commits/cdl
fio adds support for ioprio hints, such that CDL can be tested using e.g.:
fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_hint=DLD_index
A simple way to test is to use a DLD with a very short duration limit,
and send large reads. Regardless of the CDL policy, in case of a CDL
timeout, the I/O will result in a -ETIME error to user-space.
We also provide a CDL test suite located in the cdl-tools repo, see:
https://github.com/westerndigitalcorporation/cdl-tools#testing-a-system-command-duration-limits-support
We have tested this patch series using:
-real hardware
-the following QEMU implementation:
https://github.com/floatious/qemu/tree/cdl
(NOTE: the QEMU implementation requires you to define the CDL policy at compile
time, so you currently need to recompile QEMU when switching between policies.)
===================
Further information
===================
For further information about CDL, see Damien's slides:
Presented at SDC 2021:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf
Presented at Lund Linux Con 2022:
https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing
================
Changes since V6
================
-Rebased series on v6.4-rc1.
-Picked up Reviewed-by tags from Hannes (Thank you Hannes!)
-Picked up Reviewed-by tag from Christoph (Thank you Christoph!)
-Changed KernelVersion from 6.4 to 6.5 for new sysfs attributes.
For older change logs, see previous patch series versions:
https://lore.kernel.org/linux-scsi/
20230406113252.41211-1-nks@flawful.org/
https://lore.kernel.org/linux-scsi/
20230404182428.715140-1-nks@flawful.org/
https://lore.kernel.org/linux-scsi/
20230309215516.
3800571-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/
20230124190308.127318-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/
20230112140412.667308-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/
20221208105947.
2399894-1-niklas.cassel@wdc.com/
Link: https://lore.kernel.org/r/20230511011356.227789-1-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel [Thu, 11 May 2023 01:13:52 +0000 (03:13 +0200)]
scsi: ata: libata: Handle completion of CDL commands using policy 0xD
A CDL timeout for policy 0xF is defined as a NCQ error, just with a CDL
specific sk/asc/ascq in the sense data. Therefore, the existing code in
libata does not need to be modified to handle a policy 0xF CDL timeout.
For Command Duration Limits policy 0xD:
The device shall complete the command without error with the additional
sense code set to DATA CURRENTLY UNAVAILABLE.
Since a CDL timeout for policy 0xD is not an error, we cannot use the NCQ
Command Error log (10h).
Instead, we need to read the Sense Data for Successful NCQ Commands log
(0Fh).
In the success case, just like in the error case, we cannot simply read a
log page from the interrupt handler itself, since reading a log page
involves sending a READ LOG DMA EXT or READ LOG EXT command.
Therefore, we add a new EH action ATA_EH_GET_SUCCESS_SENSE. When a command
completes without error, and when the ATA_SENSE bit is set, this new action
is set as pending, and EH is scheduled.
This way, similar to the NCQ error case, the log page will be read from EH
context.
An alternative would have been to add a new kthread or workqueue to handle
this. However, extending EH can be done with minimal changes and avoids the
need to synchronize a new kthread/workqueue with EH.
Co-developed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-20-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:51 +0000 (03:13 +0200)]
scsi: ata: libata: Set read/write commands CDL index
For devices supporting the command duration limits feature, translate the
dld field of read and write operation to set the command duration limit
index field of the command task file when the duration limit feature is
enabled.
The function ata_set_tf_cdl() is introduced to do this. For unqueued (non
NCQ) read and write operations, this function sets the command duration
limit index set as the lower 3 bits of the feature field. For queued NCQ
read/write commands, the index is set as the lower 3 bits of the auxiliary
field.
The flag ATA_QCFLAG_HAS_CDL is introduced to indicate that a command
taskfile has a non zero cdl field.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-19-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:50 +0000 (03:13 +0200)]
scsi: ata: libata: Add ATA feature control sub-page translation
Add support for the ATA feature control sub-page of the control mode page
to enable/disable the command duration limits feature using the cdl_ctrl
field of the ATA feature control sub-page.
Both mode sense and mode select translation are supported. For mode sense,
the ata device flag ATA_DFLAG_CDL_ENABLED is used to cache the status of
the command duration limits feature. Enabling this feature is done using a
SET FEATURES command with a cdl action set to 1 when the page cdl_ctrl
field value is 0x2 (T2A and T2B pages supported). If this field is 0, CDL
is disabled using the SET FEATURES command with a cdl action set to 0.
Since a device CDL and NCQ priority features should not be used
simultaneously, ata_mselect_control_ata_feature() returns an error when
attempting to enable CDL with the device priority feature enabled.
Conversely, the function ata_ncq_prio_enable_store() used to enable the use
of the device NCQ priority feature through sysfs is modified to return an
error if the device CDL feature is enabled.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-18-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:49 +0000 (03:13 +0200)]
scsi: ata: libata-scsi: Add support for CDL pages mode sense
Modify ata_scsiop_mode_sense() and ata_msense_control() to support mode
sense access to the T2A and T2B sub-pages of the control mode page.
ata_msense_control() is modified to support sub-pages. The T2A sub-page is
generated using the read descriptors of the command duration limits log
page 18h. The T2B sub-page is generated using the write descriptors of the
same log page. With the addition of these sub-pages, getting all sub-pages
of the control mode page is also supported by increasing the value of
ATA_SCSI_RBUF_SIZE from 576B up to 2048B to ensure that all sub-pages fit
in the fill buffer.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-17-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:48 +0000 (03:13 +0200)]
scsi: ata: libata-scsi: Handle CDL bits in ata_scsiop_maint_in()
For a scsi MAINTENANCE_IN/MI_REPORT_SUPPORTED_OPERATION_CODES operation,
add the translation of the rwcdlp and cdlp bits for the READ 16 and WRITE
16 commands. If the ATA device does not support command duration limits,
these bits are always 0. If the ATA device supports command duration
limits, the rwcdlp bit is set to 1 for READ 16 and WRITE 16 and the cdlp
bits are set to 0x1 for READ 16 and 0x2 for WRITE 16. These correspond to
the T2A mode page containing the read descriptors and to the T2B mode page
containing the write descriptors, as defined in SAT-5.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-16-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:47 +0000 (03:13 +0200)]
scsi: ata: libata: Detect support for command duration limits
Use the supported capabilities identify device data log page to detect if a
device supports the command duration limits feature. For devices supporting
this feature, set the device flag ATA_DFLAG_CDL. To support SCSI-ATA
translation, retrieve the command duration limits log page 18h and cache
this page content using the cdl array added to the ata_device data
structure.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-15-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel [Thu, 11 May 2023 01:13:46 +0000 (03:13 +0200)]
scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION
Currently, ata_eh_request_sense() unconditionally sets the scsicmd->result
to SAM_STAT_CHECK_CONDITION.
For Command Duration Limits policy 0xD:
The device shall complete the command without error (SAM_STAT_GOOD) with
the additional sense code set to DATA CURRENTLY UNAVAILABLE.
It is perfectly fine to have sense data for a command that returned
completion without error.
In order to support for CDL policy 0xD, we have to remove this assumption
that having sense data means that the command failed
(SAM_STAT_CHECK_CONDITION).
Change ata_eh_request_sense() to not set SAM_STAT_CHECK_CONDITION, and
instead move the setting of SAM_STAT_CHECK_CONDITION to the single caller
that wants SAM_STAT_CHECK_CONDITION set, that way ata_eh_request_sense()
can be reused in a follow-up patch that adds support for CDL policy 0xD.
The only caller of ata_eh_request_sense() is protected by: if (!(qc->flags
& ATA_QCFLAG_SENSE_VALID)), so we can remove this duplicated check from
ata_eh_request_sense() itself.
Additionally, ata_eh_request_sense() is only called from
ata_eh_analyze_tf(), which is only called when iteratating the QCs using
ata_qc_for_each_raw(), which does not include the internal tag, so cmd can
never be NULL (all non-internal commands have qc->scsicmd set), so remove
the !cmd check as well.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-14-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel [Thu, 11 May 2023 01:13:45 +0000 (03:13 +0200)]
scsi: ata: libata-scsi: Remove unnecessary !cmd checks
There is no need to check if !cmd as this can only happen for ATA internal
commands which uses the ATA internal tag (32).
Most users of ata_scsi_set_sense() are from _xlat functions that translate
a scsicmd to an ATA command. These obviously have a qc->scsicmd.
ata_scsi_qc_complete() can also call ata_scsi_set_sense() via
ata_gen_passthru_sense() / ata_gen_ata_sense(), called via
ata_scsi_qc_complete(). This callback is only called for translated
commands, so it also has a qc->scsicmd.
ata_eh_analyze_ncq_error(): the NCQ error log can only contain a 0-31
value, so it will never be able to get the ATA internal tag (32).
ata_eh_request_sense(): only called by ata_eh_analyze_tf(), which is only
called when iteratating the QCs using ata_qc_for_each_raw(), which does not
include the internal tag.
Since there is no existing call site where cmd can be NULL, remove the !cmd
check from ata_scsi_set_sense() and ata_scsi_set_sense_information().
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-13-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel [Thu, 11 May 2023 01:13:44 +0000 (03:13 +0200)]
scsi: sd: Handle read/write CDL timeout failures
Commands using a duration limit descriptor that has limit policies set to a
value other than 0x0 may be failed by the device if one of the limits are
exceeded. For such commands, since the failure is the result of the user
duration limit configuration and workload, the commands should not be
retried and terminated immediately. Furthermore, to allow the user to
differentiate these "soft" failures from hard errors due to hardware
problem, a different error code than EIO should be returned.
There are 2 cases to consider:
(1) The failure is due to a limit policy failing the command with a check
condition sense key, that is, any limit policy other than 0xD. For this
case, scsi_check_sense() is modified to detect failures with the ABORTED
COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND
TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR
RECOVERY additional sense code. For these failures, a SUCCESS disposition
is returned so that scsi_finish_command() is called to terminate the
command.
(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and DATA
CURRENTLY UNAVAILABLE additional sense code. To handle this case, the
scsi_check_sense() is modified to return a SUCCESS disposition so that
scsi_finish_command() is called to terminate the command. In addition,
scsi_decide_disposition() has to be modified to see if a command being
terminated with GOOD status has sense data. This is as defined in SCSI
Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status
commands were not checked before.
If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(),
so that a command that failed because of a CDL timeout cannot be
retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to
complete the command request with the BLK_STS_DURATION_LIMIT status, which
result in the user seeing ETIME errors for the failed commands.
Co-developed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:43 +0000 (03:13 +0200)]
scsi: sd: Set read/write command CDL index
Introduce the command duration limits helper function sd_cdl_dld() to set
the DLD bits of READ/WRITE 16 and READ/WRITE 32 commands to indicate to the
device the command duration limit descriptor to apply to the commands.
When command duration limits are enabled, sd_cdl_dld() obtains the index of
the descriptor to apply to the command using the hints field of the request
IO priority value (hints IOPRIO_HINT_DEV_DURATION_LIMIT_1 to
IOPRIO_HINT_DEV_DURATION_LIMIT_7).
If command duration limits is disabled (which is the default), the limit
index "0" is always used to indicate "no limit" for a command.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-11-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:42 +0000 (03:13 +0200)]
scsi: core: Allow enabling and disabling command duration limits
Add the sysfs scsi_device attribute cdl_enable to allow a user to enable or
disable a device command duration limits feature. CDL is disabled by
default. This feature must be explicitly enabled by a user by setting the
cdl_enable attribute to 1.
The new function scsi_cdl_enable() does not do anything beside setting the
cdl_enable field of struct scsi_device in the case of a (real) SCSI device
(e.g. a SAS HDD). For ATA devices, the command duration limits feature
needs to be enabled/disabled using the ATA feature sub-page of the control
mode page. To do so, the scsi_cdl_enable() function checks if this mode
page is supported using scsi_mode_sense(). If it is, scsi_mode_select() is
used to enable and disable CDL.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-10-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:41 +0000 (03:13 +0200)]
scsi: core: Detect support for command duration limits
Introduce the function scsi_cdl_check() to detect if a device supports
command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32
and WRITE 32 commands are checked using the function scsi_report_opcode()
to probe the rwcdlp and cdlp bits as they indicate the mode page defining
the command duration limits descriptors that apply to the command being
tested.
If any of these commands support CDL, the field cdl_supported of struct
scsi_device is set to 1 to indicate that the device supports CDL.
Support for CDL for a device is advertizes through sysfs using the new
cdl_supported device attribute. This attribute value is 1 for a device
supporting CDL and 0 otherwise.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:40 +0000 (03:13 +0200)]
scsi: core: Support Service Action in scsi_report_opcode()
The REPORT_SUPPORTED_OPERATION_CODES command allows checking for support of
commands that have the same opcode but different service actions, such as
READ 32 and WRITE 32. However, the current implementation of
scsi_report_opcode() only allows checking an operation code without a
service action differentiation.
Add the "sa" argument to scsi_report_opcode() to allow passing a service
action. If a non-zero service action is specified, the reporting options
field value is set to 3 to have the service action field taken into account
by the device. If no service action field is specified (zero), the
reporting options field is set to 1 as before.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-8-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:39 +0000 (03:13 +0200)]
scsi: core: Support retrieving sub-pages of mode pages
Allow scsi_mode_sense() to retrieve sub-pages of mode pages by adding the
subpage argument. Change all the current caller sites to specify the
subpage 0.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-7-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel [Thu, 11 May 2023 01:13:38 +0000 (03:13 +0200)]
scsi: core: Rename and move get_scsi_ml_byte()
SCSI has two different getters:
- get_XXX_byte() (in scsi_cmnd.h) which takes a struct scsi_cmnd *, and
- XXX_byte() (in scsi.h) which takes a scmd->result.
The proper name for get_scsi_ml_byte() should thus be without the get_
prefix, as it takes a scmd->result. Rename the function to rectify this.
(This change was suggested by Mike Christie.)
Additionally, move get_scsi_ml_byte() to scsi_priv.h since both scsi_lib.c
and scsi_error.c will need to use this helper in a follow-up patch.
Cc: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-6-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel [Thu, 11 May 2023 01:13:37 +0000 (03:13 +0200)]
scsi: core: Allow libata to complete successful commands via EH
In SCSI, we get the sense data as part of the completion, for ATA however,
we need to fetch the sense data as an extra step. For an aborted ATA
command the sense data is fetched via libata's ->eh_strategy_handler().
For Command Duration Limits policy 0xD:
The device shall complete the command without error with the additional
sense code set to DATA CURRENTLY UNAVAILABLE.
In order to handle this policy in libata, we intend to send a successful
command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the
sense data for the good command. This is similar to how we handle an
aborted ATA command, just that we need to read the Successful NCQ Commands
log instead of the NCQ Command Error log.
When we get a SATA completion with successful commands, ATA_SENSE will be
set, indicating that some commands in the completion have sense data.
The sense_valid bitmask in the Sense Data for Successful NCQ Commands log
will inform exactly which commands that had sense data, which might be a
subset of all the commands that was completed in the same completion. (Yet
all will have ATA_SENSE set, since the status is per completion.)
The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" sense
data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will not set
the scmd->result to DID_TIME_OUT for these commands. However, the
successful commands that did not have sense data, must not get their result
marked as DID_TIME_OUT by SCSI EH.
Add a new flag SCMD_FORCE_EH_SUCCESS, which tells SCSI EH to not mark a
command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD.
This will be used by libata in a subsequent commit.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-5-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:36 +0000 (03:13 +0200)]
scsi: block: Introduce BLK_STS_DURATION_LIMIT
Introduce the new block I/O status BLK_STS_DURATION_LIMIT for LLDDs to
report command that failed due to a command duration limit being
exceeded. This new status is mapped to the ETIME error code to allow users
to differentiate "soft" duration limit failures from other more serious
hardware related errors.
If we compare BLK_STS_DURATION_LIMIT with BLK_STS_TIMEOUT:
-BLK_STS_DURATION_LIMIT means that the drive gave a reply indicating that
the command duration limit was exceeded before the command could be
completed. This I/O status is mapped to ETIME for user space.
-BLK_STS_TIMEOUT means that the drive never gave a reply at all.
This I/O status is mapped to ETIMEDOUT for user space.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-4-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:35 +0000 (03:13 +0200)]
scsi: block: Introduce ioprio hints
I/O priorities currently only use 6-bits of the 16-bits ioprio value: the
3-upper bits are used to define up to 8 priority classes (4 of which are
valid) and the 3 lower bits of the value are used to define a priority
level for the real-time and best-effort class.
The remaining 10-bits between the I/O priority class and level are unused,
and in fact, cannot be used by the user as doing so would either result in
the value being completely ignored, or in an error returned by
ioprio_check_cap().
Use these 10-bits of an ioprio value to allow a user to specify I/O
hints. An I/O hint is defined as a 10-bitsvalue, allowing up to 1023
different hints to be specified, with the value 0 being reserved as the "no
hint" case. An I/O hint can apply to any I/O that specifies a valid
priority class other than NONE, regardless of the I/O priority level
specified.
To do so, the macros IOPRIO_PRIO_HINT() and IOPRIO_PRIO_VALUE_HINT() are
introduced in include/uapi/linux/ioprio.h to respectively allow a user to
get and set a hint in an ioprio value.
To support the ATA and SCSI command duration limits feature, 7 hints are
defined: IOPRIO_HINT_DEV_DURATION_LIMIT_1 to
IOPRIO_HINT_DEV_DURATION_LIMIT_7, allowing a user to specify which command
duration limit descriptor should be applied to the commands serving an
I/O. Specifying these hints has for now no effect whatsoever if the target
block devices do not support the command duration limits feature. However,
in the future, block I/O schedulers can be modified to optimize I/O issuing
order based on these hints, even for devices that do not support the
command duration limits feature.
Given that the 7 duration limits hints defined have no effect on any block
layer component, the actual definition of the duration limits implied by
these hints remains at the device level.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-3-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 11 May 2023 01:13:34 +0000 (03:13 +0200)]
scsi: block: ioprio: Clean up interface definition
The I/O priority user interface defines the 16-bits ioprio values as the
combination of the upper 3-bits for an I/O priority class and the lower
13-bits as priority data. However, the kernel only uses the lower 3-bits of
the priority data to define priority levels for the RT and BE priority
classes. The data part of an ioprio value is completely ignored for the
IDLE and NONE classes. This is enforced by checks done in
ioprio_check_cap(), which is called for all paths that allow defining an
I/O priority for I/Os: the per-context ioprio_set() system call, aio
interface and io_uring interface.
Clarify this fact in the uapi ioprio.h header file and introduce the
IOPRIO_PRIO_LEVEL_MASK and IOPRIO_PRIO_LEVEL() macros for users to define
and get priority levels in an ioprio value. The coarser macro
IOPRIO_PRIO_DATA() is retained for backward compatibility with old
applications already using it. There is no functional change introduced
with this.
In-kernel users of the IOPRIO_PRIO_DATA() macro which are explicitly
handling I/O priority data as a priority level are modified to use the new
IOPRIO_PRIO_LEVEL() macro without any functional change. Since f2fs is the
only user of this macro not explicitly using that value as a priority
level, it is left unchanged.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-2-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 22 May 2023 20:35:02 +0000 (16:35 -0400)]
Merge patch series "Use block pr_ops in LIO"
Mike Christie <michael.christie@oracle.com> says:
The patches in this thread allow us to use the block pr_ops with LIO's
target_core_iblock module to support cluster applications in VMs. They
were built over Linus's tree. They also apply over linux-next and
Martin's tree and Jens's trees.
Currently, to use windows clustering or linux clustering (pacemaker +
cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you
have to use tcmu or pscsi or use a cluster aware FS/framework for the
LIO pr file. Setting up a cluster FS/framework is pain and waste when
your real backend device is already a distributed device, and pscsi
and tcmu are nice for specific use cases, but iblock gives you the
best performance and allows you to use stacked devices like
dm-multipath. So these patches allow iblock to work like pscsi/tcmu
where they can pass a PR command to the backend module. And then
iblock will use the pr_ops to pass the PR command to the real devices
similar to what we do for unmap today.
The patches are separated in the following groups:
Patch 1 - 2:
- Add block layer callouts for reading reservations and rename reservation
error code.
Patch 3 - 5:
- SCSI support for new callouts.
Patch 6:
- DM support for new callouts.
Patch 7 - 13:
- NVMe support for new callouts.
Patch 14 - 18:
- LIO support for new callouts.
This patchset has been tested with the libiscsi PGR ops and with
window's failover cluster verification test. Note that for scsi
backend devices we need this patchset:
https://lore.kernel.org/linux-scsi/
20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7
to handle UAs. To reduce the size of this patchset that's being done
separately to make reviewing easier. And to make merging easier this
patchset and the one above do not have any conflicts so can be merged
in different trees.
Link: https://lore.kernel.org/r/20230407200551.12660-1-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bagas Sanjaya [Wed, 10 May 2023 09:39:33 +0000 (16:39 +0700)]
scsi: dc395x: Documentation: Reword original driver attribution
The Linux kernel isn't in 2.6.x anymore, but rather the major version
has advanced much (currently 6.x). Reword the attribution.
Also, replace 404'ed 2.4 driver link with web.archive.org snapshot [1].
Link: https://web.archive.org/web/20140129181343/http://www.garloff.de/kurt/linux/dc395/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230510093933.19985-4-bagasdotme@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bagas Sanjaya [Wed, 10 May 2023 09:39:32 +0000 (16:39 +0700)]
scsi: dc395x: Documentation: Replace non-functional twibble.org list
Sync mailing list address in the documentation to follow MAINTAINERS.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230510093933.19985-3-bagasdotme@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bagas Sanjaya [Wed, 10 May 2023 09:39:31 +0000 (16:39 +0700)]
scsi: MAINTAINERS: Drop DC395x list and site
Emails to DC395x list bounce (550 error) and visiting the site returns 404
page.
Drop both twibble.org links. The driver should now be covered by linux-scsi
list.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230510093933.19985-2-bagasdotme@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jason Yan [Tue, 16 May 2023 11:01:31 +0000 (19:01 +0800)]
scsi: MAINTAINERS: Add a libsas entry
John has been reviewing libsas patches for years. And I have been
contributing to libsas for years and I am interested in reviewing and
testing libsas patches too. So add a libsas entry and add John and me as
reviewer.
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Link: https://lore.kernel.org/r/20230516110131.388634-1-yanaijie@huawei.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Azeem Shaikh [Tue, 16 May 2023 02:54:04 +0000 (02:54 +0000)]
scsi: qla2xxx: Replace all non-returning strlcpy() with strscpy()
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230516025404.2843867-1-azeemshaikh38@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Azeem Shaikh [Tue, 16 May 2023 02:53:55 +0000 (02:53 +0000)]
scsi: qla4xxx: Replace all non-returning strlcpy() with strscpy()
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230516025355.2835898-1-azeemshaikh38@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Azeem Shaikh [Tue, 16 May 2023 02:53:22 +0000 (02:53 +0000)]
scsi: target: Replace all non-returning strlcpy() with strscpy()
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230516025322.2804923-1-azeemshaikh38@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Azeem Shaikh [Tue, 16 May 2023 01:33:45 +0000 (01:33 +0000)]
scsi: bfa: Replace all non-returning strlcpy() with strscpy()
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230516013345.723623-1-azeemshaikh38@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Wed, 17 May 2023 01:36:51 +0000 (21:36 -0400)]
Merge patch series "scsi: hisi_sas: Some misc changes"
Xiang Chen <chenxiang66@hisilicon.com> says:
This series contains some fixes including:
- Configure initial value of some registers according to HBA model
- Change DMA setup lock timeout from 100ms to 2.5s
- Fix warnings detected by sparse
Link: https://lore.kernel.org/r/1684118481-95908-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xingui Yang [Mon, 15 May 2023 02:41:21 +0000 (10:41 +0800)]
scsi: hisi_sas: Fix warnings detected by sparse
This patch fixes the following warning:
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:2168:43: sparse: sparse: restricted __le32 degrades to integer
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304161254.NztCVZIO-lkp@intel.com/
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xingui Yang [Mon, 15 May 2023 02:41:20 +0000 (10:41 +0800)]
scsi: hisi_sas: Change DMA setup lock timeout to 2.5s
DMA setup lock timeout protection is added when DMA setup frames are
received. It's a function outside the protocol and used to prevent SATA
disk I/Os from being delivered for a long time. The default value is 100ms,
it's too strict and easily triggered timeout when the disk is overloaded or
faulty. Based on the average I/O latency of 300 disks, we adjust the value
to 2.5s.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Yihang Li [Mon, 15 May 2023 02:41:19 +0000 (10:41 +0800)]
scsi: hisi_sas: Configure initial value of some registers according to HBA model
For SAS HBAs of 920 and previous version, we use init_reg_v3_hw() to set
some registers which are related to HW boards. For SAS HBAs of 920B and
later version, those HW registers are set through firmware. And different
HBA models are distinguished through pci_dev->revision.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kees Cook [Thu, 11 May 2023 22:10:00 +0000 (15:10 -0700)]
scsi: megaraid_sas: Convert union megasas_sgl to flex-arrays
In the ongoing effort to replace all fake flexible arrays with true
flexible arrays, replace the sge32, sge64, and sge_skinny members of union
megasas_sgl with true flexible arrays. No binary differences are seen after
this change; sizes were already being manually calculated using the member
struct sizes directly.
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: megaraidlinux.pdl@broadcom.com
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230511220957.never.919-kees@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Krzysztof Kozlowski [Thu, 11 May 2023 17:52:04 +0000 (19:52 +0200)]
scsi: ufs: hwmon: Constify pointers to hwmon_channel_info
Statically allocated array of pointers to hwmon_channel_info can be made
const for safety.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230511175204.281038-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 8 May 2023 11:56:37 +0000 (07:56 -0400)]
Merge patch series "smartpqi updates"
Don Brace <don.brace@microchip.com> says:
These patches are based on Martin Petersen's 6.4/scsi-queue tree
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
6.4/scsi-queue
This set of changes consists of:
* Map entire BAR 0. The driver was mapping up to and including the
controller registers, but not all of BAR 0.
* Add PCI IDs to support new controllers.
* Clean up some code by removing unnecessary NULL checks. This cleanup is
a result of a Coverity report.
* Correct a rare memory leak whenever pqi_sas_port_add_rhpy() returns an
error. This was Suggested by: Yang Yingliang <yangyingliang@huawei.com>
* Remove atomic operations on variable raid_bypass_cnt. Accuracy is not
required for driver operation. Change type from atomic_t to unsigned
int.
* Correct a rare drive hot-plug removal issue where we get a NULL
io_request. We added a check for this condition.
* Turn on NCQ priority for AIO requests to disks comprising RAID devices.
* Correct byte aligned writew() operations on some ARM servers. Changed
the writew() to two writeb() operations.
* Change how the driver checks for a sanitize operation in progress. We
were using TEST UNIT READY. We removed the TEST UNIT READY code and are
now using the controller's firmware information in order to avoid issues
caused by drives failing to complete TEST UNIT READY.
* Some customers have been requesting that we add the NUMA node to
/sys/block/sd<scsi device>/device like the nvme driver does.
* Update the copyright information to match the current year.
* Bump the driver version to 2.1.22-040.
Link: https://lore.kernel.org/r/20230428153712.297638-1-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 8 May 2023 11:55:25 +0000 (07:55 -0400)]
Merge patch series "scsi: pm80xx: Enhanced debug logs for HW events"
Pranav Prasad <pranavpp@google.com> says:
This patch series enhances debug logs for pm80xx HW events, and provides a
minor fix in the case of a hard reset. The log enhancement involves changing
the log severity level to enable logging for HW events which consequently
help debug disk discovery issues.
1. Changed log severity level from MSG to EVENT for HW events. Enhanced
the HW event logs by adding the phyid.
2. Enabled INIT logging.
3. Log portid along with the PHY_UP event.
4. Print phyid and portid sent as part of device registration request.
5. Log port state during HW events.
6. Update phy_state and phy_attached to correct values after a hard reset.
Link: https://lore.kernel.org/r/20230418190101.696345-1-pranavpp@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 8 May 2023 11:54:31 +0000 (07:54 -0400)]
Merge patch series "scsi: libsas: remove empty branches and code simplification"
Jason Yan <yanaijie@huawei.com> says:
Three patches to remove two empty branches and a little code simplification.
Link: https://lore.kernel.org/r/20230421093744.1583609-1-yanaijie@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 8 May 2023 11:52:50 +0000 (07:52 -0400)]
Merge patch series "qla2xxx driver update"
Nilesh Javali <njavali@marvell.com> says:
Please apply the qla2xxx driver enhancement and bug fixes to the scsi tree
at your earliest convenience.
Link: https://lore.kernel.org/r/20230428075339.32551-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin K. Petersen [Mon, 8 May 2023 11:51:05 +0000 (07:51 -0400)]
Merge patch series "lpfc: Update lpfc to revision 14.2.0.12"
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.12
This patch set contains fixes flagged by code analyzer tools, introduces a
new CQE status to handle DMA errors, and replaces the usage of blk
interrupts with threaded interrupts.
The patches were cut against Martin's 6.4/scsi-queue tree.
Link: https://lore.kernel.org/r/20230417191558.83100-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dan Carpenter [Wed, 3 May 2023 10:40:59 +0000 (13:40 +0300)]
scsi: ufs: ufs-mediatek: Delete some dead code
There is already a test for "if (val == state)" earlier so it's not
possible here. Delete the dead code.
Fixes: 9006e3986f66 ("scsi: ufs-mediatek: Do not gate clocks if auto-hibern8 is not entered yet")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/68fce64f-4970-45f1-807e-6c0eecdfcdc2@kili.mountain
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jinhong Zhu [Tue, 2 May 2023 14:00:21 +0000 (22:00 +0800)]
scsi: qedf: Fix NULL dereference in error handling
Smatch reported:
drivers/scsi/qedf/qedf_main.c:3056 qedf_alloc_global_queues()
warn: missing unwind goto?
At this point in the function, nothing has been allocated so we can return
directly. In particular the "qedf->global_queues" have not been allocated
so calling qedf_free_global_queues() will lead to a NULL dereference when
we check if (!gl[i]) and "gl" is NULL.
Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Jinhong Zhu <jinhongzhu@hust.edu.cn>
Link: https://lore.kernel.org/r/20230502140022.2852-1-jinhongzhu@hust.edu.cn
Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Keoseong Park [Thu, 27 Apr 2023 09:44:20 +0000 (18:44 +0900)]
scsi: ufs: core: Change the module parameter macro of use_mcq_mode
mcq_mode_ops uses only param_{set,get}_bool(). Therefore, convert
module_param_cb() to module_param() and remove the mcq_mode_ops.
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Link: https://lore.kernel.org/r/20230427094420epcms2p1043333a3e0c0cf58e66164e0b83b3b02@epcms2p1
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Harshit Mogalapalli [Wed, 19 Apr 2023 06:42:56 +0000 (23:42 -0700)]
scsi: mpi3mr: Use -ENOMEM instead of -1 in mpi3mr_expander_add()
smatch warnings:
drivers/scsi/mpi3mr/mpi3mr_transport.c:1449 mpi3mr_expander_add() warn:
returning -1 instead of -ENOMEM is sloppy
No functional change.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202303202027.ZeDQE5Ug-lkp@intel.com/
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20230419064256.2532069-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Fri, 28 Apr 2023 15:37:12 +0000 (10:37 -0500)]
scsi: smartpqi: Update version to 2.1.22-040
Reviewed-by: Gerry Morong <gerry.morong@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-13-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Fri, 28 Apr 2023 15:37:11 +0000 (10:37 -0500)]
scsi: smartpqi: Update copyright to 2023
Update copyright to current year.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-12-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Fri, 28 Apr 2023 15:37:10 +0000 (10:37 -0500)]
scsi: smartpqi: Add sysfs entry for NUMA node in /sys/block/sdX/device
Although NUMA node is a PCIe device level attribute, it was requested the
NUMA node be added for each exposed device similar to NVMe disks.
Example for NVMe:
/sys/block/nvme1c1n1/device/numa_node
Example for smartpqi:
/sys/block/sdh/device/numa_node
cat /sys/block/sdh/device/numa_node
0
Reviewed-by: David Strahan <david.strahan@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-11-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Fri, 28 Apr 2023 15:37:09 +0000 (10:37 -0500)]
scsi: smartpqi: Stop sending driver-initiated TURs
Stop sending driver-initiated TURs to physical devices during driver
load/rescan.
Note: This does not affect SML initiated TURs.
Some Linux kernels can cause lengthy delays in OS boot if the kernel
detects that a drive is being sanitized/erased. We were using TURs to
detect if a sanitize/erase was in progress.
Some devices do not return the TUR in a timely manner, causing driver
load/rescan stalls.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-10-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Fri, 28 Apr 2023 15:37:08 +0000 (10:37 -0500)]
scsi: smartpqi: Fix byte aligned writew for ARM servers
Correct OOPs on ARM servers during driver init.
The driver attempts to update FW with max_feature_supported value using a
writew() kernel call using a byte aligned address. This fails on some ARM
systems.
Change the writew() to two writeb() calls to update this value.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-9-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Gilbert Wu [Fri, 28 Apr 2023 15:37:07 +0000 (10:37 -0500)]
scsi: smartpqi: Add support for RAID NCQ priority
Enable NCQ priority feature for the RAID path when AIO path is disabled.
Move function pqi_is_io_high_priority() up to avoid adding a prototype.
Remove unused argument ctrl_info.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Gilbert Wu <Gilbert.Wu@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-8-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Murthy Bhat [Fri, 28 Apr 2023 15:37:06 +0000 (10:37 -0500)]
scsi: smartpqi: Validate block layer host tag
Prevent OS crashes when a drive is hot removed during I/O stress test.
The I/O request pointer can be invalid if block layer provides incorrect
multi-queue host tag. This can lead to invalid I/O request pointer
dereference.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-7-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike McGowen [Fri, 28 Apr 2023 15:37:05 +0000 (10:37 -0500)]
scsi: smartpqi: Remove contention for raid_bypass_cnt
Reduce CPU contention when incrementing variable raid_bypass_cnt.
Remove the atomic operations for this variable by changing the atomic to an
unsigned int and replace atomic operations with standard operations. The
value is only checked that it is increasing and accuracy is not required.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-6-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Fri, 28 Apr 2023 15:37:04 +0000 (10:37 -0500)]
scsi: smartpqi: Fix rare SAS transport memory leak
Free rphy when pqi_sas_port_add_rphy() returns an error.
If pqi_sas_port_add_rphy() returns an error, the 'rphy' allocated in
sas_end_device_alloc() needs to be freed.
It should be noted that no issues were ever reported.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Suggested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-5-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kevin Barnett [Fri, 28 Apr 2023 15:37:03 +0000 (10:37 -0500)]
scsi: smartpqi: Remove NULL pointer check
Remove an unnecessary check for a NULL pointer. This unnecessary check was
flagged by Coverity.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-4-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
David Strahan [Fri, 28 Apr 2023 15:37:02 +0000 (10:37 -0500)]
scsi: smartpqi: Add new controller PCI IDs
All PCI ID entries in Hex.
Add PCI IDs for ZTE controllers:
VID / DID / SVID / SDID
---- ---- ---- ----
ZTE SmartROC3200 RS344-16i 4G 9005 / 028f / 1cf2 / 0804
ZTE SmartROC3200 RS345-16i 8G 9005 / 028f / 1cf2 / 0805
ZTE SmartIOC2200 RS346-16i 9005 / 028f / 1cf2 / 0806
ZTE SmartROC3200 RM344-16i 4G 9005 / 028f / 1cf2 / 54da
ZTE SmartROC3200 RM345-16i 8G 9005 / 028f / 1cf2 / 54db
ZTE SmartIOC2200 RM346-16i 9005 / 028f / 1cf2 / 54dc
Add PCI IDs for ByteDance controllers:
VID / DID / SVID / SDID
---- ---- ---- ----
ByteHBA JGH43014-8 9005 / 028f / 1e93 / 1005
Add PCI IDs for IBM controllers:
VID / DID / SVID / SDID
---- ---- ---- ----
IBM 4-Port 24G SAS 9005 / 028f / 1014 / 0718
Add PCI IDs for Cloudnine controllers:
VID / DID / SVID / SDID
---- ---- ---- ----
SmartHBA P6600-8i 9005 / 028f / 1f51 / 1001
SmartRAID P7604-8i 9005 / 028f / 1f51 / 1002
SmartHBA P6600-8e 9005 / 028f / 1f51 / 1003
SmartRAID P7604-8e 9005 / 028f / 1f51 / 1004
SmartHBA P6600-16i 9005 / 028f / 1f51 / 1005
SmartRAID P7608-16i 9005 / 028f / 1f51 / 1006
SmartHBA P6600-8i8e 9005 / 028f / 1f51 / 1007
SmartRAID P7608-8i8e 9005 / 028f / 1f51 / 1008
SmartHBA P6600-16e 9005 / 028f / 1f51 / 1009
SmartRAID P7608-16e 9005 / 028f / 1f51 / 100a
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: David Strahan <David.Strahan@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike McGowen [Fri, 28 Apr 2023 15:37:01 +0000 (10:37 -0500)]
scsi: smartpqi: Map full length of PCI BAR 0
Map full length of PCI BAR 0 at driver init.
During driver initialization, the driver must make a kernel call to map the
controller registers into kernel address space. A parameter to this call
is the length of the memory to be mapped. The driver was specifying the
wrong length.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nilesh Javali [Fri, 28 Apr 2023 07:53:39 +0000 (00:53 -0700)]
scsi: qla2xxx: Update version to 10.02.08.300-k
Update version to 10.02.08.300-k.
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Fri, 28 Apr 2023 07:53:38 +0000 (00:53 -0700)]
scsi: qla2xxx: Wait for io return on terminate rport
System crash due to use after free.
Current code allows terminate_rport_io to exit before making
sure all IOs has returned. For FCP-2 device, IO's can hang
on in HW because driver has not tear down the session in FW at
first sign of cable pull. When dev_loss_tmo timer pops,
terminate_rport_io is called and upper layer is about to
free various resources. Terminate_rport_io trigger qla to do
the final cleanup, but the cleanup might not be fast enough where it
leave qla still holding on to the same resource.
Wait for IO's to return to upper layer before resources are freed.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Fri, 28 Apr 2023 07:53:37 +0000 (00:53 -0700)]
scsi: qla2xxx: Fix mem access after free
System crash, where driver is accessing scsi layer's
memory (scsi_cmnd->device->host) to search for a well known internal
pointer (vha). The scsi_cmnd was released back to upper layer which
could be freed, but the driver is still accessing it.
7 [
ffffa8e8d2c3f8d0] page_fault at
ffffffff86c010fe
[exception RIP: __qla2x00_eh_wait_for_pending_commands+240]
RIP:
ffffffffc0642350 RSP:
ffffa8e8d2c3f988 RFLAGS:
00010286
RAX:
0000000000000165 RBX:
0000000000000002 RCX:
00000000000036d8
RDX:
0000000000000000 RSI:
ffff9c5c56535188 RDI:
0000000000000286
RBP:
ffff9c5bf7aa4a58 R8:
ffff9c589aecdb70 R9:
00000000000003d1
R10:
0000000000000001 R11:
0000000000380000 R12:
ffff9c5c5392bc78
R13:
ffff9c57044ff5c0 R14:
ffff9c56b5a3aa00 R15:
00000000000006db
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
8 [
ffffa8e8d2c3f9c8] qla2x00_eh_wait_for_pending_commands at
ffffffffc0646dd5 [qla2xxx]
9 [
ffffa8e8d2c3fa00] __qla2x00_async_tm_cmd at
ffffffffc0658094 [qla2xxx]
Remove access of freed memory. Currently the driver was checking to see if
scsi_done was called by seeing if the sp->type has changed. Instead,
check to see if the command has left the oustanding_cmds[] array as
sign of scsi_done was called.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Fri, 28 Apr 2023 07:53:36 +0000 (00:53 -0700)]
scsi: qla2xxx: Fix hang in task management
Task management command hangs where a side
band chip reset failed to nudge the TMF
from it's current send path.
Add additional error check to block TMF
from entering during chip reset and along
the TMF path to cause it to bail out, skip
over abort of marker.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Fri, 28 Apr 2023 07:53:35 +0000 (00:53 -0700)]
scsi: qla2xxx: Fix task management cmd fail due to unavailable resource
Task management command failed with status 2Ch which is
a result of too many task management commands sent
to the same target. Hence limit task management commands
to 8 per target.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304271952.NKNmoFzv-lkp@intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Fri, 28 Apr 2023 07:53:34 +0000 (00:53 -0700)]
scsi: qla2xxx: Fix task management cmd failure
Task management cmd failed with status 30h which means
FW is not able to finish processing one task management
before another task management for the same lun.
Hence add wait for completion of marker to space it out.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304271802.uCZfwQC1-lkp@intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com <mailto:himanshu.madhani@oracle.com>>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Fri, 28 Apr 2023 07:53:33 +0000 (00:53 -0700)]
scsi: qla2xxx: Multi-que support for TMF
Add queue flush for task management command, before
placing it on the wire.
Do IO flush for all Request Q's.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304271702.GpIL391S-lkp@intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com <mailto:himanshu.madhani@oracle.com>>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jason Yan [Fri, 21 Apr 2023 09:37:44 +0000 (17:37 +0800)]
scsi: libsas: factor out sas_check_fanout_expander_topo()
To be consistent with sas_check_edge_expander_topo(), factor out
sas_check_fanout_expander_topo(). And remove the comment since we are not
spilling over 80 colums now.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Link: https://lore.kernel.org/r/20230421093744.1583609-4-yanaijie@huawei.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jason Yan [Fri, 21 Apr 2023 09:37:43 +0000 (17:37 +0800)]
scsi: libsas: Remove an empty branch in sas_check_parent_topology()
There is an empty "all good" branch in sas_check_parent_topology(). We can
reverse the test statement and remove the empty branch.
Moreover, factor out a helper sas_check_edge_expander_topo() to make the
code more readable.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Link: https://lore.kernel.org/r/20230421093744.1583609-3-yanaijie@huawei.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jason Yan [Fri, 21 Apr 2023 09:37:42 +0000 (17:37 +0800)]
scsi: libsas: Simplify sas_check_eeds()
In sas_check_eeds() there is an empty branch. We can reverse the test
expression and then remove the empty branch. Also the test expression is a
little bit complex so it deserves an individual function. And make the
continuing prototype lines indented after the opening parenthesis to follow
the standard coding style.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Link: https://lore.kernel.org/r/20230421093744.1583609-2-yanaijie@huawei.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:58 +0000 (12:15 -0700)]
scsi: lpfc: Update lpfc version to 14.2.0.12
Update lpfc version to 14.2.0.12.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:57 +0000 (12:15 -0700)]
scsi: lpfc: Replace blk_irq_poll intr handler with threaded IRQ
It has been determined that the threaded IRQ API accomplishes effectively
the same performance metrics as blk_irq_poll. As blk_irq_poll is mostly
scheduled by the softirqd and handled in softirq context, this is not
entirely desired from a Fibre Channel driver context. A threaded IRQ model
fits cleaner. This patch replaces the blk_irq_poll logic with threaded
IRQ.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:56 +0000 (12:15 -0700)]
scsi: lpfc: Add new RCQE status for handling DMA failures
A new RCQE status value indicating DMA failure when transferring
asynchronously received data to an RQE is introduced. Such errors are
unexpected and handlers are updated to log KERN_ERR and dump lpfc's debug
trace buffer to kmsg.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:55 +0000 (12:15 -0700)]
scsi: lpfc: Update congestion warning notification period
The CMF_SYNC_WQE command is updated to use an 8-bit field sync period. All
related variables used to calculate congestion warning notifications are
updated to 8-bit fields accordingly.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:54 +0000 (12:15 -0700)]
scsi: lpfc: Match lock ordering of lpfc_cmd->buf_lock and hbalock for abort paths
The SCSI version of the abort handler routine, lpfc_abort_handler(), takes
the lpfc_cmd->buf_lock and then phba->hbalock.
Make the same change for the NVMe abort path, lpfc_nvme_fcp_abort(), to
have consistent lock ordering logic between the two abort paths.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:53 +0000 (12:15 -0700)]
scsi: lpfc: Fix double free in lpfc_cmpl_els_logo_acc() caused by lpfc_nlp_not_used()
Smatch detected a double free path because lpfc_nlp_not_used() releases an
ndlp object before reaching lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc().
Remove the outdated lpfc_nlp_not_used() routine. In
lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put(). In
lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep
the lpfc_nlp_put() at the end of the routine. If ndlp's rpi was
registered, then lpfc_unreg_rpi()'s completion routine performs the final
ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc().
Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc() is the final ndlp clean up.
Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable@vger.kernel.org> # v5.11+
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee [Mon, 17 Apr 2023 19:15:52 +0000 (12:15 -0700)]
scsi: lpfc: Fix verbose logging for SCSI commands issued to SES devices
For SES LUNs with scsi_device sector_size member set to zero, there is no
point to log an LBA. When verbose FCP driver logging is enabled, sanity
check sector_size before calling scsi_get_lba() on a scsi_cmnd.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Changyuan Lyu [Wed, 19 Apr 2023 17:55:02 +0000 (17:55 +0000)]
scsi: pm80xx: Add GET_NVMD timeout during probe
Add a wait timeout to prevent the kernel from waiting for the GET_NVMD
response forever during probe. Add a check for the controller state before
issuing GET_NVMD request.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230419175502.919999-1-pranavpp@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Changyuan Lyu [Tue, 18 Apr 2023 19:01:01 +0000 (19:01 +0000)]
scsi: pm80xx: Update PHY state after hard reset
Update phy_attached, phy_state, and port_state to correct values after a
hard rest. Without this patch, after a successful hard reset, phy_attached
is still 0, as a result, any following hard reset will cause a PHY START to
be issued first.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230418190101.696345-7-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Akshat Jain [Tue, 18 Apr 2023 19:01:00 +0000 (19:01 +0000)]
scsi: pm80xx: Log port state during HW event
Log port state during PHY_DOWN event to understand reasoning for PHY_DOWNs.
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230418190101.696345-6-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Akshat Jain [Tue, 18 Apr 2023 19:00:59 +0000 (19:00 +0000)]
scsi: pm80xx: Log phy_id and port_id in the device registration request
Print phy_id and port_id sent as part of device registration request.
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230418190101.696345-5-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Akshat Jain [Tue, 18 Apr 2023 19:00:58 +0000 (19:00 +0000)]
scsi: pm80xx: Print port_id in HW events
Log port_id and phy_id along with the PHY_UP event.
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230418190101.696345-4-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Akshat Jain [Tue, 18 Apr 2023 19:00:57 +0000 (19:00 +0000)]
scsi: pm80xx: Enable init logging
Enable init logging to debug drive discovery issues.
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230418190101.696345-3-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Akshat Jain [Tue, 18 Apr 2023 19:00:56 +0000 (19:00 +0000)]
scsi: pm80xx: Log some HW events by default
Log the following hw_event logs under EVENT log severity to help debug disk
issues:
HW_EVENT_LINK_ERR_INVALID_DWORD
HW_EVENT_LINK_ERR_DISPARITY_ERROR
HW_EVENT_LINK_ERR_CODE_VIOLATION
HW_EVENT_LINK_ERR_LOSS_OF_DWORD_SYNCH
HW_EVENT_LINK_ERR_PHY_RESET_FAILED
HW_EVENT_INBOUND_CRC_ERROR
HW_EVENT_PHY_ERROR
HW_EVENT_SAS_PHY_UP
HW_EVENT_SATA_PHY_UP
HW_EVENT_SATA_SPINUP_HOLD
HW_EVENT_PHY_DOWN
HW_EVENT_PORT_INVALID
HW_EVENT_MALFUNCTION
HW_EVENT_PORT_RESET_TIMER_TMO
HW_EVENT_PORT_RECOVERY_TIMER_TMO
HW_EVENT_HARD_RESET_RECEIVED
HW_EVENT_ID_FRAME_TIMEOUT
HW_EVENT_PORT_RECOVER
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230418190101.696345-2-pranavpp@google.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linus Torvalds [Sun, 7 May 2023 20:34:35 +0000 (13:34 -0700)]
Linux 6.4-rc1
Linus Torvalds [Sun, 7 May 2023 18:32:18 +0000 (11:32 -0700)]
Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git./linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec
2273949 546850.708501: 5029 cycles:
ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec
2273949 546850.708508: 32409 cycles:
ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec
2273949 546850.708526: 143369 cycles:
ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec
2273949 546850.708600: 372650 cycles:
ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec
2273949 546850.708791: 482953 cycles:
ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true
2273949 546850.709036: 501985 cycles:
ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true
2273949 546850.709292: 503065 cycles:
7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us
ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us
ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms
ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us
ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms
ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms
ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us
ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us
ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us
ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms
ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms
ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us
ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us
ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us
ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us
ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us
ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us
ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since
d566a9c2d482 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
Linus Torvalds [Sun, 7 May 2023 18:04:26 +0000 (11:04 -0700)]
Merge tag 'core-debugobjects-2023-05-06' of git://git./linux/kernel/git/tip/tip
Pull debugobjects fix from Thomas Gleixner:
"A single fix for debugobjects:
The recent fix to ensure atomicity of lookup and allocation
inadvertently broke the pool refill mechanism, so that debugobject
OOMs now in certain situations. The reason is that the functions which
got updated no longer invoke debug_objecs_init(), which is now the
only place to care about refilling the tracking object pool.
Restore the original behaviour by adding explicit refill opportunities
to those places"
* tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Ensure pool refill (again)
Linus Torvalds [Sun, 7 May 2023 17:57:14 +0000 (10:57 -0700)]
Merge tag 'v6.4-p2' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- A long-standing bug in crypto_engine
- A buggy but harmless check in the sun8i-ss driver
- A regression in the CRYPTO_USER interface
* tag 'v6.4-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: api - Fix CRYPTO_USER checks for report function
crypto: engine - fix crypto_queue backlog handling
crypto: sun8i-ss - Fix a test in sun8i_ss_setup_ivs()
Linus Torvalds [Sun, 7 May 2023 17:46:21 +0000 (10:46 -0700)]
Merge tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"smb3 client fixes, mostly DFS or reconnect related:
- Two DFS connection sharing fixes
- DFS refresh fix
- Reconnect fix
- Two potential use after free fixes
- Also print prefix patch in mount debug msg
- Two small cleanup fixes"
* tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Remove unneeded semicolon
cifs: fix sharing of DFS connections
cifs: avoid potential races when handling multiple dfs tcons
cifs: protect access of TCP_Server_Info::{origin,leaf}_fullpath
cifs: fix potential race when tree connecting ipc
cifs: fix potential use-after-free bugs in TCP_Server_Info::hostname
cifs: print smb3_fs_context::source when mounting
cifs: protect session status check in smb2_reconnect()
SMB3.1.1: correct definition for app_instance_id create contexts
Linus Torvalds [Sun, 7 May 2023 17:31:45 +0000 (10:31 -0700)]
Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A couple more patches that would be good to get into -rc1:
- Revert an i.MX patch that's causing video failures because division
math goes sideways
- Fix a clang + W=1 build isue where FIELD_PREP() is taking a 32-bit
variable instead of the usual u64 type
- Fix a Kconfig bug in the StarFive JH7110 clk config that selects a
reset controller when it can't be selected"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: starfive: Fix RESET_STARFIVE_JH7110 can't be selected in a specified case
clk: sp7021: Adjust width of _m in HWM_FIELD_PREP()
Revert "clk: imx: composite-8m: Add support to determine_rate"
Linus Torvalds [Sun, 7 May 2023 17:17:33 +0000 (10:17 -0700)]
Merge tag 'mailbox-v6.4' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
- mailbox api: allow direct registration to a channel and convert omap
and pcc to use mbox_bind_client
- omap and hi6220 : use of_property_read_bool
- test: fix double-free and use spinlock header
- rockchip and bcm-pdc: drop of_match_ptr
- mpfs: change config symbol
- mediatek gce: support MT6795
- qcom apcs: consolidate of_device_id and support IPQ9574
* tag 'mailbox-v6.4' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
dt-bindings: mailbox: qcom: add compatible for IPQ9574 SoC
mailbox: qcom-apcs-ipc: do not grow the of_device_id
dt-bindings: mailbox: qcom,apcs-kpss-global: use fallbacks for few variants
dt-bindings: mailbox: mediatek,gce-mailbox: Add support for MT6795
mailbox: mpfs: convert SOC_MICROCHIP_POLARFIRE to ARCH_MICROCHIP_POLARFIRE
mailbox: bcm-pdc: drop of_match_ptr for ID table
mailbox: rockchip: drop of_match_ptr for ID table
mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write()
mailbox: mailbox-test: Explicitly include header for spinlock support
mailbox: Use of_property_read_bool() for boolean properties
mailbox: pcc: Use mbox_bind_client
mailbox: omap: Use mbox_bind_client
mailbox: Allow direct registration to a channel
Linus Torvalds [Sun, 7 May 2023 17:00:09 +0000 (10:00 -0700)]
Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
"Nothing major in here, just two different parts:
- A small series from Breno that enables passing the full SQE down
for ->uring_cmd().
This is a prerequisite for enabling full network socket operations.
Queued up a bit late because of some stylistic concerns that got
resolved, would be nice to have this in 6.4-rc1 so the dependent
work will be easier to handle for 6.5.
- Fix for the huge page coalescing, which was a regression introduced
in the 6.3 kernel release (Tobias)"
* tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux:
io_uring: Remove unnecessary BUILD_BUG_ON
io_uring: Pass whole sqe to commands
io_uring: Create a helper to return the SQE size
io_uring/rsrc: check for nonconsecutive pages
Arnaldo Carvalho de Melo [Sat, 6 May 2023 21:07:37 +0000 (18:07 -0300)]
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
This reverts commit
a980755beb5aca9002e1c95ba519b83a44242b5b.
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 6 May 2023 21:06:43 +0000 (18:06 -0300)]
Revert "perf build: Warn for BPF skeletons if endian mismatches"
This reverts commit
51924ae69eea5bc90b5da525fbcf4bbd5f8551b3.
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sat, 6 May 2023 18:43:08 +0000 (11:43 -0700)]
Merge tag 'mm-stable-2023-05-06-10-49' of git://git./linux/kernel/git/akpm/mm
Pull dmapool updates - again - from Andrew Morton:
"Reinstate the dmapool changes which were accidentally removed by a
mishap on the last commit in the previous attempt at the series"
Fixes: 2d55c16c0c54 ("dmapool: create/destroy cleanup").
[ The whole old series:
def8574308ed..
2d55c16c0c54 results in an empty
diff because that last commit ended up being just a revert of all that
came everything before it. - Linus ]
* tag 'mm-stable-2023-05-06-10-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
dmapool: link blocks across pages
dmapool: don't memset on free twice
dmapool: simplify freeing
dmapool: consolidate page initialization
dmapool: rearrange page alloc failure handling
dmapool: move debug code to own functions
dmapool: speedup DMAPOOL_DEBUG with init_on_alloc
dmapool: cleanup integer types
dmapool: use sysfs_emit() instead of scnprintf()
dmapool: remove checks for dev == NULL
Linus Torvalds [Sat, 6 May 2023 18:25:03 +0000 (11:25 -0700)]
Merge tag 'mm-hotfixes-stable-2023-05-06-10-45' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"Five hotfixes.
Three are cc:stable, two pertain to merge window changes"
* tag 'mm-hotfixes-stable-2023-05-06-10-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
afs: fix the afs_dir_get_folio return value
nilfs2: do not write dirty data after degenerating to read-only
mm: do not reclaim private data from pinned page
nilfs2: fix infinite loop in nilfs_mdt_get_block()
mm/mmap/vma_merge: always check invariants
Keith Busch [Thu, 26 Jan 2023 21:51:24 +0000 (13:51 -0800)]
dmapool: link blocks across pages
The allocated dmapool pages are never freed for the lifetime of the pool.
There is no need for the two level list+stack lookup for finding a free
block since nothing is ever removed from the list. Just use a simple
stack, reducing time complexity to constant.
The implementation inserts the stack linking elements and the dma handle
of the block within itself when freed. This means the smallest possible
dmapool block is increased to at most 16 bytes to accommodate these
fields, but there are no exisiting users requesting a dma pool smaller
than that anyway.
Removing the list has a significant change in performance. Using the
kernel's micro-benchmarking self test:
Before:
# modprobe dmapool_test
dmapool test: size:16 blocks:8192 time:57282
dmapool test: size:64 blocks:8192 time:172562
dmapool test: size:256 blocks:8192 time:789247
dmapool test: size:1024 blocks:2048 time:371823
dmapool test: size:4096 blocks:1024 time:362237
After:
# modprobe dmapool_test
dmapool test: size:16 blocks:8192 time:24997
dmapool test: size:64 blocks:8192 time:26584
dmapool test: size:256 blocks:8192 time:33542
dmapool test: size:1024 blocks:2048 time:9022
dmapool test: size:4096 blocks:1024 time:6045
The module test allocates quite a few blocks that may not accurately
represent how these pools are used in real life. For a more marco level
benchmark, running fio high-depth + high-batched on nvme, this patch shows
submission and completion latency reduced by ~100usec each, 1% IOPs
improvement, and perf record's time spent in dma_pool_alloc/free were
reduced by half.
[kbusch@kernel.org: push new blocks in ascending order]
Link: https://lkml.kernel.org/r/20230221165400.1595247-1-kbusch@meta.com
Link: https://lkml.kernel.org/r/20230126215125.4069751-12-kbusch@meta.com
Fixes: 2d55c16c0c54 ("dmapool: create/destroy cleanup")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Keith Busch [Thu, 26 Jan 2023 21:51:23 +0000 (13:51 -0800)]
dmapool: don't memset on free twice
If debug is enabled, dmapool will poison the range, so no need to clear it
to 0 immediately before writing over it.
Link: https://lkml.kernel.org/r/20230126215125.4069751-11-kbusch@meta.com
Fixes: 2d55c16c0c54 ("dmapool: create/destroy cleanup")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>