review.tizen.org Git - platform/kernel/linux-starfive.git/log

scsi: lpfc: Fix pt2pt state transition causing rmmod hang

On a setup with a dual port HBA and both ports direct connected, an rmmod
hangs momentarily when we log an Illegal State Transition. Once it resumes,
a nodelist not empty logic is hit, which forces rmmod to cleanup and exit.
We're missing a state transition case in the discovery engine.

Fix by adding a case for a DEVICE_RM event while in the unmapped state to
avoid illegal state transition log message.

Link: https://lore.kernel.org/r/20210301171821.3427-17-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix nodeinfo debugfs output

The debugfs nodeinfo output gets jumbled when no rpri or a defer entry is
displayed. The misalignment makes it difficult to read.

Change the format to consistently print out a 4 character rpi, and turn
defer into a suffix.

Link: https://lore.kernel.org/r/20210301171821.3427-16-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix ADISC handling that never frees nodes

While testing target port swap test with ADISC enabled, several nodes
remain in UNUSED state. These nodes are never freed and rmmod hangs for
long time before finising with "0233 Nodelist not empty" error.

During PLOGI completion lpfc_plogi_confirm_nport() looks for existing nodes
with same WWPN. If found, the existing node is used to continue discovery.
The node on which plogi was performed is freed.  When ADISC is enabled, an
ADISC els request is triggered in response to an RSCN.  It's possible that
the ADISC may be rejected by the remote port causing the ADISC completion
handler to clear the port and node name in the node.  If this occurs, if a
PLOGI is received it causes a node lookup based on wwpn to now fail,
causing the port swap logic to kick in which allocates a new node and swaps
to it. This effectively orphans the original node structure.

Fix the situation by detecting when the lookup fails and forgo the node
swap and node allocation by using the node on which the PLOGI was issued.

Link: https://lore.kernel.org/r/20210301171821.3427-15-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix PLOGI ACC to be transmit after REG_LOGIN

The driver is seeing a scenario where PLOGI response was issued and traffic
is arriving while the adapter is still setting up the login context. This
is resulting in errors handling the traffic.

Change the driver so that PLOGI response is sent after the login context
has been setup to avoid the situation.

Link: https://lore.kernel.org/r/20210301171821.3427-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix dropped FLOGI during pt2pt discovery recovery

When connected in pt2pt mode, there is a scenario where the remote port
significantly delays sending a response to our FLOGI, but acts on the FLOGI
it sent us and proceeds to PLOGI/PRLI. The FLOGI ends up timing out and
kicks off recovery logic. End result is a lot of unnecessary state changes
and lots of discovery messages being logged.

Fix by terminating the FLOGI and noop'ing its completion if we have already
accepted the remote ports FLOGI and are now processing PLOGI.

Link: https://lore.kernel.org/r/20210301171821.3427-13-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix status returned in lpfc_els_retry() error exit path

An unlikely error exit path from lpfc_els_retry() returns incorrect status
to a caller, erroneously indicating that a retry has been successfully
issued or scheduled.

Change error exit path to indicate no retry.

Link: https://lore.kernel.org/r/20210301171821.3427-12-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix use after free in lpfc_els_free_iocb

There are several code paths where the following sequence occurs:

- An ndlp pointer is assigned to an iocb via a nlp_get()

- An attempt is made to issue the iocb, but it fails

- The failure case does a put on the ndlp then calls lpfc_els_free_iocb()

The put may free the ndlp structure, but the els_free_iocb may reference
the now-stale ndlp pointer and cause a crash.

Fix by ensuring that the lpfc_els_free_iocb() occurs before the
lpfc_nlp_put().

While fixing, refactor the code to better ensure this calling sequence.

Link: https://lore.kernel.org/r/20210301171821.3427-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix null pointer dereference in lpfc_prep_els_iocb()

It is possible to call lpfc_issue_els_plogi() passing a did for which no
matching ndlp is found. A call is then made to lpfc_prep_els_iocb() with a
null pointer to a lpfc_nodelist structure resulting in a null pointer
dereference.

Fix by returning an error status if no valid ndlp is found. Fix up comments
regarding ndlp reference counting.

Link: https://lore.kernel.org/r/20210301171821.3427-10-jsmart2021@gmail.com
Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix unnecessary null check in lpfc_release_scsi_buf

lpfc_fcp_io_cmd_wqe_cmpl() is intended to mirror
lpfc_nvme_io_cmd_wqe_cmpl() for sli4 fcp completions. When the routine was
added, lpfc_fcp_io_cmd_wqe_cmpl() included a null pointer check for
phba. However, phba is definitely valid, being dereferenced by the calling
routine and used later in the routine itself.

Remove the unnecessary null check.

Link: https://lore.kernel.org/r/20210301171821.3427-9-jsmart2021@gmail.com
Fixes: 96e209be6ecb ("scsi: lpfc: Convert SCSI I/O completions to SLI-3 and SLI-4 handlers")
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix pt2pt connection does not recover after LOGO

On a pt2pt setup, between 2 initiators, if one side issues a a LOGO, there
is no relogin attempt. The FC specs are grey in this area on which port
(higher wwn or not) is to re-login.

As there is no spec guidance, unconditionally re-PLOGI after the logout to
ensure a login is re-established.

Link: https://lore.kernel.org/r/20210301171821.3427-8-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix lpfc_els_retry() possible null pointer dereference

Driver crashed in lpfc_debugfs_disc_trc() due to null ndlp pointer. In
some calling cases, the ndlp is null and the did is looked up.

Fix by using the local did variable that is set appropriately based on ndlp
value.

Link: https://lore.kernel.org/r/20210301171821.3427-7-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix FLOGI failure due to accessing a freed node

After an initial successful FLOGI into the switch, if a subsequent FLOGI
fails the driver crashed accessing a node struct. On FLOGI error, the flogi
completion logic triggers the final dereference on the node structure
without checking if it is registered with a backend. The devloss logic is
triggered after node is freed leading to the access of freed node.

Fix by adjusting the error path to not take the final dereferece if there
is an outstanding transport registration. Let the transport devloss call
remove the final reference.

Link: https://lore.kernel.org/r/20210301171821.3427-6-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix stale node accesses on stale RRQ request

Whenever an RRQ needs to be triggered, the DID from the node structure and
node pointer are stored in the RRQ data structure and the RRQ is scheduled
for later transmission. However, at the point in time that the timer
triggers, there's no validation on the node pointer. Reference counters may
have freed the structure. Additionally the DID in the node may no longer be
valid.

Fix by not tracking the node pointer in the RRQ, only the DID. At the time
of the timer expiration, look up the node with the did and if present, send
the RRQ. If no node exists, no need to send the RRQ.

Link: https://lore.kernel.org/r/20210301171821.3427-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix reftag generation sizing errors

An LBA is 8 bytes. The driver generates a reftag from the LBA but the
reftag is 4 bytes. Thus scsi_get_lba() could return a value that exceeds
our reftag size.

Fix by converting all the code to calling the common routine
t10_pi_ref_tag() which returns a u32, thus ensuring a consistent 4byte
value. Also correct a few code lines that access LBA directly and ensure
64-bit data types are used.

Link: https://lore.kernel.org/r/20210301171821.3427-4-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix vport indices in lpfc_find_vport_by_vpid()

Calls to lpfc_find_vport_by_vpid() for the highest indexed vport fails with
error, "2936 Could not find Vport mapped to vpi XXX". Our vport indices in
the loop and if-clauses were off by one.

Correct the vpid range used for vpi lookup to include the highest possible
vpid.

Link: https://lore.kernel.org/r/20210301171821.3427-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe

The wqe_dbde field indicates whether a Data BDE is present in Words 0:2 and
should therefore should be clear in the abts request wqe. By setting the
bit we can be misleading fw into error cases.

Clear the wqe_dbde field.

Link: https://lore.kernel.org/r/20210301171821.3427-2-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: scsi_debug: Fix cmd duration calculation

In some cases, sdebug_defer::cmpl_ts (completion timestamp) wasn't being
properly set when REQ_HIPRI was given. Fix that and improve code to only
call ktime_get_boottime_ns() for commands with REQ_HIPRI set as cmpl_ts is
only used in that case.

Link: https://lore.kernel.org/r/20210304014107.307625-1-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Set shost as hctx driver_data

hctx->driver_data is not set for SCSI currently. Set hctx->driver_data =
shost.

Link: https://lore.kernel.org/r/20210215074048.19424-6-kashyap.desai@broadcom.com
Suggested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: scsi_debug: Add new defer type for mq_poll

Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests that
have REQ_HIPRI set in cmd_flags field. It is expected that these requests
will be polled via the mq_poll entry point which is driven by calls to
blk_poll() in the block layer. Therefore timer events are not 'wired up' in
the normal fashion.

There are still cases with short delays (e.g. < 10 microseconds) where by
the time the command response processing occurs, the delay is already
exceeded in which case the code calls scsi_done() directly. In such cases
there is no window for mq_poll() to be called.

Add 'mq_polls' counter that increments on each scsi_done() called via the
mq_poll entry point. Can be used to show (with 'cat
/proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing completions
rather than some other mechanism.

Link: https://lore.kernel.org/r/20210215074048.19424-5-kashyap.desai@broadcom.com
Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: scsi_debug: mq_poll support

Add support of the mq_poll interface to scsi_debug. This feature
requires shared host tag support in kernel and driver.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210215074048.19424-4-kashyap.desai@broadcom.com
Cc: dgilbert@interlog.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: mq_poll support

Implement mq_poll interface support in megaraid_sas. This feature
requires shared host tag support in kernel and driver.

The driver can work in non-IRQ mode which means there will not be any MSI-x
vector associated for poll_queues. The MegaRAID hardware has a single
submission queue and multiple reply queues. However, using the shared host
tagset support will enable the driver to simulate multiple hardware queues.

Change driver to allocate some extra reply queues which will be marked as
poll_queues. These poll_queues will not have associated MSI-x vectors. All
I/O completions on these queues will be done through the IOPOLL interface.

megaraid_sas with 8 poll_queues and using the io_uring hiprio=1 setting can
reach 3.2M IOPS with zero interrupts generated by the hardware.

The IOPOLL feature can be enabled using module parameter poll_queues.

Link: https://lore.kernel.org/r/20210215074048.19424-3-kashyap.desai@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: chandrakanth.patil@broadcom.com
Cc: linux-block@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Add mq_poll support to SCSI layer

Currently IOPOLL support is only available in block layer. This patch
adds mq_poll support to the SCSI layer.

Link: https://lore.kernel.org/r/20210215074048.19424-2-kashyap.desai@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: chandrakanth.patil@broadcom.com
Cc: linux-block@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Make completion affinity configurable

It may not always be best to complete the IO on same CPU as it was
submitted on. This commit allows userspace to configure it.

This has been useful for vhost-scsi where we have a single thread for
submissions and completions. If we force the completion on the submission
CPU we may be adding conflicts with what the user has setup in the lower
levels with settings like the block layer rq_affinity or the driver's IRQ
or softirq (the network's rps_cpus value) settings.

We may also want to set it up where the vhost thread runs on CPU N and does
its submissions/completions there, and then have LIO do its completion
booking on CPU M, but can't configure the lower levels due to issues like
using dm-multipath with lots of paths (the path selector can throw commands
all over the system because it's only taking into account latency/throughput
at its level).

The new setting is in:

    /sys/kernel/config/target/$fabric/$target/param/cmd_completion_affinity

Writing:

    -1 -> Gives the current default behavior of completing on the
          submission CPU.

    -2 -> Completes the cmd on the CPU the lower layers sent it to us from.

   > 0 -> Completes on the CPU userspace has specified.

Link: https://lore.kernel.org/r/20210227170006.5077-26-michael.christie@oracle.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Flush submission work during TMR processing

If a cmd is on the submission workqueue then the TMR code will miss it, and
end up returning task not found or success for LUN resets. The fabric
driver might then tell the initiator that the running cmds have been
handled when they are about to run.

This adds a flush when we are processing TMRs to make sure queued cmds do
not run after returning the TMR response.

Link: https://lore.kernel.org/r/20210227170006.5077-25-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcmu: Add backend plug/unplug callouts

This patch adds plug/unplug callouts for tcmu, so we can avoid the number
of times we switch to userspace. Using this driver with tcm_loop is a
common config, and dependng on the nr_hw_queues (nr_hw_queues=1 performs
much better) and fio jobs (lower num jobs around 4) this patch can increase
IOPS by only around 5-10% because we hit other issues like the big per tcmu
device mutex.

Link: https://lore.kernel.org/r/20210227170006.5077-24-michael.christie@oracle.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: iblock: Add backend plug/unplug callouts

This patch adds plug/unplug callouts for iblock. For an initiator driver
like iSCSI which wants to pass multiple cmds to its xmit thread instead of
one cmd at a time, this increases IOPS by around 10% with vhost-scsi
(combined with the last patches we can see a total 40-50% increase). For
driver combos like tcm_loop and faster drivers like the iSER initiator, we
can still see IOPS increase by 20-30% when tcm_loop's nr_hw_queues setting
is also increased.

Link: https://lore.kernel.org/r/20210227170006.5077-23-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Fix backend plugging

target_core_iblock is plugging and unplugging on every command and this is
causing perf issues for drivers that prefer batched cmds. With recent
patches we can now take multiple cmds from a fabric driver queue and then
pass them down the backend drivers in a batch. This patch adds this support
by adding 2 callouts to the backend for plugging and unplugging the
device. Subsequent commits will add support for iblock and tcmu device
plugging.

Link: https://lore.kernel.org/r/20210227170006.5077-22-michael.christie@oracle.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Cleanup cmd flag bits

We have a couple holes in the cmd flags definitions. This cleans up the
definitions to fix that and make it easier to read.

Link: https://lore.kernel.org/r/20210227170006.5077-21-michael.christie@oracle.com
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcm_loop: Use LIO wq cmd submission helper

Convert loop to use the LIO wq cmd submission helper.

Link: https://lore.kernel.org/r/20210227170006.5077-20-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcm_loop: Use block cmd allocator for se_cmds

Make tcm_loop use the block layer cmd allocator for se_cmds instead of
using the tcm_loop_cmd_cache. In the future when we can use the host tags
for internal requests like TMFs we can completely kill the
tcm_loop_cmd_cache.

Link: https://lore.kernel.org/r/20210227170006.5077-19-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: vhost-scsi: Use LIO wq cmd submission helper

Convert vhost-scsi to use the LIO wq cmd submission helper.

Link: https://lore.kernel.org/r/20210227170006.5077-18-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Add workqueue based cmd submission

loop and vhost/scsi do their target cmd submission from driver
workqueues. This allows them to avoid an issue where the backend may block
waiting for resources like tags/requests, mem/locks, etc and that ends up
blocking their entire submission path and for the case of vhost-scsi both
the submission and completion path.

This patch adds a helper drivers can use to submit from a LIO workqueue.
This code will then be extended in the next patches to fix the plugging of
backend devices.

We are only converting vhost/loop initially, but the workqueue based
submission will work for other drivers and have similar benefits where the
main target loops will not end up blocking one some backend resource.

Link: https://lore.kernel.org/r/20210227170006.5077-17-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Add gfp_t arg to target_cmd_init_cdb()

tcm_loop could be used like a normal block device, so we can't use
GFP_KERNEL and should use GFP_NOIO. This adds a gfp_t arg to
target_cmd_init_cdb() and converts the users. For every driver but loop
GFP_KERNEL is kept.

This will also be useful in subsequent patches where loop needs to do
target_submit_prep() from interrupt context to get a ref to the se_device,
and so it will need to use GFP_ATOMIC.

Link: https://lore.kernel.org/r/20210227170006.5077-16-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Remove target_submit_cmd_map_sgls()

Convert target_submit_cmd() to do its own calls and then remove
target_submit_cmd_map_sgls() since no one uses it.

Link: https://lore.kernel.org/r/20210227170006.5077-15-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcm_fc: Convert to new submission API

target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session().

tcm_fc uses target_stop_session() to sync session shutdown with LIO core,
so we use target_init_cmd(), target_submit_prep(), target_submit(), because
target_init_cmd() will now detect the target_stop_session() call and return
an error.

Link: https://lore.kernel.org/r/20210227170006.5077-14-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: xen-scsiback: Convert to new submission API

target_submit_cmd_map_sgls() is being removed, so convert xen to the new
submission API. This has it use target_init_cmd(), target_submit_prep(), or
target_submit() because we need to have LIO core map sgls which is now done
in target_submit_prep(). target_init_cmd() will never fail for xen because
it does its own sync during session shutdown, so we can remove that code.

Note: xen never calls target_stop_session() so target_submit_cmd_map_sgls()
never failed (in the new API target_init_cmd() handles
target_stop_session() being called when cmds are being submitted). If it
were to have used target_stop_session() and got an error, we would have hit
a refcount bug like xen and usb, because it does:

    if (rc < 0) {
            transport_send_check_condition_and_sense(se_cmd,
                            TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0);
            transport_generic_free_cmd(se_cmd, 0);
    }

transport_send_check_condition_and_sense() calls queue_status which calls
scsiback_cmd_done->target_put_sess_cmd. We do an extra
transport_generic_free_cmd() call above which would have dropped the
refcount to -1 and the refcount code would spit out errors.

Link: https://lore.kernel.org/r/20210227170006.5077-13-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: vhost-scsi: Convert to new submission API

target_submit_cmd_map_sgls() is being removed, so convert vhost-scsi to the
new submission API. This has it use target_init_cmd(),
target_submit_prep(), target_submit() because we need to have LIO core map
sgls which is now done in target_submit_prep(), and in the next patches we
will do the target_submit step from the LIO workqueue.

Note: vhost-scsi never calls target_stop_session() so
target_submit_cmd_map_sgls() never failed (in the new API target_init_cmd()
handles target_stop_session() being called when cmds are being
submitted). If it were to have used target_stop_session() and got an error,
we would have hit a refcount bug like xen and usb, because it does:

if (rc < 0) {
transport_send_check_condition_and_sense(se_cmd,
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0);
transport_generic_free_cmd(se_cmd, 0);
}

transport_send_check_condition_and_sense() calls queue_status which does
transport_generic_free_cmd(), and then we do an extra
transport_generic_free_cmd() call above which would have dropped the
refcount to -1 and the refcount code would spit out errors.

Link: https://lore.kernel.org/r/20210227170006.5077-12-michael.christie@oracle.com
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: usb: gadget: Convert to new submission API

target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session(). It will never return
a failure, so we can remove that code from the driver.

Note: Before these patches target_submit_cmd() would never return an error
for usb since it does not use target_stop_session(). If it did then we
would have hit a refcount error here:

    transport_send_check_condition_and_sense(se_cmd,
                             TCM_UNSUPPORTED_SCSI_OPCODE, 1);
    transport_generic_free_cmd(&cmd->se_cmd, 0);

transport_send_check_condition_and_sense() calls queue_status and the
driver can sometimes do transport_generic_free_cmd() from there via
uasp_status_data_cmpl(). In that case, the above
transport_generic_free_cmd() would then hit a refcount error.

So that other use of the above error path in the driver is also probably
wrong, but someone with the hardware needs to fix that.

Link: https://lore.kernel.org/r/20210227170006.5077-11-michael.christie@oracle.com
Cc: Felipe Balbi <balbi@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: sbp_target: Convert to new submission API

target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session(). It will never return
a failure, so we can remove that code from the driver.

Link: https://lore.kernel.org/r/20210227170006.5077-10-michael.christie@oracle.com
Cc: Chris Boot <bootc@bootc.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: tcm_loop: Convert to new submission API

target_submit_cmd_map_sgls() is being removed, so convert loop to
the new submission API.

Even though loop does its own shutdown sync, this has loop use
target_init_cmd()/target_submit_prep()/target_submit() since it needed to
map sgls and in the next patches it will use the API to use LIO's
workqueue.

Link: https://lore.kernel.org/r/20210227170006.5077-9-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: qla2xxx: Convert to new submission API

target_submit_cmd() is now only for simple drivers that do their
own sync during shutdown and do not use target_stop_session().

tcm_qla2xxx uses target_stop_session() to sync session shutdown with LIO
core, so we use target_init_cmd()/target_submit_prep()/target_submit(),
because target_init_cmd() will detect the target_stop_session() call and
return an error.

Link: https://lore.kernel.org/r/20210227170006.5077-8-michael.christie@oracle.com
Cc: Nilesh Javali <njavali@marvell.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: ibmvscsi_tgt: Convert to new submission API

target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session(). It will never return
a failure, so we can remove that code from the driver.

Link: https://lore.kernel.org/r/20210227170006.5077-7-michael.christie@oracle.com
Cc: Michael Cyr <mikecyr@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: srpt: Convert to new submission API

target_submit_cmd_map_sgls() is being removed, so convert srpt to the new
submission API.

srpt uses target_stop_session() to sync session shutdown with LIO core, so
we use target_init_cmd()/target_submit_prep()/target_submit(), because
target_init_cmd() will detect the target_stop_session() call and return an
error.

Link: https://lore.kernel.org/r/20210227170006.5077-6-michael.christie@oracle.com
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Break up target_submit_cmd_map_sgls()

This breaks up target_submit_cmd_map_sgls() into 3 helpers:

- target_init_cmd(): Do the basic general setup and get a refcount to the
   session to make sure the caller can execute the cmd.

- target_submit_prep(): Do the mapping, cdb processing and get a ref to
   the LUN.

- target_submit(): Pass the cmd to LIO core for execution.

The above functions must be used by drivers that either:

1. Rely on LIO for session shutdown synchronization by calling
    target_stop_session().

2. Need to map sgls.

When the next patches are applied then simple drivers that do not need the
extra functionality above can use target_submit_cmd() and not worry about
failures being returned and how to handle them, since many drivers were
getting this wrong and would have hit refcount bugs.

Also, by breaking target_submit_cmd_map_sgls() up into these 3 helper
functions, we can allow the later patches to do the init/prep from
interrupt context and then do the submission from a workqueue.

Link: https://lore.kernel.org/r/20210227170006.5077-5-michael.christie@oracle.com
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Michael Cyr <mikecyr@linux.ibm.com>
Cc: Chris Boot <bootc@bootc.net>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Rename transport_init_se_cmd()

Rename transport_init_se_cmd() to __target_init_cmd() to reflect that it is
more of an internal function that drivers should normally not use and
because we are going to add a new init function in the next patches.

Link: https://lore.kernel.org/r/20210227170006.5077-4-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Drop kref_get_unless_zero() in target_get_sess_cmd()

The kref_get_unless_zero() use in target_get_sess_cmd() was added in:

commit 1b4c59b7a1d0 ("target: fix potential race window in
target_sess_cmd_list_waiting()")'

but it does not seem to do anything.

The original patch might have thought we could have added the cmd to the
sess_wait_list and then target_wait_for_sess_cmds could do a put before
target_get_sess_cmd did its get. That wouldn't happen because we do the get
first then grab the sess lock and put it on the list.

It is also not needed now, because the sess_cmd_list does not exist anymore
and we instead wait on the session cmd_count.

The other problem with the commit is that several
target_submit_cmd_map_sgls()/target_submit_cmd() callers do not handle the
error case properly if it were to ever happen. These drivers think they
have their normal refcount on the cmd and in many cases do a
transport_generic_free_cmd() plus target_put_sess_cmd() so they would have
fired off the refcount WARN/BUGs.

This patch just changes the kref_get_unless_zero() to kref_get().

Link: https://lore.kernel.org/r/20210227170006.5077-3-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: target: core: Move t_task_cdb initialization

Prepare to split target_submit_cmd_map_sgls() so the initialization and
submission part can be called at different times. If the init part fails we
can reference the t_task_cdb early in some of the logging and tracing
code. Move it to transport_init_se_cmd() so we don't hit NULL pointer
crashes.

Link: https://lore.kernel.org/r/20210227170006.5077-2-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Replace sdev->device_busy with sbitmap

SCSI currently uses an atomic variable to track queue depth for each
attached device. The queue depth depends on many factors such as transport
type and device implementation. In addition, the SCSI device queue depth is
not a static entity but changes over time as a result of congestion
management.

While blk-mq currently tracks queue depth for each hctx, it can't easily be
changed to accommodate the SCSI per-device requirement.

The current approach of using an atomic variable doesn't scale well when
there are lots of CPU cores and the disk is very fast. IOPS can be
substantially impacted by the atomic in the hot path.

Replace the atomic variable sdev->device_busy with an sbitmap for tracking
the SCSI device queue depth.

It has been observed that IOPS is improved ~30% by this patchset in the
following test:

1) test machine(32 logical CPU cores)
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        2
Model name:          Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

2) setup scsi_debug:
modprobe scsi_debug virtual_gb=128 max_luns=1 submit_queues=32 delay=0 max_queue=256

3) fio script:
fio --rw=randread --size=128G --direct=1 --ioengine=libaio --iodepth=2048 \
--numjobs=32 --bs=4k --group_reporting=1 --group_reporting=1 --runtime=60 \
--loops=10000 --name=job1 --filename=/dev/sdN

[mkp: fix device_busy reference in mpt3sas]

Link: https://lore.kernel.org/r/20210122023317.687987-14-ming.lei@redhat.com
Link: https://lore.kernel.org/linux-block/20200119071432.18558-6-ming.lei@redhat.com/
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)

Limit SCSI device's queue depth to max(host->can_queue, 1024) in
scsi_change_queue_depth(). 1024 is big enough for saturating current fast
SCSI LUN(SSD or RAID volume on multiple SSDs). Also single hardware queue
depth is usually enough for saturating single LUN because per-core
performance is often considered in storage design.

This patch is needed for replacing sdev->device_busy with sbitmap which has
to be pre-allocated with reasonable max depth.

Link: https://lore.kernel.org/r/20210122023317.687987-13-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Add scsi_device_busy() wrapper

Add scsi_device_busy() helper to prepare drivers for tracking device queue
depth via sbitmap_queue.

Link: https://lore.kernel.org/r/20210122023317.687987-12-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: megaraid_sas: Replace sdev_busy with local counter

Use local tracking of per-sdev outstanding command since sdev_busy in SCSI
mid layer is improved for performance reason using sbitmap (earlier it was
atomic variable).

Link: https://lore.kernel.org/r/20210122023317.687987-11-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: core: Put hot fields of scsi_host_template in one cacheline

The following three fields of scsi_host_template are referenced in the SCSI
I/O submission hot path. Put them together in one cacheline:

- cmd_size

- queuecommand

- commit_rqs

Link: https://lore.kernel.org/r/20210122023317.687987-10-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: blk-mq: Return budget token from .get_budget callback

SCSI uses a global atomic variable to track queue depth for each
LUN/request queue.

This doesn't scale well when there are lots of CPU cores and the disk is
very fast. It has been observed that IOPS is affected a lot by tracking
queue depth via sdev->device_busy in the I/O path.

Return budget token from .get_budget callback. The budget token can be
passed to driver so that we can replace the atomic variable with
sbitmap_queue and alleviate the scaling problems that way.

Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: blk-mq: Add callbacks for storing & retrieving budget token

Since SCSI is the only driver which requires dispatch budget move the token
from struct request to struct scsi_cmnd.

Link: https://lore.kernel.org/r/20210122023317.687987-8-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sbitmap: Add sbitmap_calculate_shift() helper

Move code for calculating default shift into a public helper which can be
used by SCSI.

Link: https://lore.kernel.org/r/20210122023317.687987-7-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sbitmap: Export sbitmap_weight

SCSI's .device_busy will be converted to sbitmap and sbitmap_weight is
needed. Export the helper.

The only existing user of sbitmap_weight() uses it to find out how many
bits are set and not cleared. Align sbitmap_weight() meaning with this
usage model.

Link: https://lore.kernel.org/r/20210122023317.687987-6-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sbitmap: Move allocation hint into sbitmap

Allocation hint should have belonged to sbitmap. Also, when sbitmap's depth
is high and there is no need to use mulitple wakeup queues, user can
benefit from percpu allocation hint too.

Move allocation hint into sbitmap, then SCSI device queue can benefit from
allocation hint when converting to plain sbitmap.

Convert vhost/scsi.c to use sbitmap allocation with percpu alloc hint. This
is more efficient than the previous approach.

Link: https://lore.kernel.org/r/20210122023317.687987-5-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: virtualization@lists.linux-foundation.org
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sbitmap: Add helpers for updating allocation hint

Add helpers for updating allocation hint so that we can avoid duplicate
code.

Prepare for moving allocation hint into sbitmap.

Link: https://lore.kernel.org/r/20210122023317.687987-4-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sbitmap: Maintain allocation round_robin in sbitmap

Currently the allocation round_robin info is maintained by sbitmap_queue.

However, bit allocation really belongs to sbitmap. Move it there.

Link: https://lore.kernel.org/r/20210122023317.687987-3-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: virtualization@lists.linux-foundation.org
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sbitmap: Remove sbitmap_clear_bit_unlock

No one uses this helper any more, so kill it.

Link: https://lore.kernel.org/r/20210122023317.687987-2-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-debugfs: Add user-defined exception event rate limiting

An enabled user-specified exception event that does not clear quickly will
repeatedly cause the handler to run. That could unduly disturb the driver
behaviour being tested or debugged. To prevent that add debugfs file
exception_event_rate_limit_ms. When a exception event happens, it is
disabled, and then after a period of time (default 20ms) the exception
event is enabled again.

Note that if the driver also has that exception event enabled, it will not
be disabled.

Link: https://lore.kernel.org/r/20210209062437.6954-5-adrian.hunter@intel.com
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: ufs-debugfs: Add user-defined exception_event_mask

Allow users to enable specific exception events via debugfs.

The bits enabled by the driver ee_drv_ctrl are separated from the bits
enabled by the user ee_usr_ctrl. The control mask ee_mask_ctrl is the
logical-or of those two. A mutex is needed to ensure that the masks match
what was written to the device.

Link: https://lore.kernel.org/r/20210209062437.6954-4-adrian.hunter@intel.com
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: Add exception event definitions

For readability and completeness, add exception event definitions.

Link: https://lore.kernel.org/r/20210209062437.6954-3-adrian.hunter@intel.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: Add exception event tracepoint

Currently, exception event status can be read from wExceptionEventStatus
attribute (sysfs file attributes/exception_event_status under the UFS host
controller device directory). Polling that attribute to track UFS exception
events is impractical, so add a tracepoint to track exception events for
testing and debugging purposes.

Note, by the time the exception event status is read, the exception event
may have cleared, so the value can be zero - see example below.

Note also, only enabled exception events can be reported. A subsequent
patch adds the ability for users to enable selected exception events via
debugfs.

Example with driver instrumented to enable all exception events:

  # echo 1 > /sys/kernel/debug/tracing/events/ufs/ufshcd_exception_event/enable

  ... do some I/O ...

  # cat /sys/kernel/debug/tracing/trace
  # tracer: nop
  #
  # entries-in-buffer/entries-written: 3/3   #P:5
  #
  #                                _-----=> irqs-off
  #                               / _----=> need-resched
  #                              | / _---=> hardirq/softirq
  #                              || / _--=> preempt-depth
  #                              ||| /     delay
  #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
  #              | |         |   ||||      |         |
       kworker/2:2-173     [002] ....   731.486419: ufshcd_exception_event: 0000:00:12.5: status 0x0
       kworker/2:2-173     [002] ....   732.608918: ufshcd_exception_event: 0000:00:12.5: status 0x4
       kworker/2:2-173     [002] ....   732.609312: ufshcd_exception_event: 0000:00:12.5: status 0x4

Link: https://lore.kernel.org/r/20210209062437.6954-2-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Merge tag 'misc-5.12-2021-03-02' of git://git.kernel.dk/linux-block

Pull misc fixes from Jens Axboe:
"Two misc fixes that don't belong in other branches:

   - Fix a regression with ia64 signals, introduced by the
     TIF_NOTIFY_SIGNAL change in 5.11.

   - Fix the current swapfile regression from this merge window"

* tag 'misc-5.12-2021-03-02' of git://git.kernel.dk/linux-block:
  swap: fix swapfile read/write offset
  ia64: don't call handle_signal() unless there's actually a signal queued

swap: fix swapfile read/write offset

We're not factoring in the start of the file for where to write and
read the swapfile, which leads to very unfortunate side effects of
writing where we should not be...

Fixes: 48d15436fde6 ("mm: remove get_swap_bio")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ia64: don't call handle_signal() unless there's actually a signal queued

Sergei and John both reported that ia64 failed to boot in 5.11, and it
was related to signals. Turns out the ia64 signal handling is a bit odd,
it doesn't check the return value of get_signal() for whether there's a
signal to deliver or not. With the introduction of TIF_NOTIFY_SIGNAL,
then task_work could trigger it.

Fix it by only calling handle_signal() if we actually have a real signal
to deliver. This brings it in line with all other archs, too.

Fixes: b269c229b0e8 ("ia64: add support for TIF_NOTIFY_SIGNAL")
Reported-by: Sergei Trofimovich <slyich@gmail.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'kmap-conversion-for-5.12' of git://git./linux/kernel/git/kdave/linux

Pull kmap conversion updates from David Sterba:
"This contains changes regarding kmap API use and eg conversion from
  kmap_atomic to kmap_local_page.

  The API belongs to memory management but to save cross-tree
  dependency headaches we've agreed to take it through the btrfs tree
  because there are some trivial conversions possible, while the rest
  will need some time and getting the easy cases out of the way would be
  convenient.

  The changes can be grouped:

   - function exports, new helpers

   - new VM_BUG_ON for additional verification; it's been discussed if
     it should be VM_BUG_ON or BUG_ON, the former was chosen due to
     performance reasons

   - code replaced by relevant helpers"

[ This is an updated version of a request that originally came in during
  the merge window, but I asked for some updates:

    https://lore.kernel.org/lkml/cover.1614090658.git.dsterba@suse.com/

  which is why this got merge after the merge window closed.  - Linus ]

* 'kmap-conversion-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: use copy_highpage() instead of 2 kmaps()
  btrfs: use memcpy_[to|from]_page() and kmap_local_page()
  mm/highmem: Add VM_BUG_ON() to mem*_page() calls
  mm/highmem: Introduce memcpy_page(), memmove_page(), and memset_page()
  mm/highmem: Convert memcpy_[to|from]_page() to kmap_local_page()
  mm/highmem: Lift memcpy_[to|from]_page to core

Merge tag 'for-5.12-rc1-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"This is the first batch of fixes that usually arrive during the merge
  window code freeze. Regressions and stable material.

  Regressions:

   - fix deadlock in log sync in zoned mode

   - fix bugs in subpage mode still wrongly assuming sectorsize == page
     size

  Fixes:

   - fix missing kunmap of the Q stripe in RAID6

   - block group fixes:
      - fix race between extent freeing/allocation when using bitmaps
      - avoid double put of block group when emptying cluster

   - swapfile fixes:
      - fix swapfile writes vs running scrub
      - fix swapfile activation vs snapshot creation

   - fix stale data exposure after cloning a hole with NO_HOLES enabled

   - remove tree-checker check that does not work in case information
     from other leaves is necessary"

* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix deadlock on log sync
  btrfs: avoid double put of block group when emptying cluster
  btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
  btrfs: tree-checker: do not error out if extent ref hash doesn't match
  btrfs: fix race between swap file activation and snapshot creation
  btrfs: fix race between writes to swap files and scrub
  btrfs: avoid checking for RO block group twice during nocow writeback
  btrfs: fix race between extent freeing/allocation when using bitmaps
  btrfs: make check_compressed_csum() to be subpage compatible
  btrfs: make btrfs_submit_compressed_read() subpage compatible
  btrfs: fix raid6 qstripe kmap

Linux 5.12-rc1

Merge tag 'ide-5.11-2021-02-28' of git://git.kernel.dk/linux-block

Pull ide fix from Jens Axboe:
"This is a leftover fix from 5.11, where I forgot to ship it your way"

* tag 'ide-5.11-2021-02-28' of git://git.kernel.dk/linux-block:
ide/falconide: Fix module unload

Merge tag 'kbuild-fixes-v5.12' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix UNUSED_KSYMS_WHITELIST for Clang LTO

- Make -s builds really silent irrespective of V= option

- Fix build error when SUBLEVEL or PATCHLEVEL is empty

* tag 'kbuild-fixes-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again
  kbuild: make -s option take precedence over V=1
  ia64: remove redundant READELF from arch/ia64/Makefile
  kbuild: do not include include/config/auto.conf from adjust_autoksyms.sh
  kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO
  kbuild: lto: add _mcount to list of used symbols

Merge tag 'csky-for-linus-5.12-rc1' of git://github.com/c-sky/csky-linux

Pull arch/csky updates from Guo Ren:
"Features:
   - add new memory layout 2.5G(user):1.5G(kernel)
   - add kmemleak support
   - reconstruct VDSO framework: add VDSO with GENERIC_GETTIMEOFDAY,
     GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO
   - add faulthandler_disabled() check
   - support (fix) swapon
   - add (fix) _PAGE_ACCESSED for default pgprot
   - abort uaccess retries upon fatal signal (from arm)

  Fixes and optimizations:
   - fix perf probe failure
   - fix show_regs doesn't contain regs->usp
   - remove custom asm/atomic.h implementation
   - fix barrier design
   - fix futex SMP implementation
   - fix asm/cmpxchg.h with correct ordering barrier
   - cleanup asm/spinlock.h
   - fix PTE global for 2.5:1.5 virtual memory
   - remove prologue of page fault handler in entry.S
   - fix TLB maintenance synchronization problem
   - add show_tlb for CPU_CK860 debug
   - fix FAULT_FLAG_XXX param for handle_mm_fault
   - fix update_mmu_cache called with user io mapping
   - fix do_page_fault parent irq status
   - fix a size determination in gpr_get()
   - pgtable.h: Coding convention
   - kprobe: Fix code in simulate without 'long'
   - fix pfn_valid error with wrong max_mapnr
   - use free_initmem_default() in free_initmem()
   - fix compile error"

* tag 'csky-for-linus-5.12-rc1' of git://github.com/c-sky/csky-linux: (30 commits)
  csky: Fixup compile error
  csky: use free_initmem_default() in free_initmem()
  csky: Fixup pfn_valid error with wrong max_mapnr
  csky: Add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO
  csky: kprobe: Fixup code in simulate without 'long'
  csky: Fixup swapon
  csky: pgtable.h: Coding convention
  csky: Fixup _PAGE_ACCESSED for default pgprot
  csky: remove unused including <linux/version.h>
  csky: Fix a size determination in gpr_get()
  csky: Reconstruct VDSO framework
  csky: mm: abort uaccess retries upon fatal signal
  csky: Sync riscv mm/fault.c for easy maintenance
  csky: Fixup do_page_fault parent irq status
  csky: Add faulthandler_disabled() check
  csky: Fixup update_mmu_cache called with user io mapping
  csky: Fixup FAULT_FLAG_XXX param for handle_mm_fault
  csky: Add show_tlb for CPU_CK860 debug
  csky: Fix TLB maintenance synchronization problem
  csky: Add kmemleak support
  ...

Merge tag 'riscv-for-linus-5.12-mw1' of git://git./linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:
"A pair of patches that slipped through the cracks:

   - enable CPU hotplug in the defconfigs

   - some cleanups to setup_bootmem

  There's also a single fix for some randconfig build failures:

   - make NUMA depend on SMP"

* tag 'riscv-for-linus-5.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Cleanup setup_bootmem()
  RISC-V: Enable CPU Hotplug in defconfigs
  RISC-V: Make NUMA depend on SMP

Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
"This is a few driver updates (iscsi, mpt3sas) that were still in the
  staging queue when the merge window opened (all committed on or before
  8 Feb) and some small bug fixes which came in during the merge window
  (all committed on 22 Feb)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
  scsi: hpsa: Correct dev cmds outstanding for retried cmds
  scsi: sd: Fix Opal support
  scsi: target: tcmu: Fix memory leak caused by wrong uio usage
  scsi: target: tcmu: Move some functions without code change
  scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
  scsi: aic7xxx: Remove unused function pointer typedef ahc_bus_suspend/resume_t
  scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
  scsi: ufs: Fix a duplicate dev quirk number
  scsi: aic79xx: Fix spelling of version
  scsi: target: core: Prevent underflow for service actions
  scsi: target: core: Add cmd length set before cmd complete
  scsi: iscsi: Drop session lock in iscsi_session_chkready()
  scsi: qla4xxx: Use iscsi_is_session_online()
  scsi: libiscsi: Reset max/exp cmdsn during recovery
  scsi: iscsi_tcp: Fix shost can_queue initialization
  scsi: libiscsi: Add helper to calculate max SCSI cmds per session
  scsi: libiscsi: Fix iSCSI host workq destruction
  scsi: libiscsi: Fix iscsi_task use after free()
  scsi: libiscsi: Drop taskqueuelock
  scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
  ...

Merge tag 'xfs-5.12-merge-6' of git://git./fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
"The most notable fix here prevents premature reuse of freed metadata
  blocks, and adding the ability to detect accidental nested
  transactions, which are not allowed here.

   - Restore a disused sysctl control knob that was inadvertently
     dropped during the merge window to avoid fstests regressions.

   - Don't speculatively release freed blocks from the busy list until
     we're actually allocating them, which fixes a rare log recovery
     regression.

   - Don't nest transactions when scanning for free space.

   - Add an idiot^Wmaintainer light to detect nested transactions. ;)"

* tag 'xfs-5.12-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use current->journal_info for detecting transaction recursion
  xfs: don't nest transactions when scanning for eofblocks
  xfs: don't reuse busy extents on extent trim
  xfs: restore speculative_cow_prealloc_lifetime sysctl

Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:
"A few stragglers (and one due to me missing it originally), and fixes
  for changes in this merge window mostly. In particular:

   - blktrace cleanups (Chaitanya, Greg)

   - Kill dead blk_pm_* functions (Bart)

   - Fixes for the bio alloc changes (Christoph)

   - Fix for the partition changes (Christoph, Ming)

   - Fix for turning off iopoll with polled IO inflight (Jeffle)

   - nbd disconnect fix (Josef)

   - loop fsync error fix (Mauricio)

   - kyber update depth fix (Yang)

   - max_sectors alignment fix (Mikulas)

   - Add bio_max_segs helper (Matthew)"

* tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits)
  block: Add bio_max_segs
  blktrace: fix documentation for blk_fill_rw()
  block: memory allocations in bounce_clone_bio must not fail
  block: remove the gfp_mask argument to bounce_clone_bio
  block: fix bounce_clone_bio for passthrough bios
  block-crypto-fallback: use a bio_set for splitting bios
  block: fix logging on capacity change
  blk-settings: align max_sectors on "logical_block_size" boundary
  block: reopen the device in blkdev_reread_part
  block: don't skip empty device in in disk_uevent
  blktrace: remove debugfs file dentries from struct blk_trace
  nbd: handle device refs for DESTROY_ON_DISCONNECT properly
  kyber: introduce kyber_depth_updated()
  loop: fix I/O error on fsync() in detached loop devices
  block: fix potential IO hang when turning off io_poll
  block: get rid of the trace rq insert wrapper
  blktrace: fix blk_rq_merge documentation
  blktrace: fix blk_rq_issue documentation
  blktrace: add blk_fill_rwbs documentation comment
  block: remove superfluous param in blk_fill_rwbs()
  ...

kbuild: Fix <linux/version.h> for empty SUBLEVEL or PATCHLEVEL again

Commit 78d3bb4483ba ("kbuild: Fix <linux/version.h> for empty SUBLEVEL
or PATCHLEVEL") fixed the build error for empty SUBLEVEL or PATCHLEVEL
by prepending a zero.

Commit 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255") re-introduced
this issue.

This time, we cannot take the same approach because we have C code:

#define LINUX_VERSION_PATCHLEVEL $(PATCHLEVEL)
#define LINUX_VERSION_SUBLEVEL $(SUBLEVEL)

Replace empty SUBLEVEL/PATCHLEVEL with a zero.

Fixes: 9b82f13e7ef3 ("kbuild: clamp SUBLEVEL to 255")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-and-tested-by: Sasha Levin <sashal@kernel.org>

kbuild: make -s option take precedence over V=1

'make -s' should be really silent. However, 'make -s V=1' prints noisy
log messages from some shell scripts.

Of course, such a combination is odd, but the build system needs to do
the right thing even if a user gives strange input.

If -s is given, KBUILD_VERBOSE should be forced to 0.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

ia64: remove redundant READELF from arch/ia64/Makefile

READELF is defined by the top Makefile.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: do not include include/config/auto.conf from adjust_autoksyms.sh

Commit cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts")
split out the code that needs include/config/auto.conf.

This script no longer needs to include include/config/auto.conf.

Fixes: cd195bc4775a ("kbuild: split adjust_autoksyms.sh in two parts")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO

Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
does not work as expected if the .config file has already specified
CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling
CONFIG_LTO_CLANG.

So, the user-supplied whitelist and LTO-specific white list must be
independent of each other.

I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO
handle whitelists in the same way.

Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>

Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block

Pull io_uring thread rewrite from Jens Axboe:
"This converts the io-wq workers to be forked off the tasks in question
  instead of being kernel threads that assume various bits of the
  original task identity.

  This kills > 400 lines of code from io_uring/io-wq, and it's the worst
  part of the code. We've had several bugs in this area, and the worry
  is always that we could be missing some pieces for file types doing
  unusual things (recent /dev/tty example comes to mind, userfaultfd
  reads installing file descriptors is another fun one... - both of
  which need special handling, and I bet it's not the last weird oddity
  we'll find).

  With these identical workers, we can have full confidence that we're
  never missing anything. That, in itself, is a huge win. Outside of
  that, it's also more efficient since we're not wasting space and code
  on tracking state, or switching between different states.

  I'm sure we're going to find little things to patch up after this
  series, but testing has been pretty thorough, from the usual
  regression suite to production. Any issue that may crop up should be
  manageable.

  There's also a nice series of further reductions we can do on top of
  this, but I wanted to get the meat of it out sooner rather than later.
  The general worry here isn't that it's fundamentally broken. Most of
  the little issues we've found over the last week have been related to
  just changes in how thread startup/exit is done, since that's the main
  difference between using kthreads and these kinds of threads. In fact,
  if all goes according to plan, I want to get this into the 5.10 and
  5.11 stable branches as well.

  That said, the changes outside of io_uring/io-wq are:

   - arch setup, simple one-liner to each arch copy_thread()
     implementation.

   - Removal of net and proc restrictions for io_uring, they are no
     longer needed or useful"

* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
  io-wq: remove now unused IO_WQ_BIT_ERROR
  io_uring: fix SQPOLL thread handling over exec
  io-wq: improve manager/worker handling over exec
  io_uring: ensure SQPOLL startup is triggered before error shutdown
  io-wq: make buffered file write hashed work map per-ctx
  io-wq: fix race around io_worker grabbing
  io-wq: fix races around manager/worker creation and task exit
  io_uring: ensure io-wq context is always destroyed for tasks
  arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
  io_uring: cleanup ->user usage
  io-wq: remove nr_process accounting
  io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
  net: remove cmsg restriction from io_uring based send/recvmsg calls
  Revert "proc: don't allow async path resolution of /proc/self components"
  Revert "proc: don't allow async path resolution of /proc/thread-self components"
  io_uring: move SQPOLL thread io-wq forked worker
  io-wq: make io_wq_fork_thread() available to other users
  io-wq: only remove worker from free_list, if it was there
  io_uring: remove io_identity
  io_uring: remove any grabbing of context
  ...

Merge branch 'work.misc' of git://git./linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Assorted stuff pile - no common topic here"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  whack-a-mole: don't open-code iminor/imajor
  9p: fix misuse of sscanf() in v9fs_stat2inode()
  audit_alloc_mark(): don't open-code ERR_CAST()
  fs/inode.c: make inode_init_always() initialize i_ino to 0
  vfs: don't unnecessarily clone write access for writable fds

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Three more bugfixes and one revert. I accidently applied one patch too
  early"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: exynos5: Preserve high speed master code
  Revert "i2c: i2c-qcom-geni: Add shutdown callback for i2c"
  i2c: designware: Get right data length
  i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition

csky: Fixup compile error

: error: C++ style comments are not allowed in ISO C90
// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
^
error: (this will be reported only once per input file)

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

csky: use free_initmem_default() in free_initmem()

The existing code is essentially
free_initmem_default()->free_reserved_area() without poisoning.

Note that existing code missed to update the managed page count of the
zone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>

csky: Fixup pfn_valid error with wrong max_mapnr

The max_mapnr is the number of PFNs, not absolute PFN offset.
Using set_max_mapnr API instead of setting the value directly.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

csky: Add VDSO with GENERIC_GETTIMEOFDAY, GENERIC_TIME_VSYSCALL, HAVE_GENERIC_VDSO

It could help to reduce the latency of the time-related functions
in user space.

We have referenced arm's and riscv's implementation for the patch.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Arnd Bergmann <arnd@arndb.de>

csky: kprobe: Fixup code in simulate without 'long'

The type of 'val' is 'unsigned long' in simulate_blz32, so 'val < 0'
can't be true.

Cast 'val' to 'long' here to determine branch token or not,

Fixup instructions: bnezad32, bhsz32, bhz32, blsz32, blz32

Link: https://lore.kernel.org/linux-csky/CAJF2gTQjKXR9gpo06WAWG1aquiT87mATiMGorXs6ChxOxoe90Q@mail.gmail.com/T/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>

csky: Fixup swapon

Current csky's swappon is broken by wrong swap PTE entry format.
Now redesign the new format for abiv1 & abiv2 and make swappon +
zram work properly on csky machines.

C-SKY PTE has VALID, DIRTY to emulate PRESENT, READ, WRITE, EXEC
attributes. GLOBAL bit is shared by two pages in the same tlb
entry. So we need to keep GLOBAL, VALID, PRESENT zero in swp_pte.

To distinguish PAGE_NONE and swp_pte, we need to use an additional
bit (abiv1 is _PAGE_READ, abiv2 is _PAGE_WRITE).

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>

csky: pgtable.h: Coding convention

C-SKY page table attributes only have 'Dirty' and 'Valid' to
emulate 'PRESENT, READ, WRITE, EXEC, DIRTY, ACCESSED'.

This patch cleanup unnecessary definition.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>

kbuild: lto: add _mcount to list of used symbols

Some randconfig builds fail with undefined references to _mcount
when CONFIG_TRIM_UNUSED_KSYMS is set:

ERROR: modpost: "_mcount" [drivers/tee/optee/optee.ko] undefined!
ERROR: modpost: "_mcount" [drivers/fsi/fsi-occ.ko] undefined!
ERROR: modpost: "_mcount" [drivers/fpga/dfl-pci.ko] undefined!

Since there is already a list of symbols that get generated at link
time, add this one as well.

Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

riscv: Cleanup setup_bootmem()

After the following patches,

  commit de043da0b9e7 ("RISC-V: Fix usage of memblock_enforce_memory_limit")
  commit 1bd14a66ee52 ("RISC-V: Remove any memblock representing unusable memory area")
  commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")

some logic is useless, kill the mem_start/start/end and unneeded code.

Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

RISC-V: Enable CPU Hotplug in defconfigs

The CPU hotplug support has been tested on QEMU, Spike, and SiFive
Unleashed so let's enable it by default in RV32 and RV64 defconfigs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

RISC-V: Make NUMA depend on SMP

In theory these are orthogonal, but in practice all NUMA systems are
SMP. NUMA && !SMP doesn't build, everyone else is coupling them, and I
don't really see any value in supporting that configuration.

Fixes: 4f0e8eef772e ("riscv: Add numa support for riscv64 platform")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Atish Patra <atishp@atishpatra.org>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>

block: Add bio_max_segs

It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'docs-5.12-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
"A handful of late-arriving documentation fixes, nothing all that
  notable"

* tag 'docs-5.12-2' of git://git.lwn.net/linux:
  docs: proc.rst: fix indentation warning
  Documentation: cgroup-v2: fix path to example BPF program
  docs: powerpc: Fix tables in syscall64-abi.rst
  Documentation: features: refresh feature list
  Documentation: features: remove c6x references
  docs: ABI: testing: ima_policy: Fixed missing bracket
  Fix unaesthetic indentation
  scripts: kernel-doc: fix array element capture in pointer-to-func parsing
  doc: use KCFLAGS instead of EXTRA_CFLAGS to pass flags from command line
  Documentation: proc.rst: add more about the 6 fields in loadavg

Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:

- Update for Litex SoC controller to support wider width registers as
   well as reset.

- Refactor SMP code to use device tree to define possible cpus.

- Update build including generating vmlinux.bin

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: Use devicetree to determine present cpus
  drivers/soc/litex: Add restart handler
  openrisc: add arch/openrisc/Kbuild
  drivers/soc/litex: make 'litex_[set|get]_reg()' methods private
  drivers/soc/litex: support 32-bit subregisters, 64-bit CPUs
  drivers/soc/litex: s/LITEX_REG_SIZE/LITEX_SUBREG_ALIGN/g
  drivers/soc/litex: separate MMIO from subregister offset calculation
  drivers/soc/litex: move generic accessors to litex.h
  openrisc: restart: Call common handlers before hanging
  openrisc: Add vmlinux.bin target

Merge tag 's390-5.12-2' of git://git./linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

- Fix physical vs virtual confusion in some basic mm macros and
   routines. Caused by __pa == __va on s390 currently.

- Get rid of on-stack cpu masks.

- Add support for complete CPU counter set extraction.

- Add arch_irq_work_raise implementation.

- virtio-ccw revision and opcode fixes.

* tag 's390-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cpumf: Add support for complete counter set extraction
  virtio/s390: implement virtio-ccw revision 2 correctly
  s390/smp: implement arch_irq_work_raise()
  s390/topology: move cpumasks away from stack
  s390/smp: smp_emergency_stop() - move cpumask away from stack
  s390/smp: __smp_rescan_cpus() - move cpumask away from stack
  s390/smp: consolidate locking for smp_rescan()
  s390/mm: fix phys vs virt confusion in vmem_*() functions family
  s390/mm: fix phys vs virt confusion in pgtable allocation routines
  s390/mm: fix invalid __pa() usage in pfn_pXd() macros
  s390/mm: make pXd_deref() macros return a pointer
  s390/opcodes: rename selhhhr to selfhr