platform/kernel/linux-starfive.git
4 years agofloppy: cleanup: make output_byte() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:44 +0000 (11:40 +0200)]
floppy: cleanup: make output_byte() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-14-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: cleanup: make wait_til_ready() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:43 +0000 (11:40 +0200)]
floppy: cleanup: make wait_til_ready() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-13-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: cleanup: make show_floppy() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:42 +0000 (11:40 +0200)]
floppy: cleanup: make show_floppy() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-12-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: cleanup: make reset_fdc_info() not rely on current_fdc anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:41 +0000 (11:40 +0200)]
floppy: cleanup: make reset_fdc_info() not rely on current_fdc anymore

Now the fdc is passed in argument so that the function does not
use current_fdc anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-11-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: cleanup: make twaddle() not rely on current_{fdc,drive} anymore
Willy Tarreau [Tue, 31 Mar 2020 09:40:40 +0000 (11:40 +0200)]
floppy: cleanup: make twaddle() not rely on current_{fdc,drive} anymore

Now the fdc and drive are passed in argument so that the function does
not use current_fdc nor current_drive anymore.

Link: https://lore.kernel.org/r/20200331094054.24441-10-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: use symbolic register names in the x86 port
Willy Tarreau [Tue, 31 Mar 2020 09:40:39 +0000 (11:40 +0200)]
floppy: use symbolic register names in the x86 port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-9-w@1wt.eu
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: use symbolic register names in the sparc64 port
Willy Tarreau [Tue, 31 Mar 2020 09:40:38 +0000 (11:40 +0200)]
floppy: use symbolic register names in the sparc64 port

Now by splitting the base address from the register index we can
use the symbolic register names instead of the hard-coded numeric
values.

Link: https://lore.kernel.org/r/20200331094054.24441-8-w@1wt.eu
Cc: "David S. Miller" <davem@davemloft.net>
[willy: fix printk warnings s/%lx/%x/g in sun_82077_fd_{inb,outb}()]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: use symbolic register names in the sparc32 port
Willy Tarreau [Tue, 31 Mar 2020 09:40:37 +0000 (11:40 +0200)]
floppy: use symbolic register names in the sparc32 port

The sparc port used to be forced to rely on numeric register indexes
with their equivalent in comments. Now that they don't depend on the
IO port we can use their symbolic names.

Link: https://lore.kernel.org/r/20200331094054.24441-7-w@1wt.eu
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: use symbolic register names in the powerpc port
Willy Tarreau [Tue, 31 Mar 2020 09:40:36 +0000 (11:40 +0200)]
floppy: use symbolic register names in the powerpc port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-6-w@1wt.eu
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: use symbolic register names in the parisc port
Willy Tarreau [Tue, 31 Mar 2020 09:40:35 +0000 (11:40 +0200)]
floppy: use symbolic register names in the parisc port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-5-w@1wt.eu
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: use symbolic register names in the m68k port
Willy Tarreau [Tue, 31 Mar 2020 09:40:34 +0000 (11:40 +0200)]
floppy: use symbolic register names in the m68k port

Now we can use FD_STATUS and FD_DATA instead of 4 or 5, let's do
this, and also use STATUS_DMA and STATUS_READY for the status bits.

Link: https://lore.kernel.org/r/20200331094054.24441-4-w@1wt.eu
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: add references to 82077's extra registers
Willy Tarreau [Tue, 31 Mar 2020 09:40:33 +0000 (11:40 +0200)]
floppy: add references to 82077's extra registers

This controller provides extra status registers SRA and SRB as well
as a tape drive register (TDR) and a data rate select register (DSR),
which are referenced in the sparc port, so let's have their symbolic
definitions centralized.

Link: https://lore.kernel.org/r/20200331094054.24441-3-w@1wt.eu
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agofloppy: split the base port from the register in I/O accesses
Willy Tarreau [Tue, 31 Mar 2020 09:40:32 +0000 (11:40 +0200)]
floppy: split the base port from the register in I/O accesses

Currently we have architecture-specific fd_inb() and fd_outb() functions
or macros, taking just a port which is in fact made of a base address and
a register. The base address is FDC-specific and derived from the local or
global "fdc" variable through the FD_IOPORT macro used in the base address
calculation.

This change splits this by explicitly passing the FDC's base address and
the register separately to fd_outb() and fd_inb(). It affects the
following archs:
  - x86, alpha, mips, powerpc, parisc, arm, m68k:
    simple remap of port -> base+reg

  - sparc32: use of reg only, since the base address was already masked
    out and the FDC controller is known from a static struct.

  - sparc64: like x86 for PCI, like sparc32 for 82077

Some archs use inline functions and others macros. This was not
unified in order to minimize the number of changes to review. For the
same reason checkpatch still spews a few warnings about things that
were already there before.

The parisc still uses hard-coded register values and could be cleaned up
by taking the register definitions.

The sparc per-controller inb/outb functions could further be refined
to explicitly take an FDC register instead of a port in argument but it
was not needed yet and may be cleaned later.

Link: https://lore.kernel.org/r/20200331094054.24441-2-w@1wt.eu
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ian Molton <spyro@f2s.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Denis Efremov <efremov@linux.com>
4 years agonvme: define constants for identification values
Keith Busch [Fri, 3 Apr 2020 17:53:46 +0000 (10:53 -0700)]
nvme: define constants for identification values

Improve code readability by defining the specification's constants that
the driver is using when decoding identification payloads.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: align addrfam list to spec
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:48 +0000 (01:56 -0700)]
nvmet: align addrfam list to spec

With reference to the NVMeOF Specification (page 44, Figure 38)
discovery log page entry provides address family field. We do set the
transport type field but the adrfam field is not set when using loop
transport and also it doesn't have support in the nvme-cli. So when
reading discovery log page with a loop transport it leads to confusing
output.

As per the spec for adrfam value 254 is reserved for Intra Host
Transport i.e. loopback), we add a required macro in the protocol
header file, set default port disc addr entry's adrfam to
NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for
show/store attribute.

Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get
following output for nvme discover command from nvme-cli which is
confusing.
trtype:  loop
adrfam:  ipv4
trtype:  loop
adrfam:  ipv6
trtype:  loop
adrfam:  infiniband
trtype:  loop
adrfam:  fibre-channel
trtype:  loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam:  pci            # <----- pci for loop
trtype:  loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = " "
adrfam:  pci            # <----- pci for unrecognized

This patch fixes above output :-
trtype:  loop
adrfam:  ipv4
trtype:  loop
adrfam:  ipv6
trtype:  loop
adrfam:  infiniband
trtype:  loop
adrfam:  fibre-channel
trtype:  loop           # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam:  loop           # <----- loop for loop
trtype:  loop # ${CFGFS_HOME}/config/nvmet/ports/adrfam = " "
adrfam:  unrecognized   # <----- unrecognized when invalid value

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: centralize port enable access for configfs
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:47 +0000 (01:56 -0700)]
nvmet: centralize port enable access for configfs

The configfs attributes which are supposed to set when port is disable
such as addr[addrfam|portid|traddr|treq|trsvcid|inline_data_size|trtype]
has repetitive check and generic error message printing.

This patch creates centralize helper to check and print an error
message that also accepts caller as a parameter. This makes error
message easy to parse for the user, removes the duplicate code and
makes it available for futures such scenarios.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: use type-name map for address treq
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:46 +0000 (01:56 -0700)]
nvmet: use type-name map for address treq

Currently nvmet_addr_treq_[store|show]() uses switch and if else
ladder for address transport requirements to string and reverse
mapping. With addtion of the generic nvmet_type_name_map structure
we can get rid of the switch and if else ladder with string
duplication.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: use type-name map for ana states
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:45 +0000 (01:56 -0700)]
nvmet: use type-name map for ana states

Now that we have a generic type to name map for configfs, get rid of
the nvmet_ana_state_names structure and replace it with newly added
nvmet_type_name_map.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: use type-name map for address family
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:44 +0000 (01:56 -0700)]
nvmet: use type-name map for address family

Right now nvmet_addr_adrfam_[store|show]() uses switch and if else
ladder for address family to string and reverse mapping which also
repeats the strings in show and store function.

With addition of generic nvmet_type_name_map structure we can now get rid
of the switch and if else ladder and string duplication.

Also, we add a newline in before found label in nvmet_addr_trtype_store()
which keeps goto label code consistent with
nvmet_allowed_hosts_drop_link(), nvmet_port_subsys_drop_link() and
nvmet_ana_group_ana_state_store().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: add generic type-name mapping
Chaitanya Kulkarni [Mon, 4 May 2020 08:56:43 +0000 (01:56 -0700)]
nvmet: add generic type-name mapping

This patch adds a new type to name mapping generic structure. It
replaces nvmet_transport_name with new generic mapping structure
nvmet_transport.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-multipath: stop using ->queuedata
Christoph Hellwig [Sun, 29 Mar 2020 17:41:38 +0000 (19:41 +0200)]
nvme-multipath: stop using ->queuedata

nvme-multipath already uses the gendisk private data, not need to
also set up the request_queue queuedata and use it in one place only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-tcp: try to send request in queue_rq context
Sagi Grimberg [Fri, 1 May 2020 21:25:45 +0000 (14:25 -0700)]
nvme-tcp: try to send request in queue_rq context

Today, nvme-tcp automatically schedules a send request
to a workqueue context, which is 1 more than we'd need
in case the socket buffer is wide open.

However, because we have async send activity (as a result
of r2t, or write_space callbacks), we need to synchronize
sends from possibly multiple contexts (ideally all running
on the same cpu though).

Thus, we only try to send directly from queue_rq in cases:
1. the send_list is empty
2. we can send it synchronously (i.e. not from the RX path)
3. we run on the same cpu as the queue->io_cpu to avoid
   contention on the send operation.

Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-tcp: avoid scheduling io_work if we are already polling
Sagi Grimberg [Fri, 1 May 2020 21:25:44 +0000 (14:25 -0700)]
nvme-tcp: avoid scheduling io_work if we are already polling

When the user runs polled I/O, we shouldn't have to trigger
the workqueue to generate the receive work upon the .data_ready
upcall. This prevents a redundant context switch when the
application is already polling for completions.

Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-tcp: use bh_lock in data_ready
Sagi Grimberg [Thu, 30 Apr 2020 20:59:32 +0000 (13:59 -0700)]
nvme-tcp: use bh_lock in data_ready

data_ready may be invoked from send context or from
softirq, so need bh locking for that.

Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-pci: align io queue count with allocted nvme_queue in nvme_probe
Weiping Zhang [Sat, 2 May 2020 07:29:41 +0000 (15:29 +0800)]
nvme-pci: align io queue count with allocted nvme_queue in nvme_probe

Since commit 147b27e4bd08 ("nvme-pci: allocate device queues storage
space at probe"), nvme_alloc_queue does not alloc the nvme queues
itself anymore.

If the write/poll_queues module parameters are changed at runtime to
values larger than the number of allocated queues in nvme_probe,
nvme_alloc_queue will access unallocated memory.

Add a new nr_allocated_queues member to struct nvme_dev to record how
many queues were alloctated in nvme_probe to avoid using more than the
allocated queues after a reset following a change to the
write/poll_queues module parameters.

Also add nr_write_queues and nr_poll_queues members to allow refreshing
the number of write and poll queues based on a change to the module
parameters when resetting the controller.

Fixes: 147b27e4bd08 ("nvme-pci: allocate device queues storage space at probe")
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
[hch: add nvme_max_io_queues, update the commit message]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-pci: remove last_sq_tail
Keith Busch [Mon, 27 Apr 2020 18:54:46 +0000 (11:54 -0700)]
nvme-pci: remove last_sq_tail

The nvme driver does not have enough tags to wrap the queue, and blk-mq
will no longer call commit_rqs() when there are no new submissions to
notify.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-pci: remove volatile cqes
Keith Busch [Tue, 28 Apr 2020 14:21:56 +0000 (07:21 -0700)]
nvme-pci: remove volatile cqes

The completion queue entry is not volatile once the phase is confirmed.
Remove the volatile keywords and check the phase using the appropriate
READ_ONCE() accessor, allowing the compiler to optimize the remaining
completion path.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: flush scan work on passthrough commands
Keith Busch [Wed, 29 Apr 2020 20:31:23 +0000 (05:31 +0900)]
nvme: flush scan work on passthrough commands

If a passthrough command causes the namespace inventory or capabilities
to change, flush the scan work that handles these changes so the driver
synchronizes with the user command's effects before returning the result
to user space.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: clean up error handling in nvme_init_ns_head
Christoph Hellwig [Wed, 22 Apr 2020 07:59:08 +0000 (09:59 +0200)]
nvme: clean up error handling in nvme_init_ns_head

Use a common label for putting the nshead if needed and only convert
nvme status codes for the one case where it actually is needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc: avoid gcc-10 zero-length-bounds warning
Arnd Bergmann [Thu, 30 Apr 2020 21:30:57 +0000 (23:30 +0200)]
nvme-fc: avoid gcc-10 zero-length-bounds warning

When CONFIG_ARCH_NO_SG_CHAIN is set, op->sgl[0] cannot be dereferenced,
as gcc-10 now points out:

drivers/nvme/host/fc.c: In function 'nvme_fc_init_request':
drivers/nvme/host/fc.c:1774:29: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct scatterlist[0]' [-Wzero-length-bounds]
 1774 |  op->op.fcp_req.first_sgl = &op->sgl[0];
      |                             ^~~~~~~~~~~
drivers/nvme/host/fc.c:98:21: note: while referencing 'sgl'
   98 |  struct scatterlist sgl[NVME_INLINE_SG_CNT];
      |                     ^~~

I don't know if this is a legitimate warning or a false-positive.
If this is just a false alarm, the warning is easily suppressed
by interpreting the array as a pointer.

Fixes: b1ae1a238900 ("nvme-fc: Avoid preallocating big SGL for data")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet: add ns revalidation support
Anthony Iliopoulos [Sun, 19 Apr 2020 23:48:50 +0000 (16:48 -0700)]
nvmet: add ns revalidation support

Add support for detecting capacity changes on nvmet blockdev and file
backed namespaces. This allows for emulating and testing online resizing
of nvme devices and filesystems on top.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
[chaitanya: Fix comments posted on V1]
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: reuse code a bit more]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: consolodate io settings
Keith Busch [Thu, 9 Apr 2020 16:09:08 +0000 (09:09 -0700)]
nvme: consolodate io settings

The stream parameters indicating optimal io settings were just getting
overwritten later. Rearrange the settings so the streams parameters can
be preserved if provided.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: revalidate namespace stream parameters
Keith Busch [Thu, 9 Apr 2020 16:09:07 +0000 (09:09 -0700)]
nvme: revalidate namespace stream parameters

The stream parameters are based on the currently formatted logical block
size. Recheck these parameters on namespace revalidation so the
registered constraints will be accurate.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: consolidate chunk_sectors settings
Keith Busch [Thu, 9 Apr 2020 16:09:06 +0000 (09:09 -0700)]
nvme: consolidate chunk_sectors settings

Move the quirked chunk_sectors setting to the same location as noiob so
one place registers this setting. And since the noiob value is only used
locally, remove the member from struct nvme_ns.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: revalidate after verifying identifiers
Keith Busch [Thu, 9 Apr 2020 16:09:05 +0000 (09:09 -0700)]
nvme: revalidate after verifying identifiers

If the namespace identifiers have changed, skip updating the disk
information, as that will register parameters from a mismatched
namespace.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-multipath: set bdi capabilities once
Keith Busch [Thu, 9 Apr 2020 16:09:04 +0000 (09:09 -0700)]
nvme-multipath: set bdi capabilities once

The queues' backing device info capabilities don't change with each
namespace revalidation. Set it only when each path's request_queue
is initially added to a multipath queue.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: check namespace head shared property
Keith Busch [Thu, 9 Apr 2020 16:09:02 +0000 (09:09 -0700)]
nvme: check namespace head shared property

Reject a new shared namespace if a duplicate unshared namespace exists.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: always search for namespace head
Keith Busch [Thu, 9 Apr 2020 16:09:01 +0000 (09:09 -0700)]
nvme: always search for namespace head

Even if a namespace reports it is not capable of sharing, search the
subsystem for a matching namespace head. If found, the driver should
reject that namespace since it's coming from an invalid configuration.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: release namespace head reference on error
Keith Busch [Thu, 9 Apr 2020 16:09:00 +0000 (09:09 -0700)]
nvme: release namespace head reference on error

If a namespace identification does not match the subsystem's head for
that NSID, release the reference that was taken when the matching head
was initially found.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: unlink head after removing last namespace
Keith Busch [Thu, 9 Apr 2020 16:08:59 +0000 (09:08 -0700)]
nvme: unlink head after removing last namespace

The driver had been unlinking the namespace head from the subsystem's
list only after the last reference was released, and outside of the
list's subsys->lock protection.

There is no reason to track an empty head, so unlink the entry from the
subsystem's list when the last namespace using that head is removed and
with the mutex lock protecting the list update. The next namespace to
attach reusing the previous NSID will allocate a new head rather than
find the old head with mismatched identifiers.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: remove the magic 1024 constant in nvme_scan_ns_list
Christoph Hellwig [Sat, 4 Apr 2020 08:34:21 +0000 (10:34 +0200)]
nvme: remove the magic 1024 constant in nvme_scan_ns_list

Replace it with a value derived from the identify data and nsid sizes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: avoid an Identify Controller command for each namespace scan
Christoph Hellwig [Sat, 4 Apr 2020 08:31:35 +0000 (10:31 +0200)]
nvme: avoid an Identify Controller command for each namespace scan

The namespace lists are 0-terminated, so we don't really need the NN value
execept for the legacy sequential scan.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: factor out a nvme_ns_remove_by_nsid helper
Christoph Hellwig [Sat, 4 Apr 2020 08:30:32 +0000 (10:30 +0200)]
nvme: factor out a nvme_ns_remove_by_nsid helper

Factor out a piece of deeply indented and logicaly separate code
from nvme_scan_ns_list into a new helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: clean up nvme_scan_work
Christoph Hellwig [Sat, 4 Apr 2020 08:16:03 +0000 (10:16 +0200)]
nvme: clean up nvme_scan_work

Move the check for the supported CNS value into nvme_scan_ns_list, and
limit the life time of the identify controller allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: refine the Qemu Identify CNS quirk
Christoph Hellwig [Sat, 4 Apr 2020 08:11:28 +0000 (10:11 +0200)]
nvme: refine the Qemu Identify CNS quirk

Add a helper to check if we can use Identify CNS values > 1, and refine
the Qemu quirk to not apply to reported versions larger than 1.1, as the
Qemu implementation had been fixed by then.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: slight cleanup for kbuild test warnings
James Smart [Mon, 6 Apr 2020 23:55:34 +0000 (16:55 -0700)]
nvmet-fc: slight cleanup for kbuild test warnings

The kbuild tst robot flagged the following 3 issues:

Case 1)
>> drivers/nvme/target/fc.c:1201:37: warning: Either the condition
>> '!assoc' is redundant or there is possible null pointer dereference:
>> assoc. [nullPointerRedundantCheck]
>>  struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
                                       ^
>> drivers/nvme/target/fc.c:1853:7: note: Assuming that condition '!assoc'
>> is not redundant
>>   if (!assoc)
         ^
>> drivers/nvme/target/fc.c:1850:37: note: Assignment
>> 'assoc=nvmet_fc_find_target_assoc(tgtport,be64_to_cpu(
>>              rqst->associd.association_id))', assigned value is 0
>>   assoc = nvmet_fc_find_target_assoc(tgtport,
                                       ^
>> drivers/nvme/target/fc.c:1896:31: note: Calling function
>> 'nvmet_fc_delete_target_assoc', 1st argument 'assoc' value is 0
>>  nvmet_fc_delete_target_assoc(assoc);
                                 ^

The tool isn't smart enough to see that line 1854 sets a ret value which
thereafter causes the routine to exit. This occurs before any of the assoc
references, so it is not an issue. There are 2 more reportings of this
same failure.

To quiet the tool - rework the if test that does the exit to also
reference assoc.  No change in logic otherwise.

Case 2)
drivers/nvme/target/fc.c:1202:29: warning: The scope of the variable
'queue' can be reduced. [variableScope]
    struct nvmet_fc_tgt_queue *queue;
                               ^

The tool is requesting the variable be declared within the code block
that utilizes it. Ignoring this report as existing code style is fine.

Case 3)
drivers/nvme/target/fc.c:1137:16: warning: Variable 'needrandom' is
assigned a value that is never used. [unreadVariable]
       needrandom = true;
                  ^

Another parsing issue with the tool. Given that parens were not used
with the list_for_each_entry() check, it inadvertantly thinks the
break exited the outer while loop not the inner for loop.

This is not an error. But, added parens to the inner list_for_each_entry()
to quiet the tool and as it is better coding style.

-- james

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
CC: kbuild test robot <lkp@intel.com>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-rdma: use SRQ per completion vector
Max Gurtovoy [Wed, 19 Apr 2017 08:56:57 +0000 (11:56 +0300)]
nvmet-rdma: use SRQ per completion vector

In order to save resource allocation and utilize the completion
locality in a better way (compared to SRQ per device that exist today),
allocate Shared Receive Queues (SRQs) per completion vector. Associate
each created QP/CQ with an appropriate SRQ according to the queue index.
This association will reduce the lock contention in the fast path
(compared to SRQ per device solution) and increase the locality in
memory buffers. Add new module parameter for SRQ size to adjust it
according to the expected load. User should make sure the size is >= 256
to avoid lack of resources. Also reduce the debug level of "last WQE
reached" event that is raised when a QP is using SRQ during destruction
process to relief the log.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: remove unused parameter
Keith Busch [Fri, 3 Apr 2020 16:24:09 +0000 (09:24 -0700)]
nvme: remove unused parameter

nvme_alloc_ns_head() doesn't use the 'struct nvme_id_ns' parameter.
Remove it, and update caller accordingly.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: provide num dword helper
Keith Busch [Fri, 3 Apr 2020 16:24:01 +0000 (09:24 -0700)]
nvme: provide num dword helper

Various nvme commands use a zeroes based number of dwords field. Create
a helper function to convert byte lengths to this format.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: nvmet: Add Send LS Request and Abort LS Request support
James Smart [Tue, 31 Mar 2020 16:50:11 +0000 (09:50 -0700)]
lpfc: nvmet: Add Send LS Request and Abort LS Request support

Now that common helpers exist, add the ability to Send an NVME LS Request
and to Abort an outstanding LS Request to the nvmet side of the driver.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: nvmet: Add support for NVME LS request hosthandle
James Smart [Tue, 31 Mar 2020 16:50:10 +0000 (09:50 -0700)]
lpfc: nvmet: Add support for NVME LS request hosthandle

As the nvmet layer does not have the concept of a remoteport object, which
can be used to identify the entity on the other end of the fabric that is
to receive an LS, the hosthandle was introduced.  The driver passes the
hosthandle, a value representative of the remote port, with a ls request
receive. The LS request will create the association.  The transport will
remember the hosthandle for the association, and if there is a need to
initiate a LS request to the remote port for the association, the
hosthandle will be used. When the driver loses connectivity with the
remote port, it needs to notify the transport that the hosthandle is no
longer valid, allowing the transport to terminate associations related to
the hosthandle.

This patch adds support to the driver for the hosthandle. The driver will
use the ndlp pointer of the remote port for the hosthandle in calls to
nvmet_fc_rcv_ls_req().  The discovery engine is updated to invalidate the
hosthandle whenever connectivity with the remote port is lost.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: nvme: Add Receive LS Request and Send LS Response support to nvme
James Smart [Tue, 31 Mar 2020 16:50:09 +0000 (09:50 -0700)]
lpfc: nvme: Add Receive LS Request and Send LS Response support to nvme

Now that common helpers exist, add the ability to receive NVME LS requests
to the driver. New requests will be delivered to the transport by
nvme_fc_rcv_ls_req().

In order to complete the LS, add support for Send LS Response and send
LS response completion handling to the driver.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Refactor Send LS Response support
James Smart [Tue, 31 Mar 2020 16:50:08 +0000 (09:50 -0700)]
lpfc: Refactor Send LS Response support

Currently, the ability to send an NVME LS response is limited to the nvmet
(controller/target) side of the driver.  In preparation of both the nvme
and nvmet sides supporting Send LS Response, rework the existing send
ls_rsp and ls_rsp completion routines such that there is common code that
can be used by both sides.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Refactor Send LS Abort support
James Smart [Tue, 31 Mar 2020 16:50:07 +0000 (09:50 -0700)]
lpfc: Refactor Send LS Abort support

Send LS Abort support is needed when Send LS Request is supported.

Currently, the ability to abort an NVME LS request is limited to the nvme
(host) side of the driver.  In preparation of both the nvme and nvmet sides
supporting Send LS Abort, rework the existing ls_req abort routines such
that there is common code that can be used by both sides.

While refactoring it was seen the logic in the abort routine was incorrect.
It attempted to abort all NVME LS's on the indicated port. As such, the
routine was reworked to abort only the NVME LS request that was specified.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Refactor Send LS Request support
James Smart [Tue, 31 Mar 2020 16:50:06 +0000 (09:50 -0700)]
lpfc: Refactor Send LS Request support

Currently, the ability to send an NVME LS request is limited to the nvme
(host) side of the driver.  In preparation of both the nvme and nvmet sides
support Send LS Request, rework the existing send ls_req and ls_req
completion routines such that there is common code that can be used by
both sides.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Refactor NVME LS receive handling
James Smart [Tue, 31 Mar 2020 16:50:05 +0000 (09:50 -0700)]
lpfc: Refactor NVME LS receive handling

In preparation for supporting both intiator mode and target mode
receiving NVME LS's, commonize the existing NVME LS request receive
handling found in the base driver and in the nvmet side.

Using the original lpfc_nvmet_unsol_ls_event() and
lpfc_nvme_unsol_ls_buffer() routines as a templates, commonize the
reception of an NVME LS request. The common routine will validate the LS
request, that it was received from a logged-in node, and allocate a
lpfc_async_xchg_ctx that is used to manage the LS request. The role of
the port is then inspected to determine which handler is to receive the
LS - nvme or nvmet. As such, the nvmet handler is tied back in. A handler
is created in nvme and is stubbed out.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Commonize lpfc_async_xchg_ctx state and flag definitions
James Smart [Tue, 31 Mar 2020 16:50:04 +0000 (09:50 -0700)]
lpfc: Commonize lpfc_async_xchg_ctx state and flag definitions

The last step of commonization is to remove the 'T' suffix from
state and flag field definitions.  This is minor, but removes the
mental association that it solely applies to nvmet use.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Refactor nvmet_rcv_ctx to create lpfc_async_xchg_ctx
James Smart [Tue, 31 Mar 2020 16:50:03 +0000 (09:50 -0700)]
lpfc: Refactor nvmet_rcv_ctx to create lpfc_async_xchg_ctx

To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1),
both the nvme (host) and nvmet (controller/target) sides will need to be
able to receive LS requests.  Currently, this support is in the nvmet side
only. To prepare for both sides supporting LS receive, rename
lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolpfc: Refactor lpfc nvme headers
James Smart [Tue, 31 Mar 2020 16:50:02 +0000 (09:50 -0700)]
lpfc: Refactor lpfc nvme headers

A lot of files in lpfc include nvme headers, building up relationships that
require a file to change for its headers when there is no other change
necessary. It would be better to localize the nvme headers.

There is also no need for separate nvme (initiator) and nvmet (tgt)
header files.

Refactor the inclusion of nvme headers so that all nvme items are
included by lpfc_nvme.h

Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used
by both the nvme and nvmet sides. This prepares for structure sharing
between the two roles. Prep to add shared function prototypes for upcoming
shared routines.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fcloop: add target to host LS request support
James Smart [Tue, 31 Mar 2020 16:50:01 +0000 (09:50 -0700)]
nvme-fcloop: add target to host LS request support

Add support for performing LS requests from target to host.
Include sending request from targetport, reception into host,
host sending ls rsp.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fcloop: refactor to enable target to host LS
James Smart [Tue, 31 Mar 2020 16:50:00 +0000 (09:50 -0700)]
nvme-fcloop: refactor to enable target to host LS

Currently nvmefc-loop only sends LS's from host to target.
Slightly rework data structures and routine names to reflect this
path. Allows a straight-forward conversion to be used by ls's
from target to host.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: Add Disconnect Association Xmt support
James Smart [Tue, 31 Mar 2020 16:49:59 +0000 (09:49 -0700)]
nvmet-fc: Add Disconnect Association Xmt support

As part of FC-NVME-2 (and ammendment on FC-NVME), the target is to
send a Disconnect LS after an association is terminated and any
exchanges for the association have been ABTS'd. The target is also
not to send the receipt to any Disconnect Association LS, received
to initiate the association termination or received while the
association is terminating, until the Disconnect LS has been transmit.

Add support for sending Disconnect Association LS after all I/O's
complete (which is after ABTS'd certainly). Utilizes the new LLDD
api to send ls requests.

There is no need to track the Disconnect LS response or to retry
after timeout. All spec requirements will have been met by waiting
for i/o completion to initiate the transmission.

Add support for tracking the reception of Disconnect Association
and defering the response transmission until after the Disconnect
Association LS has been transmit.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: rename ls_list to ls_rcv_list
James Smart [Tue, 31 Mar 2020 16:49:58 +0000 (09:49 -0700)]
nvmet-fc: rename ls_list to ls_rcv_list

In preparation to add ls request support, rename the current ls_list,
which is RCV LS request only, to ls_rcv_list.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: track hostport handle for associations
James Smart [Tue, 31 Mar 2020 16:49:57 +0000 (09:49 -0700)]
nvmet-fc: track hostport handle for associations

In preparation for sending LS requests for an association that
terminates, save and track the hosthandle that is part of the
LS's that are received to create associations.

Support consists of:
- Create a hostport structure that will be 1:1 mapped to a
  host port handle. The hostport structure is specific to
  a targetport.
- Whenever an association is created, create a host port for
  the hosthandle the Create Association LS was received from.
  There will be only 1 hostport structure created, with all
  associations that have the same hosthandle sharing the
  hostport structure.
- When the association is terminated, the hostport reference
  will be removed. After the last association for the host
  port is removed, the hostport will be deleted.
- Add support for the new nvmet_fc_invalidate_host() interface.
  In the past, the LLDD didn't notify loss of connectivity to
  host ports - the LLD would simply reject new requests and wait
  for the kato timeout to kill the association. Now, when host
  port connectivity is lost, the LLDD can notify the transport.
  The transport will initiate the termination of all associations
  for that host port. When the last association has been terminated
  and the hosthandle will no longer be referenced, the new
  host_release callback will be made to the lldd.
- For compatibility with prior behavior which didn't report the
  hosthandle:  the LLDD must set hosthandle to NULL. In these
  cases, not LS request will be made, and no host_release callbacks
  will be made either.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: perform small cleanups on unneeded checks
James Smart [Tue, 31 Mar 2020 16:49:56 +0000 (09:49 -0700)]
nvmet-fc: perform small cleanups on unneeded checks

While code reviewing saw a couple of items that can be cleaned up:
- In nvmet_fc_delete_target_queue(), the routine unlocks, then checks
  and relocks.  Reorganize to avoid the unlock/relock.
- In nvmet_fc_delete_target_queue(), there's a check on the disconnect
  state that is unnecessary as the routine validates the state before
  starting any action.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: add LS failure messages
James Smart [Tue, 31 Mar 2020 16:49:55 +0000 (09:49 -0700)]
nvmet-fc: add LS failure messages

Add LS reception failure messages

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc: Add Disconnect Association Rcv support
James Smart [Tue, 31 Mar 2020 16:49:54 +0000 (09:49 -0700)]
nvme-fc: Add Disconnect Association Rcv support

The nvme-fc host transport did not support the reception of a
FC-NVME LS. Reception is necessary to implement full compliance
with FC-NVME-2.

Populate the LS receive handler, and specifically the handling
of a Disconnect Association LS. The response to the LS, if it
matched a controller, must be sent after the aborts for any
I/O on any connection have been sent.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: Update target for common definitions for LS handling
James Smart [Tue, 31 Mar 2020 16:49:53 +0000 (09:49 -0700)]
nvmet-fc: Update target for common definitions for LS handling

Given that both host and target now generate and receive LS's create
a single table definition for LS names. Each tranport half will have
a local version of the table.

Convert the target side transport to use the new common Create
Association LS validation routine.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc: Update header and host for common definitions for LS handling
James Smart [Tue, 31 Mar 2020 16:49:52 +0000 (09:49 -0700)]
nvme-fc: Update header and host for common definitions for LS handling

Given that both host and target now generate and receive LS's create
a single table definition for LS names. Each tranport half will have
a local version of the table.

As Create Association LS is issued by both sides, and received by
both sides, create common routines to format the LS and to validate
the LS.

Convert the host side transport to use the new common Create
Association LS formatting routine.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc: convert assoc_active flag to bit op
James Smart [Tue, 31 Mar 2020 16:49:51 +0000 (09:49 -0700)]
nvme-fc: convert assoc_active flag to bit op

Convert the assoc_active boolean flag to a bitop on the flags field.
The bit ops will provide atomicity.

To make this change, the flags field was converted to a long type,
which also affects the FCCTRL_TERMIO flag.  Both FCCTRL_TERMIO and
now ASSOC_ACTIVE flags are set/cleared by bit operations.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc: Ensure private pointers are NULL if no data
James Smart [Tue, 31 Mar 2020 16:49:50 +0000 (09:49 -0700)]
nvme-fc: Ensure private pointers are NULL if no data

Ensure that when allocations are done, and the lldd options indicate
no private data is needed, that private pointers will be set to NULL
(catches driver error that forgot to set private data size).

Slightly reorg the allocations so that private data follows allocations
for LS request/response buffers. Ensures better alignments for the buffers
as well as the private pointer.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvmet-fc: Better size LS buffers
James Smart [Tue, 31 Mar 2020 16:49:49 +0000 (09:49 -0700)]
nvmet-fc: Better size LS buffers

Current code uses NVME_FC_MAX_LS_BUFFER_SIZE (2KB) when allocating
buffers for LS requests and responses. This is considerable overkill
for what is actually defined.

Rework code to have unions for all possible requests and responses
and size based on the unions.  Remove NVME_FC_MAX_LS_BUFFER_SIZE.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc nvmet-fc: refactor for common LS definitions
James Smart [Tue, 31 Mar 2020 16:49:48 +0000 (09:49 -0700)]
nvme-fc nvmet-fc: refactor for common LS definitions

Routines in the target will want to be used in the host as well.
Error definitions should now shared as both sides will process
requests and responses to requests.

Moved common declarations to new fc.h header kept in the host
subdirectory.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc and nvmet-fc: revise LLDD api for LS reception and LS request
James Smart [Tue, 31 Mar 2020 16:49:47 +0000 (09:49 -0700)]
nvme-fc and nvmet-fc: revise LLDD api for LS reception and LS request

The current LLDD api has:
  nvme-fc: contains api for transport to do LS requests (and aborts of
    them). However, there is no interface for reception of LS's and sending
    responses for them.
  nvmet-fc: contains api for transport to do reception of LS's and sending
    of responses for them. However, there is no interface for doing LS
    requests.

Revise the api's so that both nvme-fc and nvmet-fc can send LS's, as well
as receiving LS's and sending their responses.

Change name of the rcv_ls_req struct to better reflect generic use as
a context to used to send an ls rsp. Specifically:
  nvmefc_tgt_ls_req -> nvmefc_ls_rsp
  nvmefc_tgt_ls_req.nvmet_fc_private -> nvmefc_ls_rsp.nvme_fc_private

Change nvmet_fc_rcv_ls_req() calling sequence to provide handle that
can be used by transport in later LS request sequences for an association.

nvme-fc nvmet_fc nvme_fcloop:
  Revise to adapt to changed names in api header.
  Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle.
  Add stubs for new interfaces:
    host/fc.c: nvme_fc_rcv_ls_req()
    target/fc.c: nvmet_fc_invalidate_host()

lpfc:
  Revise to adapt code to changed names in api header.
  Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-fc: Sync header to FC-NVME-2 rev 1.08
James Smart [Tue, 31 Mar 2020 16:49:46 +0000 (09:49 -0700)]
nvme-fc: Sync header to FC-NVME-2 rev 1.08

A couple of minor changes occurred between 1.06 and 1.08:
- Addition of NVME_SR_RSP opcode
- change of SR_RSP status code 1 to Reserved

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agohfs: stop using ioctl_by_bdev
Christoph Hellwig [Fri, 8 May 2020 08:18:28 +0000 (10:18 +0200)]
hfs: stop using ioctl_by_bdev

Instead just call the CDROM layer functionality directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: remove the name field in struct backing_dev_info
Christoph Hellwig [Mon, 4 May 2020 12:48:01 +0000 (14:48 +0200)]
bdi: remove the name field in struct backing_dev_info

The name is only printed for a not registered bdi in writeback.  Use the
device name there as is more useful anyway for the unlike case that the
warning triggers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: simplify bdi_alloc
Christoph Hellwig [Mon, 4 May 2020 12:48:00 +0000 (14:48 +0200)]
bdi: simplify bdi_alloc

Merge the _node vs normal version and drop the superflous gfp_t argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: remove bdi_register_owner
Christoph Hellwig [Mon, 4 May 2020 12:47:59 +0000 (14:47 +0200)]
bdi: remove bdi_register_owner

Split out a new bdi_set_owner helper to set the owner, and move the policy
for creating the bdi name back into genhd.c, where it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: unexport bdi_register_va
Christoph Hellwig [Mon, 4 May 2020 12:47:58 +0000 (14:47 +0200)]
bdi: unexport bdi_register_va

bdi_register_va is only used by super.c, which can't be modular.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agodriver core: remove device_create_vargs
Christoph Hellwig [Mon, 4 May 2020 12:47:57 +0000 (14:47 +0200)]
driver core: remove device_create_vargs

All external users of device_create_vargs are gone, so remove it and
open code it in the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: rename blk_mq_alloc_rq_maps
Weiping Zhang [Thu, 7 May 2020 13:04:42 +0000 (21:04 +0800)]
block: rename blk_mq_alloc_rq_maps

rename blk_mq_alloc_rq_maps to blk_mq_alloc_map_and_requests,
this function allocs both map and request, make function name align
with funtion.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: rename __blk_mq_alloc_rq_map
Weiping Zhang [Thu, 7 May 2020 13:04:22 +0000 (21:04 +0800)]
block: rename __blk_mq_alloc_rq_map

rename __blk_mq_alloc_rq_map to __blk_mq_alloc_map_and_request,
actually it alloc both map and request, make function name
align with function.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: alloc map and request for new hardware queue
Ming Lei [Thu, 7 May 2020 13:04:08 +0000 (21:04 +0800)]
block: alloc map and request for new hardware queue

Alloc new map and request for new hardware queue when increse
hardware queue count. Before this patch, it will show a
warning for each new hardware queue, but it's not enough, these
hctx have no maps and reqeust, when a bio was mapped to these
hardware queue, it will trigger kernel panic when get request
from these hctx.

Test environment:
 * A NVMe disk supports 128 io queues
 * 96 cpus in system

A corner case can always trigger this panic, there are 96
io queues allocated for HCTX_TYPE_DEFAULT type, the corresponding kernel
log: nvme nvme0: 96/0/0 default/read/poll queues. Now we set nvme write
queues to 96, then nvme will alloc others(32) queues for read, but
blk_mq_update_nr_hw_queues does not alloc map and request for these new
added io queues. So when process read nvme disk, it will trigger kernel
panic when get request from these hardware context.

Reproduce script:

nr=$(expr `cat /sys/block/nvme0n1/device/queue_count` - 1)
echo $nr > /sys/module/nvme/parameters/write_queues
echo 1 > /sys/block/nvme0n1/device/reset_controller
dd if=/dev/nvme0n1 of=/dev/null bs=4K count=1

[ 8040.805626] ------------[ cut here ]------------
[ 8040.805627] WARNING: CPU: 82 PID: 12921 at block/blk-mq.c:2578 blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805627] Modules linked in: nvme nvme_core nf_conntrack_netlink xt_addrtype br_netfilter overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nf_conntrack_tftp nft_masq nf_tables_set nft_fib_inet nft_f
ib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack tun bridge nf_defrag_ipv6 nf_defrag_ipv4 stp llc ip6_tables ip_tables nft_compat rfkill ip_set nf_tables nfne
tlink sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_
cstate intel_uncore raid0 joydev intel_rapl_perf ipmi_si pcspkr mei_me ioatdma sg ipmi_devintf mei i2c_i801 dca lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm d
rm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 8040.805637]  ahci drm i40e libahci crc32c_intel libata t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvme_core]
[ 8040.805640] CPU: 82 PID: 12921 Comm: kworker/u194:2 Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
[ 8040.805640] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8040.805641] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 8040.805642] RIP: 0010:blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805643] Code: 00 00 00 00 00 41 83 c5 01 44 39 6d 50 77 b8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b bb 98 00 00 00 89 d6 e8 8c 81 03 00 eb 83 <0f> 0b e9 52 ff ff ff 0f 1f 00 0f 1f 44 00 00 41 57 48 89 f1 41 56
[ 8040.805643] RSP: 0018:ffffba590d2e7d48 EFLAGS: 00010246
[ 8040.805643] RAX: 0000000000000000 RBX: ffff9f013e1ba800 RCX: 000000000000003d
[ 8040.805644] RDX: ffff9f00ffff6000 RSI: 0000000000000003 RDI: ffff9ed200246d90
[ 8040.805644] RBP: ffff9f00f6a79860 R08: 0000000000000000 R09: 000000000000003d
[ 8040.805645] R10: 0000000000000001 R11: ffff9f0138c3d000 R12: ffff9f00fb3a9008
[ 8040.805645] R13: 000000000000007f R14: ffffffff96822660 R15: 000000000000005f
[ 8040.805645] FS:  0000000000000000(0000) GS:ffff9f013fa80000(0000) knlGS:0000000000000000
[ 8040.805646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8040.805646] CR2: 00007f7f397fa6f8 CR3: 0000003d8240a002 CR4: 00000000007606e0
[ 8040.805647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8040.805647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8040.805647] PKRU: 55555554
[ 8040.805647] Call Trace:
[ 8040.805649]  blk_mq_update_nr_hw_queues+0x31b/0x390
[ 8040.805650]  nvme_reset_work+0xb4b/0xeab [nvme]
[ 8040.805651]  process_one_work+0x1a7/0x370
[ 8040.805652]  worker_thread+0x1c9/0x380
[ 8040.805653]  ? max_active_store+0x80/0x80
[ 8040.805655]  kthread+0x112/0x130
[ 8040.805656]  ? __kthread_parkme+0x70/0x70
[ 8040.805657]  ret_from_fork+0x35/0x40
[ 8040.805658] ---[ end trace b5f13b1e73ccb5d3 ]---
[ 8229.365135] BUG: kernel NULL pointer dereference, address: 0000000000000004
[ 8229.365165] #PF: supervisor read access in kernel mode
[ 8229.365178] #PF: error_code(0x0000) - not-present page
[ 8229.365191] PGD 0 P4D 0
[ 8229.365201] Oops: 0000 [#1] SMP PTI
[ 8229.365212] CPU: 77 PID: 13024 Comm: dd Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
[ 8229.365232] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8229.365253] RIP: 0010:blk_mq_get_tag+0x227/0x250
[ 8229.365265] Code: 44 24 04 44 01 e0 48 8b 74 24 38 65 48 33 34 25 28 00 00 00 75 33 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e c3 48 8d 68 10 4c 89 ef <44> 8b 60 04 48 89 ee e8 dd f9 ff ff 83 f8 ff 75 c8 e9 67 fe ff ff
[ 8229.365304] RSP: 0018:ffffba590e977970 EFLAGS: 00010246
[ 8229.365317] RAX: 0000000000000000 RBX: ffff9f00f6a79860 RCX: ffffba590e977998
[ 8229.365333] RDX: 0000000000000000 RSI: ffff9f012039b140 RDI: ffffba590e977a38
[ 8229.365349] RBP: 0000000000000010 R08: ffffda58ff94e190 R09: ffffda58ff94e198
[ 8229.365365] R10: 0000000000000011 R11: ffff9f00f6a79860 R12: 0000000000000000
[ 8229.365381] R13: ffffba590e977a38 R14: ffff9f012039b140 R15: 0000000000000001
[ 8229.365397] FS:  00007f481c230580(0000) GS:ffff9f013f940000(0000) knlGS:0000000000000000
[ 8229.365415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8229.365428] CR2: 0000000000000004 CR3: 0000005f35e26004 CR4: 00000000007606e0
[ 8229.365444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8229.365460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8229.365476] PKRU: 55555554
[ 8229.365484] Call Trace:
[ 8229.365498]  ? finish_wait+0x80/0x80
[ 8229.365512]  blk_mq_get_request+0xcb/0x3f0
[ 8229.365525]  blk_mq_make_request+0x143/0x5d0
[ 8229.365538]  generic_make_request+0xcf/0x310
[ 8229.365553]  ? scan_shadow_nodes+0x30/0x30
[ 8229.365564]  submit_bio+0x3c/0x150
[ 8229.365576]  mpage_readpages+0x163/0x1a0
[ 8229.365588]  ? blkdev_direct_IO+0x490/0x490
[ 8229.365601]  read_pages+0x6b/0x190
[ 8229.365612]  __do_page_cache_readahead+0x1c1/0x1e0
[ 8229.365626]  ondemand_readahead+0x182/0x2f0
[ 8229.365639]  generic_file_buffered_read+0x590/0xab0
[ 8229.365655]  new_sync_read+0x12a/0x1c0
[ 8229.365666]  vfs_read+0x8a/0x140
[ 8229.365676]  ksys_read+0x59/0xd0
[ 8229.365688]  do_syscall_64+0x55/0x1d0
[ 8229.365700]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Tested-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: save previous hardware queue count before udpate
Weiping Zhang [Thu, 7 May 2020 13:03:56 +0000 (21:03 +0800)]
block: save previous hardware queue count before udpate

blk_mq_realloc_tag_set_tags will update set->nr_hw_queues, so
save old set->nr_hw_queues before call this function.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: free both rq_map and request
Weiping Zhang [Thu, 7 May 2020 13:03:39 +0000 (21:03 +0800)]
block: free both rq_map and request

Allocation:

__blk_mq_alloc_rq_map
blk_mq_alloc_rq_map
blk_mq_alloc_rq_map
tags = blk_mq_init_tags : kzalloc_node:
tags->rqs = kcalloc_node
tags->static_rqs = kcalloc_node
blk_mq_alloc_rqs
p = alloc_pages_node
tags->static_rqs[i] = p + offset;

Free:

blk_mq_free_rq_map
kfree(tags->rqs);
kfree(tags->static_rqs);
blk_mq_free_tags
kfree(tags);

The page allocated in blk_mq_alloc_rqs cannot be released,
so we should use blk_mq_free_map_and_requests here.

blk_mq_free_map_and_requests
blk_mq_free_rqs
__free_pages : cleanup for blk_mq_alloc_rqs
blk_mq_free_rq_map : cleanup for blk_mq_alloc_rq_map

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge branch 'block-5.7' into for-5.8/block
Jens Axboe [Sat, 9 May 2020 22:13:58 +0000 (16:13 -0600)]
Merge branch 'block-5.7' into for-5.8/block

Pull in block-5.7 fixes for 5.8. Mostly to resolve a conflict with
the blk-iocost changes, but we also need the base of the bdi
use-after-free as well as we build on top of it.

* block-5.7:
  nvme: fix possible hang when ns scanning fails during error recovery
  nvme-pci: fix "slimmer CQ head update"
  bdi: add a ->dev_name field to struct backing_dev_info
  bdi: use bdi_dev_name() to get device name
  bdi: move bdi_dev_name out of line
  vboxsf: don't use the source name in the bdi name
  iocost: protect iocg->abs_vdebt with iocg->waitq.lock
  block: remove the bd_openers checks in blk_drop_partitions
  nvme: prevent double free in nvme_alloc_ns() error handling
  null_blk: Cleanup zoned device initialization
  null_blk: Fix zoned command handling
  block: remove unused header
  blk-iocost: Fix error on iocost_ioc_vrate_adj
  bdev: Reduce time holding bd_mutex in sync in blkdev_close()
  buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: fix possible hang when ns scanning fails during error recovery
Sagi Grimberg [Wed, 6 May 2020 22:44:02 +0000 (15:44 -0700)]
nvme: fix possible hang when ns scanning fails during error recovery

When the controller is reconnecting, the host fails I/O and admin
commands as the host cannot reach the controller. ns scanning may
revalidate namespaces during that period and it is wrong to remove
namespaces due to these failures as we may hang (see 205da2434301).

One command that may fail is nvme_identify_ns_descs. Since we return
success due to having ns identify descriptor list optional, we continue
to compare ns identifiers in nvme_revalidate_disk, obviously fail and
return -ENODEV to nvme_validate_ns, which will remove the namespace.

Exactly what we don't want to happen.

Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional")
Tested-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme-pci: fix "slimmer CQ head update"
Alexey Dobriyan [Thu, 7 May 2020 20:07:04 +0000 (23:07 +0300)]
nvme-pci: fix "slimmer CQ head update"

Pre-incrementing ->cq_head can't be done in memory because OOB value
can be observed by another context.

This devalues space savings compared to original code :-\

$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
Function                                     old     new   delta
nvme_poll_irqdisable                         464     456      -8
nvme_poll                                    455     447      -8
nvme_irq                                     388     380      -8
nvme_dev_disable                             955     947      -8

But the code is minimal now: one read for head, one read for q_depth,
one increment, one comparison, single instruction phase bit update and
one write for new head.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: John Garry <john.garry@huawei.com>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: add a ->dev_name field to struct backing_dev_info
Christoph Hellwig [Mon, 4 May 2020 12:47:56 +0000 (14:47 +0200)]
bdi: add a ->dev_name field to struct backing_dev_info

Cache a copy of the name for the life time of the backing_dev_info
structure so that we can reference it even after unregistering.

Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears")
Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: use bdi_dev_name() to get device name
Yufen Yu [Mon, 4 May 2020 12:47:55 +0000 (14:47 +0200)]
bdi: use bdi_dev_name() to get device name

Use the common interface bdi_dev_name() to get device name.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Add missing <linux/backing-dev.h> include BFQ

Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agobdi: move bdi_dev_name out of line
Christoph Hellwig [Mon, 4 May 2020 12:47:54 +0000 (14:47 +0200)]
bdi: move bdi_dev_name out of line

bdi_dev_name is not a fast path function, move it out of line.  This
prepares for using it from modular callers without having to export
an implementation detail like bdi_unknown_name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agovboxsf: don't use the source name in the bdi name
Christoph Hellwig [Mon, 4 May 2020 12:47:53 +0000 (14:47 +0200)]
vboxsf: don't use the source name in the bdi name

Simplify the bdi name to mirror what we are doing elsewhere, and
drop them name in favor of just using a number.  This avoids a
potentially very long bdi name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoiocost: protect iocg->abs_vdebt with iocg->waitq.lock
Tejun Heo [Mon, 4 May 2020 23:27:54 +0000 (19:27 -0400)]
iocost: protect iocg->abs_vdebt with iocg->waitq.lock

abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.

However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.

1. The iocg is in the process of being deactivated by the periodic timer.

2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
   without anything because it still sees that the iocg is already active.

3. The iocg is deactivated.

4. The bio from #2 is over budget but needs to be forced. It increases
   abs_vdebt and goes over the threshold and enables use_delay.

5. IO control is enabled for the iocg's subtree and now IOs are attributed
   to the descendant cgroups and the iocg itself no longer issues IOs.

This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.

The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.

This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f63f246 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoudf: stop using ioctl_by_bdev
Christoph Hellwig [Sat, 25 Apr 2020 07:57:06 +0000 (09:57 +0200)]
udf: stop using ioctl_by_bdev

Instead just call the CDROM layer functionality directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoisofs: stop using ioctl_by_bdev
Christoph Hellwig [Sat, 25 Apr 2020 07:57:05 +0000 (09:57 +0200)]
isofs: stop using ioctl_by_bdev

Instead just call the CDROM layer functionality directly, and turn the
hot mess in isofs_get_last_session into remotely readable code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agohfsplus: stop using ioctl_by_bdev
Christoph Hellwig [Sat, 25 Apr 2020 07:57:04 +0000 (09:57 +0200)]
hfsplus: stop using ioctl_by_bdev

Instead just call the CDROM layer functionality directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agocdrom: factor out a cdrom_multisession helper
Christoph Hellwig [Sat, 25 Apr 2020 07:57:03 +0000 (09:57 +0200)]
cdrom: factor out a cdrom_multisession helper

Factor out a version of the CDROMMULTISESSION ioctl handler that can
be called directly from kernel space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agocdrom: factor out a cdrom_read_tocentry helper
Christoph Hellwig [Sat, 25 Apr 2020 07:57:02 +0000 (09:57 +0200)]
cdrom: factor out a cdrom_read_tocentry helper

Factor out a version of the CDROMREADTOCENTRY ioctl handler that can
be called directly from kernel space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoide-cd: rename cdrom_read_tocentry
Christoph Hellwig [Sat, 25 Apr 2020 07:57:01 +0000 (09:57 +0200)]
ide-cd: rename cdrom_read_tocentry

Give the cdrom_read_tocentry function and ide_ prefix to not conflict
with the soon to be added generic function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>