platform/kernel/linux-starfive.git
3 years agoIB/hfi1: Remove indirect call to hfi1_ipoib_send_dma()
Mike Marciniszyn [Mon, 29 Mar 2021 13:54:11 +0000 (09:54 -0400)]
IB/hfi1: Remove indirect call to hfi1_ipoib_send_dma()

hfi1_ipoib_send() directly calls hfi1_ipoib_send_dma() with no value add.

Fix by renaming hfi1_ipoib_send_dma() to hfi1_ipoib_send().

Link: https://lore.kernel.org/r/1617026056-50483-6-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/hfi1: Use napi_schedule_irqoff() for tx napi
Mike Marciniszyn [Mon, 29 Mar 2021 13:54:10 +0000 (09:54 -0400)]
IB/hfi1: Use napi_schedule_irqoff() for tx napi

The call is from an ISR context and napi_schedule_irqoff() can be used.

Change the call to the more efficient type.

Link: https://lore.kernel.org/r/1617026056-50483-5-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/hfi1: Correct oversized ring allocation
Mike Marciniszyn [Mon, 29 Mar 2021 13:54:09 +0000 (09:54 -0400)]
IB/hfi1: Correct oversized ring allocation

The completion ring for tx is using the wrong size to size the ring,
oversizing the ring by two orders of magniture.

Correct the allocation size and use kcalloc_node() to allocate the ring.
Fix mistaken GFP defines in similar allocations.

Link: https://lore.kernel.org/r/1617026056-50483-4-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev
Mike Marciniszyn [Mon, 29 Mar 2021 13:54:08 +0000 (09:54 -0400)]
IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev

The current rdma_netdev handling in ipoib hooks the tx_timeout handler,
but prints out a totally useless message that prevents effective debugging
especially when multiple transmit queues are being used.

Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1
to print additional information.

The existing non-helpful message is avoided when the driver has presented
a callback.

Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/hfi1: Add AIP tx traces
Mike Marciniszyn [Mon, 29 Mar 2021 13:54:07 +0000 (09:54 -0400)]
IB/hfi1: Add AIP tx traces

Add traces to allow for debugging issues with AIP tx.

Link: https://lore.kernel.org/r/1617026056-50483-2-git-send-email-dennis.dalessandro@cornelisnetworks.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Reorganize doorbell update interfaces for all queues
Yixian Liu [Sat, 27 Mar 2021 10:25:38 +0000 (18:25 +0800)]
RDMA/hns: Reorganize doorbell update interfaces for all queues

The doorbell update interfaces are very similar for different queues, such
as SQ, RQ, SRQ, CQ and EQ. So reorganize these code and also fix some
inappropriate naming.

Link: https://lore.kernel.org/r/1616840738-7866-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Support configuring doorbell mode of RQ and CQ
Yixian Liu [Sat, 27 Mar 2021 10:25:37 +0000 (18:25 +0800)]
RDMA/hns: Support configuring doorbell mode of RQ and CQ

HIP08 supports both normal and record doorbell mode for RQ and CQ, SQ
record doorbell for userspace is also supported by the software for
flushing CQE process. As now the capability of HIP08 are exposed to the
user and are configurable, the support of normal doorbell should be added
back.

Note that, if switching to normal doorbell, the kernel will report "flush
cqe is unsupported" if modify qp to error status as the flush is based on
record doorbell.

Link: https://lore.kernel.org/r/1616840738-7866-2-git-send-email-liweihang@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Simplify command fields for HEM base address configuration
Xi Wang [Sat, 27 Mar 2021 03:21:34 +0000 (11:21 +0800)]
RDMA/hns: Simplify command fields for HEM base address configuration

Use hr_reg_write() instead of roce_set_field() to simplify codes about
configuring HEM BA.

Link: https://lore.kernel.org/r/1616815294-13434-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Reorganize process of setting HEM
Xi Wang [Sat, 27 Mar 2021 03:21:33 +0000 (11:21 +0800)]
RDMA/hns: Reorganize process of setting HEM

Encapsulate configuring GMV base address and other type of HEM table into
two separate functions to make process of setting HEM clearer.

Link: https://lore.kernel.org/r/1616815294-13434-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Refactor reset state checking flow
Xi Wang [Sat, 27 Mar 2021 03:21:32 +0000 (11:21 +0800)]
RDMA/hns: Refactor reset state checking flow

The 'HNS_ROCE_OPC_QUERY_MB_ST' command will response the mailbox complete
status and hardware busy flag, and the complete status is only valid when
the busy flag is 0, so it's better to query these two fields at a time
rather than separately.

Link: https://lore.kernel.org/r/1616815294-13434-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Reorganize hns_roce_create_cq()
Yixing Liu [Sat, 27 Mar 2021 03:21:31 +0000 (11:21 +0800)]
RDMA/hns: Reorganize hns_roce_create_cq()

Encapsulate two subprocesses into functions.

Link: https://lore.kernel.org/r/1616815294-13434-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Refactor hns_roce_v2_poll_one()
Weihang Li [Sat, 27 Mar 2021 03:21:30 +0000 (11:21 +0800)]
RDMA/hns: Refactor hns_roce_v2_poll_one()

Encapsulate the process of obtaining the current QP and filling WC as
functions, also merge some duplicate code.

Link: https://lore.kernel.org/r/1616815294-13434-2-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoMAINTAINERS: remove Xavier as maintainer of HISILICON ROCE DRIVER
Weihang Li [Mon, 29 Mar 2021 08:46:24 +0000 (16:46 +0800)]
MAINTAINERS: remove Xavier as maintainer of HISILICON ROCE DRIVER

Wei Hu(Xavier) has left Hisilicon and his email address is invalid now.
I'd be glad to add him back with another address if he wants to continue
maintain this module.

Link: https://lore.kernel.org/r/1617007584-39842-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs-clt: Cap max_io_size
Jack Wang [Thu, 25 Mar 2021 15:33:05 +0000 (16:33 +0100)]
RDMA/rtrs-clt: Cap max_io_size

Max io size is limited by both remote buffer size and the max fr pages per
mr.

Link: https://lore.kernel.org/r/20210325153308.1214057-20-gi-oh.kim@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs: Cleanup unused 's' variable in __alloc_sess
Jack Wang [Thu, 25 Mar 2021 15:33:03 +0000 (16:33 +0100)]
RDMA/rtrs: Cleanup unused 's' variable in __alloc_sess

Link: https://lore.kernel.org/r/20210325153308.1214057-18-gi-oh.kim@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs-srv: Report temporary sessname for error message
Gioh Kim [Thu, 25 Mar 2021 15:33:02 +0000 (16:33 +0100)]
RDMA/rtrs-srv: Report temporary sessname for error message

Before receiving the session name, the error message cannot include any
information about which connection generates the error.

This patch stores the addresses of source and target in the sessname field
to show which generates the error. That field will be over-written
when receiving the session name from client.

Link: https://lore.kernel.org/r/20210325153308.1214057-17-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs: New function converting rtrs_addr to string
Gioh Kim [Thu, 25 Mar 2021 15:32:59 +0000 (16:32 +0100)]
RDMA/rtrs: New function converting rtrs_addr to string

There is common code converting addresses of source machine and
destination machine to a string.  We already have a struct rtrs_addr to
store two addresses.  This patch introduces a new function that converts
two addresses into one string with struct rtrs_addr.

Link: https://lore.kernel.org/r/20210325153308.1214057-14-gi-oh.kim@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler
Guoqing Jiang [Thu, 25 Mar 2021 15:32:56 +0000 (16:32 +0100)]
RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler

Let these cases share the same path since all of them need to close
session.

Link: https://lore.kernel.org/r/20210325153308.1214057-11-gi-oh.kim@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs
Guoqing Jiang [Thu, 25 Mar 2021 15:32:55 +0000 (16:32 +0100)]
RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrs

The two members are not used in the code, so remove them.

Link: https://lore.kernel.org/r/20210325153308.1214057-10-gi-oh.kim@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs: Kill the put label in rtrs_srv_create_once_sysfs_root_folders
Guoqing Jiang [Thu, 25 Mar 2021 15:32:54 +0000 (16:32 +0100)]
RDMA/rtrs: Kill the put label in rtrs_srv_create_once_sysfs_root_folders

We can remove the label after move put_device to the right place.

Link: https://lore.kernel.org/r/20210325153308.1214057-9-gi-oh.kim@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req
Guoqing Jiang [Thu, 25 Mar 2021 15:32:53 +0000 (16:32 +0100)]
RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_req

There is no need to dereference 's' from 'sess', since we have "sess =
to_clt_sess(s)" before.

And we can deference 'dev' from 's' earlier.

Link: https://lore.kernel.org/r/20210325153308.1214057-8-gi-oh.kim@ionos.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com>
Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoMAINTAINERS: Change maintainer for rtrs module
Danil Kipnis [Thu, 25 Mar 2021 15:32:47 +0000 (16:32 +0100)]
MAINTAINERS: Change maintainer for rtrs module

Danil will step down, Haris will take over.  Also update to email address
to ionos.com, cloud.ionos.com will still work for sometime.

Link: https://lore.kernel.org/r/20210325153308.1214057-2-gi-oh.kim@ionos.com
Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Acked-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/uverbs: Fix -Wunused-function warning
YueHaibing [Thu, 1 Apr 2021 02:10:28 +0000 (10:10 +0800)]
RDMA/uverbs: Fix -Wunused-function warning

make W=1 warns this:

In file included from drivers/infiniband/sw/rdmavt/mmap.c:51:0:
./include/rdma/uverbs_ioctl.h:937:1:
 warning: ‘_uverbs_get_const_unsigned’ defined but not used [-Wunused-function]
 _uverbs_get_const_unsigned(u64 *to,
 ^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/rdma/uverbs_ioctl.h:930:1:
 warning: ‘_uverbs_get_const_signed’ defined but not used [-Wunused-function]
 _uverbs_get_const_signed(s64 *to, const struct uverbs_attr_bundle *attrs_bundle,
 ^~~~~~~~~~~~~~~~~~~~~~~~

Make these functions inline to fix this warnings.

Fixes: 2904bb37b35d ("IB/core: Split uverbs_get_const/default to consider target type")
Link: https://lore.kernel.org/r/20210401021028.25720-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Support congestion control type selection according to the FW
Yangyang Li [Thu, 25 Mar 2021 13:33:56 +0000 (21:33 +0800)]
RDMA/hns: Support congestion control type selection according to the FW

The type of congestion control algorithm includes DCQCN, LDCP, HC3 and
DIP. The driver will select one of them according to the firmware when
querying PF capabilities, and then set the related configuration fields
into QPC.

Link: https://lore.kernel.org/r/1616679236-7795-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Support query information of functions from FW
Wei Xu [Thu, 25 Mar 2021 13:33:55 +0000 (21:33 +0800)]
RDMA/hns: Support query information of functions from FW

Add a new type of command to query mac id of functions from the firmware,
it is used to select the template of congestion algorithm. More info will
be supported in the future.

Link: https://lore.kernel.org/r/1616679236-7795-2-git-send-email-liweihang@huawei.com
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Shengming Shu <shushengming1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/core: Fix corrupted SL on passive side
Håkon Bugge [Mon, 22 Mar 2021 13:35:32 +0000 (14:35 +0100)]
RDMA/core: Fix corrupted SL on passive side

On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
Subnet Local is zero.

In cm_req_handler(), the cm_process_routed_req() function is called. Since
the Primary Subnet Local value is zero in the request, and since this is
RoCE (Primary Local LID is permissive), the following statement will be
executed:

      IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);

This corrupts SL in req_msg if it was different from zero. In other words,
a request to setup a connection using an SL != zero, will not be honored,
and a connection using SL zero will be created instead.

Fixed by not calling cm_process_routed_req() on RoCE systems, the
cm_process_route_req() is only for IB anyhow.

Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
Link: https://lore.kernel.org/r/1616420132-31005-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rxe: Remove rxe_dma_device declaration
Kamal Heib [Wed, 31 Mar 2021 10:20:43 +0000 (13:20 +0300)]
RDMA/rxe: Remove rxe_dma_device declaration

The function isn't implemented - delete the declaration.

Fixes: a9d2e9ae953f ("RDMA/siw,rxe: Make emulated devices virtual in the device tree")
Link: https://lore.kernel.org/r/20210331102043.691950-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/iw_cxgb4: Use DEFINE_SPINLOCK() for spinlock
Tang Yizhou [Wed, 31 Mar 2021 02:01:05 +0000 (10:01 +0800)]
RDMA/iw_cxgb4: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK() rather
than explicitly calling spin_lock_init().

Link: https://lore.kernel.org/r/20210331020105.4858-1-tangyizhou@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/iser: struct iscsi_iser_task is declared twice
Wan Jiabing [Fri, 26 Mar 2021 11:33:46 +0000 (19:33 +0800)]
RDMA/iser: struct iscsi_iser_task is declared twice

struct iscsi_iser_task has been declared at 201st line. Remove the
duplicate.

Link: https://lore.kernel.org/r/20210326113347.903976-1-wanjiabing@vivo.com
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Fix a spelling mistake in hns_roce_hw_v1.c
Ruiqi Gong [Tue, 30 Mar 2021 12:29:12 +0000 (08:29 -0400)]
RDMA/hns: Fix a spelling mistake in hns_roce_hw_v1.c

s/caculating/calculating

Link: https://lore.kernel.org/r/20210330122912.19989-1-gongruiqi1@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ruiqi Gong <gongruiqi1@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rxe: Split MEM into MR and MW
Bob Pearson [Thu, 25 Mar 2021 21:24:26 +0000 (16:24 -0500)]
RDMA/rxe: Split MEM into MR and MW

In the original rxe implementation it was intended to use a common object
to represent MRs and MWs but they are different enough to separate these
into two objects.

This allows replacing the mem name with mr for MRs which is more
consistent with the style for the other objects and less likely to be
confusing. This is a long patch that mostly changes mem to mr where it
makes sense and adds a new rxe_mw struct.

Link: https://lore.kernel.org/r/20210325212425.2792-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/efa: Use strscpy instead of strlcpy
Gal Pressman [Mon, 29 Mar 2021 12:01:28 +0000 (15:01 +0300)]
RDMA/efa: Use strscpy instead of strlcpy

The strlcpy function doesn't limit the source length, use the preferred
strscpy function instead.

Link: https://lore.kernel.org/r/20210329120131.18793-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/isert: Fix a use after free in isert_connect_request
Lv Yunlong [Mon, 22 Mar 2021 16:13:25 +0000 (09:13 -0700)]
IB/isert: Fix a use after free in isert_connect_request

The device is got by isert_device_get() with refcount is 1, and is
assigned to isert_conn by
  isert_conn->device = device.

When isert_create_qp() failed, device will be freed with
isert_device_put().

Later, the device is used in isert_free_login_buf(isert_conn) by the
isert_conn->device->ib_device statement.

Free the device in the correct order.

Fixes: ae9ea9ed38c9 ("iser-target: Split some logic in isert_connect_request to routines")
Link: https://lore.kernel.org/r/20210322161325.7491-1-lyl2019@mail.ustc.edu.cn
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA: Fix a typo
Bhaskar Chowdhury [Mon, 22 Mar 2021 06:43:22 +0000 (12:13 +0530)]
RDMA: Fix a typo

s/struture/structure/

Link: https://lore.kernel.org/r/20210322064322.3933985-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/hfi1: Fix a typo
Bhaskar Chowdhury [Mon, 22 Mar 2021 06:29:23 +0000 (11:59 +0530)]
IB/hfi1: Fix a typo

s/struture/structure/

And add the missing colon for kdoc

Link: https://lore.kernel.org/r/20210322062923.3306167-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/core: Correct misspellings of two words in comments
Yangyang Li [Fri, 19 Mar 2021 09:55:49 +0000 (17:55 +0800)]
RDMA/core: Correct misspellings of two words in comments

Correct the following spelling errors:
1. shold -> should
2. uncontext -> ucontext

Link: https://lore.kernel.org/r/1616147749-49106-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Set ODP caps only if device profile support ODP
Shay Drory [Thu, 18 Mar 2021 13:52:59 +0000 (15:52 +0200)]
RDMA/mlx5: Set ODP caps only if device profile support ODP

Currently, ODP caps are set during the init stage of mlx5_ib_dev,
regardless of whether the device profile supports ODP or not.  There is no
point in setting ODP caps if the device profile doesn't support
ODP. Hence, move setting the ODP caps to the odp_init stage.

Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Fix drop packet rule in egress table
Maor Gottlieb [Thu, 18 Mar 2021 13:51:23 +0000 (15:51 +0200)]
RDMA/mlx5: Fix drop packet rule in egress table

Initial drop action support missed that drop action can be added to egress
flow tables as well. Add the missing support.

This requires making sure that dest_type isn't set to PORT which in turn
exposes a possibility of passing dst while indicating number of dsts as
zero. Explicitly check for number of dsts and pass the appropriate
pointer.

Fixes: f29de9eee782 ("RDMA/mlx5: Add support for drop action in DV steering")
Link: https://lore.kernel.org/r/20210318135123.680759-1-leon@kernel.org
Reviewed-by: Mark Bloch <markb@nvidia.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/uverbs: Refactor rdma_counter_set_auto_mode and __counter_set_mode
Patrisious Haddad [Thu, 18 Mar 2021 11:05:02 +0000 (13:05 +0200)]
RDMA/uverbs: Refactor rdma_counter_set_auto_mode and __counter_set_mode

Success is returned in the following flows:
 * New mode is the same as the current one.
 * Switched to new mode and there are no bound counters yet.

Link: https://lore.kernel.org/r/20210318110502.673676-1-leon@kernel.org
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/bnxt_re: Move device to error state upon device crash
Selvin Xavier [Wed, 17 Mar 2021 08:15:42 +0000 (01:15 -0700)]
RDMA/bnxt_re: Move device to error state upon device crash

When the L2 driver detects a device crash or device undergone reset, it
invokes a stop callback to recover from error.

The current RoCE driver doesn't recover the device. So move the device to
error state and dispatch fatal events to all qps Release the MSIx vectors
to avoid a crash when L2 driver disables the MSIx.  Also, check for the
device state to avoid posting further commands to the HW.

Link: https://lore.kernel.org/r/1615968942-30970-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA: Support more than 255 rdma ports
Mark Bloch [Mon, 1 Mar 2021 07:04:20 +0000 (09:04 +0200)]
RDMA: Support more than 255 rdma ports

Current code uses many different types when dealing with a port of a RDMA
device: u8, unsigned int and u32. Switch to u32 to clean up the logic.

This allows us to make (at least) the core view consistent and use the
same type. Unfortunately not all places can be converted. Many uverbs
functions expect port to be u8 so keep those places in order not to break
UAPIs.  HW/Spec defined values must also not be changed.

With the switch to u32 we now can support devices with more than 255
ports. U32_MAX is reserved to make control logic a bit easier to deal
with. As a device with U32_MAX ports probably isn't going to happen any
time soon this seems like a non issue.

When a device with more than 255 ports is created uverbs will report the
RDMA device as having 255 ports as this is the max currently supported.

The verbs interface is not changed yet because the IBTA spec limits the
port size in too many places to be u8 and all applications that relies in
verbs won't be able to cope with this change. At this stage, we are
extending the interfaces that are using vendor channel solely

Once the limitation is lifted mlx5 in switchdev mode will be able to have
thousands of SFs created by the device. As the only instance of an RDMA
device that reports more than 255 ports will be a representor device and
it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other
ULPs aren't effected by this change and their sysfs/interfaces that are
exposes to userspace can remain unchanged.

While here cleanup some alignment issues and remove unneeded sanity
checks (mainly in rdmavt),

Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Support to query firmware version
Lang Cheng [Tue, 16 Mar 2021 08:09:21 +0000 (16:09 +0800)]
RDMA/hns: Support to query firmware version

Implement the ops named get_dev_fw_str to support ib_get_device_fw_str().

Link: https://lore.kernel.org/r/1615882161-53827-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Create ODP EQ only when ODP MR is created
Shay Drory [Sun, 14 Mar 2021 12:54:18 +0000 (14:54 +0200)]
RDMA/mlx5: Create ODP EQ only when ODP MR is created

There is no need to create the ODP EQ if the user doesn't use ODP MRs.
Hence, create it only when the first ODP MR is created. This EQ will be
destroyed only when the device is unloaded.
This will decrease the number of EQs created per device. for example: If
we creates 1K devices (SF/VF/etc'), than we will decrease the num of EQs
by 1K.

Link: https://lore.kernel.org/r/20210314125418.179716-1-leon@kernel.org
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Fix memory corruption when allocating XRCDN
Weihang Li [Mon, 22 Mar 2021 02:44:29 +0000 (10:44 +0800)]
RDMA/hns: Fix memory corruption when allocating XRCDN

It's incorrect to cast the type of pointer to xrcdn from (u32 *) to
(unsigned long *), then pass it into hns_roce_bitmap_alloc(), this will
lead to a memory corruption.

Fixes: 32548870d438 ("RDMA/hns: Add support for XRC on HIP09")
Link: https://lore.kernel.org/r/1616381069-51759-1-git-send-email-liweihang@huawei.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/hns: Fix mispelling of subsystem
Bhaskar Chowdhury [Mon, 22 Mar 2021 02:27:51 +0000 (07:57 +0530)]
IB/hns: Fix mispelling of subsystem

s/wubsytem/subsystem/

Link: https://lore.kernel.org/r/20210322022751.4137205-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/include: Mundane typo fixes throughout the file
Bhaskar Chowdhury [Thu, 18 Mar 2021 10:04:53 +0000 (15:34 +0530)]
RDMA/include: Mundane typo fixes throughout the file

s/proviee/provide/
s/undelying/underlying/
s/quesiton/question/
s/drivr/driver/

Link: https://lore.kernel.org/r/20210318100453.9759-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/cma: Remove unused leftovers in cma code
Gal Pressman [Sun, 14 Mar 2021 14:34:27 +0000 (16:34 +0200)]
RDMA/cma: Remove unused leftovers in cma code

Commit ee1c60b1bff8 ("IB/SA: Modify SA to implicitly cache Class Port
info") removed the class_port_info_context struct usage, remove a couple
of leftovers.

Link: https://lore.kernel.org/r/20210314143427.76101-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA: Delete not-used static inline functions
Leon Romanovsky [Sun, 14 Mar 2021 13:39:08 +0000 (15:39 +0200)]
RDMA: Delete not-used static inline functions

Perform mass deletion of static inline functions that are not used.

Link: https://lore.kernel.org/r/20210314133908.291945-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA: Fix kernel-doc compilation warnings
Leon Romanovsky [Sun, 14 Mar 2021 13:39:07 +0000 (15:39 +0200)]
RDMA: Fix kernel-doc compilation warnings

This patch fixes bunch of kernel-doc compilation warnings like below:

 drivers/infiniband/hw/i40iw/i40iw_cm.c:4372: warning: expecting prototype for i40iw_ifdown_notify(). Prototype was for i40iw_if_notify() instead

Link: https://lore.kernel.org/r/20210314133908.291945-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Add missing returned error check of mlx5_ib_dereg_mr
Leon Romanovsky [Sun, 14 Mar 2021 08:22:50 +0000 (10:22 +0200)]
RDMA/mlx5: Add missing returned error check of mlx5_ib_dereg_mr

Fix the following smatch error:

drivers/infiniband/hw/mlx5/mr.c:1950 mlx5_ib_dereg_mr() error: uninitialized symbol 'rc'.

Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Link: https://lore.kernel.org/r/20210314082250.10143-1-leon@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Allow larger pages in DevX umem
Jason Gunthorpe [Thu, 4 Mar 2021 13:05:01 +0000 (15:05 +0200)]
RDMA/mlx5: Allow larger pages in DevX umem

The umem DMA list calculation was locked at 4k pages due to confusion
around how this API works and is used when larger pages are present.

The conclusion is:

 - umem's cannot extend past what is mapped into the process, so creating
   a lage page size and referring to a sub-range is not allowed

 - umem's must always have a page offset of zero, except for sub PAGE_SIZE
   umems

 - The feature of umem_offset to create multiple objects inside a umem
   is buggy and isn't used anyplace. Thus we can assume all users of the
   current API have umem_offset == 0 as well

Provide a new page size calculator that limits the DMA list to the VA
range and enforces umem_offset == 0.

Allow user space to specify the page sizes which it can accept, this
bitmap must be derived from the intended use of the umem, based on
per-usage HW limitations.

Link: https://lore.kernel.org/r/20210304130501.1102577-4-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/core: Split uverbs_get_const/default to consider target type
Yishai Hadas [Thu, 4 Mar 2021 13:05:00 +0000 (15:05 +0200)]
IB/core: Split uverbs_get_const/default to consider target type

Change uverbs_get_const/uverbs_get_const_default to work properly with
both signed/unsigned parameters.

Current APIs mix s64 and u64 which leads to incorrect check when u64
value was supplied and its upper bit was set. In that case
uverbs_get_const() / uverbs_get_const_default() lower bound check may
fail unexpectedly, target is unsigned (lower bound is 0) but value
became negative as of the s64 usage.

Split to have two different APIs, no change to callers as the required
API will be called internally according to the target type.

Link: https://lore.kernel.org/r/20210304130501.1102577-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoIB/core: Drop WARN_ON() from ib_umem_find_best_pgsz()
Yishai Hadas [Thu, 4 Mar 2021 13:04:59 +0000 (15:04 +0200)]
IB/core: Drop WARN_ON() from ib_umem_find_best_pgsz()

The WARN_ON() issued as part of ib_umem_find_best_pgsz() blocked cases
when only page sizes larger than PAGE_SIZE were set, drop it to enable
those cases.

In addition, there is no need to have a specific check for zero
pgsz_bitmap, the function will do its job and return 0 at the end if
nothing match will be found.

Link: https://lore.kernel.org/r/20210304130501.1102577-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Fix mlx5 rates to IB rates map
Mark Zhang [Thu, 4 Mar 2021 12:45:17 +0000 (14:45 +0200)]
RDMA/mlx5: Fix mlx5 rates to IB rates map

Correct the map between mlx5 rates and corresponding ib rates, as they
don't always have a fixed offset between them.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20210304124517.1100608-4-leon@kernel.org
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Fix query RoCE port
Maor Gottlieb [Thu, 4 Mar 2021 12:45:15 +0000 (14:45 +0200)]
RDMA/mlx5: Fix query RoCE port

mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it
should be used only when driver is loaded. Instead we just need to read
the roce_en field.

In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled.

Fixes: 7a58779edd75 ("IB/mlx5: Improve query port for representor port")
Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Rename mlx5_mr_cache_invalidate() to revoke_mr()
Jason Gunthorpe [Thu, 4 Mar 2021 12:07:45 +0000 (14:07 +0200)]
RDMA/mlx5: Rename mlx5_mr_cache_invalidate() to revoke_mr()

Now that this is only used in a few places in mr.c give it a sensible
name. It has nothing to do with the cache and can be invoked on any
MR. DMA is stopped and the user cannot touch the MR any further once it
completes.

Link: https://lore.kernel.org/r/20210304120745.1090751-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()
Jason Gunthorpe [Thu, 4 Mar 2021 12:07:44 +0000 (14:07 +0200)]
RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()

Now that the SRCU stuff has been removed the entire MR destroy logic can
be made a lot simpler. Currently there are many different ways to destroy a
MR and it makes it really hard to do this task correctly. Route all
destruction through mlx5_ib_dereg_mr() and make it work for all
situations.

Since it turns out all the different MR types do basically the same thing
this removes a lot of knowledge of MR internals from ODP and leaves ODP
just exporting an operation to clean up children.

This fixes a few weird corner cases bugs and firmly uses the correct
ordering of the MR destruction:
 - Stop parallel access to the mkey via the ODP xarray
 - Stop DMA
 - Release the umem
 - Clean up ODP children
 - Free/Recycle the MR

Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Use a union inside mlx5_ib_mr
Jason Gunthorpe [Thu, 4 Mar 2021 12:07:43 +0000 (14:07 +0200)]
RDMA/mlx5: Use a union inside mlx5_ib_mr

The struct mlx5_ib_mr can be used for three different things, but only one
at a time:

 - In the user MR cache
 - As a kernel MR
 - As a user MR

Overlay the three things into a single union with the following rules:

 - If the mr is found on the cache_ent->head list then it is a cache MR
   and umem == NULL. The entire union is zero after the MR is removed from
   the cache.

 - If umem != NULL or type == IB_MR_TYPE_USER then it is a user MR.

 - If umem == NULL then it is a kernel MR

This reduces the size of struct mlx5_ib_mr to 552 bytes from 702.

The only place the three flows overlap in the code is during dereg, so add
a few extra checks along there.

Link: https://lore.kernel.org/r/20210304120745.1090751-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr
Jason Gunthorpe [Thu, 4 Mar 2021 12:07:42 +0000 (14:07 +0200)]
RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr

All of the ODP code assumes when it calls mlx5_mr_cache_alloc() the ODP
related fields are zero'd. This is true if the MR was just allocated, but
if the MR is recycled through the cache then the values are never zero'd.

This causes a bug in the odp_stats, they don't reset when the MR is
reallocated, also is_odp_implicit is never 0'd.

So we can use memset on a block of the mlx5_ib_mr reorganize the structure
to put all the data that can be zero'd by the cache at the end.

It is organized as an anonymous struct because the next patch will make
this a union.

Delete the unused smr_info. Don't set the kernel only desc_size on the
user path. No longer any need to zero mr->parent before freeing it, the
memset() will get it now.

Fixes: a3de94e3d61e ("IB/mlx5: Introduce ODP diagnostic counters")
Link: https://lore.kernel.org/r/20210304120745.1090751-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Add support for XRC on HIP09
Wenpeng Liang [Thu, 4 Mar 2021 02:55:58 +0000 (10:55 +0800)]
RDMA/hns: Add support for XRC on HIP09

The HIP09 supports XRC transport service, it greatly saves the number of
QPs required to connect all processes in a large cluster.

Link: https://lore.kernel.org/r/1614826558-35423-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs-clt: Use rdma_event_msg in log
Jack Wang [Mon, 22 Feb 2021 14:15:51 +0000 (15:15 +0100)]
RDMA/rtrs-clt: Use rdma_event_msg in log

It's easier to understand a string instead of enum.

Link: https://lore.kernel.org/r/20210222141551.54345-2-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rtrs: Use new shared CQ mechanism
Jack Wang [Mon, 22 Feb 2021 14:15:50 +0000 (15:15 +0100)]
RDMA/rtrs: Use new shared CQ mechanism

Have the driver use shared CQs which provids a ~10%-20% improvement during
test.

Instead of opening a CQ for each QP per connection, a CQ for each QP will
be provided by the RDMA core driver that will be shared between the QPs on
that core reducing interrupt overhead.

Link: https://lore.kernel.org/r/20210222141551.54345-1-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/core: Remove unused req_ncomp_notif device operation
Gal Pressman [Thu, 11 Mar 2021 15:09:21 +0000 (17:09 +0200)]
RDMA/core: Remove unused req_ncomp_notif device operation

The request_ncomp_notif device operation and function are unused, remove
them.

Link: https://lore.kernel.org/r/20210311150921.23726-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/iwcm: Allow AFONLY binding for IPv6 addresses
Bernard Metzler [Fri, 19 Feb 2021 14:34:41 +0000 (15:34 +0100)]
RDMA/iwcm: Allow AFONLY binding for IPv6 addresses

Binding IPv6 address/port to AF_INET6 domain only is provided via
rdma_set_afonly(), but was not signalled to the provider.  Applications
like NFS/RDMA bind the same port to both IPv4 and IPv6 addresses
simultaneously and thus rely on it working correctly.

Link: https://lore.kernel.org/r/20210219143441.1068-1-bmt@zurich.ibm.com
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/hns: Use new SQ doorbell register for HIP09
Lang Cheng [Tue, 23 Feb 2021 12:20:33 +0000 (20:20 +0800)]
RDMA/hns: Use new SQ doorbell register for HIP09

HIP09 uses new address space to map SQ doorbell registers, the doorbell of
each QP is isolated based on the size of 64KB, which can improve the
performance in concurrency scenarios.

Link: https://lore.kernel.org/r/1614082833-23130-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoLinux 5.12-rc2
Linus Torvalds [Sat, 6 Mar 2021 01:33:41 +0000 (17:33 -0800)]
Linux 5.12-rc2

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Sat, 6 Mar 2021 01:27:59 +0000 (17:27 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Nothing special here, though Bob's regression fixes for rxe would have
  made it before the rc cycle had there not been such strong winter
  weather!

   - Fix corner cases in the rxe reference counting cleanup that are
     causing regressions in blktests for SRP

   - Two kdoc fixes so W=1 is clean

   - Missing error return in error unwind for mlx5

   - Wrong lock type nesting in IB CM"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
  RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
  RDMA/rxe: Fix missed IB reference counting in loopback
  RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc
  RDMA/mlx5: Set correct kernel-doc identifier
  IB/mlx5: Add missing error code
  RDMA/rxe: Fix missing kconfig dependency on CRYPTO
  RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep

3 years agoMerge tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 6 Mar 2021 01:23:03 +0000 (17:23 -0800)]
Merge tag 'gcc-plugins-v5.12-rc2' of git://git./linux/kernel/git/kees/linux

Pull gcc-plugins fixes from Kees Cook:
 "Tiny gcc-plugin fixes for v5.12-rc2. These issues are small but have
  been reported a couple times now by static analyzers, so best to get
  them fixed to reduce the noise. :)

   - Fix coding style issues (Jason Yan)"

* tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: latent_entropy: remove unneeded semicolon
  gcc-plugins: structleak: remove unneeded variable 'ret'

3 years agoMerge tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Sat, 6 Mar 2021 01:21:25 +0000 (17:21 -0800)]
Merge tag 'pstore-v5.12-rc2' of git://git./linux/kernel/git/kees/linux

Pull pstore fixes from Kees Cook:

 - Rate-limit ECC warnings (Dmitry Osipenko)

 - Fix error path check for NULL (Tetsuo Handa)

* tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Rate-limit "uncorrectable error in header" message
  pstore: Fix warning in pstore_kill_sb()

3 years agoMerge tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 5 Mar 2021 21:25:23 +0000 (13:25 -0800)]
Merge tag 'for-5.12/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
 "Fix DM verity target's optional Forward Error Correction (FEC) for
  Reed-Solomon roots that are unaligned to block size"

* tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm verity: fix FEC for RS roots unaligned to block size
  dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size

3 years agoMerge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 5 Mar 2021 20:59:37 +0000 (12:59 -0800)]
Merge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe fixes:
      - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal
        Terjan)
      - fix a hwmon error return (Daniel Wagner)
      - fix the keep alive timeout initialization (Martin George)
      - ensure the model_number can't be changed on a used subsystem
        (Max Gurtovoy)

 - rsxx missing -EFAULT on copy_to_user() failure (Dan)

 - rsxx remove unused linux.h include (Tian)

 - kill unused RQF_SORTED (Jean)

 - updated outdated BFQ comments (Joseph)

 - revert work-around commit for bd_size_lock, since we removed the
   offending user in this merge window (Damien)

* tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block:
  nvmet: model_number must be immutable once set
  nvme-fabrics: fix kato initialization
  nvme-hwmon: Return error code when registration fails
  nvme-pci: add quirks for Lexar 256GB SSD
  nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
  nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
  rsxx: Return -EFAULT if copy_to_user() fails
  block/bfq: update comments and default value in docs for fifo_expire
  rsxx: remove unused including <linux/version.h>
  block: Drop leftover references to RQF_SORTED
  block: revert "block: fix bd_size_lock use"

3 years agoMerge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 5 Mar 2021 20:44:43 +0000 (12:44 -0800)]
Merge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A bit of a mix between fallout from the worker change, cleanups and
  reductions now possible from that change, and fixes in general. In
  detail:

   - Fully serialize manager and worker creation, fixing races due to
     that.

   - Clean up some naming that had gone stale.

   - SQPOLL fixes.

   - Fix race condition around task_work rework that went into this
     merge window.

   - Implement unshare. Used for when the original task does unshare(2)
     or setuid/seteuid and friends, drops the original workers and forks
     new ones.

   - Drop the only remaining piece of state shuffling we had left, which
     was cred. Move it into issue instead, and we can drop all of that
     code too.

   - Kill f_op->flush() usage. That was such a nasty hack that we had
     out of necessity, we no longer need it.

   - Following from ->flush() removal, we can also drop various bits of
     ctx state related to SQPOLL and cancelations.

   - Fix an issue with IOPOLL retry, which originally was fallout from a
     filemap change (removing iov_iter_revert()), but uncovered an issue
     with iovec re-import too late.

   - Fix an issue with system suspend.

   - Use xchg() for fallback work, instead of cmpxchg().

   - Properly destroy io-wq on exec.

   - Add create_io_thread() core helper, and use that in io-wq and
     io_uring. This allows us to remove various silly completion events
     related to thread setup.

   - A few error handling fixes.

  This should be the grunt of fixes necessary for the new workers, next
  week should be quieter. We've got a pending series from Pavel on
  cancelations, and how tasks and rings are indexed. Outside of that,
  should just be minor fixes. Even with these fixes, we're still killing
  a net ~80 lines"

* tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits)
  io_uring: don't restrict issue_flags for io_openat
  io_uring: make SQPOLL thread parking saner
  io-wq: kill hashed waitqueue before manager exits
  io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
  io_uring: don't keep looping for more events if we can't flush overflow
  io_uring: move to using create_io_thread()
  kernel: provide create_io_thread() helper
  io_uring: reliably cancel linked timeouts
  io_uring: cancel-match based on flags
  io-wq: ensure all pending work is canceled on exit
  io_uring: ensure that threads freeze on suspend
  io_uring: remove extra in_idle wake up
  io_uring: inline __io_queue_async_work()
  io_uring: inline io_req_clean_work()
  io_uring: choose right tctx->io_wq for try cancel
  io_uring: fix -EAGAIN retry with IOPOLL
  io-wq: fix error path leak of buffered write hash map
  io_uring: remove sqo_task
  io_uring: kill sqo_dead and sqo submission halting
  io_uring: ignore double poll add on the same waitqueue head
  ...

3 years agoMerge tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 5 Mar 2021 20:36:33 +0000 (12:36 -0800)]
Merge tag 'pm-5.12-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix the usage of device links in the runtime PM core code and
  update the DTPM (Dynamic Thermal Power Management) feature added
  recently.

  Specifics:

   - Make the runtime PM core code avoid attempting to suspend supplier
     devices before updating the PM-runtime status of a consumer to
     'suspended' (Rafael Wysocki).

   - Fix DTPM (Dynamic Thermal Power Management) root node
     initialization and label that feature as EXPERIMENTAL in Kconfig
     (Daniel Lezcano)"

* tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap/drivers/dtpm: Add the experimental label to the option description
  powercap/drivers/dtpm: Fix root node initialization
  PM: runtime: Update device status before letting suppliers suspend

3 years agoMerge tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 5 Mar 2021 20:32:17 +0000 (12:32 -0800)]
Merge tag 'acpi-5.12-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Make the empty stubs of some helper functions used when CONFIG_ACPI is
  not set actually match those functions (Andy Shevchenko)"

* tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: bus: Constify is_acpi_node() and friends (part 2)

3 years agoMerge tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 5 Mar 2021 20:26:24 +0000 (12:26 -0800)]
Merge tag 'iommu-fixes-v5.12-rc1' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Fix a sleeping-while-atomic issue in the AMD IOMMU code

 - Disable lazy IOTLB flush for untrusted devices in the Intel VT-d
   driver

 - Fix status code definitions for Intel VT-d

 - Fix IO Page Fault issue in Tegra IOMMU driver

* tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix status code for Allocate/Free PASID command
  iommu: Don't use lazy flush for untrusted device
  iommu/tegra-smmu: Fix mc errors on tegra124-nyan
  iommu/amd: Fix sleeping in atomic in increase_address_space()

3 years agoMerge tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 5 Mar 2021 20:21:14 +0000 (12:21 -0800)]
Merge tag 'for-5.12-rc1-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "More regression fixes and stabilization.

  Regressions:

   - zoned mode
      - count zone sizes in wider int types
      - fix space accounting for read-only block groups

   - subpage: fix page tail zeroing

  Fixes:

   - fix spurious warning when remounting with free space tree

   - fix warning when creating a directory with smack enabled

   - ioctl checks for qgroup inheritance when creating a snapshot

   - qgroup
      - fix missing unlock on error path in zero range
      - fix amount of released reservation on error
      - fix flushing from unsafe context with open transaction,
        potentially deadlocking

   - minor build warning fixes"

* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: do not account freed region of read-only block group as zone_unusable
  btrfs: zoned: use sector_t for zone sectors
  btrfs: subpage: fix the false data csum mismatch error
  btrfs: fix warning when creating a directory with smack enabled
  btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
  btrfs: export and rename qgroup_reserve_meta
  btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
  btrfs: fix spurious free_space_tree remount warning
  btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
  btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
  btrfs: ref-verify: use 'inline void' keyword ordering

3 years agoMerge tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 5 Mar 2021 20:12:28 +0000 (12:12 -0800)]
Merge tag 'devicetree-fixes-for-5.12-1' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Another batch of graph and video-interfaces schema conversions

 - Drop DT header symlink for dropped C6X arch

 - Fix bcm2711-hdmi schema error

* tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: media: Use graph and video-interfaces schemas, round 2
  dts: drop dangling c6x symlink
  dt-bindings: bcm2711-hdmi: Fix broken schema

3 years agoMerge tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 5 Mar 2021 20:04:59 +0000 (12:04 -0800)]
Merge tag 'trace-v5.12-rc1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Functional fixes:

   - Fix big endian conversion for arm64 in recordmcount processing

   - Fix timestamp corruption in ring buffer on discarding events

   - Fix memory leak in __create_synth_event()

   - Skip selftests if tracing is disabled as it will cause them to
     fail.

  Non-functional fixes:

   - Fix help text in Kconfig

   - Remove duplicate prototype for trace_empty()

   - Fix stale comment about the trace_event_call flags.

  Self test update:

   - Add more information to the validation output of when a corrupt
     timestamp is found in the ring buffer, and also trigger a warning
     to make sure that tests catch it"

* tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix comment about the trace_event_call flags
  tracing: Skip selftests if tracing is disabled
  tracing: Fix memory leak in __create_synth_event()
  ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
  ring-buffer: Force before_stamp and write_stamp to be different on discard
  tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
  tracing: Remove duplicate declaration from trace.h
  ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount

3 years agoRDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
Bob Pearson [Thu, 4 Mar 2021 19:20:49 +0000 (13:20 -0600)]
RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()

In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb
which triggered a warning at 'done:' and could possibly at 'exit:'. The
WARN_ONCE() calls are not actually needed.  The call to free_pkt() is
moved to the end to clearly show that all skbs are freed.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
Bob Pearson [Thu, 4 Mar 2021 19:20:49 +0000 (13:20 -0600)]
RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()

rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error
occurred causing an underflow on the reference counter.  This code is
cleaned up to be clearer and easier to read.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoRDMA/rxe: Fix missed IB reference counting in loopback
Bob Pearson [Thu, 4 Mar 2021 19:20:49 +0000 (13:20 -0600)]
RDMA/rxe: Fix missed IB reference counting in loopback

When the noted patch below extending the reference taken by
rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it
was not matched by a reference in the loopback path resulting in
underflows.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 years agoio_uring: don't restrict issue_flags for io_openat
Pavel Begunkov [Sun, 28 Feb 2021 22:35:14 +0000 (22:35 +0000)]
io_uring: don't restrict issue_flags for io_openat

45d189c606292 ("io_uring: replace force_nonblock with flags") did
something strange for io_openat() slicing all issue_flags but
IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the
flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme into block-5.12
Jens Axboe [Fri, 5 Mar 2021 16:13:07 +0000 (09:13 -0700)]
Merge tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme into block-5.12

Pull NVMe fixes from Christoph:

"nvme fixes for 5.12:

 - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan)
 - fix a hwmon error return (Daniel Wagner)
 - fix the keep alive timeout initialization (Martin George)
 - ensure the model_number can't be changed on a used subsystem
   (Max Gurtovoy)"

* tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme:
  nvmet: model_number must be immutable once set
  nvme-fabrics: fix kato initialization
  nvme-hwmon: Return error code when registration fails
  nvme-pci: add quirks for Lexar 256GB SSD
  nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
  nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.

3 years agoio_uring: make SQPOLL thread parking saner
Jens Axboe [Fri, 5 Mar 2021 15:44:39 +0000 (08:44 -0700)]
io_uring: make SQPOLL thread parking saner

We have this weird true/false return from parking, and then some of the
callers decide to look at that. It can lead to unbalanced parks and
sqd locking. Have the callers check the thread status once it's parked.
We know we have the lock at that point, so it's either valid or it's NULL.

Fix race with parking on thread exit. We need to be careful here with
ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.

Rename sqd->completion to sqd->parked to reflect that this is the only
thing this completion event doesn.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: kill hashed waitqueue before manager exits
Jens Axboe [Fri, 5 Mar 2021 15:14:08 +0000 (08:14 -0700)]
io-wq: kill hashed waitqueue before manager exits

If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.

Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.

Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
Jens Axboe [Fri, 5 Mar 2021 04:02:58 +0000 (21:02 -0700)]
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return

The callback can only be armed, if we get -EIOCBQUEUED returned. It's
important that we clear the WAITQ bit for other cases, otherwise we can
queue for async retry and filemap will assume that we're armed and
return -EAGAIN instead of just blocking for the IO.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't keep looping for more events if we can't flush overflow
Jens Axboe [Fri, 5 Mar 2021 00:15:48 +0000 (17:15 -0700)]
io_uring: don't keep looping for more events if we can't flush overflow

It doesn't make sense to wait for more events to come in, if we can't
even flush the overflow we already have to the ring. Return -EBUSY for
that condition, just like we do for attempts to submit with overflow
pending.

Cc: stable@vger.kernel.org # 5.11
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move to using create_io_thread()
Jens Axboe [Thu, 4 Mar 2021 19:39:36 +0000 (12:39 -0700)]
io_uring: move to using create_io_thread()

This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge branch 'powercap'
Rafael J. Wysocki [Fri, 5 Mar 2021 15:19:10 +0000 (16:19 +0100)]
Merge branch 'powercap'

* powercap:
  powercap/drivers/dtpm: Add the experimental label to the option description
  powercap/drivers/dtpm: Fix root node initialization

3 years agonvmet: model_number must be immutable once set
Max Gurtovoy [Wed, 17 Feb 2021 17:19:40 +0000 (17:19 +0000)]
nvmet: model_number must be immutable once set

In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."

Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.

So allow setting this field only once: using configfs or in the first
identify ctrl operation.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-fabrics: fix kato initialization
Martin George [Thu, 11 Feb 2021 17:58:26 +0000 (23:28 +0530)]
nvme-fabrics: fix kato initialization

Currently kato is initialized to NVME_DEFAULT_KATO for both
discovery & i/o controllers. This is a problem specifically
for non-persistent discovery controllers since it always ends
up with a non-zero kato value. Fix this by initializing kato
to zero instead, and ensuring various controllers are assigned
appropriate kato values as follows:

non-persistent controllers  - kato set to zero
persistent controllers      - kato set to NVMF_DEV_DISC_TMO
                              (or any positive int via nvme-cli)
i/o controllers             - kato set to NVME_DEFAULT_KATO
                              (or any positive int via nvme-cli)

Signed-off-by: Martin George <marting@netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-hwmon: Return error code when registration fails
Daniel Wagner [Fri, 12 Feb 2021 09:30:15 +0000 (10:30 +0100)]
nvme-hwmon: Return error code when registration fails

The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().

Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-pci: add quirks for Lexar 256GB SSD
Pascal Terjan [Tue, 23 Feb 2021 22:10:46 +0000 (22:10 +0000)]
nvme-pci: add quirks for Lexar 256GB SSD

Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN
quirks for this buggy device.

Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417

Signed-off-by: Pascal Terjan <pterjan@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
Zoltán Böszörményi [Sun, 21 Feb 2021 05:12:16 +0000 (06:12 +0100)]
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state

My 2TB SKC2000 showed the exact same symptoms that were provided
in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on
Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed
cold boot to get it back.

According to some sources, the A2000 is simply a rebadged
SKC2000 with a slightly optimized firmware.

Adding the SKC2000 PCI ID to the quirk list with the same workaround
as the A2000 made my laptop survive a 5 hours long Yocto bootstrap
buildfest which reliably triggered the SSD lockup previously.

Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
Julian Einwag [Tue, 16 Feb 2021 12:25:43 +0000 (13:25 +0100)]
nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.

The kernel fails to fully detect these SSDs, only the character devices
are present:

[   10.785605] nvme nvme0: pci function 0000:04:00.0
[   10.876787] nvme nvme1: pci function 0000:81:00.0
[   13.198614] nvme nvme0: missing or invalid SUBNQN field.
[   13.198658] nvme nvme1: missing or invalid SUBNQN field.
[   13.206896] nvme nvme0: Shutdown timeout set to 20 seconds
[   13.215035] nvme nvme1: Shutdown timeout set to 20 seconds
[   13.225407] nvme nvme0: 16/0/0 default/read/poll queues
[   13.233602] nvme nvme1: 16/0/0 default/read/poll queues
[   13.239627] nvme nvme0: Identify Descriptors failed (8194)
[   13.246315] nvme nvme1: Identify Descriptors failed (8194)

Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679
Signed-off-by: Julian Einwag <jeinwag-nvme@marcapo.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
3 years agoMerge tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 5 Mar 2021 03:06:28 +0000 (19:06 -0800)]
Merge tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "More may show up but this is what I have at this stage: just a single
  nouveau regression fix, and a bunch of amdgpu fixes.

  amdgpu:
   - S0ix fix
   - Handle new NV12 SKU
   - Misc power fixes
   - Display uninitialized value fix
   - PCIE debugfs register access fix

  nouveau:
   - regression fix for gk104"

* tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
  drm/amd/display: fix the return of the uninitialized value in ret
  drm/amdgpu: enable BACO runpm by default on sienna cichlid and navy flounder
  drm/amd/pm: correct Arcturus mmTHM_BACO_CNTL register address
  drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable
  drm/amdgpu/pm: make unsupported power profile messages debug
  drm/amdgpu:disable VCN for Navi12 SKU
  drm/amdgpu: Only check for S0ix if AMD_PMC is configured
  drm/nouveau/fifo/gk104-gp1xx: fix creation of sw class

3 years agoMerge tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Linus Torvalds [Fri, 5 Mar 2021 02:53:30 +0000 (18:53 -0800)]
Merge tag 'mkp-scsi-fixes' of git://git./linux/kernel/git/mkp/scsi

Pull iSCSI fixes from Martin Petersen:
 "Three fixes for missed iSCSI verification checks (and make the sysfs
  files use "sysfs_emit()" - that's what it is there for)"

* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
  scsi: iscsi: Verify lengths on passthrough PDUs
  scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
  scsi: iscsi: Restrict sessions and handles to admin capabilities

3 years agoMerge tag 'amd-drm-fixes-5.12-2021-03-03' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 5 Mar 2021 01:13:21 +0000 (11:13 +1000)]
Merge tag 'amd-drm-fixes-5.12-2021-03-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.12-2021-03-03:

amdgpu:
- S0ix fix
- Handle new NV12 SKU
- Misc power fixes
- Display uninitialized value fix
- PCIE debugfs register access fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210304043255.3792-1-alexander.deucher@amd.com
3 years agoMerge branch '00.00-inst' of git://github.com/skeggsb/linux into drm-fixes
Dave Airlie [Fri, 5 Mar 2021 00:55:57 +0000 (10:55 +1000)]
Merge branch '00.00-inst' of git://github.com/skeggsb/linux into drm-fixes

A single regression fix here that I noticed while testing a bunch of
boards for something else, not sure where this got lost!  Prevents 3D
driver from initialising on some GPUs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5gmq14BrDmkMncfd=tHVSSaU89BdBEWfs6Jy-aRz03GQ@mail.gmail.com
3 years agoscsi: iscsi: Verify lengths on passthrough PDUs
Chris Leech [Wed, 24 Feb 2021 05:39:01 +0000 (21:39 -0800)]
scsi: iscsi: Verify lengths on passthrough PDUs

Open-iSCSI sends passthrough PDUs over netlink, but the kernel should be
verifying that the provided PDU header and data lengths fall within the
netlink message to prevent accessing beyond that in memory.

Cc: stable@vger.kernel.org
Reported-by: Adam Nichols <adam@grimm-co.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>