Shiraz Saleem [Tue, 2 Apr 2019 19:52:52 +0000 (14:52 -0500)]
RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs
Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.
Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.
Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.
Move npages tracking to ib_umem_odp as ODP drivers still need it.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 2 Apr 2019 12:39:55 +0000 (15:39 +0300)]
RDMA/cm: Remove useless zeroing of static global variable
Static global variables are initialized to zero by C standard,
there is no need to zero them again.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Potnuri Bharat Teja [Tue, 2 Apr 2019 09:16:11 +0000 (14:46 +0530)]
RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state
On receiving a TERM from tje peer, Host moves the QP to TERMINATE state
and then moves the adapter out of RDMA mode. After issuing a TERM, peer
issues a CLOSE and at this point of time if the connectivity between peer
and host is lost for a significant amount of time, the QP remains in
TERMINATE state.
Therefore c4iw_modify_qp() needs to initiate a close on entering terminate
state.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Mon, 25 Feb 2019 06:56:14 +0000 (08:56 +0200)]
RDMA/mlx5: Cleanup WQE page fault handler
Refactor the page fault handler to be more readable and extensible, this
cleanup was triggered by the error reported below. The code structure made
it unclear to the automatic tools to identify that such a flow is not
possible in real life because "requestor != NULL" means that "qp != NULL"
too.
drivers/infiniband/hw/mlx5/odp.c:1254 mlx5_ib_mr_wqe_pfault_handler()
error: we previously assumed 'qp' could be null (see line 1230)
Fixes:
08100fad5cac ("IB/mlx5: Add ODP SRQ support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Wed, 3 Apr 2019 18:28:05 +0000 (15:28 -0300)]
Merge HFI1 updates into k.o/for-next
Based on rdma.git for-rc for dependencies.
From Dennis Dalessandro:
====================
Here are some code improvement patches and fixes for less serious bugs to
TID RDMA than we sent for RC.
====================
* HFI1 updates:
IB/hfi1: Implement CCA for TID RDMA protocol
IB/hfi1: Remove WARN_ON when freeing expected receive groups
IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE
IB/hfi1: Add a function to read next expected psn from hardware flow
IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 19:20:59 +0000 (12:20 -0700)]
IB/hfi1: Implement CCA for TID RDMA protocol
Currently, FECN handling is not implemented on TID RDMA expected receive
packets and therefore CCA can't be turned on when TID RDMA is
enabled. This patch adds the CCA support to TID RDMA protocol by:
- modifying FECN RSM rule to include kernel receive contexts
- For TID_RDMA READ RESP or TID RDMA ACK packet, a CNP will be sent out if
the FECN bit is set. For other TID RDMA packets that generate at least
one response packet, the BECN bit will be set in the first response
packet
- Copying expected packet data to destination buffer when FECN bit is set
in the TID RDMA READ RESP or TID RDMA WRITE DATA packet. In this case,
the expected packet is received as an eager packet
- Handling the TID sequence error for subsequent normal expected packets.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:59:00 +0000 (09:59 -0700)]
IB/hfi1: Remove WARN_ON when freeing expected receive groups
When PSM user receive context is freed, the expected receive groups
allocated by the receive context will also been freed. However, if there
are still TID entries in use, the receive groups rcd->tid_full_list or
rcd->tid_used_list will not be empty, and thus triggering the WARN_ONs in
the function hfi1_free_ctxt_rcv_groups(). Even if the two lists may not
be empty, the hfi1 driver will free all TID entries and receive groups
associated with the receive context to prevent any resource leakage. Since
a clean user application exit is not controlled by the hfi1 driver, this
patch will remove the WARN_ONs in hfi1_free_ctxt_rcv_groups().
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:58:50 +0000 (09:58 -0700)]
IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE
For expected packet receiving, the hfi1 hardware checks the KDETH PSN
automatically. However, when sequence error occurs, the hfi1 driver can
check the sequence instead until the hardware flow generation is reloaded.
TID RDMA READ and WRITE protocols implement similar software checking
mechanisms, but with different flags and different local variables to
store next expected PSN.
Unify the handling by using only one set of flag and local variable for
both TID RDMA READ and WRITE protocols.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:58:40 +0000 (09:58 -0700)]
IB/hfi1: Add a function to read next expected psn from hardware flow
This patch adds a function to read next expected KDETH PSN from hardware
flow to simplify the code.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:58:30 +0000 (09:58 -0700)]
IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA
The reference of destination memory region is first obtained when TID RDMA
WRITE request is first received on the responder side. This reference is
released once all TID RDMA WRITE RESP packets are sent to the requester
side, even though not all TID RDMA WRITE DATA packets may have been
received. This early release will especially be undesired if the software
needs to access the destination memory before the last data packet is
received.
This patch delays the release of the MR until all TID RDMA DATA packets
have been received. A helper function to release the reference is also
created to simplify the code.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 2 Apr 2019 12:35:13 +0000 (15:35 +0300)]
RDMA/cma: Set proper port number as index
Conversion from IDR to XArray missed the fact that idr_alloc() returned
index as a return value, this index was saved in port variable and used as
query index later on. This caused to the following error.
BUG: KASAN: use-after-free in cma_check_port+0x86a/0xa20 [rdma_cm]
Read of size 8 at addr
ffff888069fde998 by task ucmatose/387
CPU: 3 PID: 387 Comm: ucmatose Not tainted 5.1.0-rc2+ #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x7c/0xc0
print_address_description+0x6c/0x23c
? cma_check_port+0x86a/0xa20 [rdma_cm]
kasan_report.cold.3+0x1c/0x35
? cma_check_port+0x86a/0xa20 [rdma_cm]
? cma_check_port+0x86a/0xa20 [rdma_cm]
cma_check_port+0x86a/0xa20 [rdma_cm]
rdma_bind_addr+0x11bc/0x1b00 [rdma_cm]
? find_held_lock+0x33/0x1c0
? cma_ndev_work_handler+0x180/0x180 [rdma_cm]
? wait_for_completion+0x3d0/0x3d0
ucma_bind+0x120/0x160 [rdma_ucm]
? ucma_resolve_addr+0x1a0/0x1a0 [rdma_ucm]
ucma_write+0x1f8/0x2b0 [rdma_ucm]
? ucma_open+0x260/0x260 [rdma_ucm]
vfs_write+0x157/0x460
ksys_write+0xb8/0x170
? __ia32_sys_read+0xb0/0xb0
? trace_hardirqs_off_caller+0x5b/0x160
? do_syscall_64+0x18/0x3c0
do_syscall_64+0x95/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Allocated by task 381:
__kasan_kmalloc.constprop.5+0xc1/0xd0
cma_alloc_port+0x4d/0x160 [rdma_cm]
rdma_bind_addr+0x14e7/0x1b00 [rdma_cm]
ucma_bind+0x120/0x160 [rdma_ucm]
ucma_write+0x1f8/0x2b0 [rdma_ucm]
vfs_write+0x157/0x460
ksys_write+0xb8/0x170
do_syscall_64+0x95/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 381:
__kasan_slab_free+0x12e/0x180
kfree+0xed/0x290
rdma_destroy_id+0x6b6/0x9e0 [rdma_cm]
ucma_close+0x110/0x300 [rdma_ucm]
__fput+0x25a/0x740
task_work_run+0x10e/0x190
do_exit+0x85e/0x29e0
do_group_exit+0xf0/0x2e0
get_signal+0x2e0/0x17e0
do_signal+0x94/0x1570
exit_to_usermode_loop+0xfa/0x130
do_syscall_64+0x327/0x3c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: <syzbot+2e3e485d5697ea610460@syzkaller.appspotmail.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Fixes:
638267537ad9 ("cma: Convert portspace IDRs to XArray")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 19 Mar 2019 09:10:08 +0000 (11:10 +0200)]
RDMA/hns: Fix bad endianess of port_pd variable
port_pd is treated as le32 in declaration and read, fix assignment to be
in le32 too. This change fixes the following compilation warnings.
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: warning: incorrect type
in assignment (different base types)
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: expected restricted __le32 [usertype] port_pd
drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: got restricted __be32 [usertype]
Fixes:
9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Lijun Ou <ouliun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shamir Rabinovitch [Sun, 31 Mar 2019 16:10:07 +0000 (19:10 +0300)]
IB: Pass only ib_udata in function prototypes
Now when ib_udata is passed to all the driver's object create/destroy APIs
the ib_udata will carry the ib_ucontext for every user command. There is
no need to also pass the ib_ucontext via the functions prototypes.
Make ib_udata the only argument psssed.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shamir Rabinovitch [Sun, 31 Mar 2019 16:10:06 +0000 (19:10 +0300)]
IB: Remove 'uobject->context' dependency in object destroy APIs
Now that we have the udata passed to all the ib_xxx object destroy APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shamir Rabinovitch [Sun, 31 Mar 2019 16:10:05 +0000 (19:10 +0300)]
IB: Pass uverbs_attr_bundle down ib_x destroy path
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x
destroy path as ib_udata. The next patch will use the ib_udata to free the
drivers destroy path from the dependency in 'uobject->context' as we
already did for the create path.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shamir Rabinovitch [Sun, 31 Mar 2019 16:10:04 +0000 (19:10 +0300)]
IB: Pass uverbs_attr_bundle down uobject destroy path
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will
use this to eliminate the dependecy of the drivers in ib_x->uobject
pointers.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shamir Rabinovitch [Sun, 31 Mar 2019 16:10:03 +0000 (19:10 +0300)]
IB: ucontext should be set properly for all cmd & ioctl paths
the Attempt to use the below commit to initialize the ucontext for the
uobject destroy path has shown that the below commit is incomplete.
Parts were reverted and the ucontext set up in the uverbs_attr_bundle was
moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX
macros and rdma_alloc_begin_uobject which is called when uobject is
created.
Fixes:
3d9dfd060391 ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:21:01 +0000 (16:21 -0800)]
opa_vnic: Convert vport_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:21:00 +0000 (16:21 -0800)]
qib: Convert qib_unit_table to XArray
Also remove qib_devs_list.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Fri, 8 Feb 2019 20:41:29 +0000 (15:41 -0500)]
hfi1: Convert hfi1_unit_table to XArray
Also remove hfi1_devs_list.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Sun, 10 Mar 2019 15:27:46 +0000 (17:27 +0200)]
RDMA/core: Don't compare specific bit after boolean AND
There is no need to perform extra comparison after boolean AND.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Sun, 10 Mar 2019 15:27:45 +0000 (17:27 +0200)]
RDMA/netlink: Remove unused data structure
Delete structure which is not connected due to removal in commit
cited in Fixes line.
Fixes:
a78e8723a505 ("RDMA/cma: Remove CM_ID statistics provided by rdma-cm module")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:59 +0000 (16:20 -0800)]
qedr: Convert srqidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:58 +0000 (16:20 -0800)]
qedr: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:57 +0000 (16:20 -0800)]
hfi1: Convert vesw_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 25 Oct 2018 15:15:34 +0000 (11:15 -0400)]
RDMA/hns: Convert qp_table_tree to XArray
Also fully initialise the qp before storing it in the XArray.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:39 +0000 (16:20 -0800)]
RDMA/hns: Convert cq_table to XArray
Change the locking to not disable interrupts as the lookup in interrupt
context will not see a freed CQ, thanks to the synchronize_irq() call
before freeing the CQ.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:49 +0000 (14:01 +0200)]
RDMA/core: Add command to set ib_core device net namspace sharing mode
Add netlink command that enables/disables sharing rdma device among
multiple net namespaces.
Using rdma tool,
$rdma sys set netns shared (default mode)
When rdma subsystem netns mode is set to shared mode, rdma devices
will be accessible in all net namespaces.
Using rdma tool,
$rdma sys set netns exclusive
When rdma subsystem netns mode is set to exclusive mode, devices
will be accessible in only one net namespace at any given
point of time.
If there are any net namespaces other than default init_net exists,
while executing this command, it will fail and mode cannot be changed.
To change this mode, netlink command is used instead of sysctl, because
netlink command allows to auto load a module.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:48 +0000 (14:01 +0200)]
RDMA/core: Add interface to read device namespace sharing mode
Add an interface via netlink command to query whether rdma devices are
shared among multiple net namespaces or not. When using RDMAtool, it can
be queried as,
$rdma system show netns
netns shared
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:47 +0000 (14:01 +0200)]
RDMA/core: Extend ib_device_get_by_index for net namespace
Extend ib_device_get_by_index() API to check device access for
net namespace for serving netlink commands.
Also enforce net ns check on dumpit commands which iterate over all
registered rdma devices and which don't call ib_device_get_by_index().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:46 +0000 (14:01 +0200)]
RDMA: Check net namespace access for uverbs, umad, cma and nldev
Introduce an API rdma_dev_access_netns() to check whether a rdma device
can be accessed from the specified net namespace or not.
Use rdma_dev_access_netns() while opening character uverbs, umad network
device and also check while rdma cm_id binds to rdma device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:45 +0000 (14:01 +0200)]
RDMA/core: Add module param to disable device sharing among net ns
Add module parameter to change a sharing mode of ib_core early in the
boot process. This parameter helps to those systems where modern up
to date rdma tool (iproute2) package may not be available during
kernel upgrade cycle.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 11:56:15 +0000 (13:56 +0200)]
RDMA/core: Support core port attributes in non init_net
Now that sysfs compatibility layer for non init_net exists, add core port
attributes such as pkey and gid table to non init_net ns.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 11:56:13 +0000 (13:56 +0200)]
RDMA/core: Implement compat device/sysfs tree in net namespace
Implement compatibility layer sysfs entries of ib_core so that non
init_net net namespaces can also discover rdma devices.
Each non init_net net namespace has ib_core_device created in it.
Such ib_core_device sysfs tree resembles rdma devices found in
init_net namespace.
This allows discovering rdma devices in multiple non init_net net
namespaces via sysfs entries and helpful to rdma-core userspace.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 11:56:12 +0000 (13:56 +0200)]
RDMA/core: Restrict sysfs entries view to init_net
This is a preparation patch to provide isolation of rdma device in a
network namespace.
As first step, make rdma device visible only in init net namespace.
Subsequent patch will enable rdma device visibility back in multiple net
namespaces using compat ib_core_device device/sysfs tree.
Given that the IB subsystem depends on net stack, it needs to be
initialized after netdev and since it support devices, it needs to be
initialized before the device subsystem; therefore, change initcall
sequence to fs_initcall, so that when ib_core is compiled in the kernel
image, the right init sequence is followed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 11:56:11 +0000 (13:56 +0200)]
RDMA/core: Introduce ib_core_device to hold device
In order to support sysfs entries in multiple net namespaces for a rdma
device, introduce a ib_core_device whose scope is limited to hold core
device and per port sysfs related entries.
This is preparation patch so that multiple ib_core_devices in each net
namespace can be created in subsequent patch who all can share ib_device.
(a) Move sysfs specific fields to ib_core_device.
(b) Make sysfs and device life cycle related routines to work on
ib_core_device.
(c) Introduce and use rdma_init_coredev() helper to initialize
coredev fields.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:47 +0000 (11:49 -0500)]
RDMA/rdmavt: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this buffer.
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:46 +0000 (11:49 -0500)]
RDMA/rxe: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this buffer.
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:45 +0000 (11:49 -0500)]
RDMA/mthca: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this buffer.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:44 +0000 (11:49 -0500)]
RDMA/cxbg: Use correct sizing on buffers holding page DMA addresses
The PBL array that hold the page DMA address is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this array.
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Selvin Xavier [Thu, 28 Mar 2019 16:49:43 +0000 (11:49 -0500)]
RDMA/bnxt_re: Use correct sizing on buffers holding page DMA addresses
umem->nmap is used while allocating internal buffer for storing
page DMA addresses. This causes out of bounds array access while iterating
the umem DMA-mapped SGL with umem page combining as umem->nmap can be
less than number of system pages in umem.
Use ib_umem_num_pages() instead of umem->nmap to size the page array.
Add a new structure (bnxt_qplib_sg_info) to pass sglist, npages and nmap.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:51 +0000 (16:50 -0700)]
IB/qib: Remove a set-but-not-used variable
This patch avoids that a compiler warning is reported when building with
W=1.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes:
49c0e2414b20 ("IB/qib: Change SDMA progression mode depending on single- or multi-rail")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:50 +0000 (16:50 -0700)]
IB/hfi1: Fix two format strings
Enable format string checking for hfi1_cdbg() and fix the resulting
compiler warnings.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:48 +0000 (16:50 -0700)]
IB/mlx5: Declare devx_async_cmd_event_fops static
Avoid that sparse complains about a missing declaration.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes:
6bf8f22aea0d ("IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:47 +0000 (16:50 -0700)]
RDMA/uverbs: Allow the compiler to verify declaration and definition consistency
This patch avoids that sparse reports the following warnings:
drivers/infiniband/core/uverbs_std_types_flow_action.c:442:30: warning: symbol 'uverbs_def_obj_flow_action' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_dm.c:112:30: warning: symbol 'uverbs_def_obj_dm' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_counters.c:153:30: warning: symbol 'uverbs_def_obj_counters' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_mr.c:213:30: warning: symbol 'uverbs_def_obj_mr' was not declared. Should it be static?
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes:
0bd01f3d0907 ("RDMA/uverbs: Require all objects to have a driver destroy function")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:46 +0000 (16:50 -0700)]
RDMA/uverbs: Annotate uverbs_request_next_ptr() return value as a __user pointer
This patch avoids that sparse complains about a mismatch between the
returned value and the function return type.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes:
c3bea3d2dc53 ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:45 +0000 (16:50 -0700)]
RDMA/uverbs: Add a __user annotation to a pointer
This patch avoids that sparse and smatch report the following:
warning: cast removes address space of expression
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Fixes:
3a6532c9af1a ("RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Zhu Yanjun [Wed, 27 Mar 2019 09:50:47 +0000 (05:50 -0400)]
IB/rxe: Replace av->network_type with skb->protocol
In the function rxe_init_packet, based on av->network_type, skb->protocol
is set to ipv4 or ipv6. The functions rxe_prepare and rxe_send are called
after the functin rxe_init_packet. So in these functions,
av->network_type can be replaced with skb->protocol.
The functions are in the xmit fast path. So with skb->protocol, the
performance will be better.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:49 +0000 (14:11 -0700)]
BPF: Add sample code for new ib_umad tracepoint
Provide a count of class types for a summary of MAD packets. The example
shows one way to filter the trace data based on management class.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:48 +0000 (14:11 -0700)]
IB/MAD: Add SMP details to MAD tracing
Decode more information from the packet and include it in the trace.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:47 +0000 (14:11 -0700)]
IB/UMAD: Add umad trace points
Trace MADs going to/from user space.
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:46 +0000 (14:11 -0700)]
IB/MAD: Add agent trace points
Trace agent details when agents are [un]registered. In addition, report
agent details on send/recv.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:45 +0000 (14:11 -0700)]
IB/MAD: Add recv path trace point
Trace received MAD details.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:44 +0000 (14:11 -0700)]
IB/MAD: Add send path trace points
Use the standard Linux trace mechanism to trace MADs being sent. 4 trace
points are added, when the MAD is posted to the qp, when the MAD is
completed, if a MAD is resent, and when the MAD completes in error.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com>
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Potnuri Bharat Teja [Tue, 26 Mar 2019 13:01:47 +0000 (18:31 +0530)]
iw_cxgb4: Update Maintainer details
Remove Steve and add undersigned as maintainer for iw_cxgb4 drivers.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Acked-by: Steve Wise <larrystevenwise@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yuval Shaia [Wed, 20 Mar 2019 14:14:18 +0000 (16:14 +0200)]
RDMA/vmw_pvrdma: Skip zeroing device attrs
Caller already clears props before calling query_device.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Artemy Kovalyov [Tue, 19 Mar 2019 09:24:39 +0000 (11:24 +0200)]
IB/mlx5: Compare only index part of a memory window rkey
The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY
WINDOWS says that if the CI supports the Base Memory Management Extensions
defined in this specification, the R_Key format for a Type 2 Memory Window
must consist of:
* 24 bit index in the most significant bits of the R_Key, which is owned
by the CI, and
* 8 bit key in the least significant bits of the R_Key, which is owned by
the Consumer.
This means that the kernel should compare only the index part of a R_Key
to determine equality with another R_Key.
Fixes:
db570d7deafb ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Artemy Kovalyov [Tue, 19 Mar 2019 09:24:37 +0000 (11:24 +0200)]
IB/mlx5: WQE dump jumps over first 16 bytes
Move index increment after its is used or otherwise it will start the dump
of the WQE from second WQE BB.
Fixes:
34f4c9554d8b ("IB/mlx5: Use fragmented QP's buffer for in-kernel users")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Moni Shoua [Tue, 19 Mar 2019 09:24:36 +0000 (11:24 +0200)]
IB/mlx5: Reset access mask when looping inside page fault handler
If page-fault handler spans multiple MRs then the access mask needs to
be reset before each MR handling or otherwise write access will be
granted to mapped pages instead of read-only.
Cc: <stable@vger.kernel.org> # 3.19
Fixes:
7bdf65d411c1 ("IB/mlx5: Handle page faults")
Reported-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 19 Mar 2019 09:10:09 +0000 (11:10 +0200)]
RDMA/hns: Limit scope of hns_roce_cmq_send()
The forgotten static keyword causes to the following error to appear while
building HNS driver. Declare hns_roce_cmq_send() to be static function to
fix this warning.
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:1089:5: warning: no previous
prototype for _hns_roce_cmq_send_ [-Wmissing-prototypes]
int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
Fixes:
6a04aed6afae ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:55:49 +0000 (09:55 -0700)]
IB/hfi1: Fix the allocation of RSM table
The receive side mapping (RSM) on hfi1 hardware is a special
matching mechanism to direct an incoming packet to a given
hardware receive context. It has 4 instances of matching capabilities
(RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of
256 entries, each of which points to a receive context.
Currently, three instances of RSM have been used:
1. RSM0 by QOS;
2. RSM1 by PSM FECN;
3. RSM2 by VNIC.
Each RSM instance should reserve enough entries in RMT to function
properly. Since both PSM and VNIC could allocate any receive context
between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must
reserve enough RMT entries to cover the entire receive context index
range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only
the user receive contexts allocated for PSM
(dd->num_user_contexts). Consequently, the sizing of
dd->num_user_contexts in set_up_context_variables is incorrect.
Fixes:
2280740f01ae ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:55:39 +0000 (09:55 -0700)]
IB/hfi1: Eliminate opcode tests on mr deref
When an old ack_queue entry is used to store an incoming request, it may
need to clean up the old entry if it is still referencing the
MR. Originally only RDMA READ request needed to reference MR on the
responder side and therefore the opcode was tested when cleaning up the
old entry. The introduction of tid rdma specific operations in the
ack_queue makes the specific opcode tests wrong. Multiple opcodes (RDMA
READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup.
Remove the opcode specific tests associated with the ack_queue.
Fixes:
f48ad614c100 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:55:29 +0000 (09:55 -0700)]
IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state
When a QP is put into error state, it may be waiting for send engine
resources. In this case, the QP will be removed from the send engine's
waiting list, but its IOWAIT pending bits are not cleared. This will
normally not have any major impact as the QP is being destroyed. However,
the QP still needs to wind down its operations, such as draining the send
queue by scheduling the send engine. Clearing the pending bits will avoid
any potential complications. In addition, if the QP will eventually hang,
clearing the pending bits can help debugging by presenting a consistent
picture if the user dumps the qp_stats.
This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in
priv->s_iowait.flags in this case.
Fixes:
5da0fc9dbf89 ("IB/hfi1: Prepare resource waits for dual leg")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kaike Wan [Mon, 18 Mar 2019 16:55:19 +0000 (09:55 -0700)]
IB/hfi1: Failed to drain send queue when QP is put into error state
When a QP is put into error state, all pending requests in the send work
queue should be drained. The following sequence of events could lead to a
failure, causing a request to hang:
(1) The QP builds a packet and tries to send through SDMA engine.
However, PIO engine is still busy. Consequently, this packet is put on
the QP's tx list and the QP is put on the PIO waiting list. The field
qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;
(2) The QP is put into error state by the user application and
notify_error_qp() is called, which removes the QP from the PIO waiting
list and the packet from the QP's tx list. In addition, qp->s_flags is
cleared of RVT_S_ANY_WAIT_IO bits, which does not include
HFI1_S_WAIT_PIO_DRAIN bit;
(3) The hfi1_schdule_send() function is called to drain the QP's send
queue. Subsequently, hfi1_do_send() is called. Since the flag bit
HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As
a result, hfi1_do_send() bails out without draining any request from
the send queue;
(4) The PIO engine completes the sending and tries to wake up any QP on
its waiting list. But the QP has been removed from the PIO waiting
list and therefore is kept in sleep forever.
The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.
Fixes:
2e2ba09e48b7 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
Cc: <stable@vger.kernel.org> # 4.19.x+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Sun, 17 Mar 2019 10:11:14 +0000 (12:11 +0200)]
overflow: Fix -Wtype-limits compilation warnings
Attempt to use check_shl_overflow() with inputs of unsigned type
produces the following compilation warnings.
drivers/infiniband/hw/mlx5/qp.c: In function _set_user_rq_size_:
./include/linux/overflow.h:230:6: warning: comparison of unsigned
expression >= 0 is always true [-Wtype-limits]
_s >= 0 && _s < 8 * sizeof(*d) ? _s : 0; \
^~
drivers/infiniband/hw/mlx5/qp.c:5820:6: note: in expansion of macro _check_shl_overflow_
if (check_shl_overflow(rwq->wqe_count, rwq->wqe_shift,
&rwq->buf_size))
^~~~~~~~~~~~~~~~~~
./include/linux/overflow.h:232:26: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
(_to_shift != _s || *_d < 0 || _a < 0 || \
^
drivers/infiniband/hw/mlx5/qp.c:5820:6: note: in expansion of macro _check_shl_overflow_
if (check_shl_overflow(rwq->wqe_count, rwq->wqe_shift, &rwq->buf_size))
^~~~~~~~~~~~~~~~~~
./include/linux/overflow.h:232:36: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
(_to_shift != _s || *_d < 0 || _a < 0 || \
^
drivers/infiniband/hw/mlx5/qp.c:5820:6: note: in expansion of macro _check_shl_overflow_
if (check_shl_overflow(rwq->wqe_count, rwq->wqe_shift,&rwq->buf_size))
^~~~~~~~~~~~~~~~~~
Fixes:
0c66847793d1 ("overflow.h: Add arithmetic shift helper")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Sat, 16 Mar 2019 23:05:12 +0000 (23:05 +0000)]
IB/iser: remove uninitialized variable len
The variable len is not being inintialized and the uninitialized value is
being returned. However, this return path is never reached because the
default case in the switch statement returns -ENOSYS. Clean up the code
by replacing the return -ENOSYS with a break for the default case and
returning -ENOSYS at the end of the function. This allows len to be
removed. Also remove redundant break that follows a return statement.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Kangjie Lu [Fri, 15 Mar 2019 06:57:14 +0000 (01:57 -0500)]
RDMA/i40iw: Handle workqueue allocation failure
alloc_ordered_workqueue may fail and return NULL. The fix captures the
failure and handles it properly to avoid potential NULL pointer
dereferences.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Reviewed-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Wed, 13 Mar 2019 19:05:59 +0000 (12:05 -0700)]
IB/core: Ensure an invalidate_range callback on ODP MR
No device supports ODP MR without an invalidate_range callback.
Warn on any any device which attempts to support ODP without supplying
this callback.
Then we can remove the checks for the callback within the code.
This stems from the discussion
https://www.spinics.net/lists/linux-rdma/msg76460.html
...which concluded this code was no longer necessary.
Acked-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 12 Mar 2019 08:15:44 +0000 (10:15 +0200)]
RDMA/rxe: Fix slab-out-bounds access which lead to kernel crash later
BUG: KASAN: slab-out-of-bounds in rxe_mem_init_user+0x6c1/0x740 [rdma_rxe]
Read of size 8 at addr
ffff88805c01a608 by task ib_send_bw/573
CPU: 24 PID: 573 Comm: ib_send_bw Not tainted 5.0.0-rc5+ #189
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
rxe_mem_init_user+0x6c1/0x740 [rdma_rxe]
rxe_reg_user_mr+0x9b/0x110 [rdma_rxe]
ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs]
ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs]
rxe_mem_init_user+0x6c1/0x740 [rdma_rxe]
rxe_reg_user_mr+0x9b/0x110 [rdma_rxe]
ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs]
ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs]
ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs]
ib_uverbs_ioctl+0x202/0x310 [ib_uverbs]
do_vfs_ioctl+0x193/0x1440
ksys_ioctl+0x3a/0x70
__x64_sys_ioctl+0x6f/0xb0
do_syscall_64+0x13f/0x570
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Allocated by task 573:
__kasan_kmalloc.constprop.5+0xc1/0xd0
__kmalloc+0x161/0x310
rxe_mem_alloc+0x52/0x470 [rdma_rxe]
rxe_mem_init_user+0x113/0x740 [rdma_rxe]
rxe_reg_user_mr+0x9b/0x110 [rdma_rxe]
ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs]
ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs]
ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs]
ib_uverbs_ioctl+0x202/0x310 [ib_uverbs]
do_vfs_ioctl+0x193/0x1440
ksys_ioctl+0x3a/0x70
__x64_sys_ioctl+0x6f/0xb0
do_syscall_64+0x13f/0x570
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 0:
__kasan_slab_free+0x12e/0x180
kfree+0x10a/0x2c0
rcu_process_callbacks+0xa77/0x1260
__do_softirq+0x2ad/0xacb
Test scenario:
ib_send_bw -x 1 -d rxe0 -a &
ib_send_bw -x 1 -d rxe0 -a localhost
Fixes:
8700e3e7c485 ("Soft RoCE driver")
Reported-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Enrico Weigelt, metux IT consult [Wed, 6 Mar 2019 22:08:45 +0000 (23:08 +0100)]
drivers: infiniband: Fix whitespace in kconfig
Adjust the kconfig whitespace in bnxt_re/iser to match the kernel
standard.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Sat, 2 Mar 2019 23:06:36 +0000 (23:06 +0000)]
RDMA/nes: remove redundant check on udata
The non-null check on udata is redundant as this check was performed just
a few statements earlier and the check is always true as udata must be
non-null at this point. Remove redundant the check on udata and the
redundant else part that can never be executed.
Detected by CoverityScan, CID#1477317 ("Logically dead code")
Fixes:
899444505473 ("IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:21:06 +0000 (16:21 -0800)]
cma: Convert portspace IDRs to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:21:03 +0000 (16:21 -0800)]
ucm: Convert ctx_id_table to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:46 +0000 (16:20 -0800)]
ib core: Convert query_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:43 +0000 (16:20 -0800)]
RDMA/cm: Convert local_id_table to XArray
Also introduce cm_local_id() to reduce the amount of boilerplate when
converting a local ID to an XArray index.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:42 +0000 (16:20 -0800)]
IB/mad: Convert ib_mad_clients to XArray
Pull the allocation function out into its own function to reduce the
length of ib_register_mad_agent() a little and keep all the allocation
logic together.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:44 +0000 (16:20 -0800)]
mlx4: Convert pv_id_table to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:37 +0000 (16:20 -0800)]
mlx5: Convert mlx5_srq_table to XArray
Remove the custom spinlock as the XArray handles its own locking.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Tue, 26 Feb 2019 16:46:16 +0000 (08:46 -0800)]
IB/hfi1: Add running average for adaptive pio
The adaptive PIO implementation only considers the current packet size
when deciding between SDMA and pio for a packet.
This causes credit return forces if small and large packets are
interleaved.
Add a running average to avoid costly credit forces so that a large
sequence of small packets is required to go below the threshold that
chooses pio.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Erez Alfasi [Mon, 25 Feb 2019 06:52:30 +0000 (08:52 +0200)]
RDMA: Use __packed annotation instead of __attribute__ ((packed))
"__attribute__" set of macros has been standardized, have became more
potentially portable and consistent code back in v2.6.21 by commit
82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros").
Moreover, nowadays checkpatch.pl warns about using __attribute__((packed))
instead of __packed.
This patch converts all the "__attribute__ ((packed))" annotations to
"__packed" within the RDMA subsystem.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:28 +0000 (20:01 +0800)]
RDMA/hns: Delete unused variable in hns_roce_v2_modify_qp function
The src_mac array is not used in hns_roce_v2_modify_qp function.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:26 +0000 (20:01 +0800)]
RDMA/hns: Bugfix for sending with invalidate
According to IB protocol, the send with invalidate operation will not
invalidate mr that was created through a register mr or reregister mr.
Fixes:
e93df0108579 ("RDMA/hns: Support local invalidate for hip08 in kernel space")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:25 +0000 (20:01 +0800)]
RDMA/hns: Hide error print information with roce vf device
The driver should not print the error information when the hip08 driver
not support virtual function.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:24 +0000 (20:01 +0800)]
RDMA/hns: Only assgin some fields if the relatived attr_mask is set
According to IB protocol, some fields of qp context are filled with
optional when the relatived attr_mask are set. The relatived attr_mask
include IB_QP_TIMEOUT, IB_QP_RETRY_CNT, IB_QP_RNR_RETRY and
IB_QP_MIN_RNR_TIMER. Besides, we move some assignments of the fields of
qp context into the outside of the specific qp state jump function.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:23 +0000 (20:01 +0800)]
RDMA/hns: Update the range of raq_psn field of qp context
According to hip08 UM(User Manual), the raq_psn field size is [23:0].
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:22 +0000 (20:01 +0800)]
RDMA/hns: Only assign the fields of the rq psn if IB_QP_RQ_PSN is set
Only when the IB_QP_RQ_PSN flags of attr_mask is set is it valid to assign
the relatived fields of rq'psn into the qp context when modified qp.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:21 +0000 (20:01 +0800)]
RDMA/hns: Only assign the relatived fields of psn if IB_QP_SQ_PSN is set
Only when the IB_QP_SQ_PSN flags of attr_mask is set is it valid to assign
the relatived fields of psn into the qp context when modified qp.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:55 +0000 (16:20 -0800)]
cxgb4: Convert stid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:54 +0000 (16:20 -0800)]
cxgb4: Convert atid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:53 +0000 (16:20 -0800)]
cxgb4: Convert hwtid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:52 +0000 (16:20 -0800)]
cxgb4: Convert mmidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:51 +0000 (16:20 -0800)]
cxgb4: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:50 +0000 (16:20 -0800)]
cxgb4: Convert cqidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:49 +0000 (16:20 -0800)]
cxgb3: Convert mmidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:48 +0000 (16:20 -0800)]
cxgb3: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:47 +0000 (16:20 -0800)]
cxgb3: Convert cqidr to XArray
It would make sense to convert this to an allocating XArray and remove the
kfifo that is currently used to allocate the CQID, but that work is better
done by someone who has the hardware to test with.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Linus Torvalds [Sun, 24 Mar 2019 21:02:26 +0000 (14:02 -0700)]
Linux 5.1-rc2
Linus Torvalds [Sun, 24 Mar 2019 20:41:37 +0000 (13:41 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for 5.1"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: report real fs size after failed resize
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: remove useless ext4_pin_inode()
ext4: avoid panic during forced reboot
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted
Linus Torvalds [Sun, 24 Mar 2019 18:42:10 +0000 (11:42 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
"Third more careful attempt for this set of fixes:
- Prevent a 32bit math overflow in the cpufreq code
- Fix a buffer overflow when scanning the cgroup2 cpu.max property
- A set of fixes for the NOHZ scheduler logic to prevent waking up
CPUs even if the capacity of the busy CPUs is sufficient along with
other tweaks optimizing the behaviour for asymmetric systems
(big/little)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Skip LLC NOHZ logic for asymmetric systems
sched/fair: Tune down misfit NOHZ kicks
sched/fair: Comment some nohz_balancer_kick() kick conditions
sched/core: Fix buffer overflow in cgroup2 property cpu.max
sched/cpufreq: Fix 32-bit math overflow
Linus Torvalds [Sun, 24 Mar 2019 18:16:27 +0000 (11:16 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"A larger set of perf updates.
Not all of them are strictly fixes, but that's solely the tip
maintainers fault as they let the timely -rc1 pull request fall
through the cracks for various reasons including travel. So I'm
sending this nevertheless because rebasing and distangling fixes and
updates would be a mess and risky as well. As of tomorrow, a strict
fixes separation is happening again. Sorry for the slip-up.
Kernel:
- Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different
consumers of the mmap event get what they requested.
Tools:
- A larger set of updates to perf record/report/scripts vs. time
stamp handling
- More Python3 fixups
- A pile of memory leak plumbing
- perf BPF improvements and fixes
- Finalize the perf.data directory storage"
[ Note: the kernel part is strictly a fix, the updates are purely to
tooling - Linus ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf bpf: Show more BPF program info in print_bpf_prog_info()
perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog()
perf tools: Save bpf_prog_info and BTF of new BPF programs
perf evlist: Introduce side band thread
perf annotate: Enable annotation of BPF programs
perf build: Check what binutils's 'disassembler()' signature to use
perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation
perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO
perf feature detection: Add -lopcodes to feature-libbfd
perf top: Add option --no-bpf-event
perf bpf: Save BTF information as headers to perf.data
perf bpf: Save BTF in a rbtree in perf_env
perf bpf: Save bpf_prog_info information as headers to perf.data
perf bpf: Save bpf_prog_info in a rbtree in perf_env
perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool
perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()
bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()
tools lib bpf: Introduce bpf_program__get_prog_info_linear()
perf record: Replace option --bpf-event with --no-bpf-event
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
...