platform/kernel/linux-rpi.git
6 years agordma/cxgb4: Fix SRQ endianness annotations
Bart Van Assche [Tue, 31 Jul 2018 15:25:41 +0000 (08:25 -0700)]
rdma/cxgb4: Fix SRQ endianness annotations

This patch avoids that sparse complains about casts to restricted __be32.

Fixes: a3cdaa69e4ae ("cxgb4: Adds CPL support for Shared Receive Queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agordma/cxgb4: Remove a set-but-not-used variable
Bart Van Assche [Tue, 31 Jul 2018 15:08:15 +0000 (08:08 -0700)]
rdma/cxgb4: Remove a set-but-not-used variable

This patch avoids that the following warning is reported when building with
W=1:

drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable]
  u8 status;
     ^~~~~~

Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Prefix _ib to IB/RoCE specific functions
Parav Pandit [Sun, 29 Jul 2018 08:53:16 +0000 (11:53 +0300)]
RDMA/core: Prefix _ib to IB/RoCE specific functions

In rdma cm module, functions which are common between IB and iWarp
are named with cma_.
iWarp specific functions are prefixed with cma_iw.
IB specific functions are perfixed with cma_ib.

However some functions in request processing path didn't follow
cma_ib notion. Prefix them with _ib for better code clarity.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Simplify gid type check in cma_acquire_dev()
Parav Pandit [Sun, 29 Jul 2018 08:53:15 +0000 (11:53 +0300)]
RDMA/core: Simplify gid type check in cma_acquire_dev()

cma_add_one() initializes the default GID regardless of device type.
listen_id is bound to a device and an IP address, its GID type is
initialized by cma_acquire_dev().

Therefore a valid default GID type is always available, it is not needed
to check port type during cma_acquire_dev().

Initialize gid type of a cm id when the cm_id is created instead of
doing conditional checks during cma_acquire_dev() and trying to
initialize to 0 during _cma_attach_to_dev().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Avoid holding lock while initializing fields on stack
Parav Pandit [Sun, 29 Jul 2018 08:53:14 +0000 (11:53 +0300)]
RDMA/core: Avoid holding lock while initializing fields on stack

In various functions rdma_cm_event is zero initialized on stack using
memset() while holding lock which is not necessary.
Therefore, don't hold the lock while initializing on stack.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Return bool instead of int
Parav Pandit [Sun, 29 Jul 2018 08:53:13 +0000 (11:53 +0300)]
RDMA/core: Return bool instead of int

Return bool for following internal and inline functions as their
underlying APIs return bool too.

1. cma_zero_addr()
2. cma_loopback_addr()
3. cma_any_addr()
4. ib_addr_any()
5. ib_addr_loopback()

While we are touching cma_loopback_addr(), remove extra white spaces
in it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Get rid of 1 bit boolean
Parav Pandit [Sun, 29 Jul 2018 08:53:12 +0000 (11:53 +0300)]
RDMA/cma: Get rid of 1 bit boolean

Arrange fields of cma_req_info structure for efficiency on
stack and get rid of one bit boolean field.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Constify path record, ib_cm_event, listen_id pointers
Parav Pandit [Sun, 29 Jul 2018 08:53:11 +0000 (11:53 +0300)]
RDMA/cma: Constify path record, ib_cm_event, listen_id pointers

Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Constify dst_addr argument
Parav Pandit [Sun, 29 Jul 2018 08:53:10 +0000 (11:53 +0300)]
RDMA/core: Constify dst_addr argument

Following APIs are not supposed to modify addr or dest_addr contents.
Therefore make those function argument const for better code
readability.

1. rdma_resolve_ip()
2. rdma_addr_size()
3. rdma_resolve_addr()

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Simplify rdma_resolve_addr() error flow
Parav Pandit [Sun, 29 Jul 2018 08:53:09 +0000 (11:53 +0300)]
RDMA/cma: Simplify rdma_resolve_addr() error flow

Currently dst address is first set and later on cleared on either of the
3 error conditions are met.
However none of the APIs or checks are supposed to refer to the
destination address of the cm_id.
Therefore, set the destination address after necessary checks pass which
simplifies the error flow.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Initialize resource type in __rdma_create_id()
Parav Pandit [Sun, 29 Jul 2018 08:53:08 +0000 (11:53 +0300)]
RDMA/cma: Initialize resource type in __rdma_create_id()

Currently rdma_cm_id's resource tracking fields such as owner task and
kern_name and other non resource tracking fields are initialized in
in single function __rdma_create_id().

Therefore, initialize rdma_cm_id's resource type also in same init
function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Program the tclass and flow label into the hardware
Lijun Ou [Mon, 30 Jul 2018 12:20:30 +0000 (20:20 +0800)]
RDMA/hns: Program the tclass and flow label into the hardware

This was missed in a few places, and was just using 0.

Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Use macro instead of magic number
Lijun Ou [Mon, 30 Jul 2018 12:20:29 +0000 (20:20 +0800)]
RDMA/hns: Use macro instead of magic number

This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order
to improve readability.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Modify qp will return errno when qp type is illegal
Lijun Ou [Mon, 30 Jul 2018 12:20:28 +0000 (20:20 +0800)]
RDMA/hns: Modify qp will return errno when qp type is illegal

Set for ret was missing in the error path here, resulting in incorrect
error code for modify_qp.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Assign the value for vlan field of qp context
Lijun Ou [Mon, 30 Jul 2018 12:20:27 +0000 (20:20 +0800)]
RDMA/hns: Assign the value for vlan field of qp context

This patch mainly fills the correct value into the vlan id field of qp
context as well as update the vlan field name according to the latest
hardware user manual.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set
Lijun Ou [Mon, 30 Jul 2018 12:20:25 +0000 (20:20 +0800)]
RDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set

Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the
related fields of the av into the qp context.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/providers: Remove pointless functions
Kamal Heib [Fri, 27 Jul 2018 18:23:06 +0000 (21:23 +0300)]
RDMA/providers: Remove pointless functions

The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Check for verbs callbacks before using them
Kamal Heib [Fri, 27 Jul 2018 18:23:05 +0000 (21:23 +0300)]
RDMA/core: Check for verbs callbacks before using them

Make sure the providers implement the verbs callbacks before calling
them, otherwise return -EOPNOTSUPP.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Remove {create,destroy}_ah from mandatory verbs
Kamal Heib [Fri, 27 Jul 2018 18:23:04 +0000 (21:23 +0300)]
RDMA/core: Remove {create,destroy}_ah from mandatory verbs

{create,destroy}_ah aren't mandatory verbs, because not all providers
are implementing them.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/ipoib: Fix check for return code from ib_create_srq
Kamal Heib [Mon, 30 Jul 2018 18:56:44 +0000 (21:56 +0300)]
RDMA/ipoib: Fix check for return code from ib_create_srq

Make sure to check for "-EOPNOTSUPP" instead of "-ENOSYS" which is the
return code from ib_create_srq() in case that it not supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/providers: Fix return value from create_srq callbacks
Kamal Heib [Mon, 30 Jul 2018 18:56:43 +0000 (21:56 +0300)]
RDMA/providers: Fix return value from create_srq callbacks

The proper return code is "-EOPNOTSUPP" when the create_srq() callback
is not supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx4: Use 4K pages for kernel QP's WQE buffer
Jack Morgenstein [Thu, 26 Jul 2018 07:08:37 +0000 (10:08 +0300)]
IB/mlx4: Use 4K pages for kernel QP's WQE buffer

In the current implementation, the driver tries to allocate contiguous
memory, and if it fails, it falls back to 4K fragmented allocation.

Once the memory is fragmented, the first allocation might take a lot
of time, and even fail, which can cause connection failures.

This patch changes the logic to always allocate with 4K granularity,
since it's more robust and more likely to succeed.

This patch was tested with Lustre and no performance degradation
was observed.

Note: This commit eliminates the "shrinking WQE" feature. This feature
depended on using vmap to create a virtually contiguous send WQ.
vmap use was abandoned due to problems with several processors (see the
commit cited below). As a result, shrinking WQE was available only with
physically contiguous send WQs. Allocating such send WQs caused the
problems described above.
Therefore, as a side effect of eliminating the use of large physically
contiguous send WQs, the shrinking WQE feature became unavailable.

Warning example:
worker/20:1: page allocation failure: order:8, mode:0x80d0
CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<ffffffff81686d81>] dump_stack+0x19/0x1b
[<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[<ffffffff81184fae>] __get_free_pages+0xe/0x50
[<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
[<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
[<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
[<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
[<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
[<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
[<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
[<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
[<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
[<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]

Fixes: 73898db04301 ("net/mlx4: Avoid wrong virtual mappings")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language
Jason Gunthorpe [Thu, 26 Jul 2018 22:37:14 +0000 (16:37 -0600)]
IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language

This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.

Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.

If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.

Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const
Bart Van Assche [Wed, 18 Jul 2018 16:25:32 +0000 (09:25 -0700)]
RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const

Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument
Bart Van Assche [Wed, 18 Jul 2018 16:25:15 +0000 (09:25 -0700)]
IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument

Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA: Constify the argument of the work request conversion functions
Bart Van Assche [Wed, 18 Jul 2018 16:25:14 +0000 (09:25 -0700)]
RDMA: Constify the argument of the work request conversion functions

When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/iser: Inline two work request conversion functions
Bart Van Assche [Wed, 18 Jul 2018 16:25:13 +0000 (09:25 -0700)]
IB/iser: Inline two work request conversion functions

Since the next patch will change the return type of these functions into a
const pointer and since the iSER driver modifies the work request these
functions return a pointer two, inline two work request conversion
function calls. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/cache: Restore compatibility for ib_query_gid
Jason Gunthorpe [Fri, 27 Jul 2018 15:48:30 +0000 (09:48 -0600)]
IB/cache: Restore compatibility for ib_query_gid

Code changes in smc have become so complicated this cycle that the RDMA
patches to remove ib_query_gid in smc create too complex merge conflicts.
Allow those conflicts to be resolved by using the net/smc hunks by
providing a compatibility wrapper. During the second phase of the merge
window this wrapper will be deleted and smc updated to use the new API.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Enable modify_cq for uverbs.
Lijun Ou [Wed, 25 Jul 2018 07:29:41 +0000 (15:29 +0800)]
RDMA/hns: Enable modify_cq for uverbs.

The driver implements the modify_cq callback, but did not set the bit to
expose it to userspace.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Update the data type of immediate data
Lijun Ou [Wed, 25 Jul 2018 07:29:40 +0000 (15:29 +0800)]
RDMA/hns: Update the data type of immediate data

Because the data structure of hip08 is little endian, it needs to fix the
immediate field of wqe and cqe into __le32.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Use delay instead of usleep
Lijun Ou [Wed, 25 Jul 2018 07:29:38 +0000 (15:29 +0800)]
RDMA/hns: Use delay instead of usleep

In order to avoid using usleep function in lock function, we use delay
function instead of it.  Besides, it also use brackets for standardized
the computed order.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Add illegal hop_num judgement
Lijun Ou [Wed, 25 Jul 2018 07:29:37 +0000 (15:29 +0800)]
RDMA/hns: Add illegal hop_num judgement

When hop_num is more than three, it need to return -EINVAL.  This patch
fixes it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Return correct error code from hns_roce_v1_rsv_lp_qp()
Lijun Ou [Wed, 25 Jul 2018 07:29:36 +0000 (15:29 +0800)]
RDMA/hns: Return correct error code from hns_roce_v1_rsv_lp_qp()

When create loop qp fail, it will return the correct result when
modify_qp() fails.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Add 50GE type of hnae3 device match
Lijun Ou [Wed, 25 Jul 2018 07:29:33 +0000 (15:29 +0800)]
RDMA/hns: Add 50GE type of hnae3 device match

This patch adds PCI matching for the hns 50GE NIC.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/hns: Do not overwrite the error code during error unwind in hns_roce_init
Lijun Ou [Wed, 25 Jul 2018 07:29:31 +0000 (15:29 +0800)]
RDMA/hns: Do not overwrite the error code during error unwind in hns_roce_init

When init cmq fail in initial flow of RoCE, it should return the errno of
cmq_init function, not of the rest call.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port
Qing Huang [Mon, 23 Jul 2018 21:15:08 +0000 (14:15 -0700)]
IB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port

When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."

So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
repeatedly attempts to bind the new mpi structure to every device on the
list until it finds an unbound device.

Change the log level from warn to dbg to avoid log flooding as the warning
should be harmless.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/usnic: Suppress a compiler warning
Bart Van Assche [Mon, 23 Jul 2018 22:37:01 +0000 (15:37 -0700)]
RDMA/usnic: Suppress a compiler warning

This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

drivers/infiniband/hw/usnic/usnic_fwd.c:95:2: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
  strncpy(ufdev->name, netdev_name(ufdev->netdev),
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    sizeof(ufdev->name) - 1);
    ~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/xprtrdma: Restore needed argument to ib_post_send
Jason Gunthorpe [Thu, 26 Jul 2018 17:36:50 +0000 (11:36 -0600)]
net/xprtrdma: Restore needed argument to ib_post_send

The call in svc_rdma_post_chunk_ctxt() does actually use bad_wr.

Fixes: ed288d74a9e5 ("net/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Do not ignore net namespace for unbound cm_id
Parav Pandit [Mon, 16 Jul 2018 08:50:13 +0000 (11:50 +0300)]
RDMA/cma: Do not ignore net namespace for unbound cm_id

Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.

Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/cma: Consider netdevice for RoCE ports
Parav Pandit [Mon, 16 Jul 2018 08:50:12 +0000 (11:50 +0300)]
RDMA/cma: Consider netdevice for RoCE ports

When netdevice is not found for a request, and if it for RoCE port,
currently it allows matching the listener as long as port number matches
by ignoring the netdevice.

Now that we always prefer to have netdevice associated with RoCE, when
netdevice is not found, don't consider RoCE ports.

In other words, a NULL netdevice with RoCE is not acceptable. Therefore,
remove this confusing RoCE port ignorance check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Introduce and use sgid_attr in CM requests
Parav Pandit [Mon, 16 Jul 2018 08:50:11 +0000 (11:50 +0300)]
IB/core: Introduce and use sgid_attr in CM requests

For RoCE, when CM requests are received for RC and UD connections,
netdevice of the incoming request is unavailable. Because of that CM
requests are always forwarded to init_net namespace.

Now that we have the GID attribute available, introduce SGID attribute in
incoming CM requests and refer to the netdevice of it.  This is similar to
existing SGID attribute field in outgoing CM requests for RC and UD
transports.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/usnic: usnic should not select INFINIBAND_USER_ACCESS
Jason Gunthorpe [Tue, 24 Jul 2018 20:37:52 +0000 (14:37 -0600)]
IB/usnic: usnic should not select INFINIBAND_USER_ACCESS

This driver doesn't provide any kernel services, it only provides
an interface via uverbs, so it should depend on, not select, uverbs
support.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agordma/cxgb4: Add support for kernel mode SRQ's
Raju Rangoju [Wed, 25 Jul 2018 15:52:14 +0000 (21:22 +0530)]
rdma/cxgb4: Add support for kernel mode SRQ's

This patch implements the srq specific verbs such as create/destroy/modify
and post_srq_recv. And adds srq specific structures and defines to t4.h
and uapi.

Also updates the cq poll logic to deal with completions that are
associated with the SRQ's.

This patch also handles kernel mode SRQ_LIMIT events as well as flushed
SRQ buffers

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agordma/cxgb4: Add support for srq functions & structs
Raju Rangoju [Wed, 25 Jul 2018 15:52:13 +0000 (21:22 +0530)]
rdma/cxgb4: Add support for srq functions & structs

This patch adds kernel mode t4_srq structures and support functions,
uapi structures and defines, as well as firmware work request structures.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Remove extra parentheses
Varsha Rao [Wed, 25 Jul 2018 18:43:56 +0000 (20:43 +0200)]
IB/core: Remove extra parentheses

Remove unnecessary parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/ocrdma: Suppress a compiler warning
Bart Van Assche [Wed, 18 Jul 2018 16:01:35 +0000 (09:01 -0700)]
RDMA/ocrdma: Suppress a compiler warning

This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

In function 'ocrdma_mbx_get_ctrl_attribs',
    inlined from 'ocrdma_init_hw' at drivers/infiniband/hw/ocrdma/ocrdma_hw.c:3224:11:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:1368:3: warning: 'strncpy' output may be truncated copying 31 bytes from a string of length 31 [-Wstringop-truncation]
   strncpy(dev->model_number,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~
    hba_attribs->controller_model_number, 31);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Fix locking around struct ib_uverbs_file ucontext
Jason Gunthorpe [Tue, 10 Jul 2018 19:43:06 +0000 (13:43 -0600)]
IB/uverbs: Fix locking around struct ib_uverbs_file ucontext

We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.

Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
  while holding a READ or WRITE lock on the uobject.
  This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL

This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.

As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Use the ucontext from the uobj, not the file
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:22 +0000 (20:55 -0600)]
IB/mlx5: Use the ucontext from the uobj, not the file

This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext.  Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Move the FD uobj type struct file allocation to alloc_commit
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:21 +0000 (20:55 -0600)]
IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit

Allocating the struct file during alloc_begin creates this strange
asymmetry with IDR, where the FD has two krefs pointing at it during the
pre-commit phase. In particular this makes the abort process for FD very
strange and confusing.

For instance abort currently calls the type's destroy_object twice, and
the fops release once if abort is done. This is very counter intuitive. No
fops should be called until alloc_commit succeeds, and destroy_object
should only ever be called once.

Moving the struct file allocation to the alloc_commit is now simple, as we
already support failure of rdma_alloc_commit_uobject, with all the
required rollback pieces.

This creates an understandable symmetry with IDR and simplifies/fixes the
abort handling for FD types.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:20 +0000 (20:55 -0600)]
IB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()

The ioctl framework already does this correctly, but the write path did
not. This is trivially fixed by simply using a standard pattern to return
uobj_alloc_commit() as the last statement in every function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Rework the locking for cleaning up the ucontext
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:19 +0000 (20:55 -0600)]
IB/uverbs: Rework the locking for cleaning up the ucontext

The locking here has always been a bit crazy and spread out, upon some
careful analysis we can simplify things.

Create a single function uverbs_destroy_ufile_hw() that internally handles
all locking. This pulls together pieces of this process that were
sprinkled all over the places into one place, and covers them with one
lock.

This eliminates several duplicate/confusing locks and makes the control
flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
simple.

Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
logically part of the rwsem and provides the 'down write, fail if write
locked, wait if read locked' semantic we require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Revise and clarify the rwsem and uobjects_lock
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:18 +0000 (20:55 -0600)]
IB/uverbs: Revise and clarify the rwsem and uobjects_lock

Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
to the type destroy function (aka 'hw' destroy). The main purpose of this
lock is to prevent normal add and destroy from running concurrently with
uverbs_cleanup_ufile()

Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
we can eliminate the uobjects_lock in the cleanup function. This allows
converting that lock to a very simple spinlock with a narrow critical
section.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Clarify and revise uverbs_close_fd
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:17 +0000 (20:55 -0600)]
IB/uverbs: Clarify and revise uverbs_close_fd

The locking requirements here have changed slightly now that we can rely
on the ib_uverbs_file always existing and containing all the necessary
locking infrastructure.

That means we can get rid of the cleanup_mutex usage (this was protecting
the check on !uboj->context).

Otherwise, follow the same pattern that IDR uses for destroy, acquire
exclusive write access, then call destroy and the undo the 'lookup'.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Revise the placement of get/puts on uobject
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:16 +0000 (20:55 -0600)]
IB/uverbs: Revise the placement of get/puts on uobject

This wasn't wrong, but the placement of two krefs didn't make any
sense. Follow some simple rules.

- A kref is held inside uobjects_list
- A kref is held inside the IDR
- A kref is held inside file->private
- A stack based kref is passed bettwen alloc_begin and
  alloc_abort/alloc_commit

Any place we destroy one of the above pointers, we stick a put,
or 'move' the kref into another pointer.

The key functions have sensible semantics:
- alloc_uobj fully initializes the common members in uobj, including
  the list
- Get rid of the uverbs_idr_remove_uobj helper since IDR remove
  does require put, but it depends on the situation. Later
  patches will re-consolidate this differently.
- alloc_abort always consumes the passed kref, done in the type
- alloc_commit always consumes the passed kref, done in the type
- rdma_remove_commit_uobject always pairs with a lookup_get

After it is all done the only control flow change is to:
- move a get from alloc_commit_fd_uobject to rdma_alloc_commit_uobject
- add a put to remove_commit_idr_uobject
- Consistenly use rdma_lookup_put in rdma_remove_commit_uobject at
  the right place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Clarify the kref'ing ordering for alloc_commit
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:15 +0000 (20:55 -0600)]
IB/uverbs: Clarify the kref'ing ordering for alloc_commit

The alloc_commit callback makes the uobj visible to other threads,
and it does so using a 'move' semantic of the uobj kref on the stack
into the public storage (eg the IDR, uobject list and file_private_data)

Once this is done another thread could start up and trigger deletion
of the kref. Fortunately cleanup_rwsem happens to prevent this from
being a bug, but that is a fantastically unclear side effect.

Re-organize things so that alloc_commit is that last thing to touch
the uobj, get rid of the sneaky implicit dependency on cleanup_rwsem,
and add a comment reminding that uobj is no longer kref'd after
alloc_commit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Handle IDR and FD types without truncation
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:14 +0000 (20:55 -0600)]
IB/uverbs: Handle IDR and FD types without truncation

Our ABI for write() uses a s32 for FDs and a u32 for IDRs, but internally
we ended up implicitly casting these ABI values into an 'int'. For ioctl()
we use a s64 for FDs and a u64 for IDRs, again casting to an int.

The various casts to int are all missing range checks which can cause
userspace values that should be considered invalid to be accepted.

Fix this by making the generic lookup routine accept a s64, which does not
truncate the write API's u32/s32 or the ioctl API's s64. Then push the
detailed range checking down to the actual type implementations to be
shared by both interfaces.

Finally, change the copy of the uobj->id to sign extend into a s64, so eg,
if we ever wish to return a negative value for a FD it is carried
properly.

This ensures that userspace values are never weirdly interpreted due to
the various trunctations and everything that is really out of range gets
an EINVAL.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Get rid of null_obj_type
Jason Gunthorpe [Wed, 11 Jul 2018 02:55:13 +0000 (20:55 -0600)]
IB/uverbs: Get rid of null_obj_type

If the method fails after calling rdma_explicit_destroy (eg if
copy_to_user faults) then it will trigger a kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000000548d067 P4D 800000000548d067 PUD 54a0067 PMD 0
SMP PTI
CPU: 0 PID: 359 Comm: ibv_rc_pingpong Not tainted 4.18.0-rc1+ #28
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffffc900001a3bf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000603bd00 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000603bd00
RBP: 0000000000000001 R08: ffffc900001a3cf8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900001a3cf0
R13: 0000000000000000 R14: ffffc900001a3cf0 R15: 0000000000000000
FS:  00007fb00dda8700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000000548e004 CR4: 00000000003606b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rdma_lookup_put_uobject+0x22/0x50 [ib_uverbs]
 ? uverbs_finalize_object+0x3b/0x60 [ib_uverbs]
 ? uverbs_finalize_attrs+0x128/0x140 [ib_uverbs]
 ? ib_uverbs_cmd_verbs+0x698/0x7c0 [ib_uverbs]
 ? find_held_lock+0x2d/0x90
 ? __might_fault+0x39/0x90
 ? ib_uverbs_ioctl+0x111/0x1f0 [ib_uverbs]
 ? do_vfs_ioctl+0xa0/0x6d0
 ? trace_hardirqs_on_caller+0xed/0x180
 ? _raw_spin_unlock_irq+0x24/0x40
 ? syscall_trace_enter+0x138/0x1d0
 ? ksys_ioctl+0x35/0x60
 ? __x64_sys_ioctl+0x11/0x20
 ? do_syscall_64+0x5b/0x1c0
 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is because the type was replaced with the null_type during explicit
destroy that cannot complete the destruction.

One of the side effects of replacing the type is to make the object
handle totally unreachable - so no other command could attempt to use
it, even though it remains on the uboject list.

We can get the same end result by just fully destroying the object inside
rdma_explicit_destroy and leaving the caller the residual kref for the
uobj with no attached HW object, and no presence in the ubojects list.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:31 +0000 (09:25 -0700)]
net/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/smc: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:30 +0000 (09:25 -0700)]
net/smc: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/smc: Remove a WARN_ON() statement
Bart Van Assche [Wed, 18 Jul 2018 16:25:29 +0000 (09:25 -0700)]
net/smc: Remove a WARN_ON() statement

Remove a WARN_ON() statement that verifies something that is guaranteed
by the RDMA API, namely that the failed_wr pointer is not touched if an
ib_post_send() call succeeds and that it points at the failed wr if an
ib_post_send() call fails.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/rds: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:28 +0000 (09:25 -0700)]
net/rds: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/rds: Remove two WARN_ON() statements
Bart Van Assche [Wed, 18 Jul 2018 16:25:27 +0000 (09:25 -0700)]
net/rds: Remove two WARN_ON() statements

Remove two WARN_ON() statements that verify something that is guaranteed
by the RDMA API, namely that the failed_wr pointer is not touched if an
ib_post_send() call succeeds and that it points at the failed wr if an
ib_post_send() call fails.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/9p: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:26 +0000 (09:25 -0700)]
net/9p: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agofs/cifs: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:25 +0000 (09:25 -0700)]
fs/cifs: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:24 +0000 (09:25 -0700)]
nvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonvme-rdma: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:23 +0000 (09:25 -0700)]
nvme-rdma: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/srpt: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:22 +0000 (09:25 -0700)]
IB/srpt: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/srp: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:21 +0000 (09:25 -0700)]
IB/srp: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/isert: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:20 +0000 (09:25 -0700)]
IB/isert: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/iser: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:19 +0000 (09:25 -0700)]
IB/iser: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/IPoIB: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:18 +0000 (09:25 -0700)]
IB/IPoIB: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Simplify ib_post_(send|recv|srq_recv)() calls
Bart Van Assche [Wed, 18 Jul 2018 16:25:17 +0000 (09:25 -0700)]
RDMA/core: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Allow ULPs to specify NULL as the third ib_post_(send|recv|srq_recv)() argument
Bart Van Assche [Wed, 18 Jul 2018 16:25:16 +0000 (09:25 -0700)]
IB/core: Allow ULPs to specify NULL as the third ib_post_(send|recv|srq_recv)() argument

This patch does not change the behavior of the modified functions.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/rxe: Drop QP0 silently
Zhu Yanjun [Fri, 13 Jul 2018 07:10:20 +0000 (03:10 -0400)]
IB/rxe: Drop QP0 silently

According to "Annex A16: RDMA over Converged Ethernet (RoCE)":

A16.4.3 MANAGEMENT INTERFACES

As defined in the base specification, a special Queue Pair, QP0 is defined
solely for communication between subnet manager(s) and subnet management
agents. Since such an IB-defined subnet management architecture is outside
the scope of this annex, it follows that there is also no requirement that
a port which conforms to this annex be associated with a QP0. Thus, for
end nodes designed to conform to this annex, the concept of QP0 is
undefined and unused for any port connected to an Ethernet network.

CA16-8: A packet arriving at a RoCE port containing a BTH with the
destination QP field set to QP0 shall be silently dropped.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/ipoib: Fix error return code in ipoib_dev_init()
Wei Yongjun [Wed, 11 Jul 2018 13:15:42 +0000 (13:15 +0000)]
IB/ipoib: Fix error return code in ipoib_dev_init()

Fix to return a negative error code from the ipoib_neigh_hash_init()
error handling case instead of 0, as done elsewhere in this function.

Fixes: 515ed4f3aab4 ("IB/IPoIB: Separate control and data related initializations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Enable driver uapi commands for flow steering
Yishai Hadas [Mon, 23 Jul 2018 12:25:12 +0000 (15:25 +0300)]
IB/mlx5: Enable driver uapi commands for flow steering

Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Add support for a flow table destination for driver flow steering
Yishai Hadas [Mon, 23 Jul 2018 12:25:11 +0000 (15:25 +0300)]
IB/mlx5: Add support for a flow table destination for driver flow steering

Add support to set a destination that is a flow table, this can come from
the DEVX destination.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Support adding flow steering rule by raw description
Yishai Hadas [Mon, 23 Jul 2018 12:25:10 +0000 (15:25 +0300)]
IB/mlx5: Support adding flow steering rule by raw description

Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.

The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.

This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Introduce driver create and destroy flow methods
Yishai Hadas [Mon, 23 Jul 2018 12:25:09 +0000 (15:25 +0300)]
IB/mlx5: Introduce driver create and destroy flow methods

Introduce driver create and destroy flow methods on the uverbs flow
object.

This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.

The IB object's attributes are set via the ib_set_flow() helper function.

The specific implementation for the given specification is added in
downstream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB: Support ib_flow creation in drivers
Yishai Hadas [Mon, 23 Jul 2018 12:25:08 +0000 (15:25 +0300)]
IB: Support ib_flow creation in drivers

This patch considers the case that ib_flow is created by some device
driver with its specific parameters using the KABI infrastructure.

In that case both QP and ib_uflow_resources might not be applicable.
Downstream patches from this series use the above functionality.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Introduce flow steering matcher uapi object
Yishai Hadas [Mon, 23 Jul 2018 12:25:07 +0000 (15:25 +0300)]
IB/mlx5: Introduce flow steering matcher uapi object

Introduce flow steering matcher object and its create and destroy methods.

This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.

It will be used in downstream patches to be part of mlx5 specific create
flow method.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMerge branch 'mellanox/mlx5-next' into rdma.git for-next
Jason Gunthorpe [Tue, 24 Jul 2018 19:10:23 +0000 (13:10 -0600)]
Merge branch 'mellanox/mlx5-next' into rdma.git for-next

From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support for flow table destination number
  net/mlx5: Add forward compatible support for the FTE match data
  net/mlx5: Fix tristate and description for MLX5 module
  net/mlx5: Better return types for CQE API
  net/mlx5: Use ERR_CAST() instead of coding it
  net/mlx5: Add missing SET_DRIVER_VERSION command translation
  net/mlx5: Add XRQ commands definitions
  net/mlx5: Add core support for double vlan push/pop steering action
  net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
  net/mlx5: FW tracer, add hardware structures
  net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/mlx5: Add support for flow table destination number
Yishai Hadas [Tue, 19 Jun 2018 12:23:36 +0000 (15:23 +0300)]
net/mlx5: Add support for flow table destination number

Add support to set a destination from a flow table number.
This functionality will be used in downstream patches from this
series by the DEVX stuff.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Add forward compatible support for the FTE match data
Yishai Hadas [Tue, 19 Jun 2018 12:09:48 +0000 (15:09 +0300)]
net/mlx5: Add forward compatible support for the FTE match data

Use the PRM size including the reserved when working with the FTE
match data.

This comes to support forward compatibility for cases that current
reserved data will be exposed by the firmware by an application that
uses the DEVX API without changing the kernel.

Also drop some driver checks around the match criteria leaving the work
for firmware to enable forward compatibility for future bits there.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi
Jason Gunthorpe [Wed, 11 Jul 2018 22:20:44 +0000 (16:20 -0600)]
IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi

These constants are used in the ioctl interface so they are part of the
uapi, place them in the correct header for clarity.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoMAINTAINERS: Remove Dave Goodell from the usnic RDMA driver maintainer list
Bart Van Assche [Wed, 18 Jul 2018 16:14:36 +0000 (09:14 -0700)]
MAINTAINERS: Remove Dave Goodell from the usnic RDMA driver maintainer list

The e-mail address dgoodell@exch.cisco.com no longer exists. Additionally,
according to https://www.linkedin.com/in/goodell/ Dave is an Amazon
employee since December 2017. Hence remove his Cisco e-mail address from
the usnic maintainer list.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/bnxt_re: Modify a fall-through annotation
Bart Van Assche [Wed, 18 Jul 2018 15:58:02 +0000 (08:58 -0700)]
RDMA/bnxt_re: Modify a fall-through annotation

This patch avoids that gcc reports the following warning when building
with W=1:

drivers/infiniband/hw/bnxt_re/ib_verbs.c:2404:4: warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Remove set but not used variables
Kamal Heib [Wed, 18 Jul 2018 21:05:32 +0000 (00:05 +0300)]
RDMA/mlx5: Remove set but not used variables

Remove "uctx" and "pa" variables that were set but not used.

Fixes: a8b92ca1b0e5 ("IB/mlx5: Introduce DEVX")
Fixes: 8f0622873358 ("RDMA/mlx5: Remove debug prints of VMA pointers")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIPoIB: use kvzalloc to allocate an array of bucket pointers
Jan Dakinevich [Mon, 9 Jul 2018 13:51:03 +0000 (16:51 +0300)]
IPoIB: use kvzalloc to allocate an array of bucket pointers

This table by default takes 32KiB which is 3rd memory order. Meanwhile,
this memory is not aimed for DMA operation and could be safely allocated
by vmalloc.

Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/mlx5: Fix tristate and description for MLX5 module
Eran Ben Elisha [Tue, 17 Jul 2018 01:35:37 +0000 (18:35 -0700)]
net/mlx5: Fix tristate and description for MLX5 module

Current description did not include new devices. Fix that by proving the
correct generic description.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: Better return types for CQE API
Tariq Toukan [Tue, 17 Jul 2018 01:35:36 +0000 (18:35 -0700)]
net/mlx5: Better return types for CQE API

Reduce sizes of return types.
Use bool for binary indication.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: Use ERR_CAST() instead of coding it
Roi Dayan [Tue, 17 Jul 2018 01:35:35 +0000 (18:35 -0700)]
net/mlx5: Use ERR_CAST() instead of coding it

This makes it more readable that rule is being used to return an err.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: Add missing SET_DRIVER_VERSION command translation
Noa Osherovich [Tue, 17 Jul 2018 01:35:34 +0000 (18:35 -0700)]
net/mlx5: Add missing SET_DRIVER_VERSION command translation

When translating command opcodes to a string, SET_DRIVER_VERSION
command was missing.

Fixes: 42ca502e179d0 ('net/mlx5_core: Use a macro in mlx5_command_str()')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: Add XRQ commands definitions
Max Gurtovoy [Tue, 17 Jul 2018 01:35:33 +0000 (18:35 -0700)]
net/mlx5: Add XRQ commands definitions

Update mlx5 command list and error return function to handle XRQ
commands.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: Add core support for double vlan push/pop steering action
Jianbo Liu [Tue, 17 Jul 2018 01:35:32 +0000 (18:35 -0700)]
net/mlx5: Add core support for double vlan push/pop steering action

As newer firmware supports double push/pop in a single FTE, we add
core bits and extend vlan action logic for it.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
Eran Ben Elisha [Tue, 17 Jul 2018 01:35:31 +0000 (18:35 -0700)]
net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures

This patch exposes PRM layout for handling MPEGC (Management PCIe
General Configuration).

This will be used in the downstream patch for configuring MPEGC via the
driver.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5: FW tracer, add hardware structures
Feras Daoud [Tue, 17 Jul 2018 01:35:30 +0000 (18:35 -0700)]
net/mlx5: FW tracer, add hardware structures

This change adds the infrastructure to mlx5 core fw tracer.
It introduces the following 4 new registers:
MLX5_REG_MTRC_CAP  - Used to read tracer capabilities
MLX5_REG_MTRC_CONF - Used to set tracer configurations
MLX5_REG_MTRC_STDB - Used to query tracer strings database
MLX5_REG_MTRC_CTRL - Used to control the tracer

The capability of the tracing can be checked using mcam access
register, therefore, the mcam access register interface will expose
the tracer register.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agoIB/cm: Remove cma_multicast->igmp_joined
Jason Gunthorpe [Wed, 11 Jul 2018 08:20:29 +0000 (11:20 +0300)]
IB/cm: Remove cma_multicast->igmp_joined

This variable isn't read and written to with proper locking, so it is
racy. Instead of using an unlocked bool use presence in the mc->list

The caller could race rdma_join_multicast with rdma_leave_multicast which
would leak a mc join and cause a use after free of mc.

Instead, do not add the mc to the list until it has completed
initialization, all mcs on the list require leaving.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoRDMA/umem: Refactor exit paths in ib_umem_get
Leon Romanovsky [Tue, 10 Jul 2018 10:31:49 +0000 (13:31 +0300)]
RDMA/umem: Refactor exit paths in ib_umem_get

Simplify exit paths in ib_umem_get to use the standard goto unwind
pattern.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/umem: Don't hold mmap_sem for too long
Leon Romanovsky [Tue, 10 Jul 2018 10:31:48 +0000 (13:31 +0300)]
RDMA/umem: Don't hold mmap_sem for too long

DMA mapping is time consuming operation and doesn't need to be performed
with mmap_sem semaphore is held.

The semaphore only needs to be held for accounting and get_user_pages
related activities.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>