platform/kernel/linux-rpi.git
5 years agoRDMA/restrack: Consolidate task name updates in one place
Leon Romanovsky [Tue, 2 Oct 2018 08:48:02 +0000 (11:48 +0300)]
RDMA/restrack: Consolidate task name updates in one place

Unify task update and kernel name set in one place.

Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/restrack: Un-inline set task implementation
Leon Romanovsky [Tue, 2 Oct 2018 08:48:01 +0000 (11:48 +0300)]
RDMA/restrack: Un-inline set task implementation

Prepare rdma_restrack_set_task() call to accommodate more
code by moving its implementation from *.h to *.c.

Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu
Parav Pandit [Fri, 21 Sep 2018 14:18:24 +0000 (09:18 -0500)]
RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu

rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer or
ERR_PTR().  Instead of checking for NULL, check for error.

Fixes: caf1e3ae9fa6 ("RDMA/core Introduce and use rdma_find_ndev_for_src_ip_rcu")
Reported-by: syzbot+20c32fa6ff84a2d28c36@syzkaller.appspotmail.com
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt
Venkata Sandeep Dhanalakota [Wed, 26 Sep 2018 17:44:52 +0000 (10:44 -0700)]
IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt

This patch moves ruc_loopback() from hfi1 into rdmavt for code sharing
with the qib driver.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavt
Venkata Sandeep Dhanalakota [Wed, 26 Sep 2018 17:44:42 +0000 (10:44 -0700)]
IB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavt

Moving send completion code into rdmavt in order to have shared logic
between qib and hfi1 drivers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavt
Brian Welty [Wed, 26 Sep 2018 17:44:33 +0000 (10:44 -0700)]
IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavt

This patch moves hfi1_copy_sge() into rdmavt for sharing with qib.
This patch also moves all the wss_*() functions into rdmavt as
several wss_*() functions are called from hfi1_copy_sge()

When SGE copy mode is adaptive, cacheless copy may be done in some cases
for performance reasons. In those cases, X86 cacheless copy function
is called since the drivers that use rdmavt and may set SGE copy mode
to adaptive are X86 only. For this reason, this patch adds
"depends on X86_64" to rdmavt/Kconfig.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx4: Avoid implicit enumerated type conversion
Nathan Chancellor [Mon, 24 Sep 2018 19:57:16 +0000 (12:57 -0700)]
IB/mlx4: Avoid implicit enumerated type conversion

Clang warns when one enumerated type is implicitly converted to another.

drivers/infiniband/hw/mlx4/mad.c:1811:41: warning: implicit conversion
from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration
type 'enum ib_qp_create_flags' [-Wenum-conversion]
                qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_TUNNEL_QP;
                                                    ~ ^~~~~~~~~~~~~~~~~~~~~~~

drivers/infiniband/hw/mlx4/mad.c:1819:41: warning: implicit conversion
from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration
type 'enum ib_qp_create_flags' [-Wenum-conversion]
                qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_SQP;
                                                    ~ ^~~~~~~~~~~~~~~~~

The type mlx4_ib_qp_flags explicitly provides supplemental values to the
type ib_qp_create_flags. Make that clear to Clang by changing the
create_flags type to u32.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Bugfix for atomic operation
Lijun Ou [Sun, 30 Sep 2018 09:00:38 +0000 (17:00 +0800)]
RDMA/hns: Bugfix for atomic operation

The atomic operation not supported inline. Besides, the standard atomic
operation only support a sge and the sge is placed in the wqe.

Fix: 384f881("RDMA/hns: Add atomic support")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add vlan enable bit for hip08
Lijun Ou [Sun, 30 Sep 2018 09:00:37 +0000 (17:00 +0800)]
RDMA/hns: Add vlan enable bit for hip08

In order to extend vlan device range, the design add two field of qp
context for checking vlan packet in sender and in recevicer.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Support local invalidate for hip08 in kernel space
Lijun Ou [Sun, 30 Sep 2018 09:00:36 +0000 (17:00 +0800)]
RDMA/hns: Support local invalidate for hip08 in kernel space

This patch adds local invalidate Memory Region (MR) support in the kernel
space driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Update some fields of qp context
Lijun Ou [Sun, 30 Sep 2018 09:00:35 +0000 (17:00 +0800)]
RDMA/hns: Update some fields of qp context

The hip08 hardware has two version. the version id are 0x20 and 0x21
according to the pci revision. It needs to adjust some fields for
extending new features. The specific updates include:

1. Add some fields for supporting new features by enabling some reserved
   fields in 0x20 version.
2. remove some fields which the user is not visiable in order to support
   the extend features.
3. Init some fields with zero.

These updates is compatible with 0x20 version.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Limit extend sq sge num
Lijun Ou [Sun, 30 Sep 2018 09:00:34 +0000 (17:00 +0800)]
RDMA/hns: Limit extend sq sge num

According to hip08 limit, the buffer size of extend sge needs to be an
integer wqe_sge_buf_page size. For example, the value of sge_shift field
of qp context is greater or equal to eight when buffer page size is 4K
size. The value of sge_shift field of qp context assigned by
hr_qp->sge.sge_cnt.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Update some attributes of the RoCE device
Lijun Ou [Sun, 30 Sep 2018 09:00:33 +0000 (17:00 +0800)]
RDMA/hns: Update some attributes of the RoCE device

According to the IB protocol definition, the driver needs to show the
correct device information and the information will be queryed by device
attribute.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Configure ecn field of ip header
Lijun Ou [Sun, 30 Sep 2018 09:00:32 +0000 (17:00 +0800)]
RDMA/hns: Configure ecn field of ip header

In order to compatible with the third party RoCE device, The hardware
modify the set method for the ecn field of ip header in new hip08
version. The high 6bit of tclass be assigned for dscp field of packet.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Limit the size of extend sge of sq
Lijun Ou [Sun, 30 Sep 2018 09:00:31 +0000 (17:00 +0800)]
RDMA/hns: Limit the size of extend sge of sq

The hip08 split two hardware version. The version id are 0x20 and 0x21
according to the PCI revison. The max size of extend sge of sq is limited
to 2M for 0x20 version and 8M for 0x21 version. It may be exceeded to 2M
according to the algorithm that compute the product of wqe count and
extend sge number of every wqe. But the product always less than 8M.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Bugfix for CM test
Lijun Ou [Sun, 30 Sep 2018 09:00:30 +0000 (17:00 +0800)]
RDMA/hns: Bugfix for CM test

It will print the warning when the MSB bit of SLID is not zero running
cm_req_handler function that test CM. It needs to fixed zero when test
RoCE device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Submit bad wr when post send wr exception
Lijun Ou [Sun, 30 Sep 2018 09:00:29 +0000 (17:00 +0800)]
RDMA/hns: Submit bad wr when post send wr exception

When user issues a RDMA read and enables sq inline, it needs to report a
bad wr to user.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Bugfix for reserved qp number
Lijun Ou [Sun, 30 Sep 2018 09:00:28 +0000 (17:00 +0800)]
RDMA/hns: Bugfix for reserved qp number

It needs to include two special qps for every port. The hip08 have four
ports and the all reserved qp numbers are eight.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/netlink: Simplify netlink listener existence check
Leon Romanovsky [Tue, 2 Oct 2018 08:49:24 +0000 (11:49 +0300)]
RDMA/netlink: Simplify netlink listener existence check

All users of rdma_nl_chk_listeners() are interested to get boolean answer
if netlink socket has listeners, so update all places to boolean function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Remove unused parameter from ib_modify_qp_is_ok()
Kamal Heib [Tue, 2 Oct 2018 13:11:21 +0000 (16:11 +0300)]
RDMA: Remove unused parameter from ib_modify_qp_is_ok()

The ll parameter is not used in ib_modify_qp_is_ok(), so remove it.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/rxe: Remove unused addr_same()
Kamal Heib [Tue, 2 Oct 2018 08:03:10 +0000 (11:03 +0300)]
RDMA/rxe: Remove unused addr_same()

This function is not in use - delete it.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/rxe: avoid srq memory leak
Zhu Yanjun [Sun, 30 Sep 2018 06:27:16 +0000 (02:27 -0400)]
IB/rxe: avoid srq memory leak

In rxe_queue_init, q and q->buf are allocated. In do_mmap_info, q->ip is
allocated. When error occurs, rxe_srq_from_init and the later error
handler do not free these allocated memories.  This will make memory leak.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mthca: Fix error return code in __mthca_init_one()
Wei Yongjun [Sat, 29 Sep 2018 03:55:16 +0000 (03:55 +0000)]
IB/mthca: Fix error return code in __mthca_init_one()

Fix to return a negative error code from the mthca_cmd_init() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 80fd8238734c ("[PATCH] IB/mthca: Encapsulate command interface init")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/uverbs: Fix RCU annotation for radix slot deference
Jason Gunthorpe [Fri, 28 Sep 2018 22:28:02 +0000 (16:28 -0600)]
RDMA/uverbs: Fix RCU annotation for radix slot deference

The uapi radix tree is a write-once data structure protected by kref.
Once we get to the ioctl() fop it is not possible for anything else
to be writing to it, so the access should use rcu_dereference_protected.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA: Fix building with CONFIG_MMU=n
Jason Gunthorpe [Fri, 28 Sep 2018 21:20:23 +0000 (15:20 -0600)]
RDMA: Fix building with CONFIG_MMU=n

The zap_vma_ptes() is declared but not defined on NOMMU kernels, causing a
link error for the newly added uverbs code:

drivers/infiniband/core/uverbs_main.o: In function `uverbs_user_mmap_disassociate':
uverbs_main.c:(.text+0x114c): undefined reference to `zap_vma_ptes'
drivers/infiniband/core/uverbs_main.o: In function `rdma_umap_open':
uverbs_main.c:(.text+0x53c): undefined reference to `zap_vma_ptes'

Since all user access for all of our drivers depend on remapping pages to
user space disable USER_ACCESS when there is no mmu.

Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Introduce and use cma_ib_acquire_dev()
Parav Pandit [Sat, 15 Sep 2018 09:07:57 +0000 (12:07 +0300)]
RDMA/cma: Introduce and use cma_ib_acquire_dev()

When RDMA CM connect request arrives for IB transport, it already contains
device, port, netdevice (optional).

Instead of traversing all the cma devices, use the cma device already
found by the cma_find_listener() for which a listener id is provided.

iWarp devices doesn't need to derive RoCE GIDs, therefore drop RoCE
specific checks from cma_acquire_dev() and rename it to
cma_iw_acquire_dev().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()
Parav Pandit [Sat, 15 Sep 2018 09:07:56 +0000 (12:07 +0300)]
RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()

Light weight version of cma_acquire_dev() just for binding with rdma
device based on source IP(v4/v6) address.

This simplifies cma_acquire_dev() to avoid listen_id specific checks and
also for subsequent simplification for IB vs iWarp.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/cma: Allow accepting requests for multi port rdma device
Parav Pandit [Sat, 15 Sep 2018 09:07:55 +0000 (12:07 +0300)]
RDMA/cma: Allow accepting requests for multi port rdma device

When IP failover is used between multiple ports of a given rdma device,
allow accepting CM requests from either of the ports.  This is applicable
for IPv4 and IPv6 non link local addressing scheme.

IPv6 link local addresses are bound. IP failover requests for listen
cm_ids bound to specific netdev interfaces cannot be supported.
(Similar to traditional sockets).

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Use VL15 for SM packets
Kaike Wan [Wed, 26 Sep 2018 17:56:12 +0000 (10:56 -0700)]
IB/hfi1: Use VL15 for SM packets

Subnet Management Packets (SMP) should exclusively use VL15 and their SL
is ignored (IBTA v1.3, Section 3.5.8.2). Therefore, when an SMP is posted,
the SL in the address handle can be set to 0 by a user
application. Consequently, when an address handle is created by the IB
core, some fields in struct rvt_ah may not be set correctly by using the
SL2SC and SC2VL tables at the time. Subsequently, when the request is post
sent, the incoming swqe may fail the validation check, resulting in the
rejection of the send request.

This patch fixes the problem by using VL15 for any validation, ignoring
the SL in the address handle.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Add mtu check for operational data VLs
Alex Estrin [Wed, 26 Sep 2018 17:56:03 +0000 (10:56 -0700)]
IB/hfi1: Add mtu check for operational data VLs

Since Virtual Lanes BCT credits and MTU are set through separate MADs, we
have to ensure both are valid, and data VLs are ready for transmission
before we allow port transition to Armed state.

Fixes: 5e2d6764a729 ("IB/hfi1: Verify port data VLs credits on transition to Armed")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Ensure ucast_dlid access doesnt exceed bounds
Dennis Dalessandro [Wed, 26 Sep 2018 17:55:53 +0000 (10:55 -0700)]
IB/hfi1: Ensure ucast_dlid access doesnt exceed bounds

The dlid assignment made by looking into the u_ucast_dlid array does not
do an explicit check for the size of the array. The code path to arrive at
def_port, the index value is long and complicated so its best to just have
an explicit check here.

Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Add static trace for iowait
Kaike Wan [Wed, 26 Sep 2018 17:27:03 +0000 (10:27 -0700)]
IB/hfi1: Add static trace for iowait

This patch adds the static trace for resource wait.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Prepare resource waits for dual leg
Dennis Dalessandro [Fri, 28 Sep 2018 14:17:09 +0000 (07:17 -0700)]
IB/hfi1: Prepare resource waits for dual leg

Current implementation allows each qp to have only one send engine.  As
such, each qp has only one list to queue prebuilt packets when send engine
resources are not available. To improve performance, it is desired to
support multiple send engines for each qp.

This patch creates the framework to support two send engines
(two legs) for each qp for the TID RDMA protocol, which can be easily
extended to support more send engines. It achieves the goal by creating a
leg specific struct, iowait_work in the iowait struct, to hold the
work_struct and the tx_list as well as a pointer to the parent iowait
struct.

The hfi1_pkt_state now has an additional field to record the current legs
work structure and that is now passed to all egress waiters to determine
the leg that needs to wait via a new iowait helper.  The APIs are adjusted
to use the new leg specific struct as required.

Many new and modified helpers are added to support this change.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/rdmavt: Rename check_send_wqe as setup_wqe
Kaike Wan [Wed, 26 Sep 2018 17:26:44 +0000 (10:26 -0700)]
IB/rdmavt: Rename check_send_wqe as setup_wqe

The driver-provided function check_send_wqe allows the hardware driver to
check and set up the incoming send wqe before it is inserted into the swqe
ring. This patch will rename it as setup_wqe to better reflect its
usage. In addition, this function is only called when all setup is
complete in rdmavt.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: remove set but not used variable 'dseg'
YueHaibing [Fri, 28 Sep 2018 10:59:53 +0000 (10:59 +0000)]
RDMA/hns: remove set but not used variable 'dseg'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'hns_roce_v2_post_send':
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:194:35: warning:
 variable 'dseg' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/qedr: Remove enumerated type qed_roce_ll2_tx_dest
Nathan Chancellor [Thu, 27 Sep 2018 20:55:58 +0000 (13:55 -0700)]
RDMA/qedr: Remove enumerated type qed_roce_ll2_tx_dest

Clang warns when one enumerated type is explicitly converted to another.

drivers/infiniband/hw/qedr/qedr_roce_cm.c:198:28: warning: implicit
conversion from enumeration type 'enum qed_roce_ll2_tx_dest' to
different enumeration type 'enum qed_ll2_tx_dest' [-Wenum-conversion]
        ll2_tx_pkt.tx_dest = pkt->tx_dest;
                           ~ ~~~~~^~~~~~~
1 warning generated.

Turns out that QED_ROCE_LL2_TX_DEST_NW and QED_ROCE_LL2_TX_DEST_LB are
only used once in the whole tree and QED_ROCE_LL2_TX_DEST_MAX is used
nowhere. Remove them and use the equivalent values from qed_ll2_tx_dest
in their place.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Error path MAD response size is incorrect
Michael J. Ruhl [Fri, 28 Sep 2018 14:34:57 +0000 (07:34 -0700)]
IB/hfi1: Error path MAD response size is incorrect

If a MAD packet has incorrect header information, the logic uses the reply
path to report the error.  The reply path expects *resp_len to be set
prior to return.  Unfortunately, *resp_len is set to 0 for this path.
This causes an incorrect response packet.

Fix by ensuring that the *resp_len is defaulted to the incoming packet
size (wc->bytes_len - sizeof(GRH)).

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/rxe: replace kvfree with vfree
Zhu Yanjun [Sun, 30 Sep 2018 05:57:42 +0000 (01:57 -0400)]
IB/rxe: replace kvfree with vfree

The buf is allocated by vmalloc_user in the function rxe_queue_init.
So it is better to free it by vfree.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/iser: Fix possible NULL deref at iser_inv_desc()
Israel Rukshin [Wed, 26 Sep 2018 09:44:18 +0000 (09:44 +0000)]
IB/iser: Fix possible NULL deref at iser_inv_desc()

In case target remote invalidates bogus rkey and signature is not used,
pi_ctx is NULL deref.

The commit also fails the connection on bogus remote invalidation.

Fixes: 59caaed7a72a ("IB/iser: Support the remote invalidation exception")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Enable DEVX on IB
Yishai Hadas [Thu, 20 Sep 2018 18:45:21 +0000 (21:45 +0300)]
IB/mlx5: Enable DEVX on IB

IB has additional protections with SELinux that cannot be extended to the
DEVX domain. SELinux can restrict access to pkeys. The first version of
DEVX blocked IB entirely until this could be understood.

Since DEVX requires CAP_NET_RAW, it supersedes the SELinux restriction and
allows userspace to form arbitrary packets with arbitrary pkeys.

Thus we enable IB for DEVX when CAP_NET_RAW is given.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Enable DEVX white list commands
Yishai Hadas [Thu, 20 Sep 2018 18:45:20 +0000 (21:45 +0300)]
IB/mlx5: Enable DEVX white list commands

Enable DEVX white list commands without the need for CAP_NET_RAW.

DEVX uid must exist from the ucontext or the device so that the firmware
will mask unprivileged capabilities.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Manage device uid for DEVX white list commands
Yishai Hadas [Thu, 20 Sep 2018 18:45:19 +0000 (21:45 +0300)]
IB/mlx5: Manage device uid for DEVX white list commands

Manage device uid for DEVX white list commands.  The created device uid
will be used on white list commands if the user didn't supply its own uid.

This will enable the firmware to filter out non privileged functionality
as of the recognition of the uid.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Expose RAW QP device handles to user space
Yishai Hadas [Thu, 20 Sep 2018 18:45:18 +0000 (21:45 +0300)]
IB/mlx5: Expose RAW QP device handles to user space

Expose RAW QP device handles to user space by extending the UHW part of
mlx5_ib_create_qp_resp.

This data is returned only when DEVX context is used where it may be
applicable.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Acquire and release mmap_sem on page range
Parav Pandit [Tue, 25 Sep 2018 09:04:04 +0000 (12:04 +0300)]
RDMA/core: Acquire and release mmap_sem on page range

Currently mmap_sem is read locked while pinning the memory.  In a
multi-threaded application of a process, holding mmap_sem lock creates
contention with other threads who might be either registering memory,
creating QPs or simply doing mmap() as such operations also require to
hold the mmap_sem write lock.

All such operation cannot make forward progress until one memory pin
operation is completed.  It becomes more worse if the memory is unpinned
and/or memory registration is large (in GB range).

Therefore, instead of holding mmap_sem for too long (for whole region
pinning), acquire and release the lock for every few pages.  For example
on x86 with 4K page size, acquire and release mmap_sem for every 2Mbytes
memory chunk.

This allows other competing threads to make progress who might wish to
hold mmap_sem for shorter duration.

When memory registration latency is measured using [1] for memory sizes
ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many
runs no difference is seen other than run-to-run variance.

In other targeted tests of users with large memory, desired improvements
are seen due to reduced contention of mmap_sem.

[1] https://github.com/paravmellanox/rtool

$ rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A

It registers pinned memory from 4K to 48GB size with 500 iterations for
each memory size.

$ rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4

4 competing threads pin memory, each of 12GB size with 500 iterations.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: fix spelling mistake "reseved" -> "reserved"
Colin Ian King [Thu, 27 Sep 2018 13:24:30 +0000 (14:24 +0100)]
RDMA/hns: fix spelling mistake "reseved" -> "reserved"

Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/sa: simplify return code logic for ib_nl_send_msg()
Alex Estrin [Wed, 26 Sep 2018 17:02:32 +0000 (10:02 -0700)]
IB/sa: simplify return code logic for ib_nl_send_msg()

rdma_nl_multicast() returns either negative error code
or zero if succeeded. Remove unnecessary ret code checks
and reassignments.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/hfi1: Move UnsupportedVL bits definitions to the correct header
Michael J. Ruhl [Wed, 26 Sep 2018 17:02:22 +0000 (10:02 -0700)]
IB/hfi1: Move UnsupportedVL bits definitions to the correct header

The UnsupportedVL SendCtrl register bit information is defined in
the module rather than the chip register header file.

Move the defines to the appropriate header file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mthca: remove redundant inner check of mdev->mthca_flags
Colin Ian King [Wed, 26 Sep 2018 12:26:08 +0000 (13:26 +0100)]
IB/mthca: remove redundant inner check of mdev->mthca_flags

The inner check for mdev->mthca_flags & MTHCA_FLAG_MSI_X is redundant
as this is already true because of the previous identical check in
an outer if statement.  Remove it

Detected by cppcheck:
(warning) Identical inner 'if' condition is always true.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add MW support for hip08
Yixian Liu [Sun, 23 Sep 2018 09:20:46 +0000 (17:20 +0800)]
RDMA/hns: Add MW support for hip08

This patch adds memory window (mw) support in the kernel space.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add enable judgement for UD vlan
Lijun Ou [Sat, 22 Sep 2018 08:21:08 +0000 (16:21 +0800)]
RDMA/hns: Add enable judgement for UD vlan

According to the hardware modification, the vlan of the UD packet is based
on the ud_vlan_en field of the UD wqe to determine whether to add a vlan
header to the UD packet. The ud_vlan_en field is filled by the driver
according to the net device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add CM of vlan device support
Lijun Ou [Sat, 22 Sep 2018 08:21:07 +0000 (16:21 +0800)]
RDMA/hns: Add CM of vlan device support

This patch mainly sets the vlan_id field in the WC for rdma_listen() to
work over vlan. This is required by ib_init_ah_attr_from_wc() which is
called by the CM REQ handler.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Add atomic support
Lijun Ou [Sat, 22 Sep 2018 08:21:06 +0000 (16:21 +0800)]
RDMA/hns: Add atomic support

This patch adds atomic operations for hip08, includes fetchadd and cmpswap
operation.  In order to enable atomic, the driver needs to do the
following steps:

1. Enable the atomic caps for RoCE device
2. Post the wqe context of atomic type
3. Configure the atomic type of mtpt

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/hns: Refactor the codes for setting transport opode
Lijun Ou [Sat, 22 Sep 2018 08:21:05 +0000 (16:21 +0800)]
RDMA/hns: Refactor the codes for setting transport opode

Currently the transport opcodes which come from users configuration is set
by similar code. This patch simplifies it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/ulp: Use dev_name instead of ibdev->name
Jason Gunthorpe [Thu, 20 Sep 2018 22:42:27 +0000 (16:42 -0600)]
RDMA/ulp: Use dev_name instead of ibdev->name

These return the same thing but dev_name is a more conventional use of the
kernel API.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
5 years agoRDMA/drivers: Use dev_name instead of ibdev->name
Jason Gunthorpe [Thu, 20 Sep 2018 22:42:26 +0000 (16:42 -0600)]
RDMA/drivers: Use dev_name instead of ibdev->name

These return the same thing but dev_name is a more conventional use of the
kernel API.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/core: Use dev_name instead of ibdev->name
Jason Gunthorpe [Thu, 20 Sep 2018 22:42:25 +0000 (16:42 -0600)]
RDMA/core: Use dev_name instead of ibdev->name

These return the same thing but dev_name is a more conventional use of the
kernel API.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
5 years agoRDMA/drivers: Use dev_err/dbg/etc instead of pr_* + ibdev->name
Jason Gunthorpe [Thu, 20 Sep 2018 22:42:24 +0000 (16:42 -0600)]
RDMA/drivers: Use dev_err/dbg/etc instead of pr_* + ibdev->name

Kernel convention is that a driver for a subsystem will print using
dev_* on the subsystem's struct device, or with dev_* on the physical
device. Drivers should rarely use a pr_* function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev->name
Jason Gunthorpe [Thu, 20 Sep 2018 22:42:23 +0000 (16:42 -0600)]
RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev->name

Any messages related to a device should be printed with the dev_*
formatters. This provides greater consistency for the user.

The core does not set pr_fmt so this has no significant change.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
5 years agoRDMA: Fully setup the device name in ib_register_device
Jason Gunthorpe [Tue, 25 Sep 2018 22:58:09 +0000 (16:58 -0600)]
RDMA: Fully setup the device name in ib_register_device

The current code has two copies of the device name, ibdev->dev and
dev_name(&ibdev->dev), and they are setup at different times, which is
very confusing.

Set them both up at the same time and make dev_name() the lead name, which
is the proper use of the driver core APIs. To make it very clear that the
name is not valid until registration pass it in to the
ib_register_device() call rather than messing with ibdev->name directly.

Also the reorganization now checks that dev_name is unique even if it does
not contain a %.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
5 years agoRDMA: Fix dependencies for rdma_user_mmap_io
Arnd Bergmann [Wed, 26 Sep 2018 19:36:52 +0000 (21:36 +0200)]
RDMA: Fix dependencies for rdma_user_mmap_io

The mlx4 driver produces a link error when it is configured
as built-in while CONFIG_INFINIBAND_USER_ACCESS is set to =m:

drivers/infiniband/hw/mlx4/main.o: In function `mlx4_ib_mmap':
main.c:(.text+0x1af4): undefined reference to `rdma_user_mmap_io'

The same function is called from mlx5, which already has a
dependency to ensure we can call it, and from hns, which
appears to suffer from the same problem.

This adds the same dependency that mlx5 uses to the other two.

Fixes: 6745d356ab39 ("RDMA/hns: Use rdma_user_mmap_io")
Fixes: c282da4109e4 ("RDMA/mlx4: Use rdma_user_mmap_io")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/umem: Fix potential addition overflow
Doug Ledford [Fri, 21 Sep 2018 15:30:13 +0000 (11:30 -0400)]
RDMA/umem: Fix potential addition overflow

Given a large enough memory allocation, it is possible to wrap the
pinned_vm counter.  Check for addition overflow to prevent such
eventualities.

Fixes: 40ddacf2dda9 ("RDMA/umem: Don't hold mmap_sem for too long")
Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/umem: Minor optimizations
Doug Ledford [Fri, 21 Sep 2018 15:30:12 +0000 (11:30 -0400)]
RDMA/umem: Minor optimizations

Noticed while reviewing commit d4b4dd1b9706 ("RDMA/umem: Do not use
current->tgid to track the mm_struct") patch.  Why would we take a lock,
adjust a protected variable, drop the lock, and *then* check the input
into our protected variable adjustment?  Then we have to take the lock
again on our error unwind.  Let's just check the input early and skip
taking the locks needlessly if the input isn't valid.

It was also noticed that we set mm = current->mm, we then never modify
mm, but we still go back and reference current->mm a number of times
needlessly.  Be consistent in using the stored reference in mm.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoiw_cxgb4: Use proper enumerated type in c4iw_bar2_addrs
Nathan Chancellor [Mon, 24 Sep 2018 19:29:03 +0000 (12:29 -0700)]
iw_cxgb4: Use proper enumerated type in c4iw_bar2_addrs

Clang warns when one enumerated type is implicitly converted to another.

drivers/infiniband/hw/cxgb4/qp.c:287:8: warning: implicit conversion
from enumeration type 'enum t4_bar2_qtype' to different enumeration type
'enum cxgb4_bar2_qtype' [-Wenum-conversion]
                                                 T4_BAR2_QTYPE_EGRESS,
                                                 ^~~~~~~~~~~~~~~~~~~~

c4iw_bar2_addrs expects a value from enum cxgb4_bar2_qtype so use the
corresponding values from that type so Clang is satisfied without changing
the meaning of the code.

T4_BAR2_QTYPE_EGRESS = CXGB4_BAR2_QTYPE_EGRESS = 0
T4_BAR2_QTYPE_INGRESS = CXGB4_BAR2_QTYPE_INGRESS = 1

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoRDMA/mlx5: Remove superfluous version print
Mark Bloch [Mon, 24 Sep 2018 18:58:07 +0000 (18:58 +0000)]
RDMA/mlx5: Remove superfluous version print

When profiles were introduced to MLX5 IB an unneeded version print when
creating an MLX5 IB device was added. Remove the print, we still have a
printk for driver version in mlx5_ib_add().

Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/usnic: fix spelling mistake "unvalid" -> "invalid"
Colin Ian King [Mon, 24 Sep 2018 17:27:38 +0000 (18:27 +0100)]
IB/usnic: fix spelling mistake "unvalid" -> "invalid"

Trivial fix to spelling mistake in usnic_err error message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set valid umem bit on DEVX
Yishai Hadas [Thu, 20 Sep 2018 18:39:33 +0000 (21:39 +0300)]
IB/mlx5: Set valid umem bit on DEVX

Set valid umem bit on DEVX commands that use umem.
This will enforce the umem usage by the firmware and not the 'pas' info.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of TD commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:32 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of TD commands

Set uid as part of TD commands so that the firmware can
manage the TD object in a secured way.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of XRCD commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:31 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of XRCD commands

Set uid as part of XRCD commands so that the firmware can manage the
XRCD object in a secured way.

That will enable using an XRCD that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of CQ creation
Yishai Hadas [Thu, 20 Sep 2018 18:39:30 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of CQ creation

Set uid as part of CQ creation so that the firmware can manage the
CQ object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a CQ that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid upon PD allocation
Yishai Hadas [Thu, 20 Sep 2018 18:39:29 +0000 (21:39 +0300)]
IB/mlx5: Set uid upon PD allocation

Set uid as part of PD allocation, this uid is used for other mlx5
objects upon calling the firmware.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of RQT commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:28 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of RQT commands

Set uid as part of RQT commands so that the firmware can manage the
RQT object in a secured way.

That will enable using an RQT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of TIS commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:27 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of TIS commands

Set uid as part of TIS commands so that the firmware can manage the
TIS object in a secured way.

That will enable using a TIS that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of TIR commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:26 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of TIR commands

Set uid as part of TIR commands so that the firmware can manage the
TIR object in a secured way.

That will enable using a TIR that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of MCG commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:25 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of MCG commands

Set uid as part of MCG commands so that the firmware can manage the
MCG object in a secured way.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of DCT commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:24 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of DCT commands

Set uid as part of DCT create command so that the firmware can
manage the DCT object in a secured way.

The uid for the destroy and drain commands are set by mlx5_core.

That will enable using a DCT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of SRQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:23 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of SRQ commands

Set uid as part of SRQ create command so that the firmware can manage
the SRQ object in a secured way.

The uid for the destroy and modify commands are set by mlx5_core.

That will enable using a SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of SQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:22 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of SQ commands

Set uid as part of SQ commands so that the firmware can manage the
SQ object in a secured way.

The uid for the destroy command is set by mlx5_core.

This will enable using an SQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of RQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:21 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of RQ commands

Set uid as part of RQ commands so that the firmware can manage the
RQ object in a secured way.

The uid for the destroy command is set by mlx5_core.

This will enable using an RQ that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Set uid as part of QP creation
Yishai Hadas [Thu, 20 Sep 2018 18:39:20 +0000 (21:39 +0300)]
IB/mlx5: Set uid as part of QP creation

Set uid as part of QP creation so that the firmware can manage the
QP object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoIB/mlx5: Use uid as part of PD commands
Yishai Hadas [Thu, 20 Sep 2018 18:39:19 +0000 (21:39 +0300)]
IB/mlx5: Use uid as part of PD commands

Use uid as part of PD commands so that the firmware can manage the
PD object in a secured way.

For example when a QP is created its uid must match the CQ uid which it
uses.

Next patches in this series will use the uid from the PD, then will come
a patch to set the uid on the PD so that all objects will be properly
work in one change.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agoMerge branch 'mellanox/mlx5-next' into rdma.git for-next
Jason Gunthorpe [Tue, 25 Sep 2018 20:02:48 +0000 (14:02 -0600)]
Merge branch 'mellanox/mlx5-next' into rdma.git for-next

From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Update mlx5_ifc with DEVX UID bits
  net/mlx5: Set uid as part of DCT commands
  net/mlx5: Set uid as part of SRQ commands
  net/mlx5: Set uid as part of SQ commands
  net/mlx5: Set uid as part of RQ commands
  net/mlx5: Set uid as part of QP commands
  net/mlx5: Set uid as part of CQ commands

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
5 years agonet/mlx5: Update mlx5_ifc with DEVX UID bits
Leon Romanovsky [Thu, 20 Sep 2018 18:35:26 +0000 (21:35 +0300)]
net/mlx5: Update mlx5_ifc with DEVX UID bits

Add DEVX information to WQ, SRQ, CQ, TIR, TIS, QP,
RQ, XRCD, PD, MKEY and MCG.

Each object that is created/destroyed/modified via verbs will
be stamped with a UID based on its user context. This is already
done for DEVX objects commands.

This will enable the firmware to enforce the usage of kernel objects
from the DEVX flow by validating that the same UID is used and the
resources are really related to the same user.

The addition of *_valid fields are needed to distinguish
how various addresses are passed.

For non-DEVX callers, all those fields will be zero.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Set uid as part of DCT commands
Yishai Hadas [Thu, 20 Sep 2018 18:35:25 +0000 (21:35 +0300)]
net/mlx5: Set uid as part of DCT commands

Set uid as part of DCT commands so that the firmware can manage the
DCT object in a secured way.

That will enable using a DCT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Set uid as part of SRQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:35:24 +0000 (21:35 +0300)]
net/mlx5: Set uid as part of SRQ commands

Set uid as part of SRQ commands so that the firmware can manage the
SRQ object in a secured way.

That will enable using an SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Set uid as part of SQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:35:23 +0000 (21:35 +0300)]
net/mlx5: Set uid as part of SQ commands

Set uid as part of SQ commands so that the firmware can manage the
SQ object in a secured way.

That will enable using an SQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Set uid as part of RQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:35:22 +0000 (21:35 +0300)]
net/mlx5: Set uid as part of RQ commands

Set uid as part of RQ commands so that the firmware can manage the
RQ object in a secured way.

That will enable using an RQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Set uid as part of QP commands
Yishai Hadas [Thu, 20 Sep 2018 18:35:21 +0000 (21:35 +0300)]
net/mlx5: Set uid as part of QP commands

Set uid as part of QP commands so that the firmware can manage the
QP object in a secured way.

That will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agonet/mlx5: Set uid as part of CQ commands
Yishai Hadas [Thu, 20 Sep 2018 18:35:20 +0000 (21:35 +0300)]
net/mlx5: Set uid as part of CQ commands

Set uid as part of CQ commands so that the firmware can manage the CQ
object in a secured way.

The firmware should mark this CQ with the given uid so that it can
be used later on only by objects with the same uid.

Upon DEVX flows that use this CQ (e.g. create QP command), the
pointed CQ must have the same uid as of the issuer uid command.

When a command is issued with uid=0 it means that the issuer of the
command is trusted (i.e. kernel), in that case any pointed object
can be used regardless of its uid.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoMerge branch 'mlx5-vport-loopback' into rdma.get
Doug Ledford [Sat, 22 Sep 2018 00:41:58 +0000 (20:41 -0400)]
Merge branch 'mlx5-vport-loopback' into rdma.get

For dependencies, branch based on 'mlx5-next' of
    git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

mlx5 mcast/ucast loopback control enhancements from Leon Romanovsky:

====================
This is short series from Mark which extends handling of loopback
traffic. Originally mlx5 IB dynamically enabled/disabled both unicast
and multicast based on number of users. However RAW ethernet QPs need
more granular access.
====================

Fixed failed automerge in mlx5_ib.h (minor context conflict issue)

mlx5-vport-loopback branch:
    RDMA/mlx5: Enable vport loopback when user context or QP mandate
    RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
    RDMA/mlx5: Refactor transport domain bookkeeping logic
    net/mlx5: Rename incorrect naming in IFC file

Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/mlx5: Enable vport loopback when user context or QP mandate
Mark Bloch [Mon, 17 Sep 2018 10:30:49 +0000 (13:30 +0300)]
RDMA/mlx5: Enable vport loopback when user context or QP mandate

A user can create a QP which can accept loopback traffic, but that's not
enough. We need to enable loopback on the vport as well. Currently vport
loopback is enabled only when more than 1 users are using the IB device,
update the logic to consider whatever a QP which supports loopback was
created, if so enable vport loopback even if there is only a single user.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/mlx5: Allow creating RAW ethernet QP with loopback support
Mark Bloch [Mon, 17 Sep 2018 10:30:48 +0000 (13:30 +0300)]
RDMA/mlx5: Allow creating RAW ethernet QP with loopback support

Expose two new flags:
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

Those flags can be used at creation time in order to allow a QP
to be able to receive loopback traffic (unicast and multicast).
We store the state in the QP to be used on the destroy path
to indicate with which flags the QP was created with.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/mlx5: Refactor transport domain bookkeeping logic
Mark Bloch [Mon, 17 Sep 2018 10:30:47 +0000 (13:30 +0300)]
RDMA/mlx5: Refactor transport domain bookkeeping logic

In preparation to enable loopback on a single user context move the logic
that enables/disables loopback to separate functions and group variables
under a single struct.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agonet/mlx5: Rename incorrect naming in IFC file
Mark Bloch [Mon, 17 Sep 2018 10:30:46 +0000 (13:30 +0300)]
net/mlx5: Rename incorrect naming in IFC file

Remove a trailing underscore from the multicast/unicast names.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
5 years agoRDMA/cxgb4: remove redundant null pointer check before kfree_skb
zhong jiang [Thu, 20 Sep 2018 09:52:42 +0000 (17:52 +0800)]
RDMA/cxgb4: remove redundant null pointer check before kfree_skb

kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/mlx4: Remove unnecessary parentheses
Nathan Chancellor [Thu, 20 Sep 2018 03:32:29 +0000 (20:32 -0700)]
IB/mlx4: Remove unnecessary parentheses

Clang warns when more than one set of parentheses are used in single
conditional statements.

drivers/infiniband/hw/mlx4/mcg.c:676:16: warning: equality comparison
with extraneous parentheses [-Wparentheses-equality]
                        if ((method == IB_MGMT_METHOD_GET_RESP)) {
                             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mcg.c:676:16: note: remove extraneous
parentheses around the comparison to silence this warning
                        if ((method == IB_MGMT_METHOD_GET_RESP)) {
                            ~       ^                         ~
drivers/infiniband/hw/mlx4/mcg.c:676:16: note: use '=' to turn this
equality comparison into an assignment
                        if ((method == IB_MGMT_METHOD_GET_RESP)) {
                                    ^~
                                    =

Remove the unnecessary parentheses to silence this warning.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoIB/nes: Remove unnecessary parentheses
Nathan Chancellor [Thu, 20 Sep 2018 03:29:57 +0000 (20:29 -0700)]
IB/nes: Remove unnecessary parentheses

Clang warns when more than one set of parentheses are used in single
conditional statements.

drivers/infiniband/hw/nes/nes_hw.c:1446:27: warning: equality comparison
with extraneous parentheses [-Wparentheses-equality]
        } while ((temp_phy_data2 == temp_phy_data));
                  ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: remove extraneous
parentheses around the comparison to silence this warning
        } while ((temp_phy_data2 == temp_phy_data));
                 ~               ^               ~
drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: use '=' to turn this
equality comparison into an assignment
        } while ((temp_phy_data2 == temp_phy_data));
                                 ^~
                                 =

Remove the unnecessary parentheses to silence this warning.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/uverbs: Get rid of ucontext->tgid
Jason Gunthorpe [Sun, 16 Sep 2018 17:48:12 +0000 (20:48 +0300)]
RDMA/uverbs: Get rid of ucontext->tgid

Nothing uses this now, just delete it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/umem: Avoid synchronize_srcu in the ODP MR destruction path
Jason Gunthorpe [Sun, 16 Sep 2018 17:48:11 +0000 (20:48 +0300)]
RDMA/umem: Avoid synchronize_srcu in the ODP MR destruction path

synchronize_rcu is slow enough that it should be avoided on the syscall
path when user space is destroying MRs. After all the rework we can now
trivially do this by having call_srcu kfree the per_mm.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/umem: Handle a half-complete start/end sequence
Jason Gunthorpe [Sun, 16 Sep 2018 17:48:10 +0000 (20:48 +0300)]
RDMA/umem: Handle a half-complete start/end sequence

mmu_notifier_unregister() can race between a invalidate_start/end and
cause the invalidate_end to be skipped. This causes an imbalance in the
locking, which lockdep complains about.

This is not actually a bug, as we immediately kfree the memory holding the
lock, but it simple enough to fix.

Mark when the notifier is being destroyed and abort the start callback.
This can be done under the lock we already obtained, and can re-purpose
the invalidate_range test we already have.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
5 years agoRDMA/umem: Get rid of per_mm->notifier_count
Jason Gunthorpe [Sun, 16 Sep 2018 17:48:09 +0000 (20:48 +0300)]
RDMA/umem: Get rid of per_mm->notifier_count

This is intrinsically racy and the scheme is simply unnecessary. New MR
registration can wait for any on going invalidation to fully complete.

      CPU0                              CPU1
                                  if (atomic_read())
 if (atomic_dec_and_test() &&
     !list_empty())
  { /* not taken */ }
                                       list_add()

Putting the new UMEM into some kind of purgatory until another invalidate
rolls through..

Instead hold the read side of the umem_rwsem across the pair'd start/end
and get rid of the racy 'deferred add' approach.

Since all umem's in the rbt are always ready to go, also get rid of the
mn_counters_active stuff.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>