Yixian Liu [Fri, 29 Dec 2017 11:26:18 +0000 (19:26 +0800)]
RDMA/hns: Add detailed comments for mb() call
This patch adds more detailed comments when we call the
memory barrier function, such as rmb, wmb and mb. Three
mb() callers are deleted since they are unnecessary.
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 24 Dec 2017 14:31:36 +0000 (16:31 +0200)]
IB/mlx5: Enable QP creation with a given blue flame index
This patch enables QP creation with a given BF index, this allows the
user space driver to share same BF between few QPs or alternatively have
a dedicated BF per QP.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 24 Dec 2017 14:31:35 +0000 (16:31 +0200)]
IB/mlx5: Expose dynamic mmap allocation
This patch exposes the option to dynamic allocates a UAR, this
functionality will be used in downstream patch in this series as
part of QP creation.
Specifically, the user space driver asks for a UAR allocation in a given
page index, upon success this UAR and its bfregs can be used as part of
QP creation by the user space driver.
To enable allocating more than 256 UARs the page index is encoded in an
extra one byte just after the command byte.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 24 Dec 2017 14:31:34 +0000 (16:31 +0200)]
IB/mlx5: Extend UAR stuff to support dynamic allocation
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.
Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.
In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.
Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Maor Gottlieb [Sun, 24 Dec 2017 12:51:25 +0000 (14:51 +0200)]
IB/mlx5: Report inner RSS capability
Add missing inner RSS support capability as part of
the RSS supported fields.
In addition change MLX5_RX_HASH_INNER to 1UL << 31 in
order to define it as unsigned.
Fixes:
309fa3470fca ("IB/mlx5: Add support for RSS on the inner packet")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Guy Levi [Sun, 24 Dec 2017 12:51:24 +0000 (14:51 +0200)]
IB/mlx4: Add support to RSS hash for inner headers
Support RSS hash for inner headers according to a new flag,
MLX4_IB_RX_HASH_INNER provided by the vendor channel.
In case the flag is set, RSS hash will be done on the inner headers of
VXLAN packets (which are encapsulated).
Non-encapsulated packets will be hashed according to the outer headers.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Thu, 28 Dec 2017 04:50:46 +0000 (21:50 -0700)]
Merge branch 'from-rc' of git://git./linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.
These are small clean ups for the vmw_pvrdma and i40iw drivers.
* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
i40iw: Change accelerated flag to bool
Bryan Tan [Wed, 20 Dec 2017 19:27:28 +0000 (11:27 -0800)]
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
BIT() should not be used in the UAPI header. Remove it.
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 19:27:00 +0000 (11:27 -0800)]
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
refcount_t is the preferred type for refcounts. Change the
QP and CQ refcnt fields to use refcount_t.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 19:26:00 +0000 (11:26 -0800)]
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
Convert the sizeof(void *) in two kcalloc calls to be more
specific for the arrays that are being allocated.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 19:24:33 +0000 (11:24 -0800)]
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
Be more consistent in setting and checking is_kernel
flag for QPs and CQs.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 17:50:57 +0000 (09:50 -0800)]
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
Support for SRQs were added in the vmw_pvrdma userlevel library
before two necessary macros were added into the kernel ABI header
file. Add the two UAR SRQ macros that are required by the userlevel
library so that the library can rely on the kernel ABI header file
for these SRQ macro definitions.
Fixes:
8b10ba783c9d ("RDMA/vmw_pvrdma: Add shared receive queue support")
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Henry Orosco [Sun, 17 Dec 2017 18:21:39 +0000 (12:21 -0600)]
i40iw: Change accelerated flag to bool
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct i40iw_cm_node to bool.
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Randy Dunlap [Thu, 28 Dec 2017 00:33:57 +0000 (16:33 -0800)]
infiniband: drop unknown function from core_priv.h
Delete ibnl_chk_listeners() and its kernel-doc comments from the
core_priv.h header file. There is no such function.
Fixes:
233c1955835b ("RDMA/netlink: Reduce exposure of RDMA netlink functions")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Majd Dibbiny [Tue, 14 Nov 2017 12:51:56 +0000 (14:51 +0200)]
IB/core: Make sure that PSN does not overflow
The rq/sq->psn is 24 bits as defined in the IB spec, therefore we mask
out the 8 most significant bits to avoid overflow in modify_qp.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Nitzan Carmi [Tue, 26 Dec 2017 09:20:20 +0000 (11:20 +0200)]
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.
To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).
Fixes:
8a187ee52b04 ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Moni Shoua [Sun, 24 Dec 2017 11:54:58 +0000 (13:54 +0200)]
IB/core: Verify that QP is security enabled in create and destroy
The XRC target QP create flow sets up qp_sec only if there is an IB link with
LSM security enabled. However, several other related uAPI entry points blindly
follow the qp_sec NULL pointer, resulting in a possible oops.
Check for NULL before using qp_sec.
Cc: <stable@vger.kernel.org> # v4.12
Fixes:
d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Moni Shoua [Sun, 24 Dec 2017 11:54:57 +0000 (13:54 +0200)]
IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
If the input command length is larger than the kernel supports an error should
be returned in case the unsupported bytes are not cleared, instead of the
other way aroudn. This matches what all other callers of ib_is_udata_cleared
do and will avoid user ABI problems in the future.
Cc: <stable@vger.kernel.org> # v4.10
Fixes:
189aba99e700 ("IB/uverbs: Extend modify_qp and support packet pacing")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Majd Dibbiny [Sun, 24 Dec 2017 11:54:56 +0000 (13:54 +0200)]
IB/mlx5: Serialize access to the VMA list
User-space applications can do mmap and munmap directly at
any time.
Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.
Cc: <stable@vger.kernel.org> # v4.7
Fixes:
7c2344c3bbf9 ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Don Hiatt [Fri, 22 Dec 2017 16:46:00 +0000 (08:46 -0800)]
IB/hfi1: Change slid arg in ingress_pkey_table_fail to 32bit
Change the slid arg to ingress_pkey_table_fail() to a full
32Bits and do not convert to 16Bits in caller. This is so we
can keep everything 32bit in the kernel and only change to
16bit at the uapi boundary.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Don Hiatt [Fri, 22 Dec 2017 16:45:52 +0000 (08:45 -0800)]
IB/core: Use rdma_cap_opa_mad to check for OPA
Use rdma_cap_opa_mad() to check for OPA to promote code reuse.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Tatyana Nikolova [Fri, 22 Dec 2017 15:47:02 +0000 (09:47 -0600)]
i40iw: Fix the connection ORD value for loopback
The accepting QP ORD value should be adjusted not to
exceed the peer QP IRD value (RFC 6581). This is
skipped for loopback. After the ORD is validated
by i40iw_record_ird_ord(), adjust the ORD value of
the loopback accepting QP to prevent overrunning the
IRD space of the peer QP. Also move the ORD accounting
for 0-byte RDMA read to i40iw_record_ird_ord().
Fixes:
f27b4746f378 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Tatyana Nikolova [Fri, 22 Dec 2017 15:47:01 +0000 (09:47 -0600)]
i40iw: Validate correct IRD/ORD connection parameters
Casting to u16 before validating IRD/ORD connection
parameters could cause recording wrong IRD/ORD values
in the cm_node. Validate the IRD/ORD parameters as
they are passed by the application before recording
them.
Fixes:
f27b4746f378 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:47:00 +0000 (09:47 -0600)]
i40iw: Ignore LLP_DOUBT_REACHABILITY AE
The LLP_DOUBT_REACHABILITY Asynchronous Event (AE) is an early
warning of a connection issue. It is followed by LLP_TOO_MANY_RETRIES
AE, if the retransmit threshold is reached and recovery is not possible
for the connection.
Currently we terminate the connection on receiving the
LLP_DOUBT_REACHABILITY AE. Ignore this AE and
terminate the connection only on LLP_TOO_MANY_RETRIES AE.
This improves the user experience on cable disconnect/reconnect
scenario while running iWARP traffic. On cable disconnect,
the QP traffic is paused and the user has a larger and more
reasonable timeout within which if the cable is reconnected,
traffic can continue.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:46:59 +0000 (09:46 -0600)]
i40iw: Fix sequence number for the first partial FPDU
Partial FPDU processing is broken as the sequence number
for the first partial FPDU is wrong due to incorrect
Q2 buffer offset. The offset should be 64 rather than 16.
Fixes:
786c6adb3a94 ("i40iw: add puda code")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:46:58 +0000 (09:46 -0600)]
i40iw: Selectively teardown QPs on IP addr change event
On IP address change event, all connected QPs are torn down
irrespective of whether IP address is involved in a connection.
Only teardown connections those source or destination address
matches the netdev interface IP address being changed, and if
they are on the same VLAN as the netdev.
Fixes:
e5e74b61b165 ("i40iw: Add IP addr handling on netdev events")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:46:57 +0000 (09:46 -0600)]
i40iw: Add notifier for network device events
Register a netdevice notifier for netdev UP/DOWN
notification events and report the appropriate ib event.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:46:56 +0000 (09:46 -0600)]
i40iw: Correct Q1/XF object count equation
Lower Inbound RDMA Read Queue (Q1) object count by a factor of 2
as it is incorrectly doubled. Also, round up Q1 and Transmit FIFO (XF)
object count to power of 2 to satisfy hardware requirement.
Fixes:
86dbcd0f12e9 ("i40iw: add file to handle cqp calls")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:46:55 +0000 (09:46 -0600)]
i40iw: Use utility function roundup_pow_of_two()
Consolidate all power of 2 round calculations to
use kernel utility function roundup_pow_of_two().
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 22 Dec 2017 15:46:54 +0000 (09:46 -0600)]
i40iw: Set MAX_IRD_SIZE to 64
Increase I40IW_MAX_IRD_SIZE to 64 which is the device limit.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Dennis Dalessandro [Tue, 19 Dec 2017 03:57:13 +0000 (19:57 -0800)]
rdma: Update maintainer contact for Intel RDMA drivers
Ensure both Mike and I are listed as maintainer contacts for Intel's qib,
hfi1, and rdmavt drivers.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Venkata Sandeep Dhanalakota [Tue, 19 Dec 2017 03:26:58 +0000 (19:26 -0800)]
IB/SA: Check dlid before SA agent queries for ClassPortInfo
SA queries SM for class port info when there is a LID_CHANGE event.
When a base lid is configured before fm is started ie when smlid is
not yet assigned, SA handles the LID_CHANGE event and tries query SM
with lid 0. This will cause an hang.
[ 1106.958820] INFO: task kworker/2:0:23 blocked for more than 120 seconds.
[ 1106.965082] Tainted: G O 4.12.0+ #1
[ 1106.969602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 1106.977227] kworker/2:0 D 0 23 2 0x00000000
[ 1106.977250] Workqueue: infiniband update_ib_cpi [ib_core]
[ 1106.977261] Call Trace:
[ 1106.977273] __schedule+0x28e/0x860
[ 1106.977285] schedule+0x36/0x80
[ 1106.977298] schedule_timeout+0x1a3/0x2e0
[ 1106.977310] ? radix_tree_iter_tag_clear+0x1b/0x20
[ 1106.977322] ? idr_alloc+0x64/0x90
[ 1106.977334] wait_for_completion+0xe3/0x140
[ 1106.977347] ? wake_up_q+0x80/0x80
[ 1106.977369] update_ib_cpi+0x163/0x210 [ib_core]
[ 1106.977381] process_one_work+0x147/0x370
[ 1106.977394] worker_thread+0x4a/0x390
[ 1106.977406] kthread+0x109/0x140
[ 1106.977418] ? process_one_work+0x370/0x370
[ 1106.977430] ? kthread_park+0x60/0x60
[ 1106.977443] ret_from_fork+0x22/0x30
Always ensure a proper smlid is assigned before querying SM for cpi.
Fixes:
ee1c60b1bff ("IB/SA: Modify SA to implicitly cache Class Port info")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Sun, 17 Dec 2017 18:21:38 +0000 (12:21 -0600)]
nes: Change accelerated flag to bool
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct nes_cm_node to bool.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Michael J. Ruhl [Fri, 22 Dec 2017 16:47:20 +0000 (08:47 -0800)]
IB/hfi: Only read capability registers if the capability exists
During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump. Some of these registers are not available
in some circumstances (i.e. Virtual machines).
This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.
Delete unnecessary register read/write, and only access register if
the capability exists.
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes:
a618b7e40af2 ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pravin Shedge [Wed, 6 Dec 2017 16:49:39 +0000 (22:19 +0530)]
drivers: infiniband: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yixian Liu [Tue, 14 Nov 2017 09:26:17 +0000 (17:26 +0800)]
RDMA/hns: Add eq support of hip08
This patch adds eq support for hip08. The eq table can
be multi-hop addressed.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yixian Liu [Tue, 14 Nov 2017 09:26:16 +0000 (17:26 +0800)]
RDMA/hns: Refactor eq code for hip06
Considering the compatibility of supporting hip08's eq
process and possible changes of data structure, this patch
refactors the eq code structure of hip06.
We move all the eq process code for hip06 from hns_roce_eq.c
into hns_roce_hw_v1.c, and also for hns_roce_eq.h. With
these changes, it will be convenient to add the eq support
for later hardware version.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Alex Vesker [Thu, 21 Dec 2017 15:38:27 +0000 (17:38 +0200)]
IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
contradicts other flows such as ipoib_open possibly causing a deadlock.
To prevent this deadlock heavy flush is called with RTNL locked and
only then tries to acquire vlan_rwsem.
This deadlock is possible only when there are child interfaces.
[ 140.941758] ======================================================
[ 140.946276] WARNING: possible circular locking dependency detected
[ 140.950950] 4.15.0-rc1+ #9 Tainted: G O
[ 140.954797] ------------------------------------------------------
[ 140.959424] kworker/u32:1/146 is trying to acquire lock:
[ 140.963450] (rtnl_mutex){+.+.}, at: [<
ffffffffc083516a>] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 140.970006]
but task is already holding lock:
[ 140.975141] (&priv->vlan_rwsem){++++}, at: [<
ffffffffc0834ee1>] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
[ 140.982105]
which lock already depends on the new lock.
[ 140.990023]
the existing dependency chain (in reverse order) is:
[ 140.998650]
-> #1 (&priv->vlan_rwsem){++++}:
[ 141.005276] down_read+0x4d/0xb0
[ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib]
[ 141.014400] __dev_open+0xcb/0x140
[ 141.017919] __dev_change_flags+0x1a4/0x1e0
[ 141.022133] dev_change_flags+0x23/0x60
[ 141.025695] devinet_ioctl+0x704/0x7d0
[ 141.029156] sock_do_ioctl+0x20/0x50
[ 141.032526] sock_ioctl+0x221/0x300
[ 141.036079] do_vfs_ioctl+0xa6/0x6d0
[ 141.039656] SyS_ioctl+0x74/0x80
[ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96
[ 141.046891]
-> #0 (rtnl_mutex){+.+.}:
[ 141.051701] lock_acquire+0xd4/0x220
[ 141.055212] __mutex_lock+0x88/0x970
[ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
[ 141.067648] process_one_work+0x1f5/0x610
[ 141.071429] worker_thread+0x4a/0x3f0
[ 141.074890] kthread+0x141/0x180
[ 141.078085] ret_from_fork+0x24/0x30
[ 141.081559]
other info that might help us debug this:
[ 141.088967] Possible unsafe locking scenario:
[ 141.094280] CPU0 CPU1
[ 141.097953] ---- ----
[ 141.101640] lock(&priv->vlan_rwsem);
[ 141.104771] lock(rtnl_mutex);
[ 141.109207] lock(&priv->vlan_rwsem);
[ 141.114032] lock(rtnl_mutex);
[ 141.116800]
*** DEADLOCK ***
Fixes:
b4b678b06f6e ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Majd Dibbiny [Thu, 21 Dec 2017 15:38:26 +0000 (17:38 +0200)]
IB/mlx5: Fix congestion counters in LAG mode
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.
Fixes:
e1f24a79f424 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 17:51:40 +0000 (09:51 -0800)]
RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
The use of wait queues in vmw_pvrdma for handling concurrent
access to a resource leaves a race condition which can cause a use
after free bug.
Fix this by using the pattern from other drivers, complete() protected by
dec_and_test to ensure complete() is called only once.
Fixes:
29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 17:50:01 +0000 (09:50 -0800)]
RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
refcount_dec generates a warning when the operation
causes the refcount to hit zero. Avoid this by using
refcount_dec_and_test.
Fixes:
8b10ba783c9d ("RDMA/vmw_pvrdma: Add shared receive queue support")
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bryan Tan [Wed, 20 Dec 2017 17:49:03 +0000 (09:49 -0800)]
RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
The QP cleanup did not previously call ib_umem_release,
resulting in a user-triggerable kernel resource leak.
Fixes:
29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Steve Wise [Tue, 19 Dec 2017 22:02:10 +0000 (14:02 -0800)]
iw_cxgb4: when flushing, complete all wrs in a chain
If a wr chain was posted and needed to be flushed, only the first
wr in the chain was completed with FLUSHED status. The rest were
never completed. This caused isert to hang on shutdown due to the
missing completions which left iscsi IO commands referenced, stalling
the shutdown.
Fixes:
4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Steve Wise [Tue, 19 Dec 2017 18:29:25 +0000 (10:29 -0800)]
iw_cxgb4: reflect the original WR opcode in drain cqes
The flush/drain logic was not retaining the original wr opcode in
its completion. This can cause problems if the application uses
the completion opcode to make decisions.
Use bit 10 of the CQE header word to indicate the CQE is a special
drain completion, and save the original WR opcode in the cqe header
opcode field.
Fixes:
4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Steve Wise [Mon, 18 Dec 2017 21:10:00 +0000 (13:10 -0800)]
iw_cxgb4: Only validate the MSN for successful completions
If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Anton Vasilyev [Tue, 8 Aug 2017 15:56:37 +0000 (18:56 +0300)]
RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS
Debugfs file reset_stats is created with S_IRUSR permissions,
but ocrdma_dbgfs_ops_read() doesn't support OCRDMA_RESET_STATS,
whereas ocrdma_dbgfs_ops_write() supports only OCRDMA_RESET_STATS.
The patch fixes misstype with permissions.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bharat Potnuri [Tue, 28 Nov 2017 18:28:07 +0000 (23:58 +0530)]
iser-target: avoid reinitializing rdma contexts for isert commands
isert commands that failed during isert_rdma_rw_ctx_post() are queued to
Queue-Full(QF) queue and are scheduled to be reposted during queue-full
queue processing. During this reposting, the rdma contexts are initialised
again in isert_rdma_rw_ctx_post(), which is leaking significant memory.
unreferenced object 0xffff8830201d9640 (size 64):
comm "kworker/0:2", pid 195, jiffies
4295374851 (age 4528.436s)
hex dump (first 32 bytes):
00 60 8b cb 2e 00 00 00 00 10 00 00 00 00 00 00 .`..............
00 90 e3 cb 2e 00 00 00 00 10 00 00 00 00 00 00 ................
backtrace:
[<
ffffffff8170711e>] kmemleak_alloc+0x4e/0xb0
[<
ffffffff811f8ba5>] __kmalloc+0x125/0x2b0
[<
ffffffffa046b24f>] rdma_rw_ctx_init+0x15f/0x6f0 [ib_core]
[<
ffffffffa07ab644>] isert_rdma_rw_ctx_post+0xc4/0x3c0 [ib_isert]
[<
ffffffffa07ad972>] isert_put_datain+0x112/0x1c0 [ib_isert]
[<
ffffffffa07dddce>] lio_queue_data_in+0x2e/0x30 [iscsi_target_mod]
[<
ffffffffa076c322>] target_qf_do_work+0x2b2/0x4b0 [target_core_mod]
[<
ffffffff81080c3b>] process_one_work+0x1db/0x5d0
[<
ffffffff8108107d>] worker_thread+0x4d/0x3e0
[<
ffffffff81088667>] kthread+0x117/0x150
[<
ffffffff81713fa7>] ret_from_fork+0x27/0x40
[<
ffffffffffffffff>] 0xffffffffffffffff
Here is patch to use the older rdma contexts while reposting
the isert commands intead of reinitialising them.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:18 +0000 (14:52 +0200)]
IB/cm: Refactor to avoid setting path record software only fields
When path ah_attr initialization from path record
fails, ib_cm_send_rej() uses av.ah_attr fields to send out reject
message. In such cases initialization of path record software fields
is not needed. Code is simplified for same.
Additionally in current code in cm_req_handler, when ib_get_cached_gid
fails for a given sgid_index of the GID of the GRH of the incoming CM MAD,
error code 12 is sent. This error code refers to primary GID in incoming
CM REQ and not for the GID in in MAD packet.
Therefore code is refactored to send code 5 (unsupported request) for such
error.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:17 +0000 (14:52 +0200)]
IB/{core, umad, cm}: Rename ib_init_ah_from_wc to ib_init_ah_attr_from_wc
Currently ib_init_ah_from_wc initializes address handle attributes and
not the address handle object itself.
To avoid confusion between ah_attr vs ah, ib_init_ah_from_wc is
renamed to ib_init_ah_attr_from_wc to reflect that its initialzes
ah_attr.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:16 +0000 (14:52 +0200)]
IB/{core, cm, cma, ipoib}: Rename ib_init_ah_from_path to ib_init_ah_attr_from_path
Since ib_init_ah_from_path initializes the address handle attribute, it is
renamed to reflect so.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:15 +0000 (14:52 +0200)]
IB/cm: Fix sleeping while spin lock is held
In case of LAP are used for RoCE, it can lead to a problem of sleeping a
context while spin lock is held in below flow.
cm_lap_handler
->spin_lock
-> <..switch_case..>
-> cm_init_av_for_response
-> ib_init_ah_from_wc
-> rdma_addr_find_l2_eth_by_grh
wait_for_completion()
Therefore ah attribute initialization is done for incoming lap requests
outside of the lock context.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:14 +0000 (14:52 +0200)]
IB/cm: Handle address handle attribute init error
cm_init_av_by_path depends on ib_init_ah_from_path to initialize ah
attribute and ib_init_ah_from_path() can fail, such error should not
be ignored.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:13 +0000 (14:52 +0200)]
IB/{cm, umad}: Handle av init error
cm_init_av_for_response depends on ib_init_ah_from_wc() whose return
status is ignored.
ib_init_ah_from_wc() can fail and its return status should be handled as
done in this patch.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:12 +0000 (14:52 +0200)]
IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer
Currently there are no users of ib_find_gid for RoCE transport. It is
only used by IPoIB.
Therefore its simplified to ignore RoCE ports and GID type check which
was previously done for every port.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:11 +0000 (14:52 +0200)]
RDMA/core: Avoid copying ifindex twice
rdma_copy_addr copies the ifndex to bound_dev_if.
Therefore avoid copying it again after rdma_copy_addr call is completed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:10 +0000 (14:52 +0200)]
RDMA/{core, cma}: Simplify rdma_translate_ip
Since no caller needs vlan, rdma_translate_ip is simplified to avoid
vlan pointer.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:09 +0000 (14:52 +0200)]
IB/core: Removed unused function
rdma_addr_find_smac_by_sgid() is exported symbol not used by any kernel
module. Therefore its removed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:08 +0000 (14:52 +0200)]
RDMA/core: Avoid redundant memcpy in rdma_addr_find_l2_eth_by_grh
rdma_resolve_ip already copies 'addr' to its dev_addr argument.
Remove the duplicate memcpy and since it was the only user, remove the
'addr' member from resolve_cb_context.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:07 +0000 (14:52 +0200)]
IB/core: Avoid exporting module internal ib_find_gid_by_filter()
ib_find_gid_by_filter() is used only by ib_core, therefore avoid
exporting it.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:06 +0000 (14:52 +0200)]
IB/rxe: Avoid passing unused index pointer which is optional
While searching for GID, returned index is not used, so avoid passing
pointer during invocation.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:05 +0000 (14:52 +0200)]
IB/core: Refactor to avoid unnecessary check on GID lookup miss
Currently on every gid entry comparison miss found variable is checked;
which is not needed as those two comparison fail already indicate that
GID is not found yet.
So refactor to avoid such check and copy the GID index when found.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:04 +0000 (14:52 +0200)]
IB/core: Avoid unnecessary type cast
Type cast from void to struct find_gid_index_context is not needed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:03 +0000 (14:52 +0200)]
RDMA/cma: Introduce and use helper functions to init work
Introduce and user helper functions to initialize work for address
resolved and route resolved event that avoid code duplication at few
places.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:02 +0000 (14:52 +0200)]
RDMA/cma: Avoid setting path record type twice
Avoid setting path record type twice for RoCE.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:01 +0000 (14:52 +0200)]
RDMA/cma: Simplify netdev check
Current code checks for NULL ndev twice where 2nd check is always
invalid given the fact that during route resolving stage, device address
must be bound to netdevice interface.
This patch simplifies such check.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:52:00 +0000 (14:52 +0200)]
RDMA/cma: Set default GID type as RoCE when resolving RoCE route
As the function name suggests cma_resolve_iboe_route() resolves RoCE
route. However, its default GID type is IB_GID_TYPE_IB and not
IB_GID_TYPE_ROCE, even though both are mapped to the same enum value.
Change default GID type to IB_GID_TYPE_ROCE.
cma_iboe_set_mgid() is updated to reflect the RoCEv2 GID check.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Artemy Kovalyov [Tue, 14 Nov 2017 12:51:59 +0000 (14:51 +0200)]
IB/umem: Fix use of npages/nmap fields
In ib_umem structure npages holds original number of sg entries, while
nmap is number of DMA blocks returned by dma_map_sg.
Fixes:
c5d76f130b28 ('IB/core: Add umem function to read data from user-space')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Daniel Jurgens [Tue, 14 Nov 2017 12:51:58 +0000 (14:51 +0200)]
IB/cm: Add debug prints to ib_cm
Add debug prints to the error paths in the connection manager control
flows, to help debug connection management problems.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matan Barak [Tue, 14 Nov 2017 12:51:57 +0000 (14:51 +0200)]
IB/core: Fix memory leak in cm_req_handler error flows
In cm_req_handler error flows, sometimes cm_id_priv->timewait_info
isn't free'd.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:51:55 +0000 (14:51 +0200)]
RDMA/cma: Use correct size when writing netlink stats
The code was using the src size when formatting the dst. They are almost
certainly the same value but it reads wrong.
Fixes:
ce117ffac2e9 ("RDMA/cma: Export AF_IB statistics")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Erez Shitrit [Tue, 14 Nov 2017 12:51:54 +0000 (14:51 +0200)]
IB/ipoib: Update pathrec field if not valid record
In case that the PathRecord is not valid (SM changed its network prefix)
ipoib will continue issue PathQuery requests with the same parameters
that are in its database, which are no longer valid anymore.
Now the driver in that case will re-initialize the record from a valid
place (the priv structure keeps the updated values), and a valid request
will be issued.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Erez Shitrit [Tue, 14 Nov 2017 12:51:53 +0000 (14:51 +0200)]
IB/ipoib: Avoid memory leak if the SA returns a different DGID
The ipoib path database is organized around DGIDs from the LLADDR, but the
SA is free to return a different GID when asked for path. This causes a
bug because the SA's modified DGID is copied into the database key, even
though it is no longer the correct lookup key, causing a memory leak and
other malfunctions.
Ensure the database key does not change after the SA query completes.
Demonstration of the bug is as follows
ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it
creates new record in the DB with that gid as a key, and issues a new
request to the SM.
Now, the SM from some reason returns path-record with other SGID (for
example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local
subnet prefix) now ipoib will overwrite the current entry with the new
one, and if new request to the original GID arrives ipoib will not find
it in the DB (was overwritten) and will create new record that in its
turn will also be overwritten by the response from the SM, and so on
till the driver eats all the device memory.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Erez Alfasi [Tue, 14 Nov 2017 12:51:52 +0000 (14:51 +0200)]
IB/mlx4: Remove unused ibpd parameter
Remove unused ibpd parameter from create_qp_rss() function.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:51:51 +0000 (14:51 +0200)]
IB/core: Avoid exporting module internal function
ib_security_modify_qp and ib_security_pkey_access are core internal
function. So avoid exporting them.
ib_security_pkey_access is used only when secuirty hooks are enabled so
avoid defining it otherwise.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:51:50 +0000 (14:51 +0200)]
IB/core: Depend on IPv6 stack to resolve link local address for RoCEv2
RoCEv1 does not use the IPv6 stack to resolve the link local DGID since it
uses GID address. It forms the DMAC directly from the DGID.
The code became confused and also tried to use this bypass for RoCEv2
packets, however RoCEv2 always uses a IP address in the GID and must
always use ARP or neighbor discovery to get the DMAC address.
Now that rdma_addr_find_l2_eth_by_grh() supports resolving link local
address to find destination mac address, lets make use of it.
This aligns it to how the rest of the IPv6 stack resolves link local
destination IPv6 address.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 14 Nov 2017 12:51:49 +0000 (14:51 +0200)]
IB/{core/cm}: Fix generating a return AH for RoCEE
When computing a UD reverse path (return AH) from a WC the code was not
doing a route lookup anchored in a specific netdevice. This caused several
bugs, including broken IPv6 link-local address support in RoCEv2. [1]
This fixes the lookup by determining the GID table entry that the HW
matched to the SGID for the WC and then using the netdevice from that
entry to perform the route and ND lookup for the 'DGID' to build a return
AH.
RoCE GID table management ensures that right upper netdevices of the
physical netdevices are added. Therefore init_ah_from_wc doesn't need to
perform such check.
Now that route lookup is done based on the netdevice of the GID entry,
simplify code to not have ifindex and vlan pointers. As part of that,
refactor to have netdevice as input parameter. This is already discussed
at [2].
Finally ib_init_ah_from_wc resolves dmac for unicast GID in similar way as
what ib_resolve_eth_dmac() does. So ib_resolve_eth_dmac is refactored to
split for unicast and non unicast GIDs, so that it can be reused by
ib_init_ah_from_wc.
While we are at refactoring ib_resolve_eth_dmac(), it is further
simplified
(a) to avoid hoplimit as optional parameter, as there is only one
user who always queries hoplimit.
(b) for empty line.
(c) avoided zero initialization of ret.
(d) removed as exported symbol as only ib core uses it.
For IPv6, this is tested using simple rping test as below.
rping -sv -a ::0
rping -c -a fe80::268a:7ff:fe55:4661%ens2f1 -C 1 -v -d
[1] https://www.spinics.net/lists/linux-rdma/msg45690.html
[2] https://www.spinics.net/lists/linux-rdma/msg45710.html
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Andrew F. Davis [Tue, 5 Dec 2017 19:15:53 +0000 (13:15 -0600)]
IB/ocrdma: Remove unneeded conversions to bool
Found with scripts/coccinelle/misc/boolconv.cocci.
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Dan Carpenter [Tue, 5 Dec 2017 14:39:23 +0000 (17:39 +0300)]
IB/mlx4: Potential buffer overflow in _mlx4_set_path()
Smatch complains about this code:
drivers/infiniband/hw/mlx4/qp.c:1827 _mlx4_set_path()
error: buffer overflow 'dev->dev->caps.gid_table_len' 3 <= 255
The mlx4_ib_gid_index_to_real_index() does check that "port" is within
bounds, but we don't check the return value for errors. It seems simple
enough to add a check for that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Dan Carpenter [Tue, 5 Dec 2017 14:36:54 +0000 (17:36 +0300)]
RDMA/cxgb4: Add a sanity check in process_work()
The story is that Smatch marks skb->data as untrusted so it generates
a warning message here:
drivers/infiniband/hw/cxgb4/cm.c:4100 process_work()
error: buffer overflow 'work_handlers' 241 <= 255
In other places which handle this such as t4_uld_rx_handler() there is
some checking to make sure that the function pointer is not NULL. I
have added bounds checking and a check for NULL here as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Thu, 30 Nov 2017 13:30:06 +0000 (13:30 +0000)]
iw_cxgb4: make pointer reg_workq static
The pointer reg_workq is local to the source and does not need to be
in global scope, so make it static.
Cleans up sparse warning:
drivers/infiniband/hw/cxgb4/device.c:69:25: warning: symbol 'reg_workq'
was not declared. Should it be static?
Fixes:
1c8f1da5d851 ("iw_cxgb4: Fix possible circular dependency locking warning")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Arnd Bergmann [Mon, 27 Nov 2017 11:44:53 +0000 (12:44 +0100)]
infiniband: cxgb4: use ktime_get for timestamps
The debugfs file prints the difference between host timestamps as a
seconds/nanoseconds tuple, along with a 64-bit nanoseconds hardware
timestamp. The host time is read using getnstimeofday() which is
deprecated because of the y2038 overflow, and it suffers from time jumps
during settimeofday() and leap seconds.
Converting to ktime_get_ts64() would solve those two, but I'm going
a little further here by changing to ktime_get() and printing 64-bit
nanoseconds on both host and hw timestamps. This simplifies the code
further and makes the output easier to understand.
The format of the debugfs file obviously changes here, but this should
only be read by humans and not scripts, so I assume it's fine.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Geert Uytterhoeven [Wed, 29 Nov 2017 08:47:33 +0000 (09:47 +0100)]
RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo()
With gcc-4.1.2:
drivers/infiniband/core/iwpm_util.c: In function ‘iwpm_send_mapinfo’:
drivers/infiniband/core/iwpm_util.c:647: warning: ‘ret’ may be used uninitialized in this function
Indeed, if nl_client is not found in any of the scanned has buckets, ret
will be used uninitialized.
Preinitialize ret to -EINVAL to fix this.
Fixes:
30dc5e63d6a5ad24 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yuval Shaia [Wed, 29 Nov 2017 06:34:02 +0000 (08:34 +0200)]
IB/ipoib: Warn when one port fails to initialize
If one port fails to initialize an error message should indicate the
reason and driver should continue serving the working port(s) and other
HCA(s).
Fixes:
e4b2d06892c7 ("IB/ipoib: Remove device when one port fails to init").
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Arvind Yadav [Tue, 14 Nov 2017 12:50:56 +0000 (18:20 +0530)]
RDMA/bnxt_re: Remove redundant bnxt_qplib_disable_nq() call
The bnxt_qplib_disable_nq() call is redundant as it occurs
after 'goto fail' and hence it called twice. Remove it.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yuval Shaia [Wed, 13 Dec 2017 10:25:19 +0000 (12:25 +0200)]
IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
memalloc_noio_save modifies the behavior of MM, we must restore it after
we are done.
Fixes:
d83187dda9b9 ("IB/IPoIB: Convert IPoIB to memalloc_noio_* calls")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Andy Shevchenko [Mon, 11 Dec 2017 13:05:05 +0000 (15:05 +0200)]
IB/srp: replace custom implementation of hex2bin()
There is no need to have a duplication of the generic library, i.e.
hex2bin().
Replace the open coded variant.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Arnd Bergmann [Mon, 11 Dec 2017 11:45:44 +0000 (12:45 +0100)]
IB/mlx5: revisit -Wmaybe-uninitialized warning
A warning that I thought I had fixed before occasionally comes
back in rare randconfig builds (I found 7 instances in the last
100000 builds, originally it was much more frequent):
drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_ib_reg_user_mr':
drivers/infiniband/hw/mlx5/mr.c:1229:5: error: 'order' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (order <= mr_cache_max_order(dev)) {
^
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'ncont' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'page_shift' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1260:2: error: 'npages' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I've looked at all those findings again and noticed that they are all
with CONFIG_INFINIBAND_USER_MEM=n, which means ib_umem_get() returns
an error unconditionally and we never initialize or use those variables.
This triggers a condition in gcc iff mr_umem_get() is partially but not
entirely inlined, which in turn depends on the exact combination of
optimization settings. This is a known problem with gcc, with no
easy solution in the compiler, so this adds another workaround that
should be more reliable than my previous attempt.
Returning an error from mlx5_ib_reg_user_mr() earlier means that we
can completely bypass the logic that caused the warning, the compiler
can now see that the variable is never accessed.
Fixes:
14ab8896f5d9 ("IB/mlx5: avoid bogus -Wmaybe-uninitialized warning")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mustafa Ismail [Fri, 8 Dec 2017 20:34:29 +0000 (14:34 -0600)]
i40iw: Reinitialize add_sd_cnt
add_sd_cnt in info structure passed to i40iw_create_hmc_obj_type
must be 0 and since it is modified during the call, it must be
reset in the loop. This avoids unnecessarily reprogramming the
SDs multiple times with the same values.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Chien Tin Tung [Fri, 8 Dec 2017 20:34:28 +0000 (14:34 -0600)]
i40iw: Use sqsize to initialize cqp_requests elements
Use sqsize instead of I40IW_CQP_SW_SQSIZE_2048 to initialize
cqp_requests elements in the for-loop as sqsize is used
to allocate memory for cqp_requests.
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yuval Shaia [Wed, 29 Nov 2017 09:24:29 +0000 (11:24 +0200)]
IB/ipoib: Replace printk with pr_warn
pr_* is the preferred way to print messages, replace all
printk(KERN_WARN, ...) with pr_warn.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Gomonovych, Vasyl [Tue, 28 Nov 2017 15:18:07 +0000 (16:18 +0100)]
IB/core: Use PTR_ERR_OR_ZERO()
Fix ptr_ret.cocci warnings:
drivers/infiniband/core/uverbs_cmd.c:1156:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Arnd Bergmann [Mon, 27 Nov 2017 11:38:10 +0000 (12:38 +0100)]
nes: remove unused 'timeval' struct member
There is a stale entry in nes_cm_tcp_context that has apparently
never been used in mainline linux.
I'm trying to kill off all users of timeval as part of the
y2038-safety work, so let's just remove this one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Arnd Bergmann [Mon, 27 Nov 2017 11:38:09 +0000 (12:38 +0100)]
i40iw: remove unused 'timeval' struct member
There is a stale entry in i40iw_cm_tcp_context that apparently
was copied from the 'nes' driver but never used in i40iw.
I'm trying to kill off all users of timeval as part of the
y2038-safety work, so let's just remove this one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yuval Shaia [Sun, 26 Nov 2017 11:51:35 +0000 (13:51 +0200)]
RDMA/vmw_pvrdma: Do not re-calculate npages
There is no need to re-calculate the number of pages since it is already
done in ib_umem_get.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Tested-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Geert Uytterhoeven [Sun, 19 Nov 2017 18:59:21 +0000 (19:59 +0100)]
i40w: Remove garbage at end of INFINIBAND_I40IW Kconfig section
Remove leftover garbage (containing Kconfig dependencies for another
symbol?)
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Joe Perches [Tue, 14 Nov 2017 21:29:41 +0000 (16:29 -0500)]
IB/qib: Cleanup qib_set_part_key() with direct returns
Perhaps the function is better written without
the empty bail: label and without setting ret
and just using return.
Combining the int/bool conversion of any and the
direct returns makes the resulting code clearer.
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Tue, 14 Nov 2017 21:29:36 +0000 (16:29 -0500)]
IB/qib: remove redundant setting of any in for-loop
The variable all is being set but is never read after this
hence it can be removed from the for loop initialization.
Cleans up clang warning:
drivers/infiniband/hw/qib/qib_file_ops.c:640:7: warning: Value
stored to 'any' is never read
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Tue, 14 Nov 2017 12:34:52 +0000 (04:34 -0800)]
IB/qib: Fix comparison error with qperf compare/swap test
This failure exists with qib:
ver_rc_compare_swap:
mismatch, sequence 2, expected
123456789abcdef, got 0
The request builder was using the incorrect inlines to
build the request header resulting in incorrect data
in the atomic header.
Fix by using the appropriate inlines to create the request.
Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes:
261a4351844b ("IB/qib,IB/hfi: Use core common header file")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jan Sokolowski [Tue, 14 Nov 2017 12:34:45 +0000 (04:34 -0800)]
IB/hfi1: Use 4096 for default active MTU in query_qp
Currently, if a port is queried that has an invalid
Maximum Transmission Unit, driver reports default MTU of 2048.
This in incorrect.
Use default value of 4096 if invalid.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Don Hiatt [Tue, 14 Nov 2017 12:34:38 +0000 (04:34 -0800)]
IB/CM: Change sgid to IB GID when handling CM request
ULPs do not understand OPA GIDs and will reject CM requests
if the sgid does not match the local_gid. In order to
fix this behavior we convert the OPA GID back to an IB GID.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>