platform/kernel/linux-starfive.git
6 years agoIB/mlx5: Assign send CQ and recv CQ of UMR QP
Majd Dibbiny [Mon, 30 Oct 2017 12:23:13 +0000 (14:23 +0200)]
IB/mlx5: Assign send CQ and recv CQ of UMR QP

The UMR's QP is created by calling mlx5_ib_create_qp directly, and
therefore the send CQ and the recv CQ on the ibqp weren't assigned.

Assign them right after calling the mlx5_ib_create_qp to assure
that any access to those pointers will work as expected and won't
crash the system as might happen as part of reset flow.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/cxgb4: Protect from possible dereference
Leon Romanovsky [Sun, 29 Oct 2017 19:34:35 +0000 (21:34 +0200)]
RDMA/cxgb4: Protect from possible dereference

Smatch tool reports the following error:
  drivers/infiniband/hw/cxgb4/qp.c:1886
c4iw_create_qp() error: we previously assumed 'ucontext'
could be null (see line 1804)

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/bnxt_re: Remove unused vlan_tag variable
Leon Romanovsky [Sun, 29 Oct 2017 15:05:22 +0000 (17:05 +0200)]
RDMA/bnxt_re: Remove unused vlan_tag variable

The Broadcom driver produces the following compilation warning

drivers/infiniband/hw/bnxt_re/ib_verbs.c:
In function ‘bnxt_re_create_ah’:
drivers/infiniband/hw/bnxt_re/ib_verbs.c:668:6:
warning: variable ‘vlan_tag’ set but not used [-Wunused-but-set-variable]
u16 vlan_tag;

Let's remove it till vlan_tag will be implemented properly.

Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Add PCI write end padding support
Noa Osherovich [Sun, 29 Oct 2017 11:59:45 +0000 (13:59 +0200)]
IB/mlx5: Add PCI write end padding support

Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.

During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.

PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Add PCI write end padding flags for WQ and QP
Noa Osherovich [Sun, 29 Oct 2017 11:59:44 +0000 (13:59 +0200)]
IB/core: Add PCI write end padding flags for WQ and QP

There are root complexes that are able to optimize their
performance when incoming data is multiple full cache lines.

PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.

Add a relevant entry to ib_device_cap_flags to report such capability
of an RDMA device.

Add the QP and WQ create flags:
 * A QP/WQ created with a scatter end padding flag will cause
   HW to pad the last upstream write generated by a packet to cache line.

User should consider several factors before activating this feature:
- In case of high CPU memory load (which may cause PCI back pressure in
  turn), if a large percent of the writes are partial cache line, this
  feature should be checked as an optional solution.
- This feature might reduce performance if most packets are between one
  and two cache lines and PCIe throughput has reached its maximum
  capacity. E.g. 65B packet from the network port will lead to 128B
  write on PCIe, which may cause traffic on PCIe to reach high
  throughput.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rxe: don't crash, if allocation of crc algorithm failed
Thomas Bogendoerfer [Tue, 31 Oct 2017 10:16:46 +0000 (11:16 +0100)]
IB/rxe: don't crash, if allocation of crc algorithm failed

Following crash happens, if crc algorithm couldn't be allocated:

[ 1087.989072] rdma_rxe: loaded
[ 1097.855397] PCLMULQDQ-NI instructions are not detected.
[ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
[ 1097.901248] BUG: unable to handle kernel
[ 1097.901249] NULL pointer dereference
[ 1097.901250]  at 0000000000000046
[...]

Reason is that rxe->tfm is assigned the error return, which will then
be used for crypto_free_shash() in rxe_cleanup. Fix by using a
temporary variable and assigning it rxe->tfm after allocation succeeded.

Fixes: cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Avoid crash on pkey enforcement failed in received MADs
Parav Pandit [Tue, 31 Oct 2017 15:33:18 +0000 (10:33 -0500)]
IB/core: Avoid crash on pkey enforcement failed in received MADs

Below kernel crash is observed when Pkey security enforcement fails on
received MADs. This issue is reported in [1].

ib_free_recv_mad() accesses the rmpp_list, whose initialization is
needed before accessing it.
When security enformcent fails on received MADs, MAD processing avoided
due to security checks failed.

OpenSM[3770]: SM port is down
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: PGD 0
kernel: P4D 0
kernel:
kernel: Oops: 0002 [#1] SMP
kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
kernel: Call Trace:
kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
kernel:  process_one_work+0x1e9/0x410
kernel:  worker_thread+0x4b/0x410
kernel:  kthread+0x109/0x140
kernel:  ? process_one_work+0x410/0x410
kernel:  ? kthread_create_on_node+0x70/0x70
kernel:  ? SyS_exit_group+0x14/0x20
kernel:  ret_from_fork+0x25/0x30
kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
kernel: CR2: 0000000000000008

[1] : https://www.spinics.net/lists/linux-rdma/msg56190.html

Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/cxgb4: Annotate r2 and stag as __be32
Leon Romanovsky [Wed, 25 Oct 2017 20:10:19 +0000 (23:10 +0300)]
RDMA/cxgb4: Annotate r2 and stag as __be32

Chelsio cxgb4 HW is big-endian, hence there is need to properly
annotate r2 and stag fields as __be32 and not __u32 to fix the
following sparse warnings.

  drivers/infiniband/hw/cxgb4/qp.c:614:16:
    warning: incorrect type in assignment (different base types)
      expected unsigned int [unsigned] [usertype] r2
      got restricted __be32 [usertype] <noident>
  drivers/infiniband/hw/cxgb4/qp.c:615:18:
    warning: incorrect type in assignment (different base types)
      expected unsigned int [unsigned] [usertype] stag
      got restricted __be32 [usertype] <noident>

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx4: Fix RSS's QPC attributes assignments
Guy Levi [Wed, 25 Oct 2017 19:39:35 +0000 (22:39 +0300)]
IB/mlx4: Fix RSS's QPC attributes assignments

In the modify QP handler the base_qpn_udp field in the RSS QPC is
overwrite later by irrelevant value assignment. Hence, ingress packets
which gets to the RSS QP will be steered then to a garbage QPN.

The patch fixes this by skipping the above assignment when a RSS QP is
modified, also, the RSS context's attributes assignments are relocated
just before the context is posted to avoid future issues like this.

Additionally, this patch takes the opportunity to change the code to be
disciplined to the device's manual and assigns the RSS QP context just at
RESET to INIT transition.

Fixes:3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx4: Add report for RSS capabilities by vendor channel
Guy Levi [Wed, 25 Oct 2017 19:39:34 +0000 (22:39 +0300)]
IB/mlx4: Add report for RSS capabilities by vendor channel

The mlx4's RSS patches submission missed a report of RSS capabilities
which should be reported by the vendor channel in query_device.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/umem: Avoid partial declaration of non-static function
Leon Romanovsky [Wed, 25 Oct 2017 15:56:49 +0000 (18:56 +0300)]
RDMA/umem: Avoid partial declaration of non-static function

The RDMA/umem uses generic RB-trees macros to generate various ib_umem
access functions. The generation is performed with INTERVAL_TREE_DEFINE
macro, which allows one of two modes: declare all functions as static or
declare none of the function to be static.

The second mode of operation produces the following sparse errors:
 drivers/infiniband/core/umem_rbtree.c:69:1:
warning: symbol 'rbt_ib_umem_iter_first' was not declared.
Should it be static?
 drivers/infiniband/core/umem_rbtree.c:69:1:
warning: symbol 'rbt_ib_umem_iter_next' was not declared.
Should it be static?

Code relocation together with declaration of such functions to be
"static" solves the issue.

Because there is no need to have separate file for two functions,
let's consolidate umem_rtree.c and umem_odp.c into one file.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Modify the usage of cmd_sn in hip08
oulijun [Fri, 10 Nov 2017 08:55:53 +0000 (16:55 +0800)]
RDMA/hns: Modify the usage of cmd_sn in hip08

The cmd_sn field of CQ doorbell inits for 0. It should be
increment on each first db rung after a completion Event.
if the cmd_sn of notify doorbell Adjacent two times is the
same, the hardware will distinguish it for the same notify
request and update its type according to the priority level
of next event and solicited event.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Unify the calculation for hem index in hip08
oulijun [Fri, 10 Nov 2017 08:55:52 +0000 (16:55 +0800)]
RDMA/hns: Unify the calculation for hem index in hip08

The calculation of hem index are different between hns_roce_table_get
and hns_roce_table_find. When the table chunk size of TRRL is not
divisible by object size, it will faile to find the trrl table.

This patch is to update the calculation of the hem index in the
hns_roce_table_find to the same as which in the hns_roce_table_get.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Set the owner field of SQWQE in hip08 RoCE
oulijun [Fri, 10 Nov 2017 08:55:51 +0000 (16:55 +0800)]
RDMA/hns: Set the owner field of SQWQE in hip08 RoCE

the owner need to be set when posting sqwqe in hip08 RoCE.
The owner be used according to the below algorithm:
The value of owner should be 1 in the first lap, it
should be 0 in the second lap and in turn.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Add sq_invld_flg field in QP context
oulijun [Fri, 10 Nov 2017 08:55:50 +0000 (16:55 +0800)]
RDMA/hns: Add sq_invld_flg field in QP context

In hip08 RoCE, it need to add the sq_invld_flg field
in QP context for RoCE hardware.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Update the usage of ack timeout in hip08
oulijun [Fri, 10 Nov 2017 08:55:49 +0000 (16:55 +0800)]
RDMA/hns: Update the usage of ack timeout in hip08

The ack timeout's value in qp context shall be a 5-bit value
and be assgined by users. When at of qpc is set zero, the
timer is disabled.

When attr_mask set for IB_QP_TIMEOUT, The ack timeout field
is effective.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Set sq_cur_sge_blk_addr field in QPC in hip08
oulijun [Fri, 10 Nov 2017 08:55:48 +0000 (16:55 +0800)]
RDMA/hns: Set sq_cur_sge_blk_addr field in QPC in hip08

If the extend sges exist, the sq_cur_sge_blk_addr field in QPC
(qp context) should be configured.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Enable the cqe field of sqwqe of RC
oulijun [Fri, 10 Nov 2017 08:55:47 +0000 (16:55 +0800)]
RDMA/hns: Enable the cqe field of sqwqe of RC

When sig_type of qpc is non-selectable, all sq's wqes will produce
cqe and not depend on the cqe attribute of wqe. When sig_type of
qpc is selectable, The cqe attribute of wqe will decide whether to
produce the cqe.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Set se attribute of sqwqe in hip08
oulijun [Fri, 10 Nov 2017 08:55:46 +0000 (16:55 +0800)]
RDMA/hns: Set se attribute of sqwqe in hip08

When send flags is IB_SEND_SOLICITED, the se(solicated event)
field of sqwqe will be set.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Configure fence attribute in hip08 RoCE
oulijun [Fri, 10 Nov 2017 08:55:45 +0000 (16:55 +0800)]
RDMA/hns: Configure fence attribute in hip08 RoCE

When post wr for mixed rdma operation, we need to use fence
mechanism to keep the correct execute order.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Configure TRRL field in hip08 RoCE device
oulijun [Fri, 10 Nov 2017 08:55:44 +0000 (16:55 +0800)]
RDMA/hns: Configure TRRL field in hip08 RoCE device

The TRRL(Target RDMA Read/aTOMIC List) record the information
of receiving RDMA READ or ATOMIC operation in hip08. It will
be used the hardware. The driver need to assign a continuous
physical address for trrl_ba field of qp context.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Update calculation of irrl_ba field for hip08
oulijun [Fri, 10 Nov 2017 08:55:43 +0000 (16:55 +0800)]
RDMA/hns: Update calculation of irrl_ba field for hip08

The irrl(initiator RDMA Read/Atomic list) base address of qp
context is assigned for addr[63:6]. This patch mainly fixed
it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Configure sgid type for hip08 RoCE
Wei Hu(Xavier) [Thu, 26 Oct 2017 09:10:25 +0000 (17:10 +0800)]
RDMA/hns: Configure sgid type for hip08 RoCE

The hardware vendors need to generate RoCEv1 or RoCEv2
packet according to the sgid type configured.

Besides, update the gid table size for hip08 RoCE
device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Generate gid type of RoCEv2
Wei Hu(Xavier) [Thu, 26 Oct 2017 09:10:24 +0000 (17:10 +0800)]
RDMA/hns: Generate gid type of RoCEv2

HNS_ROCE_CAP_FALG_ROCE_V1_V2 is added for selecting capability of
RoCE in hns driver. When HNS_ROCE_CAP_FALG_ROCE_V1_V2 is set,
driver will inform ib core that the related hns device can support
RoCEv2, and ib core can generate the gid of the related type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Add rereg mr support for hip08
Wei Hu(Xavier) [Thu, 26 Oct 2017 09:10:23 +0000 (17:10 +0800)]
RDMA/hns: Add rereg mr support for hip08

This patch adds rereg mr support for hip08.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'k.o/for-rc' into k.o/for-next
Doug Ledford [Wed, 1 Nov 2017 19:25:27 +0000 (15:25 -0400)]
Merge branch 'k.o/for-rc' into k.o/for-next

Pick up the missing netlink oops fix

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Take advantage of kvzalloc_node in sdma initialization
Mike Marciniszyn [Mon, 23 Oct 2017 13:06:32 +0000 (06:06 -0700)]
IB/hfi1: Take advantage of kvzalloc_node in sdma initialization

The code that allocates the tx ring in the sdma code fails to take
advantage of kvzalloc variations.

Fix by converting to use kvzalloc_node.

Reported-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Don't modify num_user_contexts module parameter
Kamenee Arumugam [Mon, 23 Oct 2017 13:06:24 +0000 (06:06 -0700)]
IB/hfi1: Don't modify num_user_contexts module parameter

The driver parameter num_user_contexts controls global behavior and
should not be modified by the driver.
This patch eliminates modification of num_user_contexts by using a
local variable to keep track of the value.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Insure int mask for in-kernel receive contexts is clear
Mike Marciniszyn [Mon, 23 Oct 2017 13:06:16 +0000 (06:06 -0700)]
IB/hfi1: Insure int mask for in-kernel receive contexts is clear

The only use for the urg interrupt is for priority PSM packets.

There is no reason for this interrupt to be enabled for kernel
contexts.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Add tx_opcode_stats like the opcode_stats
Mike Marciniszyn [Mon, 23 Oct 2017 13:06:08 +0000 (06:06 -0700)]
IB/hfi1: Add tx_opcode_stats like the opcode_stats

This patch adds tx_opcode_stats to parallel the
(rx)opcode_stats in the debugfs.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Validate PKEY for incoming GSI MAD packets
Sebastian Sanchez [Wed, 25 Oct 2017 15:15:18 +0000 (08:15 -0700)]
IB/hfi1: Validate PKEY for incoming GSI MAD packets

These are the use-cases where the pkey needs to be tested to see
if a packet needs to be dropped.

a) Check if pkey is not FULL_MGMT_P_KEY or LIM_MGMT_P_KEY,
   drop the packet as it's not part of the management partition.
   Self-originated packets are an exception.

b) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is
   in the table, the packet is coming from a management node,
   and the receiving node is also a management node, so it is safe
   for the packet to go through.

c) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is
   NOT in the table, drop the packet as LIM_MGMT_P_KEY should
   always be in the pkey table. It could be a misconfiguration.

d) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is
   NOT in the table, it is safe for the packet to go through
   since a non-management node is talking to another non-managment
   node.

e) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is in
   the table, drop the packet because a non-management node is
   talking to a management node, and it could be an attack.

For the implementation, these rules can be simplied to only checking
for (a) and (e). There's no need to check for rule (b) as
the packet doesn't need to be dropped. Rule (c) is not possible in
the driver as LIM_MGMT_P_KEY is always in the pkey table.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIb/hfi1: Return actual operational VLs in port info query
Patel Jay P [Mon, 23 Oct 2017 13:05:53 +0000 (06:05 -0700)]
Ib/hfi1: Return actual operational VLs in port info query

__subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as
operational vls. hfi1_get_ib_cfg() returns vls_operational field in
hfi1_pportdata. The problem with this is that the value is always equal
to vls_supported field in hfi1_pportdata.

The logic to calculate operational_vls is to set value passed by FM
(in  __subn_set_opa_portinfo routine). If no value is passed then
default value is stored in operational_vls.

Field actual_vls_operational is calculated on the basis of buffer
control table. Hence, modifying hfi1_get_ib_cfg() to return
actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Patel Jay P <jay.p.patel@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Race condition between user notification and driver state
Michael J. Ruhl [Mon, 23 Oct 2017 13:05:45 +0000 (06:05 -0700)]
IB/hfi1: Race condition between user notification and driver state

The handler for link init state (HLS_UP_INIT) notifies userspace
(update_statusp()) before enabling the device
(RCV_CTRL_RCV_PORT_ENABLE_SMASK) or setting the device state
(ppd->host_link_state).  This causes a race condition where the
userspace thinks the interface is in the INIT state before the driver
has set that state.

Rework the code path to eliminate the race.

Delay setting the init state until after a HW settling period.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agobnxt_re: Implement the shutdown hook of the L2-RoCE driver interface
Somnath Kotur [Wed, 25 Oct 2017 03:42:30 +0000 (09:12 +0530)]
bnxt_re: Implement the shutdown hook of the L2-RoCE driver interface

When host is shutting down, it invokes the shutdown hook of the
L2 driver where it would attempt to free the MSI-X vectors, but would fail
because some vectors are held by the RoCE driver.
Implement the new hook in the L2 -> RoCE interface which will be invoked so that
the RoCE driver can unregister the device and free up the MSI-X vectors it had
claimed so that L2 can proceed with it's shutdown without failure.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/cxgb4: Declare stag as __be32
Leon Romanovsky [Wed, 25 Oct 2017 04:41:11 +0000 (07:41 +0300)]
RDMA/cxgb4: Declare stag as __be32

The scqe.stag is actually __b32, fix it.

  drivers/infiniband/hw/cxgb4/cq.c:754:52: warning: cast to restricted __be32

Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rxe: Convert timers to use timer_setup()
Kees Cook [Tue, 24 Oct 2017 10:28:27 +0000 (03:28 -0700)]
IB/rxe: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag
Michael J. Ruhl [Tue, 24 Oct 2017 12:41:01 +0000 (08:41 -0400)]
RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag

rdma_nl_rcv_msg() checks to see if it should use the .dump() callback
or the .doit() callback.  The check is done with this check:

if (flags & NLM_F_DUMP) ...

The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH).

When an RDMA_NL_LS message (response) is received, the bit used for
indicating an error is the same bit as NLM_F_ROOT.

NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.

ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error
occurs in the service.  The current code then misinterprets the
NLM_F_DUMP bit and trys to call the .dump() callback.

If the .dump() callback for the specified request is not available
(which is true for the RDMA_NL_LS messages) the following Oops occurs:

[ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at
   (null)
[ 4555.969046] IP:           (null)
[ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0
[ 4555.979287] Oops: 0010 [#1] SMP
[ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod
target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod
dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me
lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci
pps_core drm dca libata i2c_algo_bit i2c_core
[ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G          I
4.14.0-rc2+ #6
[ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000
[ 4556.087967] RIP: 0010:          (null)
[ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246
[ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX:
0000000000000000
[ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI:
ffff881056362000
[ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09:
0000000000001100
[ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12:
ffff881056362000
[ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15:
0000000000000ec0
[ 4556.137945] FS:  00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000)
knlGS:0000000000000000
[ 4556.147000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4:
00000000001606e0
[ 4556.161419] Call Trace:
[ 4556.164167]  ? netlink_dump+0x12c/0x290
[ 4556.168468]  __netlink_dump_start+0x186/0x1f0
[ 4556.173357]  rdma_nl_rcv_msg+0x193/0x1b0 [ib_core]
[ 4556.178724]  rdma_nl_rcv+0xdc/0x130 [ib_core]
[ 4556.183604]  netlink_unicast+0x181/0x240
[ 4556.187998]  netlink_sendmsg+0x2c2/0x3b0
[ 4556.192392]  sock_sendmsg+0x38/0x50
[ 4556.196299]  SYSC_sendto+0x102/0x190
[ 4556.200308]  ? __audit_syscall_entry+0xaf/0x100
[ 4556.205387]  ? syscall_trace_enter+0x1d0/0x2b0
[ 4556.210366]  ? __audit_syscall_exit+0x209/0x290
[ 4556.215442]  SyS_sendto+0xe/0x10
[ 4556.219060]  do_syscall_64+0x67/0x1b0
[ 4556.223165]  entry_SYSCALL64_slow_path+0x25/0x25
[ 4556.228328] RIP: 0033:0x7fe0b9db2a63
[ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX:
000000000000002c
[ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX:
00007fe0b9db2a63
[ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI:
000000000000000d
[ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09:
000000000000000c
[ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12:
00007ffc55edc280
[ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15:
0000000000000001
[ 4556.282368] Code:  Bad RIP value.
[ 4556.286629] RIP:           (null) RSP: ffffc900246b7bc8
[ 4556.293013] CR2: 0000000000000000
[ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]---
[ 4556.305465] Kernel panic - not syncing: Fatal exception
[ 4556.313786] Kernel Offset: disabled
[ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception
[ 4556.328960] ------------[ cut here ]------------

Special case RDMA_NL_LS response messages to call the appropriate
callback.

Additionally, make sure that the .dump() callback is not NULL
before calling it.

Fixes: 647c75ac59a48a54 ("RDMA/netlink: Convert LS to doit callback")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/cm: Fix memory corruption in handling CM request
Parav Pandit [Thu, 19 Oct 2017 05:40:30 +0000 (08:40 +0300)]
IB/cm: Fix memory corruption in handling CM request

In recent code, two path record entries are alwasy cleared while
allocated could be either one or two path record entries.
This leads to zero out of unallocated memory.

This fix initializes alternative path record only when alternative path
is set.

While we are at it, path record allocation doesn't check for OPA
alternative path, but rest of the code checks for OPA alternative path.
Path record allocation code doesn't check for OPA alternative LID.
This can further lead to memory corruption when only one path record is
allocated, but there is actually alternative OPA path record present in CM
request.

Cc: <stable@vger.kernel.org> # v4.12+
Fixes: 9fdca4da4d8c ("IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Add support for RSS on the inner packet
Maor Gottlieb [Thu, 19 Oct 2017 05:25:56 +0000 (08:25 +0300)]
IB/mlx5: Add support for RSS on the inner packet

Some user space application would like to do RSS on the inner
packet fields instead on the outer.
When MLX5_RX_HASH_INNER is set with one or more of the other
hash fields, then the RSS will be done using the inner packet.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Add tunneling offloads support
Maor Gottlieb [Thu, 19 Oct 2017 05:25:55 +0000 (08:25 +0300)]
IB/mlx5: Add tunneling offloads support

The device can support receive Stateless Offloads for the inner
packet's fields only when the packet is processed by TIR which is
enabled to support tunneling. Otherwise, the device treats the
packet as an ordinary non-tunneling packet and receive offloads
can be done only for the outer packet's field.
In order to enable receive Stateless Offloading support for incoming
tunneling traffic the TIR should be created with tunneled_offload_en.
Tunneling offloads is supported only be raw ethernet QP.

This patch includes:
* New QP creation flag for tunneling offloads.
* Reports device capabilities.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Update tunnel offloads bits
Maor Gottlieb [Thu, 19 Oct 2017 05:25:54 +0000 (08:25 +0300)]
IB/mlx5: Update tunnel offloads bits

This patch updates the mlx5_ifc with the following:
- Fix tunnel_stateless_gre typo.
- max_geneve_opt_len - Maximum geneve options length.
- tunnel_stateless_geneve_rx - If set, receive Stateless
  Offloads for Geneve tunneled (inner) packets are supported.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Support padded 128B CQE feature
Guy Levi [Thu, 19 Oct 2017 05:25:53 +0000 (08:25 +0300)]
IB/mlx5: Support padded 128B CQE feature

In some benchmarks and some CPU architectures, writing the CQE on a full
cache line size improves performance by saving memory access operations
(read-modify-write) relative to partial cache line change. This patch
lets the user to configure the device to pad the CQE up to 128B in case
its content is less than 128B. Currently the driver supports only padding
for a CQE size of 128B.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Support 128B CQE compression feature
Guy Levi [Thu, 19 Oct 2017 05:25:52 +0000 (08:25 +0300)]
IB/mlx5: Support 128B CQE compression feature

In commit 1cbe6fc86ccf ("IB/mlx5: Add support for CQE compressing") the
concept of CQE compression was introduced and added a support for 64B
CQE size. This change update the code to support 128B CQE size as well.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Add 128B CQE compression and padding HW bits
Guy Levi [Thu, 19 Oct 2017 05:25:51 +0000 (08:25 +0300)]
IB/mlx5: Add 128B CQE compression and padding HW bits

Adding new bits in mlx5_ifc_cmd_hca_cap to get the hardware capabilities
for:
 - compression_128: Support 128B CQE compression
 - cqe_128_always: Support 128B CQE padding

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Allow creation of a multi-packet RQ
Noa Osherovich [Tue, 17 Oct 2017 15:01:13 +0000 (18:01 +0300)]
IB/mlx5: Allow creation of a multi-packet RQ

Allow creation of a multi-packet receive queue.

In order to create a multi-packet RQ, the following fields in
the mlx5_ib_rwq should be set:
- log_num_strides: Log of number of strides per WQE
- single_stride_log_num_of_bytes: Log of a single stride size
- two_byte_shift_en: When enabled, hardware pads 2 bytes of zeros
  before writing the message to memory (e.g. for the IP alignment).

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Expose multi-packet RQ capabilities
Noa Osherovich [Tue, 17 Oct 2017 15:01:12 +0000 (18:01 +0300)]
IB/mlx5: Expose multi-packet RQ capabilities

This patch reports the device's striding RQ capabilities to
the user-space:
- min/max_single_stride_log_num_of_bytes: Log of min/max number of
  bytes in a single stride.
- min/max_single_wqe_log_num_of_strides: Log of min/max number of
  strides in a single WQE.
- supported_qpts: A bit mask to know which QP types support multi-
  packet RQ, for now only Raw Packet QPs.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Add modify CQ support for hip08
oulijun [Thu, 19 Oct 2017 03:52:40 +0000 (11:52 +0800)]
RDMA/hns: Add modify CQ support for hip08

It is needed to call modify cq API for modifying cq
context fields for controlling event generation
moderations. This patch mainly adds it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Update the PD&CQE&MTT specification in hip08
Wei Hu(Xavier) [Wed, 18 Oct 2017 09:32:46 +0000 (17:32 +0800)]
RDMA/hns: Update the PD&CQE&MTT specification in hip08

This patch updates the PD specification to 16M for hip08. And it
updates the numbers of mtt and cqe segments for the buddy.

As the CQE supports hop num 1 addressing, the CQE specification is
64k. This patch updates to set the CQE specification to 64k.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Update the IRRL table chunk size in hip08
Wei Hu(Xavier) [Wed, 18 Oct 2017 09:32:45 +0000 (17:32 +0800)]
RDMA/hns: Update the IRRL table chunk size in hip08

As the increase of the IRRL specification in hip08, the IRRL table
chunk size needs to be updated.
This patch updates the IRRL table chunk size to 256k for hip08.

Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: Support WQE/CQE/PBL page size configurable feature in hip08
Wei Hu(Xavier) [Wed, 18 Oct 2017 09:32:44 +0000 (17:32 +0800)]
RDMA/hns: Support WQE/CQE/PBL page size configurable feature in hip08

This patch updates to support WQE, CQE and PBL page size configurable
feature, which includes base address page size and buffer page size.

Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/ipoib: Change number of TX wqe to 64
Erez Shitrit [Thu, 19 Oct 2017 04:56:44 +0000 (07:56 +0300)]
IB/ipoib: Change number of TX wqe to 64

NAPI budget is 64 packets, while maximum polling size for
the send CQ is 16. Let's bring them in sync, so the NAPI
budget will be reused completely.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/ipoib: Use NAPI in UD/TX flows
Erez Shitrit [Thu, 19 Oct 2017 04:56:43 +0000 (07:56 +0300)]
IB/ipoib: Use NAPI in UD/TX flows

Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism
to handle the completions of each packet that has been sent to the HW.

The next major changes were taken:
 * The driver init completion function in the creation of the send CQ,
   that function triggers the napi scheduling.
 * The driver uses CQ for RX for both modes UD and CM, and CQ for TX
   for CM and UD.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/ipoib: Get rid of the tx_outstanding variable in all modes
Erez Shitrit [Thu, 19 Oct 2017 04:56:42 +0000 (07:56 +0300)]
IB/ipoib: Get rid of the tx_outstanding variable in all modes

The first step toward using NAPI in the UD/TX flow is to separate
between two flows, the NAPI and the xmit, meaning no use of shared
variables between both flows.

This patch takes out the tx_outstanding variable that was used in both
flows and instead the driver uses the 2 cyclic ring variables: tx_head
and tx_tail, tx_head used in the xmit flow and tx_tail in the NAPI flow.

Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA: Remove Sean's and Hal's emails from MAINTAINER file
Leon Romanovsky [Sun, 15 Oct 2017 06:56:48 +0000 (09:56 +0300)]
RDMA: Remove Sean's and Hal's emails from MAINTAINER file

RDMA subsystem has one active maintainer who can apply
patches - Doug Ledford, but the RDMA entries in MAINTAINER file
are not stating it. The following patch removes Sean's and Hal's
emails from the maintainers list.

It will allow clean get_maintaner.pl output which is needed for people
outside of our community and various semi-automatic tools.

Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Move cqp_cmd_head init to CQP initialization
Bob Sharp [Mon, 16 Oct 2017 20:46:05 +0000 (15:46 -0500)]
i40iw: Move cqp_cmd_head init to CQP initialization

Control QP (CQP) command backlog list is initialized at
device initialization time.  It is not reinitialized in
the reset flow.  Move the initialization to CQP creation
time so the list can be initialized correctly for reset as well.

Fixes: 86dbcd0f12e9 ("i40iw: add file to handle cqp calls")
Signed-off-by: Bob Sharp <Robert.O.Sharp@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Remove UDA QP from QoS list if creation fails
Ivan Barrera [Mon, 16 Oct 2017 20:46:04 +0000 (15:46 -0500)]
i40iw: Remove UDA QP from QoS list if creation fails

If User-space Direct Access (UDA) QP creation fails,
the QP entry is not removed from QoS list. Fix this
by removing QP from QoS list if create QP fails.

Fixes: 0fc2dc58896f ("i40iw: Add Quality of Service support")
Signed-off-by: Ivan Barrera <ivan.d.barrera@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Clear CQP Head/Tail during initialization
Christopher Bednarz [Mon, 16 Oct 2017 20:46:03 +0000 (15:46 -0500)]
i40iw: Clear CQP Head/Tail during initialization

Clear Control Queue Pair (CQP) Head/Tail on CQP
initialization as during an adapter reset, these
values are not reinitialized. Tail is cleared by
writing 0 to CQP's tail register. Head is cleared
by writing 0 to CQP's doorbell register.

Fixes: 86dbcd0f12e9 ("i40iw: add file to handle cqp calls")
Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Refactor queue depth calculation
Shiraz Saleem [Mon, 16 Oct 2017 20:46:02 +0000 (15:46 -0500)]
i40iw: Refactor queue depth calculation

Queue depth calculations use a mix of work requests
and actual number of bytes. Consolidate all calculations
using minimum WQE size to avoid confusion.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Reinitialize IEQ on MTU change
Shiraz Saleem [Mon, 16 Oct 2017 20:46:01 +0000 (15:46 -0500)]
i40iw: Reinitialize IEQ on MTU change

On a netdev MTU change event, the iWARP
Exception Queue (IEQ) buffers may not be
sized properly to handle the new MTU.

Reinitialize the IEQ with new MTU size on MTU
change event.

Also, add define for the max ethernet frame size
field in IEQ QP context instead of the snd_mss
define which is for iWARP QPs' MSS field.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Move ceq_valid to i40iw_sc_dev structure
Shiraz Saleem [Mon, 16 Oct 2017 20:46:00 +0000 (15:46 -0500)]
i40iw: Move ceq_valid to i40iw_sc_dev structure

Completion Event Queues are created and destroyed on
a per device basis as opposed to per User-space Direct
Access resource.

Move ceq_valid to the correct place in i40iw_sc_dev
from i40iw_puda_rsrc.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Account for IPv6 header when setting MSS
Shiraz Saleem [Mon, 16 Oct 2017 20:45:59 +0000 (15:45 -0500)]
i40iw: Account for IPv6 header when setting MSS

The IPv6 header size is not subtracted from MTU when MSS is
set for QPs.

Save MTU opposed to MSS in the vsi struct during
initialization and calculate the MSS based on IPv4 vs
IPv6 connection.

Fixes: f27b4746f378 ("i40iw: add connection management code")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Remove unused structures
Mustafa Ismail [Mon, 16 Oct 2017 20:45:58 +0000 (15:45 -0500)]
i40iw: Remove unused structures

Some structures for post SQ operation are not used.
Remove the following:

i40iw_post_send_w_inv
i40iw_post_inline_send_w_inv
send_w_sol
send_w_inv
send_w_sol_inv
inline_send_w_sol
inline_send_w_inv
inline_send_w_sol_inv

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Move exception_lan_queue to VSI structure
Mustafa Ismail [Mon, 16 Oct 2017 20:45:57 +0000 (15:45 -0500)]
i40iw: Move exception_lan_queue to VSI structure

Consolidate exception_lan_queue under VSI structure
where it belongs. Remove it from device and QP structures.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Remove unused static_rsrc from i40iw_create_qp_info
Mustafa Ismail [Mon, 16 Oct 2017 20:45:56 +0000 (15:45 -0500)]
i40iw: Remove unused static_rsrc from i40iw_create_qp_info

The field static_rsrc in i40iw_create_qp_info is unused
and not needed. Remove it.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Ignore AE source field in AEQE for some AEs
Mustafa Ismail [Mon, 16 Oct 2017 20:45:55 +0000 (15:45 -0500)]
i40iw: Ignore AE source field in AEQE for some AEs

The AE source field in Asynchronous Event Queue Entry (AEQE) is
not set by the hardware for some AEs, but the code assumes it does.
This results in incorrect processing of some AEs.
Fix this by setting ae_src to I40IW_AE_SOURCE_RSVD for those AEs.

Fixes: 86dbcd0f12e9 ("i40iw: add file to handle cqp calls")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoi40iw: Cleanup AE processing
Mustafa Ismail [Mon, 16 Oct 2017 20:45:54 +0000 (15:45 -0500)]
i40iw: Cleanup AE processing

Remove unimplemented Asynchronous Events (AE) and correct names.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'timer_setup' into for-next
Doug Ledford [Wed, 18 Oct 2017 17:12:09 +0000 (13:12 -0400)]
Merge branch 'timer_setup' into for-next

Conflicts:
drivers/infiniband/hw/cxgb4/cm.c
drivers/infiniband/hw/qib/qib_driver.c
drivers/infiniband/hw/qib/qib_mad.c

There were minor fixups needed in these files.  Just minor context diffs
due to patches from independent sources touching the same basic area.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'for-next-early' into for-next
Doug Ledford [Wed, 18 Oct 2017 17:07:13 +0000 (13:07 -0400)]
Merge branch 'for-next-early' into for-next

The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base.  I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/mlx5: Use ARRAY_SIZE
Jérémy Lefaure [Mon, 16 Oct 2017 05:45:17 +0000 (08:45 +0300)]
IB/mlx5: Use ARRAY_SIZE

Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Fix calculation of maximum RoCE MTU
Parav Pandit [Mon, 16 Oct 2017 05:45:16 +0000 (08:45 +0300)]
IB/core: Fix calculation of maximum RoCE MTU

The original code only took into consideration the largest header
possible after the IB_BTH_BYTES.  This was incorrect, as the largest
possible header size is the largest possible combination of headers we
might run into.  The new code accounts for all possible headers in the
largest possible combination and subtracts that from the MTU to make
sure that all packets will fit on the wire.

Link: https://www.spinics.net/lists/linux-rdma/msg54558.html
Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Fix use workqueue without WQ_MEM_RECLAIM
Parav Pandit [Mon, 16 Oct 2017 05:45:15 +0000 (08:45 +0300)]
IB/core: Fix use workqueue without WQ_MEM_RECLAIM

The IB/core provides address resolution service and invokes callback
handler when address resolve request completes of requester in worker
thread context.

Such caller might allocate or free memory in callback handler
depending on the completion status to make further progress or to
terminate a connection. Most ULPs resolve route which involves
allocating route entry and path record elements in callback event handler.

It has been noticed that WQ_MEM_RECLAIM flag should not be used for
workers that tend to allocate memory in this [1] thread discussion.

In order to mitigate this situation, WQ_MEM_RECLAIM flag was dropped for
other such WQs in this [2] patch.

Similar problem might arise with address resolution path, though its not
yet noticed. The ib_addr workqueue is not memory reclaim path due to its
nature of invoking callback that might allocate memory or don't free any
memory under memory pressure.

[1] https://www.spinics.net/lists/linux-rdma/msg53239.html
[2] https://www.spinics.net/lists/linux-rdma/msg53416.html

Fixes: f54816261c2b ("IB/addr: Remove deprecated create_singlethread_workqueue")
Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Fix unable to change lifespan entry for hw_counters
Parav Pandit [Mon, 16 Oct 2017 05:45:14 +0000 (08:45 +0300)]
IB/core: Fix unable to change lifespan entry for hw_counters

This patch fixes the case where 'lifespan' entry of the hw_counters
is not writable. Currently write callback is not exposed for for
the hw_counters sysfs operation. Due to this, modifying lifespan
value results into permission denied error in below example.

echo 10 > /sys/class/infiniband/mlx5_0/ports/1/hw_counters/lifespan
-bash: /sys/class/infiniband/mlx5_0/ports/1/hw_counters/lifespan:
Permission denied

This patch adds the hook to modify any attribute which implements
store() operation.

Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB: Let ib_core resolve destination mac address
Parav Pandit [Mon, 16 Oct 2017 05:45:13 +0000 (08:45 +0300)]
IB: Let ib_core resolve destination mac address

Since IB/core resolves the destination mac address for user and kernel
consumers, avoid resolving in multiple provider drivers.

Only ib_core resolves DMAC now, therefore resolve_eth_dmac is removed as
exported symbol.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/core: Introduce and use rdma_create_user_ah
Parav Pandit [Mon, 16 Oct 2017 05:45:12 +0000 (08:45 +0300)]
IB/core: Introduce and use rdma_create_user_ah

Introduce rdma_create_user_ah API which allows passing udata to
provider driver and additionally which resolves DMAC for RoCE.

ib_resolve_eth_dmac() resolves destination mac address for unicast,
multicast, link local ipv4 mapped ipv6 and ipv6 destination gid entry.
This allows all RoCE provider drivers to avoid duplicating such code.

Such change brings consistency where IB core always resolves dmac and pass
it to RoCE provider drivers for user and kernel consumers, with this
ah_attr->roce.dmac is always an input field for provider drivers.

This uniformity avoids exporting ib_resolve_eth_dmac symbol to providers
or other modules. Therefore its removed as exported symbol at later in
the patch series.

Now uverbs and umad both makes use of rdma_create_user_ah API which
fixes the issue where umad has invalid DMAC for address.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/cxgb4: Convert timers to use timer_setup()
Kees Cook [Mon, 16 Oct 2017 22:52:31 +0000 (15:52 -0700)]
RDMA/cxgb4: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Also removes an unused timer and
drops a redundant initialization.

Cc: Steve Wise <swise@chelsio.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/i40iw: Convert timers to use timer_setup() (part 2)
Kees Cook [Tue, 17 Oct 2017 18:37:54 +0000 (11:37 -0700)]
RDMA/i40iw: Convert timers to use timer_setup() (part 2)

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

This includes the remaining timers missed in the earlier i40iw patch.

Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/cxgb3: Convert timers to use timer_setup()
Kees Cook [Mon, 16 Oct 2017 22:54:11 +0000 (15:54 -0700)]
RDMA/cxgb3: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Also removes an unused timer.

Cc: Steve Wise <swise@chelsio.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Convert timers to use timer_setup()
Kees Cook [Mon, 16 Oct 2017 22:51:54 +0000 (15:51 -0700)]
IB/hfi1: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Switches test of .data field to
.function, since .data will be going away.

Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rdmavt: Convert timers to use timer_setup()
Kees Cook [Mon, 16 Oct 2017 22:51:13 +0000 (15:51 -0700)]
IB/rdmavt: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

setup_timer() was already being called before the open-coded init_timer()
and .data assignment. These are removed as well.

Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge tag 'mlx5-updates-2017-10-11' of git://git.kernel.org/pub/scm/linux/kernel...
Doug Ledford [Wed, 18 Oct 2017 15:03:35 +0000 (11:03 -0400)]
Merge tag 'mlx5-updates-2017-10-11' of git://git./linux/kernel/git/mellanox/linux into mlnx-shared

mlx5-updates-2017-10-11: IPoIB Muli Pkey support

This series provides the support for IPoIB Multi Pkey.  InfiniBand Pkeys
are the equivalent of Ethernet vlans.  Currently IPoIB device driver
supports only default Pkey and IPoIB Pkey child interfaces are not
supported with IPoIB offloads mode, this series will add the support for
that by allowing creating mlx5 multiple IPoIB netdevices with a
non-default Pkey.

mlx5 IPoIB Pkey child interface is smaller version of mlx5i IPoIB
interfaces and shares most of its resources with the parent IPoIB
interface, namely RX steering and ring queue resources.

The only mlx5 resources a child Pkey interface will be creating are the
TX rings, since they should be assigned to a specific Pkey.

mlx5i Pkey netdev is implemented via new mlx5e netdev profile
implemented in mlx5/core/ipoib/ipoib_vlan.c.

The series starts with a refactoring of mlx5e PTP and mlx5 clock
implementation to move the code to be part of mlx5 core rather than
mlx5e netdevice, in order to make mlx5 clock and PTP registration part
of the core to be shared with mlx5e master Ethernet netdev/IPoIB parent
netdev and mlx5_ib in the near future.

Add the support for attaching multiple underlay QPs for the different
Pkeys in mlx5 core RX steering.

Add Pkey index to rdma_netdev to add the ability to set PKEY index to
lower IPoIB offload netdev.

Use hash-table to map between DQPN (Destination QP number) to child
netdev for the IPoIB parent netdev to forward RX packets to the
corresponding child Pkey netdev, since the RX rings are shared.

The reset of the series adds the ipoib child Pkey: mlx5e netdev profile,
netdev nods implementation and minimal set of ethtool callbacks.

Thanks,
Saeed.

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Make CM timeout dependent on subnet timeout
Bart Van Assche [Wed, 11 Oct 2017 17:27:29 +0000 (10:27 -0700)]
IB/srp: Make CM timeout dependent on subnet timeout

For small networks it is safe to reduce the subnet timeout from
its default value (18 for opensm) to 16. Make the SRP CM timeout
dependent on the subnet timeout such that decreasing the subnet
timeout also causes SRP failover and failback to occur faster.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Cache global rkey
Bart Van Assche [Wed, 11 Oct 2017 17:27:28 +0000 (10:27 -0700)]
IB/srp: Cache global rkey

This is a micro-optimization for the hot path.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Remove second argument of srp_destroy_qp()
Bart Van Assche [Wed, 11 Oct 2017 17:27:27 +0000 (10:27 -0700)]
IB/srp: Remove second argument of srp_destroy_qp()

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srp: Avoid that a cable pull can trigger a kernel crash
Bart Van Assche [Wed, 11 Oct 2017 17:27:26 +0000 (10:27 -0700)]
IB/srp: Avoid that a cable pull can trigger a kernel crash

This patch fixes the following kernel crash:

general protection fault: 0000 [#1] PREEMPT SMP
Workqueue: ib_mad2 timeout_sends [ib_core]
Call Trace:
 ib_sa_path_rec_callback+0x1c4/0x1d0 [ib_core]
 send_handler+0xb2/0xd0 [ib_core]
 timeout_sends+0x14d/0x220 [ib_core]
 process_one_work+0x200/0x630
 worker_thread+0x4e/0x3b0
 kthread+0x113/0x150

Fixes: commit aef9ec39c47f ("IB: Add SCSI RDMA Protocol (SRP) initiator")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Change default behavior from using SRQ to using RC
Bart Van Assche [Wed, 11 Oct 2017 17:27:25 +0000 (10:27 -0700)]
IB/srpt: Change default behavior from using SRQ to using RC

Although in the RC mode more resources are needed that mode has three
advantages over SRQ:
- It works with all RDMA adapters, even those that do not support
  SRQ.
- Posting WRs and polling WCs does not trigger lock contention
  because only one thread at a time accesses a WR or WC queue in
  non-SRQ mode.
- The end-to-end flow control mechanism is used.

>From the IB spec:

    C9-150.2.1: For QPs that are not associated with an SRQ, each HCA
    receive queue shall generate end-to-end flow control credits. If
    a QP is associated with an SRQ, the HCA receive queue shall not
    generate end-to-end flow control credits.

Add new configfs attributes that allow to configure which mode to use
(/sys/kernel/config/target/srpt/$GUID/$GUID/attrib/use_srq). Note:
only the attribute for port 1 is relevant on multi-port adapters.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Cache global L_Key
Bart Van Assche [Wed, 11 Oct 2017 17:27:24 +0000 (10:27 -0700)]
IB/srpt: Cache global L_Key

This patch is a micro-optimization for the hot path.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Limit the send and receive queue sizes to what the HCA supports
Bart Van Assche [Wed, 11 Oct 2017 17:27:23 +0000 (10:27 -0700)]
IB/srpt: Limit the send and receive queue sizes to what the HCA supports

Additionally, correct the comment about ch->rq_size.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/srpt: Do not accept invalid initiator port names
Bart Van Assche [Wed, 11 Oct 2017 17:27:22 +0000 (10:27 -0700)]
IB/srpt: Do not accept invalid initiator port names

Make srpt_parse_i_port_id() return a negative value if hex2bin()
fails.

Fixes: commit a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/uverbs: Make the code in ib_uverbs_cmd_verbs() less confusing
Bart Van Assche [Mon, 16 Oct 2017 15:47:45 +0000 (08:47 -0700)]
RDMA/uverbs: Make the code in ib_uverbs_cmd_verbs() less confusing

This patch reduces the number of #ifdefs and also avoids that
smatch reports the following:

drivers/infiniband/core/uverbs_ioctl.c:276: ib_uverbs_cmd_verbs() warn: if statement not indented
drivers/infiniband/core/uverbs_ioctl.c:280: ib_uverbs_cmd_verbs() warn: possible memory leak of 'ctx'
drivers/infiniband/core/uverbs_ioctl.c:315: ib_uverbs_cmd_verbs() warn: if statement not indented

References: commit fac9658cabb9 ("IB/core: Add new ioctl interface")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agobnxt_re: Make room for mapping beyond 32 entries
Somnath Kotur [Tue, 17 Oct 2017 08:31:35 +0000 (14:01 +0530)]
bnxt_re: Make room for mapping beyond 32 entries

Latest chip requires indexes 32 to 47 be used for the internal HW block
that manages queue mapping.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agobnxt_re: Fix incorrect usage of test_bit()
Somnath Kotur [Fri, 13 Oct 2017 06:08:00 +0000 (11:38 +0530)]
bnxt_re: Fix incorrect usage of test_bit()

test_bit() takes a bit number while the 'flags' field in
struct bnxt_qplib_rcfw was using actual BIT position converted
values.
Fix this by assigning bit numbers and use consistent APIs
all the flag values.
Also logging a message in case of failure.

Thanks to Dan Carpenter for pointing this out.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Add MODULE_FIRMWARE statements
Thomas Bogendoerfer [Wed, 11 Oct 2017 12:41:34 +0000 (14:41 +0200)]
IB/hfi1: Add MODULE_FIRMWARE statements

Provide information about used firmware files via modinfo.

Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoRDMA/hns: fix spelling mistake: "Reseved" -> "Reserved"
Colin Ian King [Tue, 10 Oct 2017 15:14:51 +0000 (16:14 +0100)]
RDMA/hns: fix spelling mistake: "Reseved" -> "Reserved"

Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge branch 'hfi1' into k.o/for-next
Doug Ledford [Wed, 18 Oct 2017 14:15:14 +0000 (10:15 -0400)]
Merge branch 'hfi1' into k.o/for-next

Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/rdmavt: Don't wait for resources in QP reset
Alex Estrin [Mon, 9 Oct 2017 19:38:33 +0000 (12:38 -0700)]
IB/rdmavt: Don't wait for resources in QP reset

Per the IBTA spec, QP destroy shall fail if the QP is attached
to multicast groups, although the spec is silent on modify_qp
to reset state. It implies that ULP must deregister QP from
all mcast groups for destroy to succeed.
The faulty patch "IB/ipoib: Update broadcast object if PKey value
was changed in index 0" exposed two issues in rdmavt:
1. Rvt QP reset waits for qp references to go to zero.
This will hang if QP is attached to multicast groups.
2. The mcast group detach will fail for a QP in reset state
therefore preventing ULP from correcting the issue.
This patch moves the reference count wait to the the destroy QP
path and allows a QP mcast detach to work in the reset state.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Set hdr_type when tx req is allocated
Kaike Wan [Mon, 9 Oct 2017 19:38:26 +0000 (12:38 -0700)]
IB/hfi1: Set hdr_type when tx req is allocated

Setting the protocol type should be part of initializing the tx request.
For UC and RC, the current protocol type is part of the qp priv structure.
For ud requests, it needs to be adjusted dynamically, based on the AV
posted with the WQE. This patch will simplify the initialization of the
tx request.

Fixes: 5b6cabb0db77 ("IB/hfi1: Add 16B RC/UC support")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Eliminate allocation while atomic
Don Hiatt [Mon, 9 Oct 2017 19:38:19 +0000 (12:38 -0700)]
IB/hfi1: Eliminate allocation while atomic

The PIO trailing buffer was being dynamically allocated
but the kcalloc return value was not being checked. Further,
the GFP_KERNEL was being used even though the send engine
might be called with interrupts disabled.

Since the maximum size of the trailing buffer is only 12
bytes (CRC = 4, LT = 1, Pad = 0 to 7 bytes) just statically
allocate the buffer, remove the alloc entirely and share it
with the SDMA engine by making it global.

Reported-by: Leon Romanovsky <leon@kernel.org>
Fixes: 566d53a82644 ("IB/hfi1: Enhance PIO/SDMA send for 16B")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Mask out A bit from psn trace
Don Hiatt [Mon, 9 Oct 2017 19:38:12 +0000 (12:38 -0700)]
IB/hfi1: Mask out A bit from psn trace

The trace logic prior to the fixes below used to mask the
A bit from the psn. It now mistakenly displays the A bit,
which is already displayed separately.

Fix by adding the appropriate mask to the psn tracing.

Fixes: 228d2af1b723 ("IB/hfi1: Separate input/output header tracing")
Fixes: 863cf89d472f ("IB/hfi1: Add 16B trace support")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Correct unnecessary acquisition of HW mutex
Grzegorz Morys [Mon, 9 Oct 2017 19:38:04 +0000 (12:38 -0700)]
IB/hfi1: Correct unnecessary acquisition of HW mutex

Avoid acquiring already acquired hardware mutex and releasing
the unacquired one as these are redundant operations.
Add printouts for such situations to help detect potential errors
within the driver.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Allow meta version 4 for platform configuration
Jakub Byczkowski [Mon, 9 Oct 2017 19:37:56 +0000 (12:37 -0700)]
IB/hfi1: Allow meta version 4 for platform configuration

Parsing of platform configuration format 4 will fail on meta
version check. Allow meta version 4 during parsing.

Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>