platform/kernel/linux-rpi.git
8 years agoIB/mlx5: Add create and destroy functionality for Raw Packet QP
majd@mellanox.com [Thu, 14 Jan 2016 17:13:04 +0000 (19:13 +0200)]
IB/mlx5: Add create and destroy functionality for Raw Packet QP

This patch adds support for Raw Packet QP for the mlx5 device.

Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp
object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object.

Since the SQ and RQ work-queue (WQ) buffers are not contiguous like
other QPs, we allocate separate buffers in the user-space and pass
the address of each one of them separately to the kernel.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
majd@mellanox.com [Thu, 14 Jan 2016 17:13:03 +0000 (19:13 +0200)]
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types

Extract specific IB QP fields to mlx5_ib_qp_trans structure.
The mlx5_core QP object resides in mlx5_ib_qp_base, which all QP types
inherit from. When we need to find mlx5_ib_qp using mlx5_core QP
(event handling and co), we use a pointer that resides in
mlx5_ib_qp_base.

In addition, we delete all redundant fields that weren't used anywhere
in the code:
-doorbell_qpn
-sq_max_wqes_per_wr
-sq_spare_wqes

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Allocate a Transport Domain for each ucontext
majd@mellanox.com [Thu, 14 Jan 2016 17:13:02 +0000 (19:13 +0200)]
IB/mlx5: Allocate a Transport Domain for each ucontext

Transport Domain groups several TIS and TIR object. By grouping
these object, it defines wheather local loopback packets that
are sent from the TIS objects in the group are received by the
TIR objects in the same group.

Allocate a Transport Domain(TD) for each user context to be used
in the future by Raw Packet QP for Self-Loopback Control.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx5_core: Warn on unsupported events of QP/RQ/SQ
majd@mellanox.com [Thu, 14 Jan 2016 17:13:01 +0000 (19:13 +0200)]
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ

When an event arrives on QP/RQ/SQ, check whether it's supported,
and print a warning message otherwise.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx5_core: Add RQ and SQ event handling
majd@mellanox.com [Thu, 14 Jan 2016 17:13:00 +0000 (19:13 +0200)]
net/mlx5_core: Add RQ and SQ event handling

RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated
events are affiliated also with SQs and RQs.

Since SQ, RQ and QP resource numbers do not share the same name
space, a queue type field was added to the event data to specify
the SW object that the event is affiliated with.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx5_core: Export transport objects
majd@mellanox.com [Thu, 14 Jan 2016 17:12:59 +0000 (19:12 +0200)]
net/mlx5_core: Export transport objects

To be used by mlx5_ib in the following patches for implementing
RAW PACKET QP.

Add mlx5_core_ prefix to alloc and delloc transport_domain since
they are exposed now.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Expose CQE version to user-space
Haggai Abramovsky [Thu, 14 Jan 2016 17:12:58 +0000 (19:12 +0200)]
IB/mlx5: Expose CQE version to user-space

Per user context, work with CQE version that both the user-space
and the kernel support. Report this CQE version via the response of
the alloc_ucontext command.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add CQE version 1 support to user QPs and SRQs
Haggai Abramovsky [Thu, 14 Jan 2016 17:12:57 +0000 (19:12 +0200)]
IB/mlx5: Add CQE version 1 support to user QPs and SRQs

Enforce working with CQE version 1 when the user supports CQE
version 1 and asked to work this way.

If the user still works with CQE version 0, then use the default
CQE version to tell the Firmware that the user still works in the
older mode.

After this patch, the kernel still reports CQE version 0.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
Haggai Abramovsky [Thu, 14 Jan 2016 17:12:56 +0000 (19:12 +0200)]
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext

The wrong buffer size was passed to ib_is_udata_cleared.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sa: Fix netlink local service GFP crash
Kaike Wan [Thu, 21 Jan 2016 13:41:31 +0000 (08:41 -0500)]
IB/sa: Fix netlink local service GFP crash

The rdma netlink local service registers a handler to handle RESOLVE
response and another handler to handle SET_TIMEOUT request. The first
thing these handlers do is to call netlink_capable() to check the
access right of the received skb to make sure that the sender has root
access. Under normal conditions, such responses and requests will be
directly forwarded to the handlers without going through the netlink_dump
pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c).
However, a user application could send a RESOLVE request (not response)
to the local service, which will fall into the netlink_dump pathway,
where a new skb will be created without initializing the control block.
This new skb will be eventually forwarded to the local service RESOLVE
response handler. Unfortunately, netlink_capable() will cause general
protection fault if the skb's control block is not initialized. This
patch will address the problem by checking the skb first.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srpt: Remove redundant wc array
Sagi Grimberg [Tue, 12 Jan 2016 11:19:50 +0000 (13:19 +0200)]
IB/srpt: Remove redundant wc array

No usage after the conversion to the new CQ API.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Improve ipoib UD performance
Mike Marciniszyn [Thu, 24 Dec 2015 16:19:23 +0000 (11:19 -0500)]
IB/qib: Improve ipoib UD performance

Based on profiling, UD performance drops in case of processes
in a single client due to excess context switches when
the progress workqueue is scheduled.

This is solved by modifying the heuristic to select the
direct progress instead of the scheduling progress via
the workqueue when UD-like situations are detected in
the heuristic.

Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Advertise RoCE v2 support
Matan Barak [Thu, 14 Jan 2016 15:50:43 +0000 (17:50 +0200)]
IB/mlx4: Advertise RoCE v2 support

Advertise RoCE v2 support in port_immutable attributes according to
the hardware's capabilities. This enables the verbs stack to use
RoCE v2 mode.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Create and use another QP1 for RoCEv2
Moni Shoua [Thu, 14 Jan 2016 15:50:42 +0000 (17:50 +0200)]
IB/mlx4: Create and use another QP1 for RoCEv2

The mlx4 driver uses a special QP to implement the GSI QP. This kind
of QP allows to build the InfiniBand headers in software.
When mlx4 hardware builds the packet, it calculates the ICRC and puts
it at the end of the payload. However, this ICRC calculation depends
on the QP configuration, which is determined when the QP is modified
(roce_mode during INIT->RTR).
When receiving a packet, the ICRC verification doesn't depend on this
configuration.
Therefore, using two GSI QPs for send (one for each RoCE version) and
one GSI QP for receive are required.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
Moni Shoua [Thu, 14 Jan 2016 15:50:41 +0000 (17:50 +0200)]
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers

RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.

This patch adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Enable RoCE v2 when the IB device is added
Moni Shoua [Thu, 14 Jan 2016 15:50:40 +0000 (17:50 +0200)]
IB/mlx4: Enable RoCE v2 when the IB device is added

If the hardware supports RoCE v2, we configure the hardware UDP
port according to the RoCE v2 Annex when mlx4_ib device is added.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Support modify_qp for RoCE v2
Moni Shoua [Thu, 14 Jan 2016 15:50:39 +0000 (17:50 +0200)]
IB/mlx4: Support modify_qp for RoCE v2

In order to support modify_qp for RoCE v2, we need to set
the gid_type in the QP context.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add definition for the standard RoCE V2 UDP port
Moni Shoua [Thu, 14 Jan 2016 15:50:38 +0000 (17:50 +0200)]
IB/core: Add definition for the standard RoCE V2 UDP port

This will be used in hardware device driver when building QP or AH
contexts.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4_core: Add support for RoCE v2 entropy
Moni Shoua [Thu, 14 Jan 2016 15:50:37 +0000 (17:50 +0200)]
net/mlx4_core: Add support for RoCE v2 entropy

In RoCE v2 we need to choose a source UDP port, we do so by using
entropy over the source and dest QPNs.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4_core: Add support for configuring RoCE v2 UDP port
Moni Shoua [Thu, 14 Jan 2016 15:50:36 +0000 (17:50 +0200)]
net/mlx4_core: Add support for configuring RoCE v2 UDP port

In order to support RoCE v2, the hardware needs to be configured
to classify certain UDP packets as RoCE v2 packets and pass it
through its RoCE pipeline. This patch enables configuring this
UDP port.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Add support for setting RoCEv2 gids in hardware
Moni Shoua [Thu, 14 Jan 2016 15:50:35 +0000 (17:50 +0200)]
IB/mlx4: Add support for setting RoCEv2 gids in hardware

To tell hardware about a gid with type RoCEv2, software needs a new
modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can
replace the old method, MLX4_SET_PORT_GID_TABLE, for  RoCEv1 gids.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4_core: Configure mlx4 hardware for mixed RoCE v1/v2 modes
Moni Shoua [Thu, 14 Jan 2016 15:50:34 +0000 (17:50 +0200)]
net/mlx4_core: Configure mlx4 hardware for mixed RoCE v1/v2 modes

If the hardware supports RoCE v2 (mixed with RoCE v1) mode, we enable
it. This is necessary in order to support RoCE v2.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Add gid_type to GID properties
Moni Shoua [Thu, 14 Jan 2016 15:50:33 +0000 (17:50 +0200)]
IB/mlx4: Add gid_type to GID properties

IB core driver adds a property of type to struct ib_gid_attr.
The mlx4 driver should take that in consideration when modifying or
querying the hardware gid table.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4: Query RoCE support
Moni Shoua [Thu, 14 Jan 2016 15:50:32 +0000 (17:50 +0200)]
net/mlx4: Query RoCE support

Query the RoCE support from firmware using the appropriate firmware
commands. Downstream patches will read these capabilities and act
accordingly.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvc_rdma: use local_dma_lkey
Christoph Hellwig [Fri, 8 Jan 2016 07:53:41 +0000 (23:53 -0800)]
svc_rdma: use local_dma_lkey

We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in svc_rdma and stop registering our own MR.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Add class for RDMA backwards direction transport
Chuck Lever [Thu, 7 Jan 2016 19:50:10 +0000 (14:50 -0500)]
svcrdma: Add class for RDMA backwards direction transport

To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Define maximum number of backchannel requests
Chuck Lever [Thu, 7 Jan 2016 19:50:02 +0000 (14:50 -0500)]
svcrdma: Define maximum number of backchannel requests

Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.

The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Make map_xdr non-static
Chuck Lever [Thu, 7 Jan 2016 19:49:53 +0000 (14:49 -0500)]
svcrdma: Make map_xdr non-static

Pre-requisite to use map_xdr in the backchannel code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Remove last two __GFP_NOFAIL call sites
Chuck Lever [Thu, 7 Jan 2016 19:49:45 +0000 (14:49 -0500)]
svcrdma: Remove last two __GFP_NOFAIL call sites

Clean up.

These functions can otherwise fail, so check for page allocation
failures too.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Add gfp flags to svc_rdma_post_recv()
Chuck Lever [Thu, 7 Jan 2016 19:49:37 +0000 (14:49 -0500)]
svcrdma: Add gfp flags to svc_rdma_post_recv()

svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Remove unused req_map and ctxt kmem_caches
Chuck Lever [Thu, 7 Jan 2016 19:49:28 +0000 (14:49 -0500)]
svcrdma: Remove unused req_map and ctxt kmem_caches

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Improve allocation of struct svc_rdma_req_map
Chuck Lever [Thu, 7 Jan 2016 19:49:20 +0000 (14:49 -0500)]
svcrdma: Improve allocation of struct svc_rdma_req_map

To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Improve allocation of struct svc_rdma_op_ctxt
Chuck Lever [Thu, 7 Jan 2016 19:49:12 +0000 (14:49 -0500)]
svcrdma: Improve allocation of struct svc_rdma_op_ctxt

When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d915c ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Clean up process_context()
Chuck Lever [Thu, 7 Jan 2016 19:49:03 +0000 (14:49 -0500)]
svcrdma: Clean up process_context()

Be sure the completed ctxt is put in every path.

The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.

Remove/disable debugging.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agosvcrdma: Clean up rdma_create_xprt()
Chuck Lever [Thu, 7 Jan 2016 19:48:55 +0000 (14:48 -0500)]
svcrdma: Clean up rdma_create_xprt()

kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Use hop-limit from IP stack for RoCE
Matan Barak [Mon, 4 Jan 2016 08:49:54 +0000 (10:49 +0200)]
IB/core: Use hop-limit from IP stack for RoCE

Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Rename rdma_addr_find_dmac_by_grh
Matan Barak [Mon, 4 Jan 2016 08:49:53 +0000 (10:49 +0200)]
IB/core: Rename rdma_addr_find_dmac_by_grh

rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cm: Fix a recently introduced deadlock
Bart Van Assche [Fri, 1 Jan 2016 12:17:46 +0000 (13:17 +0100)]
IB/cm: Fix a recently introduced deadlock

ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.

This patch fixes e.g. the following deadlock:
Acked-by: Erez Shitrit <erezsh@mellanox.com>
=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G            E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
  [<ffffffff810a3c11>] mark_held_locks+0x71/0x90
  [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
  [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
  [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
  [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
  [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
  [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
  [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
  [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
  [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
  [<ffffffff8107184d>] process_one_work+0x1bd/0x460
  [<ffffffff81073148>] worker_thread+0x118/0x420
  [<ffffffff81078454>] kthread+0xe4/0x100
  [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last  enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last  enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cm_id_priv->lock)->rlock);
  <Interrupt>
    lock(&(&cm_id_priv->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/8/0.

stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E   4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff81251c1b>] dump_stack+0x4f/0x74
 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190
 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
 [<ffffffff810a3995>] mark_lock+0x115/0x1b0
 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
 [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
 [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
 [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
 [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
 [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
 [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
 [<ffffffff81007f3d>] handle_irq+0x5d/0x130
 [<ffffffff810071fd>] do_IRQ+0x6d/0x130
 [<ffffffff8151d309>] common_interrupt+0x89/0x89
 <EOI>  [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
 [<ffffffff810990d6>] call_cpuidle+0x36/0x60
 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
 [<ffffffff8103c443>] start_secondary+0x83/0x90

Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/srpt: Fix the RDMA completion handlers
Bart Van Assche [Thu, 31 Dec 2015 08:56:43 +0000 (09:56 +0100)]
IB/srpt: Fix the RDMA completion handlers

Avoid that the following kernel crash is triggered when processing
an RDMA completion:

BUG: unable to handle kernel paging request at 0000000100000198
IP: [<ffffffff810a4ea2>] __lock_acquire+0xa2/0x560
Call Trace:
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt]
 [<ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core]
 [<ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core]
 [<ffffffff8107184d>] process_one_work+0x1bd/0x460
 [<ffffffff81073148>] worker_thread+0x118/0x420
 [<ffffffff81078454>] kthread+0xe4/0x100
 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70

Fixes: commit 59fae4deaad3 ("IB/srpt: chain RDMA READ/WRITE requests").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoirq_poll: Fix irq_poll_sched()
Bart Van Assche [Thu, 31 Dec 2015 08:56:03 +0000 (09:56 +0100)]
irq_poll: Fix irq_poll_sched()

The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing.
This means that irq_poll_sched() must proceed if this bit has
not yet been set.

Fixes: commit ea51190c0315 ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Fix dereference before check
Matan Barak [Wed, 30 Dec 2015 14:14:18 +0000 (16:14 +0200)]
IB/core: Fix dereference before check

Sparse complains about dereference before check. Fixing this by
moving the check before the dereference.

Fixes: 200298326b27 ('IB/core: Validate route when we init ah')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Eliminate sparse false context imbalance warning
Matan Barak [Wed, 30 Dec 2015 14:14:17 +0000 (16:14 +0200)]
IB/core: Eliminate sparse false context imbalance warning

When write_gid function needs to do a sleep-able operation, it unlocks
table->rwlock and then relocks it. Sparse complains about context
imbalance.

This is safe as write_gid is always called with table->rwlock.
write_gid protects from simultaneous writes to this GID entry
by setting the GID_TABLE_ENTRY_INVALID flag.

Fixes: 9c584f049596 ('IB/core: Change per-entry lock in RoCE GID table to
     one lock')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling
Hal Rosenstock [Tue, 29 Dec 2015 10:43:43 +0000 (05:43 -0500)]
IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling

Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.

To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.

This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il>

Fixes: 145d9c541032 ('IB/core: Display extended counter set if available')

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Remove set-but-not-used variable from ib_sg_to_pages()
Bart Van Assche [Tue, 29 Dec 2015 09:45:03 +0000 (10:45 +0100)]
IB/core: Remove set-but-not-used variable from ib_sg_to_pages()

Detected this by building the IB core with W=1. See also patch
"IB core: Fix ib_sg_to_pages()" (commit 8f5ba10ed40a).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Fix passing casted pointer in mlx5_query_port_roce
Leon Romanovsky [Sat, 9 Jan 2016 11:06:25 +0000 (13:06 +0200)]
IB/mlx5: Fix passing casted pointer in mlx5_query_port_roce

Fix static checker warning:
        drivers/infiniband/hw/mlx5/main.c:149 mlx5_query_port_roce()
        warn: passing casted pointer '&props->qkey_viol_cntr' to
'mlx5_query_nic_vport_qkey_viol_cntr()' 32 vs 16.

Fixes: 3f89a643eb29 ("IB/mlx5: Extend query_device/port to support RoCE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mad: use CQ abstraction
Christoph Hellwig [Wed, 6 Jan 2016 06:46:12 +0000 (22:46 -0800)]
IB/mad: use CQ abstraction

Remove the local workqueue to process mad completions and use the CQ API
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mad: pass ib_mad_send_buf explicitly to the recv_handler
Christoph Hellwig [Mon, 4 Jan 2016 13:15:58 +0000 (14:15 +0100)]
IB/mad: pass ib_mad_send_buf explicitly to the recv_handler

Stop abusing wr_id and just pass the parameter explicitly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoinfiniband: Replace memset with eth_zero_addr
Lucas Tanure [Tue, 19 Jan 2016 14:06:30 +0000 (12:06 -0200)]
infiniband: Replace memset with eth_zero_addr

Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.

Signed-off-by: Lucas Tanure <tanure@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Delete locally redefined variable
Leon Romanovsky [Tue, 19 Jan 2016 09:11:24 +0000 (11:11 +0200)]
IB/mlx5: Delete locally redefined variable

Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here

Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx4: Remove unused macro
Moni Shoua [Thu, 14 Jan 2016 15:48:07 +0000 (17:48 +0200)]
net/mlx4: Remove unused macro

The macro mlx4_foreach_non_ib_transport_port() is not used anywhere. Remove it.

Fixes: aa9a2d51a3e7 ("mlx4: Activate RoCE/SRIOV")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Take source mac from AH instead from the port
Moni Shoua [Thu, 14 Jan 2016 15:47:38 +0000 (17:47 +0200)]
IB/mlx4: Take source mac from AH instead from the port

In commit dbf727de7440 ("IB/core: Use GID table in AH creation and dmac
resolution") we copy source mac to mlx4_ah from the attributes of
gid at ib_ah_attr.grh.sgid_index. Now we can use it.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Initialize hop_limit when creating address handle
Matan Barak [Thu, 14 Jan 2016 15:47:02 +0000 (17:47 +0200)]
IB/mlx4: Initialize hop_limit when creating address handle

Hop limit value wasn't copied from attributes  when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.

Fixes: fa417f7b520e ("IB/mlx4: Add support for IBoE")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Expose correct maximum number of CQE capacity
Leon Romanovsky [Thu, 14 Jan 2016 06:11:40 +0000 (08:11 +0200)]
IB/mlx5: Expose correct maximum number of CQE capacity

Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.

Fixes: 938fe83c8dcb ("net/mlx5_core: New device capabilities handling")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Take clip reference before starting IPv6 listen
Hariprasad S [Wed, 13 Jan 2016 04:33:14 +0000 (10:03 +0530)]
iw_cxgb4: Take clip reference before starting IPv6 listen

The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Fixes GW-Basic labels to meaningful error names
Hariprasad S [Tue, 12 Jan 2016 11:03:22 +0000 (16:33 +0530)]
iw_cxgb4: Fixes GW-Basic labels to meaningful error names

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Fixes static checker warning in c4iw_rdev_open()
Hariprasad S [Tue, 12 Jan 2016 11:03:21 +0000 (16:33 +0530)]
iw_cxgb4: Fixes static checker warning in c4iw_rdev_open()

Commit c5dfb000b904 ("iw_cxgb4: Pass qid range to user space driver")
from Dec 11, 2015, leads to the following static checker warning:

drivers/infiniband/hw/cxgb4/device.c:857 c4iw_rdev_open()
        warn: variable dereferenced before check 'rdev->status_page'

Also we weren't deallocating ocqp pool in error path when failed to
allocate status page. Fixing it too.

Fixes: c5dfb000b904 ("iw_cxgb4: Pass qid range to user space driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: allocating too much memory in make_cma_ports()
Dan Carpenter [Tue, 12 Jan 2016 09:29:21 +0000 (12:29 +0300)]
IB/cma: allocating too much memory in make_cma_ports()

The issue here is that there is a cut and paste bug.  When we allocate
cma_dev_group->default_ports_group we use "sizeof(*cma_dev_group->ports)"
instead of "sizeof(*cma_dev_group->default_ports_group)".

We're bumping up against the 80 character limit so I introduced a new
local pointer "ports_group" to get around that.

Fixes: 045959db65c6 ('IB/cma: Add configfs for rdma_cm')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/nes: checking for NULL instead of IS_ERR
Dan Carpenter [Tue, 12 Jan 2016 09:27:43 +0000 (12:27 +0300)]
RDMA/nes: checking for NULL instead of IS_ERR

nes_reg_phys_mr() returns ERR_PTRs on error.  It doesn't return NULL.

This bug has been there for a while, but we recently changed from
calling a function pointer to calling nes_reg_phys_mr() directly so now
Smatch is able to detect the bug.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Support creating qps with GFP_NOIO flag
Vinit Agnihotri [Mon, 11 Jan 2016 17:57:25 +0000 (12:57 -0500)]
IB/qib: Support creating qps with GFP_NOIO flag

The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.

This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.

Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/sysfs: Fix sparse warning on attr_id
Ira Weiny [Mon, 4 Jan 2016 03:44:25 +0000 (22:44 -0500)]
IB/sysfs: Fix sparse warning on attr_id

Attributed ID was declared as an int while the value should really be big
endian 16.

Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/be2net: Remove open and close entry points
Devesh Sharma [Thu, 24 Dec 2015 18:14:08 +0000 (13:14 -0500)]
RDMA/be2net: Remove open and close entry points

Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/ocrdma: Depend on async link events from CNA
Devesh Sharma [Thu, 24 Dec 2015 18:14:07 +0000 (13:14 -0500)]
RDMA/ocrdma: Depend on async link events from CNA

Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.

Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/ocrdma: Dispatch only port event when port state changes
Devesh Sharma [Thu, 24 Dec 2015 18:14:06 +0000 (13:14 -0500)]
RDMA/ocrdma: Dispatch only port event when port state changes

Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/ocrdma: Fix vlan-id assignment in qp parameters
Devesh Sharma [Thu, 24 Dec 2015 18:14:05 +0000 (13:14 -0500)]
RDMA/ocrdma: Fix vlan-id assignment in qp parameters

vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.

Fixes: dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/cma: Fix RDMA port validation for iWarp
Matan Barak [Thu, 7 Jan 2016 09:19:29 +0000 (11:19 +0200)]
IB/cma: Fix RDMA port validation for iWarp

cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.

Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes: abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
     and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: fix mcast detach when qp not attached
Mike Marciniszyn [Thu, 7 Jan 2016 21:44:10 +0000 (16:44 -0500)]
IB/qib: fix mcast detach when qp not attached

The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/IPoIB: Fix kernel panic on multicast flow
Erez Shitrit [Thu, 7 Jan 2016 07:28:08 +0000 (09:28 +0200)]
IB/IPoIB: Fix kernel panic on multicast flow

ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.

This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS: 00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
process_one_work+0x164/0x470
worker_thread+0x11d/0x420
...

Fixes: 5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reported-by: Doron Tsur <doront@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Support the remote invalidation exception
Jenny Derzhavetz [Thu, 24 Dec 2015 10:20:48 +0000 (12:20 +0200)]
IB/iser: Support the remote invalidation exception

Declare that we support remote invalidation in case we are:
1. using fastreg method
2. always registering memory

Detect the invalidated rkey from the work completion info so we
won't invalidate it locally. The spec mandates that we must not rely
on the target remote invalidate our rkey so we must check it upon
a receive (scsi response) completion.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Change the increment rkey flow logic
Sagi Grimberg [Wed, 9 Dec 2015 12:12:07 +0000 (14:12 +0200)]
IB/iser: Change the increment rkey flow logic

When we enable remote invalidate support we won't want to perform
local invalidates at the same time we do today, but we still need
to get new rkeys.  So, decouple the rkey update from the local
invalidate and tie it to memory reg instead.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/isert: Support the remote invalidation exception
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:06 +0000 (14:12 +0200)]
IB/isert: Support the remote invalidation exception

We'll use remote invalidate, according to negotiation result
during connection establishment. If the initiator declared that
it supports the remote invalidate exception and the local HCA
supports IB_DEVICE_MEM_MGT_EXTENSIONS then the target will
use IB_WR_SEND_WITH_INV with the correct rkey for the response.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/isert: Declare correct flags when accepting a connection
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:05 +0000 (14:12 +0200)]
IB/isert: Declare correct flags when accepting a connection

iser target does not support zero based virtual addresses and
send with invalidate, so it should declare that it doesn't.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/isert: Remove unused file iser_proto.h
Sagi Grimberg [Wed, 9 Dec 2015 12:12:04 +0000 (14:12 +0200)]
IB/isert: Remove unused file iser_proto.h

We don't need iser_proto.h anymore, remove it and
move (non-protocol) declarations to ib_isert.h

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser,isert: Create and use new shared header
Sagi Grimberg [Wed, 9 Dec 2015 12:12:03 +0000 (14:12 +0200)]
IB/iser,isert: Create and use new shared header

The iser RDMA_CM negotiation protocol is shared by
the initiator and the target, so have a shared header
for the defines and structure. Move relevant items from
the initiator and target headers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: set intuitive values for mr_valid
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:02 +0000 (14:12 +0200)]
IB/iser: set intuitive values for mr_valid

This parameter is described as "is mr valid indicator".
In other words, it indicates whether memory registration
is valid or not. So intuitive values would be:
mr_valid=True, when memory registration is valid and
mr_valid=False otherwise.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Don't register memory for all immediate data writes
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:01 +0000 (14:12 +0200)]
IB/iser: Don't register memory for all immediate data writes

When all the task data is sent as immediate data, we are
allowed to use the local_dma_lkey as it is not sent to
the wire.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Reuse ib_sg_to_pages
Sagi Grimberg [Wed, 9 Dec 2015 12:12:00 +0000 (14:12 +0200)]
IB/iser: Reuse ib_sg_to_pages

We have in iser iser_sg_to_page_vec which has exactly
the same role as ib_sg_to_pages. Customize the page_vec
to hold a fake MR so we can reuse ib_sg_to_pages.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: Fix module init not cleaning up on error flow
Roi Dayan [Wed, 9 Dec 2015 12:11:59 +0000 (14:11 +0200)]
IB/iser: Fix module init not cleaning up on error flow

Destroy workqueue on transport register error, also
release kmem cache on workqueue allocation error.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: constify mmu_notifier_ops structures
Julia Lawall [Sun, 29 Nov 2015 22:02:51 +0000 (23:02 +0100)]
IB/core: constify mmu_notifier_ops structures

This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/iser: constify iser_reg_ops structure
Julia Lawall [Sat, 28 Nov 2015 15:52:04 +0000 (16:52 +0100)]
IB/iser: constify iser_reg_ops structure

The iser_reg_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoRDMA/nes: constify nes_cm_ops structure
Julia Lawall [Sat, 28 Nov 2015 14:00:37 +0000 (15:00 +0100)]
RDMA/nes: constify nes_cm_ops structure

The nes_cm_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: report tx/rx checksum cap in query results
Bodong Wang [Fri, 18 Dec 2015 11:53:20 +0000 (13:53 +0200)]
IB/mlx5: report tx/rx checksum cap in query results

This patch will report the tx/rx checksum cap for raw qp via the
query device results.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Convert kmalloc to kmalloc_array for checkpatch
Leon Romanovsky [Thu, 17 Dec 2015 07:31:53 +0000 (09:31 +0200)]
IB/mlx4: Convert kmalloc to kmalloc_array for checkpatch

Convert kmalloc to be kmalloc_array to fix warnings below:

WARNING: Prefer kmalloc_array over kmalloc with multiply
+               qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof(u64),

WARNING: Prefer kmalloc_array over kmalloc with multiply
+               qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof(u64),

WARNING: Prefer kmalloc_array over kmalloc with multiply
+               srq->wrid = kmalloc(srq->msrq.max * sizeof(u64),

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx4: Suppress non-fatal memory allocations
Leon Romanovsky [Thu, 17 Dec 2015 07:31:52 +0000 (09:31 +0200)]
IB/mlx4: Suppress non-fatal memory allocations

Failure in kmalloc memory allocations will throw a warning about it.
Such warnings are not needed anymore, since in commit 0ef2f05c7e02
("IB/mlx4: Use vmalloc for WR buffers when needed"), fallback mechanism
from kmalloc() to __vmalloc() was added.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Advertise atomic capabilities in query device
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:10 +0000 (16:34 +0200)]
IB/mlx5: Advertise atomic capabilities in query device

In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA.  if not,
advertise IB_ATOMIC_NONE.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agonet/mlx5_core: Add setting ATOMIC endian mode
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:09 +0000 (16:34 +0200)]
net/mlx5_core: Add setting ATOMIC endian mode

HW is capable of 2 requestor endianness modes for standard 8 Bytes
atomic: BE (0x0) and host endianness (0x1). Read the supported modes
from hca atomic capabilities and configure HW to host endianness mode if
supported.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb3: Fix incorrectly returning error on success
Hariprasad S [Fri, 11 Dec 2015 08:29:17 +0000 (13:59 +0530)]
iw_cxgb3: Fix incorrectly returning error on success

The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoiw_cxgb4: Pass qid range to user space driver
Hariprasad S [Fri, 11 Dec 2015 07:32:01 +0000 (13:02 +0530)]
iw_cxgb4: Pass qid range to user space driver

Enhances the t4_dev_status_page to pass the qid start and size
attributes from iw_cxgb4 to libcxgb4.
Bump the ABI Version to 3 -> To allow libcxgb4 to detect old drivers and
revert to the old way of computing the qid ranges.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mad: Ensure fairness in ib_mad_completion_handler
Dean Luick [Thu, 10 Dec 2015 21:52:30 +0000 (16:52 -0500)]
IB/mad: Ensure fairness in ib_mad_completion_handler

It was found that when a process was rapidly sending MADs other processes could
be hung in their unregister calls.

This would happen when process A was injecting packets fast enough that the
single threaded workqueue was never exiting ib_mad_completion_handler.
Therefore when process B called flush_workqueue via the unregister call it
would hang until process A stopped sending MADs.

The fix is to periodically reschedule ib_mad_completion_handler after
processing a large number of completions.  The number of completions chosen was
decided based on the defaults for the recv queue size.  However, it was kept
fixed such that increasing those queue sizes would not adversely affect
fairness in the future.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add driver cross-channel support
Leon Romanovsky [Sun, 20 Dec 2015 10:16:11 +0000 (12:16 +0200)]
IB/mlx5: Add driver cross-channel support

Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.

The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add cross-channel support
Leon Romanovsky [Sun, 20 Dec 2015 10:16:10 +0000 (12:16 +0200)]
IB/core: Add cross-channel support

The cross-channel feature allows to execute WQEs that involve
synchronization of I/O operations’ on different QPs.

This capability enables to program complex flows with a single
function call, hereby significantly reducing overhead associated
with I/O processing.

Cross-channel operations support is indicated by HCA capability
information.

The queue pairs can be configured to work as a “sync master queue”
or “sync slave queues”.

The added flags are:

1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the
   devices that can perform cross-channel operations.

2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun
   check. This check is useless in cross-channel scenario.

3. QP property flags to indicate if queues are slave or master:
   * IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests
     will not be executed immediately and requires enabling.
   * IB_QP_CREATE_MANAGED_RECV indicates that posted receive work
     requests will not be executed immediately and requires enabling.
   * IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel
     mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are
     not provided, this QP will be sync master queue, else it will be sync
     slave.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Align coding style of ib_device_cap_flags structure
Leon Romanovsky [Sun, 20 Dec 2015 10:16:09 +0000 (12:16 +0200)]
IB/core: Align coding style of ib_device_cap_flags structure

Modify enum ib_device_cap_flags such that other patches which add new
enum values pass strict checkpatch.pl checks.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Mmap the HCA's core clock register to user-space
Matan Barak [Tue, 15 Dec 2015 18:30:13 +0000 (20:30 +0200)]
IB/mlx5: Mmap the HCA's core clock register to user-space

In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Matan Barak [Tue, 15 Dec 2015 18:30:12 +0000 (20:30 +0200)]
IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext

Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add support for hca_core_clock and timestamp_mask
Matan Barak [Tue, 15 Dec 2015 18:30:11 +0000 (20:30 +0200)]
IB/mlx5: Add support for hca_core_clock and timestamp_mask

Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Add ib_is_udata_cleared
Matan Barak [Tue, 15 Dec 2015 18:30:10 +0000 (20:30 +0200)]
IB/core: Add ib_is_udata_cleared

Extending core and vendor verb commands require us to check that the
unknown part of the user's given command is all zeros.
Adding ib_is_udata_cleared in order to do so.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/mlx5: Add create_cq extended command
Matan Barak [Tue, 15 Dec 2015 18:30:09 +0000 (20:30 +0200)]
IB/mlx5: Add create_cq extended command

In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Display extended counter set if available
Christoph Lameter [Mon, 21 Dec 2015 14:20:29 +0000 (08:20 -0600)]
IB/core: Display extended counter set if available

Check if the extended counters are available and if so
create the proper extended and additional counters.

Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Specify attribute_id in port_table_attribute
Christoph Lameter [Mon, 21 Dec 2015 14:20:28 +0000 (08:20 -0600)]
IB/core: Specify attribute_id in port_table_attribute

Add the attr_id on port_table_attribute since we will have to add
a different port_table_attribute for the extended attribute soon.

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/core: Create get_perf_mad function in sysfs.c
Christoph Lameter [Mon, 21 Dec 2015 14:20:27 +0000 (08:20 -0600)]
IB/core: Create get_perf_mad function in sysfs.c

Create a new function to retrieve performance management
data from the existing code in get_pma_counter().

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoMAINTAINERS: Assign maintainer to Mellanox mlx4 core and IB drivers
Or Gerlitz [Wed, 23 Dec 2015 16:30:58 +0000 (18:30 +0200)]
MAINTAINERS: Assign maintainer to Mellanox mlx4 core and IB drivers

The driver was written originally by Roland Dreier, currently there's
no official maintainer. Yishai steps in as maintainer.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@kernel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>