Dan Carpenter [Tue, 12 Jan 2016 09:27:43 +0000 (12:27 +0300)]
RDMA/nes: checking for NULL instead of IS_ERR
nes_reg_phys_mr() returns ERR_PTRs on error. It doesn't return NULL.
This bug has been there for a while, but we recently changed from
calling a function pointer to calling nes_reg_phys_mr() directly so now
Smatch is able to detect the bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vinit Agnihotri [Mon, 11 Jan 2016 17:57:25 +0000 (12:57 -0500)]
IB/qib: Support creating qps with GFP_NOIO flag
The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.
This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.
Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Mon, 4 Jan 2016 03:44:25 +0000 (22:44 -0500)]
IB/sysfs: Fix sparse warning on attr_id
Attributed ID was declared as an int while the value should really be big
endian 16.
Fixes:
35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:08 +0000 (13:14 -0500)]
RDMA/be2net: Remove open and close entry points
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.
The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.
A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock
B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.
This improper locking sequence causes deadlock.
In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.
Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:07 +0000 (13:14 -0500)]
RDMA/ocrdma: Depend on async link events from CNA
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.
The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.
A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock
B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.
This improper locking sequence causes deadlock.
With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.
Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.
Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:06 +0000 (13:14 -0500)]
RDMA/ocrdma: Dispatch only port event when port state changes
Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:05 +0000 (13:14 -0500)]
RDMA/ocrdma: Fix vlan-id assignment in qp parameters
vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.
Fixes:
dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 7 Jan 2016 09:19:29 +0000 (11:19 +0200)]
IB/cma: Fix RDMA port validation for iWarp
cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.
Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes:
abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 7 Jan 2016 21:44:10 +0000 (16:44 -0500)]
IB/qib: fix mcast detach when qp not attached
The code produces the following trace:
[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task:
ffff8800366a9800 ti:
ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<
ffffffffa0131d51>] [<
ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:
ffff88007af1dd70 EFLAGS:
00010246
[1750924.420364] RAX:
0000000000000001 RBX:
ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX:
ffff88007b822688 RSI:
ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP:
ffff88007af1dd78 R08:
0000000000000001 R09:
0000000000000000
[1750924.420364] R10:
0000000000000011 R11:
0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13:
ffff88003ecab000 R14:
ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:
00007ffff7fd8740(0000) GS:
ffff88007fc80000(0000)
knlGS:
0000000000000000
[1750924.420364] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[1750924.420364] CR2:
00007ffff597c750 CR3:
000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]
ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]
ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]
00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364] [<
ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035] [<
ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035] [<
ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035] [<
ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035] [<
ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035] [<
ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035] [<
ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035] [<
ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035] [<
ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035] [<
ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP [<
ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035] RSP <
ffff88007af1dd70>
[1750924.650439] ---[ end trace
73d5d4b3f8ad4851 ]
The fix is to note the qib_mcast_qp that was found. If none is found, then
return EINVAL indicating the error.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Thu, 7 Jan 2016 07:28:08 +0000 (09:28 +0200)]
IB/IPoIB: Fix kernel panic on multicast flow
ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.
This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS:
00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
process_one_work+0x164/0x470
worker_thread+0x11d/0x420
...
Fixes:
5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reported-by: Doron Tsur <doront@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Thu, 24 Dec 2015 10:20:48 +0000 (12:20 +0200)]
IB/iser: Support the remote invalidation exception
Declare that we support remote invalidation in case we are:
1. using fastreg method
2. always registering memory
Detect the invalidated rkey from the work completion info so we
won't invalidate it locally. The spec mandates that we must not rely
on the target remote invalidate our rkey so we must check it upon
a receive (scsi response) completion.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:07 +0000 (14:12 +0200)]
IB/iser: Change the increment rkey flow logic
When we enable remote invalidate support we won't want to perform
local invalidates at the same time we do today, but we still need
to get new rkeys. So, decouple the rkey update from the local
invalidate and tie it to memory reg instead.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:06 +0000 (14:12 +0200)]
IB/isert: Support the remote invalidation exception
We'll use remote invalidate, according to negotiation result
during connection establishment. If the initiator declared that
it supports the remote invalidate exception and the local HCA
supports IB_DEVICE_MEM_MGT_EXTENSIONS then the target will
use IB_WR_SEND_WITH_INV with the correct rkey for the response.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:05 +0000 (14:12 +0200)]
IB/isert: Declare correct flags when accepting a connection
iser target does not support zero based virtual addresses and
send with invalidate, so it should declare that it doesn't.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:04 +0000 (14:12 +0200)]
IB/isert: Remove unused file iser_proto.h
We don't need iser_proto.h anymore, remove it and
move (non-protocol) declarations to ib_isert.h
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:03 +0000 (14:12 +0200)]
IB/iser,isert: Create and use new shared header
The iser RDMA_CM negotiation protocol is shared by
the initiator and the target, so have a shared header
for the defines and structure. Move relevant items from
the initiator and target headers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:02 +0000 (14:12 +0200)]
IB/iser: set intuitive values for mr_valid
This parameter is described as "is mr valid indicator".
In other words, it indicates whether memory registration
is valid or not. So intuitive values would be:
mr_valid=True, when memory registration is valid and
mr_valid=False otherwise.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:01 +0000 (14:12 +0200)]
IB/iser: Don't register memory for all immediate data writes
When all the task data is sent as immediate data, we are
allowed to use the local_dma_lkey as it is not sent to
the wire.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:00 +0000 (14:12 +0200)]
IB/iser: Reuse ib_sg_to_pages
We have in iser iser_sg_to_page_vec which has exactly
the same role as ib_sg_to_pages. Customize the page_vec
to hold a fake MR so we can reuse ib_sg_to_pages.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Roi Dayan [Wed, 9 Dec 2015 12:11:59 +0000 (14:11 +0200)]
IB/iser: Fix module init not cleaning up on error flow
Destroy workqueue on transport register error, also
release kmem cache on workqueue allocation error.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sun, 29 Nov 2015 22:02:51 +0000 (23:02 +0100)]
IB/core: constify mmu_notifier_ops structures
This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 28 Nov 2015 15:52:04 +0000 (16:52 +0100)]
IB/iser: constify iser_reg_ops structure
The iser_reg_ops structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 28 Nov 2015 14:00:37 +0000 (15:00 +0100)]
RDMA/nes: constify nes_cm_ops structure
The nes_cm_ops structure is never modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bodong Wang [Fri, 18 Dec 2015 11:53:20 +0000 (13:53 +0200)]
IB/mlx5: report tx/rx checksum cap in query results
This patch will report the tx/rx checksum cap for raw qp via the
query device results.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 17 Dec 2015 07:31:53 +0000 (09:31 +0200)]
IB/mlx4: Convert kmalloc to kmalloc_array for checkpatch
Convert kmalloc to be kmalloc_array to fix warnings below:
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof(u64),
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof(u64),
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ srq->wrid = kmalloc(srq->msrq.max * sizeof(u64),
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 17 Dec 2015 07:31:52 +0000 (09:31 +0200)]
IB/mlx4: Suppress non-fatal memory allocations
Failure in kmalloc memory allocations will throw a warning about it.
Such warnings are not needed anymore, since in commit
0ef2f05c7e02
("IB/mlx4: Use vmalloc for WR buffers when needed"), fallback mechanism
from kmalloc() to __vmalloc() was added.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:10 +0000 (16:34 +0200)]
IB/mlx5: Advertise atomic capabilities in query device
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA. if not,
advertise IB_ATOMIC_NONE.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:09 +0000 (16:34 +0200)]
net/mlx5_core: Add setting ATOMIC endian mode
HW is capable of 2 requestor endianness modes for standard 8 Bytes
atomic: BE (0x0) and host endianness (0x1). Read the supported modes
from hca atomic capabilities and configure HW to host endianness mode if
supported.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Fri, 11 Dec 2015 08:29:17 +0000 (13:59 +0530)]
iw_cxgb3: Fix incorrectly returning error on success
The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Fri, 11 Dec 2015 07:32:01 +0000 (13:02 +0530)]
iw_cxgb4: Pass qid range to user space driver
Enhances the t4_dev_status_page to pass the qid start and size
attributes from iw_cxgb4 to libcxgb4.
Bump the ABI Version to 3 -> To allow libcxgb4 to detect old drivers and
revert to the old way of computing the qid ranges.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 10 Dec 2015 21:52:30 +0000 (16:52 -0500)]
IB/mad: Ensure fairness in ib_mad_completion_handler
It was found that when a process was rapidly sending MADs other processes could
be hung in their unregister calls.
This would happen when process A was injecting packets fast enough that the
single threaded workqueue was never exiting ib_mad_completion_handler.
Therefore when process B called flush_workqueue via the unregister call it
would hang until process A stopped sending MADs.
The fix is to periodically reschedule ib_mad_completion_handler after
processing a large number of completions. The number of completions chosen was
decided based on the defaults for the recv queue size. However, it was kept
fixed such that increasing those queue sizes would not adversely affect
fairness in the future.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:11 +0000 (12:16 +0200)]
IB/mlx5: Add driver cross-channel support
Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.
The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:10 +0000 (12:16 +0200)]
IB/core: Add cross-channel support
The cross-channel feature allows to execute WQEs that involve
synchronization of I/O operations’ on different QPs.
This capability enables to program complex flows with a single
function call, hereby significantly reducing overhead associated
with I/O processing.
Cross-channel operations support is indicated by HCA capability
information.
The queue pairs can be configured to work as a “sync master queue”
or “sync slave queues”.
The added flags are:
1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the
devices that can perform cross-channel operations.
2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun
check. This check is useless in cross-channel scenario.
3. QP property flags to indicate if queues are slave or master:
* IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests
will not be executed immediately and requires enabling.
* IB_QP_CREATE_MANAGED_RECV indicates that posted receive work
requests will not be executed immediately and requires enabling.
* IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel
mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are
not provided, this QP will be sync master queue, else it will be sync
slave.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:09 +0000 (12:16 +0200)]
IB/core: Align coding style of ib_device_cap_flags structure
Modify enum ib_device_cap_flags such that other patches which add new
enum values pass strict checkpatch.pl checks.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:13 +0000 (20:30 +0200)]
IB/mlx5: Mmap the HCA's core clock register to user-space
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:12 +0000 (20:30 +0200)]
IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:11 +0000 (20:30 +0200)]
IB/mlx5: Add support for hca_core_clock and timestamp_mask
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:10 +0000 (20:30 +0200)]
IB/core: Add ib_is_udata_cleared
Extending core and vendor verb commands require us to check that the
unknown part of the user's given command is all zeros.
Adding ib_is_udata_cleared in order to do so.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:09 +0000 (20:30 +0200)]
IB/mlx5: Add create_cq extended command
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:29 +0000 (08:20 -0600)]
IB/core: Display extended counter set if available
Check if the extended counters are available and if so
create the proper extended and additional counters.
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:28 +0000 (08:20 -0600)]
IB/core: Specify attribute_id in port_table_attribute
Add the attr_id on port_table_attribute since we will have to add
a different port_table_attribute for the extended attribute soon.
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:27 +0000 (08:20 -0600)]
IB/core: Create get_perf_mad function in sysfs.c
Create a new function to retrieve performance management
data from the existing code in get_pma_counter().
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Wed, 23 Dec 2015 16:30:58 +0000 (18:30 +0200)]
MAINTAINERS: Assign maintainer to Mellanox mlx4 core and IB drivers
The driver was written originally by Roland Dreier, currently there's
no official maintainer. Yishai steps in as maintainer.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@kernel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Wed, 23 Dec 2015 16:30:57 +0000 (18:30 +0200)]
MAINTAINERS: Assign new maintainers to Mellanox mlx5 core and IB drivers
Matan and Leon step in as co-maintainers to replace Eli Cohen
who wrote and maintained the core and IB drivers.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:54 +0000 (19:12 +0100)]
IB: remove the write-only usecnt field from struct ib_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:53 +0000 (19:12 +0100)]
IB: remove the struct ib_phys_buf definition
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:52 +0000 (19:12 +0100)]
ehca: stop using struct ib_phys_buf
And simplify the calling convention for full-memory registrations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:51 +0000 (19:12 +0100)]
amso1100: fold c2_reg_phys_mr into c2_get_dma_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:50 +0000 (19:12 +0100)]
nes: simplify nes_reg_phys_mr calling conventions
Just pass and address/size pair instead of an ib_phys_buf array.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:49 +0000 (19:12 +0100)]
cxgb3: simplify iwch_get_dma_wr
Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:48 +0000 (19:12 +0100)]
IB: remove in-kernel support for memory windows
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:47 +0000 (19:12 +0100)]
IB: remove support for phys MRs
We have stopped using phys MRs in the kernel a while ago, so let's
remove all the cruft used to implement them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:46 +0000 (19:12 +0100)]
IB: remove ib_query_mr
This functionality has no users and was only supported by the staged out
EHCA driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:45 +0000 (19:12 +0100)]
IB: start documenting device capabilities
Just IB_DEVICE_LOCAL_DMA_LKEY and IB_DEVICE_MEM_MGT_EXTENSIONS for now
as I'm most familar with those.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:42:54 +0000 (08:42 -0600)]
IB/IPoIB: Move multicast specific code out of ipoib_main.c
Code cleanup to move multicast specific code that checks for
a sendonly join to ipoib_multicast.c. This allows the removal
of the export of __ipoib_mcast_find().
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:42:53 +0000 (08:42 -0600)]
IB/IPoIB: factor out common multicast list removal code
Code cleanup to remove multicast specific code from ipoib_main.c
The removal of a list of multicast groups occurs in three places.
Create a new function ipoib_mcast_remove_list(). Use this new
function in ipoib_main.c too.
That in turn allows the dropping of two functions that were
exported from ipoib_multicast.c for expiration of mc groups.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:25 +0000 (18:47 +0200)]
IB/mlx5: Support RoCE
Advertise RoCE support for IB/core layer and set the hardware to
work in RoCE mode.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:24 +0000 (18:47 +0200)]
IB/mlx5: Add RoCE fields to Address Vector
Set the address handle and QP address path fields according to the
link layer type (IB/Eth).
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:23 +0000 (18:47 +0200)]
IB/mlx5: Support IB device's callbacks for adding/deleting GIDs
These callbacks write into the mlx5 RoCE address table.
Upon del_gid we write a zero'd GID.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:22 +0000 (18:47 +0200)]
IB/mlx5: Set network_hdr_type upon RoCE responder completion
When handling a responder completion, if the link layer is Ethernet,
set the work completion network_hdr_type field according to CQE's
info and the IB_WC_WITH_NETWORK_HDR_TYPE flag.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:21 +0000 (18:47 +0200)]
IB/mlx5: Extend query_device/port to support RoCE
Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:20 +0000 (18:47 +0200)]
net/mlx5_core: Introduce access functions to query vport RoCE fields
Introduce access functions to query NIC vport system_image_guid,
node_guid and qkey_viol_cntr.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:19 +0000 (18:47 +0200)]
net/mlx5_core: Introduce access functions to enable/disable RoCE
A mlx5 Ethernet port must be explicitly enabled for RoCE.
When RoCE is not enabled on the port, the NIC will refuse to create
QPs attached to it and incoming RoCE packets will be considered by the
NIC as plain Ethernet packets.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:18 +0000 (18:47 +0200)]
net/mlx5_core: Break down the vport mac address query function
Introduce a new function called mlx5_query_nic_vport_context().
This function gets all the NIC vport attributes from the device.
The MAC address is just one of the NIC vport attributes, so
mlx5_query_nic_vport_mac_address() is now just a wrapper function
above mlx5_query_nic_vport_context().
More NIC vport attributes will be used in following commits.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:17 +0000 (18:47 +0200)]
IB/mlx5: Support IB device's callback for getting its netdev
For Eth ports only:
Maintain a net device pointer in mlx5_ib_device and update it
upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the
net-device and IB device have the same PCI parent device.
Implement the get_netdev callback to return this net device.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:16 +0000 (18:47 +0200)]
IB/mlx5: Support IB device's callback for getting the link layer
Make the existing mlx5_ib_port_link_layer() signature match
the ib device callback signature (add port_num parameter).
Refactor it to use a sub function so that the link layer could
be queried also before the ibdev is created.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 19 Dec 2015 20:48:59 +0000 (21:48 +0100)]
IB/usnic: delete unneeded IS_ERR test
kzalloc doesn't return ERR_PTR, so there is no need to test for it.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,e;
@@
* x = kzalloc(...)
... when != x = e
* IS_ERR_OR_NULL(x)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:19 +0000 (10:42 -0800)]
IB/usnic: Handle 0 counts in resource allocation
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:18 +0000 (10:42 -0800)]
IB/usnic: Fix resource leak in error case
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:17 +0000 (10:42 -0800)]
IB/usnic: Support more QP state transitions
They were already implemented at a lower layer, but the upper level
routine placed arbitrary restrictions on which transitions were
permitted. Simplify the state machine logic to live wholly in
usnic_ib_qp_grp_modify.
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:16 +0000 (10:42 -0800)]
IB/usnic: Fix message typo
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:15 +0000 (10:42 -0800)]
IB/usnic: Fix incorrect cast in usnic_ib_fw_string_to_u64
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:14 +0000 (10:42 -0800)]
IB/usnic: Improve a failure message
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:39:49 +0000 (10:39 -0800)]
IB/usnic: Remove unused prototype
query_protocol() was added in commit
6b90a6d66b17 ("IB/Verbs:
Implement new callback query_protocol()") and then removed in
commit
f9b22e355d38 ("IB/core: Convert core to use bitfield
for caps").
This left behind an unused prototype.
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Wed, 23 Dec 2015 12:56:57 +0000 (14:56 +0200)]
IB/cma: Join and leave multicast groups with IGMP
Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Wed, 23 Dec 2015 12:56:56 +0000 (14:56 +0200)]
IB/core: Initialize UD header structure with IP and UDP headers
ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCE UDP ENCAP it is
required that this function would be able to build also IP and UDP
headers.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:55 +0000 (14:56 +0200)]
IB/cma: Add configfs for rdma_cm
Users would like to control the behaviour of rdma_cm.
For example, old applications which don't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.
In order to use the configfs, one needs to mount it and
mkdir <IB device name> inside rdma_cm directory.
The patch adds support for a single configuration file,
default_roce_mode. The mode can either be "IB/RoCE v1" or
"RoCE v2".
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:54 +0000 (14:56 +0200)]
IB/rdma_cm: Add wrapper for cma reference count
Currently, cma users can't increase or decrease the cma reference
count. This is necassary when setting cma attributes (like the
default GID type) in order to avoid use-after-free errors.
Adding cma_ref_dev and cma_deref_dev APIs.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:53 +0000 (14:56 +0200)]
IB/core: Validate route when we init ah
In order to make sure API users don't try to use SGIDs which don't
conform to the routing table, validate the route before searching
the RoCE GID table.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:52 +0000 (14:56 +0200)]
IB/core: Move rdma_is_upper_dev_rcu to header file
In order to validate the route, we need an easy way to check if a
net-device belongs to our RDMA device. Move this helper function
to a header file in order to make this check easier.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Somnath Kotur [Wed, 23 Dec 2015 12:56:51 +0000 (14:56 +0200)]
IB/core: Add rdma_network_type to wc
Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.
We choose sgid_index and type from all the matching entries in
RDMA-CM based on hint from the IP stack and we set hop_limit for
the IP packet based on above hint from IP stack.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Somnath Kotur <Somnath.Kotur@Avagotech.Com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:50 +0000 (14:56 +0200)]
IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
Adding RoCE v2 GID type and port type. Vendors
which support this type will get their GID table
populated with RoCE v2 GIDs automatically.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:49 +0000 (14:56 +0200)]
IB/core: Add gid attributes to sysfs
This patch set adds attributes of net device and gid type to each GID
in the GID table. Users that use verbs directly need to specify
the GID index. Since the same GID could have different types or
associated net devices, users should have the ability to query the
associated GID attributes. Adding these attributes to sysfs.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:48 +0000 (14:56 +0200)]
IB/cm: Use the source GID index type
Previosuly, cm and cma modules supported only IB and RoCE v1 GID type.
In order to support multiple GID types, the gid_type is passed to
cm_init_av_by_path and stored in the path record.
The rdma cm client would use a default GID type that will be saved in
rdma_id_private.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 23 Dec 2015 12:56:47 +0000 (14:56 +0200)]
IB/core: Add gid_type to gid attribute
In order to support multiple GID types, we need to store the gid_type
with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
GID table entries shall have a "GID type" attribute that denotes the L3
Address type". The currently supported GID is IB_GID_TYPE_IB which is
also RoCE v1 GID type.
This implies that gid_type should be added to roce_gid_table meta-data.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 28 Oct 2015 14:52:41 +0000 (16:52 +0200)]
IB/core: don't search the GID table twice
Previously, we've searched the GID table twice: first when we searched
the table for a GID matching the proposed new one, and second when we
didn't find a match, we searched again for an empty GID slot in the
table. Instead, search the table once noting the first empty slot as
we search for our target GID.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 28 Oct 2015 14:52:40 +0000 (16:52 +0200)]
IB/core: Change per-entry lock in RoCE GID table to one lock
Previously, IB GID cached used a lock per entry. This could result
in spending a lot of CPU cycles for locking and unlocking just
in order to find a GID. Changing this in favor of one lock per
a GID table.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 28 Oct 2015 14:52:39 +0000 (16:52 +0200)]
IB/core: Refactor GID cache's ib_dispatch_event
Refactor ib_dispatch_event into a new function in order to avoid
duplicating code in the next patch.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Doug Ledford [Tue, 22 Dec 2015 22:03:15 +0000 (17:03 -0500)]
Merge branches '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/infiniband/ulp/iser/iser_verbs.c
Or Gerlitz [Fri, 18 Dec 2015 08:59:50 +0000 (10:59 +0200)]
IB/core: Remove ib_query_device
The copy of the attributes present on the device is now used by all consumers
except for uverbs in case of serving user-space query, where dev->query_device
is called.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Fri, 18 Dec 2015 08:59:49 +0000 (10:59 +0200)]
staging/o2iblnd: Avoid calling ib_query_device
Instead, use the cached copy of the attributes present on the device.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Fri, 18 Dec 2015 08:59:48 +0000 (10:59 +0200)]
xprtrdma: Avoid calling ib_query_device
Instead, use the cached copy of the attributes present on the device.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Fri, 18 Dec 2015 08:59:47 +0000 (10:59 +0200)]
net/rds: Avoid calling ib_query_device
Instead, use the cached copy of the attributes present on the device.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Fri, 18 Dec 2015 08:59:46 +0000 (10:59 +0200)]
IB/ulps: Avoid calling ib_query_device
Instead, use the cached copy of the attributes present on the device.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Fri, 18 Dec 2015 08:59:45 +0000 (10:59 +0200)]
IB/core: Avoid calling ib_query_device
Use the cached copy of the attributes present on the device, except for
the case of a query originating from user-space, where we have to invoke
the driver query_device entry, so they can fill in their udata.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Fri, 18 Dec 2015 08:59:44 +0000 (10:59 +0200)]
IB/core: Save the device attributes on the device structure
This way both the IB core and upper level drivers can access these cached
device attributes rather than querying or caching them on their own.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Mon, 21 Dec 2015 00:06:09 +0000 (16:06 -0800)]
Linux 4.4-rc6
Linus Torvalds [Sun, 20 Dec 2015 18:01:11 +0000 (10:01 -0800)]
Merge tag 'rtc-4.4-3' of git://git./linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"Late fixes for the RTC subsystem for 4.4:
A fix for a nasty hardware bug in rk808 and an initialization
reordering in da9063 to fix a possible crash"
* tag 'rtc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: da9063: fix access ordering error during RTC interrupt at system power on
rtc: rk808: Compensate for Rockchip calendar deviation on November 31st
Steve Twiss [Tue, 8 Dec 2015 16:28:39 +0000 (16:28 +0000)]
rtc: da9063: fix access ordering error during RTC interrupt at system power on
This fix alters the ordering of the IRQ and device registrations in the RTC
driver probe function. This change will apply to the RTC driver that supports
both DA9063 and DA9062 PMICs.
A problem could occur with the existing RTC driver if:
A system is started from a cold boot using the PMIC RTC IRQ to initiate a
power on operation. For instance, if an RTC alarm is used to start a
platform from power off.
The existing driver IRQ is requested before the device has been properly
registered.
i.e.
ret = devm_request_threaded_irq()
comes before
rtc->rtc_dev = devm_rtc_device_register();
In this case, the interrupt can be called before the device has been
registered and the handler can be called immediately. The IRQ handler
da9063_alarm_event() contains the function call
rtc_update_irq(rtc->rtc_dev, 1, RTC_IRQF | RTC_AF);
which in turn tries to access the unavailable rtc->rtc_dev.
The fix is to reorder the functions inside the RTC probe. The IRQ is
requested after the RTC device resource has been registered so that
get_irq_byname is the last thing to happen.
Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Julius Werner [Tue, 15 Dec 2015 23:02:49 +0000 (15:02 -0800)]
rtc: rk808: Compensate for Rockchip calendar deviation on November 31st
In A.D. 1582 Pope Gregory XIII found that the existing Julian calendar
insufficiently represented reality, and changed the rules about
calculating leap years to account for this. Similarly, in A.D. 2013
Rockchip hardware engineers found that the new Gregorian calendar still
contained flaws, and that the month of November should be counted up to
31 days instead. Unfortunately it takes a long time for calendar changes
to gain widespread adoption, and just like more than 300 years went by
before the last Protestant nation implemented Greg's proposal, we will
have to wait a while until all religions and operating system kernels
acknowledge the inherent advantages of the Rockchip system. Until then
we need to translate dates read from (and written to) Rockchip hardware
back to the Gregorian format.
This patch works by defining Jan 1st, 2016 as the arbitrary anchor date
on which Rockchip and Gregorian calendars are in sync. From that we can
translate arbitrary later dates back and forth by counting the number
of November/December transitons since the anchor date to determine the
offset between the calendars. We choose this method (rather than trying
to regularly "correct" the date stored in hardware) since it's the only
way to ensure perfect time-keeping even if the system may be shut down
for an unknown number of years. The drawback is that other software
reading the same hardware (e.g. mainboard firmware) must use the same
translation convention (including the same anchor date) to be able to
read and write correct timestamps from/to the RTC.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>