platform/kernel/linux-starfive.git
6 years agoRDMA/hns: Fix an error code in hns_roce_v2_init_eq_table()
Dan Carpenter [Mon, 10 Sep 2018 08:35:11 +0000 (11:35 +0300)]
RDMA/hns: Fix an error code in hns_roce_v2_init_eq_table()

The error code isn't set on this path.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting
Michael J. Ruhl [Mon, 10 Sep 2018 16:49:27 +0000 (09:49 -0700)]
IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting

The post_send() path determines if it should post directly or, schedule
the post for later.  The current logic is:

  if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold
    post the send
  else
    schedule

This can allow large requests to call the send engine directly.  Large
requests can potentially produce a large number of packets prior to
returning to the caller, blocking the caller from posting more requests,
and allowing better parallel processing.

Allow the driver(s) more say in this logic (pass call_send to the driver,
rather than examining a return value).

Update hfi1/qib logic to schedule the send engine if an RC or UC message
is larger than the QP MTU size.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoinfiniband: remove redundant condition check before debugfs_remove
zhong jiang [Sat, 8 Sep 2018 13:49:31 +0000 (21:49 +0800)]
infiniband: remove redundant condition check before debugfs_remove

debugfs_remove has taken the IS_ERR_OR_NULL into account. Just remove the
unnecessary condition.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Allow creating a matcher for a NIC TX flow table
Mark Bloch [Thu, 6 Sep 2018 14:27:08 +0000 (17:27 +0300)]
RDMA/mlx5: Allow creating a matcher for a NIC TX flow table

Currently a matcher can only be created and attached to a NIC RX flow
table. Extend it to allow it on NIC TX flow tables as well.

In order to achieve that, we:

1) Expose a new attribute: MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS.
   enum ib_flow_flags is used as valid flags. Only
   IB_FLOW_ATTR_FLAGS_EGRESS is supported.

2) Remove the requirement to have a DEVX or QP destination when creating a
   flow. A flow added to NIC TX flow table will forward the packet outside
   of the vport (Wire or E-Switch in the SR-iOV case).

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Add NIC TX namespace when getting a flow table
Mark Bloch [Thu, 6 Sep 2018 14:27:07 +0000 (17:27 +0300)]
RDMA/mlx5: Add NIC TX namespace when getting a flow table

Add the ability to get a NIC TX flow table when using _get_flow_table().
This will allow to create a matcher and a flow rule on the NIC TX path.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Add flow actions support to raw create flow
Mark Bloch [Thu, 6 Sep 2018 14:27:06 +0000 (17:27 +0300)]
RDMA/mlx5: Add flow actions support to raw create flow

Support attaching flow actions to a flow rule via raw create flow.
For now only NIC RX path is supported. This change requires to export
flow resources management functions so we can maintain proper bookkeeping
of flow actions.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Refactor raw flow creation
Mark Bloch [Thu, 6 Sep 2018 14:27:05 +0000 (17:27 +0300)]
RDMA/mlx5: Refactor raw flow creation

Move struct mlx5_flow_act to be passed from the method entry point,
this will allow to add support for flow action for the raw create flow
path.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Don't overwrite action if already set
Mark Bloch [Thu, 6 Sep 2018 14:27:04 +0000 (17:27 +0300)]
RDMA/mlx5: Don't overwrite action if already set

We support only a single action type per flow rule, in case the user passes
the same type of flow actions fail the flow creation.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Refactor flow action parsing to be more generic
Mark Bloch [Thu, 6 Sep 2018 14:27:03 +0000 (17:27 +0300)]
RDMA/mlx5: Refactor flow action parsing to be more generic

Make the parsing of flow actions more generic so it could be used by
mlx5 raw create flow.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Move flow resources initialization
Mark Bloch [Thu, 6 Sep 2018 14:27:02 +0000 (17:27 +0300)]
RDMA/uverbs: Move flow resources initialization

Use ib_set_flow() when initializing flow related resources.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/uverbs: Add IDRs array attribute type to ioctl() interface
Guy Levi [Thu, 6 Sep 2018 14:27:01 +0000 (17:27 +0300)]
IB/uverbs: Add IDRs array attribute type to ioctl() interface

Methods sometimes need to get a flexible set of IDRs and not a strict set
as can be achieved today by the conventional IDR attribute. Add a new
IDRS_ARRAY attribute to the generic uverbs ioctl layer.

IDRS_ARRAY points to array of idrs of the same object type and same access
rights, only write and read are supported.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>``
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Enable attaching packet reformat action to steering flows
Mark Bloch [Sun, 2 Sep 2018 09:51:36 +0000 (12:51 +0300)]
RDMA/mlx5: Enable attaching packet reformat action to steering flows

Any matching rules will be mutated based on the packet reformat context
which is attached to that given flow rule.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Enable reformat on NIC RX if supported
Mark Bloch [Sun, 2 Sep 2018 09:51:35 +0000 (12:51 +0300)]
RDMA/mlx5: Enable reformat on NIC RX if supported

A L3_TUNNEL_TO_L2 decap flow action requires to enable the encap bit on
the flow table, enable it if supported. This will allow to attach those
flow actions to NIC RX steering. We don't enable if running on a
representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Enable attaching DECAP action to steering flows
Mark Bloch [Sun, 2 Sep 2018 09:51:34 +0000 (12:51 +0300)]
RDMA/mlx5: Enable attaching DECAP action to steering flows

Any matching packet will be stripped of it's VXLAN tunnel, only the inner
L2 onward is left. The user will receive the decapsulated packet.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Enable decap and packet reformat on flow tables
Mark Bloch [Sun, 2 Sep 2018 09:51:33 +0000 (12:51 +0300)]
RDMA/mlx5: Enable decap and packet reformat on flow tables

If NIC RX flow tables support decap operation, enable it on creation,
This allows to perform decapsulation of tunnelled packets by steering
rules. If NIC TX flow tables support reformat operation, enable it on
creation.

We don't enable those capabilities on representors as the E-Switch should
handle packet modification (can be configured via TC) and as current
hardware can't handle both FDB and NIC flow tables with decap/packet
reformat support.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Enable attaching modify header to steering flows
Mark Bloch [Sun, 2 Sep 2018 09:51:32 +0000 (12:51 +0300)]
RDMA/mlx5: Enable attaching modify header to steering flows

When creating a flow steering rule, allow the user to attach a modify
header action.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Add NIC TX steering support
Mark Bloch [Sun, 2 Sep 2018 09:51:31 +0000 (12:51 +0300)]
RDMA/mlx5: Add NIC TX steering support

Just like ingress steering, allow a user to create steering rules that
match egress vport traffic. We expose the same number of priorities as
the bypass (NIC RX) steering.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Document QP @event_handler function
Chuck Lever [Tue, 4 Sep 2018 15:45:20 +0000 (11:45 -0400)]
RDMA/core: Document QP @event_handler function

Add helpful warning for RDMA consumer implementers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Document CM @event_handler function
Chuck Lever [Tue, 4 Sep 2018 15:45:14 +0000 (11:45 -0400)]
RDMA/core: Document CM @event_handler function

Code audit suggests that the RDMA CM event handler callback function is
_always_ invoked in a context that is safe to block. That's important for
consumer implementers to know, so document that in the comment before
rdma_create_id (where the handler function is set up by the consumer).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Assign device ifindex before publishing the device
Parav Pandit [Thu, 6 Sep 2018 07:58:57 +0000 (10:58 +0300)]
RDMA/core: Assign device ifindex before publishing the device

Even though device->ifindex is assigned before adding the device in the
list which is read by netlink flow, it is better to assign rdma device
index before publishing the device in the system to users and clients.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Follow correct unregister order between sysfs and cgroup
Parav Pandit [Thu, 6 Sep 2018 07:55:31 +0000 (10:55 +0300)]
RDMA/core: Follow correct unregister order between sysfs and cgroup

During register_device() init sequence is,
(a) register with rdma cgroup followed by
(b) register with sysfs

Therefore, unregister_device() sequence should follow the reverse order.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/umem: Restore lockdep check while downgrading lock
Leon Romanovsky [Mon, 3 Sep 2018 17:17:31 +0000 (20:17 +0300)]
RDMA/umem: Restore lockdep check while downgrading lock

Lockdep engine handles correctly downgrade of locks and it simply
incorrect to disable lockdep checks prior to calling mmu_notifier.

Remove lockdep_off and ensure locks correctness.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Define client_data_lock as rwlock instead of spinlock
Parav Pandit [Tue, 28 Aug 2018 12:08:45 +0000 (15:08 +0300)]
RDMA/core: Define client_data_lock as rwlock instead of spinlock

Even though device registration/unregistration and client
registration/unregistration is not a performance path, define the
client_data_lock as rwlock for code clarity.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Use simpler spin lock irq API from blocking context
Parav Pandit [Tue, 28 Aug 2018 12:08:44 +0000 (15:08 +0300)]
RDMA/core: Use simpler spin lock irq API from blocking context

add_client_context(), ib_unregister_device() and ib_unregister_client()
are designed to call from blocking context.  There is no need to save and
restore last interrupt state when called from such blocking context.  Even
though this is not a performance path, using the right spin lock API is
desired for code clarity.

To avoid checkpatch warning while removing flags, sizeof() is used.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Remove context entries from list while unregistering device
Parav Pandit [Tue, 28 Aug 2018 12:08:43 +0000 (15:08 +0300)]
RDMA/core: Remove context entries from list while unregistering device

While unregistering a device, remove the context elements from the list to
not have any stale entries. With that any errors/bugs can be checked when
device is freed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Use simplified list_for_each
Parav Pandit [Tue, 28 Aug 2018 12:08:42 +0000 (15:08 +0300)]
RDMA/core: Use simplified list_for_each

While traversing client_data_list in following conditions, linked list is
only read, no elements of the list are removed.  Therefore, use
list_for_each_entry(), instead of list_for_each_safe().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: No need to protect kfree with spin lock and semaphore
Parav Pandit [Tue, 28 Aug 2018 12:08:41 +0000 (15:08 +0300)]
RDMA/core: No need to protect kfree with spin lock and semaphore

While unregistering a client, only context removal should be protected
with lock. There is no need to protect a freeing of such context which is
already removed from the list.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/{cma, core}: Avoid callback on rdma_addr_cancel()
Parav Pandit [Tue, 28 Aug 2018 11:45:32 +0000 (14:45 +0300)]
RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()

Currently rdma_addr_cancel() is an async operation, which notifies that
cancel is done by executing the callback function given during
rdma_resolve_ip(). If resolve_ip request is already completed than
callback is not executed.

Instead, now rdma_resolve_addr() and rdma_addr_cancel() simplified in
following ways.
1. rdma_addr_cancel() now a synchronous method. If request was
pending, after it is cancelled, no callback is notified.
2. rdma_resolve_addr() and respective addr_handler() callback doesn't
need to hold reference to cm_id.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Rate limit MAD error messages
Parav Pandit [Tue, 28 Aug 2018 11:45:31 +0000 (14:45 +0300)]
RDMA/core: Rate limit MAD error messages

While registering a mad agent, a user space can trigger various errors
and flood the logs.

Therefore, decrease verbosity and rate limit such error messages.
While we are at it, use __func__ to print function name.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/ipoib: Ensure that MTU isn't less than minimum permitted
Muhammad Sammar [Tue, 28 Aug 2018 11:45:30 +0000 (14:45 +0300)]
IB/ipoib: Ensure that MTU isn't less than minimum permitted

It is illegal to change MTU to a value lower than the minimum MTU
stated in ethernet spec. In addition to that we need to add 4 bytes
for encapsulation header (IPOIB_ENCAP_LEN).

Before "ifconfig ib0 mtu 0" command, succeeds while it obviously shouldn't.

Signed-off-by: Muhammad Sammar <muhammads@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/mlx5: Don't hold spin lock while checking device state
Parav Pandit [Tue, 28 Aug 2018 11:45:29 +0000 (14:45 +0300)]
IB/mlx5: Don't hold spin lock while checking device state

mdev->state device state is not protected by the QP for which WRs are
being processed. Therefore, there is no need to hold spin lock while
checking mdev state.

Given that device fatal error is unlikely situation, wrap the condition
check with unlikely().

Additionally, kernel QP1 is also a kernel ULP for which soft CQEs needs
to be generated. Therefore, check for device fatal error before
processing QP1 work requests.

Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Fail early if unsupported QP is provided
Parav Pandit [Tue, 28 Aug 2018 11:45:28 +0000 (14:45 +0300)]
RDMA/core: Fail early if unsupported QP is provided

When requested QP type is not supported for a {device, port}, return the
error right away before validating all parameters during mad agent
registration time.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMerge branch 'uverbs_dev_cleanups' into rdma.git for-next
Jason Gunthorpe [Wed, 5 Sep 2018 22:21:22 +0000 (16:21 -0600)]
Merge branch 'uverbs_dev_cleanups' into rdma.git for-next

For dependencies, branch based on rdma.git 'for-rc' of
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/

Pull 'uverbs_dev_cleanups' from Leon Romanovsky:

====================
Reuse the char device code interfaces to simplify ib_uverbs_device
creation and destruction. As part of this series, we are sending fix to
cleanup path, which was discovered during internal review,

The fix definitely can go to -rc, but it means that this series will be
dependent on rdma-rc.
====================

* branch 'uverbs_dev_cleanups':
  RDMA/uverbs: Use device.groups to initialize device attributes
  RDMA/uverbs: Use cdev_device_add() instead of cdev_add()
  RDMA/core: Depend on device_add() to add device attributes
  RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()

Resolved conflict in ib_device_unregister_sysfs()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Use device.groups to initialize device attributes
Parav Pandit [Wed, 5 Sep 2018 06:48:00 +0000 (09:48 +0300)]
RDMA/uverbs: Use device.groups to initialize device attributes

Instead of explicitly adding device attribute files and handling such
error conditions, depend on device core layer to create device attributes
files based group pointer NULL terminated array.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Use cdev_device_add() instead of cdev_add()
Parav Pandit [Wed, 5 Sep 2018 06:47:59 +0000 (09:47 +0300)]
RDMA/uverbs: Use cdev_device_add() instead of cdev_add()

Instead of doing two step process to add char device and create underlying
device, use cdev_device_add() which does both.

Currently a kobject per uverbs_device is created to keep reference to its
holding ib_uverbs_device in addition to its underlying device 'dev'.

Instead just use uverbs_device->dev to keep a reference to.

With this change there is single reference tracker for ib_uverbs_device
structure.

This allows for subsequent patch to registers group attribute as well
using single API cdev_device_add().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Depend on device_add() to add device attributes
Parav Pandit [Wed, 5 Sep 2018 06:47:58 +0000 (09:47 +0300)]
RDMA/core: Depend on device_add() to add device attributes

Instead of adding/removing device attribute files, depend on device_add()
which considers adding these device files based on NULL terminated
attributes group array.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()
Parav Pandit [Wed, 5 Sep 2018 06:47:57 +0000 (09:47 +0300)]
RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()

If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap.

Fixes: 7d96c9b17636 ("IB/uverbs: Have the core code create the uverbs_root_spec")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agobnxt_re: Fix couple of memory leaks that could lead to IOMMU call traces
Somnath Kotur [Wed, 5 Sep 2018 07:50:34 +0000 (13:20 +0530)]
bnxt_re: Fix couple of memory leaks that could lead to IOMMU call traces

1. DMA-able memory allocated for Shadow QP was not being freed.
2. bnxt_qplib_alloc_qp_hdr_buf() had a bug wherein the SQ pointer was
   erroneously pointing to the RQ. But since the corresponding
   free_qp_hdr_buf() was correct, memory being free was less than what was
   allocated.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/core: Replace open-coded variant of get_device
Parav Pandit [Mon, 3 Sep 2018 17:20:25 +0000 (20:20 +0300)]
RDMA/core: Replace open-coded variant of get_device

Reuse existing get_device() API to do it symmetric to already used
put_device() in commit 924b8900a49d ("RDMA/core: Replace open-coded
variant of put_device")

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Declare closing variable as boolean
Leon Romanovsky [Mon, 3 Sep 2018 17:18:03 +0000 (20:18 +0300)]
RDMA/uverbs: Declare closing variable as boolean

The "closing" variable is used as boolean and set to "true" in one
place, update the declaration of that variable and their other
assignment to proper type.

Fixes: e951747a087a ("IB/uverbs: Rework the locking for cleaning up the ucontext")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/nes: Delete impossible debug prints
Leon Romanovsky [Wed, 5 Sep 2018 05:58:31 +0000 (08:58 +0300)]
RDMA/nes: Delete impossible debug prints

The pci-core and net-core logic ensure that parameters provided
to nes_probe() and nes_netdev_open() are valid, hence the assert
print are not possible.

Cc: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/qedr: remove set but not used variable 'ctx'
YueHaibing [Sat, 1 Sep 2018 03:53:46 +0000 (03:53 +0000)]
RDMA/qedr: remove set but not used variable 'ctx'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/qedr/verbs.c: In function 'qedr_create_srq':
drivers/infiniband/hw/qedr/verbs.c:1450:24: warning:
 variable 'ctx' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Rahul Verma <rahul.verma@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/srp: Remove unnecessary unlikely()
Igor Stoppa [Fri, 31 Aug 2018 20:16:45 +0000 (23:16 +0300)]
IB/srp: Remove unnecessary unlikely()

WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
into another.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Add an unbound WQ type to the new CQ API
Jack Morgenstein [Mon, 27 Aug 2018 05:35:55 +0000 (08:35 +0300)]
IB/core: Add an unbound WQ type to the new CQ API

The upstream kernel commit cited below modified the workqueue in the
new CQ API to be bound to a specific CPU (instead of being unbound).
This caused ALL users of the new CQ API to use the same bound WQ.

Specifically, MAD handling was severely delayed when the CPU bound
to the WQ was busy handling (higher priority) interrupts.

This caused a delay in the MAD "heartbeat" response handling,
which resulted in ports being incorrectly classified as "down".

To fix this, add a new "unbound" WQ type to the new CQ API, so that users
have the option to choose either a bound WQ or an unbound WQ.

For MADs, choose the new "unbound" WQ.

Fixes: b7363e67b23e ("IB/device: Convert ib-comp-wq to be CPU-bound")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.m>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/bnxt_re: QPLIB: Add and use #define dev_fmt(fmt) "QPLIB: " fmt
Joe Perches [Fri, 10 Aug 2018 18:42:46 +0000 (11:42 -0700)]
RDMA/bnxt_re: QPLIB: Add and use #define dev_fmt(fmt) "QPLIB: " fmt

Consistently use the "QPLIB: " prefix for dev_<level> logging.

Miscellanea:

o Add missing newlines to avoid possible message interleaving
o Coalesce consecutive dev_<level> uses that emit a message header to
  avoid < 80 column lengths and mistakenly output on multiple lines
o Reflow modified lines to use 80 columns where appropriate
o Consistently use "%s: " where __func__ is output
o QPLIB: is now always output immediately after the dev_<level> header

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler
Aaron Knister [Fri, 24 Aug 2018 12:42:46 +0000 (08:42 -0400)]
IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler

Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which leaves
a window where cm_rep_handler can run, set the connection up, dequeue
pending packets and leave the subsequently queued packets by start_xmit()
sitting on neigh->queue until they're dropped when the connection is torn
down. This only applies to connected mode. These dropped packets can
really upset TCP, for example, and cause multi-minute delays in
transmission for open connections.

Here's the code in start_xmit where we check to see if the connection is
up:

       if (ipoib_cm_get(neigh)) {
               if (ipoib_cm_up(neigh)) {
                       ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
                       goto unref;
               }
       }

The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet in
start_xmit where packets are queued.

       if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
               push_pseudo_header(skb, phdr->hwaddr);
               spin_lock_irqsave(&priv->lock, flags);
               __skb_queue_tail(&neigh->queue, skb);
               spin_unlock_irqrestore(&priv->lock, flags);
       } else {
               ++dev->stats.tx_dropped;
               dev_kfree_skb_any(skb);
       }

The patch acquires the netif tx lock in cm_rep_handler for the section
where it sets the connection up and dequeues and retransmits deferred
skb's.

Fixes: 839fcaba355a ("IPoIB: Connected mode experimental support")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Knister <aaron.s.knister@nasa.gov>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoMerge branch 'mlx5-flow-mutate' into rdma.git for-next
Jason Gunthorpe [Wed, 5 Sep 2018 21:24:58 +0000 (15:24 -0600)]
Merge branch 'mlx5-flow-mutate' into rdma.git for-next

For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Flow actions to mutate packets from Leon Romanovsky:

====================
This series exposes the ability to create flow actions which can
mutate packet headers. We do that by exposing two new verbs:
 * modify header - can change existing packet headers. packet
 * reformat - can encapsulate or decapsulate a packet.
              Once created a flow action must be attached to a steering
              rule for it to take effect.

The first 10 patches refactor mlx5_core code, rename internal structures
to better reflect their operation and export needed functions so the RDMA
side can allocate the action.

The last 5 patches expose via the IOCTL infrastructure mlx5_ib methods
which do the actual allocation of resources and return an handle to the
user. A user of this API is expected to know how to work with the device's
spec as the input to those function is HW depended.

An example usage of the modify header action is routing, A user can create
an action which edits the L2 header and decrease the TTL.

An example usage of the packet reformat action is VXLAN encap/decap which
is done by the HW.
====================

* branch 'mlx5-flow-mutate':
  RDMA/mlx5: Extend packet reformat verbs
  RDMA/mlx5: Add new flow action verb - packet reformat
  RDMA/uverbs: Add generic function to fill in flow action object
  RDMA/mlx5: Add a new flow action verb - modify header
  RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
  net/mlx5: Export packet reformat alloc/dealloc functions
  net/mlx5: Pass a namespace for packet reformat ID allocation
  net/mlx5: Expose new packet reformat capabilities
  {net, RDMA}/mlx5: Rename encap to reformat packet
  net/mlx5: Move header encap type to IFC header file
  net/mlx5: Break encap/decap into two separated flow table creation flags
  net/mlx5: Add support for more namespaces when allocating modify header
  net/mlx5: Export modify header alloc/dealloc functions
  net/mlx5: Add proper NIC TX steering flow tables support
  net/mlx5: Cleanup flow namespace getter switch logic

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Extend packet reformat verbs
Mark Bloch [Tue, 28 Aug 2018 11:18:54 +0000 (14:18 +0300)]
RDMA/mlx5: Extend packet reformat verbs

We expose new actions:

L2_TO_L2_TUNNEL - A generic encap from L2 to L2, the data passed should
  be the encapsulating headers.

L3_TUNNEL_TO_L2 - Will do decap where the inner packet starts from L3,
  the data should be mac or mac + vlan (14 or 18 bytes).

L2_TO_L3_TUNNEL - Will do encap where is L2 of the original packet will
  not be included, the data should be the encapsulating
  header.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Add new flow action verb - packet reformat
Mark Bloch [Tue, 28 Aug 2018 11:18:53 +0000 (14:18 +0300)]
RDMA/mlx5: Add new flow action verb - packet reformat

For now, only add L2_TUNNEL_TO_L2 option. This will allow to perform
generic decap operation if the encapsulating protocol is L2 based, and the
inner packet is also L2 based. For example this can be used to decap VXLAN
packets.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Add generic function to fill in flow action object
Mark Bloch [Tue, 28 Aug 2018 11:18:52 +0000 (14:18 +0300)]
RDMA/uverbs: Add generic function to fill in flow action object

Refactor the initialization of a flow action object to a common function.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/mlx5: Add a new flow action verb - modify header
Mark Bloch [Tue, 28 Aug 2018 11:18:51 +0000 (14:18 +0300)]
RDMA/mlx5: Add a new flow action verb - modify header

Expose the ability to create a flow action which changes packet
headers. The data passed from userspace should be modify header actions as
defined by HW specification.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
Mark Bloch [Tue, 28 Aug 2018 11:18:50 +0000 (14:18 +0300)]
RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language

This makes it clear and safe to access constants passed in from user
space. We define a consistent ABI of u64 for all constants, and verify
that the data passed in can be represented by the type the user supplies.

The expectation is this will always be used with an enum declaring the
constant values, and the user will use the enum type as input to the
accessor.

To retrieve the attribute value we introduce two helper calls - one
standard which may fail if attribute is not valid and one where caller can
provide a default value which will be used in case the attribute is not
valid (useful when attribute is optional).

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Export packet reformat alloc/dealloc functions
Mark Bloch [Tue, 28 Aug 2018 11:18:49 +0000 (14:18 +0300)]
net/mlx5: Export packet reformat alloc/dealloc functions

This will allow for the RDMA side to allocate packet reformat context.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Pass a namespace for packet reformat ID allocation
Mark Bloch [Tue, 28 Aug 2018 11:18:48 +0000 (14:18 +0300)]
net/mlx5: Pass a namespace for packet reformat ID allocation

Currently we attach packet reformat actions only to the FDB namespace.
In preparation to be able to use that for NIC steering, pass the actual
namespace as a parameter.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Expose new packet reformat capabilities
Mark Bloch [Tue, 28 Aug 2018 11:18:47 +0000 (14:18 +0300)]
net/mlx5: Expose new packet reformat capabilities

Expose new abilities when creating a packet reformat context.

The new types which can be created are:
MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic encap
operation to be done by the HW.

MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic decap
operation where the inner packet doesn't contain L2.

MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic encap
operation to be done by the HW. The L2 of the original packet
is dropped.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years ago{net, RDMA}/mlx5: Rename encap to reformat packet
Mark Bloch [Tue, 28 Aug 2018 11:18:46 +0000 (14:18 +0300)]
{net, RDMA}/mlx5: Rename encap to reformat packet

Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Move header encap type to IFC header file
Mark Bloch [Tue, 28 Aug 2018 11:18:45 +0000 (14:18 +0300)]
net/mlx5: Move header encap type to IFC header file

Those bits are hardware specification and should be defined in the
IFC header file.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Break encap/decap into two separated flow table creation flags
Mark Bloch [Tue, 28 Aug 2018 11:18:44 +0000 (14:18 +0300)]
net/mlx5: Break encap/decap into two separated flow table creation flags

Today we are able to attach encap and decap actions only to the FDB. In
preparation to enable those actions on the NIC flow tables, break the
single flag into two. Those flags control whatever a decap or encap
operations can be attached to the flow table created. For FDB, if
encapsulation is required, we set both of them.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Add support for more namespaces when allocating modify header
Mark Bloch [Tue, 28 Aug 2018 11:18:43 +0000 (14:18 +0300)]
net/mlx5: Add support for more namespaces when allocating modify header

There are RX and TX flow steering namespaces with different number of
actions. Initialize them accordingly.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Export modify header alloc/dealloc functions
Mark Bloch [Tue, 28 Aug 2018 11:18:42 +0000 (14:18 +0300)]
net/mlx5: Export modify header alloc/dealloc functions

Those functions will be used by the RDMA side to create modify header
actions to be attached to flow steering rules via verbs.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Add proper NIC TX steering flow tables support
Mark Bloch [Tue, 28 Aug 2018 11:18:41 +0000 (14:18 +0300)]
net/mlx5: Add proper NIC TX steering flow tables support

Extend the ability to add steering rules to NIC TX flow tables.
For now, we are only adding TX bypass (egress) which is used by the RDMA
side. This will allow to shape outgoing traffic and tweak it if needed, for
example performing encapsulation or rewriting headers.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Cleanup flow namespace getter switch logic
Mark Bloch [Tue, 28 Aug 2018 11:18:40 +0000 (14:18 +0300)]
net/mlx5: Cleanup flow namespace getter switch logic

Refactor the switch logic so it's simpler to follow and understand.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoIB/mlx5: Change TX affinity assignment in RoCE LAG mode
Majd Dibbiny [Tue, 28 Aug 2018 11:29:05 +0000 (14:29 +0300)]
IB/mlx5: Change TX affinity assignment in RoCE LAG mode

In the current code, the TX affinity is per RoCE device, which can cause
unfairness between different contexts. e.g. if we open two contexts, and
each open 10 QPs concurrently, all of the QPs of the first context might
end up on the first port instead of distributed on the two ports as
expected

To overcome this unfairness between processes, we maintain per device TX
affinity, and per process TX affinity.

The allocation algorithm is as follow:

1. Hold two tx_port_affinity atomic variables, one per RoCE device and one
   per ucontext. Both initialized to 0.

2. In mlx5_ib_alloc_ucontext do:
 2.1. ucontext.tx_port_affinity = device.tx_port_affinity
 2.2. device.tx_port_affinity += 1

3. In modify QP INIT2RST:
 3.1. qp.tx_port_affinity = ucontext.tx_port_affinity % MLX5_PORT_NUM
 3.2. ucontext.tx_port_affinity += 1

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoiw_cxgb4: only allow 1 flush on user qps
Steve Wise [Fri, 31 Aug 2018 14:15:56 +0000 (07:15 -0700)]
iw_cxgb4: only allow 1 flush on user qps

Once the qp has been flushed, it cannot be flushed again.  The user qp
flush logic wasn't enforcing it however.  The bug can cause
touch-after-free crashes like:

Unable to handle kernel paging request for data at address 0x000001ec
Faulting instruction address: 0xc008000016069100
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
Call Trace:
[c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
[c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
[c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
[c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
[c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
[c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
[c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
[c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
[c000000000444da4] __fput+0xe4/0x2f0

So fix flush_qp() to only flush the wq once.

Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoIB/core: Release object lock if destroy failed
Artemy Kovalyov [Tue, 28 Aug 2018 11:40:32 +0000 (14:40 +0300)]
IB/core: Release object lock if destroy failed

The object lock was supposed to always be released during destroy, but
when the destruction retry series was integrated with the destroy series
it created a failure path that missed the unlock.

Keep with convention, if destroy fails the caller must undo all locking.

Fixes: 87ad80abc70d ("IB/uverbs: Consolidate uobject destruction")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agoRDMA/ucma: check fd type in ucma_migrate_id()
Jann Horn [Mon, 3 Sep 2018 16:54:14 +0000 (18:54 +0200)]
RDMA/ucma: check fd type in ucma_migrate_id()

The current code grabs the private_data of whatever file descriptor
userspace has supplied and implicitly casts it to a `struct ucma_file *`,
potentially causing a type confusion.

This is probably fine in practice because the pointer is only used for
comparisons, it is never actually dereferenced; and even in the
comparisons, it is unlikely that a file from another filesystem would have
a ->private_data pointer that happens to also be valid in this context.
But ->private_data is not always guaranteed to be a valid pointer to an
object owned by the file's filesystem; for example, some filesystems just
cram numbers in there.

Check the type of the supplied file descriptor to be safe, analogous to how
other places in the kernel do it.

Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
6 years agonet/mlx5: Add memic command opcode to command checker
Ariel Levkovich [Mon, 3 Sep 2018 17:19:48 +0000 (20:19 +0300)]
net/mlx5: Add memic command opcode to command checker

Adding the alloc/dealloc memic FW command opcodes to
avoid "unknown command" prints in the command string
converter and internal error status handler.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agonet/mlx5: Fix atomic_mode enum values
Moni Shoua [Mon, 3 Sep 2018 17:19:28 +0000 (20:19 +0300)]
net/mlx5: Fix atomic_mode enum values

The field atomic_mode is 4 bits wide and therefore can hold values
from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values
be incorrect. While that, remove unused enum values.

Fixes: 57cda166bbe0 ("net/mlx5: Add DCT command interface")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
6 years agoLinux 4.19-rc2
Linus Torvalds [Sun, 2 Sep 2018 21:37:30 +0000 (14:37 -0700)]
Linux 4.19-rc2

6 years agoMerge tag 'devicetree-fixes-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 2 Sep 2018 17:56:01 +0000 (10:56 -0700)]
Merge tag 'devicetree-fixes-for-4.19' of git://git./linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "A couple of new helper functions in preparation for some tree wide
  clean-ups.

  I'm sending these new helpers now for rc2 in order to simplify the
  dependencies on subsequent cleanups across the tree in 4.20"

* tag 'devicetree-fixes-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Add device_type access helper functions
  of: add node name compare helper functions
  of: add helper to lookup compatible child node

6 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Sun, 2 Sep 2018 17:44:28 +0000 (10:44 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "First batch of fixes post-merge window:

   - A handful of devicetree changes for i.MX2{3,8} to change over to
     new panel bindings. The platforms were moved from legacy
     framebuffers to DRM and some development board panels hadn't yet
     been converted.

   - OMAP fixes related to ti-sysc driver conversion fallout, fixing
     some register offsets, no_console_suspend fixes, etc.

   - Droid4 changes to fix flaky eMMC probing and vibrator DTS mismerge.

   - Fixed 0755->0644 permissions on a newly added file.

   - Defconfig changes to make ARM Versatile more useful with QEMU
     (helps testing).

   - Enable defconfig options for new TI SoC platform that was merged
     this window (AM6)"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: defconfig: Enable TI's AM6 SoC platform
  ARM: defconfig: Update the ARM Versatile defconfig
  ARM: dts: omap4-droid4: Fix emmc errors seen on some devices
  ARM: dts: Fix file permission for am335x-osd3358-sm-red.dts
  ARM: imx_v6_v7_defconfig: Select CONFIG_DRM_PANEL_SEIKO_43WVF1G
  ARM: mxs_defconfig: Select CONFIG_DRM_PANEL_SEIKO_43WVF1G
  ARM: dts: imx23-evk: Convert to the new display bindings
  ARM: dts: imx23-evk: Move regulators outside simple-bus
  ARM: dts: imx28-evk: Convert to the new display bindings
  ARM: dts: imx28-evk: Move regulators outside simple-bus
  Revert "ARM: dts: imx7d: Invert legacy PCI irq mapping"
  arm: dts: am4372: setup rtc as system-power-controller
  ARM: dts: omap4-droid4: fix vibrations on Droid 4
  bus: ti-sysc: Fix no_console_suspend handling
  bus: ti-sysc: Fix module register ioremap for larger offsets
  ARM: OMAP2+: Fix module address for modules using mpu_rt_idx
  ARM: OMAP2+: Fix null hwmod for ti-sysc debug

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 2 Sep 2018 17:11:30 +0000 (10:11 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Speculation:

   - Make the microcode check more robust

   - Make the L1TF memory limit depend on the internal cache physical
     address space and not on the CPUID advertised physical address
     space, which might be significantly smaller. This avoids disabling
     L1TF on machines which utilize the full physical address space.

   - Fix the GDT mapping for EFI calls on 32bit PTI

   - Fix the MCE nospec implementation to prevent #GP

  Fixes and robustness:

   - Use the proper operand order for LSL in the VDSO

   - Prevent NMI uaccess race against CR3 switching

   - Add a lockdep check to verify that text_mutex is held in
     text_poke() functions

   - Repair the fallout of giving native_restore_fl() a prototype

   - Prevent kernel memory dumps based on usermode RIP

   - Wipe KASAN shadow stack before rewinding the stack to prevent false
     positives

   - Move the AMS GOTO enforcement to the actual build stage to allow
     user API header extraction without a compiler

   - Fix a section mismatch introduced by the on demand VDSO mapping
     change

  Miscellaneous:

   - Trivial typo, GCC quirk removal and CC_SET/OUT() cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pti: Fix section mismatch warning/error
  x86/vdso: Fix lsl operand order
  x86/mce: Fix set_mce_nospec() to avoid #GP fault
  x86/efi: Load fixmap GDT in efi_call_phys_epilog()
  x86/nmi: Fix NMI uaccess race against CR3 switching
  x86: Allow generating user-space headers without a compiler
  x86/dumpstack: Don't dump kernel memory based on usermode RIP
  x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember()
  x86/alternatives: Lockdep-enforce text_mutex in text_poke*()
  x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()
  x86/irqflags: Mark native_restore_fl extern inline
  x86/build: Remove jump label quirk for GCC older than 4.5.2
  x86/Kconfig: Fix trivial typo
  x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
  x86/spectre: Add missing family 6 check to microcode check

6 years agoMerge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 2 Sep 2018 17:09:35 +0000 (10:09 -0700)]
Merge branch 'smp-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull CPU hotplug fix from Thomas Gleixner:
 "Remove the stale skip_onerr member from the hotplug states"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Remove skip_onerr field from cpuhp_step structure

6 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 2 Sep 2018 16:41:45 +0000 (09:41 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull core fixes from Thomas Gleixner:
 "A small set of updates for core code:

   - Prevent tracing in functions which are called from trace patching
     via stop_machine() to prevent executing half patched function trace
     entries.

   - Remove old GCC workarounds

   - Remove pointless includes of notifier.h"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Remove workaround for unreachable warnings from old GCC
  notifier: Remove notifier header file wherever not used
  watchdog: Mark watchdog touch functions as notrace

6 years agox86/pti: Fix section mismatch warning/error
Randy Dunlap [Sun, 2 Sep 2018 04:01:28 +0000 (21:01 -0700)]
x86/pti: Fix section mismatch warning/error

Fix the section mismatch warning in arch/x86/mm/pti.c:

WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte()
The function pti_clone_pgtable() references
the function __init pti_user_pagetable_walk_pte().
This is often because pti_clone_pgtable lacks a __init
annotation or the annotation of pti_user_pagetable_walk_pte is wrong.
FATAL: modpost: Section mismatches detected.

Fixes: 85900ea51577 ("x86/pti: Map the vsyscall page if needed")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
6 years agoMerge tag 'omap-for-v4.19/fixes-v2-signed' of git://git.kernel.org/pub/scm/linux...
Olof Johansson [Sun, 2 Sep 2018 01:22:19 +0000 (18:22 -0700)]
Merge tag 'omap-for-v4.19/fixes-v2-signed' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

Fixes for omap variants against v4.19-rc1

These are mostly fixes related to using ti-sysc interconnect target module
driver for accessing right register offsets for sgx and cpsw and for
no_console_suspend regression.

There is also a droid4 emmc fix where emmc may not get detected for some
models, and vibrator dts mismerge fix.

And we have a file permission fix for am335x-osd3358-sm-red.dts that
just got added. And we must tag RTC as system-power-controller for
am437x for PMIC to shut down during poweroff.

* tag 'omap-for-v4.19/fixes-v2-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: omap4-droid4: Fix emmc errors seen on some devices
  ARM: dts: Fix file permission for am335x-osd3358-sm-red.dts
  arm: dts: am4372: setup rtc as system-power-controller
  ARM: dts: omap4-droid4: fix vibrations on Droid 4
  bus: ti-sysc: Fix no_console_suspend handling
  bus: ti-sysc: Fix module register ioremap for larger offsets
  ARM: OMAP2+: Fix module address for modules using mpu_rt_idx
  ARM: OMAP2+: Fix null hwmod for ti-sysc debug

Signed-off-by: Olof Johansson <olof@lixom.net>
6 years agox86/vdso: Fix lsl operand order
Samuel Neves [Sat, 1 Sep 2018 20:14:52 +0000 (21:14 +0100)]
x86/vdso: Fix lsl operand order

In the __getcpu function, lsl is using the wrong target and destination
registers. Luckily, the compiler tends to choose %eax for both variables,
so it has been working so far.

Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
6 years agoMerge tag 'linux-watchdog-4.19-rc2' of git://www.linux-watchdog.org/linux-watchdog
Linus Torvalds [Sat, 1 Sep 2018 20:17:15 +0000 (13:17 -0700)]
Merge tag 'linux-watchdog-4.19-rc2' of git://linux-watchdog.org/linux-watchdog

Pull watchdog fixlet from Wim Van Sebroeck:
 "Document support for r8a774a1"

* tag 'linux-watchdog-4.19-rc2' of git://www.linux-watchdog.org/linux-watchdog:
  dt-bindings: watchdog: renesas-wdt: Document r8a774a1 support

6 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Sep 2018 20:03:32 +0000 (13:03 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Two small fixes, one for the x86 Stoney SoC to get a more accurate clk
  frequency and the other to fix a bad allocation in the Nuvoton NPCM7XX
  driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: x86: Set default parent to 48Mhz
  clk: npcm7xx: fix memory allocation

6 years agox86/mce: Fix set_mce_nospec() to avoid #GP fault
LuckTony [Fri, 31 Aug 2018 16:55:06 +0000 (09:55 -0700)]
x86/mce: Fix set_mce_nospec() to avoid #GP fault

The trick with flipping bit 63 to avoid loading the address of the 1:1
mapping of the poisoned page while the 1:1 map is updated used to work when
unmapping the page. But it falls down horribly when attempting to directly
set the page as uncacheable.

The problem is that when the cache mode is changed to uncachable, the pages
needs to be flushed from the cache first. But the decoy address is
non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a
#GP fault.

Add code to change_page_attr_set_clr() to fix the address before calling
flush.

Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
6 years agoIB/hfi1: Move URGENT IRQ enable to hfi1_rcvctrl()
Michael J. Ruhl [Thu, 16 Aug 2018 06:04:32 +0000 (23:04 -0700)]
IB/hfi1: Move URGENT IRQ enable to hfi1_rcvctrl()

User contexts use the receive URGENT interrupt.  However, enabling
the IRQ SRC in the file_ops module is not as clean as it could be.

Augment the _rcvctl() function to be able to enable/disable the IRQ
source.

Use the new interface from file_ops to enable/disable the IRQ.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Rework the IRQ API to be more flexible
Michael J. Ruhl [Thu, 16 Aug 2018 06:04:22 +0000 (23:04 -0700)]
IB/hfi1: Rework the IRQ API to be more flexible

The current IRQ API is an all or nothing interface.  This has two
problems:

  1. All IRQs are enabled regardless of use
  2. Moving from general interrupt to MSIx handling is difficult

Introduce a new API to enable/disable specific IRQs or a range of IRQs.

Do not enable and disable all IRQs in one step.

Rework various modules to enable/disable IRQs when needed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: PCIe bus width retry
Kamenee Arumugam [Thu, 16 Aug 2018 06:04:13 +0000 (23:04 -0700)]
IB/hfi1: PCIe bus width retry

Retry the PCIe link training up to 'pcie_retry' times
if the PCIe link width is narrower than the previous width.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Make the MSIx resource allocation a bit more flexible
Michael J. Ruhl [Thu, 16 Aug 2018 06:04:04 +0000 (23:04 -0700)]
IB/hfi1: Make the MSIx resource allocation a bit more flexible

The current method of allocating MSIx resources is a bit cumbersome,
and not very easily added to.

Refactor and re-order the code paths into a more consistent interface.

Update the interface so that allocations are not order dependent.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Prepare for new HFI1 MSIx API
Michael J. Ruhl [Thu, 16 Aug 2018 13:28:40 +0000 (06:28 -0700)]
IB/hfi1: Prepare for new HFI1 MSIx API

The current HFI1 MSIx API is difficult to follow, change, or add to.

In anticipation of moving to an more flexible API, move the current
MSIx functionality to the new msix.c module.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Get the hfi1_devdata structure as early as possible
Michael J. Ruhl [Thu, 16 Aug 2018 06:03:46 +0000 (23:03 -0700)]
IB/hfi1: Get the hfi1_devdata structure as early as possible

Currently several things occur before the hfi1_devdata structure is
allocated.  This leads to an inconsistent logging ability and makes
it more difficult to restructure some code paths.

Allocate (and do a minimal init) the structure as soon as possible.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: tune_pcie_caps is arbitrarily placed, poorly
Michael J. Ruhl [Thu, 16 Aug 2018 05:58:49 +0000 (22:58 -0700)]
IB/hfi1: tune_pcie_caps is arbitrarily placed, poorly

The tune_pcie_caps needs to occur sometime after PCI is enabled, but
before the HFI is enabled.  Currently it is placed in the MSIx
allocation code which doesn't really fit. Moving it to just after
the gen3 bump.

Clean up the associated code (modules, etc.).

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Remove duplicated defines
Michael J. Ruhl [Thu, 16 Aug 2018 05:58:40 +0000 (22:58 -0700)]
IB/hfi1: Remove duplicated defines

TXREQ defines are duplicated, incompletely, in the sdma header file.

Remove duplicate defines.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoIB/hfi1: Rework file list in Makefile
Dennis Dalessandro [Thu, 16 Aug 2018 05:58:31 +0000 (22:58 -0700)]
IB/hfi1: Rework file list in Makefile

We want to keep files in alphabetical order in our makefile, however this
just makes for messy diffs when adding (or removing) files. Let's just clean
this up and make it line by line.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
6 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 31 Aug 2018 16:20:30 +0000 (09:20 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "A few arm64 fixes came in this week, specifically fixing some nasty
  truncation of return values from firmware calls and resolving a
  VM_BUG_ON due to accessing uninitialised struct pages corresponding to
  NOMAP pages.

  Summary:

   - Fix typos in SVE documentation

   - Fix type-checking and implicit truncation for SMCCC calls

   - Force CONFIG_HOLES_IN_ZONE=y so that SLAB doesn't fall over NOMAP
     regions"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: always enable CONFIG_HOLES_IN_ZONE
  arm/arm64: smccc-1.1: Handle function result as parameters
  arm/arm64: smccc-1.1: Make return values unsigned long
  Documentation/arm64/sve: Couple of improvements and typos

6 years agox86/efi: Load fixmap GDT in efi_call_phys_epilog()
Joerg Roedel [Fri, 31 Aug 2018 08:05:38 +0000 (10:05 +0200)]
x86/efi: Load fixmap GDT in efi_call_phys_epilog()

When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap
for the simple reason that this address is also mapped for user-space.

The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT
to call EFI runtime services and switch back to the kernel GDT when they
return. But the switch-back uses the writable GDT, not the fixmap GDT.

When that happened and and the CPU returns to user-space it switches to the
user %cr3 and tries to restore user segment registers. This fails because
the writable GDT is not mapped in the user page-table, and without a GDT
the fault handlers also can't be launched. The result is a triple fault and
reboot of the machine.

Fix that by restoring the GDT back to the fixmap GDT which is also mapped
in the user page-table.

Fixes: 7757d607c6b3 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: hpa@zytor.com
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
6 years agoMerge tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Aug 2018 15:45:16 +0000 (08:45 -0700)]
Merge tag 'for-linus-4.19b-rc2-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - minor cleanup avoiding a warning when building with new gcc

 - a patch to add a new sysfs node for Xen frontend/backend drivers to
   make it easier to obtain the state of a pv device

 - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable
   PTEs

* tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: remove redundant variable save_pud
  xen: export device state to sysfs
  x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear
  x86/xen: don't write ptes directly in 32-bit PV guests

6 years agoMerge tag 'm68k-for-v4.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 31 Aug 2018 15:42:46 +0000 (08:42 -0700)]
Merge tag 'm68k-for-v4.19-tag2' of git://git./linux/kernel/git/geert/linux-m68k

Pull m68k fix from Geert Uytterhoeven:
 "Just a single fix for a bug introduced during the merge window: fix
  wrong date and time on PMU-based Macs"

* tag 'm68k-for-v4.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/mac: Use correct PMU response format

6 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Fri, 31 Aug 2018 15:38:53 +0000 (08:38 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

 - regression fixes for i801 and designware

 - better API and leak fix for releasing DMA safe buffers

 - better greppable strings for the bitbang algorithm

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: sh_mobile: fix leak when using DMA bounce buffer
  i2c: sh_mobile: define start_ch() void as it only returns 0 anyhow
  i2c: refactor function to release a DMA safe buffer
  i2c: algos: bit: make the error messages grepable
  i2c: designware: Re-init controllers with pm_disabled set on resume
  i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus

6 years agox86/nmi: Fix NMI uaccess race against CR3 switching
Andy Lutomirski [Wed, 29 Aug 2018 15:47:18 +0000 (08:47 -0700)]
x86/nmi: Fix NMI uaccess race against CR3 switching

A NMI can hit in the middle of context switching or in the middle of
switch_mm_irqs_off().  In either case, CR3 might not match current->mm,
which could cause copy_from_user_nmi() and friends to read the wrong
memory.

Fix it by adding a new nmi_uaccess_okay() helper and checking it in
copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
6 years agox86: Allow generating user-space headers without a compiler
Ben Hutchings [Wed, 29 Aug 2018 19:43:17 +0000 (20:43 +0100)]
x86: Allow generating user-space headers without a compiler

When bootstrapping an architecture, it's usual to generate the kernel's
user-space headers (make headers_install) before building a compiler.  Move
the compiler check (for asm goto support) to the archprepare target so that
it is only done when building code for the target.

Fixes: e501ce957a78 ("x86: Force asm-goto")
Reported-by: Helmut Grohne <helmutg@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
6 years agox86/dumpstack: Don't dump kernel memory based on usermode RIP
Jann Horn [Tue, 28 Aug 2018 15:49:01 +0000 (17:49 +0200)]
x86/dumpstack: Don't dump kernel memory based on usermode RIP

show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.

Fixes: 7cccf0725cf7 ("x86/dumpstack: Add a show_ip() function")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: security@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
6 years agoof: Add device_type access helper functions
Rob Herring [Tue, 28 Aug 2018 20:10:48 +0000 (15:10 -0500)]
of: Add device_type access helper functions

In preparation to remove direct access to device_node.type, add
of_node_is_type() and of_node_get_device_type() helpers to check and
retrieve the device type.

Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
6 years agocpu/hotplug: Remove skip_onerr field from cpuhp_step structure
Mukesh Ojha [Tue, 28 Aug 2018 06:54:54 +0000 (12:24 +0530)]
cpu/hotplug: Remove skip_onerr field from cpuhp_step structure

When notifiers were there, `skip_onerr` was used to avoid calling
particular step startup/teardown callbacks in the CPU up/down rollback
path, which made the hotplug asymmetric.

As notifiers are gone now after the full state machine conversion, the
`skip_onerr` field is no longer required.

Remove it from the structure and its usage.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org
6 years agoarm64: mm: always enable CONFIG_HOLES_IN_ZONE
James Morse [Thu, 30 Aug 2018 15:05:32 +0000 (16:05 +0100)]
arm64: mm: always enable CONFIG_HOLES_IN_ZONE

Commit 6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")
only enabled HOLES_IN_ZONE for NUMA systems because the NUMA code was
choking on the missing zone for nomap pages. This problem doesn't just
apply to NUMA systems.

If the architecture doesn't set HAVE_ARCH_PFN_VALID, pfn_valid() will
return true if the pfn is part of a valid sparsemem section.

When working with multiple pages, the mm code uses pfn_valid_within()
to test each page it uses within the sparsemem section is valid. On
most systems memory comes in MAX_ORDER_NR_PAGES chunks which all
have valid/initialised struct pages. In this case pfn_valid_within()
is optimised out.

Systems where this isn't true (e.g. due to nomap) should set
HOLES_IN_ZONE and provide HAVE_ARCH_PFN_VALID so that mm tests each
page as it works with it.

Currently non-NUMA arm64 systems can't enable HOLES_IN_ZONE, leading to
a VM_BUG_ON():

| page:fffffdff802e1780 is uninitialized and poisoned
| raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
| raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
| page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
| ------------[ cut here ]------------
| kernel BUG at include/linux/mm.h:978!
| Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[...]
| CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 40000085 (nZcv daIf -PAN -UAO)
| pc : move_freepages_block+0x144/0x248
| lr : move_freepages_block+0x144/0x248
| sp : fffffe0071177680
[...]
| Process dd (pid: 25236, stack limit = 0x0000000094cc07fb)
| Call trace:
|  move_freepages_block+0x144/0x248
|  steal_suitable_fallback+0x100/0x16c
|  get_page_from_freelist+0x440/0xb20
|  __alloc_pages_nodemask+0xe8/0x838
|  new_slab+0xd4/0x418
|  ___slab_alloc.constprop.27+0x380/0x4a8
|  __slab_alloc.isra.21.constprop.26+0x24/0x34
|  kmem_cache_alloc+0xa8/0x180
|  alloc_buffer_head+0x1c/0x90
|  alloc_page_buffers+0x68/0xb0
|  create_empty_buffers+0x20/0x1ec
|  create_page_buffers+0xb0/0xf0
|  __block_write_begin_int+0xc4/0x564
|  __block_write_begin+0x10/0x18
|  block_write_begin+0x48/0xd0
|  blkdev_write_begin+0x28/0x30
|  generic_perform_write+0x98/0x16c
|  __generic_file_write_iter+0x138/0x168
|  blkdev_write_iter+0x80/0xf0
|  __vfs_write+0xe4/0x10c
|  vfs_write+0xb4/0x168
|  ksys_write+0x44/0x88
|  sys_write+0xc/0x14
|  el0_svc_naked+0x30/0x34
| Code: aa1303e0 90001a01 91296421 94008902 (d4210000)
| ---[ end trace 1601ba47f6e883fe ]---

Remove the NUMA dependency.

Link: https://www.spinics.net/lists/arm-kernel/msg671851.html
Cc: <stable@vger.kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>