platform/kernel/linux-starfive.git
6 years agoMerge tag 'mlx5-updates-2018-02-23' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Wed, 28 Feb 2018 14:54:54 +0000 (09:54 -0500)]
Merge tag 'mlx5-updates-2018-02-23' of git://git./linux/kernel/git/mellanox/linux

Saeed Mahameed says:

mlx5-update-2018-02-23 (IB representors)

From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode

The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.

Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.

For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with  DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.

An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlx4-misc'
David S. Miller [Tue, 27 Feb 2018 19:53:27 +0000 (14:53 -0500)]
Merge branch 'mlx4-misc'

Tariq Toukan says:

====================
mlx4_en misc for 4.17

This patchset contains misc enhancements from the team
to the mlx4 Eth driver.

Patch 1 by Eran adds physical layer counters.
Patch 2 by Eran cleans-up a redundant warn print.
Patch 3 combines the checks of two end cases into a single if statement.
Patch 4 takes common code structures out of the #ifdef, following your
comment on a previous patch.

Series generated against net-next commit:
f74290fdb363 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx4_en: RX csum, pre-define enabled protocols for IP status masking
Tariq Toukan [Tue, 27 Feb 2018 14:17:22 +0000 (16:17 +0200)]
net/mlx4_en: RX csum, pre-define enabled protocols for IP status masking

Pre-define a mask for IP status of a completion, that tests the
MLX4_CQE_STATUS_IPV6 only in case CONFIG_IPV6 is enabled.
Use it for IP status testing upon completion, instead of separating
the datapath into two flows.
This takes common code structures (such as closing parenthesis)
back to their original place, and makes code more readable.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx4_en: Combine checks of end-cases in RX completion function
Tariq Toukan [Tue, 27 Feb 2018 14:17:21 +0000 (16:17 +0200)]
net/mlx4_en: Combine checks of end-cases in RX completion function

Combine two end-cases in the same if statement with a single return value.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx4_en: Remove unnecessary warn print in reset config
Eran Ben Elisha [Tue, 27 Feb 2018 14:17:20 +0000 (16:17 +0200)]
net/mlx4_en: Remove unnecessary warn print in reset config

In mlx4_en_reset_config, there was a redundant warn print that was left
from previous versions of this function. No warn is needed anymore.

This warn can be confusing when RX-FCS is changed:
Turn OFF RX-FCS:
  mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)
Turn ON RX-FCS:
  mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx4_en: Add physical RX/TX bytes/packets counters
Eran Ben Elisha [Tue, 27 Feb 2018 14:17:19 +0000 (16:17 +0200)]
net/mlx4_en: Add physical RX/TX bytes/packets counters

Add physical RX/TX packets/bytes counters into ethtool output to monitor
all traffic that was received and transmitted on the port. These
counters are available only for none Virtual Function.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Offloading-encapsulated-SPAN'
David S. Miller [Tue, 27 Feb 2018 19:46:28 +0000 (14:46 -0500)]
Merge branch 'mlxsw-Offloading-encapsulated-SPAN'

Jiri Pirko says:

====================
mlxsw: Offloading encapsulated SPAN

Petr says:

This patch series introduces support for mirroring with GRE
encapsulation. It offloads tc action mirred mirror from a mlxsw port to
either a gretap or an ip6gretap netdevice.

Spectrum hardware needs to know all the details of the requested
encapsulation: source and destination MAC and IP addresses, details of
VLAN tagging, etc. The only variables are the encapsulated packet
itself, and TOS field, which may be inherited. To that end, mlxsw driver
resolves the route that encapsulated packets would take, queries the
corresponding neighbor, and with that configuration in hand, configures
the mirroring in the hardware.

The driver also hooks into event handlers for netdevice changes, FIB and
neighbor events, and reconsiders the configuration on each such change.
When the new configuration differs from the currently-offloaded one, the
existing offload is removed and replaced with a new one.

It is possible to mirror to {ip6,}gretap from a matchall rule as well as
from a flower match.

** Note that with this patch set, mlxsw build depends on NET_IPGRE and
   IPV6_GRE.

Current limitations:

- There has to be a route that directs packets to an mlxsw port. We
  intend to extend the logic to support other netdevice types in the
  future, but the eventual egress netdevice will have to be an mlxsw
  port in any case.

- Offload reconfiguration due to changes in netdevice configuration
  creates a window of time where packets are not mirrored. Under some
  circumstances this can be prevented by configuring an unused port
  analyzer and migrating mirrors over to that. However that's currently
  not implemented.

- Remote address of a tunnel device needs to be set, there may not be a
  GRE key, checksumming or sequence numbers, and TTL needs to be fixed
  (non-inherit). These are hard requirements imposed by the underlying
  hardware.

- TOS of a tunnel device needs to be "inherit". The hardware supports a
  fixed TOS, but that's currently not implemented.

The series start with two patches, #1 and #2, that publish one function
and add support for querying IPv6 tunnel parameters.

In patches #3 and #4, we introduce helpers to GRE and tunneling code
that we will use later in the patchset from the SPAN code.

Patches #5 and #6 introduce support for encapsulated SPAN in reg.h.

The following seven patches, #7-#13, then prepare the SPAN codebase for
introduction of mirroring to netdevices that don't correspond to front
panel ports.

Then #14 and #15 pull all this together to implement mirroring to
{ip6,}gretap netdevices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_span: Support mirror to ip6gretap
Petr Machata [Tue, 27 Feb 2018 13:53:49 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Support mirror to ip6gretap

Similarly to mirror-to-gretap, this enables mirroring to IPv6 gretap
netdevice.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_span: Support mirror to gretap
Petr Machata [Tue, 27 Feb 2018 13:53:48 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Support mirror to gretap

When a user requests mirror from a mlxsw physical port (possibly based
on an ACL match) to a gretap netdevice, the driver needs to resolve the
request to a particular physical port that the mirrored packets will
egress through, and a suite of configuration keys (importantly, IP and
MAC addresses). That means calling into routing and neighbor kernel code
to simulate the decisions made by the system for packets passing through
a gretap netdevice.

Add a new instance of mlxsw_sp_span_entry_ops to support this.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: Move a mirroring check to mlxsw_sp_span_entry_create
Petr Machata [Tue, 27 Feb 2018 13:53:47 +0000 (14:53 +0100)]
mlxsw: Move a mirroring check to mlxsw_sp_span_entry_create

The check for whether a mirror port (which is a mlxsw front panel port)
belongs to the same mlxsw instance as the mirrored port, is currently
only done in spectrum_acl, even though it's applicable for the matchall
case as well. Thus move it to mlxsw_sp_span_entry_create().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: Handle config changes pertinent to SPAN
Petr Machata [Tue, 27 Feb 2018 13:53:46 +0000 (14:53 +0100)]
mlxsw: Handle config changes pertinent to SPAN

For some netdevices, for which mlxsw offloads mirroring, may have a
complex relationship between the declared intent and low-level
device configuration.

Trying to accurately track which changes might influence offloading
decisions is finicky and error-prone. Instead, this patch introduces a
function mlxsw_sp_span_entry_respin, which re-queries the configuration
anew and, if different, removes the existing offloads and installs new
ones.

Call this function strategically at event handlers that might influence
the mirroring configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_span: Generalize SPAN support
Petr Machata [Tue, 27 Feb 2018 13:53:45 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Generalize SPAN support

To support mirroring to different device types, the functions that
partake in configuring the port analyzer need to be extended to admit
non-trivial SPAN types.

Create a structure where all details of SPAN configuration are kept,
struct mlxsw_sp_span_parms. Also create struct mlxsw_sp_span_entry_ops
to keep per-SPAN-type operations.

Instantiate the latter once for MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH, and
once for a suite of NOP callbacks used for invalidated SPAN entry. Put
the formet as a sole member of a new array mlxsw_sp_span_entry_types,
where all known SPAN types are kept. Introduce a new function,
mlxsw_sp_span_entry_ops(), to look up the right ops suite given a
netdevice.

Change mlxsw_sp_span_mirror_add() to use both parms and ops structures.
Change mlxsw_sp_span_entry_get() and mlxsw_sp_span_entry_create() to
take these as arguments. Modify mlxsw_sp_span_entry_configure() and
mlxsw_sp_span_entry_deconfigure() to dispatch to ops.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Keep mirror netdev in mlxsw_sp_span_entry
Petr Machata [Tue, 27 Feb 2018 13:53:44 +0000 (14:53 +0100)]
mlxsw: spectrum: Keep mirror netdev in mlxsw_sp_span_entry

Currently the only mirror action supported by mlxsw is mirror to another
mlxsw physical port. Correspondingly, span_entry, which tracks each
mlxsw mirror in the system, currently holds a u8 number of the
destination port.

To extend this system to mirror to gretap and ip6gretap netdevices, have
struct mlxsw_sp_span_entry actually hold the destination netdevice
itself.

This change then trickles down in obvious manner to SPAN module API and
mirror-related interfaces in struct mlxsw_afa_ops.

To prevent use of invalid pointer, NETDEV_UNREGISTER needs to be hooked
and the corresponding SPAN entry invalidated.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_span: Extract mlxsw_sp_span_entry_{de, }configure()
Petr Machata [Tue, 27 Feb 2018 13:53:43 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Extract mlxsw_sp_span_entry_{de, }configure()

Configuring the hardware for encapsulated SPAN involves more code than
the simple mirroring case. Extract the related code to a separate
function to separate it from the rest of SPAN entry creation. Extract
deconfigure as well for symmetry, even though disablement is the same
regardless of SPAN type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_span: Initialize span_entry.id eagerly
Petr Machata [Tue, 27 Feb 2018 13:53:42 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Initialize span_entry.id eagerly

It is known statically ahead of time which SPAN entry will have which
ID. Just initialize it eagerly in mlxsw_sp_span_init(), don't wait until
the entry is actually created. This simplifies some code in
mlxsw_sp_span_entry_create()

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: span: Remove span_entry by span_id
Petr Machata [Tue, 27 Feb 2018 13:53:41 +0000 (14:53 +0100)]
mlxsw: span: Remove span_entry by span_id

Instead of removing span_entry by the port number, allow removing by
SPAN id. That simplifies some code right here, and for mirroring to soft
netdevices, avoids problems with netdevice pointer invalidation and
reuse.

Rename mlxsw_sp_span_entry_find() to mlxsw_sp_span_entry_find_by_port()
and keep it--follow-up patches will make use of it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: reg: Extend mlxsw_reg_mpat_pack()
Petr Machata [Tue, 27 Feb 2018 13:53:40 +0000 (14:53 +0100)]
mlxsw: reg: Extend mlxsw_reg_mpat_pack()

To support encapsulated SPAN, extend mlxsw_reg_mpat_pack() with a field
to set the SPAN type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: reg: Add SPAN encapsulation to MPAT register
Petr Machata [Tue, 27 Feb 2018 13:53:39 +0000 (14:53 +0100)]
mlxsw: reg: Add SPAN encapsulation to MPAT register

MPAT Register is used to query and configure the Switch Port Analyzer
Table. To configure Port Analyzer to encapsulate mirrored packets,
additional fields need to be specified for the MPAT register.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip_tunnel: Rename & publish init_tunnel_flow
Petr Machata [Tue, 27 Feb 2018 13:53:38 +0000 (14:53 +0100)]
ip_tunnel: Rename & publish init_tunnel_flow

Initializing struct flowi4 is useful for drivers that need to emulate
routing decisions made by a tunnel interface. Publish the
function (appropriately renamed) so that the drivers in question don't
need to cut'n'paste it around.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: GRE: Add is_gretap_dev, is_ip6gretap_dev
Petr Machata [Tue, 27 Feb 2018 13:53:37 +0000 (14:53 +0100)]
net: GRE: Add is_gretap_dev, is_ip6gretap_dev

Determining whether a device is a GRE device is easily done by
inspecting struct net_device.type. However, for the tap variants, the
type is just ARPHRD_ETHER.

Therefore introduce two predicate functions that use netdev_ops to tell
the tap devices.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_ipip: Support decoding IPv6 tunnel addresses
Petr Machata [Tue, 27 Feb 2018 13:53:36 +0000 (14:53 +0100)]
mlxsw: spectrum_ipip: Support decoding IPv6 tunnel addresses

To support mirroring to ip6gretap, the SPAN module needs to be able to
decode IPv6 addresses specified at that tunnel.

Extend mlxsw_sp_ipip_netdev_saddr() and mlxsw_sp_ipip_netdev_daddr() to
support IPv6 addresses. To that end, add and publish a support function
mlxsw_sp_ipip_netdev_parms6().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_ipip: Extract mlxsw_sp_l3addr_is_zero
Petr Machata [Tue, 27 Feb 2018 13:53:35 +0000 (14:53 +0100)]
mlxsw: spectrum_ipip: Extract mlxsw_sp_l3addr_is_zero

Extract the logic for determining whether a given IPv4/IPv6 address is
all-zeroes from mlxsw_sp_ipip_tunnel_complete to a separate function.
Make that function public within the module.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ibmvnic-Miscellaneous-driver-fixes-and-enhancements'
David S. Miller [Tue, 27 Feb 2018 19:31:20 +0000 (14:31 -0500)]
Merge branch 'ibmvnic-Miscellaneous-driver-fixes-and-enhancements'

Thomas Falcon says:

====================
ibmvnic: Miscellaneous driver fixes and enhancements

There is not a general theme to this patch set other than that it
fixes a few issues with the ibmvnic driver. I will just give a quick
summary of what each patch does here.

"ibmvnic: Fix TX descriptor tracking again" resolves a race condition
introduced in an earlier fix to track outstanding transmit descriptors.
This condition can throw off the tracking counter to the point that
a transmit queue will halt forever.

"ibmvnic: Allocate statistics buffers during probe" allocates queue
statistics buffers on device probe to avoid a crash when accessing
statistics of an unopened interface.

"ibmvnic: Harden TX/RX pool cleaning" includes additional checks to
avoid a bad access when cleaning RX and TX buffer pools during a device
reset.

"ibmvnic: Report queue stops and restarts as debug output" changes TX
queue state notifications from informational to debug messages. This
information is not necessarily useful to a user and under load can result
in a lot of log output.

"ibmvnic: Do not attempt to login if RX or TX queues are not allocated"
checks that device queues have been allocated successfully before
attempting device login. This resolves a panic that could occur if a
user attempted to configure a device after a failed reset.

Thanks for your attention.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Do not attempt to login if RX or TX queues are not allocated
Thomas Falcon [Tue, 27 Feb 2018 00:10:59 +0000 (18:10 -0600)]
ibmvnic: Do not attempt to login if RX or TX queues are not allocated

If a device reset fails for some reason, TX and RX queue resources
could be released. If a user attempts to open the device in this scenario,
it may result in a kernel panic as the driver tries to access this
memory. To fix this, include a check before device login that TX/RX
queues are still there before enabling the device. In addition, return a
value that can be checked in case of any errors to avoid waiting for a
completion that will never come.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Report queue stops and restarts as debug output
Thomas Falcon [Tue, 27 Feb 2018 00:10:58 +0000 (18:10 -0600)]
ibmvnic: Report queue stops and restarts as debug output

It's not necessary to report each time a queue is stopped and restarted
as an informational message. Change that to be a debug message so that
it can be observed if needed but not printed by default.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Harden TX/RX pool cleaning
Thomas Falcon [Tue, 27 Feb 2018 00:10:57 +0000 (18:10 -0600)]
ibmvnic: Harden TX/RX pool cleaning

If the driver releases resources after a failed reset or some other
error, the driver might attempt to clean up and free memory that
isn't there anymore. Include some additional checks that RX/TX queues
along with their associated structures are still there before cleaning.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Allocate statistics buffers during probe
Thomas Falcon [Tue, 27 Feb 2018 00:10:56 +0000 (18:10 -0600)]
ibmvnic: Allocate statistics buffers during probe

Currently, buffers holding individual queue statistics are allocated
when the device is opened. If an ibmvnic interface is hotplugged or
initialized but never opened, an attempt to get statistics with
ethtool will result in a kernel panic.

Since the driver allocates a constant number, the maximum supported
queues, of buffers, these can be allocated during device probe and
freed when the device is hot-unplugged or the module is removed.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Fix TX descriptor tracking again
Thomas Falcon [Tue, 27 Feb 2018 00:10:55 +0000 (18:10 -0600)]
ibmvnic: Fix TX descriptor tracking again

Sorry, the previous change introduced a race condition between
transmit completion processing and tracking TX descriptors. If a
completion is received before the number of descriptors is logged,
the number of descriptors will be add but not removed. After enough
times, this could halt the transmit queue forever.

Log the number of descriptors used by a transmit before sending.
I stress tested the fix on two different systems running over the
weekend without any issues.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'stmmac-barrier-fixes-and-cleanup'
David S. Miller [Tue, 27 Feb 2018 19:28:11 +0000 (14:28 -0500)]
Merge branch 'stmmac-barrier-fixes-and-cleanup'

Niklas Cassel says:

====================
stmmac barrier fixes and cleanup
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: make dwmac4_release_tx_desc() clear all descriptor fields
Niklas Cassel [Mon, 26 Feb 2018 21:47:09 +0000 (22:47 +0100)]
net: stmmac: make dwmac4_release_tx_desc() clear all descriptor fields

Make dwmac4_release_tx_desc() clear all descriptor fields, not just
TDES2 and TDES3.

I'm suspecting that TDES0 and TDES1 wasn't cleared because the DMA
engine uses them to store the tx hardware timestamp (if PTP is enabled).

However, stmmac_tx_clean() calls stmmac_get_tx_hwtstamp(), which reads
and saves the timestamp, before it calls release_tx_desc(), so this
is not an issue.

stmmac_xmit() and stmmac_tso_xmit() both always overwrite TDES0,
however, stmmac_tso_xmit() sometimes sets TDES1, and since neither
stmmac_xmit() nor stmmac_tso_xmit() explicitly clears TDES1, both
functions might reuse a DMA descriptor with old TDES1 data.

I haven't observed any misbehavior even though TDES1 sometimes
point to an old skb, however, explicitly clearing both TDES0 and TDES1
in dwmac4_release_tx_desc() minimizes the chances of undefined behavior.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: ensure that the device has released ownership before reading data
Niklas Cassel [Mon, 26 Feb 2018 21:47:08 +0000 (22:47 +0100)]
net: stmmac: ensure that the device has released ownership before reading data

According to Documentation/memory-barriers.txt, we need to use a
dma_rmb() after reading the status/own bit, to ensure that all
descriptor fields are read after reading the own bit.

This way, we ensure that the DMA engine is done with the DMA
descriptor before we read the other descriptor fields, e.g. reading
the tx hardware timestamp (if PTP is enabled).

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: use correct barrier between coherent memory and MMIO
Niklas Cassel [Mon, 26 Feb 2018 21:47:07 +0000 (22:47 +0100)]
net: stmmac: use correct barrier between coherent memory and MMIO

The last memory barrier in stmmac_xmit()/stmmac_tso_xmit() is placed
between a coherent memory write and a MMIO write:

The own bit is written in First Desc (TSO: MSS desc or First Desc).
<barrier>
The DMA engine is started by a write to the tx desc tail pointer/
enable dma transmission register, i.e. a MMIO write.

This barrier cannot be a simple dma_wmb(), since a dma_wmb() is only
used to guarantee the ordering, with respect to other writes,
to cache coherent DMA memory.

To guarantee that the cache coherent memory writes have completed
before we attempt to write to the cache incoherent MMIO region,
we need to use the more heavyweight barrier wmb().

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: ensure that the MSS desc is the last desc to set the own bit
Niklas Cassel [Mon, 26 Feb 2018 21:47:06 +0000 (22:47 +0100)]
net: stmmac: ensure that the MSS desc is the last desc to set the own bit

A dma_wmb() is used to guarantee the ordering, with respect to
other writes, to cache coherent DMA memory.

There is a dma_wmb() in prepare_tx_desc()/prepare_tso_tx_desc() which
ensures that TDES0/1/2 is written before TDES3 (which contains the own
bit), for First Desc.

However, in the rare case that MSS changes, there will be a MSS
context descriptor in front of the regular DMA descriptors:

<MSS desc> <- DMA Next Descriptor
<First Desc>
<desc n>
<Last Desc>

Thus, for this special case, we need a dma_wmb()
after prepare_tso_tx_desc()/before writing the own bit to the MSS desc,
so that we flush the write to TDES3 for First Desc,
in order to ensure that the MSS descriptor is the last descriptor to
set the own bit.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'RDS-optimized-notification-for-zerocopy-completion'
David S. Miller [Tue, 27 Feb 2018 19:19:11 +0000 (14:19 -0500)]
Merge branch 'RDS-optimized-notification-for-zerocopy-completion'

Sowmini Varadhan says:

====================
RDS: optimized notification for zerocopy completion

Resending with acked-by additions: previous attempt does not show
up in Patchwork. This time with a new mail Message-Id.

RDS applications use predominantly request-response, transacation
based IPC, so that ingress and egress traffic are well-balanced,
and it is possible/desirable to reduce system-call overhead by
piggybacking the notifications for zerocopy completion response
with data.

Moreover, it has been pointed out that socket functions block
if sk_err is non-zero, thus if the RDS code does not plan/need
to use sk_error_queue path for completion notification, it
is preferable to remove the sk_errror_queue related paths in
RDS.

Both of these goals are implemented in this series.

v2: removed sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: reap zerocopy completions passed up as ancillary data.
Sowmini Varadhan [Tue, 27 Feb 2018 17:52:44 +0000 (09:52 -0800)]
selftests/net: reap zerocopy completions passed up as ancillary data.

PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: deliver zerocopy completion notification with data
Sowmini Varadhan [Tue, 27 Feb 2018 17:52:43 +0000 (09:52 -0800)]
rds: deliver zerocopy completion notification with data

This commit is an optimization over commit 01883eda72bd
("rds: support for zcopy completion notification") for PF_RDS sockets.

RDS applications are predominantly request-response transactions, so
it is more efficient to reduce the number of system calls and have
zerocopy completion notification delivered as ancillary data on the
POLLIN channel.

Cookies are passed up as ancillary data (at level SOL_RDS) in a
struct rds_zcopy_cookies when the returned value of recvmsg() is
greater than, or equal to, 0. A max of RDS_MAX_ZCOOKIES may be passed
with each message.

This commit removes support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: revert the zerocopy Rx path for PF_RDS
Sowmini Varadhan [Tue, 27 Feb 2018 17:52:42 +0000 (09:52 -0800)]
selftests/net: revert the zerocopy Rx path for PF_RDS

In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 27 Feb 2018 17:56:36 +0000 (12:56 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-02-26

This series contains updates to i40e and i40evf only.

Mariusz adds a new ethtool private flag for forcing true link state with
the requested changes from Jakub Kicinski.

Paweł fixes an issue where we were double locking the same resource
which would generate a kernel panic after bringing an interface up for
i40evf.

Alan modifies both drivers to use software values to determine if there
are packets stalled on the ring with the added benefit of being less CPU
intensive since we do not need to reach into the hardware to get the
values.

Colin Ian King provides a few fixes detected by Coverity, first was to
pass a struct by reference versus by value to be more efficient.  Then
verify the VSI pointer is not NULL before trying to dereference it.
Cleaned up redundant checks that always return true.

Dan Carpenter fixes over indented lines of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve interrupt handling
Heiner Kallweit [Sat, 24 Feb 2018 15:53:23 +0000 (16:53 +0100)]
r8169: improve interrupt handling

This patch improves few aspects of interrupt handling:
- update to current interrupt allocation API
  (use pci_alloc_irq_vectors() instead of deprecated pci_enable_msi())
- this implicitly will allocate a MSI-X interrupt if available
- get rid of flag RTL_FEATURE_MSI
- remove some dead code, intentionally disabling (unreliable) MSI
  being partially available on old PCI chips.

The patch works fine on a RTL8168evl (chip version 34) and on a
RTL8169SB (chip version 04).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: Add fib-onlink-tests.sh to TEST_PROGS
David Ahern [Mon, 26 Feb 2018 17:53:15 +0000 (09:53 -0800)]
selftests: Add fib-onlink-tests.sh to TEST_PROGS

Fixes: 153e1b84f477 ("selftests: Add FIB onlink tests")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'DPAA-Ethernet-fixes'
David S. Miller [Tue, 27 Feb 2018 16:40:04 +0000 (11:40 -0500)]
Merge branch 'DPAA-Ethernet-fixes'

Madalin Bucur says:

====================
DPAA Ethernet fixes

Fixed an issue on the Tx path that was visible in netperf
TCP_SENDFILE tests. Addressed another issue with Rx errors
not being always counted. Adding control for allmulti.

v2: rephrased commit message, reduced changes in the SG mapping fix
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodpaa_eth: Add allmulti option
Radu Bulie [Mon, 26 Feb 2018 17:24:04 +0000 (11:24 -0600)]
dpaa_eth: Add allmulti option

This patch adds allmulticast option for memac, dtsec
and 10GEC controllers.

Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodpaa_eth: refactor frag count checking
Madalin Bucur [Mon, 26 Feb 2018 17:24:03 +0000 (11:24 -0600)]
dpaa_eth: refactor frag count checking

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodpaa_eth: make sure all Rx errors are counted
Madalin Bucur [Mon, 26 Feb 2018 17:24:02 +0000 (11:24 -0600)]
dpaa_eth: make sure all Rx errors are counted

Simplify the code and avoid some Rx errors not being
accounted.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodpaa_eth: fix SG mapping
Madalin Bucur [Mon, 26 Feb 2018 17:24:01 +0000 (11:24 -0600)]
dpaa_eth: fix SG mapping

An issue in the code mapping the skb fragments into
scatter-gather frames was evidentiated by netperf
TCP_SENDFILE tests. The size was set wrong for all
fragments but the first, affecting the transmission
of any skb with more than one fragment.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ieee802154-for-davem-2018-02-26' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Tue, 27 Feb 2018 16:14:57 +0000 (11:14 -0500)]
Merge branch 'ieee802154-for-davem-2018-02-26' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2018-02-26

An update from ieee802154 for *net-next*

Alexander corrected a setting which got lost during some 6lowpan rework
a while back and Xue Liu provided us with a new driver for the MCR20A
transceiver.

If there are any issues let me know. If not, please pull.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'pernet_operations-convert-part-3'
David S. Miller [Tue, 27 Feb 2018 16:01:40 +0000 (11:01 -0500)]
Merge branch 'pernet_operations-convert-part-3'

Kirill Tkhai says:

====================
Converting pernet_operations (part #3)

This patchset continues to review and to convert pernet_operations
to async. Where it is possible, they are grouped by type of actions
init/exit methods ([1/28], for example). I hope this will make
the review a little bit easier. The changes are tree-wide: in net, fs,
drivers and security.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert smack_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:04:02 +0000 (16:04 +0300)]
net: Convert smack_net_ops

These pernet_operations only register and unregister nf hooks.
So, they are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert selinux_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:03:55 +0000 (16:03 +0300)]
net: Convert selinux_net_ops

These pernet_operations only register and unregister nf hooks.
So, they are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert defrag6_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:03:45 +0000 (16:03 +0300)]
net: Convert defrag6_net_ops

These pernet_operations only unregister nf hooks.
So, they are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ila_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:03:36 +0000 (16:03 +0300)]
net: Convert ila_net_ops

These pernet_operations register and unregister nf hooks.
Also they populate and depopulate ila_net_id-pointed hash
table. The table is changed by hooks during skb processing
and via netlink request. It looks impossible for another
net pernet_operations to force the table reading or writing,
so, they are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert defrag4_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:03:15 +0000 (16:03 +0300)]
net: Convert defrag4_net_ops

These pernet_operations only unregister nf hooks.
So, they are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert clusterip_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:03:05 +0000 (16:03 +0300)]
net: Convert clusterip_net_ops

These pernet_operations register and unregister nf hooks,
and populate and destroy /proc entry. So, they are able
to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert brnf_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:56 +0000 (16:02 +0300)]
net: Convert brnf_net_ops

These pernet_operations only unregister nf hooks.
So, they are able to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ipvlan_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:48 +0000 (16:02 +0300)]
net: Convert ipvlan_net_ops

These pernet_operations unregister ipvlan net hooks.
nf_unregister_net_hooks() removes hooks one-by-one,
and then frees the memory via rcu. This looks similar
to that happens, when a new hooks is added: allocation
of bigger memory region, copy of old content, and rcu
freeing the old memory. So, all of net code should be
well with this behavior. Also at the time of hook
unregistering, there are no packets, and foreign net
pernet_operations are not interested in others hooks.
So, we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert cfg802154_pernet_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:37 +0000 (16:02 +0300)]
net: Convert cfg802154_pernet_ops

These pernet_operations have only exit method, which
moves devices from cfg802154_rdev_list to init_net.
This may occur in any time from nl802154_wpan_phy_netns(),
so we are nice with rtnl_lock() synchronization.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert sit_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:27 +0000 (16:02 +0300)]
net: Convert sit_net_ops

These pernet_operations are similar to ip6_tnl_net_ops. Exit method
unregisters all net sit devices, and it looks like another
pernet_operations are not interested in foreign net sit list.
Init method registers netdevice. So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert vti6_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:19 +0000 (16:02 +0300)]
net: Convert vti6_net_ops

These pernet_operations are similar to ip6_tnl_net_ops. Exit method
unregisters all net vti6 tunnels, and it looks like another
pernet_operations are not interested in foreign net vti6 list.
Init method registers netdevice. So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ip6_tnl_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:11 +0000 (16:02 +0300)]
net: Convert ip6_tnl_net_ops

These pernet_operations are similar to ip6gre_net_ops. Exit method
unregisters all net ip6_tnl tunnels, and it looks like another
pernet_operations are not interested in foreign net tunnels list.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ip6gre_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:02:03 +0000 (16:02 +0300)]
net: Convert ip6gre_net_ops

These pernet_operations are similar to bond_net_ops. Exit method
unregisters all net ip6gre devices, and it looks like another
pernet_operations are not interested in foreign net ip6gre list
or net_generic()->tunnels_wc. Init method registers net device.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_n...
Kirill Tkhai [Mon, 26 Feb 2018 13:01:52 +0000 (16:01 +0300)]
net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops

These pernet_operations are similar to bond_net_ops. Exit methods
unregisters all net ipgre/ipgre_tap/erspan/vti/ipip devices, and it
looks like another pernet_operations are not interested in foreign
net ipgre/ipgre_tap/erspan/vti/ipip list. Init method also does not
intersect with something pernet-specific. So, it's possible
to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert br_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:01:43 +0000 (16:01 +0300)]
net: Convert br_net_ops

These pernet_operations are similar to bond_net_ops. Exit method
unregisters all net bridge devices, and it looks like another
pernet_operations are not interested in foreign net bridge list.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert vxlan_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:01:34 +0000 (16:01 +0300)]
net: Convert vxlan_net_ops

These pernet_operations are similar to bond_net_ops. Exit method
unregisters all net vlanx devices, and it looks like another
pernet_operations are not interested in foreign net vlanx list.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert ppp_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:01:25 +0000 (16:01 +0300)]
net: Convert ppp_net_ops

These pernet_operations are similar to bond_net_ops. Exit method
unregisters all net ppp devices, and it looks like another
pernet_operations are not interested in foreign net ppp list.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert gtp_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:01:16 +0000 (16:01 +0300)]
net: Convert gtp_net_ops

These pernet_operations are similar to bond_net_ops. Exit method
unregisters all net gtp devices, and it looks like another
pernet_operations are not interested in foreign net gtp list.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert geneve_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:00:48 +0000 (16:00 +0300)]
net: Convert geneve_net_ops

These pernet_operations are similar to bond_net_ops. Exit method
unregisters all net geneve devices, and it looks like another
pernet_operations are not interested in foreign net geneve list.
So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert bond_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 13:00:40 +0000 (16:00 +0300)]
net: Convert bond_net_ops

These pernet_operations populate/depopulate /proc and /sys
entries. Exit method unregisters all net bond devices, and
it seems another pernet_operations are not interested in
foreign net bond list. So, it's possible to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations
Kirill Tkhai [Mon, 26 Feb 2018 13:00:31 +0000 (16:00 +0300)]
net: Convert tc_action_net_init() and tc_action_net_exit() based pernet_operations

These pernet_operations are from net/sched directory, and they call only
tc_action_net_init() and tc_action_net_exit():

bpf_net_ops
connmark_net_ops
csum_net_ops
gact_net_ops
ife_net_ops
ipt_net_ops
xt_net_ops
mirred_net_ops
nat_net_ops
pedit_net_ops
police_net_ops
sample_net_ops
simp_net_ops
skbedit_net_ops
skbmod_net_ops
tunnel_key_net_ops
vlan_net_ops

1)tc_action_net_init() just allocates and initializes per-net memory.
2)There should not be in-flight packets at the time of tc_action_net_exit()
call, or another pernet_operations send packets to dying net (except
netlink). So, it seems they can be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert sysctl creating and destroying pernet_operations
Kirill Tkhai [Mon, 26 Feb 2018 13:00:22 +0000 (16:00 +0300)]
net: Convert sysctl creating and destroying pernet_operations

These pernet_operations create and destroy sysctl tables,
and they are able to be executed in parallel with any others:

ip_vs_lblc_ops
ip_vs_lblcr_ops

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert cma_pernet_operations
Kirill Tkhai [Mon, 26 Feb 2018 13:00:12 +0000 (16:00 +0300)]
net: Convert cma_pernet_operations

These pernet_operations just create and destroy IDR.
So, we mark them as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert simple pernet_operations
Kirill Tkhai [Mon, 26 Feb 2018 12:59:56 +0000 (15:59 +0300)]
net: Convert simple pernet_operations

These pernet_operations make pretty simple actions
like variable initialization on init, debug checks
on exit, and so on, and they obviously are able
to be executed in parallel with any others:

vrf_net_ops
lockd_net_ops
grace_net_ops
xfrm6_tunnel_net_ops
kcm_net_ops
tcf_net_ops

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert nfs_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 12:59:47 +0000 (15:59 +0300)]
net: Convert nfs_net_ops

These pernet_operations just create and destroy /proc entries
and net_generic()->cb_ident_idr IDR. So, we are able to mark
them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert synproxy_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 12:59:37 +0000 (15:59 +0300)]
net: Convert synproxy_net_ops

These pernet_operations create and destroy /proc entries
and allocate extents to template ct, which depend on global
nf_ct_ext_types[] array. So, we are able to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert hashlimit_net_ops and recent_net_ops
Kirill Tkhai [Mon, 26 Feb 2018 12:59:28 +0000 (15:59 +0300)]
net: Convert hashlimit_net_ops and recent_net_ops

These pernet_operations just create and destroy /proc entries.
Also, new /proc entries also may come after new nf rules
are added, but this is not possible, when net isn't alive.
So, they are safe to be marked as async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert /proc creating and destroying pernet_operations
Kirill Tkhai [Mon, 26 Feb 2018 12:59:19 +0000 (15:59 +0300)]
net: Convert /proc creating and destroying pernet_operations

These pernet_operations just create and destroy /proc entries,
and they can safely marked as async:

pppoe_net_ops
vlan_net_ops
canbcm_pernet_ops
kcm_net_ops
pfkey_net_ops
pppol2tp_net_ops
phonet_net_ops

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipvlan: fix building with modular IPV6
Arnd Bergmann [Mon, 26 Feb 2018 09:41:30 +0000 (10:41 +0100)]
ipvlan: fix building with modular IPV6

We no longer depend on IPV6, but that now causes a link error with
CONFIG_IPV6=m and CONFIG_IPVLAN=y:

drivers/net/ipvlan/ipvlan_core.o: In function `ipvlan_queue_xmit':
ipvlan_core.c:(.text+0x1440): undefined reference to `ip6_route_output_flags'
drivers/net/ipvlan/ipvlan_core.o: In function `ipvlan_l3_rcv':
ipvlan_core.c:(.text+0x1818): undefined reference to `ip6_route_input_lookup'

This adds back the dependency on IPV6, with the option of building without
IPV6, but forcing IPVLAN to be a module when IPV6 is a module.

Fixes: 94333fac44d1 ("ipvlan: drop ipv6 dependency")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 27 Feb 2018 01:58:25 +0000 (20:58 -0500)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-02-26

This series contains updates to ixgbe and ixgbevf only.

Colin Ian King cleans up redundant variable assignments.

Tonghao Zhang updates ixgbe to avoid writing to the hardware when the
redirection table has not changed.

Jake fixes the driver logic for checking and clearing receive timestamp
hangs so that when the PTP_RX_TIMESTAMP_IN_REGISTER flag is set, we no
longer need to check for receive timestamp hangs, which in turn will
stop the spurious log messages.

Emil updates ixgbevf with several features and improvements done in
other drivers, starting with the handling of page addresses so that we
always refer to them using a void pointer.  Added a 'legacy-rx' flag to
allow switching between the old and new receive code paths.  Added
support for using 3K buggers in order 1 page.  Updated the driver to
ensure that calls to ixgbevf_open() are rtnl lock protected and improved
the error handling when setting up multiple queues.  Added support for
providing a buffer with head room and tail room to allow for shared
info, NET_SKB_PAD, and NET_IP_ALIGN, so that we can start using
build_skb to build frames instead of using memcpy() the headers.
Updated the logic of handling rings closer to ixgbe.  Consolidated the
receive paths to reduce duplication when we expand them in the future.
Added build_skb() support to ixgbevf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: remove some stray indenting
Dan Carpenter [Wed, 21 Feb 2018 12:53:53 +0000 (15:53 +0300)]
i40e: remove some stray indenting

These two lines are indented too far.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: remove redundant array comparisons to 0 checks
Colin Ian King [Mon, 19 Feb 2018 10:23:30 +0000 (10:23 +0000)]
i40evf: remove redundant array comparisons to 0 checks

The checks to see if key->dst.s6_addr and key->src.s6_addr are null
pointers are redundant because these are constant size arrays and
so the checks always return true.  Fix this by removing the redundant
checks.   Also replace filter->f with vf, allowing wide lines to be
condensed and to rejoin some split wide lines.

Detected by CoverityScan, CID#1465279 ("Array compared to 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: check that pointer VSI is not null before dereferencing it
Colin Ian King [Thu, 15 Feb 2018 19:50:52 +0000 (19:50 +0000)]
i40e: check that pointer VSI is not null before dereferencing it

Function i40e_find_vsi_from_id can potentially return null, hence
VSI may be null, so defensively check it is non-null before
dereferencing it to check the seid.

Fixes: e284fc280473 ("i40e: Add and delete cloud filter")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: pass struct virtchnl_filter by reference rather than by value
Colin Ian King [Thu, 15 Feb 2018 19:26:05 +0000 (19:26 +0000)]
i40evf: pass struct virtchnl_filter by reference rather than by value

Passing struct virtchnl_filter f by value requires a 272 byte copy
on x86_64, so instead pass it by reference is much more efficient. Also
adjust some lines that are over 80 chars.

Detected by CoverityScan, CID#1465285 ("Big parameter passed by value")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e/i40evf: use SW variables for hang detection
Alan Brady [Mon, 12 Feb 2018 14:16:59 +0000 (09:16 -0500)]
i40e/i40evf: use SW variables for hang detection

The i40e_detect_recover_hung function uses the i40e_get_tx_pending
function to determine if there are packets stalled on the ring.
i40e_get_tx_pending calculates the pending packets using the head
writeback value and HW tail.  If the queue is stopped and we lose the
interrupt to update our next_to_clean then we a) won't get another
interrupt to clean because queue is stopped b) we won't catch the
problem with i40e_detect_recover_hung because the HW values look like
there's no packets waiting to be transmitted.  Using the SW values we
can catch the issue because next_to_clean will be out of sync with head
writeback.

This has the added benefit being less CPU intensive because we don't
need to reach into the hardware to get the values.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Fix double locking the same resource
Paweł Jabłoński [Mon, 5 Feb 2018 21:03:36 +0000 (13:03 -0800)]
i40evf: Fix double locking the same resource

Removes the locking of adapter->mac_vlan_list_lock resource in
i40evf_add_filter(). The locking part is moved above i40evf_add_filter().
i40evf_add_filter(), called by i40evf_addr_sync(), was trying to lock the
resource again and double locking generated a kernel panic after bringing
an interface up.

Fixes: 8946b56354b7 ("i40evf: use __dev_[um]c_sync routines in
       .set_rx_mode")
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agonet: make kmem caches as __ro_after_init
Alexey Dobriyan [Sat, 24 Feb 2018 18:20:33 +0000 (21:20 +0300)]
net: make kmem caches as __ro_after_init

All kmem caches aren't reallocated once set up.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: link_down_on_close private flag support
Mariusz Stachura [Tue, 14 Nov 2017 12:00:50 +0000 (07:00 -0500)]
i40e: link_down_on_close private flag support

This patch introduces new ethtool private flag used for
forcing true link state. Function i40e_force_link_state that implements
this functionality was added, it sets phy_type = 0 in order to
work-around firmware's LESM. False positive error messages were
suppressed.

The ndo_open() should not succeed if there were issues with forcing link
state to be UP.

Added I40E_PHY_TYPES_BITMASK define with all phy types OR-ed together in
one bitmask.  Added after phy type definition, so it will be hard to
forget to include new phy types to the bitmask.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoMerge branch 'sonic-ethernet-cleanups'
David S. Miller [Mon, 26 Feb 2018 19:40:03 +0000 (14:40 -0500)]
Merge branch 'sonic-ethernet-cleanups'

Finn Thain says:

====================
Fixes, cleanup and modernization for SONIC ethernet drivers

Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Omitted patches unrelated to SONIC drivers.
- Dropped changes to the 'version_printed' logic and debug message text.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sonic: Replace custom debug logging with netif_* calls
Finn Thain [Sat, 24 Feb 2018 23:27:25 +0000 (18:27 -0500)]
net/sonic: Replace custom debug logging with netif_* calls

Eliminate duplicated debug code by moving it into the core driver.
Don't log the only valid silicon revision number (it's in the source).

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Chris Zankel <chris@zankel.net>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sonic: Clean up and modernize log messages
Finn Thain [Sat, 24 Feb 2018 23:27:25 +0000 (18:27 -0500)]
net/sonic: Clean up and modernize log messages

Add missing printk severity levels by adopting pr_foo() calls for the
platform_driver and dev_foo() calls for the nubus_driver.
Avoid KERN_CONT usage as per advice from checkpatch.
Avoid #ifdef around printk calls.
Don't log driver probe messages after calling register_netdev().

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Chris Zankel <chris@zankel.net>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/macsonic: Drop redundant MACH_IS_MAC test
Finn Thain [Sat, 24 Feb 2018 23:27:25 +0000 (18:27 -0500)]
net/macsonic: Drop redundant MACH_IS_MAC test

The MACH_IS_MAC test is redundant here because the platform device
won't get registered unless MACH_IS_MAC.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/macsonic: Convert to nubus_driver
Finn Thain [Sat, 24 Feb 2018 23:27:25 +0000 (18:27 -0500)]
net/macsonic: Convert to nubus_driver

This resolves an old issue preventing any NuBus SONIC NICs from
working in a Mac with an on-board SONIC device.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix TSU init on SH7734/R8A7740
Sergei Shtylyov [Sat, 24 Feb 2018 19:41:45 +0000 (22:41 +0300)]
sh_eth: fix TSU init on SH7734/R8A7740

It appears that the single port Ether controllers having TSU (like SH7734/
R8A7740) need the same kind of treating in sh_eth_tsu_init() as R7S72100
currently has -- they also don't have the TSU registers related e.g. to
passing the frames between ports. Add the 'sh_eth_cpu_data::dual_port'
flag and use it as a new criterion for taking a "short path" in the TSU
init sequence in order to avoid writing to the non-existent registers...

Fixes: f0e81fecd4f8 ("net: sh_eth: Add support SH7734")
Fixes: 73a0d907301e ("net: sh_eth: add support R8A7740")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: TSU_QTAG0/1 registers the same as TSU_QTAGM0/1
Sergei Shtylyov [Sat, 24 Feb 2018 17:28:16 +0000 (20:28 +0300)]
sh_eth: TSU_QTAG0/1 registers the same as TSU_QTAGM0/1

The TSU_QTAG0/1 registers found in the Gigabit Ether controllers actually
have the same long name  as the TSU_QTAGM0/1 registers in the early Ether
controllers:  Qtag Addition/Deletion Set Register (Port 0/1 to 1/0); thus
there's no need to make a difference in sh_eth_tsu_init() between those
controllers. Unfortunately, we can't just remove TSU_QTAG0/1 from the
register *enum* because that would break the ethtool register dump...

Fixes: b0ca2a21f769 ("sh_eth: Add support of SH7763 to sh_eth")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc: python3, string formattings
BTaskaya [Fri, 23 Feb 2018 19:57:35 +0000 (22:57 +0300)]
tc: python3, string formattings

This patch converts old type string formattings to new type string
formattings for adapting Linux Traffic Control (tc) unit testing suite
python3.

Linux Traffic Control (tc) unit testing suite's code quality improved is improved with this patch.
According to python documentation;
"The built-in string class provides the ability to do complex variable substitutions and
value formatting via the format() method described in PEP 3101. "
but the project was using old type formattings and new type string formattings together,
this patch's main purpose is converting all old types to new types.

Following files changed:
 1. tools/testing/selftests/tc-testing/tdc.py
 2. tools/testing/selftests/tc-testing/tdc_batch.py

Following PEP rules applied:
 1. PEP8 - Code Styling
 2. PEP3101 - Advanced Code Formatting

Signed-off-by: Batuhan Osman Taskaya <batuhanosmantaskaya@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoixgbevf: remove redundant initialization of variable 'dma'
Colin Ian King [Thu, 1 Feb 2018 18:35:39 +0000 (18:35 +0000)]
ixgbevf: remove redundant initialization of variable 'dma'

Variable dma is initialized with a value that is never read, later
on it is re-assigned a new value, hence the initialization is redundant
and can be removed.

Cleans up clang warning:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:584:13: warning: Value
stored to 'dma' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbevf: add build_skb support
Emil Tantilov [Wed, 31 Jan 2018 00:51:54 +0000 (16:51 -0800)]
ixgbevf: add build_skb support

Add support for build_skb() similar to:
commit 6f429223b31c ("ixgbe: Add support for build_skb")

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbevf: break out Rx buffer page management
Emil Tantilov [Wed, 31 Jan 2018 00:51:49 +0000 (16:51 -0800)]
ixgbevf: break out Rx buffer page management

Based on commit e014272672b9 ("igb: Break out Rx buffer page management")

Consolidate Rx code paths to reduce duplication when we expand them in
the future.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbevf: allocate the rings as part of q_vector
Emil Tantilov [Wed, 31 Jan 2018 00:51:43 +0000 (16:51 -0800)]
ixgbevf: allocate the rings as part of q_vector

Make it so that all rings allocations are made as part of q_vector.
The advantage to this is that we can keep all of the memory related to
a single interrupt in one page.

The goal is to bring the logic of handling rings closer to ixgbe.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbevf: make sure all frames fit minimum size requirements
Emil Tantilov [Wed, 31 Jan 2018 00:51:38 +0000 (16:51 -0800)]
ixgbevf: make sure all frames fit minimum size requirements

Similar to commit a50c29dd09ed
("ixgbe: Make certain that all frames fit minimum size requirements")

Make sure that any packet we attempt to transmit will meet minimum
size requirements.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbevf: add support for padding packet
Emil Tantilov [Wed, 31 Jan 2018 00:51:33 +0000 (16:51 -0800)]
ixgbevf: add support for padding packet

Following the logic from commit 2de6aa3a666e
("ixgbe: Add support for padding packet")

Add support for providing a buffer with headroom and tail room
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN.  With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoixgbevf: setup queue counts
Emil Tantilov [Wed, 31 Jan 2018 00:51:27 +0000 (16:51 -0800)]
ixgbevf: setup queue counts

Add calls for netif_set_real_num_t/rx_queues() in ixgbevf_open().
Make sure that calls to ixgbevf_open() are rtnl protected and improve
the error handling when setting up multiple queues.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>