platform/kernel/linux-rpi.git
6 years agotipc: eliminate buffer cloning in function tipc_msg_extract()
Tung Nguyen [Thu, 28 Jun 2018 20:25:04 +0000 (22:25 +0200)]
tipc: eliminate buffer cloning in function tipc_msg_extract()

The function tipc_msg_extract() is using skb_clone() to clone inner
messages from a message bundle buffer. Although this method is safe,
it has an undesired effect that each buffer clone inherits the
true-size of the bundling buffer. As a result, the buffer clone
almost always ends up with being copied anyway by the message
validation function. This makes the cloning into a sub-optimization.

In this commit we take the consequence of this realization, and copy
each inner message to a separately allocated buffer up front in the
extraction function.

As a bonus we can now eliminate the two cases where we had to copy
re-routed packets that may potentially go out on the wire again.

Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: usb: Mark expected switch fall-throughs
Gustavo A. R. Silva [Thu, 28 Jun 2018 18:50:48 +0000 (13:50 -0500)]
net: usb: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: realtek: add support for RTL8211
Heiner Kallweit [Thu, 28 Jun 2018 18:46:45 +0000 (20:46 +0200)]
net: phy: realtek: add support for RTL8211

In preparation of adding phylib support to the r8169 driver we need
PHY drivers for all chip-internal PHY types. Fortunately almost all
of them are either supported by the Realtek PHY driver already or work
with the genphy driver.
Still missing is support for the PHY of RTL8169s, it requires a quirk
to properly support 100Mbit-fixed mode. The quirk was copied from
r8169 driver which copied it from the vendor driver.
Based on the PHYID the internal PHY seems to be a RTL8211.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: use standard debug output functions
Heiner Kallweit [Thu, 28 Jun 2018 18:36:15 +0000 (20:36 +0200)]
r8169: use standard debug output functions

I see no need to define a private debug output symbol, let's use the
standard debug output functions instead. In this context also remove
the deprecated PFX define.

The one assertion is wrong IMO anyway, this code path is used also
by chip version 01.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'smc-pnetid-and-SMC-D-support'
David S. Miller [Sat, 30 Jun 2018 11:42:26 +0000 (20:42 +0900)]
Merge branch 'smc-pnetid-and-SMC-D-support'

Ursula Braun says:

====================
smc: pnetid and SMC-D support

SMC requires a configured pnet table to map Ethernet interfaces to
RoCE adapter ports. For s390 there exists hardware support to group
such devices. The first three patches cover the s390 pnetid support,
enabling SMC-R usage on s390 without configuring an extra pnet table.

SMC currently requires RoCE adapters, and uses RDMA-techniques
implemented with IB-verbs. But s390 offers another method for
intra-CEC Shared Memory communication. The following seven patches
implement a solution to run SMC traffic based on intra-CEC DMA,
called SMC-D.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agos390/ism: add device driver for internal shared memory
Sebastian Ott [Thu, 28 Jun 2018 17:05:13 +0000 (19:05 +0200)]
s390/ism: add device driver for internal shared memory

Add support for the Internal Shared Memory vPCI Adapter.
This driver implements the interfaces of the SMC-D protocol.

Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add SMC-D diag support
Hans Wippel [Thu, 28 Jun 2018 17:05:12 +0000 (19:05 +0200)]
net/smc: add SMC-D diag support

This patch adds diag support for SMC-D.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add SMC-D support in af_smc
Hans Wippel [Thu, 28 Jun 2018 17:05:11 +0000 (19:05 +0200)]
net/smc: add SMC-D support in af_smc

This patch ties together the previous SMC-D patches. It adds support for
SMC-D to the listen and connect functions and, thus, enables SMC-D
support in the SMC code. If a connection supports both SMC-R and SMC-D,
SMC-D is preferred.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add SMC-D support in data transfer
Hans Wippel [Thu, 28 Jun 2018 17:05:10 +0000 (19:05 +0200)]
net/smc: add SMC-D support in data transfer

The data transfer and CDC message headers differ in SMC-R and SMC-D.
This patch adds support for the SMC-D data transfer to the existing SMC
code. It consists of the following:

* SMC-D CDC support
* SMC-D tx support
* SMC-D rx support

The CDC header is stored at the beginning of the receive buffer. Thus, a
rx_offset variable is added for the CDC header offset within the buffer
(0 for SMC-R).

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add SMC-D support in CLC messages
Hans Wippel [Thu, 28 Jun 2018 17:05:09 +0000 (19:05 +0200)]
net/smc: add SMC-D support in CLC messages

There are two types of SMC: SMC-R and SMC-D. These types are signaled
within the CLC messages during the CLC handshake. This patch adds
support for and checks of the SMC type.

Also, SMC-R and SMC-D need to exchange different information during the
CLC handshake. So, this patch extends the current message formats to
support the SMC-D header fields. The Proposal message can contain both
SMC-R and SMC-D information. The Accept and Confirm messages contain
either SMC-R or SMC-D information.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add pnetid support for SMC-D and ISM
Hans Wippel [Thu, 28 Jun 2018 17:05:08 +0000 (19:05 +0200)]
net/smc: add pnetid support for SMC-D and ISM

SMC-D relies on PNETIDs to find usable SMC-D/ISM devices for a SMC
connection. This patch adds SMC-D/ISM support to the current PNETID
implementation.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add base infrastructure for SMC-D and ISM
Hans Wippel [Thu, 28 Jun 2018 17:05:07 +0000 (19:05 +0200)]
net/smc: add base infrastructure for SMC-D and ISM

SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.

This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:

* ISM driver interface:
  This interface allows an ISM driver to register ISM devices in SMC. In
  the process, the driver provides a set of device ops for each device.
  SMC uses these ops to execute SMC specific operations on or transfer
  data over the device.

* Core SMC-D link group, connection, and buffer support:
  Link groups, SMC connections and SMC buffers (in smc_core) are
  extended to support SMC-D.

* SMC type checks:
  Some type checks are added to prevent using SMC-R specific code for
  SMC-D and vice versa.

To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: optimize consumer cursor updates
Ursula Braun [Thu, 28 Jun 2018 17:05:06 +0000 (19:05 +0200)]
net/smc: optimize consumer cursor updates

The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
Currently the decision to send a separate consumer cursor update
just considers the amount of data already received by the socket
program. It does not consider the amount of data already arrived, but
not yet consumed by the receiver. Basing the decision on the
difference between already confirmed and already arrived data
(instead of difference between already confirmed and already consumed
data), may lead to a somewhat earlier consumer cursor update send in
fast unidirectional traffic scenarios, and thus to better throughput.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: add pnetid support
Ursula Braun [Thu, 28 Jun 2018 17:05:05 +0000 (19:05 +0200)]
net/smc: add pnetid support

s390 hardware supports the definition of a so-call Physical NETwork
IDentifier (short PNETID) per network device port. These PNETIDS
can be used to identify network devices that are attached to the same
physical network (broadcast domain).

On s390 try to use the PNETID of the ethernet device port used for
initial connecting, and derive the IB device port used for SMC RDMA
traffic.

On platforms without PNETID support fall back to the existing
solution of a configured pnet table.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: determine port attributes independent from pnet table
Ursula Braun [Thu, 28 Jun 2018 17:05:04 +0000 (19:05 +0200)]
net/smc: determine port attributes independent from pnet table

For SMC it is important to know the current port state of RoCE devices.
Monitoring port states has been triggered, when a RoCE device was added
to the pnet table. To support future alternatives to the pnet table the
monitoring of ports is made independent of the existence of a pnet table.
It starts once the smc_ib_device is established.

Due to this change smc_ib_remember_port_attr() is now a local function
and shuffling its location and the location of its used functions
makes any forward references obsolete.

And the duplicate SMC_MAX_PORTS definition is removed.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'Fixes-for-running-mirror-to-gretap-tests-on-veth'
David S. Miller [Sat, 30 Jun 2018 11:34:09 +0000 (20:34 +0900)]
Merge branch 'Fixes-for-running-mirror-to-gretap-tests-on-veth'

Petr Machata says:

====================
Fixes for running mirror-to-gretap tests on veth

The forwarding selftests infrastructure makes it possible to run the
individual tests on a purely software netdevices. Names of interfaces to
run the test with can be passed as command line arguments to a test.
lib.sh then creates veth pairs backing the interfaces if none exist in
the system.

However, the tests need to recognize that they might be run on a soft
device. Many mirror-to-gretap tests are buggy in this regard. This patch
set aims to fix the problems in running mirror-to-gretap tests on veth
devices.

In patch #1, a service function is split out of setup_wait().
In patch #2, installing a trap is made optional.
In patch #3, tc filters in several tests are tweaked to work with veth.
In patch #4, the logic for waiting for neighbor is fixed for veth.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: mirror_gre_changes: Fix waiting for neighbor
Petr Machata [Thu, 28 Jun 2018 16:56:39 +0000 (18:56 +0200)]
selftests: forwarding: mirror_gre_changes: Fix waiting for neighbor

When running the test on soft devices, there's no mechanism to
gratuitously start resolving the neighbor for remote tunnel endpoint.
So instead of passively waiting, wait for the device to be up, and then
probe the neighbor with a ping.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Tweak tc filters for mirror-to-gretap tests
Petr Machata [Thu, 28 Jun 2018 16:56:33 +0000 (18:56 +0200)]
selftests: forwarding: Tweak tc filters for mirror-to-gretap tests

When running mirror_gre_bridge_1d_vlan tests on veth, several issues
cause spurious failures:

- vlan_ethtype should be ip, not ipv6 even in mirror-to-ip6gretap case,
  because the overlay packet is still IPv4.
- Similarly ip_proto matches the innermost IP protocol, so can't be used
  to filter out GRE packet. Drop the corresponding condition.
- Because the above fixes the filters to match in slow path as well,
  they need to be made skip_hw so as not to double-count packets.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: lib: Avoid trapping soft devices
Petr Machata [Thu, 28 Jun 2018 16:56:28 +0000 (18:56 +0200)]
selftests: forwarding: lib: Avoid trapping soft devices

There are several cases where traffic that would normally be forwarded
in silicon needs to be observed in slow path. That's achieved by
trapping such traffic, and the functions trap_install() and
trap_uninstall() realize that. However, such treatment is obviously
wrong if the device in question is actually a soft device not backed by
an ASIC.

Therefore try to trap if possible, but fall back to inserting a continue
if not.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: lib: Split out setup_wait_dev()
Petr Machata [Thu, 28 Jun 2018 16:56:20 +0000 (18:56 +0200)]
selftests: forwarding: lib: Split out setup_wait_dev()

Split out of setup_wait() a function setup_wait_dev() that waits for a
single device. This gives tests the opportunity to wait for a selected
device after they tinkered with its upness.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'xilinx_emaclite-coding-style'
David S. Miller [Sat, 30 Jun 2018 11:15:45 +0000 (20:15 +0900)]
Merge branch 'xilinx_emaclite-coding-style'

Radhey Shyam Pandey says:

====================
Fixes coding style in xilinx_emaclite.c

This patchset fixes checkpatch and kernel-doc warnings in
xilinx emaclite driver. No functional change.

Changes from v2:
-In 2/5 patch refactor if-else to make failure path return early.
-In 2/5 patch coalesce the format onto a single line and add the
missing space after the comma.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Remove unnecessary spaces
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:50 +0000 (18:41 +0530)]
net: emaclite: Remove unnecessary spaces

This patch fixes below checkpatch checks-

CHECK: spaces preferred around that '*' (ctx:VxV)
CHECK: No space is necessary after a cast

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Fix block comments style
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:49 +0000 (18:41 +0530)]
net: emaclite: Fix block comments style

This patch fixes below checkpatch warnings-

WARNING: Block comments use a trailing */ on a separate line
WARNING: Block comments use * on subsequent lines
WARNING: networking block comments don't use an empty /* line,
use /* Comment

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: update kernel-doc comments
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:48 +0000 (18:41 +0530)]
net: emaclite: update kernel-doc comments

This patch fixes below kernel-doc warnings:

Function parameter or member 'maxlen' not described in 'xemaclite_recv_data'
Function parameter or member 'address'not described in 'xemaclite_set_mac_address'
Excess function parameter 'addr' description in 'xemaclite_set_mac_address'
No description found for return value of 'xemaclite_interrupt'
No description found for return value of 'xemaclite_mdio_write'
Function parameter or member 'dev' not described in 'xemaclite_mdio_setup'
Excess function parameter 'ofdev' description in 'xemaclite_mdio_setup'
No description found for return value of 'xemaclite_open'
No description found for return value of 'xemaclite_close'
Excess function parameter 'match' description in 'xemaclite_of_probe'

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Simplify if-else statements
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:47 +0000 (18:41 +0530)]
net: emaclite: Simplify if-else statements

Remove else as it is not required with if doing a return.
It also coalesce the format onto a single line and add the
missing space after the comma. Fixes below checkpatch warning-

WARNING: else is not generally useful after a break or return

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: emaclite: Use __func__ instead of hardcoded name
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:46 +0000 (18:41 +0530)]
net: emaclite: Use __func__ instead of hardcoded name

Switch hardcoded function name with a reference to __func__ making
the code more maintainable. Address below checkpatch warning:

WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
this function's name, in a string
+               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",

WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
this function's name, in a string
+               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mvpp2-Add-big-endian-support'
David S. Miller [Sat, 30 Jun 2018 09:54:09 +0000 (18:54 +0900)]
Merge branch 'mvpp2-Add-big-endian-support'

Maxime Chevallier says:

====================
net: mvpp2: Add big-endian support

This series allows to use PPv2 on system built as big endian.

The first patch fixes the way we represent TX and RX descriptors, so that
they used fixed little endianness as expected by the PPv2 controller.

The second reworks the way we handle the software representation of the
Header Parser entries, so that we don't use a union of arrays.

The last two patches fixes some incorrect byte swapping logic, that wen't
un-noticed on little-endian.

This whole series doesn't fix any existing bug for little-endian systems, and
since big-endian never worked for this driver, I didn't include 'fixes' tags.

This was tested on MacchiatoBin (Armada 8040).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: Use htons when checking protocol info
Maxime Chevallier [Thu, 28 Jun 2018 12:42:07 +0000 (14:42 +0200)]
net: mvpp2: Use htons when checking protocol info

When checking the skb->protocol field, we have to make sure we use the
proper endianness using htons, and not swab16.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: prs: Drop unnecessary swab16 in vlan detection
Maxime Chevallier [Thu, 28 Jun 2018 12:42:06 +0000 (14:42 +0200)]
net: mvpp2: prs: Drop unnecessary swab16 in vlan detection

Vlan IDs must not be swapped when creating Header Parser entries. This
has no effect on little-endian systems, but is wrong for big-endian.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: prs: Drop unions representing TCAM and SRAM entries
Maxime Chevallier [Thu, 28 Jun 2018 12:42:05 +0000 (14:42 +0200)]
net: mvpp2: prs: Drop unions representing TCAM and SRAM entries

PPv2's Header Parser use some large TCAM and SRAM entries, that are
duplicated in software so that we can write them to hardware only when
we are done modifying them.

Currently, PPv2 uses a union containing arrays of u32 and u8 to represent
these entries, to facilitate byte per byte access. This representation is
broken when we want to support big endian, and this makes the code
confusing to read.

This patch drops the union, and simply stores the TCAM and SRAM entries
as u32 arrays, each entry corresponding to a 32-bit register.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: Make TX / RX descriptors little-endian
Maxime Chevallier [Thu, 28 Jun 2018 12:42:04 +0000 (14:42 +0200)]
net: mvpp2: Make TX / RX descriptors little-endian

The PPv2 controller always expect descriptors to be in little endian. We
must therefore force descriptors to use that format, and convert to the
host endianness when necessary.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: add new SNMP counter for drops when try to queue in rcv queue
Yafang Shao [Thu, 28 Jun 2018 04:22:56 +0000 (00:22 -0400)]
tcp: add new SNMP counter for drops when try to queue in rcv queue

When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.

In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.

While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.

LINUX_MIB_TCPRCVQDROP:  Number of packets meant to be queued in rcv queue
but dropped because socket rcvbuf limit hit.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnx2x: Mark expected switch fall-throughs
Gustavo A. R. Silva [Thu, 28 Jun 2018 01:32:23 +0000 (20:32 -0500)]
bnx2x: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: Add support for CBS QDISC
Jose Abreu [Wed, 27 Jun 2018 14:57:02 +0000 (15:57 +0100)]
net: stmmac: Add support for CBS QDISC

This adds support for CBS reconfiguration using the TC application.

A new callback was added to TC ops struct and another one to DMA ops to
reconfigure the channel mode.

Tested in GMAC5.10.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mlx5e-updates-2018-06-28' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Fri, 29 Jun 2018 14:54:31 +0000 (23:54 +0900)]
Merge tag 'mlx5e-updates-2018-06-28' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-06-28

mlx5e netdevice driver updates:

- Boris Pismenny added the support for UDP GSO in the first two patches.
  Impressive performance numbers are included in the commit message,
  @Line rate with ~half of the cpu utilization compared to non offload
  or no GSO at all.

- From Tariq Toukan:
  - Convert large order kzalloc allocations to kvzalloc.
  - Added performance diagnostic statistics to several places in data path.

From Saeed and Eran,
  - Update NIC HW stats on demand only, this is to eliminate the background
    thread needed to update some HW statistics in the driver cache in
    order to report error and drop counters from HW in ndo_get_stats.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-Geneve-options-support-for-TC-act_tunnel_key'
David S. Miller [Fri, 29 Jun 2018 14:50:27 +0000 (23:50 +0900)]
Merge branch 'net-Geneve-options-support-for-TC-act_tunnel_key'

Jakub Kicinski says:

====================
net: Geneve options support for TC act_tunnel_key

Simon & Pieter say:

This set adds Geneve Options support to the TC tunnel key action.
It provides the plumbing required to configure Geneve variable length
options.  The options can be configured in the form CLASS:TYPE:DATA,
where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.
Additionally multiple options may be listed using a comma delimiter.

v2:
 - fix sparse warnings in patches 3 and 4 (first one reported by
   build bot).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: add tunnel option support to act_tunnel_key
Simon Horman [Wed, 27 Jun 2018 04:39:37 +0000 (21:39 -0700)]
net/sched: add tunnel option support to act_tunnel_key

Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: check tunnel option type in tunnel flags
Pieter Jansen van Vuuren [Wed, 27 Jun 2018 04:39:36 +0000 (21:39 -0700)]
net: check tunnel option type in tunnel flags

Check the tunnel option type stored in tunnel flags when creating options
for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
options on interfaces that are not associated with them.

Make sure all users of the infrastructure set correct flags, for the BPF
helper we have to set all bits to keep backward compatibility.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: act_tunnel_key: add extended ack support
Simon Horman [Wed, 27 Jun 2018 04:39:35 +0000 (21:39 -0700)]
net/sched: act_tunnel_key: add extended ack support

Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: act_tunnel_key: disambiguate metadata dst error cases
Simon Horman [Wed, 27 Jun 2018 04:39:34 +0000 (21:39 -0700)]
net/sched: act_tunnel_key: disambiguate metadata dst error cases

Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst

Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.

This is in preparation for using extended ack to provide more information
to users when parsing their input.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: Support ethtool private flags
Arjun Vynipadath [Tue, 26 Jun 2018 11:40:50 +0000 (17:10 +0530)]
cxgb4: Support ethtool private flags

This is used to change TX workrequests, which helps in
host->vf communication.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: Add support for FW_ETH_TX_PKT_VM_WR
Arjun Vynipadath [Tue, 26 Jun 2018 11:40:25 +0000 (17:10 +0530)]
cxgb4: Add support for FW_ETH_TX_PKT_VM_WR

The present TX workrequest(FW_ETH_TX_PKT_WR) cant be used for
host->vf communication, since it doesn't loopback the outgoing
packets to virtual interfaces on the same port. This can be done
using FW_ETH_TX_PKT_VM_WR.
This fix depends on ethtool_flags to determine what WR to use for
TX path. Support for setting this flags by user is added in next
commit.

Based on the original work by : Casey Leedom <leedom@chelsio.com>

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: add support for SCTP_REUSE_PORT sockopt
Xin Long [Thu, 28 Jun 2018 07:31:00 +0000 (15:31 +0800)]
sctp: add support for SCTP_REUSE_PORT sockopt

This feature is actually already supported by sk->sk_reuse which can be
set by socket level opt SO_REUSEADDR. But it's not working exactly as
RFC6458 demands in section 8.1.27, like:

  - This option only supports one-to-one style SCTP sockets
  - This socket option must not be used after calling bind()
    or sctp_bindx().

Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
work in linux.

To separate it from the socket level version, this patch adds 'reuse' in
sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
setup limitations that are needed when it is being enabled.

"It should be noted that the behavior of the socket-level socket option
to reuse ports and/or addresses for SCTP sockets is unspecified", so it
leaves SO_REUSEADDR as is for the compatibility.

Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.

Thanks to Neil to make this clear.

v1->v2:
  - add sctp_sk->reuse to separate it from the socket level version.
v2->v3:
  - improve changelog according to Marcelo's suggestion.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: stmmac: dwmac-rk: Add GMAC support for px30
David Wu [Thu, 28 Jun 2018 01:33:21 +0000 (09:33 +0800)]
net: ethernet: stmmac: dwmac-rk: Add GMAC support for px30

Add constants and callback functions for the dwmac on px30 Soc.
The base structure is the same, but registers and the bits in
them are moved slightly, and add the clk_mac_speed for selecting
mac speed.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotg3: Mark expected switch fall-throughs
Gustavo A. R. Silva [Thu, 28 Jun 2018 01:45:24 +0000 (20:45 -0500)]
tg3: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ila-Cleanup'
David S. Miller [Fri, 29 Jun 2018 02:32:55 +0000 (11:32 +0900)]
Merge branch 'ila-Cleanup'

Tom Herbert says:

====================
ila: Cleanup

Perform some cleanup in ILA code. This includes:

- Fix rhashtable walk for cases where nl dumps are done with muliple
  function calls. Add a skip index to skip over entries in
  a node that have been previously visitied. Call rhashtable_walk_peek
  to avoid dropping items between calls to ila_nl_dump.
- Call alloc_bucket_spinlocks to create bucket locks.
- Split out module initialization and netlink definitions into
  separate files.
- Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoila: Flush netlink command to clear xlat table
Tom Herbert [Wed, 27 Jun 2018 21:39:02 +0000 (14:39 -0700)]
ila: Flush netlink command to clear xlat table

Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoila: Create main ila source file
Tom Herbert [Wed, 27 Jun 2018 21:39:01 +0000 (14:39 -0700)]
ila: Create main ila source file

Create a main ila file that contains the module initialization functions
as well as netlink definitions. Previously these were defined in
ila_xlat and ila_common. This approach allows better extensibility.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoila: Call library function alloc_bucket_locks
Tom Herbert [Wed, 27 Jun 2018 21:39:00 +0000 (14:39 -0700)]
ila: Call library function alloc_bucket_locks

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoila: Fix use of rhashtable walk in ila_xlat.c
Tom Herbert [Wed, 27 Jun 2018 21:38:59 +0000 (14:38 -0700)]
ila: Fix use of rhashtable walk in ila_xlat.c

Perform better EAGAIN handling, handle case where ila_dump_info
fails and we missed objects in the dump, and add a skip index
to skip over ila entires in a list on a rhashtable node that have
already been visited (by a previous call to ila_nl_dump).

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'hns3-a-few-code-improvements'
David S. Miller [Fri, 29 Jun 2018 02:06:35 +0000 (11:06 +0900)]
Merge branch 'hns3-a-few-code-improvements'

Peng Li says:

====================
net: hns3: a few code improvements

This patchset fixes a few code stylistic issues from
concentrated review, no functional changes introduced.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: use lower_32_bits and upper_32_bits
Huazhong Tan [Thu, 28 Jun 2018 04:12:29 +0000 (12:12 +0800)]
net: hns3: use lower_32_bits and upper_32_bits

MACRO lower_32_bits and upper_32_bits can help to get bits 0-31
and bits 32-63 of a number, so just use it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: remove back in struct hclge_hw
Huazhong Tan [Thu, 28 Jun 2018 04:12:28 +0000 (12:12 +0800)]
net: hns3: remove back in struct hclge_hw

hclge_hw is embedded in hclge_dev, so use container_of instead of
back to get hclge_dev.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: remove the Redundant put_vector in hns3_client_uninit
Peng Li [Thu, 28 Jun 2018 04:12:27 +0000 (12:12 +0800)]
net: hns3: remove the Redundant put_vector in hns3_client_uninit

The interface h->ae_algo->ops->put_vector is called in both
hns3_nic_dealloc_vector_data and hns3_nic_uninit_vector_data in
hns3_client_uninit, this will cause vector freed twice.
This patch remove the Redundant put_vector to make vector freed
only once.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: print the ret value in error information
Peng Li [Thu, 28 Jun 2018 04:12:26 +0000 (12:12 +0800)]
net: hns3: print the ret value in error information

Print the ret value in error information can help find the reason.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: extraction an interface for state init|uninit
Peng Li [Thu, 28 Jun 2018 04:12:25 +0000 (12:12 +0800)]
net: hns3: extraction an interface for state init|uninit

Extraction an interface for state init|uninit to make the code
easier to read.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: remove unused head file in hnae3.c
Peng Li [Thu, 28 Jun 2018 04:12:24 +0000 (12:12 +0800)]
net: hns3: remove unused head file in hnae3.c

linux/slab.h is not used in hnae3.h, this patch removes it.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add unlikely for error check
Peng Li [Thu, 28 Jun 2018 04:12:23 +0000 (12:12 +0800)]
net: hns3: add unlikely for error check

The first bd of a packet is invalid and invalid ring head for tx
IRQ is not offen, they may occur when there is error,
Add unlikely for error check branch is better for performance.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add l4_type check for both ipv4 and ipv6
Peng Li [Thu, 28 Jun 2018 04:12:22 +0000 (12:12 +0800)]
net: hns3: add l4_type check for both ipv4 and ipv6

HW supports UDP, TCP and SCTP packets checksum for both ipv4 and
ipv6,  but do not support other type packets checksum for ipv4 or
ipv6.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add vector status check before free vector
Peng Li [Thu, 28 Jun 2018 04:12:21 +0000 (12:12 +0800)]
net: hns3: add vector status check before free vector

If the hdev->vector_status[vector_id] is already HCLGE_INVALID_VPORT,
should log the error and return.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: rename the interface for init_client_instance and uninit_client_instance
Peng Li [Thu, 28 Jun 2018 04:12:20 +0000 (12:12 +0800)]
net: hns3: rename the interface for init_client_instance and uninit_client_instance

The interface init_client_instance and uninit_client_instance
do not register anything, only initialize the client instance.
This patch rename the related interface to make the function
name to indicate the purpose.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: remove hclge_get_vector_index from hclge_bind_ring_with_vector
Peng Li [Thu, 28 Jun 2018 04:12:19 +0000 (12:12 +0800)]
net: hns3: remove hclge_get_vector_index from hclge_bind_ring_with_vector

In hclge_unmap_ring_frm_vector, there are 2 steps:
step 1: get vector index.
step 2 unbind ring with vector.

But it gets vector id again in step 2 interface. This patch
removes hclge_get_vector_index from hclge_bind_ring_with_vector,
and make the step the same with hns3 PF driver.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: Update NIC HW stats on demand only
Saeed Mahameed [Thu, 24 May 2018 01:26:09 +0000 (18:26 -0700)]
net/mlx5e: Update NIC HW stats on demand only

Disable periodic stats update background thread and update stats in
background on demand when ndo_get_stats is called.

Having a background thread running in the driver all the time is bad for
power consumption and normally a user space daemon will query the stats
once every specific interval, so ideally the background thread and its
interval can be done in user space..

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
6 years agonet/mlx5e: Add counter for total num of NOP operations
Tariq Toukan [Sun, 13 May 2018 10:42:16 +0000 (13:42 +0300)]
net/mlx5e: Add counter for total num of NOP operations

A per-ring counter for NOP operations already exists.
Here I add a counter that sums them up.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add counter for MPWQE filler strides
Tariq Toukan [Wed, 28 Jun 2017 16:27:18 +0000 (19:27 +0300)]
net/mlx5e: Add counter for MPWQE filler strides

Add ethtool counter to indicate the number of strides consumed
by filler CQEs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add channel events counter
Tariq Toukan [Tue, 13 Mar 2018 09:19:28 +0000 (11:19 +0200)]
net/mlx5e: Add channel events counter

Add per-channel and global ethtool counters for channel events.
Each event indicates an interrupt on one of the channel's
completion queues.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add a counter for congested UMRs
Tariq Toukan [Sun, 4 Mar 2018 12:25:00 +0000 (14:25 +0200)]
net/mlx5e: Add a counter for congested UMRs

Add per-ring and global ethtool counters for congested UMR requests.
These events indicate congestion in UMR handlers in HW.

Such event is concluded when there's an outstanding UMR post,
yet the SW consumed at least two additional MPWQEs in the meanwhile.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add NAPI statistics
Tariq Toukan [Wed, 2 May 2018 15:29:42 +0000 (18:29 +0300)]
net/mlx5e: Add NAPI statistics

Add per-channel and global ethtool counters for NAPI.
This helps us monitor and analyze performance in general.

- ch[i]_poll:
  the number of times the channel's NAPI poll was invoked.

- ch[i]_arm:
  the number of times the channel's NAPI poll completed
  and armed the completion queues.

- ch[i]_aff_change:
  the number of times the channel's NAPI poll explicitly
  stopped execution on a cpu due to a change in affinity.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add XDP_TX completions statistics
Tariq Toukan [Sun, 4 Mar 2018 08:35:00 +0000 (10:35 +0200)]
net/mlx5e: Add XDP_TX completions statistics

Add per-ring and global ethtool counters for XDP_TX completions.
This helps us monitor and analyze XDP_TX flow performance.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add TX completions statistics
Tariq Toukan [Wed, 18 Apr 2018 10:33:15 +0000 (13:33 +0300)]
net/mlx5e: Add TX completions statistics

Add per-ring and global ethtool counters for TX completions.
This helps us monitor and analyze TX flow performance.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: RX, Use existing WQ local variable
Tariq Toukan [Sun, 3 Jun 2018 14:41:48 +0000 (17:41 +0300)]
net/mlx5e: RX, Use existing WQ local variable

Local variable 'wq' already points to &sq->wq, use it.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Convert large order kzalloc allocations to kvzalloc
Tariq Toukan [Tue, 5 Jun 2018 08:47:04 +0000 (11:47 +0300)]
net/mlx5e: Convert large order kzalloc allocations to kvzalloc

Replace calls to kzalloc_node with kvzalloc_node, as it fallsback
to lower-order pages if the higher-order trials fail.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add UDP GSO remaining counter
Boris Pismenny [Mon, 11 Jun 2018 14:24:58 +0000 (17:24 +0300)]
net/mlx5e: Add UDP GSO remaining counter

This patch adds a counter for tx UDP GSO packets that contain a segment
that is not aligned to MSS - remaining segment.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add UDP GSO support
Boris Pismenny [Thu, 31 May 2018 12:29:42 +0000 (15:29 +0300)]
net/mlx5e: Add UDP GSO support

This patch enables UDP GSO support. We enable this by using two WQEs
the first is a UDP LSO WQE for all segments with equal length, and the
second is for the last segment in case it has different length.
Due to HW limitation, before sending, we must adjust the packet length fields.

We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
We compare single stream UDP, UDP GSO and UDP GSO with offload.
Performance:
| MSS (bytes) | Throughput (Gbps) | CPU utilization (%)
UDP GSO offload | 1472 | 35.6 | 8%
UDP GSO  | 1472 | 25.5 | 17%
UDP  | 1472 | 10.2 | 17%
UDP GSO offload | 1024 | 35.6 | 8%
UDP GSO | 1024 | 19.2 | 17%
UDP  | 1024 | 5.7 | 17%
UDP GSO offload | 512 | 33.8 | 16%
UDP GSO | 512 | 10.4 | 17%
UDP  | 512 | 3.5 | 17%

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agoMerge branch 'net-preserve-sock-reference-when-scrubbing-the-skb'
David S. Miller [Thu, 28 Jun 2018 13:21:33 +0000 (22:21 +0900)]
Merge branch 'net-preserve-sock-reference-when-scrubbing-the-skb'

Flavio Leitner says:

====================
net: preserve sock reference when scrubbing the skb.

The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.

XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.

TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.

Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action,
but the transmit side needs some extra checking included in the
first patch.

The first patch will update netfilter to check if the socket
netns is local before use it.

The second patch removes the skb_orphan() from the skb_scrub_packet()
and improve the documentation.

ChangeLog:
- split into two (Eric)
- addressed Paolo's offline feedback to swap the checks in xt_socket.c
  to preserve original behavior.
- improved ip-sysctl.txt (reported by Cong)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoskbuff: preserve sock reference when scrubbing the skb.
Flavio Leitner [Wed, 27 Jun 2018 13:34:26 +0000 (10:34 -0300)]
skbuff: preserve sock reference when scrubbing the skb.

The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.

XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.

TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.

Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action
and on TX side the netfilter checks if the reference is local before
use it.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetfilter: check if the socket netns is correct.
Flavio Leitner [Wed, 27 Jun 2018 13:34:25 +0000 (10:34 -0300)]
netfilter: check if the socket netns is correct.

Netfilter assumes that if the socket is present in the skb, then
it can be used because that reference is cleaned up while the skb
is crossing netns.

We want to change that to preserve the socket reference in a future
patch, so this is a preparation updating netfilter to check if the
socket netns matches before use it.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-sched-actions-code-style-cleanup-and-fixes'
David S. Miller [Thu, 28 Jun 2018 13:12:03 +0000 (22:12 +0900)]
Merge branch 'net-sched-actions-code-style-cleanup-and-fixes'

Roman Mashak says:

====================
net sched actions: code style cleanup and fixes

The patchset fixes a few code stylistic issues and typos, as well as one
detected by sparse semantic checker tool.

No functional changes introduced.

Patch 1 & 2 fix coding style bits caught by the checkpatch.pl script
Patch 3 fixes an issue with a shadowed variable
Patch 4 adds sizeof() operator instead of magic number for buffer length
Patch 5 fixes typos in diagnostics messages
Patch 6 explicitly sets unsigned char for bitwise operation

v2:
   - submit for net-next
   - added Reviewed-by tags
   - use u8* instead of char* as per Davide Caratti suggestion
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: avoid bitwise operation on signed value in pedit
Roman Mashak [Wed, 27 Jun 2018 17:33:35 +0000 (13:33 -0400)]
net sched actions: avoid bitwise operation on signed value in pedit

Since char can be unsigned or signed, and bitwise operators may have
implementation-dependent results when performed on signed operands,
declare 'u8 *' operand instead.

Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: fix misleading text strings in pedit action
Roman Mashak [Wed, 27 Jun 2018 17:33:34 +0000 (13:33 -0400)]
net sched actions: fix misleading text strings in pedit action

Change "tc filter pedit .." to "tc actions pedit .." in error
messages to clearly refer to pedit action.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: use sizeof operator for buffer length
Roman Mashak [Wed, 27 Jun 2018 17:33:33 +0000 (13:33 -0400)]
net sched actions: use sizeof operator for buffer length

Replace constant integer with sizeof() to clearly indicate
the destination buffer length in skb_header_pointer() calls.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: fix sparse warning
Roman Mashak [Wed, 27 Jun 2018 17:33:32 +0000 (13:33 -0400)]
net sched actions: fix sparse warning

The variable _data in include/asm-generic/sections.h defines sections,
this causes sparse warning in pedit:

net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one
./include/asm-generic/sections.h:36:13: originally declared here

Therefore rename the variable.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: fix coding style in pedit headers
Roman Mashak [Wed, 27 Jun 2018 17:33:31 +0000 (13:33 -0400)]
net sched actions: fix coding style in pedit headers

Fix coding style issues in tc pedit headers detected by the
checkpatch script.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet sched actions: fix coding style in pedit action
Roman Mashak [Wed, 27 Jun 2018 17:33:30 +0000 (13:33 -0400)]
net sched actions: fix coding style in pedit action

Fix coding style issues in tc pedit action detected by the
checkpatch script.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetem: slotting with non-uniform distribution
Yousuk Seung [Wed, 27 Jun 2018 17:32:19 +0000 (10:32 -0700)]
netem: slotting with non-uniform distribution

Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.

Commit f043efeae2f1 ("netem: support delivering packets in delayed
time slots") added the slotting feature to approximate the behaviors
of media with packet aggregation but only supported a uniform
distribution for delays between transmission attempts. Tests with TCP
BBR with emulated wifi links with non-uniform distributions produced
more useful results.

Syntax:
   slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
      [bytes MAX_BYTES]

The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.

Examples:
  Normal distribution delay with mean = 800us and stdev = 100us.
  > tc qdisc add dev eth0 root netem slot dist normal 800us 100us

  Optionally set the max slot size in bytes and/or packets.
  > tc qdisc add dev eth0 root netem slot dist normal 800us 100us \
    bytes 64k packets 42

Signed-off-by: Yousuk Seung <ysseung@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetlink: Return extack message if attribute validation fails
David Ahern [Tue, 26 Jun 2018 19:39:18 +0000 (12:39 -0700)]
netlink: Return extack message if attribute validation fails

Have one extack message for parsing and validating.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: xgmiitorgmii: Check read_status results
Brandon Maier [Tue, 26 Jun 2018 17:50:50 +0000 (12:50 -0500)]
net: phy: xgmiitorgmii: Check read_status results

We're ignoring the result of the attached phy device's read_status().
Return it so we can detect errors.

Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: xgmiitorgmii: Use correct mdio bus
Brandon Maier [Tue, 26 Jun 2018 17:50:49 +0000 (12:50 -0500)]
net: phy: xgmiitorgmii: Use correct mdio bus

The xgmiitorgmii is using the mii_bus of the device it's attached to,
instead of the bus it was given during probe.

Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: xgmiitorgmii: Check phy_driver ready before accessing
Brandon Maier [Tue, 26 Jun 2018 17:50:48 +0000 (12:50 -0500)]
net: phy: xgmiitorgmii: Check phy_driver ready before accessing

Since a phy_device is added to the global mdio_bus list during
phy_device_register(), but a phy_device's phy_driver doesn't get
attached until phy_probe(). It's possible of_phy_find_device() in
xgmiitorgmii will return a valid phy with a NULL phy_driver. Leading to
a NULL pointer access during the memcpy().

Fixes this Oops:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.40 #1
Hardware name: Xilinx Zynq Platform
task: ce4c8d00 task.stack: ce4ca000
PC is at memcpy+0x48/0x330
LR is at xgmiitorgmii_probe+0x90/0xe8
pc : [<c074bc68>]    lr : [<c0529548>]    psr: 20000013
sp : ce4cbb54  ip : 00000000  fp : ce4cbb8c
r10: 00000000  r9 : 00000000  r8 : c0c49178
r7 : 00000000  r6 : cdc14718  r5 : ce762800  r4 : cdc14710
r3 : 00000000  r2 : 00000054  r1 : 00000000  r0 : cdc14718
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 18c5387d  Table: 0000404a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xce4ca210)
...
[<c074bc68>] (memcpy) from [<c0529548>] (xgmiitorgmii_probe+0x90/0xe8)
[<c0529548>] (xgmiitorgmii_probe) from [<c0526a94>] (mdio_probe+0x28/0x34)
[<c0526a94>] (mdio_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
[<c04db98c>] (driver_probe_device) from [<c04dbd58>] (__device_attach_driver+0xac/0x10c)
[<c04dbd58>] (__device_attach_driver) from [<c04d96f4>] (bus_for_each_drv+0x84/0xc8)
[<c04d96f4>] (bus_for_each_drv) from [<c04db5bc>] (__device_attach+0xd0/0x134)
[<c04db5bc>] (__device_attach) from [<c04dbdd4>] (device_initial_probe+0x1c/0x20)
[<c04dbdd4>] (device_initial_probe) from [<c04da8fc>] (bus_probe_device+0x98/0xa0)
[<c04da8fc>] (bus_probe_device) from [<c04d8660>] (device_add+0x43c/0x5d0)
[<c04d8660>] (device_add) from [<c0526cb8>] (mdio_device_register+0x34/0x80)
[<c0526cb8>] (mdio_device_register) from [<c0580b48>] (of_mdiobus_register+0x170/0x30c)
[<c0580b48>] (of_mdiobus_register) from [<c05349c4>] (macb_probe+0x710/0xc00)
[<c05349c4>] (macb_probe) from [<c04dd700>] (platform_drv_probe+0x44/0x80)
[<c04dd700>] (platform_drv_probe) from [<c04db98c>] (driver_probe_device+0x254/0x414)
[<c04db98c>] (driver_probe_device) from [<c04dbc58>] (__driver_attach+0x10c/0x118)
[<c04dbc58>] (__driver_attach) from [<c04d9600>] (bus_for_each_dev+0x8c/0xd0)
[<c04d9600>] (bus_for_each_dev) from [<c04db1fc>] (driver_attach+0x2c/0x30)
[<c04db1fc>] (driver_attach) from [<c04daa98>] (bus_add_driver+0x50/0x260)
[<c04daa98>] (bus_add_driver) from [<c04dc440>] (driver_register+0x88/0x108)
[<c04dc440>] (driver_register) from [<c04dd6b4>] (__platform_driver_register+0x50/0x58)
[<c04dd6b4>] (__platform_driver_register) from [<c0b31248>] (macb_driver_init+0x24/0x28)
[<c0b31248>] (macb_driver_init) from [<c010203c>] (do_one_initcall+0x60/0x1a4)
[<c010203c>] (do_one_initcall) from [<c0b00f78>] (kernel_init_freeable+0x15c/0x1f8)
[<c0b00f78>] (kernel_init_freeable) from [<c0763d10>] (kernel_init+0x18/0x124)
[<c0763d10>] (kernel_init) from [<c0112d74>] (ret_from_fork+0x14/0x20)
Code: ba000002 f5d1f03c f5d1f05c f5d1f07c (e8b151f8)
---[ end trace 3e4ec21905820a1f ]---

Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ipsec-selftests-updates'
David S. Miller [Thu, 28 Jun 2018 07:10:08 +0000 (16:10 +0900)]
Merge branch 'ipsec-selftests-updates'

Shannon Nelson says:

====================
Updates for ipsec selftests

Fix up the existing ipsec selftest and add tests for
the ipsec offload driver API.

v2: addressed formatting nits in netdevsim from Jakub Kicinski
v3: a couple more nits from Jakub
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: rtnetlink: add ipsec offload API test
Shannon Nelson [Tue, 26 Jun 2018 17:07:55 +0000 (10:07 -0700)]
selftests: rtnetlink: add ipsec offload API test

Using the netdevsim as a device for testing, try out the XFRM commands
for setting up IPsec hardware offloads.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetdevsim: add ipsec offload testing
Shannon Nelson [Tue, 26 Jun 2018 17:07:54 +0000 (10:07 -0700)]
netdevsim: add ipsec offload testing

Implement the IPsec/XFRM offload API for testing.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: rtnetlink: use dummydev as a test device
Shannon Nelson [Tue, 26 Jun 2018 17:07:53 +0000 (10:07 -0700)]
selftests: rtnetlink: use dummydev as a test device

We really shouldn't mess with local system settings, so let's
use the already created dummy device instead for ipsec testing.
Oh, and let's put the temp file into a proper directory.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: rtnetlink: clear the return code at start of ipsec test
Shannon Nelson [Tue, 26 Jun 2018 17:07:52 +0000 (10:07 -0700)]
selftests: rtnetlink: clear the return code at start of ipsec test

Following the custom from the other functions, clear the global
ret code before starting the test so as to not have previously
failed tests cause us to thing this test has failed.

Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: define helper for parsing struct sockaddr_pppol2tp*
Guillaume Nault [Tue, 26 Jun 2018 16:41:36 +0000 (18:41 +0200)]
l2tp: define helper for parsing struct sockaddr_pppol2tp*

'sockaddr_len' is checked against various values when entering
pppol2tp_connect(), to verify its validity. It is used again later, to
find out which sockaddr structure was passed from user space. This
patch combines these two operations into one new function in order to
simplify pppol2tp_connect().

A new structure, l2tp_connect_info, is used to pass sockaddr data back
to pppol2tp_connect(), to avoid passing too many parameters to
l2tp_sockaddr_get_info(). Also, the first parameter is void* in order
to avoid casting between all sockaddr_* structures manually.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove one indentation level in tcp_create_openreq_child
Eric Dumazet [Tue, 26 Jun 2018 15:45:49 +0000 (08:45 -0700)]
tcp: remove one indentation level in tcp_create_openreq_child

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix *enum* {A|M}PR_BIT
Sergei Shtylyov [Tue, 26 Jun 2018 15:42:33 +0000 (18:42 +0300)]
sh_eth: fix *enum* {A|M}PR_BIT

The *enum* {A|M}PR_BIT were declared in the commit 86a74ff21a7a ("net:
sh_eth: add support for  Renesas SuperH Ethernet") adding SH771x support,
however the SH771x manual  doesn't have the APR/MPR registers described
and the code writing to them for SH7710 was later removed by the commit
380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct
sh_eth_cpu_data""). All the newer SoC manuals have these registers
documented as having a 16-bit TIME parameter of the PAUSE frame, not
1-bit -- update the *enum* accordingly, fixing up the APR/MPR writes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc-tests: add an extreme-case csum action test
Keara Leibovitz [Tue, 26 Jun 2018 14:16:28 +0000 (10:16 -0400)]
tc-tests: add an extreme-case csum action test

Added an extreme-case test for all 7 csum action headers.

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mscc-ocelot-add-more-features'
David S. Miller [Thu, 28 Jun 2018 05:18:49 +0000 (14:18 +0900)]
Merge branch 'mscc-ocelot-add-more-features'

Alexandre Belloni says:

====================
net: mscc: ocelot: add more features

This series adds link aggregation and VLAN filtering hardware offload
support to the ocelot driver.

PTP support will be sent later.

changes in v2:
 - rebased on v4.18-rc1
 - check for aggregation type and only offload it when type is hash (balance-xor
   or 802.3ad)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mscc: ocelot: add VLAN filtering
Antoine Tenart [Tue, 26 Jun 2018 12:28:49 +0000 (14:28 +0200)]
net: mscc: ocelot: add VLAN filtering

Add hardware VLAN filtering offloading on ocelot.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>