platform/kernel/linux-rpi.git
5 years agonet: sched: add ingress mirred action to hardware IR
John Hurley [Sun, 4 Aug 2019 15:09:06 +0000 (16:09 +0100)]
net: sched: add ingress mirred action to hardware IR

TC mirred actions (redirect and mirred) can send to egress or ingress of a
device. Currently only egress is used for hw offload rules.

Modify the intermediate representation for hw offload to include mirred
actions that go to ingress. This gives drivers access to such rules and
can decide whether or not to offload them.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tc_act: add helpers to detect ingress mirred actions
John Hurley [Sun, 4 Aug 2019 15:09:05 +0000 (16:09 +0100)]
net: tc_act: add helpers to detect ingress mirred actions

TC mirred actions can send to egress or ingress on a given netdev. Helpers
exist to detect actions that are mirred to egress. Extend the header file
to include helpers to detect ingress mirred actions.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: add skbedit of ptype action to hardware IR
John Hurley [Sun, 4 Aug 2019 15:09:04 +0000 (16:09 +0100)]
net: sched: add skbedit of ptype action to hardware IR

TC rules can impliment skbedit actions. Currently actions that modify the
skb mark are passed to offloading drivers via the hardware intermediate
representation in the flow_offload API.

Extend this to include skbedit actions that modify the packet type of the
skb. Such actions may be used to set the ptype to HOST when redirecting a
packet to ingress.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tc_act: add skbedit_ptype helper functions
John Hurley [Sun, 4 Aug 2019 15:09:03 +0000 (16:09 +0100)]
net: tc_act: add skbedit_ptype helper functions

The tc_act header file contains an inline function that checks if an
action is changing the skb mark of a packet and a further function to
extract the mark.

Add similar functions to check for and get skbedit actions that modify
the packet type of the skb.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 6 Aug 2019 21:21:21 +0000 (14:21 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-08-04

This series contains more updates to fm10k from Jake Keller.

Jake removes the unnecessary initialization of some variables to help
resolve static code checker warnings.  Explicitly return success during
resume, since the value of 'err' is always success.  Fixed a issue with
incrementing a void pointer, which can produce undefined behavior.  Used
the __always_unused macro for function templates that are passed as
parameters in functions, but are not used.  Simplified the code by
removing an unnecessary macro in determining the value of NON_Q_VECTORS.
Fixed an issue, using bitwise operations to prevent the low address
overwriting the high portion of the address.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: sync PCIe PHY init with vendor driver 8.047.01
Heiner Kallweit [Sun, 4 Aug 2019 07:52:33 +0000 (09:52 +0200)]
r8169: sync PCIe PHY init with vendor driver 8.047.01

Synchronize PCIe PHY initialization with vendor driver version 8.047.01.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: add helper r8168_mac_ocp_modify
Heiner Kallweit [Sun, 4 Aug 2019 07:47:51 +0000 (09:47 +0200)]
r8169: add helper r8168_mac_ocp_modify

Add a helper for MAC OCP read-modify-write operations.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove access to legacy register MultiIntr
Heiner Kallweit [Sun, 4 Aug 2019 07:42:57 +0000 (09:42 +0200)]
r8169: remove access to legacy register MultiIntr

This code piece was inherited from RTL8139 code, the register at
address 0x5c however has a different meaning on RTL8169 and is unused.
So we can remove this.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'fq_codel-small-optimizations'
David S. Miller [Tue, 6 Aug 2019 21:18:20 +0000 (14:18 -0700)]
Merge branch 'fq_codel-small-optimizations'

Dave Taht says:

====================
Two small fq_codel optimizations

These two patches improve fq_codel performance
under extreme network loads. The first patch
more rapidly escalates the codel count under
overload, the second just kills a totally useless
statistic.

(sent together because they'd otherwise conflict)
====================

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofq_codel: Kill useless per-flow dropped statistic
Dave Taht [Sat, 3 Aug 2019 23:37:29 +0000 (16:37 -0700)]
fq_codel: Kill useless per-flow dropped statistic

It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.

It also conflates ecn marks with drops.

It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoIncrease fq_codel count in the bulk dropper
Dave Taht [Sat, 3 Aug 2019 23:37:28 +0000 (16:37 -0700)]
Increase fq_codel count in the bulk dropper

In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.

The scan through the 1024 queues happens more often than it needs to.

This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mdio-octeon: Fix Kconfig warnings and build errors
Nathan Chancellor [Sat, 3 Aug 2019 06:01:56 +0000 (23:01 -0700)]
net: mdio-octeon: Fix Kconfig warnings and build errors

After commit 171a9bae68c7 ("staging/octeon: Allow test build on
!MIPS"), the following combination of configs cause a few Kconfig
warnings and build errors (distilled from arm allyesconfig and Randy's
randconfig builds):

    CONFIG_NETDEVICES=y
    CONFIG_STAGING=y
    CONFIG_COMPILE_TEST=y

and CONFIG_OCTEON_ETHERNET as either a module or built-in.

WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y]
&& 64BIT [=n] && HAS_IOMEM [=y] && OF_MDIO [=n]
  Selected by [y]:
  - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC ||
COMPILE_TEST [=y]) && NETDEVICES [=y]

In file included from ../drivers/net/phy/mdio-octeon.c:14:
../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
function ‘writeq’; did you mean ‘writel’?
[-Werror=implicit-function-declaration]
  111 | #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
      |                                    ^~~~~~

CONFIG_64BIT is not strictly necessary if the proper readq/writeq
definitions are included from io-64-nonatomic-lo-hi.h.

CONFIG_OF_MDIO is not needed when CONFIG_COMPILE_TEST is enabled because
of commit f9dc9ac51610 ("of/mdio: Add dummy functions in of_mdio.h.").

Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: dump CPU port regs through master
Vivien Didelot [Fri, 2 Aug 2019 19:34:55 +0000 (15:34 -0400)]
net: dsa: dump CPU port regs through master

Merge the CPU port registers dump into the master interface registers
dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs
structures of the CPU port into the dump.

drvinfo->regdump_len will contain the full data length, while regs->len
will contain only the master interface registers dump length.

This allows for example to dump the CPU port registers on a ZII Dev
C board like this:

    # ethtool -d eth1
    0x004:                                              0x00000000
    0x008:                                              0x0a8000aa
    0x010:                                              0x01000000
    0x014:                                              0x00000000
    0x024:                                              0xf0000102
    0x040:                                              0x6d82c800
    0x044:                                              0x00000020
    0x064:                                              0x40000000
    0x084: RCR (Receive Control Register)               0x47c00104
        MAX_FL (Maximum frame length)                   1984
        FCE (Flow control enable)                       0
        BC_REJ (Broadcast frame reject)                 0
        PROM (Promiscuous mode)                         0
        DRT (Disable receive on transmit)               0
        LOOP (Internal loopback)                        0
    0x0c4: TCR (Transmit Control Register)              0x00000004
        RFC_PAUSE (Receive frame control pause)         0
        TFC_PAUSE (Transmit frame control pause)        0
        FDEN (Full duplex enable)                       1
        HBC (Heartbeat control)                         0
        GTS (Graceful transmit stop)                    0
    0x0e4:                                              0x76735d6d
    0x0e8:                                              0x7e9e8808
    0x0ec:                                              0x00010000
    .
    .
    .
    88E6352  Switch Port Registers
    ------------------------------
    00: Port Status                            0x4d04
          Pause Enabled                        0
          My Pause                             1
          802.3 PHY Detected                   0
          Link Status                          Up
          Duplex                               Full
          Speed                                100 or 200 Mbps
          EEE Enabled                          0
          Transmitter Paused                   0
          Flow Control                         0
          Config Mode                          0x4
    01: Physical Control                       0x003d
          RGMII Receive Timing Control         Default
          RGMII Transmit Timing Control        Default
          200 BASE Mode                        100
          Flow Control's Forced value          0
          Force Flow Control                   0
          Link's Forced value                  Up
          Force Link                           1
          Duplex's Forced value                Full
          Force Duplex                         1
          Force Speed                          100 or 200 Mbps
    .
    .
    .

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'drop_monitor-Various-improvements-and-cleanups'
David S. Miller [Tue, 6 Aug 2019 19:37:56 +0000 (12:37 -0700)]
Merge branch 'drop_monitor-Various-improvements-and-cleanups'

Ido Schimmel says:

====================
drop_monitor: Various improvements and cleanups

This patchset performs various improvements and cleanups in drop monitor
with no functional changes intended. There are no changes in these
patches relative to the RFC I sent two weeks ago [1].

A followup patchset will extend drop monitor with a packet alert mode in
which the dropped packet is notified to user space instead of just a
summary of recent drops. Subsequent patchsets will add the ability to
monitor hardware originated drops via drop monitor.

[1] https://patchwork.ozlabs.org/cover/1135226/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrop_monitor: Use pre_doit / post_doit hooks
Ido Schimmel [Tue, 6 Aug 2019 13:19:56 +0000 (16:19 +0300)]
drop_monitor: Use pre_doit / post_doit hooks

Each operation from user space should be protected by the global drop
monitor mutex. Use the pre_doit / post_doit hooks to take / release the
lock instead of doing it explicitly in each function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrop_monitor: Add extack support
Ido Schimmel [Tue, 6 Aug 2019 13:19:55 +0000 (16:19 +0300)]
drop_monitor: Add extack support

Add various extack messages to make drop_monitor more user friendly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrop_monitor: Avoid multiple blank lines
Ido Schimmel [Tue, 6 Aug 2019 13:19:54 +0000 (16:19 +0300)]
drop_monitor: Avoid multiple blank lines

Remove multiple blank lines which are visually annoying and useless.

This suppresses the "Please don't use multiple blank lines" checkpatch
messages.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrop_monitor: Document scope of spinlock
Ido Schimmel [Tue, 6 Aug 2019 13:19:53 +0000 (16:19 +0300)]
drop_monitor: Document scope of spinlock

While 'per_cpu_dm_data' is a per-CPU variable, its 'skb' and
'send_timer' fields can be accessed concurrently by the CPU sending the
netlink notification to user space from the workqueue and the CPU
tracing kfree_skb(). This spinlock is meant to protect against that.

Document its scope and suppress the checkpatch message "spinlock_t
definition without comment".

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrop_monitor: Rename and document scope of mutex
Ido Schimmel [Tue, 6 Aug 2019 13:19:52 +0000 (16:19 +0300)]
drop_monitor: Rename and document scope of mutex

The 'trace_state_mutex' does not only protect the global 'trace_state'
variable, but also the global 'hw_stats_list'.

Subsequent patches are going add more operations from user space to
drop_monitor and these all need to be mutually exclusive.

Rename 'trace_state_mutex' to the more fitting 'net_dm_mutex' name and
document its scope.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrop_monitor: Use correct error code
Ido Schimmel [Tue, 6 Aug 2019 13:19:51 +0000 (16:19 +0300)]
drop_monitor: Use correct error code

The error code 'ENOTSUPP' is reserved for use with NFS. Use 'EOPNOTSUPP'
instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ksz: Drop NET_DSA_TAG_KSZ9477
Marek Vasut [Tue, 6 Aug 2019 13:06:09 +0000 (15:06 +0200)]
net: dsa: ksz: Drop NET_DSA_TAG_KSZ9477

This Kconfig option is unused, drop it.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ksz: Merge ksz_priv.h into ksz_common.h
Marek Vasut [Tue, 6 Aug 2019 13:06:08 +0000 (15:06 +0200)]
net: dsa: ksz: Merge ksz_priv.h into ksz_common.h

Merge the two headers into one, no functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ksz: Remove dead code and fix warnings
Marek Vasut [Tue, 6 Aug 2019 13:06:07 +0000 (15:06 +0200)]
net: dsa: ksz: Remove dead code and fix warnings

Remove ksz_port_cleanup(), which is unused. Add missing include
"ksz_common.h", which fixes the following warning when built with
make ... W=1

drivers/net/dsa/microchip/ksz_common.c:23:6: warning: no previous prototype for ‘...’ [-Wmissing-prototypes]

Note that the order of the headers cannot be swapped, as that would
trigger missing forward declaration errors, which would indicate the
way forward is to merge the two headers into one.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocnic: Explicitly initialize all reference counts to 0.
Michael Chan [Fri, 2 Aug 2019 06:17:51 +0000 (02:17 -0400)]
cnic: Explicitly initialize all reference counts to 0.

The driver is relying on zero'ed allocated memory and does not
explicitly call atomic_set() to initialize the ref counts to 0.  Add
these atomic_set() calls so that it will be more straight forward
to convert atomic ref counts to refcount_t.

Reported-by: Chuhong Yuan <hslester96@gmail.com>
Cc: Rasesh Mody <rmody@marvell.com>
Cc: <GR-Linux-NIC-Dev@marvell.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: have a single rcu unlock point in __ip6_rt_update_pmtu
David Ahern [Thu, 1 Aug 2019 22:18:08 +0000 (15:18 -0700)]
ipv6: have a single rcu unlock point in __ip6_rt_update_pmtu

Simplify the unlock path in __ip6_rt_update_pmtu by using a
single point where rcu_read_unlock is called.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2019-08-01' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Mon, 5 Aug 2019 17:50:05 +0000 (10:50 -0700)]
Merge tag 'mlx5-updates-2019-08-01' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-08-01

Misc updates for mlx5 netdev driver:

1) Ingress rate support for E-Switch vports from Eli.
2) Gavi introduces flow counters bulk allocation and pool,
   To improve the performance of flow counter acquisition.
3) From Tariq, micro improvements for tx path
4) From Shay, small improvement for XDP TX MPWQE inline flow.
5) Aya provides some cleanups for tx devlink health reporters.
6) Saeed, refactor checksum handling into a single function.
7) Tonghao, allows dropping specific tunnel packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years ago][next] selftests: nettest: fix spelling mistake: "potocol" -> "protocol"
Colin Ian King [Mon, 5 Aug 2019 10:52:11 +0000 (11:52 +0100)]
][next] selftests: nettest: fix spelling mistake: "potocol" -> "protocol"

There is a spelling mistake in an error messgae. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofm10k: fix fm10k_get_fault_pf to read correct address
Jacob Keller [Mon, 8 Jul 2019 23:12:36 +0000 (16:12 -0700)]
fm10k: fix fm10k_get_fault_pf to read correct address

Fix assignment of the FM10K_FAULT_ADDR_LO register into fault->address
by using a bit-wise |= operation. Without this, the low address is
completely overwriting the high potion of the address. This caused the
fault to incorrectly return only the lower 32 bits of the fault address.

This issue was detected by cppcheck and resolves the following warnings
produced by that tool:

[fm10k_pf.c:1668] -> [fm10k_pf.c:1670]: (style) Variable
'fault->address' is reassigned a value before the old one has been used.

[fm10k_pf.c:1669] -> [fm10k_pf.c:1670]: (style) Variable
'fault->address' is reassigned a value before the old one has been used.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: convert NON_Q_VECTORS(hw) into NON_Q_VECTORS
Jacob Keller [Mon, 8 Jul 2019 23:12:35 +0000 (16:12 -0700)]
fm10k: convert NON_Q_VECTORS(hw) into NON_Q_VECTORS

The driver currently uses a macro to decide whether we should use
NON_Q_VECTORS_PF or NON_Q_VECTORS_VF.

However, we also define NON_Q_VECTORS_VF to the same value as
NON_Q_VECTORS_PF. This means that the macro NON_Q_VECTORS(hw) will
always return the same value.

Let's just remove this macro, and replace it directly with an enum value
on the enum non_q_vectors.

This was detected by cppcheck and fixes the following warnings when
building with BUILD=KERNEL

[fm10k_ethtool.c:1123]: (style) Same value in both branches of ternary
operator.

[fm10k_ethtool.c:1142]: (style) Same value in both branches of ternary
operator.

[fm10k_main.c:1826]: (style) Same value in both branches of ternary
operator.

[fm10k_main.c:1849]: (style) Same value in both branches of ternary
operator.

[fm10k_main.c:1858]: (style) Same value in both branches of ternary
operator.

[fm10k_pci.c:901]: (style) Same value in both branches of ternary
operator.

[fm10k_pci.c:1040]: (style) Same value in both branches of ternary
operator.

[fm10k_pci.c:1726]: (style) Same value in both branches of ternary
operator.

[fm10k_pci.c:1763]: (style) Same value in both branches of ternary
operator.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: mark unused parameters with __always_unused
Jacob Keller [Mon, 8 Jul 2019 23:12:34 +0000 (16:12 -0700)]
fm10k: mark unused parameters with __always_unused

Several functions in the fm10k driver have specific function templates,
as they are used as function pointers. The parameters in these functions
are not always used. Explicitly mark unused parameters with the
__always_unused macro, so that the compiler will not warn about them
when building with the -Wunused-parameter warning enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: cast page_addr to u8 * when incrementing it
Jacob Keller [Mon, 8 Jul 2019 23:12:33 +0000 (16:12 -0700)]
fm10k: cast page_addr to u8 * when incrementing it

The page_addr variable is a void pointer. Incrementing it before calling
prefetch is technically undefined. Fix this by casting it to a u8*
pointer before incrementing it. This ensures that we increment the
pointer value in byte units, instead of relying on this undefined
behavior.

This was detected by cppcheck, and resolves the following warning
produced by that tool:

[fm10k_main.c:328]: (portability) 'page_addr' is of type 'void *'. When
using void pointers in calculations, the behaviour is undefined.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: explicitly return 0 on success path in function
Jacob Keller [Mon, 8 Jul 2019 23:12:32 +0000 (16:12 -0700)]
fm10k: explicitly return 0 on success path in function

In the fm10k_handle_resume function, return 0 explicitly at the end of
the function instead of returning the err value.

This was detected by cppcheck and resolves the following style warning
produced by that tool:

[fm10k_pci.c:2768] -> [fm10k_pci.c:2787]: (warning) Identical condition
'err', second condition is always false

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: remove needless initialization of size local variable
Jacob Keller [Mon, 8 Jul 2019 23:12:31 +0000 (16:12 -0700)]
fm10k: remove needless initialization of size local variable

The local variable 'size' in fm10k_dfwd_add_station is initialized, but
is always re-assigned immediately before use. Remove this unnecessary
initialization.

This was detected by cppcheck and resolves the following warning
produced by that tool:

[fm10k_netdev.c:1466]: (style) Variable 'size' is assigned a value that is never used.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: remove needless assignment of err local variable
Jacob Keller [Mon, 8 Jul 2019 23:12:30 +0000 (16:12 -0700)]
fm10k: remove needless assignment of err local variable

The local variable err in several functions in the fm10k_netdev.c file
is initialized with a value that is never used. The err value is
immediately re-assigned in all cases where it will be checked. Remove
the unnecessary initializers.

This was detected by cppcheck and resolves the following warnings
produced by that tool:

[fm10k_netdev.c:999] -> [fm10k_netdev.c:1004]: (style) Variable 'err' is
reassigned a value before the old one has been used.

[fm10k_netdev.c:1019] -> [fm10k_netdev.c:1024]: (style) Variable 'err'
is reassigned a value before the old one has been used.

[fm10k_netdev.c:64]: (style) Variable 'err' is assigned a value that is
never used.

[fm10k_netdev.c:131]: (style) Variable 'err' is assigned a value that
is never used.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: remove unnecessary variable initializer
Jacob Keller [Mon, 8 Jul 2019 23:12:29 +0000 (16:12 -0700)]
fm10k: remove unnecessary variable initializer

The err variable in the fm10k_tlv_attr_parse function is initialized
with zero. However, the function never reads err without first assigning
it from a function call. Remove this unnecessary initialization.

This was detected by cppcheck and resolves the following warning
produced by that tool:

[fm10k_tlv.c:498]: (style) Variable 'err' is assigned a value that is
never used.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'net-l3-l4-functional-tests'
David S. Miller [Sat, 3 Aug 2019 17:42:05 +0000 (10:42 -0700)]
Merge branch 'net-l3-l4-functional-tests'

David Ahern says:

====================
net: Add functional tests for L3 and L4

This is a port the functional test cases created during the development
of the VRF feature. It covers various permutations of icmp, tcp and udp
for IPv4 and IPv6 including negative tests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add use case section to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:48 +0000 (11:56 -0700)]
selftests: Add use case section to fcnal-test

Add use case section to fcnal-test.

Initial test is VRF based with a bridge and vlans. The commands
stem from bug reports fixed by:

a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev")
cd6428988bf4 ("netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave")

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv6 netfilter tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:47 +0000 (11:56 -0700)]
selftests: Add ipv6 netfilter tests to fcnal-test

Add IPv6 netfilter tests to send tcp reset or icmp unreachable for a
port. Initial tests are VRF only.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv4 netfilter tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:46 +0000 (11:56 -0700)]
selftests: Add ipv4 netfilter tests to fcnal-test

Add netfilter tests to send tcp reset or icmp unreachable for a port.
Initial tests are VRF only.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv6 runtime tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:45 +0000 (11:56 -0700)]
selftests: Add ipv6 runtime tests to fcnal-test

Add IPv6 runtime tests where passive (no traffic flowing) and active
(with traffic) sockets are expected to be reset on device deletes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv4 runtime tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:44 +0000 (11:56 -0700)]
selftests: Add ipv4 runtime tests to fcnal-test

Add runtime tests where passive (no traffic flowing) and active (with
traffic) sockets are expected to be reset on device deletes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv6 address bind tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:43 +0000 (11:56 -0700)]
selftests: Add ipv6 address bind tests to fcnal-test

Add IPv6 address bind tests to fcnal-test.sh. Verifies socket binding to
local addresses for raw, tcp and udp including device and VRF cases.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv4 address bind tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:42 +0000 (11:56 -0700)]
selftests: Add ipv4 address bind tests to fcnal-test

Add address bind tests to fcnal-test.sh. Verifies socket binding to
local addresses for raw, tcp and udp including device and VRF cases.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv6 udp tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:41 +0000 (11:56 -0700)]
selftests: Add ipv6 udp tests to fcnal-test

Add IPv6 udp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.udp_l3mdev_accept set to 0 and 1.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv4 udp tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:40 +0000 (11:56 -0700)]
selftests: Add ipv4 udp tests to fcnal-test

Add udp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.udp_l3mdev_accept set to 0 and 1.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv6 tcp tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:39 +0000 (11:56 -0700)]
selftests: Add ipv6 tcp tests to fcnal-test

Add IPv6 tcp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.tcp_l3mdev_accept set to 0 and 1.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv4 tcp tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:38 +0000 (11:56 -0700)]
selftests: Add ipv4 tcp tests to fcnal-test

Add tcp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.tcp_l3mdev_accept set to 0 and 1.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv6 ping tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:37 +0000 (11:56 -0700)]
selftests: Add ipv6 ping tests to fcnal-test

Add IPv6 ping tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures.

Setup includes unreachable routes and fib rules blocking traffic.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add ipv4 ping tests to fcnal-test
David Ahern [Thu, 1 Aug 2019 18:56:36 +0000 (11:56 -0700)]
selftests: Add ipv4 ping tests to fcnal-test

Add ping tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures.

Setup includes unreachable routes and fib rules blocking traffic.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Setup for functional tests for fib and socket lookups
David Ahern [Thu, 1 Aug 2019 18:56:35 +0000 (11:56 -0700)]
selftests: Setup for functional tests for fib and socket lookups

Initial commit for functional test suite for fib and socket lookups.
This commit contains the namespace setup, networking config, test options
and other basic infrastructure.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add nettest
David Ahern [Thu, 1 Aug 2019 18:56:34 +0000 (11:56 -0700)]
selftests: Add nettest

Add nettest - a simple program with an implementation for various networking
APIs. nettest is used for tcp, udp and raw functional tests for both IPv4
and IPv6.

Point of this command versus existing utilities:
- controlled implementation of the APIs and the order in which they
  are called,
- ability to verify ingress device, local and remote addresses,
- timeout for controlled test length,
- ability to discriminate a timeout from a system call failure, and
- simplicity with test scripts.

The command returns:
  0  on success,
  1  for any system call failure, and
  2  on timeout.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Sat, 3 Aug 2019 17:33:01 +0000 (10:33 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-08-01

This series for fm10k, by Jake Keller, reduces the scope of local variables
where possible.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'enetc-PCIe-MDIO'
David S. Miller [Sat, 3 Aug 2019 01:22:18 +0000 (18:22 -0700)]
Merge branch 'enetc-PCIe-MDIO'

Claudiu Manoil says:

====================
enetc: Add mdio bus driver for the PCIe MDIO endpoint

First patch fixes a sparse issue and cleans up accessors to avoid
casting to __iomem.  The second one cleans up the Makefile, to make
it easier to add new entries.

Third patch just registers the PCIe endpoint device containing
the MDIO registers as a standalone MDIO bus driver, to provide
an alternative way to control the MDIO bus.  The same code used
by the ENETC ports (eth controllers) to manage MDIO via local
registers applies and is reused.

Bindings are provided for the new MDIO node, similarly to ENETC
port nodes bindings.

Last patch enables the ENETC port 1 and its RGMII PHY on the
LS1028A QDS board, where the MDIO muxing configuration relies
on the MDIO support provided in the first patch.

Changes since v0:
v1 - fixed mdio bus allocation
v2 - cleaned up accessors to avoid casting
v3 - fixed spelling (mostly commit message)
v4 - fixed err path check blunder
v5 - fixed loadble module build, provided separate kbuild module
     for the driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoarm64: dts: fsl: ls1028a: Enable eth port1 on the ls1028a QDS board
Claudiu Manoil [Thu, 1 Aug 2019 11:52:53 +0000 (14:52 +0300)]
arm64: dts: fsl: ls1028a: Enable eth port1 on the ls1028a QDS board

LS1028a has one Ethernet management interface. On the QDS board, the
MDIO signals are multiplexed to either on-board AR8035 PHY device or
to 4 PCIe slots allowing for SGMII cards.
To enable the Ethernet ENETC Port 1, which can only be connected to a
RGMII PHY, the multiplexer needs to be configured to route the MDIO to
the AR8035 PHY.  The MDIO/MDC routing is controlled by bits 7:4 of FPGA
board config register 0x54, and value 0 selects the on-board RGMII PHY.
The FPGA board config registers are accessible on the i2c bus, at address
0x66.

The PF3 MDIO PCIe integrated endpoint device allows for centralized access
to the MDIO bus.  Add the corresponding devicetree node and set it to be
the MDIO bus parent.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: fsl: enetc: Add bindings for the central MDIO PCIe endpoint
Claudiu Manoil [Thu, 1 Aug 2019 11:52:52 +0000 (14:52 +0300)]
dt-bindings: net: fsl: enetc: Add bindings for the central MDIO PCIe endpoint

The on-chip PCIe root complex that integrates the ENETC ethernet
controllers also integrates a PCIe endpoint for the MDIO controller
providing for centralized control of the ENETC mdio bus.
Add bindings for this "central" MDIO Integrated PCIe Endpoint.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: Add mdio bus driver for the PCIe MDIO endpoint
Claudiu Manoil [Thu, 1 Aug 2019 11:52:51 +0000 (14:52 +0300)]
enetc: Add mdio bus driver for the PCIe MDIO endpoint

ENETC ports can manage the MDIO bus via local register
interface.  However there's also a centralized way
to manage the MDIO bus, via the MDIO PCIe endpoint
device integrated by the same root complex that also
integrates the ENETC ports (eth controllers).

Depending on board design and use case, centralized
access to MDIO may be better than using local ENETC
port registers.  For instance, on the LS1028A QDS board
where MDIO muxing is required.  Also, the LS1028A on-chip
switch doesn't have a local MDIO register interface.

The current patch registers the above PCIe endpoint as a
separate MDIO bus and provides a driver for it by re-using
the code used for local MDIO access.  It also allows the
ENETC port PHYs to be managed by this driver if the local
"mdio" node is missing from the ENETC port node.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: Clean up makefile
Claudiu Manoil [Thu, 1 Aug 2019 11:52:50 +0000 (14:52 +0300)]
enetc: Clean up makefile

Clean up overcomplicated makefile to make it more maintainable.
Basically, there's a set of common objects shared between
the PF and VF driver modules.  This can be implemented in a
simpler way, without conditionals, less repetition, allowing
also for easier updates in the future.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: Clean up local mdio bus allocation
Claudiu Manoil [Thu, 1 Aug 2019 11:52:49 +0000 (14:52 +0300)]
enetc: Clean up local mdio bus allocation

What's needed is basically a pointer to the mdio registers.
This is one way to store it inside bus->priv allocated space,
without upsetting sparse.
Reworked accessors to avoid __iomem casting.
Used devm_* variant to further clean up the init error /
remove paths.

Fixes following sparse warning:
 warning: incorrect type in assignment (different address spaces)
    expected void *priv
    got struct enetc_mdio_regs [noderef] <asn:2>*[assigned] regs

Fixes: ebfcb23d62ab ("enetc: Add ENETC PF level external MDIO support")

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-dsa-mv88e6xxx-add-support-for-MV88E6220'
David S. Miller [Sat, 3 Aug 2019 00:58:53 +0000 (17:58 -0700)]
Merge branch 'net-dsa-mv88e6xxx-add-support-for-MV88E6220'

Hubert Feurstein says:

====================
net: dsa: mv88e6xxx: add support for MV88E6220

This patch series adds support for the MV88E6220 chip to the mv88e6xxx driver.
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins.

Furthermore, PTP support is added to the MV88E6250 family.

v2:
 - insert all 6220 entries in correct numerical order
 - introduce invalid_port_mask
 - move ptp_cc_mult* to ptp_ops and restored original ptp_adjfine code
 - added Andrews Reviewed-By to patch 2 and 4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: add PTP support for MV88E6250 family
Hubert Feurstein [Wed, 31 Jul 2019 08:23:51 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: add PTP support for MV88E6250 family

This adds PTP support for the MV88E6250 family.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: order ptp structs numerically ascending
Hubert Feurstein [Wed, 31 Jul 2019 08:23:50 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: order ptp structs numerically ascending

As it is done for all the other structs within this driver.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: setup message port is not supported in the 6250 familiy
Hubert Feurstein [Wed, 31 Jul 2019 08:23:49 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: setup message port is not supported in the 6250 familiy

The MV88E6250 family doesn't support the MV88E6XXX_PORT_CTL1_MESSAGE_PORT
bit.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: introduce invalid_port_mask in mv88e6xxx_info
Hubert Feurstein [Wed, 31 Jul 2019 08:23:48 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: introduce invalid_port_mask in mv88e6xxx_info

With this it is possible to mark certain chip ports as invalid. This is
required for example for the MV88E6220 (which is in general a MV88E6250
with 7 ports) but the ports 2-4 are not routed to pins.

If a user configures an invalid port, an error is returned.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: dsa: marvell: add 6220 model to the 6250 family
Hubert Feurstein [Wed, 31 Jul 2019 08:23:47 +0000 (10:23 +0200)]
dt-bindings: net: dsa: marvell: add 6220 model to the 6250 family

The MV88E6220 is part of the MV88E6250 family.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: add support for MV88E6220
Hubert Feurstein [Wed, 31 Jul 2019 08:23:46 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: add support for MV88E6220

The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins. So the usable ports are 0, 1, 5 and 6.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-phy-Add-AST2600-MDIO-support'
David S. Miller [Sat, 3 Aug 2019 00:56:37 +0000 (17:56 -0700)]
Merge branch 'net-phy-Add-AST2600-MDIO-support'

Andrew Jeffery says:

====================
net: phy: Add AST2600 MDIO support

v2 of the ASPEED MDIO series addresses comments from Rob on the devicetree
bindings and Andrew on the driver itself.

v1 of the series can be found here:

http://patchwork.ozlabs.org/cover/1138140/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ftgmac100: Select ASPEED MDIO driver for the AST2600
Andrew Jeffery [Wed, 31 Jul 2019 05:39:59 +0000 (15:09 +0930)]
net: ftgmac100: Select ASPEED MDIO driver for the AST2600

Ensures we can talk to a PHY via MDIO on the AST2600, as the MDIO
controller is now separate from the MAC.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ftgmac100: Add support for DT phy-handle property
Andrew Jeffery [Wed, 31 Jul 2019 05:39:58 +0000 (15:09 +0930)]
net: ftgmac100: Add support for DT phy-handle property

phy-handle is necessary for the AST2600 which separates the MDIO
controllers from the MAC.

I've tried to minimise the intrusion of supporting the AST2600 to the
FTGMAC100 by leaving in place the existing MDIO support for the embedded
MDIO interface. The AST2400 and AST2500 continue to be supported this
way, as it avoids breaking/reworking existing devicetrees.

The AST2600 support by contrast requires the presence of the phy-handle
property in the MAC devicetree node to specify the appropriate PHY to
associate with the MAC. In the event that someone wants to specify the
MDIO bus topology under the MAC node on an AST2400 or AST2500, the
current auto-probe approach is done conditional on the absence of an
"mdio" child node of the MAC.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: Add mdio-aspeed
Andrew Jeffery [Wed, 31 Jul 2019 05:39:57 +0000 (15:09 +0930)]
net: phy: Add mdio-aspeed

The AST2600 design separates the MDIO controllers from the MAC, which is
where they were placed in the AST2400 and AST2500. Further, the register
interface is reworked again, so now we have three possible different
interface implementations, however this driver only supports the
interface provided by the AST2600. The AST2400 and AST2500 will continue
to be supported by the MDIO support embedded in the FTGMAC100 driver.

The hardware supports both C22 and C45 mode, but for the moment only C22
support is implemented.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: Add aspeed, ast2600-mdio binding
Andrew Jeffery [Wed, 31 Jul 2019 05:39:56 +0000 (15:09 +0930)]
dt-bindings: net: Add aspeed, ast2600-mdio binding

The AST2600 splits out the MDIO bus controller from the MAC into its own
IP block and rearranges the register layout. Add a new binding to
describe the new hardware.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: reduce risk of wakeup queue starvation
Jon Maloy [Tue, 30 Jul 2019 14:23:18 +0000 (16:23 +0200)]
tipc: reduce risk of wakeup queue starvation

In commit 365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.

Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.

We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.

Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agofm10k: reduce scope of the ring variable
Jacob Keller [Mon, 8 Jul 2019 23:12:28 +0000 (16:12 -0700)]
fm10k: reduce scope of the ring variable

Reduce the scope of the ring local variable in the fm10k_assign_l2_accel
function.

This was detected by cppcheck and resolves the following warning
produced by that tool:

[fm10k_netdev.c:1447]: (style) The scope of the variable 'ring' can be
reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of the result local variable
Jacob Keller [Mon, 8 Jul 2019 23:12:27 +0000 (16:12 -0700)]
fm10k: reduce the scope of the result local variable

Reduce the scope of the result local variable in the
fm10k_iov_msg_lport_state_pf function.

This was detected by cppcheck and resolves the following warning
produced by that tool:

[fm10k_pf.c:1435]: (style) The scope of the variable 'result' can be
reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of the local msg variable
Jacob Keller [Mon, 8 Jul 2019 23:12:26 +0000 (16:12 -0700)]
fm10k: reduce the scope of the local msg variable

The msg variable in the fm10k_mbx_validate_msg_size and
fm10k_sm_mbx_transmit functions is only used within the do {} loop
scope. Reduce its scope only to where it is used.

This was detected by cppcheck, and resolves the following warnings
produced by that tool:

[fm10k_mbx.c:299]: (style) The scope of the variable 'msg' can be reduced.
[fm10k_mbx.c:2004]: (style) The scope of the variable 'msg' can be reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of the local i variable
Jacob Keller [Mon, 8 Jul 2019 23:12:25 +0000 (16:12 -0700)]
fm10k: reduce the scope of the local i variable

Reduce the scope of the local loop variable in the
fm10k_check_hang_subtask function.

This was detected by cppcheck and resolves the following warning
produced by that tool:

[driver/fm10k_pci.c:852]: (style) The scope of the variable 'i' can be
reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of the err variable
Jacob Keller [Mon, 8 Jul 2019 23:12:24 +0000 (16:12 -0700)]
fm10k: reduce the scope of the err variable

Reduce the scope of the local variable err in the fm10k_detach_subtask
function.

This was detected by cppcheck and resolves the following warning
produced by that tool:

[fm10k_pci.c:403]: (style) The scope of the variable 'err' can be reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of the tx_buffer variable
Jacob Keller [Mon, 8 Jul 2019 23:12:23 +0000 (16:12 -0700)]
fm10k: reduce the scope of the tx_buffer variable

The tx_buffer local variable in the function fm10k_clean_tx_ring is not
used except inside a smaller block scope. Reduce the scope to its point
of use.

This was detected by cppcheck and resolves the following style warning
produced by that tool:

[fm10k_netdev.c:179]: (style) The scope of the variable 'tx_buffer' can
be reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of the q_idx local variable
Jacob Keller [Mon, 8 Jul 2019 23:12:22 +0000 (16:12 -0700)]
fm10k: reduce the scope of the q_idx local variable

Reduce the scope of the q_idx local variable in the fm10k_cache_ring_qos
function.

This was detected by cppcheck and resolves the following style warning
produced by that tool:

[fm10k_main.c:2016]: (style) The scope of the variable 'q_idx' can be
reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of local err variable
Jacob Keller [Mon, 8 Jul 2019 23:12:21 +0000 (16:12 -0700)]
fm10k: reduce the scope of local err variable

Reduce the scope of the local err variable in the fm10k_iov_alloc_data
function.

This was detected by cppcheck and resolves the following style warning
produced by that tool:

[fm10k_iov.c:426]: (style) The scope of the variable 'err' can be reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce the scope of qv local variable
Jacob Keller [Mon, 8 Jul 2019 23:12:20 +0000 (16:12 -0700)]
fm10k: reduce the scope of qv local variable

Reduce the scope of the qv vector pointer local variable in the
fm10k_set_coalesce function.

This was detected by cppcheck and resolves the following style warning
produced by that tool:

[fm10k_ethtool.c:658]: (style) The scope of the variable 'qv' can be
reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce scope of *p local variable
Jacob Keller [Mon, 8 Jul 2019 23:12:19 +0000 (16:12 -0700)]
fm10k: reduce scope of *p local variable

Reduce the scope of the char *p local variable to only the block where
it is used.

This was detected by cppcheck and resolves the following style warning
produced by that tool:

[fm10k_ethtool.c:229]: (style) The scope of the variable 'p' can be
reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofm10k: reduce scope of the err variable
Jacob Keller [Mon, 8 Jul 2019 23:12:18 +0000 (16:12 -0700)]
fm10k: reduce scope of the err variable

Reduce the scope of the err local variable in the fm10k_dcbnl_ieee_setets
function.

This was detected using cppcheck, and resolves the following style
warning:

[fm10k_dcbnl.c:37]: (style) The scope of the variable 'err' can be reduced.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'net-dsa-mv88e6xxx-avoid-some-redundant-VTU-operations'
David S. Miller [Thu, 1 Aug 2019 20:43:09 +0000 (16:43 -0400)]
Merge branch 'net-dsa-mv88e6xxx-avoid-some-redundant-VTU-operations'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: avoid some redundant VTU operations

The mv88e6xxx driver currently uses a mv88e6xxx_vtu_get wrapper to get a
single entry and uses a boolean to eventually initialize a fresh one.

However the fresh entry is only needed in one place and mv88e6xxx_vtu_getnext
is simple enough to call it directly. Doing so makes the code easier to read,
especially for the return code expected by switchdev to honor software VLANs.

In addition to not loading the VTU again when an entry is already correctly
programmed, this also allows to avoid programming the broadcast entries
again when updating a port's membership, from e.g. tagged to untagged.

This patch series removes the mv88e6xxx_vtu_get wrapper in favor of direct
calls to mv88e6xxx_vtu_getnext, and also renames the _mv88e6xxx_port_vlan_add
and _mv88e6xxx_port_vlan_del helpers using an old underscore prefix convention.

In case the port's membership is already correctly programmed in hardware,
the following debug message may be printed:

    [  745.989884] mv88e6085 2188000.ethernet-1:00: p4: already a member of VLAN 42
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: call vtu_getnext directly in vlan_add
Vivien Didelot [Thu, 1 Aug 2019 18:36:37 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_add

Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read and
_mv88e6xxx_port_vlan_add is the only function requiring the preparation
of a new VLAN entry.

To simplify things up, remove the mv88e6xxx_vtu_get wrapper and
explicit the VLAN lookup in _mv88e6xxx_port_vlan_add. This rework
also avoids programming the broadcast entries again when changing a
port's membership, e.g. from tagged to untagged.

At the same time, rename the helper using an old underscore convention.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: call vtu_getnext directly in vlan_del
Vivien Didelot [Thu, 1 Aug 2019 18:36:36 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_del

Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read.
Explicit the call to mv88e6xxx_vtu_getnext in _mv88e6xxx_port_vlan_del
and the return value expected by switchdev in case of software VLANs.

At the same time, rename the helper using an old underscore convention.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: call vtu_getnext directly in db load/purge
Vivien Didelot [Thu, 1 Aug 2019 18:36:35 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: call vtu_getnext directly in db load/purge

mv88e6xxx_vtu_getnext is simple enough to call it directly in the
mv88e6xxx_port_db_load_purge function and explicit the return code
expected by switchdev for software VLANs when an hardware VLAN does
not exist.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: explicit entry passed to vtu_getnext
Vivien Didelot [Thu, 1 Aug 2019 18:36:34 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: explicit entry passed to vtu_getnext

mv88e6xxx_vtu_getnext interprets two members from the input
mv88e6xxx_vtu_entry structure: the (excluded) vid member to start
the iteration from, and the valid argument specifying whether the VID
must be written or not (only required once at the start of a loop).

Explicit the assignation of these two fields right before calling
mv88e6xxx_vtu_getnext, as it is done in the mv88e6xxx_vtu_get wrapper.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: lock mutex in vlan_prepare
Vivien Didelot [Thu, 1 Aug 2019 18:36:33 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: lock mutex in vlan_prepare

Lock the mutex in the mv88e6xxx_port_vlan_prepare function
called by the DSA stack, instead of doing it in the internal
mv88e6xxx_port_check_hw_vlan helper.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Allow dropping specific tunnel packets
Tonghao Zhang [Thu, 1 Aug 2019 08:40:59 +0000 (16:40 +0800)]
net/mlx5e: Allow dropping specific tunnel packets

In some case, we don't want to allow specific tunnel packets
to host that can avoid to take up high CPU (e.g network attacks).
But other tunnel packets which not matched in hardware will be
sent to host too.

    $ tc filter add dev vxlan_sys_4789 \
    protocol ip chain 0 parent ffff: prio 1 handle 1 \
    flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
    enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
    action tunnel_key unset pipe action drop

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: TX reporter cleanup
Aya Levin [Mon, 24 Jun 2019 17:33:52 +0000 (20:33 +0300)]
net/mlx5e: TX reporter cleanup

Remove redundant include files.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Set tx reporter only on successful creation
Aya Levin [Mon, 24 Jun 2019 16:34:42 +0000 (19:34 +0300)]
net/mlx5e: Set tx reporter only on successful creation

When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fix mlx5e_tx_reporter_create return value
Aya Levin [Wed, 3 Jul 2019 06:16:52 +0000 (09:16 +0300)]
net/mlx5e: Fix mlx5e_tx_reporter_create return value

Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
can't be NULL and can only hold an error in bad path.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Rx, checksum handling refactoring
Saeed Mahameed [Fri, 3 May 2019 22:12:46 +0000 (15:12 -0700)]
net/mlx5e: Rx, checksum handling refactoring

Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is
supposed to fixup SKB checksum if needed. And rename
mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup().

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Tx, Soften inline mode VLAN dependencies
Tariq Toukan [Mon, 1 Jul 2019 09:08:08 +0000 (12:08 +0300)]
net/mlx5e: Tx, Soften inline mode VLAN dependencies

If capable, use zero inline mode in TX WQE for non-VLAN packets.
For VLAN ones, keep the enforcement of at least L2 inline mode,
unless the WQE VLAN insertion offload cap is on.

Performance:
Tested single core packet rate of 64Bytes.

NIC: ConnectX-5
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz

pktgen:
Before: 12.46 Mpps
After:  14.65 Mpps (+17.5%)

XDP_TX:
The MPWQE flow is not affected, as it already has this optimization.
So we test with priv-flag xdp_tx_mpwqe: off.

Before:  9.90 Mpps
After:  10.20 Mpps (+3%)

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Tested-by: Noam Stolero <noams@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Slight enhancement for WQE fetch function
Tariq Toukan [Sun, 14 Jul 2019 14:50:51 +0000 (17:50 +0300)]
net/mlx5e: XDP, Slight enhancement for WQE fetch function

Instead of passing an output param, let function return the
WQE pointer.
In addition, pass &pi so it gets its value in the function,
and save the redundant assignment that comes after it.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left
Shay Agroskin [Sun, 12 May 2019 15:28:27 +0000 (18:28 +0300)]
net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left

In MPWQE mode, when transmitting packets with XDP, a packet that is smaller
than a certain size (set to 256 bytes) would be sent inline within its WQE
TX descriptor (mem-copied), in case the hardware tx queue is congested
beyond a pre-defined water-mark.

If a MPWQE cannot contain an additional inline packet, we close this
MPWQE session, and send the packet inlined within the next MPWQE.
To save some MPWQE session close+open operations, we don't open MPWQE
sessions that are contiguously smaller than certain size (set to the
HW MPWQE maximum size). If there isn't enough contiguous room in the
send queue, we fill it with NOPs and wrap the send queue index around.

This way, qualified packets are always sent inline.

Perf tests:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_TX:

With 24 channels:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 113.6Mpps       | 96.3Mpps        | 84%          |
| after  |   115Mpps       | 99.5Mpps        | 86%          |

With one channel:

| ------ | bounced packets | inlined packets | inline ratio |
| before | 6.7Mpps         | 0pps            | 0%           |
| after  | 6.8Mpps         | 0pps            | 0%           |

As we can see, there is improvement in both inline ratio and overall
packet rate for 24 channels. Also, we see no degradation for the
one-channel case.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Tx, Strict the room needed for SQ edge NOPs
Tariq Toukan [Thu, 11 Jul 2019 08:20:22 +0000 (11:20 +0300)]
net/mlx5e: Tx, Strict the room needed for SQ edge NOPs

We use NOPs to populate the WQ fragment edge if the WQE does not fit
in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ).

The upper bound on the needed number of NOPs is one WQEBB less than
the largest possible WQE, for otherwise the WQE would certainly fit.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Add flow counter pool
Gavi Teitz [Thu, 27 Jun 2019 17:53:03 +0000 (20:53 +0300)]
net/mlx5: Add flow counter pool

Add a pool of flow counters, based on flow counter bulks, removing the
need to allocate a new counter via a costly FW command during the flow
creation process. The time it takes to acquire/release a flow counter
is cut from ~50 [us] to ~50 [ns].

The pool is part of the mlx5 driver instance, and provides flow
counters for aging flows. mlx5_fc_create() was modified to provide
counters for aging flows from the pool by default, and
mlx5_destroy_fc() was modified to release counters back to the pool
for later reuse. If bulk allocation is not supported or fails, and for
non-aging flows, the fallback behavior is to allocate and free
individual counters.

The pool is comprised of three lists of flow counter bulks, one of
fully used bulks, one of partially used bulks, and one of unused
bulks. Counters are provided from the partially used bulks first, to
help limit bulk fragmentation.

The pool maintains a threshold, and strives to maintain the amount of
available counters below it. The pool is increased in size when a
counter acquisition request is made and there are no available
counters, and it is decreased in size when the last counter in a bulk
is released and there are more available counters than the threshold.
All pool size changes are done in the context of the
acquiring/releasing process.

The value of the threshold is directly correlated to the amount of
used counters the pool is providing, while constrained by a hard
maximum, and is recalculated every time a bulk is allocated/freed.
This ensures that the pool only consumes large amounts of memory for
available counters if the pool is being used heavily. When fully
populated and at the hard maximum, the buffer of available counters
consumes ~40 [MB].

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Add flow counter bulk infrastructure
Gavi Teitz [Thu, 27 Jun 2019 10:58:56 +0000 (13:58 +0300)]
net/mlx5: Add flow counter bulk infrastructure

Add infrastructure to track bulks of flow counters, providing
the means to allocate and deallocate bulks, and to acquire and
release individual counters from the bulks.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, add ingress rate support
Eli Cohen [Wed, 8 May 2019 08:44:56 +0000 (11:44 +0300)]
net/mlx5: E-Switch, add ingress rate support

Use the scheduling elements to implement ingress rate limiter on an
eswitch ports ingress traffic. Since the ingress of eswitch port is the
egress of VF port, we control eswitch ingress by controlling VF egress.

Configuration is done using the ports' representor net devices.

Please note that burst size configuration is not supported by devices
ConnectX-5 and earlier generations.

Configuration examples:
tc:
tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k

ovs:
ovs-vsctl set interface eth0 ingress_policing_rate=1000

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>