platform/kernel/linux-starfive.git
5 years agoMerge branch 'mlxsw-spectrum_router-Add-GRE-tunnel-support-for-Spectrum-2'
David S. Miller [Sun, 20 Jan 2019 19:12:59 +0000 (11:12 -0800)]
Merge branch 'mlxsw-spectrum_router-Add-GRE-tunnel-support-for-Spectrum-2'

Ido Schimmel says:

====================
mlxsw: spectrum_router: Add GRE tunnel support for Spectrum-2

Nir says:

In Spectrum-2, HW implementation of layer 3 tunnels differs from
Spectrum-1 when it comes to the underlay routing table selection.
Spectrum-2 uses a dedicated RIF that points to the virtual router used
for forwarding the encapsulated packets, while Spectrum-1 explicitly
specifies the virtual router itself.

Patches #1 and #2 add additional fields in RITR - Router interface table
register and RTDP - Routing tunnel decap properties respectively, the
fields are required for the new underlay RIF needed for Spectrum-2.

Patches #3-4 allow different set of RIF operations per ASIC type. The
first patch splits the operations and the following patch sets RIF ops
according to ASIC type.

Patches #5-9 introduce small changes to existing code to allow existence
of a dedicated underlay RIF along with the underlay virtual router, and
to support that new type of RIF that has no device.

Patch #10 takes care of updating the tunnel decap properties egress
underlay RIF required for Spectrum-2.

Patch #11 adds the implementation of Spectrum-2 specific RIF operations
and essentially enables layer 3 GRE tunnels on Spectrum-2.

Finally patches #12-18 add tests for GRE IP-in-IP tunnels, both in flat
and hierarchical topologies.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP-in-IP GRE hierarchical topology with keys test
Nir Dotan [Sun, 20 Jan 2019 06:50:58 +0000 (06:50 +0000)]
selftests: forwarding: Add IP-in-IP GRE hierarchical topology with keys test

Add a test that checks IP-in-IP GRE tunneling and MTU change of tunnel,
where an ikey/okey pair is set. This test is based on hierarchical topology
described in file ipip_lib.sh.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP-in-IP GRE hierarchical topology with key test
Nir Dotan [Sun, 20 Jan 2019 06:50:57 +0000 (06:50 +0000)]
selftests: forwarding: Add IP-in-IP GRE hierarchical topology with key test

Add a test that checks IP-in-IP GRE tunneling and MTU change of tunnel,
where a key is set. This test is based on hierarchical topology described
in file ipip_lib.sh.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP-in-IP GRE hierarchical topology test
Nir Dotan [Sun, 20 Jan 2019 06:50:56 +0000 (06:50 +0000)]
selftests: forwarding: Add IP-in-IP GRE hierarchical topology test

Add a test that checks IP-in-IP GRE tunneling and MTU change of tunnel,
based on hierarchical topology described in file ipip_lib.sh.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP-in-IP GRE flat topology with keys test
Nir Dotan [Sun, 20 Jan 2019 06:50:55 +0000 (06:50 +0000)]
selftests: forwarding: Add IP-in-IP GRE flat topology with keys test

Add a test that checks IP-in-IP GRE tunneling and MTU change of tunnel,
where an ikey/okey pair is set. This test is based on flat topology
described in file ipip_lib.sh.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP-in-IP GRE flat topology with key test
Nir Dotan [Sun, 20 Jan 2019 06:50:54 +0000 (06:50 +0000)]
selftests: forwarding: Add IP-in-IP GRE flat topology with key test

Add a test that checks IP-in-IP GRE tunneling and MTU change of tunnel,
where a key is set. This test is based on flat topology described in file
ipip_lib.sh.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP-in-IP GRE flat topology test
Nir Dotan [Sun, 20 Jan 2019 06:50:53 +0000 (06:50 +0000)]
selftests: forwarding: Add IP-in-IP GRE flat topology test

Add a test that checks IP-in-IP GRE tunneling and MTU change of tunnel,
based on flat topology described in file ipip_lib.sh.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add IP tunneling lib
Nir Dotan [Sun, 20 Jan 2019 06:50:52 +0000 (06:50 +0000)]
selftests: forwarding: Add IP tunneling lib

Add a library with helper functions, to be used in testing IP-in-IP and GRE
tunnels, both in flat and in hierarchical topologies.
The topologies used in this library cover the three scenarios of tunnels -
a tunel with no bound device, a tunnel with bound device in the same VRF
and a tunnel with a bound device in a different VRF.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Add GRE tunnel support for Spectrum-2
Nir Dotan [Sun, 20 Jan 2019 06:50:51 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Add GRE tunnel support for Spectrum-2

Spectrum-2 GRE tunnel implementation requires a specific underlay RIF that
points to the virtual router used for forwarding the encapsulated packet.

Add Spectrum-2 specific loopback router interface creation methods which
may create or reuse the dedicated underlay RIF.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Update tunnel decap properties
Nir Dotan [Sun, 20 Jan 2019 06:50:50 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Update tunnel decap properties

Spectrum-2 requires to specify the egress RIF when setting tunnel decap
properties. Add a method for accessing the underlay RIF index and then use
it when setting decap properties.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Support RIF without device
Nir Dotan [Sun, 20 Jan 2019 06:50:49 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Support RIF without device

Spectrum-2 underlay RIF is merely an auxiliary RIF that points to the
virtual router used for encapsulated packets lookup. It exists only when
its overlay RIF exists but may be shared with other overlay RIFs.
Hence it is undesired to mark any device as related to it.

Therefore allow usage of NULL device when allocating RIF.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Change mlxsw_sp_ipip_lb_ul_vr_id()
Nir Dotan [Sun, 20 Jan 2019 06:50:48 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Change mlxsw_sp_ipip_lb_ul_vr_id()

For the sake of Spectrum-2 GRE support, as ul_vr_id field is reserved for
Spectrum-2, Change mlxsw_sp_ipip_lb_ul_vr_id() implementation not to use
the reserved field.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Add underlay RIF ID support
Nir Dotan [Sun, 20 Jan 2019 06:50:47 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Add underlay RIF ID support

Spectrum-2 GRE tunnels underlay should be given not only the virtual router
information for an encapsulated packet lookup, but also an underlay RIF
object which belongs to a virtual router.

Therefore add ul_rif_id field in struct mlxsw_sp_rif_ipip_lb, to be used
later in Spectrum-2 underlay RIF implementation. This field complements
ul_vr_id field, already present and defined as reserved for Spectrum-2.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Mark RIF index as taken before creation
Nir Dotan [Sun, 20 Jan 2019 06:50:46 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Mark RIF index as taken before creation

The presence of an allocated RIF in mlxsw_sp->router->rifs[rif_index] marks
that rif_index as taken.
Set the marking of a taken RIF to happen before calling ops->create in
order to allow creation of a GRE underlay RIF, which may be allocated and
created as part of an overlay RIF creation.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Adjust loopback RIF configuration
Nir Dotan [Sun, 20 Jan 2019 06:50:42 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Adjust loopback RIF configuration

In Spectrum-2, the underlay routing table is pointed by an underlay router
interface in contrary to Spectrum where only an underlay virtual router
should be set. That makes the underlay virtual router field in RITR
reserved for Spectrum-2.

Change loopback RIF creation function to support the new underlay RIF
field, however leave this field reserved for Spectrum-1 updates.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Set RIF ops per ASIC type
Nir Dotan [Sun, 20 Jan 2019 06:50:41 +0000 (06:50 +0000)]
mlxsw: spectrum: Set RIF ops per ASIC type

Set RIF ops array as member of mlxsw_sp in order to control which RIF
operations callbacks are called per ASIC type. This is needed to control
per ASIC handling of loopback RIF configurations.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Split RIF ops array for Spectrum-2 support
Nir Dotan [Sun, 20 Jan 2019 06:50:40 +0000 (06:50 +0000)]
mlxsw: spectrum_router: Split RIF ops array for Spectrum-2 support

Split RIF ops array for Spectrum-1 and Spectrum-2 callbacks in order to
support different sets of operations for loopback RIF handling, as
underlying implementation differs between the ASICs.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Add underlay egress RIF field in RTDP register
Ido Schimmel [Sun, 20 Jan 2019 06:50:39 +0000 (06:50 +0000)]
mlxsw: reg: Add underlay egress RIF field in RTDP register

In Spectrum-2 we need to specify the underlay egress router interface
when performing IP-in-IP and NVE packet decapsulation in the underlay
router.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: reg: Add fields to RITR - Router Interface Table Register
Nir Dotan [Sun, 20 Jan 2019 06:50:39 +0000 (06:50 +0000)]
mlxsw: reg: Add fields to RITR - Router Interface Table Register

Add fields relevant for Spectrum-2 Loopback IPinIP router interface
creation. Add additional Loopback RIF protocol value - Generic, used for
creation of an explicit underlay RIF, and also add a field named
underlay_rif used for specifying the underlay RIF of a tunnel.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'r8169-series-with-smaller-improvements'
David S. Miller [Sun, 20 Jan 2019 00:09:14 +0000 (16:09 -0800)]
Merge branch 'r8169-series-with-smaller-improvements'

Heiner Kallweit says:

====================
r8169: series with smaller improvements

Series with smaller improvements.

v2:
- fixed a small copy & paste error in patch 4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: factor out getting ether_clk
Heiner Kallweit [Sat, 19 Jan 2019 21:07:34 +0000 (22:07 +0100)]
r8169: factor out getting ether_clk

rtl_init_one() is complex enough, so we better factor out getting the
ether_clk.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: replace mii_bus member with phy_device member in struct rtl8169_private
Heiner Kallweit [Sat, 19 Jan 2019 21:07:05 +0000 (22:07 +0100)]
r8169: replace mii_bus member with phy_device member in struct rtl8169_private

Accessing the phy_device indirectly via the netdevice causes few issues:
- Accessing the phy_device when it's not attached may cause a NPE.
- If we have to access the phy_device when it's not attached we have
  to use mdiobus_get_phy() to get a reference to the phy_device.

Therefore store a phy_device reference in struct rtl8169_private directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: reset chip synchronously in __rtl8169_resume
Heiner Kallweit [Sat, 19 Jan 2019 21:06:25 +0000 (22:06 +0100)]
r8169: reset chip synchronously in __rtl8169_resume

Triggering an asynchronous reset is problematic for the following
reasons, therefore reset the chip synchronously.

- The reset routine resets registers and parameters behind our back
  what may collide with code executed after triggering the reset.

- __rtl8169_resume() is called as part of pm_runtime_get_sync() and
  callers expect that the chip is fully resumed afterwards.

In context of this driver triggering an asynchonous reset should be
considered an emergency procedure.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: add helpers for locking / unlocking the config registers
Heiner Kallweit [Sat, 19 Jan 2019 21:05:48 +0000 (22:05 +0100)]
r8169: add helpers for locking / unlocking the config registers

Add helpers for locking / unlocking the config registers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve rtl_pcie_state_l2l3_enable
Heiner Kallweit [Sat, 19 Jan 2019 21:05:14 +0000 (22:05 +0100)]
r8169: improve rtl_pcie_state_l2l3_enable

All calls to this function have the enable parameter set to false.
So we can replace the function with a disable-only version.

v2:
- fixed copy & paste error

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: initialize task workqueue only once
Heiner Kallweit [Sat, 19 Jan 2019 21:03:49 +0000 (22:03 +0100)]
r8169: initialize task workqueue only once

It's sufficient to initialize the workqueue once, therefore remove the
additional initialization whenever rtl_open() is called.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove unneeded call in pcierr
Heiner Kallweit [Sat, 19 Jan 2019 21:03:13 +0000 (22:03 +0100)]
r8169: remove unneeded call in pcierr

rtl8169_hw_reset() is called as part of the reset routine which is
scheduled in the line after. So we can remove the call to
rtl8169_hw_reset() here.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove rtl_get_events
Heiner Kallweit [Sat, 19 Jan 2019 21:02:40 +0000 (22:02 +0100)]
r8169: remove rtl_get_events

This helper is used only once, so remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: add performance counters for basic filter
Cong Wang [Fri, 18 Jan 2019 01:14:01 +0000 (17:14 -0800)]
net_sched: add performance counters for basic filter

Similar to u32 filter, it is useful to know how many times
we reach each basic filter and how many times we pass the
ematch attached to it.

Sample output:

filter protocol arp pref 49152 basic chain 0
filter protocol arp pref 49152 basic chain 0 handle 0x1  (rule hit 3 success 3)
action order 1: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 1 installed 81 sec used 4 sec
Action statistics:
Sent 126 bytes 3 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sock: do not set sk_cookie in sk_clone_lock()
Yafang Shao [Fri, 18 Jan 2019 05:00:51 +0000 (13:00 +0800)]
net: sock: do not set sk_cookie in sk_clone_lock()

The only call site of sk_clone_lock is in inet_csk_clone_lock,
and sk_cookie will be set there.
So we don't need to set sk_cookie in sk_clone_lock().

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoisdn: remove unneeded semicolon
YueHaibing [Fri, 18 Jan 2019 03:05:11 +0000 (11:05 +0800)]
isdn: remove unneeded semicolon

remove unneeded semicolon

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: usb: rtl8150: remove set but not used variable 'rx_stat'
Yue Haibing [Fri, 18 Jan 2019 02:06:49 +0000 (02:06 +0000)]
net: usb: rtl8150: remove set but not used variable 'rx_stat'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/usb/rtl8150.c: In function 'read_bulk_callback':
drivers/net/usb/rtl8150.c:391:6: warning:
 variable 'rx_stat' set but not used [-Wunused-but-set-variable]

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'dpaa2-eth-add-debugfs-statistics'
David S. Miller [Sat, 19 Jan 2019 18:28:43 +0000 (10:28 -0800)]
Merge branch 'dpaa2-eth-add-debugfs-statistics'

Ioana Ciornei says:

====================
dpaa2-eth: add debugfs statistics

This patch set exports detailed driver counters through debugfs.
Counters which are already available through ethtool are now
presented in a structured manner (per-core, per-FQ and
per-channel) in debugfs.

The first patch is changing the dpaa2_eth_queue_count into a macro
(in order to avoid a warning) while the second one is adding the
debugfs support.

Changes in v2:
  - remove the _exit annotation of dpaa2_eth_dbg_exit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-eth: add debugfs statistics
Ioana Radulescu [Fri, 18 Jan 2019 16:16:00 +0000 (16:16 +0000)]
dpaa2-eth: add debugfs statistics

Export detailed driver counters through debugfs.

Statistics already available in ethtool are presented in a
structured manner. Includes per-core, per-FQ and per-channel statistics.

Also transition from module_fsl_mc_driver to explicit module_init/exit
in order to create the debugfs directory besides registering the driver.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-eth: transform dpaa2_eth_queue_count into a macro
Ioana Ciornei [Fri, 18 Jan 2019 16:15:59 +0000 (16:15 +0000)]
dpaa2-eth: transform dpaa2_eth_queue_count into a macro

Transform dpaa2_eth_queue_count into a macro to follow the
the convention used by dpaa2_eth_fs_count and other functions.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-use-strict-checks-in-doit-handlers'
David S. Miller [Sat, 19 Jan 2019 18:09:59 +0000 (10:09 -0800)]
Merge branch 'net-use-strict-checks-in-doit-handlers'

Jakub Kicinski says:

====================
net: use strict checks in doit handlers

This series extends strict argument checking to doit handlers
of the GET* nature.  This is a bit tricky since strict checking
flag has already been released..

iproute2 did not have a release with strick checks enabled,
and it will only need a minor one-liner to pass strick checks
after all the work that DaveA has already done.

Big thanks to Dave Ahern for help and guidence.

v2:
 - remove unnecessary check in patch 5 (Nicolas);
 - add path 7 (DaveA);
 - improve messages in patch 8 (DaveA).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mpls: netconf: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:26 +0000 (10:46 -0800)]
net: mpls: netconf: perform strict checks also for doit handlers

Make RTM_GETNETCONF's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mpls: route: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:25 +0000 (10:46 -0800)]
net: mpls: route: perform strict checks also for doit handlers

Make RTM_GETROUTE's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv6: route: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:24 +0000 (10:46 -0800)]
net: ipv6: route: perform strict checks also for doit handlers

Make RTM_GETROUTE's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv6: addrlabel: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:23 +0000 (10:46 -0800)]
net: ipv6: addrlabel: perform strict checks also for doit handlers

Make RTM_GETADDRLABEL's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv6: netconf: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:22 +0000 (10:46 -0800)]
net: ipv6: netconf: perform strict checks also for doit handlers

Make RTM_GETNETCONF's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv6: addr: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:21 +0000 (10:46 -0800)]
net: ipv6: addr: perform strict checks also for doit handlers

Make RTM_GETADDR's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: ipmr: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:20 +0000 (10:46 -0800)]
net: ipv4: ipmr: perform strict checks also for doit handlers

Make RTM_GETROUTE's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

v2: - improve extack messages (DaveA).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: route: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:19 +0000 (10:46 -0800)]
net: ipv4: route: perform strict checks also for doit handlers

Make RTM_GETROUTE's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

v2: - new patch (DaveA).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipv4: netconf: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:18 +0000 (10:46 -0800)]
net: ipv4: netconf: perform strict checks also for doit handlers

Make RTM_GETNETCONF's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: namespace: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:17 +0000 (10:46 -0800)]
net: namespace: perform strict checks also for doit handlers

Make RTM_GETNSID's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

v2: - don't check size >= sizeof(struct rtgenmsg) (Nicolas).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnetlink: ifinfo: perform strict checks also for doit handler
Jakub Kicinski [Fri, 18 Jan 2019 18:46:16 +0000 (10:46 -0800)]
rtnetlink: ifinfo: perform strict checks also for doit handler

Make RTM_GETLINK's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnetlink: stats: reject requests for unknown stats
Jakub Kicinski [Fri, 18 Jan 2019 18:46:15 +0000 (10:46 -0800)]
rtnetlink: stats: reject requests for unknown stats

In the spirit of strict checks reject requests of stats the kernel
does not support when NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnetlink: stats: validate attributes in get as well as dumps
Jakub Kicinski [Fri, 18 Jan 2019 18:46:14 +0000 (10:46 -0800)]
rtnetlink: stats: validate attributes in get as well as dumps

Make sure NETLINK_GET_STRICT_CHK influences both GETSTATS doit
as well as the dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: netlink: add helper to retrieve NETLINK_F_STRICT_CHK
Jakub Kicinski [Fri, 18 Jan 2019 18:46:13 +0000 (10:46 -0800)]
net: netlink: add helper to retrieve NETLINK_F_STRICT_CHK

Dumps can read state of the NETLINK_F_STRICT_CHK flag from
a field in the callback structure.  For non-dump GET requests
we need a way to access the state of that flag from a socket.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovirtio-net: per-queue RPS config
Willem de Bruijn [Fri, 18 Jan 2019 01:08:53 +0000 (20:08 -0500)]
virtio-net: per-queue RPS config

On multiqueue network devices, RPS maps are configured independently
for each receive queue through /sys/class/net/$DEV/queues/rx-*.

On virtio-net currently all packets use the map from rx-0, because the
real rx queue is not known at time of map lookup by get_rps_cpu.

Call skb_record_rx_queue in the driver rx path to make lookup work.

Recording the receive queue has ramifications beyond RPS, such as in
sticky load balancing decisions for sockets (skb_tx_hash) and XPS.

Reported-by: Mark Hlady <mhlady@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosch_api: Change signature of qdisc_tree_reduce_backlog() to use ints
Toke Høiland-Jørgensen [Wed, 9 Jan 2019 16:10:57 +0000 (17:10 +0100)]
sch_api: Change signature of qdisc_tree_reduce_backlog() to use ints

There are now several places where qdisc_tree_reduce_backlog() is called
with a negative number of packets (to signal an increase in number of
packets in the queue). Rather than rely on overflow behaviour, change the
function signature to use signed integers to communicate this usage to
people reading the code.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-fixes'
David S. Miller [Fri, 18 Jan 2019 23:10:22 +0000 (15:10 -0800)]
Merge branch 'hns3-fixes'

Huazhong Tan says:

====================
net: hns3: code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add HNAE3_RESTORE_CLIENT interface in enet module
Yunsheng Lin [Fri, 18 Jan 2019 08:13:14 +0000 (16:13 +0800)]
net: hns3: add HNAE3_RESTORE_CLIENT interface in enet module

The HNAE3_INIT_CLIENT interface is also used when changing tc
configuration, vlan/mac hardware table does not need to be restored
when tc configuration changes.

This patch adds a HNAE3_RESTORE_CLIENT interface to restore the
vlan/mac hardware table when resetting.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: do reinitialization while ETS configuration changed
Huazhong Tan [Fri, 18 Jan 2019 08:13:13 +0000 (16:13 +0800)]
net: hns3: do reinitialization while ETS configuration changed

When the ETS information is changed, the network device needs to be
re-initialized, otherwise the information such as the receiving queue
will be incorrect.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix wrong combined count returned by ethtool -l
Huazhong Tan [Fri, 18 Jan 2019 08:13:12 +0000 (16:13 +0800)]
net: hns3: fix wrong combined count returned by ethtool -l

The current code returns the number of all queues that can be used and
the number of queues that have been allocated, which is incorrect.
What should be returned is the number of queues allocated for each enabled
TC and the number of queues that can be allocated.

This patch fixes it.

Fixes: 482d2e9c1cc7 ("net: hns3: add support to query tqps number")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: adjust the use of alloc_tqps and num_tqps
Huazhong Tan [Fri, 18 Jan 2019 08:13:11 +0000 (16:13 +0800)]
net: hns3: adjust the use of alloc_tqps and num_tqps

The alloc_tqps field of struct hclge_vport represents the total number
of tqps allocated to the vport. The num_tqps of struct
hnae3_knic_private_info indicates the total number of all enabled tqps,
which needs to be distinguished during use.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix user configuration loss for ethtool -L
Huazhong Tan [Fri, 18 Jan 2019 08:13:10 +0000 (16:13 +0800)]
net: hns3: fix user configuration loss for ethtool -L

Ethtool -L option with the combined parameter is for changing the number of
multi-purpose channels of the specified network device. Under the current
scheme, the user configuration information will be lost after the reset or
TC information changed.

This patch fixes this issue. By default, this configuration is set to the
minimum between the number of queues for each enabled TCs and the maximum
number support available in the hardware. When there is a user
configuration, regardless of the reset or TC information change, it should
keep the user's configuration while it is under the hardware limits,
otherwise set to the maximum number support available in the hardware.

Fixes: 09f2af6405b8 ("net: hns3: add support to modify tqps number")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove redundant codes in hclge_knic_setup
Huazhong Tan [Fri, 18 Jan 2019 08:13:09 +0000 (16:13 +0800)]
net: hns3: remove redundant codes in hclge_knic_setup

The TC info will be updated in hclge_tm_vport_tc_info_update(),
so hclge_knic_setup() no need to do it again.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify parameter checks in the hns3_set_channels
Huazhong Tan [Fri, 18 Jan 2019 08:13:08 +0000 (16:13 +0800)]
net: hns3: modify parameter checks in the hns3_set_channels

The number of queues for each enabled TC should range from 1 to
the maximum available value, and return directly if the value
is same as the current one.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add interface hclge_tm_bp_setup
Huazhong Tan [Fri, 18 Jan 2019 08:13:07 +0000 (16:13 +0800)]
net: hns3: add interface hclge_tm_bp_setup

Provide a common interface to complete the back pressure settings
of all enabled TCs. So other functions directly call this interface
to complete the corresponding operation.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: reuse reinitialization interface in the hns3_set_channels
Huazhong Tan [Fri, 18 Jan 2019 08:13:06 +0000 (16:13 +0800)]
net: hns3: reuse reinitialization interface in the hns3_set_channels

There is already common interface for network device reinitialization,
so hns3_set_channels() should just call them.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove unnecessary hns3_adjust_tqps_num
Huazhong Tan [Fri, 18 Jan 2019 08:13:05 +0000 (16:13 +0800)]
net: hns3: remove unnecessary hns3_adjust_tqps_num

The parameter passed to hns3_set_channels() are already the number of
queues per channel of the enabled TC, so it is not need to divide
the number of enabled TCs.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove unused member in struct hns3_enet_ring
Huazhong Tan [Fri, 18 Jan 2019 08:13:04 +0000 (16:13 +0800)]
net: hns3: remove unused member in struct hns3_enet_ring

The irq_init_flag field in struct hns3_enet_ring is unnecessary.
This patch removes it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify enet reinitialization interface
Huazhong Tan [Fri, 18 Jan 2019 08:13:03 +0000 (16:13 +0800)]
net: hns3: modify enet reinitialization interface

hns3_reset_notify_init_enet and hns3_reset_notify_uninit_enet are the
reinitialization interface that will be called when the device reset,
the number of TC changed, or the queue length changed. So these two
function should call hns3_get_ring_config() and hns3_put_ring_config()
to allocate and free memory for the ring with the correct number.

Also this patch fixes a double free problem when
hns3_reset_notify_uninit_enet calling hns3_nic_dealloc_vector_data

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Devlink-health-reporting-and-recovery-system'
David S. Miller [Fri, 18 Jan 2019 22:51:23 +0000 (14:51 -0800)]
Merge branch 'Devlink-health-reporting-and-recovery-system'

Eran Ben Elisha says:

====================
Devlink health reporting and recovery system

The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
  information.

The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.

The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
 - Recovery procedures
 - Diagnostics and object dump procedures
 - OOB initial attributes
Different parts of the driver can register different types of health reporters
with different handlers.

Once an error is reported, devlink health will do the following actions:
  * A log is being send to the kernel trace events buffer
  * Health status and statistics are being updated for the reporter instance
  * Object dump is being taken and saved at the reporter instance (as long as
    there is no other dump which is already stored)
  * Auto recovery attempt is being done. Depends on:
    - Auto-recovery configuration
    - Grace period vs. time passed since last recover

The user interface:
User can access/change each reporter attributes and driver specific callbacks
via devlink, e.g per error type (per health reporter)
 - Configure reporter's generic attributes (like: Disable/enable auto recovery)
 - Invoke recovery procedure
 - Run diagnostics
 - Object dump

The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
  Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
  Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
  Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
  Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
  Retrieves the last stored dump. Devlink health
  saves a single dump. If an dump is not already stored by the devlink
  for this reporter, devlink generates a new dump.
  dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
  Clears the last saved dump file for the specified reporter.

                                               netlink
                                      +--------------------------+
                                      |                          |
                                      |            +             |
                                      |            |             |
                                      +--------------------------+
                                                   |request for ops
                                                   |(diagnose,
 mlx5_core                             devlink     |recover,
                                                   |dump)
+--------+                            +--------------------------+
|        |                            |    reporter|             |
|        |                            |  +---------v----------+  |
|        |   ops execution            |  |                    |  |
|     <----------------------------------+                    |  |
|        |                            |  |                    |  |
|        |                            |  + ^------------------+  |
|        |                            |    | request for ops     |
|        |                            |    | (recover, dump)     |
|        |                            |    |                     |
|        |                            |  +-+------------------+  |
|        |     health report          |  | health handler     |  |
|        +------------------------------->                    |  |
|        |                            |  +--------------------+  |
|        |     health reporter create |                          |
|        +---------------------------->                          |
+--------+                            +--------------------------+

In this patchset, mlx5e TX reporter is implemented.

v2:
- Remove FW* reporters to decrease the amount of patches in the patchset
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add Documentation/networking/devlink-health.txt
Aya Levin [Thu, 17 Jan 2019 21:59:20 +0000 (23:59 +0200)]
devlink: Add Documentation/networking/devlink-health.txt

This patch adds a new file to add information about devlink health
mechanism.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Add TX timeout support for mlx5e TX reporter
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:19 +0000 (23:59 +0200)]
net/mlx5e: Add TX timeout support for mlx5e TX reporter

With this patch, ndo_tx_timeout callback will be redirected to the TX
reporter in order to detect a TX timeout error and report it to the
devlink health. (The watchdog detects TX timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from TX timeout in case of lost interrupt was added
to the TX reporter recover method. The TX timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the TX reporter recovery flow.

TX timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ:
0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000

$devlink health diagnose pci/0000:00:09 reporter TX
SQ 0xd8a: HW state: 1, stopped: 1
SQ 0xe44: HW state: 1, stopped: 0
SQ 0xeb4: HW state: 1, stopped: 0
SQ 0xf1f: HW state: 1, stopped: 0
SQ 0xf80: HW state: 1, stopped: 0
SQ 0xfe5: HW state: 1, stopped: 0

$devlink health recover pci/0000:00:09 reporter TX
$devlink health show
pci/0000:00:09.0:
  name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false
    attributes:
        grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Add TX reporter support
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:18 +0000 (23:59 +0200)]
net/mlx5e: Add TX reporter support

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of TX errors.
This patch declares the TX reporter operations and allocate it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full TX recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health dump {get,clear} commands
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:17 +0000 (23:59 +0200)]
devlink: Add health dump {get,clear} commands

Add devlink health dump commands, in order to run an dump operation
over a specific reporter.

The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health diagnose command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:16 +0000 (23:59 +0200)]
devlink: Add health diagnose command

Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health recover command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:15 +0000 (23:59 +0200)]
devlink: Add health recover command

Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health set command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:14 +0000 (23:59 +0200)]
devlink: Add health set command

Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
receiving error for the reporter

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health get command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:13 +0000 (23:59 +0200)]
devlink: Add health get command

Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health report functionality
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:12 +0000 (23:59 +0200)]
devlink: Add health report functionality

Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
  as there is no other dump which is already stored)
* Auto recovery attempt is being done. depends on:
  - Auto Recovery configuration
  - Grace period vs. time since last recover

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health reporter create/destroy functionality
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:11 +0000 (23:59 +0200)]
devlink: Add health reporter create/destroy functionality

Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.

As part of the allocation, driver shall provide a set of callbacks which
will be used the devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.

For each reporter, devlink will hold a metadata of statistics,
buffers and status.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add health buffer support
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:10 +0000 (23:59 +0200)]
devlink: Add health buffer support

Devlink health buffer is a mechanism to pass descriptors between drivers
and devlink. The API allows the driver to add objects, object pair,
value array (nested attributes), value and name.

Driver can use this API to fill the buffers in a format which can be
translated by the devlink to the netlink message.

In order to fulfill it, an internal buffer descriptor is defined. This
will hold the data and metadata per each attribute and by used to pass
actual commands to the netlink.

This mechanism will be later used in devlink health for dump and diagnose
data store by the drivers.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: add hit counter for matchall
Cong Wang [Thu, 17 Jan 2019 20:44:25 +0000 (12:44 -0800)]
net_sched: add hit counter for matchall

Although matchall always matches packets, however, it still
relies on a protocol match first. So it is still useful to have
such a counter for matchall. Of course, unlike u32, every time
we hit a matchall filter, it is always a success, so we don't
have to distinguish them.

Sample output:

filter protocol 802.1Q pref 100 matchall chain 0
filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1
  not_in_hw (rule hit 10)
action order 1: vlan  pop continue
 index 1 ref 1 bind 1 installed 40 sec used 1 sec
Action statistics:
Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'phy-improve-stopping-PHY'
David S. Miller [Fri, 18 Jan 2019 22:12:25 +0000 (14:12 -0800)]
Merge branch 'phy-improve-stopping-PHY'

Heiner Kallweit says:

====================
net: phy: improve stopping PHY

This patchset improves and simplifies stopping the PHY.

Heiner Kallweit (3):
  net: phy: stop PHY if needed when entering phy_disconnect
  net: phy: ensure phylib state machine is stopped after calling phy_stop
  net: phy: remove phy_stop_interrupts

v2:
- break down the patch to a patchset
v3:
- don't warn if driver didn't call phy_stop before phy_disconnect
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: remove phy_stop_interrupts
Heiner Kallweit [Thu, 17 Jan 2019 19:09:21 +0000 (20:09 +0100)]
net: phy: remove phy_stop_interrupts

Interrupts have been disabled in phy_stop() already. So we can remove
phy_stop_interrupts() and free the interrupt in phy_disconnect()
directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: ensure phylib state machine is stopped after calling phy_stop
Heiner Kallweit [Thu, 17 Jan 2019 19:08:39 +0000 (20:08 +0100)]
net: phy: ensure phylib state machine is stopped after calling phy_stop

The call to the phylib state machine in phy_stop() just ensures that
the state machine isn't re-triggered, but a state machine call may
be scheduled already. So lets's call phy_stop_machine().
This also allows to get rid of the call to phy_stop_machine() in
phy_disconnect().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: stop PHY if needed when entering phy_disconnect
Heiner Kallweit [Thu, 17 Jan 2019 19:07:54 +0000 (20:07 +0100)]
net: phy: stop PHY if needed when entering phy_disconnect

Stop PHY if needed when entering phy_disconnect. This allows drivers
that don't need a separate call to phy_stop() to omit this call.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: declare tcp_mmap() only when CONFIG_MMU is set
Yafang Shao [Thu, 17 Jan 2019 10:03:14 +0000 (18:03 +0800)]
tcp: declare tcp_mmap() only when CONFIG_MMU is set

Since tcp_mmap() is defined when CONFIG_MMU is set.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: jme: fix indentation issues
Colin Ian King [Thu, 17 Jan 2019 00:03:26 +0000 (00:03 +0000)]
net: jme: fix indentation issues

There are two lines that have indentation issues, fix these. Also remove
an empty line.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vxge: fix indentation issue
Colin Ian King [Wed, 16 Jan 2019 23:59:10 +0000 (23:59 +0000)]
net: vxge: fix indentation issue

There is a goto statement that indented too deeply, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: improve get_phy_id
Heiner Kallweit [Wed, 16 Jan 2019 18:52:51 +0000 (19:52 +0100)]
net: phy: improve get_phy_id

Only caller of get_phy_id() is get_phy_device(). There a PHY ID of
0xffffffff is translated back to -ENODEV. So we can avoid some
overhead by returning -ENODEV directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: remove state PHY_CHANGELINK
Heiner Kallweit [Wed, 16 Jan 2019 18:47:57 +0000 (19:47 +0100)]
net: phy: remove state PHY_CHANGELINK

Since recent changes to the phylib state machine state PHY_CHANGELINK
isn't used any longer. Therefore let's remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ip6_gre: remove gre_hdr_len from ip6erspan_rcv
Lorenzo Bianconi [Wed, 16 Jan 2019 18:38:05 +0000 (19:38 +0100)]
net: ip6_gre: remove gre_hdr_len from ip6erspan_rcv

Remove gre_hdr_len from ip6erspan_rcv routine signature since
it is not longer used

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'tcp_openreq_child'
David S. Miller [Fri, 18 Jan 2019 06:19:05 +0000 (22:19 -0800)]
Merge branch 'tcp_openreq_child'

Eric Dumazet says:

====================
tcp: remove code from tcp_create_openreq_child()

tcp_create_openreq_child() is essentially cloning a listener, then
must initialize some fields that can not be inherited.

Listeners are either fresh sockets, or sockets that came through
tcp_disconnect() after a session that dirtied many fields.

By moving code to tcp_disconnect(), we can shorten time taken
to create a clone, since tcp_disconnect() operation is very
unlikely.
====================

Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move rx_opt & syn_data_acked init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:42 +0000 (11:23 -0800)]
tcp: move rx_opt & syn_data_acked init to tcp_disconnect()

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move tp->rack init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:41 +0000 (11:23 -0800)]
tcp: move tp->rack init to tcp_disconnect()

If we make sure all listeners have proper tp->rack value,
then a clone will also inherit proper initial value.

Note that fresh sockets init tp->rack from tcp_init_sock()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move app_limited init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:40 +0000 (11:23 -0800)]
tcp: move app_limited init to tcp_disconnect()

If we make sure all listeners have app_limited set to ~0U,
then a clone will also inherit proper initial value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_discon...
Eric Dumazet [Thu, 17 Jan 2019 19:23:39 +0000 (11:23 -0800)]
tcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_disconnect()

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: do not clear urg_data in tcp_create_openreq_child
Eric Dumazet [Thu, 17 Jan 2019 19:23:38 +0000 (11:23 -0800)]
tcp: do not clear urg_data in tcp_create_openreq_child

All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:37 +0000 (11:23 -0800)]
tcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect()

Passive connections can inherit proper value by cloning,
if we make sure all listeners have the proper values there.

tcp_disconnect() was setting snd_cwnd to 2, which seems
quite obsolete since IW10 adoption.

Also remove an obsolete comment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move mdev_us init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:36 +0000 (11:23 -0800)]
tcp: move mdev_us init to tcp_disconnect()

If we make sure a listener always has its mdev_us
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: do not clear srtt_us in tcp_create_openreq_child
Eric Dumazet [Thu, 17 Jan 2019 19:23:35 +0000 (11:23 -0800)]
tcp: do not clear srtt_us in tcp_create_openreq_child

All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: do not clear packets_out in tcp_create_openreq_child()
Eric Dumazet [Thu, 17 Jan 2019 19:23:34 +0000 (11:23 -0800)]
tcp: do not clear packets_out in tcp_create_openreq_child()

New sockets have this field cleared, and tcp_disconnect()
calls tcp_write_queue_purge() which among other things
also clear tp->packets_out

So a listener is guaranteed to have this field cleared.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: move icsk_rto init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:33 +0000 (11:23 -0800)]
tcp: move icsk_rto init to tcp_disconnect()

If we make sure a listener always has its icsk_rto
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: do not set snd_ssthresh in tcp_create_openreq_child()
Eric Dumazet [Thu, 17 Jan 2019 19:23:32 +0000 (11:23 -0800)]
tcp: do not set snd_ssthresh in tcp_create_openreq_child()

New sockets get the field set to TCP_INFINITE_SSTHRESH in tcp_init_sock()
In case a socket had this field changed and transitions to TCP_LISTEN
state, tcp_disconnect() also makes sure snd_ssthresh is set to
TCP_INFINITE_SSTHRESH.

So a listener has this field set to TCP_INFINITE_SSTHRESH already.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>