David S. Miller [Tue, 30 Jun 2020 20:09:10 +0000 (13:09 -0700)]
Merge branch 'sfc-prerequisites-for-EF100-driver-part-2'
Edward Cree says:
====================
sfc: prerequisites for EF100 driver, part 2
Continuing on from [1], this series further prepares the sfc codebase
for the introduction of the EF100 driver.
[1]: https://lore.kernel.org/netdev/
20200629.173812.
1532344417590172093.davem@davemloft.net/T/
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:15:34 +0000 (13:15 +0100)]
sfc: don't call tx_remove if there isn't one
EF100 won't have an efx->type->tx_remove method, because there's
nothing for it to do. So make the call conditional.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:15:10 +0000 (13:15 +0100)]
sfc: commonise initialisation of efx->vport_id
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:14:45 +0000 (13:14 +0100)]
sfc: commonise efx->[rt]xq_entries initialisation
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:14:13 +0000 (13:14 +0100)]
sfc: initialise max_[tx_]channels in efx_init_channels()
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:13:47 +0000 (13:13 +0100)]
sfc: move definition of EFX_MC_STATS_GENERATION_INVALID
Saves a whole #include from nic.c.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:13:15 +0000 (13:13 +0100)]
sfc: factor out efx_tx_tso_header_length() and understand encapsulation
ef100 will need to check this against NIC limits.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:12:49 +0000 (13:12 +0100)]
sfc: remove duplicate declaration of efx_enqueue_skb_tso()
Define it in nic_common.h, even though the ef100 driver will have a
different implementation backing it (actually a WARN_ON_ONCE as it
should never get called by ef100. But it needs to still exist because
common TX path code references it).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:12:17 +0000 (13:12 +0100)]
sfc: commonise TSO fallback code
ef100 will need this if it gets GSO skbs it can't handle (e.g. too long
header length).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:11:52 +0000 (13:11 +0100)]
sfc: commonise efx_sync_rx_buffer()
The ef100 RX path will also need to DMA-sync RX buffers.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:11:35 +0000 (13:11 +0100)]
sfc: commonise some MAC configuration code
Refactor it a little as we go, and introduce efx_mcdi_set_mtu() which we
will later use for ef100 to change MTU without touching other MAC settings.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:03:47 +0000 (13:03 +0100)]
sfc: commonise miscellaneous efx functions
Various left-over bits and pieces from efx.c that are needed by ef100.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:03:21 +0000 (13:03 +0100)]
sfc: add missing licence info to mcdi_filters.c
Both the licence notice and the SPDX tag were missing from this file.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:02:56 +0000 (13:02 +0100)]
sfc: commonise MCDI MAC stats handling
Most of it was already declared in mcdi_port_common.h, so just move the
implementations to mcdi_port_common.c.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 30 Jun 2020 12:02:24 +0000 (13:02 +0100)]
sfc: move NIC-specific mcdi_port declarations out of common header
These functions are implemented in mcdi_port.c, which will not be linked
into the EF100 driver; thus their prototypes should not be visible in
common header files.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 20:05:10 +0000 (13:05 -0700)]
Merge branch 'Convert-Broadcom-SF2-to-mac_link_up-resolved-state'
Russell King says:
====================
Convert Broadcom SF2 to mac_link_up() resolved state
Convert Broadcom SF2 DSA support to use the newly provided resolved
link state via mac_link_up() rather than using the state in
mac_config().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 30 Jun 2020 10:28:18 +0000 (11:28 +0100)]
net: dsa/bcm_sf2: move pause mode setting into mac_link_up()
bcm_sf2 only appears to support pause modes on RGMII interfaces (the
enable bits are in the RGMII control register.) Setup the pause modes
for RGMII connections.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 30 Jun 2020 10:28:13 +0000 (11:28 +0100)]
net: dsa/bcm_sf2: move speed/duplex forcing to mac_link_up()
Convert the bcm_sf2 to use the finalised speed and duplex in its
mac_link_up() call rather than the parameters in mac_config().
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 30 Jun 2020 10:28:08 +0000 (11:28 +0100)]
net: dsa/bcm_sf2: fix incorrect usage of state->link
state->link has never been valid in mac_config() implementations -
while it may be correct in some calls, it is not true that it can be
relied upon.
Fix bcm_sf2 to use the correct method of handling forced link status.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 20:03:27 +0000 (13:03 -0700)]
Merge branch 'Convert-Broadcom-B53-to-mac_link_up-resolved-state'
Russell King says:
====================
Convert Broadcom B53 to mac_link_up() resolved state
These two patches update the Broadcom B53 DSA support to use the newly
provided resolved link state via mac_link_up() rather than using the
state in mac_config().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 30 Jun 2020 10:25:06 +0000 (11:25 +0100)]
net: dsa/b53: use resolved link config in mac_link_up()
Convert the B53 driver to use the finalised link parameters in
mac_link_up() rather than the parameters in mac_config(). This is
just a matter of moving the call to b53_force_port_config().
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 30 Jun 2020 10:25:00 +0000 (11:25 +0100)]
net: dsa/b53: change b53_force_port_config() pause argument
Replace the b53_force_port_config() pause argument, which is based on
phylink's MLO_PAUSE_* definitions, to use a pair of booleans. This
will allow us to move b53_force_port_config() from
b53_phylink_mac_config() to b53_phylink_mac_link_up().
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 19:59:15 +0000 (12:59 -0700)]
Merge tag 'batadv-next-for-davem-
20200630' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- update mailing list URL, by Sven Eckelmann
- fix typos and grammar in documentation, by Sven Eckelmann
- introduce a configurable per interface hop penalty,
by Linus Luessing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 30 Jun 2020 04:43:13 +0000 (21:43 -0700)]
net: dsa: Improve subordinate PHY error message
It is not very informative to know the DSA master device when a
subordinate network device fails to get its PHY setup. Provide the
device name and capitalize PHY while we are it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin [Tue, 30 Jun 2020 02:30:34 +0000 (10:30 +0800)]
hinic: remove unused but set variable
remove unused but set variable to avoid auto build test WARNING
Signed-off-by: Luo bin <luobin9@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 19:34:35 +0000 (12:34 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2020-06-29
This series contains updates to only the igc driver.
Sasha added Energy Efficient Ethernet (EEE) support and Latency Tolerance
Reporting (LTR) support for the igc driver. Added Low Power Idle (LPI)
counters and cleaned up unused TCP segmentation counters. Removed
igc_power_down_link() and call igc_power_down_phy_copper_base()
directly. Removed unneeded copper media check.
Andre cleaned up timestamping by removing un-supported features and
duplicate code for i225. Fixed the timestamp check on the proper flag
instead of the skb for pending transmit timestamps. Refactored
igc_ptp_set_timestamp_mode() to simply the flow.
v2: Removed the log message in patch 1 as suggested by David Miller.
Note: The locking issue Jakub Kicinski saw in patch 5, currently
exists in the current net-next tree, so Andre will resolve the
locking issue in a follow-on patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha Neftin [Wed, 17 Jun 2020 12:01:31 +0000 (15:01 +0300)]
igc: Remove checking media type during MAC initialization
i225 device support only copper mode.
There is no point to check media type in the
igc_config_fc_after_link_up() method.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Wed, 10 Jun 2020 12:43:08 +0000 (15:43 +0300)]
igc: Remove unneeded check for copper media type
PHY of the i225 device support only copper mode.
There is no point to check media type in the
igc_power_up_link() method.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Mon, 8 Jun 2020 15:49:39 +0000 (18:49 +0300)]
igc: Refactor the igc_power_down_link()
Currently the implementation of igc_power_down_link()
method was just calling igc_power_down_phy_copper_base()
method.
We can just call igc_power_down_phy_copper_base()
method directly.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Sun, 7 Jun 2020 08:51:27 +0000 (11:51 +0300)]
igc: Remove TCP segmentation TX fail counter
TCP segmentation TX context fail counter is not
applicable for i225 devices.
This patch comes to clean up this counter.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown<aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Thu, 4 Jun 2020 11:25:16 +0000 (14:25 +0300)]
igc: Add LPI counters
Add EEE TX LPI and EEE RX LPI counters. A EEE TX LPI event
occurs when the transmitter enters EEE (IEEE 802.3az) LPI
state. A EEE RX LPI event occurs when the receiver detect
link partner entry into EEE(IEEE 802.3az) LPI state.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andre Guedes [Thu, 4 Jun 2020 00:01:05 +0000 (17:01 -0700)]
igc: Fix Rx timestamp disabling
When Rx timestamping is enabled, we set the timestamp bit in SRRCTL
register for each queue, but we don't clear it when disabling. This
patch fixes igc_ptp_disable_rx_timestamp() accordingly.
Also, this patch gets rid of igc_ptp_enable_tstamp_rxqueue() and
igc_ptp_enable_tstamp_all_rxqueues() and move their logic into
igc_ptp_enable_rx_timestamp() to keep the enable and disable
helpers symmetric.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andre Guedes [Thu, 4 Jun 2020 00:01:04 +0000 (17:01 -0700)]
igc: Refactor igc_ptp_set_timestamp_mode()
Current igc_ptp_set_timestamp_mode() logic is a bit tangled since it
handles many different hardware configurations in one single place,
making it harder to follow. This patch untangles that code by breaking
it into helper functions.
Quick note about the hw->mac.type check which was removed in this
refactoring: this check it not really needed since igc_i225 is the only
type supported by the IGC driver.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andre Guedes [Thu, 4 Jun 2020 00:01:03 +0000 (17:01 -0700)]
igc: Remove UDP filter setup in PTP code
As implemented in igc_ethtool_get_ts_info(), igc only supports HWTSTAMP_
FILTER_ALL so any HWTSTAMP_FILTER_* option the user may set falls back to
HWTSTAMP_FILTER_ALL.
HWTSTAMP_FILTER_ALL is implemented via Rx Time Sync Control (TSYNCRXCTL)
configuration which timestamps all incoming packets. Configuring a
UDP filter, in addition to TSYNCRXCTL, doesn't add much so this patch
removes that code. It also takes this opportunity to remove some
non-applicable comments.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andre Guedes [Thu, 4 Jun 2020 00:01:02 +0000 (17:01 -0700)]
igc: Check __IGC_PTP_TX_IN_PROGRESS instead of ptp_tx_skb
The __IGC_PTP_TX_IN_PROGRESS flag indicates we have a pending Tx
timestamp. In some places, instead of checking that flag, we check
adapter->ptp_tx_skb. This patch fixes those places to use the flag.
Quick note about igc_ptp_tx_hwtstamp() change: when that function is
called, adapter->ptp_tx_skb is expected to be valid always so we
WARN_ON_ONCE() in case it is not.
Quick note about igc_ptp_suspend() change: when suspending, we don't
really need to check if there is a pending timestamp. We can simply
clear it unconditionally.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andre Guedes [Thu, 4 Jun 2020 00:01:01 +0000 (17:01 -0700)]
igc: Remove duplicate code in Tx timestamp handling
The functions igc_ptp_tx_hang() and igc_ptp_tx_work() have duplicate
code which handles Tx timestamp timeouts. This patch does a trivial
refactoring by moving that code to its own function and reusing it.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andre Guedes [Thu, 4 Jun 2020 00:01:00 +0000 (17:01 -0700)]
igc: Clean up Rx timestamping logic
Differently from I210, I225 doesn't report Rx timestamps via the TS bit
Rx descriptor + RXSTMPL/RXSTMPH registers mechanism. Rx timestamps are
reported in the packet buffer only, which is implemented by igc_ptp_rx_
pktstamp(). So this patch removes igc_ptp_rx_rgtstamp() and all code
related to it, copied from igb driver.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Tue, 2 Jun 2020 07:50:47 +0000 (10:50 +0300)]
igc: Add initial LTR support
The LTR message on the PCIe inform the requested latency
on which the PCIe must become active to the downstream
PCIe port of the system.
This patch provide recommended LTR parameters by i225
specification.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 30 Jun 2020 00:45:02 +0000 (17:45 -0700)]
Merge branch 'Add-ethtool-extended-link-state'
Ido Schimmel says:
====================
Add ethtool extended link state
Amit says:
Currently, device drivers can only indicate to user space if the network
link is up or down, without additional information.
This patch set provides an infrastructure that allows these drivers to
expose more information to user space about the link state. The
information can save users' time when trying to understand why a link is
not operationally up, for example.
The above is achieved by extending the existing ethtool LINKSTATE_GET
command with attributes that carry the extended state.
For example, no link due to missing cable:
$ ethtool ethX
...
Link detected: no (No cable)
Beside the general extended state, drivers can pass additional
information about the link state using the sub-state field. For example:
$ ethtool ethX
...
Link detected: no (Autoneg, No partner detected)
In the future the infrastructure can be extended - for example - to
allow PHY drivers to report whether a downshift to a lower speed
occurred. Something like:
$ ethtool ethX
...
Link detected: yes (downshifted)
Patch set overview:
Patches #1-#3 move mlxsw ethtool code to a separate file
Patches #4-#5 add the ethtool infrastructure for extended link state
Patches #6-#7 add support of extended link state in the mlxsw driver
Patches #8-#10 add test cases
Changes since v1:
* In documentation, show ETHTOOL_LINK_EXT_STATE_* and
ETHTOOL_LINK_EXT_SUBSTATE_* constants instead of user-space strings
* Add `_CI_` to cable_issue substates to be consistent with
other substates
* Keep the commit messages within 75 columns
* Use u8 variable for __link_ext_substate
* Document the meaning of -ENODATA in get_link_ext_state() callback
description
* Do not zero data->link_ext_state_provided after getting an error
* Use `ret` variable for error value
Changes since RFC:
* Move documentation patch before ethtool patch
* Add nla_total_size() instead of sizeof() directly
* Return an error code from linkstate_get_ext_state()
* Remove SHORTED_CABLE, add CABLE_TEST_FAILURE instead
* Check if the interface is administratively up before setting ext_state
* Document all sub-states
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:21 +0000 (23:46 +0300)]
selftests: forwarding: Add tests for ethtool extended state
Add tests to check ethtool report about extended state.
The tests configure several states and verify that the correct extended
state is reported by ethtool.
Check extended state with substate (Autoneg) and extended state without
substate (No cable).
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:20 +0000 (23:46 +0300)]
selftests: forwarding: forwarding.config.sample: Add port with no cable connected
Add NETIF_NO_CABLE port to tests topology.
The port can also be declared as an environment variable and tests can be
run like that:
NETIF_NO_CABLE=eth9 ./test.sh eth{1..8}
The NETIF_NO_CABLE port will be used by ethtool_extended_state test.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:19 +0000 (23:46 +0300)]
selftests: forwarding: ethtool: Move different_speeds_get() to ethtool_lib
Currently different_speeds_get() is used only by ethtool.sh tests.
The function can be useful for another tests that check ethtool
configurations.
Move the function to ethtool_lib in order to allow other tests to use
it.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:18 +0000 (23:46 +0300)]
mlxsw: spectrum_ethtool: Add link extended state
Implement .get_down_ext_state() as part of ethtool_ops.
Query link down reason from PDDR register and convert it to ethtool
link_ext_state.
In case that more information than common link_ext_state is provided,
fill link_ext_substate also with the appropriate value.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:17 +0000 (23:46 +0300)]
mlxsw: reg: Port Diagnostics Database Register
The PDDR register enables to read the Phy debug database.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:16 +0000 (23:46 +0300)]
ethtool: Add link extended state
Currently, drivers can only tell whether the link is up/down using
LINKSTATE_GET, but no additional information is given.
Add attributes to LINKSTATE_GET command in order to allow drivers
to expose the user more information in addition to link state to ease
the debug process, for example, reason for link down state.
Extended state consists of two attributes - link_ext_state and
link_ext_substate. The idea is to avoid 'vendor specific' states in order
to prevent drivers to use specific link_ext_state that can be in the future
common link_ext_state.
The substates allows drivers to add more information to the common
link_ext_state. For example, vendor can expose 'Autoneg' as link_ext_state
and add 'No partner detected during force mode' as link_ext_substate.
If a driver cannot pinpoint the extended state with the substate
accuracy, it is free to expose only the extended state and omit the
substate attribute.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:15 +0000 (23:46 +0300)]
Documentation: networking: ethtool-netlink: Add link extended state
Add link extended state attributes.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:14 +0000 (23:46 +0300)]
mlxsw: spectrum_ethtool: Move mlxsw_sp_port_type_speed_ops structs
Move mlxsw_sp1_port_type_speed_ops and mlxsw_sp2_port_type_speed_ops
with the relevant code from spectrum.c to spectrum_ethtool.c.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:13 +0000 (23:46 +0300)]
mlxsw: Move ethtool_ops to spectrum_ethtool.c
Add spectrum_ethtool.c file for ethtool code.
Move ethtool_ops and the relevant code from spectrum.c to
spectrum_ethtool.c.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Mon, 29 Jun 2020 20:46:12 +0000 (23:46 +0300)]
mlxsw: spectrum_dcb: Rename mlxsw_sp_port_headroom_set()
mlxsw_sp_port_headroom_set() is defined twice - in spectrum.c and in
spectrum_dcb.c, with different arguments and different implementation
but the name is same.
Rename mlxsw_sp_port_headroom_set() to mlxsw_sp_port_headroom_ets_set()
in order to allow using the second function in several files, and not
only as static function in spectrum.c.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sasha Neftin [Wed, 27 May 2020 20:51:32 +0000 (13:51 -0700)]
igc: Add initial EEE support
IEEE802.3az-2010 Energy Efficient Ethernet has been
approved as standard (September 2010) and the driver
can enable and disable it via ethtool.
Disable the feature by default on parts which support it.
Add enable/disable eee options.
tx-lpi, tx-timer and advertise not supported yet.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 30 Jun 2020 00:42:48 +0000 (17:42 -0700)]
Merge branch 'dpaa2-eth-send-a-scatter-gather-FD-instead-of-realloc-ing'
Ioana Ciornei says:
====================
dpaa2-eth: send a scatter-gather FD instead of realloc-ing
This patch set changes the behaviour in case the Tx path is confroted
with an SKB with insufficient headroom for our hardware necessities (SW
annotation area). In the first patch, instead of realloc-ing the SKB we
now send a S/G frames descriptor while the second one adds a new
software held counter to account for for these types of frames.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Mon, 29 Jun 2020 18:47:12 +0000 (21:47 +0300)]
dpaa2-eth: add software counter for Tx frames converted to S/G
With the previous commit, in case of insufficient SKB headroom on the Tx
path instead of reallocing the SKB we now send a S/G frame descriptor.
Export the number of occurences of this case as a per CPU counter (in
debugfs) and a total number in the ethtool statistics - "tx converted sg
frames'.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Mon, 29 Jun 2020 18:47:11 +0000 (21:47 +0300)]
dpaa2-eth: send a scatter-gather FD instead of realloc-ing
Instead of realloc-ing the skb on the Tx path when the provided headroom
is smaller than the HW requirements, create a Scatter/Gather frame
descriptor with only one entry.
Remove the '[drv] tx realloc frames' counter exposed previously through
ethtool since it is no longer used.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 00:37:49 +0000 (17:37 -0700)]
Merge branch 'sfc-prerequisites-for-EF100-driver-part-1'
Edward Cree says:
====================
sfc: prerequisites for EF100 driver, part 1
This continues the work started by Alex Maftei <amaftei@solarflare.com>
in the series "sfc: code refactoring", "sfc: more code refactoring",
"sfc: even more code refactoring" and "sfc: refactor mcdi filtering
code", to prepare for a new driver which will share much of the code
to support the new EF100 family of Solarflare/Xilinx NICs.
After this series, there will be approximately two more of these
'prerequisites' series, followed by the sfc_ef100 driver itself.
v2: fix reverse xmas tree in patch 5. (Left the cases in patches 7,
9 and 14 alone as those are all in pure movement of existing code.)
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:36:56 +0000 (14:36 +0100)]
sfc: extend common GRO interface to support CHECKSUM_COMPLETE
EF100 will use CHECKSUM_COMPLETE, but will also make use of
efx_rx_packet_gro(), thus needs to be able to pass the checksum value
into that function.
Drivers for older NICs pass in a csum of 0 to get the old semantics (use
the RX flags for CHECKSUM_UNNECESSARY marking).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:36:33 +0000 (14:36 +0100)]
sfc: commonise ARFS handling
EF100 will use the same approach to ARFS as EF10.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:39:32 +0000 (14:39 +0100)]
sfc: commonise drain event handling
Avoids a call from generic MCDI code into ef10.c.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:35:41 +0000 (14:35 +0100)]
sfc: commonise PCI error handlers
EF100 will use the same mechanisms for PCI error recovery.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:35:33 +0000 (14:35 +0100)]
sfc: track which BAR is mapped
EF100 needs to map multiple BARs (sequentially, not concurrently) in
order to read the Function Control Window during probe.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:35:25 +0000 (14:35 +0100)]
sfc: commonise FC advertising
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:35:15 +0000 (14:35 +0100)]
sfc: commonise other ethtool bits
A few more ethtool handlers which EF100 will share.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:35:05 +0000 (14:35 +0100)]
sfc: commonise ethtool NFC and RXFH/RSS functions
EF100 will share EF10's model of filtering, hashing and spreading.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:34:50 +0000 (14:34 +0100)]
sfc: commonise ethtool link handling functions
Link speeds, FEC, and autonegotiation are all things EF100 will share.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:34:39 +0000 (14:34 +0100)]
sfc: split up nic.h
The new nic_common.h contains the inlines for NIC-type function dispatch,
declarations for NIC-generic functions in nic.c, and other similar NIC-
generic functionality. Retained in nic.h are NIC-specific declarations
such as the siena and ef10 nic_data structs and various farch functions.
The EF100 driver will thus include nic_common.h but not nic.h.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:34:20 +0000 (14:34 +0100)]
sfc: refactor EF10 stats handling
Separate the generation-count handling from the format conversion, to
make it easier to re-use both for EF100.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:33:44 +0000 (14:33 +0100)]
sfc: don't try to create more channels than we can have VIs
Calculate efx->max_vis at probe time, and check against it in
efx_allocate_msix_channels() when considering whether to create XDP TX
channels.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:33:03 +0000 (14:33 +0100)]
sfc: extend bitfield macros up to POPULATE_DWORD_13
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:32:46 +0000 (14:32 +0100)]
sfc: determine flag word automatically in efx_has_cap()
Now that we have an _OFST definition for each individual flag bit,
callers of efx_has_cap() don't need to specify which flag word it's
in; we can just use the flag name directly in MCDI_CAPABILITY_OFST.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 29 Jun 2020 13:32:31 +0000 (14:32 +0100)]
sfc: update MCDI protocol headers
The script used to generate these now includes _OFST definitions for
flags, to identify the containing flag word.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Po Liu [Mon, 29 Jun 2020 06:54:16 +0000 (14:54 +0800)]
net:qos: police action offloading parameter 'burst' change to the original value
Since 'tcfp_burst' with TICK factor, driver side always need to recover
it to the original value, this patch moves the generic calculation and
recover to the 'burst' original value before offloading to device driver.
Signed-off-by: Po Liu <po.liu@nxp.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 00:29:38 +0000 (17:29 -0700)]
Merge branch 'MPTCP-improve-fallback-to-TCP'
Davide Caratti says:
====================
MPTCP: improve fallback to TCP
there are situations where MPTCP sockets should fall-back to regular TCP:
this series reworks the fallback code to pursue the following goals:
1) cleanup the non fallback code, removing most of 'if (<fallback>)' in
the data path
2) improve performance for non-fallback sockets, avoiding locks in poll()
further work will also leverage on this changes to achieve:
a) more consistent behavior of gestockopt()/setsockopt() on passive sockets
after fallback
b) support for "infinite maps" as per RFC8684, section 3.7
the series is made of the following items:
- patch 1 lets sendmsg() / recvmsg() / poll() use the main socket also
after fallback
- patch 2 fixes 'simultaneous connect' scenario after fallback. The
problem was present also before the rework, but the fix is much easier
to implement after patch 1
- patch 3, 4, 5 are clean-ups for code that is no more needed after the
fallback rework
- patch 6 fixes a race condition between close() and poll(). The problem
was theoretically present before the rework, but it became almost
systematic after patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 29 Jun 2020 20:26:25 +0000 (22:26 +0200)]
mptcp: close poll() races
mptcp_poll always return POLLOUT for unblocking
connect(), ensure that the socket is a suitable
state.
The MPTCP_DATA_READY bit is never cleared on accept:
ensure we don't leave mptcp_accept() with an empty
accept queue and such bit set.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 29 Jun 2020 20:26:24 +0000 (22:26 +0200)]
mptcp: __mptcp_tcp_fallback() returns a struct sock
Currently __mptcp_tcp_fallback() always return NULL
on incoming connections, because MPTCP does not create
the additional socket for the first subflow.
Since the previous commit no __mptcp_tcp_fallback()
caller needs a struct socket, so let __mptcp_tcp_fallback()
return the first subflow sock and cope correctly even with
incoming connections.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 29 Jun 2020 20:26:23 +0000 (22:26 +0200)]
mptcp: create first subflow at msk creation time
This cleans the code a bit and makes the behavior more consistent.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 29 Jun 2020 20:26:22 +0000 (22:26 +0200)]
mptcp: check for plain TCP sock at accept time
This cleanup the code a bit and avoid corrupted states
on weird syscall sequence (accept(), connect()).
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Mon, 29 Jun 2020 20:26:21 +0000 (22:26 +0200)]
mptcp: fallback in case of simultaneous connect
when a MPTCP client tries to connect to itself, tcp_finish_connect() is
never reached. Because of this, depending on the socket current state,
multiple faulty behaviours can be observed:
1) a WARN_ON() in subflow_data_ready() is hit
WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
[...]
CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
[...]
RIP: 0010:subflow_data_ready+0x18b/0x230
[...]
Call Trace:
tcp_data_queue+0xd2f/0x4250
tcp_rcv_state_process+0xb1c/0x49d3
tcp_v4_do_rcv+0x2bc/0x790
__release_sock+0x153/0x2d0
release_sock+0x4f/0x170
mptcp_shutdown+0x167/0x4e0
__sys_shutdown+0xe6/0x180
__x64_sys_shutdown+0x50/0x70
do_syscall_64+0x9a/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
2) client is stuck forever in mptcp_sendmsg() because the socket is not
TCP_ESTABLISHED
crash> bt 4847
PID: 4847 TASK:
ffff88814b2fb100 CPU: 1 COMMAND: "gh35"
#0 [
ffff8881376ff680] __schedule at
ffffffff97248da4
#1 [
ffff8881376ff778] schedule at
ffffffff9724a34f
#2 [
ffff8881376ff7a0] schedule_timeout at
ffffffff97252ba0
#3 [
ffff8881376ff8a8] wait_woken at
ffffffff958ab4ba
#4 [
ffff8881376ff940] sk_stream_wait_connect at
ffffffff96c2d859
#5 [
ffff8881376ffa28] mptcp_sendmsg at
ffffffff97207fca
#6 [
ffff8881376ffbc0] sock_sendmsg at
ffffffff96be1b5b
#7 [
ffff8881376ffbe8] sock_write_iter at
ffffffff96be1daa
#8 [
ffff8881376ffce8] new_sync_write at
ffffffff95e5cb52
#9 [
ffff8881376ffe50] vfs_write at
ffffffff95e6547f
#10 [
ffff8881376ffe90] ksys_write at
ffffffff95e65d26
#11 [
ffff8881376fff28] do_syscall_64 at
ffffffff956088ba
#12 [
ffff8881376fff50] entry_SYSCALL_64_after_hwframe at
ffffffff9740008c
RIP:
00007f126f6956ed RSP:
00007ffc2a320278 RFLAGS:
00000217
RAX:
ffffffffffffffda RBX:
0000000020000044 RCX:
00007f126f6956ed
RDX:
0000000000000004 RSI:
00000000004007b8 RDI:
0000000000000003
RBP:
00007ffc2a3202a0 R8:
0000000000400720 R9:
0000000000400720
R10:
0000000000400720 R11:
0000000000000217 R12:
00000000004004b0
R13:
00007ffc2a320380 R14:
0000000000000000 R15:
0000000000000000
ORIG_RAX:
0000000000000001 CS: 0033 SS: 002b
3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
didn't complete.
$ tcpdump -tnnr bad.pcap
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq
3208913911, win 65483, options [mss 65495,sackOK,TS val
3291706876 ecr
3291694721,nop,wscale 7,mptcp capable v1], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq
3208913911, ack
3208913912, win 65483, options [mss 65495,sackOK,TS val
3291706876 ecr
3291706876,nop,wscale 7,mptcp capable v1], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val
3291706876 ecr
3291706876], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val
3291707876 ecr
3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val
3291707876 ecr
3291707876], length 0
force a fallback to TCP in these cases, and adjust the main socket
state to avoid hanging in mptcp_sendmsg().
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Mon, 29 Jun 2020 20:26:20 +0000 (22:26 +0200)]
net: mptcp: improve fallback to TCP
Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
to regular TCP. When fallback is triggered, skip addition of the MPTCP
option on send.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Baruch Siach [Sun, 28 Jun 2020 07:04:51 +0000 (10:04 +0300)]
net: phy: marvell10g: support XFI rate matching mode
When the hardware MACTYPE hardware configuration pins are set to "XFI
with Rate Matching" the PHY interface operate at fixed 10Gbps speed. The
MAC buffer packets in both directions to match various wire speeds.
Read the MAC Type field in the Port Control register, and set the MAC
interface speed accordingly.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 00:18:40 +0000 (17:18 -0700)]
Merge tag 'mlx5-tls-2020-06-26' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-tls-2020-06-26
1) Improve hardware layouts and structure for kTLS support
2) Generalize ICOSQ (Internal Channel Operations Send Queue)
Due to the asynchronous nature of adding new kTLS flows and handling
HW asynchronous kTLS resync requests, the XSK ICOSQ was extended to
support generic async operations, such as kTLS add flow and resync, in
addition to the existing XSK usages.
3) kTLS hardware flow steering and classification:
The driver already has the means to classify TCP ipv4/6 flows to send them
to the corresponding RSS HW engine, as reflected in patches 3 through 5,
the series will add a steering layer that will hook to the driver's TCP
classifiers and will match on well known kTLS connection, in case of a
match traffic will be redirected to the kTLS decryption engine, otherwise
traffic will continue flowing normally to the TCP RSS engine.
3) kTLS add flow RX HW offload support
New offload contexts post their static/progress params WQEs
(Work Queue Element) to communicate the newly added kTLS contexts
over the per-channel async ICOSQ.
The Channel/RQ is selected according to the socket's rxq index.
A new TLS-RX workqueue is used to allow asynchronous addition of
steering rules, out of the NAPI context.
It will be also used in a downstream patch in the resync procedure.
Feature is OFF by default. Can be turned on by:
$ ethtool -K <if> tls-hw-rx-offload on
4) Added mlx5 kTLS sw stats and new counters are documented in
Documentation/networking/tls-offload.rst
rx_tls_ctx - number of TLS RX HW offload contexts added to device for
decryption.
rx_tls_ooo - number of RX packets which were part of a TLS stream
but did not arrive in the expected order and triggered the resync
procedure.
rx_tls_del - number of TLS RX HW offload contexts deleted from device
(connection has finished).
rx_tls_err - number of RX packets which were part of a TLS stream
but were not decrypted due to unexpected error in the state machine.
5) Asynchronous RX resync
a. The NIC driver indicates that it would like to resync on some TLS
record within the received packet (P), but the driver does not
know (yet) which of the TLS records within the packet.
At this stage, the NIC driver will query the device to find the exact
TCP sequence for resync (tcpsn), however, the driver does not wait
for the device to provide the response.
b. Eventually, the device responds, and the driver provides the tcpsn
within the resync packet to KTLS. Now, KTLS can check the tcpsn against
any processed TLS records within packet P, and also against any record
that is processed in the future within packet P.
The asynchronous resync path simplifies the device driver, as it can
save bits on the packet completion (32-bit TCP sequence), and pass this
information on an asynchronous command instead.
Performance:
CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off
NIC: ConnectX-6 Dx 100GbE dual port
Goodput (app-layer throughput) comparison:
+---------------+-------+-------+---------+
| # connections | 1 | 4 | 8 |
+---------------+-------+-------+---------+
| SW (Gbps) | 7.26 | 24.70 | 50.30 |
+---------------+-------+-------+---------+
| HW (Gbps) | 18.50 | 64.30 | 92.90 |
+---------------+-------+-------+---------+
| Speedup | 2.55x | 2.56x | 1.85x * |
+---------------+-------+-------+---------+
* After linerate is reached, diff is observed in CPU util
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 00:08:28 +0000 (17:08 -0700)]
Merge branch 'TC-Introduce-qevents'
Petr Machata says:
====================
TC: Introduce qevents
The Spectrum hardware allows execution of one of several actions as a
result of queue management decisions: tail-dropping, early-dropping,
marking a packet, or passing a configured latency threshold or buffer
size. Such packets can be mirrored, trapped, or sampled.
Modeling the action to be taken as simply a TC action is very attractive,
but it is not obvious where to put these actions. At least with ECN marking
one could imagine a tree of qdiscs and classifiers that effectively
accomplishes this task, albeit in an impractically complex manner. But
there is just no way to match on dropped-ness of a packet, let alone
dropped-ness due to a particular reason.
To allow configuring user-defined actions as a result of inner workings of
a qdisc, this patch set introduces a concept of qevents. Those are attach
points for TC blocks, where filters can be put that are executed as the
packet hits well-defined points in the qdisc algorithms. The attached
blocks can be shared, in a manner similar to clsact ingress and egress
blocks, arbitrary classifiers with arbitrary actions can be put on them,
etc.
For example:
red limit 500K avpkt 1K qevent early_drop block 10
matchall action mirred egress mirror dev eth1
The central patch #2 introduces several helpers to allow easy and uniform
addition of qevents to qdiscs: initialization, destruction, qevent block
number change validation, and qevent handling, i.e. dispatch of the filters
attached to the block bound to a qevent.
Patch #1 adds root_lock argument to qdisc enqueue op. The problem this is
tackling is that if a qevent filter pushes packets to the same qdisc tree
that holds the qevent in the first place, attempt to take qdisc root lock
for the second time will lead to a deadlock. To solve the issue, qevent
handler needs to unlock and relock the root lock around the filter
processing. Passing root_lock around makes it possible to get the lock
where it is needed, and visibly so, such that it is obvious the lock will
be used when invoking a qevent.
The following two patches, #3 and #4, then add two qevents to the RED
qdisc: "early_drop" qevent fires when a packet is early-dropped; "mark"
qevent, when it is ECN-marked.
Patch #5 contains a selftest. I have mentioned this test when pushing the
RED ECN nodrop mode and said that "I have no confidence in its portability
to [...] different configurations". That still holds. The backlog and
packet size are tuned to make the test deterministic. But it is better than
nothing, and on the boxes that I ran it on it does work and shows that
qevents work the way they are supposed to, and that their addition has not
broken the other tested features.
This patch set does not deal with offloading. The idea there is that a
driver will be able to figure out that a given block is used in qevent
context by looking at binder type. A future patch-set will add a qdisc
pointer to struct flow_block_offload, which a driver will be able to
consult to glean the TC or other relevant attributes.
Changes from RFC to v1:
- Move a "q = qdisc_priv(sch)" from patch #3 to patch #4
- Fix deadlock caused by mirroring packet back to the same qdisc tree.
- Rename "tail" qevent to "tail_drop".
- Adapt to the new 100-column standard.
- Add a selftest
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 26 Jun 2020 22:45:29 +0000 (01:45 +0300)]
selftests: forwarding: Add a RED test for SW datapath
This test is inspired by the mlxsw RED selftest. It is much simpler to set
up (also because there is no point in testing PRIO / RED encapsulation). It
tests bare RED, ECN and ECN+nodrop modes of operation. On top of that it
tests RED early_drop and mark qevents.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 26 Jun 2020 22:45:28 +0000 (01:45 +0300)]
net: sched: sch_red: Add qevents "early_drop" and "mark"
In order to allow acting on dropped and/or ECN-marked packets, add two new
qevents to the RED qdisc: "early_drop" and "mark". Filters attached at
"early_drop" block are executed as packets are early-dropped, those
attached at the "mark" block are executed as packets are ECN-marked.
Two new attributes are introduced: TCA_RED_EARLY_DROP_BLOCK with the block
index for the "early_drop" qevent, and TCA_RED_MARK_BLOCK for the "mark"
qevent. Absence of these attributes signifies "don't care": no block is
allocated in that case, or the existing blocks are left intact in case of
the change callback.
For purposes of offloading, blocks attached to these qevents appear with
newly-introduced binder types, FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP and
FLOW_BLOCK_BINDER_TYPE_RED_MARK.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 26 Jun 2020 22:45:27 +0000 (01:45 +0300)]
net: sched: sch_red: Split init and change callbacks
In the following patches, RED will get two qevents. The implementation will
be clearer if the callback for change is not a pure subset of the callback
for init. Split the two and promote attribute parsing to the callbacks
themselves from the common code, because it will be handy there.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 26 Jun 2020 22:45:26 +0000 (01:45 +0300)]
net: sched: Introduce helpers for qevent blocks
Qevents are attach points for TC blocks, where filters can be put that are
executed when "interesting events" take place in a qdisc. The data to keep
and the functions to invoke to maintain a qevent will be largely the same
between qevents. Therefore introduce sched-wide helpers for qevent
management.
Currently, similarly to ingress and egress blocks of clsact pseudo-qdisc,
blocks attachment cannot be changed after the qdisc is created. To that
end, add a helper tcf_qevent_validate_change(), which verifies whether
block index attribute is not attached, or if it is, whether its value
matches the current one (i.e. there is no material change).
The function tcf_qevent_handle() should be invoked when qdisc hits the
"interesting event" corresponding to a block. This function releases root
lock for the duration of executing the attached filters, to allow packets
generated through user actions (notably mirred) to be reinserted to the
same qdisc tree.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 26 Jun 2020 22:45:25 +0000 (01:45 +0300)]
net: sched: Pass root lock to Qdisc_ops.enqueue
A following patch introduces qevents, points in qdisc algorithm where
packet can be processed by user-defined filters. Should this processing
lead to a situation where a new packet is to be enqueued on the same port,
holding the root lock would lead to deadlocks. To solve the issue, qevent
handler needs to unlock and relock the root lock when necessary.
To that end, add the root lock argument to the qdisc op enqueue, and
propagate throughout.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Jun 2020 00:06:19 +0000 (17:06 -0700)]
Merge branch 'net-ethernet-ti-am65-cpsw-update-and-enable-sr2-0-soc'
Grygorii Strashko says:
====================
net: ethernet: ti: am65-cpsw: update and enable sr2.0 soc
This series contains set of improvements for TI AM654x/J721E CPSW2G driver and
adds support for TI AM654x SR2.0 SoC.
Patch 1: adds vlans restoration after "if down/up"
Patches 2-5: improvments
Patch 6: adds support for TI AM654x SR2.0 SoC which allows to disable errata i2027 W/A.
By default, errata i2027 W/A (TX csum offload disabled) is enabled on AM654x SoC
for backward compatibility, unless SR2.0 SoC is identified using SOC BUS framework.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 26 Jun 2020 18:17:09 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-nuss: enable am65x sr2.0 support
The AM65x SR2.0 MCU CPSW has fixed errata i2027 "CPSW: CPSW Does Not
Support CPPI Receive Checksum (Host to Ethernet) Offload Feature". This
errata also fixed for J271E SoC.
Use SOC bus data for K3 SoC identification and apply i2027 errata w/a only
for the AM65x SR1.0 SoC.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 26 Jun 2020 18:17:08 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-ethtool: configured critical setting only when no running netdevs
Ensure that critical setting can only be configured when there are no
running netdevs - all ports are down.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 26 Jun 2020 18:17:07 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-ethtool: skip hw cfg when change p0-rx-ptype-rrobin
Skip HW configuration when p0-rx-ptype-rrobin is changed as it will be done
by .ndev_open(),
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 26 Jun 2020 18:17:06 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-nuss: fix ports mac sl initialization
The MAC SL has to be initialized for each port otherwise
am65_cpsw_nuss_slave_disable_unused() will crash for disabled ports.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 26 Jun 2020 18:17:05 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw: move to pf_p0_rx_ptype_rrobin init in probe
The pf_p0_rx_ptype_rrobin is global parameter so move its initialization in
probe.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 26 Jun 2020 18:17:04 +0000 (21:17 +0300)]
net: ethernet: ti: am65-cpsw-nuss: restore vlan configuration while down/up
The vlan configuration is not restored after interface down/up sequence.
Steps to check:
# ip link add link eth0 name eth0.100 type vlan id 100
# ifconfig eth0 down
# ifconfig eth0 up
This patch fixes it, restoring vlan ALE entries on .ndo_open().
Fixes:
93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Sun, 28 Jun 2020 10:14:13 +0000 (18:14 +0800)]
liquidio: use list_empty_careful in lio_list_delete_head
Use list_empty_careful() instead of open-coding.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geliang Tang [Sun, 28 Jun 2020 09:32:25 +0000 (17:32 +0800)]
sctp: use list_is_singular in sctp_list_single_entry
Use list_is_singular() instead of open-coding.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Armin Wolf [Sat, 27 Jun 2020 22:07:47 +0000 (00:07 +0200)]
8390: Fix coding-style issues
Fix some coding-style issues, including one which
made the function pointers in the struct ei_device
hard to understand.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 27 Jun 2020 12:03:06 +0000 (15:03 +0300)]
net: mscc: ocelot: remove EXPORT_SYMBOL from ocelot_net.c
Now that all net_device operations are bundled together inside
mscc_ocelot.ko and no longer part of the common library, there's no
reason to export these symbols.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Jun 2020 03:56:38 +0000 (20:56 -0700)]
Merge branch 'r8169-make-RTL8401-a-separate-chip-version'
Heiner Kallweit says:
====================
r8169: make RTL8401 a separate chip version
So far RTL8401 was treated like a RTL8101e, means we relied on the BIOS
to configure MAC and PHY properly. Make RTL8401 a separate chip version
and copy MAC / PHY config from r8101 vendor driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 28 Jun 2020 21:17:07 +0000 (23:17 +0200)]
r8169: sync support for RTL8401 with vendor driver
So far RTL8401 was treated like a RTL8101e, means we relied on the BIOS
to configure MAC and PHY properly. Make RTL8401 a separate chip version
and copy MAC / PHY config from r8101 vendor driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sun, 28 Jun 2020 21:15:45 +0000 (23:15 +0200)]
r8169: merge handling of RTL8101e and RTL8100e
Chip versions 13, 14, 15 are treated the same by the driver, therefore
let's merge them.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 29 Jun 2020 03:52:53 +0000 (20:52 -0700)]
Merge branch 'netdev_tx_t'
Luc Van Oostenryck says:
====================
net: always use netdev_tx_t for xmit()'s return type
The ndo_start_xmit() methods should return a 'netdev_tx_t', not
an int, and so should return NETDEV_TX_OK, not 0.
The patches in the series fix most of the remaning drivers and
subsystems (those included in allyesconfig on x86).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>