platform/kernel/linux-rpi.git
18 months agoocteontx2-af: restore rxc conf after teardown sequence
Nithin Dabilpuram [Wed, 18 Jan 2023 12:03:52 +0000 (17:33 +0530)]
octeontx2-af: restore rxc conf after teardown sequence

CN10K CPT coprocessor includes a component named RXC which
is responsible for reassembly of inner IP packets. RXC has
the feature to evict oldest entries based on age/threshold.
The age/threshold is being set to minimum values to evict
all entries at the time of teardown.
This patch adds code to restore timeout and threshold config
after teardown sequence is complete as it is global config.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoocteontx2-af: optimize cpt pf identification
Srujana Challa [Wed, 18 Jan 2023 12:03:51 +0000 (17:33 +0530)]
octeontx2-af: optimize cpt pf identification

Optimize CPT PF identification in mbox handling for faster
mbox response by doing it at AF driver probe instead of
every mbox message.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoocteontx2-af: modify FLR sequence for CPT
Srujana Challa [Wed, 18 Jan 2023 12:03:50 +0000 (17:33 +0530)]
octeontx2-af: modify FLR sequence for CPT

On OcteonTX2 platform CPT instruction enqueue is only
possible via LMTST operations.
The existing FLR sequence mentioned in HRM requires
a dummy LMTST to CPT but LMTST can't be submitted from
AF driver. So, HW team provided a new sequence to avoid
dummy LMTST. This patch adds code for the same.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoocteontx2-af: add mbox for CPT LF reset
Srujana Challa [Wed, 18 Jan 2023 12:03:49 +0000 (17:33 +0530)]
octeontx2-af: add mbox for CPT LF reset

On OcteonTX2 SoC, the admin function (AF) is the only one with all
priviliges to configure HW and alloc resources, PFs and it's VFs
have to request AF via mailbox for all their needs.
This patch adds a new mailbox for CPT VFs to request for CPT LF
reset.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoocteontx2-af: recover CPT engine when it gets fault
Srujana Challa [Wed, 18 Jan 2023 12:03:48 +0000 (17:33 +0530)]
octeontx2-af: recover CPT engine when it gets fault

When CPT engine has uncorrectable errors, it will get halted and
must be disabled and re-enabled. This patch adds code for the same.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoice: use GNSS subsystem instead of TTY
Arkadiusz Kubalewski [Thu, 19 Jan 2023 00:58:36 +0000 (16:58 -0800)]
ice: use GNSS subsystem instead of TTY

Previously support for GNSS was implemented as a TTY driver, it allowed
to access GNSS receiver on /dev/ttyGNSS_<bus><func>.

Use generic GNSS subsystem API instead of implementing own TTY driver.
The receiver is accessible on /dev/gnss<id>. In case of multiple receivers
in the OS, correct device can be found by enumerating either:
- /sys/class/net/<eth port>/device/gnss/
- /sys/class/gnss/gnss<id>/device/

Using GNSS subsystem is superior to implementing own TTY driver, as the
GNSS subsystem was designed solely for this purpose. It also implements
TTY driver but in a common and defined way.

From user perspective, there is no difference in communicating with a
device, except new path to the device shall be used. The device will
provide same information to the userspace as the old one, and can be used
in the same way, i.e.:
old # gpsmon /dev/ttyGNSS_2100_0
new # gpsmon /dev/gnss0
There is no other impact on userspace tools.

User expecting onboard GNSS receiver support is required to enable
CONFIG_GNSS=y/m in kernel config.

Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: hns: Switch to use acpi_evaluate_dsm_typed()
Andy Shevchenko [Thu, 19 Jan 2023 19:11:01 +0000 (21:11 +0200)]
net: hns: Switch to use acpi_evaluate_dsm_typed()

The acpi_evaluate_dsm_typed() provides a way to check the type of the
object evaluated by _DSM call. Use it instead of open coded variant.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoACPI: utils: Add acpi_evaluate_dsm_typed() and acpi_check_dsm() stubs
Andy Shevchenko [Thu, 19 Jan 2023 19:11:00 +0000 (21:11 +0200)]
ACPI: utils: Add acpi_evaluate_dsm_typed() and acpi_check_dsm() stubs

When the ACPI part of a driver is optional the methods used in it
are expected to be available even if CONFIG_ACPI=n. This is not
the case for _DSM related methods. Add stubs for
acpi_evaluate_dsm_typed() and acpi_check_dsm() methods.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
David S. Miller [Fri, 20 Jan 2023 11:37:57 +0000 (11:37 +0000)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-19 (ice)

This series contains updates to ice driver only.

Tsotne and Anatolii implement new handling, and AdminQ command, for
firmware LLDP, adding a pending notification to allow for proper
cleanup between TC changes.

Amritha extends support for drop action outside of switchdev.

Siddaraju adjusts restriction for PTP HW clock adjustments.

Ani removes an unneeded non-null check and improves reporting of some link
modes to utilize more appropriate values.

Jesse adds checks to ensure PF VSI type.

Przemek combines duplicate checks of the same condition into one check.

Tony makes various cleanups to code: removes comments for cppcheck
suppressions, reduces scope of some variables, changes some return
statements to reflect an explicit 0 return, matches naming for function
declaration and definition, adds local variable for readability, and
fixes indenting.

Sergey separates DDP (Dynamic Device Personalization) code into its own
file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch 'net-dcb-rewrite-table'
David S. Miller [Fri, 20 Jan 2023 09:33:22 +0000 (09:33 +0000)]
Merge branch 'net-dcb-rewrite-table'

Daniel Machon says:

====================
net: Introduce new DCB rewrite table

There is currently no support for per-port egress mapping of priority to PCP and
priority to DSCP. Some support for expressing egress mapping of PCP is supported
through ip link, with the 'egress-qos-map', however this command only maps
priority to PCP, and for vlan interfaces only. DCB APP already has support for
per-port ingress mapping of PCP/DEI, DSCP and a bunch of other stuff. So why not
take advantage of this fact, and add a new table that does the reverse.

This patch series introduces the new DCB rewrite table. Whereas the DCB
APP table deals with ingress mapping of PID (protocol identifier) to priority,
the rewrite table deals with egress mapping of priority to PID.

It is indeed possible to integrate rewrite in the existing APP table, by
introducing new dedicated rewrite selectors, and altering existing functions
to treat rewrite entries specially. However, I feel like this is not a good
solution, and will pollute the APP namespace. APP is well-defined in IEEE, and
some userspace relies of advertised entries - for this fact, separating APP and
rewrite into to completely separate objects, seems to me the best solution.

The new table shares much functionality with the APP table, and as such, much
existing code is reused, or slightly modified, to work for both.

================================================================================
DCB rewrite table in a nutshell
================================================================================
The table is implemented as a simple linked list, and uses the same lock as the
APP table. New functions for getting, setting and deleting entries have been
added, and these are exported, so they can be used by the stack or drivers.
Additionnaly, new dcbnl_setrewr and dcnl_delrewr hooks has been added, to
support hardware offload of the entries.

================================================================================
Sparx5 per-port PCP rewrite support
================================================================================
Sparx5 supports PCP egress mapping through two eight-entry switch tables.
One table maps QoS class 0-7 to PCP for DE0 (DP levels mapped to
drop-eligibility 0) and the other for DE1. DCB does currently not have support
for expressing DP/color, so instead, the tagged DEI bit will reflect the DP
levels, for any rewrite entries> 7 ('de').

The driver will take apptrust (contributed earlier) into consideration, so
that the mapping tables only be used, if PCP is trusted *and* the rewrite table
has active mappings, otherwise classified PCP (same as frame PCP) will be used
instead.

================================================================================
Sparx5 per-port DSCP rewrite support
================================================================================
Sparx5 support DSCP egress mapping through a single 32-entry table. This table
maps classified QoS class and DP level to classified DSCP, and is consulted by
the switch Analyzer Classifier at ingress. At egress, the frame DSCP can either
be rewritten to classified DSCP to frame DSCP.

The driver will take apptrust into consideration, so that the mapping tables
only be used, if DSCP is trusted *and* the rewrite table has active mappings,
otherwise frame DSCP will be used instead.

================================================================================
Patches
================================================================================
Patch #1 modifies dcb_app_add to work for both APP and rewrite

Patch #2 adds dcbnl_app_table_setdel() for setting and deleting both APP and
         rewrite entries.

Patch #3 adds the rewrite table and all required functions, offload hooks and
         bookkeeping for maintaining it.

Patch #4 adds two new helper functions for getting a priority to PCP bitmask
         map, and a priority to DSCP bitmask map.

Patch #5 adds support for PCP rewrite in the Sparx5 driver.
Patch #6 adds support for DSCP rewrite in the Sparx5 driver.

================================================================================
v2 -> v3:
  in dcbnl_ieee_fill() use nla_nest_start() instead of the _noflag() version.
  Also, cancel the rewrite nest in case of an error (Petr Machata).

v1 -> v2:
  In dcb_setrewr() change proto to u16 as it ought to be, and remove zero
  initialization of err. (Dan Carpenter).
  Change name of dcbnl_apprewr_setdel -> dcbnl_app_table_setdel and change the
  function signature to take a single function pointer. Update uses accordingly
  (Petr Machata).

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: microchip: sparx5: add support for DSCP rewrite
Daniel Machon [Wed, 18 Jan 2023 21:08:30 +0000 (22:08 +0100)]
net: microchip: sparx5: add support for DSCP rewrite

Add support for DSCP rewrite in Sparx5 driver. On egress DSCP is
rewritten from either classified DSCP, or frame DSCP. Classified DSCP is
determined by the Analyzer Classifier on ingress, and is mapped from
classified QoS class and DP level. Classification of DSCP is by default
enabled for all ports.

It is required that DSCP is trusted for the egress port *and* rewrite
table is not empty, in order to rewrite DSCP based on classified DSCP,
otherwise DSCP is always rewritten from frame DSCP.

classified_dscp = qos_dscp_map[8 * dp_level + qos_class];
if (active_mappings && dscp_is_trusted)
rewritten_dscp = classified_dscp
else
rewritten_dscp = frame_dscp

To rewrite DSCP to 20 for any frames with priority 7:

$ dcb apptrust set dev eth0 order dscp
$ dcb rewr add dev eth0 7:20 <-- not in iproute2/dcb yet

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: microchip: sparx5: add support for PCP rewrite
Daniel Machon [Wed, 18 Jan 2023 21:08:29 +0000 (22:08 +0100)]
net: microchip: sparx5: add support for PCP rewrite

Add support for rewrite of PCP and DEI, based on classified Quality of
Service (QoS) class and Drop-Precedence (DP) level.

The DCB rewrite table is queried for mappings between priority and
PCP/DEI. The classified DP level is then encoded in the DEI bit, if a
mapping for DEI exists.

Sparx5 has four DP levels, where by default, 0 is mapped to DE0 and 1-3
are mapped to DE1. If a mapping exists where DEI=1, then all classified
DP levels mapped to DE1 will set the DEI bit. The other way around for
DEI=0. Effectively, this means that the tagged DEI bit will reflect the
DP level for any mappings where DEI=1.

Map priority=1 to PCP=1 and DEI=1:
$ dcb rewr add dev eth0 pcp-prio 1:1de

Map priority=7 to PCP=2 and DEI=0
$ dcb rewr add dev eth0 pcp-prio 7:2nd

Also, sparx5_dcb_ieee_dscp_setdel() has been refactored, to work for
both APP and rewrite entries.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: dcb: add helper functions to retrieve PCP and DSCP rewrite maps
Daniel Machon [Wed, 18 Jan 2023 21:08:28 +0000 (22:08 +0100)]
net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps

Add two new helper functions to retrieve a mapping of priority to PCP
and DSCP bitmasks, where each bitmap contains ones in positions that
match a rewrite entry.

dcb_ieee_getrewr_prio_dscp_mask_map() reuses the dcb_ieee_app_prio_map,
as this struct is already used for a similar mapping in the app table.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: dcb: add new rewrite table
Daniel Machon [Wed, 18 Jan 2023 21:08:27 +0000 (22:08 +0100)]
net: dcb: add new rewrite table

Add new rewrite table and all the required functions, offload hooks and
bookkeeping for maintaining it. The rewrite table reuses the app struct,
and the entire set of app selectors. As such, some bookeeping code can
be shared between the rewrite- and the APP table.

New functions for getting, setting and deleting entries has been added.
Apart from operating on the rewrite list, these functions do not emit a
DCB_APP_EVENT when the list os modified. The new dcb_getrewr does a
lookup based on selector and priority and returns the protocol, so that
mappings from priority to protocol, for a given selector and ifindex is
obtained.

Also, a new nested attribute has been added, that encapsulates one or
more app structs. This attribute is used to distinguish the two tables.

The dcb_lock used for the APP table is reused for the rewrite table.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: dcb: add new common function for set/del of app/rewr entries
Daniel Machon [Wed, 18 Jan 2023 21:08:26 +0000 (22:08 +0100)]
net: dcb: add new common function for set/del of app/rewr entries

In preparation for DCB rewrite. Add a new function for setting and
deleting both app and rewrite entries. Moving this into a separate
function reduces duplicate code, as both type of entries requires the
same set of checks. The function will now iterate through a configurable
nested attribute (app or rewrite attr), validate each attribute and call
the appropriate set- or delete function.

Note that this function always checks for nla_len(attr_itr) <
sizeof(struct dcb_app), which was only done in dcbnl_ieee_set and not in
dcbnl_ieee_del prior to this patch. This means, that any userspace tool
that used to shove in data < sizeof(struct dcb_app) would now receive
-ERANGE.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agonet: dcb: modify dcb_app_add to take list_head ptr as parameter
Daniel Machon [Wed, 18 Jan 2023 21:08:25 +0000 (22:08 +0100)]
net: dcb: modify dcb_app_add to take list_head ptr as parameter

In preparation to DCB rewrite. Modify dcb_app_add to take new struct
list_head * as parameter, to make the used list configurable. This is
done to allow reusing the function for adding rewrite entries to the
rewrite table, which is introduced in a later patch.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge branch 'lan9303-phylink'
David S. Miller [Fri, 20 Jan 2023 08:53:13 +0000 (08:53 +0000)]
Merge branch 'lan9303-phylink'

Jerry Ray says:

====================
dsa: lan9303: Move to PHYLINK

This patch series moves the lan9303 driver to use the phylink
api away from phylib.

Migrating to phylink means removing the .adjust_link api. The
functionality from the adjust_link is moved to the phylink_mac_link_up
api.  The code being removed only affected the cpu port.  The other
ports on the LAN9303 do not need anything from the phylink_mac_link_up
api.

Patches:
 0001 - Whitespace only change aligning the dsa_switch_ops members.
No code changes.
 0002 - Moves the Turbo bit initialization out of the adjust_link api and
places it in a driver initialization execution path. It only needs
to be initialized once, it is never changed, and it is not a
per-port flag.
 0003 - Adds exception handling logic in the extremely unlikely event that
the read of the device fails.
 0004 - Performance optimization that skips a slow register write if there
is no need to perform it.
 0005 - Change the way we identify the xMII port as phydev will be NULL
when this logic is moved into phylink_mac_link_up.
 0006 - Removes adjust_link and begins using the phylink dsa_switch_ops
apis.
 0007 - Adds XMII port flow control settings in the phylink_mac_link_up()
api while cleaning up the ANEG / speed / duplex implementation.
---
v6->v7:
  - Moved the initialization of the Turbo bit into lan9303_setup().
  - Added a macro for determining is a port is an XMII port.
  - Added setting the XMII flow control in the phylink_mac_link_up() API.
  - removed unnecessary error handling and cleaned up the code flow in
    phylink_mac_link_up().
v5->v6:
  - Moved to using port number to identify xMII port for the LAN9303.
v4->v5:
  - Created prep patches to better show how things migrate.
  - cleaned up comments.
v3->v4:
  - Addressed whitespace issues as a separate patch.
  - Removed port_max_mtu api patch as it is unrelated to phylink migration.
  - Reworked the implementation to preserve the adjust_link functionality
    by including it in the phylink_mac_link_up api.
v2->v3:
  Added back in disabling Turbo Mode on the CPU MII interface.
  Removed the unnecessary clearing of the phy supported interfaces.
v1->v2:
  corrected the reported mtu size, removing ETH_HLEN and ETH_FCS_LEN
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: Add flow ctrl in link_up
Jerry Ray [Tue, 17 Jan 2023 20:57:03 +0000 (14:57 -0600)]
dsa: lan9303: Add flow ctrl in link_up

While the prior patch moved the adjust_link code into the
phylink_mac_link_up api, this patch cleans it up and adds the setting the
port's flow control based on the phylink_mac_link_up input parameters.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: Migrate to PHYLINK
Jerry Ray [Tue, 17 Jan 2023 20:57:02 +0000 (14:57 -0600)]
dsa: lan9303: Migrate to PHYLINK

This patch replaces the adjust_link api with the phylink apis that provide
equivalent functionality.

The remaining functionality from the adjust_link is now covered in the
phylink_mac_link_up api.

Removes:
.adjust_link
Adds:
.phylink_get_caps
.phylink_mac_link_up

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: Port 0 is xMII port
Jerry Ray [Tue, 17 Jan 2023 20:57:01 +0000 (14:57 -0600)]
dsa: lan9303: Port 0 is xMII port

In preparing to move the adjust_link logic into the phylink_mac_link_up
api, change the macro used to check for the cpu port. In
phylink_mac_link_up, the phydev pointer passed in for the CPU port is
NULL, so we can't keep using phy_is_pseudo_fixed_link(phydev).

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: write reg only if necessary
Jerry Ray [Tue, 17 Jan 2023 20:57:00 +0000 (14:57 -0600)]
dsa: lan9303: write reg only if necessary

As the regmap_write() is over a slow bus that will sleep, we can speed up
the boot-up time a bit by not bothering to clear a bit that is already
clear.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: Add exception logic for read failure
Jerry Ray [Tue, 17 Jan 2023 20:56:59 +0000 (14:56 -0600)]
dsa: lan9303: Add exception logic for read failure

While it is highly unlikely a read will ever fail, This code fragment is
now in a function that allows us to return an error code. A read failure
here will cause the lan9303_probe to fail.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: move Turbo Mode bit init
Jerry Ray [Tue, 17 Jan 2023 20:56:58 +0000 (14:56 -0600)]
dsa: lan9303: move Turbo Mode bit init

In preparing to remove the .adjust_link api, I am moving the one-time
initialization of the device's Turbo Mode bit into a different execution
path. This code clears (disables) the Turbo Mode bit which is never used
by this driver. Turbo Mode is a non-standard mode that would allow the
100Mbps RMII interface to run at 200Mbps.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agodsa: lan9303: align dsa_switch_ops members
Jerry Ray [Tue, 17 Jan 2023 20:56:57 +0000 (14:56 -0600)]
dsa: lan9303: align dsa_switch_ops members

Whitespace preparatory patch, making the dsa_switch_ops table consistent.
No code is added or removed.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18 months agoMerge tag 'mlx5-updates-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Fri, 20 Jan 2023 03:47:18 +0000 (19:47 -0800)]
Merge tag 'mlx5-updates-2023-01-18' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-01-18

1) From Rahul,
  1.1) extended range for PTP adjtime and adjphase
  1.2) adjphase function to support hardware-only offset control

2) From Roi, code cleanup to the TC module.

3) From Maor, TC support for Geneve and GRE with VF tunnel offload

4) Cleanups and minor updates.

* tag 'mlx5-updates-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Use read lock for eswitch get callbacks
  net/mlx5e: Remove redundant allocation of spec in create indirect fwd group
  net/mlx5e: Support Geneve and GRE with VF tunnel offload
  net/mlx5: E-Switch, Fix typo for egress
  net/mlx5e: Warn when destroying mod hdr hash table that is not empty
  net/mlx5e: TC, Use common function allocating flow mod hdr or encap mod hdr
  net/mlx5e: TC, Add tc prefix to attach/detach hdr functions
  net/mlx5e: TC, Pass flow attr to attach/detach mod hdr functions
  net/mlx5e: Add warning when log WQE size is smaller than log stride size
  net/mlx5e: Fail with messages when params are not valid for XSK
  net/mlx5: E-switch, Remove redundant comment about meta rules
  net/mlx5: Add hardware extended range support for PTP adjtime and adjphase
  net/mlx5: Add adjphase function to support hardware-only offset control
  net/mlx5: Suppress error logging on UCTX creation
  net/mlx5e: Suppress Send WQEBB room warning for PAGE_SIZE >= 16KB
====================

Link: https://lore.kernel.org/r/20230118183602.124323-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: phy: fix use of uninit variable when setting PLCA config
Piergiorgio Beruto [Wed, 18 Jan 2023 15:47:31 +0000 (16:47 +0100)]
net: phy: fix use of uninit variable when setting PLCA config

Coverity reported the following:

*** CID 1530573:    (UNINIT)
drivers/net/phy/phy-c45.c:1036 in genphy_c45_plca_set_cfg()
1030      return ret;
1031
1032      val = ret;
1033      }
1034
1035      if (plca_cfg->node_cnt >= 0)
vvv     CID 1530573:    (UNINIT)
vvv     Using uninitialized value "val".
1036      val = (val & ~MDIO_OATC14_PLCA_NCNT) |
1037            (plca_cfg->node_cnt << 8);
1038
1039      if (plca_cfg->node_id >= 0)
1040      val = (val & ~MDIO_OATC14_PLCA_ID) |
1041            (plca_cfg->node_id);
drivers/net/phy/phy-c45.c:1076 in genphy_c45_plca_set_cfg()
1070      return ret;
1071
1072      val = ret;
1073      }
1074
1075      if (plca_cfg->burst_cnt >= 0)
vvv     CID 1530573:    (UNINIT)
vvv     Using uninitialized value "val".
1076      val = (val & ~MDIO_OATC14_PLCA_MAXBC) |
1077            (plca_cfg->burst_cnt << 8);
1078
1079      if (plca_cfg->burst_tmr >= 0)
1080      val = (val & ~MDIO_OATC14_PLCA_BTMR) |
1081            (plca_cfg->burst_tmr);

This is not actually creating a real problem because the path leading to
'val' being used uninitialized will eventually override the full content
of that variable before actually using it for writing the register.
However, the fix is simple and comes at basically no cost.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Fixes: 493323416fed ("drivers/net/phy: add helpers to get/set PLCA configuration")
Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/f22f1864165a8dbac8b7a2277f341bc8e7a7b70d.1674056765.git.piergiorgio.beruto@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'devlink-linecard-and-reporters-locking-cleanup'
Jakub Kicinski [Fri, 20 Jan 2023 03:00:31 +0000 (19:00 -0800)]
Merge branch 'devlink-linecard-and-reporters-locking-cleanup'

Jiri Pirko says:

====================
devlink: linecard and reporters locking cleanup

This patchset does not change functionality.

Patches 1-2 remove linecards lock and reference counting, converting
them to be protected by devlink instance lock as the rest of
the objects.

Patches 3-4 fix the mlx5 auxiliary device devlink locking scheme whis is
needed for proper reporters lock conversion done in the following
patches.

Patches 5-8 remove reporters locks and reference counting, converting
them to be protected by devlink instance lock as the rest of
the objects.

Patches 9 and 10 convert linecards and reporters dumpit callbacks to
recently introduced devlink_nl_instance_iter_dump() infra.

Patch 11 removes no longer needed devlink_dump_for_each_instance_get()
helper.

The last patch adds assertion to devl_is_registered() as dependency on
other locks is removed.
====================

Link: https://lore.kernel.org/r/20230118152115.1113149-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: add instance lock assertion in devl_is_registered()
Jiri Pirko [Wed, 18 Jan 2023 15:21:15 +0000 (16:21 +0100)]
devlink: add instance lock assertion in devl_is_registered()

After region and linecard lock removals, this helper is always supposed
to be called with instance lock held. So put the assertion here and
remove the comment which is no longer accurate.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: remove devlink_dump_for_each_instance_get() helper
Jiri Pirko [Wed, 18 Jan 2023 15:21:14 +0000 (16:21 +0100)]
devlink: remove devlink_dump_for_each_instance_get() helper

devlink_dump_for_each_instance_get() is currently called from
a single place in netlink.c. As there is no need to use
this helper anywhere else in the future, remove it and
call devlinks_xa_find_get() directly from while loop
in devlink_nl_instance_iter_dump(). Also remove redundant
idx clear on loop end as it is already done
in devlink_nl_instance_iter_dump().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: convert reporters dump to devlink_nl_instance_iter_dump()
Jiri Pirko [Wed, 18 Jan 2023 15:21:13 +0000 (16:21 +0100)]
devlink: convert reporters dump to devlink_nl_instance_iter_dump()

Benefit from recently introduced instance iteration and convert
reporters .dumpit generic netlink callback to use it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: convert linecards dump to devlink_nl_instance_iter_dump()
Jiri Pirko [Wed, 18 Jan 2023 15:21:12 +0000 (16:21 +0100)]
devlink: convert linecards dump to devlink_nl_instance_iter_dump()

Benefit from recently introduced instance iteration and convert
linecards .dumpit generic netlink callback to use it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: remove reporter reference counting
Jiri Pirko [Wed, 18 Jan 2023 15:21:11 +0000 (16:21 +0100)]
devlink: remove reporter reference counting

As long as the reporter life time is protected by devlink instance
lock, the reference counting is no longer needed. Remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: remove devl*_port_health_reporter_destroy()
Jiri Pirko [Wed, 18 Jan 2023 15:21:10 +0000 (16:21 +0100)]
devlink: remove devl*_port_health_reporter_destroy()

Remove port-specific health reporter destroy function as it is
currently the same as the instance one so no longer needed. Inline
__devlink_health_reporter_destroy() as it is no longer called from
multiple places.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: remove reporters_lock
Jiri Pirko [Wed, 18 Jan 2023 15:21:09 +0000 (16:21 +0100)]
devlink: remove reporters_lock

Similar to other devlink objects, rely on devlink instance lock
and remove object specific reporters_lock.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: protect health reporter operation with instance lock
Jiri Pirko [Wed, 18 Jan 2023 15:21:08 +0000 (16:21 +0100)]
devlink: protect health reporter operation with instance lock

Similar to other devlink objects, protect the reporters list
by devlink instance lock. Alongside add unlocked versions
of health reporter create/destroy functions and use them in drivers
on call paths where the instance lock is held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5: Remove MLX5E_LOCKED_FLOW flag
Jiri Pirko [Wed, 18 Jan 2023 15:21:07 +0000 (16:21 +0100)]
net/mlx5: Remove MLX5E_LOCKED_FLOW flag

The MLX5E_LOCKED_FLOW flag is not checked anywhere now so remove it
entirely.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Create separate devlink instance for ethernet auxiliary device
Jiri Pirko [Wed, 18 Jan 2023 15:21:06 +0000 (16:21 +0100)]
net/mlx5e: Create separate devlink instance for ethernet auxiliary device

The fact that devlink instance lock is held over mlx5 auxiliary devices
probe and remove routines brought a need to conditionally take devlink
instance lock there. The code is checking a MLX5E_LOCKED_FLOW flag
in mlx5 priv struct.

This is racy and may lead to access devlink objects without holding
instance lock or deadlock.

To avoid this, the only lock-wise sane solution is to make the
devlink entities created by the auxiliary device independent on
the original pci devlink instance. Create devlink instance for the
auxiliary device and put the uplink port instance there alongside with
the port health reporters.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: remove linecard reference counting
Jiri Pirko [Wed, 18 Jan 2023 15:21:05 +0000 (16:21 +0100)]
devlink: remove linecard reference counting

As long as the linecard life time is protected by devlink instance
lock, the reference counting is no longer needed. Remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agodevlink: remove linecards lock
Jiri Pirko [Wed, 18 Jan 2023 15:21:04 +0000 (16:21 +0100)]
devlink: remove linecards lock

Similar to other devlink objects, convert the linecards list to be
protected by devlink instance lock. Alongside with that rename the
create/destroy() functions to devl_* to indicate the devlink instance
lock needs to be held while calling them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: ethernet: ti: am65-cpsw: Handle -EPROBE_DEFER for Serdes PHY
Siddharth Vadapalli [Wed, 18 Jan 2023 11:21:36 +0000 (16:51 +0530)]
net: ethernet: ti: am65-cpsw: Handle -EPROBE_DEFER for Serdes PHY

In the am65_cpsw_init_serdes_phy() function, the error handling for the
call to the devm_of_phy_get() function misses the case where the return
value of devm_of_phy_get() is ERR_PTR(-EPROBE_DEFER). Proceeding without
handling this case will result in a crash when the "phy" pointer with
this value is dereferenced by phy_init() in am65_cpsw_enable_phy().

Fix this by adding appropriate error handling code.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: dab2b265dd23 ("net: ethernet: ti: am65-cpsw: Add support for SERDES configuration")
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230118112136.213061-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: dsa: microchip: ptp: Fix error code in ksz_hwtstamp_set()
Dan Carpenter [Wed, 18 Jan 2023 10:28:21 +0000 (13:28 +0300)]
net: dsa: microchip: ptp: Fix error code in ksz_hwtstamp_set()

We want to return negative error codes here but the copy_to/from_user()
functions return the number of bytes remaining to be copied.

Fixes: c59e12a140fb ("net: dsa: microchip: ptp: Initial hardware time stamping support")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/Y8fJxSvbl7UNVHh/@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoMerge branch 'net-sfp-cleanup-i2c-dt-acpi-fwnode-includes'
Jakub Kicinski [Fri, 20 Jan 2023 02:55:37 +0000 (18:55 -0800)]
Merge branch 'net-sfp-cleanup-i2c-dt-acpi-fwnode-includes'

Russell King says:

====================
net: sfp: cleanup i2c / dt / acpi / fwnode / includes

This series cleans up the DT/fwnode/ACPI code in the SFP cage driver:

1. Use the newly introduced i2c_get_adapter_by_fwnode(), which removes
the need to know about ACPI handles to find the I2C device.

2. Use device_get_match_data() to get the match data, rather than
having to look up the matching DT device_id to get at the data.

3. Rename gpio_of_names, as this is not DT specific.

4. Remove acpi.h include which is no longer necessary.

5. Remove ctype.h include which, as far as I can tell, was never
necessary.
====================

Link: https://lore.kernel.org/r/Y8fH+Vqx6huYQFDU@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: sfp: remove unused ctype.h include
Russell King (Oracle) [Wed, 18 Jan 2023 10:21:18 +0000 (10:21 +0000)]
net: sfp: remove unused ctype.h include

An include of linux/ctype.h was added in commit 1323061a018a
("net: phy: sfp: Add HWMON support for module sensors") but nothing
was used from this header file. Remove this unnecessary include.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: sfp: remove acpi.h include
Russell King (Oracle) [Wed, 18 Jan 2023 10:21:13 +0000 (10:21 +0000)]
net: sfp: remove acpi.h include

Nothing in the sfp code now references anything from the ACPI header,
everything is done via fwnode APIs, so get rid of this header.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: sfp: rename gpio_of_names[]
Russell King (Oracle) [Wed, 18 Jan 2023 10:21:08 +0000 (10:21 +0000)]
net: sfp: rename gpio_of_names[]

There's nothing DT specific about the gpio_of_names array, let's drop
the _of infix.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: sfp: use device_get_match_data()
Russell King (Oracle) [Wed, 18 Jan 2023 10:21:03 +0000 (10:21 +0000)]
net: sfp: use device_get_match_data()

Rather than using of_match_node() to get the matching of_device_id
to then retrieve the match data, use device_get_match_data() instead
to avoid firmware specific functions, and free the driver from having
firmware specific code.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: sfp: use i2c_get_adapter_by_fwnode()
Russell King (Oracle) [Wed, 18 Jan 2023 10:20:58 +0000 (10:20 +0000)]
net: sfp: use i2c_get_adapter_by_fwnode()

Use the newly introduced i2c_get_adapter_by_fwnode() API, so that we
can retrieve the I2C adapter in a firmware independent manner once we
have the fwnode handle for the adapter.

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agoice: Remove excess space
Tony Nguyen [Thu, 29 Dec 2022 19:05:20 +0000 (11:05 -0800)]
ice: Remove excess space

smatch reports inconsistent indenting due to an extra space; remove it to
resolve the issue.

smatch warnings:
drivers/net/ethernet/intel/ice/ice_lib.c:1673 ice_vsi_alloc_ring_stats() warn: inconsistent indenting

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Introduce local var for readability
Tony Nguyen [Thu, 15 Dec 2022 21:36:51 +0000 (13:36 -0800)]
ice: Introduce local var for readability

Based on previous feedback[1], introduce a local var to make things more
readable.

[1] https://lore.kernel.org/netdev/20220315203218.607f612b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
18 months agoice: Match parameter name for ice_cfg_phy_fc()
Tony Nguyen [Thu, 15 Dec 2022 21:36:37 +0000 (13:36 -0800)]
ice: Match parameter name for ice_cfg_phy_fc()

The parameter name in the function declaration and definition do not
match; adjust the naming for consistency and to avoid confusion.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Explicitly return 0
Tony Nguyen [Thu, 15 Dec 2022 21:36:25 +0000 (13:36 -0800)]
ice: Explicitly return 0

Previous checks, and goto, will catch all errors meaning these returns
will only return 0; explicitly return 0 for these cases.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
18 months agoice: Reduce scope of variables
Tony Nguyen [Wed, 23 Nov 2022 16:56:40 +0000 (08:56 -0800)]
ice: Reduce scope of variables

There are some places where the scope of a variable can
be reduced, so do that.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
18 months agoice: Move support DDP code out of ice_flex_pipe.c
Sergey Temerkhanov [Tue, 20 Dec 2022 15:01:14 +0000 (16:01 +0100)]
ice: Move support DDP code out of ice_flex_pipe.c

Currently, ice_flex_pipe.c includes the DDP loading functions
and has grown large. Although flexible processing support
code is related to DDP loading, these parts are distinct.
Move the DDP loading functionality from ice_flex_pipe.c to
a separate file.

Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Remove cppcheck suppressions
Tony Nguyen [Wed, 23 Nov 2022 16:56:27 +0000 (08:56 -0800)]
ice: Remove cppcheck suppressions

The use of suppressions for cppcheck in the kernel does not look to be
standard as the ice driver is the only one doing it. Remove the
comments/suppressions.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: combine cases in ice_ksettings_find_adv_link_speed()
Przemek Kitszel [Wed, 23 Nov 2022 15:55:44 +0000 (16:55 +0100)]
ice: combine cases in ice_ksettings_find_adv_link_speed()

Combine if statements setting the same link speed together.

Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Add support for 100G KR2/CR2/SR2 link reporting
Anirudh Venkataramanan [Wed, 23 Nov 2022 15:55:43 +0000 (16:55 +0100)]
ice: Add support for 100G KR2/CR2/SR2 link reporting

Commit 2736d94f351b ("ethtool: Added support for 50Gbps per lane link modes")
in v5.1 added (among other things) support for 100G CR2/KR2/SR2 link modes.
Advertise these link modes if the firmware reports the corresponding PHY types.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: add missing checks for PF vsi type
Jesse Brandeburg [Wed, 14 Dec 2022 00:01:31 +0000 (16:01 -0800)]
ice: add missing checks for PF vsi type

There were a few places we had missed checking the VSI type to make sure
it was definitely a PF VSI, before calling setup functions intended only
for the PF VSI.

This doesn't fix any explicit bugs but cleans up the code in a few
places and removes one explicit != vsi->type check that can be
superseded by this code (it's a super set)

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: remove redundant non-null check in ice_setup_pf_sw()
Anirudh Venkataramanan [Wed, 16 Nov 2022 12:20:32 +0000 (13:20 +0100)]
ice: remove redundant non-null check in ice_setup_pf_sw()

Remove a redundant null check, as vsi could not be null at this point.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPB
Siddaraju DH [Tue, 15 Nov 2022 09:41:35 +0000 (15:11 +0530)]
ice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPB

The PHY provides only 39b timestamp. With current timing
implementation, we discard lower 7b, leaving 32b timestamp.
The driver reconstructs the full 64b timestamp by correlating the
32b timestamp with cached_time for performance. The reconstruction
algorithm does both forward & backward interpolation.

The 32b timeval has overflow duration of 2^32 counts ~= 4.23 second.
Due to interpolation in both direction, its now ~= 2.125 second
IIRC, going with at least half a duration, the cached_time is updated
with periodic thread of 1 second (worst-case) periodicity.

But the 1 second periodicity is based on System-timer.
With PPB adjustments, if the 1588 timers increments at say
double the rate, (2s in-place of 1s), the Nyquist rate/half duration
sampling/update of cached_time with 1 second periodic thread will
lead to incorrect interpolations.

Hence we should restrict the PPB adjustments to at least half duration
of cached_time update which translates to 500,000,000 PPB.

Since the periodicity of the cached-time system thread can vary,
it is good to have some buffer time and considering practicality of
PPB adjustments, limiting the max_adj to 100,000,000.

Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Support drop action
Amritha Nambiar [Tue, 8 Nov 2022 22:08:12 +0000 (14:08 -0800)]
ice: Support drop action

Currently the drop action is supported only in switchdev mode.
Add support for offloading receive filters with action drop
in ADQ/non-ADQ modes. This is in addition to other actions
such as forwarding to a VSI (ADQ) or a queue (ADQ/non-ADQ).

Also renamed 'ch_vsi' to 'dest_vsi' as it is valid for multiple
actions such as forward to vsi/queue which may/may not create a
channel vsi.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Handle LLDP MIB Pending change
Anatolii Gerasymenko [Wed, 24 Aug 2022 12:07:28 +0000 (14:07 +0200)]
ice: Handle LLDP MIB Pending change

If the number of Traffic Classes (TC) is decreased, the FW will no
longer remove TC nodes, but will send a pending change notification. This
will allow RDMA to destroy corresponding Control QP markers. After RDMA
finishes outstanding operations, the ice driver will send an execute MIB
Pending change admin queue command to FW to finish DCB configuration
change.

The FW will buffer all incoming Pending changes, so there can be only
one active Pending change.

RDMA driver guarantees to remove Control QP markers within 5000 ms.
Hence, LLDP response timeout txTTL (default 30 sec) will be met.

In the case of a Pending change, LLDP MIB Change Event (opcode 0x0A01) will
contain the whole new MIB. But Get LLDP MIB (opcode 0x0A00) AQ call would
still return an old MIB, as the Pending change hasn't been applied yet.
Add ice_get_dcb_cfg_from_mib_change() function to retrieve DCBX config
from LLDP MIB Change Event's buffer for Pending changes.

Co-developed-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoice: Add 'Execute Pending LLDP MIB' Admin Queue command
Tsotne Chakhvadze [Wed, 24 Aug 2022 12:07:27 +0000 (14:07 +0200)]
ice: Add 'Execute Pending LLDP MIB' Admin Queue command

In DCB Willing Mode (FW managed LLDP), when the link partner changes
configuration which requires fewer TCs, the TCs that are no longer
needed are suspended by EMP FW, removed, and never resumed. This occurs
before a MIB change event is indicated to SW. The permanent suspension and
removal of these TC nodes in the scheduler prevents RDMA from being able
to destroy QPs associated with this TC, requiring a CORE reset to recover.

A new DCBX configuration change flow is defined to allow SW driver and
other SW components (RDMA) to properly adjust to the configuration
changes before they are taking effect in HW. This flow includes a
two-way handshake between EMP FW<->LAN SW<->RDMA SW.

List of changes:
- Add 'Execute Pending LLDP MIB' AQC.
- Add 'Pending Event Enable' bit.
- Add additional logic to ignore Pending Event Enable' request
  while 'LLDP MIB Chnage' event is disabled.
- Add 'Execute Pending LLDP MIB' AQC sending function to FW,
  which is needed to take place MIB Event change.

Signed-off-by: Tsotne Chakhvadze <tsotne.chakhvadze@intel.com>
Co-developed-by: Karen Sornek <karen.sornek@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Co-developed-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
18 months agoMerge branch 'net-phy-remove-probe_capabilities'
Paolo Abeni [Thu, 19 Jan 2023 15:23:26 +0000 (16:23 +0100)]
Merge branch 'net-phy-remove-probe_capabilities'

Michael Walle says:

====================
net: phy: Remove probe_capabilities

With all the drivers which used .probe_capabilities converted to the
new c45 MDIO access methods, we can now decide based upon these whether
a bus driver supports c45 and we can get rid of the not widely used
probe_capabilites.

Unfortunately, due to a now broader support of c45 scans, this will
trigger a bug on some boards with a (c22-only) Micrel PHY. These PHYs
don't ignore c45 accesses correctly, thinking they are addressed
themselves and distrupt the MDIO access. To avoid this, a blacklist
for c45 scans is introduced.
====================

Link: https://lore.kernel.org/r/20230116-net-next-remove-probe-capabilities-v2-0-15513b05e1f4@walle.cc
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: phy: Remove probe_capabilities
Andrew Lunn [Wed, 18 Jan 2023 10:01:40 +0000 (11:01 +0100)]
net: phy: Remove probe_capabilities

Deciding if to probe of PHYs using C45 is now determine by if the bus
provides the C45 read method. This makes probe_capabilities redundant
so remove it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: phy: Decide on C45 capabilities based on presence of method
Andrew Lunn [Wed, 18 Jan 2023 10:01:39 +0000 (11:01 +0100)]
net: phy: Decide on C45 capabilities based on presence of method

Some PHYs provide invalid IDs in C22 space. If C45 is supported on the
bus an attempt can be made to get the IDs from the C45 space. Decide
on this based on the presence of the C45 read method in the bus
structure. This will allow the unreliable probe_capabilities to be
removed.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mdio: scan bus based on bus capabilities for C22 and C45
Andrew Lunn [Wed, 18 Jan 2023 10:01:38 +0000 (11:01 +0100)]
net: mdio: scan bus based on bus capabilities for C22 and C45

Now that all MDIO bus drivers which set probe_capabilities to
MDIOBUS_C22_C45 have been converted to use the name API for C45
transactions, perform the scanning of the bus based on which methods
the bus provides.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mdio: Add workaround for Micrel PHYs which are not C45 compatible
Andrew Lunn [Wed, 18 Jan 2023 10:01:37 +0000 (11:01 +0100)]
net: mdio: Add workaround for Micrel PHYs which are not C45 compatible

After scanning the bus for C22 devices, check if any Micrel PHYs have
been found.  They are known to do bad things if there are C45
transactions on the bus. Prevent the scanning of the bus using C45 if
such a PHY has been detected.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mdio: Rework scanning of bus ready for quirks
Andrew Lunn [Wed, 18 Jan 2023 10:01:36 +0000 (11:01 +0100)]
net: mdio: Rework scanning of bus ready for quirks

Some C22 PHYs do bad things when there are C45 transactions on the
bus. In order to handle this, the bus needs to be scanned first for
C22 at all addresses, and then C45 scanned for all addresses.

The Marvell pxa168 driver scans a specific address on the bus to find
its PHY. This is a C22 only device, so update it to use the c22
helper.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: mdio: Move mdiobus_scan() within file
Andrew Lunn [Wed, 18 Jan 2023 10:01:35 +0000 (11:01 +0100)]
net: mdio: Move mdiobus_scan() within file

No functional change, just place it earlier in preparation for some
refactoring.

While at it, correct the comment format and one typo.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'generic-implementation-of-phy-interface-and-fixed_phy-support-for-the...
Paolo Abeni [Thu, 19 Jan 2023 12:50:13 +0000 (13:50 +0100)]
Merge branch 'generic-implementation-of-phy-interface-and-fixed_phy-support-for-the-lan743x-device'

Pavithra Sathyanarayanan says:

====================
generic implementation of phy interface and fixed_phy support for the LAN743x device

This patch series includes the following changes:

- Remove the unwanted interface settings in the LAN743x driver as
  it is preset in EEPROM configurations.

- Handle generic implementation for the phy interfaces for different
  devices LAN7430/31 and pci11x1x.

- Add new feature for fixed_phy support at 1Gbps full duplex for the
  LAN7431 device if a phy not found over MDIO. Includes support for
  communication between a MAC in a LAN7431 device and custom phys
  without an MDIO interface.
====================

Link: https://lore.kernel.org/r/20230117141614.4411-1-Pavithra.Sathyanarayanan@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: lan743x: add fixed phy support for LAN7431 device
Pavithra Sathyanarayanan [Tue, 17 Jan 2023 14:16:14 +0000 (19:46 +0530)]
net: lan743x: add fixed phy support for LAN7431 device

Add fixed_phy support at 1Gbps full duplex for the lan7431 device
if a phy not found over MDIO. Tested with a MAC to MAC connection
from LAN7431 to a KSZ9893 switch. This avoids the Driver open error
in LAN743x. TX delay and internal CLK125 generation is already
enabled in EEPROM.

Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: lan743x: add generic implementation for phy interface selection
Pavithra Sathyanarayanan [Tue, 17 Jan 2023 14:16:13 +0000 (19:46 +0530)]
net: lan743x: add generic implementation for phy interface selection

Add logic to read the Phy interface from MAC_CR register for LAN743x
driver.

Checks for the LAN7430/31 or pci11x1x devices and the adapter
interface is updated accordingly. For LAN7431, adapter interface is set
based on Bit 19 of MAC_CR register as MII or RGMII which removes the
forced RGMII/GMII configurations in lan743x_phy_open().

Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agonet: lan743x: remove unwanted interface select settings
Pavithra Sathyanarayanan [Tue, 17 Jan 2023 14:16:12 +0000 (19:46 +0530)]
net: lan743x: remove unwanted interface select settings

Remove the MII/RGMII Selection settings in driver as it is preset
by the EEPROM and has the required configurations before the driver
loads for LAN743x.

Signed-off-by: Pavithra Sathyanarayanan <Pavithra.Sathyanarayanan@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoselftests/net: mv bpf/nat6to4.c to net folder
Hangbin Liu [Wed, 18 Jan 2023 02:09:27 +0000 (10:09 +0800)]
selftests/net: mv bpf/nat6to4.c to net folder

There are some issues with the bpf/nat6to4.c building.

1. It use TEST_CUSTOM_PROGS, which will add the nat6to4.o to
   kselftest-list file and run by common run_tests.
2. When building the test via `make -C tools/testing/selftests/
   TARGETS="net"`, the nat6to4.o will be build in selftests/net/bpf/
   folder. But in test udpgro_frglist.sh it refers to ../bpf/nat6to4.o.
   The correct path should be ./bpf/nat6to4.o.
3. If building the test via `make -C tools/testing/selftests/ TARGETS="net"
   install`. The nat6to4.o will be installed to kselftest_install/net/
   folder. Then the udpgro_frglist.sh should refer to ./nat6to4.o.

To fix the confusing test path, let's just move the nat6to4.c to net folder
and build it as TEST_GEN_FILES.

Fixes: edae34a3ed92 ("selftests net: add UDP GRO fraglist + bpf self-tests")
Tested-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230118020927.3971864-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
18 months agoMerge branch 'enetc-bd-ring-cleanup'
Jakub Kicinski [Thu, 19 Jan 2023 04:52:28 +0000 (20:52 -0800)]
Merge branch 'enetc-bd-ring-cleanup'

Vladimir Oltean says:

====================
ENETC BD ring cleanup

The highlights of this patch set are:

- Installing a BPF program and changing PTP RX timestamping settings are
  currently implemented through a port reconfiguration procedure which
  triggers an AN restart on the PHY, and these procedures are not
  generally guaranteed to leave the port in a sane state. Patches 9/12
  and 11/12 address that.

- Attempting to put the port down (or trying to reconfigure it) has the
  driver oppose some resistance if it's bombarded with RX traffic
  (it won't go down). Patch 12/12 addresses that.

The other 9 patches are just cleanup in the BD ring setup/teardown code,
which gradually led to bringing the driver in a position where resolving
those 2 issues was possible.
====================

Link: https://lore.kernel.org/r/20230117230234.2950873-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: prioritize ability to go down over packet processing
Vladimir Oltean [Tue, 17 Jan 2023 23:02:34 +0000 (01:02 +0200)]
net: enetc: prioritize ability to go down over packet processing

napi_synchronize() from enetc_stop() waits until the softirq has
finished execution and no longer wants to be rescheduled. However under
high traffic load, this will never happen, and the interface can never
be closed.

The problem is the fact that the NAPI poll routine is written to update
the consumer index which makes the device want to put more buffers in
the RX ring, which restarts the madness again.

Browsing around, it seems that some drivers like i40e keep a bit
(__I40E_VSI_DOWN) which they use as communication between the control
path and the data path. But that isn't my first choice, because
complications ensue - since the enetc hardirq may trigger while we are
in a theoretical ENETC_DOWN state, it may happen that enetc_msix() masks
it, but enetc_poll() never unmasks it. To prevent a stall in that case,
one would need to schedule all NAPI instances when ENETC_DOWN gets
cleared, to process what's pending.

I find it more desirable for the control path - enetc_stop() - to just
quiesce the RX ring and let the softirq finish what remains there,
without any explicit communication, just by making hardware not provide
any more packets.

This seems possible with the Enable bit of the RX BD ring (RBaMR[EN]).
I can't seem to find an exact definition of what this bit does, but when
the RX ring is disabled, the port seems to no longer update the producer
index, and not react to software updates of the consumer index.

In fact, the RBaMR[EN] bit is already toggled by the driver, but too
late for what we want:

enetc_close()
-> enetc_stop()
   -> napi_synchronize()
-> enetc_clear_bdrs()
   -> enetc_clear_rxbdr()

The enetc_clear_bdrs() function contains not only logic to disable the
RX and TX rings, but also logic to wait for the TX ring stop being busy.

We split enetc_clear_bdrs() into enetc_disable_bdrs() and
enetc_wait_bdrs(). One needs to run before napi_synchronize() and the
other after (NAPI also processes TX completions, so we maximize our
chances of not waiting for the ENETC_TBSR_BUSY bit - unless a packet is
stuck for some reason, ofc).

We also split off enetc_enable_bdrs() from enetc_setup_bdrs(), and call
this from the mirror position in enetc_start() compared to enetc_stop(),
i.e. right before netif_tx_start_all_queues().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: set up XDP program under enetc_reconfigure()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:33 +0000 (01:02 +0200)]
net: enetc: set up XDP program under enetc_reconfigure()

Offloading a BPF program to the RX path of the driver suffers from the
same problems as the PTP reconfiguration - improper error checking can
leave the driver in an invalid state, and the link on the PHY is lost.

Reuse the enetc_reconfigure() procedure, but here, we need to run some
code in the middle of the ring reconfiguration procedure - while the
interface is still down. Introduce a callback which makes that possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: rename "xdp" and "dev" in enetc_setup_bpf()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:32 +0000 (01:02 +0200)]
net: enetc: rename "xdp" and "dev" in enetc_setup_bpf()

Follow the convention from this driver, which is to name "struct
net_device *" as "ndev", and the convention from other drivers, to name
"struct netdev_bpf *" as "bpf".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: implement ring reconfiguration procedure for PTP RX timestamping
Vladimir Oltean [Tue, 17 Jan 2023 23:02:31 +0000 (01:02 +0200)]
net: enetc: implement ring reconfiguration procedure for PTP RX timestamping

The crude enetc_stop() -> enetc_open() mechanism suffers from 2
problems:

1. improper error checking
2. it involves phylink_stop() -> phylink_start() which loses the link

Right now, the driver is prepared to offer a better alternative: a ring
reconfiguration procedure which takes the RX BD size (normal or
extended) as argument. It allocates new resources (failing if that
fails), stops the traffic, and assigns the new resources to the rings.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: move phylink_start/stop out of enetc_start/stop
Vladimir Oltean [Tue, 17 Jan 2023 23:02:30 +0000 (01:02 +0200)]
net: enetc: move phylink_start/stop out of enetc_start/stop

We want to introduce a fast interface reconfiguration procedure, which
involves temporarily stopping the rings.

But we want enetc_start() and enetc_stop() to not restart PHY autoneg,
because that can take a few seconds until it completes again.

So we need part of enetc_start() and enetc_stop(), but not all of them.
Move phylink_start() right next to phylink_of_phy_connect(), and
phylink_stop() right next to phylink_disconnect_phy(), both still in
ndo_open() and ndo_stop().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: split ring resource allocation from assignment
Vladimir Oltean [Tue, 17 Jan 2023 23:02:29 +0000 (01:02 +0200)]
net: enetc: split ring resource allocation from assignment

We have a few instances in the enetc driver where the ring resources
(BD ring iomem, software BD ring, software TSO headers, basically
everything except RX buffers) need to be reallocated. For example, when
RX timestamping is enabled, the RX BD format changes to an extended one
(twice as large).

Currently, this is done using a simplistic enetc_close() -> enetc_open()
procedure. But this is quite crude, since it also invokes phylink_stop()
-> phylink_start(), the link is lost, and a few seconds need to pass for
autoneg to complete again.

In fact it's bad also due to the improper (yolo) error checking. In case
we fail to allocate new resources, we've already freed the old ones, so
the interface is more or less stuck.

To avoid that, we need a system where reconfiguration is possible in a
way in which resources are allocated upfront. This means that there will
be a higher memory usage temporarily, but the assignment of resources to
rings can be done when both the old and new resources are still available.

Introduce a struct enetc_bdr_resource which holds the resources for a
ring, be it RX or TX. This structure duplicates a lot of fields from
struct enetc_bdr (and access to the same fields in the ring structure
was left duplicated, to not change cache characteristics in the fast
path).

When enetc_alloc_tx_resources() runs, it returns an array of resource
elements (one per TX ring), in addition to the existing priv->tx_res.
To populate priv->tx_res with that array, one must call
enetc_assign_tx_resources(), and this also frees the old resources.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: bring "bool extended" to top-level in enetc_open()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:28 +0000 (01:02 +0200)]
net: enetc: bring "bool extended" to top-level in enetc_open()

Extended RX buffer descriptors are necessary if they carry RX
timestamps, which will be true when PTP timestamping is enabled.

Right now, the rx_ring->ext_en is set from the function that allocates
ring resources (enetc_alloc_rx_resources() -> enetc_alloc_rxbdr()), and
also used later, in enetc_setup_rxbdr(). It is also used in the
enetc_rxbd() and enetc_rxbd_next() fast path helpers.

We want to decouple resource allocation from BD ring setup, but both
procedures depend on BD size (extended or not). Move the "extended"
boolean to enetc_open() and pass it both to the RX allocation procedure
as well as to the RX ring setup procedure. The latter will set
rx_ring->ext_en from now on.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: drop redundant enetc_free_tx_frame() call from enetc_free_txbdr()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:27 +0000 (01:02 +0200)]
net: enetc: drop redundant enetc_free_tx_frame() call from enetc_free_txbdr()

The call path in enetc_close() is:

enetc_close()
-> enetc_free_rxtx_rings()
   -> enetc_free_tx_ring()
      -> enetc_free_tx_frame()
-> enetc_free_tx_resources()
   -> enetc_free_txbdr()
      -> enetc_free_tx_frame()

The enetc_free_tx_frame() function is written such that the second call
exits without doing anything, but nonetheless, it is completely
redundant. Delete it. This makes the TX teardown path more similar to
the RX one, where rx_swbd freeing is done in enetc_free_rx_ring(), not
in enetc_free_rxbdr().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: rx_swbd and tx_swbd are never NULL in enetc_free_rxtx_rings()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:26 +0000 (01:02 +0200)]
net: enetc: rx_swbd and tx_swbd are never NULL in enetc_free_rxtx_rings()

The call path in enetc_close() is:

enetc_close()
-> enetc_free_rxtx_rings()
   -> enetc_free_rx_ring()
      -> tests whether rx_ring->rx_swbd is NULL
   -> enetc_free_tx_ring()
      -> tests whether tx_ring->tx_swbd is NULL
-> enetc_free_rx_resources()
   -> enetc_free_rxbdr()
      -> sets rxr->rx_swbd to NULL
-> enetc_free_tx_resources()
   -> enetc_free_txbdr()
      -> setx txr->tx_swbd to NULL

From the above, it is clear that due to the function ordering, the
checks for NULL are redundant, since the software buffer descriptor
arrays have not yet been set to NULL. Drop these checks.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: create enetc_dma_free_bdr()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:25 +0000 (01:02 +0200)]
net: enetc: create enetc_dma_free_bdr()

This is a refactoring change which introduces the opposite function of
enetc_dma_alloc_bdr().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: set up RX ring indices from enetc_setup_rxbdr()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:24 +0000 (01:02 +0200)]
net: enetc: set up RX ring indices from enetc_setup_rxbdr()

There is only one place which needs to set up indices in the RX ring.
Be consistent with what was done in the TX path and do this in
enetc_setup_rxbdr().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet: enetc: set next_to_clean/next_to_use just from enetc_setup_txbdr()
Vladimir Oltean [Tue, 17 Jan 2023 23:02:23 +0000 (01:02 +0200)]
net: enetc: set next_to_clean/next_to_use just from enetc_setup_txbdr()

enetc_alloc_txbdr() deals with allocating resources necessary for a TX
ring to work (the array of software BDs and the array of TSO headers).

The next_to_clean and next_to_use pointers are overwritten with proper
values which are read from hardware here:

enetc_open
-> enetc_alloc_tx_resources
   -> enetc_alloc_txbdr
      -> set to zero
-> enetc_setup_bdrs
   -> enetc_setup_txbdr
      -> read from hardware

So their initialization with zeroes is pointless and confusing.
Delete it.

Consequently, since enetc_setup_txbdr() has no opposite cleanup
function, also delete the resetting of these indices from
enetc_free_tx_ring().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
18 months agonet/mlx5e: Use read lock for eswitch get callbacks
Leon Romanovsky [Sun, 23 Oct 2022 16:52:16 +0000 (19:52 +0300)]
net/mlx5e: Use read lock for eswitch get callbacks

In commit 367dfa121205 ("net/mlx5: Remove devl_unlock from
mlx5_eswtich_mode_callback_enter") all functions were converted
to use write lock without relation to their actual purpose.

Change the devlink eswitch getters to use read and not write locks.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: Remove redundant allocation of spec in create indirect fwd group
Maor Dickman [Sun, 1 Jan 2023 07:06:57 +0000 (09:06 +0200)]
net/mlx5e: Remove redundant allocation of spec in create indirect fwd group

mlx5_add_flow_rules supports creating rules without any matches by passing NULL
pointer instead of spec, if NULL is passed it will use a static empty spec.
This make allocation of spec in mlx5_create_indir_fwd_group unnecessary.

Remove the redundant allocation.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: Support Geneve and GRE with VF tunnel offload
Maor Dickman [Tue, 20 Dec 2022 09:21:22 +0000 (11:21 +0200)]
net/mlx5e: Support Geneve and GRE with VF tunnel offload

Today VF tunnel offload (tunnel endpoint is on VF) is implemented
by indirect table which use rules that match on VXLAN VNI to
recirculated to root table, this limit the support for only
VXLAN tunnels.

This patch change indirect table to use one single match all rule
to recirculated to root table which is added when any tunnel decap
rule is added with tunnel endpoint is VF. This allow support of
Geneve and GRE with this configuration.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5: E-Switch, Fix typo for egress
Roi Dayan [Mon, 16 Jan 2023 12:49:37 +0000 (14:49 +0200)]
net/mlx5: E-Switch, Fix typo for egress

Fix engress to egress.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: Warn when destroying mod hdr hash table that is not empty
Roi Dayan [Thu, 22 Sep 2022 08:03:30 +0000 (11:03 +0300)]
net/mlx5e: Warn when destroying mod hdr hash table that is not empty

To avoid memory leaks add a warn when destroying mod hdr hash table
but the hash table is not empty.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: TC, Use common function allocating flow mod hdr or encap mod hdr
Roi Dayan [Wed, 21 Sep 2022 06:16:13 +0000 (09:16 +0300)]
net/mlx5e: TC, Use common function allocating flow mod hdr or encap mod hdr

Use mlx5e_tc_attach_mod_hdr() when allocating encap mod hdr and
remove mlx5e_tc_add_flow_mod_hdr() which is not being used now.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: TC, Add tc prefix to attach/detach hdr functions
Roi Dayan [Tue, 20 Sep 2022 14:22:21 +0000 (17:22 +0300)]
net/mlx5e: TC, Add tc prefix to attach/detach hdr functions

Currently there are confusing names for attach/detach functions.

mlx5e_attach_mod_hdr() vs mlx5e_mod_hdr_attach()
mlx5e_detach_mod_hdr() vs mlx5e_mod_hdr_detach()

Add tc prefix to the functions that are in en_tc.c to separate
from the functions in mod_hdr.c which has the mod_hdr prefix.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: TC, Pass flow attr to attach/detach mod hdr functions
Roi Dayan [Tue, 20 Sep 2022 14:20:01 +0000 (17:20 +0300)]
net/mlx5e: TC, Pass flow attr to attach/detach mod hdr functions

In preparation to remove duplicate functions handling mod hdr allocation
and the fact that modify hdr should be per flow attr and not flow
pass flow attr to the attach and detach mod hdr funcs.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: Add warning when log WQE size is smaller than log stride size
Adham Faris [Sun, 15 Jan 2023 08:21:11 +0000 (10:21 +0200)]
net/mlx5e: Add warning when log WQE size is smaller than log stride size

Add warning macro in the function mlx5e_mpwqe_get_log_num_strides()
when log WQE size is smaller than log stride size. Theoretically this
should not happen.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5e: Fail with messages when params are not valid for XSK
Adham Faris [Sun, 8 Jan 2023 07:45:36 +0000 (09:45 +0200)]
net/mlx5e: Fail with messages when params are not valid for XSK

Current XSK prerequisites validation implementation
(setup.c/mlx5e_validate_xsk_param()) fails silently when xsk
prerequisites are not fulfilled.
Add error messages to the kernel log to help the user understand what
went wrong when params are not valid for XSK.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5: E-switch, Remove redundant comment about meta rules
Roi Dayan [Thu, 17 Nov 2022 09:36:39 +0000 (11:36 +0200)]
net/mlx5: E-switch, Remove redundant comment about meta rules

Meta rules are created/destroyed per vport and not in eswitch
init/destroy.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5: Add hardware extended range support for PTP adjtime and adjphase
Rahul Rameshbabu [Wed, 12 Oct 2022 00:28:10 +0000 (17:28 -0700)]
net/mlx5: Add hardware extended range support for PTP adjtime and adjphase

Capable hardware can use an extended range for offsetting the clock. An
extended range of [-200000,200000] is used instead of [-32768,32767] for
the delta/phase parameter of the adjtime/adjphase ptp_clock_info callbacks.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
18 months agonet/mlx5: Add adjphase function to support hardware-only offset control
Rahul Rameshbabu [Wed, 12 Oct 2022 00:28:10 +0000 (17:28 -0700)]
net/mlx5: Add adjphase function to support hardware-only offset control

The adjtime function supports using hardware to set the clock offset when
the delta was supported by the hardware. When the delta is not supported by
the hardware, the driver handles adjusting the clock. The newly-introduced
adjphase function is similar to the adjtime function, except it guarantees
that a provided clock offset will be used directly by the hardware to
adjust the PTP clock. When the range is not acceptable by the hardware, an
error is returned.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>