review.tizen.org Git - platform/kernel/linux-rpi.git/log

devlink: keep the instance mutex alive until references are gone

The reference needs to keep the instance memory around, but also
the instance lock must remain valid. Users will take the lock,
check registration status and release the lock. mutex_destroy()
etc. belong in the same place as the freeing of the memory.

Unfortunately lockdep_unregister_key() sleeps so we need
to switch the an rcu_work.

Note that the problem is a bit hard to repro, because
devlink_pernet_pre_exit() iterates over registered instances.
AFAIU the instances must get devlink_free()d concurrently with
the namespace getting deleted for the problem to occur.

Reported-by: syzbot+d94d214ea473e218fc89@syzkaller.appspotmail.com
Reported-by: syzbot+9f0dd863b87113935acf@syzkaller.appspotmail.com
Fixes: 9053637e0da7 ("devlink: remove the registration guarantee of references")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230111042908.988199-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: disable ASPM in case of tx timeout

There are still single reports of systems where ASPM incompatibilities
cause tx timeouts. It's not clear whom to blame, so let's disable
ASPM in case of a tx timeout.

v2:
- add one-time warning for informing the user

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/92369a92-dc32-4529-0509-11459ba0e391@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dt-bindings-first-batch-of-dt-schema-conversions-for-amlogic-meson-bindings'

Neil Armstrong says:

====================
dt-bindings: first batch of dt-schema conversions for Amlogic Meson bindings

Batch conversion of the following bindings:
[...]
- mdio-mux-meson-g12a.txt
====================

Link: https://lore.kernel.org/r/20221117-b4-amlogic-bindings-convert-v2-0-36ad050bb625@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: convert mdio-mux-meson-g12a.txt to dt-schema

Convert MDIO bus multiplexer/glue of Amlogic G12a SoC family bindings
to dt-schema.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2023-01-10' of git://git./linux/kernel/git/saeed/linux

mlx5-updates-2023-01-10

1) From Gal: Add debugfs entries for netdev nic driver
   - ktls, flow steering and hairpin info
   - useful for debug and performance analysis
   - e.g hairpin queue attributes, dump ktls tx pool size, etc

2) From Maher: Update shared buffer configuration on PFC commands
   2.1) For every change of buffer's headroom, recalculate the size of shared
       buffer to be equal to "total_buffer_size" - "new_headroom_size".
       The new shared buffer size will be split in ratio of 3:1 between
       lossy and lossless pools, respectively.

   2.2) For each port buffer change, count the number of lossless buffers.
       If there is only one lossless buffer, then set its lossless pool
       usage threshold to be infinite. Otherwise, if there is more than
       one lossless buffer, set a usage threshold for each lossless buffer.

    While at it, add more verbosity to debug prints when handling user
    commands, to assist in future debug.

3) From Tariq: Throttle high rate FW commands

4) From Shay: Properly initialize management PF

5) Various cleanup patches

Merge branch 'NCN26000-PLCA-RS-support'

Piergiorgio Beruto says:

====================
net: add PLCA RS support and onsemi NCN26000

This patchset adds support for getting/setting the Physical Layer
Collision Avoidace (PLCA) Reconciliation Sublayer (RS) configuration and
status on Ethernet PHYs that supports it.

PLCA is a feature that provides improved media-access performance in terms
of throughput, latency and fairness for multi-drop (P2MP) half-duplex PHYs.
PLCA is defined in Clause 148 of the IEEE802.3 specifications as amended
by 802.3cg-2019. Currently, PLCA is supported by the 10BASE-T1S single-pair
Ethernet PHY defined in the same standard and related amendments. The OPEN
Alliance SIG TC14 defines additional specifications for the 10BASE-T1S PHY,
including a standard register map for PHYs that embeds the PLCA RS (see
PLCA management registers at https://www.opensig.org/about/specifications/).

The changes proposed herein add the appropriate ethtool netlink interface
for configuring the PLCA RS on PHYs that supports it. A separate patchset
further modifies the ethtool userspace program to show and modify the
configuration/status of the PLCA RS.

Additionally, this patchset adds support for the onsemi NCN26000
Industrial Ethernet 10BASE-T1S PHY that uses the newly added PLCA
infrastructure.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add driver for the onsemi NCN26000 10BASE-T1S PHY

This patch adds support for the onsemi NCN26000 10BASE-T1S industrial
Ethernet PHY. The driver supports Point-to-Multipoint operation without
auto-negotiation and with link control handling. The PHY also features
PLCA for improving performance in P2MP mode.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add helpers to get/set PLCA configuration

This patch adds support in phylib to read/write PLCA configuration for
Ethernet PHYs that support the OPEN Alliance "10BASE-T1S PLCA
Management Registers" specifications. These can be found at
https://www.opensig.org/about/specifications/

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add connection between ethtool and phylib for PLCA

This patch adds the required connection between netlink ethtool and
phylib to resolve PLCA get/set config and get status messages.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHY

This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S
Ethernet PHY. According to the specifications, the 10BASE-T1S supports
Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or
Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: add netlink interface for the PLCA RS

Add support for configuring the PLCA Reconciliation Sublayer on
multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g.,
10BASE-T1S). This patch adds the appropriate netlink interface
to ethtool.

Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Use kzalloc() in mlx5e_accel_fs_tcp_create()

'accel_tcp' is allocted by kvzalloc() now, which is a small chunk.
Use kzalloc() directly instead of kvzalloc().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: remove redundant ret variable

Return value from mlx5dr_send_postsend_action() directly instead of taking
this in another redundant variable.

Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Replace 0-length array with flexible array

Zero-length arrays are deprecated[1]. Replace struct mlx5e_rx_wqe_cyc's
"data" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_alloc_rq':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:827:42: warning: array subscript f is outside array bounds of 'struct mlx5_wqe_data_seg[0]' [-Warray-bounds=]
  827 |                                 wqe->data[f].byte_count = 0;
      |                                 ~~~~~~~~~^~~
In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.h:11,
                 from drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:48,
                 from drivers/net/ethernet/mellanox/mlx5/core/en_main.c:42:
drivers/net/ethernet/mellanox/mlx5/core/en.h:250:39: note: while referencing 'data'
  250 |         struct mlx5_wqe_data_seg      data[0];
      |                                       ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Replace zero-length array with flexible-array member

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
array declaration in struct mlx5e_flow_meter_aso_obj with flex-array
member.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Link: https://github.com/KSPP/linux/issues/78
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Prevent high-rate FW commands from populating all slots

Certain connection-based device-offload protocols (like TLS) use
per-connection HW objects to track the state, maintain the context, and
perform the offload properly. Some of these objects are created,
modified, and destroyed via FW commands. Under high connection rate,
this type of FW commands might continuously populate all slots of the FW
command interface and throttle it, while starving other critical control
FW commands.

Limit these throttle commands to using only up to a portion (half) of
the FW command interface slots. FW commands maximal rate is not hit, and
the same high rate is still reached when applying this limitation.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Introduce and use opcode getter in command interface

Introduce an opcode getter in the FW command interface, and use it.
Initialize the entry's opcode field early in cmd_alloc_ent() and use it
when possible.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Enable management PF initialization

Enable initialization of DPU Management PF, which is a new loopback PF
designed for communication with BMC.
For now Management PF doesn't support nor require most upper layer
protocols so avoid them.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add hairpin debugfs files

We refer to a TC NIC rule that involves forwarding as "hairpin".
Hairpin queues are mlx5 hardware specific implementation for hardware
forwarding of such packets.

For debug purposes, introduce debugfs files which:
* Expose the number of active hairpins
* Dump the hairpin table
* Allow control over the number and size of the hairpin queues instead
  of the hard-coded values.

This allows us to get visibility of the feature in order to improve it
for next generation hardware.

Add debugfs files:
  fs/tc/hairpin_num_active
  fs/tc/hairpin_num_queues
  fs/tc/hairpin_queue_size
  fs/tc/hairpin_table_dump

Note that the new values will only take effect on the next queues
creation, it does not affect existing queues.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add flow steering debugfs directory

Add a debugfs directory for flow steering related information.
The directory is currently empty, and will hold the 'tc' subdirectory in
a downstream patch.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add hairpin params structure

In preparation for downstream work to expose hairpin queues parameters,
introduce a hairpin parameters struct as part of the tc structure.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: kTLS, Add debugfs

Add TLS debugfs to improve observability by exposing the size of the tls
TX pool.

To observe the size of the TX pool:
$ cat /sys/kernel/debug/mlx5/<pci>/nic/tls/tx/pool_size

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Co-developed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add Ethernet driver debugfs

Similar to the mlx5_core debugfs, lay the groundwork for mlx5e debugfs
files under /sys/kernel/debug/mlx5/<pci>/nic/..

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Update shared buffer along with device buffer changes

Currently, the user can modify device's receive buffer size, modify the
mapping between QoS priority groups to buffers and change the buffer
state to become lossy/lossless via pfc command.

However, the shared receive buffer pool alignments, as a result of
such commands, is performed only when the shared buffer is in FW ownership.
When a user changes the mapping of priority groups or buffer size,
the shared buffer is moved to SW ownership.

Therefore, for devices that support shared buffer, handle the shared buffer
alignments in accordance to user's desired configurations.

Meaning, the following will be performed:
1. For every change of buffer's headroom, recalculate the size of shared
   buffer to be equal to "total_buffer_size" - "new_headroom_size".
   The new shared buffer size will be split in ratio of 3:1 between
   lossy and lossless pools, respectively.

2. For each port buffer change, count the number of lossless buffers.
   If there is only one lossless buffer, then set its lossless pool
   usage threshold to be infinite. Otherwise, if there is more than
   one lossless buffer, set a usage threshold for each lossless buffer.

While at it, add more verbosity to debug prints when handling user
commands, to assist in future debug.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add API to query/modify SBPR and SBCM registers

To allow users to configure shared receive buffer parameters through
dcbnl callbacks, expose an API to query and modify SBPR and SBCM registers,
which will be used in the upcoming patch.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Expose shared buffer registers bits and structs

Add the shared receive buffer management and configuration registers:
1. SBPR - Shared Buffer Pools Register
2. SBCM - Shared Buffer Class Management Register

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

qed: fix a typo in comment

Fix a typo of "permision" which should be "permission".

Signed-off-by: Dai Shixin <dai.shixin@zte.com.cn>
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Link: https://lore.kernel.org/r/202301091935262709751@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mdio-start-separating-c22-and-c45'

Michael Walle says:

====================
net: mdio: Start separating C22 and C45

This patch set starts the separation of C22 and C45 MDIO bus
transactions at the API level to the MDIO Bus drivers. C45 read and
write ops are added to the MDIO bus driver structure, and the MDIO
core will try to use these ops if requested to perform a C45
transfer. If not available a fallback to the older API is made, to
allow backwards compatibility until all drivers are converted.

A few drivers are then converted to this new API.

The core DSA patch was dropped for now as there is still an ongoing
discussion. It will be picked up in a later series again.

v2: https://lore.kernel.org/r/20221227-v6-2-rc1-c45-seperation-v2-0-ddb37710e5a7@walle.cc
v1: https://lore.kernel.org/r/20220508153049.427227-1-andrew@lunn.ch
====================

Link: https://lore.kernel.org/r/20221227-v6-2-rc1-c45-seperation-v3-0-ade1deb438da@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Separate C22 and C45 transactions

The global2 SMI MDIO bus driver can perform both C22 and C45
transfers. Create separate functions for each and register the C45
versions using the new API calls where appropriate. Update the SERDES
code to make use of these new accessors.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: add mdiobus_c45_read/write_nested helpers

Some DSA devices pass through PHY access to the MDIO bus the switch is
on. Add C45 versions of the current C22 helpers for nested accesses to
MDIO busses, so that C22 and C45 can be separated in these DSA
drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: Separate C22 and C45 transactions

The fec MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: xgmac_mdio: Separate C22 and C45 transactions

The xgmac MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

While at it, remove the misleading comment. According to Vladimir
Oltean:
- miimcom is a register accessed by fsl_pq_mdio.c, not by xgmac_mdio.c
- "device dev" doesn't really refer to anything (maybe "dev_addr").
- I don't understand what is meant by the comment "All PHY
   configuration has to be done through the TSEC1 MIIM regs". Or rather
   said, I think I understand, but it is irrelevant to the driver for 2
   reasons:
    * TSEC devices use the fsl_pq_mdio.c driver, not this one
    * It doesn't matter to this driver whose TSEC registers are used for
      MDIO access. The driver just works with the registers it's given,
      which is a concern for the device tree.
- barring the above, the rest just describes the MDIO bus API, which is
   superfluous

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mvmdio: Convert XSMI bus to new API

The marvell MDIO driver supports two different hardware blocks. The
XSMI block is C45 only. Convert this block to the new API, and only
populate the c45 calls in the bus structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mdio-bitbang: Separate C22 and C45 transactions

The bitbbanging bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new driver API calls.

The SH Ethernet driver places wrappers around these functions. In
order to not break boards which might be using C45, add similar
wrappers for C45 operations.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: Move mdiobus_c45_addr() next to users

Now that mdiobus_c45_addr() is only used within the MDIO code during
fallback, move the function next to its only users. This function
should not be used any more in drivers, the c45 helpers should be used
in its place, so hiding it away will prevent any new users from being
added.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: C22 is now optional, EOPNOTSUPP if not provided

When performing a C22 operation, check that the bus driver actually
provides the methods, and return -EOPNOTSUPP if not. C45 only busses
do exist, and in future their C22 methods will be NULL.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mdiobus_register: update validation test

Now that C45 uses its own read/write methods, the validation performed
when a bus is registers needs updating. All combinations of C22 and
C45 are supported, but both read and write methods must be provided,
read only busses are not supported etc.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pcs: pcs-xpcs: Use C45 MDIO API

Convert the PCS-XPCS driver to make use of the C45 MDIO bus API for
modify_change().

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: Add dedicated C45 API to MDIO bus drivers

Currently C22 and C45 transactions are mixed over a combined API calls
which make use of a special bit in the reg address to indicate if a
C45 transaction should be performed. This makes it impossible to know
if the bus driver actually supports C45. Additionally, many C22 only
drivers don't return -EOPNOTSUPP when asked to perform a C45
transaction, they mistaking perform a C22 transaction.

This is the first step to cleanly separate C22 from C45. To maintain
backwards compatibility until all drivers which are capable of
performing C45 are converted to this new API, the helper functions
will fall back to the older API if the new API is not
supported. Eventually this fallback will be removed.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-mxl-gpy-broken-interrupt-fixes'

Michael Walle says:

====================
net: phy: mxl-gpy: broken interrupt fixes

The GPY215 has a broken interrupt pin. This patch series tries to
workaround that and because in general that is not possible, disables the
interrupts by default and falls back to polling mode. There is an opt-in
via the devicetree.

====================

Link: https://lore.kernel.org/r/20230109123013.3094144-1-michael@walle.cc
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: mxl-gpy: disable interrupts on GPY215 by default

The interrupts on the GPY215B and GPY215C are broken and the only viable
fix is to disable them altogether. There is still the possibilty to
opt-in via the device tree.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: allow a phy to opt-out of interrupt handling

Until now, it is not possible for a PHY driver to disable interrupts
during runtime. If a driver offers the .config_intr() as well as the
.handle_interrupt() ops, it is eligible for interrupt handling.
Introduce a new flag for the dev_flags property of struct phy_device, which
can be set by PHY driver to skip interrupt setup and fall back to polling
mode.

At the moment, this is used for the MaxLinear PHY which has broken
interrupt handling and there is a need to disable interrupts in some
cases.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: phy: add MaxLinear GPY2xx bindings

Add the device tree bindings for the MaxLinear GPY2xx PHYs, which
essentially adds just one flag: maxlinear,use-broken-interrupts.

One might argue, that if interrupts are broken, just don't use
the interrupt property in the first place. But it needs to be more
nuanced. First, this interrupt line is also used to wake up systems by
WoL, which has nothing to do with the (broken) PHY interrupt handling.

Second and more importantly, there are devicetrees which have this
property set. Thus, within the driver we have to switch off interrupt
handling by default as a workaround. But OTOH, a systems designer who
knows the hardware and knows there are no shared interrupts for example,
can use this new property as a hint to the driver that it can enable the
interrupt nonetheless.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: vendor-prefixes: add MaxLinear

MaxLinear is a manufacturer of integrated circuits.
https://www.maxlinear.com

Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mv88e6xxx-add-mab-offload-support'

Hans J. Schultz says:

====================
mv88e6xxx: Add MAB offload support

This patch-set adds MAB [1] offload support in mv88e6xxx.

Patch #1: Correct default return value for mv88e6xxx_port_bridge_flags.

Patch #2: Shorten the locked section in
mv88e6xxx_g1_atu_prob_irq_thread_fn().

Patch #3: The MAB implementation for mv88e6xxx.

[1] https://git.kernel.org/netdev/net-next/c/4bf24ad09bc0
====================

Link: https://lore.kernel.org/r/20230108094849.1789162-1-netdev@kapio-technology.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mv88e6xxx: mac-auth/MAB implementation

This implementation for the Marvell mv88e6xxx chip series is based on
handling ATU miss violations occurring when packets ingress on a port
that is locked with learning on. This will trigger a
SWITCHDEV_FDB_ADD_TO_BRIDGE event, which will result in the bridge module
adding a locked FDB entry. This bridge FDB entry will not age out as
it has the extern_learn flag set.

Userspace daemons can listen to these events and either accept or deny
access for the host, by either replacing the locked FDB entry with a
simple entry or leave the locked entry.

If the host MAC address is already present on another port, a ATU
member violation will occur, but to no real effect, and the packet will
be dropped in hardware. Statistics on these violations can be shown with
the command and example output of interest:

ethtool -S ethX
NIC statistics:
...
atu_member_violation: 5
atu_miss_violation: 23
...

Where ethX is the interface of the MAB enabled port.

Furthermore, as added vlan interfaces where the vid is not added to the
VTU will cause ATU miss violations reporting the FID as
MV88E6XXX_FID_STANDALONE, we need to check and skip the miss violations
handling in this case.

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mv88e6xxx: shorten the locked section in mv88e6xxx_g1_atu_prob_irq_thread_fn()

As only the hardware access functions up til and including
mv88e6xxx_g1_atu_mac_read() called under the interrupt handler
need to take the chip lock, we release the chip lock after this call.
The follow up code that handles the violations can run without the
chip lock held.
In further patches, the violation handler function will even be
incompatible with having the chip lock held. This due to an AB/BA
ordering inversion with rtnl_lock().

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mv88e6xxx: change default return of mv88e6xxx_port_bridge_flags

The default return value -EOPNOTSUPP of mv88e6xxx_port_bridge_flags()
came from the return value of the DSA method port_egress_floods() in
commit 4f85901f0063 ("net: dsa: mv88e6xxx: add support for bridge flags"),
but the DSA API was changed in commit a8b659e7ff75 ("net: dsa: act as
passthrough for bridge port flags"), resulting in the return value
-EOPNOTSUPP not being valid anymore, and sections for new flags will not
need to set the return value to zero on success, as with the new mab flag
added in a following patch.

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

amd-xgbe: Add support for 10 Mbps speed

Add the necessary changes to support 10 Mbps speed for BaseT and SFP
port modes. This is supported in MAC ver >= 30H.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://lore.kernel.org/r/20230109101819.747572-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

e1000e: Enable Link Partner Advertised Support

This enables link partner advertised support to show link modes and
pause frame use.

Signed-off-by: Jamie Gloudon <jamie.gloudon@gmx.fr>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230103230653.1102544-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: skb: remove old comments about frag_size for build_skb()

Since commit ce098da1497c ("skbuff: Introduce slab_build_skb()")
drivers trying to build skb around slab-backed buffers should
go via slab_build_skb() rather than passing frag_size = 0 to
the main build_skb().

Remove the copy'n'pasted comments about 0 meaning slab.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8152-NCM-firmwares'

Bjørn Mork says:

====================
r8152: allow firmwares with NCM support

Some device and firmware combinations with NCM support will
end up using the cdc_ncm driver by default.  This is sub-
optimal for the same reasons we've previously accepted the
blacklist hack in cdc_ether.

The recent support for subclassing the generic USB device
driver allows us to create a very slim driver with the same
functionality.  This patch set uses that to implement a
device specific configuration default which is independent
of any USB interface drivers.  This means that it works
equally whether the device initially ends up in NCM or ECM
mode, without depending on any code in the respective class
drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cdc_ether: no need to blacklist any r8152 devices

The r8152 driver does not need this anymore.

Dropping blacklist entries adds optional support for these
devices in ECM mode.

The 8153 devices are handled by the r8153_ecm driver when
in ECM mode, and must still be blacklisted here.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: add USB device driver for config selection

Subclassing the generic USB device driver to override the
default configuration selection regardless of matching interface
drivers.

The r815x family devices expose a vendor specific function which
the r8152 interface driver wants to handle.  This is the preferred
device mode. Additionally one or more USB class functions are
usually supported for hosts lacking a vendor specific driver. The
choice is USB configuration based, with one alternate function per
configuration.

Example device with both NCM and ECM alternate cfgs:

T:  Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  4 Spd=5000 MxCh= 0
D:  Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs=  3
P:  Vendor=0bda ProdID=8156 Rev=31.00
S:  Manufacturer=Realtek
S:  Product=USB 10/100/1G/2.5G LAN
S:  SerialNumber=001000001
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=256mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=r8152
E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=   2 Ivl=128ms
C:  #Ifs= 2 Cfg#= 2 Atr=a0 MxPwr=256mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0d Prot=00 Driver=
E:  Ad=83(I) Atr=03(Int.) MxPS=  16 Ivl=128ms
I:  If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=01 Driver=
I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=01 Driver=
E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
C:  #Ifs= 2 Cfg#= 3 Atr=a0 MxPwr=256mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=
E:  Ad=83(I) Atr=03(Int.) MxPS=  16 Ivl=128ms
I:  If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=
I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=
E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms

A problem with this is that Linux will prefer class functions over
vendor specific functions. Using the above example, Linux defaults
to cfg #2, running the device in a sub-optimal NCM mode.

Previously we've attempted to work around the problem by
blacklisting the devices in the ECM class driver "cdc_ether", and
matching on the ECM class function in the vendor specific interface
driver. The latter has been used to switch back to the vendor
specific configuration when the driver is probed for a class
function.

This workaround has several issues;
- class driver blacklists is additional maintanence cruft in an
  unrelated driver
- class driver blacklists prevents users from optionally running
  the devices in class mode
- each device needs double match entries in the vendor driver
- the initial probing as a class function slows down device
  discovery

Now these issues have become even worse with the introduction of
firmware supporting both NCM and ECM, where NCM ends up as the
default mode in Linux. To use the same workaround, we now have
to blacklist the devices in to two different class drivers and
add yet another match entry to the vendor specific driver.

This patch implements an alternative workaround strategy -
independent of the interface drivers.  It avoids adding a
blacklist to the cdc_ncm driver and will let us remove the
existing blacklist from the cdc_ether driver.

As an additional bonus, removing the blacklists allow users to
select one of the other device modes if wanted.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mptcp-next'

Mat Martineau says:

====================
mptcp: Protocol in-use tracking and code cleanup

Here's a collection of commits from the MPTCP tree:

Patches 1-4 and 6 contain miscellaneous code cleanup for more consistent
use of helper functions, existing local variables, and better naming.

Patches 5, 7, and 9 add sock_prot_inuse tracking for MPTCP and an
associated self test.

Patch 8 modifies the mptcp_connect self test tool to exit on SIGUSR1
when in "slow mode".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftest: mptcp: add test for mptcp socket in use

Add the function chk_msk_inuse() to diag.sh, which is used to check the
statistics of mptcp socket in use. As mptcp socket in listen state will
be closed randomly after 'accept', we need to get the count of listening
mptcp socket through 'ss' command.

All tests pass.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftest: mptcp: exit from copyfd_io_poll() when receive SIGUSR1

For now, mptcp_connect won't exit after receiving the 'SIGUSR1' signal
if '-r' is set. Fix this by skipping poll and sleep in copyfd_io_poll()
if 'quit' is set.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: add statistics for mptcp socket in use

Do the statistics of mptcp socket in use with sock_prot_inuse_add().
Therefore, we can get the count of used mptcp socket from
/proc/net/protocols:

& cat /proc/net/protocols
protocol  size sockets  memory press maxhdr  slab module     cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
MPTCPv6   2048      0       0   no       0   yes  kernel      y  n  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
MPTCP     1896      1       0   no       0   yes  kernel      y  n  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect()

'ssk' should be more appropriate to be the name of the first argument
in mptcp_token_new_connect().

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: init sk->sk_prot in build_msk()

The 'sk_prot' field in token KUNIT self-tests will be dereferenced in
mptcp_token_new_connect(). Therefore, init it with tcp_prot.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen()

'sock->sk' is used frequently in mptcp_listen(). Therefore, we can
introduce the 'sk' and replace 'sock->sk' with it.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: use local variable ssk in write_options

The local variable 'ssk' has been defined at the beginning of the function
mptcp_write_options(), use it instead of getting 'ssk' again.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: use net instead of sock_net

Use the local variable 'net' instead of sock_net() in the functions where
the variable 'struct net *net' has been defined.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: use msk_owned_by_me helper

The helper msk_owned_by_me() is defined in protocol.h, so use it instead
of sock_owned_by_me().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: optimize usbnet_bh() to reduce CPU load

The current source pushes skb into dev-done queue by calling
skb_dequeue_tail() and then pop it by skb_dequeue() to branch to
rx_cleanup state for freeing urb/skb in usbnet_bh(). It takes extra CPU
load, 2.21% (skb_queue_tail) as follows,

-   11.58%     0.26%  swapper          [k] usbnet_bh
   - 11.32% usbnet_bh
      - 6.43% skb_dequeue
           6.34% _raw_spin_unlock_irqrestore
      - 2.21% skb_queue_tail
           2.19% _raw_spin_unlock_irqrestore
      - 1.68% consume_skb
         - 0.97% kfree_skbmem
              0.80% kmem_cache_free
           0.53% skb_release_data

To reduce the extra CPU load use return values to call helper function
usb_free_skb() to free the resources instead of calling skb_queue_tail()
and skb_dequeue() for push and pop respectively.

-    7.87%     0.25%  swapper          [k] usbnet_bh
   - 7.62% usbnet_bh
      - 4.81% skb_dequeue
           4.74% _raw_spin_unlock_irqrestore
      - 1.75% consume_skb
         - 0.98% kfree_skbmem
              0.78% kmem_cache_free
           0.58% skb_release_data
        0.53% smsc95xx_rx_fixup

Signed-off-by: Leesoo Ahn <lsahn@ooseel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-micrel-warnings'

Divya Koppera says:

====================
Fixed warnings

Fixed warnings related to PTR_ERR and initialization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: Fix warn: passing zero to PTR_ERR

Handle the NULL pointer case

Fixes New smatch warnings:
drivers/net/phy/micrel.c:2613 lan8814_ptp_probe_once() warn: passing zero to 'PTR_ERR'

vim +/PTR_ERR +2613 drivers/net/phy/micrel.c
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: Fixed error related to uninitialized symbol ret

Initialized return variable

Fixes Old smatch warnings:
drivers/net/phy/micrel.c:1750 ksz886x_cable_test_get_status() error:
uninitialized symbol 'ret'.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-wangxun-adjust-code-structure'

Jiawen Wu says:

====================
net: wangxun: Adjust code structure

Remove useless structs 'txgbe_hw' and 'ngbe_hw' make the codes clear.
And move the same codes which sets MAC address between txgbe and ngbe
to libwx. Further more, rename struct 'wx_hw' to 'wx' and move total
adapter members to wx.
====================

Link: https://lore.kernel.org/r/20230106033853.2806007-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: Remove structure ngbe_adapter

Move the total private structure to libwx.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Remove structure txgbe_adapter

Move the total private structure to libwx.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: Rename private structure in libwx

In order to move the total members in struct adapter to struct wx_hw
to keep the code clean, it's a bad name of 'wx_hw' only for hardware.
Rename 'wx_hw' to 'wx', and rename the pointers at use.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wangxun: Move MAC address handling to libwx

For setting MAC address, both txgbe and ngbe drivers have the same handling
flow with different parameters. Move the same codes to libwx.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: Move defines into unified file

Remove ngbe.h, move defines into ngbe_type.h file.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Move defines into unified file

Remove txgbe.h, move defines into txgbe_type.h file.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ngbe: Remove structure ngbe_hw

Remove useless structure ngbe_hw to make the codes clear.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Remove structure txgbe_hw

Remove useless structure txgbe_hw to make the codes clear.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: micrel: Change handler interrupt for lan8814

The lan8814 represents a package of 4 PHYs. All of them are sharing the
same interrupt line. So when a link was going down/up or a frame was
timestamped, then the interrupt handler of all the PHYs was called.
Which is all fine and expected but the problem is the way the handler
interrupt works.
Basically if one of the PHYs timestamp a frame, then all the other 3
PHYs were polling the status of the interrupt until that PHY actually
cleared the interrupt by reading the timestamp.
The reason of polling was in case another PHY was also timestamping a
frame at the same time, it could miss this interrupt. But this is not
the right approach, because it is the interrupt controller who needs to
call the interrupt handlers again if the interrupt line is still
active.
Therefore change this such when the interrupt handler is called check
only if the interrupt is for itself, otherwise just exit. In this way
save CPU usage.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230104194218.3785229-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: Replace 0-length array with flexible array

Zero-length arrays are deprecated[1]. Replace struct ethtool_rxnfc's
"rule_locs" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

net/ethtool/common.c: In function 'ethtool_get_max_rxnfc_channel':
net/ethtool/common.c:558:55: warning: array subscript i is outside array bounds of '__u32[0]' {aka 'unsigned int[]'} [-Warray-bounds=]
  558 |                         .fs.location = info->rule_locs[i],
      |                                        ~~~~~~~~~~~~~~~^~~
In file included from include/linux/ethtool.h:19,
                 from include/uapi/linux/ethtool_netlink.h:12,
                 from include/linux/ethtool_netlink.h:6,
                 from net/ethtool/common.c:3:
include/uapi/linux/ethtool.h:1186:41: note: while referencing
'rule_locs'
1186 |         __u32                           rule_locs[0];
      |                                         ^~~~~~~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Andrew Lunn <andrew@lunn.ch>
Cc: kernel test robot <lkp@intel.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Cc: Sean Anderson <sean.anderson@seco.com>
Cc: Alexandru Tachici <alexandru.tachici@analog.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230106042844.give.885-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arrays

Zero-length arrays are deprecated[1]. Replace struct ipv6_rpl_sr_hdr's
"segments" union of 0-length arrays with flexible arrays. Detected with
GCC 13, using -fstrict-flex-arrays=3:

In function 'rpl_validate_srh',
    inlined from 'rpl_build_state' at ../net/ipv6/rpl_iptunnel.c:96:7:
../net/ipv6/rpl_iptunnel.c:60:28: warning: array subscript <unknown> is outside array bounds of 'struct in6_addr[0]' [-Warray-bounds=]
   60 |         if (ipv6_addr_type(&srh->rpl_segaddr[srh->segments_left - 1]) &
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../include/net/rpl.h:12,
                 from ../net/ipv6/rpl_iptunnel.c:13:
../include/uapi/linux/rpl.h: In function 'rpl_build_state':
../include/uapi/linux/rpl.h:40:33: note: while referencing 'addr'
   40 |                 struct in6_addr addr[0];
      |                                 ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230105221533.never.711-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: ioam: Replace 0-length array with flexible array

Zero-length arrays are deprecated[1]. Replace struct ioam6_trace_hdr's
"data" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

net/ipv6/ioam6_iptunnel.c: In function 'ioam6_build_state':
net/ipv6/ioam6_iptunnel.c:194:37: warning: array subscript <unknown> is outside array bounds of '__u8[0]' {aka 'unsigned char[]'} [-Warray-bounds=]
  194 |                 tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PADN;
      |                 ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
In file included from include/linux/ioam6.h:11,
                 from net/ipv6/ioam6_iptunnel.c:13:
include/uapi/linux/ioam6.h:130:17: note: while referencing 'data'
  130 |         __u8    data[0];
      |                 ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Justin Iurman <justin.iurman@uliege.be>
Tested-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://lore.kernel.org/r/20230105222115.never.661-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-unregister'

Jakub Kicinski says:

====================
devlink: remove the wait-for-references on unregister

Move the registration and unregistration of the devlink instances
under their instance locks. Don't perform the netdev-style wait
for all references when unregistering the instance.

Instead the devlink instance refcount will only ensure that
the memory of the instance is not freed. All places which acquire
access to devlink instances via a reference must check that the
instance is still registered under the instance lock.

This fixes the problem of the netdev code accessing devlink
instances before they are registered.

RFC: https://lore.kernel.org/all/20221217011953.152487-1-kuba@kernel.org/
- rewrite the cover letter
- rewrite the commit message for patch 1
- un-export and rename devl_is_alive
- squash the netdevsim patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: move devlink registration under the instance lock

To prevent races with netdev code accessing free devlink instances
move the registration under the devlink instance lock.
Core now waits for the instance to be registered before accessing it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: rename a label

err_dl_unregister should unregister the devlink instance.
Looks like renaming it was missed in one of the reshufflings.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: allow registering parameters after the instance

It's most natural to register the instance first and then its
subobjects. Now that we can use the instance lock to protect
the atomicity of all init - it should also be safe.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: don't require setting features before registration

Requiring devlink_set_features() to be run before devlink is
registered is overzealous. devlink_set_features() itself is
a leftover from old workarounds which were trying to prevent
initiating reload before probe was complete.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: remove the registration guarantee of references

The objective of exposing the devlink instance locks to
drivers was to let them use these locks to prevent user space
from accessing the device before it's fully initialized.
This is difficult because devlink_unregister() waits for all
references to be released, meaning that devlink_unregister()
can't itself be called under the instance lock.

To avoid this issue devlink_register() was moved after subobject
registration a while ago. Unfortunately the netdev paths get
a hold of the devlink instances _before_ they are registered.
Ideally netdev should wait for devlink init to finish (synchronizing
on the instance lock). This can't work because we don't know if the
instance will _ever_ be registered (in case of failures it may not).
The other option of returning an error until devlink_register()
is called is unappealing (user space would get a notification
netdev exist but would have to wait arbitrary amount of time
before accessing some of its attributes).

Weaken the guarantees of the devlink references.

Holding a reference will now only guarantee that the memory
of the object is around. Another way of looking at it is that
the reference now protects the object not its "registered" status.
Use devlink instance lock to synchronize unregistration.

This implies that releasing of the "main" reference of the devlink
instance moves from devlink_unregister() to devlink_free().

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: always check if the devlink instance is registered

Always check under the instance lock whether the devlink instance
is still / already registered.

This is a no-op for the most part, as the unregistration path currently
waits for all references. On the init path, however, we may temporarily
open up a race with netdev code, if netdevs are registered before the
devlink instance. This is temporary, the next change fixes it, and this
commit has been split out for the ease of review.

Note that in case of iterating over sub-objects which have their
own lock (regions and line cards) we assume an implicit dependency
between those objects existing and devlink unregistration.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: protect devlink->dev by the instance lock

devlink->dev is assumed to be always valid as long as any
outstanding reference to the devlink instance exists.

In prep for weakening of the references take the instance lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: update the code in netns move to latest helpers

devlink_pernet_pre_exit() is the only obvious place which takes
the instance lock without using the devl_ helpers. Update the code
and move the error print after releasing the reference
(having unlock and put together feels slightly idiomatic).

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: bump the instance index directly when iterating

xa_find_after() is designed to handle multi-index entries correctly.
If a xarray has two entries one which spans indexes 0-3 and one at
index 4 xa_find_after(0) will return the entry at index 4.

Having to juggle the two callbacks, however, is unnecessary in case
of the devlink xarray, as there is 1:1 relationship with indexes.

Always use xa_find() and increment the index manually.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sysctl: expose all net/core sysctls inside netns

All were not visible to the non-priv users inside netns. However,
with 4ecb90090c84 ("sysctl: allow override of /proc/sys/net with
CAP_NET_ADMIN"), these vars are protected from getting modified.
A proc with capable(CAP_NET_ADMIN) can change the values so
not having them visible inside netns is just causing nuisance to
process that check certain values (e.g. net.core.somaxconn) and
see different behavior in root-netns vs. other-netns

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-code-split-and-structured-instance-walk'

Jakub Kicinski says:

====================
devlink: code split and structured instance walk

Split devlink.c into a handful of files, trying to keep the "core"
code away from all the command-specific implementations.
The core code has been quite scattered until now. Going forward we can
consider using a source file per-subobject, I think that it's quite
beneficial to newcomers (based on relative ease with which folks
contribute to ethtool vs devlink). But this series doesn't split
everything out, yet - partially due to backporting concerns,
but mostly due to lack of time. Bulk of the netlink command
handling is left in a leftover.c file.

Introduce a context structure for dumps, and use it to store
the devlink instance ID of the last dumped devlink instance.
This means we don't have to restart the walk from 0 each time.

Finally - introduce a "structured walk". A centralized dump handler
in devlink/netlink.c which walks the devlink instances, deals with
refcounting/locking, simplifying the per-object implementations quite
a bit. Inspired by the ethtool code.

v1: https://lore.kernel.org/all/20230104041636.226398-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/20221215020155.1619839-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230105040531.353563-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: convert remaining dumps to the by-instance scheme

Soon we'll have to check if a devlink instance is alive after
locking it. Convert to the by-instance dumping scheme to make
refactoring easier.

Most of the subobject code no longer has to worry about any devlink
locking / lifetime rules (the only ones that still do are the two subject
types which stubbornly use their own locking). Both dump and do callbacks
are given a devlink instance which is already locked and good-to-access
(do from the .pre_doit handler, dump from the new dump indirection).

Note that we'll now check presence of an op (e.g. for sb_pool_get)
under the devlink instance lock, that will soon be necessary anyway,
because we don't hold refs on the driver modules so the memory
in which ops live may be gone for a dead instance, after upcoming
locking changes.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: add by-instance dump infra

Most dumpit implementations walk the devlink instances.
This requires careful lock taking and reference dropping.
Factor the loop out and provide just a callback to handle
a single instance dump.

Convert one user as an example, other users converted
in the next change.

Slightly inspired by ethtool netlink code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: uniformly take the devlink instance lock in the dump loop

Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit().
This way all dumps will take the instance lock in the main iteration
loop directly, making refactoring and reading the code easier.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: restart dump based on devlink instance ids (function)

Use xarray id for cases of sub-objects which are iterated in
a function.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: restart dump based on devlink instance ids (nested)

Use xarray id for cases of simple sub-object iteration.
We'll now use the state->instance for the devlink instances
and state->idx for subobject index.

Moving the definition of idx into the inner loop makes sense,
so while at it also move other sub-object local variables into
the loop.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: restart dump based on devlink instance ids (simple)

xarray gives each devlink instance an id and allows us to restart
walk based on that id quite neatly. This is nice both from the
perspective of code brevity and from the stability of the dump
(devlink instances disappearing from before the resumption point
will not cause inconsistent dumps).

This patch takes care of simple cases where state->idx counts
devlink instances only.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: health: combine loops in dump

Walk devlink instances only once. Dump the instance reporters
and port reporters before moving to the next instance.
User space should not depend on ordering of messages.

This will make improving stability of the walk easier.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>