platform/kernel/linux-rpi.git
2 years agogenetlink: use iterator in the op to policy map dumping
Jakub Kicinski [Fri, 4 Nov 2022 19:13:40 +0000 (12:13 -0700)]
genetlink: use iterator in the op to policy map dumping

We can't put the full iterator in the struct ctrl_dump_policy_ctx
because dump context is statically sized by netlink core.
Allocate it dynamically.

Rename policy to dump_map to make the logic a little easier to follow.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: add iterator for walking family ops
Jakub Kicinski [Fri, 4 Nov 2022 19:13:39 +0000 (12:13 -0700)]
genetlink: add iterator for walking family ops

Subsequent changes will expose split op structures to users,
so walking the family ops with just an index will get harder.
Add a structured iterator, convert the simple cases.
Policy dumping needs more careful conversion.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: inline genl_get_cmd()
Jakub Kicinski [Fri, 4 Nov 2022 19:13:38 +0000 (12:13 -0700)]
genetlink: inline genl_get_cmd()

All callers go via genl_get_cmd_split() now, so rename it
to genl_get_cmd() remove the original.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: support split policies in ctrl_dumppolicy_put_op()
Jakub Kicinski [Fri, 4 Nov 2022 19:13:37 +0000 (12:13 -0700)]
genetlink: support split policies in ctrl_dumppolicy_put_op()

Pass do and dump versions of the op to ctrl_dumppolicy_put_op()
so that it can provide a different policy index for the two.

Since we now look at policies, and those are set appropriately
there's no need to look at the GENL_DONT_VALIDATE_DUMP flag.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: add policies for both doit and dumpit in ctrl_dumppolicy_start()
Jakub Kicinski [Fri, 4 Nov 2022 19:13:36 +0000 (12:13 -0700)]
genetlink: add policies for both doit and dumpit in ctrl_dumppolicy_start()

Separate adding doit and dumpit policies for CTRL_CMD_GETPOLICY.
This has no effect until we actually allow do and dump to come
from different sources as netlink_policy_dump_add_policy()
does deduplication.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: check for callback type at op load time
Jakub Kicinski [Fri, 4 Nov 2022 19:13:35 +0000 (12:13 -0700)]
genetlink: check for callback type at op load time

Now that genl_get_cmd_split() is informed what type of callback
user is trying to access (do or dump) we can check that this
callback is indeed available and return an error early.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: load policy based on validation flags
Jakub Kicinski [Fri, 4 Nov 2022 19:13:34 +0000 (12:13 -0700)]
genetlink: load policy based on validation flags

Set the policy and maxattr pointers based on validation flags.
genl_family_rcv_msg_attrs_parse() will do nothing and return NULL
if maxattrs is zero, so no behavior change is expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: introduce split op representation
Jakub Kicinski [Fri, 4 Nov 2022 19:13:33 +0000 (12:13 -0700)]
genetlink: introduce split op representation

We currently have two forms of operations - small ops and "full" ops
(or just ops). The former does not have pointers for some of the less
commonly used features (namely dump start/done and policy).

The "full" ops, however, still don't contain all the necessary
information. In particular the policy is per command ID, while
do and dump often accept different attributes. It's also not
possible to define different pre_doit and post_doit callbacks
for different commands within the family.

At the same time a lot of commands do not support dumping and
therefore all the dump-related information is wasted space.

Create a new command representation which can hold info about
a do implementation or a dump implementation, but not both at
the same time.

Use this new representation on the command execution path
(genl_family_rcv_msg) as we either run a do or a dump and
don't have to create a "full" op there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: move the private fields in struct genl_family
Jakub Kicinski [Fri, 4 Nov 2022 19:13:32 +0000 (12:13 -0700)]
genetlink: move the private fields in struct genl_family

Move the private fields down to form a "private section".
Use the kdoc "private:" label comment thing to hide them
from the main kdoc comment.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogenetlink: refactor the cmd <> policy mapping dump
Jakub Kicinski [Fri, 4 Nov 2022 19:13:31 +0000 (12:13 -0700)]
genetlink: refactor the cmd <> policy mapping dump

The code at the top of ctrl_dumppolicy() dumps mappings between
ops and policies. It supports dumping both the entire family and
single op if dump is filtered. But both of those cases are handled
inside a loop, which makes the logic harder to follow and change.
Refactor to split the two cases more clearly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'am65-cpsw-suspend-resume'
David S. Miller [Mon, 7 Nov 2022 12:20:03 +0000 (12:20 +0000)]
Merge branch 'am65-cpsw-suspend-resume'

Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: Add suspend/resume support

This series enables PM_SLEEP(suspend/resume) support to
the am65-cpsw network driver.

Dual-emac and Switch mode are tested to work with suspend/resume
on AM62-SK platform.

It can be verified on the following branch
https://github.com/rogerq/linux/commits/for-v6.2/am62-cpsw-lpm-1.0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume
Roger Quadros [Fri, 4 Nov 2022 13:23:10 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume

On low power during system suspend the ALE table context is lost.
Save the ALE contect before suspend and restore it after resume.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume
Roger Quadros [Fri, 4 Nov 2022 13:23:09 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume

During suspend resume the context of PORT_VLAN_REG is lost so
save it during suspend and restore it during resume for
host port and slave ports.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: cpsw_ale: Add cpsw_ale_restore() helper
Roger Quadros [Fri, 4 Nov 2022 13:23:08 +0000 (15:23 +0200)]
net: ethernet: ti: cpsw_ale: Add cpsw_ale_restore() helper

This can be used by device driver to restore ALE context.
The data produced by cpsw_ale_dump() can be passed to
cpsw_ale_restore().

This is required as on certain platforms the ALE context
is lost on low power suspend/resume.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw: Add suspend/resume support
Roger Quadros [Fri, 4 Nov 2022 13:23:07 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw: Add suspend/resume support

Add PM handlers for System suspend/resume.

As DMA driver doesn't yet support suspend/resume we free up
the DMA channels at suspend and acquire and initialize them
at resume.

Move the init/free dma calls to ndo_open/close() hooks so
it is symmetric and easier to invoke from suspend/resume handler.

As CPTS looses contect during suspend/resume, invoke the
necessary CPTS suspend/resume helpers.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: ti: am65-cpsw/cpts: Add suspend/resume helpers
Roger Quadros [Fri, 4 Nov 2022 13:23:06 +0000 (15:23 +0200)]
net: ethernet: ti: am65-cpsw/cpts: Add suspend/resume helpers

CPTS looses context on suspend (e.g. on AM62).
Provide suspend/resume hooks in CPTS driver. These will be
invoked by CPSW driver if CPTS was instantiated by CPSW.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: fix yt8521 duplicated argument to & or |
Frank [Fri, 4 Nov 2022 08:44:41 +0000 (16:44 +0800)]
net: phy: fix yt8521 duplicated argument to & or |

cocci warnings: (new ones prefixed by >>)
>> drivers/net/phy/motorcomm.c:1122:8-35: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1126:8-35: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1130:8-34: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1134:8-34: duplicated argument to & or |

 The second YT8521_RC1R_GE_TX_DELAY_xx should be YT8521_RC1R_FE_TX_DELAY_xx.

Fixes: 70479a40954c ("net: phy: Add driver for Motorcomm yt8521 gigabit ethernet phy")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Frank <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agogve: Fix error return code in gve_prefill_rx_pages()
Yang Yingliang [Fri, 4 Nov 2022 06:17:36 +0000 (14:17 +0800)]
gve: Fix error return code in gve_prefill_rx_pages()

If alloc_page() fails in gve_prefill_rx_pages(), it should return
an error code in the error path.

Fixes: 82fd151d38d9 ("gve: Reduce alloc and copy costs in the GQ rx path")
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Shailend Chand <shailend@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: fec: simplify the code logic of quirks
Shenwei Wang [Fri, 4 Nov 2022 02:47:54 +0000 (21:47 -0500)]
net: fec: simplify the code logic of quirks

Simplify the code logic of handling the quirk of FEC_QUIRK_HAS_RACC.
If a SoC has the RACC quirk, the driver will enable the 16bit shift
by default in the probe function for a better performance.

This patch handles the logic in one place to make the logic simple
and clean. The patch optimizes the fec_enet_xdp_get_tx_queue function
according to Paolo Abeni's comments, and it also exludes the SoCs that
require to do frame swap from XDP support.

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/lcs: Fix return type of lcs_start_xmit()
Nathan Chancellor [Thu, 3 Nov 2022 17:01:30 +0000 (10:01 -0700)]
s390/lcs: Fix return type of lcs_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/lcs.c:2090:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = lcs_start_xmit,
                                    ^~~~~~~~~~~~~~
  drivers/s390/net/lcs.c:2097:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = lcs_start_xmit,
                                    ^~~~~~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of lcs_start_xmit() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/netiucv: Fix return type of netiucv_tx()
Nathan Chancellor [Thu, 3 Nov 2022 17:01:29 +0000 (10:01 -0700)]
s390/netiucv: Fix return type of netiucv_tx()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/netiucv.c:1854:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = netiucv_tx,
                                    ^~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of netiucv_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Additionally, while in the area, remove a comment block that is no
longer relevant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/ctcm: Fix return type of ctc{mp,}m_tx()
Nathan Chancellor [Thu, 3 Nov 2022 17:01:28 +0000 (10:01 -0700)]
s390/ctcm: Fix return type of ctc{mp,}m_tx()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/ctcm_main.c:1064:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = ctcm_tx,
                                    ^~~~~~~
  drivers/s390/net/ctcm_main.c:1072:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = ctcmpc_tx,
                                    ^~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of ctc{mp,}m_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Additionally, while in the area, remove a comment block that is no
longer relevant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Add NAPI support
Haijun Liu [Thu, 3 Nov 2022 09:18:29 +0000 (14:48 +0530)]
net: wwan: t7xx: Add NAPI support

Replace the work queue based RX flow with a NAPI implementation
Remove rx_thread and dpmaif_rxq_work.
Enable GRO on RX path.
Introduce dummy network device. its responsibility is
    - Binds one NAPI object for each DL HW queue and acts as
      the agent of all those network devices.
    - Use NAPI object to poll DL packets.
    - Helps to dispatch each packet to the network interface.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Signed-off-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Acked-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Acked-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: wwan: t7xx: Use needed_headroom instead of hard_header_len
Ilpo Järvinen [Thu, 3 Nov 2022 09:18:28 +0000 (14:48 +0530)]
net: wwan: t7xx: Use needed_headroom instead of hard_header_len

hard_header_len is used by gro_list_prepare() but on Rx, there
is no header so use needed_headroom instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hinic: Add support for configuration of rx-vlan-filter by ethtool
Cai Huoqing [Thu, 3 Nov 2022 08:05:11 +0000 (16:05 +0800)]
net: hinic: Add support for configuration of rx-vlan-filter by ethtool

When ethtool config rx-vlan-filter, the driver will send
control command to firmware, then set to hardware in this patch.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hinic: Add control command support for VF PMD driver in DPDK
Cai Huoqing [Thu, 3 Nov 2022 08:05:10 +0000 (16:05 +0800)]
net: hinic: Add control command support for VF PMD driver in DPDK

HINIC has a mailbox for PF-VF communication and the VF driver
could send port control command to PF driver via mailbox.

The control command only can be set to register in PF,
so add support in PF driver for VF PMD driver control
command when VF PMD driver work with linux PF driver.

Then, no need to add handlers to nic_vf_cmd_msg_handler[],
because the host driver just forwards it to the firmware.
Actually the firmware works on a coprocessor MGMT_CPU(inside the NIC)
which will recv and deal with these commands.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hinic: Convert the cmd code from decimal to hex to be more readable
Cai Huoqing [Thu, 3 Nov 2022 08:05:09 +0000 (16:05 +0800)]
net: hinic: Convert the cmd code from decimal to hex to be more readable

The print cmd code is in hex, so using hex cmd code intead of
decimal is easy to check the value with print info.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: dsa-port: constrain number of 'reg' in ports
Krzysztof Kozlowski [Wed, 2 Nov 2022 16:15:12 +0000 (12:15 -0400)]
dt-bindings: net: dsa-port: constrain number of 'reg' in ports

'reg' without any constraints allows multiple items which is not the
intention in DSA port schema (as physical port is expected to have only
one address).

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: constrain number of 'reg' in ethernet ports
Krzysztof Kozlowski [Wed, 2 Nov 2022 16:15:11 +0000 (12:15 -0400)]
dt-bindings: net: constrain number of 'reg' in ethernet ports

'reg' without any constraints allows multiple items which is not the
intention for Ethernet controller's port number.

Constrain the 'reg' on AX88178 and LAN95xx USB Ethernet Controllers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: axiemac: add PM callbacks to support suspend/resume
Andy Chiu [Tue, 1 Nov 2022 01:11:39 +0000 (09:11 +0800)]
net: axiemac: add PM callbacks to support suspend/resume

Support basic system-wide suspend and resume functions

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
David Yang [Fri, 28 Oct 2022 02:01:01 +0000 (10:01 +0800)]
net: mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood

Support mode switch properly, which is not available before.

If SoC has two Ethernet controllers, by setting both of them into MII
mode, the first controller enters GMII mode, while the second
controller is effectively disabled. This requires configuring (and
maybe enabling) the second controller in the device tree, even though
it cannot be used.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteon_ep: support Octeon device CNF95N
Veerasenareddy Burru [Thu, 3 Nov 2022 06:05:57 +0000 (23:05 -0700)]
octeon_ep: support Octeon device CNF95N

Add support for Octeon device CNF95N.
CNF95N is a Octeon Fusion family product with same PCI NIC
characteristics as CN93 which is currently supported by the driver.

update supported device list in Documentation.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Link: https://lore.kernel.org/r/20221103060600.1858-1-vburru@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'sfc-add-basic-flower-matches-to-offload'
Jakub Kicinski [Sat, 5 Nov 2022 02:54:29 +0000 (19:54 -0700)]
Merge branch 'sfc-add-basic-flower-matches-to-offload'

Edward Cree says:

====================
sfc: add basic flower matches to offload

Support offloading TC flower rules with matches on L2-L4 fields.
====================

Link: https://lore.kernel.org/r/cover.1667412458.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 4 matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:31 +0000 (15:27 +0000)]
sfc: add Layer 4 matches to ef100 TC offload

Support matching on UDP/TCP source and destination ports and TCP flags,
 with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 3 flag matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:30 +0000 (15:27 +0000)]
sfc: add Layer 3 flag matches to ef100 TC offload

Support matching on ip_frag and ip_firstfrag.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 3 matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:29 +0000 (15:27 +0000)]
sfc: add Layer 3 matches to ef100 TC offload

Support matching on IP protocol, Type of Service, Time To Live, source
 and destination addresses, with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: add Layer 2 matches to ef100 TC offload
Edward Cree [Thu, 3 Nov 2022 15:27:28 +0000 (15:27 +0000)]
sfc: add Layer 2 matches to ef100 TC offload

Support matching on EtherType, VLANs and ethernet source/destination
 addresses, with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc: check recirc_id match caps before MAE offload
Edward Cree [Thu, 3 Nov 2022 15:27:27 +0000 (15:27 +0000)]
sfc: check recirc_id match caps before MAE offload

Offloaded TC rules always match on recirc_id in the MAE, so we should
 check that the MAE reported support for this match before attempting
 to insert the rule.
These checks allow us to fail early, avoiding the PCIe round-trip to
 firmware for an MC_CMD_MAE_ACTION_RULE_INSERT that will only fail,
 and more importantly providing a more informative error message that
 identifies which match field is unsupported.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: renesas: Fix return type of rswitch_start_xmit()
Nathan Chancellor [Thu, 3 Nov 2022 22:00:32 +0000 (15:00 -0700)]
net: ethernet: renesas: Fix return type of rswitch_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/ethernet/renesas/rswitch.c:1533:20: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit = rswitch_start_xmit,
                          ^~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of rswitch_start_xmit()
to match the prototype's to resolve the warning and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221103220032.2142122-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: remove redundant check in ip_metrics_convert()
Zhengchao Shao [Fri, 4 Nov 2022 02:25:13 +0000 (10:25 +0800)]
net: remove redundant check in ip_metrics_convert()

Now ip_metrics_convert() is only called by ip_fib_metrics_init(). Before
ip_fib_metrics_init() invokes ip_metrics_convert(), it checks whether
input parameter fc_mx is NULL. Therefore, ip_metrics_convert() doesn't
need to check fc_mx.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221104022513.168868-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-ipa-more-endpoints'
David S. Miller [Fri, 4 Nov 2022 10:16:53 +0000 (10:16 +0000)]
Merge branch 'net-ipa-more-endpoints'

Alex Elder says:

====================
net: ipa: support more endpoints

This series adds support for more than 32 IPA endpoints.  To do
this, five registers whose bits represent endpoint state are
replicated as needed to represent endpoints beyond 32.  For existing
platforms, the number of endpoints is never greater than 32, so
there is just one of each register.  IPA v5.0+ supports more than
that though; these changes prepare the code for that.

Beyond that, the IPA fields that represent endpoints in a 32-bit
bitmask are updated to support an arbitrary number of these endpoint
registers.  (There is one exception, explained in patch 7.)

The first two patches are some sort of unrelated cleanups, making
use of a helper function introduced recently.

The third and fourth use parameterized functions to determine the
register offset for registers that represent endpoints.

The last five convert fields representing endpoints to allow more
than 32 endpoints to be represented.

Since v1, I have implemented Jakub's suggestions:
  - Don't print a message on (bitmap) memory allocation failure
  - Do not do "mass null checks" when allocating bitmaps
  - Rework some code to ensure error path is sane
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: use a bitmap for enabled endpoints
Alex Elder [Wed, 2 Nov 2022 22:11:39 +0000 (17:11 -0500)]
net: ipa: use a bitmap for enabled endpoints

Replace the 32-bit unsigned used to track enabled endpoints with a
Linux bitmap, to allow an arbitrary number of endpoints to be
represented.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: use a bitmap for set-up endpoints
Alex Elder [Wed, 2 Nov 2022 22:11:38 +0000 (17:11 -0500)]
net: ipa: use a bitmap for set-up endpoints

Replace the 32-bit unsigned used to track endpoints that have
completed setup with a Linux bitmap, to allow an arbitrary number
of endpoints to be represented.

Rework the error handling in ipa_endpoint_init() so the defined
endpoint bitmap is freed if an error occurs early.  Once endpoints
have been initialized, ipa_endpoint_exit() is used to recover if
the set of filtered endpoints is invalid.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: support more filtering endpoints
Alex Elder [Wed, 2 Nov 2022 22:11:37 +0000 (17:11 -0500)]
net: ipa: support more filtering endpoints

Prior to IPA v5.0, there could be no more than 32 endpoints.

A filter table begins with a bitmap indicating which endpoints have
a filter defined.  That bitmap is currently assumed to fit in a
32-bit value.

Starting with IPA v5.0, more than 32 endpoints are supported, so
it's conceivable that a TX endpoint has an ID that exceeds 32.
Increase the size of the field representing endpoints that support
filtering to 64 bits.  Rename the bitmap field "filtered".

Unlike other similar fields, we do not use an (arbitrarily long)
Linux bitmap for this purpose.  The reason is that if a filter table
ever *did* need to support more than 64 TX endpoints, its format
would change in ways we can't anticipate.

Have ipa_endpoint_init() return a negative errno rather than a mask
that indicates which endpoints support filtering, and have that
function assign the "filtered" field directly.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: use a bitmap for available endpoints
Alex Elder [Wed, 2 Nov 2022 22:11:36 +0000 (17:11 -0500)]
net: ipa: use a bitmap for available endpoints

Similar to the previous patch, replace the 32-bit unsigned used to
track endpoints supported by hardware with a Linux bitmap, to allow
an arbitrary number of endpoints to be represented.

Move ipa_endpoint_deconfig() above ipa_endpoint_config() and use
it in the error path of the latter function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: use a bitmap for defined endpoints
Alex Elder [Wed, 2 Nov 2022 22:11:35 +0000 (17:11 -0500)]
net: ipa: use a bitmap for defined endpoints

IPA v5.0 supports more than 32 endpoints, so we will be unable to
represent endpoints defined in the configuration data with a 32-bit
value.  To prepare for that, convert the field in the IPA structure
representing defined endpoints to be a Linux bitmap.

Convert loops based on that field into for_each_set_bit() calls over
the new bitmap.  Note that the loop in ipa_endpoint_config() still
assumes there are 32 or fewer endpoints (when comparing against the
available endpoint bit mask); that assumption goes away in the next
patch.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: add a parameter to suspend registers
Alex Elder [Wed, 2 Nov 2022 22:11:34 +0000 (17:11 -0500)]
net: ipa: add a parameter to suspend registers

The SUSPEND_INFO, SUSPEND_EN, SUSPEND_CLR registers represent
endpoint IDs in a bit mask.  When more than 32 endpoints are
supported, these registers will be replicated as needed to represent
the number of supported endpoints.  Update the definitions of these
registers to have a stride of 4 bytes, and update the code that
operates them to select the proper offset and bit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: add a parameter to aggregation registers
Alex Elder [Wed, 2 Nov 2022 22:11:33 +0000 (17:11 -0500)]
net: ipa: add a parameter to aggregation registers

Starting with IPA v5.0, a single IPA instance can have more than 32
endpoints defined.  To handle this, each register that holds a
bitmap of IPA endpoints is replicated as needed to represent the
available endpoints.

To prepare for this, registers that represent endpoint IDs in a bit
mask will be defined to have a parameter, with a stride value of 4
bytes.  The first 32 endpoints are represented in the first 32-bit
register, then the next (up to) 32 endpoints at an offset 4 bytes
higher.  When accessing such a register, the endpoint ID divided
by 32 determines the offset, and the endpoint ID modulo 32 defines
the endpoint's bit position within the register.

The first two registers we'll update for this are STATE_AGGR_ACTIVE
and AGGR_FORCE_CLOSE.

Until more than 32 endpoints are supported, this change has no
practical effect.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: use ipa_table_mem() in ipa_table_reset_add()
Alex Elder [Wed, 2 Nov 2022 22:11:32 +0000 (17:11 -0500)]
net: ipa: use ipa_table_mem() in ipa_table_reset_add()

Similar to the previous commit, pass flags rather than a memory
region ID to ipa_table_reset_add(), and there use ipa_table_mem() to
look up the memory region affected based on those flags.

Currently all eight of these table memory regions are assumed to
exist, because they all have canaries within them.  Stop assuming
that will always be the case, and in ipa_table_reset_add() allow
these memory regions to be non-existent.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: reduce arguments to ipa_table_init_add()
Alex Elder [Wed, 2 Nov 2022 22:11:31 +0000 (17:11 -0500)]
net: ipa: reduce arguments to ipa_table_init_add()

Recently ipa_table_mem() was added as a way to look up one of 8
possible memory regions by indicating whether it was a filter or
route table, hashed or not, and IPv6 or not.

We can simplify the interface to ipa_table_init_add() by passing two
flags to it instead of the opcode and both hashed and non-hashed
memory region IDs.  The "filter" and "ipv6" flags are sufficient to
determine the opcode to use, and with ipa_table_mem() can look up
the correct memory region as well.

It's possible to not have hashed tables, but we already verify the
number of entries in a filter or routing table is nonzero.  Stop
assuming a hashed table entry exists in ipa_table_init_add().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agords: remove redundant variable total_payload_len
Colin Ian King [Tue, 1 Nov 2022 10:41:04 +0000 (10:41 +0000)]
rds: remove redundant variable total_payload_len

Variable total_payload_len is being used to accumulate payload lengths
however it is never read or used afterwards. It is redundant and can
be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Fri, 4 Nov 2022 04:13:54 +0000 (21:13 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-02 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Joe Damato adds tracepoint information to i40e_napi_poll to expose helpful
debug information for users who'd like to get a better understanding of
how their NIC is performing as they adjust various parameters and tuning
knobs.

Note: this does not touch any XDP related code paths. This
tracepoint will only work when not using XDP. Care has been taken to avoid
changing control flow in i40e_napi_poll with this change.

Alicja adds error messaging for unsupported duplex settings for i40e.

Ye Xingchen replaces use of __FUNCTION__ with __func__ for iavf.

Bartosz changes tense of device removal message to be more clear on the
action for iavf.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  iavf: Change information about device removal in dmesg
  iavf: Replace __FUNCTION__ with __func__
  i40e: Add appropriate error message logged for incorrect duplex setting
  i40e: Add i40e_napi_poll tracepoint
  i40e: Record number of RXes cleaned during NAPI
  i40e: Record number TXes cleaned during NAPI
  i40e: Store the irq number in i40e_q_vector
====================

Link: https://lore.kernel.org/r/20221102211011.2944983-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Jakub Kicinski [Fri, 4 Nov 2022 04:12:56 +0000 (21:12 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-02 (e1000e, e1000, igc)

This series contains updates to e1000e, e1000, and igc drivers.

For e1000e, Sasha adds a new board type to help distinguish platforms and
adds device id support for upcoming platforms. He also adds trace points
for CSME flows to aid in debugging.

Ani removes unnecessary kmap_atomic call for e1000 and e1000e.

Muhammad sets speed based transmit offsets for launchtime functionality to
reduce latency for igc.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Correct the launchtime offset
  e1000: Remove unnecessary use of kmap_atomic()
  e1000e: Remove unnecessary use of kmap_atomic()
  e1000e: Add e1000e trace module
  e1000e: Add support for the next LOM generation
  e1000e: Separate MTP board type from ADP
====================

Link: https://lore.kernel.org/r/20221102203957.2944396-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agohamradio: baycom_epp: Fix return type of baycom_send_packet()
Nathan Chancellor [Wed, 2 Nov 2022 16:06:10 +0000 (09:06 -0700)]
hamradio: baycom_epp: Fix return type of baycom_send_packet()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/hamradio/baycom_epp.c:1119:25: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit      = baycom_send_packet,
                                ^~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of baycom_send_packet()
to match the prototype's to resolve the warning and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102160610.1186145-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ethernet: ti: Fix return type of netcp_ndo_start_xmit()
Nathan Chancellor [Wed, 2 Nov 2022 16:09:33 +0000 (09:09 -0700)]
net: ethernet: ti: Fix return type of netcp_ndo_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/ethernet/ti/netcp_core.c:1944:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = netcp_ndo_start_xmit,
                                    ^~~~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of
netcp_ndo_start_xmit() to match the prototype's to resolve the warning
and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102160933.1601260-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: usb: Use kstrtobool() instead of strtobool()
Christophe JAILLET [Wed, 2 Nov 2022 06:36:23 +0000 (07:36 +0100)]
net: usb: Use kstrtobool() instead of strtobool()

strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.

In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.

While at it, include the corresponding header file (<linux/kstrtox.h>).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/d4432a67b6f769cac0a9ec910ac725298b64e102.1667336095.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-fix-netdev-to-devlink_port-linkage-and-expose-to-user'
Jakub Kicinski [Fri, 4 Nov 2022 03:48:44 +0000 (20:48 -0700)]
Merge branch 'net-fix-netdev-to-devlink_port-linkage-and-expose-to-user'

Jiri Pirko says:

====================
net: fix netdev to devlink_port linkage and expose to user

Currently, the info about linkage from netdev to the related
devlink_port instance is done using ndo_get_devlink_port().
This is not sufficient, as it is up to the driver to implement it and
some of them don't do that. Also it leads to a lot of unnecessary
boilerplate code in all the drivers.

Instead of that, introduce a possibility for driver to expose this
relationship by new SET_NETDEV_DEVLINK_PORT macro which stores it into
dev->devlink_port. It is ensured by the driver init/fini flows that
the devlink_port pointer does not change during the netdev lifetime.
Devlink port is always registered before netdev register and
unregistered after netdev unregister.

Benefit from this linkage setup and remove explicit calls from driver
to devlink_port_type_eth_set() and clear(). Many of the driver
didn't use it correctly anyway. Let the devlink.c to track associated
netdev events and adjust type and type pointer accordingly. Also
use this events to to keep track on ifname change and remove RTNL lock
taking from devlink_nl_port_fill().

Finally, remove the ndo_get_devlink_port() ndo which is no longer used
and expose devlink_port handle as a new netdev netlink attribute to the
user. That way, during the ifname->devlink_port lookup, userspace app
does not have to dump whole devlink port list and instead it can just
do a simple RTM_GETLINK query.
====================

Link: https://lore.kernel.org/r/20221102160211.662752-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: expose devlink port over rtnetlink
Jiri Pirko [Wed, 2 Nov 2022 16:02:11 +0000 (17:02 +0100)]
net: expose devlink port over rtnetlink

Expose devlink port handle related to netdev over rtnetlink. Introduce a
new nested IFLA attribute to carry the info. Call into devlink code to
fill-up the nest with existing devlink attributes that are used over
devlink netlink.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: remove unused ndo_get_devlink_port
Jiri Pirko [Wed, 2 Nov 2022 16:02:10 +0000 (17:02 +0100)]
net: remove unused ndo_get_devlink_port

Remove ndo_get_devlink_port which is no longer used alongside with the
implementations in drivers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: use devlink_port pointer instead of ndo_get_devlink_port
Jiri Pirko [Wed, 2 Nov 2022 16:02:09 +0000 (17:02 +0100)]
net: devlink: use devlink_port pointer instead of ndo_get_devlink_port

Use newly introduced devlink_port pointer instead of getting it calling
to ndo_get_devlink_port op.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: add not cleared type warning to port unregister
Jiri Pirko [Wed, 2 Nov 2022 16:02:08 +0000 (17:02 +0100)]
net: devlink: add not cleared type warning to port unregister

By the time port unregister is called. There should be no type set. Make
sure that the driver cleared it before and warn in case it didn't. This
enforces symmetricity with type set and port register.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: store copy netdevice ifindex and ifname to allow port_fill() without...
Jiri Pirko [Wed, 2 Nov 2022 16:02:07 +0000 (17:02 +0100)]
net: devlink: store copy netdevice ifindex and ifname to allow port_fill() without RTNL held

To avoid a need to take RTNL mutex in port_fill() function, benefit from
the introduce infrastructure that tracks netdevice notifier events.
Store the ifindex and ifname upon register and change name events.
Remove the rtnl_held bool propagated down to port_fill() function as it
is no longer needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: remove net namespace check from devlink_nl_port_fill()
Jiri Pirko [Wed, 2 Nov 2022 16:02:06 +0000 (17:02 +0100)]
net: devlink: remove net namespace check from devlink_nl_port_fill()

It is ensured by the netdevice notifier event processing, that only
netdev pointers from the same net namespaces are filled. Remove the
net namespace check from devlink_nl_port_fill() as it is no longer
needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: remove netdev arg from devlink_port_type_eth_set()
Jiri Pirko [Wed, 2 Nov 2022 16:02:05 +0000 (17:02 +0100)]
net: devlink: remove netdev arg from devlink_port_type_eth_set()

Since devlink_port_type_eth_set() should no longer be called by any
driver with netdev pointer as it should rather use
SET_NETDEV_DEVLINK_PORT, remove the netdev arg. Add a warn to
type_clear()

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port
Jiri Pirko [Wed, 2 Nov 2022 16:02:04 +0000 (17:02 +0100)]
net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port

Benefit from the previously implemented tracking of netdev events in
devlink code and instead of calling  devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port
pointer to netdevice which is about to be registered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: track netdev with devlink_port assigned
Jiri Pirko [Wed, 2 Nov 2022 16:02:03 +0000 (17:02 +0100)]
net: devlink: track netdev with devlink_port assigned

Currently, ethernet drivers are using devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev.

Instead of calling them directly, let the driver use
SET_NETDEV_DEVLINK_PORT macro to assign devlink_port pointer and let
devlink to track it. Note the devlink port pointer is static during
the time netdevice is registered.

In devlink code, use per-namespace netdev notifier to track
the netdevices with devlink_port assigned and change the internal
devlink_port type and related type pointer accordingly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: take RTNL in port_fill() function only if it is not held
Jiri Pirko [Wed, 2 Nov 2022 16:02:02 +0000 (17:02 +0100)]
net: devlink: take RTNL in port_fill() function only if it is not held

Follow-up patch is going to introduce a netdevice notifier event
processing which is called with RTNL mutex held. Processing of this will
eventually lead to call to port_notity() and port_fill() which currently
takes RTNL mutex internally. So as a temporary solution, propagate a
bool indicating if the mutex is already held. This will go away in one
of the follow-up patches.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: move port_type_netdev_checks() call to __devlink_port_type_set()
Jiri Pirko [Wed, 2 Nov 2022 16:02:01 +0000 (17:02 +0100)]
net: devlink: move port_type_netdev_checks() call to __devlink_port_type_set()

As __devlink_port_type_set() is going to be called directly from netdevice
notifier event handle in one of the follow-up patches, move the
port_type_netdev_checks() call there.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: move port_type_warn_schedule() call to __devlink_port_type_set()
Jiri Pirko [Wed, 2 Nov 2022 16:02:00 +0000 (17:02 +0100)]
net: devlink: move port_type_warn_schedule() call to __devlink_port_type_set()

As __devlink_port_type_set() is going to be called directly from netdevice
notifier event handle in one of the follow-up patches, move the
port_type_warn_schedule() call there.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: devlink: convert devlink port type-specific pointers to union
Jiri Pirko [Wed, 2 Nov 2022 16:01:59 +0000 (17:01 +0100)]
net: devlink: convert devlink port type-specific pointers to union

Instead of storing type_dev as a void pointer, convert it to union and
use it to store either struct net_device or struct ib_device pointer.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'bridge-add-mac-authentication-bypass-mab-support'
Jakub Kicinski [Fri, 4 Nov 2022 03:46:36 +0000 (20:46 -0700)]
Merge branch 'bridge-add-mac-authentication-bypass-mab-support'

Ido Schimmel says:

====================
bridge: Add MAC Authentication Bypass (MAB) support

Patch #1 adds MAB support in the bridge driver. See the commit message
for motivation, design choices and implementation details.

Patch #2 adds corresponding test cases.

Follow-up patchsets will add offload support in mlxsw and mv88e6xxx.
====================

Link: https://lore.kernel.org/r/20221101193922.2125323-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: forwarding: Add MAC Authentication Bypass (MAB) test cases
Hans J. Schultz [Tue, 1 Nov 2022 19:39:22 +0000 (21:39 +0200)]
selftests: forwarding: Add MAC Authentication Bypass (MAB) test cases

Add four test cases to verify MAB functionality:

* Verify that a locked FDB entry can be generated by the bridge,
  preventing a host from communicating via the bridge. Test that user
  space can clear the "locked" flag by replacing the entry, thereby
  authenticating the host and allowing it to communicate via the bridge.

* Test that an entry cannot roam to a locked port, but that it can roam
  to an unlocked port.

* Test that MAB can only be enabled on a port that is both locked and
  has learning enabled.

* Test that locked FDB entries are flushed from a port when MAB is
  disabled.

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobridge: Add MAC Authentication Bypass (MAB) support
Hans J. Schultz [Tue, 1 Nov 2022 19:39:21 +0000 (21:39 +0200)]
bridge: Add MAC Authentication Bypass (MAB) support

Hosts that support 802.1X authentication are able to authenticate
themselves by exchanging EAPOL frames with an authenticator (Ethernet
bridge, in this case) and an authentication server. Access to the
network is only granted by the authenticator to successfully
authenticated hosts.

The above is implemented in the bridge using the "locked" bridge port
option. When enabled, link-local frames (e.g., EAPOL) can be locally
received by the bridge, but all other frames are dropped unless the host
is authenticated. That is, unless the user space control plane installed
an FDB entry according to which the source address of the frame is
located behind the locked ingress port. The entry can be dynamic, in
which case learning needs to be enabled so that the entry will be
refreshed by incoming traffic.

There are deployments in which not all the devices connected to the
authenticator (the bridge) support 802.1X. Such devices can include
printers and cameras. One option to support such deployments is to
unlock the bridge ports connecting these devices, but a slightly more
secure option is to use MAB. When MAB is enabled, the MAC address of the
connected device is used as the user name and password for the
authentication.

For MAB to work, the user space control plane needs to be notified about
MAC addresses that are trying to gain access so that they will be
compared against an allow list. This can be implemented via the regular
learning process with the sole difference that learned FDB entries are
installed with a new "locked" flag indicating that the entry cannot be
used to authenticate the device. The flag cannot be set by user space,
but user space can clear the flag by replacing the entry, thereby
authenticating the device.

Locked FDB entries implement the following semantics with regards to
roaming, aging and forwarding:

1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports,
   in which case the "locked" flag is cleared. FDB entries cannot roam
   to locked ports regardless of MAB being enabled or not. Therefore,
   locked FDB entries are only created if an FDB entry with the given {MAC,
   VID} does not already exist. This behavior prevents unauthenticated
   devices from disrupting traffic destined to already authenticated
   devices.

2. Aging: Locked FDB entries age and refresh by incoming traffic like
   regular entries.

3. Forwarding: Locked FDB entries forward traffic like regular entries.
   If user space detects an unauthorized MAC behind a locked port and
   wishes to prevent traffic with this MAC DA from reaching the host, it
   can do so using tc or a different mechanism.

Enable the above behavior using a new bridge port option called "mab".
It can only be enabled on a bridge port that is both locked and has
learning enabled. Locked FDB entries are flushed from the port once MAB
is disabled. A new option is added because there are pure 802.1X
deployments that are not interested in notifications about locked FDB
entries.

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 3 Nov 2022 18:38:42 +0000 (11:38 -0700)]
Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 3 Nov 2022 17:51:59 +0000 (10:51 -0700)]
Merge tag 'net-6.1-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - net: several zerocopy flags fixes

   - netfilter: fix possible memory leak in nf_nat_init()

   - openvswitch: add missing .resv_start_op

  Previous releases - regressions:

   - neigh: fix null-ptr-deref in neigh_table_clear()

   - sched: fix use after free in red_enqueue()

   - dsa: fall back to default tagger if we can't load the one from DT

   - bluetooth: fix use-after-free in l2cap_conn_del()

  Previous releases - always broken:

   - netfilter: netlink notifier might race to release objects

   - nfc: fix potential memory leak of skb

   - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu

   - bluetooth: use skb_put to set length

   - eth: tun: fix bugs for oversize packet when napi frags enabled

   - eth: lan966x: fixes for when MTU is changed

   - eth: dwmac-loongson: fix invalid mdio_node"

* tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  vsock: fix possible infinite sleep in vsock_connectible_wait_data()
  vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
  ipv6: fix WARNING in ip6_route_net_exit_late()
  bridge: Fix flushing of dynamic FDB entries
  net, neigh: Fix null-ptr-deref in neigh_table_clear()
  net/smc: Fix possible leaked pernet namespace in smc_init()
  stmmac: dwmac-loongson: fix invalid mdio_node
  ibmvnic: Free rwi on reset success
  net: mdio: fix undefined behavior in bit shift for __mdiobus_register
  Bluetooth: L2CAP: Fix attempting to access uninitialized memory
  Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
  Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
  Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
  Bluetooth: L2CAP: Fix memory leak in vhci_write
  Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
  Bluetooth: virtio_bt: Use skb_put to set length
  Bluetooth: hci_conn: Fix CIS connection dst_type handling
  Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
  netfilter: ipset: enforce documented limit to prevent allocating huge memory
  isdn: mISDN: netjet: fix wrong check of device registration
  ...

2 years agoMerge tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Thu, 3 Nov 2022 17:27:28 +0000 (10:27 -0700)]
Merge tag 'powerpc-6.1-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix an endian thinko in the asm-generic compat_arg_u64() which led to
   syscall arguments being swapped for some compat syscalls.

 - Fix syscall wrapper handling of syscalls with 64-bit arguments on
   32-bit kernels, which led to syscall arguments being misplaced.

 - A build fix for amdgpu on Book3E with AltiVec disabled.

Thanks to Andreas Schwab, Christian Zigotzky, and Arnd Bergmann.

* tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Select ARCH_SPLIT_ARG64
  powerpc/32: fix syscall wrappers with 64-bit arguments
  asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
  powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec

2 years agoMerge branch 'add-new-pcp-and-apptrust-attributes-to-dcbnl'
Paolo Abeni [Thu, 3 Nov 2022 14:26:14 +0000 (15:26 +0100)]
Merge branch 'add-new-pcp-and-apptrust-attributes-to-dcbnl'

Daniel Machon says:

====================
Add new PCP and APPTRUST attributes to dcbnl

This patch series adds new extension attributes to dcbnl, to support PCP
prioritization (and thereby hw offloadable pcp-based queue
classification) and per-selector trust and trust order. Additionally,
the microchip sparx5 driver has been dcb-enabled to make use of the new
attributes to offload PCP, DSCP and Default prio to the switch, and
implement trust order of selectors.

For pre-RFC discussion see:
https://lore.kernel.org/netdev/Yv9VO1DYAxNduw6A@DEN-LT-70577/

For RFC series see:
https://lore.kernel.org/netdev/20220915095757.2861822-1-daniel.machon@microchip.com/

In summary: there currently exist no convenient way to offload per-port
PCP-based queue classification to hardware. The DCB subsystem offers
different ways to prioritize through its APP table, but lacks an option
for PCP. Similarly, there is no way to indicate the notion of trust for
APP table selectors. This patch series addresses both topics.

PCP based queue classification:
  - 8021Q standardizes the Priority Code Point table (see 6.9.3 of IEEE
    Std 802.1Q-2018).  This patch series makes it possible, to offload
    the PCP classification to said table.  The new PCP selector is not a
    standard part of the APP managed object, therefore it is
    encapsulated in a new non-std extension attribute.

Selector trust:
  - ASIC's often has the notion of trust DSCP and trust PCP. The new
    attribute makes it possible to specify a trust order of app
    selectors, which drivers can then react on.

DCB-enable sparx5 driver:
 - Now supports offloading of DSCP, PCP and default priority. Only one
   mapping of protocol:priority is allowed. Consecutive mappings of the
   same protocol to some new priority, will overwrite the previous. This
   is to keep a consistent view of the app table and the hardware.
 - Now supports dscp and pcp trust, by use of the introduced
   dcbnl_set/getapptrust ops. Sparx5 supports trust orders: [], [dscp],
   [pcp] and [dscp, pcp]. For now, only DSCP and PCP selectors are
   supported by the driver, everything else is bounced.

Patch #1 introduces a new PCP selector to the APP object, which makes it
possible to encode PCP and DEI in the app triplet and offload it to the
PCP table of the ASIC.

Patch #2 Introduces the new extension attributes
DCB_ATTR_DCB_APP_TRUST_TABLE and DCB_ATTR_DCB_APP_TRUST. Trusted
selectors are passed in the nested DCB_ATTR_DCB_APP_TRUST_TABLE
attribute, and assembled into an array of selectors:

  u8 selectors[256];

where lower indexes has higher precedence.  In the array, selectors are
stored consecutively, starting from index zero. With a maximum number of
256 unique selectors, the list has the same maximum size.

Patch #3 Sets up the dcbnl ops hook, and adds support for offloading pcp
app entries, to the PCP table of the switch.

Patch #4 Makes use of the dcbnl_set/getapptrust ops, to set a per-port
trust order.

Patch #5 Adds support for offloading dscp app entries to the DSCP table
of the switch.

Patch #6 Adds support for offloading default prio app entries to the
switch.

====================

Link: https://lore.kernel.org/r/20221101094834.2726202-1-daniel.machon@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: microchip: sparx5: add support for offloading default prio
Daniel Machon [Tue, 1 Nov 2022 09:48:34 +0000 (10:48 +0100)]
net: microchip: sparx5: add support for offloading default prio

Add support for offloading default prio {ETHERTYPE, 0, prio}.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: microchip: sparx5: add support for offloading dscp table
Daniel Machon [Tue, 1 Nov 2022 09:48:33 +0000 (10:48 +0100)]
net: microchip: sparx5: add support for offloading dscp table

Add support for offloading dscp app entries. Dscp values are global for
all ports on the sparx5 switch. Therefore, we replicate each dscp app
entry per-port.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: microchip: sparx5: add support for apptrust
Daniel Machon [Tue, 1 Nov 2022 09:48:32 +0000 (10:48 +0100)]
net: microchip: sparx5: add support for apptrust

Make use of set/getapptrust() to implement per-selector trust and trust
order.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: microchip: sparx5: add support for offloading pcp table
Daniel Machon [Tue, 1 Nov 2022 09:48:31 +0000 (10:48 +0100)]
net: microchip: sparx5: add support for offloading pcp table

Add new registers and functions to support offload of pcp app entries.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: dcb: add new apptrust attribute
Daniel Machon [Tue, 1 Nov 2022 09:48:30 +0000 (10:48 +0100)]
net: dcb: add new apptrust attribute

Add new apptrust extension attributes to the 8021Qaz APP managed object.

Two new attributes, DCB_ATTR_DCB_APP_TRUST_TABLE and
DCB_ATTR_DCB_APP_TRUST, has been added. Trusted selectors are passed in
the nested attribute DCB_ATTR_DCB_APP_TRUST, in order of precedence.

The new attributes are meant to allow drivers, whose hw supports the
notion of trust, to be able to set whether a particular app selector is
trusted - and in which order.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: dcb: add new pcp selector to app object
Daniel Machon [Tue, 1 Nov 2022 09:48:29 +0000 (10:48 +0100)]
net: dcb: add new pcp selector to app object

Add new PCP selector for the 8021Qaz APP managed object.

As the PCP selector is not part of the 8021Qaz standard, a new non-std
extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
helper functions to translate between selector and app attribute type
has been added. The new selector has been given a value of 255, to
minimize the risk of future overlap of std- and non-std attributes.

The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
app table. This means that the dcb_app struct can now both contain std-
and non-std app attributes. Currently there is no overlap between the
selector values of the two attributes.

The purpose of adding the PCP selector, is to be able to offload
PCP-based queue classification to the 8021Q Priority Code Point table,
see 6.9.3 of IEEE Std 802.1Q-2018.

PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: mana: Assign interrupts to CPUs based on NUMA nodes
Saurabh Sengar [Tue, 1 Nov 2022 06:06:01 +0000 (23:06 -0700)]
net: mana: Assign interrupts to CPUs based on NUMA nodes

In large VMs with multiple NUMA nodes, network performance is usually
best if network interrupts are all assigned to the same virtual NUMA
node. This patch assigns online CPU according to a numa aware policy,
local cpus are returned first, followed by non-local ones, then it wraps
around.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1667282761-11547-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoMerge branch 'vsock-remove-an-unused-variable-and-fix-infinite-sleep'
Paolo Abeni [Thu, 3 Nov 2022 09:49:31 +0000 (10:49 +0100)]
Merge branch 'vsock-remove-an-unused-variable-and-fix-infinite-sleep'

Dexuan Cui says:

====================
vsock: remove an unused variable and fix infinite sleep

Patch 1 removes the unused 'wait' variable.
Patch 2 fixes an infinite sleep issue reported by a hv_sock user.
====================

Link: https://lore.kernel.org/r/20221101021706.26152-1-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agovsock: fix possible infinite sleep in vsock_connectible_wait_data()
Dexuan Cui [Tue, 1 Nov 2022 02:17:06 +0000 (19:17 -0700)]
vsock: fix possible infinite sleep in vsock_connectible_wait_data()

Currently vsock_connectible_has_data() may miss a wakeup operation
between vsock_connectible_has_data() == 0 and the prepare_to_wait().

Fix the race by adding the process to the wait queue before checking
vsock_connectible_has_data().

Fixes: b3f7fd54881b ("af_vsock: separate wait data loop")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reported-by: Frédéric Dalleau <frederic.dalleau@docker.com>
Tested-by: Frédéric Dalleau <frederic.dalleau@docker.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agovsock: remove the unused 'wait' in vsock_connectible_recvmsg()
Dexuan Cui [Tue, 1 Nov 2022 02:17:05 +0000 (19:17 -0700)]
vsock: remove the unused 'wait' in vsock_connectible_recvmsg()

Remove the unused variable introduced by 19c1b90e1979.

Fixes: 19c1b90e1979 ("af_vsock: separate receive data loop")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: fec: add initial XDP support
Shenwei Wang [Mon, 31 Oct 2022 18:53:50 +0000 (13:53 -0500)]
net: fec: add initial XDP support

This patch adds the initial XDP support to Freescale driver. It supports
XDP_PASS, XDP_DROP and XDP_REDIRECT actions. Upcoming patches will add
support for XDP_TX and Zero Copy features.

As the patch is rather large, the part of codes to collect the
statistics is separated and will prepare a dedicated patch for that
part.

I just tested with the application of xdpsock.
  -- Native here means running command of "xdpsock -i eth0"
  -- SKB-Mode means running command of "xdpsock -S -i eth0"

The following are the testing result relating to XDP mode:

root@imx8qxpc0mek:~/bpf# ./xdpsock -i eth0
 sock0@eth0:0 rxdrop xdp-drv
                   pps            pkts           1.00
rx                 371347         2717794
tx                 0              0

root@imx8qxpc0mek:~/bpf# ./xdpsock -S -i eth0
 sock0@eth0:0 rxdrop xdp-skb
                   pps            pkts           1.00
rx                 202229         404528
tx                 0              0

root@imx8qxpc0mek:~/bpf# ./xdp2 eth0
proto 0:     496708 pkt/s
proto 0:     505469 pkt/s
proto 0:     505283 pkt/s
proto 0:     505443 pkt/s
proto 0:     505465 pkt/s

root@imx8qxpc0mek:~/bpf# ./xdp2 -S eth0
proto 0:          0 pkt/s
proto 17:     118778 pkt/s
proto 17:     118989 pkt/s
proto 0:          1 pkt/s
proto 17:     118987 pkt/s
proto 0:          0 pkt/s
proto 17:     118943 pkt/s
proto 17:     118976 pkt/s
proto 0:          1 pkt/s
proto 17:     119006 pkt/s
proto 0:          0 pkt/s
proto 17:     119071 pkt/s
proto 17:     119092 pkt/s

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20221031185350.2045675-1-shenwei.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: tun: bump the link speed from 10Mbps to 10Gbps
Ilya Maximets [Mon, 31 Oct 2022 17:39:53 +0000 (18:39 +0100)]
net: tun: bump the link speed from 10Mbps to 10Gbps

The 10Mbps link speed was set in 2004 when the ethtool interface was
initially added to the tun driver.  It might have been a good
assumption 18 years ago, but CPUs and network stack came a long way
since then.

Other virtual ports typically report much higher speeds.  For example,
veth reports 10Gbps since its introduction in 2007.

Some userspace applications rely on the current link speed in
certain situations.  For example, Open vSwitch is using link speed
as an upper bound for QoS configuration if user didn't specify the
maximum rate.  Advertised 10Mbps doesn't match reality in a modern
world, so users have to always manually override the value with
something more sensible to avoid configuration issues, e.g. limiting
the traffic too much.  This also creates additional confusion among
users.

Bump the advertised speed to at least match the veth.

Alternative might be to explicitly report UNKNOWN and let the user
decide on a right value for them.  And it is indeed "the right way"
of fixing the problem.  However, that may cause issues with bonding
or with some userspace applications that may rely on speed value to
be reported (even though they should not).  Just changing the speed
value should be a safer option.

Users can still override the speed with ethtool, if necessary.

RFC discussion is linked below.

Link: https://lore.kernel.org/lkml/20221021114921.3705550-1-i.maximets@ovn.org/
Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051958.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20221031173953.614577-1-i.maximets@ovn.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agoipv6: fix WARNING in ip6_route_net_exit_late()
Zhengchao Shao [Wed, 2 Nov 2022 02:06:10 +0000 (10:06 +0800)]
ipv6: fix WARNING in ip6_route_net_exit_late()

During the initialization of ip6_route_net_init_late(), if file
ipv6_route or rt6_stats fails to be created, the initialization is
successful by default. Therefore, the ipv6_route or rt6_stats file
doesn't be found during the remove in ip6_route_net_exit_late(). It
will cause WRNING.

The following is the stack information:
name 'rt6_stats'
WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460
Modules linked in:
Workqueue: netns cleanup_net
RIP: 0010:remove_proc_entry+0x389/0x460
PKRU: 55555554
Call Trace:
<TASK>
ops_exit_list+0xb0/0x170
cleanup_net+0x4ea/0xb00
process_one_work+0x9bf/0x1710
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cdb1876192db ("[NETNS][IPV6] route6 - create route6 proc files for the namespace")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobridge: Fix flushing of dynamic FDB entries
Ido Schimmel [Tue, 1 Nov 2022 18:57:53 +0000 (20:57 +0200)]
bridge: Fix flushing of dynamic FDB entries

The following commands should result in all the dynamic FDB entries
being flushed, but instead all the non-local (non-permanent) entries are
flushed:

 # bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
 # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
 # ip link set dev br0 type bridge fdb_flush
 # bridge fdb show brport dummy1
 00:00:00:00:00:01 master br0 permanent
 33:33:00:00:00:01 self permanent
 01:00:5e:00:00:01 self permanent

This is because br_fdb_flush() works with FDB flags and not the
corresponding enumerator values. Fix by passing the FDB flag instead.

After the fix:

 # bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
 # bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
 # ip link set dev br0 type bridge fdb_flush
 # bridge fdb show brport dummy1
 00:aa:bb:cc:dd:ee master br0 static
 00:00:00:00:00:01 master br0 permanent
 33:33:00:00:00:01 self permanent
 01:00:5e:00:00:01 self permanent

Fixes: 1f78ee14eeac ("net: bridge: fdb: add support for fine-grained flushing")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20221101185753.2120691-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'rocker-two-small-changes'
Jakub Kicinski [Thu, 3 Nov 2022 03:45:26 +0000 (20:45 -0700)]
Merge branch 'rocker-two-small-changes'

Ido Schimmel says:

====================
rocker: Two small changes

Patch #1 avoids allocating and scheduling a work item when it is not
doing any work.

Patch #2 aligns rocker with other switchdev drivers to explicitly mark
FDB entries as offloaded. Needed for upcoming MAB offload [1].

[1] https://lore.kernel.org/netdev/20221025100024.1287157-1-idosch@nvidia.com/
====================

Link: https://lore.kernel.org/r/20221101123936.1900453-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agorocker: Explicitly mark learned FDB entries as offloaded
Ido Schimmel [Tue, 1 Nov 2022 12:39:36 +0000 (14:39 +0200)]
rocker: Explicitly mark learned FDB entries as offloaded

Currently, FDB entries that are notified to the bridge driver via
'SWITCHDEV_FDB_ADD_TO_BRIDGE' are always marked as offloaded by the
bridge. With MAB enabled, this will no longer be universally true.
Device drivers will report locked FDB entries to the bridge to let it
know that the corresponding hosts required authorization, but it does
not mean that these entries are necessarily programmed in the underlying
hardware.

We would like to solve it by having the bridge driver determine the
offload indication based of the 'offloaded' bit in the FDB notification
[1].

Prepare for that change by having rocker explicitly mark learned FDB
entries as offloaded. This is consistent with all the other switchdev
drivers.

[1] https://lore.kernel.org/netdev/20221025100024.1287157-4-idosch@nvidia.com/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agorocker: Avoid unnecessary scheduling of work item
Ido Schimmel [Tue, 1 Nov 2022 12:39:35 +0000 (14:39 +0200)]
rocker: Avoid unnecessary scheduling of work item

The work item function ofdpa_port_fdb_learn_work() does not do anything
when 'OFDPA_OP_FLAG_LEARNED' is not set in the work item's flags.

Therefore, do not allocate and do not schedule the work item when the
flag is not set.

Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet, neigh: Fix null-ptr-deref in neigh_table_clear()
Chen Zhongjin [Tue, 1 Nov 2022 12:15:52 +0000 (20:15 +0800)]
net, neigh: Fix null-ptr-deref in neigh_table_clear()

When IPv6 module gets initialized but hits an error in the middle,
kenel panic with:

KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f]
CPU: 1 PID: 361 Comm: insmod
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370
RSP: 0018:ffff888012677908 EFLAGS: 00000202
...
Call Trace:
 <TASK>
 neigh_table_clear+0x94/0x2d0
 ndisc_cleanup+0x27/0x40 [ipv6]
 inet6_init+0x21c/0x2cb [ipv6]
 do_one_initcall+0xd3/0x4d0
 do_init_module+0x1ae/0x670
...
Kernel panic - not syncing: Fatal exception

When ipv6 initialization fails, it will try to cleanup and calls:

neigh_table_clear()
  neigh_ifdown(tbl, NULL)
    pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL))
    # dev_net(NULL) triggers null-ptr-deref.

Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev
is NULL, to make kernel not panic immediately.

Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/smc: Fix possible leaked pernet namespace in smc_init()
Chen Zhongjin [Tue, 1 Nov 2022 09:37:22 +0000 (17:37 +0800)]
net/smc: Fix possible leaked pernet namespace in smc_init()

In smc_init(), register_pernet_subsys(&smc_net_stat_ops) is called
without any error handling.
If it fails, registering of &smc_net_ops won't be reverted.
And if smc_nl_init() fails, &smc_net_stat_ops itself won't be reverted.

This leaves wild ops in subsystem linkedlist and when another module
tries to call register_pernet_operations() it triggers page fault:

BUG: unable to handle page fault for address: fffffbfff81b964c
RIP: 0010:register_pernet_operations+0x1b9/0x5f0
Call Trace:
  <TASK>
  register_pernet_subsys+0x29/0x40
  ebtables_init+0x58/0x1000 [ebtables]
  ...

Fixes: 194730a9beb5 ("net/smc: Make SMC statistics network namespace aware")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20221101093722.127223-1-chenzhongjin@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agosfc (gcc13): synchronize ef100_enqueue_skb()'s return type
Jiri Slaby (SUSE) [Mon, 31 Oct 2022 11:44:40 +0000 (12:44 +0100)]
sfc (gcc13): synchronize ef100_enqueue_skb()'s return type

ef100_enqueue_skb() generates a valid warning with gcc-13:
  drivers/net/ethernet/sfc/ef100_tx.c:370:5: error: conflicting types for 'ef100_enqueue_skb' due to enum/integer mismatch; have 'int(struct efx_tx_queue *, struct sk_buff *)'
  drivers/net/ethernet/sfc/ef100_tx.h:25:13: note: previous declaration of 'ef100_enqueue_skb' with type 'netdev_tx_t(struct efx_tx_queue *, struct sk_buff *)'

I.e. the type of the ef100_enqueue_skb()'s return value in the declaration is
int, while the definition spells enum netdev_tx_t. Synchronize them to the
latter.

Cc: Martin Liska <mliska@suse.cz>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221031114440.10461-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobonding (gcc13): synchronize bond_{a,t}lb_xmit() types
Jiri Slaby (SUSE) [Mon, 31 Oct 2022 11:44:09 +0000 (12:44 +0100)]
bonding (gcc13): synchronize bond_{a,t}lb_xmit() types

Both bond_alb_xmit() and bond_tlb_xmit() produce a valid warning with
gcc-13:
  drivers/net/bonding/bond_alb.c:1409:13: error: conflicting types for 'bond_tlb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ...
  include/net/bond_alb.h:160:5: note: previous declaration of 'bond_tlb_xmit' with type 'int(struct sk_buff *, struct net_device *)'

  drivers/net/bonding/bond_alb.c:1523:13: error: conflicting types for 'bond_alb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ...
  include/net/bond_alb.h:159:5: note: previous declaration of 'bond_alb_xmit' with type 'int(struct sk_buff *, struct net_device *)'

I.e. the return type of the declaration is int, while the definitions
spell netdev_tx_t. Synchronize both of them to the latter.

Cc: Martin Liska <mliska@suse.cz>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221031114409.10417-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoqed (gcc13): use u16 for fid to be big enough
Jiri Slaby (SUSE) [Mon, 31 Oct 2022 11:43:54 +0000 (12:43 +0100)]
qed (gcc13): use u16 for fid to be big enough

gcc 13 correctly reports overflow in qed_grc_dump_addr_range():
In file included from drivers/net/ethernet/qlogic/qed/qed.h:23,
                 from drivers/net/ethernet/qlogic/qed/qed_debug.c:10:
drivers/net/ethernet/qlogic/qed/qed_debug.c: In function 'qed_grc_dump_addr_range':
include/linux/qed/qed_if.h:1217:9: error: overflow in conversion from 'int' to 'u8' {aka 'unsigned char'} changes value from '(int)vf_id << 8 | 128' to '128' [-Werror=overflow]

We do:
  u8 fid;
  ...
  fid = vf_id << 8 | 128;

Since fid is 16bit (and the stored value above too), fid should be u16,
not u8. Fix that.

Cc: Martin Liska <mliska@suse.cz>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221031114354.10398-1-jirislaby@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: broadcom: bcm4908_enet: report queued and transmitted bytes
Rafał Miłecki [Mon, 31 Oct 2022 10:48:56 +0000 (11:48 +0100)]
net: broadcom: bcm4908_enet: report queued and transmitted bytes

This allows BQL to operate avoiding buffer bloat and reducing latency.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Link: https://lore.kernel.org/r/20221031104856.32388-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>