review.tizen.org Git - platform/kernel/linux-rpi.git/log

net: move the ct helper function to nf_conntrack_helper for ovs and tc

Move ovs_ct_helper from openvswitch to nf_conntrack_helper and rename
as nf_ct_helper so that it can be used in TC act_ct in the next patch.
Note that it also adds the checks for the family and proto, as in TC
act_ct, the packets with correct family and proto are not guaranteed.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethtool: Fail number of channels change when it conflicts with rxnfc

Similar to what we do with the hash indirection table [1], when network
flow classification rules are forwarding traffic to channels greater
than the requested number of channels, fail the operation.
Without this, traffic could be directed to channels which no longer
exist (dropped) after changing number of channels.

[1] commit d4ab4286276f ("ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}")

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20221106123127.522985-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethtool: linkstate: add a statistic for PHY down events

The previous attempt to augment carrier_down (see Link)
was not met with much enthusiasm so let's do the simple
thing of exposing what some devices already maintain.
Add a common ethtool statistic for link going down.
Currently users have to maintain per-driver mapping
to extract the right stat from the vendor-specific ethtool -S
stats. carrier_down does not fit the bill because it counts
a lot of software related false positives.

Add the statistic to the extended link state API to steer
vendors towards implementing all of it.

Implement for bnxt and all Linux-controlled PHYs. mlx5 and (possibly)
enic also have a counter for this but I leave the implementation
to their maintainers.

Link: https://lore.kernel.org/r/20220520004500.2250674-1-kuba@kernel.org
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20221104190125.684910-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-txgbe-fix-two-bugs-in-txgbe_calc_eeprom_checksum'

YueHaibing says:

====================
net: txgbe: Fix two bugs in txgbe_calc_eeprom_checksum

Fix memleak and unsigned comparison bugs in txgbe_calc_eeprom_checksum
====================

Link: https://lore.kernel.org/r/20221105080722.20292-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()

The error checks on checksum for a negative error return always fails because
it is unsigned and can never be negative.

Fixes: 049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: Fix memleak in txgbe_calc_eeprom_checksum()

eeprom_ptrs should be freed before returned.

Fixes: 049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-04 (ixgbe, ixgbevf, igb)

This series contains updates to ixgbe, ixgbevf, and igb drivers.

Daniel Willenson adjusts descriptor buffer limits to be based on what
hardware supports instead of using a generic, least common value for
ixgbe.

Ani removes local variable for ixgbe, instead returning conditional result
directly.

Yang Li removes unneeded semicolon for ixgbe.

Jan adds error messaging when VLAN errors are encountered for ixgbevf.

Kees Cook prevents a potential use after free condition and explicitly
rounds up q_vector allocations so that allocations can be correctly
compared to ksize() for igb.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igb: Proactively round up to kmalloc bucket size
  igb: Do not free q_vector unless new one was allocated
  ixgbevf: Add error messages on vlan error
  ixgbe: Remove unneeded semicolon
  ixgbe: Remove local variable
  ixgbe: change MAX_RXD/MAX_TXD based on adapter type
====================

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221104205414.2354973-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: Fix potential skb memleak in netlink_ack

Fix coverity issue 'Resource leak'.

We should clean the skb resource if nlmsg_put/append failed.

Fixes: 738136a0e375 ("netlink: split up copies in the ack construction")
Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Link: https://lore.kernel.org/r/bff442d62c87de6299817fe1897cc5a5694ba9cc.1667638204.git.chentao.kernel@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: implement live mac addr change

Implement live mac addr change for the macb ethernet driver.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221104204837.614459-1-roman.gushchin@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: remove explicit phylink_generic_validate() references

Virtually all conventional network drivers are now converted to use
phylink_generic_validate() - only DSA drivers and fman_memac remain,
so lets remove the necessity for network drivers to explicitly set
this member, and default to phylink_generic_validate() when unset.
This is possible as .validate must currently be set.

Any remaining instances that have not been addressed by this patch can
be fixed up later.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/E1or0FZ-001tRa-DI@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: move unnecessary linux/sfp.h include

lan966x_phylink.c doesn't make use of anything from linux/sfp.h, so
remove this unnecessary include.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1oqzx9-001r9g-HV@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'genetlink-per-op-type-policies'

Jakub Kicinski says:

====================
genetlink: support per op type policies

While writing new genetlink families I was increasingly annoyed by the fact
that we don't support different policies for do and dump callbacks.
This makes it hard to do proper input validation for dumps which usually
have a lot more narrow range of accepted attributes.

There is also a minor inconvenience of not supporting different per_doit
and post_doit callbacks per op.

This series addresses those problems by introducing another op format.

v3:
- minor fixes to patch 12 after I took it for a spin with a real family
- adjust commit msg in patch 8
v2: https://lore.kernel.org/all/20221102213338.194672-1-kuba@kernel.org/
- wait for net changes to propagate
- restore the missing comment in patch 1
- drop extra space in patch 3
- improve commit message in patch 4
v1: https://lore.kernel.org/all/20221018230728.1039524-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: convert control family to split ops

Prove that the split ops work.
Sadly we need to keep bug-wards compatibility and specify
the same policy for dump as do, even tho we don't parse
inputs for the dump.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: allow families to use split ops directly

Let families to hook in the new split ops.

They are more flexible and should not be much larger than
full ops. Each split op is 40B while full op is 48B.
Devlink for example has 54 dos and 19 dumps, 2 of the dumps
do not have a do -> 56 full commands = 2688B.
Split ops would have taken 2920B, so 9% more space while
allowing individual per/post doit and per-type policies.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: inline old iteration helpers

All dumpers use the iterators now, inline the cmd by index
stuff into iterator code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: use iterator in the op to policy map dumping

We can't put the full iterator in the struct ctrl_dump_policy_ctx
because dump context is statically sized by netlink core.
Allocate it dynamically.

Rename policy to dump_map to make the logic a little easier to follow.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: add iterator for walking family ops

Subsequent changes will expose split op structures to users,
so walking the family ops with just an index will get harder.
Add a structured iterator, convert the simple cases.
Policy dumping needs more careful conversion.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: inline genl_get_cmd()

All callers go via genl_get_cmd_split() now, so rename it
to genl_get_cmd() remove the original.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: support split policies in ctrl_dumppolicy_put_op()

Pass do and dump versions of the op to ctrl_dumppolicy_put_op()
so that it can provide a different policy index for the two.

Since we now look at policies, and those are set appropriately
there's no need to look at the GENL_DONT_VALIDATE_DUMP flag.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: add policies for both doit and dumpit in ctrl_dumppolicy_start()

Separate adding doit and dumpit policies for CTRL_CMD_GETPOLICY.
This has no effect until we actually allow do and dump to come
from different sources as netlink_policy_dump_add_policy()
does deduplication.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: check for callback type at op load time

Now that genl_get_cmd_split() is informed what type of callback
user is trying to access (do or dump) we can check that this
callback is indeed available and return an error early.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: load policy based on validation flags

Set the policy and maxattr pointers based on validation flags.
genl_family_rcv_msg_attrs_parse() will do nothing and return NULL
if maxattrs is zero, so no behavior change is expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: introduce split op representation

We currently have two forms of operations - small ops and "full" ops
(or just ops). The former does not have pointers for some of the less
commonly used features (namely dump start/done and policy).

The "full" ops, however, still don't contain all the necessary
information. In particular the policy is per command ID, while
do and dump often accept different attributes. It's also not
possible to define different pre_doit and post_doit callbacks
for different commands within the family.

At the same time a lot of commands do not support dumping and
therefore all the dump-related information is wasted space.

Create a new command representation which can hold info about
a do implementation or a dump implementation, but not both at
the same time.

Use this new representation on the command execution path
(genl_family_rcv_msg) as we either run a do or a dump and
don't have to create a "full" op there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: move the private fields in struct genl_family

Move the private fields down to form a "private section".
Use the kdoc "private:" label comment thing to hide them
from the main kdoc comment.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: refactor the cmd <> policy mapping dump

The code at the top of ctrl_dumppolicy() dumps mappings between
ops and policies. It supports dumping both the entire family and
single op if dump is filtered. But both of those cases are handled
inside a loop, which makes the logic harder to follow and change.
Refactor to split the two cases more clearly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'am65-cpsw-suspend-resume'

Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: Add suspend/resume support

This series enables PM_SLEEP(suspend/resume) support to
the am65-cpsw network driver.

Dual-emac and Switch mode are tested to work with suspend/resume
on AM62-SK platform.

It can be verified on the following branch
https://github.com/rogerq/linux/commits/for-v6.2/am62-cpsw-lpm-1.0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: am65-cpsw: Fix hardware switch mode on suspend/resume

On low power during system suspend the ALE table context is lost.
Save the ALE contect before suspend and restore it after resume.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: am65-cpsw: retain PORT_VLAN_REG after suspend/resume

During suspend resume the context of PORT_VLAN_REG is lost so
save it during suspend and restore it during resume for
host port and slave ports.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpsw_ale: Add cpsw_ale_restore() helper

This can be used by device driver to restore ALE context.
The data produced by cpsw_ale_dump() can be passed to
cpsw_ale_restore().

This is required as on certain platforms the ALE context
is lost on low power suspend/resume.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: am65-cpsw: Add suspend/resume support

Add PM handlers for System suspend/resume.

As DMA driver doesn't yet support suspend/resume we free up
the DMA channels at suspend and acquire and initialize them
at resume.

Move the init/free dma calls to ndo_open/close() hooks so
it is symmetric and easier to invoke from suspend/resume handler.

As CPTS looses contect during suspend/resume, invoke the
necessary CPTS suspend/resume helpers.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: am65-cpsw/cpts: Add suspend/resume helpers

CPTS looses context on suspend (e.g. on AM62).
Provide suspend/resume hooks in CPTS driver. These will be
invoked by CPSW driver if CPTS was instantiated by CPSW.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: fix yt8521 duplicated argument to & or |

cocci warnings: (new ones prefixed by >>)
>> drivers/net/phy/motorcomm.c:1122:8-35: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1126:8-35: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1130:8-34: duplicated argument to & or |
  drivers/net/phy/motorcomm.c:1134:8-34: duplicated argument to & or |

The second YT8521_RC1R_GE_TX_DELAY_xx should be YT8521_RC1R_FE_TX_DELAY_xx.

Fixes: 70479a40954c ("net: phy: Add driver for Motorcomm yt8521 gigabit ethernet phy")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Frank <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

gve: Fix error return code in gve_prefill_rx_pages()

If alloc_page() fails in gve_prefill_rx_pages(), it should return
an error code in the error path.

Fixes: 82fd151d38d9 ("gve: Reduce alloc and copy costs in the GQ rx path")
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Shailend Chand <shailend@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: simplify the code logic of quirks

Simplify the code logic of handling the quirk of FEC_QUIRK_HAS_RACC.
If a SoC has the RACC quirk, the driver will enable the 16bit shift
by default in the probe function for a better performance.

This patch handles the logic in one place to make the logic simple
and clean. The patch optimizes the fec_enet_xdp_get_tx_queue function
according to Paolo Abeni's comments, and it also exludes the SoCs that
require to do frame swap from XDP support.

Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/lcs: Fix return type of lcs_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/lcs.c:2090:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = lcs_start_xmit,
                                    ^~~~~~~~~~~~~~
  drivers/s390/net/lcs.c:2097:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = lcs_start_xmit,
                                    ^~~~~~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of lcs_start_xmit() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/netiucv: Fix return type of netiucv_tx()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/netiucv.c:1854:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = netiucv_tx,
                                    ^~~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of netiucv_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Additionally, while in the area, remove a comment block that is no
longer relevant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/ctcm: Fix return type of ctc{mp,}m_tx()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/s390/net/ctcm_main.c:1064:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = ctcm_tx,
                                    ^~~~~~~
  drivers/s390/net/ctcm_main.c:1072:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = ctcmpc_tx,
                                    ^~~~~~~~~

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of ctc{mp,}m_tx() to
match the prototype's to resolve the warning and potential CFI failure,
should s390 select ARCH_SUPPORTS_CFI_CLANG in the future.

Additionally, while in the area, remove a comment block that is no
longer relevant.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wwan: t7xx: Add NAPI support

Replace the work queue based RX flow with a NAPI implementation
Remove rx_thread and dpmaif_rxq_work.
Enable GRO on RX path.
Introduce dummy network device. its responsibility is
    - Binds one NAPI object for each DL HW queue and acts as
      the agent of all those network devices.
    - Use NAPI object to poll DL packets.
    - Helps to dispatch each packet to the network interface.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Signed-off-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com>
Acked-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Acked-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wwan: t7xx: Use needed_headroom instead of hard_header_len

hard_header_len is used by gro_list_prepare() but on Rx, there
is no header so use needed_headroom instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sreehari Kancharla <sreehari.kancharla@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: Add support for configuration of rx-vlan-filter by ethtool

When ethtool config rx-vlan-filter, the driver will send
control command to firmware, then set to hardware in this patch.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: Add control command support for VF PMD driver in DPDK

HINIC has a mailbox for PF-VF communication and the VF driver
could send port control command to PF driver via mailbox.

The control command only can be set to register in PF,
so add support in PF driver for VF PMD driver control
command when VF PMD driver work with linux PF driver.

Then, no need to add handlers to nic_vf_cmd_msg_handler[],
because the host driver just forwards it to the firmware.
Actually the firmware works on a coprocessor MGMT_CPU(inside the NIC)
which will recv and deal with these commands.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: Convert the cmd code from decimal to hex to be more readable

The print cmd code is in hex, so using hex cmd code intead of
decimal is easy to check the value with print info.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa-port: constrain number of 'reg' in ports

'reg' without any constraints allows multiple items which is not the
intention in DSA port schema (as physical port is expected to have only
one address).

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: constrain number of 'reg' in ethernet ports

'reg' without any constraints allows multiple items which is not the
intention for Ethernet controller's port number.

Constrain the 'reg' on AX88178 and LAN95xx USB Ethernet Controllers.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: axiemac: add PM callbacks to support suspend/resume

Support basic system-wide suspend and resume functions

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood

Support mode switch properly, which is not available before.

If SoC has two Ethernet controllers, by setting both of them into MII
mode, the first controller enters GMII mode, while the second
controller is effectively disabled. This requires configuring (and
maybe enabling) the second controller in the device tree, even though
it cannot be used.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: support Octeon device CNF95N

Add support for Octeon device CNF95N.
CNF95N is a Octeon Fusion family product with same PCI NIC
characteristics as CN93 which is currently supported by the driver.

update supported device list in Documentation.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Link: https://lore.kernel.org/r/20221103060600.1858-1-vburru@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'sfc-add-basic-flower-matches-to-offload'

Edward Cree says:

====================
sfc: add basic flower matches to offload

Support offloading TC flower rules with matches on L2-L4 fields.
====================

Link: https://lore.kernel.org/r/cover.1667412458.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: add Layer 4 matches to ef100 TC offload

Support matching on UDP/TCP source and destination ports and TCP flags,
with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: add Layer 3 flag matches to ef100 TC offload

Support matching on ip_frag and ip_firstfrag.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: add Layer 3 matches to ef100 TC offload

Support matching on IP protocol, Type of Service, Time To Live, source
and destination addresses, with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: add Layer 2 matches to ef100 TC offload

Support matching on EtherType, VLANs and ethernet source/destination
addresses, with masking if supported by the hardware.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: check recirc_id match caps before MAE offload

Offloaded TC rules always match on recirc_id in the MAE, so we should
check that the MAE reported support for this match before attempting
to insert the rule.
These checks allow us to fail early, avoiding the PCIe round-trip to
firmware for an MC_CMD_MAE_ACTION_RULE_INSERT that will only fail,
and more importantly providing a more informative error message that
identifies which match field is unsupported.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: renesas: Fix return type of rswitch_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/ethernet/renesas/rswitch.c:1533:20: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit = rswitch_start_xmit,
                          ^~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of rswitch_start_xmit()
to match the prototype's to resolve the warning and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221103220032.2142122-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: remove redundant check in ip_metrics_convert()

Now ip_metrics_convert() is only called by ip_fib_metrics_init(). Before
ip_fib_metrics_init() invokes ip_metrics_convert(), it checks whether
input parameter fc_mx is NULL. Therefore, ip_metrics_convert() doesn't
need to check fc_mx.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221104022513.168868-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igb: Proactively round up to kmalloc bucket size

In preparation for removing the "silently change allocation size"
users of ksize(), explicitly round up all q_vector allocations so that
allocations can be correctly compared to ksize().

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igb: Do not free q_vector unless new one was allocated

Avoid potential use-after-free condition under memory pressure. If the
kzalloc() fails, q_vector will be freed but left in the original
adapter->q_vector[v_idx] array position.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbevf: Add error messages on vlan error

ixgbevf did not provide an error in dmesg if VLAN addition failed.

Add two descriptive failure messages in the kernel log.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: Remove unneeded semicolon

./drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c:1305:2-3: Unneeded semicolon

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2688
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: Remove local variable

Remove local variable "match" and directly return evaluated conditional
instead.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: change MAX_RXD/MAX_TXD based on adapter type

Set the length limit for the receive descriptor buffer and transmit
descriptor buffer based on the controller type. The values used are called
out in the controller datasheets as a 'Note:' in the RDLEN and TDLEN
register descriptions.

This allows the user to use ethtool to allocate larger descriptor buffers
in the case where data is received or transmitted too quickly for the
driver to keep up.

Signed-off-by: Daniel Willenson <daniel@veobot.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'net-ipa-more-endpoints'

Alex Elder says:

====================
net: ipa: support more endpoints

This series adds support for more than 32 IPA endpoints.  To do
this, five registers whose bits represent endpoint state are
replicated as needed to represent endpoints beyond 32.  For existing
platforms, the number of endpoints is never greater than 32, so
there is just one of each register.  IPA v5.0+ supports more than
that though; these changes prepare the code for that.

Beyond that, the IPA fields that represent endpoints in a 32-bit
bitmask are updated to support an arbitrary number of these endpoint
registers.  (There is one exception, explained in patch 7.)

The first two patches are some sort of unrelated cleanups, making
use of a helper function introduced recently.

The third and fourth use parameterized functions to determine the
register offset for registers that represent endpoints.

The last five convert fields representing endpoints to allow more
than 32 endpoints to be represented.

Since v1, I have implemented Jakub's suggestions:
  - Don't print a message on (bitmap) memory allocation failure
  - Do not do "mass null checks" when allocating bitmaps
  - Rework some code to ensure error path is sane
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use a bitmap for enabled endpoints

Replace the 32-bit unsigned used to track enabled endpoints with a
Linux bitmap, to allow an arbitrary number of endpoints to be
represented.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use a bitmap for set-up endpoints

Replace the 32-bit unsigned used to track endpoints that have
completed setup with a Linux bitmap, to allow an arbitrary number
of endpoints to be represented.

Rework the error handling in ipa_endpoint_init() so the defined
endpoint bitmap is freed if an error occurs early. Once endpoints
have been initialized, ipa_endpoint_exit() is used to recover if
the set of filtered endpoints is invalid.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: support more filtering endpoints

Prior to IPA v5.0, there could be no more than 32 endpoints.

A filter table begins with a bitmap indicating which endpoints have
a filter defined.  That bitmap is currently assumed to fit in a
32-bit value.

Starting with IPA v5.0, more than 32 endpoints are supported, so
it's conceivable that a TX endpoint has an ID that exceeds 32.
Increase the size of the field representing endpoints that support
filtering to 64 bits.  Rename the bitmap field "filtered".

Unlike other similar fields, we do not use an (arbitrarily long)
Linux bitmap for this purpose.  The reason is that if a filter table
ever *did* need to support more than 64 TX endpoints, its format
would change in ways we can't anticipate.

Have ipa_endpoint_init() return a negative errno rather than a mask
that indicates which endpoints support filtering, and have that
function assign the "filtered" field directly.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use a bitmap for available endpoints

Similar to the previous patch, replace the 32-bit unsigned used to
track endpoints supported by hardware with a Linux bitmap, to allow
an arbitrary number of endpoints to be represented.

Move ipa_endpoint_deconfig() above ipa_endpoint_config() and use
it in the error path of the latter function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use a bitmap for defined endpoints

IPA v5.0 supports more than 32 endpoints, so we will be unable to
represent endpoints defined in the configuration data with a 32-bit
value. To prepare for that, convert the field in the IPA structure
representing defined endpoints to be a Linux bitmap.

Convert loops based on that field into for_each_set_bit() calls over
the new bitmap. Note that the loop in ipa_endpoint_config() still
assumes there are 32 or fewer endpoints (when comparing against the
available endpoint bit mask); that assumption goes away in the next
patch.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add a parameter to suspend registers

The SUSPEND_INFO, SUSPEND_EN, SUSPEND_CLR registers represent
endpoint IDs in a bit mask. When more than 32 endpoints are
supported, these registers will be replicated as needed to represent
the number of supported endpoints. Update the definitions of these
registers to have a stride of 4 bytes, and update the code that
operates them to select the proper offset and bit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add a parameter to aggregation registers

Starting with IPA v5.0, a single IPA instance can have more than 32
endpoints defined.  To handle this, each register that holds a
bitmap of IPA endpoints is replicated as needed to represent the
available endpoints.

To prepare for this, registers that represent endpoint IDs in a bit
mask will be defined to have a parameter, with a stride value of 4
bytes.  The first 32 endpoints are represented in the first 32-bit
register, then the next (up to) 32 endpoints at an offset 4 bytes
higher.  When accessing such a register, the endpoint ID divided
by 32 determines the offset, and the endpoint ID modulo 32 defines
the endpoint's bit position within the register.

The first two registers we'll update for this are STATE_AGGR_ACTIVE
and AGGR_FORCE_CLOSE.

Until more than 32 endpoints are supported, this change has no
practical effect.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: use ipa_table_mem() in ipa_table_reset_add()

Similar to the previous commit, pass flags rather than a memory
region ID to ipa_table_reset_add(), and there use ipa_table_mem() to
look up the memory region affected based on those flags.

Currently all eight of these table memory regions are assumed to
exist, because they all have canaries within them. Stop assuming
that will always be the case, and in ipa_table_reset_add() allow
these memory regions to be non-existent.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: reduce arguments to ipa_table_init_add()

Recently ipa_table_mem() was added as a way to look up one of 8
possible memory regions by indicating whether it was a filter or
route table, hashed or not, and IPv6 or not.

We can simplify the interface to ipa_table_init_add() by passing two
flags to it instead of the opcode and both hashed and non-hashed
memory region IDs. The "filter" and "ipv6" flags are sufficient to
determine the opcode to use, and with ipa_table_mem() can look up
the correct memory region as well.

It's possible to not have hashed tables, but we already verify the
number of entries in a filter or routing table is nonzero. Stop
assuming a hashed table entry exists in ipa_table_init_add().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: remove redundant variable total_payload_len

Variable total_payload_len is being used to accumulate payload lengths
however it is never read or used afterwards. It is redundant and can
be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-02 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Joe Damato adds tracepoint information to i40e_napi_poll to expose helpful
debug information for users who'd like to get a better understanding of
how their NIC is performing as they adjust various parameters and tuning
knobs.

Note: this does not touch any XDP related code paths. This
tracepoint will only work when not using XDP. Care has been taken to avoid
changing control flow in i40e_napi_poll with this change.

Alicja adds error messaging for unsupported duplex settings for i40e.

Ye Xingchen replaces use of __FUNCTION__ with __func__ for iavf.

Bartosz changes tense of device removal message to be more clear on the
action for iavf.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  iavf: Change information about device removal in dmesg
  iavf: Replace __FUNCTION__ with __func__
  i40e: Add appropriate error message logged for incorrect duplex setting
  i40e: Add i40e_napi_poll tracepoint
  i40e: Record number of RXes cleaned during NAPI
  i40e: Record number TXes cleaned during NAPI
  i40e: Store the irq number in i40e_q_vector
====================

Link: https://lore.kernel.org/r/20221102211011.2944983-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-11-02 (e1000e, e1000, igc)

This series contains updates to e1000e, e1000, and igc drivers.

For e1000e, Sasha adds a new board type to help distinguish platforms and
adds device id support for upcoming platforms. He also adds trace points
for CSME flows to aid in debugging.

Ani removes unnecessary kmap_atomic call for e1000 and e1000e.

Muhammad sets speed based transmit offsets for launchtime functionality to
reduce latency for igc.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Correct the launchtime offset
  e1000: Remove unnecessary use of kmap_atomic()
  e1000e: Remove unnecessary use of kmap_atomic()
  e1000e: Add e1000e trace module
  e1000e: Add support for the next LOM generation
  e1000e: Separate MTP board type from ADP
====================

Link: https://lore.kernel.org/r/20221102203957.2944396-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hamradio: baycom_epp: Fix return type of baycom_send_packet()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/hamradio/baycom_epp.c:1119:25: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit      = baycom_send_packet,
                                ^~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of baycom_send_packet()
to match the prototype's to resolve the warning and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102160610.1186145-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: Fix return type of netcp_ndo_start_xmit()

With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/net/ethernet/ti/netcp_core.c:1944:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict]
          .ndo_start_xmit         = netcp_ndo_start_xmit,
                                    ^~~~~~~~~~~~~~~~~~~~
  1 error generated.

->ndo_start_xmit() in 'struct net_device_ops' expects a return type of
'netdev_tx_t', not 'int'. Adjust the return type of
netcp_ndo_start_xmit() to match the prototype's to resolve the warning
and CFI failure.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102160933.1601260-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: Use kstrtobool() instead of strtobool()

strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.

In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.

While at it, include the corresponding header file (<linux/kstrtox.h>).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/d4432a67b6f769cac0a9ec910ac725298b64e102.1667336095.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-fix-netdev-to-devlink_port-linkage-and-expose-to-user'

Jiri Pirko says:

====================
net: fix netdev to devlink_port linkage and expose to user

Currently, the info about linkage from netdev to the related
devlink_port instance is done using ndo_get_devlink_port().
This is not sufficient, as it is up to the driver to implement it and
some of them don't do that. Also it leads to a lot of unnecessary
boilerplate code in all the drivers.

Instead of that, introduce a possibility for driver to expose this
relationship by new SET_NETDEV_DEVLINK_PORT macro which stores it into
dev->devlink_port. It is ensured by the driver init/fini flows that
the devlink_port pointer does not change during the netdev lifetime.
Devlink port is always registered before netdev register and
unregistered after netdev unregister.

Benefit from this linkage setup and remove explicit calls from driver
to devlink_port_type_eth_set() and clear(). Many of the driver
didn't use it correctly anyway. Let the devlink.c to track associated
netdev events and adjust type and type pointer accordingly. Also
use this events to to keep track on ifname change and remove RTNL lock
taking from devlink_nl_port_fill().

Finally, remove the ndo_get_devlink_port() ndo which is no longer used
and expose devlink_port handle as a new netdev netlink attribute to the
user. That way, during the ifname->devlink_port lookup, userspace app
does not have to dump whole devlink port list and instead it can just
do a simple RTM_GETLINK query.
====================

Link: https://lore.kernel.org/r/20221102160211.662752-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: expose devlink port over rtnetlink

Expose devlink port handle related to netdev over rtnetlink. Introduce a
new nested IFLA attribute to carry the info. Call into devlink code to
fill-up the nest with existing devlink attributes that are used over
devlink netlink.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: remove unused ndo_get_devlink_port

Remove ndo_get_devlink_port which is no longer used alongside with the
implementations in drivers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: use devlink_port pointer instead of ndo_get_devlink_port

Use newly introduced devlink_port pointer instead of getting it calling
to ndo_get_devlink_port op.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: add not cleared type warning to port unregister

By the time port unregister is called. There should be no type set. Make
sure that the driver cleared it before and warn in case it didn't. This
enforces symmetricity with type set and port register.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: store copy netdevice ifindex and ifname to allow port_fill() without RTNL held

To avoid a need to take RTNL mutex in port_fill() function, benefit from
the introduce infrastructure that tracks netdevice notifier events.
Store the ifindex and ifname upon register and change name events.
Remove the rtnl_held bool propagated down to port_fill() function as it
is no longer needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: remove net namespace check from devlink_nl_port_fill()

It is ensured by the netdevice notifier event processing, that only
netdev pointers from the same net namespaces are filled. Remove the
net namespace check from devlink_nl_port_fill() as it is no longer
needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: remove netdev arg from devlink_port_type_eth_set()

Since devlink_port_type_eth_set() should no longer be called by any
driver with netdev pointer as it should rather use
SET_NETDEV_DEVLINK_PORT, remove the netdev arg. Add a warn to
type_clear()

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port

Benefit from the previously implemented tracking of netdev events in
devlink code and instead of calling devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port
pointer to netdevice which is about to be registered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: track netdev with devlink_port assigned

Currently, ethernet drivers are using devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev.

Instead of calling them directly, let the driver use
SET_NETDEV_DEVLINK_PORT macro to assign devlink_port pointer and let
devlink to track it. Note the devlink port pointer is static during
the time netdevice is registered.

In devlink code, use per-namespace netdev notifier to track
the netdevices with devlink_port assigned and change the internal
devlink_port type and related type pointer accordingly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: take RTNL in port_fill() function only if it is not held

Follow-up patch is going to introduce a netdevice notifier event
processing which is called with RTNL mutex held. Processing of this will
eventually lead to call to port_notity() and port_fill() which currently
takes RTNL mutex internally. So as a temporary solution, propagate a
bool indicating if the mutex is already held. This will go away in one
of the follow-up patches.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: move port_type_netdev_checks() call to __devlink_port_type_set()

As __devlink_port_type_set() is going to be called directly from netdevice
notifier event handle in one of the follow-up patches, move the
port_type_netdev_checks() call there.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: move port_type_warn_schedule() call to __devlink_port_type_set()

As __devlink_port_type_set() is going to be called directly from netdevice
notifier event handle in one of the follow-up patches, move the
port_type_warn_schedule() call there.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devlink: convert devlink port type-specific pointers to union

Instead of storing type_dev as a void pointer, convert it to union and
use it to store either struct net_device or struct ib_device pointer.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bridge-add-mac-authentication-bypass-mab-support'

Ido Schimmel says:

====================
bridge: Add MAC Authentication Bypass (MAB) support

Patch #1 adds MAB support in the bridge driver. See the commit message
for motivation, design choices and implementation details.

Patch #2 adds corresponding test cases.

Follow-up patchsets will add offload support in mlxsw and mv88e6xxx.
====================

Link: https://lore.kernel.org/r/20221101193922.2125323-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: Add MAC Authentication Bypass (MAB) test cases

Add four test cases to verify MAB functionality:

* Verify that a locked FDB entry can be generated by the bridge,
  preventing a host from communicating via the bridge. Test that user
  space can clear the "locked" flag by replacing the entry, thereby
  authenticating the host and allowing it to communicate via the bridge.

* Test that an entry cannot roam to a locked port, but that it can roam
  to an unlocked port.

* Test that MAB can only be enabled on a port that is both locked and
  has learning enabled.

* Test that locked FDB entries are flushed from a port when MAB is
  disabled.

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: Add MAC Authentication Bypass (MAB) support

Hosts that support 802.1X authentication are able to authenticate
themselves by exchanging EAPOL frames with an authenticator (Ethernet
bridge, in this case) and an authentication server. Access to the
network is only granted by the authenticator to successfully
authenticated hosts.

The above is implemented in the bridge using the "locked" bridge port
option. When enabled, link-local frames (e.g., EAPOL) can be locally
received by the bridge, but all other frames are dropped unless the host
is authenticated. That is, unless the user space control plane installed
an FDB entry according to which the source address of the frame is
located behind the locked ingress port. The entry can be dynamic, in
which case learning needs to be enabled so that the entry will be
refreshed by incoming traffic.

There are deployments in which not all the devices connected to the
authenticator (the bridge) support 802.1X. Such devices can include
printers and cameras. One option to support such deployments is to
unlock the bridge ports connecting these devices, but a slightly more
secure option is to use MAB. When MAB is enabled, the MAC address of the
connected device is used as the user name and password for the
authentication.

For MAB to work, the user space control plane needs to be notified about
MAC addresses that are trying to gain access so that they will be
compared against an allow list. This can be implemented via the regular
learning process with the sole difference that learned FDB entries are
installed with a new "locked" flag indicating that the entry cannot be
used to authenticate the device. The flag cannot be set by user space,
but user space can clear the flag by replacing the entry, thereby
authenticating the device.

Locked FDB entries implement the following semantics with regards to
roaming, aging and forwarding:

1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports,
   in which case the "locked" flag is cleared. FDB entries cannot roam
   to locked ports regardless of MAB being enabled or not. Therefore,
   locked FDB entries are only created if an FDB entry with the given {MAC,
   VID} does not already exist. This behavior prevents unauthenticated
   devices from disrupting traffic destined to already authenticated
   devices.

2. Aging: Locked FDB entries age and refresh by incoming traffic like
   regular entries.

3. Forwarding: Locked FDB entries forward traffic like regular entries.
   If user space detects an unauthorized MAC behind a locked port and
   wishes to prevent traffic with this MAC DA from reaching the host, it
   can do so using tc or a different mechanism.

Enable the above behavior using a new bridge port option called "mab".
It can only be enabled on a bridge port that is both locked and has
learning enabled. Locked FDB entries are flushed from the port once MAB
is disabled. A new option is added because there are pure 802.1X
deployments that are not interested in notifications about locked FDB
entries.

Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.1-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and netfilter.

  Current release - regressions:

   - net: several zerocopy flags fixes

   - netfilter: fix possible memory leak in nf_nat_init()

   - openvswitch: add missing .resv_start_op

  Previous releases - regressions:

   - neigh: fix null-ptr-deref in neigh_table_clear()

   - sched: fix use after free in red_enqueue()

   - dsa: fall back to default tagger if we can't load the one from DT

   - bluetooth: fix use-after-free in l2cap_conn_del()

  Previous releases - always broken:

   - netfilter: netlink notifier might race to release objects

   - nfc: fix potential memory leak of skb

   - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu

   - bluetooth: use skb_put to set length

   - eth: tun: fix bugs for oversize packet when napi frags enabled

   - eth: lan966x: fixes for when MTU is changed

   - eth: dwmac-loongson: fix invalid mdio_node"

* tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  vsock: fix possible infinite sleep in vsock_connectible_wait_data()
  vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
  ipv6: fix WARNING in ip6_route_net_exit_late()
  bridge: Fix flushing of dynamic FDB entries
  net, neigh: Fix null-ptr-deref in neigh_table_clear()
  net/smc: Fix possible leaked pernet namespace in smc_init()
  stmmac: dwmac-loongson: fix invalid mdio_node
  ibmvnic: Free rwi on reset success
  net: mdio: fix undefined behavior in bit shift for __mdiobus_register
  Bluetooth: L2CAP: Fix attempting to access uninitialized memory
  Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
  Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
  Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
  Bluetooth: L2CAP: Fix memory leak in vhci_write
  Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
  Bluetooth: virtio_bt: Use skb_put to set length
  Bluetooth: hci_conn: Fix CIS connection dst_type handling
  Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
  netfilter: ipset: enforce documented limit to prevent allocating huge memory
  isdn: mISDN: netjet: fix wrong check of device registration
  ...

Merge tag 'powerpc-6.1-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix an endian thinko in the asm-generic compat_arg_u64() which led to
   syscall arguments being swapped for some compat syscalls.

- Fix syscall wrapper handling of syscalls with 64-bit arguments on
   32-bit kernels, which led to syscall arguments being misplaced.

- A build fix for amdgpu on Book3E with AltiVec disabled.

Thanks to Andreas Schwab, Christian Zigotzky, and Arnd Bergmann.

* tag 'powerpc-6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Select ARCH_SPLIT_ARG64
  powerpc/32: fix syscall wrappers with 64-bit arguments
  asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
  powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec

Merge branch 'add-new-pcp-and-apptrust-attributes-to-dcbnl'

Daniel Machon says:

====================
Add new PCP and APPTRUST attributes to dcbnl

This patch series adds new extension attributes to dcbnl, to support PCP
prioritization (and thereby hw offloadable pcp-based queue
classification) and per-selector trust and trust order. Additionally,
the microchip sparx5 driver has been dcb-enabled to make use of the new
attributes to offload PCP, DSCP and Default prio to the switch, and
implement trust order of selectors.

For pre-RFC discussion see:
https://lore.kernel.org/netdev/Yv9VO1DYAxNduw6A@DEN-LT-70577/

For RFC series see:
https://lore.kernel.org/netdev/20220915095757.2861822-1-daniel.machon@microchip.com/

In summary: there currently exist no convenient way to offload per-port
PCP-based queue classification to hardware. The DCB subsystem offers
different ways to prioritize through its APP table, but lacks an option
for PCP. Similarly, there is no way to indicate the notion of trust for
APP table selectors. This patch series addresses both topics.

PCP based queue classification:
  - 8021Q standardizes the Priority Code Point table (see 6.9.3 of IEEE
    Std 802.1Q-2018).  This patch series makes it possible, to offload
    the PCP classification to said table.  The new PCP selector is not a
    standard part of the APP managed object, therefore it is
    encapsulated in a new non-std extension attribute.

Selector trust:
  - ASIC's often has the notion of trust DSCP and trust PCP. The new
    attribute makes it possible to specify a trust order of app
    selectors, which drivers can then react on.

DCB-enable sparx5 driver:
- Now supports offloading of DSCP, PCP and default priority. Only one
   mapping of protocol:priority is allowed. Consecutive mappings of the
   same protocol to some new priority, will overwrite the previous. This
   is to keep a consistent view of the app table and the hardware.
- Now supports dscp and pcp trust, by use of the introduced
   dcbnl_set/getapptrust ops. Sparx5 supports trust orders: [], [dscp],
   [pcp] and [dscp, pcp]. For now, only DSCP and PCP selectors are
   supported by the driver, everything else is bounced.

Patch #1 introduces a new PCP selector to the APP object, which makes it
possible to encode PCP and DEI in the app triplet and offload it to the
PCP table of the ASIC.

Patch #2 Introduces the new extension attributes
DCB_ATTR_DCB_APP_TRUST_TABLE and DCB_ATTR_DCB_APP_TRUST. Trusted
selectors are passed in the nested DCB_ATTR_DCB_APP_TRUST_TABLE
attribute, and assembled into an array of selectors:

  u8 selectors[256];

where lower indexes has higher precedence.  In the array, selectors are
stored consecutively, starting from index zero. With a maximum number of
256 unique selectors, the list has the same maximum size.

Patch #3 Sets up the dcbnl ops hook, and adds support for offloading pcp
app entries, to the PCP table of the switch.

Patch #4 Makes use of the dcbnl_set/getapptrust ops, to set a per-port
trust order.

Patch #5 Adds support for offloading dscp app entries to the DSCP table
of the switch.

Patch #6 Adds support for offloading default prio app entries to the
switch.

====================

Link: https://lore.kernel.org/r/20221101094834.2726202-1-daniel.machon@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: microchip: sparx5: add support for offloading default prio

Add support for offloading default prio {ETHERTYPE, 0, prio}.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: microchip: sparx5: add support for offloading dscp table

Add support for offloading dscp app entries. Dscp values are global for
all ports on the sparx5 switch. Therefore, we replicate each dscp app
entry per-port.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>