review.tizen.org Git - platform/kernel/linux-starfive.git/log

tcp: Introduce optional per-netns ehash.

The more sockets we have in the hash table, the longer we spend looking
up the socket.  While running a number of small workloads on the same
host, they penalise each other and cause performance degradation.

The root cause might be a single workload that consumes much more
resources than the others.  It often happens on a cloud service where
different workloads share the same computing resource.

On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash
entries), after running iperf3 in different netns, creating 24Mi sockets
without data transfer in the root netns causes about 10% performance
regression for the iperf3's connection.

thash_entries sockets length Gbps
524288       1      1 50.7
   24Mi     48 45.1

It is basically related to the length of the list of each hash bucket.
For testing purposes to see how performance drops along the length,
I set 131072 (1Mi / 8) to thash_entries, and here's the result.

thash_entries sockets length Gbps
        131072       1      1 50.7
    1Mi      8 49.9
    2Mi     16 48.9
    4Mi     32 47.3
    8Mi     64 44.6
   16Mi    128 40.6
   24Mi    192 36.3
   32Mi    256 32.5
   40Mi    320 27.0
   48Mi    384 25.0

To resolve the socket lookup degradation, we introduce an optional
per-netns hash table for TCP, but it's just ehash, and we still share
the global bhash, bhash2 and lhash2.

With a smaller ehash, we can look up non-listener sockets faster and
isolate such noisy neighbours.  In addition, we can reduce lock contention.

We can control the ehash size by a new sysctl knob.  However, depending
on workloads, it will require very sensitive tuning, so we disable the
feature by default (net.ipv4.tcp_child_ehash_entries == 0).  Moreover,
we can fall back to using the global ehash in case we fail to allocate
enough memory for a new ehash.  The maximum size is 16Mi, which is large
enough that even if we have 48Mi sockets, the average list length is 3,
and regression would be less than 1%.

We can check the current ehash size by another read-only sysctl knob,
net.ipv4.tcp_ehash_entries.  A negative value means the netns shares
the global ehash (per-netns ehash is disabled or failed to allocate
memory).

  # dmesg | cut -d ' ' -f 5- | grep "established hash"
  TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage)

  # sysctl net.ipv4.tcp_ehash_entries
  net.ipv4.tcp_ehash_entries = 524288  # can be changed by thash_entries

  # sysctl net.ipv4.tcp_child_ehash_entries
  net.ipv4.tcp_child_ehash_entries = 0  # disabled by default

  # ip netns add test1
  # ip netns exec test1 sysctl net.ipv4.tcp_ehash_entries
  net.ipv4.tcp_ehash_entries = -524288  # share the global ehash

  # sysctl -w net.ipv4.tcp_child_ehash_entries=100
  net.ipv4.tcp_child_ehash_entries = 100

  # ip netns add test2
  # ip netns exec test2 sysctl net.ipv4.tcp_ehash_entries
  net.ipv4.tcp_ehash_entries = 128  # own a per-netns ehash with 2^n buckets

When more than two processes in the same netns create per-netns ehash
concurrently with different sizes, we need to guarantee the size in
one of the following ways:

  1) Share the global ehash and create per-netns ehash

  First, unshare() with tcp_child_ehash_entries==0.  It creates dedicated
  netns sysctl knobs where we can safely change tcp_child_ehash_entries
  and clone()/unshare() to create a per-netns ehash.

  2) Control write on sysctl by BPF

  We can use BPF_PROG_TYPE_CGROUP_SYSCTL to allow/deny read/write on
  sysctl knobs.

Note that the global ehash allocated at the boot time is spread over
available NUMA nodes, but inet_pernet_hashinfo_alloc() will allocate
pages for each per-netns ehash depending on the current process's NUMA
policy.  By default, the allocation is done in the local node only, so
the per-netns hash table could fully reside on a random node.  Thus,
depending on the NUMA policy the netns is created with and the CPU the
current thread is running on, we could see some performance differences
for highly optimised networking applications.

Note also that the default values of two sysctl knobs depend on the ehash
size and should be tuned carefully:

  tcp_max_tw_buckets  : tcp_child_ehash_entries / 2
  tcp_max_syn_backlog : max(128, tcp_child_ehash_entries / 128)

As a bonus, we can dismantle netns faster.  Currently, while destroying
netns, we call inet_twsk_purge(), which walks through the global ehash.
It can be potentially big because it can have many sockets other than
TIME_WAIT in all netns.  Splitting ehash changes that situation, where
it's only necessary for inet_twsk_purge() to clean up TIME_WAIT sockets
in each netns.

With regard to this, we do not free the per-netns ehash in inet_twsk_kill()
to avoid UAF while iterating the per-netns ehash in inet_twsk_purge().
Instead, we do it in tcp_sk_exit_batch() after calling tcp_twsk_purge() to
keep it protocol-family-independent.

In the future, we could optimise ehash lookup/iteration further by removing
netns comparison for the per-netns ehash.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Save unnecessary inet_twsk_purge() calls.

While destroying netns, we call inet_twsk_purge() in tcp_sk_exit_batch()
and tcpv6_net_exit_batch() for AF_INET and AF_INET6.  These commands
trigger the kernel to walk through the potentially big ehash twice even
though the netns has no TIME_WAIT sockets.

  # ip netns add test
  # ip netns del test

  or

  # unshare -n /bin/true >/dev/null

When tw_refcount is 1, we need not call inet_twsk_purge() at least
for the net.  We can save such unneeded iterations if all netns in
net_exit_list have no TIME_WAIT sockets.  This change eliminates
the tax by the additional unshare() described in the next patch to
guarantee the per-netns ehash size.

Tested:

  # mount -t debugfs none /sys/kernel/debug/
  # echo cleanup_net > /sys/kernel/debug/tracing/set_ftrace_filter
  # echo inet_twsk_purge >> /sys/kernel/debug/tracing/set_ftrace_filter
  # echo function > /sys/kernel/debug/tracing/current_tracer
  # cat ./add_del_unshare.sh
  for i in `seq 1 40`
  do
      (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
  done
  wait;
  # ./add_del_unshare.sh

Before the patch:

  # cat /sys/kernel/debug/tracing/trace_pipe
    kworker/u128:0-8       [031] ...1.   174.162765: cleanup_net <-process_one_work
    kworker/u128:0-8       [031] ...1.   174.240796: inet_twsk_purge <-cleanup_net
    kworker/u128:0-8       [032] ...1.   174.244759: inet_twsk_purge <-tcp_sk_exit_batch
    kworker/u128:0-8       [034] ...1.   174.290861: cleanup_net <-process_one_work
    kworker/u128:0-8       [039] ...1.   175.245027: inet_twsk_purge <-cleanup_net
    kworker/u128:0-8       [046] ...1.   175.290541: inet_twsk_purge <-tcp_sk_exit_batch
    kworker/u128:0-8       [037] ...1.   175.321046: cleanup_net <-process_one_work
    kworker/u128:0-8       [024] ...1.   175.941633: inet_twsk_purge <-cleanup_net
    kworker/u128:0-8       [025] ...1.   176.242539: inet_twsk_purge <-tcp_sk_exit_batch

After:

  # cat /sys/kernel/debug/tracing/trace_pipe
    kworker/u128:0-8       [038] ...1.   428.116174: cleanup_net <-process_one_work
    kworker/u128:0-8       [038] ...1.   428.262532: cleanup_net <-process_one_work
    kworker/u128:0-8       [030] ...1.   429.292645: cleanup_net <-process_one_work

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Access &tcp_hashinfo via net.

We will soon introduce an optional per-netns ehash.

This means we cannot use tcp_hashinfo directly in most places.

Instead, access it via net->ipv4.tcp_death_row.hashinfo.

The access will be valid only while initialising tcp_hashinfo
itself and creating/destroying each netns.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Set NULL to sk->sk_prot->h.hashinfo.

We will soon introduce an optional per-netns ehash.

This means we cannot use the global sk->sk_prot->h.hashinfo
to fetch a TCP hashinfo.

Instead, set NULL to sk->sk_prot->h.hashinfo for TCP and get
a proper hashinfo from net->ipv4.tcp_death_row.hashinfo.

Note that we need not use sk->sk_prot->h.hashinfo if DCCP is
disabled.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.

We will soon introduce an optional per-netns ehash and access hash
tables via net->ipv4.tcp_death_row->hashinfo instead of &tcp_hashinfo
in most places.

It could harm the fast path because dereferences of two fields in net
and tcp_death_row might incur two extra cache line misses. To save one
dereference, let's place tcp_death_row back in netns_ipv4 and fetch
hashinfo via net->ipv4.tcp_death_row"."hashinfo.

Note tcp_death_row was initially placed in netns_ipv4, and commit
fbb8295248e1 ("tcp: allocate tcp_death_row outside of struct netns_ipv4")
changed it to a pointer so that we can fire TIME_WAIT timers after freeing
net. However, we don't do so after commit 04c494e68a13 ("Revert "tcp/dccp:
get rid of inet_twsk_purge()""), so we need not define tcp_death_row as a
pointer.

Also, we move refcount_dec_and_test(&tw_refcount) from tcp_sk_exit() to
tcp_sk_exit_batch() as a debug check.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Clean up some functions.

This patch adds no functional change and cleans up some functions
that the following patches touch around so that we make them tidy
and easy to review/revert.  The changes are

  - Keep reverse christmas tree order
  - Remove unnecessary init of port in inet_csk_find_open_port()
  - Use req_to_sk() once in reqsk_queue_unlink()
  - Use sock_net(sk) once in tcp_time_wait() and tcp_v[46]_connect()

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

headers: Remove some left-over license text

Remove a left-over from commit 2874c5fd2842 ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 152")

There is no need for an empty "License:".

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bonding: add a test for bonding lladdr target

This is a regression test for commit 592335a4164c ("bonding: accept
unsolicited NA message") and commit b7f14132bf58 ("bonding: use unspecified
address if no available link local address"). When the bond interface
up and no available link local address, unspecified address(::) is used to
send the NS message. The unsolicited NA message should also be accepted
for validation.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220920033047.173244-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mux-multiplexer: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-3-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mux-mmioreg: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-2-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mux-meson-g12a: Switch to use dev_err_probe() helper

dev_err() can be replace with dev_err_probe() which will check if error
code is -EPROBE_DEFER.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220915065043.665138-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ravb: Add RZ/G2L MII interface support

EMAC IP found on RZ/G2L Gb ethernet supports MII interface.
This patch adds support for selecting MII interface mode.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20220914192604.265859-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rtnetlink: Enslave device before bringing it up

Unlike with bridges, one can't add an interface to a bond and set it up
at the same time:

| # ip link set dummy0 down
| # ip link set dummy0 master bond0 up
| Error: Device can not be enslaved while up.

Of all drivers with ndo_add_slave callback, bond and team decline if
IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
immediately up again) and the others just don't care.

Support the common notion of setting the interface up after enslaving it
by sorting the operations accordingly.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220914150623.24152-1-phil@nwl.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'macb-add-zynqmp-sgmii-dynamic-configuration-support'

Radhey Shyam Pandey says:

====================
macb: add zynqmp SGMII dynamic configuration support

This patchset add firmware and driver support to do SD/GEM dynamic
configuration. In traditional flow GEM secure space configuration
is done by FSBL. However in specific usescases like dynamic designs
where GEM is not enabled in base vivado design, FSBL skips GEM
initialization and we need a mechanism to configure GEM secure space
in linux space at runtime.
====================

Link: https://lore.kernel.org/r/1663158796-14869-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: Add zynqmp SGMII dynamic configuration support

Add support for the dynamic configuration which takes care of
configuring the GEM secure space configuration registers
using EEMI APIs.
High level sequence is to:
- Check for the PM dynamic configuration support, if no error proceed with
GEM dynamic configurations(next steps) otherwise skip the dynamic
configuration.
- Configure GEM Fixed configurations.
- Configure GEM_CLK_CTRL (gemX_sgmii_mode).
- Trigger GEM reset.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Conor Dooley <conor.dooley@microchip.com> (for MPFS)
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

firmware: xilinx: add support for sd/gem config

Add new APIs in firmware to configure SD/GEM registers. Internally
it calls PM IOCTL for below SD/GEM register configuration:
- SD/EMMC select
- SD slot type
- SD base clock
- SD 8 bit support
- SD fixed config
- GEM SGMII Mode
- GEM fixed config

Signed-off-by: Ronak Jain <ronak.jain@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xen-netfront: make bounce_skb static

The symbol is not used outside of the file, so mark it static.

Fixes the following warning:

./drivers/net/xen-netfront.c:676:16: warning: symbol 'bounce_skb' was not declared. Should it be static?

Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220914064339.49841-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: micrel: Add interrupts support for LAN8804 PHY

Add support for interrupts for LAN8804 PHY.

Tested-by: Michael Walle <michael@walle.cc> # on kontron-kswitch-d10
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220913142926.816746-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'sfp-add-support-for-halny-gpon-module'

Russell King says:

====================
sfp: add support for HALNy GPON module

This series adds support for the HALNy GPON SFP module. In order to do
this sensibly, we need a more flexible quirk system, since we need to
change the behaviour of the SFP cage driver to ignore the LOS and
TX_FAULT signals after module detection.

Since we move the SFP quirks into the SFP cage driver, we can use it
for the MA5671A and 3FE46541AA modules as well.
====================

Link: https://lore.kernel.org/r/YyDUnvM1b0dZPmmd@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sfp: add support for HALNy GPON SFP

Add a quirk for the HALNy HL-GSFP module, which appears to have an
inverted RX_LOS signal, and maybe uses TX_FAULT as a serial port
transmit pin. Rather than use these hardware signals, switch to
using software polling for these status signals.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sfp: move Huawei MA5671A fixup

Move this module over to the new fixup mechanism.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sfp: move Alcatel Lucent 3FE46541AA fixup

Add a new fixup mechanism to the SFP quirks, and use it for this
module.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sfp: move quirk handling into sfp.c

We need to handle more quirks than just those which affect the link
modes of the module. Move the quirk lookup into sfp.c, and pass the
quirk to sfp-bus.c

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sfp: re-implement soft state polling setup

Re-implement the decision making for soft state polling. Instead of
generating the soft state mask in sfp_soft_start_poll() by looking at
which GPIOs are available, record their availability in
sfp_sm_mod_probe() in sfp->state_hw_mask.

This will then allow us to clear bits in sfp->state_hw_mask in module
specific quirks when the hardware signals should not be used, thereby
allowing us to switch to using the software state polling.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: convert ocelot.txt to dt-schema

Replace the free-form description of device tree bindings for VSC9959
and VSC9953 with a YAML formatted dt-schema description. This contains
more or less the same information, but reworded to be a bit more
succint.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220913125806.524314-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ipa-a-mix-of-cleanups'

Alex Elder says:

====================
net: ipa: a mix of cleanups

This series contains a set of cleanups done in preparation for a
more substantitive upcoming series that reworks how IPA registers
and their fields are defined.

The first eliminates about half of the possible GSI register
constant symbols by removing offset definitions that are not
currently required.

The next two mainly rearrange code for some common enumerated types.

The next one fixes two spots that reuse local variable names in
inner scopes when defining offsets.

The next adds some additional restrictions on the value held in a
register.

And the last one just fixes two field mask symbol names so they
adhere to the common naming convention.
====================

Link: https://lore.kernel.org/r/20220910011131.1431934-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: fix two symbol names

All field mask symbols are defined with a "_FMASK" suffix, but
EOT_COAL_GRANULARITY and DRBIP_ACL_ENABLE are defined without one.
Fix that.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: update sequencer definition constraints

Starting with IPA v4.5, replication is done differently from before,
and as a result the "replication" portion of the how the sequencer
is specified must be zero.

Add a check for the configuration data failing that requirement, and
only update the sesquencer type value when it's supported.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: don't reuse variable names

In ipa_endpoint_init_hdr(), as well as ipa_endpoint_init_hdr_ext(),
a top-level automatic variable named "offset" is used to represent
the offset of a register.

However, deeper within each of those functions is *another*
definition of a local variable with the same name, representing
something else. Scoping rules ensure the result is what was
intended, but this variable name reuse is bad practice and makes
the code confusing.

Fix this by naming the inner variable "off". Use "off" instead of
"checksum_offset" in ipa_endpoint_init_cfg() for consistency.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: move and redefine ipa_version_valid()

Move the definition of ipa_version_valid(), making it a static
inline function defined together with the enumerated type in
"ipa_version.h". Define a new count value in the type.

Rename the function to be ipa_version_supported(), and have it
return true only if the IPA version supplied is explicitly supported
by the driver.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: move the definition of gsi_ee_id

Move the definition of the gsi_ee_id enumerated type out of "gsi.h"
and into "ipa_version.h". That latter header file isolates the
definition of the ipa_version enumerated type, allowing it to be
included in both IPA and GSI code. We have the same requirement for
gsi_ee_id, and moving it here makes it easier to get only that
definition without everything else defined in "gsi.h".

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: don't define unneeded GSI register offsets

Each GSI execution environment (EE) is able to access many of the
GSI registers associated with the other EEs. A block of GSI
registers is contained within a region of memory, and an EE's
register offset can be determined by adding the register's base
offset to the product of the EE ID and a fixed constant.

Despite this possibility, the AP IPA code *never* accesses any GSI
registers other than its own. So there's no need to define the
macros that compute register offsets for other EEs.

Redefine the AP access macros to compute the offset the way the more
general "any EE" macro would, and get rid of the unneeded macros.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ethernet-adi-add-adin1110-support'

Alexandru Tachici says:

====================
net: ethernet: adi: Add ADIN1110 support

The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY
designed for industrial Ethernet applications. It integrates
an Ethernet PHY core with a MAC and all the associated analog
circuitry, input and output clock buffering.

ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers
can be accessed through the MDIO MAC registers.
We are registering an MDIO bus with custom read/write in order
to let the PHY to be discovered by the PAL. This will let
the ADIN1100 Linux driver to probe and take control of
the PHY.

The ADIN2111 is a low power, low complexity, two-Ethernet ports
switch with integrated 10BASE-T1L PHYs and one serial peripheral
interface (SPI) port.

The device is designed for industrial Ethernet applications using
low power constrained nodes and is compliant with the IEEE 802.3cg-2019
Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE).
The switch supports various routing configurations between
the two Ethernet ports and the SPI host port providing a flexible
solution for line, daisy-chain, or ring network topologies.

The ADIN2111 supports cable reach of up to 1700 meters with ultra
low power consumption of 77 mW. The two PHY cores support the
1.0 V p-p operating mode and the 2.4 V p-p operating mode defined
in the IEEE 802.3cg standard.

The device integrates the switch, two Ethernet physical layer (PHY)
cores with a media access control (MAC) interface and all the
associated analog circuitry, and input and output clock buffering.

The device also includes internal buffer queues, the SPI and
subsystem registers, as well as the control logic to manage the reset
and clock control and hardware pin configuration.

Access to the PHYs is exposed via an internal MDIO bus. Writes/reads
can be performed by reading/writing to the ADIN2111 MDIO registers
via SPI.

On probe, for each port, a struct net_device is allocated and
registered. When both ports are added to the same bridge, the driver
will enable offloading of frame forwarding at the hardware level.

Driver offers STP support. Normal operation on forwarding state.
Allows only frames with the 802.1d DA to be passed to the host
when in any of the other states.

When both ports of ADIN2111 belong to the same SW bridge a maximum
of 12 FDB entries will offloaded by the hardware and are marked as such.
====================

Link: https://lore.kernel.org/r/20220913122629.124546-1-andrei.tachici@stud.acs.upb.ro
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: adin1110: Add docs

Add bindings for the ADIN1110/2111 MAC-PHY/SWITCH.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: adi: Add ADIN1110 support

The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY
designed for industrial Ethernet applications. It integrates
an Ethernet PHY core with a MAC and all the associated analog
circuitry, input and output clock buffering.

ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers
can be accessed through the MDIO MAC registers.
We are registering an MDIO bus with custom read/write in order
to let the PHY to be discovered by the PAL. This will let
the ADIN1100 Linux driver to probe and take control of
the PHY.

The ADIN2111 is a low power, low complexity, two-Ethernet ports
switch with integrated 10BASE-T1L PHYs and one serial peripheral
interface (SPI) port.

The device is designed for industrial Ethernet applications using
low power constrained nodes and is compliant with the IEEE 802.3cg-2019
Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE).
The switch supports various routing configurations between
the two Ethernet ports and the SPI host port providing a flexible
solution for line, daisy-chain, or ring network topologies.

The ADIN2111 supports cable reach of up to 1700 meters with ultra
low power consumption of 77 mW. The two PHY cores support the
1.0 V p-p operating mode and the 2.4 V p-p operating mode defined
in the IEEE 802.3cg standard.

The device integrates the switch, two Ethernet physical layer (PHY)
cores with a media access control (MAC) interface and all the
associated analog circuitry, and input and output clock buffering.

The device also includes internal buffer queues, the SPI and
subsystem registers, as well as the control logic to manage the reset
and clock control and hardware pin configuration.

Access to the PHYs is exposed via an internal MDIO bus. Writes/reads
can be performed by reading/writing to the ADIN2111 MDIO registers
via SPI.

On probe, for each port, a struct net_device is allocated and
registered. When both ports are added to the same bridge, the driver
will enable offloading of frame forwarding at the hardware level.

Driver offers STP support. Normal operation on forwarding state.
Allows only frames with the 802.1d DA to be passed to the host
when in any of the other states.

When both ports of ADIN2111 belong to the same SW bridge a maximum
of 12 FDB entries will offloaded by the hardware and are marked as such.

Co-developed-by: Lennart Franzen <lennart@lfdomain.com>
Signed-off-by: Lennart Franzen <lennart@lfdomain.com>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: adin1100: add PHY IDs of adin1110/adin2111

Add additional PHY IDs for the internal PHYs of adin1110 and adin2111.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'seg6-add-next-c-sid-support-for-srv6-end-behavior'

Andrea Mayer says:

====================
seg6: add NEXT-C-SID support for SRv6 End behavior

The Segment Routing (SR) architecture is based on loose source routing.
A list of instructions, called segments, can be added to the packet headers to
influence the forwarding and processing of the packets in an SR enabled
network.
In SRv6 (Segment Routing over IPv6 data plane) [1], the segment identifiers
(SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried
in the Segment Routing Header (SRH). A segment may correspond to a "behavior"
that is executed by a node when the packet is received.
The Linux kernel currently supports a large subset of the behaviors described
in [2] (e.g., End, End.X, End.T and so on).

Some SRv6 scenarios (i.e.: traffic-engineering, fast-rerouting, VPN, mobile
network backhaul, etc.) may require a large number of segments (i.e. up to 15).
Therefore, reducing the size of the SID List is useful to minimize the impact
on MTU (Maximum Transfer Unit) and to enable SRv6 on legacy hardware devices
with limited processing power that can suffer from long IPv6 headers.

Draft-ietf-spring-srv6-srh-compression [3] extends the SRv6 architecture by
providing different mechanisms for the efficient representation (i.e.
compression) of the SID List.

The NEXT-C-SID mechanism described in [3] offers the possibility of encoding
several SRv6 segments within a single 128 bit SID address. Such a SID address
is called a Compressed SID Container. In this way, the length of the SID List
can be drastically reduced. In some cases, the SRH can be omitted, as the IPv6
Destination Address can carry the whole Segment List, using its compressed
representation.

The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2].
The flavors represent additional operations that can modify or extend a subset
of the existing behaviors.

In this patchset we extend the SRv6 Subsystem in order to support the
NEXT-C-SID mechanism.

In details the patchset is made of:
- patch 1/3: add netlink_ext_ack support in parsing SRv6 behavior attributes;
- patch 2/3: add NEXT-C-SID support for SRv6 End behavior;
- patch 3/3: add selftest for NEXT-C-SID in SRv6 End behavior.

The corresponding iproute2 patch for supporting the NEXT-C-SID in SRv6 End
behavior is provided in a separated patchset.

Comments, improvements and suggestions are always appreciated.

[1] - https://datatracker.ietf.org/doc/html/rfc8754
[2] - https://datatracker.ietf.org/doc/html/rfc8986
[3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression

====================

Link: https://lore.kernel.org/r/20220912171619.16943-1-andrea.mayer@uniroma2.it
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End behavior

This selftest is designed for testing the support of NEXT-C-SID flavor
for SRv6 End behavior. It instantiates a virtual network composed of
several nodes: hosts and SRv6 routers. Each node is realized using a
network namespace that is properly interconnected to others through veth
pairs.
The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged
by hosts for communicating with each other. Such routers i) apply
different SRv6 Policies to the traffic received from connected hosts,
considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID
compression mechanism for encoding several SRv6 segments within a single
128-bit SID address, referred to as a Compressed SID (C-SID) container.

The NEXT-C-SID is provided as a "flavor" of the SRv6 End behavior,
enabling it to properly process the C-SID containers. The correct
execution of the enabled NEXT-C-SID SRv6 End behavior is verified
through reachability tests carried out between hosts belonging to the
same VPN.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

seg6: add NEXT-C-SID support for SRv6 End behavior

The NEXT-C-SID mechanism described in [1] offers the possibility of
encoding several SRv6 segments within a single 128 bit SID address. Such
a SID address is called a Compressed SID (C-SID) container. In this way,
the length of the SID List can be drastically reduced.

A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address
logically structured in three main blocks: i) Locator-Block; ii)
Locator-Node Function; iii) Argument.

                        C-SID container
+------------------------------------------------------------------+
|     Locator-Block      |Loc-Node|            Argument            |
|                        |Function|                                |
+------------------------------------------------------------------+
<--------- B -----------> <- NF -> <------------- A --------------->

   (i) The Locator-Block can be any IPv6 prefix available to the provider;

  (ii) The Locator-Node Function represents the node and the function to
       be triggered when a packet is received on the node;

(iii) The Argument carries the remaining C-SIDs in the current C-SID
       container.

The NEXT-C-SID mechanism relies on the "flavors" framework defined in
[2]. The flavors represent additional operations that can modify or
extend a subset of the existing behaviors.

This patch introduces the support for flavors in SRv6 End behavior
implementing the NEXT-C-SID one. An SRv6 End behavior with NEXT-C-SID
flavor works as an End behavior but it is capable of processing the
compressed SID List encoded in C-SID containers.

An SRv6 End behavior with NEXT-C-SID flavor can be configured to support
user-provided Locator-Block and Locator-Node Function lengths. In this
implementation, such lengths must be evenly divisible by 8 (i.e. must be
byte-aligned), otherwise the kernel informs the user about invalid
values with a meaningful error code and message through netlink_ext_ack.

If Locator-Block and/or Locator-Node Function lengths are not provided
by the user during configuration of an SRv6 End behavior instance with
NEXT-C-SID flavor, the kernel will choose their default values i.e.,
32-bit Locator-Block and 16-bit Locator-Node Function.

[1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression
[2] - https://datatracker.ietf.org/doc/html/rfc8986

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

seg6: add netlink_ext_ack support in parsing SRv6 behavior attributes

An SRv6 behavior instance can be set up using mandatory and/or optional
attributes.
In the setup phase, each supplied attribute is parsed and processed. If
the parsing operation fails, the creation of the behavior instance stops
and an error number/code is reported to the user. In many cases, it is
challenging for the user to figure out exactly what happened by relying
only on the error code.

For this reason, we add the support for netlink_ext_ack in parsing SRv6
behavior attributes. In this way, when an SRv6 behavior attribute is
parsed and an error occurs, the kernel can send a message to the
userspace describing the error through a meaningful text message in
addition to the classic error code.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net-next: gro: Fix use of skb_gro_header_slow

In the cited commit, the function ipv6_gro_receive was accidentally
changed to use skb_gro_header_slow, without attempting the fast path.
Fix it.

Fixes: 35ffb6654729 ("net: gro: skb_gro_header helper function")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Link: https://lore.kernel.org/r/20220911184835.GA105063@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Ensure macsec_rule is always initiailized in macsec_fs_{r,t}x_add_rule()

Clang warns:

  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
          if (err)
              ^~~
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:598:9: note: uninitialized use occurs here
          return macsec_rule;
                ^~~~~~~~~~~
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:2: note: remove the 'if' if its condition is always false
          if (err)
          ^~~~~~~~
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:523:38: note: initialize the variable 'macsec_rule' to silence this warning
          union mlx5e_macsec_rule *macsec_rule;
                                              ^
                                              = NULL
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
          if (err)
              ^~~
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1215:9: note: uninitialized use occurs here
          return macsec_rule;
                ^~~~~~~~~~~
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:2: note: remove the 'if' if its condition is always false
          if (err)
          ^~~~~~~~
  drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1118:38: note: initialize the variable 'macsec_rule' to silence this warning
          union mlx5e_macsec_rule *macsec_rule;
                                              ^
                                              = NULL
  2 errors generated.

If macsec_fs_{r,t}x_ft_get() fail, macsec_rule will be uninitialized.
Initialize it to NULL at the top of each function so that it cannot be
used uninitialized.

Fixes: e467b283ffd5 ("net/mlx5e: Add MACsec TX steering rules")
Fixes: 3b20949cb21b ("net/mlx5e: Add MACsec RX steering rules")
Link: https://github.com/ClangBuiltLinux/linux/issues/1706
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Link: https://lore.kernel.org/r/20220911085748.461033-1-nathan@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'dsa-changes-for-multiple-cpu-ports-part-4'

Vladimir Oltean says:

====================
DSA changes for multiple CPU ports (part 4)

Those who have been following part 1:
https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
part 2:
https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/
and part 3:
https://patchwork.kernel.org/project/netdevbpf/cover/20220819174820.3585002-1-vladimir.oltean@nxp.com/
will know that I am trying to enable the second internal port pair from
the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q".

This series represents the final part of that effort. We have:

- the introduction of new UAPI in the form of IFLA_DSA_MASTER, the
  iproute2 patch for which is here:
  https://patchwork.kernel.org/project/netdevbpf/patch/20220904190025.813574-1-vladimir.oltean@nxp.com/

- preparation for LAG DSA masters in terms of suppressing some
  operations for masters in the DSA core that simply don't make sense
  when those masters are a bonding/team interface

- handling all the net device events that occur between DSA and a
  LAG DSA master, including migration to a different DSA master when the
  current master joins a LAG, or the LAG gets destroyed

- updating documentation

- adding an implementation for NXP LS1028A, where things are insanely
  complicated due to hardware limitations. We have 2 tagging protocols:

  * the native "ocelot" protocol (NPI port mode). This does not support
    CPU ports in a LAG, and supports a single DSA master. The DSA master
    can be changed between eno2 (2.5G) and eno3 (1G), but all ports must
    be down during the changing process, and user ports assigned to the
    old DSA master will refuse to come up if the user requests that
    during a "transient" state.

  * the "ocelot-8021q" software-defined protocol, where the Ethernet
    ports connected to the CPU are not actually "god mode" ports as far
    as the hardware is concerned. So here, static assignment between
    user and CPU ports is possible by editing the PGID_SRC masks for
    the port-based forwarding matrix, and "CPU ports in a LAG" simply
    means "a LAG like any other".

The series was regression-tested on LS1028A using the local_termination.sh
kselftest, in most of the possible operating modes and tagging protocols.
I have not done a detailed performance evaluation yet, but using LAG, is
possible to exceed the termination bandwidth of a single CPU port in an
iperf3 test with multiple senders and multiple receivers.

v1 at:
https://patchwork.kernel.org/project/netdevbpf/cover/20220830195932.683432-1-vladimir.oltean@nxp.com/

Previous (older) RFC at:
https://lore.kernel.org/netdev/20220523104256.3556016-1-olteanv@gmail.com/
====================

Link: https://lore.kernel.org/r/20220911010706.2137967-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: felix: add support for changing DSA master

Changing the DSA master means different things depending on the tagging
protocol in use.

For NPI mode ("ocelot" and "seville"), there is a single port which can
be configured as NPI, but DSA only permits changing the CPU port
affinity of user ports one by one. So changing a user port to a
different NPI port globally changes what the NPI port is, and breaks the
user ports still using the old one.

To address this while still permitting the change of the NPI port,
require that the user ports which are still affine to the old NPI port
are down, and cannot be brought up until they are all affine to the same
NPI port.

The tag_8021q mode ("ocelot-8021q") is more flexible, in that each user
port can be freely assigned to one CPU port or to the other. This works
by filtering host addresses towards both tag_8021q CPU ports, and then
restricting the forwarding from a certain user port only to one of the
two tag_8021q CPU ports.

Additionally, the 2 tag_8021q CPU ports can be placed in a LAG. This
works by enabling forwarding via PGID_SRC from a certain user port
towards the logical port ID containing both tag_8021q CPU ports, but
then restricting forwarding per packet, via the LAG hash codes in
PGID_AGGR, to either one or the other.

When we change the DSA master to a LAG device, DSA guarantees us that
the LAG has at least one lower interface as a physical DSA master.
But DSA masters can come and go as lowers of that LAG, and
ds->ops->port_change_master() will not get called, because the DSA
master is still the same (the LAG). So we need to hook into the
ds->ops->port_lag_{join,leave} calls on the CPU ports and update the
logical port ID of the LAG that user ports are assigned to.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

docs: net: dsa: update information about multiple CPU ports

DSA now supports multiple CPU ports, explain the use cases that are
covered, the new UAPI, the permitted degrees of freedom, the driver API,
and remove some old "hanging fruits".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: allow masters to join a LAG

There are 2 ways in which a DSA user port may become handled by 2 CPU
ports in a LAG:

(1) its current DSA master joins a LAG

ip link del bond0 && ip link add bond0 type bond mode 802.3ad
ip link set eno2 master bond0

When this happens, all user ports with "eno2" as DSA master get
automatically migrated to "bond0" as DSA master.

(2) it is explicitly configured as such by the user

# Before, the DSA master was eno3
ip link set swp0 type dsa master bond0

The design of this configuration is that the LAG device dynamically
becomes a DSA master through dsa_master_setup() when the first physical
DSA master becomes a LAG slave, and stops being so through
dsa_master_teardown() when the last physical DSA master leaves.

A LAG interface is considered as a valid DSA master only if it contains
existing DSA masters, and no other lower interfaces. Therefore, we
mainly rely on method (1) to enter this configuration.

Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when
it becomes a standalone DSA master again. But the LAG master also has a
dev->dsa_ptr, and this is actually duplicated from one of the physical
LAG slaves, and therefore needs to be balanced when LAG slaves come and
go.

To the switch driver, putting DSA masters in a LAG is seen as putting
their associated CPU ports in a LAG.

We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG,
by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: propagate extack to port_lag_join

Drivers could refuse to offload a LAG configuration for a variety of
reasons, mainly having to do with its TX type. Additionally, since DSA
masters may now also be LAG interfaces, and this will translate into a
call to port_lag_join on the CPU ports, there may be extra restrictions
there. Propagate the netlink extack to this DSA method in order for
drivers to give a meaningful error message back to the user.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: suppress device links to LAG DSA masters

These don't work (print a harmless error about the operation failing)
and make little sense to have anyway, because when a LAG DSA master goes
away, we will introduce logic to move our CPU port back to the first
physical DSA master. So suppress these device links in preparation for
adding support for LAG DSA masters.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: suppress appending ethtool stats to LAG DSA masters

Similar to the discussion about tracking the admin/oper state of LAG DSA
masters, we have the problem here that struct dsa_port *cpu_dp caches a
single pair of orig_ethtool_ops and netdev_ops pointers.

So if we call dsa_master_setup(bond0, cpu_dp) where cpu_dp is also the
dev->dsa_ptr of one of the physical DSA masters, we'd effectively
overwrite what we cached from that physical netdev with what replaced
from the bonding interface.

We don't need DSA ethtool stats on the bonding interface when used as
DSA master, it's good enough to have them just on the physical DSA
masters, so suppress this logic.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: don't keep track of admin/oper state on LAG DSA masters

We store information about the DSA master's state in
cpu_dp->master_admin_up and cpu_dp->master_oper_up, and this assumes a
bijective association between a CPU port and a DSA master.

However, when we have CPU ports in a LAG (and DSA masters in a LAG too),
the way in which we set up things is that the physical DSA masters still
have dev->dsa_ptr pointing to our cpu_dp, but the bonding/team device
itself also has its dev->dsa_ptr pointing towards one of the CPU port
structures (the first one).

So logically speaking, that first cpu_dp can't keep track of both the
physical master's admin/oper state, and of the bonding master's state.

This isn't even needed; the reason why we keep track of the DSA master's
state is to know when it is available for Ethernet-based register access.
For that use case, we don't even need LAG; we just need to decide upon
one of the physical DSA masters (if there is more than 1 available) and
use that.

This change suppresses dsa_tree_master_{admin,oper}_state_change() calls
on LAG DSA masters (which will be supported in a future change), to
allow the tracking of just physical DSA masters.

Link: https://lore.kernel.org/netdev/628cc94d.1c69fb81.15b0d.422d@mx.google.com/
Suggested-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: allow the DSA master to be seen and changed through rtnetlink

Some DSA switches have multiple CPU ports, which can be used to improve
CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(),
sets up only the first one, leading to suboptimal use of hardware.

The desire is to not change the default configuration but to permit the
user to create a dynamic mapping between individual user ports and the
CPU port that they are served by, configurable through rtnetlink. It is
also intended to permit load balancing between CPU ports, and in that
case, the foreseen model is for the DSA master to be a bonding interface
whose lowers are the physical DSA masters.

To that end, we create a struct rtnl_link_ops for DSA user ports with
the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that
contains the ifindex of the newly desired DSA master.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: introduce dsa_port_get_master()

There is a desire to support for DSA masters in a LAG.

That configuration is intended to work by simply enslaving the master to
a bonding/team device. But the physical DSA master (the LAG slave) still
has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical
CPU port.

However, we would like to be able to retrieve the LAG that's the upper
of the physical DSA master. In preparation for that, introduce a helper
called dsa_port_get_master() that replaces all occurrences of the
dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will
be made later within the helper itself.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: introduce iterators over synced hw addresses

Some network drivers use __dev_mc_sync()/__dev_uc_sync() and therefore
program the hardware only with addresses with a non-zero sync_cnt.

Some of the above drivers also need to save/restore the address
filtering lists when certain events happen, and they need to walk
through the struct net_device :: uc and struct net_device :: mc lists.
But these lists contain unsynced addresses too.

To keep the appearance of an elementary form of data encapsulation,
provide iterators through these lists that only look at entries with a
non-zero sync_cnt, instead of filtering entries out from device drivers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'ice-l2tpv3-offload-support'

Tony Nguyen says:

====================
ice: L2TPv3 offload support

Wojciech Drewek says:

Add support for dissecting L2TPv3 session id in flow dissector. Add support
for this field in tc-flower and support offloading L2TPv3. Finally, add
support for hardware offload of L2TPv3 packets based on session id in
switchdev mode in ice driver.

Example filter:
  # tc filter add dev $PF1 ingress prio 1 protocol ip \
      flower \
        ip_proto l2tp \
        l2tpv3_sid 1234 \
        skip_sw \
      action mirred egress redirect dev $VF1_PR

Changes in iproute2 are required to use the new fields.

ICE COMMS DDP package is required to create a filter in ice.
COMMS DDP package contains profiles of more advanced protocols.
Without COMMS DDP package hw offload will not work, however
sw offload will still work.
====================

Link: https://lore.kernel.org/r/20220908171644.1282191-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ice: Add L2TPv3 hardware offload support

Add support for offloading packets based on L2TPv3 session id in switchdev
mode.

Example filter:
tc filter add dev $PF1 ingress prio 1 protocol ip flower ip_proto l2tp \
l2tpv3_sid 1234 skip_sw action mirred egress redirect dev $VF1_PR

Changes in iproute2 are required to be able to specify l2tpv3_sid.

ICE COMMS DDP package is required to create a filter as it contains L2TPv3
profiles.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

flow_offload: Introduce flow_match_l2tpv3

Allow to offload L2TPv3 filters by adding flow_rule_match_l2tpv3.
Drivers can extract L2TPv3 specific fields from now on.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: flower: Add L2TPv3 filter

Add support for matching on L2TPv3 session ID.
Session ID can be specified only when ip proto was
set to IPPROTO_L2TP.

Example filter:
  # tc filter add dev $PF1 ingress prio 1 protocol ip \
      flower \
        ip_proto l2tp \
        l2tpv3_sid 1234 \
        skip_sw \
      action mirred egress redirect dev $VF1_PR

Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

flow_dissector: Add L2TPv3 dissectors

Allow to dissect L2TPv3 specific field which is:
- session ID (32 bits)

L2TPv3 might be transported over IP or over UDP,
this implementation is only about L2TPv3 over IP.
IP protocol carries L2TPv3 when ip_proto is
IPPROTO_L2TP (115).

Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

uapi: move IPPROTO_L2TP to in.h

IPPROTO_L2TP is currently defined in l2tp.h, but most of
ip protocols are defined in in.h file. Move it there in order
to keep code clean.

Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeon_ep: Remove useless casting value returned by vzalloc to structure

coccinelle reports a warning

WARNING: casting value returned by memory allocation
function to (struct octep_rx_buffer *) is useless.

To fix this the useless cast is removed.

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Link: https://lore.kernel.org/r/Yx+sr9o0uylXVcOl@playground
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

openvswitch: Change the return type for vport_ops.send function hook to int

All usages of the vport_ops struct have the .send field set to
dev_queue_xmit or internal_dev_recv. Since most usages are set to
dev_queue_xmit, the function hook should match the signature of
dev_queue_xmit.

The only call to vport_ops->send() is in net/openvswitch/vport.c and it
throws away the return value.

This mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20220913230739.228313-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wwan: t7xx: Fix return type of t7xx_ccmni_start_xmit

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of t7xx_ccmni_start_xmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://lore.kernel.org/r/20220912214510.929070-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wwan: iosm: Fix return type of ipc_wwan_link_transmit

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of ipc_wwan_link_transmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://lore.kernel.org/r/20220912214455.929028-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: korina: Fix return type of korina_send_packet

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of korina_send_packet should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220912214344.928925-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: litex: Fix return type of liteeth_start_xmit

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of liteeth_start_xmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Gabriel Somlo <gsomlo@gmail.com>
Link: https://lore.kernel.org/r/20220912195307.812229-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: davinci_emac: Fix return type of emac_dev_xmit

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of emac_dev_xmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220912195023.810319-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: davicom: Fix return type of dm9000_start_xmit

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of dm9000_start_xmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220912194722.809525-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ax88796c: Fix return type of ax88796c_start_xmit

The ndo_start_xmit field in net_device_ops is expected to be of type
netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev).

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of ax88796c_start_xmit should be changed from int to
netdev_tx_t.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Acked-by: Lukasz Stelmach <l.stelmach@samsung.com>
Link: https://lore.kernel.org/r/20220912194031.808425-1-nhuck@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'batadv-next-pullrequest-20220916' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- drop unused headers in trace.h, by Sven Eckelmann

- drop initialization of flexible ethtool_link_ksettings,
   by Sven Eckelmann

- remove unused struct definitions, by Marek Lindner

* tag 'batadv-next-pullrequest-20220916' of git://git.open-mesh.org/linux-merge:
  batman-adv: remove unused struct definitions
  batman-adv: Drop initialization of flexible ethtool_link_ksettings
  batman-adv: Drop unused headers in trace.h
  batman-adv: Start new development cycle
====================

Link: https://lore.kernel.org/r/20220916161454.1413154-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-adjust-qos-tests-for-spectrum-4-testing'

Petr Machata says:

====================
mlxsw: Adjust QOS tests for Spectrum-4 testing

Amit writes:

Quality Of Service tests create congestion and verify the switch behavior.
To create congestion, they need to have more traffic than the port can
handle, so some of them force 1Gbps speed.

The tests assume that 1Gbps speed is supported. Spectrum-4 ASIC will not
support this speed in all ports, so to be able to run QOS tests there,
some adjustments are required.

Patch set overview:
Patch #1 adjusts qos_ets_strict, qos_mc_aware and sch_ets tests.
Patch #2 adjusts RED tests.
Patch #3 extends devlink_lib to support querying maximum pool size.
Patch #4 adds a test which can be used instead of qos_burst and do not
assume that 1Gbps speed is supported.
Patch #5 removes qos_burst test.
====================

Link: https://lore.kernel.org/r/cover.1663152826.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Remove qos_burst test

The previous patch added a test which can be used instead of qos_burst.sh.
Remove this test.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Add QOS test for maximum use of descriptors

Add an equivalent test to qos_burst, the test's purpose is same, but the
new test uses simpler topology and does not require forcing low speed.
In addition, it can be run Spectrum-2 and not only Spectrum-3+. The idea
is to use a shaper in order to limit the traffic and create congestion.

qos_burst test uses small pool, sends many small packets, and verify that
packets are not dropped, which means that many descriptors can be handled.
This test should check the change that commit c864769add96
("mlxsw: Configure descriptor buffers") pushed.

Instead, the new test tries to use more than 85% of maximum supported
descriptors. The idea is to use big pool (as much as the ASIC supports),
such that the pool size does not limit the traffic, then send many small
packets, which means that many descriptors are used, and check how many
packets the switch can handle.

The usage of shaper allows to run the test in all ASICs, regardless of
the CPU abilities, as it is able to create the congestion with low rate
of packets.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: devlink_lib: Add function for querying maximum pool size

The maximum pool size is exposed via 'devlink sb' command. The next
patch will add a test which increases some pools to the maximum size.

Add a function to query the value.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Use shapers in QOS RED tests instead of forcing speed

QOS tests create congestion and verify the switch behavior. To create
congestion, they need to have more traffic than the port can handle, so
some of them force 1Gbps speed.

The tests assume that 1Gbps speed is supported, otherwise, they will fail.
Spectrum-4 ASIC will not support this speed in all ports, so to be able
to run the tests there, some adjustments are required. Use shapers to limit
the traffic instead of forcing speed. Note that for several ports, the
speed configuration is just for autoneg issues, so shaper is not needed
instead.

The tests already use ETS qdisc as a root and RED qdiscs as children. Add
a new TBF shaper to limit the rate of traffic, and use it as a root qdisc,
then save the previous hierarchy of qdiscs under the new TBF root.

In some ASICs, the shapers do not limit the traffic as accurately as
forcing speed. To make the tests stable, allow the backlog size to be up to
+-10% of the threshold. The aim of the tests is to make sure that with
backlog << threshold, there are no drops, and that packets are dropped
somewhere in vicinity of the configured threshold.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: Use shapers in QOS tests instead of forcing speed

QOS tests create congestion and verify the switch behavior. To create
congestion, they need to have more traffic than the port can handle, so
some of them force 1Gbps speed.

The tests assume that 1Gbps speed is supported, otherwise, they will fail.
Spectrum-4 ASIC will not support this speed in all ports, so to be able
to run QOS tests there, some adjustments are required. Use shapers to
limit the traffic instead of forcing speed. Note that for several ports,
the speed configuration is just for autoneg issues, so shaper is not needed
instead.

In tests that already use shapers, set the existing shaper to be a child of
a new TBF shaper which is added as a root qdisc and acts as a port shaper.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'remove-label-cpu-from-dsa-dt-bindings'

Vladimir Oltean says:

====================
Remove label = "cpu" from DSA dt-bindings

As explained in more detail in patch 1/3, label = "cpu" is not part of
DSA's device tree bindings, yet we have some checks in the dt-schema for
mt7530 which are written as if it was.

Reformulate those checks, and remove all occurrences of this seemingly
used, but actually unused, property from the binding examples.
====================

Link: https://lore.kernel.org/r/20220912175058.280386-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: remove label = "cpu" from examples

This is not used by the DSA dt-binding, so remove it from all examples.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: mt7530: stop requiring phy-mode on CPU ports

The common dsa-port.yaml does this (and more) since commit 2ec2fb8331af
("dt-bindings: net: dsa: make phylink bindings required for CPU/DSA
ports").

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: mt7530: replace label = "cpu" with proper checks

The fact that some DSA device trees use 'label = "cpu"' for the CPU port
is nothing but blind cargo cult copying. The 'label' property was never
part of the DSA DT bindings for anything except the user ports, where it
provided a hint as to what name the created netdevs should use.

DSA does use the "cpu" port label to identify a CPU port in dsa_port_parse(),
but this is only for non-OF code paths (platform data).

The proper way to identify a CPU port is to look at whether the
'ethernet' phandle is present.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rds: add missing __init/__exit annotations to module init/exit funcs

Add missing __init/__exit annotations to module init/exit funcs.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220909091840.247946-1-xiujianfeng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rxrpc: remove rxrpc_max_call_lifetime declaration

rxrpc_max_call_lifetime has been removed since
commit a158bdd3247b ("rxrpc: Fix call timeouts"),
so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://lore.kernel.org/r/20220909064042.1149404-1-cuigaosheng1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Switch to kmemdup() when allocate dev_addr

Use kmemdup() helper instead of open-coding to
simplify the code when allocate dev_addr.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20220914140100.3795545-2-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: add missing error code in error path

Add missing error code when mlx5e_macsec_fs_add_rule() or
mlx5e_macsec_fs_init() fails. mlx5e_macsec_fs_init() don't
return ERR_PTR(), so replace IS_ERR_OR_NULL() check with
NULL pointer check.

Fixes: e467b283ffd5 ("net/mlx5e: Add MACsec TX steering rules")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20220914140100.3795545-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'standardized-ethtool-counters-for-nxp-enetc'

Vladimir Oltean says:

====================
Standardized ethtool counters for NXP ENETC

This is another preparation patch for the introduction of MAC Merge
Layer statistics, this time for the enetc driver (endpoint ports on the
NXP LS1028A). The same set of stats groups is supported as in the case
of the Felix DSA switch.
====================

Link: https://lore.kernel.org/r/20220909113800.55225-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: expose some standardized ethtool counters

Structure the code in such a way that it can be reused later for the
pMAC statistics, by just changing the "mac" argument to 1.

Usage:
ethtool --include-statistics --show-pause eno2
ethtool -S eno0 --groups eth-mac
ethtool -S eno0 --groups eth-ctrl
ethtool -S eno0 --groups rmon

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: parameterize port MAC stats to also cover the pMAC

The ENETC has counters for the eMAC and for the pMAC exactly 0x1000
apart from each other. The driver only contains definitions for PM0,
the eMAC.

Rather than duplicating everything for PM1, modify the register
definitions such that they take the MAC as argument.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ravb: Add R-Car Gen4 support

Add support for the Renesas Ethernet AVB (EtherAVB-IF) blocks on R-Car
Gen4 SoCs (e.g. R-Car V4H) by matching on a family-specific compatible
value.

These are treated the same as EtherAVB on R-Car Gen3.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/2ee968890feba777e627d781128b074b2c43cddb.1662718171.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dt-bindings-net-renesas-etheravb-r-car-gen4-updates'

Geert Uytterhoeven says:

====================
dt-bindings: net: renesas,etheravb: R-Car Gen4 updates

This patch series contains two updates for the Renesas Ethernet AVB
Device Tree bindings.
====================

Link: https://lore.kernel.org/r/cover.1662714607.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: renesas,etheravb: Add r8a779g0 support

Document support for the Renesas Ethernet AVB (EtherAVB-IF) block in the
Renesas R-Car V4H (R8A779G0) SoC.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: renesas,etheravb: R-Car V3U is R-Car Gen4

Despite the name, R-Car V3U is the first member of the R-Car Gen4
family. Hence move its compatible value to the R-Car Gen4 section.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: micrel: Cable Diag feature for lan8814 phy

Support for Cable Diagnostics in lan8814 phy

Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220909083123.30134-1-Divya.Koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-fec-support-on-s32v234-platform'

Wei Fang says:

====================
Add FEC support on s32v234 platform

This series patches are to add FEC support on s32v234 platfom.
1. Add compatible string and quirks for fsl,s32v234
2. Update Kconfig to also check for ARCH_S32.
====================

Link: https://lore.kernel.org/r/20220907095649.3101484-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: Add initial s32v234 support

Update Kconfig to also check for ARCH_S32.
Add compatible string and quirks for fsl,s32v234

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: fec: add fsl,s32v234-fec to compatible property

Add fsl,s32v234-fec to compatible property to support s32v234 platform.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ib-mfd-net-pinctrl-v6.0' of git://git./linux/kernel/git/lee/mfd

Lee Jones says:

====================
Immutable branch between MFD, Net and Pinctrl due for the v6.0 merge window

* tag 'ib-mfd-net-pinctrl-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: ocelot: Add support for the vsc7512 chip via spi
  dt-bindings: mfd: ocelot: Add bindings for VSC7512
  resource: add define macro for register address resources
  pinctrl: microchip-sgpio: add ability to be used in a non-mmio configuration
  pinctrl: microchip-sgpio: allow sgpio driver to be used as a module
  pinctrl: ocelot: add ability to be used in a non-mmio configuration
  net: mdio: mscc-miim: add ability to be used in a non-mmio configuration
  mfd: ocelot: Add helper to get regmap from a resource
====================

Link: https://lore.kernel.org/r/YxrjyHcceLOFlT/c@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: lan937x: fix reference count leak in lan937x_mdio_register()

This node pointer is returned by of_find_compatible_node() with
refcount incremented in this function. of_node_put() on it before
exitting this function.

Fixes: c9cd961c0d43 ("net: dsa: microchip: lan937x: add interrupt support for port phy link")
Signed-off-by: Sun Ke <sunke32@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220908040226.871690-1-sunke32@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: disable detection of chip version 36

I found no evidence that this chip version ever made it to the mass
market. Therefore disable detection. Like in similar cases before:
If nobody complains, we'll remove support for this chip version after
few months.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/ac622d4a-ae0a-3817-710f-849db4015c78@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: remove fs_mii_disconnect and fs_mii_connect declarations

fs_mii_disconnect and fs_mii_connect have been removed since
commit 5b4b8454344a ("[PATCH] FS_ENET: use PAL for mii management"),
so remove them.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220909062959.1144493-1-cuigaosheng1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'octeontx2-cn10k-ptp'

From: Naveen Mamindlapalli <naveenm@marvell.com>
To: <kuba@kernel.org>, <davem@davemloft.net>, <edumazet@google.com>,
<pabeni@redhat.com>, <richardcochran@gmail.com>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<sgoutham@marvell.com>, <hkelam@marvell.com>
Cc: Naveen Mamindlapalli <naveenm@marvell.com>
Subject: [net-next PATCH 0/4] Add PTP support for CN10K silicon
Date: Sat, 10 Sep 2022 13:24:12 +0530 [thread overview]
Message-ID: <20220910075416.22887-1-naveenm@marvell.com> (raw)

This patchset adds PTP support for CN10K silicon, specifically
to workaround few hardware issues and to add 1-step mode.

Patchset overview:

Patch #1 returns correct ptp timestamp in nanoseconds captured
when external timestamp event occurs.

Patch #2 adds 1-step mode support.

Patch #3 implements software workaround to generate PPS output properly.

Patch #4 provides a software workaround for the rollover register default
value, which causes ptp to return the wrong timestamp.
====================

Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Initialize PTP_SEC_ROLLOVER register properly

Since the reset value of PTP_SEC_ROLLOVER is incorrect on
CNF10KB silicon, the ptp timestamps are inaccurate. This
patch initializes the PTP_SEC_ROLLOVER register properly
for the CNF10KB silicon.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>