review.tizen.org Git - platform/kernel/linux-starfive.git/log

dccp: annotate lockless accesses to sk->sk_err_soft

This field can be read/written without lock synchronization.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: annotate lockless accesses to sk->sk_err_soft

This field can be read/written without lock synchronization.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: partially enable SIOCSHWTSTAMP in container

Setting timestamp filter was explicitly disabled on vlan devices in
containers because it might affect other processes on the host. But it's
absolutely legit in case when real device is in the same namespace.

Fixes: 873017af7784 ("vlan: disable SIOCSHWTSTAMP in container")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'pcs_get_state-fixes'

Russell King (Oracle) says:

====================
Minor fixes for pcs_get_state() implementations

This series contains a number fixes for minor issues with some
pcs_get_state() implementations, particualrly for the phylink
state->an_enabled member. As they are minor, I'm suggesting we
queue them in net-next as there is follow-on work for these, and
there is no urgency for them to be in -rc.

Just like phylib, state->advertising's Autoneg bit is a copy of
state->an_enabled, and thus it is my intention to remove
state->an_enabled from phylink to simplify things.

This series gets rid of state->an_enabled assignments or
reporting that should never have been there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: pcs: lynx: don't print an_enabled in pcs_get_state()

an_enabled will be going away, and in any case, pcs_get_state() should
not be updating this member. Remove the print.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pcs: xpcs: remove double-read of link state when using AN

Phylink does not want the current state of the link when reading the
PCS link state - it wants the latched state. Don't double-read the
MII status register. Phylink will re-read as necessary to capture
transient link-down events as of dbae3388ea9c ("net: phylink: Force
retrigger in case of latched link-fail indicator").

The above referenced commit is a dependency for this change, and thus
this change should not be backported to any kernel that does not
contain the above referenced commit.

Fixes: fcb26bd2b6ca ("net: phy: Add Synopsys DesignWare XPCS MDIO module")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vxlan-MDB-support'

Ido Schimmel says:

====================
vxlan: Add MDB support

tl;dr
=====

This patchset implements MDB support in the VXLAN driver, allowing it to
selectively forward IP multicast traffic to VTEPs with interested
receivers instead of flooding it to all the VTEPs as BUM. The motivating
use case is intra and inter subnet multicast forwarding using EVPN
[1][2], which means that MDB entries are only installed by the user
space control plane and no snooping is implemented, thereby avoiding a
lot of unnecessary complexity in the kernel.

Background
==========

Both the bridge and VXLAN drivers have an FDB that allows them to
forward Ethernet frames based on their destination MAC addresses and
VLAN/VNI. These FDBs are managed using the same PF_BRIDGE/RTM_*NEIGH
netlink messages and bridge(8) utility.

However, only the bridge driver has an MDB that allows it to selectively
forward IP multicast packets to bridge ports with interested receivers
behind them, based on (S, G) and (*, G) MDB entries. When these packets
reach the VXLAN driver they are flooded using the "all-zeros" FDB entry
(00:00:00:00:00:00). The entry either includes the list of all the VTEPs
in the tenant domain (when ingress replication is used) or the multicast
address of the BUM tunnel (when P2MP tunnels are used), to which all the
VTEPs join.

Networks that make heavy use of multicast in the overlay can benefit
from a solution that allows them to selectively forward IP multicast
traffic only to VTEPs with interested receivers. Such a solution is
described in the next section.

Motivation
==========

RFC 7432 [3] defines a "MAC/IP Advertisement route" (type 2) [4] that
allows VTEPs in the EVPN network to advertise and learn reachability
information for unicast MAC addresses. Traffic destined to a unicast MAC
address can therefore be selectively forwarded to a single VTEP behind
which the MAC is located.

The same is not true for IP multicast traffic. Such traffic is simply
flooded as BUM to all VTEPs in the broadcast domain (BD) / subnet,
regardless if a VTEP has interested receivers for the multicast stream
or not. This is especially problematic for overlay networks that make
heavy use of multicast.

The issue is addressed by RFC 9251 [1] that defines a "Selective
Multicast Ethernet Tag Route" (type 6) [5] which allows VTEPs in the
EVPN network to advertise multicast streams that they are interested in.
This is done by having each VTEP suppress IGMP/MLD packets from being
transmitted to the NVE network and instead communicate the information
over BGP to other VTEPs.

The draft in [2] further extends RFC 9251 with procedures to allow
efficient forwarding of IP multicast traffic not only in a given subnet,
but also between different subnets in a tenant domain.

The required changes in the bridge driver to support the above were
already merged in merge commit 8150f0cfb24f ("Merge branch
'bridge-mcast-extensions-for-evpn'"). However, full support entails MDB
support in the VXLAN driver so that it will be able to selectively
forward IP multicast traffic only to VTEPs with interested receivers.
The implementation of this MDB is described in the next section.

Implementation
==============

The user interface is extended to allow user space to specify the
destination VTEP(s) and related parameters. Example usage:

# bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent dst 198.51.100.1
# bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent dst 192.0.2.1

$ bridge -d -s mdb show
dev vxlan0 port vxlan0 grp 239.1.1.1 permanent filter_mode exclude proto static dst 192.0.2.1    0.00
dev vxlan0 port vxlan0 grp 239.1.1.1 permanent filter_mode exclude proto static dst 198.51.100.1    0.00

Since the MDB is fully managed by user space and since snooping is not
implemented, only permanent entries can be installed and temporary
entries are rejected by the kernel.

The netlink interface is extended with a few new attributes in the
RTM_NEWMDB / RTM_DELMDB request messages:

[ struct nlmsghdr ]
[ struct br_port_msg ]
[ MDBA_SET_ENTRY ]
struct br_mdb_entry
[ MDBA_SET_ENTRY_ATTRS ]
[ MDBE_ATTR_SOURCE ]
struct in_addr / struct in6_addr
[ MDBE_ATTR_SRC_LIST ]
[ MDBE_SRC_LIST_ENTRY ]
[ MDBE_SRCATTR_ADDRESS ]
struct in_addr / struct in6_addr
[ ...]
[ MDBE_ATTR_GROUP_MODE ]
u8
[ MDBE_ATTR_RTPORT ]
u8
[ MDBE_ATTR_DST ] // new
struct in_addr / struct in6_addr
[ MDBE_ATTR_DST_PORT ] // new
u16
[ MDBE_ATTR_VNI ] // new
u32
[ MDBE_ATTR_IFINDEX ] // new
s32
[ MDBE_ATTR_SRC_VNI ] // new
u32

RTM_NEWMDB / RTM_DELMDB responses and notifications are extended with
corresponding attributes.

One MDB entry that can be installed in the VXLAN MDB, but not in the
bridge MDB is the catchall entry (0.0.0.0 / ::). It is used to transmit
unregistered multicast traffic that is not link-local and is especially
useful when inter-subnet multicast forwarding is required. See patch #12
for a detailed explanation and motivation. It is similar to the
"all-zeros" FDB entry that can be installed in the VXLAN FDB, but not
the bridge FDB.

"added_by_star_ex" entries
--------------------------

The bridge driver automatically installs (S, G) MDB port group entries
marked as "added_by_star_ex" whenever it detects that an (S, G) entry
can prevent traffic from being forwarded via a port associated with an
EXCLUDE (*, G) entry. The bridge will add the port to the port group of
the (S, G) entry, thereby creating a new port group entry. The
complexity associated with these entries is not trivial, but it needs to
reside in the bridge driver because it automatically installs MDB
entries in response to snooped IGMP / MLD packets.

The same in not true for the VXLAN MDB which is entirely managed by user
space who is fully capable of forming the correct replication lists on
its own. In addition, the complexity associated with the
"added_by_star_ex" entries in the VXLAN driver is higher compared to the
bridge: Whenever a remote VTEP is added to the catchall entry, it needs
to be added to all the existing MDB entries, as such a remote requested
all the multicast traffic to be forwarded to it. Similarly, whenever an
(*, G) or (S, G) entry is added, all the remotes associated with the
catchall entry need to be added to it.

Given the above, this patchset does not implement support for such
entries.  One argument against this decision can be that in the future
someone might want to populate the VXLAN MDB in response to decapsulated
IGMP / MLD packets and not according to EVPN routes. Regardless of my
doubts regarding this possibility, it can be implemented using a new
VXLAN device knob that will also enable the "added_by_star_ex"
functionality.

Testing
=======

Tested using existing VXLAN and MDB selftests under "net/" and
"net/forwarding/". Added a dedicated selftest in the last patch.

Patchset overview
=================

Patches #1-#3 are small preparations in the bridge driver. I plan to
submit them separately together with an MDB dump test case.

Patches #4-#6 are additional preparations centered around the extraction
of the MDB netlink handlers from the bridge driver to the common
rtnetlink code. This allows reusing the existing MDB netlink messages
for the configuration of the VXLAN MDB.

Patches #7-#9 include more small preparations in the common rtnetlink
code and the VXLAN driver.

Patch #10 implements the MDB control path in the VXLAN driver, which
will allow user space to create, delete, replace and dump MDB entries.

Patches #11-#12 implement the MDB data path in the VXLAN driver,
allowing it to selectively forward IP multicast traffic according to the
matched MDB entry.

Patch #13 finally enables MDB support in the VXLAN driver.

iproute2 patches can be found here [6].

Note that in order to fully support the specifications in [1] and [2],
additional functionality is required from the data path. However, it can
be achieved using existing kernel interfaces which is why it is not
described here.

Changelog
=========

Since v1 [7]:

Patch #9: Use htons() in 'case' instead of ntohs() in 'switch'.

Since RFC [8]:

Patch #3: Use NL_ASSERT_DUMP_CTX_FITS().
Patch #3: memset the entire context when moving to the next device.
Patch #3: Reset sequence counters when moving to the next device.
Patch #3: Use NL_SET_ERR_MSG_ATTR() in rtnl_validate_mdb_entry().
Patch #7: Remove restrictions regarding mixing of multicast and unicast
remote destination IPs in an MDB entry. While such configuration does
not make sense to me, it is no forbidden by the VXLAN FDB code and does
not crash the kernel.
Patch #7: Fix check regarding all-zeros MDB entry and source.
Patch #11: New patch.

[1] https://datatracker.ietf.org/doc/html/rfc9251
[2] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast
[3] https://datatracker.ietf.org/doc/html/rfc7432
[4] https://datatracker.ietf.org/doc/html/rfc7432#section-7.2
[5] https://datatracker.ietf.org/doc/html/rfc9251#section-9.1
[6] https://github.com/idosch/iproute2/commits/submit/mdb_vxlan_rfc_v1
[7] https://lore.kernel.org/netdev/20230313145349.3557231-1-idosch@nvidia.com/
[8] https://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: Add VXLAN MDB test

Add test cases for VXLAN MDB, testing the control and data paths. Two
different sets of namespaces (i.e., ns{1,2}_v4 and ns{1,2}_v6) are used
in order to test VXLAN MDB with both IPv4 and IPv6 underlays,
respectively.

Example truncated output:

# ./test_vxlan_mdb.sh
[...]
Tests passed: 620
Tests failed: 0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Enable MDB support

Now that the VXLAN MDB control and data paths are in place we can expose
the VXLAN MDB functionality to user space.

Set the VXLAN MDB net device operations to the appropriate functions,
thereby allowing the rtnetlink code to reach the VXLAN driver.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Add MDB data path support

Integrate MDB support into the Tx path of the VXLAN driver, allowing it
to selectively forward IP multicast traffic according to the matched MDB
entry.

If MDB entries are configured (i.e., 'VXLAN_F_MDB' is set) and the
packet is an IP multicast packet, perform up to three different lookups
according to the following priority:

1. For an (S, G) entry, using {Source VNI, Source IP, Destination IP}.
2. For a (*, G) entry, using {Source VNI, Destination IP}.
3. For the catchall MDB entry (0.0.0.0 or ::), using the source VNI.

The catchall MDB entry is similar to the catchall FDB entry
(00:00:00:00:00:00) that is currently used to transmit BUM (broadcast,
unknown unicast and multicast) traffic. However, unlike the catchall FDB
entry, this entry is only used to transmit unregistered IP multicast
traffic that is not link-local. Therefore, when configured, the catchall
FDB entry will only transmit BULL (broadcast, unknown unicast,
link-local multicast) traffic.

The catchall MDB entry is useful in deployments where inter-subnet
multicast forwarding is used and not all the VTEPs in a tenant domain
are members in all the broadcast domains. In such deployments it is
advantageous to transmit BULL (broadcast, unknown unicast and link-local
multicast) and unregistered IP multicast traffic on different tunnels.
If the same tunnel was used, a VTEP only interested in IP multicast
traffic would also pull all the BULL traffic and drop it as it is not a
member in the originating broadcast domain [1].

If the packet did not match an MDB entry (or if the packet is not an IP
multicast packet), return it to the Tx path, allowing it to be forwarded
according to the FDB.

If the packet did match an MDB entry, forward it to the associated
remote VTEPs. However, if the entry is a (*, G) entry and the associated
remote is in INCLUDE mode, then skip over it as the source IP is not in
its source list (otherwise the packet would have matched on an (S, G)
entry). Similarly, if the associated remote is marked as BLOCKED (can
only be set on (S, G) entries), then skip over it as well as the remote
is in EXCLUDE mode and the source IP is in its source list.

[1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: mdb: Add an internal flag to indicate MDB usage

Add an internal flag to indicate whether MDB entries are configured or
not. Set the flag after installing the first MDB entry and clear it
before deleting the last one.

The flag will be consulted by the data path which will only perform an
MDB lookup if the flag is set, thereby keeping the MDB overhead to a
minimum when the MDB is not used.

Another option would have been to use a static key, but it is global and
not per-device, unlike the current approach.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: mdb: Add MDB control path support

Implement MDB control path support, enabling the creation, deletion,
replacement and dumping of MDB entries in a similar fashion to the
bridge driver. Unlike the bridge driver, each entry stores a list of
remote VTEPs to which matched packets need to be replicated to and not a
list of bridge ports.

The motivating use case is the installation of MDB entries by a user
space control plane in response to received EVPN routes. As such, only
allow permanent MDB entries to be installed and do not implement
snooping functionality, avoiding a lot of unnecessary complexity.

Since entries can only be modified by user space under RTNL, use RTNL as
the write lock. Use RCU to ensure that MDB entries and remotes are not
freed while being accessed from the data path during transmission.

In terms of uAPI, reuse the existing MDB netlink interface, but add a
few new attributes to request and response messages:

* IP address of the destination VXLAN tunnel endpoint where the
  multicast receivers reside.

* UDP destination port number to use to connect to the remote VXLAN
  tunnel endpoint.

* VXLAN VNI Network Identifier to use to connect to the remote VXLAN
  tunnel endpoint. Required when Ingress Replication (IR) is used and
  the remote VTEP is not a member of originating broadcast domain
  (VLAN/VNI) [1].

* Source VNI Network Identifier the MDB entry belongs to. Used only when
  the VXLAN device is in external mode.

* Interface index of the outgoing interface to reach the remote VXLAN
  tunnel endpoint. This is required when the underlay destination IP is
  multicast (P2MP), as the multicast routing tables are not consulted.

All the new attributes are added under the 'MDBA_SET_ENTRY_ATTRS' nest
which is strictly validated by the bridge driver, thereby automatically
rejecting the new attributes.

[1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-3.2.2

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Expose vxlan_xmit_one()

Given a packet and a remote destination, the function will take care of
encapsulating the packet and transmitting it to the destination.

Expose it so that it could be used in subsequent patches by the MDB code
to transmit a packet to the remote destination(s) stored in the MDB
entry.

It will allow us to keep the MDB code self-contained, not exposing its
data structures to the rest of the VXLAN driver.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Move address helpers to private headers

Move the helpers out of the core C file to the private header so that
they could be used by the upcoming MDB code.

While at it, constify the second argument of vxlan_nla_get_addr().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: bridge: mcast: Relax group address validation in common code

In the upcoming VXLAN MDB implementation, the 0.0.0.0 and :: MDB entries
will act as catchall entries for unregistered IP multicast traffic in a
similar fashion to the 00:00:00:00:00:00 VXLAN FDB entry that is used to
transmit BUM traffic.

In deployments where inter-subnet multicast forwarding is used, not all
the VTEPs in a tenant domain are members in all the broadcast domains.
It is therefore advantageous to transmit BULL (broadcast, unknown
unicast and link-local multicast) and unregistered IP multicast traffic
on different tunnels. If the same tunnel was used, a VTEP only
interested in IP multicast traffic would also pull all the BULL traffic
and drop it as it is not a member in the originating broadcast domain
[1].

Prepare for this change by allowing the 0.0.0.0 group address in the
common rtnetlink MDB code and forbid it in the bridge driver. A similar
change is not needed for IPv6 because the common code only validates
that the group address is not the all-nodes address.

[1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: bridge: mcast: Move MDB handlers out of bridge driver

Currently, the bridge driver registers handlers for MDB netlink
messages, making it impossible for other drivers to implement MDB
support.

As a preparation for VXLAN MDB support, move the MDB handlers out of the
bridge driver to the core rtnetlink code. The rtnetlink code will call
into individual drivers by invoking their previously added MDB net
device operations.

Note that while the diffstat is large, the change is mechanical. It
moves code out of the bridge driver to rtnetlink code. Also note that a
similar change was made in 2012 with commit 77162022ab26 ("net: add
generic PF_BRIDGE:RTM_ FDB hooks") that moved FDB handlers out of the
bridge driver to the core rtnetlink code.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mcast: Implement MDB net device operations

Implement the previously added MDB net device operations in the bridge
driver so that they could be invoked by core rtnetlink code in the next
patch.

The operations are identical to the existing br_mdb_{dump,add,del}
functions. The '_new' suffix will be removed in the next patch. The
functions are re-implemented in this patch to make the conversion in the
next patch easier to review.

Add dummy implementations when 'CONFIG_BRIDGE_IGMP_SNOOPING' is
disabled, so that an error will be returned to user space when it is
trying to add or delete an MDB entry. This is consistent with existing
behavior where the bridge driver does not even register rtnetlink
handlers for RTM_{NEW,DEL,GET}MDB messages when this Kconfig option is
disabled.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add MDB net device operations

Add MDB net device operations that will be invoked by rtnetlink code in
response to received RTM_{NEW,DEL,GET}MDB messages. Subsequent patches
will implement these operations in the bridge and VXLAN drivers.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'J784S4-CPSW9G-bindings'

Siddharth Vadapalli says:

====================
Add J784S4 CPSW9G NET Bindings

This series cleans up the bindings by reordering the compatibles, followed
by adding the bindings for CPSW9G instance of CPSW Ethernet Switch on TI's
J784S4 SoC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: ti: k3-am654-cpsw-nuss: Add J784S4 CPSW9G support

Update bindings for TI K3 J784S4 SoC which contains 9 ports (8 external
ports) CPSW9G module and add compatible for it.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: ti: k3-am654-cpsw-nuss: Fix compatible order

Reorder compatibles to follow alphanumeric order.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mana: Add new MANA VF performance counters for easier troubleshooting

Extended performance counter stats in 'ethtool -S <interface>' output
for MANA VF to facilitate troubleshooting.

Tested-on: Ubuntu22
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wangxun: Implement the ndo change mtu interface

Add ngbe and txgbe ndo_change_mtu support.

Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: realtek: rtl8365mb: add change_mtu

The rtl8365mb was using a fixed MTU size of 1536, which was probably
inspired by the rtl8366rb's initial frame size. However, unlike that
family, the rtl8365mb family can specify the max frame size in bytes,
rather than in fixed steps.

DSA calls change_mtu for the CPU port once the max MTU value among the
ports changes. As the max frame size is defined globally, the switch
is configured only when the call affects the CPU port.

The available specifications do not directly define the max supported
frame size, but it mentions a 16k limit. This driver will use the 0x3FFF
limit as it is used in the vendor API code. However, the switch sets the
max frame size to 16368 bytes (0x3FF0) after it resets.

change_mtu uses MTU size, or ethernet payload size, while the switch
works with frame size. The frame size is calculated considering the
ethernet header (14 bytes), a possible 802.1Q tag (4 bytes), the payload
size (MTU), and the Ethernet FCS (4 bytes). The CPU tag (8 bytes) is
consumed before the switch enforces the limit.

During setup, the driver will use the default 1500-byte MTU of DSA to
set the maximum frame size. The current sum will be
VLAN_ETH_HLEN+1500+ETH_FCS_LEN, which results in 1522 bytes. Although
it is lower than the previous initial value of 1536 bytes, the driver
will increase the frame size for a larger MTU. However, if something
requires more space without increasing the MTU, such as QinQ, we would
need to add the extra length to the rtl8365mb_port_change_mtu() formula.

MTU was tested up to 2018 (with 802.1Q) as that is as far as mt7620
(where rtl8367s is stacked) can go. The register was manually
manipulated byte-by-byte to ensure the MTU to frame size conversion was
correct. For frames without 802.1Q tag, the frame size limit will be 4
bytes over the required size.

There is a jumbo register, enabled by default at 6k frame size.
However, the jumbo settings do not seem to limit nor expand the maximum
tested MTU (2018), even when jumbo is disabled. More tests are needed
with a device that can handle larger frames.

Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'add-ptp-support-for-sama7g5'

Durai Manickam says:

====================
Add PTP support for sama7g5

This patch series is intended to add PTP capability to the GEM and
EMAC for sama7g5.
====================

Link: https://lore.kernel.org/r/20230315095053.53969-1-durai.manickamkr@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: Add PTP support to EMAC for sama7g5

Add PTP capability to the Ethernet MAC.

Signed-off-by: Durai Manickam KR <durai.manickamkr@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: Add PTP support to GEM for sama7g5

Add PTP capability to the Gigabit Ethernet MAC.

Signed-off-by: Durai Manickam KR <durai.manickamkr@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: hellcreek: Get rid of custom led_init_default_state_get()

LED core provides a helper to parse default state from firmware node.
Use it instead of custom implementation.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://lore.kernel.org/r/20230314181824.56881-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: mrvl: Use of_property_read_bool() for boolean properties

It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to of_property_read_bool().

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: mrvl: Move platform_data struct into driver

There are no users of nfcmrvl platform_data struct outside of the
driver and none will be added, so move it into the driver.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mxl-gpy: enhance delay time required by loopback disable function

GPY2xx devices need 3 seconds to fully switch out of loopback mode
before it can safely re-enter loopback mode. Implement timeout mechanism
to guarantee 3 seconds waited before re-enter loopback mode.

Signed-off-by: Xu Liang <lxu@maxlinear.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: Fix spelling mistake "minimim" -> "minimum"

There is a spelling mistake in a pr_warn_ratelimited message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20230314082315.26532-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'nfp-flower-add-support-for-multi-zone-conntrack'

Louis Peens says:

====================
nfp: flower: add support for multi-zone conntrack

This series add changes to support offload of connection tracking across
multiple zones. Previously the driver only supported offloading of a
single goto_chain, spanning a single zone. This was implemented by
merging a pre_ct rule, post_ct rule and the nft rule. This series
provides updates to let the original post_ct rule act as the new pre_ct
rule for a next set of merges if it contains another goto and
conntrack action. In pseudo-tc rule format this adds support for:

    ingress chain 0 proto ip flower
        action ct zone 1 pipe action goto 1

    ingress chain 1 proto ip flower ct_state +tr+new ct_zone 1
        action ct_clear pipe action ct zone 2 pipe action goto 2
    ingress chain 1 proto ip flower ct_state +tr+est ct_zone 1
        action ct_clear pipe action ct zone 2 pipe action goto 2

    ingress chain 2 proto ip flower ct_state +tr+new ct_zone 2
        action mirred egress redirect dev ...
    ingress chain 2 proto ip flower ct_state +tr+est ct_zone 2
        action mirred egress redirect dev ...

This can continue for up to a maximum of 4 zone recirculations.

The first few patches are some smaller preparation patches while the
last one introduces the functionality.
====================

Link: https://lore.kernel.org/r/20230314063610.10544-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: offload tc flows of multiple conntrack zones

If goto_chain action present in the post ct flow rule, merge flow rules
in this ct-zone, create a new pre_ct entry as the pre ct flow rule of
next ct-zone, but do not offload merged flow rules to firmware. Repeat
the process in the next ct-zone until no goto_chain action present in
the post ct flow rule in a certain ct-zone, merged all the flow rules.
Offload to firmware finally.

Signed-off-by: Wentao Jia <wentao.jia@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: prepare for parameterisation of number of offload rules

The fixed number of offload flow rule is only supported scenario of one
ct zone, in the scenario of multiple ct zones, dynamic number and more
number of offload flow rules are required. In order to support scenario
of multiple ct zones, parameter num_rules is added for to offload flow
rules

Signed-off-by: Wentao Jia <wentao.jia@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: add goto_chain_index for ct entry

The chain_index has different means in pre ct entry and post ct entry.
In pre ct entry, it means chain index, but in post ct entry, it means
goto chain index, it is confused.

chain_index and goto_chain_index may be present in one flow rule, It
cannot be distinguished by one field chain_index, both chain_index
and goto_chain_index are required in the follow-up patch to support
multiple ct zones

Another field goto_chain_index is added to record the goto chain index.
If no goto action in post ct entry, goto_chain_index is 0.

Signed-off-by: Wentao Jia <wentao.jia@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: refactor function "is_post_ct_flow"

'ct_clear' action only or no ct action is supported for 'post_ct_flow'.
But in scenario of multiple ct zones, one non 'ct_clear' ct action or
more ct actions, including 'ct_clear action', may be present in one flow
rule. If ct state match key is 'ct_established', the flow rule is still
expected to be classified as 'post_ct_flow'. Check ct status first in
function "is_post_ct_flow" to achieve this.

Signed-off-by: Wentao Jia <wentao.jia@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: refactor function "is_pre_ct_flow"

In the scenario of multiple ct zones, ct state key match and ct action
is present in one flow rule, the flow rule is classified to post_ct_flow
in design.

There is no ct state key match for pre ct flow, the judging condition
is added to function "is_pre_ct_flow".

Chain_index is another field for judging which flows are pre ct flow
If chain_index not 0, the flow is not pre ct flow.

Signed-off-by: Wentao Jia <wentao.jia@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfp: flower: add get_flow_act_ct() for ct action

CT action is a special case different from other actions, CT clear action
is not required when get ct action, but this case is not considered.
If CT clear action in the flow rule, skip the CT clear action when get ct
action, return the first ct action that is not a CT clear action

Signed-off-by: Wentao Jia <wentao.jia@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge mlx5 updates 2023-03-13

Saeed Mahameed says:

====================
mlx5-updates-2023-03-13

1) Trivial cleanup patches
2) By Sandipan Patra: Implement thermal zone to report NIC temperature
3) Adham Faris, Improves devlink health diagnostics for netdev objects
4) From Maor, Enable TC offload for egress and engress MACVLAN over bond
5) From Gal, add devlink hairpin queues parameters to replace debugfs
as was discussed in [1]:
[1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/
====================

Link: https://lore.kernel.org/all/20230314054234.267365-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Enable TC offload for egress MACVLAN over bond

Support offloading of TC rules that mirror/redirect egress traffic to a
MACVLAN device, which is attached to bond device which master mlx5 devices.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-16-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Enable TC offload for ingress MACVLAN over bond

Support offloading of TC rules that filter ingress traffic from a MACVLAN
device, which is attached to bond device.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-15-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: TC, Extract indr setup block checks to function

In preparation for next patch which will add new check
if device block can be setup, extract all existing checks
to function to make it more readable and maintainable.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-14-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Add more information to hairpin table dump

Print the number of hairpin queues and size as part of the hairpin table
dump.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-13-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Add devlink hairpin queues parameters

We refer to a TC NIC rule that involves forwarding as "hairpin".
Hairpin queues are mlx5 hardware specific implementation for hardware
forwarding of such packets.

Per the discussion in [1], move the hairpin queues control (number and
size) from debugfs to devlink.

Expose two devlink params:
- hairpin_num_queues: control the number of hairpin queues
- hairpin_queue_size: control the size (in packets) of the hairpin queues

[1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Move needed PTYS functions to core layer

Downstream patches require devlink params to access the PTYS register,
move the needed functions from mlx5e to the core layer.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Add XSK RQ state flag for RQ devlink health diagnostics

Currently RQ health diagnostics doesn't inform the user whether an RQ
is an XSK RQ or not.

Address this, by adding XSK state flag to RQ SW state enum in core/en.h.
XSK will be '1' if current RQ is an XSK RQ, and it will be '0' if it's
not.

In this example below, it can be seen that XSK field value is '1' since
xdpsock program have been attached to channel 0 before issuing the
devlink query command:

$ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx

Output:
=======================================================================
Common config:
    RQ:
      type: 2 stride size: 4096 size: 16 ts_format: FRC
      CQ:
        stride size: 64 size: 1024
  RQs:
      channel ix: 0 rqn: 4236 HW state: 1 WQE counter: 15 posted WQEs: 15 cc: 15
        SW State:
          enabled: 1 recovering: 0 am: 1 no_csum_complete: 1 csum_full: 0 mini_cqe_hw_stridx: 1 shampo: 0 mini_cqe_enhanced: 0 xsk: 1
      CQ:
        cqn: 1085 HW status: 0 ci: 0 size: 1024
      EQ:
        eqn: 7 irqn: 32 vecidx: 0 ci: 5 size: 2048
      ICOSQ:
        sqn: 4229 HW state: 1 cc: 158 pc: 158 WQE size: 2048
        CQ:
          cqn: 1080 cc: 1 size: 2048

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Expose SQ SW state as part of SQ health diagnostics

Add SQ SW state textual representation to devlink health diagnostics
for tx reporter.

SQ SW state can be retrieved by issuing the devlink command below:

$ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter tx

Output
=======================================================================
Common Config:
    SQ:
      stride size: 64 size: 1024 ts_format: FRC
      CQ:
        stride size: 64 size: 1024
  SQs:
      channel ix: 0 tc: 0 txq ix: 0 sqn: 4170 HW state: 1 stopped: false cc: 0 pc: 0
        SW State:
          enabled: 1 mpwqe: 1 recovering: 0 ipsec: 0 am: 1 vlan_need_l2_inline: 1 pending_xsk_tx: 0 pending_tls_rx_resync: 0 xdp_multibuf: 0
      CQ:
        cqn: 1031 HW status: 0 ci: 0 size: 1024
      EQ:
        eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Stringify RQ SW state in RQ devlink health diagnostics

One of the parameters that is retrieved/printed as a response to
devlink health diagnostics for rx reporter is the RQ SW state.

It's printed as a bitmap decimal number. Printing it as bitmap is
problematic and non informative.

In addition User can't count on SW state without accessing the kernel
sources (mlx5e rq state enum in en.h).

This patch prints RQ SW state in a textual representation, as a key:
value pairs, where disabled rq states will appear as '0' and enabled
ones will appear as '1'.

See below the generated output for rx health diagnostics devlink
command:

$ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx

Before:
=======================================================================
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8 ts_format: FRC
      CQ:
        stride size: 64 size: 1024
  RQs:
      channel ix: 0 rqn: 4172 HW state: 1 SW state: 37 WQE counter: 7 posted WQEs: 7 cc: 7
      CQ:
        cqn: 1033 HW status: 0 ci: 0 size: 1024
      EQ:
        eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048
      ICOSQ:
        sqn: 4169 HW state: 1 cc: 74 pc: 74 WQE size: 128
        CQ:
          cqn: 1030 cc: 1 size: 128
      channel ix: 1 ...
        .
        .

After:
=======================================================================
Common config:
    RQ:
      type: 2 stride size: 2048 size: 8 ts_format: FRC
      CQ:
        stride size: 64 size: 1024
  RQs:
      channel ix: 0 rqn: 4172 HW state: 1 WQE counter: 7 posted WQEs: 7 cc: 7
        SW State:
          enabled: 1 recovering: 0 am: 1 no_csum_complete: 0 csum_full: 0 mini_cqe_hw_stridx: 1 shampo: 0 mini_cqe_enhanced: 0
      CQ:
        cqn: 1033 HW status: 0 ci: 0 size: 1024
      EQ:
        eqn: 7 irqn: 32 vecidx: 0 ci: 2 size: 2048
      ICOSQ:
        sqn: 4169 HW state: 1 cc: 74 pc: 74 WQE size: 128
        CQ:
          cqn: 1030 cc: 1 size: 128
       channel: ix: 1 ...
        .
        .

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Rename RQ/SQ adaptive moderation state flag

Dynamic interrupt moderation RQ and SQ feature represented by
MLX5E_RQ_STATE_AM and MLX5E_SQ_STATE_AM enums respectively, is not
consistent with the feature naming in the driver, and with the formal
feature and library names.

Hence, change MLX5E_RQ_STATE_AM and MLX5E_SQ_STATE_AM enum type names in
core/en.h to MLX5E_RQ_STATE_DIM and MLX5E_SQ_STATE_DIM respectively.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Utilize the entire fifo

Previous check was comparing against the fifo mask. The mask is size of the
fifo (power of two) minus one, so a less than or equal comparator should be
used for checking if the fifo has room for the SKB.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Implement thermal zone

Implement thermal zone support for mlx5 based HW. The NIC
uses temperature sensor provided by ASIC to report current temperature
to thermal core.

Signed-off-by: Sandipan Patra <spatra@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add comment to mlx5_devlink_params_register()

Add comment to mlx5_devlink_params_register() functions so it is clear
that only driver init params should be registered here.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Stop waiting for PCI up if teardown was triggered

If driver teardown is called while PCI is turned off, there is a race
between health recovery and teardown. If health recovery already started
it will wait 60 sec trying to see if PCI gets back and it can recover,
but actually there is no need to wait anymore once teardown was called.

Use the MLX5_BREAK_FW_WAIT flag which is set on driver teardown to break
waiting for PCI up.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: remove redundant clear_bit

When shutdown or remove callbacks are called the driver sets the flag
MLX5_BREAK_FW_WAIT, to stop waiting for FW as teardown was called. There
is no need to clear the bit as once shutdown or remove were called as
there is no way back, the driver is going down. Furthermore, if not
cleared the flag can be used also in other loops where we may wait while
teardown was already called.

Use test_bit() instead of test_and_clear_bit() as there is no need to
clear the flag.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: micrel: drop superfluous use of temp variable

'temp' was used before commit c0c99d0cd107 ("net: phy: micrel: remove
the use of .ack_interrupt()") refactored the code. Now, we can simplify
it a little.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230314124928.44948-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: update obsolete comment about PHY_STARTING

Commit 899a3cbbf77a ("net: phy: remove states PHY_STARTING and
PHY_PENDING") missed to update a comment in phy_probe. Remove
superfluous "Description:" prefix while we are here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20230314124856.44878-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: refactor mailbox overflow detection

Jake Keller says:

The primary motivation of this series is to cleanup and refactor the mailbox
overflow detection logic such that it will work with Scalable IOV. In
addition a few other minor cleanups are done while I was working on the
code in the area.

First, the mailbox overflow functions in ice_vf_mbx.c are refactored to
store the data per-VF as an embedded structure in struct ice_vf, rather than
stored separately as a fixed-size array which only works with Single Root
IOV. This reduces the overall memory footprint when only a handful of VFs
are used.

The overflow detection functions are also cleaned up to reduce the need for
multiple separate calls to determine when to report a VF as potentially
malicious.

Finally, the ice_is_malicious_vf function is cleaned up and moved into
ice_virtchnl.c since it is not Single Root IOV specific, and thus does not
belong in ice_sriov.c

I could probably have done this in fewer patches, but I split pieces out to
hopefully aid in reviewing the overall sequence of changes. This does cause
some additional thrash as it results in intermediate versions of the
refactor, but I think its worth it for making each step easier to
understand.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: call ice_is_malicious_vf() from ice_vc_process_vf_msg()
  ice: move ice_is_malicious_vf() to ice_virtchnl.c
  ice: print message if ice_mbx_vf_state_handler returns an error
  ice: pass mbxdata to ice_is_malicious_vf()
  ice: remove unnecessary &array[0] and just use array
  ice: always report VF overflowing mailbox even without PF VSI
  ice: declare ice_vc_process_vf_msg in ice_virtchnl.h
  ice: initialize mailbox snapshot earlier in PF init
  ice: merge ice_mbx_report_malvf with ice_mbx_vf_state_handler
  ice: remove ice_mbx_deinit_snapshot
  ice: move VF overflow message count into struct ice_mbx_vf_info
  ice: track malicious VFs in new ice_mbx_vf_info structure
  ice: convert ice_mbx_clear_malvf to void and use WARN
  ice: re-order ice_mbx_reset_snapshot function
====================

Link: https://lore.kernel.org/r/20230313182123.483057-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ines: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it might not be relevant here). This
also fixes !CONFIG_OF error:

drivers/ptp/ptp_ines.c:783:34: error: ‘ines_ptp_ctrl_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20230312132637.352755-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

scm: fix MSG_CTRUNC setting condition for SO_PASSSEC

Currently, kernel would set MSG_CTRUNC flag if msg_control buffer
wasn't provided and SO_PASSCRED was set or if there was pending SCM_RIGHTS.

For some reason we have no corresponding check for SO_PASSSEC.

In the recvmsg(2) doc we have:
       MSG_CTRUNC
              indicates that some control data was discarded due to lack
              of space in the buffer for ancillary data.

So, we need to set MSG_CTRUNC flag for all types of SCM.

This change can break applications those don't check MSG_CTRUNC flag.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
v2:
- commit message was rewritten according to Eric's suggestion
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-smc-updates'

Wenjia Zhang says:

====================
smc: Updates 2023-03-01

The 1st patch is to make implements later do not need to adhere to a
specific SEID format. The 2nd patch does some cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/ism: Remove extra include

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: Introduce explicit check for v2 support

Previously, v2 support was derived from a very specific format of the SEID
as part of the SMC-D codebase. Make this part of the SMC-D device API, so
implementers do not need to adhere to a specific SEID format.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: geneve: set IFF_POINTOPOINT with IFLA_GENEVE_INNER_PROTO_INHERIT

The GENEVE tunnel used with IFLA_GENEVE_INNER_PROTO_INHERIT is
point-to-point, so set IFF_POINTOPOINT to reflect that.

Signed-off-by: Josef Miegl <josef@miegl.cz>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: ca8210: drop owner from driver

Core already sets owner in spi_driver.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: adf7242: drop owner from driver

Core already sets owner in spi_driver.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: ca8210: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it might not be relevant here).

drivers/net/ieee802154/ca8210.c:3174:34: error: ‘ca8210_of_ids’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: at86rf230: drop of_match_ptr for ID table

The driver will match mostly by DT table (even thought there is regular
ID table) so there is little benefit in of_match_ptr (this also allows
ACPI matching via PRP0001, even though it might not be relevant here).

drivers/net/ieee802154/at86rf230.c:1644:34: error: ‘at86rf230_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: mcr20a: drop of_match_ptr for ID table

The driver will match mostly by DT table (even thought there is regular
ID table) so there is little benefit in of_match_ptr (this also allows
ACPI matching via PRP0001, even though it might not be relevant here).

drivers/net/ieee802154/mcr20a.c:1340:34: error: ‘mcr20a_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ieee802154: adf7242: drop of_match_ptr for ID table

The driver will match mostly by DT table (even thought there is regular
ID table) so there is little benefit in of_match_ptr (this also allows
ACPI matching via PRP0001, even though it might not be relevant here).

drivers/net/ieee802154/adf7242.c:1322:34: error: ‘adf7242_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: ks8995: drop of_match_ptr for ID table

The driver will match mostly by DT table (even thought there is regular
ID table) so there is little benefit in of_match_ptr (this also allows
ACPI matching via PRP0001, even though it might not be relevant here).

drivers/net/phy/spi_ks8995.c:156:34: error: ‘ks8895_spi_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it might not be relevant here).

drivers/net/dsa/ocelot/ocelot_ext.c:143:34: error: ‘ocelot_ext_switch_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ksz9477: drop of_match_ptr for ID table

The driver will match mostly by DT table (even thought there is regular
ID table) so there is little benefit in of_match_ptr (this also allows
ACPI matching via PRP0001, even though it might not be relevant here).

drivers/net/dsa/microchip/ksz9477_i2c.c:84:34: error: ‘ksz9477_dt_ids’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: seville_vsc9953: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it might not be relevant here).

drivers/net/dsa/ocelot/seville_vsc9953.c:1070:34: error: ‘seville_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lan9303: drop of_match_ptr for ID table

The driver will match mostly or only by DT table (even thought there is
regular ID table) so there is little benefit in of_match_ptr (this also
allows ACPI matching via PRP0001, even though it might not be relevant
here).

drivers/net/dsa/lan9303_i2c.c:97:34: error: ‘lan9303_i2c_of_match’ defined but not used [-Werror=unused-const-variable=]
drivers/net/dsa/lan9303_mdio.c:157:34: error: ‘lan9303_mdio_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: lantiq_gswip: mark OF related data as maybe unused

The driver can be compile tested with !CONFIG_OF making certain data
unused:

drivers/net/dsa/lantiq_gswip.c:1888:34: error: ‘xway_gphy_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: trf7970a: mark OF related data as maybe unused

The driver can be compile tested with !CONFIG_OF making certain data
unused:

drivers/nfc/trf7970a.c:2232:34: error: ‘trf7970a_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ni: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it is not relevant here).

drivers/net/ethernet/ni/nixge.c:1253:34: error: ‘nixge_dt_ids’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: samsung: sxgbe: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it is not relevant here).

drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c:220:34: error: ‘sxgbe_dt_ids’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: pxa168_eth: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it is not relevant here).

drivers/net/ethernet/marvell/pxa168_eth.c:1575:34: error: ‘pxa168_eth_of_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: generic: drop of_match_ptr for ID table

The driver can match only via the DT table so the table should be always
used and the of_match_ptr does not have any sense (this also allows ACPI
matching via PRP0001, even though it is not relevant here).

drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c:72:34: error: ‘dwmac_generic_match’ defined but not used [-Werror=unused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: qcom: drop of_match_ptr for ID table

The driver is specific to ARCH_QCOM which depends on OF thus the driver
is OF-only. Its of_device_id table is built unconditionally, thus
of_match_ptr() for ID table does not make sense.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-microchip-tc-ets'

Oleksij Rempel says:

====================
net: dsa: microchip: tc-ets support

changes v3:
- add tc_ets_supported to match supported devices
- dynamically regenerated default TC to queue map.
- add Acked-by to the first patch

changes v2:
- run egress limit configuration on all queue separately. Otherwise
configuration may not apply correctly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: add ETS Qdisc support for KSZ9477 series

Add ETS Qdisc support for KSZ9477 of switches. Current implementation is
limited to strict priority mode.

Tested on KSZ8563R with following configuration:
tc qdisc replace dev lan2 root handle 1: ets strict 4 \
  priomap 3 3 2 2 1 1 0 0
ip link add link lan2 name v1 type vlan id 1 \
  egress-qos-map 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7

and patched iperf3 version:
https://github.com/esnet/iperf/pull/1476
iperf3 -c 172.17.0.1 -b100M  -l1472 -t100 -u -R --sock-prio 2

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: add ksz_setup_tc_mode() function

Add ksz_setup_tc_mode() to make queue scheduling and shaping
configuration more visible.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-optimize-rt6_score_route'

Eric Dumazet says:

====================
ipv6: optimize rt6_score_route()

This patch series remove an expensive rwlock acquisition
in rt6_check_neigh()/rt6_score_route().

First patch adds missing annotations, and second patch implements
the optimization.
====================

Link: https://lore.kernel.org/r/20230313201732.887488-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: remove one read_lock()/read_unlock() pair in rt6_check_neigh()

rt6_check_neigh() uses read_lock() to protect n->nud_state reading.

This seems overkill and causes false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbour: annotate lockless accesses to n->nud_state

We have many lockless accesses to n->nud_state.

Before adding another one in the following patch,
add annotations to readers and writers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: Change lan966x_police_del return type

As the function always returns 0 change the return type to be
void instead of int. In this way also remove a wrong message
in case of error which would never happen.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230312195155.1492881-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Use of_property_present() for testing DT property presence

It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230310144716.1544083-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
i40e: support XDP multi-buffer

Tirthendu Sarkar says:

This patchset adds multi-buffer support for XDP. Tx side already has
support for multi-buffer. This patchset focuses on Rx side. The last
patch contains actual multi-buffer changes while the previous ones are
preparatory patches.

On receiving the first buffer of a packet, xdp_buff is built and its
subsequent buffers are added to it as frags. While 'next_to_clean' keeps
pointing to the first descriptor, the newly introduced 'next_to_process'
keeps track of every descriptor for the packet.

On receiving EOP buffer the XDP program is called and appropriate action
is taken (building skb for XDP_PASS, reusing page for XDP_DROP, adjusting
page offsets for XDP_{REDIRECT,TX}).

The patchset also streamlines page offset adjustments for buffer reuse
to make it easier to post process the rx_buffers after running XDP prog.

With this patchset there does not seem to be any performance degradation
for XDP_PASS and some improvement (~1% for XDP_TX, ~5% for XDP_DROP) when
measured using xdp_rxq_info program from samples/bpf/ for 64B packets.

v1: https://lore.kernel.org/netdev/20230306210822.3381942-1-anthony.l.nguyen@intel.com/

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  i40e: add support for XDP multi-buffer Rx
  i40e: add xdp_buff to i40e_ring struct
  i40e: introduce next_to_process to i40e_ring
  i40e: use frame_sz instead of recalculating truesize for building skb
  i40e: Change size to truesize when using i40e_rx_buffer_flip()
  i40e: add pre-xdp page_count in rx_buffer
  i40e: change Rx buffer size for legacy-rx to support XDP multi-buffer
  i40e: consolidate maximum frame size calculation for vsi
====================

Link: https://lore.kernel.org/r/20230309212819.1198218-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tc-testing: add tests for action binding

Add tests that check if filters can bind actions, that is create an
action independently and then bind to a filter.

tdc-tests under category 'infra':
1..18
ok 1 abdc - Reference pedit action object in filter
ok 2 7a70 - Reference mpls action object in filter
ok 3 d241 - Reference bpf action object in filter
ok 4 383a - Reference connmark action object in filter
ok 5 c619 - Reference csum action object in filter
ok 6 a93d - Reference ct action object in filter
ok 7 8bb5 - Reference ctinfo action object in filter
ok 8 2241 - Reference gact action object in filter
ok 9 35e9 - Reference gate action object in filter
ok 10 b22e - Reference ife action object in filter
ok 11 ef74 - Reference mirred action object in filter
ok 12 2c81 - Reference nat action object in filter
ok 13 ac9d - Reference police action object in filter
ok 14 68be - Reference sample action object in filter
ok 15 cf01 - Reference skbedit action object in filter
ok 16 c109 - Reference skbmod action object in filter
ok 17 4abc - Reference tunnel_key action object in filter
ok 18 dadd - Reference vlan action object in filter

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230309175554.304824-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt: avoid overflow in bnxt_get_nvram_directory()

The value of an arithmetic expression is subject
of possible overflow due to a failure to cast operands to a larger data
type before performing arithmetic. Used macro for multiplication instead
operator for avoiding overflow.

Found by Security Code and Linux Verification
Center (linuxtesting.org) with SVACE.

Signed-off-by: Maxim Korotkov <korotkov.maxim.s@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230309174347.3515-1-korotkov.maxim.s@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: virtio_net: implement exact header length guest feature

Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when
set implicates that device benefits from knowing the exact size
of the header. For compatibility, to signal to the device that
the header is reliable driver also needs to set this feature.
Without this feature set by driver, device has to figure
out the header size itself.

Quoting the original virtio spec:
"hdr_len is a hint to the device as to how much of the header needs to
be kept to copy into each packet"

"a hint" might not be clear for the reader what does it mean, if it is
"maybe like that" of "exactly like that". This feature just makes it
crystal clear and let the device count on the hdr_len being filled up
by the exact length of header.

Also note the spec already has following note about hdr_len:
"Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size."

Without this feature the device needs to parse the header in core
data path handling. Accurate information helps the device to eliminate
such header parsing and directly use the hardware accelerators
for GSO operation.

virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb).
The driver already complies to fill the correct value. Introduce the
feature and advertise it.

Note that virtio spec also includes following note for device
implementation:
"Caution should be taken by the implementation so as to prevent
a malicious driver from attacking the device by setting
an incorrect hdr_len."

There is a plan to support this feature in our emulated device.
A device of SolidRun offers this feature bit. They claim this feature
will save the device a few cycles for every GSO packet.

Link: https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html#x1-230006x3
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230309094559.917857-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hsr: Don't log netdev_err message on unknown prp dst node

If no frames has been exchanged with a node for HSR_NODE_FORGET_TIME, the
node will be deleted from the node_db list. If a frame is sent to the node
after it is deleted, a netdev_err message for each slave interface is
produced. This should not happen with dan nodes because of supervision
frames, but can happen often with san nodes, which clutters the kernel
log. Since the hsr protocol does not support sans, this is only relevant
for the prp protocol.

Signed-off-by: Kristian Overskeid <koverskeid@gmail.com>
Link: https://lore.kernel.org/r/20230309092302.179586-1-koverskeid@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: smsc: use device_property_present in smsc_phy_probe

Use unified device property API.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/a969f012-1d3b-7a36-51cf-89a5f8f15a9b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: socket: suppress unused warning

suppress unused warnings and fix the error that there is
with the W=1 enabled.

Warning generated

net/socket.c: In function ‘__sys_getsockopt’:
net/socket.c:2300:13: error: variable ‘max_optlen’ set but not used [-Werror=unused-but-set-variable]
2300 | int max_optlen;

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230310221851.304657-1-vincenzopalazzodev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83867: Disable IRQs on suspend

Before putting the PHY into IEEE power down mode, disable IRQs to
prevent accessing the PHY once MDIO has already been shutdown.

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230310074500.3472858-1-alexander.stein@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: micrel: Add support for PTP_PF_PEROUT for lan8841

Lan8841 has 10 GPIOs and it has 2 events(EVENT_A and EVENT_B). It is
possible to assigned the 2 events to any of the GPIOs, but a GPIO can
have only 1 event at a time.
These events are used to generate periodic signals. It is possible to
configure the length, the start time and the period of the signal by
configuring the event.
Currently the SW uses only EVENT_A to generate the perout.

These events are generated by comparing the target time with the PHC
time. In case the PHC time is changed to a value bigger than the target
time + reload time, then it would generate only 1 event and then it
would stop because target time + reload time is small than PHC time.
Therefore it is required to change also the target time every time when
the PHC is changed. The same will apply also when the PHC time is
changed to a smaller value.

This was tested using:
testptp -L 6,2
testptp -p 1000000000 -w 200000000

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230307214402.793057-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: call ice_is_malicious_vf() from ice_vc_process_vf_msg()

The main loop in __ice_clean_ctrlq first checks if a VF might be malicious
before calling ice_vc_process_vf_msg(). This results in duplicate code in
both functions to obtain a reference to the VF, and exports the
ice_is_malicious_vf() from ice_virtchnl.c unnecessarily.

Refactor ice_is_malicious_vf() to be a static function that takes a pointer
to the VF. Call this in ice_vc_process_vf_msg() just after we obtain a
reference to the VF by calling ice_get_vf_by_id.

Pass the mailbox data from the __ice_clean_ctrlq function into
ice_vc_process_vf_msg() instead of calling ice_is_malicious_vf().

This reduces the number of exported functions and avoids the need to obtain
the VF reference twice for every mailbox message.

Note that the state check for ICE_VF_STATE_DIS is kept in
ice_is_malicious_vf() and we call this before checking that state in
ice_vc_process_vf_msg. This is intentional, as we stop responding to VF
messages from a VF once we detect that it may be overflowing the mailbox.
This ensures that we continue to silently ignore the message as before
without responding via ice_vc_send_msg_to_vf().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>