review.tizen.org Git - platform/kernel/linux-starfive.git/log

devlink: Move devlink fmsg and health diagnose to health file

Devlink fmsg (formatted message) is used by devlink health diagnose,
dump and drivers which support these devlink health callbacks.
Therefore, move devlink fmsg helpers and related code to file health.c.
Move devlink health diagnose to file health.c. No functional change in
this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink health report and recover to health file

Move devlink health report helper and recover callback and related code
from leftover.c to health.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink health get and set code to health file

Move devlink health get and set callbacks and related code from
leftover.c to health.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: health: Fix nla_nest_end in error flow

devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix
it to call nla_nest_cancel() instead.

Note the bug is harmless as genlmsg_cancel() cancel the entire message,
so no fixes tag added.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Split out health reporter create code

Move devlink health reporter create/destroy and related dev code to new
file health.c. This file shall include all callbacks and functionality
that are related to devlink health.

In addition, fix kdoc indentation and make reporter create/destroy kdoc
more clear. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: update xdp_features with xdp multi-buff

Now ice driver supports xdp multi-buffer so add it to xdp_features.
Check vsi type before setting xdp_features flag.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/8a4781511ab6e3cd280e944eef69158954f1a15f.1676385351.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i40e: check vsi type before setting xdp_features flag

Set xdp_features flag just for I40E_VSI_MAIN vsi type since XDP is
supported just in this configuration.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/f2b537f86b34fc176fbc6b3d249b46a20a87a2f3.1676405131.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: support validated pause and autoneg in fixed-link

In fixed-link setup phylink_parse_fixedlink() unconditionally sets
Pause, Asym_Pause and Autoneg bits to "supported" bitmap, while MAC may
not support these.

This leads to ethtool reporting:

> Supported pause frame use: Symmetric Receive-only
> Supports auto-negotiation: Yes

regardless of what is actually supported.

Instead of unconditionally set Pause, Asym_Pause and Autoneg it is
sensible to set them according to validated "supported" bitmap, i.e. the
result of phylink_validate().

Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: no longer support SOCK_REFCNT_DEBUG feature

Commit e48c414ee61f ("[INET]: Generalise the TCP sock ID lookup routines")
commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another
commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") removed it.
Since we could track all of them through bpf and kprobe related tools
and the feature could print loads of information which might not be
that helpful even under a little bit pressure, the whole feature which
has been inactive for many years is no longer supported.

Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink-specs: add rx-push to ethtool family

Commit 5b4e9a7a71ab ("net: ethtool: extend ringparam set/get APIs for rx_push")
added a new attr for configuring rx-push, right after tx-push.
Add it to the spec, the ring param operation is covered by
the otherwise sparse ethtool spec.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20230214043246.230518-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-make-kobj_type-structures-constant'

Thomas Weißschuh says:

====================
net: make kobj_type structures constant

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.
====================

Link: https://lore.kernel.org/r/20230211-kobj_type-net-v2-0-013b59e59bf3@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net-sysfs: make kobj_type structures constant

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: make kobj_type structure constant

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ipa-define-gsi-register-fields-differently'

Alex Elder says:

====================
net: ipa: define GSI register fields differently

Now that we have "reg" definitions in place to define GSI register
offsets, add the definitions for the fields of GSI registers that
have them.

There aren't many differences between versions, but a few fields are
present only in some versions of IPA, so additional "gsi_reg-vX.Y.c"
files are created to capture such differences. As in the previous
series, these files are created as near-copies of existing files
just before they're needed to represent these differences. The
first patch adds files for IPA v4.0, v4.5, and v4.9; the fifth patch
adds a file for IPA v4.11.

Note that the first and fifth patch cause some checkpatch warnings
because they align some continued lines with an open parenthesis
that at the fourth column.
====================

Link: https://lore.kernel.org/r/20230213162229.604438-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: define fields for remaining GSI registers

Define field IDs for the remaining GSI registers, and populate the
register definition files accordingly. Use the reg_*() functions to
access field values for those regiters, and get rid of the previous
field definition constants.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: add "gsi_v4.11.c"

The next patch adds a GSI register field that is only valid starting
at IPA v4.11. Create "gsi_v4.11.c" from "gsi_v4.9.c", changing only
the name of the public regs structure it defines.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: define fields for event-ring related registers

Define field IDs for the EV_CH_E_CNTXT_0 and EV_CH_E_CNTXT_8 GSI
registers, and populate the register definition files accordingly.
Use the reg_*() functions to access field values for those regiters,
and get rid of the previous field definition constants.

The remaining EV_CH_E_CNTXT_* registers are written with full 32-bit
values (and have no fields).

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: define more fields for GSI registers

Beyond the CH_C_QOS register, two other registers whose offset is
related to channel number have fields within them.

Define the fields within the CH_C_CNTXT_0 GSI register, using an
enumerated type to identify the register's fields, and define an
array of field masks to use for that register's reg structure.

For the CH_C_CNTXT_1 GSI register, ch_c_cntxt_1_length_encode()
previously hid the difference in bit width in the channel ring
length field. Instead, define a new field CH_R_LENGTH and encode
the ring size with reg_encode().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: define GSI CH_C_QOS register fields

Define the fields within the CH_C_QOS GSI register using an array of
field masks in that register's reg structure. Use the reg functions
for encoding values in those fields.

One field in the register is present for IPA v4.0-4.2 only, two
others are present starting at IPA v4.5, and one more is there
starting at IPA v4.9.

Drop the "GSI_" prefix in symbols defined in the gsi_prefetch_mode
enumerated type, and define their values using decimal rather than
hexidecimal values.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: populate more GSI register files

Create "gsi_v4.0.c", "gsi_v4.5.c", and "gsi_v4.9.c" as essentially
identical copies of "gsi_v3.5.1.c". The only difference is the name
of the exported "gsi_regs_vX_Y" structure. The next patch will
start differentiating them.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: don't allow to change net namespace for FW_ACTIVATE reload action

The change on network namespace only makes sense during re-init reload
action. For FW activation it is not applicable. So check if user passed
an ATTR indicating network namespace change request and forbid it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230213115836.3404039-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hv_netvsc: Check status in SEND_RNDIS_PKT completion message

Completion responses to SEND_RNDIS_PKT messages are currently processed
regardless of the status in the response, so that resources associated
with the request are freed. While this is appropriate, code bugs that
cause sending a malformed message, or errors on the Hyper-V host, go
undetected. Fix this by checking the status and outputting a rate-limited
message if there is an error.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1676264881-48928-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'add-support-for-per-action-hw-stats'

Oz Shlomo says:

====================
add support for per action hw stats

There are currently two mechanisms for populating hardware stats:
1. Using flow_offload api to query the flow's statistics.
   The api assumes that the same stats values apply to all
   the flow's actions.
   This assumption breaks when action drops or jumps over following
   actions.
2. Using hw_action api to query specific action stats via a driver
   callback method. This api assures the correct action stats for
   the offloaded action, however, it does not apply to the rest of the
   actions in the flow's actions array, as elaborated below.

The current hw_action api does not apply to the following use cases:
1. Actions that are implicitly created by filters (aka bind actions).
   In the following example only one counter will apply to the rule:
   tc filter add dev $DEV prio 2 protocol ip parent ffff: \
        flower ip_proto tcp dst_ip $IP2 \
        action police rate 1mbit burst 100k conform-exceed drop/pipe \
        action mirred egress redirect dev $DEV2

2. Action preceding a hw action.
   In the following example the same flow stats will apply to the sample and
   mirred actions:
    tc action add police rate 1mbit burst 100k conform-exceed drop / pipe
    tc filter add dev $DEV prio 2 protocol ip parent ffff: \
        flower ip_proto tcp dst_ip $IP2 \
        action sample rate 1 group 10 trunc 60 pipe \
        action police index 1 \
        action mirred egress redirect dev $DEV2

3. Meter action using jump control.
   In the following example the same flow stats will apply to both
   mirred actions:
    tc action add police rate 1mbit burst 100k conform-exceed jump 2 / pipe
    tc filter add dev $DEV prio 2 protocol ip parent ffff: \
        flower ip_proto tcp dst_ip $IP2 \
        action police index 1 \
        action mirred egress redirect dev $DEV2
        action mirred egress redirect dev $DEV3

This series provides the platform to query per action stats for in_hw flows.

The first four patches are preparation patches with no functionality change.
The fifth patch re-uses the existing flow action stats api to query action
stats for both classifier and action dumps.
The rest of the patches add per action stats support to the Mellanox driver.
====================

Link: https://lore.kernel.org/r/20230212132520.12571-1-ozsh@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: TC, support per action stats

Extend the action stats callback implementation to update stats for actions
that are associated with hw counters.
Note that the callback may be called from tc action utility or from tc
flower. Both apis expect the driver to return the stats difference from
the last update. As such, query the raw counter value and maintain
the diff from the last api call in the tc layer, instead of the fs_core
layer.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: TC, map tc action cookie to a hw counter

Currently a hardware counter is associated with a flow cookie.
This does not apply to flows using branching action which are required to
return per action stats.

A single counter may apply to multiple actions.
Scan the flow actions in reverse (from the last to the first action) while
caching the last counter.
Associate all the flow attribute tc action cookies with the current
cached counter.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: TC, store tc action cookies per attr

The tc parse action phase translates the tc actions to mlx5 flow
attributes data structure that is used during the flow offload phase.
Currently, the flow offload stage instantiates hw counters while
associating them to flow cookie. However, flows with branching
actions are required to associate a hardware counter with its action
cookies.

Store the parsed tc action cookies on the flow attribute.
Use the list of cookies in the next patch to associate a tc action cookie
with its allocated hw counter.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: TC, add hw counter to branching actions

Currently a hw count action is appended to the last action of the action
list. However, a branching action may terminate the action list before
reaching the last action.

Append a count action to a branching action.
In the next patches, filters with branching actions will read this counter
when reporting stats per action.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: support per action hw stats

There are currently two mechanisms for populating hardware stats:
1. Using flow_offload api to query the flow's statistics.
   The api assumes that the same stats values apply to all
   the flow's actions.
   This assumption breaks when action drops or jumps over following
   actions.
2. Using hw_action api to query specific action stats via a driver
   callback method. This api assures the correct action stats for
   the offloaded action, however, it does not apply to the rest of the
   actions in the flow's actions array.

Extend the flow_offload stats callback to indicate that a per action
stats update is required.
Use the existing flow_offload_action api to query the action's hw stats.
In addition, currently the tc action stats utility only updates hw actions.
Reuse the existing action stats cb infrastructure to query any action
stats.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: introduce flow_offload action cookie

Currently a hardware action is uniquely identified by the <id, hw_index>
tuple. However, the id is set by the flow_act_setup callback and tc core
cannot enforce this, and it is possible that a future change could break
this. In addition, <id, hw_index> are not unique across network namespaces.

Uniquely identify the action by setting an action cookie by the tc core.
Use the unique action cookie to query the action's hardware stats.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: pass flow_stats instead of multiple stats args

Instead of passing 6 stats related args, pass the flow_stats.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: act_pedit, setup offload action for action stats query

A single tc pedit action may be translated to multiple flow_offload
actions.
Offload only actions that translate to a single pedit command value.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: optimize action stats api calls

Currently the hw action stats update is called from tcf_exts_hw_stats_update,
when a tc filter is dumped, and from tcf_action_copy_stats, when a hw
action is dumped.
However, the tcf_action_copy_stats is also called from tcf_action_dump.
As such, the hw action stats update cb is called 3 times for every
tc flower filter dump.

Move the tc action hw stats update from tcf_action_copy_stats to
tcf_dump_walker to update the hw action stats when tc action is dumped.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: dsa: mediatek,mt7530: improve binding description

Fix inaccurate information about PHY muxing, and merge standalone and
multi-chip module MT7530 configuration methods.

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230212131258.47551-1-arinc.unal@arinc9.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'ipv6-more-drop-reason'

Eric Dumazet says:

====================
ipv6: more drop reason

Add more drop reasons to IPv6:

- IPV6_BAD_EXTHDR
- IPV6_NDISC_FRAG
- IPV6_NDISC_HOP_LIMIT
- IPV6_NDISC_BAD_CODE
====================

Link: https://lore.kernel.org/r/20230210184708.2172562-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp6: add drop reason support to ndisc_rcv()

Creates three new drop reasons:

SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc).

SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit.

SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: icmp6: add drop reason support to icmpv6_notify()

Accurately reports what happened in icmpv6_notify() when handling
a packet.

This makes use of the new IPV6_BAD_EXTHDR drop reason.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add pskb_may_pull_reason() helper

pskb_may_pull() can fail for two different reasons.

Provide pskb_may_pull_reason() helper to distinguish
between these reasons.

It returns:

SKB_NOT_DROPPED_YET : Success
SKB_DROP_REASON_PKT_TOO_SMALL : packet too small
SKB_DROP_REASON_NOMEM : skb->head could not be resized

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dropreason: add SKB_DROP_REASON_IPV6_BAD_EXTHDR

This drop reason can be used whenever an IPv6 packet
has a malformed extension header.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwc-qos: Make struct dwc_eth_dwmac_data::remove return void

All implementations of the remove callback return 0 unconditionally. So
in dwc_eth_dwmac_remove() there is no error handling necessary. Simplify
accordingly.

This is a preparation for making struct platform_driver::remove return
void, too.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230211112431.214252-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Make stmmac_dvr_remove() return void

The function returns zero unconditionally. Change it to return void instead
which simplifies some callers as error handing becomes unnecessary.

This also makes it more obvious that most platform remove callbacks always
return zero.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230211112431.214252-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: do not set xdp_features for hw buffer devices

Devices with hardware buffer management do not support XDP, so do not
set xdp_features for them.

Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/19b5838bb3e4515750af822edb2fa5e974d0a86b.1676196230.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hv_netvsc: add missing NETDEV_XDP_ACT_NDO_XMIT xdp-features flag

Add missing ndo_xdp_xmit bit to xdp_features capability flag.

Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/8e3747018f0fd0b5d6e6b9aefe8d9448ca3a3288.1676195726.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add missing NETDEV_XDP_ACT_XSK_ZEROCOPY bit to xdp_features

Add missing xsk zero-copy bit to xdp_features capability flag.

Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c8949baafdf617188dcedb9033ce5a9ca6e9e5ff.1676195440.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_wed: No need to clear memory after a dma_alloc_coherent() call

dma_alloc_coherent() already clears the allocated memory, there is no need
to explicitly call memset().

Moreover, it is likely that the size in the memset() is incorrect and
should be "size * sizeof(*ring->desc)".

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/d5acce7dd108887832c9719f62c7201b4c83b3fb.1676184599.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: set xdp_features flag

Set xdp_features netdevice flag if lan966x nic supports xdp mode.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/01f4412f28899d97b0054c9c1a63694201301b42.1676055718.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ksz9477-eee-support'

Oleksij Rempel says:

====================
net: add EEE support for KSZ9477 switch family

changes v8:
- fix comment for linkmode_to_mii_eee_cap1_t() function
- add Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
- add Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

changes v7:
- update documentation for genphy_c45_eee_is_active()
- address review comments on "net: dsa: microchip: enable EEE support"
  patch

changes v6:
- split patch set and send only first 9 patches
- Add Reviewed-by: Andrew Lunn <andrew@lunn.ch>
- use 0xffff instead of GENMASK
- Document @supported_eee
- use "()" with function name in comments

changes v5:
- spell fixes
- move part of genphy_c45_read_eee_abilities() to
  genphy_c45_read_eee_cap1()
- validate MDIO_PCS_EEE_ABLE register against 0xffff val.
- rename *eee_100_10000* to *eee_cap1*
- use linkmode_intersects(phydev->supported, PHY_EEE_CAP1_FEATURES)
  instead of !linkmode_empty()
- add documentation to linkmode/register helpers

changes v4:
- remove following helpers:
  mmd_eee_cap_to_ethtool_sup_t
  mmd_eee_adv_to_ethtool_adv_t
  ethtool_adv_to_mmd_eee_adv_t
  and port drivers from this helpers to linkmode helpers.
- rebase against latest net-next
- port phy_init_eee() to genphy_c45_eee_is_active()

changes v3:
- rework some parts of EEE infrastructure and move it to c45 code.
- add supported_eee storage and start using it in EEE code and by the
  micrel driver.
- add EEE support for ar8035 PHY
- add SmartEEE support to FEC i.MX series.

changes v2:
- use phydev->supported instead of reading MII_BMSR regiaster
- fix @get_eee > @set_eee

With this patch series we provide EEE control for KSZ9477 family of
switches and
AR8035 with i.MX6 configuration.
According to my tests, on a system with KSZ8563 switch and 100Mbit idle
link,
we consume 0,192W less power per port if EEE is enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: start using genphy_c45_ethtool_get/set_eee()

All preparations are done. Now we can start using new functions and remove
the old code.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: migrate phy_init_eee() to genphy_c45_eee_is_active()

Reduce code duplicated by migrating phy_init_eee() to
genphy_c45_eee_is_active().

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: c45: migrate to genphy_c45_write_eee_adv()

Migrate from genphy_config_eee_advert() to genphy_c45_write_eee_adv().

It should work as before except write operation to the EEE adv registers
will be done only if some EEE abilities was detected.

If some driver will have a regression, related driver should provide own
.get_features callback. See micrel.c:ksz9477_get_features() as example.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: c22: migrate to genphy_c45_write_eee_adv()

Migrate from genphy_config_eee_advert() to genphy_c45_write_eee_adv().

It should work as before except write operation to the EEE adv registers
will be done only if some EEE abilities was detected.

If some driver will have a regression, related driver should provide own
.get_features callback. See micrel.c:ksz9477_get_features() as example.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add genphy_c45_ethtool_get/set_eee() support

Add replacement for phy_ethtool_get/set_eee() functions.

Current phy_ethtool_get/set_eee() implementation is great and it is
possible to make it even better:
- this functionality is for devices implementing parts of IEEE 802.3
  specification beyond Clause 22. The better place for this code is
  phy-c45.c
- currently it is able to do read/write operations on PHYs with
  different abilities to not existing registers. It is better to
  use stored supported_eee abilities to avoid false read/write
  operations.
- the eee_active detection will provide wrong results on not supported
  link modes. It is better to validate speed/duplex properties against
  supported EEE link modes.
- it is able to support only limited amount of link modes. We have more
  EEE link modes...

By refactoring this code I address most of this point except of the last
one. Adding additional EEE link modes will need more work.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: export phy_check_valid() function

This function will be needed for genphy_c45_ethtool_get_eee() provided
by next patch.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add ksz9477_get_features()

KSZ8563R, which has same PHYID as KSZ9477 family, will change "EEE control
and capability 1" (Register 3.20) content depending on configuration of
"EEE advertisement 1" (Register 7.60). Changes on the 7.60 will affect
3.20 register.

So, instead of depending on register 3.20, driver should set supported_eee.

Proper supported_eee configuration is needed to make use of generic
PHY c45 set/get_eee functions provided by next patches.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add genphy_c45_read_eee_abilities() function

Add generic function for EEE abilities defined by IEEE 802.3
specification. For now following registers are supported:
- IEEE 802.3-2018 45.2.3.10 EEE control and capability 1 (Register 3.20)
- IEEE 802.3cg-2019 45.2.1.186b 10BASE-T1L PMA status register
(Register 1.2295)

Since I was not able to find any flag signaling support of these
registers, we should detect link mode abilities first and then based on
these abilities doing EEE link modes detection.

Results of EEE ability detection will be stored into new variable
phydev->supported_eee.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: microchip: enable EEE support

Some of KSZ9477 family switches provides EEE support. To enable it, we
just need to register set_mac_eee/set_mac_eee handlers and validate
supported chip version and port.

Currently supported chip variants are: KSZ8563, KSZ9477, KSZ9563,
KSZ9567, KSZ9893, KSZ9896, KSZ9897. KSZ8563 supports EEE only with
100BaseTX/Full. Other chips support 100BaseTX/Full and 1000BaseTX/Full.
Low Power Idle configuration is not supported and currently not
documented in the datasheets.

EEE PHY specific tunings are not documented in the switch datasheets, but can
overlap with KSZ9131 specification.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ionic-on-chip-desc'

Shannon Nelson says:

====================
ionic: on-chip descriptors

We start with a couple of house-keeping patches that were originally
presented for 'net', then we add support for on-chip descriptor rings
for tx-push, as well as adding support for rx-push.

I have a patch for the ethtool userland utility that I can send out
once this has been accepted.

v4: added rx-push attributes to ethtool netlink
    converted CMB feature from using a priv-flag to using ethtool tx/rx-push

v3: edited commit message to describe interface-down limitation
    added warn msg if cmb_inuse alloc fails
    removed unnecessary clearing of phy_cmb_pages and cmb_npages
    changed cmb_rings_toggle to use cmb_inuse
    removed unrelated pci_set_drvdata()
    removed unnecessary (u32) cast
    added static inline func for writing CMB descriptors

v2: dropped the rx buffers patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: add tx/rx-push support with device Component Memory Buffers

The ionic device has on-board memory (CMB) that can be used
for descriptors as a way to speed descriptor access for faster
packet processing. It is rumored to improve latency and/or
packets-per-second for some profiles of small packet traffic,
although your mileage may vary.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethtool: extend ringparam set/get APIs for rx_push

Similar to what was done for TX_PUSH, add an RX_PUSH concept
to the ethtool interfaces.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: remove unnecessary void casts

Minor Code cleanup details.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: remove unnecessary indirection

We have the pointer already, don't need to go through the
lif struct for it.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ipa-GSI-regs'

Alex Elder says:

====================
net: ipa: determine GSI register offsets differently

This series changes the way GSI register offset are specified, using
the "reg" mechanism currently used for IPA registers.  A follow-on
series will extend this work so fields within GSI registers are also
specified this way.

The first patch rearranges the GSI register initialization code so
it is similar to the way it's done for the IPA registers.  The
second identifies all the GSI registers in an enumerated type.
The third introduces "gsi_reg-v3.1.c" and uses the "reg" code to
define one GSI register offset.  The second-to-last patch just
adds "gsi_reg-v3.5.1.c", because that version introduces a new
register not previously defined.  All the rest just define the
rest of the GSI register offsets using the "reg" mechanism.

Note that, to have continued lines align with an open parenthesis,
new files created in this series cause some checkpatch warnings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define IPA remaining GSI register offsets

Add the remaining GSI register offset definitions. Use gsi_reg()
rather than the corresponding GSI_*_OFFSET() macros to get the
offsets for these registers, and get rid of the macros.

Note that we are now defining information for the HW_PARAM_2
register, and that doesn't appear until IPA v3.5.1.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add "gsi_v3.5.1.c"

The next patch adds a GSI register field that is only valid starting
at IPA v3.5.1. Create "gsi_v3.5.1.c" from "gsi_v3.1.c", changing
only the name of the public regs structure it defines.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define IPA v3.1 GSI interrupt register offsets

Add definitions of the offsets for IRQ-related GSI registers. Use
gsi_reg() rather than the corresponding GSI_CNTXT_*_OFFSET() macros
to get the offsets for these registers, and get rid of the macros.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: define IPA v3.1 GSI event ring register offsets

Add definitions of the offsets and strides for registers whose
offset depends on an event ring ID, and use gsi_reg() and its
returned value to determine offsets for these registers. Get
rid of the corresponding GSI_EV_CH_E_*_OFFSET() macros.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: add more GSI register definitions

Continue populating with GSI register definitions, adding remaining
registers whose offset depends on a channel ID. Use gsi_reg() and
reg_n_offset() to determine offsets for those registers, and get rid
of the corresponding GSI_CH_C_*_OFFSET() macros.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: start creating GSI register definitions

Create a new register definition file in the "reg" subdirectory,
and begin populating it with GSI register definitions based on IPA
version.  The GSI registers haven't changed much, so several IPA
versions can share the same GSI register definitions.

As with IPA registers, an array of pointers indexed by GSI register ID
refers to these register definitions, and a new "regs" field in the
GSI structure is initialized in gsi_reg_init() to refer to register
information based on the IPA version (though for now there's only
one).  The new function gsi_reg() returns register information for
a given GSI register, and the result can be used to look up that
register's offset.

This patch is meant only to put the infrastructure in place, so only
eon register (CH_C_QOS) is defined for each version, and only the
offset and stride are defined for that register.  Use new function
gsi_reg() to look up that register's information to get its offset,
This makes the GSI_CH_C_QOS_OFFSET() unnecessary, so get rid of it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: introduce GSI register IDs

Create a new gsi_reg_id enumerated type, which identifies each GSI
register with a symbolic identifier.

Create a function that indicates whether a register ID is valid.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipa: introduce gsi_reg_init()

Create a new source file "gsi_reg.c", and in it, introduce a new
function to encapsulate initializing GSI registers, including
looking up and I/O mapping their memory.

Create gsi_reg_exit() as the inverse of the init function.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: fix error recovery in qdisc_create()

If TCA_STAB attribute is malformed, qdisc_get_stab() returns
an error, and we end up calling ops->destroy() while ops->init()
has not been called yet.

While we are at it, call qdisc_put_stab() after ops->destroy().

Fixes: 1f62879e3632 ("net/sched: make stab available before ops->init() call")
Reported-by: syzbot+d44d88f1d11e6ca8576b@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: micrel: Add PHC support for lan8841

Add support for PHC and timestamping operations for the lan8841 PHY.
PTP 1-step and 2-step modes are supported, over Ethernet and UDP both
ipv4 and ipv6.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-params-cleanup'

Jiri Pirko says:

====================
devlink: params cleanups and devl_param_driverinit_value_get() fix

The primary motivation of this patchset is the patch #6, which fixes an
issue introduced by 075935f0ae0f ("devlink: protect devlink param list
by instance lock") and reported by Kim Phillips <kim.phillips@amd.com>
(https://lore.kernel.org/netdev/719de4f0-76ac-e8b9-38a9-167ae239efc7@amd.com/)
and my colleagues doing mlx5 driver regression testing.

The basis idea is that devl_param_driverinit_value_get() could be
possible to the called without holding devlink intance lock in
most of the cases (all existing ones in the current codebase),
which would fix some mlx5 flows where the lock is not held.

To achieve that, make sure that the param value does not change between
reloads with patch #2.

Also, convert the param list to xarray which removes the worry about
list_head consistency when doing lockless lookup.

The rest of the patches are doing some small related cleanup of things
that poke me in the eye during the work.

---
v1->v2:
- a small bug was fixed in patch #2, the rest of the code stays the same
so I left review/ack tags attached to them
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: add forgotten devlink instance lock assertion to devl_param_driverinit_value_set()

Driver calling devl_param_driverinit_value_set() has to hold devlink
instance lock while doing that. Put an assertion there.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: allow to call devl_param_driverinit_value_get() without holding instance lock

If the driver maintains following basic sane behavior, the
devl_param_driverinit_value_get() function could be called without
holding instance lock:

1) Driver ensures a call to devl_param_driverinit_value_get() cannot
   race with registering/unregistering the parameter with
   the same parameter ID.
2) Driver ensures a call to devl_param_driverinit_value_get() cannot
   race with devl_param_driverinit_value_set() call with
   the same parameter ID.
3) Driver ensures a call to devl_param_driverinit_value_get() cannot
   race with reload operation.

By the nature of params usage, these requirements should be
trivially achievable. If the driver for some off reason
is not able to comply, it has to take the devlink->lock while
calling devl_param_driverinit_value_get().

Remove the lock assertion and add comment describing
the locking requirements.

This fixes a splat in mlx5 driver introduced by the commit
referenced in the "Fixes" tag.

Lore: https://lore.kernel.org/netdev/719de4f0-76ac-e8b9-38a9-167ae239efc7@amd.com/
Reported-by: Kim Phillips <kim.phillips@amd.com>
Fixes: 075935f0ae0f ("devlink: protect devlink param list by instance lock")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: convert param list to xarray

Loose the linked list for params and use xarray instead.

Note that this is required to be eventually possible to call
devl_param_driverinit_value_get() without holding instance lock.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: use xa_for_each_start() helper in devlink_nl_cmd_port_get_dump_one()

As xarray has an iterator helper that allows to start from specified
index, use this directly and avoid repeated iteration from 0.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: fix the name of value arg of devl_param_driverinit_value_get()

Probably due to copy-paste error, the name of the arg is "init_val"
which is misleading, as the pointer is used to point to struct where to
store the current value. Rename it to "val" and change the arg comment
a bit on the way.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: make sure driver does not read updated driverinit param before reload

The driverinit param purpose is to serve the driver during init/reload
time to provide a value, either default or set by user.

Make sure that driver does not read value updated by user before the
reload is performed. Hold the new value in a separate struct and switch
it during reload.

Note that this is required to be eventually possible to call
devl_param_driverinit_value_get() without holding instance lock.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: don't use strcpy() to copy param value

No need to treat string params any different comparing to other types.
Rely on the struct assign to copy the whole struct, including the
string.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: ethtool: supplement nfp link modes supported

Add support for the following modes to the nfp driver:

NFP_MEDIA_10GBASE_LR
NFP_MEDIA_25GBASE_LR
NFP_MEDIA_25GBASE_ER

These modes are supported by the hardware and,
support for them was recently added to firmware.

Signed-off-by: Yu Xiao <yu.xiao@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: stop using NL_SET_ERR_MSG_MOD

NL_SET_ERR_MSG_MOD inserts the KBUILD_MODNAME and a ':' before the actual
extended error message. The devlink feature hasn't been able to be compiled
as a module since commit f4b6bcc7002f ("net: devlink: turn devlink into a
built-in").

Stop using NL_SET_ERR_MSG_MOD, and just use the base NL_SET_ERR_MSG. This
aligns the extended error messages better with the NL_SET_ERR_MSG_ATTR
messages as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: rds_rm_zerocopy_callback() correct order for list_add_tail()

rds_rm_zerocopy_callback() uses list_add_tail() with swapped
arguments. This links the list head with the new entry, losing
the references to the remaining part of the list.

Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-02-09 (i40e)

Jan removes i40e_status from the driver; replacing them with standard
kernel error codes.

Kees Cook replaces 0-length array with flexible array.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  net/i40e: Replace 0-length array with flexible array
  i40e: use ERR_PTR error print in i40e messages
  i40e: use int for i40e_status
  i40e: Remove string printing for i40e_status
  i40e: Remove unused i40e status codes
====================

Link: https://lore.kernel.org/r/20230209172536.3595838-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 's390-net-updates-2023-02-06'

Alexandra Winter says:

====================
s390/net: updates 2023-02-06

Just maintenance patches, no functional changes.
If you disagree with patch 4, we can leave it out.
We prefer scnprintf, but no strong opinion.
====================

Link: https://lore.kernel.org/r/20230209110424.1707501-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Convert sprintf/snprintf to scnprintf

This LWN article explains the rationale for this change
https: //lwn.net/Articles/69419/
Ie. snprintf() returns what *would* be the resulting length,
while scnprintf() returns the actual length.

Reported-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Convert sysfs sprintf to sysfs_emit

Following the advice of the Documentation/filesystems/sysfs.rst.
All sysfs related show()-functions should only use sysfs_emit() or
sysfs_emit_at() when formatting the value to be returned to user space.

Reported-by: Jules Irenge <jbi.octave@gmail.com>
Reported-by: Joe Perches <joe@perches.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/qeth: Use constant for IP address buffers

Use INET6_ADDRSTRLEN constant with size of 48 which be used for char arrays
storing ip addresses (for IPv4 and IPv6)

Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/ctcm: cleanup indenting

Get rid of multiple smatch warnings, like:
warn: inconsistent indenting

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-renesas-rswitch-improve-tx-timestamp-accuracy'

Yoshihiro Shimoda says:

====================
net: renesas: rswitch: Improve TX timestamp accuracy

The patch [[123]/4] are minor refacoring for readability.
The patch [4/4] is for improving TX timestamp accuracy.
To improve the accuracy, it requires refactoring so that this is not
a fixed patch.

v2: https://lore.kernel.org/all/20230208235721.2336249-1-yoshihiro.shimoda.uh@renesas.com/
v1: https://lore.kernel.org/all/20230208073445.2317192-1-yoshihiro.shimoda.uh@renesas.com/
====================

Link: https://lore.kernel.org/r/20230209081741.2536034-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: renesas: rswitch: Improve TX timestamp accuracy

In the previous code, TX timestamp accuracy was bad because the irq
handler got the timestamp from the timestamp register at that time.

This hardware has "Timestamp capture" feature which can store
each TX timestamp into the timestamp descriptors. To improve
TX timestamp accuracy, implement timestamp descriptors' handling.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: renesas: rswitch: Remove gptp flag from rswitch_gwca_queue

In the previous code, the gptp flag was completely related to
the !dir_tx in struct rswitch_gwca_queue because
rswitch_gwca_queue_alloc() was called below:

< In rswitch_txdmac_alloc() >
err = rswitch_gwca_queue_alloc(ndev, priv, rdev->tx_queue, true, false,
TX_RING_SIZE);
So, dir_tx = true, and gptp = false.

< In rswitch_rxdmac_alloc() >
err = rswitch_gwca_queue_alloc(ndev, priv, rdev->rx_queue, false, true,
RX_RING_SIZE);
So, dir_tx = false, and gptp = true.

In the future, a new queue handling for timestamp will be implemented
and this gptp flag is confusable. So, remove the gptp flag.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: renesas: rswitch: Move linkfix variables to rswitch_gwca

To improve readability, move linkfix related variables to
struct rswitch_gwca. Also, rename function names "desc" with "linkfix".

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: renesas: rswitch: Rename rings in struct rswitch_gwca_queue

To add a new ring which is really related to timestamp (ts_ring)
in the future, rename the following members to improve readability:

ring --> tx_ring
ts_ring --> rx_ring

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: libwx: fix an error code in wx_alloc_page_pool()

This function always returns success. We need to preserve the error
code before setting rx_ring->page_pool = NULL.

Fixes: 850b971110b2 ("net: libwx: Allocate Rx and Tx resources")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y+T4aoefc1XWvGYb@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: ocelot: add PTP dependency for NET_DSA_MSCC_OCELOT_EXT

A new user of MSCC_OCELOT_SWITCH_LIB was added, bringing back an old
link failure that was fixed with e5f31552674e ("ethernet: fix
PTP_1588_CLOCK dependencies"):

x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_ptp_enable':
ocelot_ptp.c:(.text+0x8ee): undefined reference to `ptp_find_pin'
x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_get_ts_info':
ocelot_ptp.c:(.text+0xd5d): undefined reference to `ptp_clock_index'
x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_init_timestamp':
ocelot_ptp.c:(.text+0x15ca): undefined reference to `ptp_clock_register'
x86_64-linux-ld: drivers/net/ethernet/mscc/ocelot_ptp.o: in function `ocelot_deinit_timestamp':
ocelot_ptp.c:(.text+0x16b7): undefined reference to `ptp_clock_unregister'

Add the same PTP dependency here, as well as in the MSCC_OCELOT_SWITCH_LIB
symbol itself to make it more obvious what is going on when the next
driver selects it.

Fixes: 3d7316ac81ac ("net: dsa: ocelot: add external ocelot switch control")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Colin Foster <colin.foster@in-advantage.com>
Link: https://lore.kernel.org/r/20230209124435.1317781-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bridge-mcast-preparations-for-vxlan-mdb'

Ido Schimmel says:

====================
bridge: mcast: Preparations for VXLAN MDB

This patchset contains small preparations for VXLAN MDB that were split
from this RFC [1]. Tested using existing bridge MDB forwarding
selftests.

[1] https://lore.kernel.org/netdev/20230204170801.3897900-1-idosch@nvidia.com/
====================

Link: https://lore.kernel.org/r/20230209071852.613102-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: Add MDB dump test cases

The kernel maintains three markers for the MDB dump:

1. The last bridge device from which the MDB was dumped.
2. The last MDB entry from which the MDB was dumped.
3. The last port-group entry that was dumped.

Add test cases for large scale MDB dump to make sure that all the
configured entries are dumped and that the markers are used correctly.

Specifically, create 2 bridges with 32 ports and add 256 MDB entries in
which all the ports are member of. Test that each bridge reports 8192
(256 * 32) permanent entries. Do that with IPv4, IPv6 and L2 MDB
entries.

On my system, MDB dump of the above is contained in about 50 netlink
messages.

Example output:

# ./bridge_mdb.sh
[...]
INFO: # Large scale dump tests
TEST: IPv4 large scale dump tests                                   [ OK ]
TEST: IPv6 large scale dump tests                                   [ OK ]
TEST: L2 large scale dump tests                                     [ OK ]
[...]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: mcast: Move validation to a policy

Future patches are going to move parts of the bridge MDB code to the
common rtnetlink code in preparation for VXLAN MDB support. To
facilitate code sharing between both drivers, move the validation of the
top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will
eventually be moved to the rtnetlink code.

Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of
NLA_POLICY_NESTED() as this attribute is going to be validated using
different policies in the underlying drivers.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: mcast: Remove pointless sequence generation counter assignment

The purpose of the sequence generation counter in the netlink callback
is to identify if a multipart dump is consistent or not by calling
nl_dump_check_consistent() whenever a message is generated.

The function is not invoked by the MDB code, rendering the sequence
generation counter assignment pointless. Remove it.

Note that even if the function was invoked, we still could not
accurately determine if the dump is consistent or not, as there is no
sequence generation counter for MDB entries, unlike nexthop objects, for
example.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: mcast: Use correct define in MDB dump

'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same
value, but the latter is uAPI and cannot change, so use it when dumping
an MDB entry.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>