review.tizen.org Git - platform/kernel/linux-rpi.git/log

net/mlx5: Lag, Use flag to check for shared FDB mode

It's redundant and incorrect to check lag is also sriov mode.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, Remove redundant bool allocation on the stack

There is no need to allocate the bool variable and can just return the value.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, Use mlx5_lag_dev() instead of derefering pointers

Use the existing wrapper mlx5_lag_dev() to access the lag object from
dev for better maintainability and consistent code.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Lag, Update multiport eswitch check to log an error

Update the function to log an error to the user if failing to offload
the rule and while there add correct prefix for the function name.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Merge branch 'net-smc-parallelism'

D. Wythe says:

====================
net/smc: optimize the parallelism of SMC-R connections

This patch set attempts to optimize the parallelism of SMC-R connections,
mainly to reduce unnecessary blocking on locks, and to fix exceptions that
occur after thoses optimization.

According to Off-CPU graph, SMC worker's off-CPU as that:

smc_close_passive_work                  (1.09%)
        smcr_buf_unuse                  (1.08%)
                smc_llc_flow_initiate   (1.02%)

smc_listen_work                         (48.17%)
        __mutex_lock.isra.11            (47.96%)

An ideal SMC-R connection process should only block on the IO events
of the network, but it's quite clear that the SMC-R connection now is
queued on the lock most of the time.

The goal of this patchset is to achieve our ideal situation where
network IO events are blocked for the majority of the connection lifetime.

There are three big locks here:

1. smc_client_lgr_pending & smc_server_lgr_pending

2. llc_conf_mutex

3. rmbs_lock & sndbufs_lock

And an implementation issue:

1. confirm/delete rkey msg can't be sent concurrently while
protocol allows indeed.

Unfortunately,The above problems together affect the parallelism of
SMC-R connection. If any of them are not solved. our goal cannot
be achieved.

After this patch set, we can get a quite ideal off-CPU graph as
following:

smc_close_passive_work                                  (41.58%)
        smcr_buf_unuse                                  (41.57%)
                smc_llc_do_delete_rkey                  (41.57%)

smc_listen_work                                         (39.10%)
        smc_clc_wait_msg                                (13.18%)
                tcp_recvmsg_locked                      (13.18)
        smc_listen_find_device                          (25.87%)
                smcr_lgr_reg_rmbs                       (25.87%)
                        smc_llc_do_confirm_rkey         (25.87%)

We can see that most of the waiting times are waiting for network IO
events. This also has a certain performance improvement on our
short-lived conenction wrk/nginx benchmark test:

+--------------+------+------+-------+--------+------+--------+
|conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
+--------------+------+------+-------+--------+------+--------+
|TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
+--------------+------+------+-------+--------+------+--------+

The reason why the benefit is not obvious after the number of connections
has increased dues to workqueue. If we try to change workqueue to UNBOUND,
we can obtain at least 4-5 times performance improvement, reach up to half
of TCP. However, this is not an elegant solution, the optimization of it
will be much more complicated. But in any case, we will submit relevant
optimization patches as soon as possible.

Please note that the premise here is that the lock related problem
must be solved first, otherwise, no matter how we optimize the workqueue,
there won't be much improvement.

Because there are a lot of related changes to the code, if you have
any questions or suggestions, please let me know.

Thanks
D. Wythe

v1 -> v2:

1. Fix panic in SMC-D scenario
2. Fix lnkc related hashfn calculation exception, caused by operator
priority
3. Only wake up one connection if the lnk is not active
4. Delete obsolete unlock logic in smc_listen_work()
5. PATCH format, do Reverse Christmas tree
6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
7. PATCH format, add correct fix tag for the patches for fixes.
8. PATCH format, fix some spelling error
9. PATCH format, rename slow to do_slow

v2 -> v3:

1. add SMC-D support, remove the concept of link cluster since SMC-D has
no link at all. Replace it by lgr decision maker, who provides suggestions
to SMC-D and SMC-R on whether to create new link group.

2. Fix the corruption problem described by PATCH 'fix application
data exception' on SMC-D.

v3 -> v4:

1. Fix panic caused by uninitialization map.

v4 -> v5:

1. Make SMC-D buf creation be serial to avoid Potential error
2. Add a flag to synchronize the success of the first contact
with the ready of the link group, including SMC-D and SMC-R.
3. Fixed possible reference count leak in smc_llc_flow_start().
4. reorder the patch, make bugfix PATCH be ahead.

v5 -> v6:

1. Separate the bugfix patches to make it independent.
2. Merge patch 'fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending'
with patch 'remove locks smc_client_lgr_pending and smc_server_lgr_pending'
3. Format code styles, including alignment and reverse christmas tree
style.
4. Fix a possible memory leak in smc_llc_rmt_delete_rkey()
and smc_llc_rmt_conf_rkey().

v6 -> v7:

1. Discard patch attempting to remove global locks
2. Discard patch attempting make confirm/delete rkey process concurrently
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore

It's clear that rmbs_lock and sndbufs_lock are aims to protect the
rmbs list or the sndbufs list.

During connection establieshment, smc_buf_get_slot() will always
be invoked, and it only performs read semantics in rmbs list and
sndbufs list.

Based on the above considerations, we replace mutex with rw_semaphore.
Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot()
run concurrently, other part use down_write() to keep exclusive
semantics.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs()

Unlike smc_buf_create() and smcr_buf_unuse(), smcr_lgr_reg_rmbs() is
exclusive when assigned rmb_desc was not registered, although it can be
executed in parallel when assigned rmb_desc was registered already
and only performs read semtamics on it. Hence, we can not simply replace
it with read semaphore.

The idea here is that if the assigned rmb_desc was registered already,
use read semaphore to protect the critical section, once the assigned
rmb_desc was not registered, keep using keep write semaphore still
to keep its exclusivity.

Thanks to the reusable features of rmb_desc, which allows us to execute
in parallel in most cases.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse()

Following is part of Off-CPU graph during frequent SMC-R short-lived
processing:

process_one_work (51.19%)
smc_close_passive_work (28.36%)
smcr_buf_unuse (28.34%)
rwsem_down_write_slowpath (28.22%)

smc_listen_work (22.83%)
smc_clc_wait_msg (1.84%)
smc_buf_create (20.45%)
smcr_buf_map_usable_links
rwsem_down_write_slowpath (20.43%)
smcr_lgr_reg_rmbs (0.53%)
rwsem_down_write_slowpath (0.43%)
smc_llc_do_confirm_rkey (0.08%)

We can clearly see that during the connection establishment time,
waiting time of connections is not on IO, but on llc_conf_mutex.

What is more important, the core critical area (smcr_buf_unuse() &
smc_buf_create()) only perfroms read semantics on links, we can
easily replace it with read semaphore.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: llc_conf_mutex refactor, replace it with rw_semaphore

llc_conf_mutex was used to protect links and link related configurations
in the same link group, for example, add or delete links. However,
in most cases, the protected critical area has only read semantics and
with no write semantics at all, such as obtaining a usable link or an
available rmb_desc.

This patch do simply code refactoring, replace mutex with rw_semaphore,
replace mutex_lock with down_write and replace mutex_unlock with
up_write.

Theoretically, this replacement is equivalent, but after this patch,
we can distinguish lock granularity according to different semantics
of critical areas.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'updates-to-enetc-txq-management'

Vladimir Oltean says:

====================
Updates to ENETC TXQ management

The set ensures that the number of TXQs given by enetc to the network
stack (mqprio or TX hashing) + the number of TXQs given to XDP never
exceeds the number of available TXQs.

These are the first 4 patches of series "[v5,net-next,00/17] ENETC
mqprio/taprio cleanup" from here:
https://patchwork.kernel.org/project/netdevbpf/cover/20230202003621.2679603-1-vladimir.oltean@nxp.com/

There is no change in this version compared to there. I split them off
because this contains a fix for net-next and it would be good if it
could go in quickly. I also did it to reduce the patch count of that
other series, if I need to respin it again.
====================

Link: https://lore.kernel.org/r/20230203001116.3814809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: ensure we always have a minimum number of TXQs for stack

Currently it can happen that an mqprio qdisc is installed with num_tc 8,
and this will reserve 8 (out of 8) TXQs for the network stack. Then we
can attach an XDP program, and this will crop 2 TXQs, leaving just 6 for
mqprio. That's not what the user requested, and we should fail it.

On the other hand, if mqprio isn't requested, we still give the 8 TXQs
to the network stack (with hashing among a single traffic class), but
then, cropping 2 TXQs for XDP is fine, because the user didn't
explicitly ask for any number of TXQs, so no expectations are violated.

Simply put, the logic that mqprio should impose a minimum number of TXQs
for the network never existed. Let's say (more or less arbitrarily) that
without mqprio, the driver expects a minimum number of TXQs equal to the
number of CPUs (on NXP LS1028A, that is either 1, or 2). And with mqprio,
mqprio gives the minimum required number of TXQs.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: recalculate num_real_tx_queues when XDP program attaches

Since the blamed net-next commit, enetc_setup_xdp_prog() no longer goes
through enetc_open(), and therefore, the function which was supposed to
detect whether a BPF program exists (in order to crop some TX queues
from network stack usage), enetc_num_stack_tx_queues(), no longer gets
called.

We can move the netif_set_real_num_rx_queues() call to enetc_alloc_msix()
(probe time), since it is a runtime invariant. We can do the same thing
with netif_set_real_num_tx_queues(), and let enetc_reconfigure_xdp_cb()
explicitly recalculate and change the number of stack TX queues.

Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: allow the enetc_reconfigure() callback to fail

enetc_reconfigure() was modified in commit c33bfaf91c4c ("net: enetc:
set up XDP program under enetc_reconfigure()") to take an optional
callback that runs while the netdev is down, but this callback currently
cannot fail.

Code up the error handling so that the interface is restarted with the
old resources if the callback fails.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: simplify enetc_num_stack_tx_queues()

We keep a pointer to the xdp_prog in the private netdev structure as
well; what's replicated per RX ring is done so just for more convenient
access from the NAPI poll procedure.

Simplify enetc_num_stack_tx_queues() by looking at priv->xdp_prog rather
than iterating through the information replicated per RX ring.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'raw-add-drop-reasons-and-use-another-hash-function'

Eric Dumazet says:

====================
raw: add drop reasons and use another hash function

Two first patches add drop reasons to raw input processing.

Last patch spreads RAW sockets in the shared hash tables
to avoid long hash buckets in some cases.
====================

Link: https://lore.kernel.org/r/20230202094100.3083177-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

raw: use net_hash_mix() in hash function

Some applications seem to rely on RAW sockets.

If they use private netns, we can avoid piling all RAW
sockets bound to a given protocol into a single bucket.

Also place (struct raw_hashinfo).lock into its own
cache line to limit false sharing.

Alternative would be to have per-netns hashtables,
but this seems too expensive for most netns
where RAW sockets are not used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: raw: add drop reasons

Use existing helpers and drop reason codes for RAW input path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: raw: add drop reasons

Use existing helpers and drop reason codes for RAW input path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-move-devlink-dev-code-to-a-separate-file'

Moshe Shemesh says:

====================
devlink: Move devlink dev code to a separate file

This patchset is moving code from the file leftover.c to new file dev.c.
About 1.3K lines are moved by this patchset covering most of the devlink
dev object callbacks and functionality: reload, eswitch, info, flash and
selftest.
====================

Link: https://lore.kernel.org/r/1675349226-284034-1-git-send-email-moshe@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink dev selftest code to dev

Move devlink dev selftest callbacks and related code from leftover.c to
file dev.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink_info_req struct to be local

As all users of the struct devlink_info_req are already in dev.c, move
this struct from devl_internal.c to be local in dev.c.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink dev flash code to dev

Move devlink dev flash callbacks, helpers and other related code from
leftover.c to dev.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink dev info code to dev

Move devlink dev info callbacks, related drivers helpers functions and
other related code from leftover.c to dev.c. No functional change in
this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink dev eswitch code to dev

Move devlink dev eswitch callbacks and related code from leftover.c to
file dev.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move devlink dev reload code to dev

Move devlink dev reload callback and related code from leftover.c to
file dev.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Split out dev get and dump code

Move devlink dev get and dump callbacks and related dev code to new file
dev.c. This file shall include all callbacks that are specific on
devlink dev object.

No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: use NL_SET_ERR_MSG_WEAK_MOD() more consistently

Now that commit 028fb19c6ba7 ("netlink: provide an ability to set
default extack message") provides a weak function that doesn't override
an existing extack message provided by the driver, it makes sense to use
it also for LAG and HSR offloading, not just for bridge offloading.

Also consistently put the message string on a separate line, to reduce
line length from 92 to 84 characters.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230202140354.3158129-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'yt8531-support'

Frank Sae says:

====================
net: add dts for yt8521 and yt8531s, add driver for yt8531

Add dts for yt8521 and yt8531s, add driver for yt8531.
These patches have been verified on our AM335x platform (motherboard)
which has one integrated yt8521 and one RGMII interface.
It can connect to daughter boards like yt8531s or yt8531 board.

v5:
- change the compatible of yaml
- change the maintainers of yaml from "frank sae" to "Frank Sae"

v4:
- change default tx delay from 150ps to 1950ps
- add compatible for yaml

v3:
- change default rx delay from 1900ps to 1950ps
- moved ytphy_rgmii_clk_delay_config_with_lock from yt8521's patch to yt8531's patch
- removed unnecessary checks of phydev->attached_dev->dev_addr

v2:
- split BIT macro as one patch
- split "dts for yt8521/yt8531s ... " patch as two patches
- use standard rx-internal-delay-ps and tx-internal-delay-ps, removed motorcomm,sds-tx-amplitude
- removed ytphy_parse_dt, ytphy_probe_helper and ytphy_config_init_helper
- not store dts arg to yt8521_priv
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add driver for Motorcomm yt8531 gigabit ethernet phy

Add a driver for the motorcomm yt8531 gigabit ethernet phy. We have
verified the driver on AM335x platform with yt8531 board. On the
board, yt8531 gigabit ethernet phy works in utp mode, RGMII
interface, supports 1000M/100M/10M speeds, and wol(magic package).

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add dts support for Motorcomm yt8531s gigabit ethernet phy

Add dts support for Motorcomm yt8531s gigabit ethernet phy.
Change yt8521_probe to support clk config of yt8531s. Becase
yt8521_probe does the things which yt8531s is needed, so
removed yt8531s function.
This patch has been verified on AM335x platform with yt8531s board.

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add dts support for Motorcomm yt8521 gigabit ethernet phy

Add dts support for Motorcomm yt8521 gigabit ethernet phy.
Add ytphy_rgmii_clk_delay_config function to support dst config for
the delay of rgmii clk. This funciont is common for yt8521, yt8531s
and yt8531.
This patch has been verified on AM335x platform.

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add BIT macro for Motorcomm yt8521/yt8531 gigabit ethernet phy

Add BIT macro for Motorcomm yt8521/yt8531 gigabit ethernet phy.
This is a preparatory patch. Add BIT macro for 0xA012 reg, and
supplement for 0xA001 and 0xA003 reg. These will be used to support dts.

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: Add Motorcomm yt8xxx ethernet phy

Add a YAML binding document for the Motorcomm yt8xxx Ethernet phy.

Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'act_ct-UDP-NEW'

Vlad Buslov says:

====================
net: Allow offloading of UDP NEW connections via act_ct

Currently only bidirectional established connections can be offloaded
via act_ct. Such approach allows to hardcode a lot of assumptions into
act_ct, flow_table and flow_offload intermediate layer codes. In order
to enabled offloading of unidirectional UDP NEW connections start with
incrementally changing the following assumptions:

- Drivers assume that only established connections are offloaded and
  don't support updating existing connections. Extract ctinfo from meta
  action cookie and refuse offloading of new connections in the drivers.

- Fix flow_table offload fixup algorithm to calculate flow timeout
  according to current connection state instead of hardcoded
  "established" value.

- Add new flow_table flow flag that designates bidirectional connections
  instead of assuming it and hardcoding hardware offload of every flow
  in both directions.

- Add new flow_table flow flag that designates connections that are
  offloaded to hardware as "established" instead of assuming it. This
  allows some optimizations in act_ct and prevents spamming the
  flow_table workqueue with redundant tasks.

With all the necessary infrastructure in place modify act_ct to offload
UDP NEW as unidirectional connection. Pass reply direction traffic to CT
and promote connection to bidirectional when UDP connection state
changes to "assured". Rely on refresh mechanism to propagate connection
state change to supporting drivers.

Note that early drop algorithm that is designed to free up some space in
connection tracking table when it becomes full (by randomly deleting up
to 5% of non-established connections) currently ignores connections
marked as "offloaded". Now, with UDP NEW connections becoming
"offloaded" it could allow malicious user to perform DoS attack by
filling the table with non-droppable UDP NEW connections by sending just
one packet in single direction. To prevent such scenario change early
drop algorithm to also consider "offloaded" connections for deletion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack: allow early drop of offloaded UDP conns

Both synchronous early drop algorithm and asynchronous gc worker completely
ignore connections with IPS_OFFLOAD_BIT status bit set. With new
functionality that enabled UDP NEW connection offload in action CT
malicious user can flood the conntrack table with offloaded UDP connections
by just sending a single packet per 5tuple because such connections can no
longer be deleted by early drop algorithm.

To mitigate the issue allow both early drop and gc to consider offloaded
UDP connections for deletion.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_ct: offload UDP NEW connections

Modify the offload algorithm of UDP connections to the following:

- Offload NEW connection as unidirectional.

- When connection state changes to ESTABLISHED also update the hardware
flow. However, in order to prevent act_ct from spamming offload add wq for
every packet coming in reply direction in this state verify whether
connection has already been updated to ESTABLISHED in the drivers. If that
it the case, then skip flow_table and let conntrack handle such packets
which will also allow conntrack to potentially promote the connection to
ASSURED.

- When connection state changes to ASSURED set the flow_table flow
NF_FLOW_HW_BIDIRECTIONAL flag which will cause refresh mechanism to offload
the reply direction.

All other protocols have their offload algorithm preserved and are always
offloaded as bidirectional.

Note that this change tries to minimize the load on flow_table add
workqueue. First, it tracks the last ctinfo that was offloaded by using new
flow 'NF_FLOW_HW_ESTABLISHED' flag and doesn't schedule the refresh for
reply direction packets when the offloads have already been updated with
current ctinfo. Second, when 'add' task executes on workqueue it always
update the offload with current flow state (by checking 'bidirectional'
flow flag and obtaining actual ctinfo/cookie through meta action instead of
caching any of these from the moment of scheduling the 'add' work)
preventing the need from scheduling more updates if state changed
concurrently while the 'add' work was pending on workqueue.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_ct: set ctinfo in meta action depending on ct state

Currently tcf_ct_flow_table_fill_actions() function assumes that only
established connections can be offloaded and always sets ctinfo to either
IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction
without checking actual connection state. To enable UDP NEW connection
offload set the ctinfo, metadata cookie and NF_FLOW_HW_ESTABLISHED
flow_offload flags bit based on ct->status value.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: flowtable: cache info of last offload

Modify flow table offload to cache the last ct info status that was passed
to the driver offload callbacks by extending enum nf_flow_flags with new
"NF_FLOW_HW_ESTABLISHED" flag. Set the flag if ctinfo was 'established'
during last act_ct meta actions fill call. This infrastructure change is
necessary to optimize promoting of UDP connections from 'new' to
'established' in following patches in this series.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: flowtable: allow unidirectional rules

Modify flow table offload to support unidirectional connections by
extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only
offload reply direction when the flag is set. This infrastructure change is
necessary to support offloading UDP NEW connections in original direction
in following patches in series.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: flowtable: fixup UDP timeout depending on ct state

Currently flow_offload_fixup_ct() function assumes that only replied UDP
connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To
enable UDP NEW connection offload in following patches extract the actual
connections state from ct->status and set the timeout according to it.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: flow_offload: provision conntrack info in ct_metadata

In order to offload connections in other states besides "established" the
driver offload callbacks need to have access to connection conntrack info.
Flow offload intermediate representation data structure already contains
that data encoded in 'cookie' field, so just reuse it in the drivers.

Reject offloading IP_CT_NEW connections for now by returning an error in
relevant driver callbacks based on value of ctinfo. Support for offloading
such connections will need to be added to the drivers afterwards.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan966x: Add VCAP debugFS support

Enable debugfs for vcap for lan966x. This will allow to print all the
entries in the VCAP and also the port information regarding which keys
are configured.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rswitch-SERDES-PHY-init'

Yoshihiro Shimoda says:

====================
net: renesas: rswitch: Modify initialization for SERDES and PHY

- My platform has the 88x2110.
- The MACTYPE setting of strap pin on the platform is SXGMII.
- However, we realized that the SoC cannot communicate the PHY with SXGMII
  because of mismatching hardware specification.
- We have a lot of boards which mismatch the MACTYPE setting.

So, I would like to change the MACTYPE as SGMII by software for the platform.

The patch [1/5] sets phydev->host_interfaces by phylink for Marvell PHY
driver (marvell10g) to initialize the MACTYPE.

- The patch [1/5] siplifies the rswitch driver.
- The patch [2/5] converts to phy_device from phylink.
- The patch [3/5] sets phydev->host_interfaces from this driver without
  any new functions of phylib.
- The patch [4/5] adds phy_power_on() calling to initialize the Ethernet
  SERDES PHY driver (r8a779f0-eth-serdes) for each channel.
- The patch [5/5] adds "max-speed" handling.

Changes from v4:
https://lore.kernel.org/all/20230127142621.1761278-1-yoshihiro.shimoda.uh@renesas.com/
- No modification of phylink API.
- Convert to phylib instead of phylink.
- Add "max-speed" handling.

Changes from v3:
https://lore.kernel.org/all/20230127014812.1656340-1-yoshihiro.shimoda.uh@renesas.com/
- Keep a pointer of "port" and more simplify the code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: renesas: rswitch: Add "max-speed" handling

The previous code set the speed by the interface mode of PHY.
Also this hardware has a restriction which cannot change the speed
at runtime. To use other speed, add "max-speed" handling to set
each port's speed if needed.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: renesas: rswitch: Add phy_power_{on,off}() calling

Some Ethernet PHYs (like marvell10g) will decide the host interface
mode by the media-side speed. So, the rswitch driver needs to
initialize one of the Ethernet SERDES (r8a779f0-eth-serdes) ports
after linked the Ethernet PHY up. The r8a779f0-eth-serdes driver has
.init() for initializing all ports and .power_on() for initializing
each port. So, add phy_power_{on,off} calling for it.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: renesas: rswitch: Add host_interfaces setting

Set phydev->host_interfaces before calling of_phy_connect() to
configure the PHY with the information of host_interfaces.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: renesas: rswitch: Convert to phy_device

Intended to set phy_device->host_interfaces by phylink in the future.
But there is difficult to implement phylink properly, especially
supporting the in-band mode on this driver because extra initialization
is needed after linked the ethernet PHY up. So, convert to phy_device
from phylink.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: renesas: rswitch: Simplify struct phy * handling

Simplify struct phy *serdes handling by keeping the valiable in
the struct rswitch_device.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add TCP_MINTTL drop reason

In the unlikely case incoming packets are dropped because
of IP_MINTTL / IPV6_MINHOPCOUNT constraints...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230201174345.2708943-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: do not double-parse 'phy-reset-active-high' property

Conversion to gpiod API done in commit 468ba54bd616 ("fec: convert
to gpio descriptor") clashed with gpiolib applying the same quirk to the
reset GPIO polarity (introduced in commit b02c85c9458c). This results in
the reset line being left active/device being left in reset state when
reset line is "active low".

Remove handling of 'phy-reset-active-high' property from the driver and
rely on gpiolib to apply needed adjustments to avoid ending up with the
double inversion/flipped logic.

Fixes: 468ba54bd616 ("fec: convert to gpio descriptor")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230201215320.528319-2-dmitry.torokhov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: restore handling of PHY reset line as optional

Conversion of the driver to gpiod API done in 468ba54bd616 ("fec:
convert to gpio descriptor") incorrectly made reset line mandatory and
resulted in aborting driver probe in cases where reset line was not
specified (note: this way of specifying PHY reset line is actually
deprecated).

Switch to using devm_gpiod_get_optional() and skip manipulating reset
line if it can not be located.

Fixes: 468ba54bd616 ("fec: convert to gpio descriptor")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230201215320.528319-1-dmitry.torokhov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

net/core/gro.c
7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.2-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - phy: fix null-deref in phy_attach_direct

   - mac802154: fix possible double free upon parsing error

  Previous releases - regressions:

   - bpf: preserve reg parent/live fields when copying range info,
     prevent mis-verification of programs as safe

   - ip6: fix GRE tunnels not generating IPv6 link local addresses

   - phy: dp83822: fix null-deref on DP83825/DP83826 devices

   - sctp: do not check hb_timer.expires when resetting hb_timer

   - eth: mtk_sock: fix SGMII configuration after phylink conversion

  Previous releases - always broken:

   - eth: xdp: execute xdp_do_flush() before napi_complete_done()

   - skb: do not mix page pool and page referenced frags in GRO

   - bpf:
      - fix a possible task gone issue with bpf_send_signal[_thread]()
      - fix an off-by-one bug in bpf_mem_cache_idx() to select the right
        cache
      - add missing btf_put to register_btf_id_dtor_kfuncs
      - sockmap: fon't let sock_map_{close,destroy,unhash} call itself

   - gso: fix null-deref in skb_segment_list()

   - mctp: purge receive queues on sk destruction

   - fix UaF caused by accept on already connected socket in exotic
     socket families

   - tls: don't treat list head as an entry in tls_is_tx_ready()

   - netfilter: br_netfilter: disable sabotage_in hook after first
     suppression

   - wwan: t7xx: fix runtime PM implementation

  Misc:

   - MAINTAINERS: spring cleanup of networking maintainers"

* tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  mtk_sgmii: enable PCS polling to allow SFP work
  net: mediatek: sgmii: fix duplex configuration
  net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration
  MAINTAINERS: update SCTP maintainers
  MAINTAINERS: ipv6: retire Hideaki Yoshifuji
  mailmap: add John Crispin's entry
  MAINTAINERS: bonding: move Veaceslav Falico to CREDITS
  net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
  net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC
  virtio-net: Keep stop() to follow mirror sequence of open()
  selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
  selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
  selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
  selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
  can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
  can: isotp: split tx timer into transmission and timeout
  can: isotp: handle wait_event_interruptible() return values
  can: raw: fix CAN FD frame transmissions over CAN XL devices
  can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
  hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()
  ...

Merge tag 'linux-kselftest-kunit-fixes-6.2-rc7' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull KUnit fixes from Shuah Khan:
"Three fixes to bugs that cause kernel crash, link error during build,
  and a third to fix kunit_test_init_section_suites() extra indirection
  issue"

* tag 'linux-kselftest-kunit-fixes-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: fix kunit_test_init_section_suites(...)
  kunit: fix bug in KUNIT_EXPECT_MEMEQ
  kunit: Export kunit_running()

Merge tag 'soc-fixes-6.2-3' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"The majority of bugfixes is once more for the NXP i.MX platform,
  addressing issue with i.MX8M (UART, watchdog and ethernet) as well as
  imx8dxl power button and the USB modem on an imx7 board.

  The reason that i.MX always shows up here is obviously not that they
  are more buggy than the others, but they have the most boards and are
  good about getting fixes in quickly.

  The other DT fixes are for the Nuvoton wpcm450 flash controller and
  the i2c mux on an ASpeed board.

  Lastly, there are updates to the MAINTAINERS entries for Mediatek,
  AMD/Seattle and NXP SoCs, as well as a lone code fix for error
  handling in the allwinner 'rsb' bus driver"

* tag 'soc-fixes-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: dts: wpcm450: Add nuvoton,shm = <&shm> to FIU node
  MAINTAINERS: Update entry for MediaTek SoC support
  MAINTAINERS: amd: drop inactive Brijesh Singh
  ARM: dts: imx7d-smegw01: Fix USB host over-current polarity
  arm64: dts: imx8mm-verdin: Do not power down eth-phy
  MAINTAINERS: match freescale ARM64 DT directory in i.MX entry
  arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
  ARM: dts: aspeed: Fix pca9849 compatible
  arm64: dts: freescale: imx8dxl: fix sc_pwrkey's property name linux,keycode
  arm64: dts: imx8m-venice: Remove incorrect 'uart-has-rtscts'
  arm64: dts: imx8mm: Reinstate GPIO watchdog always-running property on eDM SBC
  bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()

Merge tag 's390-6.2-4' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

- With CONFIG_VMAP_STACK enabled it is not possible to load the s390
   specific diag288_wdt watchdog module. The reason is that a pointer to
   a string is passed to an inline assembly; this string however is
   located on the stack, while the instruction within the inline
   assembly expects a physicial address. Fix this by copying the string
   to a kmalloc'ed buffer.

- The diag288_wdt watchdog module does not indicate that it accesses
   memory from an inline assembly, which it does. Add "memory" to the
   clobber list to prevent the compiler from optimizing code incorrectly
   away.

- Pass size of the uncompressed kernel image to __decompress() call.
   Otherwise the kernel image decompressor may corrupt/overwrite an
   initrd. This was reported to happen on s390 after commit 2aa14b1ab2c4
   ("zstd: import usptream v1.5.2").

* tag 's390-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/decompressor: specify __decompress() buf len to avoid overflow
  watchdog: diag288_wdt: fix __diag288() inline assembly
  watchdog: diag288_wdt: do not use stack buffers for hardware data

Merge tag 'platform-drivers-x86-v6.2-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
"A set of AMD PMF fixes + a few other small fixes"

* tag 'platform-drivers-x86-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match
  platform/x86: thinkpad_acpi: Fix thinklight LED brightness returning 255
  platform/x86/amd: pmc: add CONFIG_SERIO dependency
  platform/x86/amd/pmf: Ensure mutexes are initialized before use
  platform/x86/amd/pmf: Fix to update SPS thermals when power supply change
  platform/x86/amd/pmf: Fix to update SPS default pprof thermals
  platform/x86/amd/pmf: update to auto-mode limits only after AMT event
  platform/x86/amd/pmf: Add helper routine to check pprof is balanced
  platform/x86/amd/pmf: Add helper routine to update SPS thermals

Merge branch 'fixes-for-mtk_eth_soc'

Bjørn Mork says:

====================
Fix mtk_eth_soc sgmii configuration.

This has been tested on a MT7986 with a Maxlinear GPY211C phy
permanently attached to the second SoC mac.
====================

Link: https://lore.kernel.org/r/20230201182331.943411-1-bjorn@mork.no
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mtk_sgmii: enable PCS polling to allow SFP work

Currently there is no IRQ handling (even the SGMII supports it).
Enable polling to support SFP ports.

Fixes: 14a44ab0330d ("net: mtk_eth_soc: partially convert to phylink_pcs")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: changed "1" => "true" ]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mediatek: sgmii: fix duplex configuration

The logic of the duplex bit is inverted. Setting it means half
duplex, not full duplex.

Fix and rename macro to avoid confusion.

Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration

The code expect the PHY to be in power down which is only true after reset.
Allow changes of the SGMII parameters more than once.

Only power down when reconfiguring to avoid bouncing the link when there's
no reason to - based on code from Russell King.

There are cases when the SGMII_PHYA_PWD register contains 0x9 which
prevents SGMII from working. The SGMII still shows link but no traffic
can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
taken from a good working state of the SGMII interface.

Fixes: 42c03844e93d ("net-next: mediatek: add support for MediaTek MT7622 SoC")
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: rebased and squashed into one patch ]
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'linux-can-fixes-for-6.2-20230202' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
can 2023-02-02

The first patch is by Ziyang Xuan and removes a errant WARN_ON_ONCE()
in the CAN J1939 protocol.

The next 3 patches are by Oliver Hartkopp. The first 2 target the CAN
ISO-TP protocol and fix the state machine with respect to signals and
a regression found by the syzbot.

The last patch is by me an missing assignment during the ethtool ring
configuration callback.

* tag 'linux-can-fixes-for-6.2-20230202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
  can: isotp: split tx timer into transmission and timeout
  can: isotp: handle wait_event_interruptible() return values
  can: raw: fix CAN FD frame transmissions over CAN XL devices
  can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
====================

Link: https://lore.kernel.org/r/20230202094135.2293939-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'maintainers-spring-refresh-of-networking-maintainers'

Jakub Kicinski says:

====================
MAINTAINERS: spring refresh of networking maintainers

Use Jon Corbet's script for generating statistics about maintainer
coverage to identify inactive maintainers of relatively active code.
Move them to CREDITS.
====================

Link: https://lore.kernel.org/r/20230201182014.2362044-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: update SCTP maintainers

Vlad has stepped away from SCTP related duties.
Move him to CREDITS and add Xin Long.

Subsystem SCTP PROTOCOL
  Changes 237 / 629 (37%)
  Last activity: 2022-12-12
  Vlad Yasevich <vyasevich@gmail.com>:
  Neil Horman <nhorman@tuxdriver.com>:
    Author 20a785aa52c8 2020-05-19 00:00:00 4
    Tags 20a785aa52c8 2020-05-19 00:00:00 84
  Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>:
    Author 557fb5862c92 2021-07-28 00:00:00 41
    Tags da05cecc4939 2022-12-12 00:00:00 197
  Top reviewers:
    [15]: lucien.xin@gmail.com
  INACTIVE MAINTAINER Vlad Yasevich <vyasevich@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: ipv6: retire Hideaki Yoshifuji

We very rarely hear from Hideaki Yoshifuji and the IPv4/IPv6
entry covers a lot of code. Asking people to CC someone who
rarely responds feels wrong.

Note that Hideaki Yoshifuji already has an entry in CREDITS
for IPv6 so not adding another one.

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mailmap: add John Crispin's entry

John has not been CCed on some of the fixes which perhaps resulted
in the lack of review tags:

Subsystem MEDIATEK ETHERNET DRIVER
  Changes 50 / 295 (16%)
  Last activity: 2023-01-17
  Felix Fietkau <nbd@nbd.name>:
    Author 8bd8dcc5e47f 2022-11-18 00:00:00 33
    Tags 8bd8dcc5e47f 2022-11-18 00:00:00 38
  John Crispin <john@phrozen.org>:
  Sean Wang <sean.wang@mediatek.com>:
    Author 880c2d4b2fdf 2019-06-03 00:00:00 7
    Tags a5d75538295b 2020-04-07 00:00:00 10
  Mark Lee <Mark-MC.Lee@mediatek.com>:
    Author 8d66a8183d0c 2019-11-14 00:00:00 4
    Tags 8d66a8183d0c 2019-11-14 00:00:00 4
  Lorenzo Bianconi <lorenzo@kernel.org>:
    Author 08a764a7c51b 2023-01-17 00:00:00 68
    Tags 08a764a7c51b 2023-01-17 00:00:00 74
  Top reviewers:
    [12]: leonro@nvidia.com
    [6]: f.fainelli@gmail.com
    [6]: andrew@lunn.ch
  INACTIVE MAINTAINER John Crispin <john@phrozen.org>

map his old address to the up to date one.

Acked-by: John Crispin <john@phrozen.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: bonding: move Veaceslav Falico to CREDITS

Veaceslav has stepped away from netdev:

Subsystem BONDING DRIVER
  Changes 96 / 319 (30%)
  Last activity: 2022-12-01
  Jay Vosburgh <j.vosburgh@gmail.com>:
    Author 4f5d33f4f798 2022-08-11 00:00:00 3
    Tags e5214f363dab 2022-12-01 00:00:00 48
  Veaceslav Falico <vfalico@gmail.com>:
  Andy Gospodarek <andy@greyhouse.net>:
    Tags 47f706262f1d 2019-02-24 00:00:00 4
  Top reviewers:
    [42]: jay.vosburgh@canonical.com
    [18]: jiri@nvidia.com
    [10]: jtoppins@redhat.com
  INACTIVE MAINTAINER Veaceslav Falico <vfalico@gmail.com>

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
not freed when an allocation of a key fails.

BUG: memory leak
unreferenced object 0xffff888116668000 (size 632):
  comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
    [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77
    [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957
    [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739
    [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
    [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
    [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515
    [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
    [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
    [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339
    [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934
    [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
    [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
    [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356
    [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
    [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
    [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
    [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

To fix this the patch rearranges the goto labels to reflect the order of
object allocations and adds appropriate goto statements on the error
paths.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC

According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging
won't work on the second MAC. Therefore, disable this feature when the
second MAC of the MT7621 and MT7623 SoCs is being used.

Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Link: https://lore.kernel.org/netdev/6249fc14-b38a-c770-36b4-5af6d41c21d3@arinc9.com/
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230128094232.2451947-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio-net: Keep stop() to follow mirror sequence of open()

Cited commit in fixes tag frees rxq xdp info while RQ NAPI is
still enabled and packet processing may be ongoing.

Follow the mirror sequence of open() in the stop() callback.
This ensures that when rxq info is unregistered, no rx
packet processing is ongoing.

Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: Use sysfs_emit() to instead of sprintf()

Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Link: https://lore.kernel.org/r/20230201081438.3151-1-liubo03@inspur.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'amd-xgbe-add-support-for-2-5gbe-and-rx-adaptation'

Raju Rangoju says:

====================
amd-xgbe: add support for 2.5GbE and rx-adaptation

This patch series adds support for 2.GbE in 10GBaseT mode and
rx-adaptation support for Yellow Carp devices.

1) Support for 2.5GbE:
   Add the necessary changes to the driver to fully recognize and enable
   2.5GbE speed in 10GBaseT mode.

2) Support for rx-adaptation:
   In order to support the 10G backplane mode without Auto-negotiation
   and to support the longer-length DAC cables, it requires PHY to
   perform RX Adaptation sequence as mentioned in the Synopsys databook.
   Add the necessary changes to Yellow Carp devices to ensure seamless
   RX Adaptation for 10G-SFI (LONG DAC), and 10G-KR modes without
   Auto-Negotiation (CL72 not present)
====================

Link: https://lore.kernel.org/r/20230201054932.212700-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

amd-xgbe: add support for rx-adaptation

The existing implementation for non-Autonegotiation 10G speed modes does
not enable RX adaptation in the Driver and FW. The RX Equalization
settings (AFE settings alone) are manually configured and the existing
link-up sequence in the driver does not perform rx adaptation process as
mentioned in the Synopsys databook. There's a customer request for 10G
backplane mode without Auto-negotiation and for the DAC cables of more
significant length that follow the non-Autonegotiation mode. These modes
require PHY to perform RX Adaptation.

The proposed logic adds the necessary changes to Yellow Carp devices to
ensure seamless RX Adaptation for 10G-SFI (LONG DAC) and 10G-KR without
AN (CL72 not present). The RX adaptation core algorithm is executed by
firmware, however, to achieve that a new mailbox sub-command is required
to be sent by the driver.

Co-developed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

amd-xgbe: add 2.5GbE support to 10G BaseT mode

Add support to the driver to fully recognize and enable 2.5GbE speed in
10GBaseT mode.

Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking

The test tool can check that the zerocopy number of completions value is
valid taking into consideration the number of datagram send calls. This can
catch the system into a state where the datagrams are still in the system
(for example in a qdisk, waiting for the network interface to return a
completion notification, etc).

This change adds a retry logic of computing the number of completions up to
a configurable (via CLI) timeout (default: 2 seconds).

Fixes: 79ebc3c26010 ("net/udpgso_bench_tx: options to exercise TX CMSG")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-4-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs

"udpgro_bench.sh" invokes udpgso_bench_rx/udpgso_bench_tx programs
subsequently and while doing so, there is a chance that the rx one is not
ready to accept socket connections. This racing bug could fail the test
with at least one of the following:

./udpgso_bench_tx: connect: Connection refused
./udpgso_bench_tx: sendmsg: Connection refused
./udpgso_bench_tx: write: Connection refused

This change addresses this by making udpgro_bench.sh wait for the rx
program to be ready before firing off the tx one - up to a 10s timeout.

Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-3-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided

Leaving unrecognized arguments buried in the output, can easily hide a
CLI/script typo. Avoid this by exiting when wrong arguments are provided to
the udpgso_bench test programs.

Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-2-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning

This change fixes the following compiler warning:

/usr/include/x86_64-linux-gnu/bits/error.h:40:5: warning: ‘gso_size’ may
be used uninitialized [-Wmaybe-uninitialized]
   40 |     __error_noreturn (__status, __errnum, __format,
   __va_arg_pack ());
         |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
udpgso_bench_rx.c: In function ‘main’:
udpgso_bench_rx.c:253:23: note: ‘gso_size’ was declared here
   253 |         int ret, len, gso_size, budget = 256;

Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230201001612.515730-1-andrei.gherzan@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-sched-transition-act_pedit-to-rcu-and-percpu-stats'

Pedro Tammela says:

====================
net/sched: transition act_pedit to rcu and percpu stats

The software pedit action didn't get the same love as some of the
other actions and it's still using spinlocks and shared stats.
Therefore, transition the action to rcu and percpu stats which
improves the action's performance.

We test this change with a very simple packet forwarding setup:

tc filter add dev ens2f0 ingress protocol ip matchall \
   action pedit ex munge eth src set b8:ce:f6:4b:68:35 pipe \
   action pedit ex munge eth dst set ac:1f:6b:e4:ff:93 pipe \
   action mirred egress redirect dev ens2f1
tc filter add dev ens2f1 ingress protocol ip matchall \
   action pedit ex munge eth src set b8:ce:f6:4b:68:34 pipe \
   action pedit ex munge eth dst set ac:1f:6b:e4:ff:92 pipe \
   action mirred egress redirect dev ens2f0

Using TRex with a http-like profile, in our setup with a 25G NIC
and a 26 cores Intel CPU, we observe the following in perf:
   before:
    11.59%  2.30%  [kernel]  [k] tcf_pedit_act
       2.55% tcf_pedit_act
             8.38% _raw_spin_lock
                       6.43% native_queued_spin_lock_slowpath
   after:
    1.46%  1.46%  [kernel]  [k] tcf_pedit_act

tdc results for pedit after the patch:
1..69
ok 1 319a - Add pedit action that mangles IP TTL
ok 2 7e67 - Replace pedit action with invalid goto chain
ok 3 377e - Add pedit action with RAW_OP offset u32
ok 4 a0ca - Add pedit action with RAW_OP offset u32 (INVALID)
ok 5 dd8a - Add pedit action with RAW_OP offset u16 u16
ok 6 53db - Add pedit action with RAW_OP offset u16 (INVALID)
ok 7 5c7e - Add pedit action with RAW_OP offset u8 add value
ok 8 2893 - Add pedit action with RAW_OP offset u8 quad
ok 9 3a07 - Add pedit action with RAW_OP offset u8-u16-u8
ok 10 ab0f - Add pedit action with RAW_OP offset u16-u8-u8
ok 11 9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert
ok 12 ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID)
ok 13 f512 - Add pedit action with RAW_OP offset u16 at offmask shift set
ok 14 c2cb - Add pedit action with RAW_OP offset u32 retain value
ok 15 1762 - Add pedit action with RAW_OP offset u8 clear value
ok 16 bcee - Add pedit action with RAW_OP offset u8 retain value
ok 17 e89f - Add pedit action with RAW_OP offset u16 retain value
ok 18 c282 - Add pedit action with RAW_OP offset u32 clear value
ok 19 c422 - Add pedit action with RAW_OP offset u16 invert value
ok 20 d3d3 - Add pedit action with RAW_OP offset u32 invert value
ok 21 57e5 - Add pedit action with RAW_OP offset u8 preserve value
ok 22 99e0 - Add pedit action with RAW_OP offset u16 preserve value
ok 23 1892 - Add pedit action with RAW_OP offset u32 preserve value
ok 24 4b60 - Add pedit action with RAW_OP negative offset u16/u32 set value
ok 25 a5a7 - Add pedit action with LAYERED_OP eth set src
ok 26 86d4 - Add pedit action with LAYERED_OP eth set src & dst
ok 27 f8a9 - Add pedit action with LAYERED_OP eth set dst
ok 28 c715 - Add pedit action with LAYERED_OP eth set src (INVALID)
ok 29 8131 - Add pedit action with LAYERED_OP eth set dst (INVALID)
ok 30 ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence
ok 31 dec4 - Add pedit action with LAYERED_OP eth set type (INVALID)
ok 32 ab06 - Add pedit action with LAYERED_OP eth add type
ok 33 918d - Add pedit action with LAYERED_OP eth invert src
ok 34 a8d4 - Add pedit action with LAYERED_OP eth invert dst
ok 35 ee13 - Add pedit action with LAYERED_OP eth invert type
ok 36 7588 - Add pedit action with LAYERED_OP ip set src
ok 37 0fa7 - Add pedit action with LAYERED_OP ip set dst
ok 38 5810 - Add pedit action with LAYERED_OP ip set src & dst
ok 39 1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield
ok 40 02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol
ok 41 3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID)
ok 42 31ae - Add pedit action with LAYERED_OP ip ttl clear/set
ok 43 486f - Add pedit action with LAYERED_OP ip set duplicate fields
ok 44 e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag, nofrag fields
ok 45 cc8a - Add pedit action with LAYERED_OP ip set tos
ok 46 7a17 - Add pedit action with LAYERED_OP ip set precedence
ok 47 c3b6 - Add pedit action with LAYERED_OP ip add tos
ok 48 43d3 - Add pedit action with LAYERED_OP ip add precedence
ok 49 438e - Add pedit action with LAYERED_OP ip clear tos
ok 50 6b1b - Add pedit action with LAYERED_OP ip clear precedence
ok 51 824a - Add pedit action with LAYERED_OP ip invert tos
ok 52 106f - Add pedit action with LAYERED_OP ip invert precedence
ok 53 6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport
ok 54 afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type & icmp_code
ok 55 3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID)
ok 56 815c - Add pedit action with LAYERED_OP ip6 set src
ok 57 4dae - Add pedit action with LAYERED_OP ip6 set dst
ok 58 fc1f - Add pedit action with LAYERED_OP ip6 set src & dst
ok 59 6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID)
ok 60 94bb - Add pedit action with LAYERED_OP ip6 traffic_class
ok 61 6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl
ok 62 6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr, hoplimit
ok 63 1442 - Add pedit action with LAYERED_OP tcp set dport & sport
ok 64 b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID)
ok 65 cfcc - Add pedit action with LAYERED_OP tcp flags set
ok 66 3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags fields
ok 67 f1c8 - Add pedit action with LAYERED_OP udp set dport & sport
ok 68 d784 - Add pedit action with mixed RAW/LAYERED_OP #1
ok 69 70ca - Add pedit action with mixed RAW/LAYERED_OP #2
====================

Link: https://lore.kernel.org/r/20230131190512.3805897-1-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: simplify tcf_pedit_act

Remove the check for a negative number of keys as
this cannot ever happen

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: transition act_pedit to rcu and percpu stats

The software pedit action didn't get the same love as some of the
other actions and it's still using spinlocks and shared stats in the
datapath.
Transition the action to rcu and percpu stats as this improves the
action's performance dramatically on multiple cpu deployments.

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'rxrpc-next-20230131' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
Here's the fifth part of patches in the process of moving rxrpc from doing
a lot of its stuff in softirq context to doing it in an I/O thread in
process context and thereby making it easier to support a larger SACK
table.

The full description is in the description for the first part[1] which is
now upstream.  The second and third parts are also upstream[2].  A subset
of the original fourth part[3] got applied as a fix for a race[4].

The fifth part includes some cleanups:

(1) Miscellaneous trace header cleanups: fix a trace string, display the
     security index in rx_packet rather than displaying the type twice,
     remove some whitespace to make checkpatch happier and remove some
     excess tabulation.

(2) Convert ->recvmsg_lock to a spinlock as it's only ever locked
     exclusively.

(3) Make ->ackr_window and ->ackr_nr_unacked non-atomic as they're only
     used in the I/O thread.

(4) Don't use call->tx_lock to access ->tx_buffer as that is only accessed
     inside the I/O thread.  sendmsg() loads onto ->tx_sendmsg and the I/O
     thread decants from that to the buffer.

(5) Remove local->defrag_sem as DATA packets are transmitted serially by
     the I/O thread.

(6) Remove the service connection bundle is it was only used for its
     channel_lock - which has now gone.

And some more significant changes:

(7) Add a debugging option to allow a delay to be injected into packet
     reception to help investigate the behaviour over longer links than
     just a few cm.

(8) Generate occasional PING ACKs to probe for RTT information during a
     receive heavy call.

(9) Simplify the SACK table maintenance and ACK generation.  Now that both
     parts are done in the same thread, there's no possibility of a race
     and no need to try and be cunning to avoid taking a BH spinlock whilst
     copying the SACK table (which in the future will be up to 2K) and no
     need to rotate the copy to fit the ACK packet table.

(10) Use SKB_CONSUMED when freeing received DATA packets (stop dropwatch
     complaining).

* tag 'rxrpc-next-20230131' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  rxrpc: Kill service bundle
  rxrpc: Change rx_packet tracepoint to display securityIndex not type twice
  rxrpc: Show consumed and freed packets as non-dropped in dropwatch
  rxrpc: Remove local->defrag_sem
  rxrpc: Don't lock call->tx_lock to access call->tx_buffer
  rxrpc: Simplify ACK handling
  rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked
  rxrpc: Generate extra pings for RTT during heavy-receive call
  rxrpc: Allow a delay to be injected into packet reception
  rxrpc: Convert call->recvmsg_lock to a spinlock
  rxrpc: Shrink the tabulation in the rxrpc trace header a bit
  rxrpc: Remove whitespace before ')' in trace header
  rxrpc: Fix trace string
====================

Link: https://lore.kernel.org/all/20230131171227.3912130-1-dhowells@redhat.com/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match

Add a DMI match for the CWI501 version of the Chuwi Vi8 tablet,
pointing to the same chuwi_vi8_data as the existing CWI506 version
DMI match.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230202103413.331459-1-hdegoede@redhat.com

can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq

If the a new ring layout is set, the max coalesced frames for RX and
TX are re-calculated, too. Add the missing assignment of the newly
calculated TX max coalesced frames.

Fixes: 656fc12ddaf8 ("can: mcp251xfd: add TX IRQ coalescing ethtool support")
Link: https://lore.kernel.org/all/20230130154334.1578518-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: split tx timer into transmission and timeout

The timer for the transmission of isotp PDUs formerly had two functions:
1. send two consecutive frames with a given time gap
2. monitor the timeouts for flow control frames and the echo frames

This led to larger txstate checks and potentially to a problem discovered
by syzbot which enabled the panic_on_warn feature while testing.

The former 'txtimer' function is split into 'txfrtimer' and 'txtimer'
to handle the two above functionalities with separate timer callbacks.

The two simplified timers now run in one-shot mode and make the state
transitions (especially with isotp_rcv_echo) better understandable.

Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing")
Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # >= v6.0
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: isotp: handle wait_event_interruptible() return values

When wait_event_interruptible() has been interrupted by a signal the
tx.state value might not be ISOTP_IDLE. Force the state machines
into idle state to inhibit the timer handlers to continue working.

Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing")
Cc: stable@vger.kernel.org
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230112192347.1944-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: raw: fix CAN FD frame transmissions over CAN XL devices

A CAN XL device is always capable to process CAN FD frames. The former
check when sending CAN FD frames relied on the existence of a CAN FD
device and did not check for a CAN XL device that would be correct
too.

With this patch the CAN FD feature is enabled automatically when CAN
XL is switched on - and CAN FD cannot be switch off while CAN XL is
enabled.

This precondition also leads to a clean up and reduction of checks in
the hot path in raw_rcv() and raw_sendmsg(). Some conditions are
reordered to handle simple checks first.

changes since v1: https://lore.kernel.org/all/20230131091012.50553-1-socketcan@hartkopp.net
- fixed typo: devive -> device
changes since v2: https://lore.kernel.org/all/20230131091824.51026-1-socketcan@hartkopp.net/
- reorder checks in if statements to handle simple checks first

Fixes: 626332696d75 ("can: raw: add CAN XL support")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230131105613.55228-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate

The conclusion "j1939_session_deactivate() should be called with a
session ref-count of at least 2" is incorrect. In some concurrent
scenarios, j1939_session_deactivate can be called with the session
ref-count less than 2. But there is not any problem because it
will check the session active state before session putting in
j1939_session_deactivate_locked().

Here is the concurrent scenario of the problem reported by syzbot
and my reproduction log.

        cpu0                            cpu1
                                j1939_xtp_rx_eoma
j1939_xtp_rx_abort_one
                                j1939_session_get_by_addr [kref == 2]
j1939_session_get_by_addr [kref == 3]
j1939_session_deactivate [kref == 2]
j1939_session_put [kref == 1]
j1939_session_completed
j1939_session_deactivate
WARN_ON_ONCE(kref < 2)

=====================================================
WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70
CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:j1939_session_deactivate+0x5f/0x70
Call Trace:
j1939_session_deactivate_activate_next+0x11/0x28
j1939_xtp_rx_eoma+0x12a/0x180
j1939_tp_recv+0x4a2/0x510
j1939_can_recv+0x226/0x380
can_rcv_filter+0xf8/0x220
can_receive+0x102/0x220
? process_backlog+0xf0/0x2c0
can_rcv+0x53/0xf0
__netif_receive_skb_one_core+0x67/0x90
? process_backlog+0x97/0x2c0
__netif_receive_skb+0x22/0x80

Fixes: 0c71437dd50d ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object")
Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()

netvsc_dma_map() and netvsc_dma_unmap() currently check the cp_partial
flag and adjust the page_count so that pagebuf entries for the RNDIS
portion of the message are skipped when it has already been copied into
a send buffer. But this adjustment has already been made by code in
netvsc_send(). The duplicate adjustment causes some pagebuf entries to
not be mapped. In a normal VM, this doesn't break anything because the
mapping doesn’t change the PFN. But in a Confidential VM,
dma_map_single() does bounce buffering and provides a different PFN.
Failing to do the mapping causes the wrong PFN to be passed to Hyper-V,
and various errors ensue.

Fix this by removing the duplicate adjustment in netvsc_dma_map() and
netvsc_dma_unmap().

Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1675135986-254490-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-af: Removed unnecessary debug messages.

NPC exact match feature is supported only on one silicon
variant, removed debug messages which print that this
feature is not available on all other silicon variants.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230201040301.1034843-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio-net: fix possible unsigned integer overflow

When the single-buffer xdp is loaded and after xdp_linearize_page()
is called, *num_buf becomes 0 and (*num_buf - 1) may overflow into
a large integer in virtnet_build_xdp_buff_mrg(), resulting in
unexpected packet dropping.

Fixes: ef75cb51f139 ("virtio-net: build xdp_buff with multi buffers")
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20230131085004.98687-1-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: Fix devlink unregister

Exact match feature is only available in CN10K-B.
Unregister exact match devlink entry only for
this silicon variant.

Fixes: 87e4ea29b030 ("octeontx2-af: Debugsfs support for exact match.")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230131061659.1025137-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igc: return an error if the mac type is unknown in igc_ptp_systim_to_hwtstamp()

clang static analysis reports
drivers/net/ethernet/intel/igc/igc_ptp.c:673:3: warning: The left operand of
  '+' is a garbage value [core.UndefinedBinaryOperatorResult]
   ktime_add_ns(shhwtstamps.hwtstamp, adjust);
   ^            ~~~~~~~~~~~~~~~~~~~~

igc_ptp_systim_to_hwtstamp() silently returns without setting the hwtstamp
if the mac type is unknown.  This should be treated as an error.

Fixes: 81b055205e8b ("igc: Add support for RX timestamping")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230131215437.1528994-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: provide an ability to set default extack message

In netdev common pattern, extack pointer is forwarded to the drivers
to be filled with error message. However, the caller can easily
overwrite the filled message.

Instead of adding multiple "if (!extack->_msg)" checks before any
NL_SET_ERR_MSG() call, which appears after call to the driver, let's
add new macro to common code.

[1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

neighbor: fix proxy_delay usage when it is zero

When set to zero, the neighbor sysctl proxy_delay value
does not cause an immediate reply for ARP/ND requests
as expected, it instead causes a random delay between
[0, U32_MAX). Looking at this comment from
__get_random_u32_below() explains the reason:

/*
* This function is technically undefined for ceil == 0, and in fact
* for the non-underscored constant version in the header, we build bug
* on that. But for the non-constant case, it's convenient to have that
* evaluate to being a straight call to get_random_u32(), so that
* get_random_u32_inclusive() can work over its whole range without
* undefined behavior.
*/

Added helper function that does not call get_random_u32_below()
if proxy_delay is zero and just uses the current value of
jiffies instead, causing pneigh_enqueue() to respond
immediately.

Also added definition of proxy_delay to ip-sysctl.txt since
it was missing.

Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
Link: https://lore.kernel.org/r/20230130171428.367111-1-haleyb.dev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-support-ipv4-big-tcp'

Xin Long says:

====================
net: support ipv4 big tcp

This is similar to the BIG TCP patchset added by Eric for IPv6:

  https://lwn.net/Articles/895398/

Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header
doesn't have exthdrs(options) for the BIG TCP packets' length. To make
it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to
indicate this might be a BIG TCP packet and use skb->len as the real
IPv4 total length.

This will work safely, as all BIG TCP packets are GSO/GRO packets and
processed on the same host as they were created; There is no padding
in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4
packet total length; Also, before implementing the feature, all those
places that may get iph tot_len from BIG TCP packets are taken care
with some new APIs:

Patch 1 adds some APIs for iph tot_len setting and getting, which are
used in all these places where IPv4 BIG TCP packets may reach in Patch
2-7, Patch 8 adds a GSO_TCP tp_status for af_packet users, and Patch 9
add new netlink attributes to make IPv4 BIG TCP independent from IPv6
BIG TCP on configuration, and Patch 10 implements this feature.

Note that the similar change as in Patch 2-6 are also needed for IPv6
BIG TCP packets, and will be addressed in another patchset.

The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC
and 1.5K MTU:

No BIG TCP:
for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
168          322          337          3776.49
143          236          277          4654.67
128          258          288          4772.83
171          229          278          4645.77
175          228          243          4678.93
149          239          279          4599.86
164          234          268          4606.94
155          276          289          4235.82
180          255          268          4418.95
168          241          249          4417.82

Enable BIG TCP:
ip link set dev ens1f0np0 gro_ipv4_max_size 128000 gso_ipv4_max_size 128000
for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
161          241          252          4821.73
174          205          217          5098.28
167          208          220          5001.43
164          228          249          4883.98
150          233          249          4914.90
180          233          244          4819.66
154          208          219          5004.92
157          209          247          4999.78
160          218          246          4842.31
174          206          217          5080.99

Thanks for the feedback from Eric and David Ahern.
====================

Link: https://lore.kernel.org/r/cover.1674921359.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add support for ipv4 big tcp

Similar to Eric's IPv6 BIG TCP, this patch is to enable IPv4 BIG TCP.

Firstly, allow sk->sk_gso_max_size to be set to a value greater than
GSO_LEGACY_MAX_SIZE by not trimming gso_max_size in sk_trim_gso_size()
for IPv4 TCP sockets.

Then on TX path, set IP header tot_len to 0 when skb->len > IP_MAX_MTU
in __ip_local_out() to allow to send BIG TCP packets, and this implies
that skb->len is the length of a IPv4 packet; On RX path, use skb->len
as the length of the IPv4 packet when the IP header tot_len is 0 and
skb->len > IP_MAX_MTU in ip_rcv_core(). As the API iph_set_totlen() and
skb_ip_totlen() are used in __ip_local_out() and ip_rcv_core(), we only
need to update these APIs.

Also in GRO receive, add the check for ETH_P_IP/IPPROTO_TCP, and allows
the merged packet size >= GRO_LEGACY_MAX_SIZE in skb_gro_receive(). In
GRO complete, set IP header tot_len to 0 when the merged packet size
greater than IP_MAX_MTU in iph_set_totlen() so that it can be processed
on RX path.

Note that by checking skb_is_gso_tcp() in API iph_totlen(), it makes
this implementation safe to use iph->len == 0 indicates IPv4 BIG TCP
packets.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add gso_ipv4_max_size and gro_ipv4_max_size per device

This patch introduces gso_ipv4_max_size and gro_ipv4_max_size
per device and adds netlink attributes for them, so that IPV4
BIG TCP can be guarded by a separate tunable in the next patch.

To not break the old application using "gso/gro_max_size" for
IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size"
in netif_set_gso/gro_max_size() if the new size isn't greater
than GSO_LEGACY_MAX_SIZE, so that nothing will change even if
userspace doesn't realize the new netlink attributes.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

packet: add TP_STATUS_GSO_TCP for tp_status

Introduce TP_STATUS_GSO_TCP tp_status flag to tell the af_packet user
that this is a TCP GSO packet. When parsing IPv4 BIG TCP packets in
tcpdump/libpcap, it can use tp_len as the IPv4 packet len when this
flag is set, as iph tot_len is set to 0 for IPv4 BIG TCP packets.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr

ipvlan devices calls netif_inherit_tso_max() to get the tso_max_size/segs
from the lower device, so when lower device supports BIG TCP, the ipvlan
devices support it too. We also should consider its iph tot_len accessing.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>