review.tizen.org Git - platform/kernel/linux-starfive.git/log

Merge branch 'mptcp-prepare-mptcp-packet-scheduler-for-bpf-extension'

Mat Martineau says:

====================
mptcp: Prepare MPTCP packet scheduler for BPF extension

The kernel's MPTCP packet scheduler has, to date, been a one-size-fits
all algorithm that is hard-coded. It attempts to balance latency and
throughput when transmitting data across multiple TCP subflows, and has
some limited tunability through sysctls. It has been a long-term goal of
the Linux MPTCP community to support customizable packet schedulers for
use cases that need to make different trade-offs regarding latency,
throughput, redundancy, and other metrics. BPF is well-suited for
configuring customized, per-packet scheduling decisions without having
to modify the kernel or manage out-of-tree kernel modules.

The first steps toward implementing BPF packet schedulers are to update
the existing MPTCP transmit loops to allow more flexible scheduling
decisions, and to add infrastructure for swappable packet schedulers.
The existing scheduling algorithm remains the default. BPF-related
changes will be in a future patch series.

This code has been in the MPTCP development tree for quite a while,
undergoing testing in our CI and community.

Patches 1 and 2 refactor the transmit code and do some related cleanup.

Patches 3-9 add infrastructure for registering and calling multiple
schedulers.

Patch 10 connects the in-kernel default scheduler to the new
infrastructure.
====================

Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-0-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: register default scheduler

This patch defines the default packet scheduler mptcp_sched_default.
Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init().
Skip deleting this default scheduler in mptcp_unregister_scheduler().

Set msk->sched to the default scheduler when the input parameter of
mptcp_init_sched() is NULL.

Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans()
if the defaut scheduler is set or msk->sched is NULL.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-10-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: use get_retrans wrapper

This patch adds the multiple subflows support for __mptcp_retrans(). Use
get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it.

Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.

Move msk_owned_by_me() and fallback checks into get_retrans() wrapper
from mptcp_subflow_get_retrans().

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-9-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: use get_send wrapper

This patch adds the multiple subflows support for __mptcp_push_pending
and __mptcp_subflow_push_pending. Use get_send() wrapper instead of
mptcp_subflow_get_send() in them.

Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.

Move msk_owned_by_me() and fallback checks into get_send() wrapper from
mptcp_subflow_get_send().

This commit allows the scheduler to set the subflow->scheduled bit in
multiple subflows, but it does not allow for sending redundant data.
Multiple scheduled subflows will send sequential data on each subflow.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-8-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add scheduler wrappers

This patch defines two packet scheduler wrappers mptcp_sched_get_send()
and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in
them.

Set data->reinject to true in mptcp_sched_get_retrans(), set it false in
mptcp_sched_get_send().

If msk->sched is NULL, use default functions mptcp_subflow_get_send()
and mptcp_subflow_get_retrans() to send data.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-7-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add scheduled in mptcp_subflow_context

This patch adds a new member scheduled in struct mptcp_subflow_context,
which will be set in the MPTCP scheduler context when the scheduler
picks this subflow to send data.

Add a new helper mptcp_subflow_set_scheduled() to set this flag using
WRITE_ONCE().

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-6-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add sched in mptcp_sock

This patch adds a new struct member sched in struct mptcp_sock.
And two helpers mptcp_init_sched() and mptcp_release_sched() to
init and release it.

Init it with the sysctl scheduler in mptcp_init_sock(), copy the
scheduler from the parent in mptcp_sk_clone(), and release it in
__mptcp_destroy_sock().

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-5-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add a new sysctl scheduler

This patch adds a new sysctl, named scheduler, to support for selection
of different schedulers. Export mptcp_get_scheduler helper to get this
sysctl.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: add struct mptcp_sched_ops

This patch defines struct mptcp_sched_ops, which has three struct members,
name, owner and list, and four function pointers: init(), release() and
get_subflow().

The scheduler function get_subflow() have a struct mptcp_sched_data
parameter, which contains a reinject flag for retrans or not, a subflows
number and a mptcp_subflow_context array.

Add the scheduler registering, unregistering and finding functions to add,
delete and find a packet scheduler on the global list mptcp_sched_list.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: drop last_snd and MPTCP_RESET_SCHEDULER

Since the burst check conditions have moved out of the function
mptcp_subflow_get_send(), it makes all msk->last_snd useless.
This patch drops them as well as the macro MPTCP_RESET_SCHEDULER.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-2-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: refactor push_pending logic

To support redundant package schedulers more easily, this patch refactors
__mptcp_push_pending() logic from:

For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Push/release on final subflow

->

While first_pending isn't empty:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
For each pending dfrag:
While sends succeed:
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Break if required by msk->snd_burst / etc
Push/release on final subflow

Refactors __mptcp_subflow_push_pending logic from:

For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Send the dfrag data with mptcp_subflow_delegate(), break
Send the dfrag data with mptcp_sendmsg_frag()
Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst
Update msk->first_pending

->

While first_pending isn't empty:
Call the scheduler (selects subflow and msk->snd_burst)
Send the dfrag data with mptcp_subflow_delegate(), break
Send the dfrag data with mptcp_sendmsg_frag()
For each pending dfrag:
While sends succeed:
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Break if required by msk->snd_burst / etc

Move the duplicate code from __mptcp_push_pending() and
__mptcp_subflow_push_pending() into a new helper function, named
__subflow_push_pending(). Simplify __mptcp_push_pending() and
__mptcp_subflow_push_pending() by invoking this helper.

Also move the burst check conditions out of the function
mptcp_subflow_get_send(), check them in __subflow_push_pending() in
the inner "for each pending dfrag" loop.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-1-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2023-08-16' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-08-16

1) aRFS ethtool stats

Improve aRFS observability by adding new set of counters. Each Rx
ring will have this set of counters listed below.
These counters are exposed through ethtool -S.

1.1) arfs_add: number of times a new rule has been created.
1.2) arfs_request_in: number of times a rule  was requested to move from
   its current Rx ring to a new Rx ring (incremented on the destination
   Rx ring).
1.3) arfs_request_out: number of times a rule  was requested to move out
   from its current Rx ring (incremented on source/current Rx ring).
1.4) arfs_expired: number of times a rule has been expired by the
   kernel and removed from HW.
1.5) arfs_err: number of times a rule creation or modification has
   failed.

2) Supporting inline WQE when possible in SW steering

3) Misc cleanups and fixups to net-next branch

* tag 'mlx5-updates-2023-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Devcom, only use devcom after NULL check in mlx5_devcom_send_event()
  net/mlx5: DR, Supporting inline WQE when possible
  net/mlx5: Rename devlink port ops struct for PFs/VFs
  net/mlx5: Remove VPORT_UPLINK handling from devlink_port.c
  net/mlx5: Call mlx5_esw_offloads_rep_load/unload() for uplink port directly
  net/mlx5: Update dead links in Kconfig documentation
  net/mlx5: Remove health syndrome enum duplication
  net/mlx5: DR, Remove unneeded local variable
  net/mlx5: DR, Fix code indentation
  net/mlx5: IRQ, consolidate irq and affinity mask allocation
  net/mlx5e: Fix spelling mistake "Faided" -> "Failed"
  net/mlx5e: aRFS, Introduce ethtool stats
  net/mlx5e: aRFS, Warn if aRFS table does not exist for aRFS rule
  net/mlx5e: aRFS, Prevent repeated kernel rule migrations requests
====================

Link: https://lore.kernel.org/r/20230821175739.81188-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vrf: Remove unnecessary RCU-bh critical section

dev_queue_xmit_nit() already uses rcu_read_lock() / rcu_read_unlock()
and nothing suggests that softIRQs should be disabled around it.
Therefore, remove the rcu_read_lock_bh() / rcu_read_unlock_bh()
surrounding it.

Tested using [1] with lockdep enabled.

[1]
#!/bin/bash

ip link add name vrf1 up type vrf table 100
ip link add name veth0 type veth peer name veth1
ip link set dev veth1 master vrf1
ip link set dev veth0 up
ip link set dev veth1 up
ip address add 192.0.2.1/24 dev veth0
ip address add 192.0.2.2/24 dev veth1
ip rule add pref 32765 table local
ip rule del pref 0
tcpdump -i vrf1 -c 20 -w /dev/null &
sleep 10
ping -i 0.1 -c 10 -q 192.0.2.2

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230821142339.1889961-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vxlan: vnifilter: Use GFP_KERNEL instead of GFP_ATOMIC

The function is not called from an atomic context so use GFP_KERNEL
instead of GFP_ATOMIC. The allocation of the per-CPU stats is already
performed with GFP_KERNEL.

Tested using test_vxlan_vnifiltering.sh with CONFIG_DEBUG_ATOMIC_SLEEP.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230821141923.1889776-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: Remove unused declarations

Commit e8609e69470f ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
removed am65_cpsw_nuss_adjust_link() but not its declaration.
Commit 84640e27f230 ("net: netcp: Add Keystone NetCP core ethernet driver")
declared but never implemented netcp_device_find_module().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230821134029.40084-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: microchip: Remove unused declarations

Commit 264a9c5c9dff ("net: sparx5: Remove unused GLAG handling in PGID")
removed sparx5_pgid_alloc_glag() but not its declaration.
Commit 27d293cceee5 ("net: microchip: sparx5: Add support for rule count by cookie")
removed vcap_rule_iter() but not its declaration.
Commit 8beef08f4618 ("net: microchip: sparx5: Adding initial VCAP API support")
declared but never implemented vcap_api_set_client().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821135556.43224-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: Remove unused declarations

Commit fbfb8031533c ("ionic: Add hardware init and device commands")
declared but never implemented ionic_q_rewind()/ionic_set_dma_mask().
Commit 969f84394604 ("ionic: sync the filters in the work task")
declared but never implemented ionic_rx_filters_need_sync().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20230821134717.51936-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: Remove unused declarations

Commit 6c30384eb1de ("net: mscc: ocelot: register devlink ports")
declared but never implemented ocelot_devlink_init() and
ocelot_devlink_teardown().
Commit 2096805497e2 ("net: mscc: ocelot: automatically detect VCAP constants")
declared but never implemented ocelot_detect_vcap_constants().
Commit 403ffc2c34de ("net: mscc: ocelot: add support for preemptible traffic classes")
declared but never implemented ocelot_port_update_preemptible_tcs().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230821130218.19096-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: Remove unused declarations

Commit 91a98917a883 ("net: dsa: microchip: move switch chip_id detection to ksz_common")
removed ksz8_switch_detect() but not its declaration.
Commit 6ec23aaaac43 ("net: dsa: microchip: move ksz_dev_ops to ksz_common.c")
declared but never implemented other functions.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20230821125501.19624-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: update port speed when getting bond speed

Andrew reported a bonding issue that if we put an active-back bond on top
of a 802.3ad bond interface. When the 802.3ad bond's speed/duplex changed
dynamically. The upper bonding interface's speed/duplex can't be changed at
the same time, which will show incorrect speed.

Fix it by updating the port speed when calling ethtool.

Reported-by: Andrew Schorr <ajschorr@alumni.princeton.edu>
Closes: https://lore.kernel.org/netdev/ZEt3hvyREPVdbesO@Laptop-X1/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20230821101008.797482-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: remove unnecessary input parameter 'how' in ifdown function

When the ifdown function in the dst_ops structure is referenced, the input
parameter 'how' is always true. In the current implementation of the
ifdown interface, ip6_dst_ifdown does not use the input parameter 'how',
xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter
'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and
xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown
function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

alx: fix OOB-read compiler warning

The following message shows up when compiling with W=1:

In function ‘fortify_memcpy_chk’,
    inlined from ‘alx_get_ethtool_stats’ at drivers/net/ethernet/atheros/alx/ethtool.c:297:2:
./include/linux/fortify-string.h:592:4: error: call to ‘__read_overflow2_field’
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  592 |    __read_overflow2_field(q_size_field, size);
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to get alx stats altogether, alx_get_ethtool_stats() reads
beyond hw->stats.rx_ok. Fix this warning by directly copying hw->stats,
and refactor the unnecessarily complicated BUILD_BUG_ON btw.

Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821013218.1614265-1-gongruiqi@huaweicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: pcs: lynxi: implement pcs_disable op

When switching from 10GBase-R/5GBase-R/USXGMII to one of the interface
modes provided by mtk-pcs-lynxi we need to make sure to always perform
a full configuration of the PHYA.

Implement pcs_disable op which resets the stored interface mode to
PHY_INTERFACE_MODE_NA to trigger a full reconfiguration once the LynxI
PCS driver had previously been deselected in favor of another PCS
driver such as the to-be-added driver for the USXGMII PCS found in
MT7988.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/f23d1a60d2c9d2fb72e32dcb0eaa5f7e867a3d68.1692327891.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "pds_core: Fix some kernel-doc comments"

This reverts commit cb39c35783f26892bb1a72b1115c94fa2e77f4c5.
Patch was applied to hastily, the problem is already fixed
in Alex's vfio tree:
https://lore.kernel.org/all/20230821112237.105872b5.alex.williamson@redhat.com/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Devcom, only use devcom after NULL check in mlx5_devcom_send_event()

There is a warning reported by kernel test robot:

smatch warnings:
drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c:264
mlx5_devcom_send_event() warn: variable dereferenced before
IS_ERR check devcom (see line 259)

The reason for the warning is that the pointer is used before check, put
the assignment to comp after devcom check to silence the warning.

Fixes: 88d162b47981 ("net/mlx5: Devcom, Infrastructure changes")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202308041028.AkXYDwJ6-lkp@intel.com/
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Supporting inline WQE when possible

In WQE (Work Queue Entry), the two types of data segments memories are
pointers and inline data, where inline data is passed directly as
part of the WQE.
For software steering, the maximal inline size should be less than
2*MLX5_SEND_WQE_BB, i.e., the potential data must fit with the required
inline WQE headers.

Two consecutive blocks (MLX5_SEND_WQE_BB) are not guaranteed to reside
on the same memory page. Hence, writes to MLX5_SEND_WQE_BB should be
done separately, i.e., each MLX5_SEND_WQE_BB should be obtained using
the mlx5_wq_cyc_get_wqe macro.

Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Rename devlink port ops struct for PFs/VFs

As this struct is only used for devlink ports created for PF/VF,
add it to the name of the variable to distinguish from the SF related
ops struct.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Remove VPORT_UPLINK handling from devlink_port.c

It is not possible that the functions in devlink_port.c are called for
uplink port. Remove this leftover code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Call mlx5_esw_offloads_rep_load/unload() for uplink port directly

For uplink port, mlx5_esw_offloads_load/unload_rep() are currently
called. There are 2 check inside, which effectively make the
functions a simple wrappers of mlx5_esw_offloads_rep_load/unload()
for uplink port. So avoid one check and indirection and call
mlx5_esw_offloads_rep_load/unload() for uplink port directly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Update dead links in Kconfig documentation

Point to NVIDIA documentation for device specific information now that the
Mellanox documentation site is deprecated. Refer to kernel documentation
sources for generic information not specific to mlx5 devices.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Remove health syndrome enum duplication

Health syndrome enum values were duplicated in mlx5_ifc and health.c,
the correct place for them is mlx5_ifc.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Remove unneeded local variable

Remove local variable that is already defined outside of
the scope of this block.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Fix code indentation

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: IRQ, consolidate irq and affinity mask allocation

Consolidate the mlx5_irq and mlx5_irq->mask allocation, to simplify
error flows and to match the dealloctation sequence @irq_release for
symmetry.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>

net/mlx5e: Fix spelling mistake "Faided" -> "Failed"

There is a spelling mistake in a warning message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: aRFS, Introduce ethtool stats

Improve aRFS observability by adding new set of counters. Each Rx
ring will have this set of counters listed below.
These counters are exposed through ethtool -S.

1) arfs_add: number of times a new rule has been created.
2) arfs_request_in: number of times a rule  was requested to move from
   its current Rx ring to a new Rx ring (incremented on the destination
   Rx ring).
3) arfs_request_out: number of times a rule  was requested to move out
   from its current Rx ring (incremented on source/current Rx ring).
4) arfs_expired: number of times a rule has been expired by the
   kernel and removed from HW.
5) arfs_err: number of times a rule creation or modification has
   failed.

This patch removes rx[i]_xsk_arfs_err counter and its documentation in
mlx5/counters.rst since aRFS activity does not occur in XSK RQ's.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>

net/mlx5e: aRFS, Warn if aRFS table does not exist for aRFS rule

aRFS tables should be allocated and exist in advance. Driver shouldn't
reach a point where it tries to add aRFS rule to table that does not
exist.

Add warning if driver encounters such situation.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: aRFS, Prevent repeated kernel rule migrations requests

aRFS rule movement requests from one Rx ring to other Rx ring arrive
from the kernel to ensure that packets are steered to the right Rx ring.
In the time interval until satisfying such a request, several more
requests might follow, for the same flow.

This patch detects and prevents repeated aRFS rules movement requests.

In mlx5e_rx_flow_steer() ndo, after finding the aRFS rule that have been
requested to move by the kernel, check if it's already requested to move
by calling work_busy(&arfs_rule->arfs_work) handler. IOW, if this
request is pending to be executed (in the work queue) or it's executing
now but hasn't finished yet, return current filter ID and don't issue a
new transition work.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

pds_core: Fix some kernel-doc comments

Fix some kernel-doc comments to silence the warnings:

drivers/net/ethernet/amd/pds_core/auxbus.c:18: warning: Function parameter or member 'pf' not described in 'pds_client_register'
drivers/net/ethernet/amd/pds_core/auxbus.c:18: warning: Excess function parameter 'pf_pdev' description in 'pds_client_register'
drivers/net/ethernet/amd/pds_core/auxbus.c:58: warning: Function parameter or member 'pf' not described in 'pds_client_unregister'
drivers/net/ethernet/amd/pds_core/auxbus.c:58: warning: Excess function parameter 'pf_pdev' description in 'pds_client_unregister'

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: annotate data-races around sk->sk_lingertime

sk_getsockopt() runs locklessly. This means sk->sk_lingertime
can be read while other threads are changing its value.

Other reads also happen without socket lock being held,
and must be annotated.

Remove preprocessor logic using BITS_PER_LONG, compilers
are smart enough to figure this by themselves.

v2: fixed a clang W=1 (-Wtautological-constant-out-of-range-compare) warning
(Jakub)

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IPv4: add extack info for IPv4 address add/delete

Add extack info for IPv4 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.

No extack message for the ifa_local checking in __inet_insert_ifa() as
it has been checked in find_matching_ifa().

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Check more MAC HW features for XGMAC Core 3.20

1. XGMAC Core does not have hash_filter definition, it uses
vlhash(VLAN Hash Filtering) instead, skip hash_filter when XGMAC.
2. Show exact size of Hash Table instead of raw register value.
3. Show full description of safety features defined by Synopsys Databook.
4. When safety feature is configured with no parity, or ECC only,
keep FSM Parity Checking disabled.

Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-update-route-when-delete-saddr'

Hangbin Liu says:

====================
ipv6: update route when delete source address

Currently, when remove an address, the IPv6 route will not remove the
prefer source address when the address is bond to other device. Fix this
issue and add related tests as Ido and David suggested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_test: add a test case for IPv6 source address delete

Add a test case for IPv6 source address delete.

As David suggested, add tests:
- Single device using src address
- Two devices with the same source address
- VRF with single device using src address
- VRF with two devices using src address

As Ido points out, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device. This will give us similar results
to IPv4 if the route is installed in the same VRF as the nexthop device, but
not when the nexthop device is enslaved to a different VRF. So add tests:
- src address and nexthop dev in same VR
- src address and nexthop device in different VRF

The link local address delete logic is different from the global address.
It should only affect the associate device it bonds to. So add tests cases
for link local address testing.

Here is the test result:

IPv6 delete address route tests
    Single device using src address
    TEST: Prefsrc removed when src address removed on other device      [ OK ]
    Two devices with the same source address
    TEST: Prefsrc not removed when src address exist on other device    [ OK ]
    TEST: Prefsrc removed when src address removed on all devices       [ OK ]
    VRF with single device using src address
    TEST: Prefsrc removed when src address removed on other device      [ OK ]
    VRF with two devices using src address
    TEST: Prefsrc not removed when src address exist on other device    [ OK ]
    TEST: Prefsrc removed when src address removed on all devices       [ OK ]
    src address and nexthop dev in same VRF
    TEST: Prefsrc removed from VRF when source address deleted          [ OK ]
    TEST: Prefsrc in default VRF not removed                            [ OK ]
    TEST: Prefsrc not removed from VRF when source address exist        [ OK ]
    TEST: Prefsrc in default VRF removed                                [ OK ]
    src address and nexthop device in different VRF
    TEST: Prefsrc not removed from VRF when nexthop dev in diff VRF     [ OK ]
    TEST: Prefsrc not removed in default VRF                            [ OK ]
    TEST: Prefsrc removed from VRF when nexthop dev in diff VRF         [ OK ]
    TEST: Prefsrc removed in default VRF                                [ OK ]
    Table ID 0
    TEST: Prefsrc removed from default VRF when source address deleted  [ OK ]
    Link local source route
    TEST: Prefsrc not removed when delete ll addr from other dev        [ OK ]
    TEST: Prefsrc removed when delete ll addr                           [ OK ]
    TEST: Prefsrc not removed when delete ll addr from other dev        [ OK ]
    TEST: Prefsrc removed even ll addr still exist on other dev         [ OK ]

Tests passed:  19
Tests failed:   0

Suggested-by: Ido Schimmel <idosch@idosch.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: do not match device when remove source route

After deleting an IPv6 address on an interface and cleaning up the
related preferred source entries, it is important to ensure that all
routes associated with the deleted address are properly cleared. The
current implementation of rt6_remove_prefsrc() only checks the preferred
source addresses bound to the current device. However, there may be
routes that are bound to other devices but still utilize the same
preferred source address.

To address this issue, it is necessary to also delete entries that are
bound to other interfaces but share the same source address with the
current device. Failure to delete these entries would leave routes that
are bound to the deleted address unclear. Here is an example reproducer
(I have omitted unrelated routes):

+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 1:2:3:4::5/64 dev dummy1
+ ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5
+ ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5
+ ip -6 route show
1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium
7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium

As Ido reminds, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device, which is different with IPv4.
So, while removing the device checking, we also need to add an
ipv6_chk_addr() check to make sure the address does not exist on the other
devices of the rt nexthop device's VRF.

After fix:
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 metric 1024 pref medium

Reported-by: Thomas Haller <thaller@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2170513
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: vrf_route_leaking: remove ipv6_ping_frag from default testing

As the initial commit 1a01727676a8 ("selftests: Add VRF route leaking
tests") said, the IPv6 MTU test fails as source address selection
picking ::1. Every time we run the selftest this one report failed.
There seems not much meaning to keep reporting a failure for 3 years
that no one plan to fix/update. Let't just skip this one first. We can
add it back when the issue fixed.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: release reference to inet6_dev pointer

addrconf_prefix_rcv returned early without releasing the inet6_dev
pointer when the PIO lifetime is less than accept_ra_min_lft.

Fixes: 5027d54a9c30 ("net: change accept_ra_min_rtr_lft to affect all RA lifetimes")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: selectively purge error queue in IP_RECVERR / IPV6_RECVERR

Setting IP_RECVERR and IPV6_RECVERR options to zero currently
purges the socket error queue, which was probably not expected
for zerocopy and tx_timestamp users.

I discovered this issue while preparing commit 6b5f43ea0815
("inet: move inet->recverr to inet->inet_flags"), I presume this
change does not need to be backported to stable kernels.

Add skb_errqueue_purge() helper to purge error messages only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fixed_phy_register-return-value'

Ruan Jinjie says:

====================
net: Return PTR_ERR() for fixed_phy_register()

fixed_phy_register() returns not only -EIO or -ENODEV, but also
-EPROBE_DEFER, -EINVAL and -EBUSY. The Best practice is to return these
error codes with PTR_ERR().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan743x: Return PTR_ERR() for fixed_phy_register()

fixed_phy_register() returns -EPROBE_DEFER, -EINVAL and -EBUSY,
etc, in addition to -EIO. The Best practice is to return these
error codes with PTR_ERR().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Return PTR_ERR() for fixed_phy_register()

fixed_phy_register() returns -EPROBE_DEFER, -EINVAL and -EBUSY,
etc, in addition to -ENODEV. The Best practice is to return these
error codes with PTR_ERR().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bgmac: Return PTR_ERR() for fixed_phy_register()

fixed_phy_register() returns -EPROBE_DEFER, -EINVAL and -EBUSY,
etc, in addition to -ENODEV. The best practice is to return
these error codes with PTR_ERR().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: realtek: add phylink_get_caps implementation

The user ports use RSGMII, but we don't have that, and DT doesn't
specify a phy interface mode, so phylib defaults to GMII. These support
1G, 100M and 10M with flow control. It is unknown whether asymetric
pause is supported at all speeds.

The CPU port uses MII/GMII/RGMII/REVMII by hardware pin strapping,
and support speeds specific to each, with full duplex only supported
in some modes. Flow control may be supported again by hardware pin
strapping, and theoretically is readable through a register but no
information is given in the datasheet for that.

So, we do a best efforts - and be lenient.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vcap_get_rule-return-value'

Ruan Jinjie says:

====================
net: Update and fix return value check for vcap_get_rule()

As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), which would be more cleaner.

So se IS_ERR() to update the return value and fix the issue
in lan966x_ptp_add_trap().

Changes in v2:
- Update vcap_get_rule() to always return an ERR_PTR().
- Update the return value fix in lan966x_ptp_add_trap().
- Update the return value check in sparx5_tc_free_rule_resources().
====================

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: sparx5: Update return value check for vcap_get_rule()

As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to check the return value.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan966x: Fix return value check for vcap_get_rule()

As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to fix the return value issue.

Fixes: 72df3489fb10 ("net: lan966x: Add ptp trap rules")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: microchip: vcap api: Always return ERR_PTR for vcap_get_rule()

As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), which would be more cleaner in this case.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: xgene: remove useless xgene_mdio_status

xgene_mdio_status is declared static, and is only written once by the
driver. It appears to have been this way since the driver was first
added to the kernel tree. No other users can be found, so let's remove
it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: ynl-gen: use temporary file for rendering

Currently any error during render leads to output an empty file.
That is quite annoying when using tools/net/ynl/ynl-regen.sh
which git greps files with content of "YNL-GEN.." and therefore ignores
empty files. So once you fail to regen, you have to checkout the file.

Avoid that by rendering to a temporary file first, only at the end
copy the content to the actual destination.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: intel: Enable correction of MAC propagation delay

All captured timestamps should be corrected by PHY, MAC and CDC introduced
latency/errors. The CDC correction is already used. Enable MAC propagation delay
correction as well which is available since commit 26cfb838aa00 ("net: stmmac:
correct MAC propagation delay").

Before:
|ptp4l[390.458]: rms 7 max 21 freq +177 +/- 14 delay 357 +/- 1

After:
|ptp4l[620.012]: rms 7 max 20 freq +195 +/- 14 delay 345 +/- 1

Tested on Intel Elkhart Lake.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skb_queue_purge_reason and __skb_queue_purge_reason

skb_queue_purge() and __skb_queue_purge() become wrappers
around the new generic functions.

New SKB_DROP_REASON_QUEUE_PURGE drop reason is added,
but users can start adding more specific reasons.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc-features'

Guangguan Wang says:

====================
net/smc: several features's implementation for smc v2.1

This patch set implement several new features in SMC v2.1(https://
www.ibm.com/support/pages/node/7009315), including vendor unique
experimental options, max connections per lgr negotiation, max links
per lgr negotiation.

v1 - v2:
- rename field fce_v20 to fce_v2_base in struct
   smc_clc_first_contact_ext_v2x
- use smc_get_clc_first_contact_ext in smc_connect
   _rdma_v2_prepare
- adding comment about field vendor_oui in struct
   smc_clc_msg_smcd
- remove comment about SMC_CONN_PER_LGR_MAX in smc_
   clc_srv_v2x_features_validate
- rename smc_clc_clnt_v2x_features_validate

RFC v2 - v1:
- more description in commit message
- modify SMC_CONN_PER_LGR_xxx and SMC_LINKS_ADD_LNK_xxx
   macro defination and usage
- rename field release_ver to release_nr
- remove redundant release version check in client
- explicitly set the rc value in smc_llc_cli/srv_add_link

RFC v1 - RFC v2:
- Remove ini pointer NULL check and fix code style in
   smc_clc_send_confirm_accept.
- Optimize the max_conns check in smc_clc_xxx_v2x_features_validate.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: Extend SMCR v2 linkgroup netlink attribute

Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS
to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for
linkgroup's detail info showing.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: support max links per lgr negotiation in clc handshake

Support max links per lgr negotiation in clc handshake for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for the final
value of max links based on the client preferred max links and
self-preferred max links. Here use the minimum value of the client
preferred max links and server preferred max links.

Client                                       Server
     Proposal(max links(client preferred))
     -------------------------------------->

     Accept(max links(accepted value))
accepted value=min(client preferred, server preferred)
     <-------------------------------------

      Confirm(max links(accepted value))
     ------------------------------------->

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: support max connections per lgr negotiation

Support max connections per lgr negotiation for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for
the final value of max conns based on the client preferred
max conns and self-preferred max conns. Here use the minimum
value of client preferred max conns and server preferred max
conns.

Client                                     Server
     Proposal(max conns(client preferred))
     ------------------------------------>

     Accept(max conns(accepted value))
accepted value=min(client preferred, server preferred)
     <-----------------------------------

     Confirm(max conns(accepted value))
     ----------------------------------->

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: support smc v2.x features validate

Support SMC v2.x features validate for SMC v2.1. This is the frame
code for SMC v2.x features validate, and will take effects only when
the negotiated release version is v2.1 or later.

For Server, v2.x features' validation should be done in smc_clc_srv_
v2x_features_validate when receiving v2.1 or later CLC Proposal Message,
such as max conns, max links negotiation, the decision of the final
value of max conns and max links should be made in this function.
And final check for server when receiving v2.1 or later CLC Confirm
Message should be done in smc_clc_v2x_features_confirm_check.

For client, v2.x features' validation should be done in smc_clc_clnt_
v2x_features_validate when receiving v2.1 or later CLC Accept Message,
for example, the decision to accpt the accepted value or to decline
should be made in this function.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: add vendor unique experimental options area in clc handshake

Add vendor unique experimental options area in clc handshake. In clc
accept and confirm msg, vendor unique experimental options use the
16-Bytes reserved field, which defined in struct smc_clc_fce_gid_ext
in previous version. Because of the struct smc_clc_first_contact_ext
is widely used and limit the scope of modification, this patch moves
the 16-Bytes reserved field out of struct smc_clc_fce_gid_ext, and
followed with the struct smc_clc_first_contact_ext in a new struct
names struct smc_clc_first_contact_ext_v2x.

For SMC-R first connection, in previous version, the struct smc_clc_
first_contact_ext and the 16-Bytes reserved field has already been
included in clc accept and confirm msg. Thus, this patch use struct
smc_clc_first_contact_ext_v2x instead of the struct smc_clc_first_
contact_ext and the 16-Bytes reserved field in SMC-R clc accept and
confirm msg is compatible with previous version.

For SMC-D first connection, in previous version, only the struct smc_
clc_first_contact_ext is included in clc accept and confirm msg, and
the 16-Bytes reserved field is not included. Thus, when the negotiated
smc release version is the version before v2.1, we still use struct
smc_clc_first_contact_ext for compatible consideration. If the negotiated
smc release version is v2.1 or later, use struct smc_clc_first_contact_
ext_v2x instead.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: support smc release version negotiation in clc handshake

Support smc release version negotiation in clc handshake based on
SMC v2, where no negotiation process for different releases, but
for different versions. The latest smc release version was updated
to v2.1. And currently there are two release versions of SMCv2, v2.0
and v2.1. In the release version negotiation, client sends the preferred
release version by CLC Proposal Message, server makes decision for which
release version to use based on the client preferred release version and
self-supported release version (here choose the minimum release version
of the client preferred and server latest supported), then the decision
returns to client by CLC Accept Message. Client confirms the decision by
CLC Confirm Message.

Client                                    Server
      Proposal(preferred release version)
     ------------------------------------>

      Accept(accpeted release version)
min(client preferred, server latest supported)
     <------------------------------------

      Confirm(accpeted release version)
     ------------------------------------>

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: freescale: Remove unused declarations

Commit 5d93cfcf7360 ("net: dpaa: Convert to phylink") removed
fman_set_mac_active_pause()/fman_get_pause_cfg() but not declarations.
Commit 48257c4f168e ("Add fs_enet ethernet network driver, for several
embedded platforms.") declared but never implemented
fs_enet_platform_init() and fs_enet_platform_cleanup().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230817134159.38484-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: refine skb->ooo_okay setting

Enabling BIG TCP on a low end platform apparently increased
chances of getting flows locked on one busy TX queue.

A similar problem was handled in commit 9b462d02d6dd
("tcp: TCP Small Queues and strange attractors"),
but the strategy worked for either bulk flows,
or 'large enough' RPC. BIG TCP changed how large
RPC needed to be to enable the work around:
If RPC fits in a single skb, TSQ never triggers.

Root cause for the problem is a busy TX queue,
with delayed TX completions.

This patch changes how we set skb->ooo_okay to detect
the case TX completion was not done, but incoming ACK
already was processed and emptied rtx queue.

Update the comment to explain the tricky details.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bnxt_en-update-for-net-next'

Michael Chan says:

====================
bnxt_en: Update for net-next

This patchset contains 2 features:

- The page pool implementation for the normal RX path (non-XDP) for
paged buffers in the aggregation ring.

- Saving of the ring error counters across reset.
====================

Link: https://lore.kernel.org/r/20230817231911.165035-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add tx_resets ring counter

Add a new tx_resets ring counter.  This counter will be saved as
tx_total_resets across any reset.  Since we currently do a full reset
in bnxt_sched_reset_txr(), the per ring counter will always be cleared
during reset.  Only the tx_total_resets count will be meaningful and we
only display this under ethtool -S.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Display the ring error counters under ethtool -S

The existing driver displays the sum of 4 ring counters under ethtool -S.
These counters are in the array bnxt_sw_func_stats. These counters are
summed at the time of ethtool -S and will be lost when the device is reset.

Replace these counters with the new total ring error counters added in the
last patch. These new counters are saved before reset. ethtool -S will
now display the sum of the saved counters plus the current counters.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Save ring error counters across reset

Currently, the ring counters are stored in the per ring datastructure.
During reset, all the rings are freed together with the associated
datastructures. As a result, all the ring error counters will be reset
to zero.

Add logic to keep track of the total error counts of all the rings
and save them before reset (including ifdown). The next patch will
display these total ring error counters under ethtool -S.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Increment rx_resets counter in bnxt_disable_napi()

If we are doing a complete reset with irq_re_init set to true in
bnxt_close_nic(), all the ring structures will be freed.  New
structures will be allocated in bnxt_open_nic().  The current code
increments rx_resets counter in bnxt_enable_napi() if bnapi->in_reset
is true.  In a complete reset, bnapi->in_reset will never be true
since the structure is just allocated.

Increment the rx_resets counter in bnxt_disable_napi() instead.  This
will allow us to save all the ring error counters including the
rx_resets counters in the next patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Let the page pool manage the DMA mapping

Use the page pool's ability to maintain DMA mappings for us.
This avoids re-mapping of the recycled pages.

Link: https://lore.kernel.org/netdev/20230728231829.235716-4-michael.chan@broadcom.com/
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP

Convert to use the page pool buffers for the aggregation ring when
running in non-XDP mode. This simplifies the driver and we benefit
from the recycling of pages. Adjust the page pool size to account
for the aggregation ring size.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-08-17 (ice)

This series contains updates to ice driver only.

Jan removes unused functions and refactors code to make, possible,
functions static.

Jake rearranges some functions to be logically grouped.

Marcin removes an unnecessary call to disable VLAN stripping.

Yang Yingliang utilizes list_for_each_entry() helper for a couple list
traversals.

Przemek removes some parameters from ice_aq_alloc_free_res() which were
always the same and reworks ice_aq_wait_for_event() to reduce chance of
race.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: split ice_aq_wait_for_event() func into two
  ice: embed &ice_rq_event_info event into struct ice_aq_task
  ice: ice_aq_check_events: fix off-by-one check when filling buffer
  ice: drop two params from ice_aq_alloc_free_res()
  ice: use list_for_each_entry() helper
  ice: Remove redundant VSI configuration in eswitch setup
  ice: move E810T functions to before device agnostic ones
  ice: refactor ice_vsi_is_vlan_pruning_ena
  ice: refactor ice_ptp_hw to make functions static
  ice: refactor ice_sched to make functions static
  ice: Utilize assign_bit() helper
  ice: refactor ice_vf_lib to make functions static
  ice: refactor ice_lib to make functions static
  ice: refactor ice_ddp to make functions static
  ice: remove unused methods
====================

Link: https://lore.kernel.org/r/20230817212239.2601543-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dm9051: Use PTR_ERR_OR_ZERO() to simplify code

Return PTR_ERR_OR_ZERO() instead of return 0 or PTR_ERR() to
simplify code.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230817022418.3588831-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sky2: Remove redundant NULL check for debugfs_create_dir

Since debugfs_create_dir() returns ERR_PTR, IS_ERR() is enough to
check whether the directory is successfully created. So remove the
redundant NULL check.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230817073017.350002-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: remove redundant pci_clear_master()

do_pci_disable_device() disable PCI bus-mastering as following:
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}
And pci_disable_device() sets dev->is_busmaster to 0.

pci_enable_device() is called only once before calling to
pci_disable_device() and such pci_clear_master() is not needed. So remove
redundant pci_clear_master().

Also rename goto label 'err_out_clear_master' to 'err_out_disable_device'.

Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20230817025709.2023553-1-liaoyu15@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
virtchnl: fix fake 1-elem arrays

Alexander Lobakin says:

6.5-rc1 started spitting warning splats when composing virtchnl
messages, precisely on virtchnl_rss_key and virtchnl_lut:

[   84.167709] memcpy: detected field-spanning write (size 52) of single
field "vrk->key" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1095
(size 1)
[   84.169915] WARNING: CPU: 3 PID: 11 at drivers/net/ethernet/intel/
iavf/iavf_virtchnl.c:1095 iavf_set_rss_key+0x123/0x140 [iavf]
...
[   84.191982] Call Trace:
[   84.192439]  <TASK>
[   84.192900]  ? __warn+0xc9/0x1a0
[   84.193353]  ? iavf_set_rss_key+0x123/0x140 [iavf]
[   84.193818]  ? report_bug+0x12c/0x1b0
[   84.194266]  ? handle_bug+0x42/0x70
[   84.194714]  ? exc_invalid_op+0x1a/0x50
[   84.195149]  ? asm_exc_invalid_op+0x1a/0x20
[   84.195592]  ? iavf_set_rss_key+0x123/0x140 [iavf]
[   84.196033]  iavf_watchdog_task+0xb0c/0xe00 [iavf]
...
[   84.225476] memcpy: detected field-spanning write (size 64) of single
field "vrl->lut" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1127
(size 1)
[   84.227190] WARNING: CPU: 27 PID: 1044 at drivers/net/ethernet/intel/
iavf/iavf_virtchnl.c:1127 iavf_set_rss_lut+0x123/0x140 [iavf]
...
[   84.246601] Call Trace:
[   84.247228]  <TASK>
[   84.247840]  ? __warn+0xc9/0x1a0
[   84.248263]  ? iavf_set_rss_lut+0x123/0x140 [iavf]
[   84.248698]  ? report_bug+0x12c/0x1b0
[   84.249122]  ? handle_bug+0x42/0x70
[   84.249549]  ? exc_invalid_op+0x1a/0x50
[   84.249970]  ? asm_exc_invalid_op+0x1a/0x20
[   84.250390]  ? iavf_set_rss_lut+0x123/0x140 [iavf]
[   84.250820]  iavf_watchdog_task+0xb16/0xe00 [iavf]

Gustavo already tried to fix those back in 2021[0][1]. Unfortunately,
a VM can run a different kernel than the host, meaning that those
structures are sorta ABI.
However, it is possible to have proper flex arrays + struct_size()
calculations and still send the very same messages with the same sizes.
The common rule is:

elem[1] -> elem[]
size = struct_size() + <difference between the old and the new msg size>

The "old" size in the current code is calculated 3 different ways for
10 virtchnl structures total. Each commit addresses one of the ways
cumulatively instead of per-structure.

I was planning to send it to -net initially, but given that virtchnl was
renamed from i40evf and got some fat style cleanup commits in the past,
it's not very straightforward to even pick appropriate SHAs, not
speaking of automatic portability. I may send manual backports for
a couple of the latest supported kernels later on if anyone needs it
at all.

[0] https://lore.kernel.org/all/20210525230912.GA175802@embeddedor
[1] https://lore.kernel.org/all/20210525231851.GA176647@embeddedor

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  virtchnl: fix fake 1-elem arrays for structures allocated as `nents`
  virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1`
  virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1
====================

Link: https://lore.kernel.org/r/20230816210657.1326772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'batadv-next-pullrequest-20230816' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- Remove unused declarations, by Yue Haibing

- Clean up MTU handling, by Sven Eckelmann (2 patches)

- Clean up/remove (obsolete) functions, by Sven Eckelmann (3 patches)

* tag 'batadv-next-pullrequest-20230816' of git://git.open-mesh.org/linux-merge:
  batman-adv: Drop per algo GW section class code
  batman-adv: Keep batadv_netlink_notify_* static
  batman-adv: Drop unused function batadv_gw_bandwidth_set
  batman-adv: Check hardif MTU against runtime MTU
  batman-adv: Avoid magic value for minimum MTU
  batman-adv: Remove unused declarations
  batman-adv: Start new development cycle
====================

Link: https://lore.kernel.org/r/20230816164000.190884-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: use napi_consume_skb() in fec_enet_tx_queue()

Now that the "budget" is passed into fec_enet_tx_queue(), one
optimization we can do is to use napi_consume_skb() to instead
of dev_kfree_skb_any().

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Alexander H Duyck <alexander.duyck@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230816090242.463822-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: Remove unneeded semicolon

./drivers/net/ethernet/sfc/tc_conntrack.c:464:2-3: Unneeded semicolon

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230816004944.10841-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache

Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
in commit d0bf7d5759c1 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
now is the time to use this flag for networking as proposed
earlier see link.

The SKB (sk_buff) kmem_cache slab is critical for network performance.
Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
performance by amortising the alloc/free cost.

For the bulk API to perform efficiently the slub fragmentation need to
be low. Especially for the SLUB allocator, the efficiency of bulk free
API depend on objects belonging to the same slab (page).

When running different network performance microbenchmarks, I started
to notice that performance was reduced (slightly) when machines had
longer uptimes. I believe the cause was 'skbuff_head_cache' got
aliased/merged into the general slub for 256 bytes sized objects (with
my kernel config, without CONFIG_HARDENED_USERCOPY).

For SKB kmem_cache network stack have other various reasons for
not merging, but it varies depending on kernel config (e.g.
CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE
for this kmem_cache to get most out of kmem_cache_{alloc,free}_bulk APIs.

When CONFIG_SLUB_TINY is configured the bulk APIs are essentially
disabled. Thus, for this case drop the SLAB_NO_MERGE flag.

Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/169211265663.1491038.8580163757548985946.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
fa165e194997 ("sfc: don't unregister flow_indr if it was never registered")
3bf969e88ada ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.5-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from ipsec and netfilter.

  No known outstanding regressions.

  Fixes to fixes:

   - virtio-net: set queues after driver_ok, avoid a potential race
     added by recent fix

   - Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning
     when VLAN 0 is registered explicitly

   - nf_tables:
      - fix false-positive lockdep splat in recent fixes
      - don't fail inserts if duplicate has expired (fix test failures)
      - fix races between garbage collection and netns dismantle

  Current release - new code bugs:

   - mlx5: Fix mlx5_cmd_update_root_ft() error flow

  Previous releases - regressions:

   - phy: fix IRQ-based wake-on-lan over hibernate / power off

  Previous releases - always broken:

   - sock: fix misuse of sk_under_memory_pressure() preventing system
     from exiting global TCP memory pressure if a single cgroup is under
     pressure

   - fix the RTO timer retransmitting skb every 1ms if linear option is
     enabled

   - af_key: fix sadb_x_filter validation, amment netlink policy

   - ipsec: fix slab-use-after-free in decode_session6()

   - macb: in ZynqMP resume always configure PS GTR for non-wakeup
     source

  Misc:

   - netfilter: set default timeout to 3 secs for sctp shutdown send and
     recv state (from 300ms), align with protocol timers"

* tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  ice: Block switchdev mode when ADQ is active and vice versa
  qede: fix firmware halt over suspend and resume
  net: do not allow gso_size to be set to GSO_BY_FRAGS
  sock: Fix misuse of sk_under_memory_pressure()
  sfc: don't fail probe if MAE/TC setup fails
  sfc: don't unregister flow_indr if it was never registered
  net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
  net/mlx5: Fix mlx5_cmd_update_root_ft() error flow
  net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT
  i40e: fix misleading debug logs
  iavf: fix FDIR rule fields masks validation
  ipv6: fix indentation of a config attribute
  mailmap: add entries for Simon Horman
  broadcom: b44: Use b44_writephy() return value
  net: openvswitch: reject negative ifindex
  team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
  net: phy: broadcom: stub c45 read/write for 54810
  netfilter: nft_dynset: disallow object maps
  netfilter: nf_tables: GC transaction race with netns dismantle
  netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
  ...

Merge tag 'drm-fixes-2023-08-18-1' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Regular enough week, mostly the usual amdgpu and i915 fixes.  Also
  qaic, nouveau, qxl and a revert for an EDID patch that had some side
  effects, along with a couple of panel fixes.

  edid:
   - revert mode parsing fix that had side effects.

  i915:
   - Fix the flow for ignoring GuC SLPC efficient frequency selection
   - Fix SDVO panel_type initialization
   - Fix display probe for IVB Q and IVB D GT2 server

  nouveau:
   - fix use-after-free in connector code

  qaic:
   - integer overflow check fix
   - fix slicing memory leak

  panel:
   - fix JDI LT070ME05000 probing
   - fix AUO G121EAN01 timings

  amdgpu:
   - SMU 13.x fixes
   - Fix mcbp parameter for gfx9
   - SMU 11.x fixes
   - Temporary fix for large numbers of XCP partitions
   - S0ix fixes
   - DCN 2.0 fix

  qxl:
   - fix use after free race in dumb object allocation"

* tag 'drm-fixes-2023-08-18-1' of git://anongit.freedesktop.org/drm/drm:
  drm/qxl: fix UAF on handle creation
  Revert "drm/edid: Fix csync detailed mode parsing"
  drm/nouveau/disp: fix use-after-free in error handling of nouveau_connector_create
  Revert "Revert "drm/amdgpu/display: change pipe policy for DCN 2.0""
  drm/amd: flush any delayed gfxoff on suspend entry
  drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
  drm/amdgpu: skip xcp drm device allocation when out of drm resource
  drm/amd/pm: Update pci link width for smu v13.0.6
  drm/amd/pm: Fix temperature unit of SMU v13.0.6
  drm/amdgpu/pm: fix throttle_status for other than MP1 11.0.7
  drm/amdgpu: disable mcbp if parameter zero is set
  drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0
  accel/qaic: Clean up integer overflow checking in map_user_pages()
  accel/qaic: Fix slicing memory leak
  drm/i915: fix display probe for IVB Q and IVB D GT2 server
  drm/i915/sdvo: fix panel_type initialization
  drm/i915/guc/slpc: Restore efficient freq earlier
  drm/panel: simple: Fix AUO G121EAN01 panel timings according to the docs
  drm/panel: JDI LT070ME05000 simplify with dev_err_probe()

Merge branch 'netconsole-enable-compile-time-configuration'

Breno Leitao says:

====================
netconsole: Enable compile time configuration

Enable netconsole features to be set at compilation time. Create two
Kconfig options that allow users to set extended logs and release
prepending features at compilation time.

The first patch de-duplicates the initialization code, and the second
patch adds the support in the de-duplicated code, avoiding touching two
different functions with the same change.
====================

Link: https://lore.kernel.org/r/20230811093158.1678322-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netconsole: Enable compile time configuration

Enable netconsole features to be set at compilation time. Create two
Kconfig options that allow users to set extended logs and release
prepending features at compilation time.

Right now, the user needs to pass command line parameters to netconsole,
such as "+"/"r" to enable extended logs and version prepending features.

With these two options, the user could set the default values for the
features at compile time, and don't need to pass it in the command line
to get them enabled, simplifying the command line.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230811093158.1678322-3-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netconsole: Create a allocation helper

De-duplicate the initialization and allocation code for struct
netconsole_target.

The same allocation and initialization code is duplicated in two
different places in the netconsole subsystem, when the netconsole target
is initialized by command line parameters (alloc_param_target()), and
dynamically by sysfs (make_netconsole_target()).

Create a helper function, and call it from the two different functions.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230811093158.1678322-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: fix -Wvoid-pointer-to-enum-cast warning

When building with clang 18 I see the following warning:
|       drivers/net/mdio/mdio-xgene.c:338:13: warning: cast to smaller integer
|               type 'enum xgene_mdio_id' from 'const void *' [-Wvoid-pointer-to-enum-cast]
|         338 |                 mdio_id = (enum xgene_mdio_id)of_id->data;

This is due to the fact that `of_id->data` is a void* while `enum
xgene_mdio_id` has the size of an int. This leads to truncation and
possible data loss.

Link: https://github.com/ClangBuiltLinux/linux/issues/1910
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230815-void-drivers-net-mdio-mdio-xgene-v1-1-5304342e0659@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'netem-use-a-seeded-prng-for-loss-and-corruption-events'

François Michel says:

====================
netem: use a seeded PRNG for loss and corruption events

In order to reproduce bugs or performance evaluation of
network protocols and applications, it is useful to have
reproducible test suites and tools. This patch adds
a way to specify a PRNG seed through the
TCA_NETEM_PRNG_SEED attribute for generating netem
loss and corruption events. Initializing the qdisc
with the same seed leads to the exact same loss
and corruption patterns. If no seed is explicitly
specified, the qdisc generates a random seed using
get_random_u64().

This patch can be and has been tested using tc from
the following iproute2-next fork:
https://github.com/francoismichel/iproute2-next

For instance, setting the seed 42424242 on the loopback
with a loss rate of 10% will systematically drop the 5th,
12th and 24th packet when sending 25 packets.
====================

Link: https://lore.kernel.org/r/20230815092348.1449179-1-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netem: use seeded PRNG for correlated loss events

Use prandom_u32_state() instead of get_random_u32() to generate
the correlated loss events of netem.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230815092348.1449179-4-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netem: use a seeded PRNG for generating random losses

Use prandom_u32_state() instead of get_random_u32() to generate
the random loss events of netem. The state of the prng is part
of the prng attribute of struct netem_sched_data.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230815092348.1449179-3-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netem: add prng attribute to netem_sched_data

Add prng attribute to struct netem_sched_data and
allows setting the seed of the PRNG through netlink
using the new TCA_NETEM_PRNG_SEED attribute.
The PRNG attribute is not actually used yet.

Signed-off-by: François Michel <francois.michel@uclouvain.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20230815092348.1449179-2-francois.michel@uclouvain.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: Use pci_dev_id() to simplify the code

PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.

Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230815024248.3519068-1-zhangjialin11@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tun: add __exit annotations to module exit func tun_cleanup()

Add missing __exit annotations to module exit func tun_cleanup().

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230814083000.3893589-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e)

This series contains updates to iavf and i40e drivers.

Piotr adds checks for unsupported Flow Director rules on iavf.

Andrii replaces incorrect 'write' messaging on read operations for i40e.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix misleading debug logs
iavf: fix FDIR rule fields masks validation
====================

Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>