review.tizen.org Git - platform/kernel/linux-starfive.git/log

net: bridge: mst: prevent NULL deref in br_mst_info_size()

Call br_mst_info_size() only if vg pointer is not NULL.

general protection fault, probably for non-canonical address 0xdffffc0000000058: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000002c0-0x00000000000002c7]
CPU: 0 PID: 975 Comm: syz-executor.0 Tainted: G        W         5.17.0-next-20220321-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:br_mst_info_size+0x97/0x270 net/bridge/br_mst.c:242
Code: 00 00 31 c0 e8 ba 10 53 f9 31 c0 b9 40 00 00 00 4c 8d 6c 24 30 4c 89 ef f3 48 ab 48 8d 83 c0 02 00 00 48 89 04 24 48 c1 e8 03 <80> 3c 28 00 0f 85 ae 01 00 00 48 8b 83 c0 02 00 00 41 bf 04 00 00
RSP: 0018:ffffc900153770a8 EFLAGS: 00010202
RAX: 0000000000000058 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff88259876 RDI: ffffc900153772d8
RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffffff8db68957
R10: ffffffff881f737b R11: 0000000000000000 R12: 0000000000000000
R13: ffffc900153770d8 R14: 00000000000002a0 R15: 00000000ffffffff
FS:  00007f18bbb6f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001a80 CR3: 000000001a7d9000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 00000000000000d8 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
br_get_link_af_size_filtered+0x6e9/0xc00 net/bridge/br_netlink.c:123
rtnl_link_get_af_size net/core/rtnetlink.c:598 [inline]
if_nlmsg_size+0x40c/0xa50 net/core/rtnetlink.c:1040
rtnl_calcit.isra.0+0x25f/0x460 net/core/rtnetlink.c:3780
rtnetlink_rcv_msg+0xa65/0xb80 net/core/rtnetlink.c:5937
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f18baa89049
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f18bbb6f168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f18bab9bf60 RCX: 00007f18baa89049
RDX: 0000000000000000 RSI: 0000000020001a80 RDI: 0000000000000004
RBP: 00007f18baae308d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffeedb2be2f R14: 00007f18bbb6f300 R15: 0000000000022000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:br_mst_info_size+0x97/0x270 net/bridge/br_mst.c:242
Code: 00 00 31 c0 e8 ba 10 53 f9 31 c0 b9 40 00 00 00 4c 8d 6c 24 30 4c 89 ef f3 48 ab 48 8d 83 c0 02 00 00 48 89 04 24 48 c1 e8 03 <80> 3c 28 00 0f 85 ae 01 00 00 48 8b 83 c0 02 00 00 41 bf 04 00 00
RSP: 0018:ffffc900153770a8 EFLAGS: 00010202
RAX: 0000000000000058 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff88259876 RDI: ffffc900153772d8
RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffffff8db68957
R10: ffffffff881f737b R11: 0000000000000000 R12: 0000000000000000
R13: ffffc900153770d8 R14: 00000000000002a0 R15: 00000000ffffffff
FS:  00007f18bbb6f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2ca22000 CR3: 000000001a7d9000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 00000000000000d8 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 122c29486e1f ("net: bridge: mst: Support setting and reporting MST port states")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220322012314.795187-1-eric.dumazet@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'selftests-forwarding-locked-bridge-port-fixes'

Ido Schimmel says:

====================
selftests: forwarding: Locked bridge port fixes

Two fixes for the locked bridge port selftest.
====================

Link: https://lore.kernel.org/r/20220321175102.978020-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: forwarding: Use same VRF for port and VLAN upper

The test creates a separate VRF for the VLAN upper, but does not destroy
it during cleanup, resulting in "RTNETLINK answers: File exists" errors.

Fix by using the same VRF for the port and its VLAN upper. This is OK
since their IP addresses do not overlap.

Before:

# ./bridge_locked_port.sh
TEST: Locked port ipv4                                              [ OK ]
TEST: Locked port ipv6                                              [ OK ]
TEST: Locked port vlan                                              [ OK ]

# ./bridge_locked_port.sh
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
TEST: Locked port ipv4                                              [ OK ]
TEST: Locked port ipv6                                              [ OK ]
TEST: Locked port vlan                                              [ OK ]

After:

# ./bridge_locked_port.sh
TEST: Locked port ipv4                                              [ OK ]
TEST: Locked port ipv6                                              [ OK ]
TEST: Locked port vlan                                              [ OK ]

# ./bridge_locked_port.sh
TEST: Locked port ipv4                                              [ OK ]
TEST: Locked port ipv6                                              [ OK ]
TEST: Locked port vlan                                              [ OK ]

Fixes: b2b681a41251 ("selftests: forwarding: tests of locked port feature")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: forwarding: Disable learning before link up

Disable learning before bringing the bridge port up in order to avoid
the FDB being populated and the test failing.

Before:

# ./bridge_locked_port.sh
RTNETLINK answers: File exists
TEST: Locked port ipv4                                              [FAIL]
         Ping worked after locking port, but before adding FDB entry
TEST: Locked port ipv6                                              [ OK ]
TEST: Locked port vlan                                              [ OK ]

After:

# ./bridge_locked_port.sh
TEST: Locked port ipv4                                              [ OK ]
TEST: Locked port ipv6                                              [ OK ]
TEST: Locked port vlan                                              [ OK ]

Fixes: b2b681a41251 ("selftests: forwarding: tests of locked port feature")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnx2x: truncate value to original sizing

The original behavior was to print out unsigned short or unsigned char
values. The change in commit d65aea8e8298 ("bnx2x: use correct format
characters") prints out the whole value if not truncated. So truncate
the value to an unsigned {short|char} to retain the original behavior.

Fixes: d65aea8e8298 ("bnx2x: use correct format characters")
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Bill Wendling <morbo@google.com>
Link: https://lore.kernel.org/r/20220321023155.106066-1-morbo@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mscc-miim-add-integrated-phy-reset-support'

Michael Walle says:

====================
net: mscc-miim: add integrated PHY reset support

The MDIO driver has support to release the integrated PHYs from reset.
This was implemented for the SparX-5 for now. Now add support for the
LAN966x, too.
====================

Link: https://lore.kernel.org/r/20220318201324.1647416-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mscc-miim: add lan966x internal phy reset support

The LAN966x has two internal PHYs which are in reset by default. The
driver already supported the internal PHYs of the SparX-5. Now add
support for the LAN966x, too. Add a new compatible to distinguish them.

The LAN966x has additional control bits in this register, thus convert
the regmap_write() to regmap_update_bits() to leave the remaining bits
untouched. This doesn't change anything for the SparX-5 SoC, because
there, the register consists only of reset bits.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: mscc-miim: replace magic numbers for the bus reset

Replace the magic numbers by macros which are already defined. It seems
the original commit missed to use them.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: mscc-miim: add lan966x compatible

The MDIO controller has support to release the internal PHYs from reset
by specifying a second memory resource. This is different between the
currently supported SparX-5 and the LAN966x. Add a new compatible to
distinguish between these two.

Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Fill in STU support for all supported chips

Some chips using the split VTU/STU design will not accept VTU entries
who's SID points to an invalid STU entry. Therefore, mark all those
chips with either the mv88e6352_g1_stu_* or mv88e6390_g1_stu_* ops as
appropriate.

Notably, chips for the Opal Plus (6085/6097) era seem to use a
different implementation than those from Agate (6352) and onwards,
even though their external interface is the same. The former happily
accepts VTU entries referencing invalid STU entries, while the latter
does not.

This fixes an issue where the driver would fail to probe switch trees
that contained chips of the Agate/Topaz generation which did not
declare STU support, as loaded VTU entries would be read back as
invalid.

Fixes: 49c98c1dc7d9 ("net: dsa: mv88e6xxx: Disentangle STU from VTU")
Reported-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Marek Behún <kabel@kernel.org>
Link: https://lore.kernel.org/r/20220319110345.555270-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: change fprintf format specifiers

`cur64`, `start64` and `ts_delta` are int64_t. Change format
specifiers in fprintf from `"%lu"` to `"%" PRId64` to adapt
to 32-bit and 64-bit systems.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Link: https://lore.kernel.org/r/20220319073730.5235-1-guozhengkui@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: allow PHY_INTERFACE_MODE_INTERNAL on port 5

The Felix switch has 6 ports, 2 of which are internal.
Due to some misunderstanding, my initial suggestion for
vsc9959_port_modes[]:
https://patchwork.kernel.org/project/netdevbpf/patch/20220129220221.2823127-10-colin.foster@in-advantage.com/#24718277

got translated by Colin into a 5-port array, leading to an all-zero port
mode mask for port 5.

Fixes: acf242fc739e ("net: dsa: felix: remove prevalidate_phy_mode interface")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220318195812.276276-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-mv88e6xxx-mst-fixes'

Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: MST Fixes

1/2 fixes the issue reported by Marek here:

https://lore.kernel.org/netdev/20220318182817.5ade8ecd@dellmb/

2/2 adds a missing capability check to the new .vlan_msti_set
callback.
====================

Link: https://lore.kernel.org/r/20220318201321.4010543-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Ensure STU support in VLAN MSTI callback

In the same way that we check for STU support in the MST state
callback, we should also verify it before trying to change a VLANs
MSTI membership.

Fixes: acaf4d2e36b3 ("net: dsa: mv88e6xxx: MST Offloading")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mv88e6xxx: Require ops be implemented to claim STU support

Simply having a physical STU table in the device doesn't do us any
good if there's no implementation of the relevant ops to access that
table. So ensure that chips that claim STU support can also talk to
the hardware.

This fixes an issue where chips that had a their ->info->max_sid
set (due to their family membership), but no implementation (due to
their chip-specific ops struct) would fail to probe.

Fixes: 49c98c1dc7d9 ("net: dsa: mv88e6xxx: Disentangle STU from VTU")
Reported-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-tls-some-optimizations-for-tls'

Ziyang Xuan says:

====================
net/tls: some optimizations for tls

Do some small optimizations for tls, including jump instructions
optimization, and judgement processes optimization.
====================

Link: https://lore.kernel.org/r/cover.1647658604.git.william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: optimize judgement processes in tls_set_device_offload()

It is known that priority setting HW offload when set tls TX/RX offload
by setsockopt(). Check netdevice whether support NETIF_F_HW_TLS_TX or
not at the later stages in the whole tls_set_device_offload() process,
some memory allocations have been done before that. We must release those
memory and return error if we judge the netdevice not support
NETIF_F_HW_TLS_TX. It is redundant.

Move NETIF_F_HW_TLS_TX judgement forward, and move start_marker_record
and offload_ctx memory allocation back slightly. Thus, we can get
simpler exception handling process.

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tls: remove unnecessary jump instructions in do_tls_setsockopt_conf()

Avoid using "goto" jump instruction unconditionally when we
can return directly. Remove unnecessary jump instructions in
do_tls_setsockopt_conf().

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Revert the softirq will run annotation in ____napi_schedule().

The lockdep annotation lockdep_assert_softirq_will_run() expects that
either hard or soft interrupts are disabled because both guaranty that
the "raised" soft-interrupts will be processed once the context is left.

This triggers in flush_smp_call_function_from_idle() but it this case it
explicitly calls do_softirq() in case of pending softirqs.

Revert the "softirq will run" annotation in ____napi_schedule() and move
the check back to __netif_rx() as it was. Keep the IRQ-off assert in
____napi_schedule() because this is always required.

Fixes: fbd9a2ceba5c7 ("net: Add lockdep asserts to ____napi_schedule().")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/YjhD3ZKWysyw8rc6@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-locking'

Jakub Kicinski says:

====================
devlink: hold the instance lock in eswitch callbacks

Series number 2 in the effort to hold the devlink instance lock
in call driver callbacks. We have the following drivers using
this API:

- bnxt, nfp, netdevsim - their own locking is removed / simplified
   by this series; all of them needed a lock to protect from changes
   to the number of VFs while switching modes, now the VF config bus
   callback takes the devlink instance lock via devl_lock();
- ice - appears not to allow changing modes while SR-IOV enabled,
   so nothing to do there;
- liquidio - does not contain any locking;
- octeontx2/af - is very special but at least doesn't have locking
   so doesn't get in the way either;
- mlx5 has a wealth of locks - I chickened out and dropped the lock
   in the callbacks so that I can leave the driver be, for now.

The last one is obviously not ideal, but I would prefer to transition
the API already as it make take longer.

v2: use a wrapper in mlx5 and extend the comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: hold the instance lock during eswitch_mode callbacks

Make the devlink core hold the instance lock during eswitch_mode
callbacks. Cheat in case of mlx5 (see the cover letter).

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: replace vfs_lock with devlink instance lock

Similarly to the previous commit, use the devlink instance
lock and let it replace the vfs_lock.

nsim_esw_legacy_enable() was locked by both port lock and
vfs lock so one set of lock/unlocks goes away.

netdevsim's .eswitch_mode_set callback is now ready for
the callback to take the instance lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: replace port_list_lock with devlink instance lock

Take advantage of the devlink instance lock for protecting
the port list. This will simplify locking even more once
all devlink callbacks hold the instance lock.

We need to add locking in nsim_dev_port_add_all() which used
to assume higher layer protection when accessing the list.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: add explicitly locked flavor of the rate node APIs

We'll need an explicitly locked rate node API for netdevsim
to switch eswitch mode setting to locked.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: use the devlink instance lock to protect sriov

In prep for .eswitch_mode_set being called with the devlink instance
lock held use that lock explicitly instead of creating a local mutex
just for the sriov reconfig.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'too-short'

Tong Zhang says:

====================
fix typos: "to short" -> "too short"

doing some code review and I found out there are a couple of places
where "too short" is misspelled as "to short".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: fix typo "frame to short" -> "frame too short"

"frame to short" -> "frame too short"

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i825xx: fix typo "Frame to short" -> "Frame too short"

"Frame to short" -> "Frame too short"

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/ctcm: fix typo "length to short" -> "length too short"

"packet length to short" -> "packet length too short"

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ar5523: fix typo "to short" -> "too short"

"RX USB to short" -> "RX USB too short"

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sparx5-mcast'

Casper Andersson says:

====================
net: sparx5: Add multicast support

Add multicast support to Sparx5.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sparx5: Add mdb handlers

Adds mdb handlers. Uses the PGID arbiter to
find a free entry in the PGID table for the
multicast group port mask.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sparx5: Add arbiter for managing PGID table

The PGID (Port Group ID) table holds port masks
for different purposes. The first 72 are reserved
for port destination masks, flood masks, and CPU
forwarding. The rest are shared between multicast,
link aggregation, and virtualization profiles. The
GLAG area is reserved to not be used by anything
else, since it is a subset of the MCAST area.

The arbiter keeps track of which entries are in
use. You can ask for a free ID or give back one
you are done using.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp3800'

Simon Horman says:

====================
nfp: support for NFP-3800

Yinjun Zhan says:

This is the second of a two part series to support the NFP-3800 device.

To utilize the new hardware features of the NFP-3800, driver adds support
of a new data path NFDK. This series mainly does some refactor work to the
data path related implementations. The data path specific implementations
are now separated into nfd3 and nfdk directories respectively, and the
common part is also moved into a new file.

* The series starts with a small refinement in Patch 1/10. Patches 2/10 and
  3/10 are the main refactoring of data path implementation, which prepares
  for the adding the NFDK data path.
* Before the introduction of NFDK, there's some more preparation work
  for NFP-3800 features, such as multi-descriptor per-packet and write-back
  mechanism of TX pointer, which is done in patches 4/10, 5/10, 6/10, 7/10.
* Patch 8/10 allows the driver to select data path according
  to firmware version. Finally, patches 9/10 and 10/10 introduce the new
  NFDK data path.

Changes between v1 and v2
* Correct kdoc for nfp_nfdk_tx()
* Correct build warnings on 32-bit

Thanks to everyone who contributed to this work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: nfdk: implement xdp tx path for NFDK

Due to the different definition of txbuf in NFDK comparing to NFD3,
there're no pre-allocated txbufs for xdp use in NFDK's implementation,
we just use the existed rxbuf and recycle it when xdp tx is completed.

For each packet to transmit in xdp path, we cannot use more than
`NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
and another is for dma address, so currently the amount of transmitted
bytes is not accumulated. Also we borrow the last bit of virtual addr
to indicate a new transmitted packet due to address's alignment
attribution.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add support for NFDK data path

Add new data path.  The TX is completely different, each packet
has multiple descriptor entries (between 2 and 32).  TX ring is
divided into blocks 32 descriptor, and descritors of one packet
can't cross block bounds. The RX side is the same for now.

ABI version 5 or later is required.  There is no support for
VLAN insertion on TX. XDP_TX action and AF_XDP zero-copy is not
implemented in NFDK path.

Changes to Jakub's work:
* Move statistics of hw_csum_tx after jumbo packet's segmentation.
* Set L3_CSUM flag to enable recaculating of L3 header checksum
in ipv4 case.
* Mark the case of TSO a packet with metadata prepended as
unsupported.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Xingfeng Hu <xingfeng.hu@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Dianchao Wang <dianchao.wang@corigine.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: choose data path based on version

Prepare for choosing data path based on the firmware version field.
Exploit one bit from the reserved byte in the firmware version field
as the data path type. We need the firmware version right after
vNIC is allocated, so it has to be read inside nfp_net_alloc(),
callers don't have to set it afterwards.

Following patches will bring the implementation of the second data
path.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add per-data path feature mask

Make sure that features supported only by some of the data paths
are not enabled for all. Add a mask of supported features into
the data path op structure.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: use TX ring pointer write back

Newer versions of the PCIe microcode support writing back the
position of the TX pointer back into host memory. This speeds
up TX completions, because we avoid a read from device memory
(replacing PCIe read with DMA coherent read).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move tx_ring->qcidx into cold data

QCidx is not used on fast path, move it to the lower cacheline.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: prepare for multi-part descriptors

New datapaths may use multiple descriptor units to describe
a single packet. Prepare for that by adding a descriptors
per simple frame constant into ring size calculations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: use callbacks for slow path ring related functions

To reduce the coupling of slow path ring implementations and their
callers, use callbacks instead.

Changes to Jakub's work:
* Also use callbacks for xmit functions

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move the fast path code to separate files

In preparation for support for a new datapath format move all
ring and fast path logic into separate files. It is basically
a verbatim move with some wrapping functions, no new structures
and functions added.

The current data path is called NFD3 from the initial version
of the driver ABI it used. The non-fast path, but ring related
functions are moved to nfp_net_dp.c file.

Changes to Jakub's work:
* Rebase on xsk related code.
* Split the patch, move the callback changes to next commit.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: calculate ring masks without conditionals

Ring enable masks are 64bit long.  Replace mask calculation from:
  block_cnt == 64 ? 0xffffffffffffffffULL : (1 << block_cnt) - 1
with:
  (U64_MAX >> (64 - block_cnt))
to simplify the code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next.
This patchset contains updates for the nf_tables register tracking
infrastructure, disable bogus warning when attaching ct helpers,
one namespace pollution fix and few cleanups for the flowtable.

1) Revisit conntrack gc routine to reduce chances of overruning
   the netlink buffer from the event path. From Florian Westphal.

2) Disable warning on explicit ct helper assignment, from Phil Sutter.

3) Read-only expressions do not update registers, mark them as
   NFT_REDUCE_READONLY. Add helper functions to update the register
   tracking information. This patch re-enables the register tracking
   infrastructure.

4) Cancel register tracking in case an expression fully/partially
   clobbers existing data.

5) Add register tracking support for remaining expressions: ct,
   lookup, meta, numgen, osf, hash, immediate, socket, xfrm, tunnel,
   fib, exthdr.

6) Rename init and exit functions for the conntrack h323 helper,
   from Randy Dunlap.

7) Remove redundant field in struct flow_offload_work.

8) Update nf_flow_table_iterate() to pass flowtable to callback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sparx5: Use vid 1 when bridge default vid 0 to avoid collision

Standalone ports use vid 0. Let the bridge use vid 1 when
"vlan_default_pvid 0" is set to avoid collisions. Since no
VLAN is created when default pvid is 0 this is set
at "PORT_ATTR_SET" and handled in the Switchdev fdb handler.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: remove unnecessary memset in qed_init_fw_funcs

allocated_mem is allocated by kcalloc(). The memory is set to zero.
It is unnecessary to call memset again.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlabel: fix out-of-bounds memory accesses

In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.

The bug was found during fuzzing. The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252

CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17
Hardware name: linux,dummy-virt (DT)
Call trace:
  dump_backtrace+0x21c/0x230
  show_stack+0x1c/0x60
  dump_stack_lvl+0x64/0x7c
  print_address_description.constprop.0+0x70/0x2d0
  __kasan_report+0x158/0x16c
  kasan_report+0x74/0x120
  __asan_load1+0x80/0xa0
  netlbl_bitmap_walk+0x3c/0xd0
  calipso_opt_getattr+0x1a8/0x230
  calipso_sock_getattr+0x218/0x340
  calipso_sock_getattr+0x44/0x60
  netlbl_sock_getattr+0x44/0x80
  selinux_netlbl_socket_setsockopt+0x138/0x170
  selinux_socket_setsockopt+0x4c/0x60
  security_socket_setsockopt+0x4c/0x90
  __sys_setsockopt+0xbc/0x2b0
  __arm64_sys_setsockopt+0x6c/0x84
  invoke_syscall+0x64/0x190
  el0_svc_common.constprop.0+0x88/0x200
  do_el0_svc+0x88/0xa0
  el0_svc+0x128/0x1b0
  el0t_64_sync_handler+0x9c/0x120
  el0t_64_sync+0x16c/0x170

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: flowtable: pass flowtable to nf_flow_table_iterate()

The flowtable object is already passed as argument to
nf_flow_table_iterate(), do use not data pointer to pass flowtable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: remove redundant field in flow_offload_work struct

Already available through the flowtable object, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat_h323: eliminate anonymous module_init & module_exit

Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.

Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.

Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
calling  init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling  init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling  init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs

Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_exthdr: add reduce support

Check if we can elide the load. Cancel if the new candidate
isn't identical to previous store.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_fib: add reduce support

The fib expression stores to a register, so we can't add empty stub.
Check that the register that is being written is in fact redundant.

In most cases, this is expected to cancel tracking as re-use is
unlikely.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_tunnel: track register operations

Check if the destination register already contains the data that this
tunnel expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information. This patch does not perform bitwise tracking.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_xfrm: track register operations

Check if the destination register already contains the data that this
xfrm expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_socket: track register operations

Check if the destination register already contains the data that this
socket expression performs. This allows to skip this redundant
operation. If the destination contains a different selector, update the
register tracking information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_immediate: cancel register tracking for data destination register

The immediate expression might clobber existing data on the registers,
cancel register tracking for the destination register.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_hash: track register operations

Check if the destination register already contains the data that this
osf expression performs. Always cancel register tracking for jhash since
this requires tracking multiple source registers in case of
concatenations. Perform register tracking (without bitwise) for symhash
since input does not come from source register.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_osf: track register operations

Allow to recycle the previous output of the OS fingerprint expression
if flags and ttl are the same.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_numgen: cancel register tracking

Random and increment are stateful, each invocation results in fresh output.
Cancel register tracking for these two expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_meta: extend reduce support to bridge family

its enough to export the meta get reduce helper and then call it
from nft_meta_bridge too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_lookup: only cancel tracking for clobbered dregs

In most cases, nft_lookup will be read-only, i.e. won't clobber
registers. In case of map, we need to cancel the registers that will
see stores.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_ct: track register operations

Check if the destination register already contains the data that this ct
expression performs. This allows to skip this redundant operation. If
the destination contains a different selector, update the register
tracking information.

Export nft_expr_reduce_bitwise as a symbol since nft_ct might be
compiled as a module.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: cancel tracking for clobbered destination registers

Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.

This patch adds three new helper functions:

- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.

Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.

This patch updates the following expressions:

- meta_bridge
- bitwise
- byteorder
- meta
- payload

to use these helper functions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not reduce read-only expressions

Skip register tracking for expressions that perform read-only operations
on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to
avoid defining stubs for these expressions.

This patch re-enables register tracking which was disabled in ed5f85d42290
("netfilter: nf_tables: disable register tracking"). Follow up patches
add remaining register tracking for existing expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: conntrack: Add and use nf_ct_set_auto_assign_helper_warned()

The function sets the pernet boolean to avoid the spurious warning from
nf_ct_lookup_helper() when assigning conntrack helpers via nftables.

Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: conntrack: revisit gc autotuning

as of commit 4608fdfc07e1
("netfilter: conntrack: collect all entries in one cycle")
conntrack gc was changed to run every 2 minutes.

On systems where conntrack hash table is set to large value, most evictions
happen from gc worker rather than the packet path due to hash table
distribution.

This causes netlink event overflows when events are collected.

This change collects average expiry of scanned entries and
reschedules to the average remaining value, within 1 to 60 second interval.

To avoid event overflows, reschedule after each bucket and add a
limit for both run time and number of evictions per run.

If more entries have to be evicted, reschedule and restart 1 jiffy
into the future.

Reported-by: Karel Rericha <karel@maxtel.cz>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge tag 'mlx5-updates-2022-03-18' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-03-18

1) XDP multi buffer support

This series enables XDP on non-linear legacy RQ in multi buffer mode.

When XDP is enabled, fragmentation scheme on non-linear legacy RQ is
adjusted to comply to limitations of XDP multi buffer (fragments of the
same size). DMA addresses of fragments are stored in struct page for the
completion handler to be able to unmap them. XDP_TX is supported.

XDP_REDIRECT is not yet supported, the XDP core blocks it for multi
buffer packets at the moment.

2) Trivial cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/
ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2022-03-19

1) Delete duplicated functions that calls same xfrm_api_check.
From Leon Romanovsky.

2) Align userland API of the default policy structure to the
internal structures. From Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ptp: ocp: use snprintf() in ptp_ocp_verify()

This code is fine, but it's easier to review if we use snprintf()
instead of sprintf().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220318074723.GA6617@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: st21nfca: remove unnecessary skb check before kfree_skb()

The skb will be checked in kfree_skb(), so remove the outside check.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20220318072728.2659578-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '40GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2022-03-17

This series contains updates to i40e and igb drivers.

Tom Rix moves a conversion to little endian to occur only when the
value is used for i40e. He also zeros out a structure to resolve
possible use of garbage value for igb as reported by clang.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igb: zero hwtstamp by default
i40e: little endian only valid checksums
====================

Link: https://lore.kernel.org/r/20220317160236.3534321-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-net-next-2022-03-18' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

- Add support for Asus TF103C
- Add support for Realtek RTL8852B
- Add support for Realtek RTL8723BE
- Add WBS support to mt7921s

* tag 'for-net-next-2022-03-18' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (24 commits)
  Bluetooth: ath3k: remove superfluous header files
  Bluetooth: bcm203x: remove superfluous header files
  Bluetooth: hci_bcm: Add the Asus TF103C to the bcm_broken_irq_dmi_table
  Bluetooth: mt7921s: Add WBS support
  Bluetooth: mt7921s: Add .btmtk_get_codec_config_data
  Bluetooth: mt7921s: Add .get_data_path_id
  Bluetooth: mt7921s: Set HCI_QUIRK_VALID_LE_STATES
  Bluetooth: btmtksdio: Fix kernel oops in btmtksdio_interrupt
  Bluetooth: btmtkuart: fix error handling in mtk_hci_wmt_sync()
  Bluetooth: call hci_le_conn_failed with hdev lock in hci_le_conn_failed
  Bluetooth: Send AdvMonitor Dev Found for all matched devices
  Bluetooth: msft: Clear tracked devices on resume
  Bluetooth: fix incorrect nonblock bitmask in bt_sock_wait_ready()
  Bluetooth: Don't assign twice the same value
  Bluetooth: btrtl: Add support for RTL8852B
  Bluetooth: hci_uart: add missing NULL check in h5_enqueue
  Bluetooth: Fix use after free in hci_send_acl
  Bluetooth: btusb: Use quirk to skip HCI_FLT_CLEAR_ALL on fake CSR controllers
  Bluetooth: hci_sync: Add a new quirk to skip HCI_FLT_CLEAR_ALL
  Bluetooth: btmtkuart: fix the conflict between mtk and msft vendor event
  ...
====================

Link: https://lore.kernel.org/r/20220318224752.1477292-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qlcnic: remove redundant assignment to variable index

Variable index is being assigned a value that is never read, it is being
re-assigned later in a following for-loop. The assignment is redundant
and can be removed.

Cleans up clang scan build warning:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c:1358:17: warning:
Although the value stored to 'index' is used in the enclosing expression,
the value is never actually read from 'index' [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220318012035.89482-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atl1c: remove redundant assignment to variable size

Variable sie is being assigned a value that is never read. The
The assignment is redundant and can be removed.

Cleans up clang scan build warning:
drivers/net/ethernet/atheros/atl1c/atl1c_main.c:1054:22: warning:
Although the value stored to 'size' is used in the enclosing
expression, the value is never actually read from 'size'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220318005021.82073-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: send ADD_ADDR echo before create subflows

In some corner cases, the peer handing an incoming ADD_ADDR option, can
receive a retransmitted ADD_ADDR for the same address before the subflow
creation completes.

We can avoid the above issue by generating and sending the ADD_ADDR echo
before starting the MPJ subflow connection.

This slightly changes the behaviour of the packetdrill tests as the
ADD_ADDR echo packet is sent earlier.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220317221444.426335-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Remove unnecessary brackets around CONFIG_AF_UNIX_OOB.

Let's remove unnecessary brackets around CONFIG_AF_UNIX_OOB.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220317032308.65372-1-kuniyu@amazon.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: HTB, remove unused function declaration

There is no function mlx5e_get_sq(), remove the declaration.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Statify function mlx5_cmd_trigger_completions

Starting from commit
4cab346bcf74 ("net/mlx5: No command allowed when command interface is not ready"),
no calls to mlx5_cmd_trigger_completions() are external to cmd.c anymore.
Make it a static function.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Remove MLX5E_XDP_TX_DS_COUNT

After introducing multi-buffer XDP_TX, the MLX5E_XDP_TX_DS_COUNT define
became misleading. It's no longer the DS count of an XDP_TX WQE, this
WQE can be longer because of fragments.

As this define is only used at one place in mlx5e_open_xdpsq(), it's
also not very useful anymore. This commit removes the define and puts
the calculation of ds_count for prefilled single-fragment WQEs inline.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Permit XDP with non-linear legacy RQ

Now that legacy RQ implements XDP in the non-linear mode, stop blocking
this configuration. Allow non-linear mode only for programs aware of
multi buffer.

XDP performance with linear mode RQ hasn't changed.

Baseline (MTU 1500, TX MPWQE, legacy RQ, single core):
60-byte packets, XDP_DROP: 11.25 Mpps
60-byte packets, XDP_TX: 9.0 Mpps
60-byte packets, XDP_PASS: 668 kpps

Multi buffer (MTU 9000, TX MPWQE, legacy RQ, single core):
60-byte packets, XDP_DROP: 10.1 Mpps
60-byte packets, XDP_TX: 6.6 Mpps
60-byte packets, XDP_PASS: 658 kpps
8900-byte packets, XDP_DROP: 769 kpps (100% of sent packets)
8900-byte packets, XDP_TX: 674 kpps (100% of sent packets)
8900-byte packets, XDP_PASS: 637 kpps

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Support multi buffer XDP_TX

This commit enables passing multi buffer XDP frames to the TX handlers
on XDP_TX. Fragments are DMA synchronized to the device and queued to
the xdpi_fifo for a subsequent unmapping.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Unindent the else-block in mlx5e_xmit_xdp_buff

The next commit will add more indentation levels to mlx5e_xmit_xdp_buff.
To keep indentation minimal, unindent the else-block of the if-statement
by doing an early return.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Implement sending multi buffer XDP frames

xmit_xdp_frame is extended to support sending fragmented XDP frames. The
next commit will start using this functionality.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode

When MPWQE is disabled, mlx5e_open_xdpsq() prefills the common fields of
WQEs in the XDP SQ to save time when sending packets.
mlx5e_xmit_xdp_frame() runs on the prefilled fields, however, sending
multi buffer XDP frames would require changing some of these fields on a
per-packet basis. Besides that, mlx5e_xmit_xdp_frame() will be used as a
fallback to send multi buffer XDP frames when MPWQE is enabled (MPWQE
can only handle linear packets).

In order to prepare for XDP multi buffer support, this commit introduces
a mode for mlx5e_xmit_xdp_frame() that fills all the fields itself.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Remove assignment of inline_hdr.sz on XDP TX

When MPWQE is disabled, mlx5e_open_xdpsq prefills the common fields of
WQEs in the XDP SQ to save time when sending packets. One of such fields
is eseg->inline_hdr.sz, which can be either 0 or MLX5E_XDP_MIN_INLINE,
depending on the inline mode of the SQ.

The inline mode can't change during the lifetime of the SQ, so setting
this field again in mlx5e_xmit_xdp_frame is redundant. Moreover, the
xmit function only sets it to MLX5E_XDP_MIN_INLINE, but not to 0 in the
other case.

This commit removes the redundant assignment in mlx5e_xmit_xdp_frame.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Move mlx5e_xdpi_fifo_push out of xmit_xdp_frame

The implementations of xmit_xdp_frame get the xdpi parameter of type
struct mlx5e_xdp_info for the sole purpose of calling
mlx5e_xdpi_fifo_push() on success.

This commit moves this call outside of xmit_xdp_frame, shifting this
responsibility to the caller. It will allow more fine-grained handling
of XDP info for cases when an xdp_frame is fragmented.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Store DMA address inside struct page

Use page_pool_set_dma_addr() to store the DMA address of a page inside
struct page, in order to avoid passing struct mlx5e_dma_info to XDP
handlers. Previously, struct mlx5e_dma_info was used to pass both the
DMA address and the page, and it worked well for the single-fragment
case.

When XDP multi buffer is in use, and a fragmented xdp_frame has to be
transmitted, the driver needs to know the DMA addresses of fragments,
however, the array of fragments in struct skb_shared_info doesn't
contain them. In order to pass the DMA addresses, the driver puts them
into struct page itself, which is accessible from the array of fragments
in struct skb_shared_info. The existing XDP handlers are modified to
remove the dependency on struct mlx5e_dma_info.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ

This commit adds XDP multi buffer support to the RX path in the
non-linear legacy RQ mode. mlx5e_xdp_handle is called from
mlx5e_skb_from_cqe_nonlinear.

XDP_TX action for fragmented XDP frames is not yet supported and
blocked.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Use page-sized fragments with XDP multi buffer

The implementation of XDP in mlx5e assumes that the frame size is equal
to the page size. Force this limitation in the non-linear mode for XDP
multi buffer.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP

XDP multi buffer implementation in the kernel assumes that all fragments
have the same size. bpf_xdp_frags_increase_tail uses this assumption to
get the size of the last fragment, and __xdp_build_skb_from_frame uses
it to calculate truesize as nr_frags * xdpf->frame_sz.

The current implementation of mlx5e uses fragments of different size in
non-linear legacy RQ. Specifically, the last fragment can be larger than
the others. It's an optimization for packets smaller than MTU.

This commit adapts mlx5e to the kernel limitations and makes it use
fragments of the same size, in order to add support for XDP multi
buffer. The change is applied only if XDP is active, otherwise the old
optimization still applies.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Prepare non-linear legacy RQ for XDP multi buffer support

mlx5e_skb_from_cqe_nonlinear creates an xdp_buff first, putting the
first fragment as the linear part, and the rest of fragments as
fragments to struct skb_shared_info in the tailroom. Then it creates an
SKB in place, based on the xdp_buff. The XDP program is not called in
this commit yet.

This commit contains no functional change, except the SKB is built over
the whole frag_stride of the first fragment, instead of the minimal size
required (headroom, data and skb_shared_info).

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net: set default rss queues num to physical cores / 2

Network drivers can call to netif_get_num_default_rss_queues to get the
default number of receive queues to use. Right now, this default number
is min(8, num_online_cpus()).

Instead, as suggested by Jakub, use the number of physical cores divided
by 2 as a way to avoid wasting CPU resources and to avoid using both CPU
threads, but still allowing to scale for high-end processors with many
cores.

As an exception, select 2 queues for processors with 2 cores, because
otherwise it won't take any advantage of RSS despite being SMP capable.

Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2
threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some
tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24,
getting the following results:
- Number of queues: 6 (instead of 8)
- Network throughput: not affected
- CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs
this is only 0.2-0.5% higher)
- Reduced the number of context switches by 7-50%, being more noticeable
when using a higher number of parallel threads.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Link: https://lore.kernel.org/r/20220315091832.13873-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Bluetooth: ath3k: remove superfluous header files

ath3k.c hasn't use any macro or function declared in linux/device.h.
Thus, these files can be removed from ath3k.c safely without
affecting the compilation of the ./drivers/bluetooth module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: bcm203x: remove superfluous header files

bcm203x.c hasn't use any macro or function declared in linux/atomic.h.
Thus, these files can be removed from bcm203x.c safely without
affecting the compilation of the ./drivers/bluetooth module

Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: hci_bcm: Add the Asus TF103C to the bcm_broken_irq_dmi_table

The DSDT for the Asus TF103C specifies a IOAPIC IRQ for the HCI -> host IRQ
but this is not correct. Unlike the previous entries in the table, this
time the correct GPIO to use instead is known; and the TF103C is battery
powered making runtime-pm support more important.

Extend the bcm_broken_irq_dmi_table mechanism to allow specifying the right
GPIO instead of just always disabling runtime-pm and add an entry to it for
the Asus TF103C.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: mt7921s: Add WBS support

It is time to add wide band speech (WBS) support.

Reviewed-by: Mark Chen <markyawenchen@gmail.com>
Co-developed-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Yake Yang <yake.yang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: mt7921s: Add .btmtk_get_codec_config_data

add .btmtk_get_codec_config_data to get codec configuration data.

In HFP offload usecase, controllers need to be set codec details before
opening SCO. This callback function is used to fetch vendor specific codec
config data.

This is a preliminary patch to add the WBS support to the MT7921 driver.

Reviewed-by: Mark Chen <markyawenchen@gmail.com>
Co-developed-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Yake Yang <yake.yang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: mt7921s: Add .get_data_path_id

Add .get_data_path_id to fetch data_path_id for MT7921 to support HFP
offload use case.

This is a preliminary patch to add the WBS support to the MT7921 driver.

Reviewed-by: Mark Chen <markyawenchen@gmail.com>
Co-developed-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Yake Yang <yake.yang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: mt7921s: Set HCI_QUIRK_VALID_LE_STATES

The patch set HCI_QUIRK_VALID_LE_STATES to be consistent with the btusb for
MT7921 and is required for the likes of experimental LE simultaneous roles.

Reviewed-by: Mark Chen <markyawenchen@gmail.com>
Co-developed-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Yake Yang <yake.yang@mediatek.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>