platform/kernel/linux-starfive.git
2 years agonet: dsa: add support for phylink mac_select_pcs()
Russell King (Oracle) [Thu, 17 Feb 2022 18:30:35 +0000 (18:30 +0000)]
net: dsa: add support for phylink mac_select_pcs()

Add DSA support for the phylink mac_select_pcs() method so DSA drivers
can return provide phylink with the appropriate PCS for the PHY
interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ethernet: xilinx: cleanup comments
Tom Rix [Thu, 17 Feb 2022 16:05:18 +0000 (08:05 -0800)]
net: ethernet: xilinx: cleanup comments

Remove the second 'the'.
Replacements:
endiannes to endianness
areconnected to are connected
Mamagement to Management
undoccumented to undocumented
Xilink to Xilinx
strucutre to structure

Change kernel-doc comment style to c style for
/* Management ...

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: gro: Fix a 'directive in macro's argument list' sparse warning
Gal Pressman [Thu, 17 Feb 2022 08:07:55 +0000 (10:07 +0200)]
net: gro: Fix a 'directive in macro's argument list' sparse warning

Following the cited commit, sparse started complaining about:
../include/net/gro.h:58:1: warning: directive in macro's argument list
../include/net/gro.h:59:1: warning: directive in macro's argument list

Fix that by moving the defines out of the struct_group() macro.

Fixes: de5a1f3ce4c8 ("net: gro: minor optimization for dev_gro_receive()")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agos390/qeth: Remove redundant 'flush_workqueue()' calls
Xu Wang [Wed, 16 Feb 2022 07:51:55 +0000 (07:51 +0000)]
s390/qeth: Remove redundant 'flush_workqueue()' calls

'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220216075155.940-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: delete unused exported symbols for ethtool PHY stats
Vladimir Oltean [Wed, 16 Feb 2022 19:37:26 +0000 (21:37 +0200)]
net: dsa: delete unused exported symbols for ethtool PHY stats

Introduced in commit cf963573039a ("net: dsa: Allow providing PHY
statistics from CPU port"), it appears these were never used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220216193726.2926320-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: add sanity check in proto_register()
Eric Dumazet [Wed, 16 Feb 2022 17:18:01 +0000 (09:18 -0800)]
net: add sanity check in proto_register()

prot->memory_allocated should only be set if prot->sysctl_mem
is also set.

This is a followup of commit 25206111512d ("crypto: af_alg - get
rid of alg_memory_allocated").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220216171801.3604366-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: ll_temac: Use GFP_KERNEL instead of GFP_ATOMIC when possible
Christophe JAILLET [Wed, 16 Feb 2022 20:16:16 +0000 (21:16 +0100)]
net: ll_temac: Use GFP_KERNEL instead of GFP_ATOMIC when possible

XTE_MAX_JUMBO_FRAME_SIZE is over 9000 bytes and the default value for
'rx_bd_num' is RX_BD_NUM_DEFAULT (i.e. 1024)

So this loop allocates more than 9 Mo of memory.

Previous memory allocations in this function already use GFP_KERNEL, so
use __netdev_alloc_skb_ip_align() and an explicit GFP_KERNEL instead of a
implicit GFP_ATOMIC.

This gives more opportunities of successful allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/694abd65418b2b3974106a82d758e3474c65ae8f.1645042560.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: nixge: Use GFP_KERNEL instead of GFP_ATOMIC when possible
Christophe JAILLET [Wed, 16 Feb 2022 20:38:11 +0000 (21:38 +0100)]
net: nixge: Use GFP_KERNEL instead of GFP_ATOMIC when possible

NIXGE_MAX_JUMBO_FRAME_SIZE is over 9000 bytes and RX_BD_NUM 128.

So this loop allocates more than 1 Mo of memory.

Previous memory allocations in this function already use GFP_KERNEL, so
use __netdev_alloc_skb_ip_align() and an explicit GFP_KERNEL instead of a
implicit GFP_ATOMIC.

This gives more opportunities of successful allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/28d2c8e05951ad02a57eb48333672947c8bb4f81.1645043881.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mptcp-selftest-fine-tuning-and-cleanup'
Jakub Kicinski [Fri, 18 Feb 2022 04:00:02 +0000 (20:00 -0800)]
Merge branch 'mptcp-selftest-fine-tuning-and-cleanup'

Mat Martineau says:

====================
mptcp: Selftest fine-tuning and cleanup

Patch 1 adjusts the mptcp selftest timeout to account for slow machines
running debug builds.

Patch 2 simplifies one test function.

Patches 3-6 do some cleanup, like deleting unused variables and avoiding
extra work when only printing usage information.

Patch 7 improves the checksum tests by utilizing existing checksum MIBs.
====================

Link: https://lore.kernel.org/r/20220218030311.367536-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: add csum mib check for mptcp_connect
Geliang Tang [Fri, 18 Feb 2022 03:03:11 +0000 (19:03 -0800)]
selftests: mptcp: add csum mib check for mptcp_connect

This patch added the data checksum error mib counters check for the
script mptcp_connect.sh when the data checksum is enabled.

In do_transfer(), got the mib counters twice, before and after running
the mptcp_connect commands. The latter minus the former is the actual
number of the data checksum mib counter.

The output looks like this:

ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP   (duration    86ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP   (duration    66ms) [ FAIL ]
server got 1 data checksum error[s]

Fixes: 94d66ba1d8e48 ("selftests: mptcp: enable checksum in mptcp_connect.sh")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/255
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: join: check for tools only if needed
Matthieu Baerts [Fri, 18 Feb 2022 03:03:10 +0000 (19:03 -0800)]
selftests: mptcp: join: check for tools only if needed

To allow showing the 'help' menu even if these tools are not available.

While at it, also avoid launching the command then checking $?. Instead,
the check is directly done in the 'if'.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: join: create tmp files only if needed
Matthieu Baerts [Fri, 18 Feb 2022 03:03:09 +0000 (19:03 -0800)]
selftests: mptcp: join: create tmp files only if needed

These tmp files will only be created when a test will be launched.

This avoid 'dd' output when '-h' is used for example.

While at it, also avoid creating netns that will be removed when
starting the first test.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: join: remove unused vars
Matthieu Baerts [Fri, 18 Feb 2022 03:03:08 +0000 (19:03 -0800)]
selftests: mptcp: join: remove unused vars

Shellcheck found that these variables were set but never used.

Note that rndh is no longer prefixed with '0-' but it doesn't change
anything.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: join: exit after usage()
Matthieu Baerts [Fri, 18 Feb 2022 03:03:07 +0000 (19:03 -0800)]
selftests: mptcp: join: exit after usage()

With an error if it is an unknown option.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: simplify pm_nl_change_endpoint
Geliang Tang [Fri, 18 Feb 2022 03:03:06 +0000 (19:03 -0800)]
selftests: mptcp: simplify pm_nl_change_endpoint

This patch simplified pm_nl_change_endpoint(), using id-based address
lookups only. And dropped the fragile way of parsing 'addr' and 'id'
from the output of pm_nl_show_endpoints().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoselftests: mptcp: increase timeout to 20 minutes
Matthieu Baerts [Fri, 18 Feb 2022 03:03:05 +0000 (19:03 -0800)]
selftests: mptcp: increase timeout to 20 minutes

With the increase number of tests, one CI instance, using a debug kernel
config and not recent hardware, takes around 10 minutes to execute the
slowest MPTCP test: mptcp_join.sh.

Even if most CIs don't take that long to execute these tests --
typically max 10 minutes to run all selftests -- it will help some of
them if the timeout is increased.

The timeout could be disabled but it is always good to have an extra
safeguard, just in case.

Please note that on slow public CIs with kernel debug settings, it has
been observed it can easily take up to 45 minutes to execute all tests
in this very slow environment with other jobs running in parallel.
The slowest test, mptcp_join.sh takes ~30 minutes in this case.

In such environments, the selftests timeout set in the 'settings' file
is disabled because this environment is known as being exceptionnally
slow. It has been decided not to take such exceptional environments into
account and set the timeout to 20min.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Jakub Kicinski [Fri, 18 Feb 2022 01:23:51 +0000 (17:23 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2022-02-17

We've added 29 non-merge commits during the last 8 day(s) which contain
a total of 34 files changed, 1502 insertions(+), 524 deletions(-).

The main changes are:

1) Add BTFGen support to bpftool which allows to use CO-RE in kernels without
   BTF info, from Mauricio Vásquez, Rafael David Tinoco, Lorenzo Fontana and
   Leonardo Di Donato. (Details: https://lpc.events/event/11/contributions/948/)

2) Prepare light skeleton to be used in both kernel module and user space
   and convert bpf_preload.ko to use light skeleton, from Alexei Starovoitov.

3) Rework bpftool's versioning scheme and align with libbpf's version number;
   also add linked libbpf version info to "bpftool version", from Quentin Monnet.

4) Add minimal C++ specific additions to bpftool's skeleton codegen to
   facilitate use of C skeletons in C++ applications, from Andrii Nakryiko.

5) Add BPF verifier sanity check whether relative offset on kfunc calls overflows
   desc->imm and reject the BPF program if the case, from Hou Tao.

6) Fix libbpf to use a dynamically allocated buffer for netlink messages to
   avoid receiving truncated messages on some archs, from Toke Høiland-Jørgensen.

7) Various follow-up fixes to the JIT bpf_prog_pack allocator, from Song Liu.

8) Various BPF selftest and vmtest.sh fixes, from Yucong Sun.

9) Fix bpftool pretty print handling on dumping map keys/values when no BTF
   is available, from Jiri Olsa and Yinjun Zhang.

10) Extend XDP frags selftest to check for invalid length, from Lorenzo Bianconi.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (29 commits)
  bpf: bpf_prog_pack: Set proper size before freeing ro_header
  selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails
  selftests/bpf: Fix vmtest.sh to launch smp vm.
  libbpf: Fix memleak in libbpf_netlink_recv()
  bpftool: Fix C++ additions to skeleton
  bpftool: Fix pretty print dump for maps without BTF loaded
  selftests/bpf: Test "bpftool gen min_core_btf"
  bpftool: Gen min_core_btf explanation and examples
  bpftool: Implement btfgen_get_btf()
  bpftool: Implement "gen min_core_btf" logic
  bpftool: Add gen min_core_btf command
  libbpf: Expose bpf_core_{add,free}_cands() to bpftool
  libbpf: Split bpf_core_apply_relo()
  bpf: Reject kfunc calls that overflow insn->imm
  selftests/bpf: Add Skeleton templated wrapper as an example
  bpftool: Add C++-specific open/load/etc skeleton wrappers
  selftests/bpf: Fix GCC11 compiler warnings in -O2 mode
  bpftool: Fix the error when lookup in no-btf maps
  libbpf: Use dynamically allocated buffer when receiving netlink messages
  libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0
  ...
====================

Link: https://lore.kernel.org/r/20220217232027.29831-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobpf: bpf_prog_pack: Set proper size before freeing ro_header
Song Liu [Thu, 17 Feb 2022 18:30:01 +0000 (10:30 -0800)]
bpf: bpf_prog_pack: Set proper size before freeing ro_header

bpf_prog_pack_free() uses header->size to decide whether the header
should be freed with module_memfree() or the bpf_prog_pack logic.
However, in kvmalloc() failure path of bpf_jit_binary_pack_alloc(),
header->size is not set yet. As a result, bpf_prog_pack_free() may treat
a slice of a pack as a standalone kvmalloc'd header and call
module_memfree() on the whole pack. This in turn causes use-after-free by
other users of the pack.

Fix this by setting ro_header->size before freeing ro_header.

Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
Reported-by: syzbot+ecb1e7e51c52f68f7481@syzkaller.appspotmail.com
Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220217183001.1876034-1-song@kernel.org
2 years agoMerge branch 'prestera-route-offloading'
David S. Miller [Thu, 17 Feb 2022 20:45:31 +0000 (20:45 +0000)]
Merge branch 'prestera-route-offloading'

Yevhen Orlov says:

====================
net: marvell: prestera: add basic routes offloading

Add support for blackhole and local routes for Marvell Prestera driver.
Subscribe on fib notifications and handle add/del.

Add features:
 - Support route adding.
   e.g.: "ip route add blackhole 7.7.1.1/24"
   e.g.: "ip route add local 9.9.9.9 dev sw1p30"
 - Support "rt_trap", "rt_offload", "rt_offload_failed" flags
 - Handle case, when route in "local" table overlaps route in "main" table
   example:
        ip ro add blackhole 7.7.7.7
        ip ro add local 7.7.7.7 dev sw1p30
        # blackhole route will be deoffloaded. rt_offload flag disappeared

Limitations:
 - Only "blackhole" and "local" routes supported. "nexthop" routes is TRAP
   for now and will be implemented soon.
 - Only "local" and "main" tables supported
====================

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: marvell: prestera: handle fib notifications
Yevhen Orlov [Wed, 16 Feb 2022 01:05:57 +0000 (03:05 +0200)]
net: marvell: prestera: handle fib notifications

For now we support only TRAP or DROP, so we can offload only "local" or
"blackhole" routes.
Nexthop routes is TRAP for now. Will be implemented soon.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: marvell: prestera: add hardware router objects accounting for lpm
Yevhen Orlov [Wed, 16 Feb 2022 01:05:56 +0000 (03:05 +0200)]
net: marvell: prestera: add hardware router objects accounting for lpm

Add new router_hw object "fib_node". For now it support only DROP and
TRAP mode.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: marvell: prestera: Add router LPM ABI
Yevhen Orlov [Wed, 16 Feb 2022 01:05:55 +0000 (03:05 +0200)]
net: marvell: prestera: Add router LPM ABI

Add functions to create/delete lpm entry in hw.
prestera_hw_lpm_add() take index of allocated virtual router.
Also it takes grp_id, which is index of allocated nexthop group.
ABI to create nexthop group will be added soon.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 17 Feb 2022 20:22:28 +0000 (12:22 -0800)]
Merge git://git./linux/kernel/git/netdev/net

Fast path bpf marge for some -next work.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 17 Feb 2022 20:01:54 +0000 (12:01 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2022-02-17

We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).

The main changes are:

1) Add schedule points in map batch ops, from Eric.

2) Fix bpf_msg_push_data with len 0, from Felix.

3) Fix crash due to incorrect copy_map_value, from Kumar.

4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.

5) Fix a bpf_timer initialization issue with clang, from Yonghong.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add schedule points in batch ops
  bpf: Fix crash due to out of bounds access into reg2btf_ids.
  selftests: bpf: Check bpf_msg_push_data return value
  bpf: Fix a bpf_timer initialization issue
  bpf: Emit bpf_timer in vmlinux BTF
  selftests/bpf: Add test for bpf_timer overwriting crash
  bpf: Fix crash due to incorrect copy_map_value
  bpf: Do not try bpf_msg_push_data with len 0
====================

Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 17 Feb 2022 19:44:20 +0000 (11:44 -0800)]
Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 17 Feb 2022 19:33:59 +0000 (11:33 -0800)]
Merge tag 'net-5.17-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  Current release - regressions:

   - dsa: lantiq_gswip: fix use after free in gswip_remove()

   - smc: avoid overwriting the copies of clcsock callback functions

  Current release - new code bugs:

   - iwlwifi:
      - fix use-after-free when no FW is present
      - mei: fix the pskb_may_pull check in ipv4
      - mei: retry mapping the shared area
      - mvm: don't feed the hardware RFKILL into iwlmei

  Previous releases - regressions:

   - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()

   - tipc: fix wrong publisher node address in link publications

   - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
     assertion

   - bgmac: make idm and nicpm resource optional again

   - atl1c: fix tx timeout after link flap

  Previous releases - always broken:

   - vsock: remove vsock from connected table when connect is
     interrupted by a signal

   - ping: change destination interface checks to match raw sockets

   - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
     semantics (and null-deref) after SO_RESERVE_MEM was added

   - ipv6: make exclusive flowlabel checks per-netns

   - bonding: force carrier update when releasing slave

   - sched: limit TC_ACT_REPEAT loops

   - bridge: multicast: notify switchdev driver whenever MC processing
     gets disabled because of max entries reached

   - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found

   - iwlwifi: fix locking when "HW not ready"

   - phy: mediatek: remove PHY mode check on MT7531

   - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN

   - dsa: lan9303:
      - fix polarity of reset during probe
      - fix accelerated VLAN handling"

* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  bonding: force carrier update when releasing slave
  nfp: flower: netdev offload check for ip6gretap
  ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
  ipv4: fix data races in fib_alias_hw_flags_set
  net: dsa: lan9303: add VLAN IDs to master device
  net: dsa: lan9303: handle hwaccel VLAN tags
  vsock: remove vsock from connected table when connect is interrupted by a signal
  Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
  ping: fix the dif and sdif check in ping_lookup
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
  net: sched: limit TC_ACT_REPEAT loops
  tipc: fix wrong notification node addresses
  net: dsa: lantiq_gswip: fix use after free in gswip_remove()
  ipv6: per-netns exclusive flowlabel checks
  net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
  CDC-NCM: avoid overflow in sanity checking
  mctp: fix use after free
  net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
  bonding: fix data-races around agg_select_timer
  dpaa2-eth: Initialize mutex used in one step timestamping path
  ...

2 years agoselftests/bpf: Fix crash in core_reloc when bpftool btfgen fails
Yucong Sun [Thu, 17 Feb 2022 18:02:10 +0000 (10:02 -0800)]
selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails

Avoid unnecessary goto cleanup, as there is nothing to clean up.

Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220217180210.2981502-1-fallentree@fb.com
2 years agoselftests/bpf: Fix vmtest.sh to launch smp vm.
Yucong Sun [Thu, 17 Feb 2022 15:52:12 +0000 (07:52 -0800)]
selftests/bpf: Fix vmtest.sh to launch smp vm.

Fix typo in vmtest.sh to make sure it launch proper vm with 8 cpus.

Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220217155212.2309672-1-fallentree@fb.com
2 years agobonding: force carrier update when releasing slave
Zhang Changzhong [Wed, 16 Feb 2022 14:18:08 +0000 (22:18 +0800)]
bonding: force carrier update when releasing slave

In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.

Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.

Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0

Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agobpf: Add schedule points in batch ops
Eric Dumazet [Thu, 17 Feb 2022 18:19:02 +0000 (10:19 -0800)]
bpf: Add schedule points in batch ops

syzbot reported various soft lockups caused by bpf batch operations.

 INFO: task kworker/1:1:27 blocked for more than 140 seconds.
 INFO: task hung in rcu_barrier

Nothing prevents batch ops to process huge amount of data,
we need to add schedule points in them.

Note that maybe_wait_bpf_programs(map) calls from
generic_map_delete_batch() can be factorized by moving
the call after the loop.

This will be done later in -next tree once we get this fix merged,
unless there is strong opinion doing this optimization sooner.

Fixes: aa2e93b8e58e ("bpf: Add generic support for update and delete batch ops")
Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
2 years agofs/file_table: fix adding missing kmemleak_not_leak()
Luis Chamberlain [Tue, 15 Feb 2022 02:08:28 +0000 (18:08 -0800)]
fs/file_table: fix adding missing kmemleak_not_leak()

Commit b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl
to its own file") fixed a regression, however it failed to add a
kmemleak_not_leak().

Fixes: b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
Reported-by: Tong Zhang <ztong0001@gmail.com>
Cc: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 17 Feb 2022 18:06:09 +0000 (10:06 -0800)]
Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix corrupt inject files when only last branch option is enabled with
   ARM CoreSight ETM

 - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12

 - Defer freeing string after possible strlen() on it in the BPF loader,
   found by gcc 12

 - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
   processes

 - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
   initialization

 - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf

 - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
   the CPU iterator

 - Sync linux/perf_event.h UAPI with the kernel sources

 - Update Jiri Olsa's email address in MAINTAINERS

* tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf: Defer freeing string after possible strlen() on it
  perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
  libsubcmd: Fix use-after-free for realloc(..., 0)
  libperf: Fix perf_cpu_map__for_each_cpu macro
  perf cs-etm: Fix corrupt inject files when only last branch option is enabled
  perf cs-etm: No-op refactor of synth opt usage
  libperf: Fix 32-bit build for tests uint64_t printf
  tools headers UAPI: Sync linux/perf_event.h with the kernel sources
  perf trace: Avoid early exit due SIGCHLD from non-workload processes
  MAINTAINERS: Update Jiri's email address

2 years agoMerge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof...
Linus Torvalds [Thu, 17 Feb 2022 17:54:00 +0000 (09:54 -0800)]
Merge tag 'modules-5.17-rc5' of git://git./linux/kernel/git/mcgrof/linux

Pull module fix from Luis Chamberlain:
 "Fixes module decompression when CONFIG_SYSFS=n

  The only fix trickled down for v5.17-rc cycle so far is the fix for
  module decompression when CONFIG_SYSFS=n. This was reported through
  0-day"

* tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: fix building with sysfs disabled

2 years agonfp: flower: netdev offload check for ip6gretap
Danie du Toit [Thu, 17 Feb 2022 12:48:20 +0000 (14:48 +0200)]
nfp: flower: netdev offload check for ip6gretap

IPv6 GRE tunnels are not being offloaded, this is caused by a missing
netdev offload check. The functionality of IPv6 GRE tunnel offloading
was previously added but this check was not included. Adding the
ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.

Fixes: f7536ffb0986 ("nfp: flower: Allow ipv6gretap interface for offloading")
Signed-off-by: Danie du Toit <danie.dutoit@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Eric Dumazet [Wed, 16 Feb 2022 17:32:17 +0000 (09:32 -0800)]
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt

Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
 fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
 fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
 fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
 __ip6_del_rt net/ipv6/route.c:3876 [inline]
 ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
 __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
 __sock_release net/socket.c:650 [inline]
 sock_close+0x6c/0x150 net/socket.c:1318
 __fput+0x295/0x520 fs/file_table.c:280
 ____fput+0x11/0x20 fs/file_table.c:313
 task_work_run+0x8e/0x110 kernel/task_work.c:164
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
 exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
 fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
 nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv4: fix data races in fib_alias_hw_flags_set
Eric Dumazet [Wed, 16 Feb 2022 17:32:16 +0000 (09:32 -0800)]
ipv4: fix data races in fib_alias_hw_flags_set

fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.

We need to annotate accesses to following fields of struct fib_alias:

    offload, trap, offload_failed

Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.

BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set

read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
 fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
 fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

value changed: 0x00 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 90b93f1b31f8 ("ipv4: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: lan9303: add VLAN IDs to master device
Mans Rullgard [Wed, 16 Feb 2022 20:48:18 +0000 (20:48 +0000)]
net: dsa: lan9303: add VLAN IDs to master device

If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received.  Do this in the
port_enable() function, and remove them in port_disable().

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: lan9303: handle hwaccel VLAN tags
Mans Rullgard [Wed, 16 Feb 2022 12:46:34 +0000 (12:46 +0000)]
net: dsa: lan9303: handle hwaccel VLAN tags

Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
where the VLAN tag has already been consumed by the master.

Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomm: don't try to NUMA-migrate COW pages that have other uses
Linus Torvalds [Thu, 17 Feb 2022 16:57:47 +0000 (08:57 -0800)]
mm: don't try to NUMA-migrate COW pages that have other uses

Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:

 "All the details are in the bug, but the bottom line is that somehow,
  this patch causes corruption when the numa balancing feature is
  enabled AND we don't use process affinity AND we use GUP to pin pages
  so our accelerator can DMA to/from system memory.

  Either disabling numa balancing, using process affinity to bind to
  specific numa-node or reverting this patch causes the bug to
  disappear"

and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page()
simplification").

Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW.  But it appears it
does.  Suspicious.

However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.

The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.

Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.

Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information).  Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.

The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.

Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/
Reported-and-tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agovsock: remove vsock from connected table when connect is interrupted by a signal
Seth Forshee [Thu, 17 Feb 2022 14:13:12 +0000 (08:13 -0600)]
vsock: remove vsock from connected table when connect is interrupted by a signal

vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.

Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.

Note for backporting: this patch requires d5afa82c977e ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
Jonas Gorski [Wed, 16 Feb 2022 18:46:34 +0000 (10:46 -0800)]
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"

This reverts commit 3710e80952cf2dc48257ac9f145b117b5f74e0a5.

Since idm_base and nicpm_base are still optional resources not present
on all platforms, this breaks the driver for everything except Northstar
2 (which has both).

The same change was already reverted once with 755f5738ff98 ("net:
broadcom: fix a mistake about ioremap resource").

So let's do it again.

Fixes: 3710e80952cf ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
[florian: Added comments to explain the resources are optional]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoipv6/addrconf: ensure addrconf_verify_rtnl() has completed
Eric Dumazet [Wed, 16 Feb 2022 18:20:37 +0000 (10:20 -0800)]
ipv6/addrconf: ensure addrconf_verify_rtnl() has completed

Before freeing the hash table in addrconf_exit_net(),
we need to make sure the work queue has completed,
or risk NULL dereference or UAF.

Thus, use cancel_delayed_work_sync() to enforce this.
We do not hold RTNL in addrconf_exit_net(), making this safe.

Fixes: 8805d13ff1b2 ("ipv6/addrconf: use one delayed work per netns")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220216182037.3742-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: allow out-of-order netdev unregistration
Jakub Kicinski [Tue, 15 Feb 2022 22:53:10 +0000 (14:53 -0800)]
net: allow out-of-order netdev unregistration

Sprinkle for each loops to allow netdevices to be unregistered
out of order, as their refs are released.

This prevents problems caused by dependencies between netdevs
which want to release references in their ->priv_destructor.
See commit d6ff94afd90b ("vlan: move dev_put into vlan_dev_uninit")
for example.

Eric has removed the only known ordering requirement in
commit c002496babfd ("Merge branch 'ipv6-loopback'")
so let's try this and see if anything explodes...

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220215225310.3679266-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: transition netdev reg state earlier in run_todo
Jakub Kicinski [Tue, 15 Feb 2022 22:53:09 +0000 (14:53 -0800)]
net: transition netdev reg state earlier in run_todo

In prep for unregistering netdevs out of order move the netdev
state validation and change outside of the loop.

While at it modernize this code and use WARN() instead of
pr_err() + dump_stack().

Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220215225310.3679266-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agolibbpf: Fix memleak in libbpf_netlink_recv()
Andrii Nakryiko [Thu, 17 Feb 2022 07:39:58 +0000 (23:39 -0800)]
libbpf: Fix memleak in libbpf_netlink_recv()

Ensure that libbpf_netlink_recv() frees dynamically allocated buffer in
all code paths.

Fixes: 9c3de619e13e ("libbpf: Use dynamically allocated buffer when receiving netlink messages")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20220217073958.276959-1-andrii@kernel.org
2 years agoping: fix the dif and sdif check in ping_lookup
Xin Long [Wed, 16 Feb 2022 05:20:52 +0000 (00:20 -0500)]
ping: fix the dif and sdif check in ping_lookup

When 'ping' changes to use PING socket instead of RAW socket by:

   # sysctl -w net.ipv4.ping_group_range="0 100"

There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 192.168.111.1/24 dev dummy0
  # ping -I dummy0 192.168.111.1 -c1

The issue was also reported on:

  https://github.com/iputils/iputils/issues/104

But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.

This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
Daniele Palmas [Tue, 15 Feb 2022 11:13:35 +0000 (12:13 +0100)]
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990

Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
0x1071 composition in order to avoid bind error.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ping6-SOL_IPV6'
David S. Miller [Thu, 17 Feb 2022 14:22:10 +0000 (14:22 +0000)]
Merge branch 'ping6-SOL_IPV6'

Jakub Kicinski says:

====================
net: ping6: support setting basic SOL_IPV6 options via cmsg

Support for IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG on ICMPv6
sockets and associated tests. I have no immediate plans to
implement IPV6_FLOWINFO and all the extension header stuff.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: basic test for IPV6_2292*
Jakub Kicinski [Thu, 17 Feb 2022 01:21:20 +0000 (17:21 -0800)]
selftests: net: basic test for IPV6_2292*

Add a basic test to make sure ping sockets don't crash
with IPV6_2292* options.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: test IPV6_HOPLIMIT
Jakub Kicinski [Thu, 17 Feb 2022 01:21:19 +0000 (17:21 -0800)]
selftests: net: test IPV6_HOPLIMIT

Test setting IPV6_HOPLIMIT via setsockopt and cmsg
across socket types.

Output without the kernel support (this series):

  Case HOPLIMIT ICMP cmsg - packet data returned 1, expected 0
  Case HOPLIMIT ICMP diff - packet data returned 1, expected 0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: test IPV6_TCLASS
Jakub Kicinski [Thu, 17 Feb 2022 01:21:18 +0000 (17:21 -0800)]
selftests: net: test IPV6_TCLASS

Test setting IPV6_TCLASS via setsockopt and cmsg
across socket types.

Output without the kernel support (this series):

  Case TCLASS ICMP cmsg - packet data returned 1, expected 0
  Case TCLASS ICMP cmsg - rejection returned 0, expected 1
  Case TCLASS ICMP diff - pass returned 1, expected 0
  Case TCLASS ICMP diff - packet data returned 1, expected 0
  Case TCLASS ICMP diff - rejection returned 0, expected 1

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoselftests: net: test IPV6_DONTFRAG
Jakub Kicinski [Thu, 17 Feb 2022 01:21:17 +0000 (17:21 -0800)]
selftests: net: test IPV6_DONTFRAG

Test setting IPV6_DONTFRAG via setsockopt and cmsg
across socket types.

Output without the kernel support (this series):

    Case DONTFRAG ICMP setsock returned 0, expected 1
    Case DONTFRAG ICMP cmsg returned 0, expected 1
    Case DONTFRAG ICMP both returned 0, expected 1
    Case DONTFRAG ICMP diff returned 0, expected 1
  FAIL - 4/24 cases failed

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ping6: support setting basic SOL_IPV6 options via cmsg
Jakub Kicinski [Thu, 17 Feb 2022 01:21:16 +0000 (17:21 -0800)]
net: ping6: support setting basic SOL_IPV6 options via cmsg

Support setting IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG
during sendmsg via SOL_IPV6 cmsgs.

tclass and dontfrag are init'ed from struct ipv6_pinfo in
ipcm6_init_sk(), while hlimit is inited to -1, so we need
to handle it being populated via cmsg explicitly.

Leave extension headers and flowlabel unimplemented.
Those are slightly more laborious to test and users
seem to primarily care about IPV6_TCLASS.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'switchdev-BRENTRY'
David S. Miller [Thu, 17 Feb 2022 14:17:10 +0000 (14:17 +0000)]
Merge branch 'switchdev-BRENTRY'

Vladimir Oltean says:

====================
kRemove BRENTRY checks from switchdev drivers

As discussed here:
https://patchwork.kernel.org/project/netdevbpf/patch/20220214233111.1586715-2-vladimir.oltean@nxp.com/#24738869

no switchdev driver makes use of VLAN port objects that lack the
BRIDGE_VLAN_INFO_BRENTRY flag. Notifying them in the first place rather
seems like an omission of commit 9c86ce2c1ae3 ("net: bridge: Notify
about bridge VLANs").

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag") that was just merged, the bridge no
longer notifies switchdev upon creation of these VLANs, so we can remove
the checks from drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ti: cpsw: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:52 +0000 (18:47 +0200)]
net: ti: cpsw: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ti: am65-cpsw-nuss: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:51 +0000 (18:47 +0200)]
net: ti: am65-cpsw-nuss: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: sparx5: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:50 +0000 (18:47 +0200)]
net: sparx5: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: lan966x: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:49 +0000 (18:47 +0200)]
net: lan966x: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agomlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY
Vladimir Oltean [Wed, 16 Feb 2022 16:47:48 +0000 (18:47 +0200)]
mlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY

Since commit 3116ad0696dd ("net: bridge: vlan: don't notify to switchdev
master VLANs without BRENTRY flag"), the bridge no longer emits
switchdev notifiers for VLANs that don't have the
BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
Remove them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ptp-over-udp-dsa'
David S. Miller [Thu, 17 Feb 2022 14:06:51 +0000 (14:06 +0000)]
Merge branch 'ptp-over-udp-dsa'

Vladimir Oltean says:

====================
Support PTP over UDP with the ocelot-8021q DSA tagging protocol

The alternative tag_8021q-based tagger for Ocelot switches, added here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210129010009.3959398-1-olteanv@gmail.com/

gained support for PTP over L2 here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210213223801.1334216-1-olteanv@gmail.com/

mostly as a minimum viable requirement. That PTP support was mostly
self-contained code that installed some rules to replicate PTP packets
on the CPU queue, in felix_setup_mmio_filtering().

However ocelot-8021q starts to look more interesting for general purpose
usage, so it is now time to reduce the technical debt by integrating the
PTP traps used by Felix for tag_8021q with the rest of the Ocelot driver.

There is further consolidation of traps to be done. The cookies used by
MRP traps overlap with the cookies used for tag_8021q PTP traps, so
those features could not be used at the same time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets
Vladimir Oltean [Wed, 16 Feb 2022 14:30:14 +0000 (16:30 +0200)]
net: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets

DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the
expectation is that TX checksumming is offloaded and not done in
software.

Normally the DSA master takes care of this, but packets handled by
ocelot_defer_xmit() are a very special exception, because they are
actually injected into the switch through register-based MMIO. So the
DSA master is not involved at all for these packets => no one calculates
the checksum.

This allows PTP over UDP to work using the ocelot-8021q tagging
protocol.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: felix: update destinations of existing traps with ocelot-8021q
Vladimir Oltean [Wed, 16 Feb 2022 14:30:13 +0000 (16:30 +0200)]
net: dsa: felix: update destinations of existing traps with ocelot-8021q

Historically, the felix DSA driver has installed special traps such that
PTP over L2 works with the ocelot-8021q tagging protocol; commit
0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP
timestamping") has the details.

Then the ocelot switch library also gained more comprehensive support
for PTP traps through commit 96ca08c05838 ("net: mscc: ocelot: set up
traps for PTP packets").

Right now, PTP over L2 works using ocelot-8021q via the traps it has set
for itself, but nothing else does. Consolidating the two code blocks
would make ocelot-8021q gain support for PTP over L4 and tc-flower
traps, and at the same time avoid some code and TCAM duplication.

The traps are similar in intent, but different in execution, so some
explanation is required. The traps set up by felix_setup_mmio_filtering()
are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2
filter, and the IS2 is where the 'trap' action resides. The traps set up
by ocelot_trap_add(), on the other hand, have a single filter, in VCAP
IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure
that the hardcoded traps take precedence and cannot be overridden by the
Ocelot switch library.

So in principle, the PTP traps needed for ocelot-8021q in the Felix
driver can rely on ocelot_trap_add(), but the filters need to be patched
to account for a quirk that LS1028A has: the quirk_no_xtr_irq described
in commit 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP
timestamping"). Live-patching is done by iterating through the trap list
every time we know it has been updated, and transforming a trap into a
redirect + CPU copy if ocelot-8021q is in use.

Making the DSA ocelot-8021q tagger work with the Ocelot traps means we
can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and
OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta,
OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO
(the alternative would have been to left-shift all cookie numbers by 1).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: felix: remove dead code in felix_setup_mmio_filtering()
Vladimir Oltean [Wed, 16 Feb 2022 14:30:12 +0000 (16:30 +0200)]
net: dsa: felix: remove dead code in felix_setup_mmio_filtering()

There has been some controversy related to the sanity check that a CPU
port exists, and commit e8b1d7698038 ("net: dsa: felix: Fix memory leak
in felix_setup_mmio_filtering") even "corrected" an apparent memory leak
as static analysis tools see it.

However, the check is completely dead code, since the earliest point at
which felix_setup_mmio_filtering() can be called is:

felix_pci_probe
-> dsa_register_switch
   -> dsa_switch_probe
      -> dsa_tree_setup
         -> dsa_tree_setup_cpu_ports
            -> dsa_tree_setup_default_cpu
               -> contains the "DSA: tree %d has no CPU port\n" check
         -> dsa_tree_setup_master
            -> dsa_master_setup
               -> sysfs_create_group(&dev->dev.kobj, &dsa_group);
                  -> makes tagging_store() callable
                     -> dsa_tree_change_tag_proto
                        -> dsa_tree_notify
                           -> dsa_switch_event
                              -> dsa_switch_change_tag_proto
                                 -> ds->ops->change_tag_protocol
                                    -> felix_change_tag_protocol
                                       -> felix_set_tag_protocol
                                          -> felix_setup_tag_8021q
                                             -> felix_setup_mmio_filtering
                                                -> breaks at first CPU port

So probing would have failed earlier if there wasn't any CPU port
defined.

To avoid all confusion, delete the dead code and replace it with a
comment.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: annotate which traps need PTP timestamping
Vladimir Oltean [Wed, 16 Feb 2022 14:30:11 +0000 (16:30 +0200)]
net: mscc: ocelot: annotate which traps need PTP timestamping

The ocelot switch library does not need this information, but the felix
DSA driver does.

As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line
for packet extraction, so to be notified that a PTP packet needs to be
dequeued, it receives that packet also over Ethernet, by setting up a
packet trap. The Felix driver needs to install special kinds of traps
for packets in need of RX timestamps, such that the packets are
replicated both over Ethernet and over the CPU port module.

But the Ocelot switch library sets up more than one trap for PTP event
messages; it also traps PTP general messages, MRP control messages etc.
Those packets don't need PTP timestamps, so there's no reason for the
Felix driver to send them to the CPU port module.

By knowing which traps need PTP timestamps, the Felix driver can
adjust the traps installed using ocelot_trap_add() such that only those
will actually get delivered to the CPU port module.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: keep traps in a list
Vladimir Oltean [Wed, 16 Feb 2022 14:30:10 +0000 (16:30 +0200)]
net: mscc: ocelot: keep traps in a list

When using the ocelot-8021q tagging protocol, the CPU port isn't
configured as an NPI port, but is a regular port. So a "trap to CPU"
operation is actually a "redirect" operation. So DSA needs to set up the
trapping action one way or another, depending on the tagging protocol in
use.

To ease DSA's work of modifying the action, keep all currently installed
traps in a list, so that DSA can live-patch them when the tagging
protocol changes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: dsa: felix: use DSA port iteration helpers
Vladimir Oltean [Wed, 16 Feb 2022 14:30:09 +0000 (16:30 +0200)]
net: dsa: felix: use DSA port iteration helpers

Use the helpers that avoid the quadratic complexity associated with
calling dsa_to_port() indirectly: dsa_is_unused_port(),
dsa_is_cpu_port().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: avoid overlap in VCAP IS2 between PTP and MRP traps
Vladimir Oltean [Wed, 16 Feb 2022 14:30:08 +0000 (16:30 +0200)]
net: mscc: ocelot: avoid overlap in VCAP IS2 between PTP and MRP traps

OCELOT_VCAP_IS2_TAG_8021Q_TXVLAN overlaps with OCELOT_VCAP_IS2_MRP_REDIRECT.
To avoid this, make OCELOT_VCAP_IS2_MRP_REDIRECT take the cookie region
from N to 2 * N - 1 (where N is ocelot->num_phys_ports).

To avoid any risk that the singleton (not per port) VCAP IS2 filters
overlap with per-port VCAP IS2 filters, we must ensure that the number
of singleton filters is smaller than the number of physical ports.
This is true right now, but may change in the future as switches with
less ports get supported, or more singleton filters get added. So to be
future-proof, let's move the singleton filters at the end of the range,
where they won't overlap with anything to their right.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: use a single VCAP filter for all MRP traps
Vladimir Oltean [Wed, 16 Feb 2022 14:30:07 +0000 (16:30 +0200)]
net: mscc: ocelot: use a single VCAP filter for all MRP traps

The MRP assist code installs a VCAP IS2 trapping rule for each port, but
since the key and the action is the same, just the ingress port mask
differs, there isn't any need to do this. We can save some space in the
TCAM by using a single filter and adjusting the ingress port mask.

Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this
purpose.

Now that the cookies are no longer per port, we need to change the
allocation scheme such that MRP traps use a fixed number.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: delete OCELOT_MRP_CPUQ
Vladimir Oltean [Wed, 16 Feb 2022 14:30:06 +0000 (16:30 +0200)]
net: mscc: ocelot: delete OCELOT_MRP_CPUQ

MRP frames are configured to be trapped to the CPU queue 7, and this
number is reflected in the extraction header. However, the information
isn't used anywhere, so just leave MRP frames to go to CPU queue 0
unless needed otherwise.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: consolidate cookie allocation for private VCAP rules
Vladimir Oltean [Wed, 16 Feb 2022 14:30:05 +0000 (16:30 +0200)]
net: mscc: ocelot: consolidate cookie allocation for private VCAP rules

Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP,
PTP traps) has hardcoded filter identifiers that worked well enough for
that use case alone. But when two or more of those use cases would be
used together, some of those identifiers would overlap, leading to
breakage.

Add definitions for each cookie and centralize them in ocelot_vcap.h,
such that the overlaps are more obvious.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: mscc: ocelot: use a consistent cookie for MRP traps
Vladimir Oltean [Wed, 16 Feb 2022 14:30:04 +0000 (16:30 +0200)]
net: mscc: ocelot: use a consistent cookie for MRP traps

The driver uses an identifier equal to (ocelot->num_phys_ports + port)
for MRP traps installed when the system is in the role of an MRC, and an
identifier equal to (port) otherwise.

Use the same identifier in both cases as a consolidation for the various
cookie values spread throughout the driver.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge tag 'mlx5-updates-2022-02-16' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 17 Feb 2022 10:57:21 +0000 (10:57 +0000)]
Merge tag 'mlx5-updates-2022-02-16' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-02-16

Misc updates for mlx5:
1) Alex Liu Adds support for using xdp->data_meta

2) Aya Levin Adds PTP counters and port time stamp mode for representors and
switchdev mode.

3) Tariq Toukan, Striding RQ simple improvements.

4) Roi Dayan (7): Create multiple attr instances per flow

Some TC actions use post actions for their implementation.
For example CT and sample actions.

Create a new flow attr instance after each multi table action and
create a post action rule for it as a generic parsing step.
Now multi table actions like CT, sample don't require to do it.

When flow has multiple attr instances, the first flow attr is being
offloaded normally and linked to the next attr (post action rule) by
setting an id on reg_c for matching.
Post action rule (rule created from second attr instance) match the
id on reg_c and does rest of the actions.

Example rule with actions CT,goto will be created with 2 attr instances
as following: attr1(CT)->attr2(goto)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoperf bpf: Defer freeing string after possible strlen() on it
Arnaldo Carvalho de Melo [Wed, 16 Feb 2022 19:01:00 +0000 (16:01 -0300)]
perf bpf: Defer freeing string after possible strlen() on it

This was detected by the gcc in Fedora Rawhide's gcc:

  50    11.01 fedora:rawhide                : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
        inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
    util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
     1225 |                 *key_scan_pos += strlen(map_opt);
          |                                  ^~~~~~~~~~~~~~~
    util/bpf-loader.c:1223:9: note: call to 'free' here
     1223 |         free(map_name);
          |         ^~~~~~~~~~~~~~
    cc1: all warnings being treated as errors

So do the calculations on the pointer before freeing it.

Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agonet/mlx5e: TC, Allow sample action with CT
Roi Dayan [Thu, 2 Dec 2021 12:47:45 +0000 (14:47 +0200)]
net/mlx5e: TC, Allow sample action with CT

Allow sample+CT actions but still block sample+CT NAT
as it is not supported.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Make post_act parse CT and sample actions
Roi Dayan [Sun, 19 Dec 2021 08:31:18 +0000 (10:31 +0200)]
net/mlx5e: TC, Make post_act parse CT and sample actions

Before this commit post_act can be used for normal rules
and didn't handle special cases like CT and sample.
With this commit post_act rule can also handle the special cases
when needed.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Clean redundant counter flag from tc action parsers
Roi Dayan [Sun, 21 Nov 2021 09:38:53 +0000 (11:38 +0200)]
net/mlx5e: TC, Clean redundant counter flag from tc action parsers

When tc actions being parsed only the last flow attr created needs the
counter flag and the previous flags being reset.
Clean the flag from the tc action parsers.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Use multi table support for CT and sample actions
Roi Dayan [Wed, 1 Dec 2021 11:27:37 +0000 (13:27 +0200)]
net/mlx5e: Use multi table support for CT and sample actions

CT and sample actions use post actions for their implementation.
Flag those actions as multi table actions so the post act infrastructure
will handle the post actions allocation.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Create new flow attr for multi table actions
Roi Dayan [Sun, 8 Aug 2021 06:38:03 +0000 (09:38 +0300)]
net/mlx5e: Create new flow attr for multi table actions

Some TC actions use post actions for their implementation.
For example CT and sample actions.

Create a new flow attr after each multi table action and
create a post action rule for it.

First flow attr being offloaded normally and linked to the next
attr (post action rule) with setting an id on reg_c.
Post action rules match the id on reg_c and continue to the next one.

The flow counter is allocated on the last rule.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Add post act offload/unoffload API
Roi Dayan [Thu, 14 Oct 2021 08:31:51 +0000 (11:31 +0300)]
net/mlx5e: Add post act offload/unoffload API

Introduce mlx5e_tc_post_act_offload() and mlx5e_tc_post_act_unoffload()
to be able to unoffload and reoffload existing post action rules handles.
For example in neigh update events, the driver removes and readds rules in
hardware.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Pass actions param to actions_match_supported()
Roi Dayan [Sun, 12 Sep 2021 08:43:17 +0000 (11:43 +0300)]
net/mlx5e: Pass actions param to actions_match_supported()

Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.

Currently action_match_supported() reads the actions flag from the
flow's attribute instance. Modify the function to receive the action
flags as a parameter which is set by the calling function and
pass the aggregated actions to actions_match_supported().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: TC, Move flow hashtable to be per rep
Paul Blakey [Mon, 24 Jan 2022 13:03:27 +0000 (15:03 +0200)]
net/mlx5e: TC, Move flow hashtable to be per rep

To allow shared tc block offload between two or more reps of the
same eswitch, move the tc flow hashtable to be per rep, instead
of per eswitch.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: E-Switch, Add support for tx_port_ts in switchdev mode
Aya Levin [Thu, 20 Jan 2022 15:53:06 +0000 (17:53 +0200)]
net/mlx5e: E-Switch, Add support for tx_port_ts in switchdev mode

When turning on tx_port_ts (private flag) a PTP-SQ is created. Consider
this queue when adding rules matching SQs to VPORTs. Otherwise the
traffic on this queue won't reach the wire.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: E-Switch, Add PTP counters for uplink representor
Aya Levin [Wed, 19 Jan 2022 16:28:20 +0000 (18:28 +0200)]
net/mlx5e: E-Switch, Add PTP counters for uplink representor

There is a configuration where the uplink interface is the synchronizer.
Add PTP counters for this interface for monitoring.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: RX, Restrict bulk size for small Striding RQs
Tariq Toukan [Wed, 19 Jan 2022 16:35:42 +0000 (18:35 +0200)]
net/mlx5e: RX, Restrict bulk size for small Striding RQs

In RQs of type multi-packet WQE (Striding RQ), each WQE is relatively
large (typically 256KB) but their number is relatively small (8 in
default).

Re-mapping the descriptors' buffers before re-posting them is done via
UMR (User-Mode Memory Registration) operations.

On the one hand, posting UMR WQEs in bulks reduces communication overhead
with the HW and better utilizes its processing units.
On the other hand, delaying the WQE repost operations for a small RQ
(say, of 4 WQEs) might drastically hit its performance, causing packet
drops due to no receive buffer, for high or bursty incoming packets rate.

Here we restrict the bulk size for too small RQs. Effectively, with the current
constants, RQ of size 4 (minimum allowed) would have no bulking, while larger
RQs will continue working with bulks of 2.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Default to Striding RQ when not conflicting with CQE compression
Tariq Toukan [Tue, 11 Jan 2022 16:28:12 +0000 (18:28 +0200)]
net/mlx5e: Default to Striding RQ when not conflicting with CQE compression

CQE compression is turned on by default on slow pci systems to help
reduce the load on pci.
In this case, Striding RQ was turned off as CQEs of packets that span
several strides were not compressed, significantly reducing the compression
effectiveness.
This issue does not exist when using the newer mini_cqe format "stride_index".
Hence, allow defaulting to Striding RQ in this case.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Generalize packet merge error message
Tariq Toukan [Wed, 5 Jan 2022 21:18:36 +0000 (23:18 +0200)]
net/mlx5e: Generalize packet merge error message

Update the old error message for LRO state modify with the new
general name.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Add support for using xdp->data_meta
Alex Liu [Thu, 20 Jan 2022 19:34:59 +0000 (11:34 -0800)]
net/mlx5e: Add support for using xdp->data_meta

Add support for using xdp->data_meta for cross-program communication

Pass "true" to the last argument of xdp_prepare_buff().

After SKB is built, call skb_metadata_set() if metadata was pushed.

Signed-off-by: Alex Liu <liualex@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5e: Fix spelling mistake "supoported" -> "supported"
Colin Ian King [Mon, 31 Jan 2022 08:43:17 +0000 (08:43 +0000)]
net/mlx5e: Fix spelling mistake "supoported" -> "supported"

There is a spelling mistake in a NL_SET_ERR_MSG_MOD error
message.  Fix it.

Fixes: 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet: rtnetlink: rtnl_stats_get(): Emit an extack for unset filter_mask
Petr Machata [Wed, 16 Feb 2022 14:31:36 +0000 (15:31 +0100)]
net: rtnetlink: rtnl_stats_get(): Emit an extack for unset filter_mask

Both get and dump handlers for RTM_GETSTATS require that a filter_mask, a
mask of which attributes should be emitted in the netlink response, is
unset. rtnl_stats_dump() does include an extack in the bounce,
rtnl_stats_get() however does not. Fix the omission.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/01feb1f4bbd22a19f6629503c4f366aed6424567.1645020876.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mptcp-so_sndtimeo-and-misc-cleanup'
Jakub Kicinski [Thu, 17 Feb 2022 04:52:09 +0000 (20:52 -0800)]
Merge branch 'mptcp-so_sndtimeo-and-misc-cleanup'

Mat Martineau says:

====================
mptcp: SO_SNDTIMEO and misc. cleanup

Patch 1 adds support for the SO_SNDTIMEO socket option on MPTCP sockets.

The remaining patches are various small cleanups:

Patch 2 removes an obsolete declaration.

Patches 3 and 5 remove unnecessary function parameters.

Patch 4 removes an extra cast.

Patches 6 and 7 add some const and ro_after_init modifiers.

Patch 8 removes extra storage of TCP helpers.
====================

Link: https://lore.kernel.org/r/20220216021130.171786-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: don't save tcp data_ready and write space callbacks
Florian Westphal [Wed, 16 Feb 2022 02:11:30 +0000 (18:11 -0800)]
mptcp: don't save tcp data_ready and write space callbacks

Assign the helpers directly rather than save/restore in the context
structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: mark ops structures as ro_after_init
Florian Westphal [Wed, 16 Feb 2022 02:11:29 +0000 (18:11 -0800)]
mptcp: mark ops structures as ro_after_init

These structures are initialised from the init hooks, so we can't make
them 'const'.  But no writes occur afterwards, so we can use ro_after_init.

Also, remove bogus EXPORT_SYMBOL, the only access comes from ip
stack, not from kernel modules.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: constify a bunch of of helpers
Paolo Abeni [Wed, 16 Feb 2022 02:11:28 +0000 (18:11 -0800)]
mptcp: constify a bunch of of helpers

A few pm-related helpers don't touch arguments which lacking
the const modifier, let's constify them.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: drop port parameter of mptcp_pm_add_addr_signal
Geliang Tang [Wed, 16 Feb 2022 02:11:27 +0000 (18:11 -0800)]
mptcp: drop port parameter of mptcp_pm_add_addr_signal

Drop the port parameter of mptcp_pm_add_addr_signal() and reflect it to
avoid passing too many parameters.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: drop unneeded type casts for hmac
Geliang Tang [Wed, 16 Feb 2022 02:11:26 +0000 (18:11 -0800)]
mptcp: drop unneeded type casts for hmac

Drop the unneeded type casts to 'unsigned long long' for printing out the
hmac values in add_addr_hmac_valid() and subflow_thmac_valid().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: drop unused sk in mptcp_get_options
Geliang Tang [Wed, 16 Feb 2022 02:11:25 +0000 (18:11 -0800)]
mptcp: drop unused sk in mptcp_get_options

The parameter 'sk' became useless since the code using it was dropped
from mptcp_get_options() in the commit 8d548ea1dd15 ("mptcp: do not set
unconditionally csum_reqd on incoming opt"). Let's drop it.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: mptcp_parse_option is no longer exported
Matthieu Baerts [Wed, 16 Feb 2022 02:11:24 +0000 (18:11 -0800)]
mptcp: mptcp_parse_option is no longer exported

Options parsing in now done from mptcp_incoming_options().

mptcp_parse_option() has been removed from mptcp.h when CONFIG_MPTCP is
defined but not when it is not.

Fixes: cfde141ea3fa ("mptcp: move option parsing into mptcp_incoming_options()")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomptcp: add SNDTIMEO setsockopt support
Geliang Tang [Wed, 16 Feb 2022 02:11:23 +0000 (18:11 -0800)]
mptcp: add SNDTIMEO setsockopt support

Add setsockopt support for SO_SNDTIMEO_OLD and SO_SNDTIMEO_NEW to fix this
error reported by the mptcp bpf selftest:

 (network_helpers.c:64: errno: Operation not supported) Failed to set SO_SNDTIMEO
 test_mptcp:FAIL:115

 All error logs:

 (network_helpers.c:64: errno: Operation not supported) Failed to set SO_SNDTIMEO
 test_mptcp:FAIL:115
 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: Fix an ignored error return from dm9051_get_regs()
Yang Li [Wed, 16 Feb 2022 01:45:07 +0000 (09:45 +0800)]
net: Fix an ignored error return from dm9051_get_regs()

The return from the call to dm9051_get_regs() is int, it can be
a negative error code, however this is being assigned to an unsigned
int variable 'ret', so making 'ret' an int.

Eliminate the following coccicheck warning:
./drivers/net/ethernet/davicom/dm9051.c:527:5-8: WARNING: Unsigned
expression compared with zero: ret < 0

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220216014507.109117-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: sched: limit TC_ACT_REPEAT loops
Eric Dumazet [Tue, 15 Feb 2022 23:53:05 +0000 (15:53 -0800)]
net: sched: limit TC_ACT_REPEAT loops

We have been living dangerously, at the mercy of malicious users,
abusing TC_ACT_REPEAT, as shown by this syzpot report [1].

Add an arbitrary limit (32) to the number of times an action can
return TC_ACT_REPEAT.

v2: switch the limit to 32 instead of 10.
    Use net_warn_ratelimited() instead of pr_err_once().

[1] (C repro available on demand)

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0
        (t=10502 jiffies g=5305 q=190)
rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu:    Possible timer handling issue on cpu=0 timer-softirq=3527
rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:I stack:29344 pid:   14 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:4986 [inline]
 __schedule+0xab2/0x4db0 kernel/sched/core.c:6295
 schedule+0xd2/0x260 kernel/sched/core.c:6368
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963
 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline]
RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline]
RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508
Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84
RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e
RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e
R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80
FS:  00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
 queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
 queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
 do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115
 spin_lock_bh include/linux/spinlock.h:354 [inline]
 sch_tree_lock include/net/sch_generic.h:610 [inline]
 sch_tree_lock include/net/sch_generic.h:605 [inline]
 prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211
 prio_init+0x5c/0x80 net/sched/sch_prio.c:244
 qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253
 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7ee98aae99
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0
R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4
 </TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs
NMI backtrace for cpu 1
CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
 print_cpu_stall kernel/rcu/tree_stall.h:604 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:688 [inline]
 rcu_pending kernel/rcu/tree.c:3919 [inline]
 rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286
Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b
RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246
RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0
R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8
 tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256
 tcf_action_exec net/sched/act_api.c:1049 [inline]
 tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026
 tcf_exts_exec include/net/pkt_cls.h:326 [inline]
 route4_classify+0xef0/0x1400 net/sched/cls_route.c:179
 __tcf_classify net/sched/cls_api.c:1549 [inline]
 tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615
 prio_classify net/sched/sch_prio.c:42 [inline]
 prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75
 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668
 __dev_xmit_skb net/core/dev.c:3756 [inline]
 __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081
 neigh_hh_output include/net/neighbour.h:533 [inline]
 neigh_output include/net/neighbour.h:547 [inline]
 ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228
 __ip_finish_output net/ipv4/ip_output.c:306 [inline]
 __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288
 ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0x196/0x310 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126
 iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82
 geneve_xmit_skb drivers/net/geneve.c:966 [inline]
 geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one net/core/dev.c:3473 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
 __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116
 neigh_hh_output include/net/neighbour.h:533 [inline]
 neigh_output include/net/neighbour.h:547 [inline]
 ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126
 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
 __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
 dst_output include/net/dst.h:451 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826
 mld_send_cr net/ipv6/mcast.c:2127 [inline]
 mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659
 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
 worker_thread+0x657/0x1110 kernel/workqueue.c:2454
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
----------------
Code disassembly (best guess):
   0:   48 89 eb                mov    %rbp,%rbx
   3:   c6 45 01 01             movb   $0x1,0x1(%rbp)
   7:   41 bc 00 80 00 00       mov    $0x8000,%r12d
   d:   48 c1 e9 03             shr    $0x3,%rcx
  11:   83 e3 07                and    $0x7,%ebx
  14:   41 be 01 00 00 00       mov    $0x1,%r14d
  1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  21:   fc ff df
  24:   4c 8d 2c 01             lea    (%rcx,%rax,1),%r13
  28:   eb 0c                   jmp    0x36
* 2a:   f3 90                   pause <-- trapping instruction
  2c:   41 83 ec 01             sub    $0x1,%r12d
  30:   0f 84 72 04 00 00       je     0x4a8
  36:   41 0f b6 45 00          movzbl 0x0(%r13),%eax
  3b:   38 d8                   cmp    %bl,%al
  3d:   7f 08                   jg     0x47
  3f:   84                      .byte 0x84

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>