Alexei Starovoitov [Tue, 18 Jul 2023 22:21:09 +0000 (15:21 -0700)]
Merge branch 'two-more-fixes-for-check_max_stack_depth'
Kumar Kartikeya Dwivedi says:
====================
Two more fixes for check_max_stack_depth
I noticed two more bugs while reviewing the code, description and
examples available in the patches.
One leads to incorrect subprog index to be stored in the frame stack
maintained by the function (leading to incorrect tail_call_reachable
marks, among other things).
The other problem is missing exploration pass of other async callbacks
when they are not called from the main prog. Call chains rooted at them
can thus bypass the stack limits (32 call frames * max permitted stack
depth per function).
Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/
20230713003118.1327943-1-memxor@gmail.com
* Fix commit message for patch 2 (Alexei)
====================
Link: https://lore.kernel.org/r/20230717161530.1238-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 17 Jul 2023 16:15:30 +0000 (21:45 +0530)]
selftests/bpf: Add more tests for check_max_stack_depth bug
Another test which now exercies the path of the verifier where it will
explore call chains rooted at the async callback. Without the prior
fixes, this program loads successfully, which is incorrect.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 17 Jul 2023 16:15:29 +0000 (21:45 +0530)]
bpf: Repeat check_max_stack_depth for async callbacks
While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.
It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.
Consider functions with following stack depths:
main: 256
async: 256
foo: 256
main:
rX = async
bpf_timer_set_callback(...)
async:
foo()
Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.
Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.
Fixes:
7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi [Mon, 17 Jul 2023 16:15:28 +0000 (21:45 +0530)]
bpf: Fix subprog idx logic in check_max_stack_depth
The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.
This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.
Fixes:
7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Geetha sowjanya [Sun, 16 Jul 2023 09:37:41 +0000 (15:07 +0530)]
octeontx2-pf: Dont allocate BPIDs for LBK interfaces
Current driver enables backpressure for LBK interfaces.
But these interfaces do not support this feature.
Hence, this patch fixes the issue by skipping the
backpressure configuration for these interfaces.
Fixes:
75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ido Schimmel [Sat, 15 Jul 2023 15:36:05 +0000 (18:36 +0300)]
vrf: Fix lockdep splat in output path
Cited commit converted the neighbour code to use the standard RCU
variant instead of the RCU-bh variant, but the VRF code still uses
rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
code in its IPv4 and IPv6 output paths, resulting in lockdep splats
[1][2]. Can be reproduced using [3].
Fix by switching to rcu_read_lock() / rcu_read_unlock().
[1]
=============================
WARNING: suspicious RCU usage
6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
-----------------------------
include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ping/183:
#0:
ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0
#1:
ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030
stack backtrace:
CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xc1/0xf0
lockdep_rcu_suspicious+0x211/0x3b0
vrf_output+0x1380/0x2030
ip_push_pending_frames+0x125/0x2a0
raw_sendmsg+0x200d/0x33c0
inet_sendmsg+0xa2/0xe0
__sys_sendto+0x2aa/0x420
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[2]
=============================
WARNING: suspicious RCU usage
6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
-----------------------------
include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ping6/182:
#0:
ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50
#1:
ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310
stack backtrace:
CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xc1/0xf0
lockdep_rcu_suspicious+0x211/0x3b0
vrf_output6+0xd32/0x1310
ip6_local_out+0xb4/0x1a0
ip6_send_skb+0xbc/0x340
ip6_push_pending_frames+0xe5/0x110
rawv6_sendmsg+0x2e6e/0x3e50
inet_sendmsg+0xa2/0xe0
__sys_sendto+0x2aa/0x420
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[3]
#!/bin/bash
ip link add name vrf-red up numtxqueues 2 type vrf table 10
ip link add name swp1 up master vrf-red type dummy
ip address add 192.0.2.1/24 dev swp1
ip address add 2001:db8:1::1/64 dev swp1
ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1
ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1
ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null
ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null
Fixes:
09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230715153605.4068066-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 18 Jul 2023 09:50:10 +0000 (11:50 +0200)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-14 (ice)
This series contains updates to ice driver only.
Petr Oros removes multiple calls made to unregister netdev and
devlink_port.
Michal fixes null pointer dereference that can occur during reload.
====================
Link: https://lore.kernel.org/r/20230714201041.1717834-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Heiner Kallweit [Fri, 14 Jul 2023 05:39:36 +0000 (07:39 +0200)]
r8169: fix ASPM-related problem for chip version 42 and 43
Referenced commit missed that for chip versions 42 and 43 ASPM
remained disabled in the respective rtl_hw_start_...() routines.
This resulted in problems as described in the referenced bug
ticket. Therefore re-instantiate the previous logic.
Fixes:
5fc3f6c90cca ("r8169: consolidate disabling ASPM before EPHY access")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Fri, 14 Jul 2023 00:46:22 +0000 (17:46 -0700)]
net: dsa: microchip: correct KSZ8795 static MAC table access
The KSZ8795 driver code was modified to use on KSZ8863/73, which has
different register definitions. Some of the new KSZ8795 register
information are wrong compared to previous code.
KSZ8795 also behaves differently in that the STATIC_MAC_TABLE_USE_FID
and STATIC_MAC_TABLE_FID bits are off by 1 when doing MAC table reading
than writing. To compensate that a special code was added to shift the
register value by 1 before applying those bits. This is wrong when the
code is running on KSZ8863, so this special code is only executed when
KSZ8795 is detected.
Fixes:
4b20a07e103f ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Jul 2023 06:33:39 +0000 (07:33 +0100)]
Merge branch 'sched-fixes'
Victor Nogueira says:
====================
net: sched: Fixes for classifiers
Four different classifiers (bpf, u32, matchall, and flower) are
calling tcf_bind_filter in their callbacks, but arent't undoing it by
calling tcf_unbind_filter if their was an error after binding.
This patch set fixes all this by calling tcf_unbind_filter in such
cases.
This set also undoes a refcount decrement in cls_u32 when an update
fails under specific conditions which are described in patch #3.
v1 -> v2:
* Remove blank line after fixes tag
* Fix reverse xmas tree issues pointed out by Simon
v2 -> v3:
* Inline functions cls_bpf_set_parms and fl_set_parms to avoid adding
yet another parameter (and a return value at it) to them.
* Remove similar fixes for u32 and matchall, which will be sent soon,
once we find a way to do the fixes without adding a return parameter
to their set_parms functions.
v3 -> v4:
* Inline mall_set_parms to avoid adding yet another parameter.
* Remove set_flags parameter from u32_set_parms and create a separate
function for calling tcf_bind_filter and tcf_unbind_filter in case of
failure.
* Change cover letter title to also encompass refcnt fix for u32
v4 -> v5:
* Change back tag to net
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Nogueira [Thu, 13 Jul 2023 18:05:14 +0000 (15:05 -0300)]
net: sched: cls_flower: Undo tcf_bind_filter in case of an error
If TCA_FLOWER_CLASSID is specified in the netlink message, the code will
call tcf_bind_filter. However, if any error occurs after that, the code
should undo this by calling tcf_unbind_filter.
Fixes:
77b9900ef53a ("tc: introduce Flower classifier")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Nogueira [Thu, 13 Jul 2023 18:05:13 +0000 (15:05 -0300)]
net: sched: cls_bpf: Undo tcf_bind_filter in case of an error
If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
was done before the error.
Fix that by calling tcf_unbind_filter in errout_parms.
Fixes:
eadb41489fd2 ("net: cls_bpf: add support for marking filters as hardware-only")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Nogueira [Thu, 13 Jul 2023 18:05:12 +0000 (15:05 -0300)]
net: sched: cls_u32: Undo refcount decrement in case update failed
In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
decrement the refcount of the ht_down (struct tc_u_hnode) pointer
present in the older u32 filter which we are replacing. However, if
u32_replace_hw_knode errors out, the update command fails and that
ht_down pointer continues decremented. To fix that, when
u32_replace_hw_knode fails, check if ht_down's refcount was decremented
and undo the decrement.
Fixes:
d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Nogueira [Thu, 13 Jul 2023 18:05:11 +0000 (15:05 -0300)]
net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knode
When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter
operation done at u32_set_parms.
Fixes:
d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Nogueira [Thu, 13 Jul 2023 18:05:10 +0000 (15:05 -0300)]
net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after mall_set_parms
In case an error occurred after mall_set_parms executed successfully, we
must undo the tcf_bind_filter call it issues.
Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
Fixes:
ec2507d2a306 ("net/sched: cls_matchall: Fix error path")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 15 Jul 2023 03:39:35 +0000 (20:39 -0700)]
Merge branch 'net-fix-kernel-doc-problems-in-include-net'
Randy Dunlap says:
====================
net: fix kernel-doc problems in include/net/
Fix many (but not all) kernel-doc warnings in include/net/.
[PATCH v2 net 1/9] net: bonding: remove kernel-doc comment marker
[PATCH v2 net 2/9] net: cfg802154: fix kernel-doc notation warnings
[PATCH v2 net 3/9] codel: fix kernel-doc notation warnings
[PATCH v2 net 4/9] devlink: fix kernel-doc notation warnings
[PATCH v2 net 5/9] inet: frags: remove kernel-doc comment marker
[PATCH v2 net 6/9] net: llc: fix kernel-doc notation warnings
[PATCH v2 net 7/9] net: NSH: fix kernel-doc notation warning
[PATCH v2 net 8/9] pie: fix kernel-doc notation warning
[PATCH v2 net 9/9] rsi: remove kernel-doc comment marker
====================
Link: https://lore.kernel.org/r/20230714045127.18752-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:27 +0000 (21:51 -0700)]
rsi: remove kernel-doc comment marker
Change an errant kernel-doc comment marker (/**) to a regular
comment to prevent a kernel-doc warning.
rsi_91x.h:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Copyright (c) 2017 Redpine Signals Inc.
Fixes:
4c10d56a76bb ("rsi: add header file rsi_91x")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Prameela Rani Garnepudi <prameela.j04cs@gmail.com>
Cc: Siva Rebbagondla <siva.rebbagondla@redpinesignals.com>
Acked-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230714045127.18752-10-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:26 +0000 (21:51 -0700)]
pie: fix kernel-doc notation warning
Spell a struct member's name correctly to prevent a kernel-doc
warning.
pie.h:38: warning: Function parameter or member 'tupdate' not described in 'pie_params'
Fixes:
b42a3d7c7cff ("pie: improve comments and commenting style")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Leslie Monis <lesliemonis@gmail.com>
Cc: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>
Cc: Gautam Ramakrishnan <gautamramk@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Link: https://lore.kernel.org/r/20230714045127.18752-9-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:25 +0000 (21:51 -0700)]
net: NSH: fix kernel-doc notation warning
Use the struct member's name and the correct format to prevent a
kernel-doc warning.
nsh.h:200: warning: Function parameter or member 'context' not described in 'nsh_md1_ctx'
Fixes:
1f0b7744c505 ("net: add NSH header structures and helpers")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiri Benc <jbenc@redhat.com>
Link: https://lore.kernel.org/r/20230714045127.18752-8-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:24 +0000 (21:51 -0700)]
net: llc: fix kernel-doc notation warnings
Use the corrent function parameter name or format to prevent
kernel-doc warnings.
Add 2 function parameter descriptions to prevent kernel-doc warnings.
llc_pdu.h:278: warning: Function parameter or member 'da' not described in 'llc_pdu_decode_da'
llc_pdu.h:278: warning: Excess function parameter 'sa' description in 'llc_pdu_decode_da'
llc_pdu.h:330: warning: Function parameter or member 'skb' not described in 'llc_pdu_init_as_test_cmd'
llc_pdu.h:379: warning: Function parameter or member 'svcs_supported' not described in 'llc_pdu_init_as_xid_cmd'
llc_pdu.h:379: warning: Function parameter or member 'rx_window' not described in 'llc_pdu_init_as_xid_cmd'
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230714045127.18752-7-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:23 +0000 (21:51 -0700)]
inet: frags: eliminate kernel-doc warning
Modify the anonymous enum kernel-doc content so that it doesn't cause
a kernel-doc warning.
inet_frag.h:33: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Fixes:
1ab1934ed80a ("inet: frags: enum the flag definitions and add descriptions")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230714045127.18752-6-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:22 +0000 (21:51 -0700)]
devlink: fix kernel-doc notation warnings
Spell function or struct member names correctly.
Use ':' instead of '-' for struct member entries.
Mark one field as private in kernel-doc.
Add a few entries that were missing.
Fix a typo.
These changes prevent kernel-doc warnings:
devlink.h:252: warning: Function parameter or member 'field_id' not described in 'devlink_dpipe_match'
devlink.h:267: warning: Function parameter or member 'field_id' not described in 'devlink_dpipe_action'
devlink.h:310: warning: Function parameter or member 'match_values_count' not described in 'devlink_dpipe_entry'
devlink.h:355: warning: Function parameter or member 'list' not described in 'devlink_dpipe_table'
devlink.h:374: warning: Function parameter or member 'actions_dump' not described in 'devlink_dpipe_table_ops'
devlink.h:374: warning: Function parameter or member 'matches_dump' not described in 'devlink_dpipe_table_ops'
devlink.h:374: warning: Function parameter or member 'entries_dump' not described in 'devlink_dpipe_table_ops'
devlink.h:374: warning: Function parameter or member 'counters_set_update' not described in 'devlink_dpipe_table_ops'
devlink.h:374: warning: Function parameter or member 'size_get' not described in 'devlink_dpipe_table_ops'
devlink.h:384: warning: Function parameter or member 'headers' not described in 'devlink_dpipe_headers'
devlink.h:384: warning: Function parameter or member 'headers_count' not described in 'devlink_dpipe_headers'
devlink.h:398: warning: Function parameter or member 'unit' not described in 'devlink_resource_size_params'
devlink.h:487: warning: Function parameter or member 'id' not described in 'devlink_param'
devlink.h:645: warning: Function parameter or member 'overwrite_mask' not described in 'devlink_flash_update_params'
Fixes:
1555d204e743 ("devlink: Support for pipeline debug (dpipe)")
Fixes:
d9f9b9a4d05f ("devlink: Add support for resource abstraction")
Fixes:
eabaef1896bc ("devlink: Add devlink_param register and unregister")
Fixes:
5d5b4128c4ca ("devlink: introduce flash update overwrite mask")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Moshe Shemesh <moshe@mellanox.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230714045127.18752-5-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:21 +0000 (21:51 -0700)]
codel: fix kernel-doc notation warnings
Use '@' before the struct member names in kernel-doc notation
to prevent kernel-doc warnings.
codel.h:158: warning: Function parameter or member 'ecn_mark' not described in 'codel_stats'
codel.h:158: warning: Function parameter or member 'ce_mark' not described in 'codel_stats'
Fixes:
76e3cc126bb2 ("codel: Controlled Delay AQM")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Link: https://lore.kernel.org/r/20230714045127.18752-4-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:20 +0000 (21:51 -0700)]
net: cfg802154: fix kernel-doc notation warnings
Add an enum heading to the kernel-doc comments to prevent
kernel-doc warnings.
cfg802154.h:174: warning: Cannot understand * @WPAN_PHY_FLAG_TRANSMIT_POWER: Indicates that transceiver will support
on line 174 - I thought it was a doc line
cfg802154.h:192: warning: Enum value 'WPAN_PHY_FLAG_TXPOWER' not described in enum 'wpan_phy_flags'
cfg802154.h:192: warning: Excess enum value 'WPAN_PHY_FLAG_TRANSMIT_POWER' description in 'wpan_phy_flags'
Fixes:
edea8f7c75ec ("cfg802154: introduce wpan phy flags")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Cc: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230714045127.18752-3-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Fri, 14 Jul 2023 04:51:19 +0000 (21:51 -0700)]
net: bonding: remove kernel-doc comment marker
Change an errant kernel-doc comment marker (/**) to a regular
comment to prevent a kernel-doc warning.
bonding.h:282: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Returns NULL if the net_device does not belong to any of the bond's slaves
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Link: https://lore.kernel.org/r/20230714045127.18752-2-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michal Swiatkowski [Thu, 6 Jul 2023 06:25:51 +0000 (08:25 +0200)]
ice: prevent NULL pointer deref during reload
Calling ethtool during reload can lead to call trace, because VSI isn't
configured for some time, but netdev is alive.
To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors
to 0 after freeing and add a check for ::tx/rx_rings in ring related
ethtool ops.
Add proper unroll of filters in ice_start_eth().
Reproduction:
$watch -n 0.1 -d 'ethtool -g enp24s0f0np0'
$devlink dev reload pci/0000:18:00.0 action driver_reinit
Call trace before fix:
[66303.926205] BUG: kernel NULL pointer dereference, address:
0000000000000000
[66303.926259] #PF: supervisor read access in kernel mode
[66303.926286] #PF: error_code(0x0000) - not-present page
[66303.926311] PGD 0 P4D 0
[66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI
[66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1
[66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.
070920180847 07/09/2018
[66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice]
[66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48
[66303.926722] RSP: 0018:
ffffad40472f39c8 EFLAGS:
00010246
[66303.926749] RAX:
ffff98a8ada05828 RBX:
ffff98a8c46dd060 RCX:
ffffad40472f3b48
[66303.926781] RDX:
0000000000000000 RSI:
ffff98a8c46dd068 RDI:
ffff98a8b23c4000
[66303.926811] RBP:
ffffad40472f3b48 R08:
00000000000337b0 R09:
0000000000000000
[66303.926843] R10:
0000000000000001 R11:
0000000000000100 R12:
ffff98a8b23c4000
[66303.926874] R13:
ffff98a8c46dd060 R14:
000000000000000f R15:
ffffad40472f3a50
[66303.926906] FS:
00007f6397966740(0000) GS:
ffff98b390900000(0000) knlGS:
0000000000000000
[66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[66303.926967] CR2:
0000000000000000 CR3:
000000011ac20002 CR4:
00000000007706e0
[66303.926999] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[66303.927029] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[66303.927060] PKRU:
55555554
[66303.927075] Call Trace:
[66303.927094] <TASK>
[66303.927111] ? __die+0x23/0x70
[66303.927140] ? page_fault_oops+0x171/0x4e0
[66303.927176] ? exc_page_fault+0x7f/0x180
[66303.927209] ? asm_exc_page_fault+0x26/0x30
[66303.927244] ? ice_get_ringparam+0x22/0x50 [ice]
[66303.927433] rings_prepare_data+0x62/0x80
[66303.927469] ethnl_default_doit+0xe2/0x350
[66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140
[66303.927538] genl_rcv_msg+0x1b1/0x2c0
[66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10
[66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10
[66303.927615] netlink_rcv_skb+0x58/0x110
[66303.927644] genl_rcv+0x28/0x40
[66303.927665] netlink_unicast+0x19e/0x290
[66303.927691] netlink_sendmsg+0x254/0x4d0
[66303.927717] sock_sendmsg+0x93/0xa0
[66303.927743] __sys_sendto+0x126/0x170
[66303.927780] __x64_sys_sendto+0x24/0x30
[66303.928593] do_syscall_64+0x5d/0x90
[66303.929370] ? __count_memcg_events+0x60/0xa0
[66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30
[66303.930920] ? handle_mm_fault+0x9e/0x350
[66303.931688] ? do_user_addr_fault+0x258/0x740
[66303.932452] ? exc_page_fault+0x7f/0x180
[66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes:
5b246e533d01 ("ice: split probe into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Petr Oros [Mon, 19 Jun 2023 10:58:13 +0000 (12:58 +0200)]
ice: Unregister netdev and devlink_port only once
Since commit
6624e780a577fc ("ice: split ice_vsi_setup into smaller
functions") ice_vsi_release does things twice. There is unregister
netdev which is unregistered in ice_deinit_eth also.
It also unregisters the devlink_port twice which is also unregistered
in ice_deinit_eth(). This double deregistration is hidden because
devl_port_unregister ignores the return value of xa_erase.
[ 68.642167] Call Trace:
[ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice]
[ 68.655656] ice_vsi_release+0x445/0x690 [ice]
[ 68.660147] ice_deinit+0x99/0x280 [ice]
[ 68.664117] ice_remove+0x1b6/0x5c0 [ice]
[ 171.103841] Call Trace:
[ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice]
[ 171.114841] ice_remove+0x158/0x270 [ice]
[ 171.118854] pci_device_remove+0x3b/0xc0
[ 171.122779] device_release_driver_internal+0xc7/0x170
[ 171.127912] driver_detach+0x54/0x8c
[ 171.131491] bus_remove_driver+0x77/0xd1
[ 171.135406] pci_unregister_driver+0x2d/0xb0
[ 171.139670] ice_module_exit+0xc/0x55f [ice]
Fixes:
6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Yan Zhai [Thu, 13 Jul 2023 17:28:00 +0000 (10:28 -0700)]
gso: fix dodgy bit handling for GSO_UDP_L4
Commit
1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4
packets.") checks DODGY bit for UDP, but for packets that can be fed
directly to the device after gso_segs reset, it actually falls through
to fragmentation:
https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYDg+A@mail.gmail.com/
This change restores the expected behavior of GSO_UDP_L4 packets.
Fixes:
1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Ming [Thu, 13 Jul 2023 12:16:03 +0000 (20:16 +0800)]
net: ethernet: Remove repeating expression
Identify issues that arise by using the tests/doublebitand.cocci
semantic patch. Need to remove duplicate expression in if statement.
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Ming [Thu, 13 Jul 2023 09:51:06 +0000 (17:51 +0800)]
bna: Remove error checking for debugfs_create_dir()
It is expected that most callers should _ignore_ the errors return by
debugfs_create_dir() in bnad_debugfs_init().
Signed-off-by: Wang Ming <machel@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Golle [Thu, 13 Jul 2023 02:42:29 +0000 (03:42 +0100)]
net: ethernet: mtk_eth_soc: handle probe deferral
Move the call to of_get_ethdev_address to mtk_add_mac which is part of
the probe function and can hence itself return -EPROBE_DEFER should
of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely
get rid of the mtk_init function.
The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced
in situations in which the NVMEM provider holding the MAC address has
not yet be loaded at the time mtk_eth_soc is initially probed. In this
case probing of mtk_eth_soc should be deferred instead of falling back
to use a random MAC address, so once the NVMEM provider becomes
available probing can be repeated.
Fixes:
656e705243fd ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kuniyuki Iwashima [Wed, 12 Jul 2023 15:44:49 +0000 (08:44 -0700)]
bridge: Add extack warning when enabling STP in netns.
When we create an L2 loop on a bridge in netns, we will see packets storm
even if STP is enabled.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
# ip link set veth1 master br0 up
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
# sleep 30
# ip -s link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
956553768 12861249 0 0 0
12861249 <-. Keep
TX: bytes packets errors dropped carrier collsns | increasing
1027834 11951 0 0 0 0 <-' rapidly
This is because llc_rcv() drops all packets in non-root netns and BPDU
is dropped.
Let's add extack warning when enabling STP in netns.
# unshare -n
# ip link add br0 type bridge
# ip link set br0 type bridge stp_state 1
Warning: bridge: STP does not work in non-root netns.
Note this commit will be reverted later when we namespacify the whole LLC
infra.
Fixes:
e730c15519d0 ("[NET]: Make packet reception network namespace safe")
Suggested-by: Harry Coin <hcoin@quietfountain.com>
Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tanmay Patil [Wed, 12 Jul 2023 11:06:57 +0000 (16:36 +0530)]
net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the
field will be strictly contained within one word. However, this is not
guaranteed to be the case and it is possible for ALE field entries to span
across up to two words at the most.
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes:
db82173f23c5 ("netdev: driver: ethernet: add cpsw address lookup engine support")
Signed-off-by: Tanmay Patil <t-patil@ti.com>
[s-vadapalli@ti.com: rephrased commit message and added Fixes tag]
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Brown [Wed, 12 Jul 2023 11:16:16 +0000 (12:16 +0100)]
net: dsa: ar9331: Use explict flags for regmap single read/write
The at9331 is only able to read or write a single register at once. The
driver has a custom regmap bus and chooses to tell the regmap core about
this by reporting the maximum transfer sizes rather than the explicit
flags that exist at the regmap level. Since there are a number of
problems with the raw transfer limits and the regmap level flags are
better integrated anyway convert the driver to use the flags.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alan Stern [Wed, 12 Jul 2023 14:15:10 +0000 (10:15 -0400)]
net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb
The syzbot fuzzer identified a problem in the usbnet driver:
usb 1-1: BOGUS urb xfer, pipe 3 != type 1
WARNING: CPU: 0 PID: 754 at drivers/usb/core/urb.c:504 usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504
Modules linked in:
CPU: 0 PID: 754 Comm: kworker/0:2 Not tainted 6.4.0-rc7-syzkaller-00014-g692b7dc87ca6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Workqueue: mld mld_ifc_work
RIP: 0010:usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504
Code: 7c 24 18 e8 2c b4 5b fb 48 8b 7c 24 18 e8 42 07 f0 fe 41 89 d8 44 89 e1 4c 89 ea 48 89 c6 48 c7 c7 a0 c9 fc 8a e8 5a 6f 23 fb <0f> 0b e9 58 f8 ff ff e8 fe b3 5b fb 48 81 c5 c0 05 00 00 e9 84 f7
RSP: 0018:
ffffc9000463f568 EFLAGS:
00010086
RAX:
0000000000000000 RBX:
0000000000000001 RCX:
0000000000000000
RDX:
ffff88801eb28000 RSI:
ffffffff814c03b7 RDI:
0000000000000001
RBP:
ffff8881443b7190 R08:
0000000000000001 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000001 R12:
0000000000000003
R13:
ffff88802a77cb18 R14:
0000000000000003 R15:
ffff888018262500
FS:
0000000000000000(0000) GS:
ffff8880b9800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000556a99c15a18 CR3:
0000000028c71000 CR4:
0000000000350ef0
Call Trace:
<TASK>
usbnet_start_xmit+0xfe5/0x2190 drivers/net/usb/usbnet.c:1453
__netdev_start_xmit include/linux/netdevice.h:4918 [inline]
netdev_start_xmit include/linux/netdevice.h:4932 [inline]
xmit_one net/core/dev.c:3578 [inline]
dev_hard_start_xmit+0x187/0x700 net/core/dev.c:3594
...
This bug is caused by the fact that usbnet trusts the bulk endpoint
addresses its probe routine receives in the driver_info structure, and
it does not check to see that these endpoints actually exist and have
the expected type and directions.
The fix is simply to add such a check.
Reported-and-tested-by: syzbot+63ee658b9a100ffadbe2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/
000000000000a56e9105d0cec021@google.com/
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/ea152b6d-44df-4f8a-95c6-4db51143dcc1@rowland.harvard.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Walleij [Wed, 12 Jul 2023 22:34:05 +0000 (00:34 +0200)]
dsa: mv88e6xxx: Do a final check before timing out
I get sporadic timeouts from the driver when using the
MV88E6352. Reading the status again after the loop fixes the
problem: the operation is successful but goes undetected.
Some added prints show things like this:
[ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
[ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
ATU op 4000, fid 0001
(...)
[ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
[ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
for PHY command 1860 to complete
The reason is probably not the commands: I think those are
mostly fine with the 50+50ms timeout, but the problem
appears when OpenWrt brings up several interfaces in
parallel on a system with 7 populated ports: if one of
them take more than 50 ms and waits one or more of the
others can get stuck on the mutex for the switch and then
this can easily multiply.
As we sleep and wait, the function loop needs a final
check after exiting the loop if we were successful.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Fixes:
35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230712223405.861899-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 13 Jul 2023 21:21:22 +0000 (14:21 -0700)]
Merge tag 'net-6.5-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter, wireless and ebpf.
Current release - regressions:
- netfilter: conntrack: gre: don't set assured flag for clash entries
- wifi: iwlwifi: remove 'use_tfh' config to fix crash
Previous releases - regressions:
- ipv6: fix a potential refcount underflow for idev
- icmp6: ifix null-ptr-deref of ip6_null_entry->rt6i_idev in
icmp6_dev()
- bpf: fix max stack depth check for async callbacks
- eth: mlx5e:
- check for NOT_READY flag state after locking
- fix page_pool page fragment tracking for XDP
- eth: igc:
- fix tx hang issue when QBV gate is closed
- fix corner cases for TSN offload
- eth: octeontx2-af: Move validation of ptp pointer before its usage
- eth: ena: fix shift-out-of-bounds in exponential backoff
Previous releases - always broken:
- core: prevent skb corruption on frag list segmentation
- sched:
- cls_fw: fix improper refcount update leads to use-after-free
- sch_qfq: account for stab overhead in qfq_enqueue
- netfilter:
- report use refcount overflow
- prevent OOB access in nft_byteorder_eval
- wifi: mt7921e: fix init command fail with enabled device
- eth: ocelot: fix oversize frame dropping for preemptible TCs
- eth: fec: recycle pages for transmitted XDP frames"
* tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
selftests: tc-testing: add test for qfq with stab overhead
net/sched: sch_qfq: account for stab overhead in qfq_enqueue
selftests: tc-testing: add tests for qfq mtu sanity check
net/sched: sch_qfq: reintroduce lmax bound check for MTU
wifi: cfg80211: fix receiving mesh packets without RFC1042 header
wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
net: txgbe: fix eeprom calculation error
net/sched: make psched_mtu() RTNL-less safe
net: ena: fix shift-out-of-bounds in exponential backoff
netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
net/sched: flower: Ensure both minimum and maximum ports are specified
MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
docs: netdev: update the URL of the status page
wifi: iwlwifi: remove 'use_tfh' config to fix crash
xdp: use trusted arguments in XDP hints kfuncs
bpf: cpumap: Fix memory leak in cpu_map_update_elem
wifi: airo: avoid uninitialized warning in airo_get_rate()
octeontx2-pf: Add additional check for MCAM rules
net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
net: fec: use netdev_err_once() instead of netdev_err()
...
Linus Torvalds [Thu, 13 Jul 2023 20:44:28 +0000 (13:44 -0700)]
Merge tag 'trace-v6.5-rc1-3' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix some missing-prototype warnings
- Fix user events struct args (did not include size of struct)
When creating a user event, the "struct" keyword is to denote that
the size of the field will be passed in. But the parsing failed to
handle this case.
- Add selftest to struct sizes for user events
- Fix sample code for direct trampolines.
The sample code for direct trampolines attached to handle_mm_fault().
But the prototype changed and the direct trampoline sample code was
not updated. Direct trampolines needs to have the arguments correct
otherwise it can fail or crash the system.
- Remove unused ftrace_regs_caller_ret() prototype.
- Quiet false positive of FORTIFY_SOURCE
Due to backward compatibility, the structure used to save stack
traces in the kernel had a fixed size of 8. This structure is
exported to user space via the tracing format file. A change was made
to allow more than 8 functions to be recorded, and user space now
uses the size field to know how many functions are actually in the
stack.
But the structure still has size of 8 (even though it points into the
ring buffer that has the required amount allocated to hold a full
stack.
This was fine until the fortifier noticed that the
memcpy(&entry->caller, stack, size) was greater than the 8 functions
and would complain at runtime about it.
Hide this by using a pointer to the stack location on the ring buffer
instead of using the address of the entry structure caller field.
- Fix a deadloop in reading trace_pipe that was caused by a mismatch
between ring_buffer_empty() returning false which then asked to read
the data, but the read code uses rb_num_of_entries() that returned
zero, and causing a infinite "retry".
- Fix a warning caused by not using all pages allocated to store ftrace
functions, where this can happen if the linker inserts a bunch of
"NULL" entries, causing the accounting of how many pages needed to be
off.
- Fix histogram synthetic event crashing when the start event is
removed and the end event is still using a variable from it
- Fix memory leak in freeing iter->temp in tracing_release_pipe()
* tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix memory leak of iter->temp when reading trace_pipe
tracing/histograms: Add histograms to hist_vars if they have referenced variables
tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
ring-buffer: Fix deadloop issue on reading trace_pipe
tracing: arm64: Avoid missing-prototype warnings
selftests/user_events: Test struct size match cases
tracing/user_events: Fix struct arg size match check
x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
arm64: ftrace: Add direct call trampoline samples support
samples: ftrace: Save required argument registers in sample trampolines
Linus Torvalds [Thu, 13 Jul 2023 20:39:36 +0000 (13:39 -0700)]
Merge tag 'for-linus-6.5-rc2-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a cleanup of the Xen related ELF-notes
- a fix for virtio handling in Xen dom0 when running Xen in a VM
* tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
x86/Xen: tidy xen-head.S
Linus Torvalds [Thu, 13 Jul 2023 20:34:00 +0000 (13:34 -0700)]
Merge tag 'sh-for-v6.5-tag2' of git://git./linux/kernel/git/glaubitz/sh-linux
Pull sh fixes from John Paul Adrian Glaubitz:
"The sh updates introduced multiple regressions.
In particular, the change
a8ac2961148e ("sh: Avoid using IRQ0 on SH3
and SH4") causes several boards to hang during boot due to incorrect
IRQ numbers.
Geert Uytterhoeven has contributed patches that handle the virq offset
in the IRQ code for the dreamcast, highlander and r2d boards while
Artur Rojek has contributed a patch which handles the virq offset for
the hd64461 companion chip"
* tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
sh: mach-highlander: Handle virq offset in cascaded IRL demux
sh: mach-r2d: Handle virq offset in cascaded IRL demux
Zheng Yejian [Thu, 13 Jul 2023 14:14:35 +0000 (22:14 +0800)]
tracing: Fix memory leak of iter->temp when reading trace_pipe
kmemleak reports:
unreferenced object 0xffff88814d14e200 (size 256):
comm "cat", pid 336, jiffies
4294871818 (age 779.490s)
hex dump (first 32 bytes):
04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00 ................
0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff .........Z......
backtrace:
[<
ffffffff9bdff18f>] __kmalloc+0x4f/0x140
[<
ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
[<
ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
[<
ffffffff9bc94490>] print_trace_line+0x3e0/0x950
[<
ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
[<
ffffffff9bf03a43>] vfs_read+0x143/0x520
[<
ffffffff9bf04c2d>] ksys_read+0xbd/0x160
[<
ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
[<
ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
To fix it, free 'iter->temp' in tracing_release_pipe().
Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
ff895103a84ab ("tracing: Save off entry when peeking at next entry")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Paolo Abeni [Thu, 13 Jul 2023 09:12:01 +0000 (11:12 +0200)]
Merge branch 'net-sched-fixes-for-sch_qfq'
Pedro Tammela says:
====================
net/sched: fixes for sch_qfq
Patch 1 fixes a regression introduced in 6.4 where the MTU size could be
bigger than 'lmax'.
Patch 3 fixes an issue where the code doesn't account for qdisc_pkt_len()
returning a size bigger then 'lmax'.
Patches 2 and 4 are selftests for the issues above.
====================
Link: https://lore.kernel.org/r/20230711210103.597831-1-pctammela@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela [Tue, 11 Jul 2023 21:01:03 +0000 (18:01 -0300)]
selftests: tc-testing: add test for qfq with stab overhead
A packet with stab overhead greater than QFQ_MAX_LMAX should be dropped
by the QFQ qdisc as it can't handle such lengths.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela [Tue, 11 Jul 2023 21:01:02 +0000 (18:01 -0300)]
net/sched: sch_qfq: account for stab overhead in qfq_enqueue
Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.
Consider the following code in net/sched/sch_qfq.c:
static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb), gso_segs;
// ...
if (unlikely(cl->agg->lmax < len)) {
pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
cl->agg->lmax, len, cl->common.classid);
err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
if (err) {
cl->qstats.drops++;
return qdisc_drop(skb, sch, to_free);
}
// ...
}
Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.
This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.
A user may choose virtually any size using such a table.
As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------
We can create the issue with the following commands:
tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead
999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2
This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.
Fixes:
462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <nnamrec@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela [Tue, 11 Jul 2023 21:01:01 +0000 (18:01 -0300)]
selftests: tc-testing: add tests for qfq mtu sanity check
QFQ only supports a certain bound of MTU size so make sure
we check for this requirement in the tests.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pedro Tammela [Tue, 11 Jul 2023 21:01:00 +0000 (18:01 -0300)]
net/sched: sch_qfq: reintroduce lmax bound check for MTU
25369891fcef deletes a check for the case where no 'lmax' is
specified which
3037933448f6 previously fixed as 'lmax'
could be set to the device's MTU without any bound checking
for QFQ_LMAX_MIN and QFQ_LMAX_MAX. Therefore, reintroduce the check.
Fixes:
25369891fcef ("net/sched: sch_qfq: refactor parsing of netlink parameters")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Artur Rojek [Mon, 10 Jul 2023 23:31:32 +0000 (01:31 +0200)]
sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
A recent change to start counting SuperH IRQ #s from 16 breaks support
for the Hitachi HD64461 companion chip.
Move the offchip IRQ base and HD64461 IRQ # by 16 in order to
accommodate for the new virq numbering rules.
Fixes:
a8ac2961148e ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Artur Rojek <contact@artur-rojek.eu>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20230710233132.69734-1-contact@artur-rojek.eu
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Geert Uytterhoeven [Sun, 9 Jul 2023 13:10:43 +0000 (15:10 +0200)]
sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
Take into account the virq offset when translating cascaded interrupts.
Fixes:
a8ac2961148e8c72 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/7d0cb246c9f1cd24bb1f637ec5cb67e799a4c3b8.1688908227.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Geert Uytterhoeven [Sun, 9 Jul 2023 13:10:23 +0000 (15:10 +0200)]
sh: mach-highlander: Handle virq offset in cascaded IRL demux
Take into account the virq offset when translating cascaded IRL
interrupts.
Fixes:
a8ac2961148e8c72 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/4fcb0d08a2b372431c41e04312742dc9e41e1be4.1688908186.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Geert Uytterhoeven [Sun, 9 Jul 2023 11:15:49 +0000 (13:15 +0200)]
sh: mach-r2d: Handle virq offset in cascaded IRL demux
When booting rts7751r2dplus_defconfig on QEMU, the system hangs due to
an interrupt storm on IRQ 20. IRQ 20 aka event 0x280 is a cascaded IRL
interrupt, which maps to IRQ_VOYAGER, the interrupt used by the Silicon
Motion SM501 multimedia companion chip. As rts7751r2d_irq_demux() does
not take into account the new virq offset, the interrupt is no longer
translated, leading to an unhandled interrupt.
Fix this by taking into account the virq offset when translating
cascaded IRL interrupts.
Fixes:
a8ac2961148e8c72 ("sh: Avoid using IRQ0 on SH3 and SH4")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/r/
fbfea3ad-d327-4ad5-ac9c-
648c7ca3fe1f@roeck-us.net
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/2c99d5df41c40691f6c407b7b6a040d406bc81ac.1688901306.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Jakub Kicinski [Thu, 13 Jul 2023 01:13:57 +0000 (18:13 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2023-07-12
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 7 files changed, 93 insertions(+), 28 deletions(-).
The main changes are:
1) Fix max stack depth check for async callbacks, from Kumar.
2) Fix inconsistent JIT image generation, from Björn.
3) Use trusted arguments in XDP hints kfuncs, from Larysa.
4) Fix memory leak in cpu_map_update_elem, from Pu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: use trusted arguments in XDP hints kfuncs
bpf: cpumap: Fix memory leak in cpu_map_update_elem
riscv, bpf: Fix inconsistent JIT image generation
selftests/bpf: Add selftest for check_stack_max_depth bug
bpf: Fix max stack depth check for async callbacks
====================
Link: https://lore.kernel.org/r/20230712223045.40182-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Tue, 11 Jul 2023 11:50:52 +0000 (13:50 +0200)]
wifi: cfg80211: fix receiving mesh packets without RFC1042 header
Fix ethernet header length field after stripping the mesh header
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CT5GNZSK28AI.2K6M69OXM9RW5@syracuse/
Fixes:
986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces")
Reported-and-tested-by: Nicolas Escande <nico.escande@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230711115052.68430-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhang Shurong [Thu, 6 Jul 2023 02:45:00 +0000 (10:45 +0800)]
wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
If there is a failure during rtw89_fw_h2c_raw() rtw89_debug_priv_send_h2c
should return negative error code instead of a positive value count.
Fix this bug by returning correct error code.
Fixes:
e3ec7017f6a2 ("rtw89: add Realtek 802.11ax driver")
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/tencent_AD09A61BC4DA92AD1EB0790F5C850E544D07@qq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Tue, 11 Jul 2023 06:34:14 +0000 (14:34 +0800)]
net: txgbe: fix eeprom calculation error
For some device types like TXGBE_ID_XAUI, *checksum computed in
txgbe_calc_eeprom_checksum() is larger than TXGBE_EEPROM_SUM. Remove the
limit on the size of *checksum.
Fixes:
049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Fixes:
5e2ea7801fac ("net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://lore.kernel.org/r/20230711063414.3311-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 12 Jul 2023 23:28:53 +0000 (16:28 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC fix from Stafford Horne:
- During the 6.4 cycle my fpu support work broke ABI compatibility in
the sigcontext struct. This was noticed by musl libc developers after
the release. This fix restores the ABI.
* tag 'for-linus' of https://github.com/openrisc/linux:
openrisc: Union fpcsr and oldmask in sigcontext to unbreak userspace ABI
Mohamed Khalfella [Wed, 12 Jul 2023 22:30:21 +0000 (22:30 +0000)]
tracing/histograms: Add histograms to hist_vars if they have referenced variables
Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so
$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr
ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:
00007ffc60137a38 EFLAGS:
00000246 ORIG_RAX:
0000000000000021
[ 100.321200] RAX:
ffffffffffffffda RBX:
000055f566469ea0 RCX:
00007f686c75c1cb
[ 100.324631] RDX:
0000000000000001 RSI:
0000000000000001 RDI:
000000000000000a
[ 100.328104] RBP:
00007ffc60137ac0 R08:
00007f686c818460 R09:
000000000000000a
[ 100.331509] R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000009
[ 100.334992] R13:
0000000000000007 R14:
000000000000000a R15:
0000000000000007
[ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.
Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfella@purestorage.com
Cc: stable@vger.kernel.org
Fixes:
067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pedro Tammela [Tue, 11 Jul 2023 02:16:34 +0000 (23:16 -0300)]
net/sched: make psched_mtu() RTNL-less safe
Eric Dumazet says[1]:
-------
Speaking of psched_mtu(), I see that net/sched/sch_pie.c is using it
without holding RTNL, so dev->mtu can be changed underneath.
KCSAN could issue a warning.
-------
Annotate dev->mtu with READ_ONCE() so KCSAN don't issue a warning.
[1] https://lore.kernel.org/all/CANn89iJoJO5VtaJ-2=_d2aOQhb0Xw8iBT_Cxqp2HyuS-zj6azw@mail.gmail.com/
v1 -> v2: Fix commit message
Fixes:
d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230711021634.561598-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Krister Johansen [Tue, 11 Jul 2023 01:36:21 +0000 (18:36 -0700)]
net: ena: fix shift-out-of-bounds in exponential backoff
The ENA adapters on our instances occasionally reset. Once recently
logged a UBSAN failure to console in the process:
UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
Workqueue: ena ena_fw_reset_device [ena]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
? __const_udelay+0x43/0x50
ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
wait_for_reset_state+0x54/0xa0 [ena]
ena_com_dev_reset+0xc8/0x110 [ena]
ena_down+0x3fe/0x480 [ena]
ena_destroy_device+0xeb/0xf0 [ena]
ena_fw_reset_device+0x30/0x50 [ena]
process_one_work+0x22b/0x3d0
worker_thread+0x4d/0x3f0
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.
Looking at the code, the current timeout is capped at 5000us. Using a
base value of 100us, the current code will overflow after (1<<29). Even
at values before 32, this function wraps around, perhaps
unintentionally.
Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.
Cc: stable@vger.kernel.org
Fixes:
4bb7f4cf60e3 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steven Rostedt (Google) [Wed, 12 Jul 2023 14:52:35 +0000 (10:52 -0400)]
tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
The stack_trace event is an event created by the tracing subsystem to
store stack traces. It originally just contained a hard coded array of 8
words to hold the stack, and a "size" to know how many entries are there.
This is exported to user space as:
name: kernel_stack
ID: 4
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int size; offset:8; size:4; signed:1;
field:unsigned long caller[8]; offset:16; size:64; signed:0;
print fmt: "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n",i
(void *)REC->caller[0], (void *)REC->caller[1], (void *)REC->caller[2],
(void *)REC->caller[3], (void *)REC->caller[4], (void *)REC->caller[5],
(void *)REC->caller[6], (void *)REC->caller[7]
Where the user space tracers could parse the stack. The library was
updated for this specific event to only look at the size, and not the
array. But some older users still look at the array (note, the older code
still checks to make sure the array fits inside the event that it read.
That is, if only 4 words were saved, the parser would not read the fifth
word because it will see that it was outside of the event size).
This event was changed a while ago to be more dynamic, and would save a
full stack even if it was greater than 8 words. It does this by simply
allocating more ring buffer to hold the extra words. Then it copies in the
stack via:
memcpy(&entry->caller, fstack->calls, size);
As the entry is struct stack_entry, that is created by a macro to both
create the structure and export this to user space, it still had the caller
field of entry defined as: unsigned long caller[8].
When the stack is greater than 8, the FORTIFY_SOURCE code notices that the
amount being copied is greater than the source array and complains about
it. It has no idea that the source is pointing to the ring buffer with the
required allocation.
To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:
ptr = ring_buffer_event_data(event);
entry = ptr;
ptr += offsetof(typeof(*entry), caller);
memcpy(ptr, fstack->calls, size);
Link: https://lore.kernel.org/all/20230612160748.4082850-1-svens@linux.ibm.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230712105235.5fc441aa@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Zheng Yejian [Wed, 12 Jul 2023 06:04:52 +0000 (14:04 +0800)]
ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
> Some architecture linkers will pad between
> the different mcount_loc sections of different
> object files to satisfy alignments.
> Skip any NULL pointers.
After commit
20e5227e9f55 ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit
706c81f87f84 ("ftrace: Remove extra helper
functions"), warning may occur at:
WARN_ON(pg->next);
To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
706c81f87f84 ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Dan Carpenter [Tue, 11 Jul 2023 08:52:26 +0000 (11:52 +0300)]
netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
The simple_write_to_buffer() function is designed to handle partial
writes. It returns negatives on error, otherwise it returns the number
of bytes that were able to be copied. This code doesn't check the
return properly. We only know that the first byte is written, the rest
of the buffer might be uninitialized.
There is no need to use the simple_write_to_buffer() function.
Partial writes are prohibited by the "if (*ppos != 0)" check at the
start of the function. Just use memdup_user() and copy the whole
buffer.
Fixes:
d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/7c1f950b-3a7d-4252-82a6-876e53078ef7@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 12 Jul 2023 19:16:47 +0000 (12:16 -0700)]
Merge tag 'platform-drivers-x86-v6.5-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Misc small fixes and hw-id additions"
* tag 'platform-drivers-x86-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add info for the Archos 101 Cesium Educ tablet
platform/x86: dell-ddv: Fix mangled list in documentation
platform/x86: dell-ddv: Improve error handling
platform/x86/amd: pmf: Add new ACPI ID AMDI0103
platform/x86/amd: pmc: Add new ACPI ID AMDI000A
platform/x86/amd: pmc: Apply nvme quirk to HP 15s-eq2xxx
platform/x86: Move s2idle quirk from thinkpad-acpi to amd-pmc
platform/x86: int3472/discrete: set variable skl_int3472_regulator_second_sensor storage-class-specifier to static
platform/x86/intel/tpmi: Prevent overflow for cap_offset
platform/x86: wmi: Replace open coded guid_parse_and_compare()
platform/x86: wmi: Break possible infinite loop when parsing GUID
Linus Torvalds [Wed, 12 Jul 2023 19:01:16 +0000 (12:01 -0700)]
Merge tag 'probes-fixes-v6.5-rc1' of git://git./linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- Fix fprobe's rethook release issues:
- Release rethook after ftrace_ops is unregistered so that the
rethook is not accessed after free.
- Stop rethook before ftrace_ops is unregistered so that the
rethook is NOT used after exiting unregister_fprobe()
- Fix eprobe cleanup logic. If it attaches to multiple events and
failes to enable one of them, rollback all enabled events correctly.
- Fix fprobe to unlock ftrace recursion lock correctly when it missed
by another running kprobe.
- Cleanup kprobe to remove unnecessary NULL.
- Cleanup kprobe to remove unnecessary 0 initializations.
* tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
kernel: kprobes: Remove unnecessary ‘0’ values
kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr
fprobe: add unlock to match a succeeded ftrace_test_recursion_trylock
kernel/trace: Fix cleanup logic of enable_trace_eprobe
fprobe: Release rethook after the ftrace_ops is unregistered
Linus Torvalds [Wed, 12 Jul 2023 18:56:22 +0000 (11:56 -0700)]
Merge tag 'for-linus-
2023071101' of git://git./linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- AMD SFH shift-out-of-bounds fix (Basavaraj Natikar)
- avoid struct memcpy overrun warning in the hid-hyperv module (Arnd
Bergmann)
- a quick HID kselftests script fix for our CI to be happy (Benjamin
Tissoires)
- various fixes and additions of device IDs
* tag 'for-linus-
2023071101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: amd_sfh: Fix for shift-out-of-bounds
HID: amd_sfh: Rename the float32 variable
HID: input: fix mapping for camera access keys
HID: logitech-hidpp: Add wired USB id for Logitech G502 Lightspeed
HID: nvidia-shield: Pack inner/related declarations in HOSTCMD reports
HID: hyperv: avoid struct memcpy overrun warning
selftests: hid: fix vmtests.sh not running make headers
Zheng Yejian [Sat, 8 Jul 2023 22:51:44 +0000 (06:51 +0800)]
ring-buffer: Fix deadloop issue on reading trace_pipe
Soft lockup occurs when reading file 'trace_pipe':
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [cat:4488]
[...]
RIP: 0010:ring_buffer_empty_cpu+0xed/0x170
RSP: 0018:
ffff88810dd6fc48 EFLAGS:
00000246
RAX:
0000000000000000 RBX:
0000000000000246 RCX:
ffffffff93d1aaeb
RDX:
ffff88810a280040 RSI:
0000000000000008 RDI:
ffff88811164b218
RBP:
ffff88811164b218 R08:
0000000000000000 R09:
ffff88815156600f
R10:
ffffed102a2acc01 R11:
0000000000000001 R12:
0000000051651901
R13:
0000000000000000 R14:
ffff888115e49500 R15:
0000000000000000
[...]
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f8d853c2000 CR3:
000000010dcd8000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
__find_next_entry+0x1a8/0x4b0
? peek_next_entry+0x250/0x250
? down_write+0xa5/0x120
? down_write_killable+0x130/0x130
trace_find_next_entry_inc+0x3b/0x1d0
tracing_read_pipe+0x423/0xae0
? tracing_splice_read_pipe+0xcb0/0xcb0
vfs_read+0x16b/0x490
ksys_read+0x105/0x210
? __ia32_sys_pwrite64+0x200/0x200
? switch_fpu_return+0x108/0x220
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x61/0xc6
Through the vmcore, I found it's because in tracing_read_pipe(),
ring_buffer_empty_cpu() found some buffer is not empty but then it
cannot read anything due to "rb_num_of_entries() == 0" always true,
Then it infinitely loop the procedure due to user buffer not been
filled, see following code path:
tracing_read_pipe() {
... ...
waitagain:
tracing_wait_pipe() // 1. find non-empty buffer here
trace_find_next_entry_inc() // 2. loop here try to find an entry
__find_next_entry()
ring_buffer_empty_cpu(); // 3. find non-empty buffer
peek_next_entry() // 4. but peek always return NULL
ring_buffer_peek()
rb_buffer_peek()
rb_get_reader_page()
// 5. because rb_num_of_entries() == 0 always true here
// then return NULL
// 6. user buffer not been filled so goto 'waitgain'
// and eventually leads to an deadloop in kernel!!!
}
By some analyzing, I found that when resetting ringbuffer, the 'entries'
of its pages are not all cleared (see rb_reset_cpu()). Then when reducing
the ringbuffer, and if some reduced pages exist dirty 'entries' data, they
will be added into 'cpu_buffer->overrun' (see rb_remove_pages()), which
cause wrong 'overrun' count and eventually cause the deadloop issue.
To fix it, we need to clear every pages in rb_reset_cpu().
Link: https://lore.kernel.org/linux-trace-kernel/20230708225144.3785600-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
a5fb833172eca ("ring-buffer: Fix uninitialized read_stamp")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Arnd Bergmann [Wed, 17 May 2023 12:51:48 +0000 (14:51 +0200)]
tracing: arm64: Avoid missing-prototype warnings
These are all tracing W=1 warnings in arm64 allmodconfig about missing
prototypes:
kernel/trace/trace_kprobe_selftest.c:7:5: error: no previous prototype for 'kprobe_trace_selftest_target' [-Werror=missing-pro
totypes]
kernel/trace/ftrace.c:329:5: error: no previous prototype for '__register_ftrace_function' [-Werror=missing-prototypes]
kernel/trace/ftrace.c:372:5: error: no previous prototype for '__unregister_ftrace_function' [-Werror=missing-prototypes]
kernel/trace/ftrace.c:4130:15: error: no previous prototype for 'arch_ftrace_match_adjust' [-Werror=missing-prototypes]
kernel/trace/fgraph.c:243:15: error: no previous prototype for 'ftrace_return_to_handler' [-Werror=missing-prototypes]
kernel/trace/fgraph.c:358:6: error: no previous prototype for 'ftrace_graph_sleep_time_control' [-Werror=missing-prototypes]
arch/arm64/kernel/ftrace.c:460:6: error: no previous prototype for 'prepare_ftrace_return' [-Werror=missing-prototypes]
arch/arm64/kernel/ptrace.c:2172:5: error: no previous prototype for 'syscall_trace_enter' [-Werror=missing-prototypes]
arch/arm64/kernel/ptrace.c:2195:6: error: no previous prototype for 'syscall_trace_exit' [-Werror=missing-prototypes]
Move the declarations to an appropriate header where they can be seen
by the caller and callee, and make sure the headers are included where
needed.
Link: https://lore.kernel.org/linux-trace-kernel/20230517125215.930689-1-arnd@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Florent Revest <revest@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
[ Fixed ftrace_return_to_handler() to handle CONFIG_HAVE_FUNCTION_GRAPH_RETVAL case ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Thu, 29 Jun 2023 23:50:49 +0000 (23:50 +0000)]
selftests/user_events: Test struct size match cases
The self tests for user_events currently does not ensure that the edge
case for struct types work properly with size differences.
Add cases for mis-matching struct names and sizes to ensure they work
properly.
Link: https://lkml.kernel.org/r/20230629235049.581-3-beaub@linux.microsoft.com
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Ido Schimmel [Tue, 11 Jul 2023 07:08:09 +0000 (10:08 +0300)]
net/sched: flower: Ensure both minimum and maximum ports are specified
The kernel does not currently validate that both the minimum and maximum
ports of a port range are specified. This can lead user space to think
that a filter matching on a port range was successfully added, when in
fact it was not. For example, with a patched (buggy) iproute2 that only
sends the minimum port, the following commands do not return an error:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
# tc filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 2 ref 1 bind 1
Fix by returning an error unless both ports are specified:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
Error: Both min and max source ports must be specified.
We have an error talking to the kernel
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
Error: Both min and max destination ports must be specified.
We have an error talking to the kernel
Fixes:
5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Jul 2023 09:07:24 +0000 (10:07 +0100)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
igc: Fix corner cases for TSN offload
Florian Kauer says:
The igc driver supports several different offloading capabilities
relevant in the TSN context. Recent patches in this area introduced
regressions for certain corner cases that are fixed in this series.
Each of the patches (except the first one) addresses a different
regression that can be separately reproduced. Still, they have
overlapping code changes so they should not be separately applied.
Especially #4 and #6 address the same observation,
but both need to be applied to avoid TX hang occurrences in
the scenario described in the patches.
====================
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Halaney [Mon, 10 Jul 2023 19:50:57 +0000 (14:50 -0500)]
MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
linux-arm-msm is the list most people subscribe to in order to receive
updates about Qualcomm related drivers. Make sure changes for the
Qualcomm ethernet driver make it there.
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20230710195240.197047-1-ahalaney@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 10 Jul 2023 17:46:36 +0000 (10:46 -0700)]
docs: netdev: update the URL of the status page
Move the status page from vger to the same server as mailbot.
Link: https://lore.kernel.org/r/20230710174636.1174684-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg [Mon, 10 Jul 2023 14:50:39 +0000 (16:50 +0200)]
wifi: iwlwifi: remove 'use_tfh' config to fix crash
This is equivalent to 'gen2', and it was always confusing to have
two identical config entries. The split config patch actually had
been originally developed after removing 'use_tfh" and didn't add
the use_tfh in the new configs as they'd later been copied to the
new files. Thus the easiest way to fix the init crash here now is
to just remove use_tfh (which is erroneously unset in most of the
configs now) and use 'gen2' in the code instead.
There's possibly still an unwind error in iwl_txq_gen2_init() as
it crashes if TXQ 0 fails to initialize, but we can deal with it
later since the original failure is due to the use_tfh confusion.
Tested-by: Xi Ruoyao <xry111@xry111.site>
Reported-and-tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Reported-and-tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217622
Link: https://lore.kernel.org/all/9274d9bd3d080a457649ff5addcc1726f08ef5b2.camel@xry111.site/
Link: https://lore.kernel.org/all/CAAJw_Zug6VCS5ZqTWaFSr9sd85k%3DtyPm9DEE%2BmV%3DAKoECZM%2BsQ@mail.gmail.com/
Fixes:
19898ce9cf8a ("wifi: iwlwifi: split 22000.c into multiple files")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230710145038.84186-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Larysa Zaremba [Tue, 11 Jul 2023 10:59:26 +0000 (12:59 +0200)]
xdp: use trusted arguments in XDP hints kfuncs
Currently, verifier does not reject XDP programs that pass NULL pointer to
hints functions. At the same time, this case is not handled in any driver
implementation (including veth). For example, changing
bpf_xdp_metadata_rx_timestamp(ctx, ×tamp);
to
bpf_xdp_metadata_rx_timestamp(ctx, NULL);
in xdp_metadata test successfully crashes the system.
Add KF_TRUSTED_ARGS flag to hints kfunc definitions, so driver code
does not have to worry about getting invalid pointers.
Fixes:
3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
Reported-by: Stanislav Fomichev <sdf@google.com>
Closes: https://lore.kernel.org/bpf/ZKWo0BbpLfkZHbyE@google.com/
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230711105930.29170-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pu Lehui [Tue, 11 Jul 2023 11:58:48 +0000 (19:58 +0800)]
bpf: cpumap: Fix memory leak in cpu_map_update_elem
Syzkaller reported a memory leak as follows:
BUG: memory leak
unreferenced object 0xff110001198ef748 (size 192):
comm "syz-executor.3", pid 17672, jiffies
4298118891 (age 9.906s)
hex dump (first 32 bytes):
00 00 00 00 4a 19 00 00 80 ad e3 e4 fe ff c0 00 ....J...........
00 b2 d3 0c 01 00 11 ff 28 f5 8e 19 01 00 11 ff ........(.......
backtrace:
[<
ffffffffadd28087>] __cpu_map_entry_alloc+0xf7/0xb00
[<
ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
[<
ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
[<
ffffffffadc7349b>] map_update_elem+0x4cb/0x720
[<
ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
[<
ffffffffb029cc80>] do_syscall_64+0x30/0x40
[<
ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
BUG: memory leak
unreferenced object 0xff110001198ef528 (size 192):
comm "syz-executor.3", pid 17672, jiffies
4298118891 (age 9.906s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffffadd281f0>] __cpu_map_entry_alloc+0x260/0xb00
[<
ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
[<
ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
[<
ffffffffadc7349b>] map_update_elem+0x4cb/0x720
[<
ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
[<
ffffffffb029cc80>] do_syscall_64+0x30/0x40
[<
ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
BUG: memory leak
unreferenced object 0xff1100010fd93d68 (size 8):
comm "syz-executor.3", pid 17672, jiffies
4298118891 (age 9.906s)
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace:
[<
ffffffffade5db3e>] kvmalloc_node+0x11e/0x170
[<
ffffffffadd28280>] __cpu_map_entry_alloc+0x2f0/0xb00
[<
ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
[<
ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
[<
ffffffffadc7349b>] map_update_elem+0x4cb/0x720
[<
ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
[<
ffffffffb029cc80>] do_syscall_64+0x30/0x40
[<
ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
In the cpu_map_update_elem flow, when kthread_stop is called before
calling the threadfn of rcpu->kthread, since the KTHREAD_SHOULD_STOP bit
of kthread has been set by kthread_stop, the threadfn of rcpu->kthread
will never be executed, and rcpu->refcnt will never be 0, which will
lead to the allocated rcpu, rcpu->queue and rcpu->queue->queue cannot be
released.
Calling kthread_stop before executing kthread's threadfn will return
-EINTR. We can complete the release of memory resources in this state.
Fixes:
6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230711115848.2701559-1-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Randy Dunlap [Sun, 9 Jul 2023 13:31:54 +0000 (06:31 -0700)]
wifi: airo: avoid uninitialized warning in airo_get_rate()
Quieten a gcc (11.3.0) build error or warning by checking the function
call status and returning -EBUSY if the function call failed.
This is similar to what several other wireless drivers do for the
SIOCGIWRATE ioctl call when there is a locking problem.
drivers/net/wireless/cisco/airo.c: error: 'status_rid.currentXmitRate' is used uninitialized [-Werror=uninitialized]
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/39abf2c7-24a-f167-91da-ed4c5435d1c4@linux-m68k.org
Link: https://lore.kernel.org/r/20230709133154.26206-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Suman Ghosh [Mon, 10 Jul 2023 10:30:27 +0000 (16:00 +0530)]
octeontx2-pf: Add additional check for MCAM rules
Due to hardware limitation, MCAM drop rule with
ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting
such rules.
Fixes:
dce677da57c0 ("octeontx2-pf: Add vlan-etype to ntuple filters")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20230710103027.2244139-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Thomas GENTY [Fri, 7 Jul 2023 14:14:25 +0000 (16:14 +0200)]
platform/x86: touchscreen_dmi: Add info for the Archos 101 Cesium Educ tablet
Add info for the Archos 101 Cesium Educ tablet
It was tested using gslx680_ts_acpi module
PR at https://github.com/onitake/gsl-firmware/pull/210 for the firmware
Signed-off-by: Thomas GENTY <tomlohave@gmail.com>
Link: https://lore.kernel.org/r/20230707141425.21473-1-tomlohave@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Armin Wolf [Fri, 7 Jul 2023 01:03:33 +0000 (03:03 +0200)]
platform/x86: dell-ddv: Fix mangled list in documentation
Add missing empty line necessary for sphinx to recognize
the list. Also reword the first entry a little bit.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230707010333.12954-2-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Armin Wolf [Fri, 7 Jul 2023 01:03:32 +0000 (03:03 +0200)]
platform/x86: dell-ddv: Improve error handling
If for some reason a external function returns -ENODEV,
no error message is being displayed because the driver
assumes that -ENODEV can only be returned internally if
no sensors, etc where found.
Fix this by explicitly returning 0 in such a case since
missing hardware is no error. Also remove the now obsolete
check for -ENODEV.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230707010333.12954-1-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Shyam Sundar S K [Tue, 11 Jul 2023 10:09:03 +0000 (15:39 +0530)]
platform/x86/amd: pmf: Add new ACPI ID AMDI0103
Add new ACPI ID AMDI0103 used by upcoming AMD platform to the PMF
supported list of devices.
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20230711100903.384151-1-Shyam-sundar.S-k@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Shyam Sundar S K [Tue, 11 Jul 2023 10:03:44 +0000 (15:33 +0530)]
platform/x86/amd: pmc: Add new ACPI ID AMDI000A
Add new ACPI ID AMDI000A used by upcoming AMD platform to the pmc
supported list of devices
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20230711100344.383948-1-Shyam-sundar.S-k@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Mario Limonciello [Mon, 10 Jul 2023 18:39:34 +0000 (13:39 -0500)]
platform/x86/amd: pmc: Apply nvme quirk to HP 15s-eq2xxx
HP 15s-eq2xxx is an older Lucienne laptop that has a problem resuming
from s2idle when IOMMU is enabled. The symptoms very closely resemble
that of the Lenovo issues with NVME resume. Lucienne was released in
a similar timeframe as the Renoir / Cezanne Lenovo laptops and they
may have similar BIOS code.
Applying the same quirk to this system allows the system to work with
IOMMU enabled and s2idle resume to work.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2684
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230710183934.17315-3-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Mario Limonciello [Mon, 10 Jul 2023 18:39:33 +0000 (13:39 -0500)]
platform/x86: Move s2idle quirk from thinkpad-acpi to amd-pmc
It turns out that some-non Lenovo systems can benefit from the quirk
introduced for Lenovo systems in commit
455cd867b85b5 ("platform/x86:
thinkpad_acpi: Add a s2idle resume quirk for a number of laptops").
So move this quirk into running from the amd-pmc driver instead.
No intended functional changes.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230710183934.17315-2-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Lu Hongfei [Mon, 10 Jul 2023 03:18:59 +0000 (11:18 +0800)]
net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
Remove unnecessary of_node_put from the continue path to prevent
child node from being released twice, which could avoid resource
leak or other unexpected issues.
Signed-off-by: Lu Hongfei <luhongfei@vivo.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes:
de879a016a94 ("net: dsa: felix: add functionality when not all ports are supported")
Link: https://lore.kernel.org/r/20230710031859.36784-1-luhongfei@vivo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 11 Jul 2023 08:00:52 +0000 (10:00 +0200)]
Merge branch 'net-fec-fix-some-issues-of-ndo_xdp_xmit'
Wei Fang says:
====================
net: fec: fix some issues of ndo_xdp_xmit()
We encountered some issues when testing the ndo_xdp_xmit() interface
of the fec driver on i.MX8MP and i.MX93 platforms. These issues are
easy to reproduce, and the specific reproduction steps are as follows.
step1: The ethernet port of a board (board A) is connected to the EQOS
port of i.MX8MP/i.MX93, and the FEC port of i.MX8MP/i.MX93 is connected
to another ethernet port, such as a switch port.
step2: Board A uses the pktgen_sample03_burst_single_flow.sh to generate
and send packets to i.MX8MP/i.MX93. The command is shown below.
./pktgen_sample03_burst_single_flow.sh -i eth0 -d 192.168.6.8 -m \
56:bf:0d:68:b0:9e -s 1500
step3: i.MX8MP/i.MX93 use the xdp_redirect bfp program to redirect the
XDP frames from EQOS port to FEC port. The command is shown below.
./xdp_redirect eth1 eth0
After a few moments, the warning or error logs will be printed in the
console, for more details, please refer to the commit message of each
patch.
====================
Link: https://lore.kernel.org/r/20230706081012.2278063-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Thu, 6 Jul 2023 08:10:12 +0000 (16:10 +0800)]
net: fec: use netdev_err_once() instead of netdev_err()
In the case of heavy XDP traffic to be transmitted, the console
will print the error log continuously if there are lack of enough
BDs to accommodate the frames. The log looks like below.
[ 160.013112] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec
30be0000.ethernet eth0: NOT enough BD for SG!
Not only will this log be replicated and redundant, it will also
degrade XDP performance. So we use netdev_err_once() instead of
netdev_err() now.
Fixes:
6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Thu, 6 Jul 2023 08:10:11 +0000 (16:10 +0800)]
net: fec: increase the size of tx ring and update tx_wake_threshold
When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).
[ 160.013112] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec
30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec
30be0000.ethernet eth0: NOT enough BD for SG!
In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.
After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.
This patch also updates the tx_wake_threshold of tx ring which is
related to the size of tx ring in the previous logic. Otherwise,
the tx_wake_threshold will be too high (403 BDs), which is more
likely to impact the slow path in the case of heavy XDP traffic,
because XDP path and slow path share the tx BD rings. According
to Jakub's suggestion, the tx_wake_threshold is at least equal to
tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
entries is overflowing, we should be able to apply a hysteresis
of a few tens of entries.
Fixes:
6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Thu, 6 Jul 2023 08:10:10 +0000 (16:10 +0800)]
net: fec: recycle pages for transmitted XDP frames
Once the XDP frames have been successfully transmitted through the
ndo_xdp_xmit() interface, it's the driver responsibility to free
the frames so that the page_pool can recycle the pages and reuse
them. However, this action is not implemented in the fec driver.
This leads to a user-visible problem that the console will print
the following warning log.
[ 157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
[ 217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
[ 278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
[ 338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
[ 399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
while cleaning the tx BD ring.
Fixes:
6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Thu, 6 Jul 2023 08:10:09 +0000 (16:10 +0800)]
net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP
When a XDP program is installed or uninstalled, fec_restart() will
be invoked to reset MAC and buffer descriptor rings. It's reasonable
not to transmit any packet during the process of reset. However, the
NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default,
that is to say, it's possible that the fec_enet_xdp_xmit() will be
invoked even if the process of reset is not finished. In this case,
the redirected XDP frames might be dropped and available transmit BDs
may be incorrectly deemed insufficient. So this patch disable the
NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure
this feature when the bpf program is installed or uninstalled.
Fixes:
e4ac7cc6e5a4 ("net: fec: turn on XDP features")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Björn Töpel [Mon, 10 Jul 2023 07:41:31 +0000 (09:41 +0200)]
riscv, bpf: Fix inconsistent JIT image generation
In order to generate the prologue and epilogue, the BPF JIT needs to
know which registers that are clobbered. Therefore, the during
pre-final passes, the prologue is generated after the body of the
program body-prologue-epilogue. Then, in the final pass, a proper
prologue-body-epilogue JITted image is generated.
This scheme has worked most of the time. However, for some large
programs with many jumps, e.g. the test_kmod.sh BPF selftest with
hardening enabled (blinding constants), this has shown to be
incorrect. For the final pass, when the proper prologue-body-epilogue
is generated, the image has not converged. This will lead to that the
final image will have incorrect jump offsets. The following is an
excerpt from an incorrect image:
| ...
| 3b8:
00c50663 beq a0,a2,3c4 <.text+0x3c4>
| 3bc:
0020e317 auipc t1,0x20e
| 3c0:
49630067 jalr zero,1174(t1) # 20e852 <.text+0x20e852>
| ...
| 20e84c: 8796 c.mv a5,t0
| 20e84e: 6422 c.ldsp s0,8(sp) # Epilogue start
| 20e850: 6141 c.addi16sp sp,16
| 20e852: 853e c.mv a0,a5 # Incorrect jump target
| 20e854: 8082 c.jr ra
The image has shrunk, and the epilogue offset is incorrect in the
final pass.
Correct the problem by always generating proper prologue-body-epilogue
outputs, which means that the first pass will only generate the body
to track what registers that are touched.
Fixes:
2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230710074131.19596-1-bjorn@kernel.org
Beau Belgrave [Thu, 29 Jun 2023 23:50:48 +0000 (23:50 +0000)]
tracing/user_events: Fix struct arg size match check
When users register an event the name of the event and it's argument are
checked to ensure they match if the event already exists. Normally all
arguments are in the form of "type name", except for when the type
starts with "struct ". In those cases, the size of the struct is passed
in addition to the name, IE: "struct my_struct a 20" for an argument
that is of type "struct my_struct" with a field name of "a" and has the
size of 20 bytes.
The current code does not honor the above case properly when comparing
a match. This causes the event register to fail even when the same
string was used for events that contain a struct argument within them.
The example above "struct my_struct a 20" generates a match string of
"struct my_struct a" omitting the size field.
Add the struct size of the existing field when generating a comparison
string for a struct field to ensure proper match checking.
Link: https://lkml.kernel.org/r/20230629235049.581-2-beaub@linux.microsoft.com
Cc: stable@vger.kernel.org
Fixes:
e6f89a149872 ("tracing/user_events: Ensure user provided strings are safely formatted")
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
YueHaibing [Fri, 23 Jun 2023 09:16:40 +0000 (17:16 +0800)]
x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
This is now unused, so can remove it.
Link: https://lore.kernel.org/linux-trace-kernel/20230623091640.21952-1-yuehaibing@huawei.com
Cc: <mark.rutland@arm.com>
Cc: <tglx@linutronix.de>
Cc: <mingo@redhat.com>
Cc: <bp@alien8.de>
Cc: <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: <hpa@zytor.com>
Cc: <peterz@infradead.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Fri, 7 Jul 2023 14:03:19 +0000 (23:03 +0900)]
fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
Ensure running fprobe_exit_handler() has finished before
calling rethook_free() in the unregister_fprobe() so that caller can free
the fprobe right after unregister_fprobe().
unregister_fprobe() ensured that all running fprobe_entry/exit_handler()
have finished by calling unregister_ftrace_function() which synchronizes
RCU. But commit
5f81018753df ("fprobe: Release rethook after the ftrace_ops
is unregistered") changed to call rethook_free() after
unregister_ftrace_function(). So call rethook_stop() to make rethook
disabled before unregister_ftrace_function() and ensure it again.
Here is the possible code flow that can call the exit handler after
unregister_fprobe().
------
CPU1 CPU2
call unregister_fprobe(fp)
...
__fprobe_handler()
rethook_hook() on probed function
unregister_ftrace_function()
return from probed function
rethook hooks
find rh->handler == fprobe_exit_handler
call fprobe_exit_handler()
rethook_free():
set rh->handler = NULL;
return from unreigster_fprobe;
call fp->exit_handler() <- (*)
------
(*) At this point, the exit handler is called after returning from
unregister_fprobe().
This fixes it as following;
------
CPU1 CPU2
call unregister_fprobe()
...
rethook_stop():
set rh->handler = NULL;
__fprobe_handler()
rethook_hook() on probed function
unregister_ftrace_function()
return from probed function
rethook hooks
find rh->handler == NULL
return from rethook
rethook_free()
return from unreigster_fprobe;
------
Link: https://lore.kernel.org/all/168873859949.156157.13039240432299335849.stgit@devnote2/
Fixes:
5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Thu, 27 Apr 2023 14:07:00 +0000 (16:07 +0200)]
arm64: ftrace: Add direct call trampoline samples support
The ftrace samples need per-architecture trampoline implementations
to save and restore argument registers around the calls to
my_direct_func* and to restore polluted registers (eg: x30).
These samples also include <asm/asm-offsets.h> which, on arm64, is not
necessary and redefines previously defined macros (resulting in
warnings) so these includes are guarded by !CONFIG_ARM64.
Link: https://lkml.kernel.org/r/20230427140700.625241-3-revest@chromium.org
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Florent Revest [Thu, 27 Apr 2023 14:06:59 +0000 (16:06 +0200)]
samples: ftrace: Save required argument registers in sample trampolines
The ftrace-direct-too sample traces the handle_mm_fault function whose
signature changed since the introduction of the sample. Since:
commit
bce617edecad ("mm: do page fault accounting in handle_mm_fault")
handle_mm_fault now has 4 arguments. Therefore, the sample trampoline
should save 4 argument registers.
s390 saves all argument registers already so it does not need a change
but x86_64 needs an extra push and pop.
This also evolves the signature of the tracing function to make it
mirror the signature of the traced function.
Link: https://lkml.kernel.org/r/20230427140700.625241-2-revest@chromium.org
Cc: stable@vger.kernel.org
Fixes:
bce617edecad ("mm: do page fault accounting in handle_mm_fault")
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stafford Horne [Wed, 28 Jun 2023 16:54:40 +0000 (17:54 +0100)]
openrisc: Union fpcsr and oldmask in sigcontext to unbreak userspace ABI
With commit
27267655c531 ("openrisc: Support floating point user api") I
added an entry to the struct sigcontext which caused an unwanted change
to the userspace ABI.
To fix this we use the previously unused oldmask field space for the
floating point fpcsr state. We do this with a union to restore the ABI
back to the pre kernel v6.4 ABI and keep API compatibility.
This does mean if there is some code somewhere that is setting oldmask
in an OpenRISC specific userspace sighandler it would end up setting the
floating point register status, but I think it's unlikely as oldmask was
never functional before.
Fixes:
27267655c531 ("openrisc: Support floating point user api")
Reported-by: Szabolcs Nagy <nsz@port70.net>
Closes: https://lore.kernel.org/openrisc/
20230626213840.GA1236108@port70.net/
Signed-off-by: Stafford Horne <shorne@gmail.com>
Linus Torvalds [Mon, 10 Jul 2023 17:04:26 +0000 (10:04 -0700)]
Merge tag 'linux-watchdog-6.5-rc2' of git://linux-watchdog.org/linux-watchdog
Pull watchdog update from Wim Van Sebroeck:
- Add Loongson-1 watchdog dt-bindings
* tag 'linux-watchdog-6.5-rc2' of git://www.linux-watchdog.org/linux-watchdog:
dt-bindings: watchdog: Add Loongson-1 watchdog
Linus Torvalds [Mon, 10 Jul 2023 16:53:11 +0000 (09:53 -0700)]
Merge tag 'v6.5-p2' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"Fix a couple of regressions in af_alg and incorrect return values in
crypto/asymmetric_keys/public_key"
* tag 'v6.5-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif_hash - Fix race between MORE and non-MORE sends
KEYS: asymmetric: Fix error codes
crypto: af_alg - Fix merging of written data into spliced pages
Florian Kauer [Wed, 14 Jun 2023 14:07:14 +0000 (16:07 +0200)]
igc: Fix inserting of empty frame for launchtime
The insertion of an empty frame was introduced with
commit
db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
in order to ensure that the current cycle has at least one packet if
there is some packet to be scheduled for the next cycle.
However, the current implementation does not properly check if
a packet is already scheduled for the current cycle. Currently,
an empty packet is always inserted if and only if
txtime >= end_of_cycle && txtime > last_tx_cycle
but since last_tx_cycle is always either the end of the current
cycle (end_of_cycle) or the end of a previous cycle, the
second part (txtime > last_tx_cycle) is always true unless
txtime == last_tx_cycle.
What actually needs to be checked here is if the last_tx_cycle
was already written within the current cycle, so an empty frame
should only be inserted if and only if
txtime >= end_of_cycle && end_of_cycle > last_tx_cycle.
This patch does not only avoid an unnecessary insertion, but it
can actually be harmful to insert an empty packet if packets
are already scheduled in the current cycle, because it can lead
to a situation where the empty packet is actually processed
as the first packet in the upcoming cycle shifting the packet
with the first_flag even one cycle into the future, finally leading
to a TX hang.
The TX hang can be reproduced on a i225 with:
sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 1 \
map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 \
base-time 0 \
sched-entry S 01 300000 \
flags 0x1 \
txtime-delay 500000 \
clockid CLOCK_TAI
sudo tc qdisc replace dev enp1s0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 500000 \
offload \
skip_sock_check
and traffic generator
sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns
with traffic.cfg
#define ETH_P_IP 0x0800
{
/* Ethernet Header */
0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed
0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 48, 1, # IP Src - adapt as needed
10, 0, 48, 10, # IP Dest - adapt as needed
const16(5555), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
and the observed message with that is for example
igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang
Tx Queue <0>
TDH <32>
TDT <3c>
next_to_use <3c>
next_to_clean <32>
buffer_info[next_to_clean]
time_stamp <
ffff26a8>
next_to_watch <
00000000632a1828>
jiffies <
ffff27f8>
desc.status <1048000>
Fixes:
db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Florian Kauer [Wed, 14 Jun 2023 14:07:13 +0000 (16:07 +0200)]
igc: Fix launchtime before start of cycle
It is possible (verified on a running system) that frames are processed
by igc_tx_launchtime with a txtime before the start of the cycle
(baset_est).
However, the result of txtime - baset_est is written into a u32,
leading to a wrap around to a positive number. The following
launchtime > 0 check will only branch to executing launchtime = 0
if launchtime is already 0.
Fix it by using a s32 before checking launchtime > 0.
Fixes:
db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>