Eric Dumazet [Mon, 21 Jun 2021 18:02:44 +0000 (11:02 -0700)]
ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl()
Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE
must be present to avoid a crash.
Fixes:
f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210621180244.882076-1-eric.dumazet@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Dongliang Mu [Wed, 16 Jun 2021 02:09:01 +0000 (10:09 +0800)]
ieee802154: hwsim: Fix memory leak in hwsim_add_one
No matter from hwsim_remove or hwsim_del_radio_nl, hwsim_del fails to
remove the entry in the edges list. Take the example below, phy0, phy1
and e0 will be deleted, resulting in e1 not freed and accessed in the
future.
hwsim_phys
|
------------------------------
| |
phy0 (edges) phy1 (edges)
----> e1 (idx = 1) ----> e0 (idx = 0)
Fix this by deleting and freeing all the entries in the edges list
between hwsim_edge_unsubscribe_me and list_del(&phy->list).
Reported-by: syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com
Fixes:
1c9f4a3fce77 ("ieee802154: hwsim: fix rcu handling")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210616020901.2759466-1-mudongliangabcd@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Dongliang Mu [Fri, 11 Jun 2021 01:58:12 +0000 (09:58 +0800)]
ieee802154: hwsim: Fix possible memory leak in hwsim_subscribe_all_others
In hwsim_subscribe_all_others, the error handling code performs
incorrectly if the second hwsim_alloc_edge fails. When this issue occurs,
it goes to sub_fail, without cleaning the edges allocated before.
Fixes:
f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210611015812.1626999-1-mudongliangabcd@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Aleksander Jan Bajkowski [Tue, 8 Jun 2021 21:21:07 +0000 (23:21 +0200)]
net: lantiq: disable interrupt before sheduling NAPI
This patch fixes TX hangs with threaded NAPI enabled. The scheduled
NAPI seems to be executed in parallel with the interrupt on second
thread. Sometimes it happens that ltq_dma_disable_irq() is executed
after xrx200_tx_housekeeping(). The symptom is that TX interrupts
are disabled in the DMA controller. As a result, the TX hangs after
a few seconds of the iperf test. Scheduling NAPI after disabling
interrupts fixes this issue.
Tested on Lantiq xRX200 (BT Home Hub 5A).
Fixes:
9423361da523 ("net: lantiq: Disable IRQs only if NAPI gets scheduled ")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shay Agroskin [Tue, 8 Jun 2021 16:42:54 +0000 (19:42 +0300)]
net: ena: fix DMA mapping function issues in XDP
This patch fixes several bugs found when (DMA/LLQ) mapping a packet for
transmission. The mapping procedure makes the transmitted packet
accessible by the device.
When using LLQ, this requires copying the packet's header to push header
(which would be passed to LLQ) and creating DMA mapping for the payload
(if the packet doesn't fit the maximum push length).
When not using LLQ, we map the whole packet with DMA.
The following bugs are fixed in the code:
1. Add support for non-LLQ machines:
The ena_xdp_tx_map_frame() function assumed that LLQ is
supported, and never mapped the whole packet using DMA. On some
instances, which don't support LLQ, this causes loss of traffic.
2. Wrong DMA buffer length passed to device:
When using LLQ, the first 'tx_max_header_size' bytes of the
packet would be copied to push header. The rest of the packet
would be copied to a DMA'd buffer.
3. Freeing the XDP buffer twice in case of a mapping error:
In case a buffer DMA mapping fails, the function uses
xdp_return_frame_rx_napi() to free the RX buffer and returns from
the function with an error. XDP frames that fail to xmit get
freed by the kernel and so there is no need for this call.
Fixes:
548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 8 Jun 2021 11:15:35 +0000 (14:15 +0300)]
net: dsa: felix: re-enable TX flow control in ocelot_port_flush()
Because flow control is set up statically in ocelot_init_port(), and not
in phylink_mac_link_up(), what happens is that after the blamed commit,
the flow control remains disabled after the port flushing procedure.
Fixes:
eb4733d7cffc ("net: dsa: felix: implement port flushing on .phylink_mac_link_down")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Tue, 8 Jun 2021 08:06:41 +0000 (11:06 +0300)]
net: rds: fix memory leak in rds_recvmsg
Syzbot reported memory leak in rds. The problem
was in unputted refcount in case of error.
int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
int msg_flags)
{
...
if (!rds_next_incoming(rs, &inc)) {
...
}
After this "if" inc refcount incremented and
if (rds_cmsg_recv(inc, msg, rs)) {
ret = -EFAULT;
goto out;
}
...
out:
return ret;
}
in case of rds_cmsg_recv() fail the refcount won't be
decremented. And it's easy to see from ftrace log, that
rds_inc_addref() don't have rds_inc_put() pair in
rds_recvmsg() after rds_cmsg_recv()
1) | rds_recvmsg() {
1) 3.721 us | rds_inc_addref();
1) 3.853 us | rds_message_inc_copy_to_user();
1) + 10.395 us | rds_cmsg_recv();
1) + 34.260 us | }
Fixes:
bdbe6fbc6a2f ("RDS: recv.c")
Reported-and-tested-by: syzbot+5134cdf021c4ed5aaa5f@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 8 Jun 2021 19:11:21 +0000 (12:11 -0700)]
Merge tag 'batadv-net-pullrequest-
20210608' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- Avoid WARN_ON timing related checks, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 8 Jun 2021 14:59:51 +0000 (16:59 +0200)]
vrf: fix maximum MTU
My initial goal was to fix the default MTU, which is set to 65536, ie above
the maximum defined in the driver: 65535 (ETH_MAX_MTU).
In fact, it's seems more consistent, wrt min_mtu, to set the max_mtu to
IP6_MAX_MTU (65535 + sizeof(struct ipv6hdr)) and use it by default.
Let's also, for consistency, set the mtu in vrf_setup(). This function
calls ether_setup(), which set the mtu to 1500. Thus, the whole mtu config
is done in the same function.
Before the patch:
$ ip link add blue type vrf table 1234
$ ip link list blue
9: blue: <NOARP,MASTER> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether fa:f5:27:70:24:2a brd ff:ff:ff:ff:ff:ff
$ ip link set dev blue mtu 65535
$ ip link set dev blue mtu 65536
Error: mtu greater than device maximum.
Fixes:
5055376a3b44 ("net: vrf: Fix ping failed when vrf mtu is set to 0")
CC: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gushengxian [Tue, 8 Jun 2021 02:19:32 +0000 (19:19 -0700)]
net: appletalk: fix the usage of preposition
The preposition "for" should be changed to preposition "of".
Signed-off-by: gushengxian <gushengxian@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Tue, 8 Jun 2021 01:53:15 +0000 (09:53 +0800)]
net: ipv4: Remove unneed BUG() function
When 'nla_parse_nested_deprecated' failed, it's no need to
BUG() here, return -EINVAL is ok.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nanyong Sun [Tue, 8 Jun 2021 01:51:58 +0000 (09:51 +0800)]
net: ipv4: fix memory leak in netlbl_cipsov4_add_std
Reported by syzkaller:
BUG: memory leak
unreferenced object 0xffff888105df7000 (size 64):
comm "syz-executor842", pid 360, jiffies
4294824824 (age 22.546s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
00000000e67ed558>] kmalloc include/linux/slab.h:590 [inline]
[<
00000000e67ed558>] kzalloc include/linux/slab.h:720 [inline]
[<
00000000e67ed558>] netlbl_cipsov4_add_std net/netlabel/netlabel_cipso_v4.c:145 [inline]
[<
00000000e67ed558>] netlbl_cipsov4_add+0x390/0x2340 net/netlabel/netlabel_cipso_v4.c:416
[<
0000000006040154>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320 net/netlink/genetlink.c:739
[<
00000000204d7a1c>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
[<
00000000204d7a1c>] genl_rcv_msg+0x2bf/0x4f0 net/netlink/genetlink.c:800
[<
00000000c0d6a995>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504
[<
00000000d78b9d2c>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
[<
000000009733081b>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
[<
000000009733081b>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340
[<
00000000d5fd43b8>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929
[<
000000000a2d1e40>] sock_sendmsg_nosec net/socket.c:654 [inline]
[<
000000000a2d1e40>] sock_sendmsg+0x139/0x170 net/socket.c:674
[<
00000000321d1969>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350
[<
00000000964e16bc>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404
[<
000000001615e288>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433
[<
000000004ee8b6a5>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47
[<
00000000171c7cee>] entry_SYSCALL_64_after_hwframe+0x44/0xae
The memory of doi_def->map.std pointing is allocated in
netlbl_cipsov4_add_std, but no place has freed it. It should be
freed in cipso_v4_doi_free which frees the cipso DOI resource.
Fixes:
96cb8e3313c7a ("[NetLabel]: CIPSOv4 and Unlabeled packet integration")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 7 Jun 2021 17:35:30 +0000 (11:35 -0600)]
neighbour: allow NUD_NOARP entries to be forced GCed
IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to
fill up the neighbour table with enough entries that it will overflow for
valid connections after that.
This behaviour is more prevalent after commit
58956317c8de ("neighbor:
Improve garbage collection") is applied, as it prevents removal from
entries that are not NUD_FAILED, unless they are more than 5s old.
Fixes:
58956317c8de (neighbor: Improve garbage collection)
Reported-by: Kasper Dupont <kasperd@gjkwv.06.feb.2021.kasperd.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Mon, 7 Jun 2021 18:46:23 +0000 (21:46 +0300)]
revert "net: kcm: fix memory leak in kcm_sendmsg"
In commit
c47cc304990a ("net: kcm: fix memory leak in kcm_sendmsg")
I misunderstood the root case of the memory leak and came up with
completely broken fix.
So, simply revert this commit to avoid GPF reported by
syzbot.
Im so sorry for this situation.
Fixes:
c47cc304990a ("net: kcm: fix memory leak in kcm_sendmsg")
Reported-by: syzbot+65badd5e74ec62cb67dc@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Jun 2021 20:12:08 +0000 (13:12 -0700)]
Merge branch 'mlxsw-fixes'
Merge branch 'mlxsw-fixes'
Ido Schimmel says:
====================
mlxsw: Thermal and qdisc fixes
Patches #1-#2 fix wrong validation of burst size in qdisc code and a
user triggerable WARN_ON().
Patch #3 fixes a regression in thermal monitoring of transceiver modules
and gearboxes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mykola Kostenok [Sun, 6 Jun 2021 08:24:32 +0000 (11:24 +0300)]
mlxsw: core: Set thermal zone polling delay argument to real value at init
Thermal polling delay argument for modules and gearboxes thermal zones
used to be initialized with zero value, while actual delay was used to
be set by mlxsw_thermal_set_mode() by thermal operation callback
set_mode(). After operations set_mode()/get_mode() have been removed by
cited commits, modules and gearboxes thermal zones always have polling
time set to zero and do not perform temperature monitoring.
Set non-zero "polling_delay" in thermal_zone_device_register() routine,
thus, the relevant thermal zones will perform thermal monitoring.
Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Fixes:
5d7bd8aa7c35 ("thermal: Simplify or eliminate unnecessary set_mode() methods")
Fixes:
1ee14820fd8e ("thermal: remove get_mode() operation of drivers")
Signed-off-by: Mykola Kostenok <c_mykolak@nvidia.com>
Acked-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Sun, 6 Jun 2021 08:24:31 +0000 (11:24 +0300)]
mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
In mlxsw Qdisc offload, find_class() is an operation that yields a qdisc
offload descriptor given a parental qdisc descriptor and a class handle. In
__mlxsw_sp_qdisc_ets_graft() however, a band number is passed to that
function instead of a handle. This can lead to a trigger of a WARN_ON
with the following splat:
WARNING: CPU: 3 PID: 808 at drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:1356 __mlxsw_sp_qdisc_ets_graft+0x115/0x130 [mlxsw_spectrum]
[...]
Call Trace:
mlxsw_sp_setup_tc_prio+0xe3/0x100 [mlxsw_spectrum]
qdisc_offload_graft_helper+0x35/0xa0
prio_graft+0x176/0x290 [sch_prio]
qdisc_graft+0xb3/0x540
tc_modify_qdisc+0x56a/0x8a0
rtnetlink_rcv_msg+0x12c/0x370
netlink_rcv_skb+0x49/0xf0
netlink_unicast+0x1f6/0x2b0
netlink_sendmsg+0x1fb/0x410
____sys_sendmsg+0x1f3/0x220
___sys_sendmsg+0x70/0xb0
__sys_sendmsg+0x54/0xa0
do_syscall_64+0x3a/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
Since the parent handle is not passed with the offload information, compute
it from the band number and qdisc handle.
Fixes:
28052e618b04 ("mlxsw: spectrum_qdisc: Track children per qdisc")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Sun, 6 Jun 2021 08:24:30 +0000 (11:24 +0300)]
mlxsw: reg: Spectrum-3: Enforce lowest max-shaper burst size of 11
A max-shaper is the HW component responsible for delaying egress traffic
above a configured transmission rate. Burst size is the amount of traffic
that is allowed to pass without accounting. The burst size value needs to
be such that it can be expressed as 2^BS * 512 bits, where BS lies in a
certain ASIC-dependent range. mlxsw enforces that this holds before
attempting to configure the shaper.
The assumption for Spectrum-3 was that the lower limit of BS would be 5,
like for Spectrum-1. But as of now, the limit is still 11. Therefore fix
the driver accordingly, so that incorrect values are rejected early with a
proper message.
Fixes:
23effa2479ba ("mlxsw: reg: Add max_shaper_bs to QoS ETS Element Configuration")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 6 Jun 2021 14:24:22 +0000 (17:24 +0300)]
ethtool: Fix NULL pointer dereference during module EEPROM dump
When get_module_eeprom_by_page() is not implemented by the driver, NULL
pointer dereference can occur [1].
Fix by testing if get_module_eeprom_by_page() is implemented instead of
get_module_info().
[1]
BUG: kernel NULL pointer dereference, address:
0000000000000000
[...]
CPU: 0 PID: 251 Comm: ethtool Not tainted 5.13.0-rc3-custom-00940-g3822d0670c9d #989
Call Trace:
eeprom_prepare_data+0x101/0x2d0
ethnl_default_doit+0xc2/0x290
genl_family_rcv_msg_doit+0xdc/0x140
genl_rcv_msg+0xd7/0x1d0
netlink_rcv_skb+0x49/0xf0
genl_rcv+0x1f/0x30
netlink_unicast+0x1f6/0x2c0
netlink_sendmsg+0x1f9/0x400
__sys_sendto+0xe1/0x130
__x64_sys_sendto+0x1b/0x20
do_syscall_64+0x3a/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes:
c97a31f66ebc ("ethtool: wire in generic SFP module access")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Fri, 4 Jun 2021 11:18:18 +0000 (16:48 +0530)]
cxgb4: avoid link re-train during TC-MQPRIO configuration
When configuring TC-MQPRIO offload, only turn off netdev carrier and
don't bring physical link down in hardware. Otherwise, when the
physical link is brought up again after configuration, it gets
re-trained and stalls ongoing traffic.
Also, when firmware is no longer accessible or crashed, avoid sending
FLOWC and waiting for reply that will never come.
Fix following hung_task_timeout_secs trace seen in these cases.
INFO: task tc:20807 blocked for more than 122 seconds.
Tainted: G S 5.13.0-rc3+ #122
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:tc state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000
Call Trace:
__schedule+0x27b/0x6a0
schedule+0x37/0xa0
schedule_preempt_disabled+0x5/0x10
__mutex_lock.isra.14+0x2a0/0x4a0
? netlink_lookup+0x120/0x1a0
? rtnl_fill_ifinfo+0x10f0/0x10f0
__netlink_dump_start+0x70/0x250
rtnetlink_rcv_msg+0x28b/0x380
? rtnl_fill_ifinfo+0x10f0/0x10f0
? rtnl_calcit.isra.42+0x120/0x120
netlink_rcv_skb+0x4b/0xf0
netlink_unicast+0x1a0/0x280
netlink_sendmsg+0x216/0x440
sock_sendmsg+0x56/0x60
__sys_sendto+0xe9/0x150
? handle_mm_fault+0x6d/0x1b0
? do_user_addr_fault+0x1c5/0x620
__x64_sys_sendto+0x1f/0x30
do_syscall_64+0x3c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7f73218321
RSP: 002b:
00007ffd19626208 EFLAGS:
00000246 ORIG_RAX:
000000000000002c
RAX:
ffffffffffffffda RBX:
000055b7c0a8b240 RCX:
00007f7f73218321
RDX:
0000000000000028 RSI:
00007ffd19626210 RDI:
0000000000000003
RBP:
000055b7c08680ff R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
000055b7c085f5f6
R13:
000055b7c085f60a R14:
00007ffd19636470 R15:
00007ffd196262a0
Fixes:
b1396c2bd675 ("cxgb4: parse and configure TC-MQPRIO offload")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunjian Wang [Fri, 4 Jun 2021 11:03:18 +0000 (19:03 +0800)]
sch_htb: fix refcount leak in htb_parent_to_leaf_offload
The commit
ae81feb7338c ("sch_htb: fix null pointer dereference
on a null new_q") fixes a NULL pointer dereference bug, but it
is not correct.
Because htb_graft_helper properly handles the case when new_q
is NULL, and after the previous patch by skipping this call
which creates an inconsistency : dev_queue->qdisc will still
point to the old qdisc, but cl->parent->leaf.q will point to
the new one (which will be noop_qdisc, because new_q was NULL).
The code is based on an assumption that these two pointers are
the same, so it can lead to refcount leaks.
The correct fix is to add a NULL pointer check to protect
qdisc_refcount_inc inside htb_parent_to_leaf_offload.
Fixes:
ae81feb7338c ("sch_htb: fix null pointer dereference on a null new_q")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Suggested-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Jun 2021 21:27:07 +0000 (14:27 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-06-04
This series contains updates to virtchnl header file and ice driver.
Brett fixes VF being unable to request a different number of queues then
allocated and adds clearing of VF_MBX_ATQLEN register for VF reset.
Haiyue handles error of rebuilding VF VSI during reset.
Paul fixes reporting of autoneg to use the PHY capabilities.
Dave allows LLDP packets without priority of TC_PRIO_CONTROL to be
transmitted.
Geert Uytterhoeven adds explicit padding to virtchnl_proto_hdrs
structure in the virtchnl header file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Jun 2021 21:25:14 +0000 (14:25 -0700)]
Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:
====================
wireguard fixes for 5.13-rc5
Here are bug fixes to WireGuard for 5.13-rc5:
1-2,6) These are small, trivial tweaks to our test harness.
3) Linus thinks -O3 is still dangerous to enable. The code gen wasn't so
much different with -O2 either.
4) We were accidentally calling synchronize_rcu instead of
synchronize_net while holding the rtnl_lock, resulting in some rather
large stalls that hit production machines.
5) Peer allocation was wasting literally hundreds of megabytes on real
world deployments, due to oddly sized large objects not fitting
nicely into a kmalloc slab.
7-9) We move from an insanely expensive O(n) algorithm to a fast O(1)
algorithm, and cleanup a massive memory leak in the process, in
which allowed ips churn would leave danging nodes hanging around
without cleanup until the interface was removed. The O(1) algorithm
eliminates packet stalls and high latency issues, in addition to
bringing operations that took as much as 10 minutes down to less
than a second.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:38 +0000 (17:17 +0200)]
wireguard: allowedips: free empty intermediate nodes when removing single node
When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.
In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.
The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.
We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:37 +0000 (17:17 +0200)]
wireguard: allowedips: allocate nodes in kmem_cache
The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:36 +0000 (17:17 +0200)]
wireguard: allowedips: remove nodes in O(1)
Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.
This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.
The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers. We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.
Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:35 +0000 (17:17 +0200)]
wireguard: allowedips: initialize list head in selftest
The randomized trie tests weren't initializing the dummy peer list head,
resulting in a NULL pointer dereference when used. Fix this by
initializing it in the randomized trie test, just like we do for the
static unit test.
While we're at it, all of the other strings like this have the word
"self-test", so add it to the missing place here.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:34 +0000 (17:17 +0200)]
wireguard: peer: allocate in kmem_cache
With deployments having upwards of 600k peers now, this somewhat heavy
structure could benefit from more fine-grained allocations.
Specifically, instead of using a 2048-byte slab for a 1544-byte object,
we can now use 1544-byte objects directly, thus saving almost 25%
per-peer, or with 600k peers, that's a savings of 303 MiB. This also
makes wireguard's memory usage more transparent in tools like slabtop
and /proc/slabinfo.
Fixes:
8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:33 +0000 (17:17 +0200)]
wireguard: use synchronize_net rather than synchronize_rcu
Many of the synchronization points are sometimes called under the rtnl
lock, which means we should use synchronize_net rather than
synchronize_rcu. Under the hood, this expands to using the expedited
flavor of function in the event that rtnl is held, in order to not stall
other concurrent changes.
This fixes some very, very long delays when removing multiple peers at
once, which would cause some operations to take several minutes.
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:32 +0000 (17:17 +0200)]
wireguard: do not use -O3
Apparently, various versions of gcc have O3-related miscompiles. Looking
at the difference between -O2 and -O3 for gcc 11 doesn't indicate
miscompiles, but the difference also doesn't seem so significant for
performance that it's worth risking.
Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/
Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:31 +0000 (17:17 +0200)]
wireguard: selftests: make sure rp_filter is disabled on vethc
Some distros may enable strict rp_filter by default, which will prevent
vethc from receiving the packets with an unrouteable reverse path address.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Fri, 4 Jun 2021 15:17:30 +0000 (17:17 +0200)]
wireguard: selftests: remove old conntrack kconfig value
On recent kernels, this config symbol is no longer used.
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Fixes:
e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Wed, 19 May 2021 19:43:50 +0000 (21:43 +0200)]
virtchnl: Add missing padding to virtchnl_proto_hdrs
On m68k (Coldfire M547x):
CC drivers/net/ethernet/intel/i40e/i40e_main.o
In file included from drivers/net/ethernet/intel/i40e/i40e_prototype.h:9,
from drivers/net/ethernet/intel/i40e/i40e.h:41,
from drivers/net/ethernet/intel/i40e/i40e_main.c:12:
include/linux/avf/virtchnl.h:153:36: warning: division by zero [-Wdiv-by-zero]
153 | { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) }
| ^
include/linux/avf/virtchnl.h:844:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’
844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/avf/virtchnl.h:844:33: error: enumerator value for ‘virtchnl_static_assert_virtchnl_proto_hdrs’ is not an integer constant
844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
| ^~~~~~~~~~~~~~~~~~~
On m68k, integers are aligned on addresses that are multiples of two,
not four, bytes. Hence the size of a structure containing integers may
not be divisible by 4.
Fix this by adding explicit padding.
Fixes:
1f7ea1cd6a374842 ("ice: Enable FDIR Configure for AVF")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dave Ertman [Wed, 5 May 2021 21:17:59 +0000 (14:17 -0700)]
ice: Allow all LLDP packets from PF to Tx
Currently in the ice driver, the check whether to
allow a LLDP packet to egress the interface from the
PF_VSI is being based on the SKB's priority field.
It checks to see if the packets priority is equal to
TC_PRIO_CONTROL. Injected LLDP packets do not always
meet this condition.
SCAPY defaults to a sk_buff->protocol value of ETH_P_ALL
(0x0003) and does not set the priority field. There will
be other injection methods (even ones used by end users)
that will not correctly configure the socket so that
SKB fields are correctly populated.
Then ethernet header has to have to correct value for
the protocol though.
Add a check to also allow packets whose ethhdr->h_proto
matches ETH_P_LLDP (0x88CC).
Fixes:
0c3a6101ff2d ("ice: Allow egress control packets from PF_VSI")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paul Greenwalt [Wed, 5 May 2021 21:17:58 +0000 (14:17 -0700)]
ice: report supported and advertised autoneg using PHY capabilities
Ethtool incorrectly reported supported and advertised auto-negotiation
settings for a backplane PHY image which did not support auto-negotiation.
This can occur when using media or PHY type for reporting ethtool
supported and advertised auto-negotiation settings.
Remove setting supported and advertised auto-negotiation settings based
on PHY type in ice_phy_type_to_ethtool(), and MAC type in
ice_get_link_ksettings().
Ethtool supported and advertised auto-negotiation settings should be
based on the PHY image using the AQ command get PHY capabilities with
media. Add setting supported and advertised auto-negotiation settings
based get PHY capabilities with media in ice_get_link_ksettings().
Fixes:
48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Haiyue Wang [Fri, 26 Feb 2021 21:19:31 +0000 (13:19 -0800)]
ice: handle the VF VSI rebuild failure
VSI rebuild can be failed for LAN queue config, then the VF's VSI will
be NULL, the VF reset should be stopped with the VF entering into the
disable state.
Fixes:
12bb018c538c ("ice: Refactor VF reset")
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Brett Creeley [Fri, 26 Feb 2021 21:19:21 +0000 (13:19 -0800)]
ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
Some AVF drivers expect the VF_MBX_ATQLEN register to be cleared for any
type of VFR/VFLR. Fix this by clearing the VF_MBX_ATQLEN register at the
same time as VF_MBX_ARQLEN.
Fixes:
82ba01282cf8 ("ice: clear VF ARQLEN register on reset")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Brett Creeley [Fri, 26 Feb 2021 21:19:20 +0000 (13:19 -0800)]
ice: Fix allowing VF to request more/less queues via virtchnl
Commit
12bb018c538c ("ice: Refactor VF reset") caused a regression
that removes the ability for a VF to request a different amount of
queues via VIRTCHNL_OP_REQUEST_QUEUES. This prevents VF drivers to
either increase or decrease the number of queue pairs they are
allocated. Fix this by using the variable vf->num_req_qs when
determining the vf->num_vf_qs during VF VSI creation.
Fixes:
12bb018c538c ("ice: Refactor VF reset")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Thu, 3 Jun 2021 22:32:21 +0000 (15:32 -0700)]
Merge tag 'for-net-2021-06-03' of git://git./linux/kernel/git/bluetooth/bluetooth
bluetooth pull request for net:
- Fixes UAF and CVE-2021-3564
- Fix VIRTIO_ID_BT to use an unassigned ID
- Fix firmware loading on some Intel Controllers
Signed-off-by: David S. Miller <davem@davemloft.net>
Xuan Zhuo [Thu, 3 Jun 2021 17:09:01 +0000 (01:09 +0800)]
virtio-net: fix for skb_over_panic inside big mode
In virtio-net's large packet mode, there is a hole in the space behind
buf.
hdr_padded_len - hdr_len
We must take this into account when calculating tailroom.
[ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1))
[ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131)
[ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714)
[ 44.546628] ? dev_gro_receive (net/core/dev.c:6103)
[ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565)
[ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525)
[ 44.548251] __napi_poll (net/core/dev.c:6985)
[ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139)
[ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560)
[ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649)
[ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13))
[ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
[ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
Fixes:
fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: Corentin Noël <corentin.noel@collabora.com>
Tested-by: Corentin Noël <corentin.noel@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Jun 2021 22:21:58 +0000 (15:21 -0700)]
Merge tag 'ieee802154-for-davem-2021-06-03' of git://git./linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
An update from ieee802154 for your *net* tree.
This time we have fixes for the ieee802154 netlink code, as well as a driver
fix. Zhen Lei, Wei Yongjun and Yang Li each had a patch to cleanup some return
code handling ensuring we actually get a real error code when things fails.
Dan Robertson fixed a potential null dereference in our netlink handling.
Andy Shevchenko removed of_match_ptr()usage in the mrf24j40 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Coco Li [Thu, 3 Jun 2021 07:32:58 +0000 (07:32 +0000)]
ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
Reported by syzbot:
HEAD commit:
90c911ad Merge tag 'fixes' of git://git.kernel.org/pub/scm..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
link: https://syzkaller.appspot.com/bug?extid=123aa35098fd3c000eb7
compiler: Debian clang version 11.0.1-2
==================================================================
BUG: KASAN: slab-out-of-bounds in fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
BUG: KASAN: slab-out-of-bounds in fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
Read of size 8 at addr
ffff8880145c78f8 by task syz-executor.4/17760
CPU: 0 PID: 17760 Comm: syz-executor.4 Not tainted 5.12.0-rc8-syzkaller #0
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x202/0x31e lib/dump_stack.c:120
print_address_description+0x5f/0x3b0 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report+0x15c/0x200 mm/kasan/report.c:416
fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
fib6_nh_release+0x9a/0x430 net/ipv6/route.c:3536
fib6_info_destroy_rcu+0xcb/0x1c0 net/ipv6/ip6_fib.c:174
rcu_do_batch kernel/rcu/tree.c:2559 [inline]
rcu_core+0x8f6/0x1450 kernel/rcu/tree.c:2794
__do_softirq+0x372/0x7a6 kernel/softirq.c:345
invoke_softirq kernel/softirq.c:221 [inline]
__irq_exit_rcu+0x22c/0x260 kernel/softirq.c:422
irq_exit_rcu+0x5/0x20 kernel/softirq.c:434
sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:lock_acquire+0x1f6/0x720 kernel/locking/lockdep.c:5515
Code: f6 84 24 a1 00 00 00 02 0f 85 8d 02 00 00 f7 c3 00 02 00 00 49 bd 00 00 00 00 00 fc ff df 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 3d 00 00 00 00 00 4b c7 44 3d 09 00 00 00 00 43 c7 44 3d
RSP: 0018:
ffffc90009e06560 EFLAGS:
00000206
RAX:
1ffff920013c0cc0 RBX:
0000000000000246 RCX:
dffffc0000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffffc90009e066e0 R08:
dffffc0000000000 R09:
fffffbfff1f992b1
R10:
fffffbfff1f992b1 R11:
0000000000000000 R12:
0000000000000000
R13:
dffffc0000000000 R14:
0000000000000000 R15:
1ffff920013c0cb4
rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267
rcu_read_lock include/linux/rcupdate.h:656 [inline]
ext4_get_group_info+0xea/0x340 fs/ext4/ext4.h:3231
ext4_mb_prefetch+0x123/0x5d0 fs/ext4/mballoc.c:2212
ext4_mb_regular_allocator+0x8a5/0x28f0 fs/ext4/mballoc.c:2379
ext4_mb_new_blocks+0xc6e/0x24f0 fs/ext4/mballoc.c:4982
ext4_ext_map_blocks+0x2be3/0x7210 fs/ext4/extents.c:4238
ext4_map_blocks+0xab3/0x1cb0 fs/ext4/inode.c:638
ext4_getblk+0x187/0x6c0 fs/ext4/inode.c:848
ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:900
ext4_append+0x1a4/0x360 fs/ext4/namei.c:67
ext4_init_new_dir+0x337/0xa10 fs/ext4/namei.c:2768
ext4_mkdir+0x4b8/0xc00 fs/ext4/namei.c:2814
vfs_mkdir+0x45b/0x640 fs/namei.c:3819
ovl_do_mkdir fs/overlayfs/overlayfs.h:161 [inline]
ovl_mkdir_real+0x53/0x1a0 fs/overlayfs/dir.c:146
ovl_create_real+0x280/0x490 fs/overlayfs/dir.c:193
ovl_workdir_create+0x425/0x600 fs/overlayfs/super.c:788
ovl_make_workdir+0xed/0x1140 fs/overlayfs/super.c:1355
ovl_get_workdir fs/overlayfs/super.c:1492 [inline]
ovl_fill_super+0x39ee/0x5370 fs/overlayfs/super.c:2035
mount_nodev+0x52/0xe0 fs/super.c:1413
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1497
do_new_mount fs/namespace.c:2903 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3233
do_mount fs/namespace.c:3246 [inline]
__do_sys_mount fs/namespace.c:3454 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3431
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f68f2b87188 EFLAGS:
00000246 ORIG_RAX:
00000000000000a5
RAX:
ffffffffffffffda RBX:
000000000056bf60 RCX:
00000000004665f9
RDX:
00000000200000c0 RSI:
0000000020000000 RDI:
000000000040000a
RBP:
00000000004bfbb9 R08:
0000000020000100 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
000000000056bf60
R13:
00007ffe19002dff R14:
00007f68f2b87300 R15:
0000000000022000
Allocated by task 17768:
kasan_save_stack mm/kasan/common.c:38 [inline]
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:427 [inline]
____kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506
kasan_kmalloc include/linux/kasan.h:233 [inline]
__kmalloc+0xb4/0x380 mm/slub.c:4055
kmalloc include/linux/slab.h:559 [inline]
kzalloc include/linux/slab.h:684 [inline]
fib6_info_alloc+0x2c/0xd0 net/ipv6/ip6_fib.c:154
ip6_route_info_create+0x55d/0x1a10 net/ipv6/route.c:3638
ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmsg+0x319/0x400 net/socket.c:2433
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
Last potentially related work creation:
kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3039 [inline]
call_rcu+0x1b1/0xa30 kernel/rcu/tree.c:3114
fib6_info_release include/net/ip6_fib.h:337 [inline]
ip6_route_info_create+0x10c4/0x1a10 net/ipv6/route.c:3718
ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmsg+0x319/0x400 net/socket.c:2433
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation:
kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
insert_work+0x54/0x400 kernel/workqueue.c:1331
__queue_work+0x981/0xcc0 kernel/workqueue.c:1497
queue_work_on+0x111/0x200 kernel/workqueue.c:1524
queue_work include/linux/workqueue.h:507 [inline]
call_usermodehelper_exec+0x283/0x470 kernel/umh.c:433
kobject_uevent_env+0x1349/0x1730 lib/kobject_uevent.c:617
kvm_uevent_notify_change+0x309/0x3b0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4809
kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:877 [inline]
kvm_put_kvm+0x9c/0xd10 arch/x86/kvm/../../../virt/kvm/kvm_main.c:920
kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3120
__fput+0x352/0x7b0 fs/file_table.c:280
task_work_run+0x146/0x1c0 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at
ffff8880145c7800
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 56 bytes to the right of
192-byte region [
ffff8880145c7800,
ffff8880145c78c0)
The buggy address belongs to the page:
page:
ffffea00005171c0 refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x145c7
flags: 0xfff00000000200(slab)
raw:
00fff00000000200 ffffea00006474c0 0000000200000002 ffff888010c41a00
raw:
0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880145c7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8880145c7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
ffff8880145c7880: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8880145c7900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880145c7980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
In the ip6_route_info_create function, in the case that the nh pointer
is not NULL, the fib6_nh in fib6_info has not been allocated.
Therefore, when trying to free fib6_info in this error case using
fib6_info_release, the function will call fib6_info_destroy_rcu,
which it will access fib6_nh_release(f6i->fib6_nh);
However, f6i->fib6_nh doesn't have any refcount yet given the lack of allocation
causing the reported memory issue above.
Therefore, releasing the empty pointer directly instead would be the solution.
Fixes:
f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info")
Fixes:
706ec91916462 ("ipv6: Fix nexthop refcnt leak when creating ipv6 route info")
Signed-off-by: Coco Li <lixiaoyan@google.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Jun 2021 22:17:33 +0000 (15:17 -0700)]
Merge tag 'wireless-drivers-2021-06-03' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.13
We have only mt76 fixes this time, most important being the fix for
A-MSDU injection attacks.
mt76
* mitigate A-MSDU injection attacks (CVE-2020-24588)
* fix possible array out of bound access in mt7921_mcu_tx_rate_report
* various aggregation and HE setting fixes
* suspend/resume fix for pci devices
* mt7615: fix crash when runtime-pm is not supported
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Wed, 2 Jun 2021 14:06:58 +0000 (22:06 +0800)]
fib: Return the correct errno code
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Wed, 2 Jun 2021 14:06:40 +0000 (22:06 +0800)]
net: Return the correct errno code
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Yongjun [Wed, 2 Jun 2021 14:06:30 +0000 (22:06 +0800)]
net/x25: Return the correct errno code
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 2 Jun 2021 14:08:59 +0000 (19:38 +0530)]
cxgb4: fix regression with HASH tc prio value update
commit
db43b30cd89c ("cxgb4: add ethtool n-tuple filter deletion")
has moved searching for next highest priority HASH filter rule to
cxgb4_flow_rule_destroy(), which searches the rhashtable before the
the rule is removed from it and hence always finds at least 1 entry.
Fix by removing the rule from rhashtable first before calling
cxgb4_flow_rule_destroy() and hence avoid fetching stale info.
Fixes:
db43b30cd89c ("cxgb4: add ethtool n-tuple filter deletion")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Jun 2021 22:05:07 +0000 (15:05 -0700)]
Merge branch 'caif-fixes'
Pavel Skripkin says:
====================
This patch series fix 2 memory leaks in caif
interface.
Syzbot reported memory leak in cfserl_create().
The problem was in cfcnfg_add_phy_layer() function.
This function accepts struct cflayer *link_support and
assign it to corresponting structures, but it can fail
in some cases.
These cases must be handled to prevent leaking allocated
struct cflayer *link_support pointer, because if error accured
before assigning link_support pointer to somewhere, this pointer
must be freed.
Fail log:
[ 49.051872][ T7010] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
[ 49.110236][ T7042] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
[ 49.134936][ T7045] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
[ 49.163083][ T7043] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
[ 55.248950][ T6994] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
int cfcnfg_add_phy_layer(..., struct cflayer *link_support, ...)
{
...
/* CAIF protocol allow maximum 6 link-layers */
for (i = 0; i < 7; i++) {
phyid = (dev->ifindex + i) & 0x7;
if (phyid == 0)
continue;
if (cfcnfg_get_phyinfo_rcu(cnfg, phyid) == NULL)
goto got_phyid;
}
pr_warn("Too many CAIF Link Layers (max 6)\n");
goto out;
...
if (link_support != NULL) {
link_support->id = phyid;
layer_set_dn(frml, link_support);
layer_set_up(link_support, frml);
layer_set_dn(link_support, phy_layer);
layer_set_up(phy_layer, link_support);
}
...
}
As you can see, if cfcnfg_add_phy_layer fails before layer_set_*,
link_support becomes leaked.
So, in this series, I made cfcnfg_add_phy_layer()
return an int and added error handling code to prevent
leaking link_support pointer in caif_device_notify()
and cfusbl_device_notify() functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 3 Jun 2021 16:39:35 +0000 (19:39 +0300)]
net: caif: fix memory leak in cfusbl_device_notify
In case of caif_enroll_dev() fail, allocated
link_support won't be assigned to the corresponding
structure. So simply free allocated pointer in case
of error.
Fixes:
7ad65bf68d70 ("caif: Add support for CAIF over CDC NCM USB interface")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 3 Jun 2021 16:39:11 +0000 (19:39 +0300)]
net: caif: fix memory leak in caif_device_notify
In case of caif_enroll_dev() fail, allocated
link_support won't be assigned to the corresponding
structure. So simply free allocated pointer in case
of error
Fixes:
7c18d2205ea7 ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 3 Jun 2021 16:38:51 +0000 (19:38 +0300)]
net: caif: add proper error handling
caif_enroll_dev() can fail in some cases. Ingnoring
these cases can lead to memory leak due to not assigning
link_support pointer to anywhere.
Fixes:
7c18d2205ea7 ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 3 Jun 2021 16:38:12 +0000 (19:38 +0300)]
net: caif: added cfserl_release function
Added cfserl_release() function.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Jun 2021 22:02:55 +0000 (15:02 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
This series contains updates to igb, igc, ixgbe, ixgbevf, i40e and ice
drivers.
Kurt Kanzenbach fixes XDP for igb when PTP is enabled by pulling the
timestamp and adjusting appropriate values prior to XDP operations.
Magnus adds missing exception tracing for XDP on igb, igc, ixgbe,
ixgbevf, i40e and ice drivers.
Maciej adds tracking of AF_XDP zero copy enabled queues to resolve an
issue with copy mode Tx for the ice driver.
Note: Patch 7 will conflict when merged with net-next. Please carry
these changes forward. IGC_XDP_TX and IGC_XDP_REDIRECT will need to be
changed to return to conform with the net-next changes. Let me know if
you have issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Jun 2021 21:17:42 +0000 (14:17 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2021-06-02
The following pull-request contains BPF updates for your *net* tree.
We've added 2 non-merge commits during the last 7 day(s) which contain
a total of 4 files changed, 19 insertions(+), 24 deletions(-).
The main changes are:
1) Fix pahole BTF generation when ccache is used, from Javier Martinez Canillas.
2) Fix BPF lockdown hooks in bpf_probe_read_kernel{,_str}() helpers which caused
a deadlock from bcc programs, triggered OOM killer from audit side and didn't
work generally with SELinux policy rules due to pointing to wrong task struct,
from Daniel Borkmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Wed, 2 Jun 2021 19:26:40 +0000 (22:26 +0300)]
net: kcm: fix memory leak in kcm_sendmsg
Syzbot reported memory leak in kcm_sendmsg()[1].
The problem was in non-freed frag_list in case of error.
In the while loop:
if (head == skb)
skb_shinfo(head)->frag_list = tskb;
else
skb->next = tskb;
frag_list filled with skbs, but nothing was freeing them.
backtrace:
[<
0000000094c02615>] __alloc_skb+0x5e/0x250 net/core/skbuff.c:198
[<
00000000e5386cbd>] alloc_skb include/linux/skbuff.h:1083 [inline]
[<
00000000e5386cbd>] kcm_sendmsg+0x3b6/0xa50 net/kcm/kcmsock.c:967 [1]
[<
00000000f1613a8a>] sock_sendmsg_nosec net/socket.c:652 [inline]
[<
00000000f1613a8a>] sock_sendmsg+0x4c/0x60 net/socket.c:672
Reported-and-tested-by: syzbot+b039f5699bd82e1fb011@syzkaller.appspotmail.com
Fixes:
ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luiz Augusto von Dentz [Fri, 30 Apr 2021 23:05:01 +0000 (16:05 -0700)]
Bluetooth: btusb: Fix failing to init controllers with operation firmware
Some firmware when operation don't may have broken versions leading to
error like the following:
[ 6.176482] Bluetooth: hci0: Firmware revision 0.0 build 121 week 7 2021
[ 6.177906] bluetooth hci0: Direct firmware load for intel/ibt-20-0-0.sfi failed with error -2
[ 6.177910] Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-20-0-0.sfi (-2)
Since we load the firmware file just to check if its version had changed
comparing to the one already loaded we can just skip since the firmware
is already operation.
Fixes:
ac0565462e330 ("Bluetooth: btintel: Check firmware version before
download")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Marcel Holtmann [Thu, 3 Jun 2021 19:20:26 +0000 (21:20 +0200)]
Bluetooth: Fix VIRTIO_ID_BT assigned number
It turned out that the VIRTIO_ID_* are not assigned in the virtio_ids.h
file in the upstream kernel. Picking the next free one was wrong and
there is a process that has been followed now.
See https://github.com/oasis-tcs/virtio-spec/issues/108 for details.
Fixes:
afd2daa26c7a ("Bluetooth: Add support for virtio transport driver")
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
zhang kai [Wed, 2 Jun 2021 10:36:26 +0000 (18:36 +0800)]
sit: set name of device back to struct parms
addrconf_set_sit_dstaddr will use parms->name.
Signed-off-by: zhang kai <zhangkaiheb@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiapeng Chong [Wed, 2 Jun 2021 10:15:04 +0000 (18:15 +0800)]
rtnetlink: Fix missing error code in rtnl_bridge_notify()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'err'.
Eliminate the follow smatch warning:
net/core/rtnetlink.c:4834 rtnl_bridge_notify() warn: missing error code
'err'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Jun 2021 20:49:08 +0000 (13:49 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Do not allow to add conntrack helper extension for confirmed
conntracks in the nf_tables ct expectation support.
2) Fix bogus EBUSY in nfnetlink_cthelper when NFCTH_PRIV_DATA_LEN
is passed on userspace helper updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Fijalkowski [Tue, 27 Apr 2021 19:52:09 +0000 (21:52 +0200)]
ice: track AF_XDP ZC enabled queues in bitmap
Commit
c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure")
silently introduced a regression and broke the Tx side of AF_XDP in copy
mode. xsk_pool on ice_ring is set only based on the existence of the XDP
prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
That is not something that should happen for copy mode as it should use
the regular data path ice_clean_tx_irq.
This results in a following splat when xdpsock is run in txonly or l2fwd
scenarios in copy mode:
<snip>
[ 106.050195] BUG: kernel NULL pointer dereference, address:
0000000000000030
[ 106.057269] #PF: supervisor read access in kernel mode
[ 106.062493] #PF: error_code(0x0000) - not-present page
[ 106.067709] PGD 0 P4D 0
[ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
[ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.
031920191559 03/19/2019
[ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
[ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
[ 106.115588] RSP: 0018:
ffffc9000d694e50 EFLAGS:
00010206
[ 106.120893] RAX:
0000000000000000 RBX:
ffff88984b8c8a00 RCX:
ffff889852581800
[ 106.128137] RDX:
0000000000000006 RSI:
0000000000000000 RDI:
ffff88984cd8b800
[ 106.135383] RBP:
ffff888123b50001 R08:
ffff889896800000 R09:
0000000000000800
[ 106.142628] R10:
0000000000000000 R11:
ffffffff826060c0 R12:
00000000000000ff
[ 106.149872] R13:
0000000000000000 R14:
0000000000000040 R15:
ffff888123b50018
[ 106.157117] FS:
0000000000000000(0000) GS:
ffff8897e0f40000(0000) knlGS:
0000000000000000
[ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 106.171163] CR2:
0000000000000030 CR3:
000000000560a004 CR4:
00000000007706e0
[ 106.178408] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 106.185653] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 106.192898] PKRU:
55555554
[ 106.195653] Call Trace:
[ 106.198143] <IRQ>
[ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
[ 106.205087] ice_napi_poll+0x3e/0x590 [ice]
[ 106.209356] __napi_poll+0x2a/0x160
[ 106.212911] net_rx_action+0xd6/0x200
[ 106.216634] __do_softirq+0xbf/0x29b
[ 106.220274] irq_exit_rcu+0x88/0xc0
[ 106.223819] common_interrupt+0x7b/0xa0
[ 106.227719] </IRQ>
[ 106.229857] asm_common_interrupt+0x1e/0x40
</snip>
Fix this by introducing the bitmap of queues that are zero-copy enabled,
where each bit, corresponding to a queue id that xsk pool is being
configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
checked within ice_xsk_pool(). The latter is a function used for
deciding which napi poll routine is executed.
Idea is being taken from our other drivers such as i40e and ixgbe.
Fixes:
c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 10 May 2021 09:38:54 +0000 (11:38 +0200)]
igc: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.
Fixes:
73f1071c1d29 ("igc: Add support for XDP_TX action")
Fixes:
4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 10 May 2021 09:38:53 +0000 (11:38 +0200)]
ixgbevf: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.
Fixes:
21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 10 May 2021 09:38:52 +0000 (11:38 +0200)]
igb: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.
Fixes:
9cbc948b5a20 ("igb: add XDP support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 10 May 2021 09:38:51 +0000 (11:38 +0200)]
ixgbe: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.
Fixes:
33fdc82f0883 ("ixgbe: add support for XDP_TX action")
Fixes:
d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 10 May 2021 09:38:50 +0000 (11:38 +0200)]
ice: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.
Fixes:
efc2214b6047 ("ice: Add support for XDP")
Fixes:
2d4238f55697 ("ice: Add support for AF_XDP")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Magnus Karlsson [Mon, 10 May 2021 09:38:49 +0000 (11:38 +0200)]
i40e: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different errors
can occur. The support was only partial. Several errors where not
logged which would confuse the user quite a lot not knowing where and
why the packets disappeared.
Fixes:
74608d17fe29 ("i40e: add support for XDP_TX action")
Fixes:
0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Kurt Kanzenbach [Mon, 3 May 2021 07:28:00 +0000 (09:28 +0200)]
igb: Fix XDP with PTP enabled
When using native XDP with the igb driver, the XDP frame data doesn't point to
the beginning of the packet. It's off by 16 bytes. Everything works as expected
with XDP skb mode.
Actually these 16 bytes are used to store the packet timestamps. Therefore, pull
the timestamp before executing any XDP operations and adjust all other code
accordingly. The igc driver does it like that as well.
Tested with Intel i210 card and AF_XDP sockets.
Fixes:
9cbc948b5a20 ("igb: add XDP support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Wei Yongjun [Wed, 19 May 2021 14:16:14 +0000 (14:16 +0000)]
ieee802154: fix error return code in ieee802154_llsec_getparams()
Fix to return negative error code -ENOBUFS from the error handling
case instead of 0, as done elsewhere in this function.
Fixes:
3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210519141614.3040055-1-weiyongjun1@huawei.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Zhen Lei [Sat, 8 May 2021 06:25:17 +0000 (14:25 +0800)]
ieee802154: fix error return code in ieee802154_add_iface()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes:
be51da0f3e34 ("ieee802154: Stop using NLA_PUT*().")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210508062517.2574-1-thunder.leizhen@huawei.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Andy Shevchenko [Mon, 31 May 2021 13:22:26 +0000 (16:22 +0300)]
net: ieee802154: mrf24j40: Drop unneeded of_match_ptr()
Driver can be used in different environments and moreover, when compiled
with !OF, the compiler may issue a warning due to unused mrf24j40_of_match
variable. Hence drop unneeded of_match_ptr() call.
While at it, update headers block to reflect above changes.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210531132226.47081-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Yang Li [Sun, 25 Apr 2021 10:24:59 +0000 (18:24 +0800)]
net/ieee802154: drop unneeded assignment in llsec_iter_devkeys()
In order to keep the code style consistency of the whole file,
redundant return value ‘rc’ and its assignments should be deleted
The clang_analyzer complains as follows:
net/ieee802154/nl-mac.c:1203:12: warning: Although the value stored to
'rc' is used in the enclosing expression, the value is never actually
read from 'rc'
No functional change, only more efficient.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619346299-40237-1-git-send-email-yang.lee@linux.alibaba.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Wong Vee Khee [Wed, 2 Jun 2021 02:31:25 +0000 (10:31 +0800)]
net: stmmac: fix issue where clk is being unprepared twice
In the case of MDIO bus registration failure due to no external PHY
devices is connected to the MAC, clk_disable_unprepare() is called in
stmmac_bus_clk_config() and intel_eth_pci_probe() respectively.
The second call in intel_eth_pci_probe() will caused the following:-
[ 16.578605] intel-eth-pci 0000:00:1e.5: No PHY found
[ 16.583778] intel-eth-pci 0000:00:1e.5: stmmac_dvr_probe: MDIO bus (id: 2) registration failed
[ 16.680181] ------------[ cut here ]------------
[ 16.684861] stmmac-0000:00:1e.5 already disabled
[ 16.689547] WARNING: CPU: 13 PID: 2053 at drivers/clk/clk.c:952 clk_core_disable+0x96/0x1b0
[ 16.697963] Modules linked in: dwc3 iTCO_wdt mei_hdcp iTCO_vendor_support udc_core x86_pkg_temp_thermal kvm_intel marvell10g kvm sch_fq_codel nfsd irqbypass dwmac_intel(+) stmmac uio ax88179_178a pcs_xpcs phylink uhid spi_pxa2xx_platform usbnet mei_me pcspkr tpm_crb mii i2c_i801 dw_dmac dwc3_pci thermal dw_dmac_core intel_rapl_msr libphy i2c_smbus mei tpm_tis intel_th_gth tpm_tis_core tpm intel_th_acpi intel_pmc_core intel_th i915 fuse configfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore
[ 16.746785] CPU: 13 PID: 2053 Comm: systemd-udevd Tainted: G U 5.13.0-rc3-intel-lts #76
[ 16.756134] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.
2012031421 12/03/2020
[ 16.769465] RIP: 0010:clk_core_disable+0x96/0x1b0
[ 16.774222] Code: 00 8b 05 45 96 17 01 85 c0 7f 24 48 8b 5b 30 48 85 db 74 a5 8b 43 7c 85 c0 75 93 48 8b 33 48 c7 c7 6e 32 cc b7 e8 b2 5d 52 00 <0f> 0b 5b 5d c3 65 8b 05 76 31 18 49 89 c0 48 0f a3 05 bc 92 1a 01
[ 16.793016] RSP: 0018:
ffffa44580523aa0 EFLAGS:
00010086
[ 16.798287] RAX:
0000000000000000 RBX:
ffff8d7d0eb70a00 RCX:
0000000000000000
[ 16.805435] RDX:
0000000000000002 RSI:
ffffffffb7c62d5f RDI:
00000000ffffffff
[ 16.812610] RBP:
0000000000000287 R08:
0000000000000000 R09:
ffffa445805238d0
[ 16.819759] R10:
0000000000000001 R11:
0000000000000001 R12:
ffff8d7d0eb70a00
[ 16.826904] R13:
ffff8d7d027370c8 R14:
0000000000000006 R15:
ffffa44580523ad0
[ 16.834047] FS:
00007f9882fa2600(0000) GS:
ffff8d80a0940000(0000) knlGS:
0000000000000000
[ 16.842177] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 16.847966] CR2:
00007f9882bea3d8 CR3:
000000010b126001 CR4:
0000000000370ee0
[ 16.855144] Call Trace:
[ 16.857614] clk_core_disable_lock+0x1b/0x30
[ 16.861941] intel_eth_pci_probe.cold+0x11d/0x136 [dwmac_intel]
[ 16.867913] pci_device_probe+0xcf/0x150
[ 16.871890] really_probe+0xf5/0x3e0
[ 16.875526] driver_probe_device+0x64/0x150
[ 16.879763] device_driver_attach+0x53/0x60
[ 16.883998] __driver_attach+0x9f/0x150
[ 16.887883] ? device_driver_attach+0x60/0x60
[ 16.892288] ? device_driver_attach+0x60/0x60
[ 16.896698] bus_for_each_dev+0x77/0xc0
[ 16.900583] bus_add_driver+0x184/0x1f0
[ 16.904469] driver_register+0x6c/0xc0
[ 16.908268] ? 0xffffffffc07ae000
[ 16.911598] do_one_initcall+0x4a/0x210
[ 16.915489] ? kmem_cache_alloc_trace+0x305/0x4e0
[ 16.920247] do_init_module+0x5c/0x230
[ 16.924057] load_module+0x2894/0x2b70
[ 16.927857] ? __do_sys_finit_module+0xb5/0x120
[ 16.932441] __do_sys_finit_module+0xb5/0x120
[ 16.936845] do_syscall_64+0x42/0x80
[ 16.940476] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 16.945586] RIP: 0033:0x7f98830e5ccd
[ 16.949177] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
[ 16.967970] RSP: 002b:
00007ffc66b60168 EFLAGS:
00000246 ORIG_RAX:
0000000000000139
[ 16.975583] RAX:
ffffffffffffffda RBX:
000055885de35ef0 RCX:
00007f98830e5ccd
[ 16.982725] RDX:
0000000000000000 RSI:
00007f98832541e3 RDI:
0000000000000012
[ 16.989868] RBP:
0000000000020000 R08:
0000000000000000 R09:
0000000000000000
[ 16.997042] R10:
0000000000000012 R11:
0000000000000246 R12:
00007f98832541e3
[ 17.004222] R13:
0000000000000000 R14:
0000000000000000 R15:
00007ffc66b60328
[ 17.011369] ---[ end trace
df06a3dab26b988c ]---
[ 17.016062] ------------[ cut here ]------------
[ 17.020701] stmmac-0000:00:1e.5 already unprepared
Removing the stmmac_bus_clks_config() call in stmmac_dvr_probe and let
dwmac-intel to handle the unprepare and disable of the clk device.
Fixes:
5ec55823438e ("net: stmmac: add clocks management for gmac driver")
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Triplett [Wed, 2 Jun 2021 01:38:41 +0000 (18:38 -0700)]
net: ipconfig: Don't override command-line hostnames or domains
If the user specifies a hostname or domain name as part of the ip=
command-line option, preserve it and don't overwrite it with one
supplied by DHCP/BOOTP.
For instance, ip=::::myhostname::dhcp will use "myhostname" rather than
ignoring and overwriting it.
Fix the comment on ic_bootp_string that suggests it only copies a string
"if not already set"; it doesn't have any such logic.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Jun 2021 20:12:00 +0000 (13:12 -0700)]
Merge tag 'mlx5-fixes-2021-06-01' of git://git./linu
x/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-06-01
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 28 May 2021 09:16:31 +0000 (09:16 +0000)]
bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
Commit
59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
added an implementation of the locked_down LSM hook to SELinux, with the aim
to restrict which domains are allowed to perform operations that would breach
lockdown. This is indirectly also getting audit subsystem involved to report
events. The latter is problematic, as reported by Ondrej and Serhei, since it
can bring down the whole system via audit:
1) The audit events that are triggered due to calls to security_locked_down()
can OOM kill a machine, see below details [0].
2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit()
when trying to wake up kauditd, for example, when using trace_sched_switch()
tracepoint, see details in [1]. Triggering this was not via some hypothetical
corner case, but with existing tools like runqlat & runqslower from bcc, for
example, which make use of this tracepoint. Rough call sequence goes like:
rq_lock(rq) -> -------------------------+
trace_sched_switch() -> |
bpf_prog_xyz() -> +-> deadlock
selinux_lockdown() -> |
audit_log_end() -> |
wake_up_interruptible() -> |
try_to_wake_up() -> |
rq_lock(rq) --------------+
What's worse is that the intention of
59438b46471a to further restrict lockdown
settings for specific applications in respect to the global lockdown policy is
completely broken for BPF. The SELinux policy rule for the current lockdown check
looks something like this:
allow <who> <who> : lockdown { <reason> };
However, this doesn't match with the 'current' task where the security_locked_down()
is executed, example: httpd does a syscall. There is a tracing program attached
to the syscall which triggers a BPF program to run, which ends up doing a
bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does
the permission check against 'current', that is, httpd in this example. httpd
has literally zero relation to this tracing program, and it would be nonsensical
having to write an SELinux policy rule against httpd to let the tracing helper
pass. The policy in this case needs to be against the entity that is installing
the BPF program. For example, if bpftrace would generate a histogram of syscall
counts by user space application:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
bpftrace would then go and generate a BPF program from this internally. One way
of doing it [for the sake of the example] could be to call bpf_get_current_task()
helper and then access current->comm via one of bpf_probe_read_kernel{,_str}()
helpers. So the program itself has nothing to do with httpd or any other random
app doing a syscall here. The BPF program _explicitly initiated_ the lockdown
check. The allow/deny policy belongs in the context of bpftrace: meaning, you
want to grant bpftrace access to use these helpers, but other tracers on the
system like my_random_tracer _not_.
Therefore fix all three issues at the same time by taking a completely different
approach for the security_locked_down() hook, that is, move the check into the
program verification phase where we actually retrieve the BPF func proto. This
also reliably gets the task (current) that is trying to install the BPF tracing
program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since
we're moving this out of the BPF helper's fast-path which can be called several
millions of times per second.
The check is then also in line with other security_locked_down() hooks in the
system where the enforcement is performed at open/load time, for example,
open_kcore() for /proc/kcore access or module_sig_check() for module signatures
just to pick few random ones. What's out of scope in the fix as well as in
other security_locked_down() hook locations /outside/ of BPF subsystem is that
if the lockdown policy changes on the fly there is no retrospective action.
This requires a different discussion, potentially complex infrastructure, and
it's also not clear whether this can be solved generically. Either way, it is
out of scope for a suitable stable fix which this one is targeting. Note that
the breakage is specifically on
59438b46471a where it started to rely on 'current'
as UAPI behavior, and _not_ earlier infrastructure such as
9d1f8be5cf42 ("bpf:
Restrict bpf when kernel lockdown is in confidentiality mode").
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:
I starting seeing this with F-34. When I run a container that is traced with
BPF to record the syscalls it is doing, auditd is flooded with messages like:
type=AVC msg=audit(
1619784520.593:282387): avc: denied { confidentiality }
for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0
tclass=lockdown permissive=0
This seems to be leading to auditd running out of space in the backlog buffer
and eventually OOMs the machine.
[...]
auditd running at 99% CPU presumably processing all the messages, eventually I get:
Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64
Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64
Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64
Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64
Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1
Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[...]
[1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/,
Serhei Makarov says:
Upstream kernel 5.11.0-rc7 and later was found to deadlock during a
bpf_probe_read_compat() call within a sched_switch tracepoint. The problem
is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend
testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on
ppc64le. Example stack trace:
[...]
[ 730.868702] stack backtrace:
[ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1
[ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 730.873278] Call Trace:
[ 730.873770] dump_stack+0x7f/0xa1
[ 730.874433] check_noncircular+0xdf/0x100
[ 730.875232] __lock_acquire+0x1202/0x1e10
[ 730.876031] ? __lock_acquire+0xfc0/0x1e10
[ 730.876844] lock_acquire+0xc2/0x3a0
[ 730.877551] ? __wake_up_common_lock+0x52/0x90
[ 730.878434] ? lock_acquire+0xc2/0x3a0
[ 730.879186] ? lock_is_held_type+0xa7/0x120
[ 730.880044] ? skb_queue_tail+0x1b/0x50
[ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90
[ 730.881656] ? __wake_up_common_lock+0x52/0x90
[ 730.882532] __wake_up_common_lock+0x52/0x90
[ 730.883375] audit_log_end+0x5b/0x100
[ 730.884104] slow_avc_audit+0x69/0x90
[ 730.884836] avc_has_perm+0x8b/0xb0
[ 730.885532] selinux_lockdown+0xa5/0xd0
[ 730.886297] security_locked_down+0x20/0x40
[ 730.887133] bpf_probe_read_compat+0x66/0xd0
[ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820
[ 730.888917] trace_call_bpf+0xe9/0x240
[ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0
[ 730.890579] perf_trace_sched_switch+0x142/0x180
[ 730.891485] ? __schedule+0x6d8/0xb20
[ 730.892209] __schedule+0x6d8/0xb20
[ 730.892899] schedule+0x5b/0xc0
[ 730.893522] exit_to_user_mode_prepare+0x11d/0x240
[ 730.894457] syscall_exit_to_user_mode+0x27/0x70
[ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae
[...]
Fixes:
59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Reported-by: Jakub Hrozek <jhrozek@redhat.com>
Reported-by: Serhei Makarov <smakarov@redhat.com>
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Frank Eigler <fche@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net
Pablo Neira Ayuso [Fri, 28 May 2021 11:45:16 +0000 (13:45 +0200)]
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
The private helper data size cannot be updated. However, updates that
contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
the same.
Fixes:
12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 27 May 2021 19:54:42 +0000 (21:54 +0200)]
netfilter: nft_ct: skip expectations for confirmed conntrack
nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
conntrack entry. However, nf_ct_ext_add() can only be called for
!nf_ct_is_confirmed().
[ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
[ 1825.351721] RSP: 0018:
ffffc90002e1f1e8 EFLAGS:
00010202
[ 1825.351790] RAX:
000000000000000e RBX:
ffff88814f5783c0 RCX:
ffffffffc0e4f887
[ 1825.351881] RDX:
dffffc0000000000 RSI:
0000000000000008 RDI:
ffff88814f578440
[ 1825.351971] RBP:
0000000000000000 R08:
0000000000000000 R09:
ffff88814f578447
[ 1825.352060] R10:
ffffed1029eaf088 R11:
0000000000000001 R12:
ffff88814f578440
[ 1825.352150] R13:
ffff8882053f3a00 R14:
0000000000000000 R15:
0000000000000a20
[ 1825.352240] FS:
00007f992261c900(0000) GS:
ffff889faec00000(0000) knlGS:
0000000000000000
[ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1825.352417] CR2:
000056070a4d1158 CR3:
000000015efe0000 CR4:
0000000000350ee0
[ 1825.352508] Call Trace:
[ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
[ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
[ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables]
Add the ct helper extension only for unconfirmed conntrack. Skip rule
evaluation if the ct helper extension does not exist. Thus, you can
only create expectations from the first packet.
It should be possible to remove this limitation by adding a new action
to attach a generic ct helper to the first packet. Then, use this ct
helper extension from follow up packets to create the ct expectation.
While at it, add a missing check to skip the template conntrack too
and remove check for IPCT_UNTRACK which is implicit to !ct.
Fixes:
857b46027d6f ("netfilter: nft_ct: add ct expectations support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yevgeny Kliteynik [Wed, 9 Dec 2020 14:40:38 +0000 (16:40 +0200)]
net/mlx5: DR, Create multi-destination flow table with level less than 64
Flow table that contains flow pointing to multiple flow tables or multiple
TIRs must have a level lower than 64. In our case it applies to muli-
destination flow table.
Fix the level of the created table to comply with HW Spec definitions, and
still make sure that its level lower than SW-owned tables, so that it
would be possible to point from the multi-destination FW table to SW
tables.
Fixes:
34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Mon, 3 May 2021 14:16:44 +0000 (17:16 +0300)]
net/mlx5e: Fix conflict with HW TS and CQE compression
When a driver's profile doesn't support a dedicated PTP-RQ,
configuration of CQE compression while HW TS is configured should fail.
Fixes:
885b8cfb161e ("net/mlx5e: Update ethtool setting of CQE compression")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Mon, 3 May 2021 13:59:55 +0000 (16:59 +0300)]
net/mlx5e: Fix HW TS with CQE compression according to profile
When the driver's profile doesn't support a dedicated PTP-RQ, the PTP
accuracy of HW TS is affected by the CQE compression. In this case,
turn off CQE compression. Otherwise, the driver crashes:
BUG: kernel NULL pointer dereference, address:
0000000000000018
...
...
RIP: 0010:mlx5e_ptp_rx_set_fs+0x25/0x1a0 [mlx5_core]
...
...
Call Trace:
mlx5e_ptp_activate_channel+0xb2/0xf0 [mlx5_core]
mlx5e_activate_priv_channels+0x3b9/0x8c0 [mlx5_core]
? __mutex_unlock_slowpath+0x45/0x2a0
? mlx5e_refresh_tirs+0x151/0x1e0 [mlx5_core]
mlx5e_switch_priv_channels+0x1cd/0x2d0 [mlx5_core]
? mlx5e_xdp_allowed+0x150/0x150 [mlx5_core]
mlx5e_safe_switch_params+0x118/0x3c0 [mlx5_core]
? __mutex_lock+0x6e/0x8e0
? mlx5e_hwstamp_set+0xa9/0x300 [mlx5_core]
mlx5e_hwstamp_set+0x194/0x300 [mlx5_core]
? dev_ioctl+0x9b/0x3d0
mlx5i_ioctl+0x37/0x60 [mlx5_core]
mlx5i_pkey_ioctl+0x12/0x20 [mlx5_core]
dev_ioctl+0xa9/0x3d0
sock_ioctl+0x268/0x420
__x64_sys_ioctl+0x3d8/0x790
? lockdep_hardirqs_on_prepare+0xe4/0x190
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes:
960fbfe222a4 ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Wed, 19 May 2021 07:00:27 +0000 (10:00 +0300)]
net/mlx5e: Fix adding encap rules to slow path
On some devices the ignore flow level cap is not supported and we
shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft()
already gives the correct end ft if ignore flow level cap is supported
or not.
Fixes:
39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Tue, 13 Apr 2021 11:35:22 +0000 (14:35 +0300)]
net/mlx5e: Check for needed capability for cvlan matching
If not supported show an error and return instead of trying to offload
to the hardware and fail.
Fixes:
699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Moshe Shemesh [Thu, 8 Apr 2021 04:30:57 +0000 (07:30 +0300)]
net/mlx5: Check firmware sync reset requested is set before trying to abort it
In case driver sent NACK to firmware on sync reset request, it will get
sync reset abort event while it didn't set sync reset requested mode.
Thus, on abort sync reset event handler, driver should check reset
requested is set before trying to stop sync reset poll.
Fixes:
7dd6df329d4c ("net/mlx5: Handle sync reset abort event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 29 Apr 2021 09:13:35 +0000 (12:13 +0300)]
net/mlx5e: Disable TLS offload for uplink representor
TLS offload is not supported in switchdev mode.
Fixes:
7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Tue, 25 May 2021 12:35:25 +0000 (15:35 +0300)]
net/mlx5e: Fix incompatible casting
Device supports setting of a single fec mode at a time, enforce this
by bitmap_weight == 1. Input from fec command is in u32, avoid cast to
unsigned long and use bitmap_from_arr32 to populate bitmap safely.
Fixes:
4bd9d5070b92 ("net/mlx5e: Enforce setting of a single FEC mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Joe Perches [Tue, 1 Jun 2021 16:38:58 +0000 (09:38 -0700)]
MAINTAINERS: nfc mailing lists are subscribers-only
It looks as if the MAINTAINERS entries for the nfc mailing list
should be updated as I just got a "rejected" bounce from the nfc list.
-------
Your message to the Linux-nfc mailing-list was rejected for the following
reasons:
The message is not from a list member
-------
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Jun 2021 22:58:05 +0000 (15:58 -0700)]
Merge branch 'ktls-use-after-free'
Maxim Mikityanskiy says:
====================
Fix use-after-free after the TLS device goes down and up
This small series fixes a use-after-free bug in the TLS offload code.
The first patch is a preparation for the second one, and the second is
the fix itself.
v2 changes:
Remove unneeded EXPORT_SYMBOL_GPL.
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Mikityanskiy [Tue, 1 Jun 2021 12:08:00 +0000 (15:08 +0300)]
net/tls: Fix use-after-free after the TLS device goes down and up
When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.
This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.
On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.
The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).
A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).
Fixes:
e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Mikityanskiy [Tue, 1 Jun 2021 12:07:59 +0000 (15:07 +0300)]
net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiapeng Chong [Tue, 1 Jun 2021 11:04:51 +0000 (19:04 +0800)]
ethernet: myri10ge: Fix missing error code in myri10ge_probe()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'status'.
Eliminate the follow smatch warning:
drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3818 myri10ge_probe()
warn: missing error code 'status'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Jun 2021 22:24:11 +0000 (15:24 -0700)]
Merge branch 'virtio_net-build_skb-fixes'
Xuan Zhuo says:
====================
virtio-net: fix for build_skb()
The logic of this piece is really messy. Fortunately, my refactored patch can be
completed with a small amount of testing.
====================
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xuan Zhuo [Tue, 1 Jun 2021 06:40:00 +0000 (14:40 +0800)]
virtio_net: get build_skb() buf by data ptr
In the case of merge, the page passed into page_to_skb() may be a head
page, not the page where the current data is located. So when trying to
get the buf where the data is located, we should get buf based on
headroom instead of offset.
This patch solves this problem. But if you don't use this patch, the
original code can also run, because if the page is not the page of the
current data, the calculated tailroom will be less than 0, and will not
enter the logic of build_skb() . The significance of this patch is to
modify this logical problem, allowing more situations to use
build_skb().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xuan Zhuo [Tue, 1 Jun 2021 06:39:59 +0000 (14:39 +0800)]
virtio-net: fix for unable to handle page fault for address
In merge mode, when xdp is enabled, if the headroom of buf is smaller
than virtnet_get_headroom(), xdp_linearize_page() will be called but the
variable of "headroom" is still 0, which leads to wrong logic after
entering page_to_skb().
[ 16.600944] BUG: unable to handle page fault for address:
ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode
[ 16.603350] #PF: error_code(0x0000) - not-present page
[ 16.604200] PGD 0 P4D 0
[ 16.604686] Oops: 0000 [#1] SMP PTI
[ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312
[ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
[ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
[ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[ 16.611863] RSP: 0018:
ffffc90002503c58 EFLAGS:
00010286
[ 16.612720] RAX:
ffffecbfff7b43c0 RBX:
00007f19f7203000 RCX:
ffffffff812ff359
[ 16.613853] RDX:
ffff888107778000 RSI:
0000000000000000 RDI:
0000000000000005
[ 16.614976] RBP:
ffffea000425e000 R08:
0000000000000000 R09:
3030303030303030
[ 16.616124] R10:
ffffffff82ed7d94 R11:
6637303030302052 R12:
7c00000afffded0f
[ 16.617276] R13:
0000000000000001 R14:
ffff888119ee7010 R15:
00007f19f7202000
[ 16.618423] FS:
0000000000000000(0000) GS:
ffff88842fd00000(0000) knlGS:
0000000000000000
[ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 16.620670] CR2:
ffffecbfff7b43c8 CR3:
0000000103220005 CR4:
0000000000370ee0
[ 16.621792] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 16.622920] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 16.624047] Call Trace:
[ 16.624525] ? release_pages+0x24d/0x730
[ 16.625209] unmap_single_vma+0xa9/0x130
[ 16.625885] unmap_vmas+0x76/0xf0
[ 16.626480] exit_mmap+0xa0/0x210
[ 16.627129] mmput+0x67/0x180
[ 16.627673] do_exit+0x3d1/0xf10
[ 16.628259] ? do_user_addr_fault+0x231/0x840
[ 16.629000] do_group_exit+0x53/0xd0
[ 16.629631] __x64_sys_exit_group+0x1d/0x20
[ 16.630354] do_syscall_64+0x3c/0x80
[ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 16.631828] RIP: 0033:0x7f1a043d0191
[ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
[ 16.633502] RSP: 002b:
00007ffe3d993308 EFLAGS:
00000246 ORIG_RAX:
00000000000000e7
[ 16.634737] RAX:
ffffffffffffffda RBX:
00007f1a044c9490 RCX:
00007f1a043d0191
[ 16.635857] RDX:
000000000000003c RSI:
00000000000000e7 RDI:
0000000000000000
[ 16.636986] RBP:
0000000000000000 R08:
ffffffffffffff88 R09:
0000000000000001
[ 16.638120] R10:
0000000000000008 R11:
0000000000000246 R12:
00007f1a044c9490
[ 16.639245] R13:
0000000000000001 R14:
00007f1a044c9968 R15:
0000000000000000
[ 16.640408] Modules linked in:
[ 16.640958] CR2:
ffffecbfff7b43c8
[ 16.641557] ---[ end trace
bc4891c6ce46354c ]---
[ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
[ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[ 16.645983] RSP: 0018:
ffffc90002503c58 EFLAGS:
00010286
[ 16.646845] RAX:
ffffecbfff7b43c0 RBX:
00007f19f7203000 RCX:
ffffffff812ff359
[ 16.647970] RDX:
ffff888107778000 RSI:
0000000000000000 RDI:
0000000000000005
[ 16.649091] RBP:
ffffea000425e000 R08:
0000000000000000 R09:
3030303030303030
[ 16.650250] R10:
ffffffff82ed7d94 R11:
6637303030302052 R12:
7c00000afffded0f
[ 16.651394] R13:
0000000000000001 R14:
ffff888119ee7010 R15:
00007f19f7202000
[ 16.652529] FS:
0000000000000000(0000) GS:
ffff88842fd00000(0000) knlGS:
0000000000000000
[ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 16.654841] CR2:
ffffecbfff7b43c8 CR3:
0000000103220005 CR4:
0000000000370ee0
[ 16.655992] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 16.657150] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 16.658290] Kernel panic - not syncing: Fatal exception
[ 16.659613] Kernel Offset: disabled
[ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---
Fixes:
fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 31 May 2021 21:00:30 +0000 (17:00 -0400)]
net: sock: fix in-kernel mark setting
This patch fixes the in-kernel mark setting by doing an additional
sk_dst_reset() which was introduced by commit
50254256f382 ("sock: Reset
dst when changing sk_mark via setsockopt"). The code is now shared to
avoid any further suprises when changing the socket mark value.
Fixes:
84d1c617402e ("net: sock: add sock_set_mark")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 31 May 2021 10:20:45 +0000 (13:20 +0300)]
net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
When using sub-VLANs in the range of 1-7, the resulting value from:
rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan);
is wrong according to the description from tag_8021q.c:
| 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+-----------+-----+-----------------+-----------+-----------------------+
| DIR | SVL | SWITCH_ID | SUBVLAN | PORT |
+-----------+-----+-----------------+-----------+-----------------------+
For example, when ds->index == 0, port == 3 and subvlan == 1,
dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for
subvlan == 0, but it should have returned 1043.
This is because the low portion of the subvlan bits are not masked
properly when writing into the 12-bit VLAN value. They are masked into
bits 4:3, but they should be masked into bits 5:4.
Fixes:
3eaae1d05f2b ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krzysztof Kozlowski [Mon, 31 May 2021 07:21:38 +0000 (09:21 +0200)]
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
It's possible to trigger NULL pointer dereference by local unprivileged
user, when calling getsockname() after failed bind() (e.g. the bind
fails because LLCP_SAP_MAX used as SAP):
BUG: kernel NULL pointer dereference, address:
0000000000000000
CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-
20210521+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
Call Trace:
llcp_sock_getname+0xb1/0xe0
__sys_getpeername+0x95/0xc0
? lockdep_hardirqs_on_prepare+0xd5/0x180
? syscall_enter_from_user_mode+0x1c/0x40
__x64_sys_getpeername+0x11/0x20
do_syscall_64+0x36/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
This can be reproduced with Syzkaller C repro (bind followed by
getpeername):
https://syzkaller.appspot.com/x/repro.c?x=
14def446e00000
Cc: <stable@vger.kernel.org>
Fixes:
d646960f7986 ("NFC: Initial LLCP support")
Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lin Ma [Sun, 30 May 2021 13:37:43 +0000 (21:37 +0800)]
Bluetooth: use correct lock to prevent UAF of hdev object
The hci_sock_dev_event() function will cleanup the hdev object for
sockets even if this object may still be in used within the
hci_sock_bound_ioctl() function, result in UAF vulnerability.
This patch replace the BH context lock to serialize these affairs
and prevent the race condition.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Sriranjani P [Fri, 28 May 2021 07:10:56 +0000 (12:40 +0530)]
net: stmmac: fix kernel panic due to NULL pointer dereference of mdio_bus_data
Fixed link does not need mdio bus and in that case mdio_bus_data will
not be allocated. Before using mdio_bus_data we should check for NULL.
This patch fix the kernel panic due to NULL pointer dereference of
mdio_bus_data when it is not allocated.
Without this patch we do see following kernel crash caused due to kernel
NULL pointer dereference.
Call trace:
stmmac_dvr_probe+0x3c/0x10b0
dwc_eth_dwmac_probe+0x224/0x378
platform_probe+0x68/0xe0
really_probe+0x130/0x3d8
driver_probe_device+0x68/0xd0
device_driver_attach+0x74/0x80
__driver_attach+0x58/0xf8
bus_for_each_dev+0x7c/0xd8
driver_attach+0x24/0x30
bus_add_driver+0x148/0x1f0
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
dwc_eth_dwmac_driver_init+0x1c/0x28
do_one_initcall+0x78/0x158
kernel_init_freeable+0x1f0/0x244
kernel_init+0x14/0x118
ret_from_fork+0x10/0x30
Code:
f9002bfb 9113e2d9 910e6273 aa0003f7 (
f9405c78)
---[ end trace
32d9d41562ddc081 ]---
Fixes:
e5e5b771f684 ("net: stmmac: make in-band AN mode parsing is supported for non-DT")
Signed-off-by: Sriranjani P <sriranjani.p@samsung.com>
Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Link: https://lore.kernel.org/r/20210528071056.35252-1-sriranjani.p@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Fri, 28 May 2021 12:03:04 +0000 (14:03 +0200)]
mt76: mt7921: remove leftover 80+80 HE capability
Fixes interop issues with some APs that disable HE Tx if this is present
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210528120304.34751-1-nbd@nbd.name