Taehee Yoo [Thu, 19 May 2022 03:15:54 +0000 (03:15 +0000)]
amt: fix gateway mode stuck
If a gateway can not receive any response to requests from a relay,
gateway resets status from SENT_REQUEST to INIT and variable about a
relay as well. And then it should start the full establish step
from sending a discovery message and receiving advertisement message.
But, after failure in amt_req_work() it continues sending a request
message step with flushed(invalid) relay information and sets SENT_REQUEST.
So, a gateway can't be established with a relay.
In order to avoid this situation, it stops sending the request message
step if it fails.
Fixes:
cbc21dc1cfe9 ("amt: add data plane of amt interface")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 19 May 2022 00:43:05 +0000 (17:43 -0700)]
net: stmmac: fix out-of-bounds access in a selftest
GCC 12 points out that struct tc_action is smaller than
struct tcf_action:
drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c: In function ‘stmmac_test_rxp’:
drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c:1132:21: warning: array subscript ‘struct tcf_gact[0]’ is partly outside array bounds of ‘unsigned char[272]’ [-Warray-bounds]
1132 | gact->tcf_action = TC_ACT_SHOT;
| ^~
Fixes:
ccfc639a94f2 ("net: stmmac: selftests: Add a selftest for Flexible RX Parser")
Link: https://lore.kernel.org/r/20220519004305.2109708-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Thu, 19 May 2022 02:01:48 +0000 (10:01 +0800)]
bonding: fix missed rcu protection
When removing the rcu_read_lock in bond_ethtool_get_ts_info() as
discussed [1], I didn't notice it could be called via setsockopt,
which doesn't hold rcu lock, as syzbot pointed:
stack backtrace:
CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline]
bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595
__ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554
ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568
sock_timestamping_bind_phc net/core/sock.c:869 [inline]
sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916
sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221
__sys_setsockopt+0x55e/0x6a0 net/socket.c:2223
__do_sys_setsockopt net/socket.c:2238 [inline]
__se_sys_setsockopt net/socket.c:2235 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2235
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8902c8eb39
Fix it by adding rcu_read_lock and take a ref on the real_dev.
Since dev_hold() and dev_put() can take NULL these days, we can
skip checking if real_dev exist.
[1] https://lore.kernel.org/netdev/27565.
1642742439@famine/
Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com
Fixes:
aa6034678e87 ("bonding: use rcu_dereference_rtnl when get bonding active slave")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220519020148.1058344-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Duoming Zhou [Wed, 18 May 2022 11:57:33 +0000 (19:57 +0800)]
NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
There are sleep in atomic context bugs when the request to secure
element of st21nfca is timeout. The root cause is that kzalloc and
alloc_skb with GFP_KERNEL parameter and mutex_lock are called in
st21nfca_se_wt_timeout which is a timer handler. The call tree shows
the execution paths that could lead to bugs:
(Interrupt context)
st21nfca_se_wt_timeout
nfc_hci_send_event
nfc_hci_hcp_message_tx
kzalloc(..., GFP_KERNEL) //may sleep
alloc_skb(..., GFP_KERNEL) //may sleep
mutex_lock() //may sleep
This patch moves the operations that may sleep into a work item.
The work item will run in another kernel thread which is in
process context to execute the bottom half of the interrupt.
So it could prevent atomic context from sleeping.
Fixes:
2130fb97fecf ("NFC: st21nfca: Adding support for secure element")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220518115733.62111-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 18 May 2022 16:09:15 +0000 (12:09 -0400)]
Documentation: add description for net.core.gro_normal_batch
Describe it in admin-guide/sysctl/net.rst like other Network core options.
Users need to know gro_normal_batch for performance tuning.
Fixes:
323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Reported-by: Prijesh Patel <prpatel@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/acf8a2c03b91bcde11f67ff89b6050089c0712a3.1652888963.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Harini Katakam [Wed, 18 May 2022 17:07:56 +0000 (22:37 +0530)]
net: macb: Fix PTP one step sync support
PTP one step sync packets cannot have CSUM padding and insertion in
SW since time stamp is inserted on the fly by HW.
In addition, ptp4l version 3.0 and above report an error when skb
timestamps are reported for packets that not processed for TX TS
after transmission.
Add a helper to identify PTP one step sync and fix the above two
errors. Add a common mask for PTP header flag field "twoStepflag".
Also reset ptp OSS bit when one step is not selected.
Fixes:
ab91f0a9b5f4 ("net: macb: Add hardware PTP support")
Fixes:
653e92a9175e ("net: macb: add support for padding and fcs computation")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220518170756.7752-1-harini.katakam@xilinx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 May 2022 15:50:29 +0000 (05:50 -1000)]
Merge tag 'net-5.18-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm and netfilter subtrees.
Notably this reverts a recent TCP/DCCP netns-related change to address
a possible UaF.
Current release - regressions:
- tcp: revert "tcp/dccp: get rid of inet_twsk_purge()"
- xfrm: set dst dev to blackhole_netdev instead of loopback_dev in
ifdown
Previous releases - regressions:
- netfilter: flowtable: fix TCP flow teardown
- can: revert "can: m_can: pci: use custom bit timings for Elkhart
Lake"
- xfrm: check encryption module availability consistency
- eth: vmxnet3: fix possible use-after-free bugs in
vmxnet3_rq_alloc_rx_buf()
- eth: mlx5: initialize flow steering during driver probe
- eth: ice: fix crash when writing timestamp on RX rings
Previous releases - always broken:
- mptcp: fix checksum byte order
- eth: lan966x: fix assignment of the MAC address
- eth: mlx5: remove HW-GRO from reported features
- eth: ftgmac100: disable hardware checksum on AST2600"
* tag 'net-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
ptp: ocp: change sysfs attr group handling
selftests: forwarding: fix missing backslash
netfilter: nf_tables: disable expression reduction infra
netfilter: flowtable: move dst_check to packet path
netfilter: flowtable: fix TCP flow teardown
net: ftgmac100: Disable hardware checksum on AST2600
igb: skip phy status check where unavailable
nfc: pn533: Fix buggy cleanup order
mptcp: Do TCP fallback on early DSS checksum failure
mptcp: fix checksum byte order
net: af_key: check encryption module availability consistency
net: af_key: add check for pfkey_broadcast in function pfkey_process
net/mlx5: Drain fw_reset when removing device
net/mlx5e: CT: Fix setting flow_source for smfs ct tuples
net/mlx5e: CT: Fix support for GRE tuples
net/mlx5e: Remove HW-GRO from reported features
net/mlx5e: Properly block HW GRO when XDP is enabled
net/mlx5e: Properly block LRO when XDP is enabled
net/mlx5e: Block rx-gro-hw feature in switchdev mode
...
Andrew Lunn [Wed, 18 May 2022 00:58:40 +0000 (02:58 +0200)]
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
It is possible to stack bridges on top of each other. Consider the
following which makes use of an Ethernet switch:
br1
/ \
/ \
/ \
br0.11 wlan0
|
br0
/ | \
p1 p2 p3
br0 is offloaded to the switch. Above br0 is a vlan interface, for
vlan 11. This vlan interface is then a slave of br1. br1 also has a
wireless interface as a slave. This setup trunks wireless lan traffic
over the copper network inside a VLAN.
A frame received on p1 which is passed up to the bridge has the
skb->offload_fwd_mark flag set to true, indicating that the switch has
dealt with forwarding the frame out ports p2 and p3 as needed. This
flag instructs the software bridge it does not need to pass the frame
back down again. However, the flag is not getting reset when the frame
is passed upwards. As a result br1 sees the flag, wrongly interprets
it, and fails to forward the frame to wlan0.
When passing a frame upwards, clear the flag. This is the Rx
equivalent of br_switchdev_frame_unmark() in br_dev_xmit().
Fixes:
f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jonathan Lemon [Tue, 17 May 2022 21:46:00 +0000 (14:46 -0700)]
ptp: ocp: change sysfs attr group handling
In the detach path, the driver calls sysfs_remove_group() for the
groups it believes has been registered. However, if the group was
never previously registered, then this causes a splat.
Instead, compute the groups that should be registered in advance,
and then call sysfs_create_groups(), which registers them all at once.
Update the error handling appropriately.
Fixes:
c205d53c4923 ("ptp: ocp: Add firmware capability bits for feature gating")
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220517214600.10606-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joachim Wiberg [Wed, 18 May 2022 15:16:30 +0000 (17:16 +0200)]
selftests: forwarding: fix missing backslash
Fix missing backslash, introduced in
f62c5acc800ee. Causes all tests to
not be installed.
Fixes:
f62c5acc800e ("selftests/net/forwarding: add missing tests to Makefile")
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20220518151630.2747773-1-troglobit@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 19 May 2022 02:34:25 +0000 (19:34 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Reduce number of hardware offload retries from flowtable datapath
which might hog system with retries, from Felix Fietkau.
2) Skip neighbour lookup for PPPoE device, fill_forward_path() already
provides this and set on destination address from fill_forward_path for
PPPoE device, also from Felix.
4) When combining PPPoE on top of a VLAN device, set info->outdev to the
PPPoE device so software offload works, from Felix.
5) Fix TCP teardown flowtable state, races with conntrack gc might result
in resetting the state to ESTABLISHED and the time to one day. Joint
work with Oz Shlomo and Sven Auhagen.
6) Call dst_check() from flowtable datapath to check if dst is stale
instead of doing it from garbage collector path.
7) Disable register tracking infrastructure, either user-space or
kernel need to pre-fetch keys inconditionally, otherwise register
tracking assumes data is already available in register that might
not well be there, leading to incorrect reductions.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disable expression reduction infra
netfilter: flowtable: move dst_check to packet path
netfilter: flowtable: fix TCP flow teardown
netfilter: nft_flow_offload: fix offload with pppoe + vlan
net: fix dev_fill_forward_path with pppoe + bridge
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
netfilter: flowtable: fix excessive hw offload attempts after failure
====================
Link: https://lore.kernel.org/r/20220518213841.359653-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 May 2022 00:32:27 +0000 (14:32 -1000)]
Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a small fix for a missing fifo time assigment for the head
insertion case in mq-deadline"
* tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
block/mq-deadline: Set the fifo_time member also if inserting at head
Linus Torvalds [Thu, 19 May 2022 00:21:30 +0000 (14:21 -1000)]
Merge tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two small changes fixing issues from the 5.18 merge window:
- Fix wrong ordering of a tracepoint (Dylan)
- Fix MSG_RING on IOPOLL rings (me)"
* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
io_uring: don't attempt to IOPOLL for MSG_RING requests
io_uring: fix ordering of args in io_uring_queue_async_work
Linus Torvalds [Thu, 19 May 2022 00:19:26 +0000 (14:19 -1000)]
Merge tag 'audit-pr-
20220518' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single audit patch to fix a problem where a task's audit_context was
not being properly reset with io_uring"
* tag 'audit-pr-
20220518' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts
Linus Torvalds [Thu, 19 May 2022 00:15:35 +0000 (14:15 -1000)]
Merge tag 'selinux-pr-
20220518' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single SELinux patch to fix an error path that was doing the wrong
thing with respect to freeing memory"
* tag 'selinux-pr-
20220518' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix bad cleanup on error in hashtab_duplicate()
Linus Torvalds [Thu, 19 May 2022 00:07:43 +0000 (14:07 -1000)]
Merge branch 'arm/fixes' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"The SoC bug fixes have calmed down sufficiently, there is one minor
update for the MAINTAINERS file, and few bug fixes for dts
descriptions:
- Updates to the BananaPi R2-Pro (rk3568) dts to match production
hardware rather than the prototype version.
- Qualcomm sm8250 soundwire gets disabled on some machines to avoid
crashes
- A number of aspeed SoC specific fixes, addressing incorrect pin
cotrol settings, some values in the romed8hm board, and a revert
for an accidental removal of a DT node"
* 'arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: omap: remove me as a maintainer
ARM: dts: aspeed: Add video engine to g6
ARM: dts: aspeed: romed8hm3: Fix GPIOB0 name
ARM: dts: aspeed: romed8hm3: Add lm25066 sense resistor values
ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group
ARM: dts: aspeed-g6: add FWQSPI group in pinctrl dtsi
dt-bindings: pinctrl: aspeed-g6: add FWQSPI function/group
pinctrl: pinctrl-aspeed-g6: add FWQSPI function-group
dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group
pinctrl: pinctrl-aspeed-g6: remove FWQSPID group in pinctrl
ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi
arm64: dts: qcom: sm8250: don't enable rx/tx macro by default
arm64: dts: rockchip: Add gmac1 and change network settings of bpi-r2-pro
arm64: dts: rockchip: Change io-domains of bpi-r2-pro
Linus Torvalds [Thu, 19 May 2022 00:02:25 +0000 (14:02 -1000)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull misc fixes from Al Viro:
"vhost race fix and a percpu_ref_init-caused cgroup double-free fix.
The latter had manifested as buggered struct mount refcounting - those
are also using percpu data structures, but anything that does percpu
allocations could be hit"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Fix double fget() in vhost_net_set_backend()
percpu_ref_init(): clean ->percpu_count_ref on failure
Linus Torvalds [Wed, 18 May 2022 23:53:53 +0000 (13:53 -1000)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull mlx5 fix from Michael Tsirkin:
"One last minute fixup
The patch has been on list for a while but as it was posted as part of
a thread it was missed"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Use consistent RQT size
Al Viro [Mon, 16 May 2022 08:42:13 +0000 (16:42 +0800)]
Fix double fget() in vhost_net_set_backend()
Descriptor table is a shared resource; two fget() on the same descriptor
may return different struct file references. get_tap_ptr_ring() is
called after we'd found (and pinned) the socket we'll be using and it
tries to find the private tun/tap data structures associated with it.
Redoing the lookup by the same file descriptor we'd used to get the
socket is racy - we need to same struct file.
Thanks to Jason for spotting a braino in the original variant of patch -
I'd missed the use of fd == -1 for disabling backend, and in that case
we can end up with sock == NULL and sock != oldsock.
Cc: stable@kernel.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eli Cohen [Mon, 16 May 2022 08:47:35 +0000 (11:47 +0300)]
vdpa/mlx5: Use consistent RQT size
The current code evaluates RQT size based on the configured number of
virtqueues. This can raise an issue in the following scenario:
Assume MQ was negotiated.
1. mlx5_vdpa_set_map() gets called.
2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower
than the configured max VQs.
3. A second set_map gets called, but now a smaller number of VQs is used
to evaluate the size of the RQT.
4. handle_ctrl_mq() is called with a value larger than what the RQT can
hold. This will emit errors and the driver state is compromised.
To fix this, we use a new field in struct mlx5_vdpa_net to hold the
required number of entries in the RQT. This value is evaluated in
mlx5_vdpa_set_driver_features() where we have the negotiated features
all set up.
In addition to that, we take into consideration the max capability of RQT
entries early when the device is added so we don't need to take consider
it when creating the RQT.
Last, we remove the use of mlx5_vdpa_max_qps() which just returns the
max_vas / 2 and make the code clearer.
Fixes:
52893733f2c5 ("vdpa/mlx5: Add multiqueue support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Wed, 18 May 2022 15:53:43 +0000 (05:53 -1000)]
Merge tag 'sound-5.18' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of last-minute HD- an USB-audio quirks in addition to a
fix for the legacy ISA wavefront driver.
All look small and easy"
* tag 'sound-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Restore Rane SL-1 quirk
ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machine
ALSA: hda/realtek: Add quirk for TongFang devices with pop noise
ALSA: hda/realtek: Add quirk for the Framework Laptop
ALSA: wavefront: Proper check of get_user() error
ALSA: hda/realtek: Add quirk for Dell Latitude 7520
ALSA: hda - fix unused Realtek function when PM is not enabled
ALSA: usb-audio: Don't get sample rate for MCT Trigger 5 USB-to-HDMI
Pablo Neira Ayuso [Wed, 18 May 2022 12:51:34 +0000 (14:51 +0200)]
netfilter: nf_tables: disable expression reduction infra
Either userspace or kernelspace need to pre-fetch keys inconditionally
before comparisons for this to work. Otherwise, register tracking data
is misleading and it might result in reducing expressions which are not
yet registers.
First expression is also guaranteed to be evaluated always, however,
certain expressions break before writing data to registers, before
comparing the data, leaving the register in undetermined state.
This patch disables this infrastructure by now.
Fixes:
b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ritaro Takenaka [Tue, 17 May 2022 10:55:30 +0000 (12:55 +0200)]
netfilter: flowtable: move dst_check to packet path
Fixes sporadic IPv6 packet loss when flow offloading is enabled.
IPv6 route GC and flowtable GC are not synchronized.
When dst_cache becomes stale and a packet passes through the flow before
the flowtable GC teardowns it, the packet can be dropped.
So, it is necessary to check dst every time in packet path.
Fixes:
227e1e4d0d6c ("netfilter: nf_flowtable: skip device lookup from interface index")
Signed-off-by: Ritaro Takenaka <ritarot634@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 17 May 2022 08:44:14 +0000 (10:44 +0200)]
netfilter: flowtable: fix TCP flow teardown
This patch addresses three possible problems:
1. ct gc may race to undo the timeout adjustment of the packet path, leaving
the conntrack entry in place with the internal offload timeout (one day).
2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
timeout is reached before the flow offload del.
3. tcp ct is always set to ESTABLISHED with a very long timeout
in flow offload teardown/delete even though the state might be already
CLOSED. Also as a remark we cannot assume that the FIN or RST packet
is hitting flow table teardown as the packet might get bumped to the
slow path in nftables.
This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.
Moreover, teturn the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.
Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.
If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().
This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.
flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.
Joint work with Oz and Sven.
Fixes:
1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Joel Stanley [Tue, 17 May 2022 09:22:17 +0000 (18:52 +0930)]
net: ftgmac100: Disable hardware checksum on AST2600
The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.
On a local network, the following test was performed using a 1MB file of
random data.
On the receiver run this script:
#!/bin/bash
while [ 1 ]; do
# Zero the stats
nstat -r > /dev/null
nc -l 9899 > test-file
# Check for checksum errors
TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
if [ -z "$TcpInCsumErrors" ]; then
echo No TcpInCsumErrors
else
echo TcpInCsumErrors = $TcpInCsumErrors
fi
done
On an AST2600 system:
# nc <IP of receiver host> 9899 < test-file
The test was repeated with various MTU values:
# ip link set mtu 1410 dev eth0
The observed results:
1500 - good
1434 - bad
1400 - good
1410 - bad
1420 - good
The test was repeated after disabling tx checksumming:
# ethtool -K eth0 tx-checksumming off
And all MTU values tested resulted in transfers without error.
An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.
David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.
The vendor suspects this this is a hardware issue when using NC-SI. The
fixes line refers to the patch that introduced AST2600 support.
Reported-by: David Wilder <wilder@us.ibm.com>
Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Mitchell [Tue, 17 May 2022 18:01:05 +0000 (11:01 -0700)]
igb: skip phy status check where unavailable
igb_read_phy_reg() will silently return, leaving phy_data untouched, if
hw->ops.read_reg isn't set. Depending on the uninitialized value of
phy_data, this led to the phy status check either succeeding immediately
or looping continuously for 2 seconds before emitting a noisy err-level
timeout. This message went out to the console even though there was no
actual problem.
Instead, first check if there is read_reg function pointer. If not,
proceed without trying to check the phy status register.
Fixes:
b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition")
Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lin Ma [Wed, 18 May 2022 10:53:21 +0000 (18:53 +0800)]
nfc: pn533: Fix buggy cleanup order
When removing the pn533 device (i2c or USB), there is a logic error. The
original code first cancels the worker (flush_delayed_work) and then
destroys the workqueue (destroy_workqueue), leaving the timer the last
one to be deleted (del_timer). This result in a possible race condition
in a multi-core preempt-able kernel. That is, if the cleanup
(pn53x_common_clean) is concurrently run with the timer handler
(pn533_listen_mode_timer), the timer can queue the poll_work to the
already destroyed workqueue, causing use-after-free.
This patch reorder the cleanup: it uses the del_timer_sync to make sure
the handler is finished before the routine will destroy the workqueue.
Note that the timer cannot be activated by the worker again.
static void pn533_wq_poll(struct work_struct *work)
...
rc = pn533_send_poll_frame(dev);
if (rc)
return;
if (cur_mod->len == 0 && dev->poll_mod_count > 1)
mod_timer(&dev->listen_timer, ...);
That is, the mod_timer can be called only when pn533_send_poll_frame()
returns no error, which is impossible because the device is detaching
and the lower driver should return ENODEV code.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 12:05:43 +0000 (13:05 +0100)]
Merge branch 'mptcp-checksums'
Mat Martineau says:
====================
mptcp: Fix checksum byte order on little-endian
These patches address a bug in the byte ordering of MPTCP checksums on
little-endian architectures. The __sum16 type is always big endian, but
was being cast to u16 and then byte-swapped (on little-endian archs)
when reading/writing the checksum field in MPTCP option headers.
MPTCP checksums are off by default, but are enabled if one or both peers
request it in the SYN/SYNACK handshake.
The corrected code is verified to interoperate between big-endian and
little-endian machines.
Patch 1 fixes the checksum byte order, patch 2 partially mitigates
interoperation with peers sending bad checksums by falling back to TCP
instead of resetting the connection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau [Tue, 17 May 2022 18:02:12 +0000 (11:02 -0700)]
mptcp: Do TCP fallback on early DSS checksum failure
RFC 8684 section 3.7 describes several opportunities for a MPTCP
connection to "fall back" to regular TCP early in the connection
process, before it has been confirmed that MPTCP options can be
successfully propagated on all SYN, SYN/ACK, and data packets. If a peer
acknowledges the first received data packet with a regular TCP header
(no MPTCP options), fallback is allowed.
If the recipient of that first data packet finds a MPTCP DSS checksum
error, this provides an opportunity to fail gracefully with a TCP
fallback rather than resetting the connection (as might happen if a
checksum failure were detected later).
This commit modifies the checksum failure code to attempt fallback on
the initial subflow of a MPTCP connection, only if it's a failure in the
first data mapping. In cases where the peer initiates the connection,
requests checksums, is the first to send data, and the peer is sending
incorrect checksums (see
https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows
the connection to proceed as TCP rather than reset.
Fixes:
dd8bcd1768ff ("mptcp: validate the data checksum")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 17 May 2022 18:02:11 +0000 (11:02 -0700)]
mptcp: fix checksum byte order
The MPTCP code typecasts the checksum value to u16 and
then converts it to big endian while storing the value into
the MPTCP option.
As a result, the wire encoding for little endian host is
wrong, and that causes interoperabilty interoperability
issues with other implementation or host with different endianness.
Address the issue writing in the packet the unmodified __sum16 value.
MPTCP checksum is disabled by default, interoperating with systems
with bad mptcp-level csum encoding should cause fallback to TCP.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275
Fixes:
c5b39e26d003 ("mptcp: send out checksum for DSS")
Fixes:
390b95a5fb84 ("mptcp: receive checksum for DSS")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 11:59:36 +0000 (12:59 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-05-17
This series contains updates to ice driver only.
Arkadiusz prevents writing of timestamps when rings are being
configured to resolve null pointer dereference.
Paul changes a delayed call to baseline statistics to occur immediately
which was causing misreporting of statistics due to the delay.
Michal fixes incorrect restoration of interrupt moderation settings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 11:47:36 +0000 (12:47 +0100)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2022-05-18
1) Fix "disable_policy" flag use when arriving from different devices.
From Eyal Birger.
2) Fix error handling of pfkey_broadcast in function pfkey_process.
From Jiasheng Jiang.
3) Check the encryption module availability consistency in pfkey.
From Thomas Bartschies.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 May 2022 10:33:44 +0000 (11:33 +0100)]
Merge tag 'mlx5-fixes-2022-05-17' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-05-17
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Bartschies [Wed, 18 May 2022 06:32:18 +0000 (08:32 +0200)]
net: af_key: check encryption module availability consistency
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.
Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Jiasheng Jiang [Tue, 17 May 2022 09:42:31 +0000 (17:42 +0800)]
net: af_key: add check for pfkey_broadcast in function pfkey_process
If skb_clone() returns null pointer, pfkey_broadcast() will
return error.
Therefore, it should be better to check the return value of
pfkey_broadcast() and return error if fails.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Al Viro [Wed, 18 May 2022 06:13:40 +0000 (02:13 -0400)]
percpu_ref_init(): clean ->percpu_count_ref on failure
That way percpu_ref_exit() is safe after failing percpu_ref_init().
At least one user (cgroup_create()) had a double-free that way;
there might be other similar bugs. Easier to fix in percpu_ref_init(),
rather than playing whack-a-mole in sloppy users...
Usual symptoms look like a messed refcounting in one of subsystems
that use percpu allocations (might be percpu-refcount, might be
something else). Having refcounts for two different objects share
memory is Not Nice(tm)...
Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Shay Drory [Mon, 4 Apr 2022 07:47:36 +0000 (10:47 +0300)]
net/mlx5: Drain fw_reset when removing device
In case fw sync reset is called in parallel to device removal, device
might stuck in the following deadlock:
CPU 0 CPU 1
----- -----
remove_one
uninit_one (locks intf_state_mutex)
mlx5_sync_reset_now_event()
work in fw_reset->wq.
mlx5_enter_error_state()
mutex_lock (intf_state_mutex)
cleanup_once
fw_reset_cleanup()
destroy_workqueue(fw_reset->wq)
Drain the fw_reset WQ, and make sure no new work is being queued, before
entering uninit_one().
The Drain is done before devlink_unregister() since fw_reset, in some
flows, is using devlink API devlink_remote_reload_actions_performed().
Fixes:
38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Paul Blakey [Thu, 7 Apr 2022 10:37:32 +0000 (13:37 +0300)]
net/mlx5e: CT: Fix setting flow_source for smfs ct tuples
Cited patch sets flow_source to ANY overriding the provided spec
flow_source, avoiding the optimization done by commit
c9c079b4deaa
("net/mlx5: CT: Set flow source hint from provided tuple device").
To fix the above, set the dr_rule flow_source from provided flow spec.
Fixes:
3ee61ebb0df1 ("net/mlx5: CT: Add software steering ct flow steering provider")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Paul Blakey [Sun, 20 Mar 2022 12:02:24 +0000 (14:02 +0200)]
net/mlx5e: CT: Fix support for GRE tuples
cited commit removed support for GRE tuples when software steering was enabled.
To bring back support for GRE tuples, add GRE ipv4/ipv6 matchers.
Fixes:
3ee61ebb0df1 ("net/mlx5: CT: Add software steering ct flow steering provider")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Gal Pressman [Wed, 13 Apr 2022 12:50:42 +0000 (15:50 +0300)]
net/mlx5e: Remove HW-GRO from reported features
We got reports of certain HW-GRO flows causing kernel call traces, which
might be related to firmware. To be on the safe side, disable the
feature for now and re-enable it once a driver/firmware fix is found.
Fixes:
83439f3c37aa ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maxim Mikityanskiy [Tue, 12 Apr 2022 15:54:26 +0000 (18:54 +0300)]
net/mlx5e: Properly block HW GRO when XDP is enabled
HW GRO is incompatible and mutually exclusive with XDP and XSK. However,
the needed checks are only made when enabling XDP. If HW GRO is enabled
when XDP is already active, the command will succeed, and XDP will be
skipped in the data path, although still enabled.
This commit fixes the bug by checking the XDP and XSK status in
mlx5e_fix_features and disabling HW GRO if XDP is enabled.
Fixes:
83439f3c37aa ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maxim Mikityanskiy [Tue, 12 Apr 2022 15:37:03 +0000 (18:37 +0300)]
net/mlx5e: Properly block LRO when XDP is enabled
LRO is incompatible and mutually exclusive with XDP. However, the needed
checks are only made when enabling XDP. If LRO is enabled when XDP is
already active, the command will succeed, and XDP will be skipped in the
data path, although still enabled.
This commit fixes the bug by checking the XDP status in
mlx5e_fix_features and disabling LRO if XDP is enabled.
Fixes:
86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Mon, 11 Apr 2022 14:29:08 +0000 (17:29 +0300)]
net/mlx5e: Block rx-gro-hw feature in switchdev mode
When the driver is in switchdev mode and rx-gro-hw is set, the RQ needs
special CQE handling. Till then, block setting of rx-gro-hw feature in
switchdev mode, to avoid failure while setting the feature due to
failure while opening the RQ.
Fixes:
f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maxim Mikityanskiy [Mon, 4 Apr 2022 18:51:15 +0000 (21:51 +0300)]
net/mlx5e: Wrap mlx5e_trap_napi_poll into rcu_read_lock
The body of mlx5e_napi_poll is wrapped into rcu_read_lock to be able to
read the XDP program pointer using rcu_dereference. However, the trap RQ
NAPI doesn't use rcu_read_lock, because the trap RQ works only in the
non-linear mode, and mlx5e_skb_from_cqe_nonlinear, until recently,
didn't support XDP and didn't call rcu_dereference.
Starting from the cited commit, mlx5e_skb_from_cqe_nonlinear supports
XDP and calls rcu_dereference, but mlx5e_trap_napi_poll doesn't wrap it
into rcu_read_lock. It leads to RCU-lockdep warnings like this:
WARNING: suspicious RCU usage
This commit fixes the issue by adding an rcu_read_lock to
mlx5e_trap_napi_poll, similarly to mlx5e_napi_poll.
Fixes:
ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Sun, 3 Apr 2022 20:18:10 +0000 (23:18 +0300)]
net/mlx5: DR, Ignore modify TTL on RX if device doesn't support it
When modifying TTL, packet's csum has to be recalculated.
Due to HW issue in ConnectX-5, csum recalculation for modify
TTL on RX is supported through a work-around that is specifically
enabled by configuration.
If the work-around isn't enabled, rather than adding an unsupported
action the modify TTL action on RX should be ignored.
Ignoring modify TTL action might result in zero actions, so in such
cases we will not convert the match STE to modify STE, as it is done
by FW in DMFS.
This patch fixes an issue where modify TTL action was ignored both
on RX and TX instead of only on RX.
Fixes:
4ff725e1d4ad ("net/mlx5: DR, Ignore modify TTL if device doesn't support it")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shay Drory [Wed, 9 Mar 2022 12:45:58 +0000 (14:45 +0200)]
net/mlx5: Initialize flow steering during driver probe
Currently, software objects of flow steering are created and destroyed
during reload flow. In case a device is unloaded, the following error
is printed during grace period:
mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95):
Driver is in error state. Unloading
As a solution to fix use-after-free bugs, where we try to access
these objects, when reading the value of flow_steering_mode devlink
param[1], let's split flow steering creation and destruction into two
routines:
* init and cleanup: memory, cache, and pools allocation/free.
* create and destroy: namespaces initialization and cleanup.
While at it, re-order the cleanup function to mirror the init function.
[1]
Kasan trace:
[ 385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] Read of size 4 at addr
ffff888104b79308 by task bash/291
[ 385.119849 ]
[ 385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ #2
[ 385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[ 385.119849 ] Call Trace:
[ 385.119849 ] <TASK>
[ 385.119849 ] dump_stack_lvl+0x6e/0x91
[ 385.119849 ] print_address_description.constprop.0+0x1f/0x160
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] kasan_report.cold+0x83/0xdf
[ 385.119849 ] ? devlink_param_notify+0x20/0x190
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] devlink_nl_param_fill+0x18a/0xa50
[ 385.119849 ] ? _raw_spin_lock_irqsave+0x8d/0xe0
[ 385.119849 ] ? devlink_flash_update_timeout_notify+0xf0/0xf0
[ 385.119849 ] ? __wake_up_common+0x4b/0x1e0
[ 385.119849 ] ? preempt_count_sub+0x14/0xc0
[ 385.119849 ] ? _raw_spin_unlock_irqrestore+0x28/0x40
[ 385.119849 ] ? __wake_up_common_lock+0xe3/0x140
[ 385.119849 ] ? __wake_up_common+0x1e0/0x1e0
[ 385.119849 ] ? __sanitizer_cov_trace_const_cmp8+0x27/0x80
[ 385.119849 ] ? __rcu_read_unlock+0x48/0x70
[ 385.119849 ] ? kasan_unpoison+0x23/0x50
[ 385.119849 ] ? __kasan_slab_alloc+0x2c/0x80
[ 385.119849 ] ? memset+0x20/0x40
[ 385.119849 ] ? __sanitizer_cov_trace_const_cmp4+0x25/0x80
[ 385.119849 ] devlink_param_notify+0xce/0x190
[ 385.119849 ] devlink_unregister+0x92/0x2b0
[ 385.119849 ] remove_one+0x41/0x140
[ 385.119849 ] pci_device_remove+0x68/0x140
[ 385.119849 ] ? pcibios_free_irq+0x10/0x10
[ 385.119849 ] __device_release_driver+0x294/0x3f0
[ 385.119849 ] device_driver_detach+0x82/0x130
[ 385.119849 ] unbind_store+0x193/0x1b0
[ 385.119849 ] ? subsys_interface_unregister+0x270/0x270
[ 385.119849 ] drv_attr_store+0x4e/0x70
[ 385.119849 ] ? drv_attr_show+0x60/0x60
[ 385.119849 ] sysfs_kf_write+0xa7/0xc0
[ 385.119849 ] kernfs_fop_write_iter+0x23a/0x2f0
[ 385.119849 ] ? sysfs_kf_bin_read+0x160/0x160
[ 385.119849 ] new_sync_write+0x311/0x430
[ 385.119849 ] ? new_sync_read+0x480/0x480
[ 385.119849 ] ? _raw_spin_lock+0x87/0xe0
[ 385.119849 ] ? __sanitizer_cov_trace_cmp4+0x25/0x80
[ 385.119849 ] ? security_file_permission+0x94/0xa0
[ 385.119849 ] vfs_write+0x4c7/0x590
[ 385.119849 ] ksys_write+0xf6/0x1e0
[ 385.119849 ] ? __x64_sys_read+0x50/0x50
[ 385.119849 ] ? fpregs_assert_state_consistent+0x99/0xa0
[ 385.119849 ] do_syscall_64+0x3d/0x90
[ 385.119849 ] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 385.119849 ] RIP: 0033:0x7fc36ef38504
[ 385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f
80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 385.119849 ] RSP: 002b:
00007ffde0ff3d08 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
[ 385.119849 ] RAX:
ffffffffffffffda RBX:
000000000000000c RCX:
00007fc36ef38504
[ 385.119849 ] RDX:
000000000000000c RSI:
00007fc370521040 RDI:
0000000000000001
[ 385.119849 ] RBP:
00007fc370521040 R08:
00007fc36f00b8c0 R09:
00007fc36ee4b740
[ 385.119849 ] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fc36f00a760
[ 385.119849 ] R13:
000000000000000c R14:
00007fc36f005760 R15:
000000000000000c
[ 385.119849 ] </TASK>
[ 385.119849 ]
[ 385.119849 ] Allocated by task 65:
[ 385.119849 ] kasan_save_stack+0x1e/0x40
[ 385.119849 ] __kasan_kmalloc+0x81/0xa0
[ 385.119849 ] mlx5_init_fs+0x11b/0x1160
[ 385.119849 ] mlx5_load+0x13c/0x220
[ 385.119849 ] mlx5_load_one+0xda/0x160
[ 385.119849 ] mlx5_recover_device+0xb8/0x100
[ 385.119849 ] mlx5_health_try_recover+0x2f9/0x3a1
[ 385.119849 ] devlink_health_reporter_recover+0x75/0x100
[ 385.119849 ] devlink_health_report+0x26c/0x4b0
[ 385.275909 ] mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0
[ 385.275909 ] process_one_work+0x520/0x970
[ 385.275909 ] worker_thread+0x378/0x950
[ 385.275909 ] kthread+0x1bb/0x200
[ 385.275909 ] ret_from_fork+0x1f/0x30
[ 385.275909 ]
[ 385.275909 ] Freed by task 65:
[ 385.275909 ] kasan_save_stack+0x1e/0x40
[ 385.275909 ] kasan_set_track+0x21/0x30
[ 385.275909 ] kasan_set_free_info+0x20/0x30
[ 385.275909 ] __kasan_slab_free+0xfc/0x140
[ 385.275909 ] kfree+0xa5/0x3b0
[ 385.275909 ] mlx5_unload+0x2e/0xb0
[ 385.275909 ] mlx5_unload_one+0x86/0xb0
[ 385.275909 ] mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf
[ 385.275909 ] process_one_work+0x520/0x970
[ 385.275909 ] worker_thread+0x378/0x950
[ 385.275909 ] kthread+0x1bb/0x200
[ 385.275909 ] ret_from_fork+0x1f/0x30
[ 385.275909 ]
[ 385.275909 ] The buggy address belongs to the object at
ffff888104b79300
[ 385.275909 ] which belongs to the cache kmalloc-128 of size 128
[ 385.275909 ] The buggy address is located 8 bytes inside of
[ 385.275909 ] 128-byte region [
ffff888104b79300,
ffff888104b79380)
[ 385.275909 ] The buggy address belongs to the page:
[ 385.275909 ] page:
00000000de44dd39 refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x104b78
[ 385.275909 ] head:
00000000de44dd39 order:1 compound_mapcount:0
[ 385.275909 ] flags: 0x8000000000010200(slab|head|zone=2)
[ 385.275909 ] raw:
8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0
[ 385.275909 ] raw:
0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 385.275909 ] page dumped because: kasan: bad access detected
[ 385.275909 ]
[ 385.275909 ] Memory state around the buggy address:
[ 385.275909 ]
ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
[ 385.275909 ]
ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 385.275909 ] >
ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 385.275909 ] ^
[ 385.275909 ]
ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 385.275909 ]
ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 385.275909 ]]
Fixes:
e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Mon, 21 Mar 2022 08:07:44 +0000 (10:07 +0200)]
net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table
In order to support multiple destination FTEs with SW steering
FW table is created with single FTE with multiple actions and
SW steering rule forward to it. When creating this table, flow
source isn't set according to the original FTE.
Fix this by passing the original FTE flow source to the created
FW table.
Fixes:
34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Duoming Zhou [Tue, 17 May 2022 01:25:30 +0000 (09:25 +0800)]
NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc
There are sleep in atomic context bugs when the request to secure
element of st-nci is timeout. The root cause is that nci_skb_alloc
with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is
a timer handler. The call paths that could trigger bugs are shown below:
(interrupt context 1)
st_nci_se_wt_timeout
nci_hci_send_event
nci_hci_send_data
nci_skb_alloc(..., GFP_KERNEL) //may sleep
(interrupt context 2)
st_nci_se_wt_timeout
nci_hci_send_event
nci_hci_send_data
nci_send_data
nci_queue_tx_data_frags
nci_skb_alloc(..., GFP_KERNEL) //may sleep
This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to
GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.
Fixes:
ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220517012530.75714-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christophe JAILLET [Sun, 15 May 2022 18:07:02 +0000 (20:07 +0200)]
net/qla3xxx: Fix a test in ql_reset_work()
test_bit() tests if one bit is set or not.
Here the logic seems to check of bit QL_RESET_PER_SCSI (i.e. 4) OR bit
QL_RESET_START (i.e. 3) is set.
In fact, it checks if bit 7 (4 | 3 = 7) is set, that is to say
QL_ADAPTER_UP.
This looks harmless, because this bit is likely be set, and when the
ql_reset_work() delayed work is scheduled in ql3xxx_isr() (the only place
that schedule this work), QL_RESET_START or QL_RESET_PER_SCSI is set.
This has been spotted by smatch.
Fixes:
5a4faa873782 ("[PATCH] qla3xxx NIC driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/80e73e33f390001d9c0140ffa9baddf6466a41a2.1652637337.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 17 May 2022 23:46:22 +0000 (13:46 -1000)]
Merge tag 'pci-v5.18-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Avoid putting Elo i2 PCIe Ports in D3cold because downstream devices
are inaccessible after going back to D0 (Rafael J. Wysocki)
- Qualcomm SM8250 has a ddrss_sf_tbu clock but SC8180X does not; make a
SC8180X-specific config without the clock so it probes correctly
(Bjorn Andersson)
- Revert aardvark chained IRQ handler rewrite because it broke
interrupt affinity (Pali Rohár)
* tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
PCI: qcom: Remove ddrss_sf_tbu clock from SC8180X
PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
Linus Torvalds [Tue, 17 May 2022 23:40:44 +0000 (13:40 -1000)]
Merge tag 'thermal-5.18-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix up a recent change in the int340x thermal driver that
inadvertently broke thermal zone handling on some systems
(Srinivas Pandruvada)"
* tag 'thermal-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Mode setting with new OS handshake
Ondrej Mosnacek [Tue, 17 May 2022 12:08:16 +0000 (14:08 +0200)]
selinux: fix bad cleanup on error in hashtab_duplicate()
The code attempts to free the 'new' pointer using kmem_cache_free(),
which is wrong because this function isn't responsible of freeing it.
Instead, the function should free new->htable and clear the contents of
*new (to prevent double-free).
Cc: stable@vger.kernel.org
Fixes:
c7c556f1e81b ("selinux: refactor changing booleans")
Reported-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Julian Orth [Tue, 17 May 2022 10:32:53 +0000 (12:32 +0200)]
audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes:
5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Jens Axboe [Tue, 17 May 2022 18:32:05 +0000 (12:32 -0600)]
io_uring: don't attempt to IOPOLL for MSG_RING requests
We gate whether to IOPOLL for a request on whether the opcode is allowed
on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING
is the only one that allows a file yet isn't pollable, it's merely
supported to allow communication on an IOPOLL ring, not because we can
poll for completion of it.
Put the assigned file early and clear it, so we don't attempt to poll
for it.
Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com
Fixes:
3f1d52abf098 ("io_uring: defer msg-ring file validity check until command issue")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michal Wilczynski [Sun, 8 May 2022 23:33:48 +0000 (19:33 -0400)]
ice: Fix interrupt moderation settings getting cleared
Adaptive-rx and Adaptive-tx are interrupt moderation settings
that can be enabled/disabled using ethtool:
ethtool -C ethX adaptive-rx on/off adaptive-tx on/off
Unfortunately those settings are getting cleared after
changing number of queues, or in ethtool world 'channels':
ethtool -L ethX rx 1 tx 1
Clearing was happening due to introduction of bit fields
in ice_ring_container struct. This way only itr_setting
bits were rebuilt during ice_vsi_rebuild_set_coalesce().
Introduce an anonymous struct of bitfields and create a
union to refer to them as a single variable.
This way variable can be easily saved and restored.
Fixes:
61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paul Greenwalt [Thu, 28 Apr 2022 21:11:42 +0000 (14:11 -0700)]
ice: fix possible under reporting of ethtool Tx and Rx statistics
The hardware statistics counters are not cleared during resets so the
drivers first access is to initialize the baseline and then subsequent
reads are for reporting the counters. The statistics counters are read
during the watchdog subtask when the interface is up. If the baseline
is not initialized before the interface is up, then there can be a brief
window in which some traffic can be transmitted/received before the
initial baseline reading takes place.
Directly initialize ethtool statistics in driver open so the baseline will
be initialized when the interface is up, and any dropped packets
incremented before the interface is up won't be reported.
Fixes:
28dc1b86f8ea9 ("ice: ignore dropped packets during init")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Arkadiusz Kubalewski [Thu, 28 Apr 2022 08:33:50 +0000 (10:33 +0200)]
ice: fix crash when writing timestamp on RX rings
Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.
PID: 1449 TASK:
ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51"
#0 [
ff1966a94a713bb0] machine_kexec at
ffffffff9d05a0be
#1 [
ff1966a94a713c08] __crash_kexec at
ffffffff9d192e9d
#2 [
ff1966a94a713cd0] crash_kexec at
ffffffff9d1941bd
#3 [
ff1966a94a713ce8] oops_end at
ffffffff9d01bd54
#4 [
ff1966a94a713d08] no_context at
ffffffff9d06bda4
#5 [
ff1966a94a713d60] __bad_area_nosemaphore at
ffffffff9d06c10c
#6 [
ff1966a94a713da8] do_page_fault at
ffffffff9d06cae4
#7 [
ff1966a94a713de0] page_fault at
ffffffff9da0107e
[exception RIP: ice_ptp_update_cached_phctime+91]
RIP:
ffffffffc076db8b RSP:
ff1966a94a713e98 RFLAGS:
00010246
RAX:
16e3db9c6b7ccae4 RBX:
ff187d269dd3c180 RCX:
ff187d269cd4d018
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ff187d269cfcc644 R8:
ff187d339b9641b0 R9:
0000000000000000
R10:
0000000000000002 R11:
0000000000000000 R12:
ff187d269cfcc648
R13:
ffffffff9f128784 R14:
ffffffff9d101b70 R15:
ff187d269cfcc640
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
#8 [
ff1966a94a713ea0] ice_ptp_periodic_work at
ffffffffc076dbef [ice]
#9 [
ff1966a94a713ee0] kthread_worker_fn at
ffffffff9d101c1b
#10 [
ff1966a94a713f10] kthread at
ffffffff9d101b4d
#11 [
ff1966a94a713f50] ret_from_fork at
ffffffff9da0023f
Fixes:
77a781155a65 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Dave Cain <dcain@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Zixuan Fu [Sat, 14 May 2022 05:07:11 +0000 (13:07 +0800)]
net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup()
In vmxnet3_rq_create(), when dma_alloc_coherent() fails,
vmxnet3_rq_destroy() is called. It sets rq->rx_ring[i].base to NULL. Then
vmxnet3_rq_create() returns an error to its callers mxnet3_rq_create_all()
-> vmxnet3_change_mtu(). Then vmxnet3_change_mtu() calls
vmxnet3_force_close() -> dev_close() in error handling code. And the driver
calls vmxnet3_close() -> vmxnet3_quiesce_dev() -> vmxnet3_rq_cleanup_all()
-> vmxnet3_rq_cleanup(). In vmxnet3_rq_cleanup(),
rq->rx_ring[ring_idx].base is accessed, but this variable is NULL, causing
a NULL pointer dereference.
To fix this possible bug, an if statement is added to check whether
rq->rx_ring[0].base is NULL in vmxnet3_rq_cleanup() and exit early if so.
The error log in our fault-injection testing is shown as follows:
[ 65.220135] BUG: kernel NULL pointer dereference, address:
0000000000000008
...
[ 65.222633] RIP: 0010:vmxnet3_rq_cleanup_all+0x396/0x4e0 [vmxnet3]
...
[ 65.227977] Call Trace:
...
[ 65.228262] vmxnet3_quiesce_dev+0x80f/0x8a0 [vmxnet3]
[ 65.228580] vmxnet3_close+0x2c4/0x3f0 [vmxnet3]
[ 65.228866] __dev_close_many+0x288/0x350
[ 65.229607] dev_close_many+0xa4/0x480
[ 65.231124] dev_close+0x138/0x230
[ 65.231933] vmxnet3_force_close+0x1f0/0x240 [vmxnet3]
[ 65.232248] vmxnet3_change_mtu+0x75d/0x920 [vmxnet3]
...
Fixes:
d1a890fa37f27 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Zixuan Fu <r33s3n6@gmail.com>
Link: https://lore.kernel.org/r/20220514050711.2636709-1-r33s3n6@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Zixuan Fu [Sat, 14 May 2022 05:06:56 +0000 (13:06 +0800)]
net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf()
In vmxnet3_rq_alloc_rx_buf(), when dma_map_single() fails, rbi->skb is
freed immediately. Similarly, in another branch, when dma_map_page() fails,
rbi->page is also freed. In the two cases, vmxnet3_rq_alloc_rx_buf()
returns an error to its callers vmxnet3_rq_init() -> vmxnet3_rq_init_all()
-> vmxnet3_activate_dev(). Then vmxnet3_activate_dev() calls
vmxnet3_rq_cleanup_all() in error handling code, and rbi->skb or rbi->page
are freed again in vmxnet3_rq_cleanup_all(), causing use-after-free bugs.
To fix these possible bugs, rbi->skb and rbi->page should be cleared after
they are freed.
The error log in our fault-injection testing is shown as follows:
[ 14.319016] BUG: KASAN: use-after-free in consume_skb+0x2f/0x150
...
[ 14.321586] Call Trace:
...
[ 14.325357] consume_skb+0x2f/0x150
[ 14.325671] vmxnet3_rq_cleanup_all+0x33a/0x4e0 [vmxnet3]
[ 14.326150] vmxnet3_activate_dev+0xb9d/0x2ca0 [vmxnet3]
[ 14.326616] vmxnet3_open+0x387/0x470 [vmxnet3]
...
[ 14.361675] Allocated by task 351:
...
[ 14.362688] __netdev_alloc_skb+0x1b3/0x6f0
[ 14.362960] vmxnet3_rq_alloc_rx_buf+0x1b0/0x8d0 [vmxnet3]
[ 14.363317] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
[ 14.363661] vmxnet3_open+0x387/0x470 [vmxnet3]
...
[ 14.367309]
[ 14.367412] Freed by task 351:
...
[ 14.368932] __dev_kfree_skb_any+0xd2/0xe0
[ 14.369193] vmxnet3_rq_alloc_rx_buf+0x71e/0x8d0 [vmxnet3]
[ 14.369544] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
[ 14.369883] vmxnet3_open+0x387/0x470 [vmxnet3]
[ 14.370174] __dev_open+0x28a/0x420
[ 14.370399] __dev_change_flags+0x192/0x590
[ 14.370667] dev_change_flags+0x7a/0x180
[ 14.370919] do_setlink+0xb28/0x3570
[ 14.371150] rtnl_newlink+0x1160/0x1740
[ 14.371399] rtnetlink_rcv_msg+0x5bf/0xa50
[ 14.371661] netlink_rcv_skb+0x1cd/0x3e0
[ 14.371913] netlink_unicast+0x5dc/0x840
[ 14.372169] netlink_sendmsg+0x856/0xc40
[ 14.372420] ____sys_sendmsg+0x8a7/0x8d0
[ 14.372673] __sys_sendmsg+0x1c2/0x270
[ 14.372914] do_syscall_64+0x41/0x90
[ 14.373145] entry_SYSCALL_64_after_hwframe+0x44/0xae
...
Fixes:
5738a09d58d5a ("vmxnet3: fix checks for dma mapping errors")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Zixuan Fu <r33s3n6@gmail.com>
Link: https://lore.kernel.org/r/20220514050656.2636588-1-r33s3n6@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xin Long [Mon, 16 May 2022 01:37:27 +0000 (21:37 -0400)]
xfrm: set dst dev to blackhole_netdev instead of loopback_dev in ifdown
The global blackhole_netdev has replaced pernet loopback_dev to become the
one given to the object that holds an netdev when ifdown in many places of
ipv4 and ipv6 since commit
8d7017fd621d ("blackhole_netdev: use
blackhole_netdev to invalidate dst entries").
Especially after commit
faab39f63c1f ("net: allow out-of-order netdev
unregistration"), it's no longer safe to use loopback_dev that may be
freed before other netdev.
This patch is to set dst dev to blackhole_netdev instead of loopback_dev
in ifdown.
v1->v2:
- add Fixes tag as Eric suggested.
Fixes:
faab39f63c1f ("net: allow out-of-order netdev unregistration")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/e8c87482998ca6fcdab214f5a9d582899ec0c648.1652665047.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Christophe JAILLET [Sun, 15 May 2022 17:01:56 +0000 (19:01 +0200)]
net: systemport: Fix an error handling path in bcm_sysport_probe()
if devm_clk_get_optional() fails, we still need to go through the error
handling path.
Add the missing goto.
Fixes:
6328a126896ea ("net: systemport: Manage Wake-on-LAN clock")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/99d70634a81c229885ae9e4ee69b2035749f7edc.1652634040.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Horatiu Vultur [Fri, 13 May 2022 18:00:30 +0000 (20:00 +0200)]
net: lan966x: Fix assignment of the MAC address
The following two scenarios were failing for lan966x.
1. If the port had the address X and then trying to assign the same
address, then the HW was just removing this address because first it
tries to learn new address and then delete the old one. As they are
the same the HW remove it.
2. If the port eth0 was assigned the same address as one of the other
ports eth1 then when assigning back the address to eth0 then the HW
was deleting the address of eth1.
The case 1. is fixed by checking if the port has already the same
address while case 2. is fixed by checking if the address is used by any
other port.
Fixes:
e18aba8941b40b ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220513180030.3076793-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pali Rohár [Sun, 15 May 2022 12:58:15 +0000 (14:58 +0200)]
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
This reverts commit
1571d67dc190e50c6c56e8f88cdc39f7cc53166e.
This commit broke support for setting interrupt affinity. It looks like
that it is related to the chained IRQ handler. Revert this commit until
issue with setting interrupt affinity is fixed.
Fixes:
1571d67dc190 ("PCI: aardvark: Rewrite IRQ code to chained IRQ handler")
Link: https://lore.kernel.org/r/20220515125815.30157-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Jonathan Lemon [Fri, 13 May 2022 22:52:31 +0000 (15:52 -0700)]
ptp: ocp: have adjtime handle negative delta_ns correctly
delta_ns is a s64, but it was being passed ptp_ocp_adjtime_coarse
as an u64. Also, it turns out that timespec64_add_ns() only handles
positive values, so perform the math with set_normalized_timespec().
Fixes:
90f8f4c0e3ce ("ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments")
Suggested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/20220513225231.1412-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Felix Fietkau [Mon, 9 May 2022 12:26:16 +0000 (14:26 +0200)]
netfilter: nft_flow_offload: fix offload with pppoe + vlan
When running a combination of PPPoE on top of a VLAN, we need to set
info->outdev to the PPPoE device, otherwise PPPoE encap is skipped
during software offload.
Fixes:
72efd585f714 ("netfilter: flowtable: add pppoe support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Felix Fietkau [Mon, 9 May 2022 12:26:15 +0000 (14:26 +0200)]
net: fix dev_fill_forward_path with pppoe + bridge
When calling dev_fill_forward_path on a pppoe device, the provided destination
address is invalid. In order for the bridge fdb lookup to succeed, the pppoe
code needs to update ctx->daddr to the correct value.
Fix this by storing the address inside struct net_device_path_ctx
Fixes:
f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Felix Fietkau [Mon, 9 May 2022 12:26:14 +0000 (14:26 +0200)]
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
The dst entry does not contain a valid hardware address, so skip the lookup
in order to avoid running into errors here.
The proper hardware address is filled in from nft_dev_path_info
Fixes:
72efd585f714 ("netfilter: flowtable: add pppoe support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Felix Fietkau [Mon, 9 May 2022 12:26:13 +0000 (14:26 +0200)]
netfilter: flowtable: fix excessive hw offload attempts after failure
If a flow cannot be offloaded, the code currently repeatedly tries again as
quickly as possible, which can significantly increase system load.
Fix this by limiting flow timeout update and hardware offload retry to once
per second.
Fixes:
c07531c01d82 ("netfilter: flowtable: Remove redundant hw refresh bit")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Paolo Abeni [Fri, 13 May 2022 09:27:06 +0000 (11:27 +0200)]
net/sched: act_pedit: sanitize shift argument before usage
syzbot was able to trigger an Out-of-Bound on the pedit action:
UBSAN: shift-out-of-bounds in net/sched/act_pedit.c:238:43
shift exponent
1400735974 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 3606 Comm: syz-executor151 Not tainted 5.18.0-rc5-syzkaller-00165-g810c2f0a3f86 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
ubsan_epilogue+0xb/0x50 lib/ubsan.c:151
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x187 lib/ubsan.c:322
tcf_pedit_init.cold+0x1a/0x1f net/sched/act_pedit.c:238
tcf_action_init_1+0x414/0x690 net/sched/act_api.c:1367
tcf_action_init+0x530/0x8d0 net/sched/act_api.c:1432
tcf_action_add+0xf9/0x480 net/sched/act_api.c:1956
tc_ctl_action+0x346/0x470 net/sched/act_api.c:2015
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5993
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fe36e9e1b59
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007ffef796fe88 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fe36e9e1b59
RDX:
0000000000000000 RSI:
0000000020000300 RDI:
0000000000000003
RBP:
00007fe36e9a5d00 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe36e9a5d90
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
</TASK>
The 'shift' field is not validated, and any value above 31 will
trigger out-of-bounds. The issue predates the git history, but
syzbot was able to trigger it only after the commit mentioned in
the fixes tag, and this change only applies on top of such commit.
Address the issue bounding the 'shift' value to the maximum allowed
by the relevant operator.
Reported-and-tested-by: syzbot+8ed8fc4c57e9dcf23ca6@syzkaller.appspotmail.com
Fixes:
8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Mon, 16 May 2022 10:31:12 +0000 (12:31 +0200)]
ALSA: usb-audio: Restore Rane SL-1 quirk
At cleaning up and moving the device rename from the quirk table to
its own table, we removed the entry for Rane SL-1 as we thought it's
only for renaming. It turned out, however, that the quirk is required
for matching with the device that declares itself as no standard
audio but only as vendor-specific.
Restore the quirk entry for Rane SL-1 to fix the regression.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215887
Fixes:
5436f59bc5bc ("ALSA: usb-audio: Move device rename and profile quirks to an internal table")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220516103112.12950-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
David S. Miller [Mon, 16 May 2022 08:39:37 +0000 (09:39 +0100)]
Merge tag 'linux-can-fixes-for-5.18-
20220514' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-05-14
this is a pull request of 2 patches for net/master.
Changes to linux-can-fixes-for-5.18-
20220513:
- adjusted Fixes: Tag on "Revert "can: m_can: pci: use custom bit timings for Elkhart Lake""
(Thanks Jakub)
Both patches are by Jarkko Nikula, target the m_can PCI driver
bindings, and fix usage of wrong bit timing constants for the Elkhart
Lake platform.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Chi [Fri, 13 May 2022 12:16:45 +0000 (20:16 +0800)]
ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machine
The HP EliteBook 630 is using ALC236 codec which used 0x02 to control mute LED
and 0x01 to control micmute LED. Therefore, add a quirk to make it works.
Signed-off-by: Andy Chi <andy.chi@canonical.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220513121648.28584-1-andy.chi@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Eyal Birger [Fri, 13 May 2022 20:34:02 +0000 (23:34 +0300)]
xfrm: fix "disable_policy" flag use when arriving from different devices
In IPv4 setting the "disable_policy" flag on a device means no policy
should be enforced for traffic originating from the device. This was
implemented by seting the DST_NOPOLICY flag in the dst based on the
originating device.
However, dsts are cached in nexthops regardless of the originating
devices, in which case, the DST_NOPOLICY flag value may be incorrect.
Consider the following setup:
+------------------------------+
| ROUTER |
+-------------+ | +-----------------+ |
| ipsec src |----|-|ipsec0 | |
+-------------+ | |disable_policy=0 | +----+ |
| +-----------------+ |eth1|-|-----
+-------------+ | +-----------------+ +----+ |
| noipsec src |----|-|eth0 | |
+-------------+ | |disable_policy=1 | |
| +-----------------+ |
+------------------------------+
Where ROUTER has a default route towards eth1.
dst entries for traffic arriving from eth0 would have DST_NOPOLICY
and would be cached and therefore can be reused by traffic originating
from ipsec0, skipping policy check.
Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead
of the DST in IN/FWD IPv4 policy checks.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Linus Torvalds [Mon, 16 May 2022 01:08:58 +0000 (18:08 -0700)]
Linux 5.18-rc7
Linus Torvalds [Sun, 15 May 2022 15:08:51 +0000 (08:08 -0700)]
Merge tag 'driver-core-5.18-rc7' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here is one fix, and three documentation updates for 5.18-rc7.
The fix is for the firmware loader which resolves a long-reported
problem where the credentials of the firmware loader could be set to a
userspace process without enough permissions to actually load the
firmware image. Many Android vendors have been reporting this for
quite some time.
The documentation updates are for the embargoed-hardware-issues.rst
file to add a new entry, change an existing one, and sort the list to
make changes easier in the future.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation/process: Update ARM contact for embargoed hardware issues
Documentation/process: Add embargoed HW contact for Ampere Computing
Documentation/process: Make groups alphabetical and use tabs consistently
firmware_loader: use kernel credentials when reading firmware
Linus Torvalds [Sun, 15 May 2022 15:07:07 +0000 (08:07 -0700)]
Merge tag 'char-misc-5.18-rc7' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are two small driver fixes for 5.18-rc7 that resolve reported
problems:
- slimbus driver irq bugfix
- interconnect sync state bugfix
Both of these have been in linux-next with no reported problems"
* tag 'char-misc-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
slimbus: qcom: Fix IRQ check in qcom_slim_probe
interconnect: Restore sync state by ignoring ipa-virt in provider count
Linus Torvalds [Sun, 15 May 2022 15:05:04 +0000 (08:05 -0700)]
Merge tag 'tty-5.18-rc7' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty n_gsm and serial driver fixes for 5.18-rc7
that resolve reported problems. They include:
- n_gsm fixes for reported issues
- 8250_mtk driver fixes for some platforms
- fsl_lpuart driver fix for reported problem.
- digicolor driver fix for reported problem.
All have been in linux-next for a while with no reported problems"
* tag 'tty-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
fsl_lpuart: Don't enable interrupts too early
tty: n_gsm: fix invalid gsmtty_write_room() result
tty: n_gsm: fix mux activation issues in gsm_config()
tty: n_gsm: fix buffer over-read in gsm_dlci_data()
serial: 8250_mtk: Fix register address for XON/XOFF character
serial: 8250_mtk: Make sure to select the right FEATURE_SEL
serial: 8250_mtk: Fix UART_EFR register address
tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe()
Linus Torvalds [Sun, 15 May 2022 15:03:24 +0000 (08:03 -0700)]
Merge tag 'usb-5.18-rc7' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small fixes for reported issues with some USB drivers.
They include:
- xhci fixes for xhci-mtk platform driver
- typec driver fixes for reported problems.
- cdc-wdm read-stuck fix
- gadget driver fix for reported race condition
- new usb-serial driver ids
All of these have been in linux-next with no reported problems"
* tag 'usb-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci-mtk: remove bandwidth budget table
usb: xhci-mtk: fix fs isoc's transfer error
usb: gadget: fix race when gadget driver register via ioctl
usb: typec: tcpci_mt6360: Update for BMC PHY setting
usb: gadget: uvc: allow for application to cleanly shutdown
usb: typec: tcpci: Don't skip cleanup in .remove() on error
usb: cdc-wdm: fix reading stuck on device close
USB: serial: qcserial: add support for Sierra Wireless EM7590
USB: serial: option: add Fibocom MA510 modem
USB: serial: option: add Fibocom L610 modem
USB: serial: pl2303: add device id for HP LM930 Display
Linus Torvalds [Sun, 15 May 2022 13:46:03 +0000 (06:46 -0700)]
Merge tag 'powerpc-5.18-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- Fix KVM PR on 32-bit, which was broken by some MMU code refactoring.
Thanks to: Alexander Graf, and Matt Evans.
* tag 'powerpc-5.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
Linus Torvalds [Sun, 15 May 2022 13:42:40 +0000 (06:42 -0700)]
Merge tag 'x86-urgent-2022-05-15' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the handling of unpopulated sub-pmd spaces.
The copy & pasta from the corresponding s390 code screwed up the
address calculation for marking the sub-pmd ranges via memset by
omitting the ALIGN_DOWN() to calculate the proper start address.
It's a mystery why this code is not generic and shared because there
is nothing architecture specific in there, but that's too intrusive
for a backportable fix"
* tag 'x86-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix marking of unused sub-pmd ranges
Linus Torvalds [Sun, 15 May 2022 13:40:11 +0000 (06:40 -0700)]
Merge tag 'sched-urgent-2022-05-15' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"The recent expansion of the sched switch tracepoint inserted a new
argument in the middle of the arguments. This reordering broke BPF
programs which relied on the old argument list.
While tracepoints are not considered stable ABI, it's not trivial to
make BPF cope with such a change, but it's being worked on. For now
restore the original argument order and move the new argument to the
end of the argument list"
* tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/tracing: Append prev_state to tp args instead
Linus Torvalds [Sun, 15 May 2022 13:37:05 +0000 (06:37 -0700)]
Merge tag 'irq-urgent-2022-05-15' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a recent (introduced in 5.16) regression in the core
interrupt code.
The consolidation of the interrupt handler invocation code added an
unconditional warning when generic_handle_domain_irq() is invoked from
outside hard interrupt context. That's overbroad as the requirement
for invoking these handlers in hard interrupt context is only required
for certain interrupt types. The subsequently called code already
contains a warning which triggers conditionally for interrupt chips
which indicate this requirement in their properties.
Remove the overbroad one"
* tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
Jarkko Nikula [Thu, 12 May 2022 12:41:44 +0000 (15:41 +0300)]
can: m_can: remove support for custom bit timing, take #2
Now when Intel Elkhart Lake uses again common bit timing and there are
no other users for custom bit timing, we can bring back the changes
done by the commit
0ddd83fbebbc ("can: m_can: remove support for
custom bit timing").
This effectively reverts commit
ea768b2ffec6 ("Revert "can: m_can:
remove support for custom bit timing"") while taking into account
commit
ea22ba40debe ("can: m_can: make custom bittiming fields const")
and commit
7d4a101c0bd3 ("can: dev: add sanity check in
can_set_static_ctrlmode()").
Link: https://lore.kernel.org/all/20220512124144.536850-2-jarkko.nikula@linux.intel.com
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jarkko Nikula [Thu, 12 May 2022 12:41:43 +0000 (15:41 +0300)]
Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"
This reverts commit
0e8ffdf3b86dfd44b651f91b12fcae76c25c453b.
Commit
0e8ffdf3b86d ("can: m_can: pci: use custom bit timings for
Elkhart Lake") broke the test case using bitrate switching.
| ip link set can0 up type can bitrate 500000 dbitrate 4000000 fd on
| ip link set can1 up type can bitrate 500000 dbitrate 4000000 fd on
| candump can0 &
| cangen can1 -I 0x800 -L 64 -e -fb \
| -D 11223344deadbeef55667788feedf00daabbccdd44332211 -n 1 -v -v
Above commit does everything correctly according to the datasheet.
However datasheet wasn't correct.
I got confirmation from hardware engineers that the actual CAN
hardware on Intel Elkhart Lake is based on M_CAN version v3.2.0.
Datasheet was mirroring values from an another specification which was
based on earlier M_CAN version leading to wrong bit timings.
Therefore revert the commit and switch back to common bit timings.
Fixes:
ea4c1787685d ("can: m_can: pci: use custom bit timings for Elkhart Lake")
Link: https://lore.kernel.org/all/20220512124144.536850-1-jarkko.nikula@linux.intel.com
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reported-by: Chee Hou Ong <chee.houx.ong@intel.com>
Reported-by: Aman Kumar <aman.kumar@intel.com>
Reported-by: Pallavi Kumari <kumari.pallavi@intel.com>
Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Linus Torvalds [Sat, 14 May 2022 18:43:47 +0000 (11:43 -0700)]
Merge tag 'perf-tools-fixes-for-v5.18-2022-05-14' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix two NDEBUG warnings in 'perf bench numa'
- Fix ARM coresight `perf test` failure
- Sync linux/kvm.h with the kernel sources
- Add James and Mike as Arm64 performance events reviewers
* tag 'perf-tools-fixes-for-v5.18-2022-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
MAINTAINERS: Add James and Mike as Arm64 performance events reviewers
tools headers UAPI: Sync linux/kvm.h with the kernel sources
perf tests: Fix coresight `perf test` failure.
perf bench: Fix two numa NDEBUG warnings
Harini Katakam [Thu, 12 May 2022 17:19:00 +0000 (22:49 +0530)]
net: macb: Increment rx bd head after allocating skb and buffer
In gem_rx_refill rx_prepared_head is incremented at the beginning of
the while loop preparing the skb and data buffers. If the skb or data
buffer allocation fails, this BD will be unusable BDs until the head
loops back to the same BD (and obviously buffer allocation succeeds).
In the unlikely event that there's a string of allocation failures,
there will be an equal number of unusable BDs and an inconsistent RX
BD chain. Hence increment the head at the end of the while loop to be
clean.
Fixes:
4df95131ea80 ("net/macb: change RX path for GEM")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220512171900.32593-1-harini.katakam@xilinx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 May 2022 00:04:33 +0000 (17:04 -0700)]
Merge branch 'mptcp-subflow-accounting-fix'
Mat Martineau says:
====================
mptcp: Subflow accounting fix
This series contains a bug fix affecting the in-kernel path manager
(patch 1), where closing subflows would sometimes not adjust the PM's
count of active subflows. Patch 2 updates the selftests to exercise the
new code.
====================
Link: https://lore.kernel.org/r/20220512232642.541301-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 12 May 2022 23:26:42 +0000 (16:26 -0700)]
selftests: mptcp: add subflow limits test-cases
Add and delete a bunch of endpoints and verify the
respect of configured limits.
This covers the codepath introduced by the previous patch.
Fixes:
69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 12 May 2022 23:26:41 +0000 (16:26 -0700)]
mptcp: fix subflow accounting on close
If the PM closes a fully established MPJ subflow or the subflow
creation errors out in it's early stage the subflows counter is
not bumped accordingly.
This change adds the missing accounting, additionally taking care
of updating accordingly the 'accept_subflow' flag.
Fixes:
a88c9e496937 ("mptcp: do not block subflows creation on errors")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 13 May 2022 23:20:25 +0000 (16:20 -0700)]
Merge tag 'drm-fixes-2022-05-14' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Turns out I was right, some fixes hadn't made it to me yet. The vmwgfx
ones also popped up later, but all seem like bad enough things to fix.
The dma-buf, vc4 and nouveau ones are all pretty small.
The fbdev fixes are a bit more complicated: a fix to cleanup fbdev
devices properly, uncovered some use-after-free bugs in existing
drivers. Then the fix for those bugs wasn't correct. This reverts that
fix, and puts the proper fixes in place in the drivers to avoid the
use-after-frees.
This has had a fair number of eyes on it at this stage, and I'm
confident enough that it puts things in the right place, and is less
dangerous than reverting our way out of the initial change at this
stage.
fbdev:
- revert NULL deref fix that turned into a use-after-free
- prevent use-after-free in fbdev
- efifb/simplefb/vesafb: fix cleanup paths to avoid use-after-frees
dma-buf:
- fix panic in stats setup
vc4:
- fix hdmi build
nouveau:
- tegra iommu present fix
- fix leak in backlight name
vmwgfx:
- Black screen due to fences using FIFO checks on SVGA3
- Random black screens on boot due to uninitialized drm_mode_fb_cmd2
- Hangs on SVGA3 due to command buffers being used with gbobjects"
* tag 'drm-fixes-2022-05-14' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: Disable command buffers on svga3 without gbobjects
drm/vmwgfx: Initialize drm_mode_fb_cmd2
drm/vmwgfx: Fix fencing on SVGAv3
drm/vc4: hdmi: Fix build error for implicit function declaration
dma-buf: call dma_buf_stats_setup after dmabuf is in valid list
fbdev: efifb: Fix a use-after-free due early fb_info cleanup
drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()
drm/nouveau/tegra: Stop using iommu_present()
fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: Prevent possible use-after-free in fb_release()
Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"
Bart Van Assche [Fri, 13 May 2022 17:13:07 +0000 (10:13 -0700)]
block/mq-deadline: Set the fifo_time member also if inserting at head
Before commit
322cff70d46c the fifo_time member of requests on a dispatch
list was not used. Commit
322cff70d46c introduces code that reads the
fifo_time member of requests on dispatch lists. Hence this patch that sets
the fifo_time member when adding a request to a dispatch list.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Fixes:
322cff70d46c ("block/mq-deadline: Prioritize high-priority requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220513171307.32564-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Airlie [Fri, 13 May 2022 22:34:01 +0000 (08:34 +1000)]
Merge tag 'drm-misc-fixes-2022-05-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Multiple fixes to fbdev to address a regression at unregistration, an
iommu detection improvement for nouveau, a memory leak fix for nouveau,
pointer dereference fix for dma_buf_file_release(), and a build breakage
fix for vc4
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220513073044.ymayac7x7bzatrt7@houat
Dave Airlie [Fri, 13 May 2022 22:29:41 +0000 (08:29 +1000)]
Merge tag 'vmwgfx-drm-fixes-5.18-2022-05-13' of https://gitlab.freedesktop.org/zack/vmwgfx into drm-fixes
vmwgfx fixes for:
- Black screen due to fences using FIFO checks on SVGA3
- Random black screens on boot due to uninitialized drm_mode_fb_cmd2
- Hangs on SVGA3 due to command buffers being used with gbobjects
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a1d32799e4c74b8540216376d7576bb783ca07ba.camel@vmware.com
Linus Torvalds [Fri, 13 May 2022 21:32:53 +0000 (14:32 -0700)]
Merge tag 'gfs2-v5.18-rc4-fix3' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"We've finally identified commit
dc732906c245 ("gfs2: Introduce flag
for glock holder auto-demotion") to be the other cause of the
filesystem corruption we've been seeing. This feature isn't strictly
necessary anymore, so we've decided to stop using it for now.
With this and the gfs_iomap_end rounding fix you've already seen
("gfs2: Fix filesystem block deallocation for short writes" in this
pull request), we're corruption free again now.
- Fix filesystem block deallocation for short writes.
- Stop using glock holder auto-demotion for now.
- Get rid of buffered writes inefficiencies due to page faults being
disabled.
- Minor other cleanups"
* tag 'gfs2-v5.18-rc4-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Stop using glock holder auto-demotion for now
gfs2: buffered write prefaulting
gfs2: Align read and write chunks to the page cache
gfs2: Pull return value test out of should_fault_in_pages
gfs2: Clean up use of fault_in_iov_iter_{read,write}able
gfs2: Variable rename
gfs2: Fix filesystem block deallocation for short writes
Andreas Gruenbacher [Wed, 11 May 2022 16:27:12 +0000 (18:27 +0200)]
gfs2: Stop using glock holder auto-demotion for now
We're having unresolved issues with the glock holder auto-demotion mechanism
introduced in commit
dc732906c245. This mechanism was assumed to be essential
for avoiding frequent short reads and writes until commit
296abc0d91d8
("gfs2: No short reads or writes upon glock contention"). Since then,
when the inode glock is lost, it is simply re-acquired and the operation
is resumed. This means that apart from the performance penalty, we
might as well drop the inode glock before faulting in pages, and
re-acquire it afterwards.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Andreas Gruenbacher [Wed, 4 May 2022 21:37:30 +0000 (23:37 +0200)]
gfs2: buffered write prefaulting
In gfs2_file_buffered_write, to increase the likelihood that all the
user memory we're trying to write will be resident in memory, carry out
the write in chunks and fault in each chunk of user memory before trying
to write it. Otherwise, some workloads will trigger frequent short
"internal" writes, causing filesystem blocks to be allocated and then
partially deallocated again when writing into holes, which is wasteful
and breaks reservations.
Neither the chunked writes nor any of the short "internal" writes are
user visible.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Linus Torvalds [Fri, 13 May 2022 20:13:48 +0000 (13:13 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, all in drivers.
These patches mosly fix error legs and exceptional conditions
(scsi_dh_alua, qla2xxx). The lpfc fixes are for coding issues with
lpfc features"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Correct BDE DMA address assignment for GEN_REQ_WQE
scsi: lpfc: Fix split code for FLOGI on FCoE
scsi: qla2xxx: Fix missed DMA unmap for aborted commands
scsi: scsi_dh_alua: Properly handle the ALUA transitioning state
Andreas Gruenbacher [Thu, 5 May 2022 11:32:23 +0000 (13:32 +0200)]
gfs2: Align read and write chunks to the page cache
Align the chunks that reads and writes are carried out in to the page
cache rather than the user buffers. This will be more efficient in
general, especially for allocating writes. Optimizing the case that the
user buffer is gfs2 backed isn't very useful; we only need to make sure
we won't deadlock.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Andreas Gruenbacher [Thu, 5 May 2022 10:53:26 +0000 (12:53 +0200)]
gfs2: Pull return value test out of should_fault_in_pages
Pull the return value test of the previous read or write operation out
of should_fault_in_pages(). In a following patch, we'll fault in pages
before the I/O and there will be no return value to check.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Andreas Gruenbacher [Thu, 5 May 2022 10:37:49 +0000 (12:37 +0200)]
gfs2: Clean up use of fault_in_iov_iter_{read,write}able
No need to store the return value of the fault_in functions in separate
variables.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>