Johannes Berg [Mon, 9 Oct 2023 08:18:01 +0000 (10:18 +0200)]
wifi: cfg80211: use system_unbound_wq for wiphy work
Since wiphy work items can run pretty much arbitrary
code in the stack/driver, it can take longer to run
all of this, so we shouldn't be using system_wq via
schedule_work(). Also, we lock the wiphy (which is
the reason this exists), so use system_unbound_wq.
Reported-and-tested-by: Kalle Valo <kvalo@kernel.org>
Fixes:
a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Thu, 5 Oct 2023 18:29:21 +0000 (11:29 -0700)]
Merge tag 'net-6.6-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth, netfilter, BPF and WiFi.
I didn't collect precise data but feels like we've got a lot of 6.5
fixes here. WiFi fixes are most user-awaited.
Current release - regressions:
- Bluetooth: fix hci_link_tx_to RCU lock usage
Current release - new code bugs:
- bpf: mprog: fix maximum program check on mprog attachment
- eth: ti: icssg-prueth: fix signedness bug in prueth_init_tx_chns()
Previous releases - regressions:
- ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling
- vringh: don't use vringh_kiov_advance() in vringh_iov_xfer(), it
doesn't handle zero length like we expected
- wifi:
- cfg80211: fix cqm_config access race, fix crashes with brcmfmac
- iwlwifi: mvm: handle PS changes in vif_cfg_changed
- mac80211: fix mesh id corruption on 32 bit systems
- mt76: mt76x02: fix MT76x0 external LNA gain handling
- Bluetooth: fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER
- l2tp: fix handling of transhdrlen in __ip{,6}_append_data()
- dsa: mv88e6xxx: avoid EEPROM timeout when EEPROM is absent
- eth: stmmac: fix the incorrect parameter after refactoring
Previous releases - always broken:
- net: replace calls to sock->ops->connect() with kernel_connect(),
prevent address rewrite in kernel_bind(); otherwise BPF hooks may
modify arguments, unexpectedly to the caller
- tcp: fix delayed ACKs when reads and writes align with MSS
- bpf:
- verifier: unconditionally reset backtrack_state masks on global
func exit
- s390: let arch_prepare_bpf_trampoline return program size, fix
struct_ops offsets
- sockmap: fix accounting of available bytes in presence of PEEKs
- sockmap: reject sk_msg egress redirects to non-TCP sockets
- ipv4/fib: send netlink notify when delete source address routes
- ethtool: plca: fix width of reads when parsing netlink commands
- netfilter: nft_payload: rebuild vlan header on h_proto access
- Bluetooth: hci_codec: fix leaking memory of local_codecs
- eth: intel: ice: always add legacy 32byte RXDID in supported_rxdids
- eth: stmmac:
- dwmac-stm32: fix resume on STM32 MCU
- remove buggy and unneeded stmmac_poll_controller, depend on NAPI
- ibmveth: always recompute TCP pseudo-header checksum, fix use of
the driver with Open vSwitch
- wifi:
- rtw88: rtw8723d: fix MAC address offset in EEPROM
- mt76: fix lock dependency problem for wed_lock
- mwifiex: sanity check data reported by the device
- iwlwifi: ensure ack flag is properly cleared
- iwlwifi: mvm: fix a memory corruption due to bad pointer arithm
- iwlwifi: mvm: fix incorrect usage of scan API
Misc:
- wifi: mac80211: work around Cisco AP 9115 VHT MPDU length"
* tag 'net-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
MAINTAINERS: update Matthieu's email address
mptcp: userspace pm allow creating id 0 subflow
mptcp: fix delegated action races
net: stmmac: remove unneeded stmmac_poll_controller
net: lan743x: also select PHYLIB
net: ethernet: mediatek: disable irq before schedule napi
net: mana: Fix oversized sge0 for GSO packets
net: mana: Fix the tso_bytes calculation
net: mana: Fix TX CQE error handling
netlink: annotate data-races around sk->sk_err
sctp: update hb timer immediately after users change hb_interval
sctp: update transport state when processing a dupcook packet
tcp: fix delayed ACKs for MSS boundary condition
tcp: fix quick-ack counting to count actual ACKs of new data
page_pool: fix documentation typos
tipc: fix a potential deadlock on &tx->lock
net: stmmac: dwmac-stm32: fix resume on STM32 MCU
ipv4: Set offload_failed flag in fibmatch results
netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
netfilter: nf_tables: Deduplicate nft_register_obj audit logs
...
Linus Torvalds [Thu, 5 Oct 2023 18:12:33 +0000 (11:12 -0700)]
Merge tag 'integrity-v6.6-fix' of git://git./linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
"Two additional patches to fix the removal of the deprecated
IMA_TRUSTED_KEYRING Kconfig"
* tag 'integrity-v6.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: rework CONFIG_IMA dependency block
ima: Finish deprecation of IMA_TRUSTED_KEYRING Kconfig
Linus Torvalds [Thu, 5 Oct 2023 18:07:03 +0000 (11:07 -0700)]
Merge tag 'leds-fixes-6.6' of git://git./linux/kernel/git/lee/leds
Pull LED fix from Lee Jones:
"Just the one bug-fix:
- Fix regression affecting LED_COLOR_ID_MULTI users"
* tag 'leds-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
leds: Drop BUG_ON check for LED_COLOR_ID_MULTI
Linus Torvalds [Thu, 5 Oct 2023 18:03:20 +0000 (11:03 -0700)]
Merge tag 'mfd-fixes-6.6' of git://git./linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
"A couple of small fixes:
- Potential build failure in CS42L43
- Device Tree bindings clean-up for a superseded patch"
* tag 'mfd-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
dt-bindings: mfd: Revert "dt-bindings: mfd: maxim,max77693: Add USB connector"
mfd: cs42l43: Fix MFD_CS42L43 dependency on REGMAP_IRQ
Linus Torvalds [Thu, 5 Oct 2023 17:56:18 +0000 (10:56 -0700)]
Merge tag 'ovl-fixes-6.6-rc5' of git://git./linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
- Fix for file reference leak regression
- Fix for NULL pointer deref regression
- Fixes for RCU-walk race regressions:
Two of the fixes were taken from Al's RCU pathwalk race fixes series
with his consent [1].
Note that unlike most of Al's series, these two patches are not about
racing with ->kill_sb() and they are also very recent regressions
from v6.5, so I think it's worth getting them into v6.5.y.
There is also a fix for an RCU pathwalk race with ->kill_sb(), which
may have been solved in vfs generic code as you suggested, but it
also rids overlayfs from a nasty hack, so I think it's worth anyway.
Link: https://lore.kernel.org/linux-fsdevel/20231003204749.GA800259@ZenIV/
* tag 'ovl-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix NULL pointer defer when encoding non-decodable lower fid
ovl: make use of ->layers safe in rcu pathwalk
ovl: fetch inode once in ovl_dentry_revalidate_common()
ovl: move freeing ovl_entry past rcu delay
ovl: fix file reference leak when submitting aio
Jakub Kicinski [Thu, 5 Oct 2023 16:34:34 +0000 (09:34 -0700)]
Merge branch 'mptcp-fixes-and-maintainer-email-update-for-v6-6'
Mat Martineau says:
====================
mptcp: Fixes and maintainer email update for v6.6
Patch 1 addresses a race condition in MPTCP "delegated actions"
infrastructure. Affects v5.19 and later.
Patch 2 removes an unnecessary restriction that did not allow additional
outgoing subflows using the local address of the initial MPTCP subflow.
v5.16 and later.
Patch 3 updates Matthieu's email address.
====================
Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-0-28de4ac663ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts [Wed, 4 Oct 2023 20:38:13 +0000 (13:38 -0700)]
MAINTAINERS: update Matthieu's email address
Use my kernel.org account instead.
The other one will bounce by the end of the year.
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-3-28de4ac663ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Wed, 4 Oct 2023 20:38:12 +0000 (13:38 -0700)]
mptcp: userspace pm allow creating id 0 subflow
This patch drops id 0 limitation in mptcp_nl_cmd_sf_create() to allow
creating additional subflows with the local addr ID 0.
There is no reason not to allow additional subflows from this local
address: we should be able to create new subflows from the initial
endpoint. This limitation was breaking fullmesh support from userspace.
Fixes:
702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/391
Cc: stable@vger.kernel.org
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-2-28de4ac663ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Wed, 4 Oct 2023 20:38:11 +0000 (13:38 -0700)]
mptcp: fix delegated action races
The delegated action infrastructure is prone to the following
race: different CPUs can try to schedule different delegated
actions on the same subflow at the same time.
Each of them will check different bits via mptcp_subflow_delegate(),
and will try to schedule the action on the related per-cpu napi
instance.
Depending on the timing, both can observe an empty delegated list
node, causing the same entry to be added simultaneously on two different
lists.
The root cause is that the delegated actions infra does not provide
a single synchronization point. Address the issue reserving an additional
bit to mark the subflow as scheduled for delegation. Acquiring such bit
guarantee the caller to own the delegated list node, and being able to
safely schedule the subflow.
Clear such bit only when the subflow scheduling is completed, ensuring
proper barrier in place.
Additionally swap the meaning of the delegated_action bitmask, to allow
the usage of the existing helper to set multiple bit at once.
Fixes:
bcd97734318d ("mptcp: use delegate action to schedule 3rd ack retrans")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-1-28de4ac663ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remi Pommarel [Wed, 4 Oct 2023 14:33:56 +0000 (16:33 +0200)]
net: stmmac: remove unneeded stmmac_poll_controller
Using netconsole netpoll_poll_dev could be called from interrupt
context, thus using disable_irq() would cause the following kernel
warning with CONFIG_DEBUG_ATOMIC_SLEEP enabled:
BUG: sleeping function called from invalid context at kernel/irq/manage.c:137
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 10, name: ksoftirqd/0
CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G W 5.15.42-00075-g816b502b2298-dirty #117
Hardware name: aml (r1) (DT)
Call trace:
dump_backtrace+0x0/0x270
show_stack+0x14/0x20
dump_stack_lvl+0x8c/0xac
dump_stack+0x18/0x30
___might_sleep+0x150/0x194
__might_sleep+0x64/0xbc
synchronize_irq+0x8c/0x150
disable_irq+0x2c/0x40
stmmac_poll_controller+0x140/0x1a0
netpoll_poll_dev+0x6c/0x220
netpoll_send_skb+0x308/0x390
netpoll_send_udp+0x418/0x760
write_msg+0x118/0x140 [netconsole]
console_unlock+0x404/0x500
vprintk_emit+0x118/0x250
dev_vprintk_emit+0x19c/0x1cc
dev_printk_emit+0x90/0xa8
__dev_printk+0x78/0x9c
_dev_warn+0xa4/0xbc
ath10k_warn+0xe8/0xf0 [ath10k_core]
ath10k_htt_txrx_compl_task+0x790/0x7fc [ath10k_core]
ath10k_pci_napi_poll+0x98/0x1f4 [ath10k_pci]
__napi_poll+0x58/0x1f4
net_rx_action+0x504/0x590
_stext+0x1b8/0x418
run_ksoftirqd+0x74/0xa4
smpboot_thread_fn+0x210/0x3c0
kthread+0x1fc/0x210
ret_from_fork+0x10/0x20
Since [0] .ndo_poll_controller is only needed if driver doesn't or
partially use NAPI. Because stmmac does so, stmmac_poll_controller
can be removed fixing the above warning.
[0] commit
ac3d9dd034e5 ("netpoll: make ndo_poll_controller() optional")
Cc: <stable@vger.kernel.org> # 5.15.x
Fixes:
47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1c156a6d8c9170bd6a17825f2277115525b4d50f.1696429960.git.repk@triplefau.lt
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Mon, 2 Oct 2023 19:35:44 +0000 (12:35 -0700)]
net: lan743x: also select PHYLIB
Since FIXED_PHY depends on PHYLIB, PHYLIB needs to be set to avoid
a kconfig warning:
WARNING: unmet direct dependencies detected for FIXED_PHY
Depends on [n]: NETDEVICES [=y] && PHYLIB [=n]
Selected by [y]:
- LAN743X [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && PCI [=y] && PTP_1588_CLOCK_OPTIONAL [=y]
Fixes:
73c4d1b307ae ("net: lan743x: select FIXED_PHY")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: lore.kernel.org/r/
202309261802.JPbRHwti-lkp@intel.com
Cc: Bryan Whitehead <bryan.whitehead@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://lore.kernel.org/r/20231002193544.14529-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christian Marangi [Mon, 2 Oct 2023 14:08:05 +0000 (16:08 +0200)]
net: ethernet: mediatek: disable irq before schedule napi
While searching for possible refactor of napi_schedule_prep and
__napi_schedule it was notice that the mtk eth driver disable the
interrupt for rx and tx AFTER napi is scheduled.
While this is a very hard to repro case it might happen to have
situation where the interrupt is disabled and never enabled again as the
napi completes and the interrupt is enabled before.
This is caused by the fact that a napi driven by interrupt expect a
logic with:
1. interrupt received. napi prepared -> interrupt disabled -> napi
scheduled
2. napi triggered. ring cleared -> interrupt enabled -> wait for new
interrupt
To prevent this case, disable the interrupt BEFORE the napi is
scheduled.
Fixes:
656e705243fd ("net-next: mediatek: add support for MT7623 ethernet")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20231002140805.568-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Oct 2023 09:45:08 +0000 (11:45 +0200)]
Merge branch 'net-mana-fix-some-tx-processing-bugs'
Haiyang Zhang says:
====================
net: mana: Fix some TX processing bugs
Fix TX processing bugs on error handling, tso_bytes calculation,
and sge0 size.
====================
Link: https://lore.kernel.org/r/1696020147-14989-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Haiyang Zhang [Fri, 29 Sep 2023 20:42:27 +0000 (13:42 -0700)]
net: mana: Fix oversized sge0 for GSO packets
Handle the case when GSO SKB linear length is too large.
MANA NIC requires GSO packets to put only the header part to SGE0,
otherwise the TX queue may stop at the HW level.
So, use 2 SGEs for the skb linear part which contains more than the
packet header.
Fixes:
ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Haiyang Zhang [Fri, 29 Sep 2023 20:42:26 +0000 (13:42 -0700)]
net: mana: Fix the tso_bytes calculation
sizeof(struct hop_jumbo_hdr) is not part of tso_bytes, so remove
the subtraction from header size.
Cc: stable@vger.kernel.org
Fixes:
bd7fc6e1957c ("net: mana: Add new MANA VF performance counters for easier troubleshooting")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Haiyang Zhang [Fri, 29 Sep 2023 20:42:25 +0000 (13:42 -0700)]
net: mana: Fix TX CQE error handling
For an unknown TX CQE error type (probably from a newer hardware),
still free the SKB, update the queue tail, etc., otherwise the
accounting will be wrong.
Also, TX errors can be triggered by injecting corrupted packets, so
replace the WARN_ONCE to ratelimited error logging.
Cc: stable@vger.kernel.org
Fixes:
ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Thu, 5 Oct 2023 01:19:55 +0000 (18:19 -0700)]
Merge tag 'rtla-v6.6-fixes' of git://git./linux/kernel/git/bristot/linux
Pull rtla fixes from Daniel Bristot de Oliveira:
"rtla (Real-Time Linux Analysis) tool fixes.
Timerlat auto-analysis:
- Timerlat is reporting thread interference time without thread noise
events occurrence. It was caused because the thread interference
variable was not reset after the analysis of a timerlat activation
that did not hit the threshold.
- The IRQ handler delay is estimated from the delta of the IRQ
latency reported by timerlat, and the timestamp from IRQ handler
start event. If the delta is near-zero, the drift from the external
clock and the trace event and/or the overhead can cause the value
to be negative. If the value is negative, print a zero-delay.
- IRQ handlers happening after the timerlat thread event but before
the stop tracing were being reported as IRQ that happened before
the *current* IRQ occurrence. Ignore Previous IRQ noise in this
condition because they are valid only for the *next* timerlat
activation.
Timerlat user-space:
- Timerlat is stopping all user-space thread if a CPU becomes
offline. Do not stop the entire tool if a CPU is/become offline,
but only the thread of the unavailable CPU. Stop the tool only, if
all threads leave because the CPUs become/are offline.
man-pages:
- Fix command line example in timerlat hist man page"
* tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux:
rtla: fix a example in rtla-timerlat-hist.rst
rtla/timerlat: Do not stop user-space if a cpu is offline
rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after thread sample
rtla/timerlat_aa: Fix negative IRQ delay
rtla/timerlat_aa: Zero thread sum after every sample analysis
Eric Dumazet [Tue, 3 Oct 2023 18:34:55 +0000 (18:34 +0000)]
netlink: annotate data-races around sk->sk_err
syzbot caught another data-race in netlink when
setting sk->sk_err.
Annotate all of them for good measure.
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0x00000016
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sun, 1 Oct 2023 15:04:20 +0000 (11:04 -0400)]
sctp: update hb timer immediately after users change hb_interval
Currently, when hb_interval is changed by users, it won't take effect
until the next expiry of hb timer. As the default value is 30s, users
have to wait up to 30s to wait its hb_interval update to work.
This becomes pretty bad in containers where a much smaller value is
usually set on hb_interval. This patch improves it by resetting the
hb timer immediately once the value of hb_interval is updated by users.
Note that we don't address the already existing 'problem' when sending
a heartbeat 'on demand' if one hb has just been sent(from the timer)
mentioned in:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg590224.html
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/75465785f8ee5df2fb3acdca9b8fafdc18984098.1696172660.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Sun, 1 Oct 2023 14:58:45 +0000 (10:58 -0400)]
sctp: update transport state when processing a dupcook packet
During the 4-way handshake, the transport's state is set to ACTIVE in
sctp_process_init() when processing INIT_ACK chunk on client or
COOKIE_ECHO chunk on server.
In the collision scenario below:
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag:
3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag:
144230885]
192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag:
3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag:
3914796021]
when processing COOKIE_ECHO on 192.168.1.2, as it's in COOKIE_WAIT state,
sctp_sf_do_dupcook_b() is called by sctp_sf_do_5_2_4_dupcook() where it
creates a new association and sets its transport to ACTIVE then updates
to the old association in sctp_assoc_update().
However, in sctp_assoc_update(), it will skip the transport update if it
finds a transport with the same ipaddr already existing in the old asoc,
and this causes the old asoc's transport state not to move to ACTIVE
after the handshake.
This means if DATA retransmission happens at this moment, it won't be able
to enter PF state because of the check 'transport->state == SCTP_ACTIVE'
in sctp_do_8_2_transport_strike().
This patch fixes it by updating the transport in sctp_assoc_update() with
sctp_assoc_add_peer() where it updates the transport state if there is
already a transport with the same ipaddr exists in the old asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/fd17356abe49713ded425250cc1ae51e9f5846c6.1696172325.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Neal Cardwell [Sun, 1 Oct 2023 15:12:39 +0000 (11:12 -0400)]
tcp: fix delayed ACKs for MSS boundary condition
This commit fixes poor delayed ACK behavior that can cause poor TCP
latency in a particular boundary condition: when an application makes
a TCP socket write that is an exact multiple of the MSS size.
The problem is that there is painful boundary discontinuity in the
current delayed ACK behavior. With the current delayed ACK behavior,
we have:
(1) If an app reads data when > 1*MSS is unacknowledged, then
tcp_cleanup_rbuf() ACKs immediately because of:
tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
(2) If an app reads all received data, and the packets were < 1*MSS,
and either (a) the app is not ping-pong or (b) we received two
packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
of:
((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
!inet_csk_in_pingpong_mode(sk))) &&
(3) *However*: if an app reads exactly 1*MSS of data,
tcp_cleanup_rbuf() does not send an immediate ACK. This is true
even if the app is not ping-pong and the 1*MSS of data had the PSH
bit set, suggesting the sending application completed an
application write.
Thus if the app is not ping-pong, we have this painful case where
>1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
write whose last skb is an exact multiple of 1*MSS can get a 40ms
delayed ACK. This means that any app that transfers data in one
direction and takes care to align write size or packet size with MSS
can suffer this problem. With receive zero copy making 4KB MSS values
more common, it is becoming more common to have application writes
naturally align with MSS, and more applications are likely to
encounter this delayed ACK problem.
The fix in this commit is to refine the delayed ACK heuristics with a
simple check: immediately ACK a received 1*MSS skb with PSH bit set if
the app reads all data. Why? If an skb has a len of exactly 1*MSS and
has the PSH bit set then it is likely the end of an application
write. So more data may not be arriving soon, and yet the data sender
may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
an ACK immediately if the app reads all of the data and is not
ping-pong. Note that this logic is also executed for the case where
len > MSS, but in that case this logic does not matter (and does not
hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
app reads data and there is more than an MSS of unACKed data.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Guo <guoxin0309@gmail.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Neal Cardwell [Sun, 1 Oct 2023 15:12:38 +0000 (11:12 -0400)]
tcp: fix quick-ack counting to count actual ACKs of new data
This commit fixes quick-ack counting so that it only considers that a
quick-ack has been provided if we are sending an ACK that newly
acknowledges data.
The code was erroneously using the number of data segments in outgoing
skbs when deciding how many quick-ack credits to remove. This logic
does not make sense, and could cause poor performance in
request-response workloads, like RPC traffic, where requests or
responses can be multi-segment skbs.
When a TCP connection decides to send N quick-acks, that is to
accelerate the cwnd growth of the congestion control module
controlling the remote endpoint of the TCP connection. That quick-ack
decision is purely about the incoming data and outgoing ACKs. It has
nothing to do with the outgoing data or the size of outgoing data.
And in particular, an ACK only serves the intended purpose of allowing
the remote congestion control to grow the congestion window quickly if
the ACK is ACKing or SACKing new data.
The fix is simple: only count packets as serving the goal of the
quickack mechanism if they are ACKing/SACKing new data. We can tell
whether this is the case by checking inet_csk_ack_scheduled(), since
we schedule an ACK exactly when we are ACKing/SACKing new data.
Fixes:
fc6415bcb0f5 ("[TCP]: Fix quick-ack decrementing with TSO.")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Oct 2023 21:53:17 +0000 (14:53 -0700)]
Merge tag 'nf-23-10-04' of https://git./linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter patches for net
First patch resolves a regression with vlan header matching, this was
broken since 6.5 release. From myself.
Second patch fixes an ancient problem with sctp connection tracking in
case INIT_ACK packets are delayed. This comes with a selftest, both
patches from Xin Long.
Patch 4 extends the existing nftables audit selftest, from
Phil Sutter.
Patch 5, also from Phil, avoids a situation where nftables
would emit an audit record twice. This was broken since 5.13 days.
Patch 6, from myself, avoids spurious insertion failure if we encounter an
overlapping but expired range during element insertion with the
'nft_set_rbtree' backend. This problem exists since 6.2.
* tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
netfilter: nf_tables: Deduplicate nft_register_obj audit logs
selftests: netfilter: Extend nft_audit.sh
selftests: netfilter: test for sctp collision processing in nf_conntrack
netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
netfilter: nft_payload: rebuild vlan header on h_proto access
====================
Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Sun, 1 Oct 2023 00:38:45 +0000 (17:38 -0700)]
page_pool: fix documentation typos
Correct grammar for better readability.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20231001003846.29541-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chengfeng Ye [Wed, 27 Sep 2023 18:14:14 +0000 (18:14 +0000)]
tipc: fix a potential deadlock on &tx->lock
It seems that tipc_crypto_key_revoke() could be be invoked by
wokequeue tipc_crypto_work_rx() under process context and
timer/rx callback under softirq context, thus the lock acquisition
on &tx->lock seems better use spin_lock_bh() to prevent possible
deadlock.
This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.
tipc_crypto_work_rx() <workqueue>
--> tipc_crypto_key_distr()
--> tipc_bcast_xmit()
--> tipc_bcbase_xmit()
--> tipc_bearer_bc_xmit()
--> tipc_crypto_xmit()
--> tipc_ehdr_build()
--> tipc_crypto_key_revoke()
--> spin_lock(&tx->lock)
<timer interrupt>
--> tipc_disc_timeout()
--> tipc_bearer_xmit_skb()
--> tipc_crypto_xmit()
--> tipc_ehdr_build()
--> tipc_crypto_key_revoke()
--> spin_lock(&tx->lock) <deadlock here>
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Fixes:
fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
Link: https://lore.kernel.org/r/20230927181414.59928-1-dg573847474@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ben Wolsieffer [Wed, 27 Sep 2023 17:57:49 +0000 (13:57 -0400)]
net: stmmac: dwmac-stm32: fix resume on STM32 MCU
The STM32MP1 keeps clk_rx enabled during suspend, and therefore the
driver does not enable the clock in stm32_dwmac_init() if the device was
suspended. The problem is that this same code runs on STM32 MCUs, which
do disable clk_rx during suspend, causing the clock to never be
re-enabled on resume.
This patch adds a variant flag to indicate that clk_rx remains enabled
during suspend, and uses this to decide whether to enable the clock in
stm32_dwmac_init() if the device was suspended.
This approach fixes this specific bug with limited opportunity for
unintended side-effects, but I have a follow up patch that will refactor
the clock configuration and hopefully make it less error prone.
Fixes:
6528e02cc9ff ("net: ethernet: stmmac: add adaptation for stm32mp157c.")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230927175749.1419774-1-ben.wolsieffer@hefring.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Benjamin Poirier [Tue, 26 Sep 2023 18:27:30 +0000 (14:27 -0400)]
ipv4: Set offload_failed flag in fibmatch results
Due to a small omission, the offload_failed flag is missing from ipv4
fibmatch results. Make sure it is set correctly.
The issue can be witnessed using the following commands:
echo "1 1" > /sys/bus/netdevsim/new_device
ip link add dummy1 up type dummy
ip route add 192.0.2.0/24 dev dummy1
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/fib/fail_route_offload
ip route add 198.51.100.0/24 dev dummy1
ip route
# 192.168.15.0/24 has rt_trap
# 198.51.100.0/24 has rt_offload_failed
ip route get 192.168.15.1 fibmatch
# Result has rt_trap
ip route get 198.51.100.1 fibmatch
# Result differs from the route shown by `ip route`, it is missing
# rt_offload_failed
ip link del dev dummy1
echo 1 > /sys/bus/netdevsim/del_device
Fixes:
36c5100e859d ("IPv4: Add "offload failed" indication to routes")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230926182730.231208-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 4 Oct 2023 18:35:23 +0000 (11:35 -0700)]
Merge tag 'linux-kselftest-fixes-6.6-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"One single fix to Makefile to fix the incorrect TARGET name for uevent
test"
* tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Fix wrong TARGET in kselftest top level Makefile
Jakub Kicinski [Wed, 4 Oct 2023 18:30:21 +0000 (11:30 -0700)]
Merge tag 'wireless-2023-09-27' of git://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Quite a collection of fixes this time, really too many
to list individually. Many stack fixes, even rfkill
(found by simulation and the new eevdf scheduler)!
Also a bigger maintainers file cleanup, to remove old
and redundant information.
* tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (32 commits)
wifi: iwlwifi: mvm: Fix incorrect usage of scan API
wifi: mac80211: Create resources for disabled links
wifi: cfg80211: avoid leaking stack data into trace
wifi: mac80211: allow transmitting EAPOL frames with tainted key
wifi: mac80211: work around Cisco AP 9115 VHT MPDU length
wifi: cfg80211: Fix 6GHz scan configuration
wifi: mac80211: fix potential key leak
wifi: mac80211: fix potential key use-after-free
wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling
wifi: brcmfmac: Replace 1-element arrays with flexible arrays
wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet
wifi: rtw88: rtw8723d: Fix MAC address offset in EEPROM
rfkill: sync before userspace visibility/changes
wifi: mac80211: fix mesh id corruption on 32 bit systems
wifi: cfg80211: add missing kernel-doc for cqm_rssi_work
wifi: cfg80211: fix cqm_config access race
wifi: iwlwifi: mvm: Fix a memory corruption issue
wifi: iwlwifi: Ensure ack flag is properly cleared.
wifi: iwlwifi: dbg_ini: fix structure packing
iwlwifi: mvm: handle PS changes in vif_cfg_changed
...
====================
Link: https://lore.kernel.org/r/20230927095835.25803-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 4 Oct 2023 15:28:07 +0000 (08:28 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-10-02
We've added 11 non-merge commits during the last 12 day(s) which contain
a total of 12 files changed, 176 insertions(+), 41 deletions(-).
The main changes are:
1) Fix BPF verifier to reset backtrack_state masks on global function
exit as otherwise subsequent precision tracking would reuse them,
from Andrii Nakryiko.
2) Several sockmap fixes for available bytes accounting,
from John Fastabend.
3) Reject sk_msg egress redirects to non-TCP sockets given this
is only supported for TCP sockets today, from Jakub Sitnicki.
4) Fix a syzkaller splat in bpf_mprog when hitting maximum program
limits with BPF_F_BEFORE directive, from Daniel Borkmann
and Nikolay Aleksandrov.
5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust
size_index for selecting a bpf_mem_cache, from Hou Tao.
6) Fix arch_prepare_bpf_trampoline return code for s390 JIT,
from Song Liu.
7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off,
from Leon Hwang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Use kmalloc_size_roundup() to adjust size_index
selftest/bpf: Add various selftests for program limits
bpf, mprog: Fix maximum program check on mprog attachment
bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets
bpf, sockmap: Add tests for MSG_F_PEEK
bpf, sockmap: Do not inc copied_seq when PEEK flag set
bpf: tcp_read_skb needs to pop skb regardless of seq
bpf: unconditionally reset backtrack_state masks on global func exit
bpf: Fix tr dereferencing
selftests/bpf: Check bpf_cubic_acked() is called via struct_ops
s390/bpf: Let arch_prepare_bpf_trampoline return program size
====================
Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Thu, 28 Sep 2023 13:12:44 +0000 (15:12 +0200)]
netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
nft_rbtree_gc_elem() walks back and removes the end interval element that
comes before the expired element.
There is a small chance that we've cached this element as 'rbe_ge'.
If this happens, we hold and test a pointer that has been queued for
freeing.
It also causes spurious insertion failures:
$ cat test-testcases-sets-0044interval_overlap_0.1/testout.log
Error: Could not process rule: File exists
add element t s { 0 - 2 }
^^^^^^
Failed to insert 0 - 2 given:
table ip t {
set s {
type inet_service
flags interval,timeout
timeout 2s
gc-interval 2s
}
}
The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.
Reason is that when we try to insert, the tree may hold an expired
element that collides with the range we're adding.
While we do evict/erase this element, we can trip over this check:
if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
return -ENOTEMPTY;
rbe_ge was erased by the synchronous gc, we should not have done this
check. Next attempt won't find it, so retry results in successful
insertion.
Restart in-kernel to avoid such spurious errors.
Such restart are rare, unless userspace intentionally adds very large
numbers of elements with very short timeouts while setting a huge
gc interval.
Even in this case, this cannot loop forever, on each retry an existing
element has been removed.
As the caller is holding the transaction mutex, its impossible
for a second entity to add more expiring elements to the tree.
After this it also becomes feasible to remove the async gc worker
and perform all garbage collection from the commit path.
Fixes:
c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Sat, 23 Sep 2023 01:53:50 +0000 (03:53 +0200)]
netfilter: nf_tables: Deduplicate nft_register_obj audit logs
When adding/updating an object, the transaction handler emits suitable
audit log entries already, the one in nft_obj_notify() is redundant. To
fix that (and retain the audit logging from objects' 'update' callback),
Introduce an "audit log free" variant for internal use.
Fixes:
c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Sat, 23 Sep 2023 01:53:49 +0000 (03:53 +0200)]
selftests: netfilter: Extend nft_audit.sh
Add tests for sets and elements and deletion of all kinds. Also
reorder rule reset tests: By moving the bulk rule add command up, the
two 'reset rules' tests become identical.
While at it, fix for a failing bulk rule add test's error status getting
lost due to its use in a pipe. Avoid this by using a temporary file.
Headings in diff output for failing tests contain no useful data, strip
them.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Xin Long [Tue, 3 Oct 2023 17:17:54 +0000 (13:17 -0400)]
selftests: netfilter: test for sctp collision processing in nf_conntrack
This patch adds a test case to reproduce the SCTP DATA chunk retransmission
timeout issue caused by the improper SCTP collision processing in netfilter
nf_conntrack_proto_sctp.
In this test, client sends a INIT chunk, but the INIT_ACK replied from
server is delayed until the server sends a INIT chunk to start a new
connection from its side. After the connection is complete from server
side, the delayed INIT_ACK arrives in nf_conntrack_proto_sctp.
The delayed INIT_ACK should be dropped in nf_conntrack_proto_sctp instead
of updating the vtag with the out-of-date init_tag, otherwise, the vtag
in DATA chunks later sent by client don't match the vtag in the conntrack
entry and the DATA chunks get dropped.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Xin Long [Tue, 3 Oct 2023 17:17:53 +0000 (13:17 -0400)]
netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp
In Scenario A and B below, as the delayed INIT_ACK always changes the peer
vtag, SCTP ct with the incorrect vtag may cause packet loss.
Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK
192.168.1.2 > 192.168.1.1: [INIT] [init tag:
1328086772]
192.168.1.1 > 192.168.1.2: [INIT] [init tag:
1414468151]
192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag:
1328086772]
192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag:
1650211246] *
192.168.1.2 > 192.168.1.1: [COOKIE ECHO]
192.168.1.1 > 192.168.1.2: [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: [COOKIE ACK]
Scenario B: INIT_ACK is delayed until the peer completes its own handshake
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag:
3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag:
144230885]
192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag:
3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag:
3914796021] *
This patch fixes it as below:
In SCTP_CID_INIT processing:
- clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] &&
ct->proto.sctp.init[!dir]. (Scenario E)
- set ct->proto.sctp.init[dir].
In SCTP_CID_INIT_ACK processing:
- drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] &&
ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C)
- drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A)
In SCTP_CID_COOKIE_ACK processing:
- clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir].
(Scenario D)
Also, it's important to allow the ct state to move forward with cookie_echo
and cookie_ack from the opposite dir for the collision scenarios.
There are also other Scenarios where it should allow the packet through,
addressed by the processing above:
Scenario C: new CT is created by INIT_ACK.
Scenario D: start INIT on the existing ESTABLISHED ct.
Scenario E: start INIT after the old collision on the existing ESTABLISHED
ct.
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag:
3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag:
144230885]
(both side are stopped, then start new connection again in hours)
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag:
242308742]
Fixes:
9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Fri, 29 Sep 2023 08:42:10 +0000 (10:42 +0200)]
netfilter: nft_payload: rebuild vlan header on h_proto access
nft can perform merging of adjacent payload requests.
This means that:
ether saddr 00:11 ... ether type 8021ad ...
is a single payload expression, for 8 bytes, starting at the
ethernet source offset.
Check that offset+length is fully within the source/destination mac
addersses.
This bug prevents 'ether type' from matching the correct h_proto in case
vlan tag got stripped.
Fixes:
de6843be3082 ("netfilter: nft_payload: rebuild vlan header when needed")
Reported-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
David Wilder [Tue, 26 Sep 2023 21:42:51 +0000 (16:42 -0500)]
ibmveth: Remove condition to recompute TCP header checksum.
In some OVS environments the TCP pseudo header checksum may need to be
recomputed. Currently this is only done when the interface instance is
configured for "Trunk Mode". We found the issue also occurs in some
Kubernetes environments, these environments do not use "Trunk Mode",
therefor the condition is removed.
Performance tests with this change show only a fractional decrease in
throughput (< 0.2%).
Fixes:
7525de2516fb ("ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM.")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 26 Sep 2023 14:06:58 +0000 (17:06 +0300)]
dmaengine: ti: k3-udma-glue: clean up k3_udma_glue_tx_get_irq() return
The k3_udma_glue_tx_get_irq() function currently returns negative error
codes on error, zero on error and positive values for success. This
complicates life for the callers who need to propagate the error code.
Also GCC will not warn about unsigned comparisons when you check:
if (unsigned_irq <= 0)
All the callers have been fixed now but let's just make this easy going
forward.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 26 Sep 2023 14:05:59 +0000 (17:05 +0300)]
net: ti: icssg-prueth: Fix signedness bug in prueth_init_tx_chns()
The "tx_chn->irq" variable is unsigned so the error checking does not
work correctly.
Fixes:
128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 26 Sep 2023 14:04:43 +0000 (17:04 +0300)]
net: ethernet: ti: am65-cpsw: Fix error code in am65_cpsw_nuss_init_tx_chns()
This accidentally returns success, but it should return a negative error
code.
Fixes:
93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Mon, 25 Sep 2023 10:30:57 +0000 (12:30 +0200)]
vringh: don't use vringh_kiov_advance() in vringh_iov_xfer()
In the while loop of vringh_iov_xfer(), `partlen` could be 0 if one of
the `iov` has 0 lenght.
In this case, we should skip the iov and go to the next one.
But calling vringh_kiov_advance() with 0 lenght does not cause the
advancement, since it returns immediately if asked to advance by 0 bytes.
Let's restore the code that was there before commit
b8c06ad4d67d
("vringh: implement vringh_kiov_advance()"), avoiding using
vringh_kiov_advance().
Fixes:
b8c06ad4d67d ("vringh: implement vringh_kiov_advance()")
Cc: stable@vger.kernel.org
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda [Tue, 26 Sep 2023 12:30:54 +0000 (21:30 +0900)]
rswitch: Fix PHY station management clock setting
Fix the MPIC.PSMCS value following the programming example in the
section 6.4.2 Management Data Clock (MDC) Setting, Ethernet MAC IP,
S4 Hardware User Manual Rev.1.00.
The value is calculated by
MPIC.PSMCS = clk[MHz] / (MDC frequency[MHz] * 2) - 1
with the input clock frequency from clk_get_rate() and MDC frequency
of 2.5MHz. Otherwise, this driver cannot communicate PHYs on the R-Car
S4 Starter Kit board.
Fixes:
3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Reported-by: Tam Nguyen <tam.nguyen.xa@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230926123054.3976752-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 3 Oct 2023 19:13:10 +0000 (12:13 -0700)]
Merge tag 'nfs-for-6.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"Stable fixes:
- Revert "SUNRPC dont update timeout value on connection reset"
- NFSv4: Fix a state manager thread deadlock regression
Fixes:
- Fix potential NULL pointer dereference in nfs_inode_remove_request()
- Fix rare NULL pointer dereference in xs_tcp_tls_setup_socket()
- Fix long delay before failing a TLS mount when server does not
support TLS
- Fix various NFS state manager issues"
* tag 'nfs-for-6.6-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: decrement nrequests counter before releasing the req
SUNRPC/TLS: Lock the lower_xprt during the tls handshake
Revert "SUNRPC dont update timeout value on connection reset"
NFSv4: Fix a state manager thread deadlock regression
NFSv4: Fix a nfs4_state_manager() race
SUNRPC: Fail quickly when server does not recognize TLS
Linus Torvalds [Tue, 3 Oct 2023 18:59:46 +0000 (11:59 -0700)]
Merge tag 'regulator-fix-v6.6-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Two things here, one is an improved fix for issues around freeing
devices when registration fails which replaces a half baked fix with a
more complete one which uses the device model release() function
properly.
The other fix is a device specific fix for mt6358, the driver said
that the LDOs supported mode configuration but this is not actually
the case and could cause issues"
* tag 'regulator-fix-v6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator/core: Revert "fix kobject release warning and memory leak in regulator_register()"
regulator/core: regulator_register: set device->class earlier
regulator: mt6358: split ops for buck and linear range LDO regulators
Linus Torvalds [Tue, 3 Oct 2023 18:57:37 +0000 (11:57 -0700)]
Merge tag 'regmap-fix-v6.6-rc4' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"A fix for a long standing issue where when we create a new node in an
rbtree register cache we were failing to convert the register address
of the new register into a bitmask correctly and marking the wrong
register as being present in the newly created node.
This would only have affected devices with a register stride other
than 1 but would corrupt data on those devices"
* tag 'regmap-fix-v6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: rbtree: Fix wrong register marked as in-cache when creating new node
Linus Torvalds [Tue, 3 Oct 2023 17:15:10 +0000 (10:15 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes, all in drivers.
The fnic one is the most extensive because the little used user
initiated device reset path never tagged the command and adding a tag
is rather involved. The other two fixes are smaller and more obvious"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: zfcp: Fix a double put in zfcp_port_enqueue()
scsi: fnic: Fix sg_reset success path
scsi: target: core: Fix deadlock due to recursive locking
Jeremy Cline [Fri, 8 Sep 2023 23:58:53 +0000 (19:58 -0400)]
net: nfc: llcp: Add lock when modifying device list
The device list needs its associated lock held when modifying it, or the
list could become corrupted, as syzbot discovered.
Reported-and-tested-by: syzbot+c1d0a03d305972dbbe14@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
c1d0a03d305972dbbe14
Signed-off-by: Jeremy Cline <jeremy@jcline.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes:
6709d4b7bc2e ("net: nfc: Fix use-after-free caused by nfc_llcp_find_local")
Link: https://lore.kernel.org/r/20230908235853.1319596-1-jeremy@jcline.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Fri, 8 Sep 2023 04:45:48 +0000 (10:15 +0530)]
ethtool: plca: fix plca enable data type while parsing the value
The ETHTOOL_A_PLCA_ENABLED data type is u8. But while parsing the
value from the attribute, nla_get_u32() is used in the plca_update_sint()
function instead of nla_get_u8(). So plca_cfg.enabled variable is updated
with some garbage value instead of 0 or 1 and always enables plca even
though plca is disabled through ethtool application. This bug has been
fixed by parsing the values based on the attributes type in the policy.
Fixes:
8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS")
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230908044548.5878-1-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gustavo A. R. Silva [Sun, 24 Sep 2023 01:15:59 +0000 (19:15 -0600)]
qed/red_ll2: Fix undefined behavior bug in struct qed_ll2_info
The flexible structure (a structure that contains a flexible-array member
at the end) `qed_ll2_tx_packet` is nested within the second layer of
`struct qed_ll2_info`:
struct qed_ll2_tx_packet {
...
/* Flexible Array of bds_set determined by max_bds_per_packet */
struct {
struct core_tx_bd *txq_bd;
dma_addr_t tx_frag;
u16 frag_len;
} bds_set[];
};
struct qed_ll2_tx_queue {
...
struct qed_ll2_tx_packet cur_completing_packet;
};
struct qed_ll2_info {
...
struct qed_ll2_tx_queue tx_queue;
struct qed_ll2_cbs cbs;
};
The problem is that member `cbs` in `struct qed_ll2_info` is placed just
after an object of type `struct qed_ll2_tx_queue`, which is in itself
an implicit flexible structure, which by definition ends in a flexible
array member, in this case `bds_set`. This causes an undefined behavior
bug at run-time when dynamic memory is allocated for `bds_set`, which
could lead to a serious issue if `cbs` in `struct qed_ll2_info` is
overwritten by the contents of `bds_set`. Notice that the type of `cbs`
is a structure full of function pointers (and a cookie :) ):
include/linux/qed/qed_ll2_if.h:
107 typedef
108 void (*qed_ll2_complete_rx_packet_cb)(void *cxt,
109 struct qed_ll2_comp_rx_data *data);
110
111 typedef
112 void (*qed_ll2_release_rx_packet_cb)(void *cxt,
113 u8 connection_handle,
114 void *cookie,
115 dma_addr_t rx_buf_addr,
116 bool b_last_packet);
117
118 typedef
119 void (*qed_ll2_complete_tx_packet_cb)(void *cxt,
120 u8 connection_handle,
121 void *cookie,
122 dma_addr_t first_frag_addr,
123 bool b_last_fragment,
124 bool b_last_packet);
125
126 typedef
127 void (*qed_ll2_release_tx_packet_cb)(void *cxt,
128 u8 connection_handle,
129 void *cookie,
130 dma_addr_t first_frag_addr,
131 bool b_last_fragment, bool b_last_packet);
132
133 typedef
134 void (*qed_ll2_slowpath_cb)(void *cxt, u8 connection_handle,
135 u32 opaque_data_0, u32 opaque_data_1);
136
137 struct qed_ll2_cbs {
138 qed_ll2_complete_rx_packet_cb rx_comp_cb;
139 qed_ll2_release_rx_packet_cb rx_release_cb;
140 qed_ll2_complete_tx_packet_cb tx_comp_cb;
141 qed_ll2_release_tx_packet_cb tx_release_cb;
142 qed_ll2_slowpath_cb slowpath_cb;
143 void *cookie;
144 };
Fix this by moving the declaration of `cbs` to the middle of its
containing structure `qed_ll2_info`, preventing it from being
overwritten by the contents of `bds_set` at run-time.
This bug was introduced in 2017, when `bds_set` was converted to a
one-element array, and started to be used as a Variable Length Object
(VLO) at run-time.
Fixes:
f5823fe6897c ("qed: Add ll2 option to limit the number of bds per packet")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ZQ+Nz8DfPg56pIzr@work
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shigeru Yoshida [Sat, 23 Sep 2023 17:35:49 +0000 (02:35 +0900)]
net: usb: smsc75xx: Fix uninit-value access in __smsc75xx_read_reg
syzbot reported the following uninit-value access issue:
=====================================================
BUG: KMSAN: uninit-value in smsc75xx_wait_ready drivers/net/usb/smsc75xx.c:975 [inline]
BUG: KMSAN: uninit-value in smsc75xx_bind+0x5c9/0x11e0 drivers/net/usb/smsc75xx.c:1482
CPU: 0 PID: 8696 Comm: kworker/0:3 Not tainted 5.8.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x21c/0x280 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
__msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
smsc75xx_wait_ready drivers/net/usb/smsc75xx.c:975 [inline]
smsc75xx_bind+0x5c9/0x11e0 drivers/net/usb/smsc75xx.c:1482
usbnet_probe+0x1152/0x3f90 drivers/net/usb/usbnet.c:1737
usb_probe_interface+0xece/0x1550 drivers/usb/core/driver.c:374
really_probe+0xf20/0x20b0 drivers/base/dd.c:529
driver_probe_device+0x293/0x390 drivers/base/dd.c:701
__device_attach_driver+0x63f/0x830 drivers/base/dd.c:807
bus_for_each_drv+0x2ca/0x3f0 drivers/base/bus.c:431
__device_attach+0x4e2/0x7f0 drivers/base/dd.c:873
device_initial_probe+0x4a/0x60 drivers/base/dd.c:920
bus_probe_device+0x177/0x3d0 drivers/base/bus.c:491
device_add+0x3b0e/0x40d0 drivers/base/core.c:2680
usb_set_configuration+0x380f/0x3f10 drivers/usb/core/message.c:2032
usb_generic_driver_probe+0x138/0x300 drivers/usb/core/generic.c:241
usb_probe_device+0x311/0x490 drivers/usb/core/driver.c:272
really_probe+0xf20/0x20b0 drivers/base/dd.c:529
driver_probe_device+0x293/0x390 drivers/base/dd.c:701
__device_attach_driver+0x63f/0x830 drivers/base/dd.c:807
bus_for_each_drv+0x2ca/0x3f0 drivers/base/bus.c:431
__device_attach+0x4e2/0x7f0 drivers/base/dd.c:873
device_initial_probe+0x4a/0x60 drivers/base/dd.c:920
bus_probe_device+0x177/0x3d0 drivers/base/bus.c:491
device_add+0x3b0e/0x40d0 drivers/base/core.c:2680
usb_new_device+0x1bd4/0x2a30 drivers/usb/core/hub.c:2554
hub_port_connect drivers/usb/core/hub.c:5208 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5348 [inline]
port_event drivers/usb/core/hub.c:5494 [inline]
hub_event+0x5e7b/0x8a70 drivers/usb/core/hub.c:5576
process_one_work+0x1688/0x2140 kernel/workqueue.c:2269
worker_thread+0x10bc/0x2730 kernel/workqueue.c:2415
kthread+0x551/0x590 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
Local variable ----buf.i87@smsc75xx_bind created at:
__smsc75xx_read_reg drivers/net/usb/smsc75xx.c:83 [inline]
smsc75xx_wait_ready drivers/net/usb/smsc75xx.c:968 [inline]
smsc75xx_bind+0x485/0x11e0 drivers/net/usb/smsc75xx.c:1482
__smsc75xx_read_reg drivers/net/usb/smsc75xx.c:83 [inline]
smsc75xx_wait_ready drivers/net/usb/smsc75xx.c:968 [inline]
smsc75xx_bind+0x485/0x11e0 drivers/net/usb/smsc75xx.c:1482
This issue is caused because usbnet_read_cmd() reads less bytes than requested
(zero byte in the reproducer). In this case, 'buf' is not properly filled.
This patch fixes the issue by returning -ENODATA if usbnet_read_cmd() reads
less bytes than requested.
Fixes:
d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Reported-and-tested-by: syzbot+6966546b78d050bb0b5d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=
6966546b78d050bb0b5d
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230923173549.3284502-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ilya Maximets [Fri, 22 Sep 2023 21:04:58 +0000 (23:04 +0200)]
ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling
Commit
b0e214d21203 ("netfilter: keep conntrack reference until
IPsecv6 policy checks are done") is a direct copy of the old
commit
b59c270104f0 ("[NETFILTER]: Keep conntrack reference until
IPsec policy checks are done") but for IPv6. However, it also
copies a bug that this old commit had. That is: when the third
packet of 3WHS connection establishment contains payload, it is
added into socket receive queue without the XFRM check and the
drop of connection tracking context.
That leads to nf_conntrack module being impossible to unload as
it waits for all the conntrack references to be dropped while
the packet release is deferred in per-cpu cache indefinitely, if
not consumed by the application.
The issue for IPv4 was fixed in commit
6f0012e35160 ("tcp: add a
missing nf_reset_ct() in 3WHS handling") by adding a missing XFRM
check and correctly dropping the conntrack context. However, the
issue was introduced to IPv6 code afterwards. Fixing it the
same way for IPv6 now.
Fixes:
b0e214d21203 ("netfilter: keep conntrack reference until IPsecv6 policy checks are done")
Link: https://lore.kernel.org/netdev/d589a999-d4dd-2768-b2d5-89dec64a4a42@ovn.org/
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230922210530.2045146-1-i.maximets@ovn.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hangbin Liu [Fri, 22 Sep 2023 07:55:08 +0000 (15:55 +0800)]
ipv4/fib: send notify when delete source address routes
After deleting an interface address in fib_del_ifaddr(), the function
scans the fib_info list for stray entries and calls fib_flush() and
fib_table_flush(). Then the stray entries will be deleted silently and no
RTM_DELROUTE notification will be sent.
This lack of notification can make routing daemons, or monitor like
`ip monitor route` miss the routing changes. e.g.
+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 192.168.5.5/24 dev dummy1
+ ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5
+ ip -4 route
7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
As Ido reminded, fib_table_flush() isn't only called when an address is
deleted, but also when an interface is deleted or put down. The lack of
notification in these cases is deliberate. And commit
7c6bb7d2faaf
("net/ipv6: Add knob to skip DELROUTE message on device down") introduced
a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send
the route delete notify blindly in fib_table_flush().
To fix this issue, let's add a new flag in "struct fib_info" to track the
deleted prefer source address routes, and only send notify for them.
After update:
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
Suggested-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230922075508.848925-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Amir Goldstein [Tue, 3 Oct 2023 06:21:27 +0000 (09:21 +0300)]
ovl: fix NULL pointer defer when encoding non-decodable lower fid
A wrong return value from ovl_check_encode_origin() would cause
ovl_dentry_to_fid() to try to encode fid from NULL upper dentry.
Reported-by: syzbot+2208f82282740c1c8915@syzkaller.appspotmail.com
Fixes:
16aac5ad1fa9 ("ovl: support encoding non-decodable file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Linus Torvalds [Mon, 2 Oct 2023 21:10:46 +0000 (14:10 -0700)]
Merge tag 'ubifs-for-linus-6.6-rc5' of git://git./linux/kernel/git/rw/ubifs
Pull UBI fix from Richard Weinberger:
- Don't try to attach MTDs with erase block size 0
* tag 'ubifs-for-linus-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: Refuse attaching if mtd's erasesize is 0
Linus Torvalds [Mon, 2 Oct 2023 20:52:47 +0000 (13:52 -0700)]
Merge tag 'libnvdimm-fixes-6.6-rc5' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Dave Jiang:
- Fix incorrect calculation of idt size in NFIT
* tag 'libnvdimm-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ACPI: NFIT: Fix incorrect calculation of idt size
Linus Torvalds [Mon, 2 Oct 2023 17:39:12 +0000 (10:39 -0700)]
Merge tag 'iommu-fixes-v6.6-rc4' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Arm SMMU fixes from Will Deacon:
- Fix TLB range command encoding when TTL, Num and Scale are all zero
- Fix soft lockup by limiting TLB invalidation ops issued by SVA
- Fix clocks description for SDM630 platform in arm-smmu DT binding
- Intel VT-d fix from Lu Baolu:
- Fix a suspend/hibernation problem in iommu_suspend()
- Mediatek driver: Fix page table sharing for addresses over 4GiB
- Apple/Dart: DMA_FQ handling fix in attach_dev()
* tag 'iommu-fixes-v6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Avoid memory allocation in iommu_suspend()
iommu/apple-dart: Handle DMA_FQ domains in attach_dev()
iommu/mediatek: Fix share pgtable for iova over 4GB
iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range
dt-bindings: arm-smmu: Fix SDM630 clocks description
iommu/arm-smmu-v3: Avoid constructing invalid range commands
Amir Goldstein [Mon, 2 Oct 2023 11:21:49 +0000 (14:21 +0300)]
ovl: make use of ->layers safe in rcu pathwalk
ovl_permission() accesses ->layers[...].mnt; we can't have ->layers
freed without an RCU delay on fs shutdown.
Fortunately, kern_unmount_array() that is used to drop those mounts
does include an RCU delay, so freeing is delayed; unfortunately, the
array passed to kern_unmount_array() is formed by mangling ->layers
contents and that happens without any delays.
The ->layers[...].name string entries are used to store the strings to
display in "lowerdir=..." by ovl_show_options(). Those entries are not
accessed in RCU walk.
Move the name strings into a separate array ofs->config.lowerdirs and
reuse the ofs->config.lowerdirs array as the temporary mount array to
pass to kern_unmount_array().
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20231002023711.GP3389589@ZenIV/
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Al Viro [Mon, 2 Oct 2023 02:36:43 +0000 (03:36 +0100)]
ovl: fetch inode once in ovl_dentry_revalidate_common()
d_inode_rcu() is right - we might be in rcu pathwalk;
however, OVL_E() hides plain d_inode() on the same dentry...
Fixes:
a6ff2bc0be17 ("ovl: use OVL_E() and OVL_E_FLAGS() accessors")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Al Viro [Mon, 2 Oct 2023 02:36:13 +0000 (03:36 +0100)]
ovl: move freeing ovl_entry past rcu delay
... into ->free_inode(), that is.
Fixes:
0af950f57fef "ovl: move ovl_entry into ovl_inode"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Amir Goldstein [Mon, 2 Oct 2023 10:04:45 +0000 (13:04 +0300)]
ovl: fix file reference leak when submitting aio
Commit
724768a39374 ("ovl: fix incorrect fdput() on aio completion")
took a refcount on real file before submitting aio, but forgot to
avoid clearing FDPUT_FPUT from real.flags stack variable.
This can result in a file reference leak.
Fixes:
724768a39374 ("ovl: fix incorrect fdput() on aio completion")
Reported-by: Gil Lev <contact@levgil.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Kees Cook [Fri, 22 Sep 2023 16:50:39 +0000 (09:50 -0700)]
sky2: Make sure there is at least one frag_addr available
In the pathological case of building sky2 with 16k PAGE_SIZE, the
frag_addr[] array would never be used, so the original code was correct
that size should be 0. But the compiler now gets upset with 0 size arrays
in places where it hasn't eliminated the code that might access such an
array (it can't figure out that in this case an rx skb with fragments
would never be created). To keep the compiler happy, make sure there is
at least 1 frag_addr in struct rx_ring_info:
In file included from include/linux/skbuff.h:28,
from include/net/net_namespace.h:43,
from include/linux/netdevice.h:38,
from drivers/net/ethernet/marvell/sky2.c:18:
drivers/net/ethernet/marvell/sky2.c: In function 'sky2_rx_unmap_skb':
include/linux/dma-mapping.h:416:36: warning: array subscript i is outside array bounds of 'dma_addr_t[0]' {aka 'long long unsigned int[]'} [-Warray-bounds=]
416 | #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/marvell/sky2.c:1257:17: note: in expansion of macro 'dma_unmap_page'
1257 | dma_unmap_page(&pdev->dev, re->frag_addr[i],
| ^~~~~~~~~~~~~~
In file included from drivers/net/ethernet/marvell/sky2.c:41:
drivers/net/ethernet/marvell/sky2.h:2198:25: note: while referencing 'frag_addr'
2198 | dma_addr_t frag_addr[ETH_JUMBO_MTU >> PAGE_SHIFT];
| ^~~~~~~~~
With CONFIG_PAGE_SIZE_16KB=y, PAGE_SHIFT == 14, so:
#define ETH_JUMBO_MTU 9000
causes "ETH_JUMBO_MTU >> PAGE_SHIFT" to be 0. Use "?: 1" to solve this build warning.
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202309191958.UBw1cjXk-lkp@intel.com/
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Fri, 22 Sep 2023 12:47:41 +0000 (09:47 -0300)]
net: dsa: mv88e6xxx: Avoid EEPROM timeout when EEPROM is absent
Since commit
23d775f12dcd ("net: dsa: mv88e6xxx: Wait for EEPROM done
before HW reset") the following error is seen on a imx8mn board with
a 88E6320 switch:
mv88e6085
30be0000.ethernet-1:00: Timeout waiting for EEPROM done
This board does not have an EEPROM attached to the switch though.
This problem is well explained by Andrew Lunn:
"If there is an EEPROM, and the EEPROM contains a lot of data, it could
be that when we perform a hardware reset towards the end of probe, it
interrupts an I2C bus transaction, leaving the I2C bus in a bad state,
and future reads of the EEPROM do not work.
The work around for this was to poll the EEInt status and wait for it
to go true before performing the hardware reset.
However, we have discovered that for some boards which do not have an
EEPROM, EEInt never indicates complete. As a result,
mv88e6xxx_g1_wait_eeprom_done() spins for a second and then prints a
warning.
We probably need a different solution than calling
mv88e6xxx_g1_wait_eeprom_done(). The datasheet for 6352 documents the
EEPROM Command register:
bit 15 is:
EEPROM Unit Busy. This bit must be set to a one to start an EEPROM
operation (see EEOp below). Only one EEPROM operation can be
executing at one time so this bit must be zero before setting it to
a one. When the requested EEPROM operation completes this bit will
automatically be cleared to a zero. The transition of this bit from
a one to a zero can be used to generate an interrupt (the EEInt in
Global 1, offset 0x00).
and more interesting is bit 11:
Register Loader Running. This bit is set to one whenever the
register loader is busy executing instructions contained in the
EEPROM."
Change to using mv88e6xxx_g2_eeprom_wait() to fix the timeout error
when the EEPROM chip is not present.
Fixes:
23d775f12dcd ("net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dinghao Liu [Fri, 22 Sep 2023 09:40:44 +0000 (17:40 +0800)]
ptp: ocp: Fix error handling in ptp_ocp_device_init
When device_add() fails, ptp_ocp_dev_release() will be called
after put_device(). Therefore, it seems that the
ptp_ocp_dev_release() before put_device() is redundant.
Fixes:
773bda964921 ("ptp: ocp: Expose various resources on the timecard.")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Vadim Feodrenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 1 Oct 2023 21:15:13 +0000 (14:15 -0700)]
Linux 6.6-rc4
Linus Torvalds [Sun, 1 Oct 2023 20:48:46 +0000 (13:48 -0700)]
Merge tag 'kbuild-fixes-v6.6-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix the module compression with xz so the in-kernel decompressor
works
- Document a kconfig idiom to express an optional dependency between
modules
- Make modpost, when W=1 is given, detect broken drivers that reference
.exit.* sections
- Remove unused code
* tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove stale code for 'source' symlink in packaging scripts
modpost: Don't let "driver"s reference .exit.*
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
modpost: add missing else to the "of" check
Documentation: kbuild: explain handling optional dependencies
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
Linus Torvalds [Sun, 1 Oct 2023 20:33:25 +0000 (13:33 -0700)]
Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
Linus Torvalds [Sun, 1 Oct 2023 19:50:04 +0000 (12:50 -0700)]
Merge tag 'char-misc-6.6-rc4' of git://git./linux/kernel/git/gregkh/char-misc
Pull misc driver fix from Greg KH:
"Here is a single, much requested, fix for a set of misc drivers to
resolve a much reported regression in the -rc series that has also
propagated back to the stable releases. Sorry for the delay, lots of
conference travel for a few weeks put me very far behind in patch
wrangling.
It has been reported by many to resolve the reported problem, and has
been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
Linus Torvalds [Sun, 1 Oct 2023 19:44:45 +0000 (12:44 -0700)]
Merge tag 'tty-6.6-rc4' of git://git./linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are two tty/serial driver fixes for 6.6-rc4 that resolve some
reported regressions:
- revert a n_gsm change that ended up causing problems
- 8250_port fix for irq data
both have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
serial: 8250_port: Check IRQ data before use
Jordan Rife [Thu, 21 Sep 2023 23:46:42 +0000 (18:46 -0500)]
net: prevent address rewrite in kernel_bind()
Similar to the change in commit
0bdf399342c5("net: Avoid address
overwrite in kernel_connect"), BPF hooks run on bind may rewrite the
address passed to kernel_bind(). This change
1) Makes a copy of the bind address in kernel_bind() to insulate
callers.
2) Replaces direct calls to sock->ops->bind() in net with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes:
4fbac77d2d09 ("bpf: Hooks for sys_bind")
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jordan Rife [Thu, 21 Sep 2023 23:46:41 +0000 (18:46 -0500)]
net: prevent rewrite of msg_name in sock_sendmsg()
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
present before passing it down the stack to insulate callers from
changes to the send address.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes:
1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jordan Rife [Thu, 21 Sep 2023 23:46:40 +0000 (18:46 -0500)]
net: replace calls to sock->ops->connect() with kernel_connect()
commit
0bdf399342c5 ("net: Avoid address overwrite in kernel_connect")
ensured that kernel_connect() will not overwrite the address parameter
in cases where BPF connect hooks perform an address rewrite. This change
replaces direct calls to sock->ops->connect() in net with kernel_connect()
to make these call safe.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes:
d74bad4e74ee ("bpf: Hooks for sys_connect")
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 21 Sep 2023 10:41:19 +0000 (11:41 +0100)]
ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()
Including the transhdrlen in length is a problem when the packet is
partially filled (e.g. something like send(MSG_MORE) happened previously)
when appending to an IPv4 or IPv6 packet as we don't want to repeat the
transport header or account for it twice. This can happen under some
circumstances, such as splicing into an L2TP socket.
The symptom observed is a warning in __ip6_append_data():
WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800
that occurs when MSG_SPLICE_PAGES is used to append more data to an already
partially occupied skbuff. The warning occurs when 'copy' is larger than
the amount of data in the message iterator. This is because the requested
length includes the transport header length when it shouldn't. This can be
triggered by, for example:
sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
bind(sfd, ...); // ::1
connect(sfd, ...); // ::1 port 7
send(sfd, buffer, 4100, MSG_MORE);
sendfile(sfd, dfd, NULL, 1024);
Fix this by only adding transhdrlen into the length if the write queue is
empty in l2tp_ip6_sendmsg(), analogously to how UDP does things.
l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds
the UDP packet itself.
Fixes:
a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
cc: syzkaller-bugs@googlegroups.com
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 1 Oct 2023 16:50:58 +0000 (09:50 -0700)]
Merge tag 'x86-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a kerneldoc build warning fix, add SRSO mitigation for
AMD-derived Hygon processors, and fix a SGX kernel crash in the page
fault handler that can trigger when ksgxd races to reclaim the SECS
special page, by making the SECS page unswappable"
* tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race
x86/srso: Add SRSO mitigation for Hygon processors
x86/kgdb: Fix a kerneldoc warning when build with W=1
Linus Torvalds [Sun, 1 Oct 2023 16:41:58 +0000 (09:41 -0700)]
Merge tag 'timers-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a spurious kernel warning during CPU hotplug events that may
trigger when timer/hrtimer softirqs are pending, which are otherwise
hotplug-safe and don't merit a warning"
* tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Tag (hr)timer softirq as hotplug safe
Linus Torvalds [Sun, 1 Oct 2023 16:38:05 +0000 (09:38 -0700)]
Merge tag 'sched-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a RT tasks related lockup/live-lock during CPU offlining"
* tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix live lock between select_fallback_rq() and RT push
Linus Torvalds [Sun, 1 Oct 2023 16:34:53 +0000 (09:34 -0700)]
Merge tag 'perf-urgent-2023-10-01' of git://git./linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Misc fixes: work around an AMD microcode bug on certain models, and
fix kexec kernel PMI handlers on AMD systems that get loaded on older
kernels that have an unexpected register state"
* tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd: Do not WARN() on every IRQ
perf/x86/amd/core: Fix overflow reset on hotplug
Eric Dumazet [Thu, 21 Sep 2023 09:27:13 +0000 (09:27 +0000)]
neighbour: fix data-races around n->output
n->output field can be read locklessly, while a writer
might change the pointer concurrently.
Add missing annotations to prevent load-store tearing.
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 21 Sep 2023 08:46:26 +0000 (08:46 +0000)]
net: fix possible store tearing in neigh_periodic_work()
While looking at a related syzbot report involving neigh_periodic_work(),
I found that I forgot to add an annotation when deleting an
RCU protected item from a list.
Readers use rcu_deference(*np), we need to use either
rcu_assign_pointer() or WRITE_ONCE() on writer side
to prevent store tearing.
I use rcu_assign_pointer() to have lockdep support,
this was the choice made in neigh_flush_dev().
Fixes:
767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Masahiro Yamada [Sun, 1 Oct 2023 14:03:39 +0000 (23:03 +0900)]
kbuild: remove stale code for 'source' symlink in packaging scripts
Since commit
d8131c2965d5 ("kbuild: remove $(MODLIB)/source symlink"),
modules_install does not create the 'source' symlink.
Remove the stale code from builddeb and kernel.spec.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
David S. Miller [Sun, 1 Oct 2023 13:15:29 +0000 (14:15 +0100)]
Merge tag 'for-net-2023-09-20' of git://git./linux/kernel/git/bluetooth/bluetooth
bluetooth pull request for net:
- Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER
- Fix handling of listen for ISO unicast
- Fix build warnings
- Fix leaking content of local_codecs
- Add shutdown function for QCA6174
- Delete unused hci_req_prepare_suspend() declaration
- Fix hci_link_tx_to RCU lock usage
- Avoid redundant authentication
Signed-off-by: David S. Miller <davem@davemloft.net>
Clark Wang [Thu, 21 Sep 2023 06:24:43 +0000 (14:24 +0800)]
net: stmmac: platform: fix the incorrect parameter
The second parameter of stmmac_pltfr_init() needs the pointer of
"struct plat_stmmacenet_data". So, correct the parameter typo when calling the
function.
Otherwise, it may cause this alignment exception when doing suspend/resume.
[ 49.067201] CPU1 is up
[ 49.135258] Internal error: SP/PC alignment exception:
000000008a000000 [#1] PREEMPT SMP
[ 49.143346] Modules linked in: soc_imx9 crct10dif_ce polyval_ce nvmem_imx_ocotp_fsb_s400 polyval_generic layerscape_edac_mod snd_soc_fsl_asoc_card snd_soc_imx_audmux snd_soc_imx_card snd_soc_wm8962 el_enclave snd_soc_fsl_micfil rtc_pcf2127 rtc_pcf2131 flexcan can_dev snd_soc_fsl_xcvr snd_soc_fsl_sai imx8_media_dev(C) snd_soc_fsl_utils fuse
[ 49.173393] CPU: 0 PID: 565 Comm: sh Tainted: G C 6.5.0-rc4-next-
20230804-05047-g5781a6249dae #677
[ 49.183721] Hardware name: NXP i.MX93 11X11 EVK board (DT)
[ 49.189190] pstate:
60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 49.196140] pc : 0x80800052
[ 49.198931] lr : stmmac_pltfr_resume+0x34/0x50
[ 49.203368] sp :
ffff800082f8bab0
[ 49.206670] x29:
ffff800082f8bab0 x28:
ffff0000047d0ec0 x27:
ffff80008186c170
[ 49.213794] x26:
0000000b5e4ff1ba x25:
ffff800081e5fa74 x24:
0000000000000010
[ 49.220918] x23:
ffff800081fe0000 x22:
0000000000000000 x21:
0000000000000000
[ 49.228042] x20:
ffff0000001b4010 x19:
ffff0000001b4010 x18:
0000000000000006
[ 49.235166] x17:
ffff7ffffe007000 x16:
ffff800080000000 x15:
0000000000000000
[ 49.242290] x14:
00000000000000fc x13:
0000000000000000 x12:
0000000000000000
[ 49.249414] x11:
0000000000000001 x10:
0000000000000a60 x9 :
ffff800082f8b8c0
[ 49.256538] x8 :
0000000000000008 x7 :
0000000000000001 x6 :
000000005f54a200
[ 49.263662] x5 :
0000000001000000 x4 :
ffff800081b93680 x3 :
ffff800081519be0
[ 49.270786] x2 :
0000000080800052 x1 :
0000000000000000 x0 :
ffff0000001b4000
[ 49.277911] Call trace:
[ 49.280346] 0x80800052
[ 49.282781] platform_pm_resume+0x2c/0x68
[ 49.286785] dpm_run_callback.constprop.0+0x74/0x134
[ 49.291742] device_resume+0x88/0x194
[ 49.295391] dpm_resume+0x10c/0x230
[ 49.298866] dpm_resume_end+0x18/0x30
[ 49.302515] suspend_devices_and_enter+0x2b8/0x624
[ 49.307299] pm_suspend+0x1fc/0x348
[ 49.310774] state_store+0x80/0x104
[ 49.314258] kobj_attr_store+0x18/0x2c
[ 49.318002] sysfs_kf_write+0x44/0x54
[ 49.321659] kernfs_fop_write_iter+0x120/0x1ec
[ 49.326088] vfs_write+0x1bc/0x300
[ 49.329485] ksys_write+0x70/0x104
[ 49.332874] __arm64_sys_write+0x1c/0x28
[ 49.336783] invoke_syscall+0x48/0x114
[ 49.340527] el0_svc_common.constprop.0+0xc4/0xe4
[ 49.345224] do_el0_svc+0x38/0x98
[ 49.348526] el0_svc+0x2c/0x84
[ 49.351568] el0t_64_sync_handler+0x100/0x12c
[ 49.355910] el0t_64_sync+0x190/0x194
[ 49.359567] Code: ???????? ???????? ???????? ???????? (????????)
[ 49.365644] ---[ end trace
0000000000000000 ]---
Fixes:
97117eb51ec8 ("net: stmmac: platform: provide stmmac_pltfr_init()")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Sat, 30 Sep 2023 16:52:04 +0000 (18:52 +0200)]
modpost: Don't let "driver"s reference .exit.*
Drivers must not reference functions marked with __exit as these likely
are not available when the code is built-in.
There are few creative offenders uncovered for example in ARCH=amd64
allmodconfig builds. So only trigger the section mismatch warning for
W=1 builds.
The dual rule that drivers must not reference .init.* is implemented
since commit
0db252452378 ("modpost: don't allow *driver to reference
.init.*") which however missed that .exit.* should be handled in the
same way.
Thanks to Masahiro Yamada and Arnd Bergmann who gave valuable hints to
find this improvement.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Sat, 30 Sep 2023 07:13:35 +0000 (16:13 +0900)]
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
Remove the left-over of commit
e24f6628811e ("modpost: remove all
traces of cpuinit/cpuexit sections").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Mauricio Faria de Oliveira [Thu, 28 Sep 2023 20:28:07 +0000 (17:28 -0300)]
modpost: add missing else to the "of" check
Without this 'else' statement, an "usb" name goes into two handlers:
the first/previous 'if' statement _AND_ the for-loop over 'devtable',
but the latter is useless as it has no 'usb' device_id entry anyway.
Tested with allmodconfig before/after patch; no changes to *.mod.c:
git checkout v6.6-rc3
make -j$(nproc) allmodconfig
make -j$(nproc) olddefconfig
make -j$(nproc)
find . -name '*.mod.c' | cpio -pd /tmp/before
# apply patch
make -j$(nproc)
find . -name '*.mod.c' | cpio -pd /tmp/after
diff -r /tmp/before/ /tmp/after/
# no difference
Fixes:
acbef7b76629 ("modpost: fix module autoloading for OF devices with generic compatible property")
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Linus Torvalds [Sun, 1 Oct 2023 01:41:37 +0000 (18:41 -0700)]
Merge tag 'soc-fixes-6.6' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are the latest bug fixes that have come up in the soc tree. Most
of these are fairly minor. Most notably, the majority of changes this
time are not for dts files as usual.
- Updates to the addresses of the broadcom and aspeed entries in the
MAINTAINERS file.
- Defconfig updates to address a regression on samsung and a build
warning from an unknown Kconfig symbol
- Build fixes for the StrongARM and Uniphier platforms
- Code fixes for SCMI and FF-A firmware drivers, both of which had a
simple bug that resulted in invalid data, and a lesser fix for the
optee firmware driver
- Multiple fixes for the recently added loongson/loongarch "guts" soc
driver
- Devicetree fixes for RISC-V on the startfive platform, addressing
issues with NOR flash, usb and uart.
- Multiple fixes for NXP i.MX8/i.MX9 dts files, fixing problems with
clock, gpio, hdmi settings and the Makefile
- Bug fixes for i.MX firmware code and the OCOTP soc driver
- Multiple fixes for the TI sysc bus driver
- Minor dts updates for TI omap dts files, to address boot time
warnings and errors"
* tag 'soc-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
MAINTAINERS: Fix Florian Fainelli's email address
arm64: defconfig: enable syscon-poweroff driver
ARM: locomo: fix locomolcd_power declaration
soc: loongson: loongson2_guts: Remove unneeded semicolon
soc: loongson: loongson2_guts: Convert to devm_platform_ioremap_resource()
soc: loongson: loongson_pm2: Populate children syscon nodes
dt-bindings: soc: loongson,ls2k-pmc: Allow syscon-reboot/syscon-poweroff as child
soc: loongson: loongson_pm2: Drop useless of_device_id compatible
dt-bindings: soc: loongson,ls2k-pmc: Use fallbacks for ls2k-pmc compatible
soc: loongson: loongson_pm2: Add dependency for INPUT
arm64: defconfig: remove CONFIG_COMMON_CLK_NPCM8XX=y
ARM: uniphier: fix cache kernel-doc warnings
MAINTAINERS: aspeed: Update Andrew's email address
MAINTAINERS: aspeed: Update git tree URL
firmware: arm_ffa: Don't set the memory region attributes for MEM_LEND
arm64: dts: imx: Add imx8mm-prt8mm.dtb to build
arm64: dts: imx8mm-evk: Fix hdmi@3d node
soc: imx8m: Enable OCOTP clock for imx8mm before reading registers
arm64: dts: imx8mp-beacon-kit: Fix audio_pll2 clock
arm64: dts: imx8mp: Fix SDMA2/3 clocks
...
Linus Torvalds [Sun, 1 Oct 2023 01:19:02 +0000 (18:19 -0700)]
Merge tag 'trace-v6.6-rc3' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Make sure 32-bit applications using user events have aligned access
when running on a 64-bit kernel.
- Add cond_resched in the loop that handles converting enums in
print_fmt string is trace events.
- Fix premature wake ups of polling processes in the tracing ring
buffer. When a task polls waiting for a percentage of the ring buffer
to be filled, the writer still will wake it up at every event. Add
the polling's percentage to the "shortest_full" list to tell the
writer when to wake it up.
- For eventfs dir lookups on dynamic events, an event system's only
event could be removed, leaving its dentry with no children. This is
totally legitimate. But in eventfs_release() it must not access the
children array, as it is only allocated when the dentry has children.
* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Test for dentries array allocated in eventfs_release()
tracing/user_events: Align set_bit() address for all archs
tracing: relax trace_event_eval_update() execution with cond_resched()
ring-buffer: Update "shortest_full" in polling
Steven Rostedt (Google) [Sat, 30 Sep 2023 13:01:06 +0000 (09:01 -0400)]
eventfs: Test for dentries array allocated in eventfs_release()
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.
Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes:
ef36b4f92868 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Beau Belgrave [Mon, 25 Sep 2023 23:08:28 +0000 (23:08 +0000)]
tracing/user_events: Align set_bit() address for all archs
All architectures should use a long aligned address passed to set_bit().
User processes can pass either a 32-bit or 64-bit sized value to be
updated when tracing is enabled when on a 64-bit kernel. Both cases are
ensured to be naturally aligned, however, that is not enough. The
address must be long aligned without affecting checks on the value
within the user process which require different adjustments for the bit
for little and big endian CPUs.
Add a compat flag to user_event_enabler that indicates when a 32-bit
value is being used on a 64-bit kernel. Long align addresses and correct
the bit to be used by set_bit() to account for this alignment. Ensure
compat flags are copied during forks and used during deletion clears.
Link: https://lore.kernel.org/linux-trace-kernel/20230925230829.341-2-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/20230914131102.179100-1-cleger@rivosinc.com/
Cc: stable@vger.kernel.org
Fixes:
7235759084a4 ("tracing/user_events: Use remote writes for event enablement")
Reported-by: Clément Léger <cleger@rivosinc.com>
Suggested-by: Clément Léger <cleger@rivosinc.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Clément Léger [Fri, 29 Sep 2023 19:16:37 +0000 (21:16 +0200)]
tracing: relax trace_event_eval_update() execution with cond_resched()
When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.
Adding a call to cond_resched() in trace_event_eval_update() allows
other tasks to be executed and thus continue working asynchronously
like before without blocking any pending task at boot time.
Link: https://lore.kernel.org/linux-trace-kernel/20230929191637.416931-1-cleger@rivosinc.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt (Google) [Fri, 29 Sep 2023 22:01:13 +0000 (18:01 -0400)]
ring-buffer: Update "shortest_full" in polling
It was discovered that the ring buffer polling was incorrectly stating
that read would not block, but that's because polling did not take into
account that reads will block if the "buffer-percent" was set. Instead,
the ring buffer polling would say reads would not block if there was any
data in the ring buffer. This was incorrect behavior from a user space
point of view. This was fixed by commit
42fb0a1e84ff by having the polling
code check if the ring buffer had more data than what the user specified
"buffer percent" had.
The problem now is that the polling code did not register itself to the
writer that it wanted to wait for a specific "full" value of the ring
buffer. The result was that the writer would wake the polling waiter
whenever there was a new event. The polling waiter would then wake up, see
that there's not enough data in the ring buffer to notify user space and
then go back to sleep. The next event would wake it up again.
Before the polling fix was added, the code would wake up around 100 times
for a hackbench 30 benchmark. After the "fix", due to the constant waking
of the writer, it would wake up over 11,0000 times! It would never leave
the kernel, so the user space behavior was still "correct", but this
definitely is not the desired effect.
To fix this, have the polling code add what it's waiting for to the
"shortest_full" variable, to tell the writer not to wake it up if the
buffer is not as full as it expects to be.
Note, after this fix, it appears that the waiter is now woken up around 2x
the times it was before (~200). This is a tremendous improvement from the
11,000 times, but I will need to spend some time to see why polling is
more aggressive in its wakeups than the read blocking code.
Link: https://lore.kernel.org/linux-trace-kernel/20230929180113.01c2cae3@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Sat, 30 Sep 2023 18:07:26 +0000 (11:07 -0700)]
Merge tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- fix the narea calculation in swiotlb initialization (Ross Lagerwall)
- fix the check whether a device has used swiotlb (Petr Tesarik)
* tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix the check whether a device has used software IO TLB
swiotlb: use the calculated number of areas
Linus Torvalds [Sat, 30 Sep 2023 18:01:38 +0000 (11:01 -0700)]
Merge tag 'iomap-6.6-fixes-4' of git://git./fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
- Handle a race between writing and shrinking block devices by
returning EIO
- Fix a typo in a comment
* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Spelling s/preceeding/preceding/g
iomap: add a workaround for racy i_size updates on block devices
Linus Torvalds [Sat, 30 Sep 2023 17:07:33 +0000 (10:07 -0700)]
Merge tag 'i2c-for-6.6-rc4' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Usual business: a driver fix, a DT fix, a minor core fix"
* tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: npcm7xx: Fix callback completion ordering
i2c: mux: Avoid potential false error message in i2c_mux_add_adapter
dt-bindings: i2c: mxs: Pass ref and 'unevaluatedProperties: false'
Linus Torvalds [Sat, 30 Sep 2023 16:59:37 +0000 (09:59 -0700)]
Merge tag 'acpi-6.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a possible NULL pointer dereference in the error path of
acpi_video_bus_add() resulting from recent changes (Dinghao Liu)"
* tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Fix NULL pointer dereference in acpi_video_bus_add()
Linus Torvalds [Sat, 30 Sep 2023 16:53:09 +0000 (09:53 -0700)]
Merge tag 'powerpc-6.6-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix arch_stack_walk_reliable(), used by live patching
- Fix powerpc selftests to work with run_kselftest.sh
Thanks to Joe Lawrence and Petr Mladek.
* tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix emit_tests to work with run_kselftest.sh
powerpc/stacktrace: Fix arch_stack_walk_reliable()
Linus Torvalds [Sat, 30 Sep 2023 16:44:48 +0000 (09:44 -0700)]
Merge tag 'nfsd-6.6-2' of git://git./linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix NFSv4 READ corner case
* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
Hou Tao [Thu, 28 Sep 2023 10:15:58 +0000 (18:15 +0800)]
bpf: Use kmalloc_size_roundup() to adjust size_index
Commit
d52b59315bf5 ("bpf: Adjust size_index according to the value of
KMALLOC_MIN_SIZE") uses KMALLOC_MIN_SIZE to adjust size_index, but as
reported by Nathan, the adjustment is not enough, because
__kmalloc_minalign() also decides the minimal alignment of slab object
as shown in new_kmalloc_cache() and its value may be greater than
KMALLOC_MIN_SIZE (e.g., 64 bytes vs 8 bytes under a riscv QEMU VM).
Instead of invoking __kmalloc_minalign() in bpf subsystem to find the
maximal alignment, just using kmalloc_size_roundup() directly to get the
corresponding slab object size for each allocation size. If these two
sizes are unmatched, adjust size_index to select a bpf_mem_cache with
unit_size equal to the object_size of the underlying slab cache for the
allocation size.
Fixes:
822fb26bdb55 ("bpf: Add a hint to allocated objects.")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/bpf/
20230914181407.GA1000274@dev-arch.thelio-3990X/
Signed-off-by: Hou Tao <houtao1@huawei.com>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20230928101558.2594068-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Linus Torvalds [Sat, 30 Sep 2023 16:39:23 +0000 (09:39 -0700)]
Merge tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
"Fix for password freeing potential oops (also for stable)"
* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb/client: Reset password pointer to NULL
Baoquan He [Tue, 26 Sep 2023 12:09:05 +0000 (20:09 +0800)]
Crash: add lock to serialize crash hotplug handling
Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.
=======
[ 78.714569] Fallback order for Node 0: 0
[ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886
[ 78.717133] Policy zone: Normal
[ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 80.056643] PEFILE: Unsigned PE binary
=======
The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively. So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.
Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically. This doesn't impact the usage of __kexec_lock.
Link: https://lkml.kernel.org/r/20230926120905.392903-1-bhe@redhat.com
Fixes:
247262756121 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>