Masami Hiramatsu (Google) [Tue, 11 Jul 2023 14:15:57 +0000 (23:15 +0900)]
Revert "tracing: Add "(fault)" name injection to kernel probes"
This reverts commit
2e9906f84fc7c99388bb7123ade167250d50f1c0.
It was turned out that commit
2e9906f84fc7 ("tracing: Add "(fault)"
name injection to kernel probes") did not work correctly and probe
events still show just '(fault)' (instead of '"(fault)"'). Also,
current '(fault)' is more explicit that it faulted.
This also moves FAULT_STRING macro to trace.h so that synthetic
event can keep using it, and uses it in trace_probe.c too.
Link: https://lore.kernel.org/all/168908495772.123124.1250788051922100079.stgit@devnote2/
Link: https://lore.kernel.org/all/20230706230642.3793a593@rorschach.local.home/
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Tue, 11 Jul 2023 14:15:48 +0000 (23:15 +0900)]
tracing/probes: Fix to update dynamic data counter if fetcharg uses it
Fix to update dynamic data counter ('dyndata') and max length ('maxlen')
only if the fetcharg uses the dynamic data. Also get out arg->dynamic
from unlikely(). This makes dynamic data address wrong if
process_fetch_insn() returns error on !arg->dynamic case.
Link: https://lore.kernel.org/all/168908494781.123124.8160245359962103684.stgit@devnote2/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20230710233400.5aaf024e@gandalf.local.home/
Fixes:
9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Tue, 11 Jul 2023 14:15:38 +0000 (23:15 +0900)]
tracing/probes: Fix not to count error code to total length
Fix not to count the error code (which is minus value) to the total
used length of array, because it can mess up the return code of
process_fetch_insn_bottom(). Also clear the 'ret' value because it
will be used for calculating next data_loc entry.
Link: https://lore.kernel.org/all/168908493827.123124.2175257289106364229.stgit@devnote2/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/
8819b154-2ba1-43c3-98a2-
cbde20892023@moroto.mountain/
Fixes:
9b960a38835f ("tracing: probeevent: Unify fetch_insn processing common part")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Tue, 11 Jul 2023 14:15:29 +0000 (23:15 +0900)]
tracing/probes: Fix to avoid double count of the string length on the array
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@devnote2/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/
8819b154-2ba1-43c3-98a2-
cbde20892023@moroto.mountain/
Fixes:
88903c464321 ("tracing/probe: Add ustring type for user-space string")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Fri, 7 Jul 2023 16:38:03 +0000 (01:38 +0900)]
fprobes: Add a comment why fprobe_kprobe_handler exits if kprobe is running
Add a comment the reason why fprobe_kprobe_handler() exits if any other
kprobe is running.
Link: https://lore.kernel.org/all/168874788299.159442.2485957441413653858.stgit@devnote2/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20230706120916.3c6abf15@gandalf.local.home/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Masami Hiramatsu (Google) [Fri, 7 Jul 2023 14:03:19 +0000 (23:03 +0900)]
fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
Ensure running fprobe_exit_handler() has finished before
calling rethook_free() in the unregister_fprobe() so that caller can free
the fprobe right after unregister_fprobe().
unregister_fprobe() ensured that all running fprobe_entry/exit_handler()
have finished by calling unregister_ftrace_function() which synchronizes
RCU. But commit
5f81018753df ("fprobe: Release rethook after the ftrace_ops
is unregistered") changed to call rethook_free() after
unregister_ftrace_function(). So call rethook_stop() to make rethook
disabled before unregister_ftrace_function() and ensure it again.
Here is the possible code flow that can call the exit handler after
unregister_fprobe().
------
CPU1 CPU2
call unregister_fprobe(fp)
...
__fprobe_handler()
rethook_hook() on probed function
unregister_ftrace_function()
return from probed function
rethook hooks
find rh->handler == fprobe_exit_handler
call fprobe_exit_handler()
rethook_free():
set rh->handler = NULL;
return from unreigster_fprobe;
call fp->exit_handler() <- (*)
------
(*) At this point, the exit handler is called after returning from
unregister_fprobe().
This fixes it as following;
------
CPU1 CPU2
call unregister_fprobe()
...
rethook_stop():
set rh->handler = NULL;
__fprobe_handler()
rethook_hook() on probed function
unregister_ftrace_function()
return from probed function
rethook hooks
find rh->handler == NULL
return from rethook
rethook_free()
return from unreigster_fprobe;
------
Link: https://lore.kernel.org/all/168873859949.156157.13039240432299335849.stgit@devnote2/
Fixes:
5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Li zeming [Tue, 11 Jul 2023 18:53:53 +0000 (02:53 +0800)]
kernel: kprobes: Remove unnecessary ‘0’ values
it is assigned first, so it does not need to initialize the assignment.
Link: https://lore.kernel.org/all/20230711185353.3218-1-zeming@nfschina.com/
Signed-off-by: Li zeming <zeming@nfschina.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Li zeming [Tue, 4 Jul 2023 19:43:59 +0000 (03:43 +0800)]
kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr
The 'correct_ret_addr' pointer is always set in the later code, no need
to initialize it at definition time.
Link: https://lore.kernel.org/all/20230704194359.3124-1-zeming@nfschina.com/
Signed-off-by: Li zeming <zeming@nfschina.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Ze Gao [Mon, 3 Jul 2023 09:23:36 +0000 (17:23 +0800)]
fprobe: add unlock to match a succeeded ftrace_test_recursion_trylock
Unlock ftrace recursion lock when fprobe_kprobe_handler() is failed
because of some running kprobe.
Link: https://lore.kernel.org/all/20230703092336.268371-1-zegao@tencent.com/
Fixes:
3cc4e2c5fbae ("fprobe: make fprobe_kprobe_handler recursion free")
Reported-by: Yafang <laoar.shao@gmail.com>
Closes: https://lore.kernel.org/linux-trace-kernel/CALOAHbC6UpfFOOibdDiC7xFc5YFUgZnk3MZ=3Ny6we=AcrNbew@mail.gmail.com/
Signed-off-by: Ze Gao <zegao@tencent.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tzvetomir Stoyanov (VMware) [Mon, 3 Jul 2023 04:28:53 +0000 (07:28 +0300)]
kernel/trace: Fix cleanup logic of enable_trace_eprobe
The enable_trace_eprobe() function enables all event probes, attached
to given trace probe. If an error occurs in enabling one of the event
probes, all others should be roll backed. There is a bug in that roll
back logic - instead of all event probes, only the failed one is
disabled.
Link: https://lore.kernel.org/all/20230703042853.1427493-1-tz.stoyanov@gmail.com/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes:
7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Jiri Olsa [Thu, 15 Jun 2023 11:52:36 +0000 (13:52 +0200)]
fprobe: Release rethook after the ftrace_ops is unregistered
While running bpf selftests it's possible to get following fault:
general protection fault, probably for non-canonical address \
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
...
Call Trace:
<TASK>
fprobe_handler+0xc1/0x270
? __pfx_bpf_testmod_init+0x10/0x10
? __pfx_bpf_testmod_init+0x10/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_testmod_init+0x22/0x80
? do_one_initcall+0x63/0x2e0
? rcu_is_watching+0xd/0x40
? kmalloc_trace+0xaf/0xc0
? do_init_module+0x60/0x250
? __do_sys_finit_module+0xac/0x120
? do_syscall_64+0x37/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
</TASK>
In unregister_fprobe function we can't release fp->rethook while it's
possible there are some of its users still running on another cpu.
Moving rethook_free call after fp->ops is unregistered with
unregister_ftrace_function call.
Link: https://lore.kernel.org/all/20230615115236.3476617-1-jolsa@kernel.org/
Fixes:
5b0ab78998e3 ("fprobe: Add exit_handler support")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Linus Torvalds [Sun, 25 Jun 2023 23:29:58 +0000 (16:29 -0700)]
Linux 6.4
Linus Torvalds [Sun, 25 Jun 2023 22:36:01 +0000 (15:36 -0700)]
Merge tag 'i2c-for-6.4-rc8' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Nothing fancy. Two driver and one DT binding fix"
* tag 'i2c-for-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
i2c: qup: Add missing unwind goto in qup_i2c_probe()
dt-bindings: i2c: opencores: Add missing type for "regstep"
Linus Torvalds [Sun, 25 Jun 2023 17:13:17 +0000 (10:13 -0700)]
Merge tag 'perf_urgent_for_v6.4' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Drop the __weak attribute from a function prototype as it otherwise
leads to the function getting replaced by a dummy stub
- Fix the umask value setup of the frontend event as former is
different on two Intel cores
* tag 'perf_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL
perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype
Linus Torvalds [Sun, 25 Jun 2023 17:00:17 +0000 (10:00 -0700)]
Merge tag 'objtool_urgent_for_v6.4' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Add a ORC format hash to vmlinux and modules in order for other tools
which use it, to detect changes to it and adapt accordingly
* tag 'objtool_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Add ELF section with ORC version identifier
Linus Torvalds [Sun, 25 Jun 2023 16:47:04 +0000 (09:47 -0700)]
Merge tag 'x86_urgent_for_v6.4' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Do not use set_pgd() when updating the KASLR trampoline pgd entry
because that updates the user PGD too on KPTI builds, resulting in
memory corruption
- Prevent a panic in the IO-APIC setup code due to conflicting command
line parameters
* tag 'x86_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys
x86/mm: Avoid using set_pgd() outside of real PGD pages
Linus Torvalds [Fri, 23 Jun 2023 23:33:26 +0000 (16:33 -0700)]
Merge tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Very quiet last week, just two misc fixes, one dp-mst and one qaic:
qaic:
- dma-buf import fix
dp-mst:
- fix NULL ptr deref"
[ It turns out it was a quiet week because Alex Deucher hadn't sent in
his pending AMD changes. So they are coming next - Linus ]
* tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drm:
drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
accel/qaic: Call DRM helper function to destroy prime GEM
Linus Torvalds [Fri, 23 Jun 2023 23:21:59 +0000 (16:21 -0700)]
Merge tag 'arm-fixes-6.4-3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"The final bug fixes for Qualcomm and Rockchips came in, all of them
for devicetree files:
- Devices on Qualcomm SC7180/SC7280 that are cache coherent are now
marked so correctly to fix a regression after a change in kernel
behavior
- Rockchips has a few minor changes for correctness of regulator and
cache properties, as well as fixes for incorrect behavior of the
RK3568 PCI controller and reset pins on two boards"
* tag 'arm-fixes-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
arm64: dts: rockchip: Fix rk356x PCIe register and range mappings
arm64: dts: rockchip: fix button reset pin for nanopi r5c
arm64: dts: rockchip: fix nEXTRST on SOQuartz
arm64: dts: rockchip: add missing cache properties
arm64: dts: rockchip: fix USB regulator on ROCK64
Linus Torvalds [Fri, 23 Jun 2023 23:09:53 +0000 (16:09 -0700)]
Merge tag 'for-6.4-rc7-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Unfortunately the recent u32 overflow fix was not complete, there was
one conversion left, assertion not triggered by my tests but caught by
Qu's fstests case.
The "cleanup for later" has been promoted to a proper fix and wraps
all uses of the stripe left shift so the diffstat has grown but leaves
no potentially problematic uses.
We should have done it that way before, sorry"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix remaining u32 overflows when left shifting stripe_nr
Linus Torvalds [Fri, 23 Jun 2023 23:04:35 +0000 (16:04 -0700)]
Merge tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
"It's apparently the week of 'fixup something from last week', because
the same is true for this block pull request.
Fix up a lock grab that needs to be IRQ saving, rather than just IRQ
disabling, in the block cgroup code"
* tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linux:
block: make sure local irq is disabled when calling __blkcg_rstat_flush
Linus Torvalds [Fri, 23 Jun 2023 22:56:44 +0000 (15:56 -0700)]
Merge tag 'iommu-fix-v6.4-rc7' of git://git./linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
- Fix potential memory leak in AMD IOMMU domain allocation path
* tag 'iommu-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix possible memory leak of 'domain'
Linus Torvalds [Fri, 23 Jun 2023 22:43:01 +0000 (15:43 -0700)]
Merge tag 'sound-6.4' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Three oneliner fixes: one for a thinko in SOF SoundWire code and two
HD-audio quirks for ASUS laptops. All device-specific and should be
safe to apply"
* tag 'sound-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for ASUS ROG GV601V
ALSA: hda/realtek: Add quirk for ASUS ROG G634Z
ASoC: intel: sof_sdw: Fixup typo in device link checking
Linus Torvalds [Fri, 23 Jun 2023 22:24:09 +0000 (15:24 -0700)]
Merge tag 'gpio-fixes-for-v6.4' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix IRQ initialization in gpiochip_irqchip_add_domain()
- add a missing return value check for platform_get_irq() in
gpio-sifive
- don't free irq_domains which GPIOLIB does not manage
* tag 'gpio-fixes-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()
gpio: sifive: add missing check for platform_get_irq
gpiolib: Fix GPIO chip IRQ initialization restriction
Arnd Bergmann [Fri, 23 Jun 2023 20:13:22 +0000 (22:13 +0200)]
Merge tag 'qcom-arm64-fixes-for-6.4-2' of https://git./linux/kernel/git/qcom/linux into arm/fixes
One last Qualcomm ARM64 DeviceTree fix for v6.4
Changes related to cache management for DMA memory caused WiFi to stop
work on SC7180 and SC7280 based products, using TF-A. These changes
marks the relevant device dma-coherent to correct the behavior.
* tag 'qcom-arm64-fixes-for-6.4-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
Link: https://lore.kernel.org/r/20230622203248.106422-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Fri, 23 Jun 2023 19:08:14 +0000 (12:08 -0700)]
workqueue: clean up WORK_* constant types, clarify masking
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:
kernel/workqueue.c: In function ‘get_work_pwq’:
kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
| ^
[ ... a couple of other cases ... ]
and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.
Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.
The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused. The mask needs to be 'unsigned long', not some unspecified
enum type.
To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.
That's now how we roll in the kernel.
So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.
Incidentally, this magically makes clang generate better code. That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.
Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clark Wang [Mon, 29 May 2023 08:02:51 +0000 (16:02 +0800)]
i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
Claim clkhi and clklo as integer type to avoid possible calculation
errors caused by data overflow.
Fixes:
a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Shuai Jiang [Tue, 18 Apr 2023 13:56:12 +0000 (21:56 +0800)]
i2c: qup: Add missing unwind goto in qup_i2c_probe()
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes:
9cedf3b2f099 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Cc: <stable@vger.kernel.org> # v4.6+
Rob Herring [Tue, 13 Jun 2023 20:11:04 +0000 (14:11 -0600)]
dt-bindings: i2c: opencores: Add missing type for "regstep"
"regstep" may be deprecated, but it still needs a type.
Fixes:
8ad69f490516 ("dt-bindings: i2c: convert ocores binding to yaml")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Peter Korsgaard <peter@korsgaard.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Dave Airlie [Fri, 23 Jun 2023 02:16:47 +0000 (12:16 +1000)]
Merge tag 'drm-misc-fixes-2023-06-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.4:
- Qaic imported dma-buf fix.
- Fix null pointer deref when printing a dp-mst message.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patchwork.freedesktop.org/patch/msgid/e96b1965-ba67-7cc5-2358-826eb5b9b998@lankhorst.se
Linus Torvalds [Fri, 23 Jun 2023 00:59:51 +0000 (17:59 -0700)]
Merge tag 'net-6.4-rc8' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from ipsec, bpf, mptcp and netfilter.
Current release - regressions:
- netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
- eth: mlx5e:
- fix scheduling of IPsec ASO query while in atomic
- free IRQ rmap and notifier on kernel shutdown
Current release - new code bugs:
- phy: manual remove LEDs to ensure correct ordering
Previous releases - regressions:
- mptcp: fix possible divide by zero in recvmsg()
- dsa: revert "net: phy: dp83867: perform soft reset and retain
established link"
Previous releases - always broken:
- sched: netem: acquire qdisc lock in netem_change()
- bpf:
- fix verifier id tracking of scalars on spill
- fix NULL dereference on exceptions
- accept function names that contain dots
- netfilter: disallow element updates of bound anonymous sets
- mptcp: ensure listener is unhashed before updating the sk status
- xfrm:
- add missed call to delete offloaded policies
- fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
- selftests: fixes for FIPS mode
- dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
- eth: sfc: use budget for TX completions
Misc:
- wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0"
* tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits)
revert "net: align SO_RCVMARK required privileges with SO_MARK"
net: wwan: iosm: Convert single instance struct member to flexible array
sch_netem: acquire qdisc lock in netem_change()
selftests: forwarding: Fix race condition in mirror installation
wifi: mac80211: report all unusable beacon frames
mptcp: ensure listener is unhashed before updating the sk status
mptcp: drop legacy code around RX EOF
mptcp: consolidate fallback and non fallback state machine
mptcp: fix possible list corruption on passive MPJ
mptcp: fix possible divide by zero in recvmsg()
mptcp: handle correctly disconnect() failures
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
Revert "net: phy: dp83867: perform soft reset and retain established link"
net: mdio: fix the wrong parameters
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
...
Linus Torvalds [Fri, 23 Jun 2023 00:54:10 +0000 (17:54 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly save/restore PMUSERNR_EL0 when host userspace is using
PMU counters directly
- Fix GICv2 emulation on GICv3 after the locking rework
- Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
document why
Generic:
- Avoid setting page table entries pointing to a deleted memslot if a
host page table entry is changed concurrently with the deletion"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid illegal stage2 mapping on invalid memory slot
KVM: arm64: Use raw_smp_processor_id() in kvm_pmu_probe_armpmu()
KVM: arm64: Restore GICv2-on-GICv3 functionality
KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded
KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
Linus Torvalds [Fri, 23 Jun 2023 00:49:40 +0000 (17:49 -0700)]
Merge tag 'powerpc-6.4-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- Disable IRQs when switching mm in exit_lazy_flush_tlb() called from
exit_mmap()
Thanks to Nicholas Piggin and Sachin Sant.
* tag 'powerpc-6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs enabled
Linus Torvalds [Fri, 23 Jun 2023 00:47:07 +0000 (17:47 -0700)]
Merge tag 'pci-v6.4-fixes-2' of git://git./linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Transfer Intel LGM GW PCIe maintenance from Rahul Tanwar to Chuanhua
Lei (Zhu YiXin)
* tag 'pci-v6.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Chuanhua Lei as Intel LGM GW PCIe maintainer
Linus Torvalds [Fri, 23 Jun 2023 00:42:07 +0000 (17:42 -0700)]
Merge tag 'mmc-v6.4-rc6' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- Fix support for deferred probing for several host drivers
- litex_mmc: Use async probe as it's common for all mmc hosts
- meson-gx: Fix bug when scheduling while atomic
- mmci_stm32: Fix max busy timeout calculation
- sdhci-msm: Disable broken 64-bit DMA on MSM8916
* tag 'mmc-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: usdhi60rol0: fix deferred probing
mmc: sunxi: fix deferred probing
mmc: sh_mmcif: fix deferred probing
mmc: sdhci-spear: fix deferred probing
mmc: sdhci-acpi: fix deferred probing
mmc: owl: fix deferred probing
mmc: omap_hsmmc: fix deferred probing
mmc: omap: fix deferred probing
mmc: mvsdio: fix deferred probing
mmc: mtk-sd: fix deferred probing
mmc: meson-gx: fix deferred probing
mmc: bcm2835: fix deferred probing
mmc: litex_mmc: set PROBE_PREFER_ASYNCHRONOUS
mmc: meson-gx: remove redundant mmc_request_done() call from irq context
mmc: mmci: stm32: fix max busy timeout calculation
mmc: sdhci-msm: Disable broken 64-bit DMA on MSM8916
Linus Torvalds [Fri, 23 Jun 2023 00:38:11 +0000 (17:38 -0700)]
Merge tag 'platform-drivers-x86-v6.4-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fix from Hans de Goede:
"One small fix for an AMD PMF driver issue which is causing issues for
users of just released AMD laptop models"
* tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd/pmf: Register notify handler only if SPS is enabled
Linus Torvalds [Fri, 23 Jun 2023 00:32:34 +0000 (17:32 -0700)]
Merge tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A fix for a race condition with poll removal and linked timeouts, and
then a few followup fixes/tweaks for the msg_control patch from last
week.
Not super important, particularly the sparse fixup, as it was broken
before that recent commit. But let's get it sorted for real for this
release, rather than just have it broken a bit differently"
* tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux:
io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
io_uring/net: disable partial retries for recvmsg with cmsg
io_uring/net: clear msg_controllen on partial sendmsg retry
io_uring/poll: serialize poll linked timer start with poll removal
Linus Torvalds [Fri, 23 Jun 2023 00:27:16 +0000 (17:27 -0700)]
Merge tag 'cgroup-for-6.4-rc7-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"It's late but here are two bug fixes. Both fix problems which can be
severe but are very confined in scope. The risk to most use cases
should be minimal.
- Fix for an old bug which triggers if a cgroup subsystem is
remounted to a different hierarchy while someone is reading its
cgroup.procs/tasks file. The risk is pretty low given how seldom
cgroup subsystems are moved across hierarchies.
- We moved cpus_read_lock() outside of cgroup internal locks a while
ago but forgot to update the legacy_freezer leading to lockdep
triggers. Fixed"
* tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Do not corrupt task iteration when rebinding subsystem
cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
Paolo Bonzini [Thu, 22 Jun 2023 19:28:26 +0000 (15:28 -0400)]
Merge tag 'kvmarm-fixes-6.4-4' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.4, take #4
- Correctly save/restore PMUSERNR_EL0 when host userspace is using
PMU counters directly
- Fix GICv2 emulation on GICv3 after the locking rework
- Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
document why...
Douglas Anderson [Fri, 16 Jun 2023 15:14:41 +0000 (08:14 -0700)]
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
Just like for sc7180 devices using the Chrome bootflow (AKA trogdor
and IDP), sc7280 devices using the Chrome bootflow also need their
firmware marked dma-coherent. On sc7280 this wasn't causing WiFi to
fail to startup, since WiFi works differently there. However, on
sc7280 devices we were still getting the message at bootup after
commit
7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache
invalidation from arch_dma_prep_coherent()"""):
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem
9c900000.memory: assign memory failed
qcom_rmtfs_mem: probe of
9c900000.memory failed with error -22
We should mark SCM properly just like we did for trogdor.
Fixes:
7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes:
7a1f4e7f740d ("arm64: dts: qcom: sc7280: Add basic dts/dtsi files for sc7280 soc")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.4.I21dc14a63327bf81c6bb58fe8ed91dbdc9849ee2@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Douglas Anderson [Fri, 16 Jun 2023 15:14:40 +0000 (08:14 -0700)]
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
Trogdor devices use firmware backed by TF-A instead of Qualcomm's
normal TZ. On TF-A we end up mapping memory as cacheable.
Specifically, you can see in Trogdor's TF-A code [1] in
qti_sip_mem_assign() that we call qti_mmap_add_dynamic_region() with
MT_RO_DATA. This translates down to MT_MEMORY instead of
MT_NON_CACHEABLE or MT_DEVICE. Apparently Qualcomm's normal TZ
implementation maps the memory as non-cacheable.
Let's add the "dma-coherent" attribute to the SCM for trogdor.
Adding "dma-coherent" like this fixes WiFi on sc7180-trogdor
devices. WiFi was broken as of commit
7bd6680b47fa ("Revert "Revert
"arm64: dma: Drop cache invalidation from
arch_dma_prep_coherent()"""). Specifically at bootup we'd get:
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem
94600000.memory: assign memory failed
qcom_rmtfs_mem: probe of
94600000.memory failed with error -22
From discussion on the mailing lists [2] and over IRC [3], it was
determined that we should always have been tagging the SCM as
dma-coherent on trogdor but that the old "invalidate" happened to make
things work most of the time. Tagging it properly like this is a much
more robust solution.
[1] https://chromium.googlesource.com/chromiumos/third_party/arm-trusted-firmware/+/refs/heads/firmware-trogdor-13577.B/plat/qti/common/src/qti_syscall.c
[2] https://lore.kernel.org/r/
20230614165904.1.I279773c37e2c1ed8fbb622ca6d1397aea0023526@changeid
[3] https://oftc.irclog.whitequark.org/linux-msm/2023-06-15
Fixes:
7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes:
7ec3e67307f8 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt")
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.3.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Douglas Anderson [Fri, 16 Jun 2023 15:14:39 +0000 (08:14 -0700)]
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
sc7180-idp is, for most intents and purposes, a trogdor device.
Specifically, sc7180-idp is designed to run the same style of firmware
as trogdor devices. This can be seen from the fact that IDP has the
same "Reserved memory changes" in its device tree that trogdor has.
Recently it was realized that we need to mark SCM as dma-coherent to
match what trogdor's style of firmware (based on TF-A) does [1]. That
means we need this dma-coherent tag on IDP as well.
Without this, on newer versions of Linux, specifically those with
commit
7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache
invalidation from arch_dma_prep_coherent()"""), WiFi will fail to
work. At bootup you'll see:
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem
94600000.memory: assign memory failed
qcom_rmtfs_mem: probe of
94600000.memory failed with error -22
[1] https://lore.kernel.org/r/
20230615145253.1.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
Fixes:
7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes:
f5ab220d162c ("arm64: dts: qcom: sc7180: Add remoteproc enablers")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.2.I3c17d546d553378aa8a0c68c3fe04bccea7cba17@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Douglas Anderson [Fri, 16 Jun 2023 15:14:38 +0000 (08:14 -0700)]
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
Trogdor devices use firmware backed by TF-A instead of Qualcomm's
normal TZ. On TF-A we end up mapping memory as cacheable. Specifically,
you can see in Trogdor's TF-A code [1] in qti_sip_mem_assign() that we
call qti_mmap_add_dynamic_region() with MT_RO_DATA. This translates
down to MT_MEMORY instead of MT_NON_CACHEABLE or MT_DEVICE.
Let's allow devices like trogdor to be described properly by allowing
"dma-coherent" in the SCM node.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230616081440.v2.1.Ie79b5f0ed45739695c9970df121e11d724909157@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Gavin Shan [Thu, 15 Jun 2023 05:42:59 +0000 (15:42 +1000)]
KVM: Avoid illegal stage2 mapping on invalid memory slot
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is
cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit
d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable@vger.kernel.org # v3.9+
Fixes:
d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai@redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Message-Id: <
20230615054259.14911-1-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Qu Wenruo [Thu, 22 Jun 2023 06:42:40 +0000 (14:42 +0800)]
btrfs: fix remaining u32 overflows when left shifting stripe_nr
There was regression caused by
a97699d1d610 ("btrfs: replace
map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr").
To avoid code churn the fix was open coding the type casts but
unfortunately missed one which was still possible to hit [1].
The missing place was assignment of bioc->full_stripe_logical inside
btrfs_map_block().
Fix it by adding a helper that does the safe calculation of the offset
and use it everywhere even though it may not be strictly necessary due
to already using u64 types. This replaces all remaining
"<< BTRFS_STRIPE_LEN_SHIFT" calls.
[1] https://lore.kernel.org/linux-btrfs/
20230622065438.86402-1-wqu@suse.com/
Fixes:
a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Ming Lei [Thu, 22 Jun 2023 08:42:49 +0000 (16:42 +0800)]
block: make sure local irq is disabled when calling __blkcg_rstat_flush
When __blkcg_rstat_flush() is called from cgroup_rstat_flush*() code
path, interrupt is always disabled.
When we start to flush blkcg per-cpu stats list in __blkg_release()
for avoiding to leak blkcg_gq's reference in commit
20cb1c2fb756
("blk-cgroup: Flush stats before releasing blkcg_gq"), local irq
isn't disabled yet, then lockdep warning may be triggered because
the dependent cgroup locks may be acquired from irq(soft irq) handler.
Fix the issue by disabling local irq always.
Fixes:
20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/pz2wzwnmn5tk3pwpskmjhli6g3qly7eoknilb26of376c7kwxy@qydzpvt6zpis/T/#u
Cc: stable@vger.kernel.org
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20230622084249.1208005-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Abeni [Thu, 22 Jun 2023 12:39:06 +0000 (14:39 +0200)]
Merge tag 'nf-23-06-21' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
This is v3, including a crash fix for patch 01/14.
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
2) Fix chain binding transaction logic, add a bound flag to rule
transactions. Remove incorrect logic in nft_data_hold() and
nft_data_release().
3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
the set/chain as a follow up to
1240eb93f061 ("netfilter: nf_tables:
incorrect error path handling with NFT_MSG_NEWRULE")
4) Drop map element references from preparation phase instead of
set destroy path, otherwise bogus EBUSY with transactions such as:
flush chain ip x y
delete chain ip x w
where chain ip x y contains jump/goto from set elements.
5) Pipapo set type does not regard generation mask from the walk
iteration.
6) Fix reference count underflow in set element reference to
stateful object.
7) Several patches to tighten the nf_tables API:
- disallow set element updates of bound anonymous set
- disallow unbound anonymous set/chain at the end of transaction.
- disallow updates of anonymous set.
- disallow timeout configuration for anonymous sets.
8) Fix module reference leak in chain updates.
9) Fix nfnetlink_osf module autoload.
10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
in iptables-nft.
This Netfilter batch is larger than usual at this stage, I am aware we
are fairly late in the -rc cycle, if you prefer to route them through
net-next, please let me know.
netfilter pull request 23-06-21
* tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: disallow element updates of bound anonymous sets
netfilter: nf_tables: fix underflow in object reference counter
netfilter: nft_set_pipapo: .walk does not deal with generations
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: fix chain binding transaction logic
ipvs: align inner_mac_header for encapsulation
====================
Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Maciej Żenczykowski [Sun, 18 Jun 2023 10:31:30 +0000 (03:31 -0700)]
revert "net: align SO_RCVMARK required privileges with SO_MARK"
This reverts commit
1f86123b9749 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit
e42c7beee71d
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes:
1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK")
Cc: Larysa Zaremba <larysa.zaremba@intel.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick Rohr <prohr@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kees Cook [Tue, 20 Jun 2023 19:42:38 +0000 (12:42 -0700)]
net: wwan: iosm: Convert single instance struct member to flexible array
struct mux_adth actually ends with multiple struct mux_adth_dg members.
This is seen both in the comments about the member:
/**
* struct mux_adth - Structure of the Aggregated Datagram Table Header.
...
* @dg: datagramm table with variable length
*/
and in the preparation for populating it:
adth_dg_size = offsetof(struct mux_adth, dg) +
ul_adb->dg_count[i] * sizeof(*dg);
...
adth_dg_size -= offsetof(struct mux_adth, dg);
memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
This was reported as a run-time false positive warning:
memcpy: detected field-spanning write (size 16) of single field "&adth->dg" at drivers/net/wwan/iosm/iosm_ipc_mux_codec.c:852 (size 8)
Adjust the struct mux_adth definition and associated sizeof() math; no binary
output differences are observed in the resulting object file.
Reported-by: Florian Klink <flokli@flokli.de>
Closes: https://lore.kernel.org/lkml/
dbfa25f5-64c8-5574-4f5d-
0151ba95d232@gmail.com/
Fixes:
1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Cc: M Chetan Kumar <m.chetan.kumar@intel.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Intel Corporation <linuxwwan@intel.com>
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620194234.never.023-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 20 Jun 2023 18:44:25 +0000 (18:44 +0000)]
sch_netem: acquire qdisc lock in netem_change()
syzbot managed to trigger a divide error [1] in netem.
It could happen if q->rate changes while netem_enqueue()
is running, since q->rate is read twice.
It turns out netem_change() always lacked proper synchronization.
[1]
divide error: 0000 [#1] SMP KASAN
CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
RIP: 0010:div64_u64 include/linux/math64.h:69 [inline]
RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline]
RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576
Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03
RSP: 0018:
ffffc9000dccea60 EFLAGS:
00010246
RAX:
000001a442624200 RBX:
000001a442624200 RCX:
ffff888108a4f000
RDX:
0000000000000000 RSI:
000000000000070d RDI:
000000000000070d
RBP:
ffffc9000dcceb90 R08:
ffffffff849c5e26 R09:
fffffbfff10e1297
R10:
0000000000000000 R11:
dffffc0000000001 R12:
ffff888108a4f358
R13:
dffffc0000000000 R14:
0000001a8cd9a7ec R15:
0000000000000000
FS:
00007fa73fe18700(0000) GS:
ffff8881f6b00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fa73fdf7718 CR3:
000000011d36e000 CR4:
0000000000350ee0
Call Trace:
<TASK>
[<
ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline]
[<
ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290
[<
ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline]
[<
ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline]
[<
ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline]
[<
ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235
[<
ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0
[<
ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323
[<
ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline]
[<
ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437
[<
ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline]
[<
ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline]
[<
ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542
[<
ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shyam Sundar S K [Thu, 22 Jun 2023 06:03:09 +0000 (11:33 +0530)]
platform/x86/amd/pmf: Register notify handler only if SPS is enabled
Power source notify handler is getting registered even when none of the
PMF feature in enabled leading to a crash.
...
[ 22.592162] Call Trace:
[ 22.592164] <TASK>
[ 22.592164] ? rcu_note_context_switch+0x5e0/0x660
[ 22.592166] ? __warn+0x81/0x130
[ 22.592171] ? rcu_note_context_switch+0x5e0/0x660
[ 22.592172] ? report_bug+0x171/0x1a0
[ 22.592175] ? prb_read_valid+0x1b/0x30
[ 22.592177] ? handle_bug+0x3c/0x80
[ 22.592178] ? exc_invalid_op+0x17/0x70
[ 22.592179] ? asm_exc_invalid_op+0x1a/0x20
[ 22.592182] ? rcu_note_context_switch+0x5e0/0x660
[ 22.592183] ? acpi_ut_delete_object_desc+0x86/0xb0
[ 22.592186] ? acpi_ut_update_ref_count.part.0+0x22d/0x930
[ 22.592187] __schedule+0xc0/0x1410
[ 22.592189] ? ktime_get+0x3c/0xa0
[ 22.592191] ? lapic_next_event+0x1d/0x30
[ 22.592193] ? hrtimer_start_range_ns+0x25b/0x350
[ 22.592196] schedule+0x5e/0xd0
[ 22.592197] schedule_hrtimeout_range_clock+0xbe/0x140
[ 22.592199] ? __pfx_hrtimer_wakeup+0x10/0x10
[ 22.592200] usleep_range_state+0x64/0x90
[ 22.592203] amd_pmf_send_cmd+0x106/0x2a0 [amd_pmf
bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592207] amd_pmf_update_slider+0x56/0x1b0 [amd_pmf
bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592210] amd_pmf_set_sps_power_limits+0x72/0x80 [amd_pmf
bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592213] amd_pmf_pwr_src_notify_call+0x49/0x90 [amd_pmf
bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592216] notifier_call_chain+0x5a/0xd0
[ 22.592218] atomic_notifier_call_chain+0x32/0x50
...
Fix this by moving the registration of source change notify handler only
when SPS(Static Slider) is advertised as supported.
Reported-by: Allen Zhong <allen@atr.me>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571
Fixes:
4c71ae414474 ("platform/x86/amd/pmf: Add support SPS PMF feature")
Tested-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20230622060309.310001-1-Shyam-sundar.S-k@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Danielle Ratson [Tue, 20 Jun 2023 12:45:15 +0000 (14:45 +0200)]
selftests: forwarding: Fix race condition in mirror installation
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes:
35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
Fixes:
239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Benjamin Berg [Wed, 21 Jun 2023 12:05:44 +0000 (14:05 +0200)]
wifi: mac80211: report all unusable beacon frames
Properly check for RX_DROP_UNUSABLE now that the new drop reason
infrastructure is used. Without this change, the comparison will always
be false as a more specific reason is given in the lower bits of result.
Fixes:
baa951a1c177 ("mac80211: use the new drop reasons infrastructure")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230621120543.412920-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 22 Jun 2023 05:44:59 +0000 (22:44 -0700)]
Merge branch 'mptcp-fixes-for-6-4'
Matthieu Baerts says:
====================
mptcp: fixes for 6.4
Patch 1 correctly handles disconnect() failures that can happen in some
specific cases: now the socket state is set as unconnected as expected.
That fixes an issue introduced in v6.2.
Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
flag. It should address an issue present in MPTCP from almost the
beginning, from v5.9.
Patch 3 fixes a possible list corruption on passive MPJ even if the race
seems very unlikely, better be safe than sorry. The possible issue is
present from v5.17.
Patch 4 consolidates fallback and non fallback state machines to avoid
leaking some MPTCP sockets. The fix is likely needed for versions from
v5.11.
Patch 5 drops code that is no longer used after the introduction of
patch 4/6. This is not really a fix but this patch can probably land in
the -net tree as well not to leave unused code.
Patch 6 ensures listeners are unhashed before updating their sk status
to avoid possible deadlocks when diag info are going to be retrieved
with a lock. Even if it should not be visible with the way we are
currently getting diag info, the issue is present from v5.17.
====================
Link: https://lore.kernel.org/r/20230620-upstream-net-20230620-misc-fixes-for-v6-4-v1-0-f36aa5eae8b9@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 20 Jun 2023 16:24:23 +0000 (18:24 +0200)]
mptcp: ensure listener is unhashed before updating the sk status
The MPTCP protocol access the listener subflow in a lockless
manner in a couple of places (poll, diag). That works only if
the msk itself leaves the listener status only after that the
subflow itself has been closed/disconnected. Otherwise we risk
deadlock in diag, as reported by Christoph.
Address the issue ensuring that the first subflow (the listener
one) is always disconnected before updating the msk socket status.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407
Fixes:
b29fcfb54cd7 ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 20 Jun 2023 16:24:22 +0000 (18:24 +0200)]
mptcp: drop legacy code around RX EOF
Thanks to the previous patch -- "mptcp: consolidate fallback and non
fallback state machine" -- we can finally drop the "temporary hack"
used to detect rx eof.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 20 Jun 2023 16:24:21 +0000 (18:24 +0200)]
mptcp: consolidate fallback and non fallback state machine
An orphaned msk releases the used resources via the worker,
when the latter first see the msk in CLOSED status.
If the msk status transitions to TCP_CLOSE in the release callback
invoked by the worker's final release_sock(), such instance of the
workqueue will not take any action.
Additionally the MPTCP code prevents scheduling the worker once the
socket reaches the CLOSE status: such msk resources will be leaked.
The only code path that can trigger the above scenario is the
__mptcp_check_send_data_fin() in fallback mode.
Address the issue removing the special handling of fallback socket
in __mptcp_check_send_data_fin(), consolidating the state machine
for fallback and non fallback socket.
Since non-fallback sockets do not send and do not receive data_fin,
the mptcp code can update the msk internal status to match the next
step in the SM every time data fin (ack) should be generated or
received.
As a consequence we can remove a bunch of checks for fallback from
the fastpath.
Fixes:
6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 20 Jun 2023 16:24:20 +0000 (18:24 +0200)]
mptcp: fix possible list corruption on passive MPJ
At passive MPJ time, if the msk socket lock is held by the user,
the new subflow is appended to the msk->join_list under the msk
data lock.
In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in
that list are moved from the join_list into the conn_list under the
msk socket lock.
Append and removal could race, possibly corrupting such list.
Address the issue splicing the join list into a temporary one while
still under the msk data lock.
Found by code inspection, the race itself should be almost impossible
to trigger in practice.
Fixes:
3e5014909b56 ("mptcp: cleanup MPJ subflow list handling")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 20 Jun 2023 16:24:19 +0000 (18:24 +0200)]
mptcp: fix possible divide by zero in recvmsg()
Christoph reported a divide by zero bug in mptcp_recvmsg():
divide error: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 19978 Comm: syz-executor.6 Not tainted 6.4.0-rc2-gffcc7899081b #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__tcp_select_window+0x30e/0x420 net/ipv4/tcp_output.c:3018
Code: 11 ff 0f b7 cd c1 e9 0c b8 ff ff ff ff d3 e0 89 c1 f7 d1 01 cb 21 c3 eb 17 e8 2e 83 11 ff 31 db eb 0e e8 25 83 11 ff 89 d8 99 <f7> 7c 24 04 29 d3 65 48 8b 04 25 28 00 00 00 48 3b 44 24 10 75 60
RSP: 0018:
ffffc90000a07a18 EFLAGS:
00010246
RAX:
000000000000ffd7 RBX:
000000000000ffd7 RCX:
0000000000040000
RDX:
0000000000000000 RSI:
000000000003ffff RDI:
0000000000040000
RBP:
000000000000ffd7 R08:
ffffffff820cf297 R09:
0000000000000001
R10:
0000000000000000 R11:
ffffffff8103d1a0 R12:
0000000000003f00
R13:
0000000000300000 R14:
ffff888101cf3540 R15:
0000000000180000
FS:
00007f9af4c09640(0000) GS:
ffff88813bd00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001b33824000 CR3:
000000012f241001 CR4:
0000000000170ee0
Call Trace:
<TASK>
__tcp_cleanup_rbuf+0x138/0x1d0 net/ipv4/tcp.c:1611
mptcp_recvmsg+0xcb8/0xdd0 net/mptcp/protocol.c:2034
inet_recvmsg+0x127/0x1f0 net/ipv4/af_inet.c:861
____sys_recvmsg+0x269/0x2b0 net/socket.c:1019
___sys_recvmsg+0xe6/0x260 net/socket.c:2764
do_recvmmsg+0x1a5/0x470 net/socket.c:2858
__do_sys_recvmmsg net/socket.c:2937 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0xa6/0x130 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9af58fc6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:
00007f9af4c08cd8 EFLAGS:
00000246 ORIG_RAX:
000000000000012b
RAX:
ffffffffffffffda RBX:
00000000006bc050 RCX:
00007f9af58fc6a9
RDX:
0000000000000001 RSI:
0000000020000140 RDI:
0000000000000004
RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000f00 R11:
0000000000000246 R12:
00000000006bc05c
R13:
fffffffffffffea8 R14:
00000000006bc050 R15:
000000000001fe40
</TASK>
mptcp_recvmsg is allowed to release the msk socket lock when
blocking, and before re-acquiring it another thread could have
switched the sock to TCP_LISTEN status - with a prior
connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss.
Address the issue preventing the disconnect if some other process is
concurrently performing a blocking syscall on the same socket, alike
commit
4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting").
Fixes:
a6b118febbab ("mptcp: add receive buffer auto-tuning")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/404
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 20 Jun 2023 16:24:18 +0000 (18:24 +0200)]
mptcp: handle correctly disconnect() failures
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Fixes:
7d803344fdc3 ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 21 Jun 2023 20:59:46 +0000 (13:59 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-21
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 7 files changed, 181 insertions(+), 15 deletions(-).
The main changes are:
1) Fix a verifier id tracking issue with scalars upon spill,
from Maxim Mikityanskiy.
2) Fix NULL dereference if an exception is generated while a BPF
subprogram is running, from Krister Johansen.
3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0,
from Florent Revest.
4) Fix expected_attach_type enforcement for kprobe_multi link,
from Jiri Olsa.
5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
selftests/bpf: add a test for subprogram extables
bpf: ensure main program has an extable
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
selftests/bpf: Add test cases to assert proper ID tracking on spill
bpf: Fix verifier id tracking of scalars on spill
====================
Link: https://lore.kernel.org/r/20230621101116.16122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 21 Jun 2023 19:36:34 +0000 (12:36 -0700)]
Merge tag 'timers-urgent-2023-06-21' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single regression fix for a regression fix:
For a long time the tick was aligned to clock MONOTONIC so that the
tick event happened at a multiple of nanoseconds per tick starting
from clock MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable,
was adjusted with a boot offset, so that time 0 is closer to the point
where the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is
initially set up to a multiple of tick periods. That works as long as
the underlying tick device supports periodic mode, but breaks under
certain conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can
get in a range where the minimal programming delta of the underlying
clock event device is larger than the calculated delta to the next
tick. This results in a boot hang as the tick code tries to play catch
up, but as the tick never fires jiffies are not advanced so it keeps
trying for ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where
applications try to align their timers, the old behaviour of the tick
happening at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0 is restored"
* tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/common: Align tick period during sched_timer setup
Linus Torvalds [Wed, 21 Jun 2023 18:10:40 +0000 (11:10 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute revert to fix a regression"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "virtio-blk: support completion batching for the IRQ path"
Linus Torvalds [Wed, 21 Jun 2023 17:58:46 +0000 (10:58 -0700)]
Revert "efi: random: refresh non-volatile random seed when RNG is initialized"
This reverts commit
e7b813b32a42a3a6281a4fd9ae7700a0257c1d50 (and the
subsequent fix for it:
41a15855c1ee "efi: random: fix NULL-deref when
refreshing seed").
It turns otu to cause non-deterministic boot stalls on at least a HP
6730b laptop.
Reported-and-bisected-by: Sami Korkalainen <sami.korkalainen@proton.me>
Link: https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 21 Jun 2023 17:32:42 +0000 (10:32 -0700)]
Merge tag 'spi-fix-v6.4-rc7' of git://git./linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One last fix for SPI, just a simple fix for incorrect handling of
probe deferral for DMA in the Qualcomm GENI driver"
* tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
Linus Torvalds [Wed, 21 Jun 2023 17:29:42 +0000 (10:29 -0700)]
Merge tag 'regulator-fix-v6.4-rc7' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver"
* tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
Linus Torvalds [Wed, 21 Jun 2023 17:25:43 +0000 (10:25 -0700)]
Merge tag 'regmap-fix-v6.4-rc7' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"One more fix for v6.4
The earlier fix to take account of the register data size when
limiting raw register writes exposed the fact that the Intel AVMM bus
was incorrectly specifying too low a limit on the maximum data
transfer, it is only capable of transmitting one register so had set a
transfer size limit that couldn't fit both the value and the the
register address into a single message"
* tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spi-avmm: Fix regmap_bus max_raw_write
Jens Axboe [Tue, 20 Jun 2023 22:11:51 +0000 (16:11 -0600)]
io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202306210654.mDMcyMuB-lkp@intel.com/
Fixes:
cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 19 Jun 2023 15:41:05 +0000 (09:41 -0600)]
io_uring/net: disable partial retries for recvmsg with cmsg
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 19 Jun 2023 15:35:34 +0000 (09:35 -0600)]
io_uring/net: clear msg_controllen on partial sendmsg retry
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable@vger.kernel.org # 5.10+
Fixes:
cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Takashi Iwai [Wed, 21 Jun 2023 13:23:23 +0000 (15:23 +0200)]
Merge tag 'asoc-fix-v6.4-rc7' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fix for v6.4
A fix for a typoed iterator in the Intel Soundwire driver, fairly simple
on inspection though not reviewed by Intel.
Luke D. Jones [Wed, 21 Jun 2023 08:57:15 +0000 (20:57 +1200)]
ALSA: hda/realtek: Add quirk for ASUS ROG GV601V
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GV601V series.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230621085715.5382-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Jiri Olsa [Sun, 18 Jun 2023 13:14:14 +0000 (15:14 +0200)]
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Fixes:
ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
Florent Revest [Thu, 15 Jun 2023 14:56:07 +0000 (16:56 +0200)]
bpf/btf: Accept function names that contain dots
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
Fixes:
1dc92851849c ("bpf: kernel side support for BTF Var and DataSec")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Yonghong Song <yhs@meta.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org
Michael S. Tsirkin [Thu, 8 Jun 2023 21:42:53 +0000 (17:42 -0400)]
Revert "virtio-blk: support completion batching for the IRQ path"
This reverts commit
07b679f70d73483930e8d3c293942416d9cd5c13.
This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda
1843140071 0
14745120568 712568645 0 0 0 0 0 3117947
712568645 0 0 0 0 0 0
252 16 vdb
1816291511 0
14530332088 704905623 0 0 0 0 0 3117711
704905623 0 0 0 0 0 0
...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count -
839416590 813148916 105586179 84988123
[b]completion1_count -
839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3
[a]request_count -
823335887 812516140 104582672 75856549
[b]completion1_count -
823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}
kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
Perhaps you might like to fix/test/revert this change...
Martin
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Tested-by: edliaw@google.com
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Message-Id: <
336455b4f630f329380a8f53ee8cad3868764d5c.
1686295549.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Francesco Dolcini [Mon, 19 Jun 2023 15:44:35 +0000 (17:44 +0200)]
Revert "net: phy: dp83867: perform soft reset and retain established link"
This reverts commit
da9ef50f545f86ffe6ff786174d26500c4db737a.
This fixes a regression in which the link would come up, but no
communication was possible.
The reverted commit was also removing a comment about
DP83867_PHYCR_FORCE_LINK_GOOD, this is not added back in this commits
since it seems that this is unrelated to the original code change.
Closes: https://lore.kernel.org/all/ZGuDJos8D7N0J6Z2@francesco-nb.int.toradex.com/
Fixes:
da9ef50f545f ("net: phy: dp83867: perform soft reset and retain established link")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Praneeth Bajjuri <praneeth@ti.com>
Link: https://lore.kernel.org/r/20230619154435.355485-1-francesco@dolcini.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Mon, 19 Jun 2023 09:49:48 +0000 (17:49 +0800)]
net: mdio: fix the wrong parameters
PHY address and device address are passed in the wrong order.
Cc: stable@vger.kernel.org
Fixes:
4e4aafcddbbf ("net: mdio: Add dedicated C45 API to MDIO bus drivers")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230619094948.84452-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 21 Jun 2023 00:20:22 +0000 (17:20 -0700)]
Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"19 hotfixes. 8 of these are cc:stable.
This includes a wholesale reversion of the post-6.4 series 'make slab
shrink lockless'. After input from Dave Chinner it has been decided
that we should go a different way [1]"
Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area
* tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
selftests/mm: fix cross compilation with LLVM
mailmap: add entries for Ben Dooks
nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Revert "mm: vmscan: make global slab shrink lockless"
Revert "mm: vmscan: make memcg slab shrink lockless"
Revert "mm: vmscan: add shrinker_srcu_generation"
Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
nilfs2: fix buffer corruption due to concurrent device reads
scripts/gdb: fix SB_* constants parsing
scripts: fix the gfp flags header path in gfp-translate
udmabuf: revert 'Add support for mapping hugepages (v4)'
mm/khugepaged: fix iteration in collapse_file
memfd: check for non-NULL file_seals in memfd_create() syscall
mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
mm/mprotect: fix do_mprotect_pkey() limit check
writeback: fix dereferencing NULL mapping->host on writeback_page_template
Linus Torvalds [Tue, 20 Jun 2023 22:45:34 +0000 (15:45 -0700)]
Merge tag 'acpi-6.4-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a kernel crash during early resume from ACPI S3 that has been
present since the 5.15 cycle when might_sleep() was added to
down_timeout(), which in some configurations of the kernel caused an
implicit preemption point to trigger at a wrong time"
* tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
Linus Torvalds [Tue, 20 Jun 2023 22:39:41 +0000 (15:39 -0700)]
Merge tag 'thermal-6.4-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix a regression introduced during the 6.3 cycle causing
intel_soc_dts_iosf to report incorrect temperature values
due to a coding mistake (Hans de Goede)"
* tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
Linus Torvalds [Tue, 20 Jun 2023 22:01:08 +0000 (15:01 -0700)]
Merge tag 'trace-v6.4-rc6' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix MAINTAINERS file to point to proper mailing list for rtla and rv
The mailing list pointed to linux-trace-devel instead of
linux-trace-kernel. The former is for the tracing libraries and the
latter is for anything in the Linux kernel tree. The wrong mailing
list was used because linux-trace-kernel did not exist when rtla and
rv were created.
- User events:
- Fix matching of dynamic events to their user events
When user writes to dynamic_events file, a lookup of the
registered dynamic events is made, but there were some cases that
a match could be incorrectly made.
- Add auto cleanup of user events
Have the user events automatically get removed when the last
reference (file descriptor) is closed. This was asked for to
prevent leaks of user events hanging around needing admins to
clean them up.
- Add persistent logic (but not let user space use it yet)
In some cases, having a persistent user event (one that does not
get cleaned up automatically) is useful. But there's still debates
about how to expose this to user space. The infrastructure is
added, but the API is not.
- Update the selftests
Update the user event selftests to reflect the above changes"
* tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/user_events: Document auto-cleanup and remove dyn_event refs
selftests/user_events: Adapt dyn_test to non-persist events
selftests/user_events: Ensure auto cleanup works as expected
tracing/user_events: Add auto cleanup and future persist flag
tracing/user_events: Track refcount consistently via put/get
tracing/user_events: Store register flags on events
tracing/user_events: Remove user_ns walk for groups
selftests/user_events: Add perf self-test for empty arguments events
selftests/user_events: Clear the events after perf self-test
selftests/user_events: Add ftrace self-test for empty arguments events
tracing/user_events: Fix the incorrect trace record for empty arguments events
tracing: Modify print_fields() for fields output order
tracing/user_events: Handle matching arguments that is null from dyn_events
tracing/user_events: Prevent same name but different args event
tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
Linus Torvalds [Tue, 20 Jun 2023 21:38:21 +0000 (14:38 -0700)]
Merge tag 'for-6.4-rc7-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more regression fix for an assertion failure that uncovered a
nasty problem with stripe calculations. This is caused by a u32
overflow when there are enough devices. The fstests require 6 so this
hasn't been caught, I was able to hit it with 8.
The fix is minimal and only adds u64 casts, we'll clean that up later.
I did various additional tests to be sure"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix u32 overflows when left shifting stripe_nr
Phil Sutter [Fri, 16 Jun 2023 15:56:11 +0000 (17:56 +0200)]
netfilter: nf_tables: Fix for deleting base chains with payload
When deleting a base chain, iptables-nft simply submits the whole chain
to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code
added by fixed commit then turned this into a chain update, destroying
the hook but not the chain itself. Detect the situation by checking if
the chain type is either netdev or inet/ingress.
Fixes:
7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 15 Jun 2023 08:14:25 +0000 (10:14 +0200)]
netfilter: nfnetlink_osf: fix module autoload
Move the alias from xt_osf to nfnetlink_osf.
Fixes:
f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 14 Jun 2023 21:20:18 +0000 (23:20 +0200)]
netfilter: nf_tables: drop module reference after updating chain
Otherwise the module reference counter is leaked.
Fixes
b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:22:18 +0000 (15:22 +0200)]
netfilter: nf_tables: disallow timeout for anonymous sets
Never used from userspace, disallow these parameters.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:22:01 +0000 (15:22 +0200)]
netfilter: nf_tables: disallow updates of anonymous sets
Disallow updates of set timeout and garbage collection parameters for
anonymous sets.
Fixes:
123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:21:39 +0000 (15:21 +0200)]
netfilter: nf_tables: reject unbound chain set before commit phase
Use binding list to track set transaction and to check for unbound
chains before entering the commit phase.
Bail out if chain binding remain unused before entering the commit
step.
Fixes:
d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:21:33 +0000 (15:21 +0200)]
netfilter: nf_tables: reject unbound anonymous set before commit phase
Add a new list to track set transaction and to check for unbound
anonymous sets before entering the commit phase.
Bail out at the end of the transaction handling if an anonymous set
remains unbound.
Fixes:
96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:20:16 +0000 (15:20 +0200)]
netfilter: nf_tables: disallow element updates of bound anonymous sets
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API
allows to create anonymous sets without NFT_SET_CONSTANT, it makes no
sense to allow to add and to delete elements for bound anonymous sets.
Fixes:
96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:20:08 +0000 (15:20 +0200)]
netfilter: nf_tables: fix underflow in object reference counter
Since ("netfilter: nf_tables: drop map element references from
preparation phase"), integration with commit protocol is better,
therefore drop the workaround that
b91d90368837 ("netfilter: nf_tables:
fix leaking object reference count") provides.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 13:20:04 +0000 (15:20 +0200)]
netfilter: nft_set_pipapo: .walk does not deal with generations
The .walk callback iterates over the current active set, but it might be
useful to iterate over the next generation set. Use the generation mask
to determine what set view (either current or next generation) is use
for the walk iteration.
Fixes:
3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 12:51:49 +0000 (14:51 +0200)]
netfilter: nf_tables: drop map element references from preparation phase
set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.
Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.
Fixes:
591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 12:45:26 +0000 (14:45 +0200)]
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
Add a new state to deal with rule expressions deactivation from the
newrule error path, otherwise the anonymous set remains in the list in
inactive state for the next generation. Mark the set/chain transaction
as unbound so the abort path releases this object, set it as inactive in
the next generation so it is not reachable anymore from this transaction
and reference counter is dropped.
Fixes:
1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 16 Jun 2023 12:45:22 +0000 (14:45 +0200)]
netfilter: nf_tables: fix chain binding transaction logic
Add bound flag to rule and chain transactions as in
6a0a8d10a366
("netfilter: nf_tables: use-after-free in failing rule with bound set")
to skip them in case that the chain is already bound from the abort
path.
This patch fixes an imbalance in the chain use refcnt that triggers a
WARN_ON on the table and chain destroy path.
This patch also disallows nested chain bindings, which is not
supported from userspace.
The logic to deal with chain binding in nft_data_hold() and
nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a
special handling in case a chain is bound but next expressions in the
same rule fail to initialize as described by
1240eb93f061 ("netfilter:
nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").
The chain is left bound if rule construction fails, so the objects
stored in this chain (and the chain itself) are released by the
transaction records from the abort path, follow up patch ("netfilter:
nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
completes this error handling.
When deleting an existing rule, chain bound flag is set off so the
rule expression .destroy path releases the objects.
Fixes:
d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Russ Weight [Tue, 20 Jun 2023 20:28:24 +0000 (13:28 -0700)]
regmap: spi-avmm: Fix regmap_bus max_raw_write
The max_raw_write member of the regmap_spi_avmm_bus structure is defined
as:
.max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
SPI_AVMM_VAL_SIZE == 4 and MAX_WRITE_CNT == 1 so this results in a
maximum write transfer size of 4 bytes which provides only enough space to
transfer the address of the target register. It provides no space for the
value to be transferred. This bug became an issue (divide-by-zero in
_regmap_raw_write()) after the following was accepted into mainline:
commit
3981514180c9 ("regmap: Account for register length when chunking")
Change max_raw_write to include space (4 additional bytes) for both the
register address and value:
.max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
Fixes:
7f9fb67358a2 ("regmap: add Intel SPI Slave to AVMM Bus Bridge support")
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
Link: https://lore.kernel.org/r/20230620202824.380313-1-russell.h.weight@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Jeff Layton [Wed, 19 Apr 2023 11:24:46 +0000 (07:24 -0400)]
drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
I've been experiencing some intermittent crashes down in the display
driver code. The symptoms are ususally a line like this in dmesg:
amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port
000000006d3a3885: -5
...followed by an Oops due to a NULL pointer dereference.
Switch to using mgr->dev instead of state->dev since "state" can be
NULL in some cases.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855
Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230419112447.18471-1-jlayton@kernel.org
Mukesh Sisodiya [Mon, 19 Jun 2023 15:02:34 +0000 (17:02 +0200)]
wifi: iwlwifi: pcie: Handle SO-F device for PCI id 0x7AF0
Add support for AX1690i and AX1690s devices with
PCIE id 0x7AF0.
Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230619150233.461290-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ross Lagerwall [Fri, 16 Jun 2023 16:45:49 +0000 (17:45 +0100)]
be2net: Extend xmit workaround to BE3 chip
We have seen a bug where the NIC incorrectly changes the length in the
IP header of a padded packet to include the padding bytes. The driver
already has a workaround for this so do the workaround for this NIC too.
This resolves the issue.
The NIC in question identifies itself as follows:
[ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31
[ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1
02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
Fixes:
ca34fe38f06d ("be2net: fix wrong usage of adapter->generation")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Link: https://lore.kernel.org/r/20230616164549.2863037-1-ross.lagerwall@citrix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Tue, 20 Jun 2023 18:50:40 +0000 (11:50 -0700)]
Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Four smb3 server fixes, all also for stable:
- fix potential oops in parsing compounded requests
- fix various paths (mkdir, create etc) where mnt_want_write was not
checked first
- fix slab out of bounds in check_message and write"
* tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate session id and tree id in the compound request
ksmbd: fix out-of-bound read in smb2_write
ksmbd: add mnt_want_write to ksmbd vfs functions
ksmbd: validate command payload size
Qu Wenruo [Tue, 20 Jun 2023 09:57:31 +0000 (17:57 +0800)]
btrfs: fix u32 overflows when left shifting stripe_nr
[BUG]
David reported an ASSERT() get triggered during fio load on 8 devices
with data/raid6 and metadata/raid1c3:
fio --rw=randrw --randrepeat=1 --size=3000m \
--bsrange=512b-64k --bs_unaligned \
--ioengine=libaio --fsync=1024 \
--name=job0 --name=job1 \
The ASSERT() is from rbio_add_bio() of raid56.c:
ASSERT(orig_logical >= full_stripe_start &&
orig_logical + orig_len <= full_stripe_start +
rbio->nr_data * BTRFS_STRIPE_LEN);
Which is checking if the target rbio is crossing the full stripe
boundary.
[100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
[100.795] ------------[ cut here ]------------
[100.796] kernel BUG at fs/btrfs/raid56.c:1622!
[100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
[100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
[100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
[100.806] RSP: 0018:
ffff888104a8f300 EFLAGS:
00010246
[100.808] RAX:
00000000000000a1 RBX:
ffff8881075907e0 RCX:
ffffed1020951e01
[100.809] RDX:
0000000000000000 RSI:
0000000000000008 RDI:
0000000000000001
[100.811] RBP:
0000000141d20000 R08:
0000000000000001 R09:
ffff888104a8f04f
[100.813] R10:
ffffed1020951e09 R11:
0000000000000003 R12:
ffff88810e87f400
[100.815] R13:
0000000041d20000 R14:
0000000144529000 R15:
ffff888101524000
[100.817] FS:
0000000000000000(0000) GS:
ffff88811ac00000(0000) knlGS:
0000000000000000
[100.821] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[100.822] CR2:
000055d54e44c270 CR3:
000000010a9a1006 CR4:
00000000003706a0
[100.824] Call Trace:
[100.825] <TASK>
[100.825] ? die+0x32/0x80
[100.826] ? do_trap+0x12d/0x160
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.829] ? do_error_trap+0x90/0x130
[100.830] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.831] ? handle_invalid_op+0x2c/0x30
[100.833] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.835] ? exc_invalid_op+0x29/0x40
[100.836] ? asm_exc_invalid_op+0x16/0x20
[100.837] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.837] raid56_parity_write+0x64/0x270 [btrfs]
[100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs]
[100.840] ? btrfs_bio_init+0x80/0x80 [btrfs]
[100.841] ? release_pages+0x503/0x6d0
[100.842] ? folio_unlock+0x2f/0x60
[100.844] ? __folio_put+0x60/0x60
[100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
[100.847] btrfs_submit_bio+0x21/0x60 [btrfs]
[100.847] submit_one_bio+0x6a/0xb0 [btrfs]
[100.849] extent_write_cache_pages+0x395/0x680 [btrfs]
[100.850] ? __extent_writepage+0x520/0x520 [btrfs]
[100.851] ? mark_usage+0x190/0x190
[100.852] extent_writepages+0xdb/0x130 [btrfs]
[100.853] ? extent_write_locked_range+0x480/0x480 [btrfs]
[100.854] ? mark_usage+0x190/0x190
[100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs]
[100.855] ? reacquire_held_locks+0x178/0x280
[100.856] ? writeback_sb_inodes+0x245/0x7f0
[100.857] do_writepages+0x102/0x2e0
[100.858] ? page_writeback_cpu_online+0x10/0x10
[100.859] ? __lock_release.isra.0+0x14a/0x4d0
[100.860] ? reacquire_held_locks+0x280/0x280
[100.861] ? __lock_acquired+0x1e9/0x3d0
[100.862] ? do_raw_spin_lock+0x1b0/0x1b0
[100.863] __writeback_single_inode+0x94/0x450
[100.864] writeback_sb_inodes+0x372/0x7f0
[100.864] ? lock_sync+0xd0/0xd0
[100.865] ? do_raw_spin_unlock+0x93/0xf0
[100.866] ? sync_inode_metadata+0xc0/0xc0
[100.867] ? rwsem_optimistic_spin+0x340/0x340
[100.868] __writeback_inodes_wb+0x70/0x130
[100.869] wb_writeback+0x2d1/0x530
[100.869] ? __writeback_inodes_wb+0x130/0x130
[100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
[100.870] wb_do_writeback+0x3eb/0x480
[100.871] ? wb_writeback+0x530/0x530
[100.871] ? mark_lock_irq+0xcd0/0xcd0
[100.872] wb_workfn+0xe0/0x3f0<
[CAUSE]
Commit
a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
u64 division.
Function btrfs_max_io_len() is to get the length to the stripe boundary.
It calculates the full stripe start offset (inside the chunk) by the
following code:
*full_stripe_start =
rounddown(*stripe_nr, nr_data_stripes(map)) <<
BTRFS_STRIPE_LEN_SHIFT;
The calculation itself is fine, but the value returned by rounddown() is
dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
returned int).
Thus the result is also u32, then we do the left shift, which can
overflow u32.
If such overflow happens, @full_stripe_start will be a value way smaller
than @offset, causing later "full_stripe_len - (offset -
*full_stripe_start)" to underflow, thus make later length calculation to
have no stripe boundary limit, resulting a write bio to exceed stripe
boundary.
There are some other locations like this, with a u32 @stripe_nr got left
shift, which can lead to a similar overflow.
[FIX]
Fix all @stripe_nr with left shift with a type cast to u64 before the
left shift.
Those involved @stripe_nr or similar variables are recording the stripe
number inside the chunk, which is small enough to be contained by u32,
but their offset inside the chunk can not fit into u32.
Thus for those specific left shifts, a type cast to u64 is necessary so
this patch does not touch them and the code will be cleaned up in the
future to keep the fix minimal.
Reported-by: David Sterba <dsterba@suse.com>
Fixes:
a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>